Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next,v4, 4/6] net/mlx5: Add HV VHCA infrastructure
From: Eran Ben Elisha @ 2019-08-22 19:33 UTC (permalink / raw)
  To: Leon Romanovsky, haiyangz
  Cc: sashal@kernel.org, davem@davemloft.net, Saeed Mahameed,
	lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	linux-pci@vger.kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, kys, Stephen Hemminger,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190822185847.GP29433@mtr-leonro.mtl.com>



On 8/22/2019 9:58 PM, Leon Romanovsky wrote:
> On Thu, Aug 22, 2019 at 05:05:51AM +0000, Haiyang Zhang wrote:
>> From: Eran Ben Elisha <eranbe@mellanox.com>
>>
>> HV VHCA is a layer which provides PF to VF communication channel based on
>> HyperV PCI config channel. It implements Mellanox's Inter VHCA control
>> communication protocol. The protocol contains control block in order to
>> pass messages between the PF and VF drivers, and data blocks in order to
>> pass actual data.
>>
>> The infrastructure is agent based. Each agent will be responsible of
>> contiguous buffer blocks in the VHCA config space. This infrastructure will
>> bind agents to their blocks, and those agents can only access read/write
>> the buffer blocks assigned to them. Each agent will provide three
>> callbacks (control, invalidate, cleanup). Control will be invoked when
>> block-0 is invalidated with a command that concerns this agent. Invalidate
>> callback will be invoked if one of the blocks assigned to this agent was
>> invalidated. Cleanup will be invoked before the agent is being freed in
>> order to clean all of its open resources or deferred works.
>>
>> Block-0 serves as the control block. All execution commands from the PF
>> will be written by the PF over this block. VF will ack on those by
>> writing on block-0 as well. Its format is described by struct
>> mlx5_hv_vhca_control_block layout.
>>
>> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
>>   .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c  | 253 +++++++++++++++++++++
>>   .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h  | 102 +++++++++
>>   drivers/net/ethernet/mellanox/mlx5/core/main.c     |   7 +
>>   include/linux/mlx5/driver.h                        |   2 +
>>   5 files changed, 365 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
>>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> index fd32a5b..8d443fc 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> @@ -45,7 +45,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH)   += eswitch.o eswitch_offloads.o eswitch_offlo
>>   mlx5_core-$(CONFIG_MLX5_MPFS)      += lib/mpfs.o
>>   mlx5_core-$(CONFIG_VXLAN)          += lib/vxlan.o
>>   mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
>> -mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o
>> +mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o lib/hv_vhca.o
>>
>>   #
>>   # Ipoib netdev
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
>> new file mode 100644
>> index 0000000..84d1d75
>> --- /dev/null
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
>> @@ -0,0 +1,253 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> +// Copyright (c) 2018 Mellanox Technologies
>> +
>> +#include <linux/hyperv.h>
>> +#include "mlx5_core.h"
>> +#include "lib/hv.h"
>> +#include "lib/hv_vhca.h"
>> +
>> +struct mlx5_hv_vhca {
>> +	struct mlx5_core_dev       *dev;
>> +	struct workqueue_struct    *work_queue;
>> +	struct mlx5_hv_vhca_agent  *agents[MLX5_HV_VHCA_AGENT_MAX];
>> +	struct mutex                agents_lock; /* Protect agents array */
>> +};
>> +
>> +struct mlx5_hv_vhca_work {
>> +	struct work_struct     invalidate_work;
>> +	struct mlx5_hv_vhca   *hv_vhca;
>> +	u64                    block_mask;
>> +};
>> +
>> +struct mlx5_hv_vhca_data_block {
>> +	u16     sequence;
>> +	u16     offset;
>> +	u8      reserved[4];
>> +	u64     data[15];
>> +};
>> +
>> +struct mlx5_hv_vhca_agent {
>> +	enum mlx5_hv_vhca_agent_type	 type;
>> +	struct mlx5_hv_vhca		*hv_vhca;
>> +	void				*priv;
>> +	u16                              seq;
>> +	void (*control)(struct mlx5_hv_vhca_agent *agent,
>> +			struct mlx5_hv_vhca_control_block *block);
>> +	void (*invalidate)(struct mlx5_hv_vhca_agent *agent,
>> +			   u64 block_mask);
>> +	void (*cleanup)(struct mlx5_hv_vhca_agent *agent);
>> +};
>> +
>> +struct mlx5_hv_vhca *mlx5_hv_vhca_create(struct mlx5_core_dev *dev)
>> +{
>> +	struct mlx5_hv_vhca *hv_vhca = NULL;
>> +
>> +	hv_vhca = kzalloc(sizeof(*hv_vhca), GFP_KERNEL);
>> +	if (!hv_vhca)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	hv_vhca->work_queue = create_singlethread_workqueue("mlx5_hv_vhca");
> 
> I was under impression that usage of create_* interfaces is discouraged,
> It has WQ_MEMORY_LEGACY flag inside and commit b71ab8c2025ca talks about
> this interface as legacy one.

mlx5 driver has dozens of singlethread workqueues. A general effort to 
remove them can be done later.

> 
>> +	if (!hv_vhca->work_queue) {
>> +		kfree(hv_vhca);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	hv_vhca->dev = dev;
>> +	mutex_init(&hv_vhca->agents_lock);
>> +
>> +	return hv_vhca;
>> +}
>> +
>> +void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +	if (IS_ERR_OR_NULL(hv_vhca))
>> +		return;
>> +
>> +	destroy_workqueue(hv_vhca->work_queue);
>> +	kfree(hv_vhca);
>> +}
>> +
>> +static void mlx5_hv_vhca_invalidate_work(struct work_struct *work)
>> +{
>> +	struct mlx5_hv_vhca_work *hwork;
>> +	struct mlx5_hv_vhca *hv_vhca;
>> +	int i;
>> +
>> +	hwork = container_of(work, struct mlx5_hv_vhca_work, invalidate_work);
>> +	hv_vhca = hwork->hv_vhca;
>> +
>> +	mutex_lock(&hv_vhca->agents_lock);
>> +	for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++) {
>> +		struct mlx5_hv_vhca_agent *agent = hv_vhca->agents[i];
>> +
>> +		if (!agent || !agent->invalidate)
>> +			continue;
>> +
>> +		if (!(BIT(agent->type) & hwork->block_mask))
>> +			continue;
>> +
>> +		agent->invalidate(agent, hwork->block_mask);
>> +	}
>> +	mutex_unlock(&hv_vhca->agents_lock);
>> +
>> +	kfree(hwork);
>> +}
>> +
>> +void mlx5_hv_vhca_invalidate(void *context, u64 block_mask)
>> +{
>> +	struct mlx5_hv_vhca *hv_vhca = (struct mlx5_hv_vhca *)context;
>> +	struct mlx5_hv_vhca_work *work;
>> +
>> +	work = kzalloc(sizeof(*work), GFP_ATOMIC);
>> +	if (!work)
>> +		return;
>> +
>> +	INIT_WORK(&work->invalidate_work, mlx5_hv_vhca_invalidate_work);
>> +	work->hv_vhca    = hv_vhca;
>> +	work->block_mask = block_mask;
>> +
>> +	queue_work(hv_vhca->work_queue, &work->invalidate_work);
>> +}
>> +
>> +int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +	if (IS_ERR_OR_NULL(hv_vhca))
>> +		return IS_ERR_OR_NULL(hv_vhca);
>> +
>> +	return mlx5_hv_register_invalidate(hv_vhca->dev, hv_vhca,
>> +					   mlx5_hv_vhca_invalidate);
>> +}
>> +
>> +void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +	int i;
>> +
>> +	if (IS_ERR_OR_NULL(hv_vhca))
>> +		return;
>> +
>> +	mutex_lock(&hv_vhca->agents_lock);
>> +	for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++)
>> +		WARN_ON(hv_vhca->agents[i]);
>> +
>> +	mutex_unlock(&hv_vhca->agents_lock);
>> +
>> +	mlx5_hv_unregister_invalidate(hv_vhca->dev);
>> +}
>> +
>> +struct mlx5_hv_vhca_agent *
>> +mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
>> +			  enum mlx5_hv_vhca_agent_type type,
>> +			  void (*control)(struct mlx5_hv_vhca_agent*,
>> +					  struct mlx5_hv_vhca_control_block *block),
>> +			  void (*invalidate)(struct mlx5_hv_vhca_agent*,
>> +					     u64 block_mask),
>> +			  void (*cleaup)(struct mlx5_hv_vhca_agent *agent),
>> +			  void *priv)
>> +{
>> +	struct mlx5_hv_vhca_agent *agent;
>> +
>> +	if (IS_ERR_OR_NULL(hv_vhca))
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	if (type >= MLX5_HV_VHCA_AGENT_MAX)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	mutex_lock(&hv_vhca->agents_lock);
>> +	if (hv_vhca->agents[type]) {
>> +		mutex_unlock(&hv_vhca->agents_lock);
>> +		return ERR_PTR(-EINVAL);
>> +	}
>> +	mutex_unlock(&hv_vhca->agents_lock);
>> +
>> +	agent = kzalloc(sizeof(*agent), GFP_KERNEL);
>> +	if (!agent)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	agent->type      = type;
>> +	agent->hv_vhca   = hv_vhca;
>> +	agent->priv      = priv;
>> +	agent->control   = control;
>> +	agent->invalidate = invalidate;
>> +	agent->cleanup   = cleaup;
>> +
>> +	mutex_lock(&hv_vhca->agents_lock);
>> +	hv_vhca->agents[type] = agent;
>> +	mutex_unlock(&hv_vhca->agents_lock);
>> +
>> +	return agent;
>> +}
>> +
>> +void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent)
>> +{
>> +	struct mlx5_hv_vhca *hv_vhca = agent->hv_vhca;
>> +
>> +	mutex_lock(&hv_vhca->agents_lock);
>> +
>> +	if (WARN_ON(agent != hv_vhca->agents[agent->type])) {
>> +		mutex_unlock(&hv_vhca->agents_lock);
>> +		return;
>> +	}
>> +
>> +	hv_vhca->agents[agent->type] = NULL;
>> +	mutex_unlock(&hv_vhca->agents_lock);
>> +
>> +	if (agent->cleanup)
>> +		agent->cleanup(agent);
>> +
>> +	kfree(agent);
>> +}
>> +
>> +static int mlx5_hv_vhca_data_block_prepare(struct mlx5_hv_vhca_agent *agent,
>> +					   struct mlx5_hv_vhca_data_block *data_block,
>> +					   void *src, int len, int *offset)
>> +{
>> +	int bytes = min_t(int, (int)sizeof(data_block->data), len);
>> +
>> +	data_block->sequence = agent->seq;
>> +	data_block->offset   = (*offset)++;
>> +	memcpy(data_block->data, src, bytes);
>> +
>> +	return bytes;
>> +}
>> +
>> +static void mlx5_hv_vhca_agent_seq_update(struct mlx5_hv_vhca_agent *agent)
>> +{
>> +	agent->seq++;
>> +}
>> +
>> +int mlx5_hv_vhca_agent_write(struct mlx5_hv_vhca_agent *agent,
>> +			     void *buf, int len)
>> +{
>> +	int offset = agent->type * HV_CONFIG_BLOCK_SIZE_MAX;
>> +	int block_offset = 0;
>> +	int total = 0;
>> +	int err;
>> +
>> +	while (len) {
>> +		struct mlx5_hv_vhca_data_block data_block = {0};
>> +		int bytes;
>> +
>> +		bytes = mlx5_hv_vhca_data_block_prepare(agent, &data_block,
>> +							buf + total,
>> +							len, &block_offset);
>> +		if (!bytes)
>> +			return -ENOMEM;
>> +
>> +		err = mlx5_hv_write_config(agent->hv_vhca->dev, &data_block,
>> +					   sizeof(data_block), offset);
>> +		if (err)
>> +			return err;
>> +
>> +		total += bytes;
>> +		len   -= bytes;
>> +	}
>> +
>> +	mlx5_hv_vhca_agent_seq_update(agent);
>> +
>> +	return 0;
>> +}
>> +
>> +void *mlx5_hv_vhca_agent_priv(struct mlx5_hv_vhca_agent *agent)
>> +{
>> +	return agent->priv;
>> +}
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
>> new file mode 100644
>> index 0000000..cdf1303
>> --- /dev/null
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
>> @@ -0,0 +1,102 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
>> +/* Copyright (c) 2019 Mellanox Technologies. */
>> +
>> +#ifndef __LIB_HV_VHCA_H__
>> +#define __LIB_HV_VHCA_H__
>> +
>> +#include "en.h"
>> +#include "lib/hv.h"
>> +
>> +struct mlx5_hv_vhca_agent;
>> +struct mlx5_hv_vhca;
>> +struct mlx5_hv_vhca_control_block;
>> +
>> +enum mlx5_hv_vhca_agent_type {
>> +	MLX5_HV_VHCA_AGENT_MAX = 32,
>> +};
>> +
>> +#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
>> +
>> +struct mlx5_hv_vhca_control_block {
>> +	u32     capabilities;
>> +	u32     control;
>> +	u16     command;
>> +	u16     command_ack;
>> +	u16     version;
>> +	u16     rings;
>> +	u32     reserved1[28];
>> +};
>> +
>> +struct mlx5_hv_vhca *mlx5_hv_vhca_create(struct mlx5_core_dev *dev);
>> +void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca);
>> +int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca);
>> +void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca);
>> +void mlx5_hv_vhca_invalidate(void *context, u64 block_mask);
>> +
>> +struct mlx5_hv_vhca_agent *
>> +mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
>> +			  enum mlx5_hv_vhca_agent_type type,
>> +			  void (*control)(struct mlx5_hv_vhca_agent*,
>> +					  struct mlx5_hv_vhca_control_block *block),
>> +			  void (*invalidate)(struct mlx5_hv_vhca_agent*,
>> +					     u64 block_mask),
>> +			  void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
>> +			  void *context);
>> +
>> +void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent);
>> +int mlx5_hv_vhca_agent_write(struct mlx5_hv_vhca_agent *agent,
>> +			     void *buf, int len);
>> +void *mlx5_hv_vhca_agent_priv(struct mlx5_hv_vhca_agent *agent);
>> +
>> +#else
>> +
>> +static inline struct mlx5_hv_vhca *
>> +mlx5_hv_vhca_create(struct mlx5_core_dev *dev)
>> +{
>> +	return NULL;
>> +}
>> +
>> +static inline void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +}
>> +
>> +static inline int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +	return 0;
>> +}
>> +
>> +static inline void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
>> +{
>> +}
>> +
>> +static inline void mlx5_hv_vhca_invalidate(void *context,
>> +					   u64 block_mask)
>> +{
>> +}
>> +
>> +static inline struct mlx5_hv_vhca_agent *
>> +mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
>> +			  enum mlx5_hv_vhca_agent_type type,
>> +			  void (*control)(struct mlx5_hv_vhca_agent*,
>> +					  struct mlx5_hv_vhca_control_block *block),
>> +			  void (*invalidate)(struct mlx5_hv_vhca_agent*,
>> +					     u64 block_mask),
>> +			  void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
>> +			  void *context)
>> +{
>> +	return NULL;
>> +}
>> +
>> +static inline void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent)
>> +{
>> +}
>> +
>> +static inline int
>> +mlx5_hv_vhca_write_agent(struct mlx5_hv_vhca_agent *agent,
>> +			 void *buf, int len)
>> +{
>> +	return 0;
>> +}
>> +#endif
>> +
>> +#endif /* __LIB_HV_VHCA_H__ */
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> index 0b70b1d..61388ca 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> @@ -69,6 +69,7 @@
>>   #include "lib/pci_vsc.h"
>>   #include "diag/fw_tracer.h"
>>   #include "ecpf.h"
>> +#include "lib/hv_vhca.h"
>>
>>   MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
>>   MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver");
>> @@ -870,6 +871,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
>>   	}
>>
>>   	dev->tracer = mlx5_fw_tracer_create(dev);
>> +	dev->hv_vhca = mlx5_hv_vhca_create(dev);
>>
>>   	return 0;
>>
>> @@ -900,6 +902,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
>>
>>   static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
>>   {
>> +	mlx5_hv_vhca_destroy(dev->hv_vhca);
>>   	mlx5_fw_tracer_destroy(dev->tracer);
>>   	mlx5_fpga_cleanup(dev);
>>   	mlx5_eswitch_cleanup(dev->priv.eswitch);
>> @@ -1067,6 +1070,8 @@ static int mlx5_load(struct mlx5_core_dev *dev)
>>   		goto err_fw_tracer;
>>   	}
>>
>> +	mlx5_hv_vhca_init(dev->hv_vhca);
> 
> What is the point to declare this function as "int ..." if you are not
> interested in result?
> 
>> +
>>   	err = mlx5_fpga_device_start(dev);
>>   	if (err) {
>>   		mlx5_core_err(dev, "fpga device start failed %d\n", err);
>> @@ -1122,6 +1127,7 @@ static int mlx5_load(struct mlx5_core_dev *dev)
>>   err_ipsec_start:
>>   	mlx5_fpga_device_stop(dev);
>>   err_fpga_start:
>> +	mlx5_hv_vhca_cleanup(dev->hv_vhca);
>>   	mlx5_fw_tracer_cleanup(dev->tracer);
>>   err_fw_tracer:
>>   	mlx5_eq_table_destroy(dev);
>> @@ -1142,6 +1148,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev)
>>   	mlx5_accel_ipsec_cleanup(dev);
>>   	mlx5_accel_tls_cleanup(dev);
>>   	mlx5_fpga_device_stop(dev);
>> +	mlx5_hv_vhca_cleanup(dev->hv_vhca);
>>   	mlx5_fw_tracer_cleanup(dev->tracer);
>>   	mlx5_eq_table_destroy(dev);
>>   	mlx5_irq_table_destroy(dev);
>> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
>> index df23f17..13b4cf2 100644
>> --- a/include/linux/mlx5/driver.h
>> +++ b/include/linux/mlx5/driver.h
>> @@ -659,6 +659,7 @@ struct mlx5_clock {
>>   struct mlx5_fw_tracer;
>>   struct mlx5_vxlan;
>>   struct mlx5_geneve;
>> +struct mlx5_hv_vhca;
>>
>>   struct mlx5_core_dev {
>>   	struct device *device;
>> @@ -706,6 +707,7 @@ struct mlx5_core_dev {
>>   	struct mlx5_ib_clock_info  *clock_info;
>>   	struct mlx5_fw_tracer   *tracer;
>>   	u32                      vsc_addr;
>> +	struct mlx5_hv_vhca	*hv_vhca;
>>   };
>>
>>   struct mlx5_db {
>> --
>> 1.8.3.1
>>

^ permalink raw reply

* Re: [PATCH] nexthops: remove redundant assignment to variable err
From: David Miller @ 2019-08-22 19:14 UTC (permalink / raw)
  To: colin.king
  Cc: dsahern, kuznet, yoshfuji, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20190822125340.30783-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Thu, 22 Aug 2019 13:53:40 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> Variable err is initialized to a value that is never read and it is
> re-assigned later. The initialization is redundant and can be removed.
> 
> Addresses-Coverity: ("Unused Value")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net 0/9] rxrpc: Fix use of skb_cow_data()
From: David Miller @ 2019-08-22 19:12 UTC (permalink / raw)
  To: dhowells; +Cc: netdev, linux-afs, linux-kernel
In-Reply-To: <156647655350.10908.12081183247715153431.stgit@warthog.procyon.org.uk>

From: David Howells <dhowells@redhat.com>
Date: Thu, 22 Aug 2019 13:22:33 +0100

> Here's a series of patches that fixes the use of skb_cow_data() in rxrpc.
> The problem is that skb_cow_data() indirectly requires that the maximum
> usage count on an sk_buff be 1, and it may generate an assertion failure in
> pskb_expand_head() if not.

It sounds like you are effectively doing a late unshare when you have to
do in-place encryption.

Why don't you just do an skb_unshare() at the beginning when you know that
you'll need to do that?

^ permalink raw reply

* Re: [net] devlink: Add method for time-stamp on reporter's dump
From: Arnd Bergmann @ 2019-08-22 19:11 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Aya Levin, David S. Miller, Jiri Pirko, Networking,
	Linux Kernel Mailing List
In-Reply-To: <20190822174037.GA18030@splinter>

On Thu, Aug 22, 2019 at 7:40 PM Ido Schimmel <idosch@idosch.org> wrote:
> On Thu, Aug 22, 2019 at 04:06:35PM +0200, Andrew Lunn wrote:
> > On Thu, Aug 22, 2019 at 11:17:51AM +0300, Aya Levin wrote:
> > > When setting the dump's time-stamp, use ktime_get_real in addition to
> > > jiffies. This simplifies the user space implementation and bypasses
> > > some inconsistent behavior with translating jiffies to current time.
> >
> > Is this year 2038 safe? I don't know enough about this to answer the
> > question myself.
>
> Good point. 'struct timespec' is not considered year 2038 safe and
> unfortunately I recently made the mistake of using it to communicate
> timestamps to user space over netlink. :/ The code is still in net-next,
> so I will fix it while I can.
>
> Arnd, would it be acceptable to use 'struct __kernel_timespec' instead?

The in-kernel representation should just use 'timespec64' if you need
separate seconds and nanoseconds, you can convert that to
__kernel_timespec while copying to user space.

However, please consider two other points:

- for simplicity, the general recommendation is to use 64-bit nanoseconds
  without separate seconds for timestamps
- instead of CLOCK_REALTIME, you could use CLOCK_MONOTONIC
  timestamps that are not affected by clock_settime() or leap second jumps.

      Arnd

^ permalink raw reply

* [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Horatiu Vultur @ 2019-08-22 19:07 UTC (permalink / raw)
  To: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, netdev, linux-kernel, bridge
  Cc: Horatiu Vultur

Current implementation of the SW bridge is setting the interfaces in
promisc mode when they are added to bridge if learning of the frames is
enabled.
In case of Ocelot which has HW capabilities to switch frames, it is not
needed to set the ports in promisc mode because the HW already capable of
doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
HW has bridge capabilities. Therefore the SW bridge doesn't need to set
the ports in promisc mode to do the switching.
This optimization takes places only if all the interfaces that are part
of the bridge have this flag and have the same network driver.

If the bridge interfaces is added in promisc mode then also the ports part
of the bridge are set in promisc mode.

Horatiu Vultur (3):
  net: Add HW_BRIDGE offload feature
  net: mscc: Use NETIF_F_HW_BRIDGE
  net: mscc: Implement promisc mode.

 drivers/net/ethernet/mscc/ocelot.c | 26 ++++++++++++++++++++++++--
 include/linux/netdev_features.h    |  3 +++
 net/bridge/br_if.c                 | 29 ++++++++++++++++++++++++++++-
 net/core/ethtool.c                 |  1 +
 4 files changed, 56 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH 1/3] net: Add HW_BRIDGE offload feature
From: Horatiu Vultur @ 2019-08-22 19:07 UTC (permalink / raw)
  To: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, netdev, linux-kernel, bridge
  Cc: Horatiu Vultur
In-Reply-To: <1566500850-6247-1-git-send-email-horatiu.vultur@microchip.com>

This patch adds a netdev feature to configure the HW as a switch.
The purpose of this flag is to show that the hardware can do switching
of the frames.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
 include/linux/netdev_features.h |  3 +++
 net/bridge/br_if.c              | 29 ++++++++++++++++++++++++++++-
 net/core/ethtool.c              |  1 +
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 4b19c54..34274de 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -78,6 +78,8 @@ enum {
 	NETIF_F_HW_TLS_TX_BIT,		/* Hardware TLS TX offload */
 	NETIF_F_HW_TLS_RX_BIT,		/* Hardware TLS RX offload */
 
+	NETIF_F_HW_BRIDGE_BIT,		/* Bridge offload support */
+
 	NETIF_F_GRO_HW_BIT,		/* Hardware Generic receive offload */
 	NETIF_F_HW_TLS_RECORD_BIT,	/* Offload TLS record */
 
@@ -150,6 +152,7 @@ enum {
 #define NETIF_F_GSO_UDP_L4	__NETIF_F(GSO_UDP_L4)
 #define NETIF_F_HW_TLS_TX	__NETIF_F(HW_TLS_TX)
 #define NETIF_F_HW_TLS_RX	__NETIF_F(HW_TLS_RX)
+#define NETIF_F_HW_BRIDGE	__NETIF_F(HW_BRIDGE)
 
 /* Finds the next feature with the highest number of the range of start till 0.
  */
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 4fe30b1..975a34c 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -127,6 +127,31 @@ static void br_port_clear_promisc(struct net_bridge_port *p)
 	p->flags &= ~BR_PROMISC;
 }
 
+/* Determin if the SW bridge can be offloaded to HW. Return true if all
+ * the interfaces of the bridge have the feature NETIF_F_HW_SWITCHDEV set
+ * and have the same netdev_ops.
+ */
+static int br_hw_offload(struct net_bridge *br)
+{
+	const struct net_device_ops *ops = NULL;
+	struct net_bridge_port *p;
+
+	list_for_each_entry(p, &br->port_list, list) {
+		if (!(p->dev->hw_features & NETIF_F_HW_BRIDGE))
+			return 0;
+
+		if (!ops) {
+			ops = p->dev->netdev_ops;
+			continue;
+		}
+
+		if (ops != p->dev->netdev_ops)
+			return 0;
+	}
+
+	return 1;
+}
+
 /* When a port is added or removed or when certain port flags
  * change, this function is called to automatically manage
  * promiscuity setting of all the bridge ports.  We are always called
@@ -134,6 +159,7 @@ static void br_port_clear_promisc(struct net_bridge_port *p)
  */
 void br_manage_promisc(struct net_bridge *br)
 {
+	bool hw_offload = br_hw_offload(br);
 	struct net_bridge_port *p;
 	bool set_all = false;
 
@@ -161,7 +187,8 @@ void br_manage_promisc(struct net_bridge *br)
 			    (br->auto_cnt == 1 && br_auto_port(p)))
 				br_port_clear_promisc(p);
 			else
-				br_port_set_promisc(p);
+				if (!hw_offload)
+					br_port_set_promisc(p);
 		}
 	}
 }
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 6288e69..4e224fe 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -111,6 +111,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
 	[NETIF_F_HW_TLS_RECORD_BIT] =	"tls-hw-record",
 	[NETIF_F_HW_TLS_TX_BIT] =	 "tls-hw-tx-offload",
 	[NETIF_F_HW_TLS_RX_BIT] =	 "tls-hw-rx-offload",
+	[NETIF_F_HW_BRIDGE_BIT] =	 "hw-bridge-offload",
 };
 
 static const char
-- 
2.7.4


^ permalink raw reply related

* [PATCH 2/3] net: mscc: Use NETIF_F_HW_BRIDGE
From: Horatiu Vultur @ 2019-08-22 19:07 UTC (permalink / raw)
  To: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, netdev, linux-kernel, bridge
  Cc: Horatiu Vultur
In-Reply-To: <1566500850-6247-1-git-send-email-horatiu.vultur@microchip.com>

Enable HW_BRIDGE feature for ocelot. In this way the HW will do all the
switching of the frames so it is not needed for the ports to be in promisc
mode.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
 drivers/net/ethernet/mscc/ocelot.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index 4d1bce4..c9cf2bee 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -2017,8 +2017,10 @@ int ocelot_probe_port(struct ocelot *ocelot, u8 port,
 	dev->ethtool_ops = &ocelot_ethtool_ops;
 
 	dev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_RXFCS |
-		NETIF_F_HW_TC;
-	dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_TC;
+		NETIF_F_HW_TC | NETIF_F_HW_BRIDGE;
+	dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_TC |
+		NETIF_F_HW_BRIDGE;
+	dev->priv_flags |= IFF_UNICAST_FLT;
 
 	memcpy(dev->dev_addr, ocelot->base_mac, ETH_ALEN);
 	dev->dev_addr[ETH_ALEN - 1] += port;
-- 
2.7.4


^ permalink raw reply related

* [PATCH 3/3] net: mscc: Implement promisc mode.
From: Horatiu Vultur @ 2019-08-22 19:07 UTC (permalink / raw)
  To: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, netdev, linux-kernel, bridge
  Cc: Horatiu Vultur
In-Reply-To: <1566500850-6247-1-git-send-email-horatiu.vultur@microchip.com>

Before when a port was added to a bridge then the port was added in
promisc mode. But because of the patches:
commit 6657c3d812dc5d ("net: Add HW_BRIDGE offload feature")
commit e2e3678c292f9c (net: mscc: Use NETIF_F_HW_BRIDGE")

the port is not needed to be in promisc mode to be part of the bridge.
So it is possible to togle the promisc mode of the port even if it is or
not part of the bridge.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
 drivers/net/ethernet/mscc/ocelot.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index c9cf2bee..9fa97fe 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -691,6 +691,25 @@ static void ocelot_set_rx_mode(struct net_device *dev)
 	__dev_mc_sync(dev, ocelot_mc_sync, ocelot_mc_unsync);
 }
 
+static void ocelot_change_rx_flags(struct net_device *dev, int flags)
+{
+	struct ocelot_port *port = netdev_priv(dev);
+	struct ocelot *ocelot = port->ocelot;
+	u32 val;
+
+	if (!(flags & IFF_PROMISC))
+		return;
+
+	val = ocelot_read_gix(ocelot, ANA_PORT_CPU_FWD_CFG,
+			      port->chip_port);
+	if (dev->flags & IFF_PROMISC)
+		val |= ANA_PORT_CPU_FWD_CFG_CPU_SRC_COPY_ENA;
+	else
+		val &= ~(ANA_PORT_CPU_FWD_CFG_CPU_SRC_COPY_ENA);
+
+	ocelot_write_gix(ocelot, val, ANA_PORT_CPU_FWD_CFG, port->chip_port);
+}
+
 static int ocelot_port_get_phys_port_name(struct net_device *dev,
 					  char *buf, size_t len)
 {
@@ -1070,6 +1089,7 @@ static const struct net_device_ops ocelot_port_netdev_ops = {
 	.ndo_stop			= ocelot_port_stop,
 	.ndo_start_xmit			= ocelot_port_xmit,
 	.ndo_set_rx_mode		= ocelot_set_rx_mode,
+	.ndo_change_rx_flags		= ocelot_change_rx_flags,
 	.ndo_get_phys_port_name		= ocelot_port_get_phys_port_name,
 	.ndo_set_mac_address		= ocelot_port_set_mac_address,
 	.ndo_get_stats64		= ocelot_get_stats64,
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH 0/3] rework netlink skb allocation
From: David Miller @ 2019-08-22 19:04 UTC (permalink / raw)
  To: jan.dakinevich
  Cc: linux-kernel, den, khorenko, jan.dakinevich, kuznet, yoshfuji,
	pablo, kadlec, fw, johannes.berg, dsahern, christian, stephen,
	Jason, jakub.kicinski, willemb, xiyou.wangcong, simon.horman,
	john.hurley, pabeni, brouer, bigeasy, edumazet, lirongqing,
	ap420073, ptalbert, herbert, tglx, dima, netdev, netfilter-devel,
	coreteam
In-Reply-To: <1566470851-4694-1-git-send-email-jan.dakinevich@virtuozzo.com>

From: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Date: Thu, 22 Aug 2019 10:48:08 +0000

> Currently, userspace is able to initiate costly high-order allocation in 
> kernel sending large broadcast netlink message, which is considered 
> undesirable. At the same time, unicast message are safe in this regard, 
> because they uses vmalloc-ed memory.
> 
> This series introduces changes, that allow broadcast messages to be 
> allocated with vmalloc() as well as unicast.

I'm tossing this series for the same reason I tossed the AF_UNIX change.

^ permalink raw reply

* Re: [PATCH] af_unix: utilize skb's fragment list for sending large datagrams
From: David Miller @ 2019-08-22 19:04 UTC (permalink / raw)
  To: jan.dakinevich
  Cc: linux-kernel, den, khorenko, pabeni, viro, axboe, hare, kgraul,
	kyeongdon.kim, tglx, netdev
In-Reply-To: <1566470311-4089-1-git-send-email-jan.dakinevich@virtuozzo.com>

From: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Date: Thu, 22 Aug 2019 10:38:39 +0000

> However, paged part can not exceed MAX_SKB_FRAGS * PAGE_SIZE, and large
> datagram causes increasing skb's data buffer. Thus, if any user-space
> program sets send buffer (by calling setsockopt(SO_SNDBUF, ...)) to
> maximum allowed size (wmem_max) it becomes able to cause any amount
> of uncontrolled high-order kernel allocations.

So?  You want huge SKBs you get the high order allocations, seems
rather reasonable to me.

SKBs using fragment lists are the most difficult and cpu intensive
geometry for an SKB to have and we should avoid using it where
feasible.

I don't want to apply this, sorry.

^ permalink raw reply

* Re: [PATCH net-next,v4, 4/6] net/mlx5: Add HV VHCA infrastructure
From: Leon Romanovsky @ 2019-08-22 18:58 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
	eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, linux-kernel@vger.kernel.org
In-Reply-To: <1566450236-36757-5-git-send-email-haiyangz@microsoft.com>

On Thu, Aug 22, 2019 at 05:05:51AM +0000, Haiyang Zhang wrote:
> From: Eran Ben Elisha <eranbe@mellanox.com>
>
> HV VHCA is a layer which provides PF to VF communication channel based on
> HyperV PCI config channel. It implements Mellanox's Inter VHCA control
> communication protocol. The protocol contains control block in order to
> pass messages between the PF and VF drivers, and data blocks in order to
> pass actual data.
>
> The infrastructure is agent based. Each agent will be responsible of
> contiguous buffer blocks in the VHCA config space. This infrastructure will
> bind agents to their blocks, and those agents can only access read/write
> the buffer blocks assigned to them. Each agent will provide three
> callbacks (control, invalidate, cleanup). Control will be invoked when
> block-0 is invalidated with a command that concerns this agent. Invalidate
> callback will be invoked if one of the blocks assigned to this agent was
> invalidated. Cleanup will be invoked before the agent is being freed in
> order to clean all of its open resources or deferred works.
>
> Block-0 serves as the control block. All execution commands from the PF
> will be written by the PF over this block. VF will ack on those by
> writing on block-0 as well. Its format is described by struct
> mlx5_hv_vhca_control_block layout.
>
> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
>  .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c  | 253 +++++++++++++++++++++
>  .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h  | 102 +++++++++
>  drivers/net/ethernet/mellanox/mlx5/core/main.c     |   7 +
>  include/linux/mlx5/driver.h                        |   2 +
>  5 files changed, 365 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
>  create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> index fd32a5b..8d443fc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> @@ -45,7 +45,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH)   += eswitch.o eswitch_offloads.o eswitch_offlo
>  mlx5_core-$(CONFIG_MLX5_MPFS)      += lib/mpfs.o
>  mlx5_core-$(CONFIG_VXLAN)          += lib/vxlan.o
>  mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
> -mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o
> +mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o lib/hv_vhca.o
>
>  #
>  # Ipoib netdev
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
> new file mode 100644
> index 0000000..84d1d75
> --- /dev/null
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
> @@ -0,0 +1,253 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +// Copyright (c) 2018 Mellanox Technologies
> +
> +#include <linux/hyperv.h>
> +#include "mlx5_core.h"
> +#include "lib/hv.h"
> +#include "lib/hv_vhca.h"
> +
> +struct mlx5_hv_vhca {
> +	struct mlx5_core_dev       *dev;
> +	struct workqueue_struct    *work_queue;
> +	struct mlx5_hv_vhca_agent  *agents[MLX5_HV_VHCA_AGENT_MAX];
> +	struct mutex                agents_lock; /* Protect agents array */
> +};
> +
> +struct mlx5_hv_vhca_work {
> +	struct work_struct     invalidate_work;
> +	struct mlx5_hv_vhca   *hv_vhca;
> +	u64                    block_mask;
> +};
> +
> +struct mlx5_hv_vhca_data_block {
> +	u16     sequence;
> +	u16     offset;
> +	u8      reserved[4];
> +	u64     data[15];
> +};
> +
> +struct mlx5_hv_vhca_agent {
> +	enum mlx5_hv_vhca_agent_type	 type;
> +	struct mlx5_hv_vhca		*hv_vhca;
> +	void				*priv;
> +	u16                              seq;
> +	void (*control)(struct mlx5_hv_vhca_agent *agent,
> +			struct mlx5_hv_vhca_control_block *block);
> +	void (*invalidate)(struct mlx5_hv_vhca_agent *agent,
> +			   u64 block_mask);
> +	void (*cleanup)(struct mlx5_hv_vhca_agent *agent);
> +};
> +
> +struct mlx5_hv_vhca *mlx5_hv_vhca_create(struct mlx5_core_dev *dev)
> +{
> +	struct mlx5_hv_vhca *hv_vhca = NULL;
> +
> +	hv_vhca = kzalloc(sizeof(*hv_vhca), GFP_KERNEL);
> +	if (!hv_vhca)
> +		return ERR_PTR(-ENOMEM);
> +
> +	hv_vhca->work_queue = create_singlethread_workqueue("mlx5_hv_vhca");

I was under impression that usage of create_* interfaces is discouraged,
It has WQ_MEMORY_LEGACY flag inside and commit b71ab8c2025ca talks about
this interface as legacy one.

> +	if (!hv_vhca->work_queue) {
> +		kfree(hv_vhca);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	hv_vhca->dev = dev;
> +	mutex_init(&hv_vhca->agents_lock);
> +
> +	return hv_vhca;
> +}
> +
> +void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca)
> +{
> +	if (IS_ERR_OR_NULL(hv_vhca))
> +		return;
> +
> +	destroy_workqueue(hv_vhca->work_queue);
> +	kfree(hv_vhca);
> +}
> +
> +static void mlx5_hv_vhca_invalidate_work(struct work_struct *work)
> +{
> +	struct mlx5_hv_vhca_work *hwork;
> +	struct mlx5_hv_vhca *hv_vhca;
> +	int i;
> +
> +	hwork = container_of(work, struct mlx5_hv_vhca_work, invalidate_work);
> +	hv_vhca = hwork->hv_vhca;
> +
> +	mutex_lock(&hv_vhca->agents_lock);
> +	for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++) {
> +		struct mlx5_hv_vhca_agent *agent = hv_vhca->agents[i];
> +
> +		if (!agent || !agent->invalidate)
> +			continue;
> +
> +		if (!(BIT(agent->type) & hwork->block_mask))
> +			continue;
> +
> +		agent->invalidate(agent, hwork->block_mask);
> +	}
> +	mutex_unlock(&hv_vhca->agents_lock);
> +
> +	kfree(hwork);
> +}
> +
> +void mlx5_hv_vhca_invalidate(void *context, u64 block_mask)
> +{
> +	struct mlx5_hv_vhca *hv_vhca = (struct mlx5_hv_vhca *)context;
> +	struct mlx5_hv_vhca_work *work;
> +
> +	work = kzalloc(sizeof(*work), GFP_ATOMIC);
> +	if (!work)
> +		return;
> +
> +	INIT_WORK(&work->invalidate_work, mlx5_hv_vhca_invalidate_work);
> +	work->hv_vhca    = hv_vhca;
> +	work->block_mask = block_mask;
> +
> +	queue_work(hv_vhca->work_queue, &work->invalidate_work);
> +}
> +
> +int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca)
> +{
> +	if (IS_ERR_OR_NULL(hv_vhca))
> +		return IS_ERR_OR_NULL(hv_vhca);
> +
> +	return mlx5_hv_register_invalidate(hv_vhca->dev, hv_vhca,
> +					   mlx5_hv_vhca_invalidate);
> +}
> +
> +void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
> +{
> +	int i;
> +
> +	if (IS_ERR_OR_NULL(hv_vhca))
> +		return;
> +
> +	mutex_lock(&hv_vhca->agents_lock);
> +	for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++)
> +		WARN_ON(hv_vhca->agents[i]);
> +
> +	mutex_unlock(&hv_vhca->agents_lock);
> +
> +	mlx5_hv_unregister_invalidate(hv_vhca->dev);
> +}
> +
> +struct mlx5_hv_vhca_agent *
> +mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
> +			  enum mlx5_hv_vhca_agent_type type,
> +			  void (*control)(struct mlx5_hv_vhca_agent*,
> +					  struct mlx5_hv_vhca_control_block *block),
> +			  void (*invalidate)(struct mlx5_hv_vhca_agent*,
> +					     u64 block_mask),
> +			  void (*cleaup)(struct mlx5_hv_vhca_agent *agent),
> +			  void *priv)
> +{
> +	struct mlx5_hv_vhca_agent *agent;
> +
> +	if (IS_ERR_OR_NULL(hv_vhca))
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (type >= MLX5_HV_VHCA_AGENT_MAX)
> +		return ERR_PTR(-EINVAL);
> +
> +	mutex_lock(&hv_vhca->agents_lock);
> +	if (hv_vhca->agents[type]) {
> +		mutex_unlock(&hv_vhca->agents_lock);
> +		return ERR_PTR(-EINVAL);
> +	}
> +	mutex_unlock(&hv_vhca->agents_lock);
> +
> +	agent = kzalloc(sizeof(*agent), GFP_KERNEL);
> +	if (!agent)
> +		return ERR_PTR(-ENOMEM);
> +
> +	agent->type      = type;
> +	agent->hv_vhca   = hv_vhca;
> +	agent->priv      = priv;
> +	agent->control   = control;
> +	agent->invalidate = invalidate;
> +	agent->cleanup   = cleaup;
> +
> +	mutex_lock(&hv_vhca->agents_lock);
> +	hv_vhca->agents[type] = agent;
> +	mutex_unlock(&hv_vhca->agents_lock);
> +
> +	return agent;
> +}
> +
> +void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent)
> +{
> +	struct mlx5_hv_vhca *hv_vhca = agent->hv_vhca;
> +
> +	mutex_lock(&hv_vhca->agents_lock);
> +
> +	if (WARN_ON(agent != hv_vhca->agents[agent->type])) {
> +		mutex_unlock(&hv_vhca->agents_lock);
> +		return;
> +	}
> +
> +	hv_vhca->agents[agent->type] = NULL;
> +	mutex_unlock(&hv_vhca->agents_lock);
> +
> +	if (agent->cleanup)
> +		agent->cleanup(agent);
> +
> +	kfree(agent);
> +}
> +
> +static int mlx5_hv_vhca_data_block_prepare(struct mlx5_hv_vhca_agent *agent,
> +					   struct mlx5_hv_vhca_data_block *data_block,
> +					   void *src, int len, int *offset)
> +{
> +	int bytes = min_t(int, (int)sizeof(data_block->data), len);
> +
> +	data_block->sequence = agent->seq;
> +	data_block->offset   = (*offset)++;
> +	memcpy(data_block->data, src, bytes);
> +
> +	return bytes;
> +}
> +
> +static void mlx5_hv_vhca_agent_seq_update(struct mlx5_hv_vhca_agent *agent)
> +{
> +	agent->seq++;
> +}
> +
> +int mlx5_hv_vhca_agent_write(struct mlx5_hv_vhca_agent *agent,
> +			     void *buf, int len)
> +{
> +	int offset = agent->type * HV_CONFIG_BLOCK_SIZE_MAX;
> +	int block_offset = 0;
> +	int total = 0;
> +	int err;
> +
> +	while (len) {
> +		struct mlx5_hv_vhca_data_block data_block = {0};
> +		int bytes;
> +
> +		bytes = mlx5_hv_vhca_data_block_prepare(agent, &data_block,
> +							buf + total,
> +							len, &block_offset);
> +		if (!bytes)
> +			return -ENOMEM;
> +
> +		err = mlx5_hv_write_config(agent->hv_vhca->dev, &data_block,
> +					   sizeof(data_block), offset);
> +		if (err)
> +			return err;
> +
> +		total += bytes;
> +		len   -= bytes;
> +	}
> +
> +	mlx5_hv_vhca_agent_seq_update(agent);
> +
> +	return 0;
> +}
> +
> +void *mlx5_hv_vhca_agent_priv(struct mlx5_hv_vhca_agent *agent)
> +{
> +	return agent->priv;
> +}
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
> new file mode 100644
> index 0000000..cdf1303
> --- /dev/null
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
> @@ -0,0 +1,102 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
> +/* Copyright (c) 2019 Mellanox Technologies. */
> +
> +#ifndef __LIB_HV_VHCA_H__
> +#define __LIB_HV_VHCA_H__
> +
> +#include "en.h"
> +#include "lib/hv.h"
> +
> +struct mlx5_hv_vhca_agent;
> +struct mlx5_hv_vhca;
> +struct mlx5_hv_vhca_control_block;
> +
> +enum mlx5_hv_vhca_agent_type {
> +	MLX5_HV_VHCA_AGENT_MAX = 32,
> +};
> +
> +#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
> +
> +struct mlx5_hv_vhca_control_block {
> +	u32     capabilities;
> +	u32     control;
> +	u16     command;
> +	u16     command_ack;
> +	u16     version;
> +	u16     rings;
> +	u32     reserved1[28];
> +};
> +
> +struct mlx5_hv_vhca *mlx5_hv_vhca_create(struct mlx5_core_dev *dev);
> +void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca);
> +int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca);
> +void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca);
> +void mlx5_hv_vhca_invalidate(void *context, u64 block_mask);
> +
> +struct mlx5_hv_vhca_agent *
> +mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
> +			  enum mlx5_hv_vhca_agent_type type,
> +			  void (*control)(struct mlx5_hv_vhca_agent*,
> +					  struct mlx5_hv_vhca_control_block *block),
> +			  void (*invalidate)(struct mlx5_hv_vhca_agent*,
> +					     u64 block_mask),
> +			  void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
> +			  void *context);
> +
> +void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent);
> +int mlx5_hv_vhca_agent_write(struct mlx5_hv_vhca_agent *agent,
> +			     void *buf, int len);
> +void *mlx5_hv_vhca_agent_priv(struct mlx5_hv_vhca_agent *agent);
> +
> +#else
> +
> +static inline struct mlx5_hv_vhca *
> +mlx5_hv_vhca_create(struct mlx5_core_dev *dev)
> +{
> +	return NULL;
> +}
> +
> +static inline void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca)
> +{
> +}
> +
> +static inline int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca)
> +{
> +	return 0;
> +}
> +
> +static inline void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
> +{
> +}
> +
> +static inline void mlx5_hv_vhca_invalidate(void *context,
> +					   u64 block_mask)
> +{
> +}
> +
> +static inline struct mlx5_hv_vhca_agent *
> +mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
> +			  enum mlx5_hv_vhca_agent_type type,
> +			  void (*control)(struct mlx5_hv_vhca_agent*,
> +					  struct mlx5_hv_vhca_control_block *block),
> +			  void (*invalidate)(struct mlx5_hv_vhca_agent*,
> +					     u64 block_mask),
> +			  void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
> +			  void *context)
> +{
> +	return NULL;
> +}
> +
> +static inline void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent)
> +{
> +}
> +
> +static inline int
> +mlx5_hv_vhca_write_agent(struct mlx5_hv_vhca_agent *agent,
> +			 void *buf, int len)
> +{
> +	return 0;
> +}
> +#endif
> +
> +#endif /* __LIB_HV_VHCA_H__ */
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 0b70b1d..61388ca 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -69,6 +69,7 @@
>  #include "lib/pci_vsc.h"
>  #include "diag/fw_tracer.h"
>  #include "ecpf.h"
> +#include "lib/hv_vhca.h"
>
>  MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
>  MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver");
> @@ -870,6 +871,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
>  	}
>
>  	dev->tracer = mlx5_fw_tracer_create(dev);
> +	dev->hv_vhca = mlx5_hv_vhca_create(dev);
>
>  	return 0;
>
> @@ -900,6 +902,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
>
>  static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
>  {
> +	mlx5_hv_vhca_destroy(dev->hv_vhca);
>  	mlx5_fw_tracer_destroy(dev->tracer);
>  	mlx5_fpga_cleanup(dev);
>  	mlx5_eswitch_cleanup(dev->priv.eswitch);
> @@ -1067,6 +1070,8 @@ static int mlx5_load(struct mlx5_core_dev *dev)
>  		goto err_fw_tracer;
>  	}
>
> +	mlx5_hv_vhca_init(dev->hv_vhca);

What is the point to declare this function as "int ..." if you are not
interested in result?

> +
>  	err = mlx5_fpga_device_start(dev);
>  	if (err) {
>  		mlx5_core_err(dev, "fpga device start failed %d\n", err);
> @@ -1122,6 +1127,7 @@ static int mlx5_load(struct mlx5_core_dev *dev)
>  err_ipsec_start:
>  	mlx5_fpga_device_stop(dev);
>  err_fpga_start:
> +	mlx5_hv_vhca_cleanup(dev->hv_vhca);
>  	mlx5_fw_tracer_cleanup(dev->tracer);
>  err_fw_tracer:
>  	mlx5_eq_table_destroy(dev);
> @@ -1142,6 +1148,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev)
>  	mlx5_accel_ipsec_cleanup(dev);
>  	mlx5_accel_tls_cleanup(dev);
>  	mlx5_fpga_device_stop(dev);
> +	mlx5_hv_vhca_cleanup(dev->hv_vhca);
>  	mlx5_fw_tracer_cleanup(dev->tracer);
>  	mlx5_eq_table_destroy(dev);
>  	mlx5_irq_table_destroy(dev);
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index df23f17..13b4cf2 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -659,6 +659,7 @@ struct mlx5_clock {
>  struct mlx5_fw_tracer;
>  struct mlx5_vxlan;
>  struct mlx5_geneve;
> +struct mlx5_hv_vhca;
>
>  struct mlx5_core_dev {
>  	struct device *device;
> @@ -706,6 +707,7 @@ struct mlx5_core_dev {
>  	struct mlx5_ib_clock_info  *clock_info;
>  	struct mlx5_fw_tracer   *tracer;
>  	u32                      vsc_addr;
> +	struct mlx5_hv_vhca	*hv_vhca;
>  };
>
>  struct mlx5_db {
> --
> 1.8.3.1
>

^ permalink raw reply

* Re: [PATCH net-next] r8152: saving the settings of EEE
From: David Miller @ 2019-08-22 18:52 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel
In-Reply-To: <1394712342-15778-304-Taiwan-albertk@realtek.com>

From: Hayes Wang <hayeswang@realtek.com>
Date: Thu, 22 Aug 2019 16:07:18 +0800

> +	if (tp->eee_en) {
> +		r8152_eee_en(tp, true);
> +		r8152_mmd_write(tp, MDIO_MMD_AN, MDIO_AN_EEE_ADV, tp->eee_adv);
> +	} else {
> +		r8152_eee_en(tp, false);
> +		r8152_mmd_write(tp, MDIO_MMD_AN, MDIO_AN_EEE_ADV, 0);
> +	}

I see this same exact code sequence at least 4 times in your patch, please
make a helper function.

^ permalink raw reply

* Re: New skb extension for use by LSMs (skb "security blob")?
From: Casey Schaufler @ 2019-08-22 18:50 UTC (permalink / raw)
  To: David Miller, paul; +Cc: netdev, linux-security-module, selinux, casey
In-Reply-To: <20190821.205454.2103510420957943248.davem@davemloft.net>

On 8/21/2019 8:54 PM, David Miller wrote:
> From: Paul Moore <paul@paul-moore.com>
> Date: Wed, 21 Aug 2019 23:27:03 -0400
>
>> On Wed, Aug 21, 2019 at 6:50 PM David Miller <davem@davemloft.net> wrote:
>>> From: Paul Moore <paul@paul-moore.com>
>>> Date: Wed, 21 Aug 2019 18:00:09 -0400
>>>
>>>> I was just made aware of the skb extension work, and it looks very
>>>> appealing from a LSM perspective.  As some of you probably remember,
>>>> we (the LSM folks) have wanted a proper security blob in the skb for
>>>> quite some time, but netdev has been resistant to this idea thus far.
>>>>
>>>> If I were to propose a patchset to add a SKB_EXT_SECURITY skb
>>>> extension (a single extension ID to be shared among the different
>>>> LSMs), would that be something that netdev would consider merging, or
>>>> is there still a philosophical objection to things like this?
>>> Unlike it's main intended user (MPTCP), it sounds like LSM's would use
>>> this in a way such that it would be enabled on most systems all the
>>> time.

Only SELinux and Smack use the networking hooks today,
although I understand that AppArmor has plans to do so
in the not too distant future. Smack enables labeled
networking at all times. While Smack doesn't have the
expansive use that SELinux does because of Android, it
is used extensively in embedded systems via Tizen and
Yocto Project deployments.

>>> That really defeats the whole purpose of making it dynamic. :-/

It argues that fulfilling the needs of LSMs ought to be a basic
feature of the skb, rather than a dynamic extension. When LSMs were
introduced 20 years ago it was assumed their use would be rare, and
it was. Today almost all Linux systems use LSMs, and once AppArmor
adds network labeling it will be quite difficult to find a major
distribution that doesn't need the support.

>> I would be okay with only adding a skb extension when we needed it,
>> which I'm currently thinking would only be when we had labeled
>> networking actually configured at runtime and not just built into the
>> kernel.  In SELinux we do something similar today when it comes to our
>> per-packet access controls; if labeled networking is not configured we
>> bail out of the LSM hooks early to improve performance (we would just
>> be comparing unlabeled_t to unlabeled_t anyway).  I think the other
>> LSMs would be okay with this usage as well.

Smack uses labeled (CIPSO now, CALIPSO 'soon') networking by default
and depends on it heavily for basic system policy enforcement.

>> While a number of distros due enable some form of LSM and the labeled
>> networking bits at build time, vary few (if any?) provide a default
>> configuration so I would expect no additional overhead in the common
>> case.

Tizen isn't a distro, but neither is Android.

>> Would that be acceptable?
> I honestly don't know, I kinda feared that once the SKB extension went in
> people would start dumping things there and that's exactly what's happening.

As Paul has mentioned, the LSM community (Paul and me in particular)
have been looking for a better way to deal with the network stack
for a long time.

> I just so happened to be reviewing:
>
> 	https://patchwork.ozlabs.org/patch/1150091/
>
> while you were writing this email.
>
> It's rediculous, the vultures are out.

^ permalink raw reply

* Re: [PATCHv4 net 1/2] ipv4/icmp: fix rt dst dev null pointer dereference
From: Julian Anastasov @ 2019-08-22 18:46 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Stefano Brivio, wenxu, Alexei Starovoitov,
	David S . Miller, Eric Dumazet
In-Reply-To: <20190822141949.29561-2-liuhangbin@gmail.com>


	Hello,

On Thu, 22 Aug 2019, Hangbin Liu wrote:

> In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
> e,g, with tunnel collect_md mode, which will cause kernel crash.
> Here is what the code path looks like, for GRE:
> 
> - ip6gre_tunnel_xmit
>   - ip6gre_xmit_ipv4
>     - __gre6_xmit
>       - ip6_tnl_xmit
>         - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
>     - icmp_send
>       - net = dev_net(rt->dst.dev); <-- here
> 
> The reason is __metadata_dst_init() init dst->dev to NULL by default.
> We could not fix it in __metadata_dst_init() as there is no dev supplied.
> On the other hand, the reason we need rt->dst.dev is to get the net.
> So we can just try get it from skb->dev when rt->dst.dev is NULL.
> 
> v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
> still use dst.dev and do a check to avoid crash.
> 
> v3: No changes.
> 
> v2: fix the issue in __icmp_send() instead of updating shared dst dev
> in {ip_md, ip6}_tunnel_xmit.
> 
> Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

	This patch looks good to me, thanks!

Reviewed-by: Julian Anastasov <ja@ssi.bg>

> ---
>  net/ipv4/icmp.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 1510e951f451..001f03f76bc4 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -582,7 +582,13 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
>  
>  	if (!rt)
>  		goto out;
> -	net = dev_net(rt->dst.dev);
> +
> +	if (rt->dst.dev)
> +		net = dev_net(rt->dst.dev);
> +	else if (skb_in->dev)
> +		net = dev_net(skb_in->dev);
> +	else
> +		goto out;
>  
>  	/*
>  	 *	Find the original header. It is expected to be valid, of course.
> -- 
> 2.19.2

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH bpf-next v5 03/11] xsk: add support to allow unaligned chunk placement
From: Jonathan Lemon @ 2019-08-22 18:43 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
	saeedm, maximmi, stephen, bruce.richardson, ciara.loftus, bpf,
	intel-wired-lan
In-Reply-To: <20190822014427.49800-4-kevin.laatz@intel.com>



On 21 Aug 2019, at 18:44, Kevin Laatz wrote:

> Currently, addresses are chunk size aligned. This means, we are very
> restricted in terms of where we can place chunk within the umem. For
> example, if we have a chunk size of 2k, then our chunks can only be 
> placed
> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>
> This patch introduces the ability to use unaligned chunks. With these
> changes, we are no longer bound to having to place chunks at a 2k (or
> whatever your chunk size is) interval. Since we are no longer dealing 
> with
> aligned chunks, they can now cross page boundaries. Checks for page
> contiguity have been added in order to keep track of which pages are
> followed by a physically contiguous page.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>
> ---
> v2:
>   - Add checks for the flags coming from userspace
>   - Fix how we get chunk_size in xsk_diag.c
>   - Add defines for masking the new descriptor format
>   - Modified the rx functions to use new descriptor format
>   - Modified the tx functions to use new descriptor format
>
> v3:
>   - Add helper function to do address/offset masking/addition
>
> v4:
>   - fixed page_start calculation in __xsk_rcv_memcpy().
>   - move offset handling to the xdp_umem_get_* functions
>   - modified the len field in xdp_umem_reg struct. We now use 16 bits 
> from
>     this for the flags field.
>   - removed next_pg_contig field from xdp_umem_page struct. Using low 
> 12
>     bits of addr to store flags instead.
>   - other minor changes based on review comments
>
> v5:
>   - Added accessors for getting addr and offset
>   - Added helper function to add offset to addr
>   - Fixed offset handling in xsk_rcv
>   - Removed bitfields from xdp_umem_reg
>   - Added struct size checking for xdp_umem_reg in xsk_setsockopt to 
> handle
>     different versions of the struct.
>   - fix conflicts after 'bpf-af-xdp-wakeup' was merged.
> ---
>  include/net/xdp_sock.h      | 75 +++++++++++++++++++++++++++--
>  include/uapi/linux/if_xdp.h |  9 ++++
>  net/xdp/xdp_umem.c          | 19 ++++++--
>  net/xdp/xsk.c               | 96 
> +++++++++++++++++++++++++++++--------
>  net/xdp/xsk_diag.c          |  2 +-
>  net/xdp/xsk_queue.h         | 68 ++++++++++++++++++++++----
>  6 files changed, 232 insertions(+), 37 deletions(-)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index f023b9940d64..c9398ce7960f 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -16,6 +16,13 @@
>  struct net_device;
>  struct xsk_queue;
>
> +/* Masks for xdp_umem_page flags.
> + * The low 12-bits of the addr will be 0 since this is the page 
> address, so we
> + * can use them for flags.
> + */
> +#define XSK_NEXT_PG_CONTIG_SHIFT 0
> +#define XSK_NEXT_PG_CONTIG_MASK (1ULL << XSK_NEXT_PG_CONTIG_SHIFT)
> +
>  struct xdp_umem_page {
>  	void *addr;
>  	dma_addr_t dma;
> @@ -27,8 +34,12 @@ struct xdp_umem_fq_reuse {
>  	u64 handles[];
>  };
>
> -/* Flags for the umem flags field. */
> -#define XDP_UMEM_USES_NEED_WAKEUP (1 << 0)
> +/* Flags for the umem flags field.
> + *
> + * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for 
> public
> + * flags. See inlude/uapi/include/linux/if_xdp.h.
> + */
> +#define XDP_UMEM_USES_NEED_WAKEUP (1 << 1)
>
>  struct xdp_umem {
>  	struct xsk_queue *fq;
> @@ -124,14 +135,36 @@ void xsk_map_try_sock_delete(struct xsk_map 
> *map, struct xdp_sock *xs,
>  int xsk_map_inc(struct xsk_map *map);
>  void xsk_map_put(struct xsk_map *map);
>
> +static inline u64 xsk_umem_extract_addr(u64 addr)
> +{
> +	return addr & XSK_UNALIGNED_BUF_ADDR_MASK;
> +}
> +
> +static inline u64 xsk_umem_extract_offset(u64 addr)
> +{
> +	return addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
> +}
> +
> +static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
> +{
> +	return xsk_umem_extract_addr(addr) + xsk_umem_extract_offset(addr);
> +}
> +
>  static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 
> addr)
>  {
> -	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 
> 1));
> +	unsigned long page_addr;
> +
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	page_addr = (unsigned long)umem->pages[addr >> PAGE_SHIFT].addr;
> +
> +	return (char *)(page_addr & PAGE_MASK) + (addr & ~PAGE_MASK);
>  }
>
>  static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 
> addr)
>  {
> -	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 
> 1));
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +
> +	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & ~PAGE_MASK);
>  }
>
>  /* Reuse-queue aware version of FILL queue helpers */
> @@ -172,6 +205,19 @@ static inline void xsk_umem_fq_reuse(struct 
> xdp_umem *umem, u64 addr)
>
>  	rq->handles[rq->length++] = addr;
>  }
> +
> +/* Handle the offset appropriately depending on aligned or unaligned 
> mode.
> + * For unaligned mode, we store the offset in the upper 16-bits of 
> the address.
> + * For aligned mode, we simply add the offset to the address.
> + */
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 
> address,
> +					 u64 offset)
> +{
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +		return address + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +	else
> +		return address + offset;
> +}
>  #else
>  static inline int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
> @@ -241,6 +287,21 @@ static inline struct xdp_umem 
> *xdp_get_umem_from_qid(struct net_device *dev,
>  	return NULL;
>  }
>
> +static inline u64 xsk_umem_extract_addr(u64 addr)
> +{
> +	return 0;
> +}
> +
> +static inline u64 xsk_umem_extract_offset(u64 addr)
> +{
> +	return 0;
> +}
> +
> +static inline u64 xsk_umem_add_offset_to_addr(u64 addr)
> +{
> +	return 0;
> +}
> +
>  static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 
> addr)
>  {
>  	return NULL;
> @@ -290,6 +351,12 @@ static inline bool 
> xsk_umem_uses_need_wakeup(struct xdp_umem *umem)
>  	return false;
>  }
>
> +static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 
> handle,
> +					 u64 offset)
> +{
> +	return 0;
> +}
> +
>  #endif /* CONFIG_XDP_SOCKETS */
>
>  #endif /* _LINUX_XDP_SOCK_H */
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index 62b80d57b72a..be328c59389d 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -26,6 +26,9 @@
>   */
>  #define XDP_USE_NEED_WAKEUP (1 << 3)
>
> +/* Flags for xsk_umem_config flags */
> +#define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0)
> +
>  struct sockaddr_xdp {
>  	__u16 sxdp_family;
>  	__u16 sxdp_flags;
> @@ -66,6 +69,7 @@ struct xdp_umem_reg {
>  	__u64 len; /* Length of packet data area */
>  	__u32 chunk_size;
>  	__u32 headroom;
> +	__u32 flags;
>  };
>
>  struct xdp_statistics {
> @@ -87,6 +91,11 @@ struct xdp_options {
>  #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>  #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>
> +/* Masks for unaligned chunks mode */
> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
> +
>  /* Rx/Tx descriptor */
>  struct xdp_desc {
>  	__u64 addr;
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 2d65779282a1..e997b263a0dd 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -340,6 +340,7 @@ static int xdp_umem_account_pages(struct xdp_umem 
> *umem)
>
>  static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg 
> *mr)
>  {
> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG;
>  	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>  	unsigned int chunks, chunks_per_page;
>  	u64 addr = mr->addr, size = mr->len;
> @@ -355,7 +356,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>  	}
>
> -	if (!is_power_of_2(chunk_size))
> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG |
> +			XDP_UMEM_USES_NEED_WAKEUP))
> +		return -EINVAL;
> +
> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>  		return -EINVAL;
>
>  	if (!PAGE_ALIGNED(addr)) {
> @@ -372,9 +377,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  	if (chunks == 0)
>  		return -EINVAL;
>
> -	chunks_per_page = PAGE_SIZE / chunk_size;
> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
> -		return -EINVAL;
> +	if (!unaligned_chunks) {
> +		chunks_per_page = PAGE_SIZE / chunk_size;
> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
> +			return -EINVAL;
> +	}
>
>  	headroom = ALIGN(headroom, 64);
>
> @@ -383,13 +390,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, 
> struct xdp_umem_reg *mr)
>  		return -EINVAL;
>
>  	umem->address = (unsigned long)addr;
> -	umem->chunk_mask = ~((u64)chunk_size - 1);
> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
> +					    : ~((u64)chunk_size - 1);
>  	umem->size = size;
>  	umem->headroom = headroom;
>  	umem->chunk_size_nohr = chunk_size - headroom;
>  	umem->npgs = size / PAGE_SIZE;
>  	umem->pgs = NULL;
>  	umem->user = NULL;
> +	umem->flags = mr->flags;
>  	INIT_LIST_HEAD(&umem->xsk_list);
>  	spin_lock_init(&umem->xsk_list_lock);
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index ee4428a892fa..907e5f12338f 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>
>  u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>  {
> -	return xskq_peek_addr(umem->fq, addr);
> +	return xskq_peek_addr(umem->fq, addr, umem);
>  }
>  EXPORT_SYMBOL(xsk_umem_peek_addr);
>
> @@ -115,21 +115,43 @@ bool xsk_umem_uses_need_wakeup(struct xdp_umem 
> *umem)
>  }
>  EXPORT_SYMBOL(xsk_umem_uses_need_wakeup);
>
> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one 
> for
> + * each page. This is only required in copy mode.
> + */
> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void 
> *from_buf,
> +			     u32 len, u32 metalen)
> +{
> +	void *to_buf = xdp_umem_get_data(umem, addr);
> +
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
> +		u64 page_start = addr & ~(PAGE_SIZE - 1);
> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> +
> +		memcpy(to_buf, from_buf, first_len + metalen);
> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
> +
> +		return;
> +	}
> +
> +	memcpy(to_buf, from_buf, len + metalen);
> +}
> +
>  static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 
> len)
>  {
> -	void *to_buf, *from_buf;
> +	u64 offset = xs->umem->headroom;
> +	u64 addr, memcpy_addr;
> +	void *from_buf;
>  	u32 metalen;
> -	u64 addr;
>  	int err;
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		xs->rx_dropped++;
>  		return -ENOSPC;
>  	}
>
> -	addr += xs->umem->headroom;
> -
>  	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>  		from_buf = xdp->data;
>  		metalen = 0;
> @@ -138,9 +160,11 @@ static int __xsk_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp, u32 len)
>  		metalen = xdp->data - xdp->data_meta;
>  	}
>
> -	to_buf = xdp_umem_get_data(xs->umem, addr);
> -	memcpy(to_buf, from_buf, len + metalen);
> -	addr += metalen;
> +	memcpy_addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
> +	__xsk_rcv_memcpy(xs->umem, memcpy_addr, from_buf, len, metalen);
> +
> +	offset += metalen;
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (!err) {
>  		xskq_discard_addr(xs->umem->fq);
> @@ -185,6 +209,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  {
>  	u32 metalen = xdp->data - xdp->data_meta;
>  	u32 len = xdp->data_end - xdp->data;
> +	u64 offset = xs->umem->headroom;
>  	void *buffer;
>  	u64 addr;
>  	int err;
> @@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct 
> xdp_buff *xdp)
>  		goto out_unlock;
>  	}
>
> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>  	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>  		err = -ENOSPC;
>  		goto out_drop;
>  	}
>
> -	addr += xs->umem->headroom;
> -
> -	buffer = xdp_umem_get_data(xs->umem, addr);
> +	buffer = xdp_umem_get_data(xs->umem, addr + offset);
>  	memcpy(buffer, xdp->data_meta, len + metalen);
> -	addr += metalen;
> +	offset += metalen;
> +
> +	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>  	err = xskq_produce_batch_desc(xs->rx, addr, len);
>  	if (err)
>  		goto out_drop;

Can't just add address and offset any longer.  This should read:

	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
	buffer = xdp_umem_get_data(xs->umem, addr);

	addr = xsk_umem_adjust_offset(xs->umem, addr, metalen);
	

so that offset and then metalen are added.  (or preserve the
address across the calls like memcpy_addr earlier).
-- 
Jonathan

> @@ -250,7 +275,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, 
> struct xdp_desc *desc)
>
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> -		if (!xskq_peek_desc(xs->tx, desc))
> +		if (!xskq_peek_desc(xs->tx, desc, umem))
>  			continue;
>
>  		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
> @@ -304,7 +329,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  	if (xs->queue_id >= xs->dev->real_num_tx_queues)
>  		goto out;
>
> -	while (xskq_peek_desc(xs->tx, &desc)) {
> +	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
>  		char *buffer;
>  		u64 addr;
>  		u32 len;
> @@ -333,7 +358,7 @@ static int xsk_generic_xmit(struct sock *sk, 
> struct msghdr *m,
>  		skb->dev = xs->dev;
>  		skb->priority = sk->sk_priority;
>  		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
> +		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
>  		skb->destructor = xsk_destruct_skb;
>
>  		err = dev_direct_xmit(skb, xs->queue_id);
> @@ -526,6 +551,24 @@ static struct socket *xsk_lookup_xsk_from_fd(int 
> fd)
>  	return sock;
>  }
>
> +/* Check if umem pages are contiguous.
> + * If zero-copy mode, use the DMA address to do the page contiguity 
> check
> + * For all other modes we use addr (kernel virtual address)
> + * Store the result in the low bits of addr.
> + */
> +static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 
> flags)
> +{
> +	struct xdp_umem_page *pgs = umem->pages;
> +	int i, is_contig;
> +
> +	for (i = 0; i < umem->npgs - 1; i++) {
> +		is_contig = (flags & XDP_ZEROCOPY) ?
> +			(pgs[i].dma + PAGE_SIZE == pgs[i + 1].dma) :
> +			(pgs[i].addr + PAGE_SIZE == pgs[i + 1].addr);
> +		pgs[i].addr += is_contig << XSK_NEXT_PG_CONTIG_SHIFT;
> +	}
> +}
> +
>  static int xsk_bind(struct socket *sock, struct sockaddr *addr, int 
> addr_len)
>  {
>  	struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
> @@ -616,6 +659,8 @@ static int xsk_bind(struct socket *sock, struct 
> sockaddr *addr, int addr_len)
>  		err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
>  		if (err)
>  			goto out_unlock;
> +
> +		xsk_check_page_contiguity(xs->umem, flags);
>  	}
>
>  	xs->dev = dev;
> @@ -636,6 +681,13 @@ static int xsk_bind(struct socket *sock, struct 
> sockaddr *addr, int addr_len)
>  	return err;
>  }
>
> +struct xdp_umem_reg_v1 {
> +	__u64 addr; /* Start of packet data area */
> +	__u64 len; /* Length of packet data area */
> +	__u32 chunk_size;
> +	__u32 headroom;
> +};
> +
>  static int xsk_setsockopt(struct socket *sock, int level, int 
> optname,
>  			  char __user *optval, unsigned int optlen)
>  {
> @@ -673,10 +725,16 @@ static int xsk_setsockopt(struct socket *sock, 
> int level, int optname,
>  	}
>  	case XDP_UMEM_REG:
>  	{
> -		struct xdp_umem_reg mr;
> +		size_t mr_size = sizeof(struct xdp_umem_reg);
> +		struct xdp_umem_reg mr = {};
>  		struct xdp_umem *umem;
>
> -		if (copy_from_user(&mr, optval, sizeof(mr)))
> +		if (optlen < sizeof(struct xdp_umem_reg_v1))
> +			return -EINVAL;
> +		else if (optlen < sizeof(mr))
> +			mr_size = sizeof(struct xdp_umem_reg_v1);
> +
> +		if (copy_from_user(&mr, optval, mr_size))
>  			return -EFAULT;
>
>  		mutex_lock(&xs->mutex);
> diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
> index d5e06c8e0cbf..9986a759fe06 100644
> --- a/net/xdp/xsk_diag.c
> +++ b/net/xdp/xsk_diag.c
> @@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock 
> *xs, struct sk_buff *nlskb)
>  	du.id = umem->id;
>  	du.size = umem->size;
>  	du.num_pages = umem->npgs;
> -	du.chunk_size = (__u32)(~umem->chunk_mask + 1);
> +	du.chunk_size = umem->chunk_size_nohr + umem->headroom;
>  	du.headroom = umem->headroom;
>  	du.ifindex = umem->dev ? umem->dev->ifindex : 0;
>  	du.queue_id = umem->queue_id;
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index dd9e985c2461..6c67c9d0294f 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -134,6 +134,17 @@ static inline bool xskq_has_addrs(struct 
> xsk_queue *q, u32 cnt)
>
>  /* UMEM queue */
>
> +static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, 
> u64 addr,
> +					      u64 length)
> +{
> +	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
> +	bool next_pg_contig =
> +		(unsigned long)umem->pages[(addr >> PAGE_SHIFT)].addr &
> +			XSK_NEXT_PG_CONTIG_MASK;
> +
> +	return cross_pg && !next_pg_contig;
> +}
> +
>  static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
>  {
>  	if (addr >= q->size) {
> @@ -144,23 +155,49 @@ static inline bool xskq_is_valid_addr(struct 
> xsk_queue *q, u64 addr)
>  	return true;
>  }
>
> -static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, 
> u64 addr,
> +						u64 length,
> +						struct xdp_umem *umem)
> +{
> +	addr = xsk_umem_add_offset_to_addr(addr);
> +	if (addr >= q->size ||
> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> +		q->invalid_descs++;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
> +				      struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
> +
> +		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +			if (xskq_is_valid_addr_unaligned(q, *addr,
> +							 umem->chunk_size_nohr,
> +							 umem))
> +				return addr;
> +			goto out;
> +		}
> +
>  		if (xskq_is_valid_addr(q, *addr))
>  			return addr;
>
> +out:
>  		q->cons_tail++;
>  	}
>
>  	return NULL;
>  }
>
> -static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
> +static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
> +				  struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -171,7 +208,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue 
> *q, u64 *addr)
>  		smp_rmb();
>  	}
>
> -	return xskq_validate_addr(q, addr);
> +	return xskq_validate_addr(q, addr, umem);
>  }
>
>  static inline void xskq_discard_addr(struct xsk_queue *q)
> @@ -230,8 +267,21 @@ static inline int xskq_reserve_addr(struct 
> xsk_queue *q)
>
>  /* Rx/Tx queue */
>
> -static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d)
> +static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct 
> xdp_desc *d,
> +				      struct xdp_umem *umem)
>  {
> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
> +		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
> +			return false;
> +
> +		if (d->len > umem->chunk_size_nohr || d->options) {
> +			q->invalid_descs++;
> +			return false;
> +		}
> +
> +		return true;
> +	}
> +
>  	if (!xskq_is_valid_addr(q, d->addr))
>  		return false;
>
> @@ -245,14 +295,15 @@ static inline bool xskq_is_valid_desc(struct 
> xsk_queue *q, struct xdp_desc *d)
>  }
>
>  static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue 
> *q,
> -						  struct xdp_desc *desc)
> +						  struct xdp_desc *desc,
> +						  struct xdp_umem *umem)
>  {
>  	while (q->cons_tail != q->cons_head) {
>  		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>  		unsigned int idx = q->cons_tail & q->ring_mask;
>
>  		*desc = READ_ONCE(ring->desc[idx]);
> -		if (xskq_is_valid_desc(q, desc))
> +		if (xskq_is_valid_desc(q, desc, umem))
>  			return desc;
>
>  		q->cons_tail++;
> @@ -262,7 +313,8 @@ static inline struct xdp_desc 
> *xskq_validate_desc(struct xsk_queue *q,
>  }
>
>  static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> -					      struct xdp_desc *desc)
> +					      struct xdp_desc *desc,
> +					      struct xdp_umem *umem)
>  {
>  	if (q->cons_tail == q->cons_head) {
>  		smp_mb(); /* D, matches A */
> @@ -273,7 +325,7 @@ static inline struct xdp_desc 
> *xskq_peek_desc(struct xsk_queue *q,
>  		smp_rmb(); /* C, matches B */
>  	}
>
> -	return xskq_validate_desc(q, desc);
> +	return xskq_validate_desc(q, desc, umem);
>  }
>
>  static inline void xskq_discard_desc(struct xsk_queue *q)
> -- 
> 2.17.1

^ permalink raw reply

* [PATCH] igb/igc: Don't warn on fatal read failures when the device is removed
From: Lyude Paul @ 2019-08-22 18:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: Feng Tang, Sasha Neftin, Jeff Kirsher, David S. Miller,
	linux-kernel

Fatal read errors are worth warning about, unless of course the device
was just unplugged from the machine - something that's a rather normal
occurence when the igb/igc adapter is located on a Thunderbolt dock. So,
let's only WARN() if there's a fatal read error while the device is
still present.

This fixes the following WARN splat that's been appearing whenever I
unplug my Caldigit TS3 Thunderbolt dock from my laptop:

  igb 0000:09:00.0 enp9s0: PCIe link lost
  ------------[ cut here ]------------
  igb: Failed to read reg 0x18!
  WARNING: CPU: 7 PID: 516 at
  drivers/net/ethernet/intel/igb/igb_main.c:756 igb_rd32+0x57/0x6a [igb]
  Modules linked in: igb dca thunderbolt fuse vfat fat elan_i2c mei_wdt
  mei_hdcp i915 wmi_bmof intel_wmi_thunderbolt iTCO_wdt
  iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp joydev
  coretemp crct10dif_pclmul crc32_pclmul i2c_algo_bit ghash_clmulni_intel
  intel_cstate drm_kms_helper intel_uncore syscopyarea sysfillrect
  sysimgblt fb_sys_fops intel_rapl_perf intel_xhci_usb_role_switch mei_me
  drm roles idma64 i2c_i801 ucsi_acpi typec_ucsi mei intel_lpss_pci
  processor_thermal_device typec intel_pch_thermal intel_soc_dts_iosf
  intel_lpss int3403_thermal thinkpad_acpi wmi int340x_thermal_zone
  ledtrig_audio int3400_thermal acpi_thermal_rel acpi_pad video
  pcc_cpufreq ip_tables serio_raw nvme nvme_core crc32c_intel uas
  usb_storage e1000e i2c_dev
  CPU: 7 PID: 516 Comm: kworker/u16:3 Not tainted 5.2.0-rc1Lyude-Test+ #14
  Hardware name: LENOVO 20L8S2N800/20L8S2N800, BIOS N22ET35W (1.12 ) 04/09/2018
  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
  RIP: 0010:igb_rd32+0x57/0x6a [igb]
  Code: 87 b8 fc ff ff 48 c7 47 08 00 00 00 00 48 c7 c6 33 42 9b c0 4c 89
  c7 e8 47 45 cd dc 89 ee 48 c7 c7 43 42 9b c0 e8 c1 94 71 dc <0f> 0b eb
  08 8b 00 ff c0 75 b0 eb c8 44 89 e0 5d 41 5c c3 0f 1f 44
  RSP: 0018:ffffba5801cf7c48 EFLAGS: 00010286
  RAX: 0000000000000000 RBX: ffff9e7956608840 RCX: 0000000000000007
  RDX: 0000000000000000 RSI: ffffba5801cf7b24 RDI: ffff9e795e3d6a00
  RBP: 0000000000000018 R08: 000000009dec4a01 R09: ffffffff9e61018f
  R10: 0000000000000000 R11: ffffba5801cf7ae5 R12: 00000000ffffffff
  R13: ffff9e7956608840 R14: ffff9e795a6f10b0 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff9e795e3c0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000564317bc4088 CR3: 000000010e00a006 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   igb_release_hw_control+0x1a/0x30 [igb]
   igb_remove+0xc5/0x14b [igb]
   pci_device_remove+0x3b/0x93
   device_release_driver_internal+0xd7/0x17e
   pci_stop_bus_device+0x36/0x75
   pci_stop_bus_device+0x66/0x75
   pci_stop_bus_device+0x66/0x75
   pci_stop_and_remove_bus_device+0xf/0x19
   trim_stale_devices+0xc5/0x13a
   ? __pm_runtime_resume+0x6e/0x7b
   trim_stale_devices+0x103/0x13a
   ? __pm_runtime_resume+0x6e/0x7b
   trim_stale_devices+0x103/0x13a
   acpiphp_check_bridge+0xd8/0xf5
   acpiphp_hotplug_notify+0xf7/0x14b
   ? acpiphp_check_bridge+0xf5/0xf5
   acpi_device_hotplug+0x357/0x3b5
   acpi_hotplug_work_fn+0x1a/0x23
   process_one_work+0x1a7/0x296
   worker_thread+0x1a8/0x24c
   ? process_scheduled_works+0x2c/0x2c
   kthread+0xe9/0xee
   ? kthread_destroy_worker+0x41/0x41
   ret_from_fork+0x35/0x40
  ---[ end trace 252bf10352c63d22 ]---

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 47e16692b26b ("igb/igc: warn when fatal read failure happens")
Cc: Feng Tang <feng.tang@intel.com>
Cc: Sasha Neftin <sasha.neftin@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
---
 drivers/net/ethernet/intel/igb/igb_main.c | 3 ++-
 drivers/net/ethernet/intel/igc/igc_main.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index e5b7e638df28..1a7f7cd28df9 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -753,7 +753,8 @@ u32 igb_rd32(struct e1000_hw *hw, u32 reg)
 		struct net_device *netdev = igb->netdev;
 		hw->hw_addr = NULL;
 		netdev_err(netdev, "PCIe link lost\n");
-		WARN(1, "igb: Failed to read reg 0x%x!\n", reg);
+		WARN(pci_device_is_present(igb->pdev),
+		     "igb: Failed to read reg 0x%x!\n", reg);
 	}
 
 	return value;
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 28072b9aa932..f873a4b35eaf 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -3934,7 +3934,8 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg)
 		hw->hw_addr = NULL;
 		netif_device_detach(netdev);
 		netdev_err(netdev, "PCIe link lost, device now detached\n");
-		WARN(1, "igc: Failed to read reg 0x%x!\n", reg);
+		WARN(pci_device_is_present(igc->pdev),
+		     "igc: Failed to read reg 0x%x!\n", reg);
 	}
 
 	return value;
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH net] openvswitch: Fix conntrack cache with timeout
From: Yi-Hung Wei @ 2019-08-22 18:32 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Linux Kernel Network Developers, Pravin Shelar
In-Reply-To: <201908230208.0aRY5GdN%lkp@intel.com>

On Thu, Aug 22, 2019 at 11:12 AM kbuild test robot <lkp@intel.com> wrote:
>
> Hi Yi-Hung,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on net/master]
>
> url:    https://github.com/0day-ci/linux/commits/Yi-Hung-Wei/openvswitch-Fix-conntrack-cache-with-timeout/20190822-212539
> reproduce:
>         # apt-get install sparse
>         # sparse version: v0.6.1-rc1-7-g2b96cd8-dirty
>         make ARCH=x86_64 allmodconfig
>         make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
>
> sparse warnings: (new ones prefixed by >>)
>
>    include/linux/sched.h:609:43: sparse: sparse: bad integer constant expression
>    include/linux/sched.h:609:73: sparse: sparse: invalid named zero-width bitfield `value'
>    include/linux/sched.h:610:43: sparse: sparse: bad integer constant expression
>    include/linux/sched.h:610:67: sparse: sparse: invalid named zero-width bitfield `bucket_id'
> >> net/openvswitch/conntrack.c:706:41: sparse: sparse: incompatible types in comparison expression (different address spaces):
> >> net/openvswitch/conntrack.c:706:41: sparse:    struct nf_ct_timeout *
> >> net/openvswitch/conntrack.c:706:41: sparse:    struct nf_ct_timeout [noderef] <asn:4> *

My v1 does not take care of the rcu pointer properly.  I will fix the
reported issue and send v2.

Thanks,

-Yi-Hung

^ permalink raw reply

* [PATCH] ethernet: Delete unnecessary checks before the macro call “dev_kfree_skb”
From: Markus Elfring @ 2019-08-22 18:30 UTC (permalink / raw)
  To: netdev, linux-arm-kernel, linux-stm32, intel-wired-lan,
	bcm-kernel-feedback-list, UNGLinuxDriver, Alexandre Torgue,
	Alexios Zavras, Allison Randal, Bryan Whitehead, Claudiu Manoil,
	David S. Miller, Doug Berger, Douglas Miller, Florian Fainelli,
	Giuseppe Cavallaro, Greg Kroah-Hartman, Jeff Kirsher,
	Jilayne Lovejoy, Jonathan Lemon, Jose Abreu, Kate Stewart,
	Luis Chamberlain, Maxime Coquelin, Michael Heimpold,
	Nicolas Pitre, Petr Štetiar, Shannon Nelson, Stefan Wahren,
	Steve Winslow, Thomas Gleixner, Wei Yongjun, Wolfram Sang,
	Yang Wei, YueHaibing, zhong jiang
  Cc: LKML, kernel-janitors

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Thu, 22 Aug 2019 20:02:56 +0200

The dev_kfree_skb() function performs also input parameter validation.
Thus the test around the shown calls is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/net/ethernet/amd/ni65.c                   |  6 ++----
 drivers/net/ethernet/broadcom/bcmsysport.c        |  3 +--
 drivers/net/ethernet/broadcom/genet/bcmgenet.c    | 11 +++--------
 drivers/net/ethernet/freescale/gianfar.c          |  3 +--
 drivers/net/ethernet/ibm/ehea/ehea_main.c         | 12 ++++--------
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c  |  3 +--
 drivers/net/ethernet/intel/e1000/e1000_main.c     |  3 +--
 drivers/net/ethernet/intel/e1000e/ethtool.c       |  6 ++----
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c   |  3 +--
 drivers/net/ethernet/intel/igb/igb_main.c         |  3 +--
 drivers/net/ethernet/intel/igc/igc_main.c         |  3 +--
 drivers/net/ethernet/micrel/ks8842.c              |  4 +---
 drivers/net/ethernet/microchip/lan743x_ptp.c      |  3 +--
 drivers/net/ethernet/packetengines/yellowfin.c    |  3 +--
 drivers/net/ethernet/qualcomm/qca_spi.c           |  3 +--
 drivers/net/ethernet/qualcomm/qca_uart.c          |  3 +--
 drivers/net/ethernet/sgi/meth.c                   |  3 +--
 drivers/net/ethernet/smsc/smc91x.c                |  3 +--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |  3 +--
 drivers/net/ethernet/sun/sunvnet_common.c         |  3 +--
 20 files changed, 27 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/amd/ni65.c b/drivers/net/ethernet/amd/ni65.c
index 87ff5d6d1b22..c6c2a54c1121 100644
--- a/drivers/net/ethernet/amd/ni65.c
+++ b/drivers/net/ethernet/amd/ni65.c
@@ -697,16 +697,14 @@ static void ni65_free_buffer(struct priv *p)
 	for(i=0;i<TMDNUM;i++) {
 		kfree(p->tmdbounce[i]);
 #ifdef XMT_VIA_SKB
-		if(p->tmd_skb[i])
-			dev_kfree_skb(p->tmd_skb[i]);
+		dev_kfree_skb(p->tmd_skb[i]);
 #endif
 	}

 	for(i=0;i<RMDNUM;i++)
 	{
 #ifdef RCV_VIA_SKB
-		if(p->recv_skb[i])
-			dev_kfree_skb(p->recv_skb[i]);
+		dev_kfree_skb(p->recv_skb[i]);
 #else
 		kfree(p->recvbounce[i]);
 #endif
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index 9483553ce444..6a47daec2302 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -708,8 +708,7 @@ static int bcm_sysport_alloc_rx_bufs(struct bcm_sysport_priv *priv)
 	for (i = 0; i < priv->num_rx_bds; i++) {
 		cb = &priv->rx_cbs[i];
 		skb = bcm_sysport_rx_refill(priv, cb);
-		if (skb)
-			dev_kfree_skb(skb);
+		dev_kfree_skb(skb);
 		if (!cb->skb)
 			return -ENOMEM;
 	}
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index d3a0b614dbfa..8b19ddcdafaa 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2515,19 +2515,14 @@ static int bcmgenet_dma_teardown(struct bcmgenet_priv *priv)
 static void bcmgenet_fini_dma(struct bcmgenet_priv *priv)
 {
 	struct netdev_queue *txq;
-	struct sk_buff *skb;
-	struct enet_cb *cb;
 	int i;

 	bcmgenet_fini_rx_napi(priv);
 	bcmgenet_fini_tx_napi(priv);

-	for (i = 0; i < priv->num_tx_bds; i++) {
-		cb = priv->tx_cbs + i;
-		skb = bcmgenet_free_tx_cb(&priv->pdev->dev, cb);
-		if (skb)
-			dev_kfree_skb(skb);
-	}
+	for (i = 0; i < priv->num_tx_bds; i++)
+		dev_kfree_skb(bcmgenet_free_tx_cb(&priv->pdev->dev,
+						  priv->tx_cbs + i));

 	for (i = 0; i < priv->hw_params->tx_queues; i++) {
 		txq = netdev_get_tx_queue(priv->dev, priv->tx_rings[i].queue);
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 7ea19e173339..412c0340fed9 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2005,8 +2005,7 @@ static void free_skb_rx_queue(struct gfar_priv_rx_q *rx_queue)

 	struct rxbd8 *rxbdp = rx_queue->rx_bd_base;

-	if (rx_queue->skb)
-		dev_kfree_skb(rx_queue->skb);
+	dev_kfree_skb(rx_queue->skb);

 	for (i = 0; i < rx_queue->rx_ring_size; i++) {
 		struct	gfar_rx_buff *rxb = &rx_queue->rx_buff[i];
diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c
index cca71ba7a74a..13e30eba5349 100644
--- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
+++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
@@ -1577,20 +1577,16 @@ static int ehea_clean_portres(struct ehea_port *port, struct ehea_port_res *pr)
 		ehea_destroy_eq(pr->eq);

 		for (i = 0; i < pr->rq1_skba.len; i++)
-			if (pr->rq1_skba.arr[i])
-				dev_kfree_skb(pr->rq1_skba.arr[i]);
+			dev_kfree_skb(pr->rq1_skba.arr[i]);

 		for (i = 0; i < pr->rq2_skba.len; i++)
-			if (pr->rq2_skba.arr[i])
-				dev_kfree_skb(pr->rq2_skba.arr[i]);
+			dev_kfree_skb(pr->rq2_skba.arr[i]);

 		for (i = 0; i < pr->rq3_skba.len; i++)
-			if (pr->rq3_skba.arr[i])
-				dev_kfree_skb(pr->rq3_skba.arr[i]);
+			dev_kfree_skb(pr->rq3_skba.arr[i]);

 		for (i = 0; i < pr->sq_skba.len; i++)
-			if (pr->sq_skba.arr[i])
-				dev_kfree_skb(pr->sq_skba.arr[i]);
+			dev_kfree_skb(pr->sq_skba.arr[i]);

 		vfree(pr->rq1_skba.arr);
 		vfree(pr->rq2_skba.arr);
diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index a41008523c98..71d3d8854d8f 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -937,8 +937,7 @@ static void e1000_free_desc_rings(struct e1000_adapter *adapter)
 						 txdr->buffer_info[i].dma,
 						 txdr->buffer_info[i].length,
 						 DMA_TO_DEVICE);
-			if (txdr->buffer_info[i].skb)
-				dev_kfree_skb(txdr->buffer_info[i].skb);
+			dev_kfree_skb(txdr->buffer_info[i].skb);
 		}
 	}

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 6b6ba1c38235..86493fea56e4 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -4175,8 +4175,7 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
 				/* an error means any chain goes out the window
 				 * too
 				 */
-				if (rx_ring->rx_skb_top)
-					dev_kfree_skb(rx_ring->rx_skb_top);
+				dev_kfree_skb(rx_ring->rx_skb_top);
 				rx_ring->rx_skb_top = NULL;
 				goto next_desc;
 			}
diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c
index 08342698386d..de8c5818a305 100644
--- a/drivers/net/ethernet/intel/e1000e/ethtool.c
+++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
@@ -1126,8 +1126,7 @@ static void e1000_free_desc_rings(struct e1000_adapter *adapter)
 						 buffer_info->dma,
 						 buffer_info->length,
 						 DMA_TO_DEVICE);
-			if (buffer_info->skb)
-				dev_kfree_skb(buffer_info->skb);
+			dev_kfree_skb(buffer_info->skb);
 		}
 	}

@@ -1139,8 +1138,7 @@ static void e1000_free_desc_rings(struct e1000_adapter *adapter)
 				dma_unmap_single(&pdev->dev,
 						 buffer_info->dma,
 						 2048, DMA_FROM_DEVICE);
-			if (buffer_info->skb)
-				dev_kfree_skb(buffer_info->skb);
+			dev_kfree_skb(buffer_info->skb);
 		}
 	}

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index d3e85480f46d..09f7a246e134 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -253,8 +253,7 @@ static void fm10k_clean_rx_ring(struct fm10k_ring *rx_ring)
 	if (!rx_ring->rx_buffer)
 		return;

-	if (rx_ring->skb)
-		dev_kfree_skb(rx_ring->skb);
+	dev_kfree_skb(rx_ring->skb);
 	rx_ring->skb = NULL;

 	/* Free all the Rx ring sk_buffs */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index b63e77528a91..105b0624081a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4731,8 +4731,7 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring)
 {
 	u16 i = rx_ring->next_to_clean;

-	if (rx_ring->skb)
-		dev_kfree_skb(rx_ring->skb);
+	dev_kfree_skb(rx_ring->skb);
 	rx_ring->skb = NULL;

 	/* Free all the Rx ring sk_buffs */
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index e5114bebd30b..251552855c40 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -352,8 +352,7 @@ static void igc_clean_rx_ring(struct igc_ring *rx_ring)
 {
 	u16 i = rx_ring->next_to_clean;

-	if (rx_ring->skb)
-		dev_kfree_skb(rx_ring->skb);
+	dev_kfree_skb(rx_ring->skb);
 	rx_ring->skb = NULL;

 	/* Free all the Rx ring sk_buffs */
diff --git a/drivers/net/ethernet/micrel/ks8842.c b/drivers/net/ethernet/micrel/ks8842.c
index ccd06702cc56..da329ca115cc 100644
--- a/drivers/net/ethernet/micrel/ks8842.c
+++ b/drivers/net/ethernet/micrel/ks8842.c
@@ -580,9 +580,7 @@ static int __ks8842_start_new_rx_dma(struct net_device *netdev)
 		dma_unmap_single(adapter->dev, sg_dma_address(sg),
 			DMA_BUFFER_SIZE, DMA_FROM_DEVICE);
 	sg_dma_address(sg) = 0;
-	if (ctl->skb)
-		dev_kfree_skb(ctl->skb);
-
+	dev_kfree_skb(ctl->skb);
 	ctl->skb = NULL;

 	printk(KERN_ERR DRV_NAME": Failed to start RX DMA: %d\n", err);
diff --git a/drivers/net/ethernet/microchip/lan743x_ptp.c b/drivers/net/ethernet/microchip/lan743x_ptp.c
index b2109eca81fd..57b26c2acf87 100644
--- a/drivers/net/ethernet/microchip/lan743x_ptp.c
+++ b/drivers/net/ethernet/microchip/lan743x_ptp.c
@@ -963,8 +963,7 @@ void lan743x_ptp_close(struct lan743x_adapter *adapter)
 		index++) {
 		struct sk_buff *skb = ptp->tx_ts_skb_queue[index];

-		if (skb)
-			dev_kfree_skb(skb);
+		dev_kfree_skb(skb);
 		ptp->tx_ts_skb_queue[index] = NULL;
 		ptp->tx_ts_seconds_queue[index] = 0;
 		ptp->tx_ts_nseconds_queue[index] = 0;
diff --git a/drivers/net/ethernet/packetengines/yellowfin.c b/drivers/net/ethernet/packetengines/yellowfin.c
index 6f8d6584f809..5113ee647090 100644
--- a/drivers/net/ethernet/packetengines/yellowfin.c
+++ b/drivers/net/ethernet/packetengines/yellowfin.c
@@ -1258,8 +1258,7 @@ static int yellowfin_close(struct net_device *dev)
 		yp->rx_skbuff[i] = NULL;
 	}
 	for (i = 0; i < TX_RING_SIZE; i++) {
-		if (yp->tx_skbuff[i])
-			dev_kfree_skb(yp->tx_skbuff[i]);
+		dev_kfree_skb(yp->tx_skbuff[i]);
 		yp->tx_skbuff[i] = NULL;
 	}

diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c
index b28360bc2255..5ecf61df78bd 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -837,8 +837,7 @@ qcaspi_netdev_uninit(struct net_device *dev)

 	kfree(qca->rx_buffer);
 	qca->buffer_size = 0;
-	if (qca->rx_skb)
-		dev_kfree_skb(qca->rx_skb);
+	dev_kfree_skb(qca->rx_skb);
 }

 static const struct net_device_ops qcaspi_netdev_ops = {
diff --git a/drivers/net/ethernet/qualcomm/qca_uart.c b/drivers/net/ethernet/qualcomm/qca_uart.c
index 590616846cd1..0981068504fa 100644
--- a/drivers/net/ethernet/qualcomm/qca_uart.c
+++ b/drivers/net/ethernet/qualcomm/qca_uart.c
@@ -285,8 +285,7 @@ static void qcauart_netdev_uninit(struct net_device *dev)
 {
 	struct qcauart *qca = netdev_priv(dev);

-	if (qca->rx_skb)
-		dev_kfree_skb(qca->rx_skb);
+	dev_kfree_skb(qca->rx_skb);
 }

 static const struct net_device_ops qcauart_netdev_ops = {
diff --git a/drivers/net/ethernet/sgi/meth.c b/drivers/net/ethernet/sgi/meth.c
index 00660dd820e2..539bc5db989c 100644
--- a/drivers/net/ethernet/sgi/meth.c
+++ b/drivers/net/ethernet/sgi/meth.c
@@ -247,8 +247,7 @@ static void meth_free_tx_ring(struct meth_private *priv)

 	/* Remove any pending skb */
 	for (i = 0; i < TX_RING_ENTRIES; i++) {
-		if (priv->tx_skbs[i])
-			dev_kfree_skb(priv->tx_skbs[i]);
+		dev_kfree_skb(priv->tx_skbs[i]);
 		priv->tx_skbs[i] = NULL;
 	}
 	dma_free_coherent(&priv->pdev->dev, TX_RING_BUFFER_SIZE, priv->tx_ring,
diff --git a/drivers/net/ethernet/smsc/smc91x.c b/drivers/net/ethernet/smsc/smc91x.c
index 601e76ad99a0..3a6761131f4c 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -378,8 +378,7 @@ static void smc_shutdown(struct net_device *dev)
 	pending_skb = lp->pending_tx_skb;
 	lp->pending_tx_skb = NULL;
 	spin_unlock_irq(&lp->lock);
-	if (pending_skb)
-		dev_kfree_skb(pending_skb);
+	dev_kfree_skb(pending_skb);

 	/* and tell the card to stay away from that nasty outside world */
 	SMC_SELECT_BANK(lp, 0);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index bd1078433448..06ccd216ae90 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3519,8 +3519,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 		if (unlikely(error && (status & rx_not_ls)))
 			goto read_again;
 		if (unlikely(error)) {
-			if (skb)
-				dev_kfree_skb(skb);
+			dev_kfree_skb(skb);
 			continue;
 		}

diff --git a/drivers/net/ethernet/sun/sunvnet_common.c b/drivers/net/ethernet/sun/sunvnet_common.c
index 646e67236b65..8b94d9ad9e2b 100644
--- a/drivers/net/ethernet/sun/sunvnet_common.c
+++ b/drivers/net/ethernet/sun/sunvnet_common.c
@@ -1532,8 +1532,7 @@ sunvnet_start_xmit_common(struct sk_buff *skb, struct net_device *dev,
 	else if (port)
 		del_timer(&port->clean_timer);
 	rcu_read_unlock();
-	if (skb)
-		dev_kfree_skb(skb);
+	dev_kfree_skb(skb);
 	vnet_free_skbs(freeskbs);
 	dev->stats.tx_dropped++;
 	return NETDEV_TX_OK;
--
2.23.0


^ permalink raw reply related

* Re: [PATCH spi for-5.4 3/5] spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing
From: Mark Brown @ 2019-08-22 17:38 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli,
	linux-spi, netdev
In-Reply-To: <20190818182600.3047-4-olteanv@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 435 bytes --]

On Sun, Aug 18, 2019 at 09:25:58PM +0300, Vladimir Oltean wrote:
> On platforms like LS1021A which use TCFQ mode, an interrupt needs to be
> processed after each byte is TXed/RXed. I tried to make the DSPI
> implementation on this SoC operate in other, more efficient modes (EOQ,
> DMA) but it looks like it simply isn't possible.

This doesn't apply against current code (I guess due to your cleanup
series), please check and resend.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH spi for-5.4 2/5] spi: Add a PTP system timestamp to the transfer structure
From: Mark Brown @ 2019-08-22 17:11 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli,
	linux-spi, netdev
In-Reply-To: <20190818182600.3047-3-olteanv@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1695 bytes --]

On Sun, Aug 18, 2019 at 09:25:57PM +0300, Vladimir Oltean wrote:

> @@ -1391,6 +1402,13 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
>  		goto out;
>  	}
>  
> +	if (!ctlr->ptp_sts_supported) {
> +		list_for_each_entry(xfer, &mesg->transfers, transfer_list) {
> +			xfer->ptp_sts_word_pre = 0;
> +			ptp_read_system_prets(xfer->ptp_sts);
> +		}
> +	}
> +

We can do better than this for controllers which use transfer_one().

> +const void *spi_xfer_ptp_sts_word(struct spi_transfer *xfer, bool pre)
> +{

xfer can be const here too.

> + * @ptp_sts_supported: If the driver sets this to true, it must provide a
> + *	time snapshot in @spi_transfer->ptp_sts as close as possible to the
> + *	moment in time when @spi_transfer->ptp_sts_word_pre and
> + *	@spi_transfer->ptp_sts_word_post were transmitted.
> + *	If the driver does not set this, the SPI core takes the snapshot as
> + *	close to the driver hand-over as possible.

A couple of issues here.  The big one is that for PIO transfers
this is going to either complicate the code or introduce overhead
in individual drivers for an extremely niche use case.  I guess
most drivers won't implement it which makes this a bit moot but
then this is a concern that pushes back against the idea of
implementing the feature.

The other is that it's not 100% clear what you're looking to
timestamp here - is it when the data goes on the wire, is it when
the data goes on the FIFO (which could be relatively large)?  I'm
guessing you're looking for the physical transfer here, if that's
the case should there be some effort to compensate for the delays
in the controller?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net] openvswitch: Fix conntrack cache with timeout
From: kbuild test robot @ 2019-08-22 18:11 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: kbuild-all, netdev, pshelar, Yi-Hung Wei
In-Reply-To: <1566432854-35880-1-git-send-email-yihung.wei@gmail.com>

Hi Yi-Hung,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net/master]

url:    https://github.com/0day-ci/linux/commits/Yi-Hung-Wei/openvswitch-Fix-conntrack-cache-with-timeout/20190822-212539
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-rc1-7-g2b96cd8-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   include/linux/sched.h:609:43: sparse: sparse: bad integer constant expression
   include/linux/sched.h:609:73: sparse: sparse: invalid named zero-width bitfield `value'
   include/linux/sched.h:610:43: sparse: sparse: bad integer constant expression
   include/linux/sched.h:610:67: sparse: sparse: invalid named zero-width bitfield `bucket_id'
>> net/openvswitch/conntrack.c:706:41: sparse: sparse: incompatible types in comparison expression (different address spaces):
>> net/openvswitch/conntrack.c:706:41: sparse:    struct nf_ct_timeout *
>> net/openvswitch/conntrack.c:706:41: sparse:    struct nf_ct_timeout [noderef] <asn:4> *

vim +706 net/openvswitch/conntrack.c

   670	
   671	/* Determine whether skb->_nfct is equal to the result of conntrack lookup. */
   672	static bool skb_nfct_cached(struct net *net,
   673				    const struct sw_flow_key *key,
   674				    const struct ovs_conntrack_info *info,
   675				    struct sk_buff *skb)
   676	{
   677		enum ip_conntrack_info ctinfo;
   678		struct nf_conn *ct;
   679		bool ct_executed = true;
   680	
   681		ct = nf_ct_get(skb, &ctinfo);
   682		if (!ct)
   683			ct = ovs_ct_executed(net, key, info, skb, &ct_executed);
   684	
   685		if (ct)
   686			nf_ct_get(skb, &ctinfo);
   687		else
   688			return false;
   689	
   690		if (!net_eq(net, read_pnet(&ct->ct_net)))
   691			return false;
   692		if (!nf_ct_zone_equal_any(info->ct, nf_ct_zone(ct)))
   693			return false;
   694		if (info->helper) {
   695			struct nf_conn_help *help;
   696	
   697			help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
   698			if (help && rcu_access_pointer(help->helper) != info->helper)
   699				return false;
   700		}
   701		if (info->nf_ct_timeout) {
   702			struct nf_conn_timeout *timeout_ext;
   703	
   704			timeout_ext = nf_ct_timeout_find(ct);
   705			if (!timeout_ext ||
 > 706			    info->nf_ct_timeout != timeout_ext->timeout)
   707				return false;
   708		}
   709		/* Force conntrack entry direction to the current packet? */
   710		if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
   711			/* Delete the conntrack entry if confirmed, else just release
   712			 * the reference.
   713			 */
   714			if (nf_ct_is_confirmed(ct))
   715				nf_ct_delete(ct, 0, 0);
   716	
   717			nf_conntrack_put(&ct->ct_general);
   718			nf_ct_set(skb, NULL, 0);
   719			return false;
   720		}
   721	
   722		return ct_executed;
   723	}
   724	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH] nl80211: add NL80211_CMD_UPDATE_FT_IES to supported commands
From: Johannes Berg @ 2019-08-22 18:08 UTC (permalink / raw)
  To: Brian Norris, Matthew Wang; +Cc: davem, linux-wireless, netdev, linux-kernel
In-Reply-To: <20190822180737.GA177276@google.com>

On Thu, 2019-08-22 at 11:07 -0700, Brian Norris wrote:
> On Thu, Aug 22, 2019 at 10:48:06AM -0700, Matthew Wang wrote:
> > Add NL80211_CMD_UPDATE_FT_IES to supported commands. In mac80211 drivers,
> > this can be implemented via existing NL80211_CMD_AUTHENTICATE and
> > NL80211_ATTR_IE, but non-mac80211 drivers have a separate command for
> > this. A driver supports FT if it either is mac80211 or supports this
> > command.
> > 
> > Signed-off-by: Matthew Wang <matthewmwang@chromium.org>
> 
> FWIW:
> 
> Reviewed-by: Brian Norris <briannorris@chromium.org>
> 
> > Change-Id: I93e3d09a6d949466d1aea48bff2c3ad862edccc6
> 
> Oops :)

:)

No worries, I can edit that out.

johannes


^ permalink raw reply

* Re: [PATCH] nl80211: add NL80211_CMD_UPDATE_FT_IES to supported commands
From: Brian Norris @ 2019-08-22 18:07 UTC (permalink / raw)
  To: Matthew Wang; +Cc: johannes, davem, linux-wireless, netdev, linux-kernel
In-Reply-To: <20190822174806.2954-1-matthewmwang@chromium.org>

On Thu, Aug 22, 2019 at 10:48:06AM -0700, Matthew Wang wrote:
> Add NL80211_CMD_UPDATE_FT_IES to supported commands. In mac80211 drivers,
> this can be implemented via existing NL80211_CMD_AUTHENTICATE and
> NL80211_ATTR_IE, but non-mac80211 drivers have a separate command for
> this. A driver supports FT if it either is mac80211 or supports this
> command.
> 
> Signed-off-by: Matthew Wang <matthewmwang@chromium.org>

FWIW:

Reviewed-by: Brian Norris <briannorris@chromium.org>

> Change-Id: I93e3d09a6d949466d1aea48bff2c3ad862edccc6

Oops :)

^ permalink raw reply

* [PATCH] nl80211: add NL80211_CMD_UPDATE_FT_IES to supported commands
From: Matthew Wang @ 2019-08-22 17:48 UTC (permalink / raw)
  To: johannes; +Cc: davem, linux-wireless, netdev, linux-kernel, Matthew Wang

Add NL80211_CMD_UPDATE_FT_IES to supported commands. In mac80211 drivers,
this can be implemented via existing NL80211_CMD_AUTHENTICATE and
NL80211_ATTR_IE, but non-mac80211 drivers have a separate command for
this. A driver supports FT if it either is mac80211 or supports this
command.

Signed-off-by: Matthew Wang <matthewmwang@chromium.org>
Change-Id: I93e3d09a6d949466d1aea48bff2c3ad862edccc6
---
 net/wireless/nl80211.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index fd05ae1437a9..c2f9e6b429b2 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2065,6 +2065,7 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
 				CMD(add_tx_ts, ADD_TX_TS);
 			CMD(set_multicast_to_unicast, SET_MULTICAST_TO_UNICAST);
 			CMD(update_connect_params, UPDATE_CONNECT_PARAMS);
+			CMD(update_ft_ies, UPDATE_FT_IES);
 		}
 #undef CMD
 
-- 
2.23.0.187.g17f5b7556c-goog


^ permalink raw reply related

* Re: [net] devlink: Add method for time-stamp on reporter's dump
From: Ido Schimmel @ 2019-08-22 17:40 UTC (permalink / raw)
  To: Andrew Lunn, arnd
  Cc: Aya Levin, David S. Miller, Jiri Pirko, netdev, linux-kernel
In-Reply-To: <20190822140635.GH13020@lunn.ch>

On Thu, Aug 22, 2019 at 04:06:35PM +0200, Andrew Lunn wrote:
> On Thu, Aug 22, 2019 at 11:17:51AM +0300, Aya Levin wrote:
> > When setting the dump's time-stamp, use ktime_get_real in addition to
> > jiffies. This simplifies the user space implementation and bypasses
> > some inconsistent behavior with translating jiffies to current time.
> 
> Hi Aya
> 
> Is this year 2038 safe? I don't know enough about this to answer the
> question myself.

Hi Andrew,

Good point. 'struct timespec' is not considered year 2038 safe and
unfortunately I recently made the mistake of using it to communicate
timestamps to user space over netlink. :/ The code is still in net-next,
so I will fix it while I can.

Arnd, would it be acceptable to use 'struct __kernel_timespec' instead?

Thanks

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox