Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [PATCH net-next v3 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-09  7:38 UTC (permalink / raw)
  To: Ilias Apalodimas, David Miller
  Cc: Jose.Abreu@synopsys.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, Joao.Pinto@synopsys.com,
	peppe.cavallaro@st.com, alexandre.torgue@st.com,
	brouer@redhat.com, arnd@arndb.de
In-Reply-To: <20190709072356.GA4599@apalos>

From: Ilias Apalodimas <ilias.apalodimas@linaro.org> | Date: Tue, Jul 09, 2019 at 08:23:56

> The patch from Ivan did get merged, can you change the free call to
> page_pool_destroy and re-spin? You can add my acked-by

Yes, I will re-spin then. Thanks!

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH 1/2] forcedeth: add recv cache make nic work steadily
From: Yanjun Zhu @ 2019-07-09  7:38 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem
In-Reply-To: <20190708135257.18200316@cakuba.netronome.com>


On 2019/7/9 4:52, Jakub Kicinski wrote:
> On Fri,  5 Jul 2019 02:19:27 -0400, Zhu Yanjun wrote:
>> A recv cache is added. The size of recv cache is 1000Mb / skb_length.
>> When the system memory is not enough, this recv cache can make nic work
>> steadily.
>> When nic is up, this recv cache and work queue are created. When nic
>> is down, this recv cache will be destroyed and delayed workqueue is
>> canceled.
>> When nic is polled or rx interrupt is triggerred, rx handler will
>> get a skb from recv cache. Then the state of recv cache is checked.
>> If recv cache is not in filling up state, a work is queued to fill
>> up recv cache.
>> When skb size is changed, the old recv cache is destroyed and new recv
>> cache is created.
>> When the system memory is not enough, the allocation of skb failed.
>> recv cache will continue allocate skb until the recv cache is filled up.
>> When the system memory is not enough, this can make nic work steadily.
>> Becase of recv cache, the performance of nic is enhanced.
>>
>> CC: Joe Jin <joe.jin@oracle.com>
>> CC: Junxiao Bi <junxiao.bi@oracle.com>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> Could you tell us a little more about the use case and the system
> condition?

When the host run for long time, there are a lot of memory fragments in 
hosts.

In this condition, this patch can help us a lot.

>
>> diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
>> index b327b29..a673005 100644
>> --- a/drivers/net/ethernet/nvidia/forcedeth.c
>> +++ b/drivers/net/ethernet/nvidia/forcedeth.c
>> @@ -674,6 +674,11 @@ struct nv_ethtool_stats {
>>   	u64 tx_broadcast;
>>   };
>>   
>> +/* 1000Mb is 125M bytes, 125 * 1024 * 1024 bytes
>> + * The length of recv cache is 125M / skb_length
>> + */
>> +#define RECV_CACHE_LIST_LENGTH		(125 * 1024 * 1024 / np->rx_buf_sz)
>> +
>>   #define NV_DEV_STATISTICS_V3_COUNT (sizeof(struct nv_ethtool_stats)/sizeof(u64))
>>   #define NV_DEV_STATISTICS_V2_COUNT (NV_DEV_STATISTICS_V3_COUNT - 3)
>>   #define NV_DEV_STATISTICS_V1_COUNT (NV_DEV_STATISTICS_V2_COUNT - 6)
>> @@ -844,8 +849,18 @@ struct fe_priv {
>>   	char name_rx[IFNAMSIZ + 3];       /* -rx    */
>>   	char name_tx[IFNAMSIZ + 3];       /* -tx    */
>>   	char name_other[IFNAMSIZ + 6];    /* -other */
>> +
>> +	/* This is to schedule work */
>> +	struct delayed_work     recv_cache_work;
>> +	/* This list is to store skb queue for recv */
>> +	struct sk_buff_head recv_list;
>> +	unsigned long nv_recv_list_state;
>>   };
>>   
>> +/* This is recv list state to fill up recv cache */
>> +enum recv_list_state {
>> +	RECV_LIST_ALLOCATE
>> +};
>>   /*
>>    * Maximum number of loops until we assume that a bit in the irq mask
>>    * is stuck. Overridable with module param.
>> @@ -1804,7 +1819,11 @@ static int nv_alloc_rx(struct net_device *dev)
>>   		less_rx = np->last_rx.orig;
>>   
>>   	while (np->put_rx.orig != less_rx) {
>> -		struct sk_buff *skb = netdev_alloc_skb(dev, np->rx_buf_sz + NV_RX_ALLOC_PAD);
>> +		struct sk_buff *skb = skb_dequeue(&np->recv_list);
>> +
>> +		if (!test_bit(RECV_LIST_ALLOCATE, &np->nv_recv_list_state))
>> +			schedule_delayed_work(&np->recv_cache_work, 0);
> Interesting, this seems to be coming up in multiple places recently..
>
> Could you explain why you have your own RECV_LIST_ALLOCATE bit here?
> Workqueue implementation itself uses an atomic bit to avoid scheduling
> work mutliple times:

This can avoid function call overhead to optimize the recv process.

Thanks

Zhu Yanjun

>
> bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
> 			   struct delayed_work *dwork, unsigned long delay)
> {
> 	struct work_struct *work = &dwork->work;
> 	bool ret = false;
> 	unsigned long flags;
>
> 	/* read the comment in __queue_work() */
> 	local_irq_save(flags);
>
> 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
> 		__queue_delayed_work(cpu, wq, dwork, delay);
> 		ret = true;
> 	}
>
> 	local_irq_restore(flags);
> 	return ret;
> }
> EXPORT_SYMBOL(queue_delayed_work_on);
>
>>   		if (likely(skb)) {
>>   			np->put_rx_ctx->skb = skb;
>>   			np->put_rx_ctx->dma = dma_map_single(&np->pci_dev->dev,
>> @@ -1845,7 +1864,11 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
>>   		less_rx = np->last_rx.ex;
>>   
>>   	while (np->put_rx.ex != less_rx) {
>> -		struct sk_buff *skb = netdev_alloc_skb(dev, np->rx_buf_sz + NV_RX_ALLOC_PAD);
>> +		struct sk_buff *skb = skb_dequeue(&np->recv_list);
>> +
>> +		if (!test_bit(RECV_LIST_ALLOCATE, &np->nv_recv_list_state))
>> +			schedule_delayed_work(&np->recv_cache_work, 0);
> It seems a little heavy to schedule this work on every packet, would it
> make sense to add this in nv_napi_poll() instead?
>
>>   		if (likely(skb)) {
>>   			np->put_rx_ctx->skb = skb;
>>   			np->put_rx_ctx->dma = dma_map_single(&np->pci_dev->dev,
>> @@ -1957,6 +1980,40 @@ static void nv_init_tx(struct net_device *dev)
>>   	}
>>   }
>>   
>> +static void nv_init_recv_cache(struct net_device *dev)
>> +{
>> +	struct fe_priv *np = netdev_priv(dev);
>> +
>> +	skb_queue_head_init(&np->recv_list);
>> +	while (skb_queue_len(&np->recv_list) < RECV_CACHE_LIST_LENGTH) {
>> +		struct sk_buff *skb = netdev_alloc_skb(dev,
>> +				 np->rx_buf_sz + NV_RX_ALLOC_PAD);
>> +		/* skb is null. This indicates that memory is not
>> +		 * enough.
>> +		 */
>> +		if (unlikely(!skb)) {
>> +			ndelay(3);
>> +			continue;
> Does this path ever hit?  Seems like doing an ndelay() and retrying
> allocation is not the best idea for non-preempt kernel.
>
> Also perhaps you should consider using __netdev_alloc_skb() and passing
> GFP_KERNEL, that way the system has a chance to go into memory reclaim
> (I presume this function can sleep).
>
>> +		}
>> +
>> +		skb_queue_tail(&np->recv_list, skb);
>> +	}
>> +}
>> +
>> +static void nv_destroy_recv_cache(struct net_device *dev)
>> +{
>> +	struct sk_buff *skb;
>> +	struct fe_priv *np = netdev_priv(dev);
>> +
>> +	cancel_delayed_work_sync(&np->recv_cache_work);
>> +	WARN_ON(delayed_work_pending(&np->recv_cache_work));
>> +
>> +	while ((skb = skb_dequeue(&np->recv_list)))
>> +		kfree_skb(skb);
> skb_queue_purge()
>
>> +	WARN_ON(skb_queue_len(&np->recv_list));
>> +}
>> +
>>   static int nv_init_ring(struct net_device *dev)
>>   {
>>   	struct fe_priv *np = netdev_priv(dev);
>> @@ -3047,6 +3104,8 @@ static int nv_change_mtu(struct net_device *dev, int new_mtu)
>>   		nv_drain_rxtx(dev);
>>   		/* reinit driver view of the rx queue */
>>   		set_bufsize(dev);
>> +		nv_destroy_recv_cache(dev);
>> +		nv_init_recv_cache(dev);
>>   		if (nv_init_ring(dev)) {
>>   			if (!np->in_shutdown)
>>   				mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
>> @@ -4074,6 +4133,32 @@ static void nv_free_irq(struct net_device *dev)
>>   	}
>>   }
>>   
>> +static void nv_recv_cache_worker(struct work_struct *work)
>> +{
>> +	struct fe_priv *np = container_of(work, struct fe_priv,
>> +					  recv_cache_work.work);
>> +
>> +	set_bit(RECV_LIST_ALLOCATE, &np->nv_recv_list_state);
>> +	while (skb_queue_len(&np->recv_list) < RECV_CACHE_LIST_LENGTH) {
>> +		struct sk_buff *skb = netdev_alloc_skb(np->dev,
>> +				np->rx_buf_sz + NV_RX_ALLOC_PAD);
>> +		/* skb is null. This indicates that memory is not
>> +		 * enough.
>> +		 * When the system memory is not enough, the kernel
>> +		 * will compact memory or drop caches. At that time,
>> +		 * if memory allocation fails, it had better wait some
>> +		 * time for memory.
>> +		 */
>> +		if (unlikely(!skb)) {
>> +			ndelay(3);
>> +			continue;
> Same comments as for the init function.
>
>> +		}
>> +
>> +		skb_queue_tail(&np->recv_list, skb);
>> +	}
>> +	clear_bit(RECV_LIST_ALLOCATE, &np->nv_recv_list_state);
>> +}
>> +
>>   static void nv_do_nic_poll(struct timer_list *t)
>>   {
>>   	struct fe_priv *np = from_timer(np, t, nic_poll);
>> @@ -4129,6 +4214,8 @@ static void nv_do_nic_poll(struct timer_list *t)
>>   			nv_drain_rxtx(dev);
>>   			/* reinit driver view of the rx queue */
>>   			set_bufsize(dev);
>> +			nv_destroy_recv_cache(dev);
>> +			nv_init_recv_cache(dev);
>>   			if (nv_init_ring(dev)) {
>>   				if (!np->in_shutdown)
>>   					mod_timer(&np->oom_kick, jiffies + OOM_REFILL);

^ permalink raw reply

* [PATCH net-next] bnxt_en: Add page_pool_destroy() during RX ring cleanup.
From: Michael Chan @ 2019-07-09  7:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, Ilias Apalodimas, Andy Gospodarek

Add page_pool_destroy() in bnxt_free_rx_rings() during normal RX ring
cleanup, as Ilias has informed us that the following commit has been
merged:

1da4bbeffe41 ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy")

The special error handling code to call page_pool_free() can now be
removed.  bnxt_free_rx_rings() will always be called during normal
shutdown or any error paths.

Fixes: 322b87ca55f2 ("bnxt_en: add page_pool support")
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index e9d3bd8..2b5b0ab 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2500,6 +2500,7 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
 		if (xdp_rxq_info_is_reg(&rxr->xdp_rxq))
 			xdp_rxq_info_unreg(&rxr->xdp_rxq);
 
+		page_pool_destroy(rxr->page_pool);
 		rxr->page_pool = NULL;
 
 		kfree(rxr->rx_tpa);
@@ -2560,19 +2561,14 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
 			return rc;
 
 		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
-		if (rc < 0) {
-			page_pool_free(rxr->page_pool);
-			rxr->page_pool = NULL;
+		if (rc < 0)
 			return rc;
-		}
 
 		rc = xdp_rxq_info_reg_mem_model(&rxr->xdp_rxq,
 						MEM_TYPE_PAGE_POOL,
 						rxr->page_pool);
 		if (rc) {
 			xdp_rxq_info_unreg(&rxr->xdp_rxq);
-			page_pool_free(rxr->page_pool);
-			rxr->page_pool = NULL;
 			return rc;
 		}
 
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v4 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-09  8:03 UTC (permalink / raw)
  To: netdev
  Cc: Joao Pinto, Jose Abreu, Giuseppe Cavallaro, Alexandre Torgue,
	David S. Miller, Maxime Coquelin, linux-stm32, linux-arm-kernel,
	linux-kernel, Ilias Apalodimas, Jesper Dangaard Brouer,
	Arnd Bergmann
In-Reply-To: <cover.1562659012.git.joabreu@synopsys.com>

Mapping and unmapping DMA region is an high bottleneck in stmmac driver,
specially in the RX path.

This commit introduces support for Page Pool API and uses it in all RX
queues. With this change, we get more stable troughput and some increase
of banwidth with iperf:
	- MAC1000 - 950 Mbps
	- XGMAC: 9.22 Gbps

Changes from v3:
	- Use page_pool_destroy() (Ilias)
Changes from v2:
	- Uncoditionally call page_pool_free() (Jesper)
Changes from v1:
	- Use page_pool_get_dma_addr() (Jesper)
	- Add a comment (Jesper)
	- Add page_pool_free() call (Jesper)
	- Reintroduce sync_single_for_device (Arnd / Ilias)

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

---
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ethernet/stmicro/stmmac/Kconfig       |   1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  10 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 203 +++++++---------------
 3 files changed, 70 insertions(+), 144 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig b/drivers/net/ethernet/stmicro/stmmac/Kconfig
index 943189dcccb1..2325b40dff6e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
+++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
@@ -3,6 +3,7 @@ config STMMAC_ETH
 	tristate "STMicroelectronics Multi-Gigabit Ethernet driver"
 	depends on HAS_IOMEM && HAS_DMA
 	select MII
+	select PAGE_POOL
 	select PHYLINK
 	select CRC32
 	imply PTP_1588_CLOCK
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 513f4e2df5f6..5cd966c154f3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -20,6 +20,7 @@
 #include <linux/ptp_clock_kernel.h>
 #include <linux/net_tstamp.h>
 #include <linux/reset.h>
+#include <net/page_pool.h>
 
 struct stmmac_resources {
 	void __iomem *addr;
@@ -54,14 +55,19 @@ struct stmmac_tx_queue {
 	u32 mss;
 };
 
+struct stmmac_rx_buffer {
+	struct page *page;
+	dma_addr_t addr;
+};
+
 struct stmmac_rx_queue {
 	u32 rx_count_frames;
 	u32 queue_index;
+	struct page_pool *page_pool;
+	struct stmmac_rx_buffer *buf_pool;
 	struct stmmac_priv *priv_data;
 	struct dma_extended_desc *dma_erx;
 	struct dma_desc *dma_rx ____cacheline_aligned_in_smp;
-	struct sk_buff **rx_skbuff;
-	dma_addr_t *rx_skbuff_dma;
 	unsigned int cur_rx;
 	unsigned int dirty_rx;
 	u32 rx_zeroc_thresh;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c142e9367a68..00f2df304e28 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1197,26 +1197,14 @@ static int stmmac_init_rx_buffers(struct stmmac_priv *priv, struct dma_desc *p,
 				  int i, gfp_t flags, u32 queue)
 {
 	struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
-	struct sk_buff *skb;
+	struct stmmac_rx_buffer *buf = &rx_q->buf_pool[i];
 
-	skb = __netdev_alloc_skb_ip_align(priv->dev, priv->dma_buf_sz, flags);
-	if (!skb) {
-		netdev_err(priv->dev,
-			   "%s: Rx init fails; skb is NULL\n", __func__);
+	buf->page = page_pool_dev_alloc_pages(rx_q->page_pool);
+	if (!buf->page)
 		return -ENOMEM;
-	}
-	rx_q->rx_skbuff[i] = skb;
-	rx_q->rx_skbuff_dma[i] = dma_map_single(priv->device, skb->data,
-						priv->dma_buf_sz,
-						DMA_FROM_DEVICE);
-	if (dma_mapping_error(priv->device, rx_q->rx_skbuff_dma[i])) {
-		netdev_err(priv->dev, "%s: DMA mapping error\n", __func__);
-		dev_kfree_skb_any(skb);
-		return -EINVAL;
-	}
-
-	stmmac_set_desc_addr(priv, p, rx_q->rx_skbuff_dma[i]);
 
+	buf->addr = page_pool_get_dma_addr(buf->page);
+	stmmac_set_desc_addr(priv, p, buf->addr);
 	if (priv->dma_buf_sz == BUF_SIZE_16KiB)
 		stmmac_init_desc3(priv, p);
 
@@ -1232,13 +1220,11 @@ static int stmmac_init_rx_buffers(struct stmmac_priv *priv, struct dma_desc *p,
 static void stmmac_free_rx_buffer(struct stmmac_priv *priv, u32 queue, int i)
 {
 	struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
+	struct stmmac_rx_buffer *buf = &rx_q->buf_pool[i];
 
-	if (rx_q->rx_skbuff[i]) {
-		dma_unmap_single(priv->device, rx_q->rx_skbuff_dma[i],
-				 priv->dma_buf_sz, DMA_FROM_DEVICE);
-		dev_kfree_skb_any(rx_q->rx_skbuff[i]);
-	}
-	rx_q->rx_skbuff[i] = NULL;
+	if (buf->page)
+		page_pool_put_page(rx_q->page_pool, buf->page, false);
+	buf->page = NULL;
 }
 
 /**
@@ -1321,10 +1307,6 @@ static int init_dma_rx_desc_rings(struct net_device *dev, gfp_t flags)
 						     queue);
 			if (ret)
 				goto err_init_rx_buffers;
-
-			netif_dbg(priv, probe, priv->dev, "[%p]\t[%p]\t[%x]\n",
-				  rx_q->rx_skbuff[i], rx_q->rx_skbuff[i]->data,
-				  (unsigned int)rx_q->rx_skbuff_dma[i]);
 		}
 
 		rx_q->cur_rx = 0;
@@ -1498,8 +1480,11 @@ static void free_dma_rx_desc_resources(struct stmmac_priv *priv)
 					  sizeof(struct dma_extended_desc),
 					  rx_q->dma_erx, rx_q->dma_rx_phy);
 
-		kfree(rx_q->rx_skbuff_dma);
-		kfree(rx_q->rx_skbuff);
+		kfree(rx_q->buf_pool);
+		if (rx_q->page_pool) {
+			page_pool_request_shutdown(rx_q->page_pool);
+			page_pool_destroy(rx_q->page_pool);
+		}
 	}
 }
 
@@ -1551,20 +1536,29 @@ static int alloc_dma_rx_desc_resources(struct stmmac_priv *priv)
 	/* RX queues buffers and DMA */
 	for (queue = 0; queue < rx_count; queue++) {
 		struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
+		struct page_pool_params pp_params = { 0 };
 
 		rx_q->queue_index = queue;
 		rx_q->priv_data = priv;
 
-		rx_q->rx_skbuff_dma = kmalloc_array(DMA_RX_SIZE,
-						    sizeof(dma_addr_t),
-						    GFP_KERNEL);
-		if (!rx_q->rx_skbuff_dma)
+		pp_params.flags = PP_FLAG_DMA_MAP;
+		pp_params.pool_size = DMA_RX_SIZE;
+		pp_params.order = DIV_ROUND_UP(priv->dma_buf_sz, PAGE_SIZE);
+		pp_params.nid = dev_to_node(priv->device);
+		pp_params.dev = priv->device;
+		pp_params.dma_dir = DMA_FROM_DEVICE;
+
+		rx_q->page_pool = page_pool_create(&pp_params);
+		if (IS_ERR(rx_q->page_pool)) {
+			ret = PTR_ERR(rx_q->page_pool);
+			rx_q->page_pool = NULL;
 			goto err_dma;
+		}
 
-		rx_q->rx_skbuff = kmalloc_array(DMA_RX_SIZE,
-						sizeof(struct sk_buff *),
-						GFP_KERNEL);
-		if (!rx_q->rx_skbuff)
+		rx_q->buf_pool = kmalloc_array(DMA_RX_SIZE,
+					       sizeof(*rx_q->buf_pool),
+					       GFP_KERNEL);
+		if (!rx_q->buf_pool)
 			goto err_dma;
 
 		if (priv->extend_desc) {
@@ -3286,9 +3280,8 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)
 	int dirty = stmmac_rx_dirty(priv, queue);
 	unsigned int entry = rx_q->dirty_rx;
 
-	int bfsize = priv->dma_buf_sz;
-
 	while (dirty-- > 0) {
+		struct stmmac_rx_buffer *buf = &rx_q->buf_pool[entry];
 		struct dma_desc *p;
 		bool use_rx_wd;
 
@@ -3297,49 +3290,22 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)
 		else
 			p = rx_q->dma_rx + entry;
 
-		if (likely(!rx_q->rx_skbuff[entry])) {
-			struct sk_buff *skb;
-
-			skb = netdev_alloc_skb_ip_align(priv->dev, bfsize);
-			if (unlikely(!skb)) {
-				/* so for a while no zero-copy! */
-				rx_q->rx_zeroc_thresh = STMMAC_RX_THRESH;
-				if (unlikely(net_ratelimit()))
-					dev_err(priv->device,
-						"fail to alloc skb entry %d\n",
-						entry);
-				break;
-			}
-
-			rx_q->rx_skbuff[entry] = skb;
-			rx_q->rx_skbuff_dma[entry] =
-			    dma_map_single(priv->device, skb->data, bfsize,
-					   DMA_FROM_DEVICE);
-			if (dma_mapping_error(priv->device,
-					      rx_q->rx_skbuff_dma[entry])) {
-				netdev_err(priv->dev, "Rx DMA map failed\n");
-				dev_kfree_skb(skb);
+		if (!buf->page) {
+			buf->page = page_pool_dev_alloc_pages(rx_q->page_pool);
+			if (!buf->page)
 				break;
-			}
-
-			stmmac_set_desc_addr(priv, p, rx_q->rx_skbuff_dma[entry]);
-			stmmac_refill_desc3(priv, rx_q, p);
-
-			if (rx_q->rx_zeroc_thresh > 0)
-				rx_q->rx_zeroc_thresh--;
-
-			netif_dbg(priv, rx_status, priv->dev,
-				  "refill entry #%d\n", entry);
 		}
-		dma_wmb();
+
+		buf->addr = page_pool_get_dma_addr(buf->page);
+		stmmac_set_desc_addr(priv, p, buf->addr);
+		stmmac_refill_desc3(priv, rx_q, p);
 
 		rx_q->rx_count_frames++;
 		rx_q->rx_count_frames %= priv->rx_coal_frames;
 		use_rx_wd = priv->use_riwt && rx_q->rx_count_frames;
 
-		stmmac_set_rx_owner(priv, p, use_rx_wd);
-
 		dma_wmb();
+		stmmac_set_rx_owner(priv, p, use_rx_wd);
 
 		entry = STMMAC_GET_ENTRY(entry, DMA_RX_SIZE);
 	}
@@ -3364,9 +3330,6 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 	unsigned int next_entry = rx_q->cur_rx;
 	int coe = priv->hw->rx_csum;
 	unsigned int count = 0;
-	bool xmac;
-
-	xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac;
 
 	if (netif_msg_rx_status(priv)) {
 		void *rx_head;
@@ -3380,11 +3343,12 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 		stmmac_display_ring(priv, rx_head, DMA_RX_SIZE, true);
 	}
 	while (count < limit) {
+		struct stmmac_rx_buffer *buf;
+		struct dma_desc *np, *p;
 		int entry, status;
-		struct dma_desc *p;
-		struct dma_desc *np;
 
 		entry = next_entry;
+		buf = &rx_q->buf_pool[entry];
 
 		if (priv->extend_desc)
 			p = (struct dma_desc *)(rx_q->dma_erx + entry);
@@ -3414,20 +3378,9 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 			stmmac_rx_extended_status(priv, &priv->dev->stats,
 					&priv->xstats, rx_q->dma_erx + entry);
 		if (unlikely(status == discard_frame)) {
+			page_pool_recycle_direct(rx_q->page_pool, buf->page);
 			priv->dev->stats.rx_errors++;
-			if (priv->hwts_rx_en && !priv->extend_desc) {
-				/* DESC2 & DESC3 will be overwritten by device
-				 * with timestamp value, hence reinitialize
-				 * them in stmmac_rx_refill() function so that
-				 * device can reuse it.
-				 */
-				dev_kfree_skb_any(rx_q->rx_skbuff[entry]);
-				rx_q->rx_skbuff[entry] = NULL;
-				dma_unmap_single(priv->device,
-						 rx_q->rx_skbuff_dma[entry],
-						 priv->dma_buf_sz,
-						 DMA_FROM_DEVICE);
-			}
+			buf->page = NULL;
 		} else {
 			struct sk_buff *skb;
 			int frame_len;
@@ -3467,58 +3420,20 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 					   frame_len, status);
 			}
 
-			/* The zero-copy is always used for all the sizes
-			 * in case of GMAC4 because it needs
-			 * to refill the used descriptors, always.
-			 */
-			if (unlikely(!xmac &&
-				     ((frame_len < priv->rx_copybreak) ||
-				     stmmac_rx_threshold_count(rx_q)))) {
-				skb = netdev_alloc_skb_ip_align(priv->dev,
-								frame_len);
-				if (unlikely(!skb)) {
-					if (net_ratelimit())
-						dev_warn(priv->device,
-							 "packet dropped\n");
-					priv->dev->stats.rx_dropped++;
-					continue;
-				}
-
-				dma_sync_single_for_cpu(priv->device,
-							rx_q->rx_skbuff_dma
-							[entry], frame_len,
-							DMA_FROM_DEVICE);
-				skb_copy_to_linear_data(skb,
-							rx_q->
-							rx_skbuff[entry]->data,
-							frame_len);
-
-				skb_put(skb, frame_len);
-				dma_sync_single_for_device(priv->device,
-							   rx_q->rx_skbuff_dma
-							   [entry], frame_len,
-							   DMA_FROM_DEVICE);
-			} else {
-				skb = rx_q->rx_skbuff[entry];
-				if (unlikely(!skb)) {
-					if (net_ratelimit())
-						netdev_err(priv->dev,
-							   "%s: Inconsistent Rx chain\n",
-							   priv->dev->name);
-					priv->dev->stats.rx_dropped++;
-					continue;
-				}
-				prefetch(skb->data - NET_IP_ALIGN);
-				rx_q->rx_skbuff[entry] = NULL;
-				rx_q->rx_zeroc_thresh++;
-
-				skb_put(skb, frame_len);
-				dma_unmap_single(priv->device,
-						 rx_q->rx_skbuff_dma[entry],
-						 priv->dma_buf_sz,
-						 DMA_FROM_DEVICE);
+			skb = netdev_alloc_skb_ip_align(priv->dev, frame_len);
+			if (unlikely(!skb)) {
+				priv->dev->stats.rx_dropped++;
+				continue;
 			}
 
+			dma_sync_single_for_cpu(priv->device, buf->addr,
+						frame_len, DMA_FROM_DEVICE);
+			skb_copy_to_linear_data(skb, page_address(buf->page),
+						frame_len);
+			skb_put(skb, frame_len);
+			dma_sync_single_for_device(priv->device, buf->addr,
+						   frame_len, DMA_FROM_DEVICE);
+
 			if (netif_msg_pktdata(priv)) {
 				netdev_dbg(priv->dev, "frame received (%dbytes)",
 					   frame_len);
@@ -3538,6 +3453,10 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 
 			napi_gro_receive(&ch->rx_napi, skb);
 
+			/* Data payload copied into SKB, page ready for recycle */
+			page_pool_recycle_direct(rx_q->page_pool, buf->page);
+			buf->page = NULL;
+
 			priv->dev->stats.rx_packets++;
 			priv->dev->stats.rx_bytes += frame_len;
 		}
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next v4 0/3] net: stmmac: Some improvements and a fix
From: Jose Abreu @ 2019-07-09  8:02 UTC (permalink / raw)
  To: netdev
  Cc: Joao Pinto, Jose Abreu, Giuseppe Cavallaro, Alexandre Torgue,
	David S. Miller, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-stm32, linux-arm-kernel, linux-kernel

Some performace improvements (01/03 and 03/03) and a fix (02/03), all for -next.

---
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: netdev@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---

Jose Abreu (3):
  net: stmmac: Implement RX Coalesce Frames setting
  net: stmmac: Fix descriptors address being in > 32 bits address space
  net: stmmac: Introducing support for Page Pool

 drivers/net/ethernet/stmicro/stmmac/Kconfig        |   1 +
 drivers/net/ethernet/stmicro/stmmac/common.h       |   1 +
 drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c  |   8 +-
 .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c    |   8 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c |   8 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c   |   8 +-
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h     |   2 +
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c |  10 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h         |   4 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h       |  12 +-
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |   7 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 217 +++++++--------------
 12 files changed, 114 insertions(+), 172 deletions(-)

-- 
2.7.4


^ permalink raw reply

* [PATCH net-next v4 2/3] net: stmmac: Fix descriptors address being in > 32 bits address space
From: Jose Abreu @ 2019-07-09  8:02 UTC (permalink / raw)
  To: netdev
  Cc: Joao Pinto, Jose Abreu, Giuseppe Cavallaro, Alexandre Torgue,
	David S. Miller, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-stm32, linux-arm-kernel, linux-kernel
In-Reply-To: <cover.1562659012.git.joabreu@synopsys.com>

Commit a993db88d17d ("net: stmmac: Enable support for > 32 Bits
addressing in XGMAC"), introduced support for > 32 bits addressing in
XGMAC but the conversion of descriptors to dma_addr_t was left out.

As some devices assing coherent memory in regions > 32 bits we need to
set lower and upper value of descriptors address when initializing DMA
channels.

Luckly, this was working for me because I was assigning CMA to < 4GB
address space for performance reasons.

Fixes: a993db88d17d ("net: stmmac: Enable support for > 32 Bits addressing in XGMAC")
Signed-off-by: Jose Abreu <joabreu@synopsys.com>

---
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: netdev@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c   |  8 ++++----
 drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c |  8 ++++----
 drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c  |  8 ++++----
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c    |  8 ++++----
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h      |  2 ++
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c  | 10 ++++++----
 drivers/net/ethernet/stmicro/stmmac/hwif.h          |  4 ++--
 7 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
index 6d5cba4075eb..2856f3fe5266 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
@@ -289,18 +289,18 @@ static void sun8i_dwmac_dma_init(void __iomem *ioaddr,
 
 static void sun8i_dwmac_dma_init_rx(void __iomem *ioaddr,
 				    struct stmmac_dma_cfg *dma_cfg,
-				    u32 dma_rx_phy, u32 chan)
+				    dma_addr_t dma_rx_phy, u32 chan)
 {
 	/* Write RX descriptors address */
-	writel(dma_rx_phy, ioaddr + EMAC_RX_DESC_LIST);
+	writel(lower_32_bits(dma_rx_phy), ioaddr + EMAC_RX_DESC_LIST);
 }
 
 static void sun8i_dwmac_dma_init_tx(void __iomem *ioaddr,
 				    struct stmmac_dma_cfg *dma_cfg,
-				    u32 dma_tx_phy, u32 chan)
+				    dma_addr_t dma_tx_phy, u32 chan)
 {
 	/* Write TX descriptors address */
-	writel(dma_tx_phy, ioaddr + EMAC_TX_DESC_LIST);
+	writel(lower_32_bits(dma_tx_phy), ioaddr + EMAC_TX_DESC_LIST);
 }
 
 /* sun8i_dwmac_dump_regs() - Dump EMAC address space
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
index 1fdedf77678f..2bac49b49f73 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
@@ -112,18 +112,18 @@ static void dwmac1000_dma_init(void __iomem *ioaddr,
 
 static void dwmac1000_dma_init_rx(void __iomem *ioaddr,
 				  struct stmmac_dma_cfg *dma_cfg,
-				  u32 dma_rx_phy, u32 chan)
+				  dma_addr_t dma_rx_phy, u32 chan)
 {
 	/* RX descriptor base address list must be written into DMA CSR3 */
-	writel(dma_rx_phy, ioaddr + DMA_RCV_BASE_ADDR);
+	writel(lower_32_bits(dma_rx_phy), ioaddr + DMA_RCV_BASE_ADDR);
 }
 
 static void dwmac1000_dma_init_tx(void __iomem *ioaddr,
 				  struct stmmac_dma_cfg *dma_cfg,
-				  u32 dma_tx_phy, u32 chan)
+				  dma_addr_t dma_tx_phy, u32 chan)
 {
 	/* TX descriptor base address list must be written into DMA CSR4 */
-	writel(dma_tx_phy, ioaddr + DMA_TX_BASE_ADDR);
+	writel(lower_32_bits(dma_tx_phy), ioaddr + DMA_TX_BASE_ADDR);
 }
 
 static u32 dwmac1000_configure_fc(u32 csr6, int rxfifosz)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
index c980cc7360a4..8f0d9bc7cab5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
@@ -31,18 +31,18 @@ static void dwmac100_dma_init(void __iomem *ioaddr,
 
 static void dwmac100_dma_init_rx(void __iomem *ioaddr,
 				 struct stmmac_dma_cfg *dma_cfg,
-				 u32 dma_rx_phy, u32 chan)
+				 dma_addr_t dma_rx_phy, u32 chan)
 {
 	/* RX descriptor base addr lists must be written into DMA CSR3 */
-	writel(dma_rx_phy, ioaddr + DMA_RCV_BASE_ADDR);
+	writel(lower_32_bits(dma_rx_phy), ioaddr + DMA_RCV_BASE_ADDR);
 }
 
 static void dwmac100_dma_init_tx(void __iomem *ioaddr,
 				 struct stmmac_dma_cfg *dma_cfg,
-				 u32 dma_tx_phy, u32 chan)
+				 dma_addr_t dma_tx_phy, u32 chan)
 {
 	/* TX descriptor base addr lists must be written into DMA CSR4 */
-	writel(dma_tx_phy, ioaddr + DMA_TX_BASE_ADDR);
+	writel(lower_32_bits(dma_tx_phy), ioaddr + DMA_TX_BASE_ADDR);
 }
 
 /* Store and Forward capability is not used at all.
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index 0f208e13da9f..6cbcdaea55f6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -70,7 +70,7 @@ static void dwmac4_dma_axi(void __iomem *ioaddr, struct stmmac_axi *axi)
 
 static void dwmac4_dma_init_rx_chan(void __iomem *ioaddr,
 				    struct stmmac_dma_cfg *dma_cfg,
-				    u32 dma_rx_phy, u32 chan)
+				    dma_addr_t dma_rx_phy, u32 chan)
 {
 	u32 value;
 	u32 rxpbl = dma_cfg->rxpbl ?: dma_cfg->pbl;
@@ -79,12 +79,12 @@ static void dwmac4_dma_init_rx_chan(void __iomem *ioaddr,
 	value = value | (rxpbl << DMA_BUS_MODE_RPBL_SHIFT);
 	writel(value, ioaddr + DMA_CHAN_RX_CONTROL(chan));
 
-	writel(dma_rx_phy, ioaddr + DMA_CHAN_RX_BASE_ADDR(chan));
+	writel(lower_32_bits(dma_rx_phy), ioaddr + DMA_CHAN_RX_BASE_ADDR(chan));
 }
 
 static void dwmac4_dma_init_tx_chan(void __iomem *ioaddr,
 				    struct stmmac_dma_cfg *dma_cfg,
-				    u32 dma_tx_phy, u32 chan)
+				    dma_addr_t dma_tx_phy, u32 chan)
 {
 	u32 value;
 	u32 txpbl = dma_cfg->txpbl ?: dma_cfg->pbl;
@@ -97,7 +97,7 @@ static void dwmac4_dma_init_tx_chan(void __iomem *ioaddr,
 
 	writel(value, ioaddr + DMA_CHAN_TX_CONTROL(chan));
 
-	writel(dma_tx_phy, ioaddr + DMA_CHAN_TX_BASE_ADDR(chan));
+	writel(lower_32_bits(dma_tx_phy), ioaddr + DMA_CHAN_TX_BASE_ADDR(chan));
 }
 
 static void dwmac4_dma_init_channel(void __iomem *ioaddr,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
index 9a9792527530..7f86dffb264d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -199,7 +199,9 @@
 #define XGMAC_RxPBL			GENMASK(21, 16)
 #define XGMAC_RxPBL_SHIFT		16
 #define XGMAC_RXST			BIT(0)
+#define XGMAC_DMA_CH_TxDESC_HADDR(x)	(0x00003110 + (0x80 * (x)))
 #define XGMAC_DMA_CH_TxDESC_LADDR(x)	(0x00003114 + (0x80 * (x)))
+#define XGMAC_DMA_CH_RxDESC_HADDR(x)	(0x00003118 + (0x80 * (x)))
 #define XGMAC_DMA_CH_RxDESC_LADDR(x)	(0x0000311c + (0x80 * (x)))
 #define XGMAC_DMA_CH_TxDESC_TAIL_LPTR(x)	(0x00003124 + (0x80 * (x)))
 #define XGMAC_DMA_CH_RxDESC_TAIL_LPTR(x)	(0x0000312c + (0x80 * (x)))
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
index 229c58758cbd..a4f236e3593e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
@@ -44,7 +44,7 @@ static void dwxgmac2_dma_init_chan(void __iomem *ioaddr,
 
 static void dwxgmac2_dma_init_rx_chan(void __iomem *ioaddr,
 				      struct stmmac_dma_cfg *dma_cfg,
-				      u32 dma_rx_phy, u32 chan)
+				      dma_addr_t phy, u32 chan)
 {
 	u32 rxpbl = dma_cfg->rxpbl ?: dma_cfg->pbl;
 	u32 value;
@@ -54,12 +54,13 @@ static void dwxgmac2_dma_init_rx_chan(void __iomem *ioaddr,
 	value |= (rxpbl << XGMAC_RxPBL_SHIFT) & XGMAC_RxPBL;
 	writel(value, ioaddr + XGMAC_DMA_CH_RX_CONTROL(chan));
 
-	writel(dma_rx_phy, ioaddr + XGMAC_DMA_CH_RxDESC_LADDR(chan));
+	writel(upper_32_bits(phy), ioaddr + XGMAC_DMA_CH_RxDESC_HADDR(chan));
+	writel(lower_32_bits(phy), ioaddr + XGMAC_DMA_CH_RxDESC_LADDR(chan));
 }
 
 static void dwxgmac2_dma_init_tx_chan(void __iomem *ioaddr,
 				      struct stmmac_dma_cfg *dma_cfg,
-				      u32 dma_tx_phy, u32 chan)
+				      dma_addr_t phy, u32 chan)
 {
 	u32 txpbl = dma_cfg->txpbl ?: dma_cfg->pbl;
 	u32 value;
@@ -70,7 +71,8 @@ static void dwxgmac2_dma_init_tx_chan(void __iomem *ioaddr,
 	value |= XGMAC_OSP;
 	writel(value, ioaddr + XGMAC_DMA_CH_TX_CONTROL(chan));
 
-	writel(dma_tx_phy, ioaddr + XGMAC_DMA_CH_TxDESC_LADDR(chan));
+	writel(upper_32_bits(phy), ioaddr + XGMAC_DMA_CH_TxDESC_HADDR(chan));
+	writel(lower_32_bits(phy), ioaddr + XGMAC_DMA_CH_TxDESC_LADDR(chan));
 }
 
 static void dwxgmac2_dma_axi(void __iomem *ioaddr, struct stmmac_axi *axi)
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.h b/drivers/net/ethernet/stmicro/stmmac/hwif.h
index 2acfbc70e3c8..278c0dbec9d9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.h
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.h
@@ -150,10 +150,10 @@ struct stmmac_dma_ops {
 			  struct stmmac_dma_cfg *dma_cfg, u32 chan);
 	void (*init_rx_chan)(void __iomem *ioaddr,
 			     struct stmmac_dma_cfg *dma_cfg,
-			     u32 dma_rx_phy, u32 chan);
+			     dma_addr_t phy, u32 chan);
 	void (*init_tx_chan)(void __iomem *ioaddr,
 			     struct stmmac_dma_cfg *dma_cfg,
-			     u32 dma_tx_phy, u32 chan);
+			     dma_addr_t phy, u32 chan);
 	/* Configure the AXI Bus Mode Register */
 	void (*axi)(void __iomem *ioaddr, struct stmmac_axi *axi);
 	/* Dump DMA registers */
-- 
2.7.4


^ permalink raw reply related

* [PATCH net-next v4 1/3] net: stmmac: Implement RX Coalesce Frames setting
From: Jose Abreu @ 2019-07-09  8:02 UTC (permalink / raw)
  To: netdev
  Cc: Joao Pinto, Jose Abreu, Giuseppe Cavallaro, Alexandre Torgue,
	David S. Miller, Maxime Coquelin, linux-stm32, linux-arm-kernel,
	linux-kernel, Jakub Kicinski
In-Reply-To: <cover.1562659012.git.joabreu@synopsys.com>

Add support for coalescing RX path by specifying number of frames which
don't need to have interrupt on completion bit set.

This is only available when RX Watchdog is enabled.

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>

---
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h         |  1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac.h         |  2 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c |  7 +++++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c    | 18 ++++++++++++------
 4 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 2403a65167b2..dfd47fdfa447 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -252,6 +252,7 @@ struct stmmac_safety_stats {
 #define STMMAC_MAX_COAL_TX_TICK	100000
 #define STMMAC_TX_MAX_FRAMES	256
 #define STMMAC_TX_FRAMES	1
+#define STMMAC_RX_FRAMES	25
 
 /* Packets types */
 enum packets_types {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 123898235cb0..513f4e2df5f6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -55,6 +55,7 @@ struct stmmac_tx_queue {
 };
 
 struct stmmac_rx_queue {
+	u32 rx_count_frames;
 	u32 queue_index;
 	struct stmmac_priv *priv_data;
 	struct dma_extended_desc *dma_erx;
@@ -110,6 +111,7 @@ struct stmmac_priv {
 	/* Frequently used values are kept adjacent for cache effect */
 	u32 tx_coal_frames;
 	u32 tx_coal_timer;
+	u32 rx_coal_frames;
 
 	int tx_coalesce;
 	int hwts_tx_en;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index cfd93eefb50e..6efb66820d4c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -701,8 +701,10 @@ static int stmmac_get_coalesce(struct net_device *dev,
 	ec->tx_coalesce_usecs = priv->tx_coal_timer;
 	ec->tx_max_coalesced_frames = priv->tx_coal_frames;
 
-	if (priv->use_riwt)
+	if (priv->use_riwt) {
+		ec->rx_max_coalesced_frames = priv->rx_coal_frames;
 		ec->rx_coalesce_usecs = stmmac_riwt2usec(priv->rx_riwt, priv);
+	}
 
 	return 0;
 }
@@ -715,7 +717,7 @@ static int stmmac_set_coalesce(struct net_device *dev,
 	unsigned int rx_riwt;
 
 	/* Check not supported parameters  */
-	if ((ec->rx_max_coalesced_frames) || (ec->rx_coalesce_usecs_irq) ||
+	if ((ec->rx_coalesce_usecs_irq) ||
 	    (ec->rx_max_coalesced_frames_irq) || (ec->tx_coalesce_usecs_irq) ||
 	    (ec->use_adaptive_rx_coalesce) || (ec->use_adaptive_tx_coalesce) ||
 	    (ec->pkt_rate_low) || (ec->rx_coalesce_usecs_low) ||
@@ -749,6 +751,7 @@ static int stmmac_set_coalesce(struct net_device *dev,
 	/* Only copy relevant parameters, ignore all others. */
 	priv->tx_coal_frames = ec->tx_max_coalesced_frames;
 	priv->tx_coal_timer = ec->tx_coalesce_usecs;
+	priv->rx_coal_frames = ec->rx_max_coalesced_frames;
 	priv->rx_riwt = rx_riwt;
 	stmmac_rx_watchdog(priv, priv->ioaddr, priv->rx_riwt, rx_cnt);
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index bdedde99148a..c142e9367a68 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2268,20 +2268,21 @@ static void stmmac_tx_timer(struct timer_list *t)
 }
 
 /**
- * stmmac_init_tx_coalesce - init tx mitigation options.
+ * stmmac_init_coalesce - init mitigation options.
  * @priv: driver private structure
  * Description:
- * This inits the transmit coalesce parameters: i.e. timer rate,
+ * This inits the coalesce parameters: i.e. timer rate,
  * timer handler and default threshold used for enabling the
  * interrupt on completion bit.
  */
-static void stmmac_init_tx_coalesce(struct stmmac_priv *priv)
+static void stmmac_init_coalesce(struct stmmac_priv *priv)
 {
 	u32 tx_channel_count = priv->plat->tx_queues_to_use;
 	u32 chan;
 
 	priv->tx_coal_frames = STMMAC_TX_FRAMES;
 	priv->tx_coal_timer = STMMAC_COAL_TX_TIMER;
+	priv->rx_coal_frames = STMMAC_RX_FRAMES;
 
 	for (chan = 0; chan < tx_channel_count; chan++) {
 		struct stmmac_tx_queue *tx_q = &priv->tx_queue[chan];
@@ -2651,7 +2652,7 @@ static int stmmac_open(struct net_device *dev)
 		goto init_error;
 	}
 
-	stmmac_init_tx_coalesce(priv);
+	stmmac_init_coalesce(priv);
 
 	phylink_start(priv->phylink);
 
@@ -3289,6 +3290,7 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)
 
 	while (dirty-- > 0) {
 		struct dma_desc *p;
+		bool use_rx_wd;
 
 		if (priv->extend_desc)
 			p = (struct dma_desc *)(rx_q->dma_erx + entry);
@@ -3331,7 +3333,11 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)
 		}
 		dma_wmb();
 
-		stmmac_set_rx_owner(priv, p, priv->use_riwt);
+		rx_q->rx_count_frames++;
+		rx_q->rx_count_frames %= priv->rx_coal_frames;
+		use_rx_wd = priv->use_riwt && rx_q->rx_count_frames;
+
+		stmmac_set_rx_owner(priv, p, use_rx_wd);
 
 		dma_wmb();
 
@@ -4631,7 +4637,7 @@ int stmmac_resume(struct device *dev)
 	stmmac_clear_descriptors(priv);
 
 	stmmac_hw_setup(ndev, false);
-	stmmac_init_tx_coalesce(priv);
+	stmmac_init_coalesce(priv);
 	stmmac_set_rx_mode(ndev);
 
 	stmmac_enable_all_queues(priv);
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH 0/2] forcedeth: recv cache support
From: Yanjun Zhu @ 2019-07-09  8:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20190708.152352.710464914281100209.davem@davemloft.net>


On 2019/7/9 6:23, David Miller wrote:
> From: Zhu Yanjun <yanjun.zhu@oracle.com>
> Date: Fri,  5 Jul 2019 02:19:26 -0400
>
>> This recv cache is to make NIC work steadily when the system memory is
>> not enough.
> The system is supposed to hold onto enough atomic memory to absorb all
> reasonable situations like this.
>
> If anything a solution to this problem belongs generically somewhere,
> not in a driver.  And furthermore looping over an allocation attempt
> with a delay is strongly discouraged.

Thanks a lot for your suggestions. Now a user is testing this patch in 
LAB and production hosts.

After this patch can pass tests, I will send V2 based on your suggestions.

Thanks,

Zhu Yanjun


^ permalink raw reply

* Re:  Re: linux-next: build failure after merge of the net-next tree
From: Bernard Metzler @ 2019-07-09  9:00 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Stephen Rothwell, Doug Ledford, Jason Gunthorpe, David Miller,
	Networking, Linux Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <20190709064346.GF7034@mtr-leonro.mtl.com>

-----"Leon Romanovsky" <leon@kernel.org> wrote: -----

>To: "Stephen Rothwell" <sfr@canb.auug.org.au>
>From: "Leon Romanovsky" <leon@kernel.org>
>Date: 07/09/2019 08:43AM
>Cc: "Doug Ledford" <dledford@redhat.com>, "Jason Gunthorpe"
><jgg@mellanox.com>, "David Miller" <davem@davemloft.net>,
>"Networking" <netdev@vger.kernel.org>, "Linux Next Mailing List"
><linux-next@vger.kernel.org>, "Linux Kernel Mailing List"
><linux-kernel@vger.kernel.org>, "Bernard Metzler"
><bmt@zurich.ibm.com>
>Subject: [EXTERNAL] Re: linux-next: build failure after merge of the
>net-next tree
>
>On Tue, Jul 09, 2019 at 01:56:36PM +1000, Stephen Rothwell wrote:
>> Hi all,
>>
>> After merging the net-next tree, today's linux-next build (x86_64
>> allmodconfig) failed like this:
>>
>> drivers/infiniband/sw/siw/siw_cm.c: In function
>'siw_create_listen':
>> drivers/infiniband/sw/siw/siw_cm.c:1978:3: error: implicit
>declaration of function 'for_ifa'; did you mean 'fork_idle'?
>[-Werror=implicit-function-declaration]
>>    for_ifa(in_dev)
>>    ^~~~~~~
>>    fork_idle
>> drivers/infiniband/sw/siw/siw_cm.c:1978:18: error: expected ';'
>before '{' token
>>    for_ifa(in_dev)
>>                   ^
>>                   ;
>>    {
>>    ~
>>
>> Caused by commit
>>
>>   6c52fdc244b5 ("rdma/siw: connection management")
>>
>> from the rdma tree.  I don't know why this didn't fail after I
>mereged
>> that tree.
>
>I had the same question, because I have this fix for a couple of days
>already.
>
>From 56c9e15ec670af580daa8c3ffde9503af3042d67 Mon Sep 17 00:00:00
>2001
>From: Leon Romanovsky <leonro@mellanox.com>
>Date: Sun, 7 Jul 2019 10:43:42 +0300
>Subject: [PATCH] Fixup to build SIW issue
>
>Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>---
> drivers/infiniband/sw/siw/siw_cm.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/infiniband/sw/siw/siw_cm.c
>b/drivers/infiniband/sw/siw/siw_cm.c
>index 8e618cb7261f..c883bf514341 100644
>--- a/drivers/infiniband/sw/siw/siw_cm.c
>+++ b/drivers/infiniband/sw/siw/siw_cm.c
>@@ -1954,6 +1954,7 @@ static void siw_drop_listeners(struct iw_cm_id
>*id)
> int siw_create_listen(struct iw_cm_id *id, int backlog)
> {
> 	struct net_device *dev = to_siw_dev(id->device)->netdev;
>+	const struct in_ifaddr *ifa;
> 	int rv = 0, listeners = 0;
>
> 	siw_dbg(id->device, "id 0x%p: backlog %d\n", id, backlog);
>@@ -1975,8 +1976,7 @@ int siw_create_listen(struct iw_cm_id *id, int
>backlog)
> 			id, &s_laddr.sin_addr, ntohs(s_laddr.sin_port),
> 			&s_raddr->sin_addr, ntohs(s_raddr->sin_port));
>
>-		for_ifa(in_dev)
>-		{
>+		in_dev_for_each_ifa_rcu(ifa, in_dev) {
> 			if (ipv4_is_zeronet(s_laddr.sin_addr.s_addr) ||
> 			    s_laddr.sin_addr.s_addr == ifa->ifa_address) {
> 				s_laddr.sin_addr.s_addr = ifa->ifa_address;
>@@ -1988,7 +1988,6 @@ int siw_create_listen(struct iw_cm_id *id, int
>backlog)
> 					listeners++;
> 			}
> 		}
>-		endfor_ifa(in_dev);
> 		in_dev_put(in_dev);
> 	} else if (id->local_addr.ss_family == AF_INET6) {
> 		struct inet6_dev *in6_dev = in6_dev_get(dev);
>--
>2.21.0
>
>
>>
>> I have marked that driver as depending on BROKEN for today.
>>
>> --
>> Cheers,
>> Stephen Rothwell
>
>
>
I am very sorry for that issues. Things are moving fast, and I
didn't realize 'for_ifa' recently went away. I agree with Leon,
his patch fixes the issue. So, please, let's apply that one.

Leon, many thanks for providing the fix.

Thanks very much,
Bernard.


^ permalink raw reply

* Re: [PATCH v5 rdma-next 2/6] RDMA/efa: Use the common mmap_xa helpers
From: Gal Pressman @ 2019-07-09  9:02 UTC (permalink / raw)
  To: Michal Kalderon, ariel.elior, jgg, dledford, galpress
  Cc: linux-rdma, davem, netdev
In-Reply-To: <20190708091503.14723-3-michal.kalderon@marvell.com>

On 08/07/2019 12:14, Michal Kalderon wrote:

Hi, a few nits:

> Remove the functions related to managing the mmap_xa database.
> This code was copied to the ib_core. Use the common API's instead.
> 
> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> ---
>  drivers/infiniband/hw/efa/efa.h       |   3 +-
>  drivers/infiniband/hw/efa/efa_main.c  |   1 +
>  drivers/infiniband/hw/efa/efa_verbs.c | 183 ++++++++--------------------------
>  3 files changed, 42 insertions(+), 145 deletions(-)
> diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
> index df77bc312a25..5dff892da161 100644
> --- a/drivers/infiniband/hw/efa/efa_verbs.c
> +++ b/drivers/infiniband/hw/efa/efa_verbs.c
> @@ -13,34 +13,15 @@
>  
>  #include "efa.h"
>  
> -#define EFA_MMAP_FLAG_SHIFT 56
> -#define EFA_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
> -#define EFA_MMAP_INVALID U64_MAX
> -

Don't delete the blank line please.

>  enum {
>  	EFA_MMAP_DMA_PAGE = 0,
>  	EFA_MMAP_IO_WC,
>  	EFA_MMAP_IO_NC,
>  };
> -
>  #define EFA_AENQ_ENABLED_GROUPS \
>  	(BIT(EFA_ADMIN_FATAL_ERROR) | BIT(EFA_ADMIN_WARNING) | \
>  	 BIT(EFA_ADMIN_NOTIFICATION) | BIT(EFA_ADMIN_KEEP_ALIVE))
>  
> -struct efa_mmap_entry {
> -	void  *obj;
> -	u64 address;
> -	u64 length;
> -	u32 mmap_page;
> -	u8 mmap_flag;
> -};
> -
> -static inline u64 get_mmap_key(const struct efa_mmap_entry *efa)
> -{
> -	return ((u64)efa->mmap_flag << EFA_MMAP_FLAG_SHIFT) |
> -	       ((u64)efa->mmap_page << PAGE_SHIFT);
> -}
> -
>  #define EFA_CHUNK_PAYLOAD_SHIFT       12
>  #define EFA_CHUNK_PAYLOAD_SIZE        BIT(EFA_CHUNK_PAYLOAD_SHIFT)
>  #define EFA_CHUNK_PAYLOAD_PTR_SIZE    8
> @@ -145,105 +126,7 @@ static void *efa_zalloc_mapped(struct efa_dev *dev, dma_addr_t *dma_addr,
>  	return addr;
>  }
>  
> -/*
> - * This is only called when the ucontext is destroyed and there can be no
> - * concurrent query via mmap or allocate on the xarray, thus we can be sure no
> - * other thread is using the entry pointer. We also know that all the BAR
> - * pages have either been zap'd or munmaped at this point.  Normal pages are
> - * refcounted and will be freed at the proper time.
> - */
> -static void mmap_entries_remove_free(struct efa_dev *dev,
> -				     struct efa_ucontext *ucontext)
> -{
> -	struct efa_mmap_entry *entry;
> -	unsigned long mmap_page;
>  
> -	xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
> -		xa_erase(&ucontext->mmap_xa, mmap_page);
> -
> -		ibdev_dbg(
> -			&dev->ibdev,
> -			"mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
> -			entry->obj, get_mmap_key(entry), entry->address,
> -			entry->length);
> -		if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
> -			/* DMA mapping is already gone, now free the pages */
> -			free_pages_exact(phys_to_virt(entry->address),
> -					 entry->length);
> -		kfree(entry);
> -	}
> -}
> -
> -static struct efa_mmap_entry *mmap_entry_get(struct efa_dev *dev,
> -					     struct efa_ucontext *ucontext,
> -					     u64 key, u64 len)
> -{
> -	struct efa_mmap_entry *entry;
> -	u64 mmap_page;
> -
> -	mmap_page = (key & EFA_MMAP_PAGE_MASK) >> PAGE_SHIFT;
> -	if (mmap_page > U32_MAX)
> -		return NULL;
> -
> -	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> -	if (!entry || get_mmap_key(entry) != key || entry->length != len)
> -		return NULL;
> -
> -	ibdev_dbg(&dev->ibdev,
> -		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
> -		  entry->obj, key, entry->address, entry->length);
> -
> -	return entry;
> -}
> -
> -/*
> - * Note this locking scheme cannot support removal of entries, except during
> - * ucontext destruction when the core code guarentees no concurrency.
> - */
> -static u64 mmap_entry_insert(struct efa_dev *dev, struct efa_ucontext *ucontext,
> -			     void *obj, u64 address, u64 length, u8 mmap_flag)
> -{
> -	struct efa_mmap_entry *entry;
> -	u32 next_mmap_page;
> -	int err;
> -
> -	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
> -	if (!entry)
> -		return EFA_MMAP_INVALID;
> -
> -	entry->obj = obj;
> -	entry->address = address;
> -	entry->length = length;
> -	entry->mmap_flag = mmap_flag;
> -
> -	xa_lock(&ucontext->mmap_xa);
> -	if (check_add_overflow(ucontext->mmap_xa_page,
> -			       (u32)(length >> PAGE_SHIFT),
> -			       &next_mmap_page))
> -		goto err_unlock;
> -
> -	entry->mmap_page = ucontext->mmap_xa_page;
> -	ucontext->mmap_xa_page = next_mmap_page;
> -	err = __xa_insert(&ucontext->mmap_xa, entry->mmap_page, entry,
> -			  GFP_KERNEL);
> -	if (err)
> -		goto err_unlock;
> -
> -	xa_unlock(&ucontext->mmap_xa);
> -
> -	ibdev_dbg(
> -		&dev->ibdev,
> -		"mmap: obj[0x%p] addr[%#llx], len[%#llx], key[%#llx] inserted\n",
> -		entry->obj, entry->address, entry->length, get_mmap_key(entry));
> -
> -	return get_mmap_key(entry);
> -
> -err_unlock:
> -	xa_unlock(&ucontext->mmap_xa);
> -	kfree(entry);
> -	return EFA_MMAP_INVALID;
> -
> -}
>  

You left two extra blank lines between efa_zalloc_mapped and efa_query_device.

>  int efa_query_device(struct ib_device *ibdev,
>  		     struct ib_device_attr *props,
> @@ -488,45 +371,52 @@ static int qp_mmap_entries_setup(struct efa_qp *qp,
>  				 struct efa_com_create_qp_params *params,
>  				 struct efa_ibv_create_qp_resp *resp)
>  {
> +	u64 address;
> +	u64 length;

Line break.

>  	/*
>  	 * Once an entry is inserted it might be mmapped, hence cannot be
>  	 * cleaned up until dealloc_ucontext.
>  	 */
>  	resp->sq_db_mmap_key =
> -		mmap_entry_insert(dev, ucontext, qp,
> -				  dev->db_bar_addr + resp->sq_db_offset,
> -				  PAGE_SIZE, EFA_MMAP_IO_NC);
> -	if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
> +		rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
> +					    dev->db_bar_addr +
> +					    resp->sq_db_offset,
> +					    PAGE_SIZE, EFA_MMAP_IO_NC);
> +	if (resp->sq_db_mmap_key == RDMA_USER_MMAP_INVALID)
>  		return -ENOMEM;
>  
>  	resp->sq_db_offset &= ~PAGE_MASK;
>  
> +	address = dev->mem_bar_addr + resp->llq_desc_offset;
> +	length = PAGE_ALIGN(params->sq_ring_size_in_bytes +
> +			    (resp->llq_desc_offset & ~PAGE_MASK));
>  	resp->llq_desc_mmap_key =
> -		mmap_entry_insert(dev, ucontext, qp,
> -				  dev->mem_bar_addr + resp->llq_desc_offset,
> -				  PAGE_ALIGN(params->sq_ring_size_in_bytes +
> -					     (resp->llq_desc_offset & ~PAGE_MASK)),
> -				  EFA_MMAP_IO_WC);
> -	if (resp->llq_desc_mmap_key == EFA_MMAP_INVALID)
> +		rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
> +					    address,
> +					    length,
> +					    EFA_MMAP_IO_WC);
> +	if (resp->llq_desc_mmap_key == RDMA_USER_MMAP_INVALID)
>  		return -ENOMEM;
>  
>  	resp->llq_desc_offset &= ~PAGE_MASK;
>  
>  	if (qp->rq_size) {
> +		address = dev->db_bar_addr + resp->rq_db_offset;
>  		resp->rq_db_mmap_key =
> -			mmap_entry_insert(dev, ucontext, qp,
> -					  dev->db_bar_addr + resp->rq_db_offset,
> -					  PAGE_SIZE, EFA_MMAP_IO_NC);
> -		if (resp->rq_db_mmap_key == EFA_MMAP_INVALID)
> +			rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
> +						    address, PAGE_SIZE,
> +						    EFA_MMAP_IO_NC);
> +		if (resp->rq_db_mmap_key == RDMA_USER_MMAP_INVALID)
>  			return -ENOMEM;
>  
>  		resp->rq_db_offset &= ~PAGE_MASK;
>  
> +		address = virt_to_phys(qp->rq_cpu_addr);
>  		resp->rq_mmap_key =
> -			mmap_entry_insert(dev, ucontext, qp,
> -					  virt_to_phys(qp->rq_cpu_addr),
> -					  qp->rq_size, EFA_MMAP_DMA_PAGE);
> -		if (resp->rq_mmap_key == EFA_MMAP_INVALID)
> +			rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
> +						    address, qp->rq_size,
> +						    EFA_MMAP_DMA_PAGE);
> +		if (resp->rq_mmap_key == RDMA_USER_MMAP_INVALID)
>  			return -ENOMEM;
>  
>  		resp->rq_mmap_size = qp->rq_size;
> @@ -875,11 +765,13 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
>  static int cq_mmap_entries_setup(struct efa_dev *dev, struct efa_cq *cq,
>  				 struct efa_ibv_create_cq_resp *resp)
>  {
> +	struct efa_ucontext *ucontext = cq->ucontext;

Line break.

>  	resp->q_mmap_size = cq->size;
> -	resp->q_mmap_key = mmap_entry_insert(dev, cq->ucontext, cq,
> -					     virt_to_phys(cq->cpu_addr),
> -					     cq->size, EFA_MMAP_DMA_PAGE);
> -	if (resp->q_mmap_key == EFA_MMAP_INVALID)
> +	resp->q_mmap_key =
> +		rdma_user_mmap_entry_insert(&ucontext->ibucontext, cq,
> +					    virt_to_phys(cq->cpu_addr),
> +					    cq->size, EFA_MMAP_DMA_PAGE);
> +	if (resp->q_mmap_key == RDMA_USER_MMAP_INVALID)
>  		return -ENOMEM;
>  
>  	return 0;
> @@ -1531,7 +1423,6 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata)
>  		goto err_out;
>  
>  	ucontext->uarn = result.uarn;
> -	xa_init(&ucontext->mmap_xa);
>  
>  	resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_QUERY_DEVICE;
>  	resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_CREATE_AH;
> @@ -1560,19 +1451,25 @@ void efa_dealloc_ucontext(struct ib_ucontext *ibucontext)
>  	struct efa_ucontext *ucontext = to_eucontext(ibucontext);
>  	struct efa_dev *dev = to_edev(ibucontext->device);
>  
> -	mmap_entries_remove_free(dev, ucontext);
>  	efa_dealloc_uar(dev, ucontext->uarn);
>  }
>  
> +void efa_mmap_free(u64 address, u64 length, u8 mmap_flag)
> +{
> +	/* DMA mapping is already gone, now free the pages */
> +	if (mmap_flag == EFA_MMAP_DMA_PAGE)
> +		free_pages_exact(phys_to_virt(address), length);
> +}
> +
>  static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
>  		      struct vm_area_struct *vma, u64 key, u64 length)
>  {
> -	struct efa_mmap_entry *entry;
> +	struct rdma_user_mmap_entry *entry;
>  	unsigned long va;
>  	u64 pfn;
>  	int err;
>  
> -	entry = mmap_entry_get(dev, ucontext, key, length);
> +	entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key, length);
>  	if (!entry) {
>  		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid entry\n",
>  			  key);
> 

^ permalink raw reply

* Re: [PATCH v2 05/10] net: hisilicon: HI13X1_GMAX need dreq reset at first
From: Sergei Shtylyov @ 2019-07-09  9:35 UTC (permalink / raw)
  To: Jiangfeng Xiao, davem, robh+dt, yisen.zhuang, salil.mehta,
	mark.rutland, dingtianhong
  Cc: netdev, devicetree, linux-kernel, leeyou.li, nixiaoming,
	jianping.liu, xiekunxun
In-Reply-To: <1562643071-46811-6-git-send-email-xiaojiangfeng@huawei.com>

Hello!

On 09.07.2019 6:31, Jiangfeng Xiao wrote:

> HI13X1_GMAC delete request for soft reset at first,
> otherwise, the subsequent initialization will not
> take effect.
> 
> Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
> ---
>   drivers/net/ethernet/hisilicon/hip04_eth.c | 24 ++++++++++++++++++++++++
>   1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
> index fe61b01..19d8cfd 100644
> --- a/drivers/net/ethernet/hisilicon/hip04_eth.c
> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
[...]
> @@ -853,6 +867,15 @@ static int hip04_mac_probe(struct platform_device *pdev)
>   		goto init_fail;
>   	}
>   
> +#if defined(CONFIG_HI13X1_GMAC)
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> +	priv->sysctrl_base = devm_ioremap_resource(d, res);

    There's devm_platform_ioremap_resource() now.

> +	if (IS_ERR(priv->sysctrl_base)) {
> +		ret = PTR_ERR(priv->sysctrl_base);
> +		goto init_fail;
> +	}
> +#endif
> +
>   	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
>   	if (ret < 0) {
>   		dev_warn(d, "no port-handle\n");
[...]

MBR, Sergei

^ permalink raw reply

* [PATCH v2 1/2] rtw88: pci: Rearrange the memory usage for skb in RX ISR
From: Jian-Hong Pan @ 2019-07-09 10:20 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight
  Cc: linux-wireless, netdev, linux-kernel, linux, Daniel Drake,
	Jian-Hong Pan, stable
In-Reply-To: <20190708063252.4756-1-jian-hong@endlessm.com>

Testing with RTL8822BE hardware, when available memory is low, we
frequently see a kernel panic and system freeze.

First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):

rx routine starvation
WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822 rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
[ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]

Then we see a variety of different error conditions and kernel panics,
such as this one (trimmed):

rtw_pci 0000:02:00.0: pci bus timeout, check dma status
skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:105!
invalid opcode: 0000 [#1] SMP NOPTI
RIP: 0010:skb_panic+0x43/0x45

When skb allocation fails and the "rx routine starvation" is hit, the
function returns immediately without updating the RX ring. At this
point, the RX ring may continue referencing an old skb which was already
handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
bad things happen.

This patch allocates a new, data-sized skb first in RX ISR. After
copying the data in, we pass it to the upper layers. However, if skb
allocation fails, we effectively drop the frame. In both cases, the
original, full size ring skb is reused.

In addition, to fixing the kernel crash, the RX routine should now
generally behave better under low memory conditions.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Cc: <stable@vger.kernel.org>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 49 +++++++++++-------------
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index cfe05ba7280d..e9fe3ad896c8 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -763,6 +763,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 	u32 pkt_offset;
 	u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
 	u32 buf_desc_sz = chip->rx_buf_desc_sz;
+	u32 new_len;
 	u8 *rx_desc;
 	dma_addr_t dma;
 
@@ -790,40 +791,34 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 		pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
 			     pkt_stat.shift;
 
-		if (pkt_stat.is_c2h) {
-			/* keep rx_desc, halmac needs it */
-			skb_put(skb, pkt_stat.pkt_len + pkt_offset);
+		/* discard current skb if the new skb cannot be allocated as a
+		 * new one in rx ring later
+		 */
+		new_len = pkt_stat.pkt_len + pkt_offset;
+		new = dev_alloc_skb(new_len);
+		if (WARN_ONCE(!new, "rx routine starvation\n"))
+			goto next_rp;
+
+		/* put the DMA data including rx_desc from phy to new skb */
+		skb_put_data(new, skb->data, new_len);
 
-			/* pass offset for further operation */
-			*((u32 *)skb->cb) = pkt_offset;
-			skb_queue_tail(&rtwdev->c2h_queue, skb);
+		if (pkt_stat.is_c2h) {
+			 /* pass rx_desc & offset for further operation */
+			*((u32 *)new->cb) = pkt_offset;
+			skb_queue_tail(&rtwdev->c2h_queue, new);
 			ieee80211_queue_work(rtwdev->hw, &rtwdev->c2h_work);
 		} else {
-			/* remove rx_desc, maybe use skb_pull? */
-			skb_put(skb, pkt_stat.pkt_len);
-			skb_reserve(skb, pkt_offset);
-
-			/* alloc a smaller skb to mac80211 */
-			new = dev_alloc_skb(pkt_stat.pkt_len);
-			if (!new) {
-				new = skb;
-			} else {
-				skb_put_data(new, skb->data, skb->len);
-				dev_kfree_skb_any(skb);
-			}
-			/* TODO: merge into rx.c */
-			rtw_rx_stats(rtwdev, pkt_stat.vif, skb);
+			/* remove rx_desc */
+			skb_pull(new, pkt_offset);
+
+			rtw_rx_stats(rtwdev, pkt_stat.vif, new);
 			memcpy(new->cb, &rx_status, sizeof(rx_status));
 			ieee80211_rx_irqsafe(rtwdev->hw, new);
 		}
 
-		/* skb delivered to mac80211, alloc a new one in rx ring */
-		new = dev_alloc_skb(RTK_PCI_RX_BUF_SIZE);
-		if (WARN(!new, "rx routine starvation\n"))
-			return;
-
-		ring->buf[cur_rp] = new;
-		rtw_pci_reset_rx_desc(rtwdev, new, ring, cur_rp, buf_desc_sz);
+next_rp:
+		/* new skb delivered to mac80211, re-enable original skb DMA */
+		rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
 
 		/* host read next element in ring */
 		if (++cur_rp >= ring->r.len)
-- 
2.22.0


^ permalink raw reply related

* [PATCH v2 2/2] rtw88: pci: Use DMA sync instead of remapping in RX ISR
From: Jian-Hong Pan @ 2019-07-09 10:21 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller, Larry Finger,
	David Laight
  Cc: linux-wireless, netdev, linux-kernel, linux, Daniel Drake,
	Jian-Hong Pan
In-Reply-To: <20190708063252.4756-1-jian-hong@endlessm.com>

Since each skb in RX ring is reused instead of new allocation, we can
treat the DMA in a more efficient way by DMA synchronization.

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 35 ++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index e9fe3ad896c8..28ca76f71dfe 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -206,6 +206,35 @@ static int rtw_pci_reset_rx_desc(struct rtw_dev *rtwdev, struct sk_buff *skb,
 	return 0;
 }
 
+static int rtw_pci_sync_rx_desc_cpu(struct rtw_dev *rtwdev, dma_addr_t dma)
+{
+	struct device *dev = rtwdev->dev;
+	int buf_sz = RTK_PCI_RX_BUF_SIZE;
+
+	dma_sync_single_for_cpu(dev, dma, buf_sz, PCI_DMA_FROMDEVICE);
+
+	return 0;
+}
+
+static int rtw_pci_sync_rx_desc_device(struct rtw_dev *rtwdev, dma_addr_t dma,
+				       struct rtw_pci_rx_ring *rx_ring,
+				       u32 idx, u32 desc_sz)
+{
+	struct device *dev = rtwdev->dev;
+	struct rtw_pci_rx_buffer_desc *buf_desc;
+	int buf_sz = RTK_PCI_RX_BUF_SIZE;
+
+	dma_sync_single_for_device(dev, dma, buf_sz, PCI_DMA_FROMDEVICE);
+
+	buf_desc = (struct rtw_pci_rx_buffer_desc *)(rx_ring->r.head +
+						     idx * desc_sz);
+	memset(buf_desc, 0, sizeof(*buf_desc));
+	buf_desc->buf_size = cpu_to_le16(RTK_PCI_RX_BUF_SIZE);
+	buf_desc->dma = cpu_to_le32(dma);
+
+	return 0;
+}
+
 static int rtw_pci_init_rx_ring(struct rtw_dev *rtwdev,
 				struct rtw_pci_rx_ring *rx_ring,
 				u8 desc_size, u32 len)
@@ -782,8 +811,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 		rtw_pci_dma_check(rtwdev, ring, cur_rp);
 		skb = ring->buf[cur_rp];
 		dma = *((dma_addr_t *)skb->cb);
-		pci_unmap_single(rtwpci->pdev, dma, RTK_PCI_RX_BUF_SIZE,
-				 PCI_DMA_FROMDEVICE);
+		rtw_pci_sync_rx_desc_cpu(rtwdev, dma);
 		rx_desc = skb->data;
 		chip->ops->query_rx_desc(rtwdev, rx_desc, &pkt_stat, &rx_status);
 
@@ -818,7 +846,8 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
 
 next_rp:
 		/* new skb delivered to mac80211, re-enable original skb DMA */
-		rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
+		rtw_pci_sync_rx_desc_device(rtwdev, dma, ring, cur_rp,
+					    buf_desc_sz);
 
 		/* host read next element in ring */
 		if (++cur_rp >= ring->r.len)
-- 
2.22.0


^ permalink raw reply related

* RE: [PATCH v5 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Michal Kalderon @ 2019-07-09 10:26 UTC (permalink / raw)
  To: Gal Pressman, Ariel Elior, jgg@ziepe.ca, dledford@redhat.com
  Cc: linux-rdma@vger.kernel.org, davem@davemloft.net,
	netdev@vger.kernel.org
In-Reply-To: <da67a821-1b26-c795-ff43-af17324f07e5@amazon.com>

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Gal Pressman
> 
> On 08/07/2019 12:14, Michal Kalderon wrote:
> > diff --git a/drivers/infiniband/core/device.c
> > b/drivers/infiniband/core/device.c
> > index 8a6ccb936dfe..a830c2c5d691 100644
> > --- a/drivers/infiniband/core/device.c
> > +++ b/drivers/infiniband/core/device.c
> > @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev,
> const struct ib_device_ops *ops)
> >  	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
> >  	SET_DEVICE_OP(dev_ops, map_phys_fmr);
> >  	SET_DEVICE_OP(dev_ops, mmap);
> > +	SET_DEVICE_OP(dev_ops, mmap_free);
> >  	SET_DEVICE_OP(dev_ops, modify_ah);
> >  	SET_DEVICE_OP(dev_ops, modify_cq);
> >  	SET_DEVICE_OP(dev_ops, modify_device); diff --git
> > a/drivers/infiniband/core/rdma_core.c
> > b/drivers/infiniband/core/rdma_core.c
> > index ccf4d069c25c..7166741834c8 100644
> > --- a/drivers/infiniband/core/rdma_core.c
> > +++ b/drivers/infiniband/core/rdma_core.c
> > @@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct
> ib_uverbs_file *ufile,
> >  	rdma_restrack_del(&ucontext->res);
> >
> >  	ib_dev->ops.dealloc_ucontext(ucontext);
> > +	rdma_user_mmap_entries_remove_free(ucontext);
> 
> This should happen before dealloc_ucontext.
Right, will fix.
> 
> > +struct rdma_user_mmap_entry *
> > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64
> > +len) {
> > +	struct rdma_user_mmap_entry *entry;
> > +	u64 mmap_page;
> > +
> > +	mmap_page = key >> PAGE_SHIFT;
> > +	if (mmap_page > U32_MAX)
> > +		return NULL;
> > +
> > +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > +	if (!entry || rdma_user_mmap_get_key(entry) != key ||
> 
> I wonder if the 'rdma_user_mmap_get_key(entry) != key' check is still
> needed.
I guess it's not since the key is used to get the entry. I'll remove the check

> 
> > +/*
> > + * This is only called when the ucontext is destroyed and there can
> > +be no
> > + * concurrent query via mmap or allocate on the xarray, thus we can
> > +be sure no
> > + * other thread is using the entry pointer. We also know that all the
> > +BAR
> > + * pages have either been zap'd or munmaped at this point.  Normal
> > +pages are
> > + * refcounted and will be freed at the proper time.
> > + */
> > +void rdma_user_mmap_entries_remove_free(struct ib_ucontext
> *ucontext)
> > +{
> > +	struct rdma_user_mmap_entry *entry;
> > +	unsigned long mmap_page;
> > +
> > +	xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
> > +		xa_erase(&ucontext->mmap_xa, mmap_page);
> > +
> > +		ibdev_dbg(ucontext->device,
> > +			  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx]
> removed\n",
> > +			  entry->obj, rdma_user_mmap_get_key(entry),
> > +			  entry->address, entry->length);
> > +		if (ucontext->device->ops.mmap_free)
> > +			ucontext->device->ops.mmap_free(entry->address,
> > +							entry->length,
> > +							entry->mmap_flag);
> 
> Pass entry instead?
> 
> > +		kfree(entry);
> > +	}
> > +}
> > +
> >  void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)  {
> >  	struct rdma_umap_priv *priv, *next_priv; diff --git
> > a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
> > 26e9c2594913..54ce3fdae180 100644
> > --- a/include/rdma/ib_verbs.h
> > +++ b/include/rdma/ib_verbs.h
> > @@ -1425,6 +1425,8 @@ struct ib_ucontext {
> >  	 * Implementation details of the RDMA core, don't use in drivers:
> >  	 */
> >  	struct rdma_restrack_entry res;
> > +	struct xarray mmap_xa;
> > +	u32 mmap_xa_page;
> >  };
> >
> >  struct ib_uobject {
> > @@ -2311,6 +2313,7 @@ struct ib_device_ops {
> >  			      struct ib_udata *udata);
> >  	void (*dealloc_ucontext)(struct ib_ucontext *context);
> >  	int (*mmap)(struct ib_ucontext *context, struct vm_area_struct
> > *vma);
> > +	void (*mmap_free)(u64 address, u64 length, u8 mmap_flag);
> 
> I feel like this callback needs some documentation.
> 
> >  	void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
> >  	int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
> >  	void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata); @@
> > -2706,9 +2709,23 @@ void  ib_set_client_data(struct ib_device *device,
> > struct ib_client *client,  void ib_set_device_ops(struct ib_device *device,
> >  		       const struct ib_device_ops *ops);
> >
> > +#define RDMA_USER_MMAP_INVALID U64_MAX struct
> rdma_user_mmap_entry {
> > +	void  *obj;
> 
> I know EFA is the culprit here, but please remove the extra space :).
> 
> > +	u64 address;
> > +	u64 length;
> > +	u32 mmap_page;
> > +	u8 mmap_flag;
> > +};
> > +

^ permalink raw reply

* RE: [PATCH v5 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Michal Kalderon @ 2019-07-09 10:29 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ariel Elior, jgg@ziepe.ca, dledford@redhat.com,
	galpress@amazon.com, linux-rdma@vger.kernel.org,
	davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <20190709070244.GH7034@mtr-leonro.mtl.com>

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Leon Romanovsky
> 
> On Mon, Jul 08, 2019 at 12:14:58PM +0300, Michal Kalderon wrote:
> > Create some common API's for adding entries to a xa_mmap.
> > Searching for an entry and freeing one.
> >
> > The code was copied from the efa driver almost as is, just renamed
> > function to be generic and not efa specific.
> >
> > Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> > ---
> >  drivers/infiniband/core/device.c      |   1 +
> >  drivers/infiniband/core/rdma_core.c   |   1 +
> >  drivers/infiniband/core/uverbs_cmd.c  |   1 +
> >  drivers/infiniband/core/uverbs_main.c | 105
> ++++++++++++++++++++++++++++++++++
> >  include/rdma/ib_verbs.h               |  32 +++++++++++
> >  5 files changed, 140 insertions(+)
> >
> > diff --git a/drivers/infiniband/core/device.c
> > b/drivers/infiniband/core/device.c
> > index 8a6ccb936dfe..a830c2c5d691 100644
> > --- a/drivers/infiniband/core/device.c
> > +++ b/drivers/infiniband/core/device.c
> > @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev,
> const struct ib_device_ops *ops)
> >  	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
> >  	SET_DEVICE_OP(dev_ops, map_phys_fmr);
> >  	SET_DEVICE_OP(dev_ops, mmap);
> > +	SET_DEVICE_OP(dev_ops, mmap_free);
> >  	SET_DEVICE_OP(dev_ops, modify_ah);
> >  	SET_DEVICE_OP(dev_ops, modify_cq);
> >  	SET_DEVICE_OP(dev_ops, modify_device); diff --git
> > a/drivers/infiniband/core/rdma_core.c
> > b/drivers/infiniband/core/rdma_core.c
> > index ccf4d069c25c..7166741834c8 100644
> > --- a/drivers/infiniband/core/rdma_core.c
> > +++ b/drivers/infiniband/core/rdma_core.c
> > @@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct
> ib_uverbs_file *ufile,
> >  	rdma_restrack_del(&ucontext->res);
> >
> >  	ib_dev->ops.dealloc_ucontext(ucontext);
> > +	rdma_user_mmap_entries_remove_free(ucontext);
> >  	kfree(ucontext);
> >
> >  	ufile->ucontext = NULL;
> > diff --git a/drivers/infiniband/core/uverbs_cmd.c
> > b/drivers/infiniband/core/uverbs_cmd.c
> > index 7ddd0e5bc6b3..44c0600245e4 100644
> > --- a/drivers/infiniband/core/uverbs_cmd.c
> > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > @@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct
> > uverbs_attr_bundle *attrs)
> >
> >  	mutex_init(&ucontext->per_mm_list_lock);
> >  	INIT_LIST_HEAD(&ucontext->per_mm_list);
> > +	xa_init(&ucontext->mmap_xa);
> >
> >  	ret = get_unused_fd_flags(O_CLOEXEC);
> >  	if (ret < 0)
> > diff --git a/drivers/infiniband/core/uverbs_main.c
> > b/drivers/infiniband/core/uverbs_main.c
> > index 11c13c1381cf..37507cc27e8c 100644
> > --- a/drivers/infiniband/core/uverbs_main.c
> > +++ b/drivers/infiniband/core/uverbs_main.c
> > @@ -965,6 +965,111 @@ int rdma_user_mmap_io(struct ib_ucontext
> > *ucontext, struct vm_area_struct *vma,  }
> > EXPORT_SYMBOL(rdma_user_mmap_io);
> >
> > +static inline u64
> > +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry
> *entry) {
> > +	return (u64)entry->mmap_page << PAGE_SHIFT; }
> > +
> > +struct rdma_user_mmap_entry *
> > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64
> > +len) {
> > +	struct rdma_user_mmap_entry *entry;
> > +	u64 mmap_page;
> > +
> > +	mmap_page = key >> PAGE_SHIFT;
> > +	if (mmap_page > U32_MAX)
> > +		return NULL;
> > +
> > +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > +	if (!entry || rdma_user_mmap_get_key(entry) != key ||
> > +	    entry->length != len)
> > +		return NULL;
> > +
> > +	ibdev_dbg(ucontext->device,
> > +		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx]
> removed\n",
> > +		  entry->obj, key, entry->address, entry->length);
> > +
> > +	return entry;
> > +}
> > +EXPORT_SYMBOL(rdma_user_mmap_entry_get);
> 
> Please add function description in kernel doc format for all newly
> EXPORT_SYMBOL() functions you introduced in RDMA/core.
Ok. Could you give me a reference to an example? Where should the 
Documentation be added to? 


> 
> > +
> > +/*
> > + * Note this locking scheme cannot support removal of entries, except
> > +during
> > + * ucontext destruction when the core code guarentees no concurrency.
> > + */
> > +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void
> *obj,
> > +				u64 address, u64 length, u8 mmap_flag) {
> > +	struct rdma_user_mmap_entry *entry;
> > +	u32 next_mmap_page;
> > +	int err;
> > +
> > +	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
> 
> It is worth to use kzalloc and not kmalloc.
> 
> Thanks

^ permalink raw reply

* RE: [PATCH v5 rdma-next 2/6] RDMA/efa: Use the common mmap_xa helpers
From: Michal Kalderon @ 2019-07-09 10:30 UTC (permalink / raw)
  To: Gal Pressman, Ariel Elior, jgg@ziepe.ca, dledford@redhat.com
  Cc: linux-rdma@vger.kernel.org, davem@davemloft.net,
	netdev@vger.kernel.org
In-Reply-To: <0a28f174-1875-452e-ea0a-c8db2d243ce5@amazon.com>

> From: Gal Pressman <galpress@amazon.com>
> Sent: Tuesday, July 9, 2019 12:03 PM
> 
> On 08/07/2019 12:14, Michal Kalderon wrote:
> 
> Hi, a few nits:
Thanks for the review, will fix them.

> 
> > Remove the functions related to managing the mmap_xa database.
> > This code was copied to the ib_core. Use the common API's instead.
> >
> > Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> > ---
> >  drivers/infiniband/hw/efa/efa.h       |   3 +-
> >  drivers/infiniband/hw/efa/efa_main.c  |   1 +
> >  drivers/infiniband/hw/efa/efa_verbs.c | 183
> > ++++++++--------------------------
> >  3 files changed, 42 insertions(+), 145 deletions(-) diff --git
> > a/drivers/infiniband/hw/efa/efa_verbs.c
> > b/drivers/infiniband/hw/efa/efa_verbs.c
> > index df77bc312a25..5dff892da161 100644
> > --- a/drivers/infiniband/hw/efa/efa_verbs.c
> > +++ b/drivers/infiniband/hw/efa/efa_verbs.c
> > @@ -13,34 +13,15 @@
> >
> >  #include "efa.h"
> >
> > -#define EFA_MMAP_FLAG_SHIFT 56
> > -#define EFA_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT -
> 1, 0)
> > -#define EFA_MMAP_INVALID U64_MAX
> > -
> 
> Don't delete the blank line please.
> 
> >  enum {
> >  	EFA_MMAP_DMA_PAGE = 0,
> >  	EFA_MMAP_IO_WC,
> >  	EFA_MMAP_IO_NC,
> >  };
> > -
> >  #define EFA_AENQ_ENABLED_GROUPS \
> >  	(BIT(EFA_ADMIN_FATAL_ERROR) | BIT(EFA_ADMIN_WARNING) | \
> >  	 BIT(EFA_ADMIN_NOTIFICATION) | BIT(EFA_ADMIN_KEEP_ALIVE))
> >
> > -struct efa_mmap_entry {
> > -	void  *obj;
> > -	u64 address;
> > -	u64 length;
> > -	u32 mmap_page;
> > -	u8 mmap_flag;
> > -};
> > -
> > -static inline u64 get_mmap_key(const struct efa_mmap_entry *efa) -{
> > -	return ((u64)efa->mmap_flag << EFA_MMAP_FLAG_SHIFT) |
> > -	       ((u64)efa->mmap_page << PAGE_SHIFT);
> > -}
> > -
> >  #define EFA_CHUNK_PAYLOAD_SHIFT       12
> >  #define EFA_CHUNK_PAYLOAD_SIZE
> BIT(EFA_CHUNK_PAYLOAD_SHIFT)
> >  #define EFA_CHUNK_PAYLOAD_PTR_SIZE    8
> > @@ -145,105 +126,7 @@ static void *efa_zalloc_mapped(struct efa_dev
> *dev, dma_addr_t *dma_addr,
> >  	return addr;
> >  }
> >
> > -/*
> > - * This is only called when the ucontext is destroyed and there can
> > be no
> > - * concurrent query via mmap or allocate on the xarray, thus we can
> > be sure no
> > - * other thread is using the entry pointer. We also know that all the
> > BAR
> > - * pages have either been zap'd or munmaped at this point.  Normal
> > pages are
> > - * refcounted and will be freed at the proper time.
> > - */
> > -static void mmap_entries_remove_free(struct efa_dev *dev,
> > -				     struct efa_ucontext *ucontext)
> > -{
> > -	struct efa_mmap_entry *entry;
> > -	unsigned long mmap_page;
> >
> > -	xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
> > -		xa_erase(&ucontext->mmap_xa, mmap_page);
> > -
> > -		ibdev_dbg(
> > -			&dev->ibdev,
> > -			"mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx]
> removed\n",
> > -			entry->obj, get_mmap_key(entry), entry->address,
> > -			entry->length);
> > -		if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
> > -			/* DMA mapping is already gone, now free the pages
> */
> > -			free_pages_exact(phys_to_virt(entry->address),
> > -					 entry->length);
> > -		kfree(entry);
> > -	}
> > -}
> > -
> > -static struct efa_mmap_entry *mmap_entry_get(struct efa_dev *dev,
> > -					     struct efa_ucontext *ucontext,
> > -					     u64 key, u64 len)
> > -{
> > -	struct efa_mmap_entry *entry;
> > -	u64 mmap_page;
> > -
> > -	mmap_page = (key & EFA_MMAP_PAGE_MASK) >> PAGE_SHIFT;
> > -	if (mmap_page > U32_MAX)
> > -		return NULL;
> > -
> > -	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > -	if (!entry || get_mmap_key(entry) != key || entry->length != len)
> > -		return NULL;
> > -
> > -	ibdev_dbg(&dev->ibdev,
> > -		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx]
> removed\n",
> > -		  entry->obj, key, entry->address, entry->length);
> > -
> > -	return entry;
> > -}
> > -
> > -/*
> > - * Note this locking scheme cannot support removal of entries, except
> > during
> > - * ucontext destruction when the core code guarentees no concurrency.
> > - */
> > -static u64 mmap_entry_insert(struct efa_dev *dev, struct efa_ucontext
> *ucontext,
> > -			     void *obj, u64 address, u64 length, u8 mmap_flag)
> > -{
> > -	struct efa_mmap_entry *entry;
> > -	u32 next_mmap_page;
> > -	int err;
> > -
> > -	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
> > -	if (!entry)
> > -		return EFA_MMAP_INVALID;
> > -
> > -	entry->obj = obj;
> > -	entry->address = address;
> > -	entry->length = length;
> > -	entry->mmap_flag = mmap_flag;
> > -
> > -	xa_lock(&ucontext->mmap_xa);
> > -	if (check_add_overflow(ucontext->mmap_xa_page,
> > -			       (u32)(length >> PAGE_SHIFT),
> > -			       &next_mmap_page))
> > -		goto err_unlock;
> > -
> > -	entry->mmap_page = ucontext->mmap_xa_page;
> > -	ucontext->mmap_xa_page = next_mmap_page;
> > -	err = __xa_insert(&ucontext->mmap_xa, entry->mmap_page,
> entry,
> > -			  GFP_KERNEL);
> > -	if (err)
> > -		goto err_unlock;
> > -
> > -	xa_unlock(&ucontext->mmap_xa);
> > -
> > -	ibdev_dbg(
> > -		&dev->ibdev,
> > -		"mmap: obj[0x%p] addr[%#llx], len[%#llx], key[%#llx]
> inserted\n",
> > -		entry->obj, entry->address, entry->length,
> get_mmap_key(entry));
> > -
> > -	return get_mmap_key(entry);
> > -
> > -err_unlock:
> > -	xa_unlock(&ucontext->mmap_xa);
> > -	kfree(entry);
> > -	return EFA_MMAP_INVALID;
> > -
> > -}
> >
> 
> You left two extra blank lines between efa_zalloc_mapped and
> efa_query_device.
> 
> >  int efa_query_device(struct ib_device *ibdev,
> >  		     struct ib_device_attr *props,
> > @@ -488,45 +371,52 @@ static int qp_mmap_entries_setup(struct efa_qp
> *qp,
> >  				 struct efa_com_create_qp_params
> *params,
> >  				 struct efa_ibv_create_qp_resp *resp)  {
> > +	u64 address;
> > +	u64 length;
> 
> Line break.
> 
> >  	/*
> >  	 * Once an entry is inserted it might be mmapped, hence cannot be
> >  	 * cleaned up until dealloc_ucontext.
> >  	 */
> >  	resp->sq_db_mmap_key =
> > -		mmap_entry_insert(dev, ucontext, qp,
> > -				  dev->db_bar_addr + resp->sq_db_offset,
> > -				  PAGE_SIZE, EFA_MMAP_IO_NC);
> > -	if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
> > +		rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
> > +					    dev->db_bar_addr +
> > +					    resp->sq_db_offset,
> > +					    PAGE_SIZE, EFA_MMAP_IO_NC);
> > +	if (resp->sq_db_mmap_key == RDMA_USER_MMAP_INVALID)
> >  		return -ENOMEM;
> >
> >  	resp->sq_db_offset &= ~PAGE_MASK;
> >
> > +	address = dev->mem_bar_addr + resp->llq_desc_offset;
> > +	length = PAGE_ALIGN(params->sq_ring_size_in_bytes +
> > +			    (resp->llq_desc_offset & ~PAGE_MASK));
> >  	resp->llq_desc_mmap_key =
> > -		mmap_entry_insert(dev, ucontext, qp,
> > -				  dev->mem_bar_addr + resp-
> >llq_desc_offset,
> > -				  PAGE_ALIGN(params-
> >sq_ring_size_in_bytes +
> > -					     (resp->llq_desc_offset &
> ~PAGE_MASK)),
> > -				  EFA_MMAP_IO_WC);
> > -	if (resp->llq_desc_mmap_key == EFA_MMAP_INVALID)
> > +		rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
> > +					    address,
> > +					    length,
> > +					    EFA_MMAP_IO_WC);
> > +	if (resp->llq_desc_mmap_key == RDMA_USER_MMAP_INVALID)
> >  		return -ENOMEM;
> >
> >  	resp->llq_desc_offset &= ~PAGE_MASK;
> >
> >  	if (qp->rq_size) {
> > +		address = dev->db_bar_addr + resp->rq_db_offset;
> >  		resp->rq_db_mmap_key =
> > -			mmap_entry_insert(dev, ucontext, qp,
> > -					  dev->db_bar_addr + resp-
> >rq_db_offset,
> > -					  PAGE_SIZE, EFA_MMAP_IO_NC);
> > -		if (resp->rq_db_mmap_key == EFA_MMAP_INVALID)
> > +			rdma_user_mmap_entry_insert(&ucontext-
> >ibucontext, qp,
> > +						    address, PAGE_SIZE,
> > +						    EFA_MMAP_IO_NC);
> > +		if (resp->rq_db_mmap_key ==
> RDMA_USER_MMAP_INVALID)
> >  			return -ENOMEM;
> >
> >  		resp->rq_db_offset &= ~PAGE_MASK;
> >
> > +		address = virt_to_phys(qp->rq_cpu_addr);
> >  		resp->rq_mmap_key =
> > -			mmap_entry_insert(dev, ucontext, qp,
> > -					  virt_to_phys(qp->rq_cpu_addr),
> > -					  qp->rq_size,
> EFA_MMAP_DMA_PAGE);
> > -		if (resp->rq_mmap_key == EFA_MMAP_INVALID)
> > +			rdma_user_mmap_entry_insert(&ucontext-
> >ibucontext, qp,
> > +						    address, qp->rq_size,
> > +						    EFA_MMAP_DMA_PAGE);
> > +		if (resp->rq_mmap_key == RDMA_USER_MMAP_INVALID)
> >  			return -ENOMEM;
> >
> >  		resp->rq_mmap_size = qp->rq_size;
> > @@ -875,11 +765,13 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct
> > ib_udata *udata)  static int cq_mmap_entries_setup(struct efa_dev *dev,
> struct efa_cq *cq,
> >  				 struct efa_ibv_create_cq_resp *resp)  {
> > +	struct efa_ucontext *ucontext = cq->ucontext;
> 
> Line break.
> 
> >  	resp->q_mmap_size = cq->size;
> > -	resp->q_mmap_key = mmap_entry_insert(dev, cq->ucontext, cq,
> > -					     virt_to_phys(cq->cpu_addr),
> > -					     cq->size,
> EFA_MMAP_DMA_PAGE);
> > -	if (resp->q_mmap_key == EFA_MMAP_INVALID)
> > +	resp->q_mmap_key =
> > +		rdma_user_mmap_entry_insert(&ucontext->ibucontext, cq,
> > +					    virt_to_phys(cq->cpu_addr),
> > +					    cq->size,
> EFA_MMAP_DMA_PAGE);
> > +	if (resp->q_mmap_key == RDMA_USER_MMAP_INVALID)
> >  		return -ENOMEM;
> >
> >  	return 0;
> > @@ -1531,7 +1423,6 @@ int efa_alloc_ucontext(struct ib_ucontext
> *ibucontext, struct ib_udata *udata)
> >  		goto err_out;
> >
> >  	ucontext->uarn = result.uarn;
> > -	xa_init(&ucontext->mmap_xa);
> >
> >  	resp.cmds_supp_udata_mask |=
> EFA_USER_CMDS_SUPP_UDATA_QUERY_DEVICE;
> >  	resp.cmds_supp_udata_mask |=
> EFA_USER_CMDS_SUPP_UDATA_CREATE_AH;
> > @@ -1560,19 +1451,25 @@ void efa_dealloc_ucontext(struct ib_ucontext
> *ibucontext)
> >  	struct efa_ucontext *ucontext = to_eucontext(ibucontext);
> >  	struct efa_dev *dev = to_edev(ibucontext->device);
> >
> > -	mmap_entries_remove_free(dev, ucontext);
> >  	efa_dealloc_uar(dev, ucontext->uarn);
> >  }
> >
> > +void efa_mmap_free(u64 address, u64 length, u8 mmap_flag)
> > +{
> > +	/* DMA mapping is already gone, now free the pages */
> > +	if (mmap_flag == EFA_MMAP_DMA_PAGE)
> > +		free_pages_exact(phys_to_virt(address), length);
> > +}
> > +
> >  static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext
> *ucontext,
> >  		      struct vm_area_struct *vma, u64 key, u64 length)
> >  {
> > -	struct efa_mmap_entry *entry;
> > +	struct rdma_user_mmap_entry *entry;
> >  	unsigned long va;
> >  	u64 pfn;
> >  	int err;
> >
> > -	entry = mmap_entry_get(dev, ucontext, key, length);
> > +	entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key,
> length);
> >  	if (!entry) {
> >  		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid
> entry\n",
> >  			  key);
> >

^ permalink raw reply

* Re: [PATCH nf-next 1/3] netfilter: nf_nat_proto: add nf_nat_bridge_ops support
From: Florian Westphal @ 2019-07-09 10:42 UTC (permalink / raw)
  To: wenxu; +Cc: Florian Westphal, pablo, netfilter-devel, netdev
In-Reply-To: <0a4cf910-6c87-34b6-3018-3e25f6fecdce@ucloud.cn>

wenxu <wenxu@ucloud.cn> wrote:
> > For NAT on bridge, it should be possible already to push such packets
> > up the stack by
> >
> > bridge input meta iif eth0 ip saddr 192.168.0.0/16 \
> >        meta pkttype set unicast ether daddr set 00:11:22:33:44:55
> 
> yes, packet can be push up to IP stack to handle the nat through bridge device. 
> 
> In my case dnat 2.2.1.7 to 10.0.0.7, It assume the mac address of the two address
> is the same known by outer.

I think that in general they will have different MAC addresses, so plain
replacement of ip addresses won't work.

> But in This case modify the packet dmac to bridge device, the packet push up through bridge device
> Then do nat and route send back to bridge device.

Are you saying that you can use the send-to-ip-layer approach?

We might need/want a more convenient way to do this.
There are two ways that I can see:

1. a redirect support for nftables bridge family.
   The redirect expression would be same as "ether daddr set
   <bridge_mac>", but there is no need to know the bridge mac address.

2. Support ebtables -t broute in nftables.
   The route rework for ebtables has been completed already, so
   this needs a new expression.  Packet that is brouted behaves
   as if the bridge port was not part of the bridge.

^ permalink raw reply

* Re: [PATCH v5 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Leon Romanovsky @ 2019-07-09 10:52 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Ariel Elior, jgg@ziepe.ca, dledford@redhat.com,
	galpress@amazon.com, linux-rdma@vger.kernel.org,
	davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <MN2PR18MB31825FB2CCC56C47763C9D0DA1F10@MN2PR18MB3182.namprd18.prod.outlook.com>

On Tue, Jul 09, 2019 at 10:29:36AM +0000, Michal Kalderon wrote:
> > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > owner@vger.kernel.org> On Behalf Of Leon Romanovsky
> >
> > On Mon, Jul 08, 2019 at 12:14:58PM +0300, Michal Kalderon wrote:
> > > Create some common API's for adding entries to a xa_mmap.
> > > Searching for an entry and freeing one.
> > >
> > > The code was copied from the efa driver almost as is, just renamed
> > > function to be generic and not efa specific.
> > >
> > > Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> > > ---
> > >  drivers/infiniband/core/device.c      |   1 +
> > >  drivers/infiniband/core/rdma_core.c   |   1 +
> > >  drivers/infiniband/core/uverbs_cmd.c  |   1 +
> > >  drivers/infiniband/core/uverbs_main.c | 105
> > ++++++++++++++++++++++++++++++++++
> > >  include/rdma/ib_verbs.h               |  32 +++++++++++
> > >  5 files changed, 140 insertions(+)
> > >
> > > diff --git a/drivers/infiniband/core/device.c
> > > b/drivers/infiniband/core/device.c
> > > index 8a6ccb936dfe..a830c2c5d691 100644
> > > --- a/drivers/infiniband/core/device.c
> > > +++ b/drivers/infiniband/core/device.c
> > > @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev,
> > const struct ib_device_ops *ops)
> > >  	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
> > >  	SET_DEVICE_OP(dev_ops, map_phys_fmr);
> > >  	SET_DEVICE_OP(dev_ops, mmap);
> > > +	SET_DEVICE_OP(dev_ops, mmap_free);
> > >  	SET_DEVICE_OP(dev_ops, modify_ah);
> > >  	SET_DEVICE_OP(dev_ops, modify_cq);
> > >  	SET_DEVICE_OP(dev_ops, modify_device); diff --git
> > > a/drivers/infiniband/core/rdma_core.c
> > > b/drivers/infiniband/core/rdma_core.c
> > > index ccf4d069c25c..7166741834c8 100644
> > > --- a/drivers/infiniband/core/rdma_core.c
> > > +++ b/drivers/infiniband/core/rdma_core.c
> > > @@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct
> > ib_uverbs_file *ufile,
> > >  	rdma_restrack_del(&ucontext->res);
> > >
> > >  	ib_dev->ops.dealloc_ucontext(ucontext);
> > > +	rdma_user_mmap_entries_remove_free(ucontext);
> > >  	kfree(ucontext);
> > >
> > >  	ufile->ucontext = NULL;
> > > diff --git a/drivers/infiniband/core/uverbs_cmd.c
> > > b/drivers/infiniband/core/uverbs_cmd.c
> > > index 7ddd0e5bc6b3..44c0600245e4 100644
> > > --- a/drivers/infiniband/core/uverbs_cmd.c
> > > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > > @@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct
> > > uverbs_attr_bundle *attrs)
> > >
> > >  	mutex_init(&ucontext->per_mm_list_lock);
> > >  	INIT_LIST_HEAD(&ucontext->per_mm_list);
> > > +	xa_init(&ucontext->mmap_xa);
> > >
> > >  	ret = get_unused_fd_flags(O_CLOEXEC);
> > >  	if (ret < 0)
> > > diff --git a/drivers/infiniband/core/uverbs_main.c
> > > b/drivers/infiniband/core/uverbs_main.c
> > > index 11c13c1381cf..37507cc27e8c 100644
> > > --- a/drivers/infiniband/core/uverbs_main.c
> > > +++ b/drivers/infiniband/core/uverbs_main.c
> > > @@ -965,6 +965,111 @@ int rdma_user_mmap_io(struct ib_ucontext
> > > *ucontext, struct vm_area_struct *vma,  }
> > > EXPORT_SYMBOL(rdma_user_mmap_io);
> > >
> > > +static inline u64
> > > +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry
> > *entry) {
> > > +	return (u64)entry->mmap_page << PAGE_SHIFT; }
> > > +
> > > +struct rdma_user_mmap_entry *
> > > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64
> > > +len) {
> > > +	struct rdma_user_mmap_entry *entry;
> > > +	u64 mmap_page;
> > > +
> > > +	mmap_page = key >> PAGE_SHIFT;
> > > +	if (mmap_page > U32_MAX)
> > > +		return NULL;
> > > +
> > > +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > > +	if (!entry || rdma_user_mmap_get_key(entry) != key ||
> > > +	    entry->length != len)
> > > +		return NULL;
> > > +
> > > +	ibdev_dbg(ucontext->device,
> > > +		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx]
> > removed\n",
> > > +		  entry->obj, key, entry->address, entry->length);
> > > +
> > > +	return entry;
> > > +}
> > > +EXPORT_SYMBOL(rdma_user_mmap_entry_get);
> >
> > Please add function description in kernel doc format for all newly
> > EXPORT_SYMBOL() functions you introduced in RDMA/core.
> Ok. Could you give me a reference to an example? Where should the
> Documentation be added to?

Above function in *.c file.

For example, see function rdma_set_ack_timeout():
https://patchwork.kernel.org/patch/10778827/

Thanks

^ permalink raw reply

* Re: [PATCH] iwlwifi: fix warning iwl-trans.h is included more than once
From: Luca Coelho @ 2019-07-09 10:58 UTC (permalink / raw)
  To: Hariprasad Kelam, Johannes Berg, Emmanuel Grumbach,
	Intel Linux Wireless, Kalle Valo, David S. Miller, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <20190526113815.GA6328@hari-Inspiron-1545>

On Sun, 2019-05-26 at 17:08 +0530, Hariprasad Kelam wrote:
> remove duplication include of iwl-trans.h
> 
> issue identified by includecheck
> 
> Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
> ---

Thanks! I have applied this (with small modifications to the commit
message) in our internal tree and it will reach the mainline following
our normal upstreaming process.

--
Cheers,
Luca.


^ permalink raw reply

* IPv6 flow label reflection behave for RST packets
From: Marek Majkowski @ 2019-07-09 11:10 UTC (permalink / raw)
  To: kuznet, yoshfuji, Jakub Sitnicki; +Cc: netdev, kernel-team

Morning,

I'm experimenting with flow label reflection from a server point of
view. I'm able to get it working in both supported ways:

(a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0

(b) with global flowlabel_reflect sysctl

However, I was surprised to see that RST after the connection is torn
down, doesn't have the correct flow label value:

IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [S]
IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [S.]
IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [.]
IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [F.]
IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [P.]
IP6 (flowlabel 0xdfc46) ::1.1235 > ::1.59276: Flags [R]

Notice, the last RST packet has inconsistent flow label. Perhaps we
can argue this behaviour might be acceptable for a per-socket
IPV6_FL_F_REFLECT option, but with global flowlabel_reflect, I would
expect the RST to preserve the reflected flow label value.

I suspect the same behaviour is true for kernel-generated ICMPv6.

Prepared test case:
https://gist.github.com/majek/139081b84f9b5b6187c8ccff802e3ab3

This behaviour is not necessarily a bug, more of a surprise. Flow
label reflection is mostly useful in deployments where Linux servers
stand behind ECMP router, which uses flow-label to compute the hash.
Flow label reflection allows ICMP PTB message to be routed back to
correct server.

It's hard to imagine a situation where generated RST or ICMP echo
response would trigger a ICMP PTB. Flow label reflection is explained
here:
https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
and:
https://tools.ietf.org/html/rfc7098
https://tools.ietf.org/html/rfc6438

Cheers,
    Marek

(Note: the unrelated "fwmark_reflect" toggle is about something
different - flow marks, but also addresses RST and ICMP generated by
the server)

^ permalink raw reply

* [PATCH] crypto: user - make NETLINK_CRYPTO work inside netns
From: Ondrej Mosnacek @ 2019-07-09 11:11 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu
  Cc: netdev, David S . Miller, Stephan Mueller, Steffen Klassert,
	Don Zickus

Currently, NETLINK_CRYPTO works only in the init network namespace. It
doesn't make much sense to cut it out of the other network namespaces,
so do the minor plumbing work necessary to make it work in any network
namespace. Code inspired by net/core/sock_diag.c.

Tested using kcapi-dgst from libkcapi [1]:
Before:
    # unshare -n kcapi-dgst -c sha256 </dev/null | wc -c
    libkcapi - Error: Netlink error: sendmsg failed
    libkcapi - Error: Netlink error: sendmsg failed
    libkcapi - Error: NETLINK_CRYPTO: cannot obtain cipher information for hmac(sha512) (is required crypto_user.c patch missing? see documentation)
    0

After:
    # unshare -n kcapi-dgst -c sha256 </dev/null | wc -c
    32

[1] https://github.com/smuellerDD/libkcapi

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 crypto/crypto_user_base.c            | 37 +++++++++++++++++++---------
 crypto/crypto_user_stat.c            |  4 ++-
 include/crypto/internal/cryptouser.h |  2 --
 include/net/net_namespace.h          |  3 +++
 4 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/crypto/crypto_user_base.c b/crypto/crypto_user_base.c
index e48da3b75c71..c92d415eaf82 100644
--- a/crypto/crypto_user_base.c
+++ b/crypto/crypto_user_base.c
@@ -22,9 +22,10 @@
 #include <linux/crypto.h>
 #include <linux/cryptouser.h>
 #include <linux/sched.h>
-#include <net/netlink.h>
 #include <linux/security.h>
+#include <net/netlink.h>
 #include <net/net_namespace.h>
+#include <net/sock.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/internal/rng.h>
 #include <crypto/akcipher.h>
@@ -37,9 +38,6 @@
 
 static DEFINE_MUTEX(crypto_cfg_mutex);
 
-/* The crypto netlink socket */
-struct sock *crypto_nlsk;
-
 struct crypto_dump_info {
 	struct sk_buff *in_skb;
 	struct sk_buff *out_skb;
@@ -195,6 +193,7 @@ out:
 static int crypto_report(struct sk_buff *in_skb, struct nlmsghdr *in_nlh,
 			 struct nlattr **attrs)
 {
+	struct net *net = sock_net(in_skb->sk);
 	struct crypto_user_alg *p = nlmsg_data(in_nlh);
 	struct crypto_alg *alg;
 	struct sk_buff *skb;
@@ -226,7 +225,7 @@ drop_alg:
 	if (err)
 		return err;
 
-	return nlmsg_unicast(crypto_nlsk, skb, NETLINK_CB(in_skb).portid);
+	return nlmsg_unicast(net->crypto_nlsk, skb, NETLINK_CB(in_skb).portid);
 }
 
 static int crypto_dump_report(struct sk_buff *skb, struct netlink_callback *cb)
@@ -429,6 +428,7 @@ static const struct crypto_link {
 static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 			       struct netlink_ext_ack *extack)
 {
+	struct net *net = sock_net(skb->sk);
 	struct nlattr *attrs[CRYPTOCFGA_MAX+1];
 	const struct crypto_link *link;
 	int type, err;
@@ -459,7 +459,7 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 				.done = link->done,
 				.min_dump_alloc = min(dump_alloc, 65535UL),
 			};
-			err = netlink_dump_start(crypto_nlsk, skb, nlh, &c);
+			err = netlink_dump_start(net->crypto_nlsk, skb, nlh, &c);
 		}
 
 		return err;
@@ -483,22 +483,35 @@ static void crypto_netlink_rcv(struct sk_buff *skb)
 	mutex_unlock(&crypto_cfg_mutex);
 }
 
-static int __init crypto_user_init(void)
+static int __net_init crypto_netlink_init(struct net *net)
 {
 	struct netlink_kernel_cfg cfg = {
 		.input	= crypto_netlink_rcv,
 	};
 
-	crypto_nlsk = netlink_kernel_create(&init_net, NETLINK_CRYPTO, &cfg);
-	if (!crypto_nlsk)
-		return -ENOMEM;
+	net->crypto_nlsk = netlink_kernel_create(net, NETLINK_CRYPTO, &cfg);
+	return net->crypto_nlsk == NULL ? -ENOMEM : 0;
+}
 
-	return 0;
+static void __net_exit crypto_netlink_exit(struct net *net)
+{
+	netlink_kernel_release(net->crypto_nlsk);
+	net->crypto_nlsk = NULL;
+}
+
+static struct pernet_operations crypto_netlink_net_ops = {
+	.init = crypto_netlink_init,
+	.exit = crypto_netlink_exit,
+};
+
+static int __init crypto_user_init(void)
+{
+	return register_pernet_subsys(&crypto_netlink_net_ops);
 }
 
 static void __exit crypto_user_exit(void)
 {
-	netlink_kernel_release(crypto_nlsk);
+	unregister_pernet_subsys(&crypto_netlink_net_ops);
 }
 
 module_init(crypto_user_init);
diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c
index a03f326a63d3..8bad88413de1 100644
--- a/crypto/crypto_user_stat.c
+++ b/crypto/crypto_user_stat.c
@@ -10,6 +10,7 @@
 #include <linux/cryptouser.h>
 #include <linux/sched.h>
 #include <net/netlink.h>
+#include <net/sock.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/internal/rng.h>
 #include <crypto/akcipher.h>
@@ -298,6 +299,7 @@ out:
 int crypto_reportstat(struct sk_buff *in_skb, struct nlmsghdr *in_nlh,
 		      struct nlattr **attrs)
 {
+	struct net *net = sock_net(in_skb->sk);
 	struct crypto_user_alg *p = nlmsg_data(in_nlh);
 	struct crypto_alg *alg;
 	struct sk_buff *skb;
@@ -329,7 +331,7 @@ drop_alg:
 	if (err)
 		return err;
 
-	return nlmsg_unicast(crypto_nlsk, skb, NETLINK_CB(in_skb).portid);
+	return nlmsg_unicast(net->crypto_nlsk, skb, NETLINK_CB(in_skb).portid);
 }
 
 MODULE_LICENSE("GPL");
diff --git a/include/crypto/internal/cryptouser.h b/include/crypto/internal/cryptouser.h
index 8c602b187c58..40623f4457df 100644
--- a/include/crypto/internal/cryptouser.h
+++ b/include/crypto/internal/cryptouser.h
@@ -1,8 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <net/netlink.h>
 
-extern struct sock *crypto_nlsk;
-
 struct crypto_alg *crypto_alg_match(struct crypto_user_alg *p, int exact);
 
 #ifdef CONFIG_CRYPTO_STATS
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 12689ddfc24c..610e40eaea52 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -165,6 +165,9 @@ struct net {
 #endif
 #ifdef CONFIG_XDP_SOCKETS
 	struct netns_xdp	xdp;
+#endif
+#if IS_ENABLED(CONFIG_CRYPTO_USER)
+	struct sock		*crypto_nlsk;
 #endif
 	struct sock		*diag_nlsk;
 	atomic_t		fnhe_genid;
-- 
2.21.0


^ permalink raw reply related

* Re: IPv6 flow label reflection behave for RST packets
From: Eric Dumazet @ 2019-07-09 11:19 UTC (permalink / raw)
  To: Marek Majkowski, kuznet, yoshfuji, Jakub Sitnicki; +Cc: netdev, kernel-team
In-Reply-To: <CAJPywT++ibhPSzL8pCS6Jpej9EeR3g9x89xssK8U=vi6FqLUUw@mail.gmail.com>



On 7/9/19 1:10 PM, Marek Majkowski wrote:
> Morning,
> 
> I'm experimenting with flow label reflection from a server point of
> view. I'm able to get it working in both supported ways:
> 
> (a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0
> 
> (b) with global flowlabel_reflect sysctl
> 
> However, I was surprised to see that RST after the connection is torn
> down, doesn't have the correct flow label value:
> 
> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [S]
> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [S.]
> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [.]
> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [F.]
> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [P.]
> IP6 (flowlabel 0xdfc46) ::1.1235 > ::1.59276: Flags [R]
> 
> Notice, the last RST packet has inconsistent flow label. Perhaps we
> can argue this behaviour might be acceptable for a per-socket
> IPV6_FL_F_REFLECT option, but with global flowlabel_reflect, I would
> expect the RST to preserve the reflected flow label value.
> 
> I suspect the same behaviour is true for kernel-generated ICMPv6.
> 
> Prepared test case:
> https://gist.github.com/majek/139081b84f9b5b6187c8ccff802e3ab3
> 
> This behaviour is not necessarily a bug, more of a surprise. Flow
> label reflection is mostly useful in deployments where Linux servers
> stand behind ECMP router, which uses flow-label to compute the hash.
> Flow label reflection allows ICMP PTB message to be routed back to
> correct server.
> 
> It's hard to imagine a situation where generated RST or ICMP echo
> response would trigger a ICMP PTB. Flow label reflection is explained
> here:
> https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
> and:
> https://tools.ietf.org/html/rfc7098
> https://tools.ietf.org/html/rfc6438
> 
> Cheers,
>     Marek
> 
> 
> (Note: the unrelated "fwmark_reflect" toggle is about something
> different - flow marks, but also addresses RST and ICMP generated by
> the server)
> 

Please check the recent commits, scheduled for linux-5.3

a346abe051bd2bd0d5d0140b2da9ec95639acad7 ipv6: icmp: allow flowlabel reflection in echo replies
c67b85558ff20cb1ff20874461d12af456bee5d0 ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state
392096736a06bc9d8f2b42fd4bb1a44b245b9fed ipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()
50a8accf10627b343109a9c9d5c361751bf753b0 ipv6: tcp: send consistent flowlabel in TIME_WAIT state
323a53c41292a0d7efc8748856c623324c8d7c21 ipv6: tcp: enable flowlabel reflection in some RST packets


^ permalink raw reply

* [PATCH] vhost: fix null pointer dereference in vhost_del_umem_range
From: Denis Kirjanov @ 2019-07-09 11:42 UTC (permalink / raw)
  To: mst, jasowang; +Cc: kvm, netdev, Denis Kirjanov

> UBSAN: Undefined behaviour in ../drivers/vhost/vhost.c:52:1
> member access within null pointer of type 'struct rb_root'
> CPU: 2 PID: 1450 Comm: syz-executor493 Not tainted
> 4.12.14-525.g4d6309b-default #1 SLE15 (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.0.0-prebuilt.qemu-project.org 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0xc6/0x159 lib/dump_stack.c:52
>  ubsan_epilogue+0xe/0x81 lib/ubsan.c:164
>  handle_missaligned_access lib/ubsan.c:299 [inline]
>  __ubsan_handle_type_mismatch+0x2ed/0x44e lib/ubsan.c:325
>  vhost_chr_write_iter+0x103e/0x1330 [vhost]
>  call_write_iter include/linux/fs.h:1764 [inline]
>  new_sync_write fs/read_write.c:497 [inline]
>  __vfs_write+0x67c/0x9b0 fs/read_write.c:510
>  vfs_write+0x1a2/0x610 fs/read_write.c:558
>  SYSC_write fs/read_write.c:605 [inline]
>  SyS_write+0xd8/0x1f0 fs/read_write.c:597
>  do_syscall_64+0x26b/0x6e0 arch/x86/entry/common.c:284
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7fdb7687a739
> RSP: 002b:00007ffd48719f38 EFLAGS: 00000213 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdb7687a739
> RDX: 0000000000000068 RSI: 0000000020000300 RDI: 0000000000000003
> RBP: 00000000004006f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000213 R12: 00000000004004a0
> R13: 00007ffd4871a020 R14: 0000000000000000 R15: 0000000000000000
> ================================================================================
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN PTI
> Modules linked in: vhost_net tun vhost tap af_packet iscsi_ibft
> iscsi_boot_sysfs bochs_drm ttm ppdev drm_kms_helper drm
> drm_panel_orientation_quirks joydev e1000 syscopyarea sysfillrect
> pcspkr sysimgblt fb_sys_fops i2c_piix4 parport_pc parport qemu_fw_cfg
> button ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic
> ata_piix ahci libahci libata serio_raw floppy sg dm_multipath dm_mod
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
> Supported: No, Unreleased kernel
> CPU: 2 PID: 1450 Comm: syz-executor493 Not tainted
> 4.12.14-525.g4d6309b-default #1 SLE15 (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.0.0-prebuilt.qemu-project.org 04/01/2014
> task: ffff88004b858040 task.stack: ffff88004a820000
> RIP: 0010:vhost_chr_write_iter+0x525/0x1330 [vhost]
> RSP: 0018:ffff88004a827a38 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: 1ffff10009504f51 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffffed0009504f40
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 1ffff10009504ec1 R11: 0000000000000000 R12: 0000000020000040
> R13: dffffc0000000000 R14: ffff8800352f0000 R15: ffffed0006a5e005
> FS:  00007fdb76f79740(0000) GS:ffff880060d00000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055b42161d980 CR3: 000000003422c000 CR4: 00000000000006e0
> Call Trace:
>  call_write_iter include/linux/fs.h:1764 [inline]
>  new_sync_write fs/read_write.c:497 [inline]
>  __vfs_write+0x67c/0x9b0 fs/read_write.c:510
>  vfs_write+0x1a2/0x610 fs/read_write.c:558
>  SYSC_write fs/read_write.c:605 [inline]
>  SyS_write+0xd8/0x1f0 fs/read_write.c:597
>  do_syscall_64+0x26b/0x6e0 arch/x86/entry/common.c:284
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7fdb7687a739
> RSP: 002b:00007ffd48719f38 EFLAGS: 00000213 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdb7687a739
> RDX: 0000000000000068 RSI: 0000000020000300 RDI: 0000000000000003
> RBP: 00000000004006f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000213 R12: 00000000004004a0
> R13: 00007ffd4871a020 R14: 0000000000000000 R15: 0000000000000000
> Code: fc ff df 80 3c 02 00 0f 85 3c 0b 00 00 49 8b 6e 60 48 85 ed 0f
> 84 1c 0b 00 00 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80>
> 3c 02 00 0f 85 f4 0a 00 00 4c 8b 7d 00 4d 85 ff 0f 84 e7 06
> RIP: vhost_chr_write_iter+0x525/0x1330 [vhost] RSP: ffff88004a827a38
> ---[ end trace 49849730b5255f76 ]---

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
---
 drivers/vhost/vhost.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e995c12d8e24..026123a6fc7b 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -962,7 +962,8 @@ static void vhost_del_umem_range(struct vhost_umem *umem,
 
 	while ((node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
 							   start, end)))
-		vhost_umem_free(umem, node);
+		if (node)
+			vhost_umem_free(umem, node);
 }
 
 static void vhost_iotlb_notify_vq(struct vhost_dev *d,
-- 
2.12.3


^ permalink raw reply related

* Re: [PATCH net-next 0/2] tc-testing: Add plugin for simple traffic generation
From: Jamal Hadi Salim @ 2019-07-09 11:44 UTC (permalink / raw)
  To: Lucas Bates, davem
  Cc: netdev, xiyou.wangcong, jiri, mleitner, vladbu, dcaratti, kernel
In-Reply-To: <1562636067-1338-1-git-send-email-lucasb@mojatatu.com>

On 2019-07-08 9:34 p.m., Lucas Bates wrote:
> This series supersedes the previous submission that included a patch for test
> case verification using JSON output.  It adds a new tdc plugin, scapyPlugin, as
> a way to send traffic to test tc filters and actions.
> 
> The first patch makes a change to the TdcPlugin module that will allow tdc
> plugins to examine the test case currently being executed, so plugins can
> play a more active role in testing by accepting information or commands from
> the test case.  This is required for scapyPlugin to work.
> 
> The second patch adds scapyPlugin itself, and an example test case file to
> demonstrate how the scapy block works in the test cases.
> 

Shouldve said V3 in the subject line - but fwiw,

ACKed-by: Jamal Hadi Salim <jhs@mojatatu.com>


cheers,
jamal

^ permalink raw reply

* KASAN: global-out-of-bounds Read in load_next_firmware_from_table
From: syzbot @ 2019-07-09 12:27 UTC (permalink / raw)
  To: andreyknvl, davem, gregkh, kvalo, libertas-dev, linux-kernel,
	linux-usb, linux-wireless, netdev, syzkaller-bugs, tglx

Hello,

syzbot found the following crash on:

HEAD commit:    7829a896 usb-fuzzer: main usb gadget fuzzer driver
git tree:       https://github.com/google/kasan.git usb-fuzzer
console output: https://syzkaller.appspot.com/x/log.txt?x=12fd0e9ba00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f6d4561982f71f63
dashboard link: https://syzkaller.appspot.com/bug?extid=98156c174c5a2cad9f8f
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=125f669ba00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=146b806ba00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+98156c174c5a2cad9f8f@syzkaller.appspotmail.com

usb 1-1: Direct firmware load for libertas/usb8388_v5.bin failed with error  
-2
usb 1-1: Direct firmware load for libertas/usb8388.bin failed with error -2
usb 1-1: Direct firmware load for usb8388.bin failed with error -2
==================================================================
BUG: KASAN: global-out-of-bounds in  
load_next_firmware_from_table+0x267/0x2d0  
drivers/net/wireless/marvell/libertas/firmware.c:99
Read of size 8 at addr ffffffff860942b8 by task kworker/1:1/21

CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 5.2.0-rc6+ #13
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: events request_firmware_work_func
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0xca/0x13e lib/dump_stack.c:113
  print_address_description+0x67/0x231 mm/kasan/report.c:188
  __kasan_report.cold+0x1a/0x32 mm/kasan/report.c:317
  kasan_report+0xe/0x20 mm/kasan/common.c:614
  load_next_firmware_from_table+0x267/0x2d0  
drivers/net/wireless/marvell/libertas/firmware.c:99
  helper_firmware_cb+0xdc/0x100  
drivers/net/wireless/marvell/libertas/firmware.c:70
  request_firmware_work_func+0x126/0x242  
drivers/base/firmware_loader/main.c:785
  process_one_work+0x905/0x1570 kernel/workqueue.c:2269
  worker_thread+0x96/0xe20 kernel/workqueue.c:2415
  kthread+0x30b/0x410 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the variable:
  fw_table+0x98/0x5c0

Memory state around the buggy address:
  ffffffff86094180: fa fa fa fa 00 04 fa fa fa fa fa fa 00 00 05 fa
  ffffffff86094200: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
> ffffffff86094280: 00 00 00 00 00 00 fa fa fa fa fa fa 00 00 00 00
                                         ^
  ffffffff86094300: 00 00 00 01 fa fa fa fa 00 00 00 00 02 fa fa fa
  ffffffff86094380: fa fa fa fa 00 03 fa fa fa fa fa fa 00 00 00 00
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox