* [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-25 14:39 Sinan Kaya
  2018-03-25 14:39 ` [PATCH v7 1/7] net: qla3xxx: " Sinan Kaya
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: linux-arm-kernel
Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
I did a regex search for wmb() followed by writel() in each drivers
directory.
I scrubbed the ones I care about in this series.
I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering.
We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers.
Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.
We start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.
Feel free to apply patches individually.
Changes since v6:
- bring back amazon ena and add mmiowb, remove
  ena_com_write_sq_doorbell_rel(). 
- remove extra mmiowb in bnx2x
- correct spelling mistake in  bnx2x: Replace doorbell barrier() with wmb()
Sinan Kaya (7):
  net: qla3xxx: Eliminate duplicate barriers on weakly-ordered archs
  qlcnic: Eliminate duplicate barriers on weakly-ordered archs
  bnx2x: Replace doorbell barrier() with wmb()
  bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  net: qlge: Eliminate duplicate barriers on weakly-ordered archs
  bnxt_en: Eliminate duplicate barriers on weakly-ordered archs
  net: ena: Eliminate duplicate barriers on weakly-ordered archs
 drivers/net/ethernet/amazon/ena/ena_com.c           |  8 ++++++--
 drivers/net/ethernet/amazon/ena/ena_eth_com.h       |  8 ++++++--
 drivers/net/ethernet/amazon/ena/ena_netdev.c        |  5 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h         | 12 ++++++++----
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c     |  5 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h     |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c |  5 +++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c    |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c    |  4 +++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c           |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h           |  9 +++++++++
 drivers/net/ethernet/qlogic/qla3xxx.c               |  5 +++--
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c |  2 +-
 drivers/net/ethernet/qlogic/qlge/qlge.h             | 16 ++++++++++++++++
 drivers/net/ethernet/qlogic/qlge/qlge_main.c        |  3 ++-
 15 files changed, 68 insertions(+), 24 deletions(-)
-- 
2.7.4
^ permalink raw reply	[flat|nested] 15+ messages in thread
* [PATCH v7 1/7] net: qla3xxx: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-25 14:39 [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
@ 2018-03-25 14:39 ` Sinan Kaya
  2018-03-25 14:39 ` [PATCH v7 2/7] qlcnic: " Sinan Kaya
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: linux-arm-kernel
Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing
the register write.
Since code already has an explicit barrier call, changing code to
wmb()
writel_relaxed()
mmiowb()
for multi-arch support.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c
index 9e5264d..b48f761 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -1858,8 +1858,9 @@ static void ql_update_small_bufq_prod_index(struct ql3_adapter *qdev)
 			qdev->small_buf_release_cnt -= 8;
 		}
 		wmb();
-		writel(qdev->small_buf_q_producer_index,
-			&port_regs->CommonRegs.rxSmallQProducerIndex);
+		writel_relaxed(qdev->small_buf_q_producer_index,
+			       &port_regs->CommonRegs.rxSmallQProducerIndex);
+		mmiowb();
 	}
 }
 
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 15+ messages in thread
* [PATCH v7 2/7] qlcnic: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-25 14:39 [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
  2018-03-25 14:39 ` [PATCH v7 1/7] net: qla3xxx: " Sinan Kaya
@ 2018-03-25 14:39 ` Sinan Kaya
  2018-03-25 14:39 ` [PATCH v7 3/7] bnx2x: Replace doorbell barrier() with wmb() Sinan Kaya
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: linux-arm-kernel
Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing
the register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Acked-by: Manish Chopra <manish.chopra@cavium.com>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
index 46b0372..97c146e7 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
@@ -478,7 +478,7 @@ irqreturn_t qlcnic_83xx_clear_legacy_intr(struct qlcnic_adapter *adapter)
 	wmb();
 
 	/* clear the interrupt trigger control register */
-	writel(0, adapter->isr_int_vec);
+	writel_relaxed(0, adapter->isr_int_vec);
 	intr_val = readl(adapter->isr_int_vec);
 	do {
 		intr_val = readl(adapter->tgt_status_reg);
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 15+ messages in thread
* [PATCH v7 3/7] bnx2x: Replace doorbell barrier() with wmb()
  2018-03-25 14:39 [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
  2018-03-25 14:39 ` [PATCH v7 1/7] net: qla3xxx: " Sinan Kaya
  2018-03-25 14:39 ` [PATCH v7 2/7] qlcnic: " Sinan Kaya
@ 2018-03-25 14:39 ` Sinan Kaya
  2018-03-29  9:17   ` Elior, Ariel
  2018-03-25 14:39 ` [PATCH v7 4/7] bnx2x: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: linux-arm-kernel
barrier() doesn't guarantee memory writes to be observed by the hardware on
all architectures. barrier() only tells compiler not to move this code
with respect to other read/writes.
If memory write needs to be observed by the HW, wmb() is the right choice.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c     | 3 ++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index d7c98e8..0f86f18 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4153,7 +4153,8 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	wmb();
 
 	txdata->tx_db.data.prod += nbd;
-	barrier();
+	/* make sure descriptor update is observed by HW */
+	wmb();
 
 	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index 1e33abd..39af4f8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -2591,7 +2591,8 @@ static int bnx2x_run_loopback(struct bnx2x *bp, int loopback_mode)
 	wmb();
 
 	txdata->tx_db.data.prod += 2;
-	barrier();
+	/* make sure descriptor update is observed by the HW */
+	wmb();
 	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
 
 	mmiowb();
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 15+ messages in thread
* [PATCH v7 4/7] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-25 14:39 [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
                   ` (2 preceding siblings ...)
  2018-03-25 14:39 ` [PATCH v7 3/7] bnx2x: Replace doorbell barrier() with wmb() Sinan Kaya
@ 2018-03-25 14:39 ` Sinan Kaya
  2018-03-25 14:39 ` [PATCH v7 5/7] net: qlge: " Sinan Kaya
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: linux-arm-kernel
Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing
the register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h         | 12 ++++++++----
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c     |  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h     |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c |  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c    |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c    |  4 +++-
 6 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 352beff..d847e1b 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -166,6 +166,12 @@ do {						\
 #define REG_RD8(bp, offset)		readb(REG_ADDR(bp, offset))
 #define REG_RD16(bp, offset)		readw(REG_ADDR(bp, offset))
 
+#define REG_WR_RELAXED(bp, offset, val)	\
+	writel_relaxed((u32)val, REG_ADDR(bp, offset))
+
+#define REG_WR16_RELAXED(bp, offset, val) \
+	writew_relaxed((u16)val, REG_ADDR(bp, offset))
+
 #define REG_WR(bp, offset, val)		writel((u32)val, REG_ADDR(bp, offset))
 #define REG_WR8(bp, offset, val)	writeb((u8)val, REG_ADDR(bp, offset))
 #define REG_WR16(bp, offset, val)	writew((u16)val, REG_ADDR(bp, offset))
@@ -758,10 +764,8 @@ struct bnx2x_fastpath {
 #if (BNX2X_DB_SHIFT < BNX2X_DB_MIN_SHIFT)
 #error "Min DB doorbell stride is 8"
 #endif
-#define DOORBELL(bp, cid, val) \
-	do { \
-		writel((u32)(val), bp->doorbells + (bp->db_size * (cid))); \
-	} while (0)
+#define DOORBELL_RELAXED(bp, cid, val) \
+	writel_relaxed((u32)(val), (bp)->doorbells + ((bp)->db_size * (cid)))
 
 /* TX CSUM helpers */
 #define SKB_CS_OFF(skb)		(offsetof(struct tcphdr, check) - \
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 0f86f18..95871576 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4156,7 +4156,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* make sure descriptor update is observed by HW */
 	wmb();
 
-	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
+	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
 
 	mmiowb();
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index a5265e1..a8ce5c5 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -522,8 +522,8 @@ static inline void bnx2x_update_rx_prod(struct bnx2x *bp,
 	wmb();
 
 	for (i = 0; i < sizeof(rx_prods)/4; i++)
-		REG_WR(bp, fp->ustorm_rx_prods_offset + i*4,
-		       ((u32 *)&rx_prods)[i]);
+		REG_WR_RELAXED(bp, fp->ustorm_rx_prods_offset + i * 4,
+			       ((u32 *)&rx_prods)[i]);
 
 	mmiowb(); /* keep prod updates ordered */
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index 39af4f8..da18aa2 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -2593,7 +2593,7 @@ static int bnx2x_run_loopback(struct bnx2x *bp, int loopback_mode)
 	txdata->tx_db.data.prod += 2;
 	/* make sure descriptor update is observed by the HW */
 	wmb();
-	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
+	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
 
 	mmiowb();
 	barrier();
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 74fc9af..146c40d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -3817,8 +3817,8 @@ static void bnx2x_sp_prod_update(struct bnx2x *bp)
 	 */
 	mb();
 
-	REG_WR16(bp, BAR_XSTRORM_INTMEM + XSTORM_SPQ_PROD_OFFSET(func),
-		 bp->spq_prod_idx);
+	REG_WR16_RELAXED(bp, BAR_XSTRORM_INTMEM + XSTORM_SPQ_PROD_OFFSET(func),
+			 bp->spq_prod_idx);
 	mmiowb();
 }
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index 76a4668..8e0a317 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -170,7 +170,9 @@ static int bnx2x_send_msg2pf(struct bnx2x *bp, u8 *done, dma_addr_t msg_mapping)
 	wmb();
 
 	/* Trigger the PF FW */
-	writeb(1, &zone_data->trigger.vf_pf_channel.addr_valid);
+	writeb_relaxed(1, &zone_data->trigger.vf_pf_channel.addr_valid);
+
+	mmiowb();
 
 	/* Wait for PF to complete */
 	while ((tout >= 0) && (!*done)) {
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 15+ messages in thread
* [PATCH v7 5/7] net: qlge: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-25 14:39 [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
                   ` (3 preceding siblings ...)
  2018-03-25 14:39 ` [PATCH v7 4/7] bnx2x: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
@ 2018-03-25 14:39 ` Sinan Kaya
  2018-03-25 14:39 ` [PATCH v7 6/7] bnxt_en: " Sinan Kaya
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: linux-arm-kernel
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/qlogic/qlge/qlge.h      | 16 ++++++++++++++++
 drivers/net/ethernet/qlogic/qlge/qlge_main.c |  3 ++-
 2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge.h b/drivers/net/ethernet/qlogic/qlge/qlge.h
index 84ac50f..3e71b65 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge.h
+++ b/drivers/net/ethernet/qlogic/qlge/qlge.h
@@ -2185,6 +2185,22 @@ static inline void ql_write_db_reg(u32 val, void __iomem *addr)
 }
 
 /*
+ * Doorbell Registers:
+ * Doorbell registers are virtual registers in the PCI memory space.
+ * The space is allocated by the chip during PCI initialization.  The
+ * device driver finds the doorbell address in BAR 3 in PCI config space.
+ * The registers are used to control outbound and inbound queues. For
+ * example, the producer index for an outbound queue.  Each queue uses
+ * 1 4k chunk of memory.  The lower half of the space is for outbound
+ * queues. The upper half is for inbound queues.
+ * Caller has to guarantee ordering.
+ */
+static inline void ql_write_db_reg_relaxed(u32 val, void __iomem *addr)
+{
+	writel_relaxed(val, addr);
+}
+
+/*
  * Shadow Registers:
  * Outbound queues have a consumer index that is maintained by the chip.
  * Inbound queues have a producer index that is maintained by the chip.
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 50038d9..8293c202 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2700,7 +2700,8 @@ static netdev_tx_t qlge_send(struct sk_buff *skb, struct net_device *ndev)
 		tx_ring->prod_idx = 0;
 	wmb();
 
-	ql_write_db_reg(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
+	ql_write_db_reg_relaxed(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
+	mmiowb();
 	netif_printk(qdev, tx_queued, KERN_DEBUG, qdev->ndev,
 		     "tx queued, slot %d, len %d\n",
 		     tx_ring->prod_idx, skb->len);
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 15+ messages in thread
* [PATCH v7 6/7] bnxt_en: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-25 14:39 [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
                   ` (4 preceding siblings ...)
  2018-03-25 14:39 ` [PATCH v7 5/7] net: qlge: " Sinan Kaya
@ 2018-03-25 14:39 ` Sinan Kaya
  2018-03-25 20:24   ` Michael Chan
  2018-03-25 14:39 ` [PATCH v7 7/7] net: ena: " Sinan Kaya
  2018-03-26 16:48 ` [PATCH v7 0/7] netdev: " David Miller
  7 siblings, 1 reply; 15+ messages in thread
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: linux-arm-kernel
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Also add mmiowb() so that write code doesn't move outside of scope.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 1500243..befb538 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1922,7 +1922,7 @@ static int bnxt_poll_work(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
 		/* Sync BD data before updating doorbell */
 		wmb();
 
-		bnxt_db_write(bp, db, DB_KEY_TX | prod);
+		bnxt_db_write_relaxed(bp, db, DB_KEY_TX | prod);
 	}
 
 	cpr->cp_raw_cons = raw_cons;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 1989c47..5e453b9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1401,6 +1401,15 @@ static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
 		((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask);
 }
 
+/* For TX and RX ring doorbells with no ordering guarantee*/
+static inline void bnxt_db_write_relaxed(struct bnxt *bp, void __iomem *db,
+					 u32 val)
+{
+	writel_relaxed(val, db);
+	if (bp->flags & BNXT_FLAG_DOUBLE_DB)
+		writel_relaxed(val, db);
+}
+
 /* For TX and RX ring doorbells */
 static inline void bnxt_db_write(struct bnxt *bp, void __iomem *db, u32 val)
 {
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 15+ messages in thread
* [PATCH v7 7/7] net: ena: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-25 14:39 [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
                   ` (5 preceding siblings ...)
  2018-03-25 14:39 ` [PATCH v7 6/7] bnxt_en: " Sinan Kaya
@ 2018-03-25 14:39 ` Sinan Kaya
  2018-03-26 16:48 ` [PATCH v7 0/7] netdev: " David Miller
  7 siblings, 0 replies; 15+ messages in thread
From: Sinan Kaya @ 2018-03-25 14:39 UTC (permalink / raw)
  To: linux-arm-kernel
Code includes barrier() followed by writel(). writel() already has a
barrier on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a barrier().
Since code already has an explicit barrier call, changing writel() to
writel_relaxed() and adding mmiowb() for ordering protection.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/amazon/ena/ena_com.c     | 8 ++++++--
 drivers/net/ethernet/amazon/ena/ena_eth_com.h | 8 ++++++--
 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 5 +++--
 3 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index bf2de52..1b9d3130 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -631,8 +631,10 @@ static u32 ena_com_reg_bar_read32(struct ena_com_dev *ena_dev, u16 offset)
 	 */
 	wmb();
 
-	writel(mmio_read_reg, ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);
+	writel_relaxed(mmio_read_reg,
+		       ena_dev->reg_bar + ENA_REGS_MMIO_REG_READ_OFF);
 
+	mmiowb();
 	for (i = 0; i < timeout; i++) {
 		if (read_resp->req_id == mmio_read->seq_num)
 			break;
@@ -1826,7 +1828,9 @@ void ena_com_aenq_intr_handler(struct ena_com_dev *dev, void *data)
 
 	/* write the aenq doorbell after all AENQ descriptors were read */
 	mb();
-	writel((u32)aenq->head, dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
+	writel_relaxed((u32)aenq->head,
+		       dev->reg_bar + ENA_REGS_AENQ_HEAD_DB_OFF);
+	mmiowb();
 }
 
 int ena_com_dev_reset(struct ena_com_dev *ena_dev,
diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.h b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
index 2f76572..6fdc753 100644
--- a/drivers/net/ethernet/amazon/ena/ena_eth_com.h
+++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
@@ -107,7 +107,8 @@ static inline int ena_com_sq_empty_space(struct ena_com_io_sq *io_sq)
 	return io_sq->q_depth - 1 - cnt;
 }
 
-static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq)
+static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq,
+					    bool relaxed)
 {
 	u16 tail;
 
@@ -116,7 +117,10 @@ static inline int ena_com_write_sq_doorbell(struct ena_com_io_sq *io_sq)
 	pr_debug("write submission queue doorbell for queue: %d tail: %d\n",
 		 io_sq->qid, tail);
 
-	writel(tail, io_sq->db_addr);
+	if (relaxed)
+		writel_relaxed(tail, io_sq->db_addr);
+	else
+		writel(tail, io_sq->db_addr);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 6975150..a822e70 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -556,7 +556,8 @@ static int ena_refill_rx_bufs(struct ena_ring *rx_ring, u32 num)
 		 * issue a doorbell
 		 */
 		wmb();
-		ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq);
+		ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq, true);
+		mmiowb();
 	}
 
 	rx_ring->next_to_use = next_to_use;
@@ -2151,7 +2152,7 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (netif_xmit_stopped(txq) || !skb->xmit_more) {
 		/* trigger the dma engine */
-		ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq);
+		ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq, false);
 		u64_stats_update_begin(&tx_ring->syncp);
 		tx_ring->tx_stats.doorbells++;
 		u64_stats_update_end(&tx_ring->syncp);
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 15+ messages in thread
* [PATCH v7 6/7] bnxt_en: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-25 14:39 ` [PATCH v7 6/7] bnxt_en: " Sinan Kaya
@ 2018-03-25 20:24   ` Michael Chan
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Chan @ 2018-03-25 20:24 UTC (permalink / raw)
  To: linux-arm-kernel
On Sun, Mar 25, 2018 at 7:39 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a wmb().
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Also add mmiowb() so that write code doesn't move outside of scope.
This line in the patch description is not needed anymore.  Other than that,
Acked-by: Michael Chan <michael.chan@broadcom.com>
Thanks.
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
^ permalink raw reply	[flat|nested] 15+ messages in thread
* [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-25 14:39 [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
                   ` (6 preceding siblings ...)
  2018-03-25 14:39 ` [PATCH v7 7/7] net: ena: " Sinan Kaya
@ 2018-03-26 16:48 ` David Miller
  2018-03-27 12:40   ` Sinan Kaya
  7 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2018-03-26 16:48 UTC (permalink / raw)
  To: linux-arm-kernel
From: Sinan Kaya <okaya@codeaurora.org>
Date: Sun, 25 Mar 2018 10:39:14 -0400
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> I did a regex search for wmb() followed by writel() in each drivers
> directory.
> I scrubbed the ones I care about in this series.
> 
> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.
> 
> We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
> 
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.
> 
> We start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
> 
> Feel free to apply patches individually.
> 
> Changes since v6:
> - bring back amazon ena and add mmiowb, remove
>   ena_com_write_sq_doorbell_rel(). 
> - remove extra mmiowb in bnx2x
> - correct spelling mistake in  bnx2x: Replace doorbell barrier() with wmb()
Series applied, thank you.
^ permalink raw reply	[flat|nested] 15+ messages in thread
* [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-26 16:48 ` [PATCH v7 0/7] netdev: " David Miller
@ 2018-03-27 12:40   ` Sinan Kaya
  2018-03-27 14:00     ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Sinan Kaya @ 2018-03-27 12:40 UTC (permalink / raw)
  To: linux-arm-kernel
Dave,
On 3/26/2018 12:48 PM, David Miller wrote:
> From: Sinan Kaya <okaya@codeaurora.org>
> Date: Sun, 25 Mar 2018 10:39:14 -0400
> 
>> Code includes wmb() followed by writel() in multiple places. writel()
>> already has a barrier on some architectures like arm64.
>>
>> This ends up CPU observing two barriers back to back before executing the
>> register write.
>>
>> Since code already has an explicit barrier call, changing writel() to
>> writel_relaxed().
>>
>> I did a regex search for wmb() followed by writel() in each drivers
>> directory.
>> I scrubbed the ones I care about in this series.
>>
>> I considered "ease of change", "popular usage" and "performance critical
>> path" as the determining criteria for my filtering.
>>
>> We used relaxed API heavily on ARM for a long time but
>> it did not exist on other architectures. For this reason, relaxed
>> architectures have been paying double penalty in order to use the common
>> drivers.
>>
>> Now that relaxed API is present on all architectures, we can go and scrub
>> all drivers to see what needs to change and what can remain.
>>
>> We start with mostly used ones and hope to increase the coverage over time.
>> It will take a while to cover all drivers.
>>
>> Feel free to apply patches individually.
>>
>> Changes since v6:
>> - bring back amazon ena and add mmiowb, remove
>>   ena_com_write_sq_doorbell_rel(). 
>> - remove extra mmiowb in bnx2x
>> - correct spelling mistake in  bnx2x: Replace doorbell barrier() with wmb()
> 
> Series applied, thank you.
> 
I don't know if you have been following "RFC on writel and writel_relaxed" thread
or not but there are some new developments about wmb() requirement. 
Basically, wmb() should never be used before writel() as writel() seem to
provide coherency and observability guarantee.
wmb()+writel_relaxed() is slower on some architectures than plain writel()
I'll have to rework these patches to have writel() only. 
Are you able to drop the applied ones so that I can post V7 or is it too late?
Sinan
-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply	[flat|nested] 15+ messages in thread
* [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-27 12:40   ` Sinan Kaya
@ 2018-03-27 14:00     ` David Miller
  2018-03-27 14:02       ` Sinan Kaya
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2018-03-27 14:00 UTC (permalink / raw)
  To: linux-arm-kernel
From: Sinan Kaya <okaya@codeaurora.org>
Date: Tue, 27 Mar 2018 08:40:41 -0400
> Are you able to drop the applied ones so that I can post V7 or is it
> too late?
I cannot "drop" changes from my tree since my tree is used by thousands
of people and therefore immutable.
You must therefore send me relative fixes or reverts.
^ permalink raw reply	[flat|nested] 15+ messages in thread
* [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-27 14:00     ` David Miller
@ 2018-03-27 14:02       ` Sinan Kaya
  0 siblings, 0 replies; 15+ messages in thread
From: Sinan Kaya @ 2018-03-27 14:02 UTC (permalink / raw)
  To: linux-arm-kernel
On 3/27/2018 10:00 AM, David Miller wrote:
> From: Sinan Kaya <okaya@codeaurora.org>
> Date: Tue, 27 Mar 2018 08:40:41 -0400
> 
>> Are you able to drop the applied ones so that I can post V7 or is it
>> too late?
> 
> I cannot "drop" changes from my tree since my tree is used by thousands
> of people and therefore immutable.
> 
> You must therefore send me relative fixes or reverts.
> 
Thanks, I'll send fixes. Just wanted to see whether it got merged or if it was
sitting on a branch.
-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply	[flat|nested] 15+ messages in thread
* [PATCH v7 3/7] bnx2x: Replace doorbell barrier() with wmb()
  2018-03-25 14:39 ` [PATCH v7 3/7] bnx2x: Replace doorbell barrier() with wmb() Sinan Kaya
@ 2018-03-29  9:17   ` Elior, Ariel
  2018-03-29 13:45     ` Sinan Kaya
  0 siblings, 1 reply; 15+ messages in thread
From: Elior, Ariel @ 2018-03-29  9:17 UTC (permalink / raw)
  To: linux-arm-kernel
> Subject: [PATCH v7 3/7] bnx2x: Replace doorbell barrier() with wmb()
> 
> barrier() doesn't guarantee memory writes to be observed by the hardware on
> all architectures. barrier() only tells compiler not to move this code
> with respect to other read/writes.
> 
> If memory write needs to be observed by the HW, wmb() is the right choice.
The wmb() is there (a couple of lines above). Your modification adds an
unnecessary fence which would hurt high pps scenarios. The memory
writes which the HW needs to observe are the buffer descriptors, not the
producer update message. The producer is written to the HW, and exists
on the stack. The barrier() is there to prevent the compiler from mixing the
order of the prod update message preparation and writing it to the host.
A possible alternative would be to move the existing wmb() to where
the barrier() is, achieving both goals, although in the existing design each
barrier has a distinct purpose. The comment location is misleading, though.
Thanks,
Ariel
^ permalink raw reply	[flat|nested] 15+ messages in thread
* [PATCH v7 3/7] bnx2x: Replace doorbell barrier() with wmb()
  2018-03-29  9:17   ` Elior, Ariel
@ 2018-03-29 13:45     ` Sinan Kaya
  0 siblings, 0 replies; 15+ messages in thread
From: Sinan Kaya @ 2018-03-29 13:45 UTC (permalink / raw)
  To: linux-arm-kernel
Hi Ariel,
On 3/29/2018 5:17 AM, Elior, Ariel wrote:
>> Subject: [PATCH v7 3/7] bnx2x: Replace doorbell barrier() with wmb()
>>
>> barrier() doesn't guarantee memory writes to be observed by the hardware on
>> all architectures. barrier() only tells compiler not to move this code
>> with respect to other read/writes.
>>
>> If memory write needs to be observed by the HW, wmb() is the right choice.
> The wmb() is there (a couple of lines above). Your modification adds an
> unnecessary fence which would hurt high pps scenarios. The memory
> writes which the HW needs to observe are the buffer descriptors, not the
> producer update message. The producer is written to the HW, and exists
> on the stack. The barrier() is there to prevent the compiler from mixing the
> order of the prod update message preparation and writing it to the host.
> A possible alternative would be to move the existing wmb() to where
> the barrier() is, achieving both goals, although in the existing design each
> barrier has a distinct purpose. The comment location is misleading, though.
I was told that barrier() is there to guarantee that HW is observing the memory
write before writel(). 
I reacted to this and changed barrier() to wmb() following the old directions. 
You are saying that this not true. 
Since then, Linus gave us direction not to have wmb() in front of writel() as
writel() already has memory-IO guarantee.
https://www.mail-archive.com/netdev at vger.kernel.org/msg225806.html
I'll be doing one more pass to remove wmb() before writel() soon. Please review
that carefully. 
Intel drivers use wmb() as a substitute for smp_wmb(). So, we can't always assume
that you can remove all wmb() in front of writel() as the write barrier seems to
serve dual purpose.
Please help me getting this right on the next version.
Sinan
-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply	[flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-03-29 13:45 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-25 14:39 [PATCH v7 0/7] netdev: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
2018-03-25 14:39 ` [PATCH v7 1/7] net: qla3xxx: " Sinan Kaya
2018-03-25 14:39 ` [PATCH v7 2/7] qlcnic: " Sinan Kaya
2018-03-25 14:39 ` [PATCH v7 3/7] bnx2x: Replace doorbell barrier() with wmb() Sinan Kaya
2018-03-29  9:17   ` Elior, Ariel
2018-03-29 13:45     ` Sinan Kaya
2018-03-25 14:39 ` [PATCH v7 4/7] bnx2x: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
2018-03-25 14:39 ` [PATCH v7 5/7] net: qlge: " Sinan Kaya
2018-03-25 14:39 ` [PATCH v7 6/7] bnxt_en: " Sinan Kaya
2018-03-25 20:24   ` Michael Chan
2018-03-25 14:39 ` [PATCH v7 7/7] net: ena: " Sinan Kaya
2018-03-26 16:48 ` [PATCH v7 0/7] netdev: " David Miller
2018-03-27 12:40   ` Sinan Kaya
2018-03-27 14:00     ` David Miller
2018-03-27 14:02       ` Sinan Kaya
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).