public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends
@ 2017-11-16 15:37 Brian King
  2017-11-16 15:37 ` [PATCH 1/7] ixgbe: Fix skb list corruption on Power systems Brian King
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Brian King @ 2017-11-16 15:37 UTC (permalink / raw)
  Cc: brking, jeffrey.t.kirsher, intel-wired-lan, stable, maurosr,
	Brian King

Resending as the first attempt is not showing up in the list archive.

This patch converts several network drivers to use smp_rmb
rather than read_barrier_depends. The initial issue was
discovered with ixgbe on a Power machine which resulted
in skb list corruption due to fetching a stale skb pointer.
More details can be found in the ixgbe patch description.

Brian King (7):
  ixgbe: Fix skb list corruption on Power systems
  i40e: Use smp_rmb rather than read_barrier_depends
  ixgbevf: Use smp_rmb rather than read_barrier_depends
  igbvf: Use smp_rmb rather than read_barrier_depends
  igb: Use smp_rmb rather than read_barrier_depends
  fm10k: Use smp_rmb rather than read_barrier_depends
  i40evf: Use smp_rmb rather than read_barrier_depends

 drivers/net/ethernet/intel/fm10k/fm10k_main.c     | 2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c       | 2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 2 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c     | 2 +-
 drivers/net/ethernet/intel/igb/igb_main.c         | 2 +-
 drivers/net/ethernet/intel/igbvf/netdev.c         | 2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 3 ++-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
 8 files changed, 9 insertions(+), 8 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/7] ixgbe: Fix skb list corruption on Power systems
  2017-11-16 15:37 [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends Brian King
@ 2017-11-16 15:37 ` Brian King
  2017-11-16 15:37 ` [PATCH 2/7] i40e: Use smp_rmb rather than read_barrier_depends Brian King
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Brian King @ 2017-11-16 15:37 UTC (permalink / raw)
  Cc: brking, jeffrey.t.kirsher, intel-wired-lan, stable, maurosr,
	Brian King

This patch fixes an issue seen on Power systems with ixgbe which results
in skb list corruption and an eventual kernel oops. The following is what
was observed:

CPU 1                                   CPU2
============================            ============================
1: ixgbe_xmit_frame_ring                ixgbe_clean_tx_irq
2:  first->skb = skb                     eop_desc = tx_buffer->next_to_watch
3:  ixgbe_tx_map                         read_barrier_depends()
4:   wmb                                 check adapter written status bit
5:   first->next_to_watch = tx_desc      napi_consume_skb(tx_buffer->skb ..);
6:   writel(i, tx_ring->tail);

The read_barrier_depends is insufficient to ensure that tx_buffer->skb does not
get loaded prior to tx_buffer->next_to_watch, which then results in loading
a stale skb pointer, since we aren't zeroing it, like is done in other
similar code in other networking drivers. This patch addresses both of these
issues, replacing the read_barrier_depends with an smp_rmb, which will ensure
the load of tx_buffer->skb will not occur until after the load from
tx_buffer->next_to_watch. Secondly, it zeroes tx_buffer->skb to make this
consistent with other Intel ethernet drivers and elso ensure we don't leave
a stale skb pointer sitting around.

Cc: stable<stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 6d5f31e..4d8c7bb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1192,7 +1192,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 			break;
 
 		/* prevent any other reads prior to eop_desc */
-		read_barrier_depends();
+		smp_rmb();
 
 		/* if DD is not set pending work has not been completed */
 		if (!(eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
@@ -1218,6 +1218,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 				 DMA_TO_DEVICE);
 
 		/* clear tx_buffer data */
+		tx_buffer->skb = NULL;
 		dma_unmap_len_set(tx_buffer, len, 0);
 
 		/* unmap remaining buffers */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/7] i40e: Use smp_rmb rather than read_barrier_depends
  2017-11-16 15:37 [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends Brian King
  2017-11-16 15:37 ` [PATCH 1/7] ixgbe: Fix skb list corruption on Power systems Brian King
@ 2017-11-16 15:37 ` Brian King
  2017-11-16 15:37 ` [PATCH 3/7] ixgbevf: " Brian King
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Brian King @ 2017-11-16 15:37 UTC (permalink / raw)
  Cc: brking, jeffrey.t.kirsher, intel-wired-lan, stable, maurosr,
	Brian King

The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with i40e as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.

Cc: stable<stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6498da8..ea20aac 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3760,7 +3760,7 @@ static bool i40e_clean_fdir_tx_irq(struct i40e_ring *tx_ring, int budget)
 			break;
 
 		/* prevent any other reads prior to eop_desc */
-		read_barrier_depends();
+		smp_rmb();
 
 		/* if the descriptor isn't done, no work yet to do */
 		if (!(eop_desc->cmd_type_offset_bsz &
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 120c68f..3c07ff1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -759,7 +759,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 			break;
 
 		/* prevent any other reads prior to eop_desc */
-		read_barrier_depends();
+		smp_rmb();
 
 		i40e_trace(clean_tx_irq, tx_ring, tx_desc, tx_buf);
 		/* we have caught up to head, no work left to do */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/7] ixgbevf: Use smp_rmb rather than read_barrier_depends
  2017-11-16 15:37 [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends Brian King
  2017-11-16 15:37 ` [PATCH 1/7] ixgbe: Fix skb list corruption on Power systems Brian King
  2017-11-16 15:37 ` [PATCH 2/7] i40e: Use smp_rmb rather than read_barrier_depends Brian King
@ 2017-11-16 15:37 ` Brian King
  2017-11-16 15:37 ` [PATCH 4/7] igbvf: " Brian King
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Brian King @ 2017-11-16 15:37 UTC (permalink / raw)
  Cc: brking, jeffrey.t.kirsher, intel-wired-lan, stable, maurosr,
	Brian King

The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with ixgbevf as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.

Cc: stable<stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 032f8ac..90ecc4b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -326,7 +326,7 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector *q_vector,
 			break;
 
 		/* prevent any other reads prior to eop_desc */
-		read_barrier_depends();
+		smp_rmb();
 
 		/* if DD is not set pending work has not been completed */
 		if (!(eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/7] igbvf: Use smp_rmb rather than read_barrier_depends
  2017-11-16 15:37 [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends Brian King
                   ` (2 preceding siblings ...)
  2017-11-16 15:37 ` [PATCH 3/7] ixgbevf: " Brian King
@ 2017-11-16 15:37 ` Brian King
  2017-11-16 15:37 ` [PATCH 5/7] igb: " Brian King
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Brian King @ 2017-11-16 15:37 UTC (permalink / raw)
  Cc: brking, jeffrey.t.kirsher, intel-wired-lan, stable, maurosr,
	Brian King

The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with igbvf as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.

Cc: stable<stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index 1ed5569..6f5888b 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -810,7 +810,7 @@ static bool igbvf_clean_tx_irq(struct igbvf_ring *tx_ring)
 			break;
 
 		/* prevent any other reads prior to eop_desc */
-		read_barrier_depends();
+		smp_rmb();
 
 		/* if DD is not set pending work has not been completed */
 		if (!(eop_desc->wb.status & cpu_to_le32(E1000_TXD_STAT_DD)))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/7] igb: Use smp_rmb rather than read_barrier_depends
  2017-11-16 15:37 [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends Brian King
                   ` (3 preceding siblings ...)
  2017-11-16 15:37 ` [PATCH 4/7] igbvf: " Brian King
@ 2017-11-16 15:37 ` Brian King
  2017-11-16 15:37 ` [PATCH 6/7] fm10k: " Brian King
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Brian King @ 2017-11-16 15:37 UTC (permalink / raw)
  Cc: brking, jeffrey.t.kirsher, intel-wired-lan, stable, maurosr,
	Brian King

The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with igb as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.

Cc: stable<stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index ea69af2..b0031c5 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6970,7 +6970,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
 			break;
 
 		/* prevent any other reads prior to eop_desc */
-		read_barrier_depends();
+		smp_rmb();
 
 		/* if DD is not set pending work has not been completed */
 		if (!(eop_desc->wb.status & cpu_to_le32(E1000_TXD_STAT_DD)))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/7] fm10k: Use smp_rmb rather than read_barrier_depends
  2017-11-16 15:37 [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends Brian King
                   ` (4 preceding siblings ...)
  2017-11-16 15:37 ` [PATCH 5/7] igb: " Brian King
@ 2017-11-16 15:37 ` Brian King
  2017-11-16 15:37 ` [PATCH 7/7] i40evf: " Brian King
  2017-11-16 19:33 ` [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: " Jesse Brandeburg
  7 siblings, 0 replies; 15+ messages in thread
From: Brian King @ 2017-11-16 15:37 UTC (permalink / raw)
  Cc: brking, jeffrey.t.kirsher, intel-wired-lan, stable, maurosr,
	Brian King

The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with fm10k as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.

Cc: stable<stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 9dffaba..103c0a7 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1229,7 +1229,7 @@ static bool fm10k_clean_tx_irq(struct fm10k_q_vector *q_vector,
 			break;
 
 		/* prevent any other reads prior to eop_desc */
-		read_barrier_depends();
+		smp_rmb();
 
 		/* if DD is not set pending work has not been completed */
 		if (!(eop_desc->flags & FM10K_TXD_FLAG_DONE))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/7] i40evf: Use smp_rmb rather than read_barrier_depends
  2017-11-16 15:37 [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends Brian King
                   ` (5 preceding siblings ...)
  2017-11-16 15:37 ` [PATCH 6/7] fm10k: " Brian King
@ 2017-11-16 15:37 ` Brian King
  2017-11-16 19:33 ` [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: " Jesse Brandeburg
  7 siblings, 0 replies; 15+ messages in thread
From: Brian King @ 2017-11-16 15:37 UTC (permalink / raw)
  Cc: brking, jeffrey.t.kirsher, intel-wired-lan, stable, maurosr,
	Brian King

The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with i40evf as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.

Cc: stable<stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index c32c624..07a4e6e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -179,7 +179,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 			break;
 
 		/* prevent any other reads prior to eop_desc */
-		read_barrier_depends();
+		smp_rmb();
 
 		i40e_trace(clean_tx_irq, tx_ring, tx_desc, tx_buf);
 		/* if the descriptor isn't done, no work yet to do */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends
  2017-11-16 15:37 [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends Brian King
                   ` (6 preceding siblings ...)
  2017-11-16 15:37 ` [PATCH 7/7] i40evf: " Brian King
@ 2017-11-16 19:33 ` Jesse Brandeburg
  2017-11-16 20:03   ` Brian King
  7 siblings, 1 reply; 15+ messages in thread
From: Jesse Brandeburg @ 2017-11-16 19:33 UTC (permalink / raw)
  To: Brian King
  Cc: stable, intel-wired-lan, brking, jesse.brandeburg,
	alexander.h.duyck

On Thu, 16 Nov 2017 09:37:48 -0600
Brian King <brking@linux.vnet.ibm.com> wrote:

> Resending as the first attempt is not showing up in the list archive.
> 
> This patch converts several network drivers to use smp_rmb
> rather than read_barrier_depends. The initial issue was
> discovered with ixgbe on a Power machine which resulted
> in skb list corruption due to fetching a stale skb pointer.
> More details can be found in the ixgbe patch description.

Thanks for the fix Brian, I bet it was a tough debug.

The only users in the entire kernel of read_barrier_depends() (not
smp_read_barrier_depends) are the Intel network drivers.

Wouldn't it be better for power to just fix read_barrier_depends to do
the right thing on power? The question I'm not sure of the answer to is:
Is it really the wrong barrier to be using or is the implementation in
the kernel powerpc wrong?

So I think the right thing might actually to be to:
Fix arch powerpc read_barrier_depends to not be a noop, as the
semantics of the read_barrier_depends seems to be sufficient to solve
this problem, but it seems not to work for powerpc?

Alex Duyck may have the most clarity on this problem, and possible
solutions so I made sure to CC him.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends
  2017-11-16 19:33 ` [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: " Jesse Brandeburg
@ 2017-11-16 20:03   ` Brian King
  2017-11-16 21:09     ` Duyck, Alexander H
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Brian King @ 2017-11-16 20:03 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: stable, intel-wired-lan, brking, alexander.h.duyck, dipankar,
	Michael Ellerman, linuxppc-dev

On 11/16/2017 01:33 PM, Jesse Brandeburg wrote:
> On Thu, 16 Nov 2017 09:37:48 -0600
> Brian King <brking@linux.vnet.ibm.com> wrote:
> 
>> Resending as the first attempt is not showing up in the list archive.
>>
>> This patch converts several network drivers to use smp_rmb
>> rather than read_barrier_depends. The initial issue was
>> discovered with ixgbe on a Power machine which resulted
>> in skb list corruption due to fetching a stale skb pointer.
>> More details can be found in the ixgbe patch description.
> 
> Thanks for the fix Brian, I bet it was a tough debug.
> 
> The only users in the entire kernel of read_barrier_depends() (not
> smp_read_barrier_depends) are the Intel network drivers.
> 
> Wouldn't it be better for power to just fix read_barrier_depends to do
> the right thing on power? The question I'm not sure of the answer to is:
> Is it really the wrong barrier to be using or is the implementation in
> the kernel powerpc wrong?
> 
> So I think the right thing might actually to be to:
> Fix arch powerpc read_barrier_depends to not be a noop, as the
> semantics of the read_barrier_depends seems to be sufficient to solve
> this problem, but it seems not to work for powerpc?

Jesse,

Thanks for the quick response.

Cc'ing linuxppc-dev as well. 

I did think about changing the powerpc definition of read_barrier_depends,
but after reading up on that barrier, decided it was not the correct barrier
to be used in this context. Here is some good historical background on
read_barrier_depends that I found, along with an example.

https://lwn.net/Articles/5159/

Since there is no data-dependency in the code in question here, I think
the smp_rmb is the proper barrier to use.

For background, the code in question looks like this:

CPU 1                                   CPU2
============================            ============================
1: ixgbe_xmit_frame_ring                ixgbe_clean_tx_irq
2:  first->skb = skb                     eop_desc = tx_buffer->next_to_watch
                                         if (!eop_desc)
                                             break;
3:  ixgbe_tx_map                         read_barrier_depends()
                                         if (!(eop_desc->wb.status) ... )
                                             break;
4:   wmb                                 
5:   first->next_to_watch = tx_desc      napi_consume_skb(tx_buffer->skb ..);
6:   writel(i, tx_ring->tail);

What we see on powerpc is that tx_buffer->skb on CPU2 is getting loaded
prior to tx_buffer->next_to_watch. Changing the read_barrier_depends
to a smp_rmb solves this and prevents us from dereferencing old pointer.

-Brian


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends
  2017-11-16 20:03   ` Brian King
@ 2017-11-16 21:09     ` Duyck, Alexander H
  2017-11-16 22:01     ` Jesse Brandeburg
  2017-11-16 22:57     ` Michael Ellerman
  2 siblings, 0 replies; 15+ messages in thread
From: Duyck, Alexander H @ 2017-11-16 21:09 UTC (permalink / raw)
  To: Brandeburg, Jesse, brking@linux.vnet.ibm.com
  Cc: dipankar@linux.vnet.ibm.com, michaele@au1.ibm.com,
	linuxppc-dev@lists.ozlabs.org, intel-wired-lan@lists.osuosl.org,
	stable@vger.kernel.org, brking@pobox.com

On Thu, 2017-11-16 at 14:03 -0600, Brian King wrote:
> On 11/16/2017 01:33 PM, Jesse Brandeburg wrote:
> > On Thu, 16 Nov 2017 09:37:48 -0600
> > Brian King <brking@linux.vnet.ibm.com> wrote:
> > 
> > > Resending as the first attempt is not showing up in the list archive.
> > > 
> > > This patch converts several network drivers to use smp_rmb
> > > rather than read_barrier_depends. The initial issue was
> > > discovered with ixgbe on a Power machine which resulted
> > > in skb list corruption due to fetching a stale skb pointer.
> > > More details can be found in the ixgbe patch description.
> > 
> > Thanks for the fix Brian, I bet it was a tough debug.
> > 
> > The only users in the entire kernel of read_barrier_depends() (not
> > smp_read_barrier_depends) are the Intel network drivers.
> > 
> > Wouldn't it be better for power to just fix read_barrier_depends to do
> > the right thing on power? The question I'm not sure of the answer to is:
> > Is it really the wrong barrier to be using or is the implementation in
> > the kernel powerpc wrong?
> > 
> > So I think the right thing might actually to be to:
> > Fix arch powerpc read_barrier_depends to not be a noop, as the
> > semantics of the read_barrier_depends seems to be sufficient to solve
> > this problem, but it seems not to work for powerpc?
> 
> Jesse,
> 
> Thanks for the quick response.
> 
> Cc'ing linuxppc-dev as well. 
> 
> I did think about changing the powerpc definition of read_barrier_depends,
> but after reading up on that barrier, decided it was not the correct barrier
> to be used in this context. Here is some good historical background on
> read_barrier_depends that I found, along with an example.
> 
> https://lwn.net/Articles/5159/
> 
> Since there is no data-dependency in the code in question here, I think
> the smp_rmb is the proper barrier to use.
> 
> For background, the code in question looks like this:
> 
> CPU 1                                   CPU2
> ============================            ============================
> 1: ixgbe_xmit_frame_ring                ixgbe_clean_tx_irq
> 2:  first->skb = skb                     eop_desc = tx_buffer->next_to_watch
>                                          if (!eop_desc)
>                                              break;
> 3:  ixgbe_tx_map                         read_barrier_depends()
>                                          if (!(eop_desc->wb.status) ... )
>                                              break;
> 4:   wmb                                 
> 5:   first->next_to_watch = tx_desc      napi_consume_skb(tx_buffer->skb ..);
> 6:   writel(i, tx_ring->tail);
> 
> What we see on powerpc is that tx_buffer->skb on CPU2 is getting loaded
> prior to tx_buffer->next_to_watch. Changing the read_barrier_depends
> to a smp_rmb solves this and prevents us from dereferencing old pointer.
> 
> -Brian

So the barrier part I am okay with for all the drivers. I hadn't
accounted for the skb being read before next_to_watch. I was more
concerned about the descriptor ring versus buffer_info structure at the
time I made use of that.

The updates to clear tx_buffer->skb in ixgbe I am not okay with.
Basically the tell-tale sign for skb present is next_to_watch being
non-null. The extra writes add overhead and I want to avoid that at all
costs since I want to avoid as much bouncing between the xmit path and
the Tx clean-up as possible.

- Alex


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends
  2017-11-16 20:03   ` Brian King
  2017-11-16 21:09     ` Duyck, Alexander H
@ 2017-11-16 22:01     ` Jesse Brandeburg
  2017-11-16 22:57     ` Michael Ellerman
  2 siblings, 0 replies; 15+ messages in thread
From: Jesse Brandeburg @ 2017-11-16 22:01 UTC (permalink / raw)
  To: Brian King
  Cc: stable, intel-wired-lan, brking, alexander.h.duyck, dipankar,
	Michael Ellerman, linuxppc-dev

On Thu, 16 Nov 2017 14:03:02 -0600
Brian King <brking@linux.vnet.ibm.com> wrote:
> I did think about changing the powerpc definition of read_barrier_depends,
> but after reading up on that barrier, decided it was not the correct barrier
> to be used in this context. Here is some good historical background on
> read_barrier_depends that I found, along with an example.
> 
> https://lwn.net/Articles/5159/
> 
> Since there is no data-dependency in the code in question here, I think
> the smp_rmb is the proper barrier to use.

Hey Brian, thanks for the explanation, I'll agree with you and Alex
that the smb_rmb replacement is okay.  Does your test still pass
without the ->skb NULLs?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends
  2017-11-16 20:03   ` Brian King
  2017-11-16 21:09     ` Duyck, Alexander H
  2017-11-16 22:01     ` Jesse Brandeburg
@ 2017-11-16 22:57     ` Michael Ellerman
  2017-11-17 16:16       ` Brian King
  2 siblings, 1 reply; 15+ messages in thread
From: Michael Ellerman @ 2017-11-16 22:57 UTC (permalink / raw)
  To: Brian King, Jesse Brandeburg
  Cc: alexander.h.duyck, dipankar, stable, intel-wired-lan,
	linuxppc-dev, brking

Brian King <brking@linux.vnet.ibm.com> writes:

> On 11/16/2017 01:33 PM, Jesse Brandeburg wrote:
>> On Thu, 16 Nov 2017 09:37:48 -0600
>> Brian King <brking@linux.vnet.ibm.com> wrote:
>> 
>>> Resending as the first attempt is not showing up in the list archive.
>>>
>>> This patch converts several network drivers to use smp_rmb
>>> rather than read_barrier_depends. The initial issue was
>>> discovered with ixgbe on a Power machine which resulted
>>> in skb list corruption due to fetching a stale skb pointer.
>>> More details can be found in the ixgbe patch description.
>> 
>> Thanks for the fix Brian, I bet it was a tough debug.
>> 
>> The only users in the entire kernel of read_barrier_depends() (not
>> smp_read_barrier_depends) are the Intel network drivers.
>> 
>> Wouldn't it be better for power to just fix read_barrier_depends to do
>> the right thing on power? The question I'm not sure of the answer to is:
>> Is it really the wrong barrier to be using or is the implementation in
>> the kernel powerpc wrong?
>> 
>> So I think the right thing might actually to be to:
>> Fix arch powerpc read_barrier_depends to not be a noop, as the
>> semantics of the read_barrier_depends seems to be sufficient to solve
>> this problem, but it seems not to work for powerpc?
>
> Jesse,
>
> Thanks for the quick response.
>
> Cc'ing linuxppc-dev as well. 
>
> I did think about changing the powerpc definition of read_barrier_depends,
> but after reading up on that barrier, decided it was not the correct barrier
> to be used in this context. Here is some good historical background on
> read_barrier_depends that I found, along with an example.
>
> https://lwn.net/Articles/5159/
>
> Since there is no data-dependency in the code in question here, I think
> the smp_rmb is the proper barrier to use.

Yes I agree.

The read_barrier_depends() is correct to order the load of eop_desc and
then the dependent load of eop_desc->wb.status, but it's only required
or does anything on Alpha.

> For background, the code in question looks like this:
>
> CPU 1                                   CPU2
> ============================            ============================
> 1: ixgbe_xmit_frame_ring                ixgbe_clean_tx_irq
> 2:  first->skb = skb                     eop_desc = tx_buffer->next_to_watch
>                                          if (!eop_desc)
>                                              break;
> 3:  ixgbe_tx_map                         read_barrier_depends()
>                                          if (!(eop_desc->wb.status) ... )
>                                              break;
> 4:   wmb                                 
> 5:   first->next_to_watch = tx_desc      napi_consume_skb(tx_buffer->skb ..);
> 6:   writel(i, tx_ring->tail);
>
> What we see on powerpc is that tx_buffer->skb on CPU2 is getting loaded
> prior to tx_buffer->next_to_watch. Changing the read_barrier_depends
> to a smp_rmb solves this and prevents us from dereferencing old pointer.

Right. Given that read_barrier_depends() is a nop, there's nothing there
to order the load of tx_buffer->skb vs anything else.

If it's actually the load of tx_buffer->skb that's the issue then the
smp_rmb() should really be immediately prior to that, rather than where
the read_barrier_depends() currently is.

cheers

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends
  2017-11-16 22:57     ` Michael Ellerman
@ 2017-11-17 16:16       ` Brian King
  2017-11-17 16:50         ` Duyck, Alexander H
  0 siblings, 1 reply; 15+ messages in thread
From: Brian King @ 2017-11-17 16:16 UTC (permalink / raw)
  To: Jesse Brandeburg, alexander.h.duyck
  Cc: Michael Ellerman, dipankar, stable, intel-wired-lan, linuxppc-dev,
	brking

On 11/16/2017 04:57 PM, Michael Ellerman wrote:
> Brian King <brking@linux.vnet.ibm.com> writes:
> 
>> On 11/16/2017 01:33 PM, Jesse Brandeburg wrote:
>>> On Thu, 16 Nov 2017 09:37:48 -0600
>>> Brian King <brking@linux.vnet.ibm.com> wrote:
>>>
>>>> Resending as the first attempt is not showing up in the list archive.
>>>>
>>>> This patch converts several network drivers to use smp_rmb
>>>> rather than read_barrier_depends. The initial issue was
>>>> discovered with ixgbe on a Power machine which resulted
>>>> in skb list corruption due to fetching a stale skb pointer.
>>>> More details can be found in the ixgbe patch description.
>>>
>>> Thanks for the fix Brian, I bet it was a tough debug.
>>>
>>> The only users in the entire kernel of read_barrier_depends() (not
>>> smp_read_barrier_depends) are the Intel network drivers.
>>>
>>> Wouldn't it be better for power to just fix read_barrier_depends to do
>>> the right thing on power? The question I'm not sure of the answer to is:
>>> Is it really the wrong barrier to be using or is the implementation in
>>> the kernel powerpc wrong?
>>>
>>> So I think the right thing might actually to be to:
>>> Fix arch powerpc read_barrier_depends to not be a noop, as the
>>> semantics of the read_barrier_depends seems to be sufficient to solve
>>> this problem, but it seems not to work for powerpc?
>>
>> Jesse,
>>
>> Thanks for the quick response.
>>
>> Cc'ing linuxppc-dev as well. 
>>
>> I did think about changing the powerpc definition of read_barrier_depends,
>> but after reading up on that barrier, decided it was not the correct barrier
>> to be used in this context. Here is some good historical background on
>> read_barrier_depends that I found, along with an example.
>>
>> https://lwn.net/Articles/5159/
>>
>> Since there is no data-dependency in the code in question here, I think
>> the smp_rmb is the proper barrier to use.
> 
> Yes I agree.
> 
> The read_barrier_depends() is correct to order the load of eop_desc and
> then the dependent load of eop_desc->wb.status, but it's only required
> or does anything on Alpha.
> 
>> For background, the code in question looks like this:
>>
>> CPU 1                                   CPU2
>> ============================            ============================
>> 1: ixgbe_xmit_frame_ring                ixgbe_clean_tx_irq
>> 2:  first->skb = skb                     eop_desc = tx_buffer->next_to_watch
>>                                          if (!eop_desc)
>>                                              break;
>> 3:  ixgbe_tx_map                         read_barrier_depends()
>>                                          if (!(eop_desc->wb.status) ... )
>>                                              break;
>> 4:   wmb                                 
>> 5:   first->next_to_watch = tx_desc      napi_consume_skb(tx_buffer->skb ..);
>> 6:   writel(i, tx_ring->tail);
>>
>> What we see on powerpc is that tx_buffer->skb on CPU2 is getting loaded
>> prior to tx_buffer->next_to_watch. Changing the read_barrier_depends
>> to a smp_rmb solves this and prevents us from dereferencing old pointer.
> 
> Right. Given that read_barrier_depends() is a nop, there's nothing there
> to order the load of tx_buffer->skb vs anything else.
> 
> If it's actually the load of tx_buffer->skb that's the issue then the
> smp_rmb() should really be immediately prior to that, rather than where
> the read_barrier_depends() currently is.

Alex,

How would you like to proceed? read_barrier_depends is a noop on all archs
except alpha and blackfin. On those two archs, read_barrier_depends and
smp_rmb end up resulting in the same code. So, I can either:

1. Remove the setting of tx_buffer->skb to NULL to address your concern and proceed
with the rest of the patch set unchanged.

2. Leave the read_barrier_depends, as it is the right barrier to order the load
of eop_desc with respect to eop_desc->wb.status, and then *add* an smp_rmb in
the same code path to address the speculative load of the skb that I was running into.
This is arguably more pure from the perspective of the use of the different
barriers, but has the downside of additional overhead on alpha and blackfin.

Do you have a preference? 

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends
  2017-11-17 16:16       ` Brian King
@ 2017-11-17 16:50         ` Duyck, Alexander H
  0 siblings, 0 replies; 15+ messages in thread
From: Duyck, Alexander H @ 2017-11-17 16:50 UTC (permalink / raw)
  To: Brandeburg, Jesse, brking@linux.vnet.ibm.com
  Cc: michaele@au1.ibm.com, dipankar@linux.vnet.ibm.com,
	intel-wired-lan@lists.osuosl.org, stable@vger.kernel.org,
	brking@pobox.com, linuxppc-dev@lists.ozlabs.org

On Fri, 2017-11-17 at 10:16 -0600, Brian King wrote:
> On 11/16/2017 04:57 PM, Michael Ellerman wrote:
> > Brian King <brking@linux.vnet.ibm.com> writes:
> > 
> > > On 11/16/2017 01:33 PM, Jesse Brandeburg wrote:
> > > > On Thu, 16 Nov 2017 09:37:48 -0600
> > > > Brian King <brking@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > Resending as the first attempt is not showing up in the list archive.
> > > > > 
> > > > > This patch converts several network drivers to use smp_rmb
> > > > > rather than read_barrier_depends. The initial issue was
> > > > > discovered with ixgbe on a Power machine which resulted
> > > > > in skb list corruption due to fetching a stale skb pointer.
> > > > > More details can be found in the ixgbe patch description.
> > > > 
> > > > Thanks for the fix Brian, I bet it was a tough debug.
> > > > 
> > > > The only users in the entire kernel of read_barrier_depends() (not
> > > > smp_read_barrier_depends) are the Intel network drivers.
> > > > 
> > > > Wouldn't it be better for power to just fix read_barrier_depends to do
> > > > the right thing on power? The question I'm not sure of the answer to is:
> > > > Is it really the wrong barrier to be using or is the implementation in
> > > > the kernel powerpc wrong?
> > > > 
> > > > So I think the right thing might actually to be to:
> > > > Fix arch powerpc read_barrier_depends to not be a noop, as the
> > > > semantics of the read_barrier_depends seems to be sufficient to solve
> > > > this problem, but it seems not to work for powerpc?
> > > 
> > > Jesse,
> > > 
> > > Thanks for the quick response.
> > > 
> > > Cc'ing linuxppc-dev as well. 
> > > 
> > > I did think about changing the powerpc definition of read_barrier_depends,
> > > but after reading up on that barrier, decided it was not the correct barrier
> > > to be used in this context. Here is some good historical background on
> > > read_barrier_depends that I found, along with an example.
> > > 
> > > https://lwn.net/Articles/5159/
> > > 
> > > Since there is no data-dependency in the code in question here, I think
> > > the smp_rmb is the proper barrier to use.
> > 
> > Yes I agree.
> > 
> > The read_barrier_depends() is correct to order the load of eop_desc and
> > then the dependent load of eop_desc->wb.status, but it's only required
> > or does anything on Alpha.
> > 
> > > For background, the code in question looks like this:
> > > 
> > > CPU 1                                   CPU2
> > > ============================            ============================
> > > 1: ixgbe_xmit_frame_ring                ixgbe_clean_tx_irq
> > > 2:  first->skb = skb                     eop_desc = tx_buffer->next_to_watch
> > >                                          if (!eop_desc)
> > >                                              break;
> > > 3:  ixgbe_tx_map                         read_barrier_depends()
> > >                                          if (!(eop_desc->wb.status) ... )
> > >                                              break;
> > > 4:   wmb                                 
> > > 5:   first->next_to_watch = tx_desc      napi_consume_skb(tx_buffer->skb ..);
> > > 6:   writel(i, tx_ring->tail);
> > > 
> > > What we see on powerpc is that tx_buffer->skb on CPU2 is getting loaded
> > > prior to tx_buffer->next_to_watch. Changing the read_barrier_depends
> > > to a smp_rmb solves this and prevents us from dereferencing old pointer.
> > 
> > Right. Given that read_barrier_depends() is a nop, there's nothing there
> > to order the load of tx_buffer->skb vs anything else.
> > 
> > If it's actually the load of tx_buffer->skb that's the issue then the
> > smp_rmb() should really be immediately prior to that, rather than where
> > the read_barrier_depends() currently is.
> 
> Alex,
> 
> How would you like to proceed? read_barrier_depends is a noop on all archs
> except alpha and blackfin. On those two archs, read_barrier_depends and
> smp_rmb end up resulting in the same code. So, I can either:
> 
> 1. Remove the setting of tx_buffer->skb to NULL to address your concern and proceed
> with the rest of the patch set unchanged.

I am good with this option. We just need to be certain that it solves
the original issue you saw.

> 2. Leave the read_barrier_depends, as it is the right barrier to order the load
> of eop_desc with respect to eop_desc->wb.status, and then *add* an smp_rmb in
> the same code path to address the speculative load of the skb that I was running into.
> This is arguably more pure from the perspective of the use of the different
> barriers, but has the downside of additional overhead on alpha and blackfin.
> 
> Do you have a preference? 

If you have the smp_rmb there is no need for the read_barrier_depends
as having both barriers would be redundant anyway. It was there as more
of a mental place holder than anything else since I suspect these
drivers would never be run on an alpha architecture anyway.

> Thanks,
> 
> Brian

Thanks for finding this issue and taking the time to resolve it.

- Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-11-17 16:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-16 15:37 [PATCH 0/7] [RESEND] [net] intel: Use smp_rmb rather than read_barrier_depends Brian King
2017-11-16 15:37 ` [PATCH 1/7] ixgbe: Fix skb list corruption on Power systems Brian King
2017-11-16 15:37 ` [PATCH 2/7] i40e: Use smp_rmb rather than read_barrier_depends Brian King
2017-11-16 15:37 ` [PATCH 3/7] ixgbevf: " Brian King
2017-11-16 15:37 ` [PATCH 4/7] igbvf: " Brian King
2017-11-16 15:37 ` [PATCH 5/7] igb: " Brian King
2017-11-16 15:37 ` [PATCH 6/7] fm10k: " Brian King
2017-11-16 15:37 ` [PATCH 7/7] i40evf: " Brian King
2017-11-16 19:33 ` [Intel-wired-lan] [PATCH 0/7] [RESEND] [net] intel: " Jesse Brandeburg
2017-11-16 20:03   ` Brian King
2017-11-16 21:09     ` Duyck, Alexander H
2017-11-16 22:01     ` Jesse Brandeburg
2017-11-16 22:57     ` Michael Ellerman
2017-11-17 16:16       ` Brian King
2017-11-17 16:50         ` Duyck, Alexander H

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox