public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR
@ 2023-11-09  3:19 Jan Bottorff
  2023-11-09 18:00 ` Andy Shevchenko
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jan Bottorff @ 2023-11-09  3:19 UTC (permalink / raw)
  To: Wolfram Sang, Jarkko Nikula, Andy Shevchenko, Mika Westerberg,
	Jan Dabros, Andi Shyti
  Cc: linux-i2c, linux-kernel, Jan Bottorff

When running on a many core ARM64 server, errors were
happening in the ISR that looked like corrupted memory. These
corruptions would fix themselves if small delays were inserted
in the ISR. Errors reported by the driver included "i2c_designware
APMC0D0F:00: i2c_dw_xfer_msg: invalid target address" and
"i2c_designware APMC0D0F:00:controller timed out" during
in-band IPMI SSIF stress tests.

The problem was determined to be memory writes in the driver were not
becoming visible to all cores when execution rapidly shifted between
cores, like when a register write immediately triggers an ISR.
Processors with weak memory ordering, like ARM64, make no
guarantees about the order normal memory writes become globally
visible, unless barrier instructions are used to control ordering.

To solve this, regmap accessor functions configured by this driver
were changed to use non-relaxed forms of the low-level register
access functions, which include a barrier on platforms that require
it. This assures memory writes before a controller register access are
visible to all cores. The community concluded defaulting to correct
operation outweighed defaulting to the small performance gains from
using relaxed access functions. Being a low speed device added weight to
this choice of default register access behavior.

Signed-off-by: Jan Bottorff <janb@os.amperecomputing.com>
---
ChangeLog
v3->v4: add missing changelog
v2->v3: regmap accessors use non-relaxed form instead of wmb barrier
v1->v2: Commit message improvements
v1: insert wmb barrier before enabling interrupts

 drivers/i2c/busses/i2c-designware-common.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/i2c/busses/i2c-designware-common.c b/drivers/i2c/busses/i2c-designware-common.c
index affcfb243f0f..35f762872b8a 100644
--- a/drivers/i2c/busses/i2c-designware-common.c
+++ b/drivers/i2c/busses/i2c-designware-common.c
@@ -63,7 +63,7 @@ static int dw_reg_read(void *context, unsigned int reg, unsigned int *val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	*val = readl_relaxed(dev->base + reg);
+	*val = readl(dev->base + reg);
 
 	return 0;
 }
@@ -72,7 +72,7 @@ static int dw_reg_write(void *context, unsigned int reg, unsigned int val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	writel_relaxed(val, dev->base + reg);
+	writel(val, dev->base + reg);
 
 	return 0;
 }
@@ -81,7 +81,7 @@ static int dw_reg_read_swab(void *context, unsigned int reg, unsigned int *val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	*val = swab32(readl_relaxed(dev->base + reg));
+	*val = swab32(readl(dev->base + reg));
 
 	return 0;
 }
@@ -90,7 +90,7 @@ static int dw_reg_write_swab(void *context, unsigned int reg, unsigned int val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	writel_relaxed(swab32(val), dev->base + reg);
+	writel(swab32(val), dev->base + reg);
 
 	return 0;
 }
@@ -99,8 +99,8 @@ static int dw_reg_read_word(void *context, unsigned int reg, unsigned int *val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	*val = readw_relaxed(dev->base + reg) |
-		(readw_relaxed(dev->base + reg + 2) << 16);
+	*val = readw(dev->base + reg) |
+		(readw(dev->base + reg + 2) << 16);
 
 	return 0;
 }
@@ -109,8 +109,8 @@ static int dw_reg_write_word(void *context, unsigned int reg, unsigned int val)
 {
 	struct dw_i2c_dev *dev = context;
 
-	writew_relaxed(val, dev->base + reg);
-	writew_relaxed(val >> 16, dev->base + reg + 2);
+	writew(val, dev->base + reg);
+	writew(val >> 16, dev->base + reg + 2);
 
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR
  2023-11-09  3:19 [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR Jan Bottorff
@ 2023-11-09 18:00 ` Andy Shevchenko
  2023-11-10 12:02   ` Jarkko Nikula
  2023-11-13  1:54 ` Wolfram Sang
  2023-11-13  9:48 ` Serge Semin
  2 siblings, 1 reply; 9+ messages in thread
From: Andy Shevchenko @ 2023-11-09 18:00 UTC (permalink / raw)
  To: Jan Bottorff
  Cc: Wolfram Sang, Jarkko Nikula, Mika Westerberg, Jan Dabros,
	Andi Shyti, linux-i2c, linux-kernel

On Thu, Nov 09, 2023 at 03:19:27AM +0000, Jan Bottorff wrote:
> When running on a many core ARM64 server, errors were
> happening in the ISR that looked like corrupted memory. These
> corruptions would fix themselves if small delays were inserted
> in the ISR. Errors reported by the driver included "i2c_designware
> APMC0D0F:00: i2c_dw_xfer_msg: invalid target address" and
> "i2c_designware APMC0D0F:00:controller timed out" during
> in-band IPMI SSIF stress tests.
> 
> The problem was determined to be memory writes in the driver were not
> becoming visible to all cores when execution rapidly shifted between
> cores, like when a register write immediately triggers an ISR.
> Processors with weak memory ordering, like ARM64, make no
> guarantees about the order normal memory writes become globally
> visible, unless barrier instructions are used to control ordering.
> 
> To solve this, regmap accessor functions configured by this driver
> were changed to use non-relaxed forms of the low-level register
> access functions, which include a barrier on platforms that require
> it. This assures memory writes before a controller register access are
> visible to all cores. The community concluded defaulting to correct
> operation outweighed defaulting to the small performance gains from
> using relaxed access functions. Being a low speed device added weight to
> this choice of default register access behavior.

...

> v3->v4: add missing changelog

Side note: Usually it's enough to just reply to the patch with the changelog.

...

> -	*val = swab32(readl_relaxed(dev->base + reg));
> +	*val = swab32(readl(dev->base + reg));

> -	writel_relaxed(swab32(val), dev->base + reg);
> +	writel(swab32(val), dev->base + reg);

I'm wondering why ioread32be() / iowrite32be() can't be used here...

Probably it would require to switch entire IO to use ioreadXX() /
iowriteXX() APIs and since we touch all of them (?) may be it makes
sense convert to use them at the same time. Dunno.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR
  2023-11-09 18:00 ` Andy Shevchenko
@ 2023-11-10 12:02   ` Jarkko Nikula
  0 siblings, 0 replies; 9+ messages in thread
From: Jarkko Nikula @ 2023-11-10 12:02 UTC (permalink / raw)
  To: Andy Shevchenko, Jan Bottorff
  Cc: Wolfram Sang, Mika Westerberg, Jan Dabros, Andi Shyti, linux-i2c,
	linux-kernel

On 11/9/23 20:00, Andy Shevchenko wrote:
> On Thu, Nov 09, 2023 at 03:19:27AM +0000, Jan Bottorff wrote:
>> When running on a many core ARM64 server, errors were
>> happening in the ISR that looked like corrupted memory. These
>> corruptions would fix themselves if small delays were inserted
>> in the ISR. Errors reported by the driver included "i2c_designware
>> APMC0D0F:00: i2c_dw_xfer_msg: invalid target address" and
>> "i2c_designware APMC0D0F:00:controller timed out" during
>> in-band IPMI SSIF stress tests.
>>
>> The problem was determined to be memory writes in the driver were not
>> becoming visible to all cores when execution rapidly shifted between
>> cores, like when a register write immediately triggers an ISR.
>> Processors with weak memory ordering, like ARM64, make no
>> guarantees about the order normal memory writes become globally
>> visible, unless barrier instructions are used to control ordering.
>>
>> To solve this, regmap accessor functions configured by this driver
>> were changed to use non-relaxed forms of the low-level register
>> access functions, which include a barrier on platforms that require
>> it. This assures memory writes before a controller register access are
>> visible to all cores. The community concluded defaulting to correct
>> operation outweighed defaulting to the small performance gains from
>> using relaxed access functions. Being a low speed device added weight to
>> this choice of default register access behavior.
> 
> ...
> 
>> v3->v4: add missing changelog
> 
> Side note: Usually it's enough to just reply to the patch with the changelog.
> 
> ...
> 
>> -	*val = swab32(readl_relaxed(dev->base + reg));
>> +	*val = swab32(readl(dev->base + reg));
> 
>> -	writel_relaxed(swab32(val), dev->base + reg);
>> +	writel(swab32(val), dev->base + reg);
> 
> I'm wondering why ioread32be() / iowrite32be() can't be used here...
> 
> Probably it would require to switch entire IO to use ioreadXX() /
> iowriteXX() APIs and since we touch all of them (?) may be it makes
> sense convert to use them at the same time. Dunno.
> 
I would say cosmetic conversions are better to go to another patch.

For this patch:

Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR
  2023-11-09  3:19 [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR Jan Bottorff
  2023-11-09 18:00 ` Andy Shevchenko
@ 2023-11-13  1:54 ` Wolfram Sang
  2023-11-13  9:48 ` Serge Semin
  2 siblings, 0 replies; 9+ messages in thread
From: Wolfram Sang @ 2023-11-13  1:54 UTC (permalink / raw)
  To: Jan Bottorff
  Cc: Jarkko Nikula, Andy Shevchenko, Mika Westerberg, Jan Dabros,
	Andi Shyti, linux-i2c, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1548 bytes --]

On Thu, Nov 09, 2023 at 03:19:27AM +0000, Jan Bottorff wrote:
> When running on a many core ARM64 server, errors were
> happening in the ISR that looked like corrupted memory. These
> corruptions would fix themselves if small delays were inserted
> in the ISR. Errors reported by the driver included "i2c_designware
> APMC0D0F:00: i2c_dw_xfer_msg: invalid target address" and
> "i2c_designware APMC0D0F:00:controller timed out" during
> in-band IPMI SSIF stress tests.
> 
> The problem was determined to be memory writes in the driver were not
> becoming visible to all cores when execution rapidly shifted between
> cores, like when a register write immediately triggers an ISR.
> Processors with weak memory ordering, like ARM64, make no
> guarantees about the order normal memory writes become globally
> visible, unless barrier instructions are used to control ordering.
> 
> To solve this, regmap accessor functions configured by this driver
> were changed to use non-relaxed forms of the low-level register
> access functions, which include a barrier on platforms that require
> it. This assures memory writes before a controller register access are
> visible to all cores. The community concluded defaulting to correct
> operation outweighed defaulting to the small performance gains from
> using relaxed access functions. Being a low speed device added weight to
> this choice of default register access behavior.
> 
> Signed-off-by: Jan Bottorff <janb@os.amperecomputing.com>

Applied to for-current, thanks!


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR
  2023-11-09  3:19 [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR Jan Bottorff
  2023-11-09 18:00 ` Andy Shevchenko
  2023-11-13  1:54 ` Wolfram Sang
@ 2023-11-13  9:48 ` Serge Semin
  2023-11-13  9:51   ` Wolfram Sang
  2 siblings, 1 reply; 9+ messages in thread
From: Serge Semin @ 2023-11-13  9:48 UTC (permalink / raw)
  To: Jan Bottorff
  Cc: Wolfram Sang, Jarkko Nikula, Andy Shevchenko, Mika Westerberg,
	Jan Dabros, Andi Shyti, linux-i2c, linux-kernel

On Thu, Nov 09, 2023 at 03:19:27AM +0000, Jan Bottorff wrote:
> When running on a many core ARM64 server, errors were
> happening in the ISR that looked like corrupted memory. These
> corruptions would fix themselves if small delays were inserted
> in the ISR. Errors reported by the driver included "i2c_designware
> APMC0D0F:00: i2c_dw_xfer_msg: invalid target address" and
> "i2c_designware APMC0D0F:00:controller timed out" during
> in-band IPMI SSIF stress tests.
> 
> The problem was determined to be memory writes in the driver were not
> becoming visible to all cores when execution rapidly shifted between
> cores, like when a register write immediately triggers an ISR.
> Processors with weak memory ordering, like ARM64, make no
> guarantees about the order normal memory writes become globally
> visible, unless barrier instructions are used to control ordering.
> 
> To solve this, regmap accessor functions configured by this driver
> were changed to use non-relaxed forms of the low-level register
> access functions, which include a barrier on platforms that require
> it. This assures memory writes before a controller register access are
> visible to all cores. The community concluded defaulting to correct
> operation outweighed defaulting to the small performance gains from
> using relaxed access functions. Being a low speed device added weight to
> this choice of default register access behavior.
> 
> Signed-off-by: Jan Bottorff <janb@os.amperecomputing.com>

The patch has already been merged in, but in anyway:
Tested-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>

For the record. Next time don't forget to add all the reviewers to the
Cc-list on the patch(set) respin, otherwise the new version might get
to be missed amongst the other messages in their inbox. That in its
turn won't let them test it out and finish the review on time.

-Serge(y)

> ---
> ChangeLog
> v3->v4: add missing changelog
> v2->v3: regmap accessors use non-relaxed form instead of wmb barrier
> v1->v2: Commit message improvements
> v1: insert wmb barrier before enabling interrupts
> 
>  drivers/i2c/busses/i2c-designware-common.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-designware-common.c b/drivers/i2c/busses/i2c-designware-common.c
> index affcfb243f0f..35f762872b8a 100644
> --- a/drivers/i2c/busses/i2c-designware-common.c
> +++ b/drivers/i2c/busses/i2c-designware-common.c
> @@ -63,7 +63,7 @@ static int dw_reg_read(void *context, unsigned int reg, unsigned int *val)
>  {
>  	struct dw_i2c_dev *dev = context;
>  
> -	*val = readl_relaxed(dev->base + reg);
> +	*val = readl(dev->base + reg);
>  
>  	return 0;
>  }
> @@ -72,7 +72,7 @@ static int dw_reg_write(void *context, unsigned int reg, unsigned int val)
>  {
>  	struct dw_i2c_dev *dev = context;
>  
> -	writel_relaxed(val, dev->base + reg);
> +	writel(val, dev->base + reg);
>  
>  	return 0;
>  }
> @@ -81,7 +81,7 @@ static int dw_reg_read_swab(void *context, unsigned int reg, unsigned int *val)
>  {
>  	struct dw_i2c_dev *dev = context;
>  
> -	*val = swab32(readl_relaxed(dev->base + reg));
> +	*val = swab32(readl(dev->base + reg));
>  
>  	return 0;
>  }
> @@ -90,7 +90,7 @@ static int dw_reg_write_swab(void *context, unsigned int reg, unsigned int val)
>  {
>  	struct dw_i2c_dev *dev = context;
>  
> -	writel_relaxed(swab32(val), dev->base + reg);
> +	writel(swab32(val), dev->base + reg);
>  
>  	return 0;
>  }
> @@ -99,8 +99,8 @@ static int dw_reg_read_word(void *context, unsigned int reg, unsigned int *val)
>  {
>  	struct dw_i2c_dev *dev = context;
>  
> -	*val = readw_relaxed(dev->base + reg) |
> -		(readw_relaxed(dev->base + reg + 2) << 16);
> +	*val = readw(dev->base + reg) |
> +		(readw(dev->base + reg + 2) << 16);
>  
>  	return 0;
>  }
> @@ -109,8 +109,8 @@ static int dw_reg_write_word(void *context, unsigned int reg, unsigned int val)
>  {
>  	struct dw_i2c_dev *dev = context;
>  
> -	writew_relaxed(val, dev->base + reg);
> -	writew_relaxed(val >> 16, dev->base + reg + 2);
> +	writew(val, dev->base + reg);
> +	writew(val >> 16, dev->base + reg + 2);
>  
>  	return 0;
>  }
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR
  2023-11-13  9:48 ` Serge Semin
@ 2023-11-13  9:51   ` Wolfram Sang
  2023-11-13 10:01     ` Serge Semin
  0 siblings, 1 reply; 9+ messages in thread
From: Wolfram Sang @ 2023-11-13  9:51 UTC (permalink / raw)
  To: Serge Semin
  Cc: Jan Bottorff, Jarkko Nikula, Andy Shevchenko, Mika Westerberg,
	Jan Dabros, Andi Shyti, linux-i2c, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 266 bytes --]


> The patch has already been merged in, but in anyway:
> Tested-by: Serge Semin <fancer.lancer@gmail.com>
> Reviewed-by: Serge Semin <fancer.lancer@gmail.com>

Thanks to a restrictive hotel network, I haven't pushed out yet, and
could still add your tags. Thanks!


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR
  2023-11-13  9:51   ` Wolfram Sang
@ 2023-11-13 10:01     ` Serge Semin
  2023-11-15 18:25       ` Jan Bottorff
  0 siblings, 1 reply; 9+ messages in thread
From: Serge Semin @ 2023-11-13 10:01 UTC (permalink / raw)
  To: Wolfram Sang, Jan Bottorff, Jarkko Nikula, Andy Shevchenko,
	Mika Westerberg, Jan Dabros, Andi Shyti, linux-i2c, linux-kernel

On Mon, Nov 13, 2023 at 04:51:00AM -0500, Wolfram Sang wrote:
> 
> > The patch has already been merged in, but in anyway:
> > Tested-by: Serge Semin <fancer.lancer@gmail.com>
> > Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
> 
> Thanks to a restrictive hotel network, I haven't pushed out yet, and
> could still add your tags. Thanks!
> 

Awesome. Thanks!

-Serge(y)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR
  2023-11-13 10:01     ` Serge Semin
@ 2023-11-15 18:25       ` Jan Bottorff
  2023-11-15 18:52         ` Wolfram Sang
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Bottorff @ 2023-11-15 18:25 UTC (permalink / raw)
  To: Serge Semin, Wolfram Sang, Jarkko Nikula, Andy Shevchenko,
	Mika Westerberg, Jan Dabros, Andi Shyti, linux-i2c, linux-kernel

> On Mon, Nov 13, 2023 at 04:51:00AM -0500, Wolfram Sang wrote:
>>
>> Thanks to a restrictive hotel network, I haven't pushed out yet, and
>> could still add your tags. Thanks!

Hi Wolfram,

Any chance we could get the "Cc: stable@vger.kernel.org" tag added to 
this patch? More than one large cloud company would like to see this in 
the stable kernel, as it significantly improves the reliability of IPMI 
transactions on platforms that use i2c for this communication.

Sorry for not including this tag in the submission.

Jan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR
  2023-11-15 18:25       ` Jan Bottorff
@ 2023-11-15 18:52         ` Wolfram Sang
  0 siblings, 0 replies; 9+ messages in thread
From: Wolfram Sang @ 2023-11-15 18:52 UTC (permalink / raw)
  To: Jan Bottorff
  Cc: Serge Semin, Jarkko Nikula, Andy Shevchenko, Mika Westerberg,
	Jan Dabros, Andi Shyti, linux-i2c, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 567 bytes --]


> Any chance we could get the "Cc: stable@vger.kernel.org" tag added to this
> patch? More than one large cloud company would like to see this in the

Well, no, it is already pushed out...

> stable kernel, as it significantly improves the reliability of IPMI
> transactions on platforms that use i2c for this communication.

... but with that commit desc it will surely be backported nonetheless.
If all fails, it can be manually requested to go to stable, but I am
sure this will not be needed.

> Sorry for not including this tag in the submission.

No worries.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-11-15 18:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-09  3:19 [PATCH v4] i2c: designware: Fix corrupted memory seen in the ISR Jan Bottorff
2023-11-09 18:00 ` Andy Shevchenko
2023-11-10 12:02   ` Jarkko Nikula
2023-11-13  1:54 ` Wolfram Sang
2023-11-13  9:48 ` Serge Semin
2023-11-13  9:51   ` Wolfram Sang
2023-11-13 10:01     ` Serge Semin
2023-11-15 18:25       ` Jan Bottorff
2023-11-15 18:52         ` Wolfram Sang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox