[Bug] x86/resctrl: unexpect mbm_local_bytes/mbm_total_bytes delta on AMD with multiple RMIDs in the same domain

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Bug] x86/resctrl: unexpect mbm_local_bytes/mbm_total_bytes delta on AMD with multiple RMIDs in the same domain
@ 2025-07-29  7:53 Hc Zheng
  2025-07-29 16:49 ` Reinette Chatre
  0 siblings, 1 reply; 3+ messages in thread
From: Hc Zheng @ 2025-07-29  7:53 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen
  Cc: x86, H. Peter Anvin, linux-kernel

Hi All,

We have enable resctrl on container platform. We notice some unexpect
behaviors when multiple containers running in the same L3 domain.
the  mbm_local_bytes/mbm_total_bytes for such mon_groups return
Unavailable or delta with two consecutive reads is out of normal range
(eg: 1000+GB/s)

after reading the AMD pqos manual(), it says
"""
Potential causes of the “U” bit being set include
(but are not limited to):

• RMID is not currently tracked by the hardware.
• RMID was not tracked by the hardware at some time since it was last read.
• RMID has not been read since it started being tracked by the hardware.
"""

but no explanations for unexpect large delta between 2 reads of the
counters. After exam the kernel code, I suspect this would more likely
to be a hardware bugs

here are the steps to reproduce it

1. create mon_groups

$ for i in `seq 0 99`;do mkdir -p /sys/fs/resctrl/amdtest/mon_groups/test$i;done

2. run stress command and assigned such pid to each mon_groups , (I
have run such test on AMD Genoa. cpu 16-23,208-215 is on CCD 8)

$ cat stress.sh
nohup numactl -C 16-23,208-215 stress -m  1 --vm-hang 1 > /dev/null &
lastPid=$!
echo $lastPid > /sys/fs/resctrl/amdtest/tasks
echo $lastPid > /sys/fs/resctrl/amdtest/mon_groups/test$1/tasks
$ for i in `seq 0 99`;do bash stress.sh $i ;done

3. watch the resctrl counter every 10 seconds

$ while true ;do cat
/sys/fs/resctrl/amdtest/mon_groups/test9/mon_data/mon_L3_08/mbm_local_bytes;sleep
10;done

...
Unavailable
Unavailable
Unavailable
61924495182825856
64176294690029568
Unavailable
Unavailable
Unavailable
...

at some point the delta for 2 consecutive reads is out of normal
range,  (64176294690029568 - 61924495182825856) / 1024 / 1024 / 1024 /
10 =  209715 Gb/s

if I lower the concurrecy to like 59 or lower, the delta is in normal
range, and never return Unavailable. I have also tested on amd Rome
cpu, the problem still existed.
I have try this on intel platform, It does not have such problem, with
even over 200+ RMIDs concurrently being monitored.

I can not find any documents about max RMID for AMD hardware can
concurrently holds, or a explanations for such problems.
I believe this could become even severe on AMD with more threads in
the future, as we will run more workloads on a single server

Can some one help me to solve this problem, thanks

Best Regards
Huaicheng Zheng

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Bug] x86/resctrl: unexpect mbm_local_bytes/mbm_total_bytes delta on AMD with multiple RMIDs in the same domain
  2025-07-29  7:53 [Bug] x86/resctrl: unexpect mbm_local_bytes/mbm_total_bytes delta on AMD with multiple RMIDs in the same domain Hc Zheng
@ 2025-07-29 16:49 ` Reinette Chatre
  2025-07-29 17:42   ` Moger, Babu
  0 siblings, 1 reply; 3+ messages in thread
From: Reinette Chatre @ 2025-07-29 16:49 UTC (permalink / raw)
  To: Hc Zheng, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Babu Moger
  Cc: x86, H. Peter Anvin, linux-kernel

+Babu

Hi Huaicheng Zheng,

On 7/29/25 12:53 AM, Hc Zheng wrote:
> Hi All,
> 
> We have enable resctrl on container platform. We notice some unexpect
> behaviors when multiple containers running in the same L3 domain.
> the  mbm_local_bytes/mbm_total_bytes for such mon_groups return
> Unavailable or delta with two consecutive reads is out of normal range
> (eg: 1000+GB/s)
> 
> after reading the AMD pqos manual(), it says
> """
> Potential causes of the “U” bit being set include
> (but are not limited to):
> 
> • RMID is not currently tracked by the hardware.
> • RMID was not tracked by the hardware at some time since it was last read.
> • RMID has not been read since it started being tracked by the hardware.
> """
> 
> but no explanations for unexpect large delta between 2 reads of the
> counters. After exam the kernel code, I suspect this would more likely
> to be a hardware bugs
> 
> here are the steps to reproduce it
> 
> 1. create mon_groups
> 
> $ for i in `seq 0 99`;do mkdir -p /sys/fs/resctrl/amdtest/mon_groups/test$i;done
> 
> 2. run stress command and assigned such pid to each mon_groups , (I
> have run such test on AMD Genoa. cpu 16-23,208-215 is on CCD 8)
> 
> $ cat stress.sh
> nohup numactl -C 16-23,208-215 stress -m  1 --vm-hang 1 > /dev/null &
> lastPid=$!
> echo $lastPid > /sys/fs/resctrl/amdtest/tasks
> echo $lastPid > /sys/fs/resctrl/amdtest/mon_groups/test$1/tasks
> $ for i in `seq 0 99`;do bash stress.sh $i ;done
> 
> 3. watch the resctrl counter every 10 seconds
> 
> $ while true ;do cat
> /sys/fs/resctrl/amdtest/mon_groups/test9/mon_data/mon_L3_08/mbm_local_bytes;sleep
> 10;done
> 
> ...
> Unavailable
> Unavailable
> Unavailable
> 61924495182825856
> 64176294690029568
> Unavailable
> Unavailable
> Unavailable
> ...
> 
> at some point the delta for 2 consecutive reads is out of normal
> range,  (64176294690029568 - 61924495182825856) / 1024 / 1024 / 1024 /
> 10 =  209715 Gb/s
> 
> if I lower the concurrecy to like 59 or lower, the delta is in normal
> range, and never return Unavailable. I have also tested on amd Rome
> cpu, the problem still existed.
> I have try this on intel platform, It does not have such problem, with
> even over 200+ RMIDs concurrently being monitored.
> 
> I can not find any documents about max RMID for AMD hardware can
> concurrently holds, or a explanations for such problems.
> I believe this could become even severe on AMD with more threads in
> the future, as we will run more workloads on a single server
> 
> Can some one help me to solve this problem, thanks

It looks to me as though you are encountering the issue that is addressed with AMD's
Assignable Bandwidth Monitoring Counters (ABMC) feature that Babu is currently enabling
in resctrl [1]. The feature itself is well documented in that series and includes links to
the AMD spec where you can learn more.
You show that the "Unavailable" is encountered when reading these counters from user
space and I deduce from that that resctrl's internal MBM overflow handler (it runs once
per second) likely encounters the same error with the consequence that overflows of the
counter are not handled correctly.

If you do have access to the AMD hardware with this feature, please do take a look at
the resctrl support for it and try it out. We would all appreciate your feedback to ensure
resctrl supports it well.

Reinette 

[1] https://lore.kernel.org/lkml/cover.1753467772.git.babu.moger@amd.com/


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Bug] x86/resctrl: unexpect mbm_local_bytes/mbm_total_bytes delta on AMD with multiple RMIDs in the same domain
  2025-07-29 16:49 ` Reinette Chatre
@ 2025-07-29 17:42   ` Moger, Babu
  0 siblings, 0 replies; 3+ messages in thread
From: Moger, Babu @ 2025-07-29 17:42 UTC (permalink / raw)
  To: Reinette Chatre, Hc Zheng, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, H. Peter Anvin, linux-kernel

Hi Hc Zheng,

On 7/29/25 11:49, Reinette Chatre wrote:
> +Babu
> 
> Hi Huaicheng Zheng,
> 
> On 7/29/25 12:53 AM, Hc Zheng wrote:
>> Hi All,
>>
>> We have enable resctrl on container platform. We notice some unexpect
>> behaviors when multiple containers running in the same L3 domain.
>> the  mbm_local_bytes/mbm_total_bytes for such mon_groups return
>> Unavailable or delta with two consecutive reads is out of normal range
>> (eg: 1000+GB/s)
>>
>> after reading the AMD pqos manual(), it says
>> """
>> Potential causes of the “U” bit being set include
>> (but are not limited to):
>>
>> • RMID is not currently tracked by the hardware.
>> • RMID was not tracked by the hardware at some time since it was last read.
>> • RMID has not been read since it started being tracked by the hardware.
>> """
>>
>> but no explanations for unexpect large delta between 2 reads of the
>> counters. After exam the kernel code, I suspect this would more likely
>> to be a hardware bugs
>>
>> here are the steps to reproduce it
>>
>> 1. create mon_groups
>>
>> $ for i in `seq 0 99`;do mkdir -p /sys/fs/resctrl/amdtest/mon_groups/test$i;done

Looks like you are creating 99 new groups here.

You can create more monitor groups,  but hardware cannot count more than
32 RMIDs(or 16 in some old hardware) at a time.


>>
>> 2. run stress command and assigned such pid to each mon_groups , (I
>> have run such test on AMD Genoa. cpu 16-23,208-215 is on CCD 8)
>>
>> $ cat stress.sh
>> nohup numactl -C 16-23,208-215 stress -m  1 --vm-hang 1 > /dev/null &
>> lastPid=$!
>> echo $lastPid > /sys/fs/resctrl/amdtest/tasks
>> echo $lastPid > /sys/fs/resctrl/amdtest/mon_groups/test$1/tasks
>> $ for i in `seq 0 99`;do bash stress.sh $i ;done
>>
>> 3. watch the resctrl counter every 10 seconds
>>
>> $ while true ;do cat
>> /sys/fs/resctrl/amdtest/mon_groups/test9/mon_data/mon_L3_08/mbm_local_bytes;sleep
>> 10;done
>>
>> ...
>> Unavailable
>> Unavailable
>> Unavailable
>> 61924495182825856
>> 64176294690029568
>> Unavailable
>> Unavailable
>> Unavailable
>> ...
>>
>> at some point the delta for 2 consecutive reads is out of normal
>> range,  (64176294690029568 - 61924495182825856) / 1024 / 1024 / 1024 /
>> 10 =  209715 Gb/s
>>
>> if I lower the concurrecy to like 59 or lower, the delta is in normal
>> range, and never return Unavailable. I have also tested on amd Rome
>> cpu, the problem still existed.
>> I have try this on intel platform, It does not have such problem, with
>> even over 200+ RMIDs concurrently being monitored.
>>
>> I can not find any documents about max RMID for AMD hardware can
>> concurrently holds, or a explanations for such problems.
>> I believe this could become even severe on AMD with more threads in
>> the future, as we will run more workloads on a single server
>>
>> Can some one help me to solve this problem, thanks
> 
> It looks to me as though you are encountering the issue that is addressed with AMD's
> Assignable Bandwidth Monitoring Counters (ABMC) feature that Babu is currently enabling
> in resctrl [1]. The feature itself is well documented in that series and includes links to
> the AMD spec where you can learn more.
> You show that the "Unavailable" is encountered when reading these counters from user
> space and I deduce from that that resctrl's internal MBM overflow handler (it runs once
> per second) likely encounters the same error with the consequence that overflows of the
> counter are not handled correctly.

Yea. The huge numbers are due to overflow problem. Kernel assumes there is
an overflow and adds a big number to account for the overflow in a
subsequent reads.

Yes. We are trying to address in the new hardware which is mentioned in [1].
> 
> If you do have access to the AMD hardware with this feature, please do take a look at
> the resctrl support for it and try it out. We would all appreciate your feedback to ensure
> resctrl supports it well.
> 
> Reinette 
> 
> [1] https://lore.kernel.org/lkml/cover.1753467772.git.babu.moger@amd.com/
> 
> 

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-07-29 17:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29  7:53 [Bug] x86/resctrl: unexpect mbm_local_bytes/mbm_total_bytes delta on AMD with multiple RMIDs in the same domain Hc Zheng
2025-07-29 16:49 ` Reinette Chatre
2025-07-29 17:42   ` Moger, Babu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).