* amdgpu multi monitor - clock, heat and power problem
@ 2019-04-08 20:18 Rigo Reddig
2019-04-08 22:58 ` Alex Deucher
0 siblings, 1 reply; 2+ messages in thread
From: Rigo Reddig @ 2019-04-08 20:18 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
[-- Attachment #1.1.1: Type: text/plain, Size: 4779 bytes --]
I have 2 Gigabyte RX580's in my desktop workstation.
I'm running Arch Linux with KDE Plasma on the 5.0.6 kernel.
The cards themselves work fine, except,
I have two 1080p HDMI monitors plugged into one of these cards.
One in a native HDMI port, one in a passive DVI->HDMI adapter.
This causes the following problem for idle usage:
1. Memory clock is effectively locked at 200Mhz always
2. Core clock is constantly at high frequency P-state
3. Temperatures are increased
4. Power consumption is increased (significantly)
5. PCI bus is always at full speed
6. Forcing core clock to 300Mhz, uses a higher than usual voltage
Below is an excerpt from the rocm-smi utility for the automatic defaults
(I have omitted overclock and powercap values for formatting purposes)
2 Monitors connected to GPU 0, No monitors connected to GPU 1
ROCm System Management Interface
==========================================================================
=====
GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU%
0 44.0c 36.193W 1145Mhz 2000Mhz 8.0GT/s, x16 40.0% auto 0%
1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0%
==========================================================================
=====
End of ROCm SMI Log
GPU 0 is idle and yet running SCLK and MCLK at unnecessary power levels
GPU 1 is truly idle
Regarding GPU 0 temperature, I have actually setup a daemon to run the fan at a
consistent rate to prevent it from constantly peaking.
-------------------------------------------------------------------------------
1 Monitors connected to GPU 0, No monitors connected to GPU 1
ROCm System Management Interface
==========================================================================
=====
GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU%
0 36.0c 28.103W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0%
1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0%
==========================================================================
=====
2 Monitors connected to GPU 0, No monitors connected to GPU 1
2 Monitors connected to GPU 0, No monitors connected to GPU 1
ROCm System Management Interface
==========================================================================
=====
GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU%
0 44.0c 31.086W 300Mhz 2000Mhz 2.5GT/s, x8 40.0% low 0%
1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% low 0%
==========================================================================
=====
Peculiarly even with low power state forced, the GPU runs at a voltage (950mV) in excess
of what is required for 300Mhz (750mV)
==========================================================================
=====
cat /sys/kernel/debug/dri/0/amdgpu_pm_info
jupiter: Mon Apr 8 21:57:29 2019
Clock Gating Flags Mask: 0x3fbcf
Graphics Medium Grain Clock Gating: On
Graphics Medium Grain memory Light Sleep: On
Graphics Coarse Grain Clock Gating: On
Graphics Coarse Grain memory Light Sleep: On
Graphics Coarse Grain Tree Shader Clock Gating: Off
Graphics Coarse Grain Tree Shader Light Sleep: Off
Graphics Command Processor Light Sleep: On
Graphics Run List Controller Light Sleep: On
Graphics 3D Coarse Grain Clock Gating: Off
Graphics 3D Coarse Grain memory Light Sleep: Off
Memory Controller Light Sleep: On
Memory Controller Medium Grain Clock Gating: On
System Direct Memory Access Light Sleep: Off
System Direct Memory Access Medium Grain Clock Gating: On
Bus Interface Medium Grain Clock Gating: Off
Bus Interface Light Sleep: On
Unified Video Decoder Medium Grain Clock Gating: On
Video Compression Engine Medium Grain Clock Gating: On
Host Data Path Light Sleep: On
Host Data Path Medium Grain Clock Gating: On
Digital Right Management Medium Grain Clock Gating: Off
Digital Right Management Light Sleep: Off
Rom Medium Grain Clock Gating: On
Data Fabric Medium Grain Clock Gating: Off
GFX Clocks and Power:
2000 MHz (MCLK)
300 MHz (SCLK)
600 MHz (PSTATE_SCLK)
1000 MHz (PSTATE_MCLK)
950 mV (VDDGFX)
31.14 W (average GPU)
GPU Temperature: 43 C
GPU Load: 0 %
UVD: Disabled
VCE: Disabled
==========================================================================
=====
[-- Attachment #1.1.2: Type: text/html, Size: 29785 bytes --]
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 153 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: amdgpu multi monitor - clock, heat and power problem
2019-04-08 20:18 amdgpu multi monitor - clock, heat and power problem Rigo Reddig
@ 2019-04-08 22:58 ` Alex Deucher
0 siblings, 0 replies; 2+ messages in thread
From: Alex Deucher @ 2019-04-08 22:58 UTC (permalink / raw)
To: Rigo Reddig; +Cc: amd-gfx list
On Mon, Apr 8, 2019 at 6:50 PM Rigo Reddig <rigo.reddig@gmail.com> wrote:
>
> I have 2 Gigabyte RX580's in my desktop workstation.
>
> I'm running Arch Linux with KDE Plasma on the 5.0.6 kernel.
>
>
>
> The cards themselves work fine, except,
>
> I have two 1080p HDMI monitors plugged into one of these cards.
>
> One in a native HDMI port, one in a passive DVI->HDMI adapter.
>
>
>
> This causes the following problem for idle usage:
>
>
>
> 1. Memory clock is effectively locked at 200Mhz always
>
> 2. Core clock is constantly at high frequency P-state
>
> 3. Temperatures are increased
>
> 4. Power consumption is increased (significantly)
>
> 5. PCI bus is always at full speed
>
> 6. Forcing core clock to 300Mhz, uses a higher than usual voltage
>
>
>
> Below is an excerpt from the rocm-smi utility for the automatic defaults
>
> (I have omitted overclock and powercap values for formatting purposes)
>
>
>
>
>
> 2 Monitors connected to GPU 0, No monitors connected to GPU 1
>
> ROCm System Management Interface
>
> ===============================================================================
>
> GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU%
>
> 0 44.0c 36.193W 1145Mhz 2000Mhz 8.0GT/s, x16 40.0% auto 0%
>
> 1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0%
>
> ===============================================================================
>
> End of ROCm SMI Log
>
>
>
> GPU 0 is idle and yet running SCLK and MCLK at unnecessary power levels
>
> GPU 1 is truly idle
>
> Regarding GPU 0 temperature, I have actually setup a daemon to run the fan at a consistent rate to prevent it from constantly peaking.
>
>
>
> -------------------------------------------------------------------------------
>
>
>
> 1 Monitors connected to GPU 0, No monitors connected to GPU 1
>
> ROCm System Management Interface
>
> ===============================================================================
>
> GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU%
>
> 0 36.0c 28.103W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0%
>
> 1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0%
>
> ===============================================================================
>
>
>
> 2 Monitors connected to GPU 0, No monitors connected to GPU 1
>
>
>
> 2 Monitors connected to GPU 0, No monitors connected to GPU 1
>
> ROCm System Management Interface
>
> ===============================================================================
>
> GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU%
>
> 0 44.0c 31.086W 300Mhz 2000Mhz 2.5GT/s, x8 40.0% low 0%
>
> 1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% low 0%
>
> ===============================================================================
>
>
>
> Peculiarly even with low power state forced, the GPU runs at a voltage (950mV) in excess of what is required for 300Mhz (750mV)
>
>
>
>
>
> ===============================================================================
>
> cat /sys/kernel/debug/dri/0/amdgpu_pm_info jupiter: Mon Apr 8 21:57:29 2019
>
>
>
> Clock Gating Flags Mask: 0x3fbcf
>
> Graphics Medium Grain Clock Gating: On
>
> Graphics Medium Grain memory Light Sleep: On
>
> Graphics Coarse Grain Clock Gating: On
>
> Graphics Coarse Grain memory Light Sleep: On
>
> Graphics Coarse Grain Tree Shader Clock Gating: Off
>
> Graphics Coarse Grain Tree Shader Light Sleep: Off
>
> Graphics Command Processor Light Sleep: On
>
> Graphics Run List Controller Light Sleep: On
>
> Graphics 3D Coarse Grain Clock Gating: Off
>
> Graphics 3D Coarse Grain memory Light Sleep: Off
>
> Memory Controller Light Sleep: On
>
> Memory Controller Medium Grain Clock Gating: On
>
> System Direct Memory Access Light Sleep: Off
>
> System Direct Memory Access Medium Grain Clock Gating: On
>
> Bus Interface Medium Grain Clock Gating: Off
>
> Bus Interface Light Sleep: On
>
> Unified Video Decoder Medium Grain Clock Gating: On
>
> Video Compression Engine Medium Grain Clock Gating: On
>
> Host Data Path Light Sleep: On
>
> Host Data Path Medium Grain Clock Gating: On
>
> Digital Right Management Medium Grain Clock Gating: Off
>
> Digital Right Management Light Sleep: Off
>
> Rom Medium Grain Clock Gating: On
>
> Data Fabric Medium Grain Clock Gating: Off
>
>
>
> GFX Clocks and Power:
>
> 2000 MHz (MCLK)
>
> 300 MHz (SCLK)
>
> 600 MHz (PSTATE_SCLK)
>
> 1000 MHz (PSTATE_MCLK)
>
> 950 mV (VDDGFX)
>
> 31.14 W (average GPU)
>
>
>
> GPU Temperature: 43 C
>
> GPU Load: 0 %
>
>
>
> UVD: Disabled
>
>
>
> VCE: Disabled
>
> ===============================================================================
>
>
>
>
>
> Using amdgpu.ppfeaturemask=0xffffffff I am able to work around all of the above issues, but requires me to manually set idle and performance clock speeds as required. 300mhz memory and core drive 2 HDMI 1080p displays just fine.
>
> But this leads to screen tearing/green visible artefacting when *changing* core clock speeds. Like there is a synchronization issue. But when running at a fixed speed, all is well.
>
>
>
> The temperatures alone show that power is being wasted.
>
>
>
> I have a UPS that can reasonably accurately (16W steps) measure system power consumption. At idle with default settings letting the kernel and gpu's deal with things themselves I sometimes read ~196W idle power!
>
>
>
> 2 Monitors (auto) -> 196W Idle
>
> 2 Monitors (low) -> 160W Idle
>
> 2 Monitors (Force 300) -> 112-128W Idle
>
> 1 monitor -> 96-128W Idle
>
>
>
> Even if my UPS isn't giving the exact true values, that delta is concerning.
>
>
>
> It is a longstanding issue which has been bugging me for a while now.
> I'm not sure if it's come up yet or why this has been going on for so long.
>
> But it should really be fixed as the issue carries a quite large associated thermal and power burden.
>
>
>
> I have tried poking through the source code to figure this out, but no luck. Have I missed something? Is there a problem synchronizing display VSYNC on clock changes? Why is this happening? It's clearly not the right behaviour.
>
>
>
> What can be done to fix this? Can I help?
When multiple monitors are active, mclk dpm is disabled and the mclk
is set to the highest. This is because mclk switching has to happen
during vblank to avoid artifacts and flickering on the display when it
happens. With multiple monitors, the vblank periods don't necessarily
overlap so mclk cannot be switched with out flickering or artifacts.
Sclk dpm should still work however and should go to the lowest sclk
state when the GPU is idle even with multiple monitors.
Alex
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-04-08 22:58 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-08 20:18 amdgpu multi monitor - clock, heat and power problem Rigo Reddig
2019-04-08 22:58 ` Alex Deucher
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.