All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <deathsimple@vodafone.de>
To: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: kexec@lists.infradead.org,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 0/3] drm/radeon kexec fixes
Date: Wed, 11 Sep 2013 11:10:40 +0200	[thread overview]
Message-ID: <52303390.3060503@vodafone.de> (raw)
In-Reply-To: <20130911090135.GB359@x4>

Am 11.09.2013 11:01, schrieb Markus Trippelsdorf:
> On 2013.09.09 at 11:38 +0200, Christian König wrote:
>> Am 09.09.2013 11:21, schrieb Markus Trippelsdorf:
>>> On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote:
>>>> Markus Trippelsdorf <markus@trippelsdorf.de> writes:
>>>>
>>>>> Here are a couple of patches that get kexec working with radeon devices.
>>>>> I've tested this on my RS780.
>>>>> Comments or flames are welcome.
>>>>> Thanks.
>>>> A couple of high level comments.
>>>>
>>>> This looks promising for the usual case.
>>>>
>>>> Removing the printk at the end of the kexec path seems a little dubious,
>>>> what of other cpus, interrupt handlers, etc.  Basically estabilishing a
>>>> new rule on when printk is allowed seems a little dubious at this point,
>>>> even if it is a useful debugging trick.
>>> OK. I will drop this patch. It doesn't seem to be necessary, because I
>>> cannot reproduce the printk related hang anymore.
>>>
>>>> Having a clean shutdown of the radeon definitely seems worth doing,
>>>> because the cases where we care abouty video are when a person is in
>>>> front of the system.
>>> Yes. But please note that even with radeon_pci_shutdown implemented, I
>>> still get ring test failures on roughly every eighth kexec boot:
>>>
>>>    [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
>>>    radeon 0000:01:05.0: disabling GPU acceleration
>>>
>>> That's definitely better than the current state of affairs, with ring
>>> test failures on every second boot. But I haven't figured out the reason
>>> for these failures yet. It's curious that once a ring test failure
>>> occurs, it will reliably fail after each kexec invocation, no matter how
>>> often repeated. Only a reboot brings the machine back to normal.
>> The main problem here is that the AMD gfx hardware doesn't really
>> support being reinitialized once booted (for various reasons). It's a
>> (intended) limitation of the hardware design that you can only
>> initialize certain blocks once every power cycle, so the whole approach
>> actually will never work 100% reliable.
>>
>> All you can hope for is that stopping the hardware while shutting down
>> the old kernel and starting it again results in exactly the same
>> hardware parameters (offsets, clock etc...) otherwise starting the
>> blocks will just fail and you end up with disabled acceleration like above.
>>
>> Sorry, but there isn't much we can do about this,
> I've tested this further and it turned out that if I revert commit
> f5d9b7f0f9 on top of my "drm/radeon: Implement radeon_pci_shutdown"
> patch, the initialization failures seem to go away completely.
>
> Any idea what's going on?

Well DPM is mostly Alex domain, but if I have to guess I would say that 
the SCLK is gated by the hardware when the driver is unloaded and since 
DPM initialized only later not ungated when the driver loads again.

> Here's the patch:
>
> diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c
> index fa0de46..4e8c1988 100644
> --- a/drivers/gpu/drm/radeon/r600_dpm.c
> +++ b/drivers/gpu/drm/radeon/r600_dpm.c
> @@ -296,9 +296,9 @@ bool r600_dynamicpm_enabled(struct radeon_device *rdev)
>   void r600_enable_sclk_control(struct radeon_device *rdev, bool enable)
>   {
>   	if (enable)
> -		WREG32_P(SCLK_PWRMGT_CNTL, 0, ~SCLK_PWRMGT_OFF);
> +		WREG32_P(GENERAL_PWRMGT, 0, ~SCLK_PWRMGT_OFF);
>   	else
> -		WREG32_P(SCLK_PWRMGT_CNTL, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF);
> +		WREG32_P(GENERAL_PWRMGT, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF);
>   }
>   
>   void r600_enable_mclk_control(struct radeon_device *rdev, bool enable)

The patch just breaks SCLK gating on R6xx again, so no gain here.

Christian.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: "Christian König" <deathsimple@vodafone.de>
To: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: kexec@lists.infradead.org,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 0/3] drm/radeon kexec fixes
Date: Wed, 11 Sep 2013 11:10:40 +0200	[thread overview]
Message-ID: <52303390.3060503@vodafone.de> (raw)
In-Reply-To: <20130911090135.GB359@x4>

Am 11.09.2013 11:01, schrieb Markus Trippelsdorf:
> On 2013.09.09 at 11:38 +0200, Christian König wrote:
>> Am 09.09.2013 11:21, schrieb Markus Trippelsdorf:
>>> On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote:
>>>> Markus Trippelsdorf <markus@trippelsdorf.de> writes:
>>>>
>>>>> Here are a couple of patches that get kexec working with radeon devices.
>>>>> I've tested this on my RS780.
>>>>> Comments or flames are welcome.
>>>>> Thanks.
>>>> A couple of high level comments.
>>>>
>>>> This looks promising for the usual case.
>>>>
>>>> Removing the printk at the end of the kexec path seems a little dubious,
>>>> what of other cpus, interrupt handlers, etc.  Basically estabilishing a
>>>> new rule on when printk is allowed seems a little dubious at this point,
>>>> even if it is a useful debugging trick.
>>> OK. I will drop this patch. It doesn't seem to be necessary, because I
>>> cannot reproduce the printk related hang anymore.
>>>
>>>> Having a clean shutdown of the radeon definitely seems worth doing,
>>>> because the cases where we care abouty video are when a person is in
>>>> front of the system.
>>> Yes. But please note that even with radeon_pci_shutdown implemented, I
>>> still get ring test failures on roughly every eighth kexec boot:
>>>
>>>    [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
>>>    radeon 0000:01:05.0: disabling GPU acceleration
>>>
>>> That's definitely better than the current state of affairs, with ring
>>> test failures on every second boot. But I haven't figured out the reason
>>> for these failures yet. It's curious that once a ring test failure
>>> occurs, it will reliably fail after each kexec invocation, no matter how
>>> often repeated. Only a reboot brings the machine back to normal.
>> The main problem here is that the AMD gfx hardware doesn't really
>> support being reinitialized once booted (for various reasons). It's a
>> (intended) limitation of the hardware design that you can only
>> initialize certain blocks once every power cycle, so the whole approach
>> actually will never work 100% reliable.
>>
>> All you can hope for is that stopping the hardware while shutting down
>> the old kernel and starting it again results in exactly the same
>> hardware parameters (offsets, clock etc...) otherwise starting the
>> blocks will just fail and you end up with disabled acceleration like above.
>>
>> Sorry, but there isn't much we can do about this,
> I've tested this further and it turned out that if I revert commit
> f5d9b7f0f9 on top of my "drm/radeon: Implement radeon_pci_shutdown"
> patch, the initialization failures seem to go away completely.
>
> Any idea what's going on?

Well DPM is mostly Alex domain, but if I have to guess I would say that 
the SCLK is gated by the hardware when the driver is unloaded and since 
DPM initialized only later not ungated when the driver loads again.

> Here's the patch:
>
> diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c
> index fa0de46..4e8c1988 100644
> --- a/drivers/gpu/drm/radeon/r600_dpm.c
> +++ b/drivers/gpu/drm/radeon/r600_dpm.c
> @@ -296,9 +296,9 @@ bool r600_dynamicpm_enabled(struct radeon_device *rdev)
>   void r600_enable_sclk_control(struct radeon_device *rdev, bool enable)
>   {
>   	if (enable)
> -		WREG32_P(SCLK_PWRMGT_CNTL, 0, ~SCLK_PWRMGT_OFF);
> +		WREG32_P(GENERAL_PWRMGT, 0, ~SCLK_PWRMGT_OFF);
>   	else
> -		WREG32_P(SCLK_PWRMGT_CNTL, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF);
> +		WREG32_P(GENERAL_PWRMGT, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF);
>   }
>   
>   void r600_enable_mclk_control(struct radeon_device *rdev, bool enable)

The patch just breaks SCLK gating on R6xx again, so no gain here.

Christian.

  reply	other threads:[~2013-09-11  9:11 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-08 12:09 [PATCH 0/3] drm/radeon kexec fixes Markus Trippelsdorf
2013-09-08 12:09 ` Markus Trippelsdorf
2013-09-08 12:10 ` [PATCH 1/3] kexec: get rid of late printk Markus Trippelsdorf
2013-09-08 12:10   ` Markus Trippelsdorf
2013-09-08 20:11   ` Daniel Vetter
2013-09-08 20:11     ` Daniel Vetter
2013-09-08 20:42     ` Bruno Prémont
2013-09-08 20:42       ` Bruno Prémont
2013-09-08 12:10 ` [PATCH 2/3] drm/radeon: Implement radeon_pci_shutdown Markus Trippelsdorf
2013-09-08 12:10   ` Markus Trippelsdorf
2013-09-09 13:32   ` Konrad Rzeszutek Wilk
2013-09-09 13:32     ` Konrad Rzeszutek Wilk
2013-09-08 12:11 ` [PATCH 3/3] drm/radeon: get rid of r100_restore_sanity hack Markus Trippelsdorf
2013-09-08 12:11   ` Markus Trippelsdorf
2013-09-09  0:32 ` [PATCH 0/3] drm/radeon kexec fixes Eric W. Biederman
2013-09-09  0:32   ` Eric W. Biederman
2013-09-09  9:21   ` Markus Trippelsdorf
2013-09-09  9:21     ` Markus Trippelsdorf
2013-09-09  9:38     ` Christian König
2013-09-09  9:38       ` Christian König
2013-09-11  9:01       ` Markus Trippelsdorf
2013-09-11  9:01         ` Markus Trippelsdorf
2013-09-11  9:10         ` Christian König [this message]
2013-09-11  9:10           ` Christian König
2013-09-11 13:30         ` Alex Deucher
2013-09-11 13:30           ` Alex Deucher
2013-09-09 13:04     ` Alex Deucher
2013-09-09 13:04       ` Alex Deucher
2013-09-10 18:27       ` Eric W. Biederman
2013-09-10 18:27         ` Eric W. Biederman
2013-09-10 20:40         ` Alex Deucher
2013-09-10 20:40           ` Alex Deucher
2013-09-11  8:53           ` Markus Trippelsdorf
2013-09-11  8:53             ` Markus Trippelsdorf
2013-09-11  9:21             ` Christian König
2013-09-11  9:21               ` Christian König
2013-09-11 13:40             ` Alex Deucher
2013-09-11 13:40               ` Alex Deucher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52303390.3060503@vodafone.de \
    --to=deathsimple@vodafone.de \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ebiederm@xmission.com \
    --cc=kexec@lists.infradead.org \
    --cc=markus@trippelsdorf.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.