* [PATCH 0/3] drm/radeon kexec fixes
@ 2013-09-08 12:09 Markus Trippelsdorf
2013-09-08 12:10 ` [PATCH 1/3] kexec: get rid of late printk Markus Trippelsdorf
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: Markus Trippelsdorf @ 2013-09-08 12:09 UTC (permalink / raw)
To: dri-devel; +Cc: Alex Deucher, kexec, Eric Biederman
Here are a couple of patches that get kexec working with radeon devices.
I've tested this on my RS780.
Comments or flames are welcome.
Thanks.
Markus Trippelsdorf (3):
kexec: get rid of late printk
drm/radeon: Implement radeon_pci_shutdown
drm/radeon: get rid of r100_restore_sanity hack
drivers/gpu/drm/radeon/r100.c | 27 ---------------------------
drivers/gpu/drm/radeon/r300.c | 2 --
drivers/gpu/drm/radeon/r420.c | 2 --
drivers/gpu/drm/radeon/r520.c | 2 --
drivers/gpu/drm/radeon/radeon_asic.h | 1 -
drivers/gpu/drm/radeon/radeon_drv.c | 10 ++++++++++
drivers/gpu/drm/radeon/rs400.c | 2 --
drivers/gpu/drm/radeon/rs600.c | 2 --
drivers/gpu/drm/radeon/rs690.c | 2 --
drivers/gpu/drm/radeon/rv515.c | 2 --
kernel/kexec.c | 1 -
11 files changed, 10 insertions(+), 43 deletions(-)
--
Markus
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 19+ messages in thread* [PATCH 1/3] kexec: get rid of late printk 2013-09-08 12:09 [PATCH 0/3] drm/radeon kexec fixes Markus Trippelsdorf @ 2013-09-08 12:10 ` Markus Trippelsdorf 2013-09-08 20:11 ` Daniel Vetter 2013-09-08 12:10 ` [PATCH 2/3] drm/radeon: Implement radeon_pci_shutdown Markus Trippelsdorf ` (2 subsequent siblings) 3 siblings, 1 reply; 19+ messages in thread From: Markus Trippelsdorf @ 2013-09-08 12:10 UTC (permalink / raw) To: dri-devel; +Cc: Alex Deucher, kexec, Eric Biederman kexec calls: printk(KERN_EMERG "Starting new kernel\n"); late before calling machine_shutdown(). However at this point the underlying fb device may have already been shutdown. This causes the kernel to hang. Fix by simply getting rid of the printk call. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> --- kernel/kexec.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/kexec.c b/kernel/kexec.c index 59f7b55..f33fa9f 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1679,7 +1679,6 @@ int kernel_kexec(void) #endif { kernel_restart_prepare(NULL); - printk(KERN_EMERG "Starting new kernel\n"); machine_shutdown(); } -- Markus _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/3] kexec: get rid of late printk 2013-09-08 12:10 ` [PATCH 1/3] kexec: get rid of late printk Markus Trippelsdorf @ 2013-09-08 20:11 ` Daniel Vetter 2013-09-08 20:42 ` Bruno Prémont 0 siblings, 1 reply; 19+ messages in thread From: Daniel Vetter @ 2013-09-08 20:11 UTC (permalink / raw) To: Markus Trippelsdorf; +Cc: kexec, Eric Biederman, dri-devel On Sun, Sep 8, 2013 at 2:10 PM, Markus Trippelsdorf <markus@trippelsdorf.de> wrote: > kexec calls: > printk(KERN_EMERG "Starting new kernel\n"); > late before calling machine_shutdown(). > However at this point the underlying fb device may have already been > shutdown. This causes the kernel to hang. > Fix by simply getting rid of the printk call. > > Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Shouldn't this be taken care of with the suspend/resume_console calls? At least that's my understanding how it works in the suspend/hibernate code, maybe kexec needs similar treatment ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/3] kexec: get rid of late printk 2013-09-08 20:11 ` Daniel Vetter @ 2013-09-08 20:42 ` Bruno Prémont 0 siblings, 0 replies; 19+ messages in thread From: Bruno Prémont @ 2013-09-08 20:42 UTC (permalink / raw) To: Daniel Vetter; +Cc: dri-devel, kexec, Eric Biederman, Markus Trippelsdorf On Sun, 08 September 2013 Daniel Vetter <daniel@ffwll.ch> wrote: > On Sun, Sep 8, 2013 at 2:10 PM, Markus Trippelsdorf > <markus@trippelsdorf.de> wrote: > > kexec calls: > > printk(KERN_EMERG "Starting new kernel\n"); > > late before calling machine_shutdown(). > > However at this point the underlying fb device may have already been > > shutdown. This causes the kernel to hang. > > Fix by simply getting rid of the printk call. > > > > Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> > > Shouldn't this be taken care of with the suspend/resume_console calls? > At least that's my understanding how it works in the suspend/hibernate > code, maybe kexec needs similar treatment ... Is it suspend/resume_console? Shouldn't the fbcon be short-circuited or disabled once there is no more underlying fb? Serial console, if present, as well as netconsole if network device is still alive should continue working I would say. Bruno > -Daniel _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/3] drm/radeon: Implement radeon_pci_shutdown 2013-09-08 12:09 [PATCH 0/3] drm/radeon kexec fixes Markus Trippelsdorf 2013-09-08 12:10 ` [PATCH 1/3] kexec: get rid of late printk Markus Trippelsdorf @ 2013-09-08 12:10 ` Markus Trippelsdorf 2013-09-09 13:32 ` Konrad Rzeszutek Wilk 2013-09-08 12:11 ` [PATCH 3/3] drm/radeon: get rid of r100_restore_sanity hack Markus Trippelsdorf 2013-09-09 0:32 ` [PATCH 0/3] drm/radeon kexec fixes Eric W. Biederman 3 siblings, 1 reply; 19+ messages in thread From: Markus Trippelsdorf @ 2013-09-08 12:10 UTC (permalink / raw) To: dri-devel; +Cc: Alex Deucher, kexec, Eric Biederman Currently radeon devices are not properbly shutdown during kexec. This cases a varity of issues, e.g. dpm initialization failures. Fix this by implementing a radeon_pci_shutdown function, that unloads the driver cleanly. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> --- drivers/gpu/drm/radeon/radeon_drv.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index cb4445f..d71edee 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -380,6 +380,15 @@ static const struct file_operations radeon_driver_kms_fops = { #endif }; + +static void +radeon_pci_shutdown(struct pci_dev *pdev) +{ + struct drm_device *dev = pci_get_drvdata(pdev); + + radeon_driver_unload_kms(dev); +} + static struct drm_driver kms_driver = { .driver_features = DRIVER_USE_AGP | @@ -453,6 +462,7 @@ static struct pci_driver radeon_kms_pci_driver = { .remove = radeon_pci_remove, .suspend = radeon_pci_suspend, .resume = radeon_pci_resume, + .shutdown = radeon_pci_shutdown, }; static int __init radeon_init(void) -- Markus _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/3] drm/radeon: Implement radeon_pci_shutdown 2013-09-08 12:10 ` [PATCH 2/3] drm/radeon: Implement radeon_pci_shutdown Markus Trippelsdorf @ 2013-09-09 13:32 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 19+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-09-09 13:32 UTC (permalink / raw) To: Markus Trippelsdorf; +Cc: kexec, Eric Biederman, dri-devel On Sun, Sep 08, 2013 at 02:10:58PM +0200, Markus Trippelsdorf wrote: > Currently radeon devices are not properbly shutdown during kexec. This properly > cases a varity of issues, e.g. dpm initialization failures. > Fix this by implementing a radeon_pci_shutdown function, that unloads > the driver cleanly. > > Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> > --- > drivers/gpu/drm/radeon/radeon_drv.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c > index cb4445f..d71edee 100644 > --- a/drivers/gpu/drm/radeon/radeon_drv.c > +++ b/drivers/gpu/drm/radeon/radeon_drv.c > @@ -380,6 +380,15 @@ static const struct file_operations radeon_driver_kms_fops = { > #endif > }; > > + > +static void > +radeon_pci_shutdown(struct pci_dev *pdev) > +{ > + struct drm_device *dev = pci_get_drvdata(pdev); > + > + radeon_driver_unload_kms(dev); > +} > + > static struct drm_driver kms_driver = { > .driver_features = > DRIVER_USE_AGP | > @@ -453,6 +462,7 @@ static struct pci_driver radeon_kms_pci_driver = { > .remove = radeon_pci_remove, > .suspend = radeon_pci_suspend, > .resume = radeon_pci_resume, > + .shutdown = radeon_pci_shutdown, > }; > > static int __init radeon_init(void) > -- > Markus > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 3/3] drm/radeon: get rid of r100_restore_sanity hack 2013-09-08 12:09 [PATCH 0/3] drm/radeon kexec fixes Markus Trippelsdorf 2013-09-08 12:10 ` [PATCH 1/3] kexec: get rid of late printk Markus Trippelsdorf 2013-09-08 12:10 ` [PATCH 2/3] drm/radeon: Implement radeon_pci_shutdown Markus Trippelsdorf @ 2013-09-08 12:11 ` Markus Trippelsdorf 2013-09-09 0:32 ` [PATCH 0/3] drm/radeon kexec fixes Eric W. Biederman 3 siblings, 0 replies; 19+ messages in thread From: Markus Trippelsdorf @ 2013-09-08 12:11 UTC (permalink / raw) To: dri-devel; +Cc: Alex Deucher, kexec, Eric Biederman Now that radeon devices are properly shutdown during kexec, we can get rid of r100_restore_sanity. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> --- drivers/gpu/drm/radeon/r100.c | 27 --------------------------- drivers/gpu/drm/radeon/r300.c | 2 -- drivers/gpu/drm/radeon/r420.c | 2 -- drivers/gpu/drm/radeon/r520.c | 2 -- drivers/gpu/drm/radeon/radeon_asic.h | 1 - drivers/gpu/drm/radeon/rs400.c | 2 -- drivers/gpu/drm/radeon/rs600.c | 2 -- drivers/gpu/drm/radeon/rs690.c | 2 -- drivers/gpu/drm/radeon/rv515.c | 2 -- 9 files changed, 42 deletions(-) diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index 9fc61dd..d53dcd8 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -3938,31 +3938,6 @@ void r100_fini(struct radeon_device *rdev) rdev->bios = NULL; } -/* - * Due to how kexec works, it can leave the hw fully initialised when it - * boots the new kernel. However doing our init sequence with the CP and - * WB stuff setup causes GPU hangs on the RN50 at least. So at startup - * do some quick sanity checks and restore sane values to avoid this - * problem. - */ -void r100_restore_sanity(struct radeon_device *rdev) -{ - u32 tmp; - - tmp = RREG32(RADEON_CP_CSQ_CNTL); - if (tmp) { - WREG32(RADEON_CP_CSQ_CNTL, 0); - } - tmp = RREG32(RADEON_CP_RB_CNTL); - if (tmp) { - WREG32(RADEON_CP_RB_CNTL, 0); - } - tmp = RREG32(RADEON_SCRATCH_UMSK); - if (tmp) { - WREG32(RADEON_SCRATCH_UMSK, 0); - } -} - int r100_init(struct radeon_device *rdev) { int r; @@ -3975,8 +3950,6 @@ int r100_init(struct radeon_device *rdev) radeon_scratch_init(rdev); /* Initialize surface registers */ radeon_surface_init(rdev); - /* sanity check some register to avoid hangs like after kexec */ - r100_restore_sanity(rdev); /* TODO: disable VGA need to use VGA request */ /* BIOS*/ if (!radeon_get_bios(rdev)) { diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c index d8dd269..57ba534 100644 --- a/drivers/gpu/drm/radeon/r300.c +++ b/drivers/gpu/drm/radeon/r300.c @@ -1480,8 +1480,6 @@ int r300_init(struct radeon_device *rdev) /* Initialize surface registers */ radeon_surface_init(rdev); /* TODO: disable VGA need to use VGA request */ - /* restore some register to sane defaults */ - r100_restore_sanity(rdev); /* BIOS*/ if (!radeon_get_bios(rdev)) { if (ASIC_IS_AVIVO(rdev)) diff --git a/drivers/gpu/drm/radeon/r420.c b/drivers/gpu/drm/radeon/r420.c index 4e796ec..9ee3360 100644 --- a/drivers/gpu/drm/radeon/r420.c +++ b/drivers/gpu/drm/radeon/r420.c @@ -371,8 +371,6 @@ int r420_init(struct radeon_device *rdev) /* Initialize surface registers */ radeon_surface_init(rdev); /* TODO: disable VGA need to use VGA request */ - /* restore some register to sane defaults */ - r100_restore_sanity(rdev); /* BIOS*/ if (!radeon_get_bios(rdev)) { if (ASIC_IS_AVIVO(rdev)) diff --git a/drivers/gpu/drm/radeon/r520.c b/drivers/gpu/drm/radeon/r520.c index e1aece7..4709c10 100644 --- a/drivers/gpu/drm/radeon/r520.c +++ b/drivers/gpu/drm/radeon/r520.c @@ -256,8 +256,6 @@ int r520_init(struct radeon_device *rdev) radeon_scratch_init(rdev); /* Initialize surface registers */ radeon_surface_init(rdev); - /* restore some register to sane defaults */ - r100_restore_sanity(rdev); /* TODO: disable VGA need to use VGA request */ /* BIOS*/ if (!radeon_get_bios(rdev)) { diff --git a/drivers/gpu/drm/radeon/radeon_asic.h b/drivers/gpu/drm/radeon/radeon_asic.h index 818bbe6..6eee9e2 100644 --- a/drivers/gpu/drm/radeon/radeon_asic.h +++ b/drivers/gpu/drm/radeon/radeon_asic.h @@ -122,7 +122,6 @@ void r100_mc_resume(struct radeon_device *rdev, struct r100_mc_save *save); void r100_vram_init_sizes(struct radeon_device *rdev); int r100_cp_reset(struct radeon_device *rdev); void r100_vga_render_disable(struct radeon_device *rdev); -void r100_restore_sanity(struct radeon_device *rdev); int r100_cs_track_check_pkt3_indx_buffer(struct radeon_cs_parser *p, struct radeon_cs_packet *pkt, struct radeon_bo *robj); diff --git a/drivers/gpu/drm/radeon/rs400.c b/drivers/gpu/drm/radeon/rs400.c index b8074a8..23bbf89 100644 --- a/drivers/gpu/drm/radeon/rs400.c +++ b/drivers/gpu/drm/radeon/rs400.c @@ -510,8 +510,6 @@ int rs400_init(struct radeon_device *rdev) /* Initialize surface registers */ radeon_surface_init(rdev); /* TODO: disable VGA need to use VGA request */ - /* restore some register to sane defaults */ - r100_restore_sanity(rdev); /* BIOS*/ if (!radeon_get_bios(rdev)) { if (ASIC_IS_AVIVO(rdev)) diff --git a/drivers/gpu/drm/radeon/rs600.c b/drivers/gpu/drm/radeon/rs600.c index 670b555..cfe6f80 100644 --- a/drivers/gpu/drm/radeon/rs600.c +++ b/drivers/gpu/drm/radeon/rs600.c @@ -1018,8 +1018,6 @@ int rs600_init(struct radeon_device *rdev) radeon_scratch_init(rdev); /* Initialize surface registers */ radeon_surface_init(rdev); - /* restore some register to sane defaults */ - r100_restore_sanity(rdev); /* BIOS */ if (!radeon_get_bios(rdev)) { if (ASIC_IS_AVIVO(rdev)) diff --git a/drivers/gpu/drm/radeon/rs690.c b/drivers/gpu/drm/radeon/rs690.c index d8ddfb3..e1f4036 100644 --- a/drivers/gpu/drm/radeon/rs690.c +++ b/drivers/gpu/drm/radeon/rs690.c @@ -789,8 +789,6 @@ int rs690_init(struct radeon_device *rdev) radeon_scratch_init(rdev); /* Initialize surface registers */ radeon_surface_init(rdev); - /* restore some register to sane defaults */ - r100_restore_sanity(rdev); /* TODO: disable VGA need to use VGA request */ /* BIOS*/ if (!radeon_get_bios(rdev)) { diff --git a/drivers/gpu/drm/radeon/rv515.c b/drivers/gpu/drm/radeon/rv515.c index 8ea1573..f9692d7 100644 --- a/drivers/gpu/drm/radeon/rv515.c +++ b/drivers/gpu/drm/radeon/rv515.c @@ -627,8 +627,6 @@ int rv515_init(struct radeon_device *rdev) /* Initialize surface registers */ radeon_surface_init(rdev); /* TODO: disable VGA need to use VGA request */ - /* restore some register to sane defaults */ - r100_restore_sanity(rdev); /* BIOS*/ if (!radeon_get_bios(rdev)) { if (ASIC_IS_AVIVO(rdev)) -- Markus _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-08 12:09 [PATCH 0/3] drm/radeon kexec fixes Markus Trippelsdorf ` (2 preceding siblings ...) 2013-09-08 12:11 ` [PATCH 3/3] drm/radeon: get rid of r100_restore_sanity hack Markus Trippelsdorf @ 2013-09-09 0:32 ` Eric W. Biederman 2013-09-09 9:21 ` Markus Trippelsdorf 3 siblings, 1 reply; 19+ messages in thread From: Eric W. Biederman @ 2013-09-09 0:32 UTC (permalink / raw) To: Markus Trippelsdorf; +Cc: Alex Deucher, kexec, dri-devel Markus Trippelsdorf <markus@trippelsdorf.de> writes: > Here are a couple of patches that get kexec working with radeon devices. > I've tested this on my RS780. > Comments or flames are welcome. > Thanks. A couple of high level comments. This looks promising for the usual case. Removing the printk at the end of the kexec path seems a little dubious, what of other cpus, interrupt handlers, etc. Basically estabilishing a new rule on when printk is allowed seems a little dubious at this point, even if it is a useful debugging trick. Having a clean shutdown of the radeon definitely seems worth doing, because the cases where we care abouty video are when a person is in front of the system. I don't know if you want to remove the sanity checks. They seem cheap and safe regardless. Are they expensive or ineffective? Moreover if they work a reasonable amount of the time that means that the kexec on panic case (where we don't shut anything down) can actually use the video, and that in general the driver will be more robust. I don't expect anyone much cares as kexec on panic is mostly used to just write a core file to the network, or the local disk. But if it is easy to keep that case working most of the time, why not. Eric > Markus Trippelsdorf (3): > kexec: get rid of late printk > drm/radeon: Implement radeon_pci_shutdown > drm/radeon: get rid of r100_restore_sanity hack > > drivers/gpu/drm/radeon/r100.c | 27 --------------------------- > drivers/gpu/drm/radeon/r300.c | 2 -- > drivers/gpu/drm/radeon/r420.c | 2 -- > drivers/gpu/drm/radeon/r520.c | 2 -- > drivers/gpu/drm/radeon/radeon_asic.h | 1 - > drivers/gpu/drm/radeon/radeon_drv.c | 10 ++++++++++ > drivers/gpu/drm/radeon/rs400.c | 2 -- > drivers/gpu/drm/radeon/rs600.c | 2 -- > drivers/gpu/drm/radeon/rs690.c | 2 -- > drivers/gpu/drm/radeon/rv515.c | 2 -- > kernel/kexec.c | 1 - > 11 files changed, 10 insertions(+), 43 deletions(-) _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-09 0:32 ` [PATCH 0/3] drm/radeon kexec fixes Eric W. Biederman @ 2013-09-09 9:21 ` Markus Trippelsdorf 2013-09-09 9:38 ` Christian König 2013-09-09 13:04 ` Alex Deucher 0 siblings, 2 replies; 19+ messages in thread From: Markus Trippelsdorf @ 2013-09-09 9:21 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Alex Deucher, kexec, dri-devel On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote: > Markus Trippelsdorf <markus@trippelsdorf.de> writes: > > > Here are a couple of patches that get kexec working with radeon devices. > > I've tested this on my RS780. > > Comments or flames are welcome. > > Thanks. > > A couple of high level comments. > > This looks promising for the usual case. > > Removing the printk at the end of the kexec path seems a little dubious, > what of other cpus, interrupt handlers, etc. Basically estabilishing a > new rule on when printk is allowed seems a little dubious at this point, > even if it is a useful debugging trick. OK. I will drop this patch. It doesn't seem to be necessary, because I cannot reproduce the printk related hang anymore. > Having a clean shutdown of the radeon definitely seems worth doing, > because the cases where we care abouty video are when a person is in > front of the system. Yes. But please note that even with radeon_pci_shutdown implemented, I still get ring test failures on roughly every eighth kexec boot: [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) radeon 0000:01:05.0: disabling GPU acceleration That's definitely better than the current state of affairs, with ring test failures on every second boot. But I haven't figured out the reason for these failures yet. It's curious that once a ring test failure occurs, it will reliably fail after each kexec invocation, no matter how often repeated. Only a reboot brings the machine back to normal. > I don't know if you want to remove the sanity checks. They seem cheap > and safe regardless. Are they expensive or ineffective? Moreover if > they work a reasonable amount of the time that means that the kexec on > panic case (where we don't shut anything down) can actually use the > video, and that in general the driver will be more robust. I don't > expect anyone much cares as kexec on panic is mostly used to just write > a core file to the network, or the local disk. But if it is easy to > keep that case working most of the time, why not. IIRC Alex said the sanity checks are expensive and boot-time could be improved by dropping them. Maybe he can chime in? -- Markus _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-09 9:21 ` Markus Trippelsdorf @ 2013-09-09 9:38 ` Christian König 2013-09-11 9:01 ` Markus Trippelsdorf 2013-09-09 13:04 ` Alex Deucher 1 sibling, 1 reply; 19+ messages in thread From: Christian König @ 2013-09-09 9:38 UTC (permalink / raw) To: Markus Trippelsdorf; +Cc: kexec, Eric W. Biederman, dri-devel Am 09.09.2013 11:21, schrieb Markus Trippelsdorf: > On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote: >> Markus Trippelsdorf <markus@trippelsdorf.de> writes: >> >>> Here are a couple of patches that get kexec working with radeon devices. >>> I've tested this on my RS780. >>> Comments or flames are welcome. >>> Thanks. >> A couple of high level comments. >> >> This looks promising for the usual case. >> >> Removing the printk at the end of the kexec path seems a little dubious, >> what of other cpus, interrupt handlers, etc. Basically estabilishing a >> new rule on when printk is allowed seems a little dubious at this point, >> even if it is a useful debugging trick. > OK. I will drop this patch. It doesn't seem to be necessary, because I > cannot reproduce the printk related hang anymore. > >> Having a clean shutdown of the radeon definitely seems worth doing, >> because the cases where we care abouty video are when a person is in >> front of the system. > Yes. But please note that even with radeon_pci_shutdown implemented, I > still get ring test failures on roughly every eighth kexec boot: > > [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) > radeon 0000:01:05.0: disabling GPU acceleration > > That's definitely better than the current state of affairs, with ring > test failures on every second boot. But I haven't figured out the reason > for these failures yet. It's curious that once a ring test failure > occurs, it will reliably fail after each kexec invocation, no matter how > often repeated. Only a reboot brings the machine back to normal. The main problem here is that the AMD gfx hardware doesn't really support being reinitialized once booted (for various reasons). It's a (intended) limitation of the hardware design that you can only initialize certain blocks once every power cycle, so the whole approach actually will never work 100% reliable. All you can hope for is that stopping the hardware while shutting down the old kernel and starting it again results in exactly the same hardware parameters (offsets, clock etc...) otherwise starting the blocks will just fail and you end up with disabled acceleration like above. Sorry, but there isn't much we can do about this, Christian. >> I don't know if you want to remove the sanity checks. They seem cheap >> and safe regardless. Are they expensive or ineffective? Moreover if >> they work a reasonable amount of the time that means that the kexec on >> panic case (where we don't shut anything down) can actually use the >> video, and that in general the driver will be more robust. I don't >> expect anyone much cares as kexec on panic is mostly used to just write >> a core file to the network, or the local disk. But if it is easy to >> keep that case working most of the time, why not. > IIRC Alex said the sanity checks are expensive and boot-time could be > improved by dropping them. Maybe he can chime in? > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-09 9:38 ` Christian König @ 2013-09-11 9:01 ` Markus Trippelsdorf 2013-09-11 9:10 ` Christian König 2013-09-11 13:30 ` Alex Deucher 0 siblings, 2 replies; 19+ messages in thread From: Markus Trippelsdorf @ 2013-09-11 9:01 UTC (permalink / raw) To: Christian König; +Cc: kexec, Eric W. Biederman, dri-devel On 2013.09.09 at 11:38 +0200, Christian König wrote: > Am 09.09.2013 11:21, schrieb Markus Trippelsdorf: > > On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote: > >> Markus Trippelsdorf <markus@trippelsdorf.de> writes: > >> > >>> Here are a couple of patches that get kexec working with radeon devices. > >>> I've tested this on my RS780. > >>> Comments or flames are welcome. > >>> Thanks. > >> A couple of high level comments. > >> > >> This looks promising for the usual case. > >> > >> Removing the printk at the end of the kexec path seems a little dubious, > >> what of other cpus, interrupt handlers, etc. Basically estabilishing a > >> new rule on when printk is allowed seems a little dubious at this point, > >> even if it is a useful debugging trick. > > OK. I will drop this patch. It doesn't seem to be necessary, because I > > cannot reproduce the printk related hang anymore. > > > >> Having a clean shutdown of the radeon definitely seems worth doing, > >> because the cases where we care abouty video are when a person is in > >> front of the system. > > Yes. But please note that even with radeon_pci_shutdown implemented, I > > still get ring test failures on roughly every eighth kexec boot: > > > > [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) > > radeon 0000:01:05.0: disabling GPU acceleration > > > > That's definitely better than the current state of affairs, with ring > > test failures on every second boot. But I haven't figured out the reason > > for these failures yet. It's curious that once a ring test failure > > occurs, it will reliably fail after each kexec invocation, no matter how > > often repeated. Only a reboot brings the machine back to normal. > > The main problem here is that the AMD gfx hardware doesn't really > support being reinitialized once booted (for various reasons). It's a > (intended) limitation of the hardware design that you can only > initialize certain blocks once every power cycle, so the whole approach > actually will never work 100% reliable. > > All you can hope for is that stopping the hardware while shutting down > the old kernel and starting it again results in exactly the same > hardware parameters (offsets, clock etc...) otherwise starting the > blocks will just fail and you end up with disabled acceleration like above. > > Sorry, but there isn't much we can do about this, I've tested this further and it turned out that if I revert commit f5d9b7f0f9 on top of my "drm/radeon: Implement radeon_pci_shutdown" patch, the initialization failures seem to go away completely. Any idea what's going on? Here's the patch: diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c index fa0de46..4e8c1988 100644 --- a/drivers/gpu/drm/radeon/r600_dpm.c +++ b/drivers/gpu/drm/radeon/r600_dpm.c @@ -296,9 +296,9 @@ bool r600_dynamicpm_enabled(struct radeon_device *rdev) void r600_enable_sclk_control(struct radeon_device *rdev, bool enable) { if (enable) - WREG32_P(SCLK_PWRMGT_CNTL, 0, ~SCLK_PWRMGT_OFF); + WREG32_P(GENERAL_PWRMGT, 0, ~SCLK_PWRMGT_OFF); else - WREG32_P(SCLK_PWRMGT_CNTL, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); + WREG32_P(GENERAL_PWRMGT, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); } void r600_enable_mclk_control(struct radeon_device *rdev, bool enable) -- Markus _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-11 9:01 ` Markus Trippelsdorf @ 2013-09-11 9:10 ` Christian König 2013-09-11 13:30 ` Alex Deucher 1 sibling, 0 replies; 19+ messages in thread From: Christian König @ 2013-09-11 9:10 UTC (permalink / raw) To: Markus Trippelsdorf; +Cc: kexec, Eric W. Biederman, dri-devel Am 11.09.2013 11:01, schrieb Markus Trippelsdorf: > On 2013.09.09 at 11:38 +0200, Christian König wrote: >> Am 09.09.2013 11:21, schrieb Markus Trippelsdorf: >>> On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote: >>>> Markus Trippelsdorf <markus@trippelsdorf.de> writes: >>>> >>>>> Here are a couple of patches that get kexec working with radeon devices. >>>>> I've tested this on my RS780. >>>>> Comments or flames are welcome. >>>>> Thanks. >>>> A couple of high level comments. >>>> >>>> This looks promising for the usual case. >>>> >>>> Removing the printk at the end of the kexec path seems a little dubious, >>>> what of other cpus, interrupt handlers, etc. Basically estabilishing a >>>> new rule on when printk is allowed seems a little dubious at this point, >>>> even if it is a useful debugging trick. >>> OK. I will drop this patch. It doesn't seem to be necessary, because I >>> cannot reproduce the printk related hang anymore. >>> >>>> Having a clean shutdown of the radeon definitely seems worth doing, >>>> because the cases where we care abouty video are when a person is in >>>> front of the system. >>> Yes. But please note that even with radeon_pci_shutdown implemented, I >>> still get ring test failures on roughly every eighth kexec boot: >>> >>> [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) >>> radeon 0000:01:05.0: disabling GPU acceleration >>> >>> That's definitely better than the current state of affairs, with ring >>> test failures on every second boot. But I haven't figured out the reason >>> for these failures yet. It's curious that once a ring test failure >>> occurs, it will reliably fail after each kexec invocation, no matter how >>> often repeated. Only a reboot brings the machine back to normal. >> The main problem here is that the AMD gfx hardware doesn't really >> support being reinitialized once booted (for various reasons). It's a >> (intended) limitation of the hardware design that you can only >> initialize certain blocks once every power cycle, so the whole approach >> actually will never work 100% reliable. >> >> All you can hope for is that stopping the hardware while shutting down >> the old kernel and starting it again results in exactly the same >> hardware parameters (offsets, clock etc...) otherwise starting the >> blocks will just fail and you end up with disabled acceleration like above. >> >> Sorry, but there isn't much we can do about this, > I've tested this further and it turned out that if I revert commit > f5d9b7f0f9 on top of my "drm/radeon: Implement radeon_pci_shutdown" > patch, the initialization failures seem to go away completely. > > Any idea what's going on? Well DPM is mostly Alex domain, but if I have to guess I would say that the SCLK is gated by the hardware when the driver is unloaded and since DPM initialized only later not ungated when the driver loads again. > Here's the patch: > > diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c > index fa0de46..4e8c1988 100644 > --- a/drivers/gpu/drm/radeon/r600_dpm.c > +++ b/drivers/gpu/drm/radeon/r600_dpm.c > @@ -296,9 +296,9 @@ bool r600_dynamicpm_enabled(struct radeon_device *rdev) > void r600_enable_sclk_control(struct radeon_device *rdev, bool enable) > { > if (enable) > - WREG32_P(SCLK_PWRMGT_CNTL, 0, ~SCLK_PWRMGT_OFF); > + WREG32_P(GENERAL_PWRMGT, 0, ~SCLK_PWRMGT_OFF); > else > - WREG32_P(SCLK_PWRMGT_CNTL, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); > + WREG32_P(GENERAL_PWRMGT, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); > } > > void r600_enable_mclk_control(struct radeon_device *rdev, bool enable) The patch just breaks SCLK gating on R6xx again, so no gain here. Christian. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-11 9:01 ` Markus Trippelsdorf 2013-09-11 9:10 ` Christian König @ 2013-09-11 13:30 ` Alex Deucher 1 sibling, 0 replies; 19+ messages in thread From: Alex Deucher @ 2013-09-11 13:30 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Christian König, kexec, Eric W. Biederman, Maling list - DRI developers On Wed, Sep 11, 2013 at 5:01 AM, Markus Trippelsdorf <markus@trippelsdorf.de> wrote: > On 2013.09.09 at 11:38 +0200, Christian König wrote: >> Am 09.09.2013 11:21, schrieb Markus Trippelsdorf: >> > On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote: >> >> Markus Trippelsdorf <markus@trippelsdorf.de> writes: >> >> >> >>> Here are a couple of patches that get kexec working with radeon devices. >> >>> I've tested this on my RS780. >> >>> Comments or flames are welcome. >> >>> Thanks. >> >> A couple of high level comments. >> >> >> >> This looks promising for the usual case. >> >> >> >> Removing the printk at the end of the kexec path seems a little dubious, >> >> what of other cpus, interrupt handlers, etc. Basically estabilishing a >> >> new rule on when printk is allowed seems a little dubious at this point, >> >> even if it is a useful debugging trick. >> > OK. I will drop this patch. It doesn't seem to be necessary, because I >> > cannot reproduce the printk related hang anymore. >> > >> >> Having a clean shutdown of the radeon definitely seems worth doing, >> >> because the cases where we care abouty video are when a person is in >> >> front of the system. >> > Yes. But please note that even with radeon_pci_shutdown implemented, I >> > still get ring test failures on roughly every eighth kexec boot: >> > >> > [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) >> > radeon 0000:01:05.0: disabling GPU acceleration >> > >> > That's definitely better than the current state of affairs, with ring >> > test failures on every second boot. But I haven't figured out the reason >> > for these failures yet. It's curious that once a ring test failure >> > occurs, it will reliably fail after each kexec invocation, no matter how >> > often repeated. Only a reboot brings the machine back to normal. >> >> The main problem here is that the AMD gfx hardware doesn't really >> support being reinitialized once booted (for various reasons). It's a >> (intended) limitation of the hardware design that you can only >> initialize certain blocks once every power cycle, so the whole approach >> actually will never work 100% reliable. >> >> All you can hope for is that stopping the hardware while shutting down >> the old kernel and starting it again results in exactly the same >> hardware parameters (offsets, clock etc...) otherwise starting the >> blocks will just fail and you end up with disabled acceleration like above. >> >> Sorry, but there isn't much we can do about this, > > I've tested this further and it turned out that if I revert commit > f5d9b7f0f9 on top of my "drm/radeon: Implement radeon_pci_shutdown" > patch, the initialization failures seem to go away completely. > > Any idea what's going on? > You are disabling dynamic power management with that patch reverted. The patch fixed a copy paste typo in the register. Bit 0 (SCLK_PWRMGT_OFF) of register SCLK_PWRMGT_CNTL controls whether dynamic engine clock control is enabled. Bit 0 (GLOBAL_PWRMGT_EN) of register GENERAL_PWRMGT controls whether dynamic power management (dynamic engine/memory/voltage, controls etc.) is enabled at all. Alex > Here's the patch: > > diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c > index fa0de46..4e8c1988 100644 > --- a/drivers/gpu/drm/radeon/r600_dpm.c > +++ b/drivers/gpu/drm/radeon/r600_dpm.c > @@ -296,9 +296,9 @@ bool r600_dynamicpm_enabled(struct radeon_device *rdev) > void r600_enable_sclk_control(struct radeon_device *rdev, bool enable) > { > if (enable) > - WREG32_P(SCLK_PWRMGT_CNTL, 0, ~SCLK_PWRMGT_OFF); > + WREG32_P(GENERAL_PWRMGT, 0, ~SCLK_PWRMGT_OFF); > else > - WREG32_P(SCLK_PWRMGT_CNTL, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); > + WREG32_P(GENERAL_PWRMGT, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF); > } > > void r600_enable_mclk_control(struct radeon_device *rdev, bool enable) > > > -- > Markus > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-09 9:21 ` Markus Trippelsdorf 2013-09-09 9:38 ` Christian König @ 2013-09-09 13:04 ` Alex Deucher 2013-09-10 18:27 ` Eric W. Biederman 1 sibling, 1 reply; 19+ messages in thread From: Alex Deucher @ 2013-09-09 13:04 UTC (permalink / raw) To: Markus Trippelsdorf Cc: kexec, Eric W. Biederman, Maling list - DRI developers On Mon, Sep 9, 2013 at 5:21 AM, Markus Trippelsdorf <markus@trippelsdorf.de> wrote: > On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote: >> Markus Trippelsdorf <markus@trippelsdorf.de> writes: >> >> > Here are a couple of patches that get kexec working with radeon devices. >> > I've tested this on my RS780. >> > Comments or flames are welcome. >> > Thanks. >> >> A couple of high level comments. >> >> This looks promising for the usual case. >> >> Removing the printk at the end of the kexec path seems a little dubious, >> what of other cpus, interrupt handlers, etc. Basically estabilishing a >> new rule on when printk is allowed seems a little dubious at this point, >> even if it is a useful debugging trick. > > OK. I will drop this patch. It doesn't seem to be necessary, because I > cannot reproduce the printk related hang anymore. > >> Having a clean shutdown of the radeon definitely seems worth doing, >> because the cases where we care abouty video are when a person is in >> front of the system. > > Yes. But please note that even with radeon_pci_shutdown implemented, I > still get ring test failures on roughly every eighth kexec boot: > > [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) > radeon 0000:01:05.0: disabling GPU acceleration > > That's definitely better than the current state of affairs, with ring > test failures on every second boot. But I haven't figured out the reason > for these failures yet. It's curious that once a ring test failure > occurs, it will reliably fail after each kexec invocation, no matter how > often repeated. Only a reboot brings the machine back to normal. > >> I don't know if you want to remove the sanity checks. They seem cheap >> and safe regardless. Are they expensive or ineffective? Moreover if >> they work a reasonable amount of the time that means that the kexec on >> panic case (where we don't shut anything down) can actually use the >> video, and that in general the driver will be more robust. I don't >> expect anyone much cares as kexec on panic is mostly used to just write >> a core file to the network, or the local disk. But if it is easy to >> keep that case working most of the time, why not. > > IIRC Alex said the sanity checks are expensive and boot-time could be > improved by dropping them. Maybe he can chime in? They shouldn't be necessary with a proper shutdown, but in this particular case, they are not very expensive. What is expensive is having a separate sanity check functions for all the various hw blocks to teardown everything on startup prior to starting it up in case kexec, etc. left the system in a bad state. It ends up amounting to a full tear down sequence followed by a full start up sequence every time you load the driver. I can't really comment on the first patch, but the rest seem fine. Alex _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-09 13:04 ` Alex Deucher @ 2013-09-10 18:27 ` Eric W. Biederman 2013-09-10 20:40 ` Alex Deucher 0 siblings, 1 reply; 19+ messages in thread From: Eric W. Biederman @ 2013-09-10 18:27 UTC (permalink / raw) To: Alex Deucher Cc: Christian König, kexec, Maling list - DRI developers, Markus Trippelsdorf Alex Deucher <alexdeucher@gmail.com> writes: > On Mon, Sep 9, 2013 at 5:21 AM, Markus Trippelsdorf > <markus@trippelsdorf.de> wrote: > >> IIRC Alex said the sanity checks are expensive and boot-time could be >> improved by dropping them. Maybe he can chime in? > > They shouldn't be necessary with a proper shutdown, but in this > particular case, they are not very expensive. What is expensive is > having a separate sanity check functions for all the various hw blocks > to teardown everything on startup prior to starting it up in case > kexec, etc. left the system in a bad state. It ends up amounting to a > full tear down sequence followed by a full start up sequence every > time you load the driver. > > I can't really comment on the first patch, but the rest seem fine. Let me reask the question just a little bit. Is it the sanity checks that are expensive? Or is it the reinitialization that is triggered by the sanity checks that is expensive? From what Christian said in the other reply it sounds like this is a game we will never completely win, but it would be nice to have half a chance in the kexec on panic case to have video. So I am curious to know if the checks are expensive when we are coming at hardware in a clean state. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-10 18:27 ` Eric W. Biederman @ 2013-09-10 20:40 ` Alex Deucher 2013-09-11 8:53 ` Markus Trippelsdorf 0 siblings, 1 reply; 19+ messages in thread From: Alex Deucher @ 2013-09-10 20:40 UTC (permalink / raw) To: Eric W. Biederman Cc: Christian König, kexec, Maling list - DRI developers, Markus Trippelsdorf On Tue, Sep 10, 2013 at 2:27 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Alex Deucher <alexdeucher@gmail.com> writes: > >> On Mon, Sep 9, 2013 at 5:21 AM, Markus Trippelsdorf >> <markus@trippelsdorf.de> wrote: >> >>> IIRC Alex said the sanity checks are expensive and boot-time could be >>> improved by dropping them. Maybe he can chime in? >> >> They shouldn't be necessary with a proper shutdown, but in this >> particular case, they are not very expensive. What is expensive is >> having a separate sanity check functions for all the various hw blocks >> to teardown everything on startup prior to starting it up in case >> kexec, etc. left the system in a bad state. It ends up amounting to a >> full tear down sequence followed by a full start up sequence every >> time you load the driver. >> >> I can't really comment on the first patch, but the rest seem fine. > > Let me reask the question just a little bit. > > Is it the sanity checks that are expensive? Or is it the > reinitialization that is triggered by the sanity checks that is > expensive? > > From what Christian said in the other reply it sounds like this is a > game we will never completely win, but it would be nice to have half a > chance in the kexec on panic case to have video. So I am curious to > know if the checks are expensive when we are coming at hardware in a > clean state. The particular sanity checks from this patch set are not expensive, but we had previously discussed more extensive sanity checks for other aspects of the chips in prior conversations. Prior to this patch set, the hw is not torn down properly (might have been in the middle of DMA for example) when kexec happens. That's why the sanity checks were added in the first place. With this patch set, the sanity checks shouldn't be necessary. Alex _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-10 20:40 ` Alex Deucher @ 2013-09-11 8:53 ` Markus Trippelsdorf 2013-09-11 9:21 ` Christian König 2013-09-11 13:40 ` Alex Deucher 0 siblings, 2 replies; 19+ messages in thread From: Markus Trippelsdorf @ 2013-09-11 8:53 UTC (permalink / raw) To: Alex Deucher Cc: Christian König, kexec, Eric W. Biederman, Maling list - DRI developers On 2013.09.10 at 16:40 -0400, Alex Deucher wrote: > On Tue, Sep 10, 2013 at 2:27 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: > > Alex Deucher <alexdeucher@gmail.com> writes: > > > >> On Mon, Sep 9, 2013 at 5:21 AM, Markus Trippelsdorf > >> <markus@trippelsdorf.de> wrote: > >> > >>> IIRC Alex said the sanity checks are expensive and boot-time could be > >>> improved by dropping them. Maybe he can chime in? > >> > >> They shouldn't be necessary with a proper shutdown, but in this > >> particular case, they are not very expensive. What is expensive is > >> having a separate sanity check functions for all the various hw blocks > >> to teardown everything on startup prior to starting it up in case > >> kexec, etc. left the system in a bad state. It ends up amounting to a > >> full tear down sequence followed by a full start up sequence every > >> time you load the driver. > >> > >> I can't really comment on the first patch, but the rest seem fine. > > > > Let me reask the question just a little bit. > > > > Is it the sanity checks that are expensive? Or is it the > > reinitialization that is triggered by the sanity checks that is > > expensive? > > > > From what Christian said in the other reply it sounds like this is a > > game we will never completely win, but it would be nice to have half a > > chance in the kexec on panic case to have video. So I am curious to > > know if the checks are expensive when we are coming at hardware in a > > clean state. > > The particular sanity checks from this patch set are not expensive, > but we had previously discussed more extensive sanity checks for other > aspects of the chips in prior conversations. Prior to this patch set, > the hw is not torn down properly (might have been in the middle of DMA > for example) when kexec happens. That's why the sanity checks were > added in the first place. With this patch set, the sanity checks > shouldn't be necessary. I think you're talking past each other. What Eric worries about is the »kexec on panic« case, where the shutdown method *isn't* called. In this case the sanity checks, that are only superfluous when the hardware was shutdown normally during kexec (the default case), may actually help. And because the checks aren't expensive, it wouldn't hurt to just leave them in place. -- Markus _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-11 8:53 ` Markus Trippelsdorf @ 2013-09-11 9:21 ` Christian König 2013-09-11 13:40 ` Alex Deucher 1 sibling, 0 replies; 19+ messages in thread From: Christian König @ 2013-09-11 9:21 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Alex Deucher, kexec, Eric W. Biederman, Maling list - DRI developers Am 11.09.2013 10:53, schrieb Markus Trippelsdorf: > On 2013.09.10 at 16:40 -0400, Alex Deucher wrote: >> On Tue, Sep 10, 2013 at 2:27 PM, Eric W. Biederman >> <ebiederm@xmission.com> wrote: >>> Alex Deucher <alexdeucher@gmail.com> writes: >>> >>>> On Mon, Sep 9, 2013 at 5:21 AM, Markus Trippelsdorf >>>> <markus@trippelsdorf.de> wrote: >>>> >>>>> IIRC Alex said the sanity checks are expensive and boot-time could be >>>>> improved by dropping them. Maybe he can chime in? >>>> They shouldn't be necessary with a proper shutdown, but in this >>>> particular case, they are not very expensive. What is expensive is >>>> having a separate sanity check functions for all the various hw blocks >>>> to teardown everything on startup prior to starting it up in case >>>> kexec, etc. left the system in a bad state. It ends up amounting to a >>>> full tear down sequence followed by a full start up sequence every >>>> time you load the driver. >>>> >>>> I can't really comment on the first patch, but the rest seem fine. >>> Let me reask the question just a little bit. >>> >>> Is it the sanity checks that are expensive? Or is it the >>> reinitialization that is triggered by the sanity checks that is >>> expensive? >>> >>> From what Christian said in the other reply it sounds like this is a >>> game we will never completely win, but it would be nice to have half a >>> chance in the kexec on panic case to have video. So I am curious to >>> know if the checks are expensive when we are coming at hardware in a >>> clean state. >> The particular sanity checks from this patch set are not expensive, >> but we had previously discussed more extensive sanity checks for other >> aspects of the chips in prior conversations. Prior to this patch set, >> the hw is not torn down properly (might have been in the middle of DMA >> for example) when kexec happens. That's why the sanity checks were >> added in the first place. With this patch set, the sanity checks >> shouldn't be necessary. > I think you're talking past each other. > What Eric worries about is the »kexec on panic« case, where the shutdown > method *isn't* called. In this case the sanity checks, that are only > superfluous when the hardware was shutdown normally during kexec (the > default case), may actually help. And because the checks aren't > expensive, it wouldn't hurt to just leave them in place. When we don't know the state the hardware is in it is really hard to get into a working configuration again, even with the sanity checks in place. For example you can't just cut power or reset the UVD block without knowing it's state, cause then you have a good chance that you hit into the middle of a register or memory access of the VCPU. This usually results in the next few registers accesses only return either 0x0 or 0xffffffff and after that the system completely locks up. Isn't it possible to know that we are in a "kexec after panic" case and only try to initialize the display hardware and not all the other blocks? That would at least allow us to get an error message of what happened on the screen and otherwise advise the user to do a clean reset. Christian. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/3] drm/radeon kexec fixes 2013-09-11 8:53 ` Markus Trippelsdorf 2013-09-11 9:21 ` Christian König @ 2013-09-11 13:40 ` Alex Deucher 1 sibling, 0 replies; 19+ messages in thread From: Alex Deucher @ 2013-09-11 13:40 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Christian König, kexec, Eric W. Biederman, Maling list - DRI developers On Wed, Sep 11, 2013 at 4:53 AM, Markus Trippelsdorf <markus@trippelsdorf.de> wrote: > On 2013.09.10 at 16:40 -0400, Alex Deucher wrote: >> On Tue, Sep 10, 2013 at 2:27 PM, Eric W. Biederman >> <ebiederm@xmission.com> wrote: >> > Alex Deucher <alexdeucher@gmail.com> writes: >> > >> >> On Mon, Sep 9, 2013 at 5:21 AM, Markus Trippelsdorf >> >> <markus@trippelsdorf.de> wrote: >> >> >> >>> IIRC Alex said the sanity checks are expensive and boot-time could be >> >>> improved by dropping them. Maybe he can chime in? >> >> >> >> They shouldn't be necessary with a proper shutdown, but in this >> >> particular case, they are not very expensive. What is expensive is >> >> having a separate sanity check functions for all the various hw blocks >> >> to teardown everything on startup prior to starting it up in case >> >> kexec, etc. left the system in a bad state. It ends up amounting to a >> >> full tear down sequence followed by a full start up sequence every >> >> time you load the driver. >> >> >> >> I can't really comment on the first patch, but the rest seem fine. >> > >> > Let me reask the question just a little bit. >> > >> > Is it the sanity checks that are expensive? Or is it the >> > reinitialization that is triggered by the sanity checks that is >> > expensive? >> > >> > From what Christian said in the other reply it sounds like this is a >> > game we will never completely win, but it would be nice to have half a >> > chance in the kexec on panic case to have video. So I am curious to >> > know if the checks are expensive when we are coming at hardware in a >> > clean state. >> >> The particular sanity checks from this patch set are not expensive, >> but we had previously discussed more extensive sanity checks for other >> aspects of the chips in prior conversations. Prior to this patch set, >> the hw is not torn down properly (might have been in the middle of DMA >> for example) when kexec happens. That's why the sanity checks were >> added in the first place. With this patch set, the sanity checks >> shouldn't be necessary. > > I think you're talking past each other. > What Eric worries about is the »kexec on panic« case, where the shutdown > method *isn't* called. In this case the sanity checks, that are only > superfluous when the hardware was shutdown normally during kexec (the > default case), may actually help. And because the checks aren't > expensive, it wouldn't hurt to just leave them in place. For the particular sanity check in this patch set, that's fine. Even with that in place, it only covers a small subset of GPUs and it doesn't cover all possible problematic areas. In general kexec on panic seems like a recipe for trouble. You may end up in the middle of several dma transactions across multiple engines on the GPU. We could add code to tear down every block on the chip on start up in case such an event happened, but like Christian said, some blocks can't really be recovered depending on how things were ended, and then we've doubled the start up time for the driver. Additionally, most of the tear down is dependent on various driver state which we wouldn't have at start up, so most of code would need to re-architected so that it could be called independently of the relevant driver state. Alex _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-09-11 13:41 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-08 12:09 [PATCH 0/3] drm/radeon kexec fixes Markus Trippelsdorf 2013-09-08 12:10 ` [PATCH 1/3] kexec: get rid of late printk Markus Trippelsdorf 2013-09-08 20:11 ` Daniel Vetter 2013-09-08 20:42 ` Bruno Prémont 2013-09-08 12:10 ` [PATCH 2/3] drm/radeon: Implement radeon_pci_shutdown Markus Trippelsdorf 2013-09-09 13:32 ` Konrad Rzeszutek Wilk 2013-09-08 12:11 ` [PATCH 3/3] drm/radeon: get rid of r100_restore_sanity hack Markus Trippelsdorf 2013-09-09 0:32 ` [PATCH 0/3] drm/radeon kexec fixes Eric W. Biederman 2013-09-09 9:21 ` Markus Trippelsdorf 2013-09-09 9:38 ` Christian König 2013-09-11 9:01 ` Markus Trippelsdorf 2013-09-11 9:10 ` Christian König 2013-09-11 13:30 ` Alex Deucher 2013-09-09 13:04 ` Alex Deucher 2013-09-10 18:27 ` Eric W. Biederman 2013-09-10 20:40 ` Alex Deucher 2013-09-11 8:53 ` Markus Trippelsdorf 2013-09-11 9:21 ` Christian König 2013-09-11 13:40 ` Alex Deucher
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox