public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* SB600 AHCI: Hard Disk Corruption
@ 2008-05-25 12:10 Patrick
  2008-05-25 12:16 ` Patrick
  2008-05-25 17:38 ` Pavel Machek
  0 siblings, 2 replies; 25+ messages in thread
From: Patrick @ 2008-05-25 12:10 UTC (permalink / raw)
  To: linux-kernel

Hello (Tejun Heo *)

I've got an annoying problem with my athlon 64bit, 4gb ram, asus m2a-vm
(->SB600 AHCI controller), SAMSUNG HD501LJ SATA Disk. I'm using kernel
2.6.26-rc3. Everything works fine, expect for standby/suspend/hibernate.
Standby freezes, hibernate, I acually haven't tested lately cause I
want suspend to ram to work first.

"echo mem > /sys/power/state; vbetool post;" (on text console)
successfully suspends the system and it resumes as well, BUT: After
resuming, things quickly turn bad: "file not fonund", kernel reports
ext2 errors on root (lvm) partition. After a (hard) reboot the root
fileystem won't even be recognized again by mount and e2fschk can harldy
recover it (thousands of inodes go to lost+found, have to restore
backups to make the system work again). This happend even when the
partition was mounted _readonly_ and it happens to ALL partitions
mounted during suspend. ** I'm testing now by appending break=init to
the kernel command line, getting to a busybox on the initramfs, and then
unmounting "root" before suspending. From there i can dmesg to see
what's happening (though the dmesg buffer is quiet small...can i
increase that in proc somewhere?). I'd be willing to test and send
whatever logs you need to get this fixed.

Some additional infos: Upgrading from 2.6.24, I hoped the
AHCI_HFLAG_NO_MSI in drivers/ata/ahci.c might solve the issue - no luck.
All the other sb600 workarounds: obviousley no luck as well.
irqpoll: slightly different behaviour when unloading sd_mod and ahci
modules before suspending:
without irqpoll, the disk ([sda]) doesn't show up again after "modprobe
ahci; modprobe sd_mod" and I get "ata5.00: failed to IDENTIFY [...]
err_mask=0x80" "failed to restore some devices [...]" errors
with irqpoll, disk shows up again and no errors, but "there is different
data" on each read (head -c10000) from /dev/sda. Though the disk is not
changed, after rebooting it contains the original data. I just wonder
how the data is "created" - it seems to be disk content from different
locations (not beginning) on the disk - if i "dd if=/dev/sda
of=/dev/null", i hear the disk reading data....

Well - I hope you might be able to make some sense of that and tell me
what logs and dumps exactly you need to fix it...

Greets - Patrick



* I read many threads in which Tejun provided patches for the SB600 AHCI
Controller which seems to be seriously broken - if only i knew that in
advance... Maybe he can fix this issue as well - last ressort. Otherwise
I'll burn that mobo!

** After my firs install and configuring the system for a day, trying
out suspend to ram smashed it with no backups, since then i didn't learn
my lesson and smashed it again 2-3 times, this time with backups at hand
though, ...




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: SB600 AHCI: Hard Disk Corruption
  2008-05-25 12:10 SB600 AHCI: Hard Disk Corruption Patrick
@ 2008-05-25 12:16 ` Patrick
  2008-05-25 17:38 ` Pavel Machek
  1 sibling, 0 replies; 25+ messages in thread
From: Patrick @ 2008-05-25 12:16 UTC (permalink / raw)
  To: linux-kernel

Uh, dough, forgot: Could you please CC me on replies, since i'm not
subscribed - thanks


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: SB600 AHCI: Hard Disk Corruption
  2008-05-25 12:10 SB600 AHCI: Hard Disk Corruption Patrick
  2008-05-25 12:16 ` Patrick
@ 2008-05-25 17:38 ` Pavel Machek
  2008-05-25 20:08   ` Patrick
  1 sibling, 1 reply; 25+ messages in thread
From: Pavel Machek @ 2008-05-25 17:38 UTC (permalink / raw)
  To: Patrick; +Cc: linux-kernel

Hi!

> I've got an annoying problem with my athlon 64bit, 4gb ram, ???asus m2a-vm
> (->SB600 AHCI controller), SAMSUNG HD501LJ SATA Disk. I'm using kernel
> 2.6.26-rc3. Everything works fine, expect for standby/suspend/hibernate.
> Standby freezes, hibernate, I acually ???haven't tested lately cause I
> want suspend to ram to work first.
> 
> "echo mem > /sys/power/state; vbetool post;" (on text console)
> successfully suspends the system and it resumes as well, BUT: After
> resuming, things quickly turn bad: "file not fonund", kernel reports

iommu problem? Try it with mem=3G.

> * I read many threads in which Tejun provided patches for the SB600 AHCI
> Controller which seems to be seriously broken - if only i knew that in
> advance... Maybe he can fix this issue as well - last ressort. Otherwise
> I'll burn that mobo!

I suspect all you need is to burn one dimm.. or send it to me so that
I can reproduce it ;-).

> ???** After my firs install and configuring the system for a day, trying
> out suspend to ram smashed it with no backups, since then i didn't learn
> my lesson and smashed it again 2-3 times, this time with backups at hand
> though, ...

Boot it from cd ;-).

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: SB600 AHCI: Hard Disk Corruption
  2008-05-25 17:38 ` Pavel Machek
@ 2008-05-25 20:08   ` Patrick
  2008-05-25 20:39     ` >3G => iommu => suspend problems -- was " Pavel Machek
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick @ 2008-05-25 20:08 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

> iommu problem? Try it with mem=3G.

YES! :-) How did you know?

Even mem=4G did the trick. I should have tried that out long time back
(!), since it also used to make the fglrx driver problems I was dealing
with go away. These are solved now though, by a recent bios update
(Version 1705 2008/04/21). The fglrx driver used to work once but
stopped to work after an upgrade one day. Now with the new bios, the
current 
(ubuntu 8.04) version is working.

So now, using mem=4G, after successfully suspending once, my
ahci-scsi-libata-[sda] disk is still working. The error messages I
mentioned dissapeared from the kernel messages. Some other, usb driver
error messages, that used show up after resume have disappeared as well.

Anyway: using mem=4G is definitely no option! I get only 3G of usable
memory!!! *

So how are we going to get this fixed???

Should the IOMMU be reinitialized after resume? Or should the bios do it
but doesn't? ** On the m2a-vm, the "GART" seems to be used as iommu,
so... :-) ???

I put some kernel message logs here:
http://zefir.freesitespace.net/dmesg/
One log without mem=4G will also be there shortly, where you will see
the messages concerning the iommu (which seems to be unused right now).

Greets

*
> I suspect all you need is to burn one dimm.. or send it to me so that
> I can reproduce it ;-).
;-) ...looking back - all the trouble the 4G caused me till now, I
would have saved lots of time mounting only one 2G dimm in the first
place, even if the system was a little slower, but be it...

> Boot it from cd ;-).
I'm impatient...

** Seems, the SB600 is not to blame - alas sorry for the subject.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-05-25 20:08   ` Patrick
@ 2008-05-25 20:39     ` Pavel Machek
  2008-05-25 21:10       ` Pavel Machek
  0 siblings, 1 reply; 25+ messages in thread
From: Pavel Machek @ 2008-05-25 20:39 UTC (permalink / raw)
  To: Patrick; +Cc: linux-kernel

Hi!

> > iommu problem? Try it with mem=3G.
> 
> YES! :-) How did you know?

Guess how... I hit it myself.

> So how are we going to get this fixed???

Write a patch, submit it? ;-).

Okay, I guess I should do the patch, but I can't test it easily. If
you can do testing/some development, I guess I can try to cook up
something.

(But no, I'm not an IOMMU expert).

> Should the IOMMU be reinitialized after resume? Or should the bios do it
> but doesn't? ** On the m2a-vm, the "GART" seems to be used as iommu,
> so... :-) ???

It is a Linux bug. BIOS could be more helpful, but... this is a Linux
problem.

> I put some kernel message logs here:
> http://zefir.freesitespace.net/dmesg/
> One log without mem=4G will also be there shortly, where you will see
> the messages concerning the iommu (which seems to be unused right
> now).

Yep, they are similar to what I see.

> ???** Seems, the SB600 is not to blame - alas sorry for the subject.

Subject is easy to change ;-).
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-05-25 20:39     ` >3G => iommu => suspend problems -- was " Pavel Machek
@ 2008-05-25 21:10       ` Pavel Machek
  2008-05-26 15:31         ` Patrick
  2008-05-27 10:23         ` Pavel Machek
  0 siblings, 2 replies; 25+ messages in thread
From: Pavel Machek @ 2008-05-25 21:10 UTC (permalink / raw)
  To: Patrick; +Cc: linux-kernel

Hi!

> > > iommu problem? Try it with mem=3G.
> > 
> > YES! :-) How did you know?
> 
> Guess how... I hit it myself.
> 
> > So how are we going to get this fixed???
> 
> Write a patch, submit it? ;-).

Can you try this one? It should prevent suspend in the broken cases,
but allow it in mem=4G config.

								Pavel

iommu/gart support misses suspend/resume code, which can do bad stuff,
including memory corruption on resume. Prevent system suspend in case
we would be unable to resume.

Signed-off-by: Pavel Machek <pavel@suse.cz>

---
commit 7724af033ea084f0b037ae8a2032da5e40255088
tree 06de661a106ba83a96cab2ee0e76e3f3c44823ab
parent e9f4353b46ec2b05f73e1a84085c305de211bd3e
author Pavel <pavel@amd.ucw.cz> Sun, 25 May 2008 23:08:17 +0200
committer Pavel <pavel@amd.ucw.cz> Sun, 25 May 2008 23:08:17 +0200

 arch/x86/kernel/pci-gart_64.c |   31 ++++++++++++++++++++++++++++++-
 1 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index 620ec3a..926af9c 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -26,6 +26,7 @@ #include <linux/bitops.h>
 #include <linux/kdebug.h>
 #include <linux/scatterlist.h>
 #include <linux/iommu-helper.h>
+#include <linux/sysdev.h>
 #include <asm/atomic.h>
 #include <asm/io.h>
 #include <asm/mtrr.h>
@@ -548,6 +549,28 @@ static __init unsigned read_aperture(str
 	return aper_base;
 }
 
+static int gart_resume(struct sys_device *dev)
+{
+	return 0;
+}
+
+static int gart_suspend(struct sys_device *dev, pm_message_t state)
+{
+	return -EINVAL;
+}
+
+static struct sysdev_class gart_sysdev_class = {
+	.name = "gart",
+	.suspend = gart_suspend,
+	.resume = gart_resume,
+
+};
+
+static struct sys_device device_gart = {
+	.id	= 0,
+	.cls	= &gart_sysdev_class,
+};
+
 /*
  * Private Northbridge GATT initialization in case we cannot use the
  * AGP driver for some reason.
@@ -558,7 +581,7 @@ static __init int init_k8_gatt(struct ag
 	unsigned aper_base, new_aper_base;
 	struct pci_dev *dev;
 	void *gatt;
-	int i;
+	int i, error;
 
 	printk(KERN_INFO "PCI-DMA: Disabling AGP.\n");
 	aper_size = aper_base = info->aper_size = 0;
@@ -595,6 +618,12 @@ static __init int init_k8_gatt(struct ag
 		dev = k8_northbridges[i];
 		enable_gart_translation(dev, __pa(gatt));
 	}
+	
+	error = sysdev_class_register(&gart_sysdev_class);
+	if (!error)
+		error = sysdev_register(&device_gart);
+	if (error)
+		panic("Could not register gart_sysdev -- would corrupt data on next suspend");
 	flush_gart();
 
 	printk(KERN_INFO "PCI-DMA: aperture base @ %x size %u KB\n",


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-05-25 21:10       ` Pavel Machek
@ 2008-05-26 15:31         ` Patrick
  2008-05-27 11:22           ` Pavel Machek
  2008-05-27 10:23         ` Pavel Machek
  1 sibling, 1 reply; 25+ messages in thread
From: Patrick @ 2008-05-26 15:31 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

> Can you try this one? It should prevent suspend in the broken cases,
> but allow it in mem=4G config.

Sure!

root@babar:/usr/src/linux-2.6.25# patch -p1 < /home/pat/patch-2.6.26-rc3.gart-suspend
patching file arch/x86/kernel/pci-gart_64.c
Hunk #4 succeeded at 629 with fuzz 2 (offset 11 lines).

.....make; cp bzImage /boot; reboot....

without mem=4G:
...
[17180761.682783] CPU1 is down
[17180761.682968] Class suspend failed for gart0
[17180761.683321] PM: Some devices failed to power down
[17180761.683326] Enabling non-boot CPUs ...

with mem=4G:
...
[no change]

-> perfect! Behaves as you predicted.

logs are ... here: http://zefir.890m.com/dmesg/ *


Thanks for attacking the issue really quickly.

I'm ready to do further testing and will try to find enough
time to do one test a day during the week.

Cheers



* The other host's dns entry is gone, I guess that's because it
had too much requests and was taken down...? Hope this one survives!



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-05-25 21:10       ` Pavel Machek
  2008-05-26 15:31         ` Patrick
@ 2008-05-27 10:23         ` Pavel Machek
  1 sibling, 0 replies; 25+ messages in thread
From: Pavel Machek @ 2008-05-27 10:23 UTC (permalink / raw)
  To: Patrick, Rafael J. Wysocki; +Cc: linux-kernel

Hi!

> > > > iommu problem? Try it with mem=3G.
> > > 
> > > YES! :-) How did you know?
> > 
> > Guess how... I hit it myself.
> > 
> > > So how are we going to get this fixed???
> > 
> > Write a patch, submit it? ;-).
> 
> Can you try this one? It should prevent suspend in the broken cases,
> but allow it in mem=4G config.

Apply this on top of previous patch, and you may get working system
_and_ all the memory...

Add resume support to pci-gart_64.c. This is neccessary for resume not
to currupt disk on >3GB machines.

Signed-off-by: Pavel Machek <pavel@suse.cz>

---
commit cc8201de538dda6c17e03fe495146e7fc755f64d
tree 9f4ece8312b59e6e14eb3a38d489ff37070d6cf1
parent db95a81f7f2106655c6ceb05a38300fd26f6ea3f
author Pavel <pavel@amd.ucw.cz> Tue, 27 May 2008 12:23:45 +0200
committer Pavel <pavel@amd.ucw.cz> Tue, 27 May 2008 12:23:45 +0200

 arch/x86/kernel/aperture_64.c |   45 +++++++++++++++++++++++++++--------------
 arch/x86/kernel/pci-gart_64.c |   28 ++++++++++++++++++++------
 drivers/char/agp/generic.c    |    2 +-
 include/asm-x86/gart.h        |    1 +
 4 files changed, 54 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 2088b6a..2571dc4 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -302,6 +302,32 @@ void __init early_gart_iommu_check(void)
 
 }
 
+u32 fix_aper_enabled, fix_aper_order, fix_aper_alloc;
+
+void fix_up_north_bridges(void)
+{
+	int num;
+	if (!fix_aper_enabled)
+		return;
+
+	/* Fix up the north bridges */
+	for (num = 24; num < 32; num++) {
+		if (!early_is_k8_nb(read_pci_config(0, num, 3, 0x00)))
+			continue;
+
+		/*
+		 * Don't enable translation yet. That is done later
+		 * by enable_gart_translation.
+		 *
+		 * Assume this BIOS didn't initialise the GART so
+		 * just overwrite all previous bits
+		 */
+		write_pci_config(0, num, 3, AMD64_GARTAPERTURECTL, fix_aper_order<<1);
+		write_pci_config(0, num, 3, AMD64_GARTAPERTUREBASE, fix_aper_alloc>>25);
+	}
+}
+
+
 void __init gart_iommu_hole_init(void)
 {
 	u32 aper_size, aper_alloc = 0, aper_order = 0, last_aper_order = 0;
@@ -393,19 +419,8 @@ void __init gart_iommu_hole_init(void)
 		return;
 	}
 
-	/* Fix up the north bridges */
-	for (num = 24; num < 32; num++) {
-		if (!early_is_k8_nb(read_pci_config(0, num, 3, 0x00)))
-			continue;
-
-		/*
-		 * Don't enable translation yet. That is done later
-		 * by enable_gart_translation.
-		 *
-		 * Assume this BIOS didn't initialise the GART so
-		 * just overwrite all previous bits
-		 */
-		write_pci_config(0, num, 3, AMD64_GARTAPERTURECTL, aper_order<<1);
-		write_pci_config(0, num, 3, AMD64_GARTAPERTUREBASE, aper_alloc>>25);
-	}
+	fix_aper_enabled = 1;
+	fix_aper_order = aper_order;
+	fix_aper_alloc = aper_alloc;
+	fix_up_north_bridges();
 }
diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index 926af9c..dbd3000 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -549,14 +549,27 @@ static __init unsigned read_aperture(str
 	return aper_base;
 }
 
+static void enable_gart_translations(void)
+{
+	int i;
+	struct pci_dev *dev;
+
+	for (i = 0; i < num_k8_northbridges; i++) {
+		dev = k8_northbridges[i];
+		enable_gart_translation(dev, __pa(agp_gatt_table));
+	}
+}
+
 static int gart_resume(struct sys_device *dev)
 {
+	fix_up_north_bridges();
+	enable_gart_translations();
 	return 0;
 }
 
 static int gart_suspend(struct sys_device *dev, pm_message_t state)
 {
-	return -EINVAL;
+	return 0;
 }
 
 static struct sysdev_class gart_sysdev_class = {
@@ -571,6 +584,7 @@ static struct sys_device device_gart = {
 	.cls	= &gart_sysdev_class,
 };
 
+
 /*
  * Private Northbridge GATT initialization in case we cannot use the
  * AGP driver for some reason.
@@ -614,11 +628,8 @@ static __init int init_k8_gatt(struct ag
 	memset(gatt, 0, gatt_size);
 	agp_gatt_table = gatt;
 
-	for (i = 0; i < num_k8_northbridges; i++) {
-		dev = k8_northbridges[i];
-		enable_gart_translation(dev, __pa(gatt));
-	}
-	
+	enable_gart_translations();
+
 	error = sysdev_class_register(&gart_sysdev_class);
 	if (!error)
 		error = sysdev_register(&device_gart);
@@ -651,6 +662,11 @@ static const struct dma_mapping_ops gart
 	.unmap_sg			= gart_unmap_sg,
 };
 
+/* Called from native_machine_shutdown; should this use regular
+ * shutdown call from sysdev?  Why is this needed at all? Some broken
+ * BIOS can't cope with gart enabled during reboot?
+ */
+
 void gart_iommu_shutdown(void)
 {
 	struct pci_dev *dev;
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index 7fc0c99..7fb4d5b 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -43,7 +43,7 @@ #include <asm/cacheflush.h>
 #include <asm/pgtable.h>
 #include "agp.h"
 
-__u32 *agp_gatt_table;
+u32 *agp_gatt_table;
 int agp_memory_reserved;
 
 /*
diff --git a/include/asm-x86/gart.h b/include/asm-x86/gart.h
index f37d83b..6f27b14 100644
--- a/include/asm-x86/gart.h
+++ b/include/asm-x86/gart.h
@@ -93,5 +93,6 @@ static inline int __aperture_valid(u64 a
 	return 1;
 }
 
+extern void fix_up_north_bridges(void);
 
 #endif


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-05-26 15:31         ` Patrick
@ 2008-05-27 11:22           ` Pavel Machek
  2008-05-29 18:44             ` Patrick
  2008-06-03 22:33             ` Rafael J. Wysocki
  0 siblings, 2 replies; 25+ messages in thread
From: Pavel Machek @ 2008-05-27 11:22 UTC (permalink / raw)
  To: Patrick; +Cc: linux-kernel

Hi!

> > Can you try this one? It should prevent suspend in the broken cases,
> > but allow it in mem=4G config.
> 
> Sure!
> 
> root@babar:/usr/src/linux-2.6.25# patch -p1 < /home/pat/patch-2.6.26-rc3.gart-suspend
> patching file arch/x86/kernel/pci-gart_64.c
> Hunk #4 succeeded at 629 with fuzz 2 (offset 11 lines).
> 
> .....make; cp bzImage /boot; reboot....

Thanks!

This goes on top of the second patch... it makes it work.

								Pavel

For iommu suspend/resume code to work, functions it calls may not be
__init.

Signed-off-by: Pavel Machek <pavel@suse.cz>

---
commit 0ea376de01be797f9563c2c2464149f8f0af6329
tree 4b5179fe97fe045cc770091bce94f898f26e4499
parent 017834f8541b8ded8ef831e5fe2b5f9cead4f6b0
author Pavel <pavel@amd.ucw.cz> Tue, 27 May 2008 13:21:05 +0200
committer Pavel <pavel@amd.ucw.cz> Tue, 27 May 2008 13:21:05 +0200

 arch/x86/kernel/k8.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/k8.c b/arch/x86/kernel/k8.c
index 7377ccb..acf4770 100644
--- a/arch/x86/kernel/k8.c
+++ b/arch/x86/kernel/k8.c
@@ -76,7 +76,7 @@ EXPORT_SYMBOL_GPL(cache_k8_northbridges)
 
 /* Ignores subdevice/subvendor but as far as I can figure out
    they're useless anyways */
-int __init early_is_k8_nb(u32 device)
+int early_is_k8_nb(u32 device)
 {
 	struct pci_device_id *id;
 	u32 vendor = device & 0xffff;


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-05-27 11:22           ` Pavel Machek
@ 2008-05-29 18:44             ` Patrick
  2008-05-29 18:51               ` Patrick
  2008-05-29 21:05               ` Patrick
  2008-06-03 22:33             ` Rafael J. Wysocki
  1 sibling, 2 replies; 25+ messages in thread
From: Patrick @ 2008-05-29 18:44 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

Hi!

> This goes on top of the second patch... it makes it work.
> 
> 								Pavel
> 

All right. It works! :-) Thank you very much!

I'll describe how I tested:
So far you sent 3 Patches to the list. As I couldn't apply the second
one to my v2.6.26-rc4 tree i had to get git and then your tree. Now I
have a git working tree with two branches *master* and *pavel*,
corresponding to torvalds/linux-2.6 and pavel/work respectively.
I made *pavel* the current branch and issued the following command:

git diff v2.6.26-rc4 	arch/x86/kernel/aperture_64.c \
			arch/x86/kernel/k8.c \
			arch/x86/kernel/pci-gart_64.c \
			drivers/char/agp/generic.c \
			include/asm-x86/gart.h \
	> /home/pat/suspend-vs-iommu.patch

The result is on http://zefir.890m.com/dmesg/

I applied this patch to my old v2.6.26-rc4 tree and recompiled a new
kernel there *, put it in place and rebooted. I suspended several times
since then and put a kernel log of one normal boot process with suspend
from console while X is running (radeonhd) on the site mentioned before.

I'm running debian testing / ubuntu hardy mixed system and used
kernel .config from linux-headers-2.6.25-2-amd64.deb (debian unstable)
with "make oldconfig" and defaults on new options.

The diff is nearly 500 lines long, but i it could be narrowed, I guess.
For me it works perfectly like this though. If you want me to do any
further tests, just say so. I'm trying to get to know git a bit better
and hope to be able to help again sometime. It's great fun!  :-)

Cheers



* I get 
"ACPI: Unable to turn cooling device [ffff81012fa5cdd0] 'off'" every two
seconds and "fancontrol" isn't working any more... going to send a
acpidump to the acpi group. Maybe i'll find out myself as well...
investigating. Not sure if it's caused by the patch.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-05-29 18:44             ` Patrick
@ 2008-05-29 18:51               ` Patrick
  2008-05-29 21:05               ` Patrick
  1 sibling, 0 replies; 25+ messages in thread
From: Patrick @ 2008-05-29 18:51 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

_This_ was meant to be the first footnote, actually:

*
I guess these build warnings have nothing to do with the patch I
applied:

root@babar:/mnt/reiser/linux/linux-2.6.25# make -j3 CONFIG_DEBUG_SECTION_MISMATCH=y
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CALL    scripts/checksyscalls.sh
  CHK     include/linux/compile.h
  Building modules, stage 2.
Kernel: arch/x86/boot/bzImage is ready  (#2)
  MODPOST 1876 modules
WARNING: drivers/isdn/hisax/hisax.o(.text+0xbc7): Section mismatch in reference from the function HiSax_inithardware() to the function .devinit.text:hisax_cs_setup_card()
The function HiSax_inithardware() references
the function __devinit hisax_cs_setup_card().
This is often because HiSax_inithardware lacks a __devinit
annotation or the annotation of hisax_cs_setup_card is wrong.

WARNING: drivers/isdn/hisax/hisax.o(.text+0xccc): Section mismatch in reference from the function hisax_init_pcmcia() to the function .devinit.text:hisax_cs_setup_card()
The function hisax_init_pcmcia() references
the function __devinit hisax_cs_setup_card().
This is often because hisax_init_pcmcia lacks a __devinit
annotation or the annotation of hisax_cs_setup_card is wrong.

WARNING: drivers/isdn/hisax/hisax.o(.text+0x1198): Section mismatch in reference from the function hisax_register() to the function .devinit.text:hisax_cs_setup_card()
The function hisax_register() references
the function __devinit hisax_cs_setup_card().
This is often because hisax_register lacks a __devinit
annotation or the annotation of hisax_cs_setup_card is wrong.

WARNING: drivers/scsi/gdth.o(.text+0x3c35): Section mismatch in reference from the function gdth_pci_probe_one() to the function .init.text:gdth_search_drives()
The function gdth_pci_probe_one() references
the function __init gdth_search_drives().
This is often because gdth_pci_probe_one lacks a __init
annotation or the annotation of gdth_search_drives is wrong.

WARNING: drivers/scsi/gdth.o(.text+0x3d3a): Section mismatch in reference from the function gdth_pci_probe_one() to the function .init.text:gdth_enable_int()
The function gdth_pci_probe_one() references
the function __init gdth_enable_int().
This is often because gdth_pci_probe_one lacks a __init
annotation or the annotation of gdth_enable_int is wrong.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-05-29 18:44             ` Patrick
  2008-05-29 18:51               ` Patrick
@ 2008-05-29 21:05               ` Patrick
  1 sibling, 0 replies; 25+ messages in thread
From: Patrick @ 2008-05-29 21:05 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

> * I get 
> "ACPI: Unable to turn cooling device [ffff81012fa5cdd0] 'off'" every two
> seconds and "fancontrol" isn't working any more... going to send a
> acpidump to the acpi group. Maybe i'll find out myself as well...
> investigating. Not sure if it's caused by the patch.

It's caused by the suspend/resume process, not the patch - it happens
with the unpatched -rc4 kernel and mem=4G as well after resume.

"fancontrol" needed a config file update only (new device, hwmon0).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-05-27 11:22           ` Pavel Machek
  2008-05-29 18:44             ` Patrick
@ 2008-06-03 22:33             ` Rafael J. Wysocki
  2008-06-06 13:20               ` Pavel Machek
  1 sibling, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2008-06-03 22:33 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Patrick, linux-kernel, Ingo Molnar, pm list

On Tuesday, 27 of May 2008, Pavel Machek wrote:
> Hi!
> 
> > > Can you try this one? It should prevent suspend in the broken cases,
> > > but allow it in mem=4G config.
> > 
> > Sure!
> > 
> > root@babar:/usr/src/linux-2.6.25# patch -p1 < /home/pat/patch-2.6.26-rc3.gart-suspend
> > patching file arch/x86/kernel/pci-gart_64.c
> > Hunk #4 succeeded at 629 with fuzz 2 (offset 11 lines).
> > 
> > .....make; cp bzImage /boot; reboot....
> 
> Thanks!
> 
> This goes on top of the second patch... it makes it work.
> 
> 								Pavel
> 
> For iommu suspend/resume code to work, functions it calls may not be
> __init.
> 
> Signed-off-by: Pavel Machek <pavel@suse.cz>

I consolidated some of your patches sent in this thread and made the result
apply to the current -git.  It hasn't been tested yet, but does it look good?

It's on top of the patch that adds the GART sysdev.

Thanks,
Rafael

---
Handle GART IOMMU suspend and resume.

Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/aperture_64.c |   34 ++++++++++++++++---------
 arch/x86/kernel/k8.c          |    2 -
 arch/x86/kernel/pci-gart_64.c |   55 +++++++++++++++++++++++++++++-------------
 include/asm-x86/gart.h        |    2 +
 4 files changed, 63 insertions(+), 30 deletions(-)

Index: linux-2.6/arch/x86/kernel/pci-gart_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-gart_64.c
+++ linux-2.6/arch/x86/kernel/pci-gart_64.c
@@ -549,14 +549,50 @@ static __init unsigned read_aperture(str
 	return aper_base;
 }
 
+static void enable_gart_translations(void)
+{
+	int i;
+
+	for (i = 0; i < num_k8_northbridges; i++) {
+		struct pci_dev *dev;
+		u32 gatt_reg;
+		u32 ctl;
+
+		dev = k8_northbridges[i];
+		gatt_reg = __pa(agp_gatt_table) >> 12;
+		gatt_reg <<= 4;
+		pci_write_config_dword(dev, 0x98, gatt_reg);
+		pci_read_config_dword(dev, 0x90, &ctl);
+
+		ctl |= 1;
+		ctl &= ~((1<<4) | (1<<5));
+
+		pci_write_config_dword(dev, 0x90, ctl);
+	}
+}
+
+static bool fix_north_bridges;	/* call fix_up_north_bridges() on resume */
+static u32 aperture_order;	/* arguments for fix_up_north_bridges() */
+static u32 aperture_alloc;
+
+void set_gart_resume_data(u32 aper_order, u32 aper_alloc)
+{
+	fix_north_bridges = true;
+	aperture_order = aper_order;
+	aperture_alloc = aper_alloc;
+}
+
 static int gart_resume(struct sys_device *dev)
 {
+	if (fix_north_bridges)
+		fix_up_north_bridges(aperture_order, aperture_alloc);
+	enable_gart_translations();
 	return 0;
 }
 
 static int gart_suspend(struct sys_device *dev, pm_message_t state)
 {
-	return -EINVAL;
+	return 0;
 }
 
 static struct sysdev_class gart_sysdev_class = {
@@ -613,27 +649,14 @@ static __init int init_k8_gatt(struct ag
 	memset(gatt, 0, gatt_size);
 	agp_gatt_table = gatt;
 
-	for (i = 0; i < num_k8_northbridges; i++) {
-		u32 gatt_reg;
-		u32 ctl;
-
-		dev = k8_northbridges[i];
-		gatt_reg = __pa(gatt) >> 12;
-		gatt_reg <<= 4;
-		pci_write_config_dword(dev, 0x98, gatt_reg);
-		pci_read_config_dword(dev, 0x90, &ctl);
-
-		ctl |= 1;
-		ctl &= ~((1<<4) | (1<<5));
-
-		pci_write_config_dword(dev, 0x90, ctl);
-	}
+	enable_gart_translations();
 
 	error = sysdev_class_register(&gart_sysdev_class);
 	if (!error)
 		error = sysdev_register(&device_gart);
 	if (error)
 		panic("Could not register gart_sysdev -- would corrupt data on next suspend");
+
 	flush_gart();
 
 	printk(KERN_INFO "PCI-DMA: aperture base @ %x size %u KB\n",
Index: linux-2.6/arch/x86/kernel/k8.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/k8.c
+++ linux-2.6/arch/x86/kernel/k8.c
@@ -76,7 +76,7 @@ EXPORT_SYMBOL_GPL(cache_k8_northbridges)
 
 /* Ignores subdevice/subvendor but as far as I can figure out
    they're useless anyways */
-int __init early_is_k8_nb(u32 device)
+int early_is_k8_nb(u32 device)
 {
 	struct pci_device_id *id;
 	u32 vendor = device & 0xffff;
Index: linux-2.6/arch/x86/kernel/aperture_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/aperture_64.c
+++ linux-2.6/arch/x86/kernel/aperture_64.c
@@ -310,6 +310,25 @@ void __init early_gart_iommu_check(void)
 
 }
 
+void fix_up_north_bridges(u32 aper_order, u32 aper_alloc)
+{
+	int num;
+
+	/* Fix up the north bridges */
+	for (num = 24; num < 32; num++) {
+		if (!early_is_k8_nb(read_pci_config(0, num, 3, 0x00)))
+			continue;
+
+		/*
+		 * Don't enable translation yet. That is done later.
+		 * Assume this BIOS didn't initialise the GART so
+		 * just overwrite all previous bits
+		 */
+		write_pci_config(0, num, 3, 0x90, aper_order << 1);
+		write_pci_config(0, num, 3, 0x94, aper_alloc >> 25);
+	}
+}
+
 void __init gart_iommu_hole_init(void)
 {
 	u32 aper_size, aper_alloc = 0, aper_order = 0, last_aper_order = 0;
@@ -400,17 +419,6 @@ void __init gart_iommu_hole_init(void)
 		return;
 	}
 
-	/* Fix up the north bridges */
-	for (num = 24; num < 32; num++) {
-		if (!early_is_k8_nb(read_pci_config(0, num, 3, 0x00)))
-			continue;
-
-		/*
-		 * Don't enable translation yet. That is done later.
-		 * Assume this BIOS didn't initialise the GART so
-		 * just overwrite all previous bits
-		 */
-		write_pci_config(0, num, 3, 0x90, aper_order<<1);
-		write_pci_config(0, num, 3, 0x94, aper_alloc>>25);
-	}
+	fix_up_north_bridges(aper_order, aper_alloc);
+	set_gart_resume_data(aper_order, aper_alloc);
 }
Index: linux-2.6/include/asm-x86/gart.h
===================================================================
--- linux-2.6.orig/include/asm-x86/gart.h
+++ linux-2.6/include/asm-x86/gart.h
@@ -11,6 +11,8 @@ extern void gart_iommu_shutdown(void);
 extern void __init gart_parse_options(char *);
 extern void early_gart_iommu_check(void);
 extern void gart_iommu_hole_init(void);
+extern void set_gart_resume_data(u32, u32);
+extern void fix_up_north_bridges(u32, u32);
 extern int fallback_aper_order;
 extern int fallback_aper_force;
 extern int gart_iommu_aperture;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-06-03 22:33             ` Rafael J. Wysocki
@ 2008-06-06 13:20               ` Pavel Machek
  2008-06-08 22:36                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Pavel Machek @ 2008-06-06 13:20 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Patrick, linux-kernel, Ingo Molnar, pm list

Hi!

> > > > Can you try this one? It should prevent suspend in the broken cases,
> > > > but allow it in mem=4G config.
> > > 
> > > Sure!
> > > 
> > > root@babar:/usr/src/linux-2.6.25# patch -p1 < /home/pat/patch-2.6.26-rc3.gart-suspend
> > > patching file arch/x86/kernel/pci-gart_64.c
> > > Hunk #4 succeeded at 629 with fuzz 2 (offset 11 lines).
> > > 
> > > .....make; cp bzImage /boot; reboot....
> > 
> > Thanks!
> > 
> > This goes on top of the second patch... it makes it work.
> > 
> > 								Pavel
> > 
> > For iommu suspend/resume code to work, functions it calls may not be
> > __init.
> > 
> > Signed-off-by: Pavel Machek <pavel@suse.cz>
> 
> I consolidated some of your patches sent in this thread and made the result
> apply to the current -git.  It hasn't been tested yet, but does it look good?
> 
> It's on top of the patch that adds the GART sysdev.

Looks ok to me.

							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-06-06 13:20               ` Pavel Machek
@ 2008-06-08 22:36                 ` Rafael J. Wysocki
       [not found]                   ` <20080609124630.GA28799@elte.hu>
  2008-06-11 11:43                   ` >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption Patrick
  0 siblings, 2 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2008-06-08 22:36 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Patrick, linux-kernel, Ingo Molnar, pm list, Andi Kleen

On Friday, 6 of June 2008, Pavel Machek wrote:
> Hi!
> 
> > > > > Can you try this one? It should prevent suspend in the broken cases,
> > > > > but allow it in mem=4G config.
> > > > 
> > > > Sure!
> > > > 
> > > > root@babar:/usr/src/linux-2.6.25# patch -p1 < /home/pat/patch-2.6.26-rc3.gart-suspend
> > > > patching file arch/x86/kernel/pci-gart_64.c
> > > > Hunk #4 succeeded at 629 with fuzz 2 (offset 11 lines).
> > > > 
> > > > .....make; cp bzImage /boot; reboot....
> > > 
> > > Thanks!
> > > 
> > > This goes on top of the second patch... it makes it work.
> > > 
> > > 								Pavel
> > > 
> > > For iommu suspend/resume code to work, functions it calls may not be
> > > __init.
> > > 
> > > Signed-off-by: Pavel Machek <pavel@suse.cz>
> > 
> > I consolidated some of your patches sent in this thread and made the result
> > apply to the current -git.  It hasn't been tested yet, but does it look good?
> > 
> > It's on top of the patch that adds the GART sysdev.
> 
> Looks ok to me.

Still, it may be improved. :-)

First, we shouldn't mix the "early PCI config access" thing with the "normal"
method.  Second, we don't have to check for the K8 north bridges on resume,
because we already know where they are in the configuration space and we can
use this information.

Updated patch follows.  It has been tested a little on my new 4 GB test box on
which 2.6.26-rc4 failed miserably with severe consequences.  More testing
welcome, but please be careful.

Thanks,
Rafael

---
Add resume handling to GART IOMMU.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/aperture_64.c |    2 +
 arch/x86/kernel/pci-gart_64.c |   75 +++++++++++++++++++++++++++++++++---------
 include/asm-x86/gart.h        |    1 
 3 files changed, 62 insertions(+), 16 deletions(-)

Index: linux-2.6/arch/x86/kernel/pci-gart_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/pci-gart_64.c
+++ linux-2.6/arch/x86/kernel/pci-gart_64.c
@@ -549,14 +549,70 @@ static __init unsigned read_aperture(str
 	return aper_base;
 }
 
+static void enable_gart_translations(void)
+{
+	int i;
+
+	for (i = 0; i < num_k8_northbridges; i++) {
+		struct pci_dev *dev;
+		u32 gatt_reg;
+		u32 ctl;
+
+		dev = k8_northbridges[i];
+		gatt_reg = __pa(agp_gatt_table) >> 12;
+		gatt_reg <<= 4;
+		pci_write_config_dword(dev, 0x98, gatt_reg);
+		pci_read_config_dword(dev, 0x90, &ctl);
+
+		ctl |= 1;
+		ctl &= ~((1<<4) | (1<<5));
+
+		pci_write_config_dword(dev, 0x90, ctl);
+	}
+}
+
+/*
+ * If fix_up_north_bridges is set, the north bridges have to be fixed up on
+ * resume in the same way as they are handled in gart_iommu_hole_init().
+ */
+static bool fix_up_north_bridges;
+static u32 aperture_order;
+static u32 aperture_alloc;
+
+void set_up_gart_resume(u32 aper_order, u32 aper_alloc)
+{
+	fix_up_north_bridges = true;
+	aperture_order = aper_order;
+	aperture_alloc = aper_alloc;
+}
+
 static int gart_resume(struct sys_device *dev)
 {
+	printk(KERN_INFO "PCI-DMA: Resuming GART IOMMU\n");
+
+	if (fix_up_north_bridges) {
+		int i;
+
+		for (i = 0; i < num_k8_northbridges; i++) {
+			struct pci_dev *dev = k8_northbridges[i];
+
+			/*
+			 * Don't enable translations just yet.  That is the next
+			 * step.  Restore the pre-suspend aperture settings.
+			 */
+			pci_write_config_dword(dev, 0x90, aperture_order << 1);
+			pci_write_config_dword(dev, 0x94, aperture_alloc >> 25);
+		}
+	}
+
+	enable_gart_translations();
+
 	return 0;
 }
 
 static int gart_suspend(struct sys_device *dev, pm_message_t state)
 {
-	return -EINVAL;
+	return 0;
 }
 
 static struct sysdev_class gart_sysdev_class = {
@@ -614,27 +670,14 @@ static __init int init_k8_gatt(struct ag
 	memset(gatt, 0, gatt_size);
 	agp_gatt_table = gatt;
 
-	for (i = 0; i < num_k8_northbridges; i++) {
-		u32 gatt_reg;
-		u32 ctl;
-
-		dev = k8_northbridges[i];
-		gatt_reg = __pa(gatt) >> 12;
-		gatt_reg <<= 4;
-		pci_write_config_dword(dev, 0x98, gatt_reg);
-		pci_read_config_dword(dev, 0x90, &ctl);
-
-		ctl |= 1;
-		ctl &= ~((1<<4) | (1<<5));
-
-		pci_write_config_dword(dev, 0x90, ctl);
-	}
+	enable_gart_translations();
 
 	error = sysdev_class_register(&gart_sysdev_class);
 	if (!error)
 		error = sysdev_register(&device_gart);
 	if (error)
 		panic("Could not register gart_sysdev -- would corrupt data on next suspend");
+
 	flush_gart();
 
 	printk(KERN_INFO "PCI-DMA: aperture base @ %x size %u KB\n",
Index: linux-2.6/arch/x86/kernel/aperture_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/aperture_64.c
+++ linux-2.6/arch/x86/kernel/aperture_64.c
@@ -413,4 +413,6 @@ void __init gart_iommu_hole_init(void)
 		write_pci_config(0, num, 3, 0x90, aper_order<<1);
 		write_pci_config(0, num, 3, 0x94, aper_alloc>>25);
 	}
+
+	set_up_gart_resume(aper_order, aper_alloc);
 }
Index: linux-2.6/include/asm-x86/gart.h
===================================================================
--- linux-2.6.orig/include/asm-x86/gart.h
+++ linux-2.6/include/asm-x86/gart.h
@@ -11,6 +11,7 @@ extern void gart_iommu_shutdown(void);
 extern void __init gart_parse_options(char *);
 extern void early_gart_iommu_check(void);
 extern void gart_iommu_hole_init(void);
+extern void set_up_gart_resume(u32, u32);
 extern int fallback_aper_order;
 extern int fallback_aper_force;
 extern int gart_iommu_aperture;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] x86 GART: Add resume handling (was: Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption)
       [not found]                   ` <20080609124630.GA28799@elte.hu>
@ 2008-06-09 22:10                     ` Rafael J. Wysocki
  2008-06-10 10:03                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2008-06-09 22:10 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Pavel Machek, the arch/x86 maintainers, pm list, LKML

On Monday, 9 of June 2008, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > > Looks ok to me.
> > 
> > Still, it may be improved. :-)
> > 
> > First, we shouldn't mix the "early PCI config access" thing with the 
> > "normal" method.  Second, we don't have to check for the K8 north 
> > bridges on resume, because we already know where they are in the 
> > configuration space and we can use this information.
> > 
> > Updated patch follows.  It has been tested a little on my new 4 GB 
> > test box on which 2.6.26-rc4 failed miserably with severe 
> > consequences.  More testing welcome, but please be careful.
> 
> Rafael, could we try this against the tip/x86/gart tree perhaps? It 
> already has a couple of fixes from Pavel and your patch collides with 
> them in a non-obvious way.
> 
>   http://people.redhat.com/mingo/tip.git/README

Okay, appended is the patch rebased on tip/x86/gart with (mainline) commit
cd76374e9de4501acc74f833dc6cb5e7a5dca115 "suspend-vs-iommu: prevent suspend if
we could not resume" (which appears to be missing from tip/x86/gart) applied.

This version of the patch doesn't break compilation, but it hasn't been really
tested yet.

Thanks,
Rafael

---
Add resume handling to GART IOMMU.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/x86/kernel/aperture_64.c |    2 +
 arch/x86/kernel/pci-gart_64.c |   57 ++++++++++++++++++++++++++++++++++++++----
 include/asm-x86/gart.h        |    1 
 3 files changed, 55 insertions(+), 5 deletions(-)

Index: tip.git/arch/x86/kernel/aperture_64.c
===================================================================
--- tip.git.orig/arch/x86/kernel/aperture_64.c
+++ tip.git/arch/x86/kernel/aperture_64.c
@@ -496,4 +496,6 @@ out:
 			write_pci_config(bus, slot, 3, AMD64_GARTAPERTUREBASE, aper_alloc >> 25);
 		}
 	}
+
+	set_up_gart_resume(aper_order, aper_alloc);
 }
Index: tip.git/arch/x86/kernel/pci-gart_64.c
===================================================================
--- tip.git.orig/arch/x86/kernel/pci-gart_64.c
+++ tip.git/arch/x86/kernel/pci-gart_64.c
@@ -549,14 +549,63 @@ static __init unsigned read_aperture(str
 	return aper_base;
 }
 
+static void enable_gart_translations(void)
+{
+	int i;
+
+	for (i = 0; i < num_k8_northbridges; i++) {
+		struct pci_dev *dev = k8_northbridges[i];
+
+		enable_gart_translation(dev, __pa(agp_gatt_table));
+	}
+}
+
+/*
+ * If fix_up_north_bridges is set, the north bridges have to be fixed up on
+ * resume in the same way as they are handled in gart_iommu_hole_init().
+ */
+static bool fix_up_north_bridges;
+static u32 aperture_order;
+static u32 aperture_alloc;
+
+void set_up_gart_resume(u32 aper_order, u32 aper_alloc)
+{
+	fix_up_north_bridges = true;
+	aperture_order = aper_order;
+	aperture_alloc = aper_alloc;
+}
+
 static int gart_resume(struct sys_device *dev)
 {
+	printk(KERN_INFO "PCI-DMA: Resuming GART IOMMU\n");
+
+	if (fix_up_north_bridges) {
+		int i;
+
+		printk(KERN_INFO "PCI-DMA: Restoring GART aperture settings\n");
+
+		for (i = 0; i < num_k8_northbridges; i++) {
+			struct pci_dev *dev = k8_northbridges[i];
+
+			/*
+			 * Don't enable translations just yet.  That is the next
+			 * step.  Restore the pre-suspend aperture settings.
+			 */
+			pci_write_config_dword(dev, AMD64_GARTAPERTURECTL,
+						aperture_order << 1);
+			pci_write_config_dword(dev, AMD64_GARTAPERTUREBASE,
+						aperture_alloc >> 25);
+		}
+	}
+
+	enable_gart_translations();
+
 	return 0;
 }
 
 static int gart_suspend(struct sys_device *dev, pm_message_t state)
 {
-	return -EINVAL;
+	return 0;
 }
 
 static struct sysdev_class gart_sysdev_class = {
@@ -614,16 +663,14 @@ static __init int init_k8_gatt(struct ag
 	memset(gatt, 0, gatt_size);
 	agp_gatt_table = gatt;
 
-	for (i = 0; i < num_k8_northbridges; i++) {
-		dev = k8_northbridges[i];
-		enable_gart_translation(dev, __pa(gatt));
-	}
+	enable_gart_translations();
 
 	error = sysdev_class_register(&gart_sysdev_class);
 	if (!error)
 		error = sysdev_register(&device_gart);
 	if (error)
 		panic("Could not register gart_sysdev -- would corrupt data on next suspend");
+
 	flush_gart();
 
 	printk(KERN_INFO "PCI-DMA: aperture base @ %x size %u KB\n",
Index: tip.git/include/asm-x86/gart.h
===================================================================
--- tip.git.orig/include/asm-x86/gart.h
+++ tip.git/include/asm-x86/gart.h
@@ -14,6 +14,7 @@ extern void gart_iommu_shutdown(void);
 extern void __init gart_parse_options(char *);
 extern void early_gart_iommu_check(void);
 extern void gart_iommu_hole_init(void);
+extern void set_up_gart_resume(u32, u32);
 extern int fallback_aper_order;
 extern int fallback_aper_force;
 extern int gart_iommu_aperture;



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] x86 GART: Add resume handling (was: Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption)
  2008-06-09 22:10                     ` [PATCH] x86 GART: Add resume handling (was: Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption) Rafael J. Wysocki
@ 2008-06-10 10:03                       ` Rafael J. Wysocki
  2008-06-12  9:34                         ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2008-06-10 10:03 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Pavel Machek, the arch/x86 maintainers, pm list, LKML

On Tuesday, 10 of June 2008, Rafael J. Wysocki wrote:
> On Monday, 9 of June 2008, Ingo Molnar wrote:
> > 
> > * Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > 
> > > > Looks ok to me.
> > > 
> > > Still, it may be improved. :-)
> > > 
> > > First, we shouldn't mix the "early PCI config access" thing with the 
> > > "normal" method.  Second, we don't have to check for the K8 north 
> > > bridges on resume, because we already know where they are in the 
> > > configuration space and we can use this information.
> > > 
> > > Updated patch follows.  It has been tested a little on my new 4 GB 
> > > test box on which 2.6.26-rc4 failed miserably with severe 
> > > consequences.  More testing welcome, but please be careful.
> > 
> > Rafael, could we try this against the tip/x86/gart tree perhaps? It 
> > already has a couple of fixes from Pavel and your patch collides with 
> > them in a non-obvious way.
> > 
> >   http://people.redhat.com/mingo/tip.git/README
> 
> Okay, appended is the patch rebased on tip/x86/gart with (mainline) commit
> cd76374e9de4501acc74f833dc6cb5e7a5dca115 "suspend-vs-iommu: prevent suspend if
> we could not resume" (which appears to be missing from tip/x86/gart) applied.
> 
> This version of the patch doesn't break compilation, but it hasn't been really
> tested yet.

Now it has been (successfully) tested too. :-)

Thanks,
Rafael

 
> ---
> Add resume handling to GART IOMMU.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  arch/x86/kernel/aperture_64.c |    2 +
>  arch/x86/kernel/pci-gart_64.c |   57 ++++++++++++++++++++++++++++++++++++++----
>  include/asm-x86/gart.h        |    1 
>  3 files changed, 55 insertions(+), 5 deletions(-)
> 
> Index: tip.git/arch/x86/kernel/aperture_64.c
> ===================================================================
> --- tip.git.orig/arch/x86/kernel/aperture_64.c
> +++ tip.git/arch/x86/kernel/aperture_64.c
> @@ -496,4 +496,6 @@ out:
>  			write_pci_config(bus, slot, 3, AMD64_GARTAPERTUREBASE, aper_alloc >> 25);
>  		}
>  	}
> +
> +	set_up_gart_resume(aper_order, aper_alloc);
>  }
> Index: tip.git/arch/x86/kernel/pci-gart_64.c
> ===================================================================
> --- tip.git.orig/arch/x86/kernel/pci-gart_64.c
> +++ tip.git/arch/x86/kernel/pci-gart_64.c
> @@ -549,14 +549,63 @@ static __init unsigned read_aperture(str
>  	return aper_base;
>  }
>  
> +static void enable_gart_translations(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_k8_northbridges; i++) {
> +		struct pci_dev *dev = k8_northbridges[i];
> +
> +		enable_gart_translation(dev, __pa(agp_gatt_table));
> +	}
> +}
> +
> +/*
> + * If fix_up_north_bridges is set, the north bridges have to be fixed up on
> + * resume in the same way as they are handled in gart_iommu_hole_init().
> + */
> +static bool fix_up_north_bridges;
> +static u32 aperture_order;
> +static u32 aperture_alloc;
> +
> +void set_up_gart_resume(u32 aper_order, u32 aper_alloc)
> +{
> +	fix_up_north_bridges = true;
> +	aperture_order = aper_order;
> +	aperture_alloc = aper_alloc;
> +}
> +
>  static int gart_resume(struct sys_device *dev)
>  {
> +	printk(KERN_INFO "PCI-DMA: Resuming GART IOMMU\n");
> +
> +	if (fix_up_north_bridges) {
> +		int i;
> +
> +		printk(KERN_INFO "PCI-DMA: Restoring GART aperture settings\n");
> +
> +		for (i = 0; i < num_k8_northbridges; i++) {
> +			struct pci_dev *dev = k8_northbridges[i];
> +
> +			/*
> +			 * Don't enable translations just yet.  That is the next
> +			 * step.  Restore the pre-suspend aperture settings.
> +			 */
> +			pci_write_config_dword(dev, AMD64_GARTAPERTURECTL,
> +						aperture_order << 1);
> +			pci_write_config_dword(dev, AMD64_GARTAPERTUREBASE,
> +						aperture_alloc >> 25);
> +		}
> +	}
> +
> +	enable_gart_translations();
> +
>  	return 0;
>  }
>  
>  static int gart_suspend(struct sys_device *dev, pm_message_t state)
>  {
> -	return -EINVAL;
> +	return 0;
>  }
>  
>  static struct sysdev_class gart_sysdev_class = {
> @@ -614,16 +663,14 @@ static __init int init_k8_gatt(struct ag
>  	memset(gatt, 0, gatt_size);
>  	agp_gatt_table = gatt;
>  
> -	for (i = 0; i < num_k8_northbridges; i++) {
> -		dev = k8_northbridges[i];
> -		enable_gart_translation(dev, __pa(gatt));
> -	}
> +	enable_gart_translations();
>  
>  	error = sysdev_class_register(&gart_sysdev_class);
>  	if (!error)
>  		error = sysdev_register(&device_gart);
>  	if (error)
>  		panic("Could not register gart_sysdev -- would corrupt data on next suspend");
> +
>  	flush_gart();
>  
>  	printk(KERN_INFO "PCI-DMA: aperture base @ %x size %u KB\n",
> Index: tip.git/include/asm-x86/gart.h
> ===================================================================
> --- tip.git.orig/include/asm-x86/gart.h
> +++ tip.git/include/asm-x86/gart.h
> @@ -14,6 +14,7 @@ extern void gart_iommu_shutdown(void);
>  extern void __init gart_parse_options(char *);
>  extern void early_gart_iommu_check(void);
>  extern void gart_iommu_hole_init(void);
> +extern void set_up_gart_resume(u32, u32);
>  extern int fallback_aper_order;
>  extern int fallback_aper_force;
>  extern int gart_iommu_aperture;
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-06-08 22:36                 ` Rafael J. Wysocki
       [not found]                   ` <20080609124630.GA28799@elte.hu>
@ 2008-06-11 11:43                   ` Patrick
  2008-06-11 14:38                     ` Rafael J. Wysocki
  1 sibling, 1 reply; 25+ messages in thread
From: Patrick @ 2008-06-11 11:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, linux-kernel, Ingo Molnar, pm list, Andi Kleen

Hello

On Mon, 2008-06-09 at 00:36 +0200, Rafael J. Wysocki wrote:

> > > It's on top of the patch that adds the GART sysdev.

> Updated patch follows.  It has been tested a little on my new 4 GB test box on
> which 2.6.26-rc4 failed miserably with severe consequences.  More testing
> welcome, but please be careful.

No risk no fun! :-)

As the suspend-vs-iommu-prevent-suspend-if-we-could-not-resume.patch,
where the GART sysdev is added, is now included in 2.6.26-rc5, I was
able to apply this one seamlessly to it (-rc5 tree) and have just
rebooted *, suspended to disk and ram once wiht all filesystems (>400GB)
mounted rw ** and everything is _working fine_.

As usual, a kernel log, the applied patch and a test report are
available at http://zefir.890m.com/kernel-testing/ .

pat@babar:~/tmp/dmesg$ grep -C3 GART dmesg.2.6.26-rc5-gart-suspend.txt
[    0.294419] PCI: Using ACPI for IRQ routing
[    0.314420] PCI-DMA: Disabling AGP.
[    0.314420] PCI-DMA: aperture base @ 4000000 size 65536 KB
[    0.314420] PCI-DMA: using GART IOMMU.
[    0.314420] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
[    0.314420] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
[    0.314420] hpet0: 4 32-bit timers, 14318180 Hz
--
[  221.770810] CPU1 is down
[  221.770966] PM: Creating hibernation image:
[  221.772014] PM: Need to copy 220260 pages
[  221.772014] PCI-DMA: Resuming GART IOMMU
[  221.772014] Enabling non-boot CPUs ...
[  221.772014] CPU0 attaching NULL sched-domain.
[  221.781073] SMP alternatives: switching to SMP code
--
[  251.756678]   groups: 0
[  251.756984] CPU1 is down
[  251.756984] Back to C!
[  251.756988] PCI-DMA: Resuming GART IOMMU
[  251.757355] Enabling non-boot CPUs ...
[  251.757515] CPU0 attaching NULL sched-domain.
[  251.766920] SMP alternatives: switching to SMP code



* Rebooted for the first time since i compiled and rebooted -rc4 with
a similar "home-grown" patch on 31. of May, with which I suspended
several times to disk and ram without any problems.

** and also generally the same setup where suspending has disastrous
effects using kernel < 2.6.26-rc5

> ---
> Add resume handling to GART IOMMU.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  arch/x86/kernel/aperture_64.c |    2 +
>  arch/x86/kernel/pci-gart_64.c |   75 +++++++++++++++++++++++++++++++++---------
>  include/asm-x86/gart.h        |    1 
>  3 files changed, 62 insertions(+), 16 deletions(-)
> 
> Index: linux-2.6/arch/x86/kernel/pci-gart_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/pci-gart_64.c
> +++ linux-2.6/arch/x86/kernel/pci-gart_64.c
> @@ -549,14 +549,70 @@ static __init unsigned read_aperture(str
>  	return aper_base;
>  }
>  
> +static void enable_gart_translations(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_k8_northbridges; i++) {
> +		struct pci_dev *dev;
> +		u32 gatt_reg;
> +		u32 ctl;
> +
> +		dev = k8_northbridges[i];
> +		gatt_reg = __pa(agp_gatt_table) >> 12;
> +		gatt_reg <<= 4;
> +		pci_write_config_dword(dev, 0x98, gatt_reg);
> +		pci_read_config_dword(dev, 0x90, &ctl);
> +
> +		ctl |= 1;
> +		ctl &= ~((1<<4) | (1<<5));
> +
> +		pci_write_config_dword(dev, 0x90, ctl);
> +	}
> +}
> +
> +/*
> + * If fix_up_north_bridges is set, the north bridges have to be fixed up on
> + * resume in the same way as they are handled in gart_iommu_hole_init().
> + */
> +static bool fix_up_north_bridges;
> +static u32 aperture_order;
> +static u32 aperture_alloc;
> +
> +void set_up_gart_resume(u32 aper_order, u32 aper_alloc)
> +{
> +	fix_up_north_bridges = true;
> +	aperture_order = aper_order;
> +	aperture_alloc = aper_alloc;
> +}
> +
>  static int gart_resume(struct sys_device *dev)
>  {
> +	printk(KERN_INFO "PCI-DMA: Resuming GART IOMMU\n");
> +
> +	if (fix_up_north_bridges) {
> +		int i;
> +
> +		for (i = 0; i < num_k8_northbridges; i++) {
> +			struct pci_dev *dev = k8_northbridges[i];
> +
> +			/*
> +			 * Don't enable translations just yet.  That is the next
> +			 * step.  Restore the pre-suspend aperture settings.
> +			 */
> +			pci_write_config_dword(dev, 0x90, aperture_order << 1);
> +			pci_write_config_dword(dev, 0x94, aperture_alloc >> 25);
> +		}
> +	}
> +
> +	enable_gart_translations();
> +
>  	return 0;
>  }
>  
>  static int gart_suspend(struct sys_device *dev, pm_message_t state)
>  {
> -	return -EINVAL;
> +	return 0;
>  }
>  
>  static struct sysdev_class gart_sysdev_class = {
> @@ -614,27 +670,14 @@ static __init int init_k8_gatt(struct ag
>  	memset(gatt, 0, gatt_size);
>  	agp_gatt_table = gatt;
>  
> -	for (i = 0; i < num_k8_northbridges; i++) {
> -		u32 gatt_reg;
> -		u32 ctl;
> -
> -		dev = k8_northbridges[i];
> -		gatt_reg = __pa(gatt) >> 12;
> -		gatt_reg <<= 4;
> -		pci_write_config_dword(dev, 0x98, gatt_reg);
> -		pci_read_config_dword(dev, 0x90, &ctl);
> -
> -		ctl |= 1;
> -		ctl &= ~((1<<4) | (1<<5));
> -
> -		pci_write_config_dword(dev, 0x90, ctl);
> -	}
> +	enable_gart_translations();
>  
>  	error = sysdev_class_register(&gart_sysdev_class);
>  	if (!error)
>  		error = sysdev_register(&device_gart);
>  	if (error)
>  		panic("Could not register gart_sysdev -- would corrupt data on next suspend");
> +
>  	flush_gart();
>  
>  	printk(KERN_INFO "PCI-DMA: aperture base @ %x size %u KB\n",
> Index: linux-2.6/arch/x86/kernel/aperture_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/aperture_64.c
> +++ linux-2.6/arch/x86/kernel/aperture_64.c
> @@ -413,4 +413,6 @@ void __init gart_iommu_hole_init(void)
>  		write_pci_config(0, num, 3, 0x90, aper_order<<1);
>  		write_pci_config(0, num, 3, 0x94, aper_alloc>>25);
>  	}
> +
> +	set_up_gart_resume(aper_order, aper_alloc);
>  }
> Index: linux-2.6/include/asm-x86/gart.h
> ===================================================================
> --- linux-2.6.orig/include/asm-x86/gart.h
> +++ linux-2.6/include/asm-x86/gart.h
> @@ -11,6 +11,7 @@ extern void gart_iommu_shutdown(void);
>  extern void __init gart_parse_options(char *);
>  extern void early_gart_iommu_check(void);
>  extern void gart_iommu_hole_init(void);
> +extern void set_up_gart_resume(u32, u32);
>  extern int fallback_aper_order;
>  extern int fallback_aper_force;
>  extern int gart_iommu_aperture;


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-06-11 11:43                   ` >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption Patrick
@ 2008-06-11 14:38                     ` Rafael J. Wysocki
  2008-06-11 15:04                       ` Andi Kleen
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2008-06-11 14:38 UTC (permalink / raw)
  To: Patrick; +Cc: Pavel Machek, linux-kernel, Ingo Molnar, pm list, Andi Kleen

On Wednesday, 11 of June 2008, Patrick wrote:
> Hello

Hi,

> On Mon, 2008-06-09 at 00:36 +0200, Rafael J. Wysocki wrote:
> 
> > > > It's on top of the patch that adds the GART sysdev.
> 
> > Updated patch follows.  It has been tested a little on my new 4 GB test box on
> > which 2.6.26-rc4 failed miserably with severe consequences.  More testing
> > welcome, but please be careful.
> 
> No risk no fun! :-)
> 
> As the suspend-vs-iommu-prevent-suspend-if-we-could-not-resume.patch,
> where the GART sysdev is added, is now included in 2.6.26-rc5, I was
> able to apply this one seamlessly to it (-rc5 tree) and have just
> rebooted *, suspended to disk and ram once wiht all filesystems (>400GB)
> mounted rw ** and everything is _working fine_.
> 
> As usual, a kernel log, the applied patch and a test report are
> available at http://zefir.890m.com/kernel-testing/ .

Thanks for the testing!

Well, I was hoping to be able to get this patch into 2.6.26, as I don't really
like the temporary hack, preventing the affected systems from resuming
at all, that we have in there, but it seems to be too late. :-(

Hopefully, we'll get it into 2.6.27.

Thanks,
Rafael


> > ---
> > Add resume handling to GART IOMMU.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  arch/x86/kernel/aperture_64.c |    2 +
> >  arch/x86/kernel/pci-gart_64.c |   75 +++++++++++++++++++++++++++++++++---------
> >  include/asm-x86/gart.h        |    1 
> >  3 files changed, 62 insertions(+), 16 deletions(-)
> > 
> > Index: linux-2.6/arch/x86/kernel/pci-gart_64.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/kernel/pci-gart_64.c
> > +++ linux-2.6/arch/x86/kernel/pci-gart_64.c
> > @@ -549,14 +549,70 @@ static __init unsigned read_aperture(str
> >  	return aper_base;
> >  }
> >  
> > +static void enable_gart_translations(void)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < num_k8_northbridges; i++) {
> > +		struct pci_dev *dev;
> > +		u32 gatt_reg;
> > +		u32 ctl;
> > +
> > +		dev = k8_northbridges[i];
> > +		gatt_reg = __pa(agp_gatt_table) >> 12;
> > +		gatt_reg <<= 4;
> > +		pci_write_config_dword(dev, 0x98, gatt_reg);
> > +		pci_read_config_dword(dev, 0x90, &ctl);
> > +
> > +		ctl |= 1;
> > +		ctl &= ~((1<<4) | (1<<5));
> > +
> > +		pci_write_config_dword(dev, 0x90, ctl);
> > +	}
> > +}
> > +
> > +/*
> > + * If fix_up_north_bridges is set, the north bridges have to be fixed up on
> > + * resume in the same way as they are handled in gart_iommu_hole_init().
> > + */
> > +static bool fix_up_north_bridges;
> > +static u32 aperture_order;
> > +static u32 aperture_alloc;
> > +
> > +void set_up_gart_resume(u32 aper_order, u32 aper_alloc)
> > +{
> > +	fix_up_north_bridges = true;
> > +	aperture_order = aper_order;
> > +	aperture_alloc = aper_alloc;
> > +}
> > +
> >  static int gart_resume(struct sys_device *dev)
> >  {
> > +	printk(KERN_INFO "PCI-DMA: Resuming GART IOMMU\n");
> > +
> > +	if (fix_up_north_bridges) {
> > +		int i;
> > +
> > +		for (i = 0; i < num_k8_northbridges; i++) {
> > +			struct pci_dev *dev = k8_northbridges[i];
> > +
> > +			/*
> > +			 * Don't enable translations just yet.  That is the next
> > +			 * step.  Restore the pre-suspend aperture settings.
> > +			 */
> > +			pci_write_config_dword(dev, 0x90, aperture_order << 1);
> > +			pci_write_config_dword(dev, 0x94, aperture_alloc >> 25);
> > +		}
> > +	}
> > +
> > +	enable_gart_translations();
> > +
> >  	return 0;
> >  }
> >  
> >  static int gart_suspend(struct sys_device *dev, pm_message_t state)
> >  {
> > -	return -EINVAL;
> > +	return 0;
> >  }
> >  
> >  static struct sysdev_class gart_sysdev_class = {
> > @@ -614,27 +670,14 @@ static __init int init_k8_gatt(struct ag
> >  	memset(gatt, 0, gatt_size);
> >  	agp_gatt_table = gatt;
> >  
> > -	for (i = 0; i < num_k8_northbridges; i++) {
> > -		u32 gatt_reg;
> > -		u32 ctl;
> > -
> > -		dev = k8_northbridges[i];
> > -		gatt_reg = __pa(gatt) >> 12;
> > -		gatt_reg <<= 4;
> > -		pci_write_config_dword(dev, 0x98, gatt_reg);
> > -		pci_read_config_dword(dev, 0x90, &ctl);
> > -
> > -		ctl |= 1;
> > -		ctl &= ~((1<<4) | (1<<5));
> > -
> > -		pci_write_config_dword(dev, 0x90, ctl);
> > -	}
> > +	enable_gart_translations();
> >  
> >  	error = sysdev_class_register(&gart_sysdev_class);
> >  	if (!error)
> >  		error = sysdev_register(&device_gart);
> >  	if (error)
> >  		panic("Could not register gart_sysdev -- would corrupt data on next suspend");
> > +
> >  	flush_gart();
> >  
> >  	printk(KERN_INFO "PCI-DMA: aperture base @ %x size %u KB\n",
> > Index: linux-2.6/arch/x86/kernel/aperture_64.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/kernel/aperture_64.c
> > +++ linux-2.6/arch/x86/kernel/aperture_64.c
> > @@ -413,4 +413,6 @@ void __init gart_iommu_hole_init(void)
> >  		write_pci_config(0, num, 3, 0x90, aper_order<<1);
> >  		write_pci_config(0, num, 3, 0x94, aper_alloc>>25);
> >  	}
> > +
> > +	set_up_gart_resume(aper_order, aper_alloc);
> >  }
> > Index: linux-2.6/include/asm-x86/gart.h
> > ===================================================================
> > --- linux-2.6.orig/include/asm-x86/gart.h
> > +++ linux-2.6/include/asm-x86/gart.h
> > @@ -11,6 +11,7 @@ extern void gart_iommu_shutdown(void);
> >  extern void __init gart_parse_options(char *);
> >  extern void early_gart_iommu_check(void);
> >  extern void gart_iommu_hole_init(void);
> > +extern void set_up_gart_resume(u32, u32);
> >  extern int fallback_aper_order;
> >  extern int fallback_aper_force;
> >  extern int gart_iommu_aperture;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-06-11 14:38                     ` Rafael J. Wysocki
@ 2008-06-11 15:04                       ` Andi Kleen
  2008-07-03 17:35                         ` Patrick
  0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2008-06-11 15:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Patrick, Pavel Machek, linux-kernel, Ingo Molnar, pm list


> Well, I was hoping to be able to get this patch into 2.6.26, as I don't really
> like the temporary hack, preventing the affected systems from resuming
> at all, that we have in there, but it seems to be too late. :-(

I agree with you that this patch would be far better than the hack.

-Andi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] x86 GART: Add resume handling (was: Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption)
  2008-06-10 10:03                       ` Rafael J. Wysocki
@ 2008-06-12  9:34                         ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2008-06-12  9:34 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Pavel Machek, the arch/x86 maintainers, pm list, LKML


* Rafael J. Wysocki <rjw@sisk.pl> wrote:

> > Okay, appended is the patch rebased on tip/x86/gart with (mainline) 
> > commit cd76374e9de4501acc74f833dc6cb5e7a5dca115 "suspend-vs-iommu: 
> > prevent suspend if we could not resume" (which appears to be missing 
> > from tip/x86/gart) applied.
> > 
> > This version of the patch doesn't break compilation, but it hasn't 
> > been really tested yet.
> 
> Now it has been (successfully) tested too. :-)

applied to tip/x86/gart, thanks Rafael.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-06-11 15:04                       ` Andi Kleen
@ 2008-07-03 17:35                         ` Patrick
  2008-08-07  8:17                           ` Pavel Machek
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick @ 2008-07-03 17:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Rafael J. Wysocki, Pavel Machek, linux-kernel, Ingo Molnar,
	pm list

*bump*

Why "too late" ??? It (2.6.26) is still not out! Please (;-) oh please!)
reconsider / push it a bit! It's tested and working perfectly here ever
since. But I'd like for example vmware to work again as will and it
doesn't yet even with 2.6.25... Please let this patch go into 2.6.26 so
there's a chance of having a stock kernel with working vmware, fglrx,
and such again soon!

Thx

Greets - Patrick

On Mit, 2008-06-11 at 17:04 +0200, Andi Kleen wrote:
> > Well, I was hoping to be able to get this patch into 2.6.26, as I don't really
> > like the temporary hack, preventing the affected systems from resuming
> > at all, that we have in there, but it seems to be too late. :-(
> 
> I agree with you that this patch would be far better than the hack.
> 
> -Andi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-07-03 17:35                         ` Patrick
@ 2008-08-07  8:17                           ` Pavel Machek
  2008-08-08 22:40                             ` Patrick
  0 siblings, 1 reply; 25+ messages in thread
From: Pavel Machek @ 2008-08-07  8:17 UTC (permalink / raw)
  To: Patrick; +Cc: Andi Kleen, Rafael J. Wysocki, linux-kernel, Ingo Molnar, pm list

Hi!

> *bump*
> 
> Why "too late" ??? It (2.6.26) is still not out! Please (;-) oh please!)
> reconsider / push it a bit! It's tested and working perfectly here ever
> since. But I'd like for example vmware to work again as will and it
> doesn't yet even with 2.6.25... Please let this patch go into 2.6.26 so
> there's a chance of having a stock kernel with working vmware, fglrx,
> and such again soon!

Can you test 2.6.27-rc2? It should be all there...
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-08-07  8:17                           ` Pavel Machek
@ 2008-08-08 22:40                             ` Patrick
  2008-09-02  8:05                               ` Pavel Machek
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick @ 2008-08-08 22:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andi Kleen, Rafael J. Wysocki, linux-kernel, Ingo Molnar, pm list

On Thu, 2008-08-07 at 10:17 +0200, Pavel Machek wrote:

> Can you test 2.6.27-rc2? It should be all there...

Ja, sure! I noticed that it's there already... thank you all!

So as you requested, I got myself a -rc2 tree, compiled and tested
#s2ram and #s2disk. It works!

pat@babar:~/tmp/dmesg$ grep -C3 GART dmesg.2.6.27-rc2-gart-suspend.txt

[    0.208013] PCI: Using ACPI for IRQ routing
[    0.228126] PCI-DMA: Disabling AGP.
[    0.228918] PCI-DMA: aperture base @ 20000000 size 65536 KB
[    0.228956] PCI-DMA: using GART IOMMU.
[    0.228993] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
[    0.229429] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
[    0.229615] hpet0: 4 32-bit timers, 14318180 Hz
--
[   45.170754] [drm] Initialized drm 1.1.0 20060810
[   45.199305] pci 0000:01:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ
18
[   45.199447] [drm] Initialized radeon 1.29.0 20080528 on minor 0
[   46.127083] [drm] Setting GART location based on new memory map
[   46.128083] [drm] Loading RS690 Microcode
[   46.128116] [drm] Num pipes: 1
[   46.128123] [drm] writeback test succeeded in 1 usecs
--
[ 5341.902040]   groups: 0
[ 5341.902511] CPU1 is down
[ 5341.902511] Back to C!
[ 5341.902535] PCI-DMA: Resuming GART IOMMU
[ 5341.902537] PCI-DMA: Restoring GART aperture settings
[ 5341.902831] Enabling non-boot CPUs ...
[ 5341.903244] SMP alternatives: switching to SMP code
[ 5341.914231] Booting processor 1/1 ip 6000
--
[ 5778.246557] CPU1 is down
[ 5778.246726] PM: Creating hibernation image:
[ 5778.256008] PM: Need to copy 236714 pages
[ 5778.256008] PCI-DMA: Resuming GART IOMMU
[ 5778.256008] PCI-DMA: Restoring GART aperture settings
[ 5778.256008] Enabling non-boot CPUs ...
[ 5778.256008] SMP alternatives: switching to SMP code
[ 5778.267186] Booting processor 1/1 ip 6000

--------

full log and small test report is here:
http://zefir.890m.com/kernel-testing/
http://zefir.890m.com/kernel-testing/dmesg.2.6.27-rc2.txt

I have no clue what caused the oops (see dmesg), but I think it's
unrelated (look at the time stamps).

Maybe it's got something to do with the fact that I'm using
https://launchpad.net/~xorg-edgers xserver now to have 3d acceleration
working for my radeon X1200 (RS690 builtin) and the drm module versions
don't match. Xserver wants 1.30 but starts anyway. I can't fix this
because the script can't compile the
git://anongit.freedesktop.org/git/mesa/drm modules for 2.6.27-rc2 ....
This creates some unpredictable problems with the X server process when
I log into my account (can't find out what triggers it ... cpu usage
goes to 100%, have to reboot) As a workaround, I could log into a guest
account to do the tests but switched back to my patched 2.6.26-rc5 for
now for everyday PC usage.

BTW, if somebody would like to explain to me, why the simultanious usage
of the GART by drm and as iommu DON'T bite each other, I'd be quiet
interrested to learn.

Patrick


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption
  2008-08-08 22:40                             ` Patrick
@ 2008-09-02  8:05                               ` Pavel Machek
  0 siblings, 0 replies; 25+ messages in thread
From: Pavel Machek @ 2008-09-02  8:05 UTC (permalink / raw)
  To: Patrick; +Cc: Andi Kleen, Rafael J. Wysocki, linux-kernel, Ingo Molnar, pm list

Hi!

> > Can you test 2.6.27-rc2? It should be all there...
> 
> Ja, sure! I noticed that it's there already... thank you all!
> 
> So as you requested, I got myself a -rc2 tree, compiled and tested
> #s2ram and #s2disk. It works!

Good.

> Maybe it's got something to do with the fact that I'm using
> https://launchpad.net/~xorg-edgers xserver now to have 3d acceleration
> working for my radeon X1200 (RS690 builtin) and the drm module versions
> don't match. Xserver wants 1.30 but starts anyway. I can't fix this
> because the script can't compile the
> git://anongit.freedesktop.org/git/mesa/drm modules for 2.6.27-rc2 ....
> This creates some unpredictable problems with the X server process when
> I log into my account (can't find out what triggers it ... cpu usage
> goes to 100%, have to reboot) As a workaround, I could log into a guest
> account to do the tests but switched back to my patched 2.6.26-rc5 for
> now for everyday PC usage.

Yes, 3d has some problems.

> BTW, if somebody would like to explain to me, why the simultanious usage
> of the GART by drm and as iommu DON'T bite each other, I'd be quiet
> interrested to learn.

No idea, sorry.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2008-09-02  9:22 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-25 12:10 SB600 AHCI: Hard Disk Corruption Patrick
2008-05-25 12:16 ` Patrick
2008-05-25 17:38 ` Pavel Machek
2008-05-25 20:08   ` Patrick
2008-05-25 20:39     ` >3G => iommu => suspend problems -- was " Pavel Machek
2008-05-25 21:10       ` Pavel Machek
2008-05-26 15:31         ` Patrick
2008-05-27 11:22           ` Pavel Machek
2008-05-29 18:44             ` Patrick
2008-05-29 18:51               ` Patrick
2008-05-29 21:05               ` Patrick
2008-06-03 22:33             ` Rafael J. Wysocki
2008-06-06 13:20               ` Pavel Machek
2008-06-08 22:36                 ` Rafael J. Wysocki
     [not found]                   ` <20080609124630.GA28799@elte.hu>
2008-06-09 22:10                     ` [PATCH] x86 GART: Add resume handling (was: Re: >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption) Rafael J. Wysocki
2008-06-10 10:03                       ` Rafael J. Wysocki
2008-06-12  9:34                         ` Ingo Molnar
2008-06-11 11:43                   ` >3G => iommu => suspend problems -- was Re: SB600 AHCI: Hard Disk Corruption Patrick
2008-06-11 14:38                     ` Rafael J. Wysocki
2008-06-11 15:04                       ` Andi Kleen
2008-07-03 17:35                         ` Patrick
2008-08-07  8:17                           ` Pavel Machek
2008-08-08 22:40                             ` Patrick
2008-09-02  8:05                               ` Pavel Machek
2008-05-27 10:23         ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox