* [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode
@ 2005-04-05 19:33 Ross Biro
2005-04-05 20:53 ` Randy.Dunlap
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Ross Biro @ 2005-04-05 19:33 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2430 bytes --]
Currently Linux 2.6 assumes the BIOS (or firmware) sets the master abort
mode flag on PCI bridge chips in a coherent fashion. This is not always
the case and the consequences of getting this flag incorrect can cause
hardware to fail or silent data corruption. This patch lets the user
override the BIOS master abort setting at boot time and the distro
maintainer to set a default according to their target audience.
The comments in the patch are probably a bit too verbose, but I think it
is a good patch to start discussions around. If it is decided that
something should be done about this problem, this patch could be
included in a -mm release and migrate into Linus's kernel as appropriate.
This incarnation of the patch has had minimal testing. For our internal
kernels, we always force the master abort mode to 1 and then let the
device drivers for hardware we know can't handle target aborts switch
the master abort mode to 0. This does not seem appropriate for general
release.
Some background for those who do not spend most of their waking hours
exploring buses and what can go wrong.
The master abort flag tells a PCI bridge what to do when a bus master
behind the bridge requests the bus and the bridge is unable to get the
bus. With the flag clear, for master reads the bridge returns all
0xff's (hence silent data corruption) and for master writes, it throws
the data away. With the bit set, the bridge sends a target abort to the
master. This can only happen when the system is heavily loaded.
The problem with always setting the bit is that some PCI hardware,
notably some Intel E-1000 chips (Ethernet controller: Intel Corporation:
Unknown device 1076) cannot properly handle the target abort bit. In
the case of the E-1000 chip, the driver must reset the chip to recover.
This usually leads to the machine being off the network for several
seconds, or sometimes even minutes, which can be bad for servers.
I even have a single motherboard with both a device that cannot handle
the target abort and an IDE controller that can handle the target abort
behind the same bridge. For this motherboard, I have to choose the
lesser of two evils, network hiccups or potential data corruption.
For the record, I have seen both occur. Other people may make wish to
make a different choice than we did, hence this patch allows the user to
choose the mode at runtime.
Ross
[-- Attachment #2: master-abort.patch --]
[-- Type: text/plain, Size: 4486 bytes --]
diff -ur linux-2.6.11/drivers/pci/Kconfig linux-2.6.11-new/drivers/pci/Kconfig
--- linux-2.6.11/drivers/pci/Kconfig 2005-03-01 23:37:51.000000000 -0800
+++ linux-2.6.11-new/drivers/pci/Kconfig 2005-04-01 07:19:32.000000000 -0800
@@ -47,3 +47,38 @@
When in doubt, say Y.
+choice
+ prompt "Enable PCI Master Abort Mode"
+ depends on PCI
+ default PCI_MASTER_ABORT_DEFAULT
+ help
+ On PCI systems, when a bus is unavailable to a bus master, a
+ master abort occurs. Older bridges satisfy the master request
+ with all 0xFF's. This can lead to silent data corruption. Newer
+ bridges can send a target abort to the bus master. Some PCI
+ hardware cannot handle the target abort. Some x86 BIOSes configure
+ the buses in a suboptimal way. This option allows you to override
+ the BIOS setting. If unsure chose default. This choice can be
+ overridden at boot time with the pci_enable_master_abort={default,
+ enable, disable}
+
+config PCI_MASTER_ABORT_DEFAULT
+ bool "Default"
+ help
+ Choose this option if you are unsure, or believe your
+ firmware does the right thing.
+
+config PCI_MASTER_ABORT_ENABLE
+ bool "Enable"
+ help
+ Choose this option if it is more important for you to prevent
+ silent data loss than to have more hardware configurations work.
+
+
+config PCI_MASTER_ABORT_DISABLE
+ bool "Disable"
+ help
+ Choose this option if it is more important for you to have more
+ hardware configurations work than to prevent silent data loss.
+
+endchoice
diff -ur linux-2.6.11/drivers/pci/probe.c linux-2.6.11-new/drivers/pci/probe.c
--- linux-2.6.11/drivers/pci/probe.c 2005-03-01 23:38:13.000000000 -0800
+++ linux-2.6.11-new/drivers/pci/probe.c 2005-04-05 12:07:53.000000000 -0700
@@ -28,6 +28,15 @@
LIST_HEAD(pci_devices);
+/* used to force master abort mode on or off at runtime.
+ PCI_MASTER_ABORT_DEFAULT means leave alone, the BIOS got it correct.
+ PCI_MASTER_ABORT_ENABLE means turn it on everywhere.
+ PCI_MASTER_ABORT_DISABLE means turn it off everywhere.
+*/
+
+static int pci_enable_master_abort=PCI_MASTER_ABORT_VAL;
+
+
#ifdef HAVE_PCI_LEGACY
/**
* pci_create_legacy_files - create legacy I/O port and memory files
@@ -429,6 +438,20 @@
pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
+ /* Some BIOSes disable master abort mode, even though it's
+ usually a good thing (prevents silent data corruption).
+ Unfortunately some hardware (buggy e-1000 chips for
+ example) require Master Abort Mode to be off, or they will
+ not function properly. So we enable master abort mode
+ unless the user told us not to. The default value
+ for pci_enable_master_abort is set in the config file,
+ but can be overridden at setup time. */
+ if (pci_enable_master_abort == PCI_MASTER_ABORT_ENABLE) {
+ bctl |= PCI_BRIDGE_CTL_MASTER_ABORT;
+ } else if (pci_enable_master_abort == PCI_MASTER_ABORT_DISABLE) {
+ bctl &= ~PCI_BRIDGE_CTL_MASTER_ABORT;
+ }
+
pci_enable_crs(dev);
if ((buses & 0xffff00) && !pcibios_assign_all_busses() && !is_cardbus) {
@@ -932,6 +955,22 @@
kfree(b);
return NULL;
}
+
+static int __devinit pci_enable_master_abort_setup(char *str)
+{
+ if (strcmp(str, "enable") == 0) {
+ pci_enable_master_abort = PCI_MASTER_ABORT_ENABLE;
+ } else if (strcmp(str, "disable") == 0) {
+ pci_enable_master_abort = PCI_MASTER_ABORT_DISABLE;
+ } else if (strcmp(str, "default") == 0) {
+ pci_enable_master_abort = PCI_MASTER_ABORT_DEFAULT;
+ } else {
+ printk (KERN_ERR "PCI: Unknown Master Abort Mode (%s).", str);
+ }
+}
+
+__setup("pci_enable_master_abort=", pci_enable_master_abort_setup);
+
EXPORT_SYMBOL(pci_scan_bus_parented);
#ifdef CONFIG_HOTPLUG
diff -ur linux-2.6.11/include/linux/pci.h linux-2.6.11-new/include/linux/pci.h
--- linux-2.6.11/include/linux/pci.h 2005-03-01 23:38:08.000000000 -0800
+++ linux-2.6.11-new/include/linux/pci.h 2005-04-01 07:19:18.000000000 -0800
@@ -1064,5 +1064,17 @@
#define PCIPCI_VSFX 16
#define PCIPCI_ALIMAGIK 32
+#define PCI_MASTER_ABORT_DEFAULT 0
+#define PCI_MASTER_ABORT_ENABLE 1
+#define PCI_MASTER_ABORT_DISABLE 2
+
+#if defined(CONFIG_PCI_MASTER_ABORT_ENABLE)
+# define PCI_MASTER_ABORT_VAL PCI_MASTER_ABORT_ENABLE
+#elif defined(CONFIG_PCI_MASTER_ABORT_DISABLE)
+# define PCI_MASTER_ABORT_VAL PCI_MASTER_ABORT_DISABLE
+#else
+# define PCI_MASTER_ABORT_VAL PCI_MASTER_ABORT_DEFAULT
+#endif
+
#endif /* __KERNEL__ */
#endif /* LINUX_PCI_H */
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode
2005-04-05 19:33 [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode Ross Biro
@ 2005-04-05 20:53 ` Randy.Dunlap
2005-04-06 12:47 ` Ross Biro
2005-04-06 20:44 ` Daniel Egger
2005-04-10 13:29 ` Andi Kleen
2 siblings, 1 reply; 13+ messages in thread
From: Randy.Dunlap @ 2005-04-05 20:53 UTC (permalink / raw)
To: Ross Biro; +Cc: linux-kernel
Ross Biro wrote:
>
> Currently Linux 2.6 assumes the BIOS (or firmware) sets the master abort
> mode flag on PCI bridge chips in a coherent fashion. This is not always
> the case and the consequences of getting this flag incorrect can cause
> hardware to fail or silent data corruption. This patch lets the user
> override the BIOS master abort setting at boot time and the distro
> maintainer to set a default according to their target audience.
>
> The comments in the patch are probably a bit too verbose, but I think it
> is a good patch to start discussions around. If it is decided that
> something should be done about this problem, this patch could be
> included in a -mm release and migrate into Linus's kernel as appropriate.
The comments were helpful to me.
> This incarnation of the patch has had minimal testing. For our internal
> kernels, we always force the master abort mode to 1 and then let the
> device drivers for hardware we know can't handle target aborts switch
> the master abort mode to 0. This does not seem appropriate for general
> release.
>
> Some background for those who do not spend most of their waking hours
> exploring buses and what can go wrong.
Is this related (or could it be -- or should it be) at all to the
current discussion on the linux-pci mailing list
linux-pci@atrey.karlin.mff.cuni.cz) about "PCI Error Recovery
API Proposal" ?
> The master abort flag tells a PCI bridge what to do when a bus master
> behind the bridge requests the bus and the bridge is unable to get the
> bus. With the flag clear, for master reads the bridge returns all
> 0xff's (hence silent data corruption) and for master writes, it throws
> the data away. With the bit set, the bridge sends a target abort to the
> master. This can only happen when the system is heavily loaded.
or a PCI device isn't playing nicely?
> The problem with always setting the bit is that some PCI hardware,
> notably some Intel E-1000 chips (Ethernet controller: Intel Corporation:
> Unknown device 1076) cannot properly handle the target abort bit. In
> the case of the E-1000 chip, the driver must reset the chip to recover.
> This usually leads to the machine being off the network for several
> seconds, or sometimes even minutes, which can be bad for servers.
>
> I even have a single motherboard with both a device that cannot handle
> the target abort and an IDE controller that can handle the target abort
> behind the same bridge. For this motherboard, I have to choose the
> lesser of two evils, network hiccups or potential data corruption.
> For the record, I have seen both occur. Other people may make wish to
> make a different choice than we did, hence this patch allows the user to
> choose the mode at runtime.
>
> Ross
>
> ------------------------------------------------------------------------
>
> diff -ur linux-2.6.11/drivers/pci/Kconfig linux-2.6.11-new/drivers/pci/Kconfig
> --- linux-2.6.11/drivers/pci/Kconfig 2005-03-01 23:37:51.000000000 -0800
> +++ linux-2.6.11-new/drivers/pci/Kconfig 2005-04-01 07:19:32.000000000 -0800
> @@ -47,3 +47,38 @@
>
> When in doubt, say Y.
>
> +choice
> + prompt "Enable PCI Master Abort Mode"
> + depends on PCI
> + default PCI_MASTER_ABORT_DEFAULT
> + help
> + On PCI systems, when a bus is unavailable to a bus master, a
> + master abort occurs. Older bridges satisfy the master request
> + with all 0xFF's. This can lead to silent data corruption. Newer
> + bridges can send a target abort to the bus master. Some PCI
> + hardware cannot handle the target abort. Some x86 BIOSes configure
> + the buses in a suboptimal way. This option allows you to override
^^^ extra spaces
> + the BIOS setting. If unsure chose default. This choice can be
choose
> + overridden at boot time with the pci_enable_master_abort={default,
> + enable, disable}
boot option.
> +
> +config PCI_MASTER_ABORT_DEFAULT
> + bool "Default"
> + help
> + Choose this option if you are unsure, or believe your
> + firmware does the right thing.
> +
> +config PCI_MASTER_ABORT_ENABLE
> + bool "Enable"
> + help
> + Choose this option if it is more important for you to prevent
> + silent data loss than to have more hardware configurations work.
^^^^ ??
> +
> +
> +config PCI_MASTER_ABORT_DISABLE
> + bool "Disable"
> + help
> + Choose this option if it is more important for you to have more
^^^^
The phrase "have more hardware configurations work" need something....
Maybe add something like: "Some devices are known not to work with
PCI Master Aborts. If you have one of these devices, you probably
want to Disable this option."
> + hardware configurations work than to prevent silent data loss.
> +
> +endchoice
> diff -ur linux-2.6.11/drivers/pci/probe.c linux-2.6.11-new/drivers/pci/probe.c
> --- linux-2.6.11/drivers/pci/probe.c 2005-03-01 23:38:13.000000000 -0800
> +++ linux-2.6.11-new/drivers/pci/probe.c 2005-04-05 12:07:53.000000000 -0700
> @@ -28,6 +28,15 @@
>
> LIST_HEAD(pci_devices);
>
> +/* used to force master abort mode on or off at runtime.
> + PCI_MASTER_ABORT_DEFAULT means leave alone, the BIOS got it correct.
> + PCI_MASTER_ABORT_ENABLE means turn it on everywhere.
> + PCI_MASTER_ABORT_DISABLE means turn it off everywhere.
> +*/
> +
> +static int pci_enable_master_abort=PCI_MASTER_ABORT_VAL;
Nitpick: spaces around the '=' would enhance readability and be
appreciated.
> @@ -429,6 +438,20 @@
> pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
> bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
>
> + /* Some BIOSes disable master abort mode, even though it's
> + usually a good thing (prevents silent data corruption).
> + Unfortunately some hardware (buggy e-1000 chips for
> + example) require Master Abort Mode to be off, or they will
> + not function properly. So we enable master abort mode
> + unless the user told us not to. The default value
> + for pci_enable_master_abort is set in the config file,
> + but can be overridden at setup time. */
Nit #2: kernel long-comment style is:
/*
* line1
* line2
*/
> + if (pci_enable_master_abort == PCI_MASTER_ABORT_ENABLE) {
> + bctl |= PCI_BRIDGE_CTL_MASTER_ABORT;
> + } else if (pci_enable_master_abort == PCI_MASTER_ABORT_DISABLE) {
> + bctl &= ~PCI_BRIDGE_CTL_MASTER_ABORT;
> + }
> +
> pci_enable_crs(dev);
>
> if ((buses & 0xffff00) && !pcibios_assign_all_busses() && !is_cardbus) {
> @@ -932,6 +955,22 @@
> kfree(b);
> return NULL;
> }
> +
> +static int __devinit pci_enable_master_abort_setup(char *str)
Why __devinit? Looks to me like __init would be fine.
> +{
> + if (strcmp(str, "enable") == 0) {
> + pci_enable_master_abort = PCI_MASTER_ABORT_ENABLE;
> + } else if (strcmp(str, "disable") == 0) {
> + pci_enable_master_abort = PCI_MASTER_ABORT_DISABLE;
> + } else if (strcmp(str, "default") == 0) {
> + pci_enable_master_abort = PCI_MASTER_ABORT_DEFAULT;
> + } else {
> + printk (KERN_ERR "PCI: Unknown Master Abort Mode (%s).", str);
> + }
> +}
> +
> +__setup("pci_enable_master_abort=", pci_enable_master_abort_setup);
--
~Randy
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode
2005-04-05 20:53 ` Randy.Dunlap
@ 2005-04-06 12:47 ` Ross Biro
0 siblings, 0 replies; 13+ messages in thread
From: Ross Biro @ 2005-04-06 12:47 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: linux-kernel
Randy.Dunlap wrote:
>
>
> Is this related (or could it be -- or should it be) at all to the
> current discussion on the linux-pci mailing list
> linux-pci@atrey.karlin.mff.cuni.cz) about "PCI Error Recovery
> API Proposal" ?
I'm not familiar with the proposal, but this is not related to error
recovery since master aborts are a way of life on the PCI bus and things
just need to deal. The only question is how.
>
>> the master. This can only happen when the system is heavily loaded.
>
>
> or a PCI device isn't playing nicely?
Yes, but at least then you could blame the device in that case.
[ style and grammar comments noted ]
One thing I did fail to mention in my original post is that all of this
could be done by rc scripts from user space, but that seems unclean to me.
Ross
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode
2005-04-05 19:33 [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode Ross Biro
2005-04-05 20:53 ` Randy.Dunlap
@ 2005-04-06 20:44 ` Daniel Egger
2005-04-10 13:29 ` Andi Kleen
2 siblings, 0 replies; 13+ messages in thread
From: Daniel Egger @ 2005-04-06 20:44 UTC (permalink / raw)
To: Ross Biro; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 732 bytes --]
On 05.04.2005, at 21:33, Ross Biro wrote:
> The problem with always setting the bit is that some PCI hardware,
> notably some Intel E-1000 chips (Ethernet controller: Intel
> Corporation: Unknown device 1076) cannot properly handle the target
> abort bit. In the case of the E-1000 chip, the driver must reset the
> chip to recover. This usually leads to the machine being off the
> network for several seconds, or sometimes even minutes, which can be
> bad for servers.
This sounds *exactly* like my problem since I swapped
motherboards. I'll see whether there's some option in
the BIOS that fixes it and if not bite the bullet and
compile a generic kernel....
Thanks a lot for investigating this.
Servus,
Daniel
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode
2005-04-05 19:33 [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode Ross Biro
2005-04-05 20:53 ` Randy.Dunlap
2005-04-06 20:44 ` Daniel Egger
@ 2005-04-10 13:29 ` Andi Kleen
[not found] ` <8783be66050412075218b2b0b0@mail.gmail.com>
2 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2005-04-10 13:29 UTC (permalink / raw)
To: Ross Biro; +Cc: linux-kernel
Ross Biro <rossb@google.com> writes:
>
> I even have a single motherboard with both a device that cannot handle
> the target abort and an IDE controller that can handle the target
> abort behind the same bridge. For this motherboard, I have to choose
> the lesser of two evils, network hiccups or potential data corruption.
> For the record, I have seen both occur. Other people may make wish to
> make a different choice than we did, hence this patch allows the user
> to choose the mode at runtime.
I think it is totally wrong to make this Configs and boot options.
Nobody can do anything with such obscure boot configurations
and it is bad to require kernel recompiles for such things.
The right way to do this would be to have sysfs knobs that allow
to change these bits, and then let a user space tool change
it depending on PCI-ID. If the issue is critical enough
that it happens very often then it should be added to kernel
pci quirks - but again be unconditional.
-Andi
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2005-04-14 19:15 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-05 19:33 [RFC/Patch 2.6.11] Take control of PCI Master Abort Mode Ross Biro
2005-04-05 20:53 ` Randy.Dunlap
2005-04-06 12:47 ` Ross Biro
2005-04-06 20:44 ` Daniel Egger
2005-04-10 13:29 ` Andi Kleen
[not found] ` <8783be66050412075218b2b0b0@mail.gmail.com>
2005-04-13 18:37 ` Andi Kleen
2005-04-13 23:00 ` Ross Biro
2005-04-13 23:28 ` Dave Jones
2005-04-14 17:25 ` Ross Biro
2005-04-14 17:34 ` Tim Hockin
2005-04-14 18:02 ` Andi Kleen
2005-04-14 18:33 ` Dave Jones
2005-04-14 19:14 ` Daniel Egger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox