* [ath9k-devel] [RFC v2] Serialization of IO @ 2009-02-11 8:07 Luis R. Rodriguez 2009-02-15 1:29 ` Luis R. Rodriguez 0 siblings, 1 reply; 12+ messages in thread From: Luis R. Rodriguez @ 2009-02-11 8:07 UTC (permalink / raw) To: ath9k-devel So I've gone back to the drawing board, and reviewed this issue as thoroughly as I can. The issue is PCI reads/writes can overlap with each other (not just writes). This shouldn't generally be an issue but if some reads take a while, for example, there could be another read/write on its way on another CPU and at least for our PCI 11n devices that will make them angry. Some PCI hosts don't seem to do this but some others do. It should be noted this issue is not present on our pre-802.11n devices or our new 11n PCI-express devices. So with clarified, here's a second attempt at serialization. The first patch wasn't doing anything because we never initialized ah->config.serialize_regmode. We do that now only on non-UP systems. The last patch in the series is perhaps overkill -- but it would deal with rare case of a UP system coming up and you hotplugging a second CPU later. It may also help with suspend, but don't quote me on that yet. Anyway, here's the latest stab at it: http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009-02-11/serialization-v2.patch This applies against today's wireless-testing/compat-wireless updates. Please test and let me know if ath9k with PCI devices on HT/Multi-CPU issues are corrected by it. Known issue: ping flood in a terminal makes it painful to come back. I've been trying to look for a more neater way to guarantee serialization but so far this is what I have. I do wonder, for example, if some of the atomic.h (atomic_inc_and_test()) stuff may let us use it to somehow serialize CPU entry into a read/write. Although its not designed for it may be worth considering. I also some of the most evil code I've seen lately on drivers/pci/quirks.c and did wonder if there was a fix we can re-use through there but didn't see anything. If you know have any other ideas please let me know. Luis ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO 2009-02-11 8:07 [ath9k-devel] [RFC v2] Serialization of IO Luis R. Rodriguez @ 2009-02-15 1:29 ` Luis R. Rodriguez 2009-02-16 10:18 ` W. van den Akker 0 siblings, 1 reply; 12+ messages in thread From: Luis R. Rodriguez @ 2009-02-15 1:29 UTC (permalink / raw) To: ath9k-devel On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez <lrodriguez@atheros.com> wrote: > So I've gone back to the drawing board, and reviewed this issue > as thoroughly as I can. The issue is PCI reads/writes can overlap > with each other (not just writes). This shouldn't generally be an > issue but if some reads take a while, for example, there could be > another read/write on its way on another CPU and at least for our > PCI 11n devices that will make them angry. Some PCI hosts don't seem > to do this but some others do. It should be noted this issue is not > present on our pre-802.11n devices or our new 11n PCI-express > devices. > > So with clarified, here's a second attempt at serialization. > The first patch wasn't doing anything because we never initialized > ah->config.serialize_regmode. We do that now only on non-UP systems. > The last patch in the series is perhaps overkill -- but it would deal > with rare case of a UP system coming up and you hotplugging a second > CPU later. It may also help with suspend, but don't quote me on that > yet. > > Anyway, here's the latest stab at it: > > http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009-02-11/serialization-v2.patch > > This applies against today's wireless-testing/compat-wireless updates. > > Please test and let me know if ath9k with PCI devices on HT/Multi-CPU > issues are corrected by it. > > Known issue: ping flood in a terminal makes it painful to come back. > > I've been trying to look for a more neater way to guarantee serialization > but so far this is what I have. I do wonder, for example, if some of > the atomic.h (atomic_inc_and_test()) stuff may let us use it to somehow > serialize CPU entry into a read/write. Although its not designed for it > may be worth considering. I also some of the most evil code I've seen > lately on drivers/pci/quirks.c and did wonder if there was a fix we can > re-use through there but didn't see anything. If you know have any other > ideas please let me know. Can someone who is able to reproduce the SMP issue please try these patches? Luis ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO 2009-02-15 1:29 ` Luis R. Rodriguez @ 2009-02-16 10:18 ` W. van den Akker 2009-02-22 21:51 ` W. van den Akker 0 siblings, 1 reply; 12+ messages in thread From: W. van den Akker @ 2009-02-16 10:18 UTC (permalink / raw) To: ath9k-devel > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez > <lrodriguez@atheros.com> wrote: >> So I've gone back to the drawing board, and reviewed this issue >> as thoroughly as I can. The issue is PCI reads/writes can overlap >> with each other (not just writes). This shouldn't generally be an >> issue but if some reads take a while, for example, there could be >> another read/write on its way on another CPU and at least for our >> PCI 11n devices that will make them angry. Some PCI hosts don't seem >> to do this but some others do. It should be noted this issue is not >> present on our pre-802.11n devices or our new 11n PCI-express >> devices. >> >> So with clarified, here's a second attempt at serialization. >> The first patch wasn't doing anything because we never initialized >> ah->config.serialize_regmode. We do that now only on non-UP systems. >> The last patch in the series is perhaps overkill -- but it would deal >> with rare case of a UP system coming up and you hotplugging a second >> CPU later. It may also help with suspend, but don't quote me on that >> yet. >> >> Anyway, here's the latest stab at it: >> >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009-02-11/serialization-v2.patch >> >> This applies against today's wireless-testing/compat-wireless updates. >> >> Please test and let me know if ath9k with PCI devices on HT/Multi-CPU >> issues are corrected by it. >> >> Known issue: ping flood in a terminal makes it painful to come back. >> >> I've been trying to look for a more neater way to guarantee >> serialization >> but so far this is what I have. I do wonder, for example, if some of >> the atomic.h (atomic_inc_and_test()) stuff may let us use it to somehow >> serialize CPU entry into a read/write. Although its not designed for it >> may be worth considering. I also some of the most evil code I've seen >> lately on drivers/pci/quirks.c and did wonder if there was a fix we can >> re-use through there but didn't see anything. If you know have any other >> ideas please let me know. > > Can someone who is able to reproduce the SMP issue please try these > patches? > > Luis > _______________________________________________ > ath9k-devel mailing list > ath9k-devel at lists.ath9k.org > https://lists.ath9k.org/mailman/listinfo/ath9k-devel > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > Hi Luis, I am currently on holiday. I have patched the system. But had some issues because with the UDEV and also my notebook wouldnt connect anymore to the AP. I had not had the time to dig into. Sunday I will investigate it further. Is it possible to create the patch against the mainstream RC instead of the RC4-wl? Then I can test it faster. Sorry for the delay. greetings, Willem -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO 2009-02-16 10:18 ` W. van den Akker @ 2009-02-22 21:51 ` W. van den Akker 2009-02-23 17:31 ` Luis R. Rodriguez 0 siblings, 1 reply; 12+ messages in thread From: W. van den Akker @ 2009-02-22 21:51 UTC (permalink / raw) To: ath9k-devel On Monday 16 February 2009 11:18:28 W. van den Akker wrote: > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez > > > > <lrodriguez@atheros.com> wrote: > >> So I've gone back to the drawing board, and reviewed this issue > >> as thoroughly as I can. The issue is PCI reads/writes can overlap > >> with each other (not just writes). This shouldn't generally be an > >> issue but if some reads take a while, for example, there could be > >> another read/write on its way on another CPU and at least for our > >> PCI 11n devices that will make them angry. Some PCI hosts don't seem > >> to do this but some others do. It should be noted this issue is not > >> present on our pre-802.11n devices or our new 11n PCI-express > >> devices. > >> > >> So with clarified, here's a second attempt at serialization. > >> The first patch wasn't doing anything because we never initialized > >> ah->config.serialize_regmode. We do that now only on non-UP systems. > >> The last patch in the series is perhaps overkill -- but it would deal > >> with rare case of a UP system coming up and you hotplugging a second > >> CPU later. It may also help with suspend, but don't quote me on that > >> yet. > >> > >> Anyway, here's the latest stab at it: > >> > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009- > >>02-11/serialization-v2.patch > >> > >> This applies against today's wireless-testing/compat-wireless updates. > >> > >> Please test and let me know if ath9k with PCI devices on HT/Multi-CPU > >> issues are corrected by it. > >> > >> Known issue: ping flood in a terminal makes it painful to come back. > >> > >> I've been trying to look for a more neater way to guarantee > >> serialization > >> but so far this is what I have. I do wonder, for example, if some of > >> the atomic.h (atomic_inc_and_test()) stuff may let us use it to somehow > >> serialize CPU entry into a read/write. Although its not designed for it > >> may be worth considering. I also some of the most evil code I've seen > >> lately on drivers/pci/quirks.c and did wonder if there was a fix we can > >> re-use through there but didn't see anything. If you know have any other > >> ideas please let me know. > > > > Can someone who is able to reproduce the SMP issue please try these > > patches? > > > > Luis > > _______________________________________________ > > ath9k-devel mailing list > > ath9k-devel at lists.ath9k.org > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel > > > > -- > > This message has been scanned for viruses and > > dangerous content by MailScanner, and is > > believed to be clean. > > Hi Luis, > > I am currently on holiday. I have patched the system. But had some issues > because with the UDEV and also my notebook wouldnt connect anymore to the > AP. > I had not had the time to dig into. > Sunday I will investigate it further. > > Is it possible to create the patch against the mainstream RC instead of > the RC4-wl? Then I can test it faster. > > Sorry for the delay. > I grabbed the latest git-testing today and applied the patch. The server is running for almost 2 hours now with a double CPU. No problems yet found. I have stressed the system. But no hangups. Also the previous problems with UDEV are gone with this testing version. So.... it looks like the patch is working for SMP systems I will report tomorrow if its still up and running. But it looks promissing. Greetings, Willem -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO 2009-02-22 21:51 ` W. van den Akker @ 2009-02-23 17:31 ` Luis R. Rodriguez 2009-02-23 17:45 ` W. van den Akker 0 siblings, 1 reply; 12+ messages in thread From: Luis R. Rodriguez @ 2009-02-23 17:31 UTC (permalink / raw) To: ath9k-devel On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote: > On Monday 16 February 2009 11:18:28 W. van den Akker wrote: > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez > > > > > > <lrodriguez@atheros.com> wrote: > > >> So I've gone back to the drawing board, and reviewed this issue > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap > > >> with each other (not just writes). This shouldn't generally be an > > >> issue but if some reads take a while, for example, there could be > > >> another read/write on its way on another CPU and at least for our > > >> PCI 11n devices that will make them angry. Some PCI hosts don't seem > > >> to do this but some others do. It should be noted this issue is not > > >> present on our pre-802.11n devices or our new 11n PCI-express > > >> devices. > > >> > > >> So with clarified, here's a second attempt at serialization. > > >> The first patch wasn't doing anything because we never initialized > > >> ah->config.serialize_regmode. We do that now only on non-UP systems. > > >> The last patch in the series is perhaps overkill -- but it would deal > > >> with rare case of a UP system coming up and you hotplugging a second > > >> CPU later. It may also help with suspend, but don't quote me on that > > >> yet. > > >> > > >> Anyway, here's the latest stab at it: > > >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009- > > >>02-11/serialization-v2.patch > > >> > > >> This applies against today's wireless-testing/compat-wireless updates. > > >> > > >> Please test and let me know if ath9k with PCI devices on HT/Multi-CPU > > >> issues are corrected by it. > > >> > > >> Known issue: ping flood in a terminal makes it painful to come back. > > >> > > >> I've been trying to look for a more neater way to guarantee > > >> serialization > > >> but so far this is what I have. I do wonder, for example, if some of > > >> the atomic.h (atomic_inc_and_test()) stuff may let us use it to somehow > > >> serialize CPU entry into a read/write. Although its not designed for it > > >> may be worth considering. I also some of the most evil code I've seen > > >> lately on drivers/pci/quirks.c and did wonder if there was a fix we can > > >> re-use through there but didn't see anything. If you know have any other > > >> ideas please let me know. > > > > > > Can someone who is able to reproduce the SMP issue please try these > > > patches? > > > > > > Luis > > > _______________________________________________ > > > ath9k-devel mailing list > > > ath9k-devel at lists.ath9k.org > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel > > > > > > -- > > > This message has been scanned for viruses and > > > dangerous content by MailScanner, and is > > > believed to be clean. > > > > Hi Luis, > > > > I am currently on holiday. I have patched the system. But had some issues > > because with the UDEV and also my notebook wouldnt connect anymore to the > > AP. > > I had not had the time to dig into. > > Sunday I will investigate it further. > > > > Is it possible to create the patch against the mainstream RC instead of > > the RC4-wl? Then I can test it faster. > > > > Sorry for the delay. > > > > I grabbed the latest git-testing today and applied the patch. > The server is running for almost 2 hours now with a double CPU. No problems > yet found. I have stressed the system. But no hangups. > > Also the previous problems with UDEV are gone with this testing version. > > So.... it looks like the patch is working for SMP systems > > I will report tomorrow if its still up and running. But it looks promissing. Willem, thanks for the feedback so far -- you're the first to report back success on the patches curing your issues. I believe you may have also enabled maxcpus=1 before so just want to confirm that if you did have that that you removed that from your grub conf for the shiny new kernel + serialization patches. Luis ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO 2009-02-23 17:31 ` Luis R. Rodriguez @ 2009-02-23 17:45 ` W. van den Akker 2009-02-23 17:48 ` Luis R. Rodriguez 0 siblings, 1 reply; 12+ messages in thread From: W. van den Akker @ 2009-02-23 17:45 UTC (permalink / raw) To: ath9k-devel On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote: > On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote: > > On Monday 16 February 2009 11:18:28 W. van den Akker wrote: > > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez > > > > > > > > <lrodriguez@atheros.com> wrote: > > > >> So I've gone back to the drawing board, and reviewed this issue > > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap > > > >> with each other (not just writes). This shouldn't generally be an > > > >> issue but if some reads take a while, for example, there could be > > > >> another read/write on its way on another CPU and at least for our > > > >> PCI 11n devices that will make them angry. Some PCI hosts don't seem > > > >> to do this but some others do. It should be noted this issue is not > > > >> present on our pre-802.11n devices or our new 11n PCI-express > > > >> devices. > > > >> > > > >> So with clarified, here's a second attempt at serialization. > > > >> The first patch wasn't doing anything because we never initialized > > > >> ah->config.serialize_regmode. We do that now only on non-UP systems. > > > >> The last patch in the series is perhaps overkill -- but it would > > > >> deal with rare case of a UP system coming up and you hotplugging a > > > >> second CPU later. It may also help with suspend, but don't quote me > > > >> on that yet. > > > >> > > > >> Anyway, here's the latest stab at it: > > > >> > > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2 > > > >>009- 02-11/serialization-v2.patch > > > >> > > > >> This applies against today's wireless-testing/compat-wireless > > > >> updates. > > > >> > > > >> Please test and let me know if ath9k with PCI devices on > > > >> HT/Multi-CPU issues are corrected by it. > > > >> > > > >> Known issue: ping flood in a terminal makes it painful to come back. > > > >> > > > >> I've been trying to look for a more neater way to guarantee > > > >> serialization > > > >> but so far this is what I have. I do wonder, for example, if some of > > > >> the atomic.h (atomic_inc_and_test()) stuff may let us use it to > > > >> somehow serialize CPU entry into a read/write. Although its not > > > >> designed for it may be worth considering. I also some of the most > > > >> evil code I've seen lately on drivers/pci/quirks.c and did wonder if > > > >> there was a fix we can re-use through there but didn't see anything. > > > >> If you know have any other ideas please let me know. > > > > > > > > Can someone who is able to reproduce the SMP issue please try these > > > > patches? > > > > > > > > Luis > > > > _______________________________________________ > > > > ath9k-devel mailing list > > > > ath9k-devel at lists.ath9k.org > > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel > > > > > > > > -- > > > > This message has been scanned for viruses and > > > > dangerous content by MailScanner, and is > > > > believed to be clean. > > > > > > Hi Luis, > > > > > > I am currently on holiday. I have patched the system. But had some > > > issues because with the UDEV and also my notebook wouldnt connect > > > anymore to the AP. > > > I had not had the time to dig into. > > > Sunday I will investigate it further. > > > > > > Is it possible to create the patch against the mainstream RC instead of > > > the RC4-wl? Then I can test it faster. > > > > > > Sorry for the delay. > > > > I grabbed the latest git-testing today and applied the patch. > > The server is running for almost 2 hours now with a double CPU. No > > problems yet found. I have stressed the system. But no hangups. > > > > Also the previous problems with UDEV are gone with this testing version. > > > > So.... it looks like the patch is working for SMP systems > > > > I will report tomorrow if its still up and running. But it looks > > promissing. > > Willem, > > thanks for the feedback so far -- you're the first to report back success > on the patches curing your issues. I believe you may have also enabled > maxcpus=1 before so just want to confirm that if you did have that that > you removed that from your grub conf for the shiny new kernel + > serialization patches. > > Luis Correct, I downloaded the last testing-writeless.. applied the patch and removed the maxcpus=1 from the grub conf. The server is showing 2 CPUs. Now almost 24 hours up and running without any problems. gr, Willem -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO 2009-02-23 17:45 ` W. van den Akker @ 2009-02-23 17:48 ` Luis R. Rodriguez 2009-02-23 18:01 ` W. van den Akker 2009-02-23 21:08 ` W. van den Akker 0 siblings, 2 replies; 12+ messages in thread From: Luis R. Rodriguez @ 2009-02-23 17:48 UTC (permalink / raw) To: ath9k-devel On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl> wrote: > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote: >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote: >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote: >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez >> > > > >> > > > <lrodriguez@atheros.com> wrote: >> > > >> So I've gone back to the drawing board, and reviewed this issue >> > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap >> > > >> with each other (not just writes). This shouldn't generally be an >> > > >> issue but if some reads take a while, for example, there could be >> > > >> another read/write on its way on another CPU and at least for our >> > > >> PCI 11n devices that will make them angry. Some PCI hosts don't seem >> > > >> to do this but some others do. It should be noted this issue is not >> > > >> present on our pre-802.11n devices or our new 11n PCI-express >> > > >> devices. >> > > >> >> > > >> So with clarified, here's a second attempt at serialization. >> > > >> The first patch wasn't doing anything because we never initialized >> > > >> ah->config.serialize_regmode. We do that now only on non-UP systems. >> > > >> The last patch in the series is perhaps overkill -- but it would >> > > >> deal with rare case of a UP system coming up and you hotplugging a >> > > >> second CPU later. It may also help with suspend, but don't quote me >> > > >> on that yet. >> > > >> >> > > >> Anyway, here's the latest stab at it: >> > > >> >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2 >> > > >>009- 02-11/serialization-v2.patch >> > > >> >> > > >> This applies against today's wireless-testing/compat-wireless >> > > >> updates. >> > > >> >> > > >> Please test and let me know if ath9k with PCI devices on >> > > >> HT/Multi-CPU issues are corrected by it. >> > > >> >> > > >> Known issue: ping flood in a terminal makes it painful to come back. >> > > >> >> > > >> I've been trying to look for a more neater way to guarantee >> > > >> serialization >> > > >> but so far this is what I have. I do wonder, for example, if some of >> > > >> the atomic.h (atomic_inc_and_test()) stuff may let us use it to >> > > >> somehow serialize CPU entry into a read/write. Although its not >> > > >> designed for it may be worth considering. I also some of the most >> > > >> evil code I've seen lately on drivers/pci/quirks.c and did wonder if >> > > >> there was a fix we can re-use through there but didn't see anything. >> > > >> If you know have any other ideas please let me know. >> > > > >> > > > Can someone who is able to reproduce the SMP issue please try these >> > > > patches? >> > > > >> > > > Luis >> > > > _______________________________________________ >> > > > ath9k-devel mailing list >> > > > ath9k-devel at lists.ath9k.org >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel >> > > > >> > > > -- >> > > > This message has been scanned for viruses and >> > > > dangerous content by MailScanner, and is >> > > > believed to be clean. >> > > >> > > Hi Luis, >> > > >> > > I am currently on holiday. I have patched the system. But had some >> > > issues because with the UDEV and also my notebook wouldnt connect >> > > anymore to the AP. >> > > I had not had the time to dig into. >> > > Sunday I will investigate it further. >> > > >> > > Is it possible to create the patch against the mainstream RC instead of >> > > the RC4-wl? Then I can test it faster. >> > > >> > > Sorry for the delay. >> > >> > I grabbed the latest git-testing today and applied the patch. >> > The server is running for almost 2 hours now with a double CPU. No >> > problems yet found. I have stressed the system. But no hangups. >> > >> > Also the previous problems with UDEV are gone with this testing version. >> > >> > So.... it looks like the patch is working for SMP systems >> > >> > I will report tomorrow if its still up and running. But it looks >> > promissing. >> >> Willem, >> >> thanks for the feedback so far -- you're the first to report back success >> on the patches curing your issues. I believe you may have also enabled >> maxcpus=1 before so just want to confirm that if you did have that that >> you removed that from your grub conf for the shiny new kernel + >> serialization patches. >> >> Luis > > > Correct, I downloaded the last testing-writeless.. applied the patch and > removed the maxcpus=1 from the grub conf. > The server is showing 2 CPUs. > Now almost 24 hours up and running without any problems. Thanks for the confirmation -- are you using AR5416 PCI? Just want to confirm as well. Luis ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO 2009-02-23 17:48 ` Luis R. Rodriguez @ 2009-02-23 18:01 ` W. van den Akker 2009-02-23 21:08 ` W. van den Akker 1 sibling, 0 replies; 12+ messages in thread From: W. van den Akker @ 2009-02-23 18:01 UTC (permalink / raw) To: ath9k-devel On Monday 23 February 2009 18:48:09 Luis R. Rodriguez wrote: > On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl> wrote: > > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote: > >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote: > >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote: > >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez > >> > > > > >> > > > <lrodriguez@atheros.com> wrote: > >> > > >> So I've gone back to the drawing board, and reviewed this issue > >> > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap > >> > > >> with each other (not just writes). This shouldn't generally be an > >> > > >> issue but if some reads take a while, for example, there could be > >> > > >> another read/write on its way on another CPU and at least for our > >> > > >> PCI 11n devices that will make them angry. Some PCI hosts don't > >> > > >> seem to do this but some others do. It should be noted this issue > >> > > >> is not present on our pre-802.11n devices or our new 11n > >> > > >> PCI-express devices. > >> > > >> > >> > > >> So with clarified, here's a second attempt at serialization. > >> > > >> The first patch wasn't doing anything because we never > >> > > >> initialized ah->config.serialize_regmode. We do that now only on > >> > > >> non-UP systems. The last patch in the series is perhaps overkill > >> > > >> -- but it would deal with rare case of a UP system coming up and > >> > > >> you hotplugging a second CPU later. It may also help with > >> > > >> suspend, but don't quote me on that yet. > >> > > >> > >> > > >> Anyway, here's the latest stab at it: > >> > > >> > >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9 > >> > > >>k/2 009- 02-11/serialization-v2.patch > >> > > >> > >> > > >> This applies against today's wireless-testing/compat-wireless > >> > > >> updates. > >> > > >> > >> > > >> Please test and let me know if ath9k with PCI devices on > >> > > >> HT/Multi-CPU issues are corrected by it. > >> > > >> > >> > > >> Known issue: ping flood in a terminal makes it painful to come > >> > > >> back. > >> > > >> > >> > > >> I've been trying to look for a more neater way to guarantee > >> > > >> serialization > >> > > >> but so far this is what I have. I do wonder, for example, if some > >> > > >> of the atomic.h (atomic_inc_and_test()) stuff may let us use it > >> > > >> to somehow serialize CPU entry into a read/write. Although its > >> > > >> not designed for it may be worth considering. I also some of the > >> > > >> most evil code I've seen lately on drivers/pci/quirks.c and did > >> > > >> wonder if there was a fix we can re-use through there but didn't > >> > > >> see anything. If you know have any other ideas please let me > >> > > >> know. > >> > > > > >> > > > Can someone who is able to reproduce the SMP issue please try > >> > > > these patches? > >> > > > > >> > > > Luis > >> > > > _______________________________________________ > >> > > > ath9k-devel mailing list > >> > > > ath9k-devel at lists.ath9k.org > >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel > >> > > > > >> > > > -- > >> > > > This message has been scanned for viruses and > >> > > > dangerous content by MailScanner, and is > >> > > > believed to be clean. > >> > > > >> > > Hi Luis, > >> > > > >> > > I am currently on holiday. I have patched the system. But had some > >> > > issues because with the UDEV and also my notebook wouldnt connect > >> > > anymore to the AP. > >> > > I had not had the time to dig into. > >> > > Sunday I will investigate it further. > >> > > > >> > > Is it possible to create the patch against the mainstream RC instead > >> > > of the RC4-wl? Then I can test it faster. > >> > > > >> > > Sorry for the delay. > >> > > >> > I grabbed the latest git-testing today and applied the patch. > >> > The server is running for almost 2 hours now with a double CPU. No > >> > problems yet found. I have stressed the system. But no hangups. > >> > > >> > Also the previous problems with UDEV are gone with this testing > >> > version. > >> > > >> > So.... it looks like the patch is working for SMP systems > >> > > >> > I will report tomorrow if its still up and running. But it looks > >> > promissing. > >> > >> Willem, > >> > >> thanks for the feedback so far -- you're the first to report back > >> success on the patches curing your issues. I believe you may have also > >> enabled maxcpus=1 before so just want to confirm that if you did have > >> that that you removed that from your grub conf for the shiny new kernel > >> + serialization patches. > >> > >> Luis > > > > Correct, I downloaded the last testing-writeless.. applied the patch and > > removed the maxcpus=1 from the grub conf. > > The server is showing 2 CPUs. > > Now almost 24 hours up and running without any problems. > > Thanks for the confirmation -- are you using AR5416 PCI? Just want to > confirm as well. > > Luis Yes, its a WMP300N v2 PCI card with a AR5416. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO 2009-02-23 17:48 ` Luis R. Rodriguez 2009-02-23 18:01 ` W. van den Akker @ 2009-02-23 21:08 ` W. van den Akker 2009-02-23 23:58 ` Luis R. Rodriguez 1 sibling, 1 reply; 12+ messages in thread From: W. van den Akker @ 2009-02-23 21:08 UTC (permalink / raw) To: ath9k-devel On Monday 23 February 2009 18:48:09 Luis R. Rodriguez wrote: > On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl> wrote: > > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote: > >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote: > >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote: > >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez > >> > > > > >> > > > <lrodriguez@atheros.com> wrote: > >> > > >> So I've gone back to the drawing board, and reviewed this issue > >> > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap > >> > > >> with each other (not just writes). This shouldn't generally be an > >> > > >> issue but if some reads take a while, for example, there could be > >> > > >> another read/write on its way on another CPU and at least for our > >> > > >> PCI 11n devices that will make them angry. Some PCI hosts don't > >> > > >> seem to do this but some others do. It should be noted this issue > >> > > >> is not present on our pre-802.11n devices or our new 11n > >> > > >> PCI-express devices. > >> > > >> > >> > > >> So with clarified, here's a second attempt at serialization. > >> > > >> The first patch wasn't doing anything because we never > >> > > >> initialized ah->config.serialize_regmode. We do that now only on > >> > > >> non-UP systems. The last patch in the series is perhaps overkill > >> > > >> -- but it would deal with rare case of a UP system coming up and > >> > > >> you hotplugging a second CPU later. It may also help with > >> > > >> suspend, but don't quote me on that yet. > >> > > >> > >> > > >> Anyway, here's the latest stab at it: > >> > > >> > >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9 > >> > > >>k/2 009- 02-11/serialization-v2.patch > >> > > >> > >> > > >> This applies against today's wireless-testing/compat-wireless > >> > > >> updates. > >> > > >> > >> > > >> Please test and let me know if ath9k with PCI devices on > >> > > >> HT/Multi-CPU issues are corrected by it. > >> > > >> > >> > > >> Known issue: ping flood in a terminal makes it painful to come > >> > > >> back. > >> > > >> > >> > > >> I've been trying to look for a more neater way to guarantee > >> > > >> serialization > >> > > >> but so far this is what I have. I do wonder, for example, if some > >> > > >> of the atomic.h (atomic_inc_and_test()) stuff may let us use it > >> > > >> to somehow serialize CPU entry into a read/write. Although its > >> > > >> not designed for it may be worth considering. I also some of the > >> > > >> most evil code I've seen lately on drivers/pci/quirks.c and did > >> > > >> wonder if there was a fix we can re-use through there but didn't > >> > > >> see anything. If you know have any other ideas please let me > >> > > >> know. > >> > > > > >> > > > Can someone who is able to reproduce the SMP issue please try > >> > > > these patches? > >> > > > > >> > > > Luis > >> > > > _______________________________________________ > >> > > > ath9k-devel mailing list > >> > > > ath9k-devel at lists.ath9k.org > >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel > >> > > > > >> > > > -- > >> > > > This message has been scanned for viruses and > >> > > > dangerous content by MailScanner, and is > >> > > > believed to be clean. > >> > > > >> > > Hi Luis, > >> > > > >> > > I am currently on holiday. I have patched the system. But had some > >> > > issues because with the UDEV and also my notebook wouldnt connect > >> > > anymore to the AP. > >> > > I had not had the time to dig into. > >> > > Sunday I will investigate it further. > >> > > > >> > > Is it possible to create the patch against the mainstream RC instead > >> > > of the RC4-wl? Then I can test it faster. > >> > > > >> > > Sorry for the delay. > >> > > >> > I grabbed the latest git-testing today and applied the patch. > >> > The server is running for almost 2 hours now with a double CPU. No > >> > problems yet found. I have stressed the system. But no hangups. > >> > > >> > Also the previous problems with UDEV are gone with this testing > >> > version. > >> > > >> > So.... it looks like the patch is working for SMP systems > >> > > >> > I will report tomorrow if its still up and running. But it looks > >> > promissing. > >> > >> Willem, > >> > >> thanks for the feedback so far -- you're the first to report back > >> success on the patches curing your issues. I believe you may have also > >> enabled maxcpus=1 before so just want to confirm that if you did have > >> that that you removed that from your grub conf for the shiny new kernel > >> + serialization patches. > >> > >> Luis > > > > Correct, I downloaded the last testing-writeless.. applied the patch and > > removed the maxcpus=1 from the grub conf. > > The server is showing 2 CPUs. > > Now almost 24 hours up and running without any problems. > > Thanks for the confirmation -- are you using AR5416 PCI? Just want to > confirm as well. > > Luis hmmm, After 23,5 hours of problemless working the server hangs. Also without any trace or oops. Close but no sigar.... anyhow ... I think we are almost there. gr, Willem -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO 2009-02-23 21:08 ` W. van den Akker @ 2009-02-23 23:58 ` Luis R. Rodriguez 2009-02-24 6:19 ` W. van den Akker 0 siblings, 1 reply; 12+ messages in thread From: Luis R. Rodriguez @ 2009-02-23 23:58 UTC (permalink / raw) To: ath9k-devel On Mon, Feb 23, 2009 at 1:08 PM, W. van den Akker <listsrv@wilsoft.nl> wrote: > On Monday 23 February 2009 18:48:09 Luis R. Rodriguez wrote: >> On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl> > wrote: >> > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote: >> >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote: >> >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote: >> >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez >> >> > > > >> >> > > > <lrodriguez@atheros.com> wrote: >> >> > > >> So I've gone back to the drawing board, and reviewed this issue >> >> > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap >> >> > > >> with each other (not just writes). This shouldn't generally be an >> >> > > >> issue but if some reads take a while, for example, there could be >> >> > > >> another read/write on its way on another CPU and at least for our >> >> > > >> PCI 11n devices that will make them angry. Some PCI hosts don't >> >> > > >> seem to do this but some others do. It should be noted this issue >> >> > > >> is not present on our pre-802.11n devices or our new 11n >> >> > > >> PCI-express devices. >> >> > > >> >> >> > > >> So with clarified, here's a second attempt at serialization. >> >> > > >> The first patch wasn't doing anything because we never >> >> > > >> initialized ah->config.serialize_regmode. We do that now only on >> >> > > >> non-UP systems. The last patch in the series is perhaps overkill >> >> > > >> -- but it would deal with rare case of a UP system coming up and >> >> > > >> you hotplugging a second CPU later. It may also help with >> >> > > >> suspend, but don't quote me on that yet. >> >> > > >> >> >> > > >> Anyway, here's the latest stab at it: >> >> > > >> >> >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9 >> >> > > >>k/2 009- 02-11/serialization-v2.patch >> >> > > >> >> >> > > >> This applies against today's wireless-testing/compat-wireless >> >> > > >> updates. >> >> > > >> >> >> > > >> Please test and let me know if ath9k with PCI devices on >> >> > > >> HT/Multi-CPU issues are corrected by it. >> >> > > >> >> >> > > >> Known issue: ping flood in a terminal makes it painful to come >> >> > > >> back. >> >> > > >> >> >> > > >> I've been trying to look for a more neater way to guarantee >> >> > > >> serialization >> >> > > >> but so far this is what I have. I do wonder, for example, if some >> >> > > >> of the atomic.h (atomic_inc_and_test()) stuff may let us use it >> >> > > >> to somehow serialize CPU entry into a read/write. Although its >> >> > > >> not designed for it may be worth considering. I also some of the >> >> > > >> most evil code I've seen lately on drivers/pci/quirks.c and did >> >> > > >> wonder if there was a fix we can re-use through there but didn't >> >> > > >> see anything. If you know have any other ideas please let me >> >> > > >> know. >> >> > > > >> >> > > > Can someone who is able to reproduce the SMP issue please try >> >> > > > these patches? >> >> > > > >> >> > > > Luis >> >> > > > _______________________________________________ >> >> > > > ath9k-devel mailing list >> >> > > > ath9k-devel at lists.ath9k.org >> >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel >> >> > > > >> >> > > > -- >> >> > > > This message has been scanned for viruses and >> >> > > > dangerous content by MailScanner, and is >> >> > > > believed to be clean. >> >> > > >> >> > > Hi Luis, >> >> > > >> >> > > I am currently on holiday. I have patched the system. But had some >> >> > > issues because with the UDEV and also my notebook wouldnt connect >> >> > > anymore to the AP. >> >> > > I had not had the time to dig into. >> >> > > Sunday I will investigate it further. >> >> > > >> >> > > Is it possible to create the patch against the mainstream RC instead >> >> > > of the RC4-wl? Then I can test it faster. >> >> > > >> >> > > Sorry for the delay. >> >> > >> >> > I grabbed the latest git-testing today and applied the patch. >> >> > The server is running for almost 2 hours now with a double CPU. No >> >> > problems yet found. I have stressed the system. But no hangups. >> >> > >> >> > Also the previous problems with UDEV are gone with this testing >> >> > version. >> >> > >> >> > So.... it looks like the patch is working for SMP systems >> >> > >> >> > I will report tomorrow if its still up and running. But it looks >> >> > promissing. >> >> >> >> Willem, >> >> >> >> thanks for the feedback so far -- you're the first to report back >> >> success on the patches curing your issues. I believe you may have also >> >> enabled maxcpus=1 before so just want to confirm that if you did have >> >> that that you removed that from your grub conf for the shiny new kernel >> >> + serialization patches. >> >> >> >> Luis >> > >> > Correct, I downloaded the last testing-writeless.. applied the patch and >> > removed the maxcpus=1 from the grub conf. >> > The server is showing 2 CPUs. >> > Now almost 24 hours up and running without any problems. >> >> Thanks for the confirmation -- are you using AR5416 PCI? Just want to >> confirm as well. >> >> Luis > > hmmm, > > After 23,5 hours of problemless working the server hangs. > Also without any trace or oops. > Close but no sigar.... anyhow ... I think we are almost there. We may be dealing with two separate issues here. Before the serialization patches (or if you revert them) how long before you box hangs? Luis ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO 2009-02-23 23:58 ` Luis R. Rodriguez @ 2009-02-24 6:19 ` W. van den Akker 2009-02-24 6:27 ` Luis R. Rodriguez 0 siblings, 1 reply; 12+ messages in thread From: W. van den Akker @ 2009-02-24 6:19 UTC (permalink / raw) To: ath9k-devel > On Mon, Feb 23, 2009 at 1:08 PM, W. van den Akker <listsrv@wilsoft.nl> > wrote: >> On Monday 23 February 2009 18:48:09 Luis R. Rodriguez wrote: >>> On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl> >> wrote: >>> > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote: >>> >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote: >>> >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote: >>> >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez >>> >> > > > >>> >> > > > <lrodriguez@atheros.com> wrote: >>> >> > > >> So I've gone back to the drawing board, and reviewed this >>> issue >>> >> > > >> as thoroughly as I can. The issue is PCI reads/writes can >>> overlap >>> >> > > >> with each other (not just writes). This shouldn't generally >>> be an >>> >> > > >> issue but if some reads take a while, for example, there >>> could be >>> >> > > >> another read/write on its way on another CPU and at least for >>> our >>> >> > > >> PCI 11n devices that will make them angry. Some PCI hosts >>> don't >>> >> > > >> seem to do this but some others do. It should be noted this >>> issue >>> >> > > >> is not present on our pre-802.11n devices or our new 11n >>> >> > > >> PCI-express devices. >>> >> > > >> >>> >> > > >> So with clarified, here's a second attempt at serialization. >>> >> > > >> The first patch wasn't doing anything because we never >>> >> > > >> initialized ah->config.serialize_regmode. We do that now only >>> on >>> >> > > >> non-UP systems. The last patch in the series is perhaps >>> overkill >>> >> > > >> -- but it would deal with rare case of a UP system coming up >>> and >>> >> > > >> you hotplugging a second CPU later. It may also help with >>> >> > > >> suspend, but don't quote me on that yet. >>> >> > > >> >>> >> > > >> Anyway, here's the latest stab at it: >>> >> > > >> >>> >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9 >>> >> > > >>k/2 009- 02-11/serialization-v2.patch >>> >> > > >> >>> >> > > >> This applies against today's wireless-testing/compat-wireless >>> >> > > >> updates. >>> >> > > >> >>> >> > > >> Please test and let me know if ath9k with PCI devices on >>> >> > > >> HT/Multi-CPU issues are corrected by it. >>> >> > > >> >>> >> > > >> Known issue: ping flood in a terminal makes it painful to >>> come >>> >> > > >> back. >>> >> > > >> >>> >> > > >> I've been trying to look for a more neater way to guarantee >>> >> > > >> serialization >>> >> > > >> but so far this is what I have. I do wonder, for example, if >>> some >>> >> > > >> of the atomic.h (atomic_inc_and_test()) stuff may let us use >>> it >>> >> > > >> to somehow serialize CPU entry into a read/write. Although >>> its >>> >> > > >> not designed for it may be worth considering. I also some of >>> the >>> >> > > >> most evil code I've seen lately on drivers/pci/quirks.c and >>> did >>> >> > > >> wonder if there was a fix we can re-use through there but >>> didn't >>> >> > > >> see anything. If you know have any other ideas please let me >>> >> > > >> know. >>> >> > > > >>> >> > > > Can someone who is able to reproduce the SMP issue please try >>> >> > > > these patches? >>> >> > > > >>> >> > > > Luis >>> >> > > > _______________________________________________ >>> >> > > > ath9k-devel mailing list >>> >> > > > ath9k-devel at lists.ath9k.org >>> >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel >>> >> > > > >>> >> > > > -- >>> >> > > > This message has been scanned for viruses and >>> >> > > > dangerous content by MailScanner, and is >>> >> > > > believed to be clean. >>> >> > > >>> >> > > Hi Luis, >>> >> > > >>> >> > > I am currently on holiday. I have patched the system. But had >>> some >>> >> > > issues because with the UDEV and also my notebook wouldnt >>> connect >>> >> > > anymore to the AP. >>> >> > > I had not had the time to dig into. >>> >> > > Sunday I will investigate it further. >>> >> > > >>> >> > > Is it possible to create the patch against the mainstream RC >>> instead >>> >> > > of the RC4-wl? Then I can test it faster. >>> >> > > >>> >> > > Sorry for the delay. >>> >> > >>> >> > I grabbed the latest git-testing today and applied the patch. >>> >> > The server is running for almost 2 hours now with a double CPU. No >>> >> > problems yet found. I have stressed the system. But no hangups. >>> >> > >>> >> > Also the previous problems with UDEV are gone with this testing >>> >> > version. >>> >> > >>> >> > So.... it looks like the patch is working for SMP systems >>> >> > >>> >> > I will report tomorrow if its still up and running. But it looks >>> >> > promissing. >>> >> >>> >> Willem, >>> >> >>> >> thanks for the feedback so far -- you're the first to report back >>> >> success on the patches curing your issues. I believe you may have >>> also >>> >> enabled maxcpus=1 before so just want to confirm that if you did >>> have >>> >> that that you removed that from your grub conf for the shiny new >>> kernel >>> >> + serialization patches. >>> >> >>> >> Luis >>> > >>> > Correct, I downloaded the last testing-writeless.. applied the patch >>> and >>> > removed the maxcpus=1 from the grub conf. >>> > The server is showing 2 CPUs. >>> > Now almost 24 hours up and running without any problems. >>> >>> Thanks for the confirmation -- are you using AR5416 PCI? Just want to >>> confirm as well. >>> >>> Luis >> >> hmmm, >> >> After 23,5 hours of problemless working the server hangs. >> Also without any trace or oops. >> Close but no sigar.... anyhow ... I think we are almost there. > > We may be dealing with two separate issues here. Before the > serialization patches (or if you revert them) how long before you box > hangs? > > Luis Before the patches applied the box hangs within minutes. After the patch the box operated 23.5 hours without problems. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO 2009-02-24 6:19 ` W. van den Akker @ 2009-02-24 6:27 ` Luis R. Rodriguez 0 siblings, 0 replies; 12+ messages in thread From: Luis R. Rodriguez @ 2009-02-24 6:27 UTC (permalink / raw) To: ath9k-devel On Mon, Feb 23, 2009 at 10:19 PM, W. van den Akker <listsrv@wilsoft.nl> wrote: >> On Mon, Feb 23, 2009 at 1:08 PM, W. van den Akker <listsrv@wilsoft.nl> >> wrote: >>> On Monday 23 February 2009 18:48:09 Luis R. Rodriguez wrote: >>>> On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl> >>> wrote: >>>> > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote: >>>> >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote: >>>> >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote: >>>> >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez >>>> >> > > > >>>> >> > > > <lrodriguez@atheros.com> wrote: >>>> >> > > >> So I've gone back to the drawing board, and reviewed this >>>> issue >>>> >> > > >> as thoroughly as I can. The issue is PCI reads/writes can >>>> overlap >>>> >> > > >> with each other (not just writes). This shouldn't generally >>>> be an >>>> >> > > >> issue but if some reads take a while, for example, there >>>> could be >>>> >> > > >> another read/write on its way on another CPU and at least for >>>> our >>>> >> > > >> PCI 11n devices that will make them angry. Some PCI hosts >>>> don't >>>> >> > > >> seem to do this but some others do. It should be noted this >>>> issue >>>> >> > > >> is not present on our pre-802.11n devices or our new 11n >>>> >> > > >> PCI-express devices. >>>> >> > > >> >>>> >> > > >> So with clarified, here's a second attempt at serialization. >>>> >> > > >> The first patch wasn't doing anything because we never >>>> >> > > >> initialized ah->config.serialize_regmode. We do that now only >>>> on >>>> >> > > >> non-UP systems. The last patch in the series is perhaps >>>> overkill >>>> >> > > >> -- but it would deal with rare case of a UP system coming up >>>> and >>>> >> > > >> you hotplugging a second CPU later. It may also help with >>>> >> > > >> suspend, but don't quote me on that yet. >>>> >> > > >> >>>> >> > > >> Anyway, here's the latest stab at it: >>>> >> > > >> >>>> >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9 >>>> >> > > >>k/2 009- 02-11/serialization-v2.patch >>>> >> > > >> >>>> >> > > >> This applies against today's wireless-testing/compat-wireless >>>> >> > > >> updates. >>>> >> > > >> >>>> >> > > >> Please test and let me know if ath9k with PCI devices on >>>> >> > > >> HT/Multi-CPU issues are corrected by it. >>>> >> > > >> >>>> >> > > >> Known issue: ping flood in a terminal makes it painful to >>>> come >>>> >> > > >> back. >>>> >> > > >> >>>> >> > > >> I've been trying to look for a more neater way to guarantee >>>> >> > > >> serialization >>>> >> > > >> but so far this is what I have. I do wonder, for example, if >>>> some >>>> >> > > >> of the atomic.h (atomic_inc_and_test()) stuff may let us use >>>> it >>>> >> > > >> to somehow serialize CPU entry into a read/write. Although >>>> its >>>> >> > > >> not designed for it may be worth considering. I also some of >>>> the >>>> >> > > >> most evil code I've seen lately on drivers/pci/quirks.c and >>>> did >>>> >> > > >> wonder if there was a fix we can re-use through there but >>>> didn't >>>> >> > > >> see anything. If you know have any other ideas please let me >>>> >> > > >> know. >>>> >> > > > >>>> >> > > > Can someone who is able to reproduce the SMP issue please try >>>> >> > > > these patches? >>>> >> > > > >>>> >> > > > ? Luis >>>> >> > > > _______________________________________________ >>>> >> > > > ath9k-devel mailing list >>>> >> > > > ath9k-devel at lists.ath9k.org >>>> >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel >>>> >> > > > >>>> >> > > > -- >>>> >> > > > This message has been scanned for viruses and >>>> >> > > > dangerous content by MailScanner, and is >>>> >> > > > believed to be clean. >>>> >> > > >>>> >> > > Hi Luis, >>>> >> > > >>>> >> > > I am currently on holiday. I have patched the system. But had >>>> some >>>> >> > > issues because with the UDEV and also my notebook wouldnt >>>> connect >>>> >> > > anymore to the AP. >>>> >> > > I had not had the time to dig into. >>>> >> > > Sunday I will investigate it further. >>>> >> > > >>>> >> > > Is it possible to create the patch against the mainstream RC >>>> instead >>>> >> > > of the RC4-wl? Then I can test it faster. >>>> >> > > >>>> >> > > Sorry for the delay. >>>> >> > >>>> >> > I grabbed the latest git-testing today and applied the patch. >>>> >> > The server is running for almost 2 hours now with a double CPU. No >>>> >> > problems yet found. I have stressed the system. But no hangups. >>>> >> > >>>> >> > Also the previous problems with UDEV are gone with this testing >>>> >> > version. >>>> >> > >>>> >> > So.... it looks like the patch is working for SMP systems >>>> >> > >>>> >> > I will report tomorrow if its still up and running. But it looks >>>> >> > promissing. >>>> >> >>>> >> Willem, >>>> >> >>>> >> thanks for the feedback so far -- you're the first to report back >>>> >> success on the patches curing your issues. I believe you may have >>>> also >>>> >> enabled maxcpus=1 before so just want to confirm that if you did >>>> have >>>> >> that that you removed that from your grub conf for the shiny new >>>> kernel >>>> >> + serialization patches. >>>> >> >>>> >> ? Luis >>>> > >>>> > Correct, I downloaded the last testing-writeless.. applied the patch >>>> and >>>> > removed the maxcpus=1 from the grub conf. >>>> > The server is showing 2 CPUs. >>>> > Now almost 24 hours up and running without any problems. >>>> >>>> Thanks for the confirmation -- are you using AR5416 PCI? Just want to >>>> confirm as well. >>>> >>>> ? Luis >>> >>> hmmm, >>> >>> After 23,5 hours of problemless working the server hangs. >>> Also without any trace or oops. >>> Close but no sigar.... anyhow ... I think we are almost there. >> >> We may be dealing with two separate issues here. Before the >> serialization patches (or if you revert them) how long before you box >> hangs? >> >> ? Luis > > Before the patches applied the box hangs within minutes. After the patch > the box operated 23.5 hours without problems. Excellent so we will push these in -- please give the new series a shot that I just posted. Luis ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-02-24 6:27 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-11 8:07 [ath9k-devel] [RFC v2] Serialization of IO Luis R. Rodriguez 2009-02-15 1:29 ` Luis R. Rodriguez 2009-02-16 10:18 ` W. van den Akker 2009-02-22 21:51 ` W. van den Akker 2009-02-23 17:31 ` Luis R. Rodriguez 2009-02-23 17:45 ` W. van den Akker 2009-02-23 17:48 ` Luis R. Rodriguez 2009-02-23 18:01 ` W. van den Akker 2009-02-23 21:08 ` W. van den Akker 2009-02-23 23:58 ` Luis R. Rodriguez 2009-02-24 6:19 ` W. van den Akker 2009-02-24 6:27 ` Luis R. Rodriguez
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.