From mboxrd@z Thu Jan 1 00:00:00 1970 From: W. van den Akker Date: Mon, 23 Feb 2009 22:08:31 +0100 Subject: [ath9k-devel] [RFC v2] Serialization of IO In-Reply-To: <43e72e890902230948k6c73ae72i820c26febe7ce23a@mail.gmail.com> References: <20090211080717.GN4248@tesla> <200902231845.41715.listsrv@wilsoft.nl> <43e72e890902230948k6c73ae72i820c26febe7ce23a@mail.gmail.com> Message-ID: <200902232208.31275.listsrv@wilsoft.nl> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ath9k-devel@lists.ath9k.org On Monday 23 February 2009 18:48:09 Luis R. Rodriguez wrote: > On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker wrote: > > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote: > >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote: > >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote: > >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez > >> > > > > >> > > > wrote: > >> > > >> So I've gone back to the drawing board, and reviewed this issue > >> > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap > >> > > >> with each other (not just writes). This shouldn't generally be an > >> > > >> issue but if some reads take a while, for example, there could be > >> > > >> another read/write on its way on another CPU and at least for our > >> > > >> PCI 11n devices that will make them angry. Some PCI hosts don't > >> > > >> seem to do this but some others do. It should be noted this issue > >> > > >> is not present on our pre-802.11n devices or our new 11n > >> > > >> PCI-express devices. > >> > > >> > >> > > >> So with clarified, here's a second attempt at serialization. > >> > > >> The first patch wasn't doing anything because we never > >> > > >> initialized ah->config.serialize_regmode. We do that now only on > >> > > >> non-UP systems. The last patch in the series is perhaps overkill > >> > > >> -- but it would deal with rare case of a UP system coming up and > >> > > >> you hotplugging a second CPU later. It may also help with > >> > > >> suspend, but don't quote me on that yet. > >> > > >> > >> > > >> Anyway, here's the latest stab at it: > >> > > >> > >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9 > >> > > >>k/2 009- 02-11/serialization-v2.patch > >> > > >> > >> > > >> This applies against today's wireless-testing/compat-wireless > >> > > >> updates. > >> > > >> > >> > > >> Please test and let me know if ath9k with PCI devices on > >> > > >> HT/Multi-CPU issues are corrected by it. > >> > > >> > >> > > >> Known issue: ping flood in a terminal makes it painful to come > >> > > >> back. > >> > > >> > >> > > >> I've been trying to look for a more neater way to guarantee > >> > > >> serialization > >> > > >> but so far this is what I have. I do wonder, for example, if some > >> > > >> of the atomic.h (atomic_inc_and_test()) stuff may let us use it > >> > > >> to somehow serialize CPU entry into a read/write. Although its > >> > > >> not designed for it may be worth considering. I also some of the > >> > > >> most evil code I've seen lately on drivers/pci/quirks.c and did > >> > > >> wonder if there was a fix we can re-use through there but didn't > >> > > >> see anything. If you know have any other ideas please let me > >> > > >> know. > >> > > > > >> > > > Can someone who is able to reproduce the SMP issue please try > >> > > > these patches? > >> > > > > >> > > > Luis > >> > > > _______________________________________________ > >> > > > ath9k-devel mailing list > >> > > > ath9k-devel at lists.ath9k.org > >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel > >> > > > > >> > > > -- > >> > > > This message has been scanned for viruses and > >> > > > dangerous content by MailScanner, and is > >> > > > believed to be clean. > >> > > > >> > > Hi Luis, > >> > > > >> > > I am currently on holiday. I have patched the system. But had some > >> > > issues because with the UDEV and also my notebook wouldnt connect > >> > > anymore to the AP. > >> > > I had not had the time to dig into. > >> > > Sunday I will investigate it further. > >> > > > >> > > Is it possible to create the patch against the mainstream RC instead > >> > > of the RC4-wl? Then I can test it faster. > >> > > > >> > > Sorry for the delay. > >> > > >> > I grabbed the latest git-testing today and applied the patch. > >> > The server is running for almost 2 hours now with a double CPU. No > >> > problems yet found. I have stressed the system. But no hangups. > >> > > >> > Also the previous problems with UDEV are gone with this testing > >> > version. > >> > > >> > So.... it looks like the patch is working for SMP systems > >> > > >> > I will report tomorrow if its still up and running. But it looks > >> > promissing. > >> > >> Willem, > >> > >> thanks for the feedback so far -- you're the first to report back > >> success on the patches curing your issues. I believe you may have also > >> enabled maxcpus=1 before so just want to confirm that if you did have > >> that that you removed that from your grub conf for the shiny new kernel > >> + serialization patches. > >> > >> Luis > > > > Correct, I downloaded the last testing-writeless.. applied the patch and > > removed the maxcpus=1 from the grub conf. > > The server is showing 2 CPUs. > > Now almost 24 hours up and running without any problems. > > Thanks for the confirmation -- are you using AR5416 PCI? Just want to > confirm as well. > > Luis hmmm, After 23,5 hours of problemless working the server hangs. Also without any trace or oops. Close but no sigar.... anyhow ... I think we are almost there. gr, Willem -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.