* [ath9k-devel] [RFC v2] Serialization of IO
@ 2009-02-11 8:07 Luis R. Rodriguez
2009-02-15 1:29 ` Luis R. Rodriguez
0 siblings, 1 reply; 12+ messages in thread
From: Luis R. Rodriguez @ 2009-02-11 8:07 UTC (permalink / raw)
To: ath9k-devel
So I've gone back to the drawing board, and reviewed this issue
as thoroughly as I can. The issue is PCI reads/writes can overlap
with each other (not just writes). This shouldn't generally be an
issue but if some reads take a while, for example, there could be
another read/write on its way on another CPU and at least for our
PCI 11n devices that will make them angry. Some PCI hosts don't seem
to do this but some others do. It should be noted this issue is not
present on our pre-802.11n devices or our new 11n PCI-express
devices.
So with clarified, here's a second attempt at serialization.
The first patch wasn't doing anything because we never initialized
ah->config.serialize_regmode. We do that now only on non-UP systems.
The last patch in the series is perhaps overkill -- but it would deal
with rare case of a UP system coming up and you hotplugging a second
CPU later. It may also help with suspend, but don't quote me on that
yet.
Anyway, here's the latest stab at it:
http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009-02-11/serialization-v2.patch
This applies against today's wireless-testing/compat-wireless updates.
Please test and let me know if ath9k with PCI devices on HT/Multi-CPU
issues are corrected by it.
Known issue: ping flood in a terminal makes it painful to come back.
I've been trying to look for a more neater way to guarantee serialization
but so far this is what I have. I do wonder, for example, if some of
the atomic.h (atomic_inc_and_test()) stuff may let us use it to somehow
serialize CPU entry into a read/write. Although its not designed for it
may be worth considering. I also some of the most evil code I've seen
lately on drivers/pci/quirks.c and did wonder if there was a fix we can
re-use through there but didn't see anything. If you know have any other
ideas please let me know.
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO
2009-02-11 8:07 [ath9k-devel] [RFC v2] Serialization of IO Luis R. Rodriguez
@ 2009-02-15 1:29 ` Luis R. Rodriguez
2009-02-16 10:18 ` W. van den Akker
0 siblings, 1 reply; 12+ messages in thread
From: Luis R. Rodriguez @ 2009-02-15 1:29 UTC (permalink / raw)
To: ath9k-devel
On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez
<lrodriguez@atheros.com> wrote:
> So I've gone back to the drawing board, and reviewed this issue
> as thoroughly as I can. The issue is PCI reads/writes can overlap
> with each other (not just writes). This shouldn't generally be an
> issue but if some reads take a while, for example, there could be
> another read/write on its way on another CPU and at least for our
> PCI 11n devices that will make them angry. Some PCI hosts don't seem
> to do this but some others do. It should be noted this issue is not
> present on our pre-802.11n devices or our new 11n PCI-express
> devices.
>
> So with clarified, here's a second attempt at serialization.
> The first patch wasn't doing anything because we never initialized
> ah->config.serialize_regmode. We do that now only on non-UP systems.
> The last patch in the series is perhaps overkill -- but it would deal
> with rare case of a UP system coming up and you hotplugging a second
> CPU later. It may also help with suspend, but don't quote me on that
> yet.
>
> Anyway, here's the latest stab at it:
>
> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009-02-11/serialization-v2.patch
>
> This applies against today's wireless-testing/compat-wireless updates.
>
> Please test and let me know if ath9k with PCI devices on HT/Multi-CPU
> issues are corrected by it.
>
> Known issue: ping flood in a terminal makes it painful to come back.
>
> I've been trying to look for a more neater way to guarantee serialization
> but so far this is what I have. I do wonder, for example, if some of
> the atomic.h (atomic_inc_and_test()) stuff may let us use it to somehow
> serialize CPU entry into a read/write. Although its not designed for it
> may be worth considering. I also some of the most evil code I've seen
> lately on drivers/pci/quirks.c and did wonder if there was a fix we can
> re-use through there but didn't see anything. If you know have any other
> ideas please let me know.
Can someone who is able to reproduce the SMP issue please try these patches?
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO
2009-02-15 1:29 ` Luis R. Rodriguez
@ 2009-02-16 10:18 ` W. van den Akker
2009-02-22 21:51 ` W. van den Akker
0 siblings, 1 reply; 12+ messages in thread
From: W. van den Akker @ 2009-02-16 10:18 UTC (permalink / raw)
To: ath9k-devel
> On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez
> <lrodriguez@atheros.com> wrote:
>> So I've gone back to the drawing board, and reviewed this issue
>> as thoroughly as I can. The issue is PCI reads/writes can overlap
>> with each other (not just writes). This shouldn't generally be an
>> issue but if some reads take a while, for example, there could be
>> another read/write on its way on another CPU and at least for our
>> PCI 11n devices that will make them angry. Some PCI hosts don't seem
>> to do this but some others do. It should be noted this issue is not
>> present on our pre-802.11n devices or our new 11n PCI-express
>> devices.
>>
>> So with clarified, here's a second attempt at serialization.
>> The first patch wasn't doing anything because we never initialized
>> ah->config.serialize_regmode. We do that now only on non-UP systems.
>> The last patch in the series is perhaps overkill -- but it would deal
>> with rare case of a UP system coming up and you hotplugging a second
>> CPU later. It may also help with suspend, but don't quote me on that
>> yet.
>>
>> Anyway, here's the latest stab at it:
>>
>> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009-02-11/serialization-v2.patch
>>
>> This applies against today's wireless-testing/compat-wireless updates.
>>
>> Please test and let me know if ath9k with PCI devices on HT/Multi-CPU
>> issues are corrected by it.
>>
>> Known issue: ping flood in a terminal makes it painful to come back.
>>
>> I've been trying to look for a more neater way to guarantee
>> serialization
>> but so far this is what I have. I do wonder, for example, if some of
>> the atomic.h (atomic_inc_and_test()) stuff may let us use it to somehow
>> serialize CPU entry into a read/write. Although its not designed for it
>> may be worth considering. I also some of the most evil code I've seen
>> lately on drivers/pci/quirks.c and did wonder if there was a fix we can
>> re-use through there but didn't see anything. If you know have any other
>> ideas please let me know.
>
> Can someone who is able to reproduce the SMP issue please try these
> patches?
>
> Luis
> _______________________________________________
> ath9k-devel mailing list
> ath9k-devel at lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
Hi Luis,
I am currently on holiday. I have patched the system. But had some issues
because with the UDEV and also my notebook wouldnt connect anymore to the
AP.
I had not had the time to dig into.
Sunday I will investigate it further.
Is it possible to create the patch against the mainstream RC instead of
the RC4-wl? Then I can test it faster.
Sorry for the delay.
greetings,
Willem
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO
2009-02-16 10:18 ` W. van den Akker
@ 2009-02-22 21:51 ` W. van den Akker
2009-02-23 17:31 ` Luis R. Rodriguez
0 siblings, 1 reply; 12+ messages in thread
From: W. van den Akker @ 2009-02-22 21:51 UTC (permalink / raw)
To: ath9k-devel
On Monday 16 February 2009 11:18:28 W. van den Akker wrote:
> > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez
> >
> > <lrodriguez@atheros.com> wrote:
> >> So I've gone back to the drawing board, and reviewed this issue
> >> as thoroughly as I can. The issue is PCI reads/writes can overlap
> >> with each other (not just writes). This shouldn't generally be an
> >> issue but if some reads take a while, for example, there could be
> >> another read/write on its way on another CPU and at least for our
> >> PCI 11n devices that will make them angry. Some PCI hosts don't seem
> >> to do this but some others do. It should be noted this issue is not
> >> present on our pre-802.11n devices or our new 11n PCI-express
> >> devices.
> >>
> >> So with clarified, here's a second attempt at serialization.
> >> The first patch wasn't doing anything because we never initialized
> >> ah->config.serialize_regmode. We do that now only on non-UP systems.
> >> The last patch in the series is perhaps overkill -- but it would deal
> >> with rare case of a UP system coming up and you hotplugging a second
> >> CPU later. It may also help with suspend, but don't quote me on that
> >> yet.
> >>
> >> Anyway, here's the latest stab at it:
> >>
> >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009-
> >>02-11/serialization-v2.patch
> >>
> >> This applies against today's wireless-testing/compat-wireless updates.
> >>
> >> Please test and let me know if ath9k with PCI devices on HT/Multi-CPU
> >> issues are corrected by it.
> >>
> >> Known issue: ping flood in a terminal makes it painful to come back.
> >>
> >> I've been trying to look for a more neater way to guarantee
> >> serialization
> >> but so far this is what I have. I do wonder, for example, if some of
> >> the atomic.h (atomic_inc_and_test()) stuff may let us use it to somehow
> >> serialize CPU entry into a read/write. Although its not designed for it
> >> may be worth considering. I also some of the most evil code I've seen
> >> lately on drivers/pci/quirks.c and did wonder if there was a fix we can
> >> re-use through there but didn't see anything. If you know have any other
> >> ideas please let me know.
> >
> > Can someone who is able to reproduce the SMP issue please try these
> > patches?
> >
> > Luis
> > _______________________________________________
> > ath9k-devel mailing list
> > ath9k-devel at lists.ath9k.org
> > https://lists.ath9k.org/mailman/listinfo/ath9k-devel
> >
> > --
> > This message has been scanned for viruses and
> > dangerous content by MailScanner, and is
> > believed to be clean.
>
> Hi Luis,
>
> I am currently on holiday. I have patched the system. But had some issues
> because with the UDEV and also my notebook wouldnt connect anymore to the
> AP.
> I had not had the time to dig into.
> Sunday I will investigate it further.
>
> Is it possible to create the patch against the mainstream RC instead of
> the RC4-wl? Then I can test it faster.
>
> Sorry for the delay.
>
I grabbed the latest git-testing today and applied the patch.
The server is running for almost 2 hours now with a double CPU. No problems
yet found. I have stressed the system. But no hangups.
Also the previous problems with UDEV are gone with this testing version.
So.... it looks like the patch is working for SMP systems
I will report tomorrow if its still up and running. But it looks promissing.
Greetings,
Willem
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO
2009-02-22 21:51 ` W. van den Akker
@ 2009-02-23 17:31 ` Luis R. Rodriguez
2009-02-23 17:45 ` W. van den Akker
0 siblings, 1 reply; 12+ messages in thread
From: Luis R. Rodriguez @ 2009-02-23 17:31 UTC (permalink / raw)
To: ath9k-devel
On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote:
> On Monday 16 February 2009 11:18:28 W. van den Akker wrote:
> > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez
> > >
> > > <lrodriguez@atheros.com> wrote:
> > >> So I've gone back to the drawing board, and reviewed this issue
> > >> as thoroughly as I can. The issue is PCI reads/writes can overlap
> > >> with each other (not just writes). This shouldn't generally be an
> > >> issue but if some reads take a while, for example, there could be
> > >> another read/write on its way on another CPU and at least for our
> > >> PCI 11n devices that will make them angry. Some PCI hosts don't seem
> > >> to do this but some others do. It should be noted this issue is not
> > >> present on our pre-802.11n devices or our new 11n PCI-express
> > >> devices.
> > >>
> > >> So with clarified, here's a second attempt at serialization.
> > >> The first patch wasn't doing anything because we never initialized
> > >> ah->config.serialize_regmode. We do that now only on non-UP systems.
> > >> The last patch in the series is perhaps overkill -- but it would deal
> > >> with rare case of a UP system coming up and you hotplugging a second
> > >> CPU later. It may also help with suspend, but don't quote me on that
> > >> yet.
> > >>
> > >> Anyway, here's the latest stab at it:
> > >>
> > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2009-
> > >>02-11/serialization-v2.patch
> > >>
> > >> This applies against today's wireless-testing/compat-wireless updates.
> > >>
> > >> Please test and let me know if ath9k with PCI devices on HT/Multi-CPU
> > >> issues are corrected by it.
> > >>
> > >> Known issue: ping flood in a terminal makes it painful to come back.
> > >>
> > >> I've been trying to look for a more neater way to guarantee
> > >> serialization
> > >> but so far this is what I have. I do wonder, for example, if some of
> > >> the atomic.h (atomic_inc_and_test()) stuff may let us use it to somehow
> > >> serialize CPU entry into a read/write. Although its not designed for it
> > >> may be worth considering. I also some of the most evil code I've seen
> > >> lately on drivers/pci/quirks.c and did wonder if there was a fix we can
> > >> re-use through there but didn't see anything. If you know have any other
> > >> ideas please let me know.
> > >
> > > Can someone who is able to reproduce the SMP issue please try these
> > > patches?
> > >
> > > Luis
> > > _______________________________________________
> > > ath9k-devel mailing list
> > > ath9k-devel at lists.ath9k.org
> > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel
> > >
> > > --
> > > This message has been scanned for viruses and
> > > dangerous content by MailScanner, and is
> > > believed to be clean.
> >
> > Hi Luis,
> >
> > I am currently on holiday. I have patched the system. But had some issues
> > because with the UDEV and also my notebook wouldnt connect anymore to the
> > AP.
> > I had not had the time to dig into.
> > Sunday I will investigate it further.
> >
> > Is it possible to create the patch against the mainstream RC instead of
> > the RC4-wl? Then I can test it faster.
> >
> > Sorry for the delay.
> >
>
> I grabbed the latest git-testing today and applied the patch.
> The server is running for almost 2 hours now with a double CPU. No problems
> yet found. I have stressed the system. But no hangups.
>
> Also the previous problems with UDEV are gone with this testing version.
>
> So.... it looks like the patch is working for SMP systems
>
> I will report tomorrow if its still up and running. But it looks promissing.
Willem,
thanks for the feedback so far -- you're the first to report back success
on the patches curing your issues. I believe you may have also enabled
maxcpus=1 before so just want to confirm that if you did have that that
you removed that from your grub conf for the shiny new kernel + serialization
patches.
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO
2009-02-23 17:31 ` Luis R. Rodriguez
@ 2009-02-23 17:45 ` W. van den Akker
2009-02-23 17:48 ` Luis R. Rodriguez
0 siblings, 1 reply; 12+ messages in thread
From: W. van den Akker @ 2009-02-23 17:45 UTC (permalink / raw)
To: ath9k-devel
On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote:
> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote:
> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote:
> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez
> > > >
> > > > <lrodriguez@atheros.com> wrote:
> > > >> So I've gone back to the drawing board, and reviewed this issue
> > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap
> > > >> with each other (not just writes). This shouldn't generally be an
> > > >> issue but if some reads take a while, for example, there could be
> > > >> another read/write on its way on another CPU and at least for our
> > > >> PCI 11n devices that will make them angry. Some PCI hosts don't seem
> > > >> to do this but some others do. It should be noted this issue is not
> > > >> present on our pre-802.11n devices or our new 11n PCI-express
> > > >> devices.
> > > >>
> > > >> So with clarified, here's a second attempt at serialization.
> > > >> The first patch wasn't doing anything because we never initialized
> > > >> ah->config.serialize_regmode. We do that now only on non-UP systems.
> > > >> The last patch in the series is perhaps overkill -- but it would
> > > >> deal with rare case of a UP system coming up and you hotplugging a
> > > >> second CPU later. It may also help with suspend, but don't quote me
> > > >> on that yet.
> > > >>
> > > >> Anyway, here's the latest stab at it:
> > > >>
> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2
> > > >>009- 02-11/serialization-v2.patch
> > > >>
> > > >> This applies against today's wireless-testing/compat-wireless
> > > >> updates.
> > > >>
> > > >> Please test and let me know if ath9k with PCI devices on
> > > >> HT/Multi-CPU issues are corrected by it.
> > > >>
> > > >> Known issue: ping flood in a terminal makes it painful to come back.
> > > >>
> > > >> I've been trying to look for a more neater way to guarantee
> > > >> serialization
> > > >> but so far this is what I have. I do wonder, for example, if some of
> > > >> the atomic.h (atomic_inc_and_test()) stuff may let us use it to
> > > >> somehow serialize CPU entry into a read/write. Although its not
> > > >> designed for it may be worth considering. I also some of the most
> > > >> evil code I've seen lately on drivers/pci/quirks.c and did wonder if
> > > >> there was a fix we can re-use through there but didn't see anything.
> > > >> If you know have any other ideas please let me know.
> > > >
> > > > Can someone who is able to reproduce the SMP issue please try these
> > > > patches?
> > > >
> > > > Luis
> > > > _______________________________________________
> > > > ath9k-devel mailing list
> > > > ath9k-devel at lists.ath9k.org
> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel
> > > >
> > > > --
> > > > This message has been scanned for viruses and
> > > > dangerous content by MailScanner, and is
> > > > believed to be clean.
> > >
> > > Hi Luis,
> > >
> > > I am currently on holiday. I have patched the system. But had some
> > > issues because with the UDEV and also my notebook wouldnt connect
> > > anymore to the AP.
> > > I had not had the time to dig into.
> > > Sunday I will investigate it further.
> > >
> > > Is it possible to create the patch against the mainstream RC instead of
> > > the RC4-wl? Then I can test it faster.
> > >
> > > Sorry for the delay.
> >
> > I grabbed the latest git-testing today and applied the patch.
> > The server is running for almost 2 hours now with a double CPU. No
> > problems yet found. I have stressed the system. But no hangups.
> >
> > Also the previous problems with UDEV are gone with this testing version.
> >
> > So.... it looks like the patch is working for SMP systems
> >
> > I will report tomorrow if its still up and running. But it looks
> > promissing.
>
> Willem,
>
> thanks for the feedback so far -- you're the first to report back success
> on the patches curing your issues. I believe you may have also enabled
> maxcpus=1 before so just want to confirm that if you did have that that
> you removed that from your grub conf for the shiny new kernel +
> serialization patches.
>
> Luis
Correct, I downloaded the last testing-writeless.. applied the patch and
removed the maxcpus=1 from the grub conf.
The server is showing 2 CPUs.
Now almost 24 hours up and running without any problems.
gr,
Willem
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO
2009-02-23 17:45 ` W. van den Akker
@ 2009-02-23 17:48 ` Luis R. Rodriguez
2009-02-23 18:01 ` W. van den Akker
2009-02-23 21:08 ` W. van den Akker
0 siblings, 2 replies; 12+ messages in thread
From: Luis R. Rodriguez @ 2009-02-23 17:48 UTC (permalink / raw)
To: ath9k-devel
On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl> wrote:
> On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote:
>> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote:
>> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote:
>> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez
>> > > >
>> > > > <lrodriguez@atheros.com> wrote:
>> > > >> So I've gone back to the drawing board, and reviewed this issue
>> > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap
>> > > >> with each other (not just writes). This shouldn't generally be an
>> > > >> issue but if some reads take a while, for example, there could be
>> > > >> another read/write on its way on another CPU and at least for our
>> > > >> PCI 11n devices that will make them angry. Some PCI hosts don't seem
>> > > >> to do this but some others do. It should be noted this issue is not
>> > > >> present on our pre-802.11n devices or our new 11n PCI-express
>> > > >> devices.
>> > > >>
>> > > >> So with clarified, here's a second attempt at serialization.
>> > > >> The first patch wasn't doing anything because we never initialized
>> > > >> ah->config.serialize_regmode. We do that now only on non-UP systems.
>> > > >> The last patch in the series is perhaps overkill -- but it would
>> > > >> deal with rare case of a UP system coming up and you hotplugging a
>> > > >> second CPU later. It may also help with suspend, but don't quote me
>> > > >> on that yet.
>> > > >>
>> > > >> Anyway, here's the latest stab at it:
>> > > >>
>> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2
>> > > >>009- 02-11/serialization-v2.patch
>> > > >>
>> > > >> This applies against today's wireless-testing/compat-wireless
>> > > >> updates.
>> > > >>
>> > > >> Please test and let me know if ath9k with PCI devices on
>> > > >> HT/Multi-CPU issues are corrected by it.
>> > > >>
>> > > >> Known issue: ping flood in a terminal makes it painful to come back.
>> > > >>
>> > > >> I've been trying to look for a more neater way to guarantee
>> > > >> serialization
>> > > >> but so far this is what I have. I do wonder, for example, if some of
>> > > >> the atomic.h (atomic_inc_and_test()) stuff may let us use it to
>> > > >> somehow serialize CPU entry into a read/write. Although its not
>> > > >> designed for it may be worth considering. I also some of the most
>> > > >> evil code I've seen lately on drivers/pci/quirks.c and did wonder if
>> > > >> there was a fix we can re-use through there but didn't see anything.
>> > > >> If you know have any other ideas please let me know.
>> > > >
>> > > > Can someone who is able to reproduce the SMP issue please try these
>> > > > patches?
>> > > >
>> > > > Luis
>> > > > _______________________________________________
>> > > > ath9k-devel mailing list
>> > > > ath9k-devel at lists.ath9k.org
>> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel
>> > > >
>> > > > --
>> > > > This message has been scanned for viruses and
>> > > > dangerous content by MailScanner, and is
>> > > > believed to be clean.
>> > >
>> > > Hi Luis,
>> > >
>> > > I am currently on holiday. I have patched the system. But had some
>> > > issues because with the UDEV and also my notebook wouldnt connect
>> > > anymore to the AP.
>> > > I had not had the time to dig into.
>> > > Sunday I will investigate it further.
>> > >
>> > > Is it possible to create the patch against the mainstream RC instead of
>> > > the RC4-wl? Then I can test it faster.
>> > >
>> > > Sorry for the delay.
>> >
>> > I grabbed the latest git-testing today and applied the patch.
>> > The server is running for almost 2 hours now with a double CPU. No
>> > problems yet found. I have stressed the system. But no hangups.
>> >
>> > Also the previous problems with UDEV are gone with this testing version.
>> >
>> > So.... it looks like the patch is working for SMP systems
>> >
>> > I will report tomorrow if its still up and running. But it looks
>> > promissing.
>>
>> Willem,
>>
>> thanks for the feedback so far -- you're the first to report back success
>> on the patches curing your issues. I believe you may have also enabled
>> maxcpus=1 before so just want to confirm that if you did have that that
>> you removed that from your grub conf for the shiny new kernel +
>> serialization patches.
>>
>> Luis
>
>
> Correct, I downloaded the last testing-writeless.. applied the patch and
> removed the maxcpus=1 from the grub conf.
> The server is showing 2 CPUs.
> Now almost 24 hours up and running without any problems.
Thanks for the confirmation -- are you using AR5416 PCI? Just want to
confirm as well.
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO
2009-02-23 17:48 ` Luis R. Rodriguez
@ 2009-02-23 18:01 ` W. van den Akker
2009-02-23 21:08 ` W. van den Akker
1 sibling, 0 replies; 12+ messages in thread
From: W. van den Akker @ 2009-02-23 18:01 UTC (permalink / raw)
To: ath9k-devel
On Monday 23 February 2009 18:48:09 Luis R. Rodriguez wrote:
> On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl>
wrote:
> > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote:
> >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote:
> >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote:
> >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez
> >> > > >
> >> > > > <lrodriguez@atheros.com> wrote:
> >> > > >> So I've gone back to the drawing board, and reviewed this issue
> >> > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap
> >> > > >> with each other (not just writes). This shouldn't generally be an
> >> > > >> issue but if some reads take a while, for example, there could be
> >> > > >> another read/write on its way on another CPU and at least for our
> >> > > >> PCI 11n devices that will make them angry. Some PCI hosts don't
> >> > > >> seem to do this but some others do. It should be noted this issue
> >> > > >> is not present on our pre-802.11n devices or our new 11n
> >> > > >> PCI-express devices.
> >> > > >>
> >> > > >> So with clarified, here's a second attempt at serialization.
> >> > > >> The first patch wasn't doing anything because we never
> >> > > >> initialized ah->config.serialize_regmode. We do that now only on
> >> > > >> non-UP systems. The last patch in the series is perhaps overkill
> >> > > >> -- but it would deal with rare case of a UP system coming up and
> >> > > >> you hotplugging a second CPU later. It may also help with
> >> > > >> suspend, but don't quote me on that yet.
> >> > > >>
> >> > > >> Anyway, here's the latest stab at it:
> >> > > >>
> >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9
> >> > > >>k/2 009- 02-11/serialization-v2.patch
> >> > > >>
> >> > > >> This applies against today's wireless-testing/compat-wireless
> >> > > >> updates.
> >> > > >>
> >> > > >> Please test and let me know if ath9k with PCI devices on
> >> > > >> HT/Multi-CPU issues are corrected by it.
> >> > > >>
> >> > > >> Known issue: ping flood in a terminal makes it painful to come
> >> > > >> back.
> >> > > >>
> >> > > >> I've been trying to look for a more neater way to guarantee
> >> > > >> serialization
> >> > > >> but so far this is what I have. I do wonder, for example, if some
> >> > > >> of the atomic.h (atomic_inc_and_test()) stuff may let us use it
> >> > > >> to somehow serialize CPU entry into a read/write. Although its
> >> > > >> not designed for it may be worth considering. I also some of the
> >> > > >> most evil code I've seen lately on drivers/pci/quirks.c and did
> >> > > >> wonder if there was a fix we can re-use through there but didn't
> >> > > >> see anything. If you know have any other ideas please let me
> >> > > >> know.
> >> > > >
> >> > > > Can someone who is able to reproduce the SMP issue please try
> >> > > > these patches?
> >> > > >
> >> > > > Luis
> >> > > > _______________________________________________
> >> > > > ath9k-devel mailing list
> >> > > > ath9k-devel at lists.ath9k.org
> >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel
> >> > > >
> >> > > > --
> >> > > > This message has been scanned for viruses and
> >> > > > dangerous content by MailScanner, and is
> >> > > > believed to be clean.
> >> > >
> >> > > Hi Luis,
> >> > >
> >> > > I am currently on holiday. I have patched the system. But had some
> >> > > issues because with the UDEV and also my notebook wouldnt connect
> >> > > anymore to the AP.
> >> > > I had not had the time to dig into.
> >> > > Sunday I will investigate it further.
> >> > >
> >> > > Is it possible to create the patch against the mainstream RC instead
> >> > > of the RC4-wl? Then I can test it faster.
> >> > >
> >> > > Sorry for the delay.
> >> >
> >> > I grabbed the latest git-testing today and applied the patch.
> >> > The server is running for almost 2 hours now with a double CPU. No
> >> > problems yet found. I have stressed the system. But no hangups.
> >> >
> >> > Also the previous problems with UDEV are gone with this testing
> >> > version.
> >> >
> >> > So.... it looks like the patch is working for SMP systems
> >> >
> >> > I will report tomorrow if its still up and running. But it looks
> >> > promissing.
> >>
> >> Willem,
> >>
> >> thanks for the feedback so far -- you're the first to report back
> >> success on the patches curing your issues. I believe you may have also
> >> enabled maxcpus=1 before so just want to confirm that if you did have
> >> that that you removed that from your grub conf for the shiny new kernel
> >> + serialization patches.
> >>
> >> Luis
> >
> > Correct, I downloaded the last testing-writeless.. applied the patch and
> > removed the maxcpus=1 from the grub conf.
> > The server is showing 2 CPUs.
> > Now almost 24 hours up and running without any problems.
>
> Thanks for the confirmation -- are you using AR5416 PCI? Just want to
> confirm as well.
>
> Luis
Yes, its a WMP300N v2 PCI card with a AR5416.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO
2009-02-23 17:48 ` Luis R. Rodriguez
2009-02-23 18:01 ` W. van den Akker
@ 2009-02-23 21:08 ` W. van den Akker
2009-02-23 23:58 ` Luis R. Rodriguez
1 sibling, 1 reply; 12+ messages in thread
From: W. van den Akker @ 2009-02-23 21:08 UTC (permalink / raw)
To: ath9k-devel
On Monday 23 February 2009 18:48:09 Luis R. Rodriguez wrote:
> On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl>
wrote:
> > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote:
> >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote:
> >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote:
> >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez
> >> > > >
> >> > > > <lrodriguez@atheros.com> wrote:
> >> > > >> So I've gone back to the drawing board, and reviewed this issue
> >> > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap
> >> > > >> with each other (not just writes). This shouldn't generally be an
> >> > > >> issue but if some reads take a while, for example, there could be
> >> > > >> another read/write on its way on another CPU and at least for our
> >> > > >> PCI 11n devices that will make them angry. Some PCI hosts don't
> >> > > >> seem to do this but some others do. It should be noted this issue
> >> > > >> is not present on our pre-802.11n devices or our new 11n
> >> > > >> PCI-express devices.
> >> > > >>
> >> > > >> So with clarified, here's a second attempt at serialization.
> >> > > >> The first patch wasn't doing anything because we never
> >> > > >> initialized ah->config.serialize_regmode. We do that now only on
> >> > > >> non-UP systems. The last patch in the series is perhaps overkill
> >> > > >> -- but it would deal with rare case of a UP system coming up and
> >> > > >> you hotplugging a second CPU later. It may also help with
> >> > > >> suspend, but don't quote me on that yet.
> >> > > >>
> >> > > >> Anyway, here's the latest stab at it:
> >> > > >>
> >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9
> >> > > >>k/2 009- 02-11/serialization-v2.patch
> >> > > >>
> >> > > >> This applies against today's wireless-testing/compat-wireless
> >> > > >> updates.
> >> > > >>
> >> > > >> Please test and let me know if ath9k with PCI devices on
> >> > > >> HT/Multi-CPU issues are corrected by it.
> >> > > >>
> >> > > >> Known issue: ping flood in a terminal makes it painful to come
> >> > > >> back.
> >> > > >>
> >> > > >> I've been trying to look for a more neater way to guarantee
> >> > > >> serialization
> >> > > >> but so far this is what I have. I do wonder, for example, if some
> >> > > >> of the atomic.h (atomic_inc_and_test()) stuff may let us use it
> >> > > >> to somehow serialize CPU entry into a read/write. Although its
> >> > > >> not designed for it may be worth considering. I also some of the
> >> > > >> most evil code I've seen lately on drivers/pci/quirks.c and did
> >> > > >> wonder if there was a fix we can re-use through there but didn't
> >> > > >> see anything. If you know have any other ideas please let me
> >> > > >> know.
> >> > > >
> >> > > > Can someone who is able to reproduce the SMP issue please try
> >> > > > these patches?
> >> > > >
> >> > > > Luis
> >> > > > _______________________________________________
> >> > > > ath9k-devel mailing list
> >> > > > ath9k-devel at lists.ath9k.org
> >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel
> >> > > >
> >> > > > --
> >> > > > This message has been scanned for viruses and
> >> > > > dangerous content by MailScanner, and is
> >> > > > believed to be clean.
> >> > >
> >> > > Hi Luis,
> >> > >
> >> > > I am currently on holiday. I have patched the system. But had some
> >> > > issues because with the UDEV and also my notebook wouldnt connect
> >> > > anymore to the AP.
> >> > > I had not had the time to dig into.
> >> > > Sunday I will investigate it further.
> >> > >
> >> > > Is it possible to create the patch against the mainstream RC instead
> >> > > of the RC4-wl? Then I can test it faster.
> >> > >
> >> > > Sorry for the delay.
> >> >
> >> > I grabbed the latest git-testing today and applied the patch.
> >> > The server is running for almost 2 hours now with a double CPU. No
> >> > problems yet found. I have stressed the system. But no hangups.
> >> >
> >> > Also the previous problems with UDEV are gone with this testing
> >> > version.
> >> >
> >> > So.... it looks like the patch is working for SMP systems
> >> >
> >> > I will report tomorrow if its still up and running. But it looks
> >> > promissing.
> >>
> >> Willem,
> >>
> >> thanks for the feedback so far -- you're the first to report back
> >> success on the patches curing your issues. I believe you may have also
> >> enabled maxcpus=1 before so just want to confirm that if you did have
> >> that that you removed that from your grub conf for the shiny new kernel
> >> + serialization patches.
> >>
> >> Luis
> >
> > Correct, I downloaded the last testing-writeless.. applied the patch and
> > removed the maxcpus=1 from the grub conf.
> > The server is showing 2 CPUs.
> > Now almost 24 hours up and running without any problems.
>
> Thanks for the confirmation -- are you using AR5416 PCI? Just want to
> confirm as well.
>
> Luis
hmmm,
After 23,5 hours of problemless working the server hangs.
Also without any trace or oops.
Close but no sigar.... anyhow ... I think we are almost there.
gr,
Willem
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO
2009-02-23 21:08 ` W. van den Akker
@ 2009-02-23 23:58 ` Luis R. Rodriguez
2009-02-24 6:19 ` W. van den Akker
0 siblings, 1 reply; 12+ messages in thread
From: Luis R. Rodriguez @ 2009-02-23 23:58 UTC (permalink / raw)
To: ath9k-devel
On Mon, Feb 23, 2009 at 1:08 PM, W. van den Akker <listsrv@wilsoft.nl> wrote:
> On Monday 23 February 2009 18:48:09 Luis R. Rodriguez wrote:
>> On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl>
> wrote:
>> > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote:
>> >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote:
>> >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote:
>> >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez
>> >> > > >
>> >> > > > <lrodriguez@atheros.com> wrote:
>> >> > > >> So I've gone back to the drawing board, and reviewed this issue
>> >> > > >> as thoroughly as I can. The issue is PCI reads/writes can overlap
>> >> > > >> with each other (not just writes). This shouldn't generally be an
>> >> > > >> issue but if some reads take a while, for example, there could be
>> >> > > >> another read/write on its way on another CPU and at least for our
>> >> > > >> PCI 11n devices that will make them angry. Some PCI hosts don't
>> >> > > >> seem to do this but some others do. It should be noted this issue
>> >> > > >> is not present on our pre-802.11n devices or our new 11n
>> >> > > >> PCI-express devices.
>> >> > > >>
>> >> > > >> So with clarified, here's a second attempt at serialization.
>> >> > > >> The first patch wasn't doing anything because we never
>> >> > > >> initialized ah->config.serialize_regmode. We do that now only on
>> >> > > >> non-UP systems. The last patch in the series is perhaps overkill
>> >> > > >> -- but it would deal with rare case of a UP system coming up and
>> >> > > >> you hotplugging a second CPU later. It may also help with
>> >> > > >> suspend, but don't quote me on that yet.
>> >> > > >>
>> >> > > >> Anyway, here's the latest stab at it:
>> >> > > >>
>> >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9
>> >> > > >>k/2 009- 02-11/serialization-v2.patch
>> >> > > >>
>> >> > > >> This applies against today's wireless-testing/compat-wireless
>> >> > > >> updates.
>> >> > > >>
>> >> > > >> Please test and let me know if ath9k with PCI devices on
>> >> > > >> HT/Multi-CPU issues are corrected by it.
>> >> > > >>
>> >> > > >> Known issue: ping flood in a terminal makes it painful to come
>> >> > > >> back.
>> >> > > >>
>> >> > > >> I've been trying to look for a more neater way to guarantee
>> >> > > >> serialization
>> >> > > >> but so far this is what I have. I do wonder, for example, if some
>> >> > > >> of the atomic.h (atomic_inc_and_test()) stuff may let us use it
>> >> > > >> to somehow serialize CPU entry into a read/write. Although its
>> >> > > >> not designed for it may be worth considering. I also some of the
>> >> > > >> most evil code I've seen lately on drivers/pci/quirks.c and did
>> >> > > >> wonder if there was a fix we can re-use through there but didn't
>> >> > > >> see anything. If you know have any other ideas please let me
>> >> > > >> know.
>> >> > > >
>> >> > > > Can someone who is able to reproduce the SMP issue please try
>> >> > > > these patches?
>> >> > > >
>> >> > > > Luis
>> >> > > > _______________________________________________
>> >> > > > ath9k-devel mailing list
>> >> > > > ath9k-devel at lists.ath9k.org
>> >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel
>> >> > > >
>> >> > > > --
>> >> > > > This message has been scanned for viruses and
>> >> > > > dangerous content by MailScanner, and is
>> >> > > > believed to be clean.
>> >> > >
>> >> > > Hi Luis,
>> >> > >
>> >> > > I am currently on holiday. I have patched the system. But had some
>> >> > > issues because with the UDEV and also my notebook wouldnt connect
>> >> > > anymore to the AP.
>> >> > > I had not had the time to dig into.
>> >> > > Sunday I will investigate it further.
>> >> > >
>> >> > > Is it possible to create the patch against the mainstream RC instead
>> >> > > of the RC4-wl? Then I can test it faster.
>> >> > >
>> >> > > Sorry for the delay.
>> >> >
>> >> > I grabbed the latest git-testing today and applied the patch.
>> >> > The server is running for almost 2 hours now with a double CPU. No
>> >> > problems yet found. I have stressed the system. But no hangups.
>> >> >
>> >> > Also the previous problems with UDEV are gone with this testing
>> >> > version.
>> >> >
>> >> > So.... it looks like the patch is working for SMP systems
>> >> >
>> >> > I will report tomorrow if its still up and running. But it looks
>> >> > promissing.
>> >>
>> >> Willem,
>> >>
>> >> thanks for the feedback so far -- you're the first to report back
>> >> success on the patches curing your issues. I believe you may have also
>> >> enabled maxcpus=1 before so just want to confirm that if you did have
>> >> that that you removed that from your grub conf for the shiny new kernel
>> >> + serialization patches.
>> >>
>> >> Luis
>> >
>> > Correct, I downloaded the last testing-writeless.. applied the patch and
>> > removed the maxcpus=1 from the grub conf.
>> > The server is showing 2 CPUs.
>> > Now almost 24 hours up and running without any problems.
>>
>> Thanks for the confirmation -- are you using AR5416 PCI? Just want to
>> confirm as well.
>>
>> Luis
>
> hmmm,
>
> After 23,5 hours of problemless working the server hangs.
> Also without any trace or oops.
> Close but no sigar.... anyhow ... I think we are almost there.
We may be dealing with two separate issues here. Before the
serialization patches (or if you revert them) how long before you box
hangs?
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO
2009-02-23 23:58 ` Luis R. Rodriguez
@ 2009-02-24 6:19 ` W. van den Akker
2009-02-24 6:27 ` Luis R. Rodriguez
0 siblings, 1 reply; 12+ messages in thread
From: W. van den Akker @ 2009-02-24 6:19 UTC (permalink / raw)
To: ath9k-devel
> On Mon, Feb 23, 2009 at 1:08 PM, W. van den Akker <listsrv@wilsoft.nl>
> wrote:
>> On Monday 23 February 2009 18:48:09 Luis R. Rodriguez wrote:
>>> On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl>
>> wrote:
>>> > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote:
>>> >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote:
>>> >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote:
>>> >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez
>>> >> > > >
>>> >> > > > <lrodriguez@atheros.com> wrote:
>>> >> > > >> So I've gone back to the drawing board, and reviewed this
>>> issue
>>> >> > > >> as thoroughly as I can. The issue is PCI reads/writes can
>>> overlap
>>> >> > > >> with each other (not just writes). This shouldn't generally
>>> be an
>>> >> > > >> issue but if some reads take a while, for example, there
>>> could be
>>> >> > > >> another read/write on its way on another CPU and at least for
>>> our
>>> >> > > >> PCI 11n devices that will make them angry. Some PCI hosts
>>> don't
>>> >> > > >> seem to do this but some others do. It should be noted this
>>> issue
>>> >> > > >> is not present on our pre-802.11n devices or our new 11n
>>> >> > > >> PCI-express devices.
>>> >> > > >>
>>> >> > > >> So with clarified, here's a second attempt at serialization.
>>> >> > > >> The first patch wasn't doing anything because we never
>>> >> > > >> initialized ah->config.serialize_regmode. We do that now only
>>> on
>>> >> > > >> non-UP systems. The last patch in the series is perhaps
>>> overkill
>>> >> > > >> -- but it would deal with rare case of a UP system coming up
>>> and
>>> >> > > >> you hotplugging a second CPU later. It may also help with
>>> >> > > >> suspend, but don't quote me on that yet.
>>> >> > > >>
>>> >> > > >> Anyway, here's the latest stab at it:
>>> >> > > >>
>>> >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9
>>> >> > > >>k/2 009- 02-11/serialization-v2.patch
>>> >> > > >>
>>> >> > > >> This applies against today's wireless-testing/compat-wireless
>>> >> > > >> updates.
>>> >> > > >>
>>> >> > > >> Please test and let me know if ath9k with PCI devices on
>>> >> > > >> HT/Multi-CPU issues are corrected by it.
>>> >> > > >>
>>> >> > > >> Known issue: ping flood in a terminal makes it painful to
>>> come
>>> >> > > >> back.
>>> >> > > >>
>>> >> > > >> I've been trying to look for a more neater way to guarantee
>>> >> > > >> serialization
>>> >> > > >> but so far this is what I have. I do wonder, for example, if
>>> some
>>> >> > > >> of the atomic.h (atomic_inc_and_test()) stuff may let us use
>>> it
>>> >> > > >> to somehow serialize CPU entry into a read/write. Although
>>> its
>>> >> > > >> not designed for it may be worth considering. I also some of
>>> the
>>> >> > > >> most evil code I've seen lately on drivers/pci/quirks.c and
>>> did
>>> >> > > >> wonder if there was a fix we can re-use through there but
>>> didn't
>>> >> > > >> see anything. If you know have any other ideas please let me
>>> >> > > >> know.
>>> >> > > >
>>> >> > > > Can someone who is able to reproduce the SMP issue please try
>>> >> > > > these patches?
>>> >> > > >
>>> >> > > > Luis
>>> >> > > > _______________________________________________
>>> >> > > > ath9k-devel mailing list
>>> >> > > > ath9k-devel at lists.ath9k.org
>>> >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel
>>> >> > > >
>>> >> > > > --
>>> >> > > > This message has been scanned for viruses and
>>> >> > > > dangerous content by MailScanner, and is
>>> >> > > > believed to be clean.
>>> >> > >
>>> >> > > Hi Luis,
>>> >> > >
>>> >> > > I am currently on holiday. I have patched the system. But had
>>> some
>>> >> > > issues because with the UDEV and also my notebook wouldnt
>>> connect
>>> >> > > anymore to the AP.
>>> >> > > I had not had the time to dig into.
>>> >> > > Sunday I will investigate it further.
>>> >> > >
>>> >> > > Is it possible to create the patch against the mainstream RC
>>> instead
>>> >> > > of the RC4-wl? Then I can test it faster.
>>> >> > >
>>> >> > > Sorry for the delay.
>>> >> >
>>> >> > I grabbed the latest git-testing today and applied the patch.
>>> >> > The server is running for almost 2 hours now with a double CPU. No
>>> >> > problems yet found. I have stressed the system. But no hangups.
>>> >> >
>>> >> > Also the previous problems with UDEV are gone with this testing
>>> >> > version.
>>> >> >
>>> >> > So.... it looks like the patch is working for SMP systems
>>> >> >
>>> >> > I will report tomorrow if its still up and running. But it looks
>>> >> > promissing.
>>> >>
>>> >> Willem,
>>> >>
>>> >> thanks for the feedback so far -- you're the first to report back
>>> >> success on the patches curing your issues. I believe you may have
>>> also
>>> >> enabled maxcpus=1 before so just want to confirm that if you did
>>> have
>>> >> that that you removed that from your grub conf for the shiny new
>>> kernel
>>> >> + serialization patches.
>>> >>
>>> >> Luis
>>> >
>>> > Correct, I downloaded the last testing-writeless.. applied the patch
>>> and
>>> > removed the maxcpus=1 from the grub conf.
>>> > The server is showing 2 CPUs.
>>> > Now almost 24 hours up and running without any problems.
>>>
>>> Thanks for the confirmation -- are you using AR5416 PCI? Just want to
>>> confirm as well.
>>>
>>> Luis
>>
>> hmmm,
>>
>> After 23,5 hours of problemless working the server hangs.
>> Also without any trace or oops.
>> Close but no sigar.... anyhow ... I think we are almost there.
>
> We may be dealing with two separate issues here. Before the
> serialization patches (or if you revert them) how long before you box
> hangs?
>
> Luis
Before the patches applied the box hangs within minutes. After the patch
the box operated 23.5 hours without problems.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [ath9k-devel] [RFC v2] Serialization of IO
2009-02-24 6:19 ` W. van den Akker
@ 2009-02-24 6:27 ` Luis R. Rodriguez
0 siblings, 0 replies; 12+ messages in thread
From: Luis R. Rodriguez @ 2009-02-24 6:27 UTC (permalink / raw)
To: ath9k-devel
On Mon, Feb 23, 2009 at 10:19 PM, W. van den Akker <listsrv@wilsoft.nl> wrote:
>> On Mon, Feb 23, 2009 at 1:08 PM, W. van den Akker <listsrv@wilsoft.nl>
>> wrote:
>>> On Monday 23 February 2009 18:48:09 Luis R. Rodriguez wrote:
>>>> On Mon, Feb 23, 2009 at 9:45 AM, W. van den Akker <listsrv@wilsoft.nl>
>>> wrote:
>>>> > On Monday 23 February 2009 18:31:46 Luis R. Rodriguez wrote:
>>>> >> On Sun, Feb 22, 2009 at 01:51:38PM -0800, W. van den Akker wrote:
>>>> >> > On Monday 16 February 2009 11:18:28 W. van den Akker wrote:
>>>> >> > > > On Wed, Feb 11, 2009 at 12:07 AM, Luis R. Rodriguez
>>>> >> > > >
>>>> >> > > > <lrodriguez@atheros.com> wrote:
>>>> >> > > >> So I've gone back to the drawing board, and reviewed this
>>>> issue
>>>> >> > > >> as thoroughly as I can. The issue is PCI reads/writes can
>>>> overlap
>>>> >> > > >> with each other (not just writes). This shouldn't generally
>>>> be an
>>>> >> > > >> issue but if some reads take a while, for example, there
>>>> could be
>>>> >> > > >> another read/write on its way on another CPU and at least for
>>>> our
>>>> >> > > >> PCI 11n devices that will make them angry. Some PCI hosts
>>>> don't
>>>> >> > > >> seem to do this but some others do. It should be noted this
>>>> issue
>>>> >> > > >> is not present on our pre-802.11n devices or our new 11n
>>>> >> > > >> PCI-express devices.
>>>> >> > > >>
>>>> >> > > >> So with clarified, here's a second attempt at serialization.
>>>> >> > > >> The first patch wasn't doing anything because we never
>>>> >> > > >> initialized ah->config.serialize_regmode. We do that now only
>>>> on
>>>> >> > > >> non-UP systems. The last patch in the series is perhaps
>>>> overkill
>>>> >> > > >> -- but it would deal with rare case of a UP system coming up
>>>> and
>>>> >> > > >> you hotplugging a second CPU later. It may also help with
>>>> >> > > >> suspend, but don't quote me on that yet.
>>>> >> > > >>
>>>> >> > > >> Anyway, here's the latest stab at it:
>>>> >> > > >>
>>>> >> > > >> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9
>>>> >> > > >>k/2 009- 02-11/serialization-v2.patch
>>>> >> > > >>
>>>> >> > > >> This applies against today's wireless-testing/compat-wireless
>>>> >> > > >> updates.
>>>> >> > > >>
>>>> >> > > >> Please test and let me know if ath9k with PCI devices on
>>>> >> > > >> HT/Multi-CPU issues are corrected by it.
>>>> >> > > >>
>>>> >> > > >> Known issue: ping flood in a terminal makes it painful to
>>>> come
>>>> >> > > >> back.
>>>> >> > > >>
>>>> >> > > >> I've been trying to look for a more neater way to guarantee
>>>> >> > > >> serialization
>>>> >> > > >> but so far this is what I have. I do wonder, for example, if
>>>> some
>>>> >> > > >> of the atomic.h (atomic_inc_and_test()) stuff may let us use
>>>> it
>>>> >> > > >> to somehow serialize CPU entry into a read/write. Although
>>>> its
>>>> >> > > >> not designed for it may be worth considering. I also some of
>>>> the
>>>> >> > > >> most evil code I've seen lately on drivers/pci/quirks.c and
>>>> did
>>>> >> > > >> wonder if there was a fix we can re-use through there but
>>>> didn't
>>>> >> > > >> see anything. If you know have any other ideas please let me
>>>> >> > > >> know.
>>>> >> > > >
>>>> >> > > > Can someone who is able to reproduce the SMP issue please try
>>>> >> > > > these patches?
>>>> >> > > >
>>>> >> > > > ? Luis
>>>> >> > > > _______________________________________________
>>>> >> > > > ath9k-devel mailing list
>>>> >> > > > ath9k-devel at lists.ath9k.org
>>>> >> > > > https://lists.ath9k.org/mailman/listinfo/ath9k-devel
>>>> >> > > >
>>>> >> > > > --
>>>> >> > > > This message has been scanned for viruses and
>>>> >> > > > dangerous content by MailScanner, and is
>>>> >> > > > believed to be clean.
>>>> >> > >
>>>> >> > > Hi Luis,
>>>> >> > >
>>>> >> > > I am currently on holiday. I have patched the system. But had
>>>> some
>>>> >> > > issues because with the UDEV and also my notebook wouldnt
>>>> connect
>>>> >> > > anymore to the AP.
>>>> >> > > I had not had the time to dig into.
>>>> >> > > Sunday I will investigate it further.
>>>> >> > >
>>>> >> > > Is it possible to create the patch against the mainstream RC
>>>> instead
>>>> >> > > of the RC4-wl? Then I can test it faster.
>>>> >> > >
>>>> >> > > Sorry for the delay.
>>>> >> >
>>>> >> > I grabbed the latest git-testing today and applied the patch.
>>>> >> > The server is running for almost 2 hours now with a double CPU. No
>>>> >> > problems yet found. I have stressed the system. But no hangups.
>>>> >> >
>>>> >> > Also the previous problems with UDEV are gone with this testing
>>>> >> > version.
>>>> >> >
>>>> >> > So.... it looks like the patch is working for SMP systems
>>>> >> >
>>>> >> > I will report tomorrow if its still up and running. But it looks
>>>> >> > promissing.
>>>> >>
>>>> >> Willem,
>>>> >>
>>>> >> thanks for the feedback so far -- you're the first to report back
>>>> >> success on the patches curing your issues. I believe you may have
>>>> also
>>>> >> enabled maxcpus=1 before so just want to confirm that if you did
>>>> have
>>>> >> that that you removed that from your grub conf for the shiny new
>>>> kernel
>>>> >> + serialization patches.
>>>> >>
>>>> >> ? Luis
>>>> >
>>>> > Correct, I downloaded the last testing-writeless.. applied the patch
>>>> and
>>>> > removed the maxcpus=1 from the grub conf.
>>>> > The server is showing 2 CPUs.
>>>> > Now almost 24 hours up and running without any problems.
>>>>
>>>> Thanks for the confirmation -- are you using AR5416 PCI? Just want to
>>>> confirm as well.
>>>>
>>>> ? Luis
>>>
>>> hmmm,
>>>
>>> After 23,5 hours of problemless working the server hangs.
>>> Also without any trace or oops.
>>> Close but no sigar.... anyhow ... I think we are almost there.
>>
>> We may be dealing with two separate issues here. Before the
>> serialization patches (or if you revert them) how long before you box
>> hangs?
>>
>> ? Luis
>
> Before the patches applied the box hangs within minutes. After the patch
> the box operated 23.5 hours without problems.
Excellent so we will push these in -- please give the new series a
shot that I just posted.
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-02-24 6:27 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-11 8:07 [ath9k-devel] [RFC v2] Serialization of IO Luis R. Rodriguez
2009-02-15 1:29 ` Luis R. Rodriguez
2009-02-16 10:18 ` W. van den Akker
2009-02-22 21:51 ` W. van den Akker
2009-02-23 17:31 ` Luis R. Rodriguez
2009-02-23 17:45 ` W. van den Akker
2009-02-23 17:48 ` Luis R. Rodriguez
2009-02-23 18:01 ` W. van den Akker
2009-02-23 21:08 ` W. van den Akker
2009-02-23 23:58 ` Luis R. Rodriguez
2009-02-24 6:19 ` W. van den Akker
2009-02-24 6:27 ` Luis R. Rodriguez
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.