* [PATCH] aic94xx: Hotplug ex_change_count race fix
@ 2006-09-26 22:05 Alexis Bruemmer
2006-10-03 14:19 ` James Bottomley
0 siblings, 1 reply; 8+ messages in thread
From: Alexis Bruemmer @ 2006-09-26 22:05 UTC (permalink / raw)
To: linux-scsi
In some cases while hotplugging disks on a system with an expander the
broadcast primitive will be posted and begin processing before the
expander change count is updated. This causes the device that triggered
the broadcast to not be found.
--Alexis
Signed-off-by: Alexis Bruemmer <alexisb@us.ibm.com>
-------
diff -pNura linux-2.6.18-git4-orig/drivers/scsi/libsas/sas_expander.c linux-2.6.18-git4/drivers/scsi/libsas/sas_expander.c
--- linux-2.6.18-git4-orig/drivers/scsi/libsas/sas_expander.c 2006-09-26 07:42:43.000000000 -0700
+++ linux-2.6.18-git4/drivers/scsi/libsas/sas_expander.c 2006-09-26 08:28:50.000000000 -0700
@@ -24,6 +24,7 @@
#include <linux/pci.h>
#include <linux/scatterlist.h>
+#include <linux/delay.h>
#include "sas_internal.h"
@@ -1760,23 +1761,34 @@ out:
int sas_ex_revalidate_domain(struct domain_device *port_dev)
{
int res;
+ int i;
struct domain_device *dev = NULL;
- res = sas_find_bcast_dev(port_dev, &dev);
- if (res)
- goto out;
- if (dev) {
- struct expander_device *ex = &dev->ex_dev;
- int i = 0, phy_id;
-
- do {
- phy_id = -1;
- res = sas_find_bcast_phy(dev, &phy_id, i);
- if (phy_id == -1)
- break;
- res = sas_rediscover(dev, phy_id);
- i = phy_id + 1;
- } while (i < ex->num_phys);
+
+ /* Some expanders seem to generate a BROADCAST primitive before
+ * they actually update their own "expander change count" field!
+ * If we didn't find a device that caused the BROADCAST
+ * primitive, let us wait and retry.
+ */
+ for (i = 0; i < 2; i++) {
+ res = sas_find_bcast_dev(port_dev, &dev);
+ if (res)
+ goto out;
+ if (dev) {
+ struct expander_device *ex = &dev->ex_dev;
+ int i = 0, phy_id;
+
+ do {
+ phy_id = -1;
+ res = sas_find_bcast_phy(dev, &phy_id, i);
+ if (phy_id == -1)
+ break;
+ res = sas_rediscover(dev, phy_id);
+ i = phy_id + 1;
+ } while (i < ex->num_phys);
+ break;
+ }
+ ssleep(1);
}
out:
return res;
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] aic94xx: Hotplug ex_change_count race fix
2006-09-26 22:05 [PATCH] aic94xx: Hotplug ex_change_count race fix Alexis Bruemmer
@ 2006-10-03 14:19 ` James Bottomley
2006-10-03 15:30 ` Alexis Bruemmer
2006-10-03 20:52 ` Luben Tuikov
0 siblings, 2 replies; 8+ messages in thread
From: James Bottomley @ 2006-10-03 14:19 UTC (permalink / raw)
To: Alexis Bruemmer; +Cc: linux-scsi
On Tue, 2006-09-26 at 15:05 -0700, Alexis Bruemmer wrote:
> In some cases while hotplugging disks on a system with an expander the
> broadcast primitive will be posted and begin processing before the
> expander change count is updated. This causes the device that triggered
> the broadcast to not be found.
Thanks; I'll stick this in.
However, it has always struck me that this broadcast code is fragile
because of the way event processing works. If we get two fairly close
together broadcast events, we'll amalgamate them into a single event and
then stop processing as soon as we find one expander that changed, if
you want to look at sorting that out ...
James
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] aic94xx: Hotplug ex_change_count race fix
2006-10-03 14:19 ` James Bottomley
@ 2006-10-03 15:30 ` Alexis Bruemmer
2006-10-04 23:52 ` malahal
2006-10-03 20:52 ` Luben Tuikov
1 sibling, 1 reply; 8+ messages in thread
From: Alexis Bruemmer @ 2006-10-03 15:30 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi
On Tue, 2006-10-03 at 09:19 -0500, James Bottomley wrote:
> On Tue, 2006-09-26 at 15:05 -0700, Alexis Bruemmer wrote:
> > In some cases while hotplugging disks on a system with an expander the
> > broadcast primitive will be posted and begin processing before the
> > expander change count is updated. This causes the device that triggered
> > the broadcast to not be found.
>
> Thanks; I'll stick this in.
>
> However, it has always struck me that this broadcast code is fragile
> because of the way event processing works. If we get two fairly close
> together broadcast events, we'll amalgamate them into a single event and
> then stop processing as soon as we find one expander that changed, if
> you want to look at sorting that out ...
I will look into it.
--Alexis
>
> James
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] aic94xx: Hotplug ex_change_count race fix
2006-10-03 15:30 ` Alexis Bruemmer
@ 2006-10-04 23:52 ` malahal
0 siblings, 0 replies; 8+ messages in thread
From: malahal @ 2006-10-04 23:52 UTC (permalink / raw)
To: Alexis Bruemmer; +Cc: James Bottomley, linux-scsi
Yes, I noticed that we stop processing as soon as we find a device that
has change count. We should go through all the devices that have change
counts. At least, in FCP world an RSCN (similar to BROADCAST in SAS)
may be delayed by the Fabric to collect few changes.
Thanks, Malahal.
Alexis Bruemmer [alexisb@us.ibm.com] wrote:
> On Tue, 2006-10-03 at 09:19 -0500, James Bottomley wrote:
> > On Tue, 2006-09-26 at 15:05 -0700, Alexis Bruemmer wrote:
> > > In some cases while hotplugging disks on a system with an expander the
> > > broadcast primitive will be posted and begin processing before the
> > > expander change count is updated. This causes the device that triggered
> > > the broadcast to not be found.
> >
> > Thanks; I'll stick this in.
> >
> > However, it has always struck me that this broadcast code is fragile
> > because of the way event processing works. If we get two fairly close
> > together broadcast events, we'll amalgamate them into a single event and
> > then stop processing as soon as we find one expander that changed, if
> > you want to look at sorting that out ...
>
> I will look into it.
>
> --Alexis
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] aic94xx: Hotplug ex_change_count race fix
2006-10-03 14:19 ` James Bottomley
2006-10-03 15:30 ` Alexis Bruemmer
@ 2006-10-03 20:52 ` Luben Tuikov
2006-10-03 21:18 ` Jeff Garzik
1 sibling, 1 reply; 8+ messages in thread
From: Luben Tuikov @ 2006-10-03 20:52 UTC (permalink / raw)
To: James Bottomley, Alexis Bruemmer; +Cc: linux-scsi
--- James Bottomley <James.Bottomley@SteelEye.com> wrote:
> On Tue, 2006-09-26 at 15:05 -0700, Alexis Bruemmer wrote:
> > In some cases while hotplugging disks on a system with an expander the
> > broadcast primitive will be posted and begin processing before the
> > expander change count is updated. This causes the device that triggered
> > the broadcast to not be found.
>
> Thanks; I'll stick this in.
>
> However, it has always struck me that this broadcast code is fragile
> because of the way event processing works. If we get two fairly close
> together broadcast events, we'll amalgamate them into a single event and
> then stop processing as soon as we find one expander that changed, if
> you want to look at sorting that out ...
You have to make a clarification: "the way event processing works" means
the way it works after you removed all of event processing from my code the
way I wrote it, and added a very naive "event processing" if it can be
called that.
Event processing seems to work just fine with my version of my code the
way I maintain it, and the way you had it originally on this list.
Just clarifying in case someone gets confused reading the code.
>
> James
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] aic94xx: Hotplug ex_change_count race fix
2006-10-03 20:52 ` Luben Tuikov
@ 2006-10-03 21:18 ` Jeff Garzik
2006-10-04 2:29 ` Douglas Gilbert
2006-10-04 7:39 ` Luben Tuikov
0 siblings, 2 replies; 8+ messages in thread
From: Jeff Garzik @ 2006-10-03 21:18 UTC (permalink / raw)
To: ltuikov; +Cc: James Bottomley, Alexis Bruemmer, linux-scsi
Luben Tuikov wrote:
> You have to make a clarification: "the way event processing works" means
> the way it works after you removed all of event processing from my code the
> way I wrote it, and added a very naive "event processing" if it can be
> called that.
>
> Event processing seems to work just fine with my version of my code the
> way I maintain it, and the way you had it originally on this list.
>
> Just clarifying in case someone gets confused reading the code.
And we were all waiting breathlessly for this clarification too, I
assure you.
In the dictionary under the phrase "sore loser", it says "see: Luben"
Jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] aic94xx: Hotplug ex_change_count race fix
2006-10-03 21:18 ` Jeff Garzik
@ 2006-10-04 2:29 ` Douglas Gilbert
2006-10-04 7:39 ` Luben Tuikov
1 sibling, 0 replies; 8+ messages in thread
From: Douglas Gilbert @ 2006-10-04 2:29 UTC (permalink / raw)
To: Jeff Garzik; +Cc: ltuikov, James Bottomley, Alexis Bruemmer, linux-scsi
Jeff Garzik wrote:
> Luben Tuikov wrote:
>> You have to make a clarification: "the way event processing works" means
>> the way it works after you removed all of event processing from my
>> code the
>> way I wrote it, and added a very naive "event processing" if it can be
>> called that.
>>
>> Event processing seems to work just fine with my version of my code the
>> way I maintain it, and the way you had it originally on this list.
>>
>> Just clarifying in case someone gets confused reading the code.
>
> And we were all waiting breathlessly for this clarification too, I
> assure you.
>
> In the dictionary under the phrase "sore loser", it says "see: Luben"
I work with both drivers and provide feedback to their
maintainers. IMO the loser is "linux".
One handicap the official driver has is the SATL it has
decided to use.
... back to my redmond os for testing ...
Doug Gilbert
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] aic94xx: Hotplug ex_change_count race fix
2006-10-03 21:18 ` Jeff Garzik
2006-10-04 2:29 ` Douglas Gilbert
@ 2006-10-04 7:39 ` Luben Tuikov
1 sibling, 0 replies; 8+ messages in thread
From: Luben Tuikov @ 2006-10-04 7:39 UTC (permalink / raw)
To: Jeff Garzik; +Cc: James Bottomley, Alexis Bruemmer, linux-scsi
--- Jeff Garzik <jeff@garzik.org> wrote:
> Luben Tuikov wrote:
> > You have to make a clarification: "the way event processing works" means
> > the way it works after you removed all of event processing from my code the
> > way I wrote it, and added a very naive "event processing" if it can be
> > called that.
> >
> > Event processing seems to work just fine with my version of my code the
> > way I maintain it, and the way you had it originally on this list.
> >
> > Just clarifying in case someone gets confused reading the code.
>
> And we were all waiting breathlessly for this clarification too, I
> assure you.
>
> In the dictionary under the phrase "sore loser", it says "see: Luben"
Sorry that you feel this way Jeff and that your have to resort
to personal attacks. It makes me doubt your professionalism.
Merely on technical grounds and the fact that the code still
bears my copyright, I wanted to mention that all those bugs
being posted against aic94xx and the SAS Stack, and "fixed" by
Bottomley and LTC are not due to the reason that I wrote the
code with bugs, but due to the fact of Bottomley blocking the code
from going in, then him and LTC removing large portions of perfectly well
working and tested code (SAS analysed and complient) with their
own versions and interpretations, only later to "fix" the very code
they submitted as a substition of a perfectly working code I wrote
that they removed.
For example, event management is one such:
http://marc.theaimsgroup.com/?l=linux-scsi&m=115895097501842&w=2
Another example, commit 2908d778ab3e244900c310974e1fc1c69066e450,
http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2908d778ab3e244900c310974e1fc1c69066e450
has things in it like this:
...
o Preliminary expander support for aic94xx
...
All the while aic94xx and the SAS Stack had always supported
expanders. There is just a lot of misleading information,
and I don't want the casual reader to believe that aic94xx
and the SAS Stack as I posted them didn't support expanders
all the while they did.
And I'm sure you don't want such confusion too and as a member
of the open source community will do everything in your power
to disambiguate that. For example mentioning the announcement
emails of the very first posting of the SAS Stack and aic94xx:
http://marc.theaimsgroup.com/?l=linux-scsi&m=112629423714248&w=2
http://marc.theaimsgroup.com/?l=linux-scsi&m=112629509826900&w=2
http://marc.theaimsgroup.com/?l=linux-scsi&m=112629509318354&w=2
The tricky point is that aic94xx and the SAS Stack were
*NOT* committed into git as I had posted them, but were
first _edited_ and then "fixed" by what you see in
commit 2908d778ab3e244900c310974e1fc1c69066e450,
http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2908d778ab3e244900c310974e1fc1c69066e450
and _then_ commited into git as you'd see them starting
with the commit id quoted above.
Surely there are a lot of people just reading this mailing list,
interested in where storage is going with Linux. You don't want
to mislead them... Not to mention SATL support.
I should probably post my git trees somewhere? The SAS Stack
now also includes enterprise-grade per-device SATL support.
I'm sure you'd enjoy that the most. ;-)
>
> Jeff
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-10-04 23:53 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-26 22:05 [PATCH] aic94xx: Hotplug ex_change_count race fix Alexis Bruemmer
2006-10-03 14:19 ` James Bottomley
2006-10-03 15:30 ` Alexis Bruemmer
2006-10-04 23:52 ` malahal
2006-10-03 20:52 ` Luben Tuikov
2006-10-03 21:18 ` Jeff Garzik
2006-10-04 2:29 ` Douglas Gilbert
2006-10-04 7:39 ` Luben Tuikov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox