* [PATCH] sym53_8xx_2: fixes two bugs related to chip reset
@ 2008-01-09 22:59 Krzysztof Helt
2008-01-09 23:51 ` James Bottomley
0 siblings, 1 reply; 7+ messages in thread
From: Krzysztof Helt @ 2008-01-09 22:59 UTC (permalink / raw)
To: linux-scsi, James Bottomley; +Cc: matthew
From: Krzysztof Helt <krzysztof.h1@wp.pl>
This patch fixes two bugs pointed by James Bottomley:
1. the if (!sym_data->io_reset). That variable is only ever filled
by a stack based completion. If we find it non empty it means
this code has been entered twice and we have a severe problem,
so that should just become a BUG_ON(!sym_data->io_reset).
2. sym_data->io_reset should be set to NULL before the routine is
exited otherwise the PCI recovery code could end up completing
what will be a bogus pointer into the stack.
Big thanks to James Bottomley for help with the patch.
Signed-off-by: Krzysztof Helt <krzysztof.h1@w.pl>
---
I do not know if I understood correctly all James' tips.
diff -urp linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c
--- linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c 2007-12-23 20:39:44.000000000 +0100
+++ linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c 2008-01-09 22:22:30.000000000 +0100
@@ -609,22 +609,22 @@ static int sym_eh_handler(int op, char *
*/
#define WAIT_FOR_PCI_RECOVERY 35
if (pci_channel_offline(pdev)) {
- struct completion *io_reset;
int finished_reset = 0;
init_completion(&eh_done);
spin_lock_irq(shost->host_lock);
/* Make sure we didn't race */
if (pci_channel_offline(pdev)) {
- if (!sym_data->io_reset)
- sym_data->io_reset = &eh_done;
- io_reset = sym_data->io_reset;
+ BUG_ON(!sym_data->io_reset);
+ sym_data->io_reset = &eh_done;
} else {
finished_reset = 1;
}
spin_unlock_irq(shost->host_lock);
if (!finished_reset)
- finished_reset = wait_for_completion_timeout(io_reset,
+ finished_reset = wait_for_completion_timeout
+ (sym_data->io_reset,
WAIT_FOR_PCI_RECOVERY*HZ);
+ sym_data->io_reset = NULL;
if (!finished_reset)
return SCSI_FAILED;
}
@@ -1879,7 +1879,6 @@ static void sym2_io_resume(struct pci_de
spin_lock_irq(shost->host_lock);
if (sym_data->io_reset)
complete_all(sym_data->io_reset);
- sym_data->io_reset = NULL;
spin_unlock_irq(shost->host_lock);
}
----------------------------------------------------------------------
Sprawdz, ktore komorki sa najmodniejsze!
Kliknij >>> http://link.interia.pl/f1cd4
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] sym53_8xx_2: fixes two bugs related to chip reset
2008-01-09 22:59 [PATCH] sym53_8xx_2: fixes two bugs related to chip reset Krzysztof Helt
@ 2008-01-09 23:51 ` James Bottomley
2008-01-10 22:31 ` Krzysztof Helt
0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2008-01-09 23:51 UTC (permalink / raw)
To: Krzysztof Helt; +Cc: linux-scsi, matthew
On Wed, 2008-01-09 at 23:59 +0100, Krzysztof Helt wrote:
> From: Krzysztof Helt <krzysztof.h1@wp.pl>
>
> This patch fixes two bugs pointed by James Bottomley:
>
> 1. the if (!sym_data->io_reset). That variable is only ever filled
> by a stack based completion. If we find it non empty it means
> this code has been entered twice and we have a severe problem,
> so that should just become a BUG_ON(!sym_data->io_reset).
> 2. sym_data->io_reset should be set to NULL before the routine is
> exited otherwise the PCI recovery code could end up completing
> what will be a bogus pointer into the stack.
>
> Big thanks to James Bottomley for help with the patch.
>
> Signed-off-by: Krzysztof Helt <krzysztof.h1@w.pl>
Well done .. there's actually just one problem remaining:
> ---
> I do not know if I understood correctly all James' tips.
>
> diff -urp linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c
> --- linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c 2007-12-23 20:39:44.000000000 +0100
> +++ linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c 2008-01-09 22:22:30.000000000 +0100
> @@ -609,22 +609,22 @@ static int sym_eh_handler(int op, char *
> */
> #define WAIT_FOR_PCI_RECOVERY 35
> if (pci_channel_offline(pdev)) {
> - struct completion *io_reset;
> int finished_reset = 0;
> init_completion(&eh_done);
> spin_lock_irq(shost->host_lock);
> /* Make sure we didn't race */
> if (pci_channel_offline(pdev)) {
> - if (!sym_data->io_reset)
> - sym_data->io_reset = &eh_done;
> - io_reset = sym_data->io_reset;
> + BUG_ON(!sym_data->io_reset);
> + sym_data->io_reset = &eh_done;
> } else {
> finished_reset = 1;
> }
> spin_unlock_irq(shost->host_lock);
> if (!finished_reset)
> - finished_reset = wait_for_completion_timeout(io_reset,
> + finished_reset = wait_for_completion_timeout
> + (sym_data->io_reset,
> WAIT_FOR_PCI_RECOVERY*HZ);
> + sym_data->io_reset = NULL;
This has to be cleared under the host_lock to forestall the (tiny) race
where the pci recovery code checks the value of sym_data->io_reset, we
change it to null and then the pci recovery code completes a NULL
pointer.
Other than this one problem, the code looks fine.
James
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] sym53_8xx_2: fixes two bugs related to chip reset
2008-01-09 23:51 ` James Bottomley
@ 2008-01-10 22:31 ` Krzysztof Helt
2008-01-10 22:47 ` James Bottomley
0 siblings, 1 reply; 7+ messages in thread
From: Krzysztof Helt @ 2008-01-10 22:31 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi, matthew
On Wed, 09 Jan 2008 17:51:43 -0600
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > diff -urp linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c
> > --- linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c 2007-12-23 20:39:44.000000000 +0100
> > +++ linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c 2008-01-09 22:22:30.000000000 +0100
> > @@ -609,22 +609,22 @@ static int sym_eh_handler(int op, char *
> > */
> > #define WAIT_FOR_PCI_RECOVERY 35
> > if (pci_channel_offline(pdev)) {
> > - struct completion *io_reset;
> > int finished_reset = 0;
> > init_completion(&eh_done);
> > spin_lock_irq(shost->host_lock);
> > /* Make sure we didn't race */
> > if (pci_channel_offline(pdev)) {
> > - if (!sym_data->io_reset)
> > - sym_data->io_reset = &eh_done;
> > - io_reset = sym_data->io_reset;
> > + BUG_ON(!sym_data->io_reset);
> > + sym_data->io_reset = &eh_done;
> > } else {
> > finished_reset = 1;
> > }
> > spin_unlock_irq(shost->host_lock);
> > if (!finished_reset)
> > - finished_reset = wait_for_completion_timeout(io_reset,
> > + finished_reset = wait_for_completion_timeout
> > + (sym_data->io_reset,
> > WAIT_FOR_PCI_RECOVERY*HZ);
> > + sym_data->io_reset = NULL;
>
> This has to be cleared under the host_lock to forestall the (tiny) race
> where the pci recovery code checks the value of sym_data->io_reset, we
> change it to null and then the pci recovery code completes a NULL
> pointer.
>
It is impossible as the io_reset value is not NULL before and during wait
completion. The case above would happen only if one thread checked the
io_reset value (under lock) and it was NULL and before setting it (inside
locked section) another thread checked the io_reset value (still NULL
and also inside locked section = impossible). Otherwise, the BUG_ON()
kicked in (the value is already not NULL).
Another case is if you consider changing the io_reset value after the
locked section but before wait_for_completion_timeout(). In this case,
putting spinlock around the io_reset clearing does not change anything.
There is still a chance of race if the io_reset is cleared after one thread
leaves the locked section then another one clearing the io_reset (under lock)
then completion by the first thread happens with NULL pointer.
Am I right? I understand that you asked for change like this:
+ spin_lock_irq(shost->host_lock);
+ sym_data->io_reset = NULL;
+ spin_unlock_irq(shost->host_lock);
Maybe, better solution is to clear the io_reset field inside the
sym2_io_resume() (as it was done)? It would be cleared only
after the completion and before it is cleared the BUG_ON() guards
against race. The 2nd race case described above is impossible
in such solution.
However, another version of the patch is needed as the BUG_ON condition
should be BUG_ON(sym_data->io_reset) and not
BUG_ON(!sym_data->io_reset).
So, I wait for your opinion to the lock issue. If you still consider version with
the change presented above better, I'll add it.
Kind regards,
Krzysztof
----------------------------------------------------------------------
Rozdajemy nagrody!
Sprawdz >> http://link.interia.pl/f1cbf
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] sym53_8xx_2: fixes two bugs related to chip reset
2008-01-10 22:31 ` Krzysztof Helt
@ 2008-01-10 22:47 ` James Bottomley
2008-01-11 20:50 ` Krzysztof Helt
0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2008-01-10 22:47 UTC (permalink / raw)
To: Krzysztof Helt; +Cc: linux-scsi, matthew
On Thu, 2008-01-10 at 23:31 +0100, Krzysztof Helt wrote:
> On Wed, 09 Jan 2008 17:51:43 -0600
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > > diff -urp linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c
> > > --- linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c 2007-12-23 20:39:44.000000000 +0100
> > > +++ linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c 2008-01-09 22:22:30.000000000 +0100
> > > @@ -609,22 +609,22 @@ static int sym_eh_handler(int op, char *
> > > */
> > > #define WAIT_FOR_PCI_RECOVERY 35
> > > if (pci_channel_offline(pdev)) {
> > > - struct completion *io_reset;
> > > int finished_reset = 0;
> > > init_completion(&eh_done);
> > > spin_lock_irq(shost->host_lock);
> > > /* Make sure we didn't race */
> > > if (pci_channel_offline(pdev)) {
> > > - if (!sym_data->io_reset)
> > > - sym_data->io_reset = &eh_done;
> > > - io_reset = sym_data->io_reset;
> > > + BUG_ON(!sym_data->io_reset);
> > > + sym_data->io_reset = &eh_done;
> > > } else {
> > > finished_reset = 1;
> > > }
> > > spin_unlock_irq(shost->host_lock);
> > > if (!finished_reset)
> > > - finished_reset = wait_for_completion_timeout(io_reset,
> > > + finished_reset = wait_for_completion_timeout
> > > + (sym_data->io_reset,
> > > WAIT_FOR_PCI_RECOVERY*HZ);
> > > + sym_data->io_reset = NULL;
> >
> > This has to be cleared under the host_lock to forestall the (tiny) race
> > where the pci recovery code checks the value of sym_data->io_reset, we
> > change it to null and then the pci recovery code completes a NULL
> > pointer.
> >
>
> It is impossible as the io_reset value is not NULL before and during wait
> completion. The case above would happen only if one thread checked the
> io_reset value (under lock) and it was NULL and before setting it (inside
> locked section) another thread checked the io_reset value (still NULL
> and also inside locked section = impossible). Otherwise, the BUG_ON()
> kicked in (the value is already not NULL).
Er, no. Let me be clearer about the sequence
sym2_io_resume() is racing to complete sym_data->io_reset with
sym_eh_handler()s timeout. Because you don't set it to NULL under the
host_lock, you can get the sequence
sym2_io_resume() acquires the host lock and checks the value of
sym_data->io_reset and finds it to be not NULL.
sym_eh_handler() NULLs sym_data->io_resume *without* acquiring the host
lock. probably because the wait__for_completion_timeout() times out.
sym2_io_resume() calls complete() on sym_data->io_resume which is now a
NULL pointer and boom.
We never get multiple threads into sym_eh_handler() because it's single
threaded from the error handler thread.
James
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] sym53_8xx_2: fixes two bugs related to chip reset
2008-01-10 22:47 ` James Bottomley
@ 2008-01-11 20:50 ` Krzysztof Helt
2008-01-11 20:52 ` Matthew Wilcox
0 siblings, 1 reply; 7+ messages in thread
From: Krzysztof Helt @ 2008-01-11 20:50 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi, matthew
From: Krzysztof Helt <krzysztof.h1@wp.pl>
This patch fixes two bugs pointed by James Bottomley:
1. the if (!sym_data->io_reset). That variable is only ever filled
by a stack based completion. If we find it non empty it means
this code has been entered twice and we have a severe problem,
so that should just become a BUG_ON(!sym_data->io_reset).
2. sym_data->io_reset should be set to NULL before the routine is
exited otherwise the PCI recovery code could end up completing
what will be a bogus pointer into the stack.
Big thanks to James Bottomley for help with the patch.
Signed-off-by: Krzysztof Helt <krzysztof.h1@w.pl>
---
This is the second version with clearing io_reset value
under lock. I haven't got that it was a race between resume
and handler functions.
diff -urp linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c
--- linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c 2007-12-23 20:39:44.000000000 +0100
+++ linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c 2008-01-10 21:26:32.000000000 +0100
@@ -609,22 +609,24 @@ static int sym_eh_handler(int op, char *
*/
#define WAIT_FOR_PCI_RECOVERY 35
if (pci_channel_offline(pdev)) {
- struct completion *io_reset;
int finished_reset = 0;
init_completion(&eh_done);
spin_lock_irq(shost->host_lock);
/* Make sure we didn't race */
if (pci_channel_offline(pdev)) {
- if (!sym_data->io_reset)
- sym_data->io_reset = &eh_done;
- io_reset = sym_data->io_reset;
+ BUG_ON(!sym_data->io_reset);
+ sym_data->io_reset = &eh_done;
} else {
finished_reset = 1;
}
spin_unlock_irq(shost->host_lock);
if (!finished_reset)
- finished_reset = wait_for_completion_timeout(io_reset,
+ finished_reset = wait_for_completion_timeout
+ (sym_data->io_reset,
WAIT_FOR_PCI_RECOVERY*HZ);
+ spin_lock_irq(shost->host_lock);
+ sym_data->io_reset = NULL;
+ spin_unlock_irq(shost->host_lock);
if (!finished_reset)
return SCSI_FAILED;
}
@@ -1879,7 +1881,6 @@ static void sym2_io_resume(struct pci_de
spin_lock_irq(shost->host_lock);
if (sym_data->io_reset)
complete_all(sym_data->io_reset);
- sym_data->io_reset = NULL;
spin_unlock_irq(shost->host_lock);
}
----------------------------------------------------------------------
Rozdajemy nagrody!
Sprawdz >> http://link.interia.pl/f1cbf
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] sym53_8xx_2: fixes two bugs related to chip reset
2008-01-11 20:50 ` Krzysztof Helt
@ 2008-01-11 20:52 ` Matthew Wilcox
2008-01-12 4:11 ` Krzysztof Helt
0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2008-01-11 20:52 UTC (permalink / raw)
To: Krzysztof Helt; +Cc: James Bottomley, linux-scsi
On Fri, Jan 11, 2008 at 09:50:46PM +0100, Krzysztof Helt wrote:
> + BUG_ON(!sym_data->io_reset);
> + sym_data->io_reset = &eh_done;
Isn't that BUG_ON still the wrong sense?
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sym53_8xx_2: fixes two bugs related to chip reset
2008-01-11 20:52 ` Matthew Wilcox
@ 2008-01-12 4:11 ` Krzysztof Helt
0 siblings, 0 replies; 7+ messages in thread
From: Krzysztof Helt @ 2008-01-12 4:11 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: James Bottomley, linux-scsi
On Fri, 11 Jan 2008 13:52:29 -0700
Matthew Wilcox <matthew@wil.cx> wrote:
> On Fri, Jan 11, 2008 at 09:50:46PM +0100, Krzysztof Helt wrote:
> > + BUG_ON(!sym_data->io_reset);
> > + sym_data->io_reset = &eh_done;
>
> Isn't that BUG_ON still the wrong sense?
>
Unfortunately yes.
Here is the 3rd version of the patch. Thank you for your patience.
---
From: Krzysztof Helt <krzysztof.h1@wp.pl>
This patch fixes two bugs pointed by James Bottomley:
1. the if (!sym_data->io_reset). That variable is only ever filled
by a stack based completion. If we find it non empty it means
this code has been entered twice and we have a severe problem,
so that should just become a BUG_ON(!sym_data->io_reset).
2. sym_data->io_reset should be set to NULL before the routine is
exited otherwise the PCI recovery code could end up completing
what will be a bogus pointer into the stack.
Big thanks to James Bottomley for help with the patch.
Signed-off-by: Krzysztof Helt <krzysztof.h1@w.pl>
---
diff -urp linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c
--- linux-ref/drivers/scsi/sym53c8xx_2/sym_glue.c 2007-12-23 20:39:44.000000000 +0100
+++ linux-new/drivers/scsi/sym53c8xx_2/sym_glue.c 2008-01-10 21:26:32.000000000 +0100
@@ -609,22 +609,24 @@ static int sym_eh_handler(int op, char *
*/
#define WAIT_FOR_PCI_RECOVERY 35
if (pci_channel_offline(pdev)) {
- struct completion *io_reset;
int finished_reset = 0;
init_completion(&eh_done);
spin_lock_irq(shost->host_lock);
/* Make sure we didn't race */
if (pci_channel_offline(pdev)) {
- if (!sym_data->io_reset)
- sym_data->io_reset = &eh_done;
- io_reset = sym_data->io_reset;
+ BUG_ON(sym_data->io_reset);
+ sym_data->io_reset = &eh_done;
} else {
finished_reset = 1;
}
spin_unlock_irq(shost->host_lock);
if (!finished_reset)
- finished_reset = wait_for_completion_timeout(io_reset,
+ finished_reset = wait_for_completion_timeout
+ (sym_data->io_reset,
WAIT_FOR_PCI_RECOVERY*HZ);
+ spin_lock_irq(shost->host_lock);
+ sym_data->io_reset = NULL;
+ spin_unlock_irq(shost->host_lock);
if (!finished_reset)
return SCSI_FAILED;
}
@@ -1879,7 +1881,6 @@ static void sym2_io_resume(struct pci_de
spin_lock_irq(shost->host_lock);
if (sym_data->io_reset)
complete_all(sym_data->io_reset);
- sym_data->io_reset = NULL;
spin_unlock_irq(shost->host_lock);
}
----------------------------------------------------------------------
Rozdajemy nagrody!
Sprawdz >> http://link.interia.pl/f1cbf
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-01-12 4:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-09 22:59 [PATCH] sym53_8xx_2: fixes two bugs related to chip reset Krzysztof Helt
2008-01-09 23:51 ` James Bottomley
2008-01-10 22:31 ` Krzysztof Helt
2008-01-10 22:47 ` James Bottomley
2008-01-11 20:50 ` Krzysztof Helt
2008-01-11 20:52 ` Matthew Wilcox
2008-01-12 4:11 ` Krzysztof Helt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox