linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 1/7] git-scsi-misc gdth fix
@ 2007-10-16 21:28 akpm
  2007-10-17 12:28 ` James Bottomley
  0 siblings, 1 reply; 4+ messages in thread
From: akpm @ 2007-10-16 21:28 UTC (permalink / raw)
  Cc: linux-scsi, akpm, James.Bottomley, davemilter

From: James Bottomley <James.Bottomley@SteelEye.com>

On Sun, 2007-10-14 at 12:21 -0700, Andrew Morton wrote:
> On Sun, 14 Oct 2007 22:45:47 +0400 "Dave Milter" <davemilter@gmail.com> wrote:
>
> > I build linux-2.6.23-mm1 and try to boot it using qemu,
> > and it crashed with trace like this:
> > do_page_fault
> > error_code
> > lock_acquire
> > _spin_lock_irqsave
> > gdth_timeout
> > run_timer_softirq
> > __do_softirq
> > do_softirq
> >
> > I have screenshot, but have no idea, is it legal to include it, if I
> > sent copy to lkml.
> > config of kernel in attachment,
> > I apply all three patches from hot-fixes.
> >
>
> The screenshot is here:  http://userweb.kernel.org/~akpm/crash.png
>
> It would appear that gdth_timeout() is passing a bad pointer into
> spin_lock_irqsave().

There's a bug in the gdth rework in that the instance can be deleted
from the list before the actual timer is stopped.  This can be worked
around I think by the following patch; although we really should be
stopping the timer from firing when the list goes empty.



Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/scsi/gdth.c |    3 +++
 1 file changed, 3 insertions(+)

diff -puN drivers/scsi/gdth.c~git-scsi-misc-gdth-fix drivers/scsi/gdth.c
--- a/drivers/scsi/gdth.c~git-scsi-misc-gdth-fix
+++ a/drivers/scsi/gdth.c
@@ -3793,6 +3793,9 @@ static void gdth_timeout(ulong data)
     gdth_ha_str *ha;
     ulong flags;
 
+    if (list_empty(&gdth_instances))
+	return;
+
     ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
     spin_lock_irqsave(&ha->smp_lock, flags);
 
_

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch 1/7] git-scsi-misc gdth fix
  2007-10-16 21:28 [patch 1/7] git-scsi-misc gdth fix akpm
@ 2007-10-17 12:28 ` James Bottomley
  2007-10-18 16:54   ` Boaz Harrosh
  2007-10-18 17:41   ` Boaz Harrosh
  0 siblings, 2 replies; 4+ messages in thread
From: James Bottomley @ 2007-10-17 12:28 UTC (permalink / raw)
  To: akpm, Boaz Harrosh; +Cc: linux-scsi, davemilter

On Tue, 2007-10-16 at 14:28 -0700, akpm@linux-foundation.org wrote:
> From: James Bottomley <James.Bottomley@SteelEye.com>
> 
> On Sun, 2007-10-14 at 12:21 -0700, Andrew Morton wrote:
> > On Sun, 14 Oct 2007 22:45:47 +0400 "Dave Milter" <davemilter@gmail.com> wrote:
> >
> > > I build linux-2.6.23-mm1 and try to boot it using qemu,
> > > and it crashed with trace like this:
> > > do_page_fault
> > > error_code
> > > lock_acquire
> > > _spin_lock_irqsave
> > > gdth_timeout
> > > run_timer_softirq
> > > __do_softirq
> > > do_softirq
> > >
> > > I have screenshot, but have no idea, is it legal to include it, if I
> > > sent copy to lkml.
> > > config of kernel in attachment,
> > > I apply all three patches from hot-fixes.
> > >
> >
> > The screenshot is here:  http://userweb.kernel.org/~akpm/crash.png
> >
> > It would appear that gdth_timeout() is passing a bad pointer into
> > spin_lock_irqsave().
> 
> There's a bug in the gdth rework in that the instance can be deleted
> from the list before the actual timer is stopped.  This can be worked
> around I think by the following patch; although we really should be
> stopping the timer from firing when the list goes empty.
> 
> 
> 
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  drivers/scsi/gdth.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff -puN drivers/scsi/gdth.c~git-scsi-misc-gdth-fix drivers/scsi/gdth.c
> --- a/drivers/scsi/gdth.c~git-scsi-misc-gdth-fix
> +++ a/drivers/scsi/gdth.c
> @@ -3793,6 +3793,9 @@ static void gdth_timeout(ulong data)
>      gdth_ha_str *ha;
>      ulong flags;
>  
> +    if (list_empty(&gdth_instances))
> +	return;
> +
>      ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
>      spin_lock_irqsave(&ha->smp_lock, flags);
>  

This is almost certainly the wrong fix for real hardware.  Although it
kills the timer when the list goes empty, nothing will ever restart it
when the list fills again.

Boaz, since you touched all of this, you get to fix it.  The correct fix
will be to control the timer along with the actual list instead of at
entry/exit time.  If you're not going to add this empty check to the
timer routine, make sure you use del_timer_sync() before removing the
last element from the list.

James



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch 1/7] git-scsi-misc gdth fix
  2007-10-17 12:28 ` James Bottomley
@ 2007-10-18 16:54   ` Boaz Harrosh
  2007-10-18 17:41   ` Boaz Harrosh
  1 sibling, 0 replies; 4+ messages in thread
From: Boaz Harrosh @ 2007-10-18 16:54 UTC (permalink / raw)
  To: James Bottomley; +Cc: akpm, linux-scsi, davemilter

On Wed, Oct 17 2007 at 14:28 +0200, James Bottomley <James.Bottomley@SteelEye.com> wrote:
> On Tue, 2007-10-16 at 14:28 -0700, akpm@linux-foundation.org wrote:
>> From: James Bottomley <James.Bottomley@SteelEye.com>
>>
>> On Sun, 2007-10-14 at 12:21 -0700, Andrew Morton wrote:
>>> On Sun, 14 Oct 2007 22:45:47 +0400 "Dave Milter" <davemilter@gmail.com> wrote:
>>>
>>>> I build linux-2.6.23-mm1 and try to boot it using qemu,
>>>> and it crashed with trace like this:
>>>> do_page_fault
>>>> error_code
>>>> lock_acquire
>>>> _spin_lock_irqsave
>>>> gdth_timeout
>>>> run_timer_softirq
>>>> __do_softirq
>>>> do_softirq
>>>>
>>>> I have screenshot, but have no idea, is it legal to include it, if I
>>>> sent copy to lkml.
>>>> config of kernel in attachment,
>>>> I apply all three patches from hot-fixes.
>>>>
>>> The screenshot is here:  http://userweb.kernel.org/~akpm/crash.png
>>>
>>> It would appear that gdth_timeout() is passing a bad pointer into
>>> spin_lock_irqsave().
>> There's a bug in the gdth rework in that the instance can be deleted
>> from the list before the actual timer is stopped.  This can be worked
>> around I think by the following patch; although we really should be
>> stopping the timer from firing when the list goes empty.
>>
>>
>>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>
>>  drivers/scsi/gdth.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff -puN drivers/scsi/gdth.c~git-scsi-misc-gdth-fix drivers/scsi/gdth.c
>> --- a/drivers/scsi/gdth.c~git-scsi-misc-gdth-fix
>> +++ a/drivers/scsi/gdth.c
>> @@ -3793,6 +3793,9 @@ static void gdth_timeout(ulong data)
>>      gdth_ha_str *ha;
>>      ulong flags;
>>  
>> +    if (list_empty(&gdth_instances))
>> +	return;
>> +
>>      ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
>>      spin_lock_irqsave(&ha->smp_lock, flags);
>>  
> 
> This is almost certainly the wrong fix for real hardware.  Although it
> kills the timer when the list goes empty, nothing will ever restart it
> when the list fills again.
> 
> Boaz, since you touched all of this, you get to fix it.  The correct fix
> will be to control the timer along with the actual list instead of at
> entry/exit time.  If you're not going to add this empty check to the
> timer routine, make sure you use del_timer_sync() before removing the
> last element from the list.
> 
> James
> 
> 
OK I'm on it give me 24h

Boaz


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch 1/7] git-scsi-misc gdth fix
  2007-10-17 12:28 ` James Bottomley
  2007-10-18 16:54   ` Boaz Harrosh
@ 2007-10-18 17:41   ` Boaz Harrosh
  1 sibling, 0 replies; 4+ messages in thread
From: Boaz Harrosh @ 2007-10-18 17:41 UTC (permalink / raw)
  To: James Bottomley, davemilter; +Cc: akpm, linux-scsi

On Wed, Oct 17 2007 at 14:28 +0200, James Bottomley <James.Bottomley@SteelEye.com> wrote:
> On Tue, 2007-10-16 at 14:28 -0700, akpm@linux-foundation.org wrote:
>> From: James Bottomley <James.Bottomley@SteelEye.com>
>>
>> On Sun, 2007-10-14 at 12:21 -0700, Andrew Morton wrote:
>>> On Sun, 14 Oct 2007 22:45:47 +0400 "Dave Milter" <davemilter@gmail.com> wrote:
>>>
>>>> I build linux-2.6.23-mm1 and try to boot it using qemu,
>>>> and it crashed with trace like this:
>>>> do_page_fault
>>>> error_code
>>>> lock_acquire
>>>> _spin_lock_irqsave
>>>> gdth_timeout
>>>> run_timer_softirq
>>>> __do_softirq
>>>> do_softirq
>>>>
>>>> I have screenshot, but have no idea, is it legal to include it, if I
>>>> sent copy to lkml.
>>>> config of kernel in attachment,
>>>> I apply all three patches from hot-fixes.
>>>>
>>> The screenshot is here:  http://userweb.kernel.org/~akpm/crash.png
>>>
>>> It would appear that gdth_timeout() is passing a bad pointer into
>>> spin_lock_irqsave().
>> There's a bug in the gdth rework in that the instance can be deleted
>> from the list before the actual timer is stopped.  This can be worked
>> around I think by the following patch; although we really should be
>> stopping the timer from firing when the list goes empty.
>>
>>
>>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>
>>  drivers/scsi/gdth.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff -puN drivers/scsi/gdth.c~git-scsi-misc-gdth-fix drivers/scsi/gdth.c
>> --- a/drivers/scsi/gdth.c~git-scsi-misc-gdth-fix
>> +++ a/drivers/scsi/gdth.c
>> @@ -3793,6 +3793,9 @@ static void gdth_timeout(ulong data)
>>      gdth_ha_str *ha;
>>      ulong flags;
>>  
>> +    if (list_empty(&gdth_instances))
>> +	return;
>> +
>>      ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
>>      spin_lock_irqsave(&ha->smp_lock, flags);
>>  
> 
> This is almost certainly the wrong fix for real hardware.  Although it
> kills the timer when the list goes empty, nothing will ever restart it
> when the list fills again.
> 
> Boaz, since you touched all of this, you get to fix it.  The correct fix
> will be to control the timer along with the actual list instead of at
> entry/exit time.  If you're not going to add this empty check to the
> timer routine, make sure you use del_timer_sync() before removing the
> last element from the list.
> 
> James
> 
> 
OK I can see the crash, but than I do not understand
why the fix above works. Let me explain

gdth is not yet hotplug. the only place that deletes
hosts is in the module_exit(gdth_exit) routine.
Now, the code does not do a list_del() anywhere it will
just deallocate the hosts and only after that delete the
timer. Now, suppose a timer gets through between the time
hosts are deallocated and the timer is removed. The list
is not empty it points to garbage memory, yes. But the
check above will not fix it. See my problem.

please try below patch, and report if the WARN_ON I put
triggers. If it works I will send a proper patch later
with a one that does a list_del() too.
(this is with out the patch above)

-------------
diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index 3ac080e..b84a869 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -3793,6 +3793,11 @@ static void gdth_timeout(ulong data)
     gdth_ha_str *ha;
     ulong flags;
 
+    if (list_empty(&gdth_instances)) {
+        WARN_ON(1);
+        return;
+    }
+
     ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
     spin_lock_irqsave(&ha->smp_lock, flags);
 
@@ -5236,12 +5241,13 @@ static void __exit gdth_exit(void)
 {
 	gdth_ha_str *ha;
 
+#ifdef GDTH_STATISTICS
+	del_timer_sync(&gdth_timer);
+#endif
+
 	list_for_each_entry(ha, &gdth_instances, list)
 		gdth_remove_one(ha);
 
-#ifdef GDTH_STATISTICS
-	del_timer(&gdth_timer);
-#endif
 	unregister_chrdev(major,"gdth");
 	unregister_reboot_notifier(&gdth_notifier);
 }




^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-10-18 17:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-16 21:28 [patch 1/7] git-scsi-misc gdth fix akpm
2007-10-17 12:28 ` James Bottomley
2007-10-18 16:54   ` Boaz Harrosh
2007-10-18 17:41   ` Boaz Harrosh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).