[OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)
@ 2002-08-14 11:54 Antti Salmela
  2002-08-14 12:37 ` Alan Cox
  0 siblings, 1 reply; 10+ messages in thread
From: Antti Salmela @ 2002-08-14 11:54 UTC (permalink / raw)
  To: linux-kernel

Oopsed soon after boot up. Stable with vanilla 2.4.19. The board is Intel
SDS2. dnetc was running.

ksymoops 2.4.5 on i686 2.4.19-rc5.  Options used
     -V (default)
     -K (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20-pre1-ac3 (specified)
     -m /boot/System.map-2.4.20-pre1-ac3 (specified)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel NULL pointer dereference at virtual address 0000002a
c0115d4c
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c0115d4c>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010003
eax: 0000008c   ebx: ffffffd6   ecx: c03268a4   edx: f721c000
esi: c0326880   edi: f721c02c   ebp: f721dfa4   esp: f721df88
ds: 0018   es: 0018   ss: 0018
Process distributed-net (pid: 521, stackpage=f721d000)
Stack: f721c000 00000000 f721c02c c0112f5f 00000000 f721c000 f721c000 f721dfbc 
       c0116cff f721c000 00000043 0003f7a0 c0326880 bffff944 c0106f4b 00000000 
       00000000 40026004 00000043 0003f7a0 bffff944 0000009e 0000002b 0000002b 
Call Trace:    [<c0112f5f>] [<c0116cff>] [<c0106f4b>]
Code: 8b 4b 54 89 4d f4 8b 72 58 85 c9 75 37 89 73 58 f0 ff 46 14 


>>EIP; c0115d4c <schedule+198/384>   <=====

>>ebx; ffffffd6 <END_OF_CODE+3fc5a89a/????>
>>ecx; c03268a4 <runqueues+24/14000>
>>edx; f721c000 <END_OF_CODE+36e768c4/????>
>>esi; c0326880 <runqueues+0/14000>
>>edi; f721c02c <END_OF_CODE+36e768f0/????>
>>ebp; f721dfa4 <END_OF_CODE+36e78868/????>
>>esp; f721df88 <END_OF_CODE+36e7884c/????>

Trace; c0112f5f <smp_apic_timer_interrupt+f3/114>
Trace; c0116cff <sys_sched_yield+113/11c>
Trace; c0106f4b <system_call+33/38>

Code;  c0115d4c <schedule+198/384>
00000000 <_EIP>:
Code;  c0115d4c <schedule+198/384>   <=====
   0:   8b 4b 54                  mov    0x54(%ebx),%ecx   <=====
Code;  c0115d4f <schedule+19b/384>
   3:   89 4d f4                  mov    %ecx,0xfffffff4(%ebp)
Code;  c0115d52 <schedule+19e/384>
   6:   8b 72 58                  mov    0x58(%edx),%esi
Code;  c0115d55 <schedule+1a1/384>
   9:   85 c9                     test   %ecx,%ecx
Code;  c0115d57 <schedule+1a3/384>
   b:   75 37                     jne    44 <_EIP+0x44> c0115d90 <schedule+1dc/384>
Code;  c0115d59 <schedule+1a5/384>
   d:   89 73 58                  mov    %esi,0x58(%ebx)
Code;  c0115d5c <schedule+1a8/384>
  10:   f0 ff 46 14               lock incl 0x14(%esi)

00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 23)
00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
00:00.2 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:00.3 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:03.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 0d)
00:04.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 0d)
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 92)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 92)
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05)
00:0f.3 Host bridge: ServerWorks: Unknown device 0230
02:04.0 SCSI storage controller: Adaptec 7899P (rev 01)
02:04.1 SCSI storage controller: Adaptec 7899P (rev 01)

-- 
Antti Salmela

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)
  2002-08-14 11:54 [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) Antti Salmela
@ 2002-08-14 12:37 ` Alan Cox
  2002-08-14 13:10   ` Antti Salmela
  2002-08-14 13:16   ` pci-dma bug in pci_alloc_consistent on i386 ? Steffen Persvold
  0 siblings, 2 replies; 10+ messages in thread
From: Alan Cox @ 2002-08-14 12:37 UTC (permalink / raw)
  To: Antti Salmela; +Cc: linux-kernel

On Wed, 2002-08-14 at 12:54, Antti Salmela wrote:
> Oopsed soon after boot up. Stable with vanilla 2.4.19. The board is Intel
> SDS2. dnetc was running.

Does vanilla 2.4.20pre1 run ok ?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)
  2002-08-14 12:37 ` Alan Cox
@ 2002-08-14 13:10   ` Antti Salmela
  2002-08-14 13:27     ` Alan Cox
  2002-08-14 13:16   ` pci-dma bug in pci_alloc_consistent on i386 ? Steffen Persvold
  1 sibling, 1 reply; 10+ messages in thread
From: Antti Salmela @ 2002-08-14 13:10 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Wed, Aug 14, 2002 at 01:37:10PM +0100, Alan Cox wrote:
> On Wed, 2002-08-14 at 12:54, Antti Salmela wrote:
> > Oopsed soon after boot up. Stable with vanilla 2.4.19. The board is Intel
> > SDS2. dnetc was running.
> 
> Does vanilla 2.4.20pre1 run ok ?

Seems to work just fine.

-- 
Antti Salmela

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)
  2002-08-14 13:10   ` Antti Salmela
@ 2002-08-14 13:27     ` Alan Cox
  2002-08-14 15:55       ` Antti Salmela
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Cox @ 2002-08-14 13:27 UTC (permalink / raw)
  To: Antti Salmela; +Cc: linux-kernel

On Wed, 2002-08-14 at 14:10, Antti Salmela wrote:
> On Wed, Aug 14, 2002 at 01:37:10PM +0100, Alan Cox wrote:
> > On Wed, 2002-08-14 at 12:54, Antti Salmela wrote:
> > > Oopsed soon after boot up. Stable with vanilla 2.4.19. The board is Intel
> > > SDS2. dnetc was running.
> > 
> > Does vanilla 2.4.20pre1 run ok ?
> 
> Seems to work just fine.

Really we need to find which kernel the problem started with then. If
you've got the time to spend on this try 2.4.19-ac1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)
  2002-08-14 13:27     ` Alan Cox
@ 2002-08-14 15:55       ` Antti Salmela
  2002-08-14 17:30         ` Christian Ehrhardt
  0 siblings, 1 reply; 10+ messages in thread
From: Antti Salmela @ 2002-08-14 15:55 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel


On Wed, Aug 14, 2002 at 02:27:09PM +0100, Alan Cox wrote:
> On Wed, 2002-08-14 at 14:10, Antti Salmela wrote:
> > On Wed, Aug 14, 2002 at 01:37:10PM +0100, Alan Cox wrote:
> > > On Wed, 2002-08-14 at 12:54, Antti Salmela wrote:
> > > > Oopsed soon after boot up. Stable with vanilla 2.4.19. The board is Intel
> > > > SDS2. dnetc was running.
> > > 
> > > Does vanilla 2.4.20pre1 run ok ?
> > 
> > Seems to work just fine.
> 
> Really we need to find which kernel the problem started with then. If
> you've got the time to spend on this try 2.4.19-ac1

2.4.19-rc1-ac2 appears to be the first one that does not work.

ksymoops 2.4.5 on i686 2.4.19-rc1-ac1.  Options used
     -V (default)
     -K (specified)
     -L (specified)
     -o /lib/modules/2.4.19-rc1-ac2 (specified)
     -m /boot/System.map-2.4.19-rc1-ac2 (specified)

No modules in ksyms, skipping objects
Unable to handle kernel NULL pointer dereference at virtual address 0000002a
c0116f0c
*pde = 00000000
Oops: 0000
CPU:    1
EIP:    0010:[<c0116f0c>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010003
eax: 0000008c   ebx: c0327680   ecx: c03276a4   edx: f6760000
esi: ffffffd6   edi: f676002c   ebp: f6761fa4   esp: f6761f88
ds: 0018   es: 0018   ss: 0018
Process distributed-net (pid: 511, stackpage=f6761000)
Stack: f6760000 00000a00 f676002c 00000001 c011428f f6760000 f6760000 f6761fbc 
       c0117eef f6760000 000000b5 000b2390 c0327680 bffff934 c01088eb 00000000 
       00000000 40026004 000000b5 000b2390 bffff934 0000009e c010002b 0000002b 
Call Trace: [<c011428f>] [<c0117eef>] [<c01088eb>] 
Code: 8b 7e 54 8b 4a 58 89 4d f4 85 ff 75 37 89 4e 58 f0 ff 41 14 


>>EIP; c0116f0c <schedule+198/394>   <=====

>>ebx; c0327680 <runqueues+a00/14000>
>>ecx; c03276a4 <runqueues+a24/14000>
>>edx; f6760000 <END_OF_CODE+363b9844/????>
>>esi; ffffffd6 <END_OF_CODE+3fc5981a/????>
>>edi; f676002c <END_OF_CODE+363b9870/????>
>>ebp; f6761fa4 <END_OF_CODE+363bb7e8/????>
>>esp; f6761f88 <END_OF_CODE+363bb7cc/????>

Trace; c011428f <smp_apic_timer_interrupt+f3/114>
Trace; c0117eef <sys_sched_yield+113/11c>
Trace; c01088eb <system_call+33/38>

Code;  c0116f0c <schedule+198/394>
00000000 <_EIP>:
Code;  c0116f0c <schedule+198/394>   <=====
   0:   8b 7e 54                  mov    0x54(%esi),%edi   <=====
Code;  c0116f0f <schedule+19b/394>
   3:   8b 4a 58                  mov    0x58(%edx),%ecx
Code;  c0116f12 <schedule+19e/394>
   6:   89 4d f4                  mov    %ecx,0xfffffff4(%ebp)
Code;  c0116f15 <schedule+1a1/394>
   9:   85 ff                     test   %edi,%edi
Code;  c0116f17 <schedule+1a3/394>
   b:   75 37                     jne    44 <_EIP+0x44> c0116f50 <schedule+1dc/394>
Code;  c0116f19 <schedule+1a5/394>
   d:   89 4e 58                  mov    %ecx,0x58(%esi)
Code;  c0116f1c <schedule+1a8/394>
  10:   f0 ff 41 14               lock incl 0x14(%ecx)

-- 
Antti Salmela

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)
  2002-08-14 15:55       ` Antti Salmela
@ 2002-08-14 17:30         ` Christian Ehrhardt
  2002-08-15  1:33           ` Alan Cox
  0 siblings, 1 reply; 10+ messages in thread
From: Christian Ehrhardt @ 2002-08-14 17:30 UTC (permalink / raw)
  To: Antti Salmela; +Cc: Alan Cox, linux-kernel


Hi,

I invested some time analyzing the Ooops and thought I'd share
what I think I found out:

The code where it Oopses is line 451 in context_switch:
    449 static inline task_t * context_switch(task_t *prev, task_t *next)
    450 {
    451         struct mm_struct *mm = next->mm;

0x54 is the offset of task->mm
At this point next is in %esi (%ebx in the earlier Oops posted). The
value of next is calculated by this code in schedule:
    867         idx = sched_find_first_bit(array->bitmap);
    868         queue = array->queue + idx;
    869         next = list_entry(queue->next, task_t, run_list);

At this point idx is in %eax, i.e. it has a value of 0x8c == 140
in both of the Oopsen. Investigating further on the value of next
(0xffffffd6) shows that this value is the result of
      list_entry (0x02, task_t, run_list),
i.e. queue->next == 0x02. Getting back to %eax shows that 140 (== MAX_PRIO
is actually NOT a valid index for array->queue above, i.e. it seems that we
overrun this array by one. Putting a ``BUG_ON(idx >= MAX_PRIO);'' between
lines 867 and 868 above should proof this.

HTH, I have no more time to investigate this now.

      regards    Christian Ehrhardt

------  Oooops preserved for reference --------------------
On Wed, Aug 14, 2002 at 06:55:05PM +0300, Antti Salmela wrote:
> ksymoops 2.4.5 on i686 2.4.19-rc1-ac1.  Options used
>      -V (default)
>      -K (specified)
>      -L (specified)
>      -o /lib/modules/2.4.19-rc1-ac2 (specified)
>      -m /boot/System.map-2.4.19-rc1-ac2 (specified)
> 
> No modules in ksyms, skipping objects
> Unable to handle kernel NULL pointer dereference at virtual address 0000002a
> c0116f0c
> *pde = 00000000
> Oops: 0000
> CPU:    1
> EIP:    0010:[<c0116f0c>]    Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010003
> eax: 0000008c   ebx: c0327680   ecx: c03276a4   edx: f6760000
> esi: ffffffd6   edi: f676002c   ebp: f6761fa4   esp: f6761f88
> ds: 0018   es: 0018   ss: 0018
> Process distributed-net (pid: 511, stackpage=f6761000)
> Stack: f6760000 00000a00 f676002c 00000001 c011428f f6760000 f6760000 f6761fbc 
>        c0117eef f6760000 000000b5 000b2390 c0327680 bffff934 c01088eb 00000000 
>        00000000 40026004 000000b5 000b2390 bffff934 0000009e c010002b 0000002b 
> Call Trace: [<c011428f>] [<c0117eef>] [<c01088eb>] 
> Code: 8b 7e 54 8b 4a 58 89 4d f4 85 ff 75 37 89 4e 58 f0 ff 41 14 
> 
> 
> >>EIP; c0116f0c <schedule+198/394>   <=====
> 
> >>ebx; c0327680 <runqueues+a00/14000>
> >>ecx; c03276a4 <runqueues+a24/14000>
> >>edx; f6760000 <END_OF_CODE+363b9844/????>
> >>esi; ffffffd6 <END_OF_CODE+3fc5981a/????>
> >>edi; f676002c <END_OF_CODE+363b9870/????>
> >>ebp; f6761fa4 <END_OF_CODE+363bb7e8/????>
> >>esp; f6761f88 <END_OF_CODE+363bb7cc/????>
> 
> Trace; c011428f <smp_apic_timer_interrupt+f3/114>
> Trace; c0117eef <sys_sched_yield+113/11c>
> Trace; c01088eb <system_call+33/38>
> 
> Code;  c0116f0c <schedule+198/394>
> 00000000 <_EIP>:
> Code;  c0116f0c <schedule+198/394>   <=====
>    0:   8b 7e 54                  mov    0x54(%esi),%edi   <=====
> Code;  c0116f0f <schedule+19b/394>
>    3:   8b 4a 58                  mov    0x58(%edx),%ecx
> Code;  c0116f12 <schedule+19e/394>
>    6:   89 4d f4                  mov    %ecx,0xfffffff4(%ebp)
> Code;  c0116f15 <schedule+1a1/394>
>    9:   85 ff                     test   %edi,%edi
> Code;  c0116f17 <schedule+1a3/394>
>    b:   75 37                     jne    44 <_EIP+0x44> c0116f50 <schedule+1dc/394>
> Code;  c0116f19 <schedule+1a5/394>
>    d:   89 4e 58                  mov    %ecx,0x58(%esi)
> Code;  c0116f1c <schedule+1a8/394>
>   10:   f0 ff 41 14               lock incl 0x14(%ecx)
> 
> -- 
> Antti Salmela
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
-- 
****************************************************************************
** Christian Ehrhardt  **  e-Mail: ehrhardt@mathematik.uni-ulm.de  *********
****************************************************************************

THAT'S ALL FOLKS!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)
  2002-08-14 17:30         ` Christian Ehrhardt
@ 2002-08-15  1:33           ` Alan Cox
  2002-08-16 14:17             ` Christian Ehrhardt
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Cox @ 2002-08-15  1:33 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: Antti Salmela, linux-kernel

Thanks - your analysis is informative to say the least. It looks like
the PIV load balancing code is the problem. 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)
  2002-08-15  1:33           ` Alan Cox
@ 2002-08-16 14:17             ` Christian Ehrhardt
  2002-08-16 16:50               ` Antti Salmela
  0 siblings, 1 reply; 10+ messages in thread
From: Christian Ehrhardt @ 2002-08-16 14:17 UTC (permalink / raw)
  To: Alan Cox; +Cc: Antti Salmela, linux-kernel

On Thu, Aug 15, 2002 at 02:33:02AM +0100, Alan Cox wrote:
> Thanks - your analysis is informative to say the least. It looks like
> the PIV load balancing code is the problem. 

A few more observations and maybe a solution for the problem
(kernel is 2.4.20-pre1-ac3):

I ran Richard Gooch's scheduler benchmark[1] as a normal user with
num_running set to 50 (./a.out 50). The box locks up hard within a few
seconds. There is no Ooops, Magic-SysRq doesn't work anymore, neither does
console switching or Ctrl-Alt-Delete. NMI-Watchdog is enabled and failed
to reboot the box.

This suggests that sched_yield, nice or sched_setscheduler is involved
with sched_yield beeing #1 candidate.

Further investigation showed that adding BUG_ON(p->array != array);
in dequeue_task would have given some interesting results.
At least the following is quite possible and doesn't even require SMP:

      Task                          Interrupt
Calls sys_sched_yield
      ======> Timer Interrupt
				    Timer Interrupt decreases times lice,
				    the time slice expires and the task is
				    moved to the expired array.
Continues with yield.
Assume current->prio == MAX_PRIO-1,
current->time_slice <= 1 is satisifed
anyway, i.e. wie do:
   dequeue_task(current, rq->active);
   enqueue_task(current, rq->expired);
However, the task has already been moved
from the active to the expired array
by the timer interrupt, i.e.
dequeue_task and enqueue_task will get
the nr_active counts and the bitmaps
wrong because they remove the task from
the wrong array.  --> BOOOM

The (untested) patch below should correct this problem along with
a locking oddity (last hunk) that IMHO either needs fixing or a BIG
comment. Be prepared for a few (up to 4) lines of fuzz due to additional
BUG_ONs in both versions of the file.

     regards   Christian Ehrhardt

[1] http://www.atnf.csiro.au/people/rgooch/benchmarks/linux-scheduler.html


--- /usr/src/linux-2.4.20-pre1-ac3/kernel/sched.c	Thu Aug 15 20:03:01 2002
+++ sched.c	Fri Aug 16 16:15:57 2002
@@ -769,7 +772,7 @@
 			set_tsk_need_resched(p);
 
 			/* put it at the end of the queue: */
-			dequeue_task(p, rq->active);
+			dequeue_task(p, p->array);
 			enqueue_task(p, rq->active);
 		}
 		goto out;
@@ -785,7 +788,7 @@
 	if (p->sleep_avg)
 		p->sleep_avg--;
 	if (!--p->time_slice) {
-		dequeue_task(p, rq->active);
+		dequeue_task(p, p->array);
 		set_tsk_need_resched(p);
 		p->prio = effective_prio(p);
 		p->time_slice = TASK_TIMESLICE(p);
@@ -1396,7 +1399,7 @@
 	 */
 	if (likely(current->prio == MAX_PRIO-1)) {
 		if (current->time_slice <= 1) {
-			dequeue_task(current, rq->active);
+			dequeue_task(current, array);
 			enqueue_task(current, rq->expired);
 		} else
 			current->time_slice--;
@@ -1411,7 +1414,7 @@
 		list_add_tail(&current->run_list, array->queue + current->prio);
 		__set_bit(current->prio, array->bitmap);
 	}
-	spin_unlock(&rq->lock);
+	rq_unlock (rq);
 
 	schedule();
 

-- 
THAT'S ALL FOLKS!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)
  2002-08-16 14:17             ` Christian Ehrhardt
@ 2002-08-16 16:50               ` Antti Salmela
  0 siblings, 0 replies; 10+ messages in thread
From: Antti Salmela @ 2002-08-16 16:50 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: Alan Cox, linux-kernel

On Fri, Aug 16, 2002 at 04:17:18PM +0200, Christian Ehrhardt wrote:
> On Thu, Aug 15, 2002 at 02:33:02AM +0100, Alan Cox wrote:
> > Thanks - your analysis is informative to say the least. It looks like
> > the PIV load balancing code is the problem. 
> 
> The (untested) patch below should correct this problem along with
> a locking oddity (last hunk) that IMHO either needs fixing or a BIG
> comment. Be prepared for a few (up to 4) lines of fuzz due to additional
> BUG_ONs in both versions of the file.

With this patch I could boot 2.4.20-pre2-ac3 and it has now run nearly an
hour without any problems.

>      regards   Christian Ehrhardt
> 
> [1] http://www.atnf.csiro.au/people/rgooch/benchmarks/linux-scheduler.html
> 
> 
> --- /usr/src/linux-2.4.20-pre1-ac3/kernel/sched.c	Thu Aug 15 20:03:01 2002
> +++ sched.c	Fri Aug 16 16:15:57 2002
> @@ -769,7 +772,7 @@
>  			set_tsk_need_resched(p);
>  
>  			/* put it at the end of the queue: */
> -			dequeue_task(p, rq->active);
> +			dequeue_task(p, p->array);
>  			enqueue_task(p, rq->active);
>  		}
>  		goto out;
> @@ -785,7 +788,7 @@
>  	if (p->sleep_avg)
>  		p->sleep_avg--;
>  	if (!--p->time_slice) {
> -		dequeue_task(p, rq->active);
> +		dequeue_task(p, p->array);
>  		set_tsk_need_resched(p);
>  		p->prio = effective_prio(p);
>  		p->time_slice = TASK_TIMESLICE(p);
> @@ -1396,7 +1399,7 @@
>  	 */
>  	if (likely(current->prio == MAX_PRIO-1)) {
>  		if (current->time_slice <= 1) {
> -			dequeue_task(current, rq->active);
> +			dequeue_task(current, array);
>  			enqueue_task(current, rq->expired);
>  		} else
>  			current->time_slice--;
> @@ -1411,7 +1414,7 @@
>  		list_add_tail(&current->run_list, array->queue + current->prio);
>  		__set_bit(current->prio, array->bitmap);
>  	}
> -	spin_unlock(&rq->lock);
> +	rq_unlock (rq);
>  
>  	schedule();
>  

-- 
Antti Salmela

^ permalink raw reply	[flat|nested] 10+ messages in thread

* pci-dma bug in pci_alloc_consistent on i386 ?
  2002-08-14 12:37 ` Alan Cox
  2002-08-14 13:10   ` Antti Salmela
@ 2002-08-14 13:16   ` Steffen Persvold
  1 sibling, 0 replies; 10+ messages in thread
From: Steffen Persvold @ 2002-08-14 13:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: David S. Miller

Hi all,

I _think_ I found a little snag in the pci_alloc_consistent code for i386. 
What I've discovered is that modern GbE drivers (such as the tg3 driver 
written by David and the e1000 driver by Intel), does something like this 
in setup :

	if (!pci_set_dma_mask(pdev, (u64) 0xffffffffffffffff)) {
		pci_using_dac = 1;
	} else {
		err = pci_set_dma_mask(pdev, (u64) 0xffffffff);
		if (err) {
			printk(KERN_ERR PFX "No usable DMA configuration, "
			       "aborting.\n");
				goto err_out_free_res;
		}
		pci_using_dac = 0;
	}

	if (pci_using_dac)
		dev->features |= NETIF_F_HIGHDMA;


On i386 the first pci_set_dma_mask will succeed, because :

(in include/asm-i386/pci.h)
static inline int pci_dma_supported(struct pci_dev *hwdev, u64 mask)
{
        /*
         * we fall back to GFP_DMA when the mask isn't all 1s,
         * so we can't guarantee allocations that must be
         * within a tighter range than GFP_DMA..
         */
        if(mask < 0x00ffffff)
                return 0;

	return 1;
}

(in drivers/pci/pci.c)
int
pci_set_dma_mask(struct pci_dev *dev, u64 mask)
{
	if (!pci_dma_supported(dev, mask))
		return -EIO;

	dev->dma_mask = mask;

	return 0;
}

And this is just fine, the ethernet adapter will be able to DMA directly 
to any memory (even highmem).


However when they are allocating RX and TX descriptors (and thus using 
pci_alloc_consistent), they are getting GFP_DMA pages. This is why :

(in arch/i386/kernel/pci-dma.c)
void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size,
			   dma_addr_t *dma_handle)
{
	void *ret;
	int gfp = GFP_ATOMIC;

	if (hwdev == NULL || hwdev->dma_mask != 0xffffffff)
		gfp |= GFP_DMA;
	ret = (void *)__get_free_pages(gfp, get_order(size));

	if (ret != NULL) {
		memset(ret, 0, size);
		*dma_handle = virt_to_bus(ret);
	}
	return ret;
}


IMHO the criteria for when to select GFP_DMA pages is wrong, it should be :

        if (hwdev == NULL || hwdev->dma_mask < 0xffffffff)
                gfp |= GFP_DMA;

Here's a patch :

--- pci-dma.c.~1~       Wed Aug 14 15:06:49 2002
+++ pci-dma.c   Wed Aug 14 15:08:29 2002
@@ -19,7 +19,7 @@
	void *ret;
	int gfp = GFP_ATOMIC;
 
-	if (hwdev == NULL || hwdev->dma_mask != 0xffffffff)
+	if (hwdev == NULL || hwdev->dma_mask < 0xffffffff)
		gfp |= GFP_DMA;
	ret = (void *)__get_free_pages(gfp, get_order(size));
 

Regards,
 -- 
  Steffen Persvold   | Scalable Linux Systems |   Try out the world's best
 mailto:sp@scali.com |  http://www.scali.com  | performing MPI implementation:
Tel: (+47) 2262 8950 |   Olaf Helsets vei 6   |      - ScaMPI 1.13.8 -
Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY   | >320MBytes/s and <4uS latency



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2002-08-16 16:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-14 11:54 [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) Antti Salmela
2002-08-14 12:37 ` Alan Cox
2002-08-14 13:10   ` Antti Salmela
2002-08-14 13:27     ` Alan Cox
2002-08-14 15:55       ` Antti Salmela
2002-08-14 17:30         ` Christian Ehrhardt
2002-08-15  1:33           ` Alan Cox
2002-08-16 14:17             ` Christian Ehrhardt
2002-08-16 16:50               ` Antti Salmela
2002-08-14 13:16   ` pci-dma bug in pci_alloc_consistent on i386 ? Steffen Persvold

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox