* [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII)
@ 2002-08-14 11:54 Antti Salmela
2002-08-14 12:37 ` Alan Cox
0 siblings, 1 reply; 10+ messages in thread
From: Antti Salmela @ 2002-08-14 11:54 UTC (permalink / raw)
To: linux-kernel
Oopsed soon after boot up. Stable with vanilla 2.4.19. The board is Intel
SDS2. dnetc was running.
ksymoops 2.4.5 on i686 2.4.19-rc5. Options used
-V (default)
-K (specified)
-l /proc/modules (default)
-o /lib/modules/2.4.20-pre1-ac3 (specified)
-m /boot/System.map-2.4.20-pre1-ac3 (specified)
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel NULL pointer dereference at virtual address 0000002a
c0115d4c
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0115d4c>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010003
eax: 0000008c ebx: ffffffd6 ecx: c03268a4 edx: f721c000
esi: c0326880 edi: f721c02c ebp: f721dfa4 esp: f721df88
ds: 0018 es: 0018 ss: 0018
Process distributed-net (pid: 521, stackpage=f721d000)
Stack: f721c000 00000000 f721c02c c0112f5f 00000000 f721c000 f721c000 f721dfbc
c0116cff f721c000 00000043 0003f7a0 c0326880 bffff944 c0106f4b 00000000
00000000 40026004 00000043 0003f7a0 bffff944 0000009e 0000002b 0000002b
Call Trace: [<c0112f5f>] [<c0116cff>] [<c0106f4b>]
Code: 8b 4b 54 89 4d f4 8b 72 58 85 c9 75 37 89 73 58 f0 ff 46 14
>>EIP; c0115d4c <schedule+198/384> <=====
>>ebx; ffffffd6 <END_OF_CODE+3fc5a89a/????>
>>ecx; c03268a4 <runqueues+24/14000>
>>edx; f721c000 <END_OF_CODE+36e768c4/????>
>>esi; c0326880 <runqueues+0/14000>
>>edi; f721c02c <END_OF_CODE+36e768f0/????>
>>ebp; f721dfa4 <END_OF_CODE+36e78868/????>
>>esp; f721df88 <END_OF_CODE+36e7884c/????>
Trace; c0112f5f <smp_apic_timer_interrupt+f3/114>
Trace; c0116cff <sys_sched_yield+113/11c>
Trace; c0106f4b <system_call+33/38>
Code; c0115d4c <schedule+198/384>
00000000 <_EIP>:
Code; c0115d4c <schedule+198/384> <=====
0: 8b 4b 54 mov 0x54(%ebx),%ecx <=====
Code; c0115d4f <schedule+19b/384>
3: 89 4d f4 mov %ecx,0xfffffff4(%ebp)
Code; c0115d52 <schedule+19e/384>
6: 8b 72 58 mov 0x58(%edx),%esi
Code; c0115d55 <schedule+1a1/384>
9: 85 c9 test %ecx,%ecx
Code; c0115d57 <schedule+1a3/384>
b: 75 37 jne 44 <_EIP+0x44> c0115d90 <schedule+1dc/384>
Code; c0115d59 <schedule+1a5/384>
d: 89 73 58 mov %esi,0x58(%ebx)
Code; c0115d5c <schedule+1a8/384>
10: f0 ff 46 14 lock incl 0x14(%esi)
00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 23)
00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
00:00.2 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:00.3 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:03.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 0d)
00:04.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 0d)
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 92)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 92)
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05)
00:0f.3 Host bridge: ServerWorks: Unknown device 0230
02:04.0 SCSI storage controller: Adaptec 7899P (rev 01)
02:04.1 SCSI storage controller: Adaptec 7899P (rev 01)
--
Antti Salmela
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) 2002-08-14 11:54 [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) Antti Salmela @ 2002-08-14 12:37 ` Alan Cox 2002-08-14 13:10 ` Antti Salmela 2002-08-14 13:16 ` pci-dma bug in pci_alloc_consistent on i386 ? Steffen Persvold 0 siblings, 2 replies; 10+ messages in thread From: Alan Cox @ 2002-08-14 12:37 UTC (permalink / raw) To: Antti Salmela; +Cc: linux-kernel On Wed, 2002-08-14 at 12:54, Antti Salmela wrote: > Oopsed soon after boot up. Stable with vanilla 2.4.19. The board is Intel > SDS2. dnetc was running. Does vanilla 2.4.20pre1 run ok ? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) 2002-08-14 12:37 ` Alan Cox @ 2002-08-14 13:10 ` Antti Salmela 2002-08-14 13:27 ` Alan Cox 2002-08-14 13:16 ` pci-dma bug in pci_alloc_consistent on i386 ? Steffen Persvold 1 sibling, 1 reply; 10+ messages in thread From: Antti Salmela @ 2002-08-14 13:10 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel On Wed, Aug 14, 2002 at 01:37:10PM +0100, Alan Cox wrote: > On Wed, 2002-08-14 at 12:54, Antti Salmela wrote: > > Oopsed soon after boot up. Stable with vanilla 2.4.19. The board is Intel > > SDS2. dnetc was running. > > Does vanilla 2.4.20pre1 run ok ? Seems to work just fine. -- Antti Salmela ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) 2002-08-14 13:10 ` Antti Salmela @ 2002-08-14 13:27 ` Alan Cox 2002-08-14 15:55 ` Antti Salmela 0 siblings, 1 reply; 10+ messages in thread From: Alan Cox @ 2002-08-14 13:27 UTC (permalink / raw) To: Antti Salmela; +Cc: linux-kernel On Wed, 2002-08-14 at 14:10, Antti Salmela wrote: > On Wed, Aug 14, 2002 at 01:37:10PM +0100, Alan Cox wrote: > > On Wed, 2002-08-14 at 12:54, Antti Salmela wrote: > > > Oopsed soon after boot up. Stable with vanilla 2.4.19. The board is Intel > > > SDS2. dnetc was running. > > > > Does vanilla 2.4.20pre1 run ok ? > > Seems to work just fine. Really we need to find which kernel the problem started with then. If you've got the time to spend on this try 2.4.19-ac1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) 2002-08-14 13:27 ` Alan Cox @ 2002-08-14 15:55 ` Antti Salmela 2002-08-14 17:30 ` Christian Ehrhardt 0 siblings, 1 reply; 10+ messages in thread From: Antti Salmela @ 2002-08-14 15:55 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel On Wed, Aug 14, 2002 at 02:27:09PM +0100, Alan Cox wrote: > On Wed, 2002-08-14 at 14:10, Antti Salmela wrote: > > On Wed, Aug 14, 2002 at 01:37:10PM +0100, Alan Cox wrote: > > > On Wed, 2002-08-14 at 12:54, Antti Salmela wrote: > > > > Oopsed soon after boot up. Stable with vanilla 2.4.19. The board is Intel > > > > SDS2. dnetc was running. > > > > > > Does vanilla 2.4.20pre1 run ok ? > > > > Seems to work just fine. > > Really we need to find which kernel the problem started with then. If > you've got the time to spend on this try 2.4.19-ac1 2.4.19-rc1-ac2 appears to be the first one that does not work. ksymoops 2.4.5 on i686 2.4.19-rc1-ac1. Options used -V (default) -K (specified) -L (specified) -o /lib/modules/2.4.19-rc1-ac2 (specified) -m /boot/System.map-2.4.19-rc1-ac2 (specified) No modules in ksyms, skipping objects Unable to handle kernel NULL pointer dereference at virtual address 0000002a c0116f0c *pde = 00000000 Oops: 0000 CPU: 1 EIP: 0010:[<c0116f0c>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010003 eax: 0000008c ebx: c0327680 ecx: c03276a4 edx: f6760000 esi: ffffffd6 edi: f676002c ebp: f6761fa4 esp: f6761f88 ds: 0018 es: 0018 ss: 0018 Process distributed-net (pid: 511, stackpage=f6761000) Stack: f6760000 00000a00 f676002c 00000001 c011428f f6760000 f6760000 f6761fbc c0117eef f6760000 000000b5 000b2390 c0327680 bffff934 c01088eb 00000000 00000000 40026004 000000b5 000b2390 bffff934 0000009e c010002b 0000002b Call Trace: [<c011428f>] [<c0117eef>] [<c01088eb>] Code: 8b 7e 54 8b 4a 58 89 4d f4 85 ff 75 37 89 4e 58 f0 ff 41 14 >>EIP; c0116f0c <schedule+198/394> <===== >>ebx; c0327680 <runqueues+a00/14000> >>ecx; c03276a4 <runqueues+a24/14000> >>edx; f6760000 <END_OF_CODE+363b9844/????> >>esi; ffffffd6 <END_OF_CODE+3fc5981a/????> >>edi; f676002c <END_OF_CODE+363b9870/????> >>ebp; f6761fa4 <END_OF_CODE+363bb7e8/????> >>esp; f6761f88 <END_OF_CODE+363bb7cc/????> Trace; c011428f <smp_apic_timer_interrupt+f3/114> Trace; c0117eef <sys_sched_yield+113/11c> Trace; c01088eb <system_call+33/38> Code; c0116f0c <schedule+198/394> 00000000 <_EIP>: Code; c0116f0c <schedule+198/394> <===== 0: 8b 7e 54 mov 0x54(%esi),%edi <===== Code; c0116f0f <schedule+19b/394> 3: 8b 4a 58 mov 0x58(%edx),%ecx Code; c0116f12 <schedule+19e/394> 6: 89 4d f4 mov %ecx,0xfffffff4(%ebp) Code; c0116f15 <schedule+1a1/394> 9: 85 ff test %edi,%edi Code; c0116f17 <schedule+1a3/394> b: 75 37 jne 44 <_EIP+0x44> c0116f50 <schedule+1dc/394> Code; c0116f19 <schedule+1a5/394> d: 89 4e 58 mov %ecx,0x58(%esi) Code; c0116f1c <schedule+1a8/394> 10: f0 ff 41 14 lock incl 0x14(%ecx) -- Antti Salmela ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) 2002-08-14 15:55 ` Antti Salmela @ 2002-08-14 17:30 ` Christian Ehrhardt 2002-08-15 1:33 ` Alan Cox 0 siblings, 1 reply; 10+ messages in thread From: Christian Ehrhardt @ 2002-08-14 17:30 UTC (permalink / raw) To: Antti Salmela; +Cc: Alan Cox, linux-kernel Hi, I invested some time analyzing the Ooops and thought I'd share what I think I found out: The code where it Oopses is line 451 in context_switch: 449 static inline task_t * context_switch(task_t *prev, task_t *next) 450 { 451 struct mm_struct *mm = next->mm; 0x54 is the offset of task->mm At this point next is in %esi (%ebx in the earlier Oops posted). The value of next is calculated by this code in schedule: 867 idx = sched_find_first_bit(array->bitmap); 868 queue = array->queue + idx; 869 next = list_entry(queue->next, task_t, run_list); At this point idx is in %eax, i.e. it has a value of 0x8c == 140 in both of the Oopsen. Investigating further on the value of next (0xffffffd6) shows that this value is the result of list_entry (0x02, task_t, run_list), i.e. queue->next == 0x02. Getting back to %eax shows that 140 (== MAX_PRIO is actually NOT a valid index for array->queue above, i.e. it seems that we overrun this array by one. Putting a ``BUG_ON(idx >= MAX_PRIO);'' between lines 867 and 868 above should proof this. HTH, I have no more time to investigate this now. regards Christian Ehrhardt ------ Oooops preserved for reference -------------------- On Wed, Aug 14, 2002 at 06:55:05PM +0300, Antti Salmela wrote: > ksymoops 2.4.5 on i686 2.4.19-rc1-ac1. Options used > -V (default) > -K (specified) > -L (specified) > -o /lib/modules/2.4.19-rc1-ac2 (specified) > -m /boot/System.map-2.4.19-rc1-ac2 (specified) > > No modules in ksyms, skipping objects > Unable to handle kernel NULL pointer dereference at virtual address 0000002a > c0116f0c > *pde = 00000000 > Oops: 0000 > CPU: 1 > EIP: 0010:[<c0116f0c>] Not tainted > Using defaults from ksymoops -t elf32-i386 -a i386 > EFLAGS: 00010003 > eax: 0000008c ebx: c0327680 ecx: c03276a4 edx: f6760000 > esi: ffffffd6 edi: f676002c ebp: f6761fa4 esp: f6761f88 > ds: 0018 es: 0018 ss: 0018 > Process distributed-net (pid: 511, stackpage=f6761000) > Stack: f6760000 00000a00 f676002c 00000001 c011428f f6760000 f6760000 f6761fbc > c0117eef f6760000 000000b5 000b2390 c0327680 bffff934 c01088eb 00000000 > 00000000 40026004 000000b5 000b2390 bffff934 0000009e c010002b 0000002b > Call Trace: [<c011428f>] [<c0117eef>] [<c01088eb>] > Code: 8b 7e 54 8b 4a 58 89 4d f4 85 ff 75 37 89 4e 58 f0 ff 41 14 > > > >>EIP; c0116f0c <schedule+198/394> <===== > > >>ebx; c0327680 <runqueues+a00/14000> > >>ecx; c03276a4 <runqueues+a24/14000> > >>edx; f6760000 <END_OF_CODE+363b9844/????> > >>esi; ffffffd6 <END_OF_CODE+3fc5981a/????> > >>edi; f676002c <END_OF_CODE+363b9870/????> > >>ebp; f6761fa4 <END_OF_CODE+363bb7e8/????> > >>esp; f6761f88 <END_OF_CODE+363bb7cc/????> > > Trace; c011428f <smp_apic_timer_interrupt+f3/114> > Trace; c0117eef <sys_sched_yield+113/11c> > Trace; c01088eb <system_call+33/38> > > Code; c0116f0c <schedule+198/394> > 00000000 <_EIP>: > Code; c0116f0c <schedule+198/394> <===== > 0: 8b 7e 54 mov 0x54(%esi),%edi <===== > Code; c0116f0f <schedule+19b/394> > 3: 8b 4a 58 mov 0x58(%edx),%ecx > Code; c0116f12 <schedule+19e/394> > 6: 89 4d f4 mov %ecx,0xfffffff4(%ebp) > Code; c0116f15 <schedule+1a1/394> > 9: 85 ff test %edi,%edi > Code; c0116f17 <schedule+1a3/394> > b: 75 37 jne 44 <_EIP+0x44> c0116f50 <schedule+1dc/394> > Code; c0116f19 <schedule+1a5/394> > d: 89 4e 58 mov %ecx,0x58(%esi) > Code; c0116f1c <schedule+1a8/394> > 10: f0 ff 41 14 lock incl 0x14(%ecx) > > -- > Antti Salmela > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- -- **************************************************************************** ** Christian Ehrhardt ** e-Mail: ehrhardt@mathematik.uni-ulm.de ********* **************************************************************************** THAT'S ALL FOLKS! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) 2002-08-14 17:30 ` Christian Ehrhardt @ 2002-08-15 1:33 ` Alan Cox 2002-08-16 14:17 ` Christian Ehrhardt 0 siblings, 1 reply; 10+ messages in thread From: Alan Cox @ 2002-08-15 1:33 UTC (permalink / raw) To: Christian Ehrhardt; +Cc: Antti Salmela, linux-kernel Thanks - your analysis is informative to say the least. It looks like the PIV load balancing code is the problem. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) 2002-08-15 1:33 ` Alan Cox @ 2002-08-16 14:17 ` Christian Ehrhardt 2002-08-16 16:50 ` Antti Salmela 0 siblings, 1 reply; 10+ messages in thread From: Christian Ehrhardt @ 2002-08-16 14:17 UTC (permalink / raw) To: Alan Cox; +Cc: Antti Salmela, linux-kernel On Thu, Aug 15, 2002 at 02:33:02AM +0100, Alan Cox wrote: > Thanks - your analysis is informative to say the least. It looks like > the PIV load balancing code is the problem. A few more observations and maybe a solution for the problem (kernel is 2.4.20-pre1-ac3): I ran Richard Gooch's scheduler benchmark[1] as a normal user with num_running set to 50 (./a.out 50). The box locks up hard within a few seconds. There is no Ooops, Magic-SysRq doesn't work anymore, neither does console switching or Ctrl-Alt-Delete. NMI-Watchdog is enabled and failed to reboot the box. This suggests that sched_yield, nice or sched_setscheduler is involved with sched_yield beeing #1 candidate. Further investigation showed that adding BUG_ON(p->array != array); in dequeue_task would have given some interesting results. At least the following is quite possible and doesn't even require SMP: Task Interrupt Calls sys_sched_yield ======> Timer Interrupt Timer Interrupt decreases times lice, the time slice expires and the task is moved to the expired array. Continues with yield. Assume current->prio == MAX_PRIO-1, current->time_slice <= 1 is satisifed anyway, i.e. wie do: dequeue_task(current, rq->active); enqueue_task(current, rq->expired); However, the task has already been moved from the active to the expired array by the timer interrupt, i.e. dequeue_task and enqueue_task will get the nr_active counts and the bitmaps wrong because they remove the task from the wrong array. --> BOOOM The (untested) patch below should correct this problem along with a locking oddity (last hunk) that IMHO either needs fixing or a BIG comment. Be prepared for a few (up to 4) lines of fuzz due to additional BUG_ONs in both versions of the file. regards Christian Ehrhardt [1] http://www.atnf.csiro.au/people/rgooch/benchmarks/linux-scheduler.html --- /usr/src/linux-2.4.20-pre1-ac3/kernel/sched.c Thu Aug 15 20:03:01 2002 +++ sched.c Fri Aug 16 16:15:57 2002 @@ -769,7 +772,7 @@ set_tsk_need_resched(p); /* put it at the end of the queue: */ - dequeue_task(p, rq->active); + dequeue_task(p, p->array); enqueue_task(p, rq->active); } goto out; @@ -785,7 +788,7 @@ if (p->sleep_avg) p->sleep_avg--; if (!--p->time_slice) { - dequeue_task(p, rq->active); + dequeue_task(p, p->array); set_tsk_need_resched(p); p->prio = effective_prio(p); p->time_slice = TASK_TIMESLICE(p); @@ -1396,7 +1399,7 @@ */ if (likely(current->prio == MAX_PRIO-1)) { if (current->time_slice <= 1) { - dequeue_task(current, rq->active); + dequeue_task(current, array); enqueue_task(current, rq->expired); } else current->time_slice--; @@ -1411,7 +1414,7 @@ list_add_tail(¤t->run_list, array->queue + current->prio); __set_bit(current->prio, array->bitmap); } - spin_unlock(&rq->lock); + rq_unlock (rq); schedule(); -- THAT'S ALL FOLKS! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) 2002-08-16 14:17 ` Christian Ehrhardt @ 2002-08-16 16:50 ` Antti Salmela 0 siblings, 0 replies; 10+ messages in thread From: Antti Salmela @ 2002-08-16 16:50 UTC (permalink / raw) To: Christian Ehrhardt; +Cc: Alan Cox, linux-kernel On Fri, Aug 16, 2002 at 04:17:18PM +0200, Christian Ehrhardt wrote: > On Thu, Aug 15, 2002 at 02:33:02AM +0100, Alan Cox wrote: > > Thanks - your analysis is informative to say the least. It looks like > > the PIV load balancing code is the problem. > > The (untested) patch below should correct this problem along with > a locking oddity (last hunk) that IMHO either needs fixing or a BIG > comment. Be prepared for a few (up to 4) lines of fuzz due to additional > BUG_ONs in both versions of the file. With this patch I could boot 2.4.20-pre2-ac3 and it has now run nearly an hour without any problems. > regards Christian Ehrhardt > > [1] http://www.atnf.csiro.au/people/rgooch/benchmarks/linux-scheduler.html > > > --- /usr/src/linux-2.4.20-pre1-ac3/kernel/sched.c Thu Aug 15 20:03:01 2002 > +++ sched.c Fri Aug 16 16:15:57 2002 > @@ -769,7 +772,7 @@ > set_tsk_need_resched(p); > > /* put it at the end of the queue: */ > - dequeue_task(p, rq->active); > + dequeue_task(p, p->array); > enqueue_task(p, rq->active); > } > goto out; > @@ -785,7 +788,7 @@ > if (p->sleep_avg) > p->sleep_avg--; > if (!--p->time_slice) { > - dequeue_task(p, rq->active); > + dequeue_task(p, p->array); > set_tsk_need_resched(p); > p->prio = effective_prio(p); > p->time_slice = TASK_TIMESLICE(p); > @@ -1396,7 +1399,7 @@ > */ > if (likely(current->prio == MAX_PRIO-1)) { > if (current->time_slice <= 1) { > - dequeue_task(current, rq->active); > + dequeue_task(current, array); > enqueue_task(current, rq->expired); > } else > current->time_slice--; > @@ -1411,7 +1414,7 @@ > list_add_tail(¤t->run_list, array->queue + current->prio); > __set_bit(current->prio, array->bitmap); > } > - spin_unlock(&rq->lock); > + rq_unlock (rq); > > schedule(); > -- Antti Salmela ^ permalink raw reply [flat|nested] 10+ messages in thread
* pci-dma bug in pci_alloc_consistent on i386 ? 2002-08-14 12:37 ` Alan Cox 2002-08-14 13:10 ` Antti Salmela @ 2002-08-14 13:16 ` Steffen Persvold 1 sibling, 0 replies; 10+ messages in thread From: Steffen Persvold @ 2002-08-14 13:16 UTC (permalink / raw) To: linux-kernel; +Cc: David S. Miller Hi all, I _think_ I found a little snag in the pci_alloc_consistent code for i386. What I've discovered is that modern GbE drivers (such as the tg3 driver written by David and the e1000 driver by Intel), does something like this in setup : if (!pci_set_dma_mask(pdev, (u64) 0xffffffffffffffff)) { pci_using_dac = 1; } else { err = pci_set_dma_mask(pdev, (u64) 0xffffffff); if (err) { printk(KERN_ERR PFX "No usable DMA configuration, " "aborting.\n"); goto err_out_free_res; } pci_using_dac = 0; } if (pci_using_dac) dev->features |= NETIF_F_HIGHDMA; On i386 the first pci_set_dma_mask will succeed, because : (in include/asm-i386/pci.h) static inline int pci_dma_supported(struct pci_dev *hwdev, u64 mask) { /* * we fall back to GFP_DMA when the mask isn't all 1s, * so we can't guarantee allocations that must be * within a tighter range than GFP_DMA.. */ if(mask < 0x00ffffff) return 0; return 1; } (in drivers/pci/pci.c) int pci_set_dma_mask(struct pci_dev *dev, u64 mask) { if (!pci_dma_supported(dev, mask)) return -EIO; dev->dma_mask = mask; return 0; } And this is just fine, the ethernet adapter will be able to DMA directly to any memory (even highmem). However when they are allocating RX and TX descriptors (and thus using pci_alloc_consistent), they are getting GFP_DMA pages. This is why : (in arch/i386/kernel/pci-dma.c) void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, dma_addr_t *dma_handle) { void *ret; int gfp = GFP_ATOMIC; if (hwdev == NULL || hwdev->dma_mask != 0xffffffff) gfp |= GFP_DMA; ret = (void *)__get_free_pages(gfp, get_order(size)); if (ret != NULL) { memset(ret, 0, size); *dma_handle = virt_to_bus(ret); } return ret; } IMHO the criteria for when to select GFP_DMA pages is wrong, it should be : if (hwdev == NULL || hwdev->dma_mask < 0xffffffff) gfp |= GFP_DMA; Here's a patch : --- pci-dma.c.~1~ Wed Aug 14 15:06:49 2002 +++ pci-dma.c Wed Aug 14 15:08:29 2002 @@ -19,7 +19,7 @@ void *ret; int gfp = GFP_ATOMIC; - if (hwdev == NULL || hwdev->dma_mask != 0xffffffff) + if (hwdev == NULL || hwdev->dma_mask < 0xffffffff) gfp |= GFP_DMA; ret = (void *)__get_free_pages(gfp, get_order(size)); Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2002-08-16 16:47 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-08-14 11:54 [OOPS] 2.4.20-pre1-ac3, SMP (Dual PIII) Antti Salmela 2002-08-14 12:37 ` Alan Cox 2002-08-14 13:10 ` Antti Salmela 2002-08-14 13:27 ` Alan Cox 2002-08-14 15:55 ` Antti Salmela 2002-08-14 17:30 ` Christian Ehrhardt 2002-08-15 1:33 ` Alan Cox 2002-08-16 14:17 ` Christian Ehrhardt 2002-08-16 16:50 ` Antti Salmela 2002-08-14 13:16 ` pci-dma bug in pci_alloc_consistent on i386 ? Steffen Persvold
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox