From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760287AbXJXWW7 (ORCPT ); Wed, 24 Oct 2007 18:22:59 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755020AbXJXWWw (ORCPT ); Wed, 24 Oct 2007 18:22:52 -0400 Received: from mo11.iij4u.or.jp ([210.138.174.79]:33214 "EHLO mo11.iij4u.or.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754782AbXJXWWv (ORCPT ); Wed, 24 Oct 2007 18:22:51 -0400 Date: Thu, 25 Oct 2007 07:09:06 +0900 To: kamalesh@linux.vnet.ibm.com, jens.axboe@oracle.com Cc: fujita.tomonori@lab.ntt.co.jp, apw@shadowen.org, linux-kernel@vger.kernel.org, tomof@acm.org Subject: Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers From: FUJITA Tomonori In-Reply-To: <471F6DFE.3040304@linux.vnet.ibm.com> References: <20071024115436.GT32058@shadowen.org> <20071024214014C.fujita.tomonori@lab.ntt.co.jp> <471F6DFE.3040304@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20071025071043P.tomof@acm.org> X-Dispatcher: imput version 20050308(IM148) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 24 Oct 2007 21:38:30 +0530 Kamalesh Babulal wrote: > FUJITA Tomonori wrote: > > On Wed, 24 Oct 2007 12:54:36 +0100 > > Andy Whitcroft wrote: > > > >> On Tue, Oct 23, 2007 at 08:44:20PM +0200, Jens Axboe wrote: > >>> On Tue, Oct 23 2007, Kamalesh Babulal wrote: > >>>> Hi, > >>>> > >>>> Kernel oops is triggered while running fsx-linux test, followed by cpu softlock > >>>> over the AMD box > >>>> > >>>> Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: > >>>> [] gart_map_sg+0x26c/0x406 > >>>> PGD 10185b067 PUD 10075b067 PMD 0 > >>>> Oops: 0002 [1] SMP > >>>> CPU 3 > >>>> Modules linked in: > >>>> Pid: 18676, comm: fsx-linux Not tainted 2.6.23-git18-autokern1 #1 > >>>> RIP: 0010:[] [] gart_map_sg+0x26c/0x406 > >>>> RSP: 0000:ffff810181edf948 EFLAGS: 00010002 > >>> Can you check where gart_map_sg+0x26c is at? Make sure you have > >>> CONFIG_DEBUG_INFO defined, then do: > >>> > >>> $ gdb vmlinux > >>> $ l *gart_map_sg+0x26c > >> Ok, this problem still seems to be about in 2.6.24-rc1. Here is the gdb > >> output from that version, the panic (also below) seems the same: > >> > >> (gdb) l *gart_map_sg+0x26c > >> 0xffffffff8022011e is in gart_map_sg (arch/x86/kernel/pci-gart_64.c:433). > >> 428 goto error; > >> 429 out++; > >> 430 flush_gart(); > >> 431 if (out < nents) { > >> 432 sgmap = sg_next(sgmap); > >> 433 sgmap->dma_length = 0; > >> 434 } > >> 435 return out; > >> 436 > >> 437 error: > >> > >> So it seems sg_next has returned 0. > > > > Have you tried this? > > > > http://marc.info/?l=linux-kernel&m=119317981406073&w=2 > > - > Hi, > Thanks, this patch solves the kernel oops. Thanks for testing! Jens, here's the proper changelog. - From: FUJITA Tomonori Subject: [PATCH] x86: pci-gart fix map_sg could copy the last sg element to another position (if merging some elements). It breaks sg chaining. This copies only dma_address/length instead of the whole sg element. Signed-off-by: FUJITA Tomonori --- arch/x86/kernel/pci-gart_64.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c index c56e9ee..ae7e016 100644 --- a/arch/x86/kernel/pci-gart_64.c +++ b/arch/x86/kernel/pci-gart_64.c @@ -338,7 +338,6 @@ static int __dma_map_cont(struct scatterlist *start, int nelems, BUG_ON(s != start && s->offset); if (s == start) { - *sout = *s; sout->dma_address = iommu_bus_base; sout->dma_address += iommu_page*PAGE_SIZE + s->offset; sout->dma_length = s->length; @@ -365,7 +364,7 @@ static inline int dma_map_cont(struct scatterlist *start, int nelems, { if (!need) { BUG_ON(nelems != 1); - *sout = *start; + sout->dma_address = start->dma_address; sout->dma_length = start->length; return 0; } -- 1.5.2.4