* Re: [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps [not found] ` <20040105112800.7a9f240b.davem@redhat.com.suse.lists.linux.kernel> @ 2004-01-05 21:02 ` Andi Kleen 2004-01-05 21:01 ` David S. Miller 0 siblings, 1 reply; 8+ messages in thread From: Andi Kleen @ 2004-01-05 21:02 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel, linux-scsi, gibbs "David S. Miller" <davem@redhat.com> writes: > On Mon, 5 Jan 2004 13:29:54 -0600 (CST) > Berkley Shands <berkley@cs.wustl.edu> wrote: > > > The pci layer is modifying the sg list, and then placing a zero > > in the length field. pci-gart.c at line 453 (2.6.0 sources) checks this length field > > after a retry, sees that it is zero, and bughalts. > > Oh that's a bug. It is allowed to modify the dma_length field but not > the physical length field. It sets length to zero to terminate the list when entries were merged. It doesn't have a dma_length. It tripping over remapped lists is an side effect, but an useful one because remapping is not supported (merging destroys information that cannot be reconstructed). If the bug didn't exist you would get data corruption. -Andi P.S.: The x86-64 IOMMU code in 2.6.0 was buggy. Use current -bk*. It will avoid the problem because merging is turned off by default, but it should be still fixed. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps 2004-01-05 21:02 ` [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps Andi Kleen @ 2004-01-05 21:01 ` David S. Miller 2004-01-05 21:31 ` Andi Kleen 0 siblings, 1 reply; 8+ messages in thread From: David S. Miller @ 2004-01-05 21:01 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, linux-scsi, gibbs On 05 Jan 2004 22:02:19 +0100 Andi Kleen <ak@suse.de> wrote: > It sets length to zero to terminate the list when entries were merged. > It doesn't have a dma_length. I understand, and you are defining dma_length to just use the normal sg->length field, and I'm trying to explain to you that this is not allowed. If you want to modify the length field to zero terminate the DMA chunks, you must have a seperate dma_length field in your platforms scatterlist structure. Again, for the 3rd time, see what sparc64 is doing here. > It tripping over remapped lists is an side effect, but an useful one > because remapping is not supported (merging destroys information that > cannot be reconstructed). If the bug didn't exist you would get data > corruption. You should not be modifying any portion of the non-DMA fields. Therefore, if the SG is unmapped, then passed into your IOMMU code for a future map call, it should just work. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps 2004-01-05 21:01 ` David S. Miller @ 2004-01-05 21:31 ` Andi Kleen 2004-01-05 21:40 ` [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps II Andi Kleen 2004-01-06 0:05 ` [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps James Bottomley 0 siblings, 2 replies; 8+ messages in thread From: Andi Kleen @ 2004-01-05 21:31 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel, linux-scsi, gibbs On Mon, 5 Jan 2004 13:01:18 -0800 "David S. Miller" <davem@redhat.com> wrote: > On 05 Jan 2004 22:02:19 +0100 > Andi Kleen <ak@suse.de> wrote: > > > It sets length to zero to terminate the list when entries were merged. > > It doesn't have a dma_length. > > I understand, and you are defining dma_length to just use the > normal sg->length field, and I'm trying to explain to you that this > is not allowed. If you want to modify the length field to zero terminate > the DMA chunks, you must have a seperate dma_length field in your > platforms scatterlist structure. DMA-mapping.txt does not ever mention a dma_length or that remapping is legal. > Again, for the 3rd time, see what sparc64 is doing here. Sorry Dave, I'm not going to reverse engineer your cryptic mess. If you have any more such undocumented requirements you should definitely add them to the Spec. For the sake of bug-to-bug compatibility to the SCSI layer this patch may work. I haven't tested it so no guarantees if it won't eat your file systems. Feedback welcome anyways. -Andi diff -u linux-2.6.1rc1-amd64/arch/x86_64/kernel/pci-gart.c-o linux-2.6.1rc1-amd64/arch/x86_64/kernel/pci-gart.c --- linux-2.6.1rc1-amd64/arch/x86_64/kernel/pci-gart.c-o 2004-01-01 06:40:28.000000000 +0100 +++ linux-2.6.1rc1-amd64/arch/x86_64/kernel/pci-gart.c 2004-01-05 22:17:49.000000000 +0100 @@ -384,6 +395,7 @@ } } s->dma_address = addr; + s->dma_length = s->length; } flush_gart(dev); return nents; @@ -410,8 +422,9 @@ *sout = *s; sout->dma_address = iommu_bus_base; sout->dma_address += iommu_page*PAGE_SIZE + s->offset; + sout->dma_length = s->length; } else { - sout->length += s->length; + sout->dma_length += s->length; } addr = phys_addr; @@ -538,9 +551,9 @@ int i; for (i = 0; i < nents; i++) { struct scatterlist *s = &sg[i]; - if (!s->length) + if (!s->dma_length || !s->length) break; - pci_unmap_single(dev, s->dma_address, s->length, dir); + pci_unmap_single(dev, s->dma_address, s->dma_length, dir); } } diff -u linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h-o linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h --- linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h-o 2003-07-18 02:40:01.000000000 +0200 +++ linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h 2004-01-05 22:10:15.000000000 +0100 @@ -4,8 +4,9 @@ struct scatterlist { struct page *page; unsigned int offset; - unsigned int length; + unsigned int length; dma_addr_t dma_address; + unsigned int dma_length; }; #define ISA_DMA_THRESHOLD (0x00ffffff) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps II 2004-01-05 21:31 ` Andi Kleen @ 2004-01-05 21:40 ` Andi Kleen 2004-01-06 0:05 ` [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps James Bottomley 1 sibling, 0 replies; 8+ messages in thread From: Andi Kleen @ 2004-01-05 21:40 UTC (permalink / raw) To: Andi Kleen; +Cc: davem, linux-kernel, linux-scsi, gibbs On Mon, 5 Jan 2004 22:31:58 +0100 Andi Kleen <ak@suse.de> wrote: > For the sake of bug-to-bug compatibility to the SCSI layer this patch may > work. I haven't tested it so no guarantees if it won't eat your file systems. > Feedback welcome anyways. [...] This version will likely work better (original still set ->length to zero) -Andi diff -u linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h-o linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h --- linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h-o 2003-07-18 02:40:01.000000000 +0200 +++ linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h 2004-01-05 22:10:15.000000000 +0100 @@ -4,8 +4,9 @@ struct scatterlist { struct page *page; unsigned int offset; - unsigned int length; + unsigned int length; dma_addr_t dma_address; + unsigned int dma_length; }; #define ISA_DMA_THRESHOLD (0x00ffffff) diff -u linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h-o linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h --- linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h-o 2003-07-18 02:40:01.000000000 +0200 +++ linux-2.6.1rc1-amd64/include/asm-x86_64/scatterlist.h 2004-01-05 22:10:15.000000000 +0100 @@ -4,8 +4,9 @@ struct scatterlist { struct page *page; unsigned int offset; - unsigned int length; + unsigned int length; dma_addr_t dma_address; + unsigned int dma_length; }; #define ISA_DMA_THRESHOLD (0x00ffffff) diff -u linux-2.6.1rc1-amd64/arch/x86_64/kernel/pci-gart.c-o linux-2.6.1rc1-amd64/arch/x86_64/kernel/pci-gart.c --- linux-2.6.1rc1-amd64/arch/x86_64/kernel/pci-gart.c-o 2004-01-01 06:40:28.000000000 +0100 +++ linux-2.6.1rc1-amd64/arch/x86_64/kernel/pci-gart.c 2004-01-05 22:37:59.000000000 +0100 @@ -384,6 +395,7 @@ } } s->dma_address = addr; + s->dma_length = s->length; } flush_gart(dev); return nents; @@ -410,8 +422,9 @@ *sout = *s; sout->dma_address = iommu_bus_base; sout->dma_address += iommu_page*PAGE_SIZE + s->offset; + sout->dma_length = s->length; } else { - sout->length += s->length; + sout->dma_length += s->length; } addr = phys_addr; @@ -490,8 +503,8 @@ goto error; out++; flush_gart(dev); - if (out < nents) - sg[out].length = 0; + if (out < nents) + sg[out].dma_length = 0; return out; error: @@ -538,9 +551,9 @@ int i; for (i = 0; i < nents; i++) { struct scatterlist *s = &sg[i]; - if (!s->length) + if (!s->dma_length || !s->length) break; - pci_unmap_single(dev, s->dma_address, s->length, dir); + pci_unmap_single(dev, s->dma_address, s->dma_length, dir); } } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps 2004-01-05 21:31 ` Andi Kleen 2004-01-05 21:40 ` [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps II Andi Kleen @ 2004-01-06 0:05 ` James Bottomley 2004-01-06 3:06 ` Andi Kleen 1 sibling, 1 reply; 8+ messages in thread From: James Bottomley @ 2004-01-06 0:05 UTC (permalink / raw) To: Andi Kleen; +Cc: David S. Miller, Linux Kernel, SCSI Mailing List, gibbs On Mon, 2004-01-05 at 15:31, Andi Kleen wrote: > For the sake of bug-to-bug compatibility to the SCSI layer this patch may > work. I haven't tested it so no guarantees if it won't eat your file systems. > Feedback welcome anyways. This isn't a bug in SCSI, it's a deliberate design feature. SCSI has certain events, like QUEUE full that cause us to re-queue the pending I/O. Other block layer drivers that can get these EAGAIN type queueing problems from the device also follow this model. The reason for doing this is the prep/request model for block drivers (although the behaviour pre-dates the bio model). As part of the prep function, we prepare the mapping list that is later passed to dma_map_sg(). Requeueing is done from the request function; by design, we don't re-prepare the requests (and hence don't free and rebuild the sg list). Like Dave says, we rely on the sequence map/unmap/map to produce a correct SG list. This does give you slightly more leeway, since we never look at the sg list after the unmap, for SCSI it doesn't have to be returned to the pre-map state as long as re-mapping it is correct. As to the idempotence of map/unmap: I'm ambivalent. If it's going to be a performance hit to return the sg list to its prior state in unmap, then it does seem a waste given that for most of our I/O transactions we simply free the sg list after the unmap. It looks like we're down to a choice of three 1. Fix the x86_64 mapping layer as your patch proposes (how much of a performance hit on every transaction will this be)? 2. Change SCSI (and every other block driver) to re-prepare requeued I/O. This will incur a free/setup overhead penalty, but only in the requeue path. 3. Don't unmap the I/O in the requeue case. I like this least because the LLD is responsible for map/unmap and the mid-layer is responsible for requeueing, so it's a layering violation (as well as a waste of potentially valuable mapping resources). On the whole, I prefer 1. Partly because it doesn't involve extra work for me ;-) but also because idempotence is an appealing property from a layering point of view. If we're agreed on this, I can add it to the DMA docs. James ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps 2004-01-06 0:05 ` [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps James Bottomley @ 2004-01-06 3:06 ` Andi Kleen 2004-01-06 3:04 ` David S. Miller 0 siblings, 1 reply; 8+ messages in thread From: Andi Kleen @ 2004-01-06 3:06 UTC (permalink / raw) To: James Bottomley; +Cc: davem, linux-kernel, linux-scsi, gibbs On 05 Jan 2004 18:05:47 -0600 James Bottomley <James.Bottomley@steeleye.com> wrote: > On Mon, 2004-01-05 at 15:31, Andi Kleen wrote: > > For the sake of bug-to-bug compatibility to the SCSI layer this patch may > > work. I haven't tested it so no guarantees if it won't eat your file systems. > > Feedback welcome anyways. > > This isn't a bug in SCSI, it's a deliberate design feature. SCSI has > certain events, like QUEUE full that cause us to re-queue the pending > I/O. Other block layer drivers that can get these EAGAIN type queueing > problems from the device also follow this model. It's ok. I fixed the code now[1] If you have other undocumented requirements you should document them though, otherwise there may be more problems. Since merging is disabled by default now it won't trigger anyways. [1] will be in next merge after testing, the last patch i posted still had one bug. > As to the idempotence of map/unmap: I'm ambivalent. If it's going to be > a performance hit to return the sg list to its prior state in unmap, > then it does seem a waste given that for most of our I/O transactions we > simply free the sg list after the unmap. With the evil dma_length trick it is actually near zero cost. > 1. Fix the x86_64 mapping layer as your patch proposes (how much of a > performance hit on every transaction will this be)? I don't expect a significant performance hit. -Andi ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps 2004-01-06 3:06 ` Andi Kleen @ 2004-01-06 3:04 ` David S. Miller 2004-01-06 3:14 ` James Bottomley 0 siblings, 1 reply; 8+ messages in thread From: David S. Miller @ 2004-01-06 3:04 UTC (permalink / raw) To: Andi Kleen; +Cc: James.Bottomley, linux-kernel, linux-scsi, gibbs On Tue, 6 Jan 2004 04:06:40 +0100 Andi Kleen <ak@suse.de> wrote: > It's ok. I fixed the code now[1] If you have other undocumented requirements > you should document them though, otherwise there may be more problems. > Since merging is disabled by default now it won't trigger anyways. I totally agree with you Andi, not documenting this properly was an oversight and not intentional. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps 2004-01-06 3:04 ` David S. Miller @ 2004-01-06 3:14 ` James Bottomley 0 siblings, 0 replies; 8+ messages in thread From: James Bottomley @ 2004-01-06 3:14 UTC (permalink / raw) To: David S. Miller; +Cc: Andi Kleen, Linux Kernel, SCSI Mailing List, gibbs On Mon, 2004-01-05 at 21:04, David S. Miller wrote: > On Tue, 6 Jan 2004 04:06:40 +0100 > Andi Kleen <ak@suse.de> wrote: > > > It's ok. I fixed the code now[1] If you have other undocumented requirements > > you should document them though, otherwise there may be more problems. > > Since merging is disabled by default now it won't trigger anyways. > > I totally agree with you Andi, not documenting this properly was an > oversight and not intentional. I'll go over the SCSI code and see if I spot any other lacunae in the API. James ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-01-06 3:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200401051929.i05JTsM0000014248@mudpuddle.cs.wustl.edu.suse.lists.linux.kernel>
[not found] ` <20040105112800.7a9f240b.davem@redhat.com.suse.lists.linux.kernel>
2004-01-05 21:02 ` [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps Andi Kleen
2004-01-05 21:01 ` David S. Miller
2004-01-05 21:31 ` Andi Kleen
2004-01-05 21:40 ` [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps II Andi Kleen
2004-01-06 0:05 ` [BUG] x86_64 pci_map_sg modifies sg list - fails multiple map/unmaps James Bottomley
2004-01-06 3:06 ` Andi Kleen
2004-01-06 3:04 ` David S. Miller
2004-01-06 3:14 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox