Linux PARISC architecture development
 help / color / mirror / Atom feed
* dma_addr_t: which comment is correct?
@ 2007-12-22 12:15 rubisher
  2007-12-23  9:39 ` Grant Grundler
  0 siblings, 1 reply; 8+ messages in thread
From: rubisher @ 2007-12-22 12:15 UTC (permalink / raw)
  To: linux-parisc

Hello *,

Continuing my blind investigation on ccio-dma stuff, I read those 2 different comments:
in include/asm-parisc/scatterlist.h, scartterlist structure is defined like this:
struct scatterlist {
#ifdef CONFIG_DEBUG_SG
         unsigned long sg_magic;
#endif
         unsigned long page_link;
         unsigned int offset;

         unsigned int length;

         /* an IOVA can be 64-bits on some PA-Risc platforms. */
         dma_addr_t iova;        /* I/O Virtual Address */
         __u32      iova_length; /* bytes mapped */
};

in absolute the comment "an IOVA can be 64-bits on some PA-Risc platforms." seems ok.

but otoh, include/asm-parisc/types.h, defined dma_addr_t like this:

/* Dma addresses are 32-bits wide.  */

typedef u32 dma_addr_t;
typedef u64 dma64_addr_t;

#endif /* __ASSEMBLY__ */

OK it's just a comment but imho there is interesting matter in x86:

typedef u64 dma64_addr_t;
#if defined(CONFIG_X86_64) || defined(CONFIG_HIGHMEM64G)
/* DMA addresses come in 32-bit and 64-bit flavours. */
typedef u64 dma_addr_t;
#else
typedef u32 dma_addr_t;
#endif

But I simply have no idea which "#if defined" would be the most relevant for parisc, any idea?

Cheers,
	r.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dma_addr_t: which comment is correct?
  2007-12-22 12:15 dma_addr_t: which comment is correct? rubisher
@ 2007-12-23  9:39 ` Grant Grundler
  2007-12-23 22:50   ` rubisher
  0 siblings, 1 reply; 8+ messages in thread
From: Grant Grundler @ 2007-12-23  9:39 UTC (permalink / raw)
  To: rubisher; +Cc: linux-parisc

On Sat, Dec 22, 2007 at 12:15:31PM +0000, rubisher wrote:
> Hello *,
>
> Continuing my blind investigation on ccio-dma stuff, I read those 2 
> different comments:
> in include/asm-parisc/scatterlist.h, scartterlist structure is defined like 
> this:
> struct scatterlist {
> #ifdef CONFIG_DEBUG_SG
>         unsigned long sg_magic;
> #endif
>         unsigned long page_link;
>         unsigned int offset;
>
>         unsigned int length;
>
>         /* an IOVA can be 64-bits on some PA-Risc platforms. */
>         dma_addr_t iova;        /* I/O Virtual Address */
>         __u32      iova_length; /* bytes mapped */
> };
>
> in absolute the comment "an IOVA can be 64-bits on some PA-Risc platforms." 
> seems ok.

Yes, it's correct. pa8800 allows 64-bit capabale devices to bypass the IOMMU.
But I don't think we allow it yet since we would need to add code that stuffs
the Virtual Index (for DMA to be cache coherent) into the high bits of the IOVA.


> but otoh, include/asm-parisc/types.h, defined dma_addr_t like this:
>
> /* Dma addresses are 32-bits wide.  */
>
> typedef u32 dma_addr_t;
> typedef u64 dma64_addr_t;
>
> #endif /* __ASSEMBLY__ */

Yes, this matches the current implementation.
Adding support for "bypass" on zx1 chipsets (pa8800/pa8900 CPUs) will
require something more clever.

> OK it's just a comment but imho there is interesting matter in x86:
>
> typedef u64 dma64_addr_t;
> #if defined(CONFIG_X86_64) || defined(CONFIG_HIGHMEM64G)
> /* DMA addresses come in 32-bit and 64-bit flavours. */
> typedef u64 dma_addr_t;
> #else
> typedef u32 dma_addr_t;
> #endif
>
> But I simply have no idea which "#if defined" would be the most relevant 
> for parisc, any idea?

u32 is correct now. u64 could be used for 64-bit builds when someone decides
we should bypass the IOMMU to improve DMA mapping/unmapping performance.

3-4 years ago I saw about 3% better performance for Storage Devices
_with_ the IOMMU enabled. IIRC, it was because of coalescing the longer
SG lists into a single IOMMU entry was more efficient for the PCI device.
Smaller, non-contiguous IOs would benefit from IOMMU bypass - e.g. NIC
workloads.

hth,
grant

>
> Cheers,
> 	r.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dma_addr_t: which comment is correct?
  2007-12-23  9:39 ` Grant Grundler
@ 2007-12-23 22:50   ` rubisher
  2007-12-24  8:51     ` Grant Grundler
  0 siblings, 1 reply; 8+ messages in thread
From: rubisher @ 2007-12-23 22:50 UTC (permalink / raw)
  To: Grant Grundler; +Cc: linux-parisc

Grant Grundler wrote:
> On Sat, Dec 22, 2007 at 12:15:31PM +0000, rubisher wrote:
>> Hello *,
>>
>> Continuing my blind investigation on ccio-dma stuff, I read those 2 
>> different comments:
>> in include/asm-parisc/scatterlist.h, scartterlist structure is defined like 
>> this:
>> struct scatterlist {
>> #ifdef CONFIG_DEBUG_SG
>>         unsigned long sg_magic;
>> #endif
>>         unsigned long page_link;
>>         unsigned int offset;
>>
>>         unsigned int length;
>>
>>         /* an IOVA can be 64-bits on some PA-Risc platforms. */
>>         dma_addr_t iova;        /* I/O Virtual Address */
>>         __u32      iova_length; /* bytes mapped */
>> };
>>
>> in absolute the comment "an IOVA can be 64-bits on some PA-Risc platforms." 
>> seems ok.
> 
> Yes, it's correct. pa8800 allows 64-bit capabale devices to bypass the IOMMU.
> But I don't think we allow it yet since we would need to add code that stuffs
> the Virtual Index (for DMA to be cache coherent) into the high bits of the IOVA.
> 
> 
>> but otoh, include/asm-parisc/types.h, defined dma_addr_t like this:
>>
>> /* Dma addresses are 32-bits wide.  */
>>
>> typedef u32 dma_addr_t;
>> typedef u64 dma64_addr_t;
>>
>> #endif /* __ASSEMBLY__ */
> 
> Yes, this matches the current implementation.
> Adding support for "bypass" on zx1 chipsets (pa8800/pa8900 CPUs) will
> require something more clever.
> 
>> OK it's just a comment but imho there is interesting matter in x86:
>>
>> typedef u64 dma64_addr_t;
>> #if defined(CONFIG_X86_64) || defined(CONFIG_HIGHMEM64G)
>> /* DMA addresses come in 32-bit and 64-bit flavours. */
>> typedef u64 dma_addr_t;
>> #else
>> typedef u32 dma_addr_t;
>> #endif
>>
>> But I simply have no idea which "#if defined" would be the most relevant 
>> for parisc, any idea?
> 
> u32 is correct now. u64 could be used for 64-bit builds when someone decides
> we should bypass the IOMMU to improve DMA mapping/unmapping performance.
> 
> 3-4 years ago I saw about 3% better performance for Storage Devices
> _with_ the IOMMU enabled. IIRC, it was because of coalescing the longer
> SG lists into a single IOMMU entry was more efficient for the PCI device.
> Smaller, non-contiguous IOs would benefit from IOMMU bypass - e.g. NIC
> workloads.
> 
> hth,
> grant
> 
Ok,

btw do you notice how much way are there (in so few lines) to use a unsigned 32bits integer: u32, __u32 but also uint32_t, 
u_int32_t, ...

Is there some advise om the best pratice?

Ah also related to ccio-dma patch test on my c110, it seems that removing DELAYED_RESOURCE_CNT code help a bit this system 
with few ram (64Mb only) to survive a bit longer (enough to debootstrap and download 800Mb to be installed) but that didn't 
make the drill: there are still fs corruption from time to time and some reset on the lasi disk ;-(.
This test just seems to confirm an issue of i/o coherency somewhere but where those rules could be broken???

Merry Xmas,
	r.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dma_addr_t: which comment is correct?
  2007-12-23 22:50   ` rubisher
@ 2007-12-24  8:51     ` Grant Grundler
  2007-12-26 10:01       ` Thibaut VARENE
  2007-12-26 17:31       ` iommu_fill_pdir() and its /* Horrible hack. ... */ reading rubisher
  0 siblings, 2 replies; 8+ messages in thread
From: Grant Grundler @ 2007-12-24  8:51 UTC (permalink / raw)
  To: rubisher; +Cc: linux-parisc

On Sun, Dec 23, 2007 at 10:50:34PM +0000, rubisher wrote:
...
> btw do you notice how much way are there (in so few lines) to use a 
> unsigned 32bits integer: u32, __u32 but also uint32_t, u_int32_t, ...
>
> Is there some advise om the best pratice?

IIRC, Preferred is "u32". But use whatever is consistent with the
file you are editing.

>
> Ah also related to ccio-dma patch test on my c110, it seems that removing 
> DELAYED_RESOURCE_CNT code help a bit this system with few ram (64Mb only) 
> to survive a bit longer (enough to debootstrap and download 800Mb to be 
> installed) but that didn't make the drill: there are still fs corruption 
> from time to time and some reset on the lasi disk ;-(.
> This test just seems to confirm an issue of i/o coherency somewhere but 
> where those rules could be broken???

ISTR Joel Soete reporting problems for CCIO machines as well.
Offhand, I don't know what's wrong and we depend on Ryan Bradetich
and other volunteers to track that down. 

cheers,
grant

>
> Merry Xmas,
> 	r.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dma_addr_t: which comment is correct?
  2007-12-24  8:51     ` Grant Grundler
@ 2007-12-26 10:01       ` Thibaut VARENE
  2007-12-26 17:31       ` iommu_fill_pdir() and its /* Horrible hack. ... */ reading rubisher
  1 sibling, 0 replies; 8+ messages in thread
From: Thibaut VARENE @ 2007-12-26 10:01 UTC (permalink / raw)
  To: Grant Grundler; +Cc: rubisher, linux-parisc

On Dec 24, 2007 9:51 AM, Grant Grundler <grundler@parisc-linux.org> wrote:
> On Sun, Dec 23, 2007 at 10:50:34PM +0000, rubisher wrote:
> ...
> > btw do you notice how much way are there (in so few lines) to use a
> > unsigned 32bits integer: u32, __u32 but also uint32_t, u_int32_t, ...
> >
> > Is there some advise om the best pratice?
>
> IIRC, Preferred is "u32". But use whatever is consistent with the
> file you are editing.

IIRC, the __xx variants are to be used for userspace exported headers.
See the comment in include/asm-parisc/types.h

HTH and merry Xmas

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* iommu_fill_pdir() and its /* Horrible hack. ... */ reading.
  2007-12-24  8:51     ` Grant Grundler
  2007-12-26 10:01       ` Thibaut VARENE
@ 2007-12-26 17:31       ` rubisher
  2007-12-28  8:27         ` Grant Grundler
  1 sibling, 1 reply; 8+ messages in thread
From: rubisher @ 2007-12-26 17:31 UTC (permalink / raw)
  To: Grant Grundler; +Cc: linux-parisc

Grant Grundler wrote:
> On Sun, Dec 23, 2007 at 10:50:34PM +0000, rubisher wrote:
> ...
>> btw do you notice how much way are there (in so few lines) to use a 
>> unsigned 32bits integer: u32, __u32 but also uint32_t, u_int32_t, ...
>>
>> Is there some advise om the best pratice?
> 
> IIRC, Preferred is "u32". But use whatever is consistent with the
> file you are editing.
> 
>> Ah also related to ccio-dma patch test on my c110, it seems that removing 
>> DELAYED_RESOURCE_CNT code help a bit this system with few ram (64Mb only) 
>> to survive a bit longer (enough to debootstrap and download 800Mb to be 
>> installed) but that didn't make the drill: there are still fs corruption 
>> from time to time and some reset on the lasi disk ;-(.
>> This test just seems to confirm an issue of i/o coherency somewhere but 
>> where those rules could be broken???
> 
> ISTR Joel Soete reporting problems for CCIO machines as well.
> Offhand, I don't know what's wrong and we depend on Ryan Bradetich
> and other volunteers to track that down. 
> 
> cheers,
> grant
> 
>> Merry Xmas,
>> 	r.
>> -
Hello Grant,

I suspecting a possible issue with this hack in your iommu_fill_pdir():

you initialized dma_sg with the adress of startsg (/* pointer to current DMA */)
then before the loop you dma_sg--;

(where and what dms_sg point to, I guess we have any idea?)

Now in the while (nents-- > 0), suppose the test "if (sg_dma_address(startsg) & PIDE_FLAG) {" failed,

so later in the loop the "sg_dma_len(dma_sg) += startsg->length" (which is actually "dma_sg->iova_length += startsg->length" 
) imo could corrupt something?

That said I tried to re-use the first implementation of jejb (what was in ccio-dma.c before this patch 
<http://cvs.parisc-linux.org/linux-2.6/drivers/parisc/ccio-dma.c?r1=1.12&r2=1.13> but that doesn't seems to fix the ccio-dma 
issue at all: I can still read those kind of message at the console while doing such copy
[snip]
scsi1: (4:0) phase mismatch at 01e8, phase IO CD MSG BSY REQ MSG IN
scsi1: Bus Reset detected, executing command 10953600, slot 109708a4, dsp 001301e8[01e8] 

  failing command because of reset, slot 10970778, cmnd 12644a00 

  failing command because of reset, slot 109708a4, cmnd 10953600 

scsi1: (4:0) phase mismatch at 01e8, phase IO CD MSG BSY REQ MSG IN 

scsi1: Bus Reset detected, executing command 12644c40, slot 10970520, dsp 001301e8[01e8] 

  failing command because of reset, slot 10970520, cmnd 12644c40 

scsi1: (4:0) phase mismatch at 01e8, phase IO CD MSG BSY REQ MSG IN 

scsi1: Bus Reset detected, executing command 109532a0, slot 1097064c, dsp 001301e8[01e8] 

  failing command because of reset, slot 1097064c, cmnd 109532a0 

end_request: I/O error, dev sdc, sector 197250
[snip]
while doing such copy:
root@hpalin:/boot# find . | cpio -mpduv /mnt/OldPA/boot/.
/mnt/OldPA/boot/././vmlinux-cp
/mnt/OldPA/boot/././vmlinux-2.6.23-c110-p2
/mnt/OldPA/boot/././vmlinux-cp13
/mnt/OldPA/boot/././vmlinux-2.6.23-c110
/mnt/OldPA/boot/././vmlinux-cp6
/mnt/OldPA/boot/././config-2.6.23-pa-b180
/mnt/OldPA/boot/././vmlinux-cp9
/mnt/OldPA/boot/././lost+found
/mnt/OldPA/boot/././vmlinux-2.6.23-c110-p4
/mnt/OldPA/boot/././vmlinux-2.6.22-3-parisc
/mnt/OldPA/boot/././config-2.6.23-c110-p15
/mnt/OldPA/boot/././vmlinux-2.6.23-c110-p5
/mnt/OldPA/boot/././vmlinux-2.6.23-c110-p15
/mnt/OldPA/boot/././vmlinux-2.6.23-c110-p10
/mnt/OldPA/boot/././System.map-2.6.22-3-parisc
/mnt/OldPA/boot/././vmlinux-2.6.23-c110-p13
/mnt/OldPA/boot/././vmlinux-2.6.23-c110-p6
/mnt/OldPA/boot/././config-2.6.22-3-parisc
/mnt/OldPA/boot/././vmlinux-cp14
/mnt/OldPA/boot/././initrd.img-2.6.22-3-parisc
/mnt/OldPA/boot/././vmlinux-2.4.17-32
/mnt/OldPA/boot/././vmlinux-cp11
/mnt/OldPA/boot/././vmlinux-cp10
/mnt/OldPA/boot/././vmlinux-cp5
/mnt/OldPA/boot/././vmlinux-cp15
/mnt/OldPA/boot/././System.map-2.4.17-32
/mnt/OldPA/boot/././config-2.6.23-c110-p2
/mnt/OldPA/boot/././vmlinux-2.6.23-pa-b180
/mnt/OldPA/boot/././vmlinux-2.6.23-c110-p14
/mnt/OldPA/boot/././vmlinux-cp4
/mnt/OldPA/boot/././vmlinux-2.6.23-c110-p
/mnt/OldPA/boot/././config-2.4.17-32
/mnt/OldPA/boot/././vmlinux-2.6.23-c110-p9
/mnt/OldPA/boot/././vmlinux
/mnt/OldPA/boot/././vmlinux-2.6.23-c110-p11
/mnt/OldPA/boot/././vmlinux-t
/mnt/OldPA/boot/././vmlinux-co
138420 blocks

(the scsi1 is the lasi scsi hba as sources and the target being the disks on ncr53c720 hba)

or experimenting fs issues on this target disks?


That said ok I will wait either U2/Uturn ers public doc or all volonteers feedback.

Cheers,
	r.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: iommu_fill_pdir() and its /* Horrible hack. ... */ reading.
  2007-12-26 17:31       ` iommu_fill_pdir() and its /* Horrible hack. ... */ reading rubisher
@ 2007-12-28  8:27         ` Grant Grundler
  2007-12-28 15:27           ` rubisher
  0 siblings, 1 reply; 8+ messages in thread
From: Grant Grundler @ 2007-12-28  8:27 UTC (permalink / raw)
  To: rubisher; +Cc: Grant Grundler, linux-parisc

Hi,

On Wed, Dec 26, 2007 at 05:31:51PM +0000, rubisher wrote:
> Hello Grant,
>
> I suspecting a possible issue with this hack in your iommu_fill_pdir():
>
> you initialized dma_sg with the adress of startsg (/* pointer to current 
> DMA */)
> then before the loop you dma_sg--;

Yes. The comment before that line explains why it does that.

...
> Now in the while (nents-- > 0), suppose the test "if 
> (sg_dma_address(startsg) & PIDE_FLAG) {" failed,

Do you have any evidence this test has failed when dma_sg is pointing
at garbage?

While possible, that would be a bug in iommu_coalesce_chunks()
for not setting PIDE_FLAG.

> so later in the loop the "sg_dma_len(dma_sg) += startsg->length" (which is 
> actually "dma_sg->iova_length += startsg->length" ) imo could corrupt 
> something?

Yes, that would be the result. Can you try a bug catcher to prove
that's something is actually getting corrupted?

Add something like the following around line 65 (before "sg_dma_len(dma_sg)"
is assigned):
	BUG_ON(dma_sg < startsg);


On the same note, line 44 is clearly wrong:
 41                 if (sg_dma_address(startsg) & PIDE_FLAG) {
 42                         u32 pide = sg_dma_address(startsg) & ~PIDE_FLAG;
 43 
 44                         BUG_ON(pdirp && (dma_len != sg_dma_len(dma_sg)));
 45 
 46                         dma_sg++;

The BUG_ON at line 44 might fail when it shouldn't (and vice versa).
My preference is to remove it or put "#ifdef DEBUG_IOMMU" around
that line of code (not literally, but effectively).


In general, I didn't like the "pre-decrement" but it seems to work and
makes the code a bit more efficient. Efficiency is extremely important
for this code since it gets called so often. Small changes can have
easily measured impact.

> That said I tried to re-use the first implementation of jejb (what was in 
> ccio-dma.c before this patch 
> <http://cvs.parisc-linux.org/linux-2.6/drivers/parisc/ccio-dma.c?r1=1.12&r2=1.13> 
> but that doesn't seems to fix the ccio-dma issue at all: I can still read 
> those kind of message at the console while doing such copy
> [snip]
> scsi1: (4:0) phase mismatch at 01e8, phase IO CD MSG BSY REQ MSG IN
> scsi1: Bus Reset detected, executing command 10953600, slot 109708a4, dsp 
> 001301e8[01e8] 

I'm thinking we really need SCSI bus traces to figure out if the SCSI driver
is doing the right thing and if not, exactly what is it doing.

If it is a CCIO bug, my guess is it's more likely to be problems with
setting magic bits.  We really need the ERS to review register settings.

..
> (the scsi1 is the lasi scsi hba as sources and the target being the disks 
> on ncr53c720 hba)
>
> or experimenting fs issues on this target disks?

I doubt this is a file system problem.

> That said ok I will wait either U2/Uturn ers public doc or all volonteers 
> feedback.

I'm skeptical for the former and hopeful for the latter.
There is a chance Linux Foundation could ask HP for those docs under NDA.
But you need to sign up with Linux Foundataion as a developer and
then request HP for those docs.

cheers,
grant

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: iommu_fill_pdir() and its /* Horrible hack. ... */ reading.
  2007-12-28  8:27         ` Grant Grundler
@ 2007-12-28 15:27           ` rubisher
  0 siblings, 0 replies; 8+ messages in thread
From: rubisher @ 2007-12-28 15:27 UTC (permalink / raw)
  To: Grant Grundler; +Cc: linux-parisc

Grant Grundler wrote:
> Hi,
> 
> On Wed, Dec 26, 2007 at 05:31:51PM +0000, rubisher wrote:
>> Hello Grant,
>>
>> I suspecting a possible issue with this hack in your iommu_fill_pdir():
>>
>> you initialized dma_sg with the adress of startsg (/* pointer to current 
>> DMA */)
>> then before the loop you dma_sg--;
> 
> Yes. The comment before that line explains why it does that.
> 
> ...
>> Now in the while (nents-- > 0), suppose the test "if 
>> (sg_dma_address(startsg) & PIDE_FLAG) {" failed,
> 
> Do you have any evidence this test has failed when dma_sg is pointing
> at garbage?
> 
> While possible, that would be a bug in iommu_coalesce_chunks()
> for not setting PIDE_FLAG.
> 
>> so later in the loop the "sg_dma_len(dma_sg) += startsg->length" (which is 
>> actually "dma_sg->iova_length += startsg->length" ) imo could corrupt 
>> something?
> 
> Yes, that would be the result. Can you try a bug catcher to prove
> that's something is actually getting corrupted?
> 
> Add something like the following around line 65 (before "sg_dma_len(dma_sg)"
> is assigned):
> 	BUG_ON(dma_sg < startsg);
> 
> 
> On the same note, line 44 is clearly wrong:
>  41                 if (sg_dma_address(startsg) & PIDE_FLAG) {
>  42                         u32 pide = sg_dma_address(startsg) & ~PIDE_FLAG;
>  43 
>  44                         BUG_ON(pdirp && (dma_len != sg_dma_len(dma_sg)));
>  45 
>  46                         dma_sg++;
> 
> The BUG_ON at line 44 might fail when it shouldn't (and vice versa).
> My preference is to remove it or put "#ifdef DEBUG_IOMMU" around
> that line of code (not literally, but effectively).
> 
Good idea:
here is the patch I used:
Index: linux-current/drivers/parisc/iommu-helpers.h
===================================================================
--- linux-current.orig/drivers/parisc/iommu-helpers.h   2007-12-28 12:59:35.000000000 +0000
+++ linux-current/drivers/parisc/iommu-helpers.h        2007-12-28 12:45:29.000000000 +0000
@@ -22,14 +22,14 @@
         /* Horrible hack.  For efficiency's sake, dma_sg starts one
          * entry below the true start (it is immediately incremented
          * in the loop) */
-        dma_sg--;
+       dma_sg--;

         while (nents-- > 0) {
                 unsigned long vaddr;
                 long size;

                 DBG_RUN_SG(" %d : %08lx/%05x %08lx/%05x\n", nents,
-                          (unsigned long)sg_dma_address(startsg), cnt,
+                          (unsigned long)sg_dma_address(startsg), sg_dma_len(startsg),
                            sg_virt_addr(startsg), startsg->length
                 );

@@ -41,7 +41,9 @@
                 if (sg_dma_address(startsg) & PIDE_FLAG) {
                         u32 pide = sg_dma_address(startsg) & ~PIDE_FLAG;

+#ifdef DEBUG_IOMMU
                         BUG_ON(pdirp && (dma_len != sg_dma_len(dma_sg)));
+#endif

                         dma_sg++;

@@ -62,6 +64,7 @@
                         prefetchw(pdirp);
                 }

+               BUG_ON(dma_sg < startsg);
                 BUG_ON(pdirp == NULL);

                 vaddr = sg_virt_addr(startsg);
=== <> ===

And :<(
------------[ cut here ]------------ 

kernel BUG at /CAD/linux-2.6.23-pa-git-20071022/drivers/parisc/iommu-helpers.h:67! 

 

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI 

PSW: 00000000000001001111011100001110 Not tainted 

r00-03  0004f70e fff0bdc0 10212064 107dd000 

r04-07  00000000 10880d48 00000001 00000006 

r08-11  108fc8a0 10440e10 00000001 108fc8a0 

r12-15  10000000 00008000 00000005 0000000e 

r16-19  1084c800 1081880c 10387964 007dc005 

r20-23  10491000 00001000 00000000 00000005 

r24-27  107dc000 07dc0000 10880d40 103dee10 

r28-31  00000000 00000000 10819080 108fc8b4 

sr00-03  00000000 00000000 00000000 00000000 

sr04-07  00000000 00000000 00000000 00000000 

 

IASQ: 00000000 00000000 IAOQ: 102120a8 102120ac 

  IIR: 03ffe01f    ISR: 00000000  IOR: 108fc8b4 

  CPU:        0   CR30: 10818000 CR31: f01043c0 

  ORIG_R28: aac6ca23 

  IAOQ[0]: ccio_map_sg+0x22c/0x3a8 

  IAOQ[1]: ccio_map_sg+0x230/0x3a8 

  RP(r2): ccio_map_sg+0x1e8/0x3a8 

Backtrace: 

  [<10107164>] die_if_kernel+0xa0/0x1b0 

  [<10107898>] handle_interruption+0x624/0x6b4 

  [<1010b078>] intr_check_sig+0x0/0x34 

  [<10121574>] enqueue_task+0x28/0x44 

  [<102120ac>] ccio_map_sg+0x230/0x3a8 

  [<10171428>] do_filp_open+0x54/0x68 

  [<10211fd8>] ccio_map_sg+0x15c/0x3a8 

  [<10266d2c>] scsi_dma_map+0x48/0x58 

  [<1027597c>] NCR_700_queuecommand+0x38c/0x504 

  [<10260368>] scsi_dispatch_cmd+0x118/0x288 

  [<1026664c>] scsi_request_fn+0x184/0x2e0 

  [<101fb440>] __generic_unplug_device+0x38/0x44 

  [<101fb7fc>] generic_unplug_device+0x14/0x24 

  [<101f841c>] blk_backing_dev_unplug+0x1c/0x28 

  [<10198b18>] sync_buffer+0x38/0x50 

  [<1019ab20>] __bread+0x90/0xec 

 

Kernel panic - not syncing: Attempted to kill init!

but there must be something wrong in this test because same change applied to iommu-helper without 'horrible hack' panic at 
the same place:
------------[ cut here ]------------
kernel BUG at /CAD/linux-2.6.23-pa-git-20071022/drivers/parisc/iommu-helpers.h:64!

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00000000000001001111011100001110 Not tainted
r00-03  0004f70e fff0bdc0 10212068 107dd000
r04-07  00000000 108fc8b4 108fc8a0 10880d48
r08-11  00000006 10440e10 00000001 00000001
r12-15  1084c800 108fc8a0 10000000 00008000
r16-19  00000005 0000000e 10387964 007dc005
r20-23  10491000 00000000 00000000 00000005
r24-27  107dc000 07dc0000 10880d40 103dee10
r28-31  000007dc 00002ee0 10819080 00000000
sr00-03  00000000 00000000 00000000 00000000
sr04-07  00000000 00000000 00000000 00000000

IASQ: 00000000 00000000 IAOQ: 102120b0 102120b4
  IIR: 03ffe01f    ISR: 00000000  IOR: 00000000
  CPU:        0   CR30: 10818000 CR31: f01043c0
  ORIG_R28: aac6ca23
  IAOQ[0]: ccio_map_sg+0x234/0x3ac
  IAOQ[1]: ccio_map_sg+0x238/0x3ac
  RP(r2): ccio_map_sg+0x1ec/0x3ac
Backtrace:
  [<10107164>] die_if_kernel+0xa0/0x1b0
  [<10107898>] handle_interruption+0x624/0x6b4
  [<1010b078>] intr_check_sig+0x0/0x34
  [<10121a84>] __wake_up_common+0x7c/0xcc
  [<101713c0>] nameidata_to_filp+0x44/0x58
  [<10171428>] do_filp_open+0x54/0x68
  [<10211fd8>] ccio_map_sg+0x15c/0x3ac
  [<10266d30>] scsi_dma_map+0x48/0x58
  [<10275980>] NCR_700_queuecommand+0x38c/0x504
  [<1026036c>] scsi_dispatch_cmd+0x118/0x288
  [<10266650>] scsi_request_fn+0x184/0x2e0
  [<101fb440>] __generic_unplug_device+0x38/0x44
  [<101fb7fc>] generic_unplug_device+0x14/0x24
  [<101f841c>] blk_backing_dev_unplug+0x1c/0x28
  [<10198b18>] sync_buffer+0x38/0x50
  [<1019ab20>] __bread+0x90/0xec

Kernel panic - not syncing: Attempted to kill init!
Rebooting in 10 seconds..

So we could only guess that the other BUG_ON(pdirp == NULL); do well its job and you have right I have no evidence that 
something is actually corrupted.

> 
> In general, I didn't like the "pre-decrement" but it seems to work and
> makes the code a bit more efficient. Efficiency is extremely important
> for this code since it gets called so often. Small changes can have
> easily measured impact.
> 
Understand, eventhought for my linux learning only I prefer a more robust code
>> That said I tried to re-use the first implementation of jejb (what was in 
>> ccio-dma.c before this patch 
>> <http://cvs.parisc-linux.org/linux-2.6/drivers/parisc/ccio-dma.c?r1=1.12&r2=1.13> 
>> but that doesn't seems to fix the ccio-dma issue at all: I can still read 
>> those kind of message at the console while doing such copy
>> [snip]
>> scsi1: (4:0) phase mismatch at 01e8, phase IO CD MSG BSY REQ MSG IN
>> scsi1: Bus Reset detected, executing command 10953600, slot 109708a4, dsp 
>> 001301e8[01e8] 
> 
> I'm thinking we really need SCSI bus traces to figure out if the SCSI driver
> is doing the right thing and if not, exactly what is it doing.
> 
Well, submitted some stress test (the loop of disk read/write with a tar -xf linux-2.6.11.tar) on the same disk but 
connected to a b180 (i.e. using same ncr53c710 driver for the same lasi hba but without ccio-dma driver) didn't showed any 
failures.
(for the ncr53c720 hba, i didn't have any other system to test it without ccio-dma ;<( )

> If it is a CCIO bug, my guess is it's more likely to be problems with
> setting magic bits.  We really need the ERS to review register settings.
> 
> ..
>> (the scsi1 is the lasi scsi hba as sources and the target being the disks 
>> on ncr53c720 hba)
>>
>> or experimenting fs issues on this target disks?
> 
> I doubt this is a file system problem.

No, it's a ext3 which I use on severall other hp model (b180, b2k so without ccio-dma) without any issues

> 
>> That said ok I will wait either U2/Uturn ers public doc or all volonteers 
>> feedback.
> 
> I'm skeptical for the former and hopeful for the latter.
> There is a chance Linux Foundation could ask HP for those docs under NDA.
> But you need to sign up with Linux Foundataion as a developer and
> then request HP for those docs.
> 
Well I am not actualy a developer, just an engineer (not in computer science) trying to help but may be is it enough to sign 
up with Linux Foundation?
If yes, can you send me a link to request papers?

tx again,
	r.

PS: This c110 boxe was more reliable (I mean I could easily update system frequently, build kernel, ...) when it was equiped 
with 512M RAM; it became not any more usable when I had to decrease this ram size to original one (i.e. 64M)?

> cheers,
> grant
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-12-28 15:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-22 12:15 dma_addr_t: which comment is correct? rubisher
2007-12-23  9:39 ` Grant Grundler
2007-12-23 22:50   ` rubisher
2007-12-24  8:51     ` Grant Grundler
2007-12-26 10:01       ` Thibaut VARENE
2007-12-26 17:31       ` iommu_fill_pdir() and its /* Horrible hack. ... */ reading rubisher
2007-12-28  8:27         ` Grant Grundler
2007-12-28 15:27           ` rubisher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox