Linux PARISC architecture development
 help / color / mirror / Atom feed
* [parisc-linux] syncdma question (back to ccio drivers)
@ 2006-11-02 22:11 Joel Soete
  2006-11-04 22:39 ` [parisc-linux] " Grant Grundler
  0 siblings, 1 reply; 4+ messages in thread
From: Joel Soete @ 2006-11-02 22:11 UTC (permalink / raw)
  To: parisc-linux, grundler

Hello Grant,

In one of my test, I also activated CCIO_MAP_STATS and noticed that before 53c700 pb occured the ccio driver used a very few 
number of entries: may max 30 of severall 100 available?

This make me so suspected a pb of coherency and remember me another of your comment in sba:
         /* XXX REVISIT for 2.5 Linux - need syncdma for zero-copy support.
         ** For Astro based systems this isn't a big deal WRT performance.
         ** As long as 2.4 kernels copyin/copyout data from/to userspace,
         ** we don't need the syncdma. The issue here is I/O MMU cachelines
         ** are *not* coherent in all cases.  May be hwrev dependent.
         ** Need to investigate more.
         asm volatile("syncdma");
         */

Reading back pa11_acd text:
� Cache Coherent I/O
Two instructions (LOAD COHERENCE INDEX and SYNCHRONIZE DMA) have been added to enable cache coherent memory references by 
I/O modules. Previously, responsibility for cache coherence between the processor and I/O modules lay with software, which 
had to use a sequence of flush and purge operations to ensure coherence. While software cache coherence for I/O is still 
attractive in uniprocessor systems because of the lower hardware cost, hardware cache coherence for I/O has a relatively low 
incremental cost in multiprocessor systems.

� Uncacheable Memory
An optional U (Uncacheable) bit has been added to each data TLB entry which controls cache move-in for the corresponding 
page. When the U-bit is set, new lines must not be moved in to the data cache, although existing lines may remain resident 
in the cache. This forces all references to non-resident lines to cause transactions to memory and enables better support of 
industry standard I/O busses where byte and word transactions to memory are sometimes required to communicate with I/O devices.

Unfortunately later:

If implemented, the U (Uncacheable) bit is found in the data TLB entry associated with a page. Whether or not the U-bit is 
implemented, the state of this bit if implemented, whether the memory reference is virtual or absolute, and whether the 
reference is made from a page in the memory or I/O address spaces determine if the reference may be moved into the data 
cache. The detailed rules for moving references into the data cache are specified in "Data Cache Move-In" on page 3-21.

Software must set the U-bit associated with all pages in the I/O address space to 1. Referencing a page in the I/O address 
space for which the U-bit is 0 is an undefined operation.

Changing the state of the U-bit for a page has no effect on the data cache lines from that page which already exist in the 
cache. A page from the memory address space which has its U-bit set to 0 is called a cacheable page. Pages from the I/O 
address space and pages which have their U-bit set to 1 are called uncacheable pages. It is possible for data cache lines 
from an uncacheable page to exist in a data cache. This case may be caused by changing a cacheable page to uncacheable after 
references to this page were moved into the data cache.

So my first question is:
How/where could I find if U-bit is implemented on my systems?

p-l pacache.S rely on its implementation (while hpux does syncdma conditional to a global var: duno what? )

TIA,
	Joel

PS: by reference to this James'paper <http://www.linuxjournal.com/article/7104>, mmu virtualize physical memory addresses 
for the cpu and otoh iommu virtualize this same physical memory addresses for the io bus; so given a virtual page address 
for the cpu, it's impossible for lpa to help me to know if this page is a physical address of a page in IO address space (I 
mean above 0xF0000000 for 32 bit kernels and above 0xF1000000 00000000 for 64bit kernel)?

PS2: is there any way to grab a [id]tlb entry for a given virtual address (may be undocumented feature like the "bit graber" 
;-) ?)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [parisc-linux] Re: syncdma question (back to ccio drivers)
  2006-11-02 22:11 [parisc-linux] syncdma question (back to ccio drivers) Joel Soete
@ 2006-11-04 22:39 ` Grant Grundler
  2006-11-05 11:50   ` Joel Soete
  0 siblings, 1 reply; 4+ messages in thread
From: Grant Grundler @ 2006-11-04 22:39 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Thu, Nov 02, 2006 at 10:11:02PM +0000, Joel Soete wrote:
> Hello Grant,
> 
> In one of my test, I also activated CCIO_MAP_STATS and noticed that before 
> 53c700 pb occured the ccio driver used a very few number of entries: may 
> max 30 of severall 100 available?

ok. that's not too surprising given drivers are only supposed to map
memory for DMA just before sending the DMA request to HW.

> This make me so suspected a pb of coherency and remember me another of your 
> comment in sba:
>         /* XXX REVISIT for 2.5 Linux - need syncdma for zero-copy support.
>         ** For Astro based systems this isn't a big deal WRT performance.
>         ** As long as 2.4 kernels copyin/copyout data from/to userspace,
>         ** we don't need the syncdma. The issue here is I/O MMU cachelines
>         ** are *not* coherent in all cases.  May be hwrev dependent.
>         ** Need to investigate more.
>         asm volatile("syncdma");
>         */

What makes you think this is a problem with IOMMU coherency?

> Reading back pa11_acd text:
...
> So my first question is:
> How/where could I find if U-bit is implemented on my systems?

I'm certain all PA 2.0 systems support U-bit.
I believe all PA 1.1 systems do too.
arch/parisc/kernel/pci-dma.c depends on it I think.

PDC might also tell us but I haven't looked the spec recently.

> p-l pacache.S rely on its implementation (while hpux does syncdma 
> conditional to a global var: duno what? )
> 
> TIA,
> 	Joel
> 
> PS: by reference to this James'paper 
> <http://www.linuxjournal.com/article/7104>, mmu virtualize physical memory 
> addresses for the cpu and otoh iommu virtualize this same physical memory 
> addresses for the io bus; so given a virtual page address for the cpu, it's 
> impossible for lpa to help me to know if this page is a physical address of 
> a page in IO address space (I mean above 0xF0000000 for 32 bit kernels and 
> above 0xF1000000 00000000 for 64bit kernel)?

lpa by itself won't work.
Also need to know the memory map of the system.

> PS2: is there any way to grab a [id]tlb entry for a given virtual address 
> (may be undocumented feature like the "bit graber" ;-) ?)

 I don't know. Probably but requires knowledge of how addresses
map to either I or D cache.

grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [parisc-linux] Re: syncdma question (back to ccio drivers)
  2006-11-04 22:39 ` [parisc-linux] " Grant Grundler
@ 2006-11-05 11:50   ` Joel Soete
  2006-11-06  7:11     ` Grant Grundler
  0 siblings, 1 reply; 4+ messages in thread
From: Joel Soete @ 2006-11-05 11:50 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

Hello Grant,

Grant Grundler wrote:
 > On Thu, Nov 02, 2006 at 10:11:02PM +0000, Joel Soete wrote:
 >> Hello Grant,
 >>
 >> In one of my test, I also activated CCIO_MAP_STATS and noticed that before 53c700 pb occured the ccio driver used a very 
few number of entries: may max 30 of severall 100 available?
 >
 > ok. that's not too surprising given drivers are only supposed to map
 > memory for DMA just before sending the DMA request to HW.
 >
 >> This make me so suspected a pb of coherency and remember me another of your comment in sba:
 >>         /* XXX REVISIT for 2.5 Linux - need syncdma for zero-copy support.
 >>         ** For Astro based systems this isn't a big deal WRT performance.
 >>         ** As long as 2.4 kernels copyin/copyout data from/to userspace,
 >>         ** we don't need the syncdma. The issue here is I/O MMU cachelines
 >>         ** are *not* coherent in all cases.  May be hwrev dependent.
 >>         ** Need to investigate more.
 >>         asm volatile("syncdma");
 >>         */
 >
 > What makes you think this is a problem with IOMMU coherency?
 >
Remember: 53c700 driver on b180 (don't use any ccio) works fine but the same 53c700 driver on c110/d380 failed sadely:
<http://lists.parisc-linux.org/pipermail/parisc-linux/2006-September/030202.html>

and according to James' comment:
<http://lists.parisc-linux.org/pipermail/parisc-linux/2006-September/030204.html>

this should be a pb in sg list management; not in 53c700 (because works fine without ccio) but well in ccio to which this 
53c700 driver has to address its io request, right?

In a first step, I so manage to backport all your sba job since the time those drivers looks like brotherhood:
around
<http://cvs.parisc-linux.org/linux-2.4/arch/parisc/kernel/sba_iommu.c?rev=1.26&view=markup>

This seems to help to improve ncr53c720 driver (not absolutely sure: run untar/rm loop only 1h while it failed after few min 
before, but not yet enough for me and more over it seems to break dino on the d380 additional nic, though) but if didn't 
seem to degrade 53c700 driver, it didn't improve it at all.

In a second step, I suspected specific stuff to ccio and specialy what doesn't seems to exist here in ccio:
the sba
     /* flush purges */
     READ_REG32(ioc->ioc_hpa+IOC_PCOM);

but without doc (not yet publicaly available) I couldn't go further in this investigation.

Let so assume that's ok.

Anyway something else could show a pb of synchronization: the driver perf which can be a bit improved by disabling 
CCIO_SEARCH_TIME as this comment said:
/*
  * CCIO_SEARCH_TIME can help measure how fast the bitmap search is.
  * impacts performance though - ditch it if you don't use it.
  */

If that make top stat completely false that mainly made 53c700 behaviour even worse: (with the same driver code and same up 
config) even only one untar/rm didn't reach to complete (iirc it didn't even finished untar) while it could at least 
complete 2 or 3 loop before.

This latest test make me though it could also be a pb of synchronization somewhere between ccio and 53c700 and may be a pb 
of cache?

 >> Reading back pa11_acd text:
 > ...
 >> So my first question is:
 >> How/where could I find if U-bit is implemented on my systems?
 >
 > I'm certain all PA 2.0 systems support U-bit.
 > I believe all PA 1.1 systems do too.
 > arch/parisc/kernel/pci-dma.c depends on it I think.
 >
 > PDC might also tell us but I haven't looked the spec recently.
 >
ok I will try to find out stuff.

 >> p-l pacache.S rely on its implementation (while hpux does syncdma conditional to a global var: duno what? )
 >>
 >> TIA,
 >>     Joel
 >>
 >> PS: by reference to this James'paper <http://www.linuxjournal.com/article/7104>, mmu virtualize physical memory 
addresses for the cpu and otoh iommu virtualize this same physical memory addresses for the io bus; so given a virtual page 
address for the cpu, it's impossible for lpa to help me to know if this page is a physical address of a page in IO address 
space (I mean above 0xF0000000 for 32 bit kernels and above 0xF1000000 00000000 for 64bit kernel)?
 >
 > lpa by itself won't work.
 > Also need to know the memory map of the system.
 >
 >> PS2: is there any way to grab a [id]tlb entry for a given virtual address (may be undocumented feature like the "bit 
graber" ;-) ?)
 >
 >  I don't know. Probably but requires knowledge of how addresses
 > map to either I or D cache.
 >
 > grant
 >
 >
Thanks,
     Joel

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [parisc-linux] Re: syncdma question (back to ccio drivers)
  2006-11-05 11:50   ` Joel Soete
@ 2006-11-06  7:11     ` Grant Grundler
  0 siblings, 0 replies; 4+ messages in thread
From: Grant Grundler @ 2006-11-06  7:11 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Sun, Nov 05, 2006 at 11:50:45AM +0000, Joel Soete wrote:
..
> > What makes you think this is a problem with IOMMU coherency?
> >
> Remember: 53c700 driver on b180 (don't use any ccio) works fine but the 
> same 53c700 driver on c110/d380 failed sadely:
> <http://lists.parisc-linux.org/pipermail/parisc-linux/2006-September/030202.html>
> 
> and according to James' comment:
> <http://lists.parisc-linux.org/pipermail/parisc-linux/2006-September/030204.html>
> 
> this should be a pb in sg list management; not in 53c700 (because works 
> fine without ccio) but well in ccio to which this 53c700 driver has to 
> address its io request, right?

Ah ok. It doesn't have to be the SG list handling.
So what is wrong? I have no clue.


This might also be a write-posting problem with MMIO register writes.
The CCIO chip might be introducing enough delay to expose the problem.

My second guess is a coherency problem with "consistent" data.
Ie control data that is allocated with pci_alloc_consistent().
I find this unlikely but it's possible.

> In a first step, I so manage to backport all your sba job since the time 
> those drivers looks like brotherhood:
> around
> <http://cvs.parisc-linux.org/linux-2.4/arch/parisc/kernel/sba_iommu.c?rev=1.26&view=markup>

Those two driver do have alot in common. But the TLB replacement algorithms
are NOT the same. The IO Pdir has different coherency rules as well.
Unfortunately, I don't remember all the details.

> This seems to help to improve ncr53c720 driver (not absolutely sure: run 
> untar/rm loop only 1h while it failed after few min before, but not yet 
> enough for me and more over it seems to break dino on the d380 additional 
> nic, though) but if didn't seem to degrade 53c700 driver, it didn't improve 
> it at all.
> 
> In a second step, I suspected specific stuff to ccio and specialy what 
> doesn't seems to exist here in ccio:
> the sba
>     /* flush purges */
>     READ_REG32(ioc->ioc_hpa+IOC_PCOM);
> 
> but without doc (not yet publicaly available) I couldn't go further in this 
> investigation.

Yes, that's another difference. IIRC, SBA can flush a _range_ of TLB entries
and CCIO (Uturn/U2) can not.

> 
> Let so assume that's ok.
> 
> Anyway something else could show a pb of synchronization: the driver perf 
> which can be a bit improved by disabling CCIO_SEARCH_TIME as this comment 
> said:
> /*
>  * CCIO_SEARCH_TIME can help measure how fast the bitmap search is.
>  * impacts performance though - ditch it if you don't use it.
>  */
> 
> If that make top stat completely false

Sorry -ENOPARSE.

> that mainly made 53c700 behaviour 
> even worse: (with the same driver code and same up config) even only one 
> untar/rm didn't reach to complete (iirc it didn't even finished untar) 
> while it could at least complete 2 or 3 loop before.

Sounds like there is a race condition between asking for a mapping
and it's use. Enableing CCIO_SEARCH_TIME will just make that longer.
Maybe experiement with adding udelay(10) or udelay(100) in the
same code path to see what happens.

> This latest test make me though it could also be a pb of synchronization 
> somewhere between ccio and 53c700 and may be a pb of cache?
 
Maybe.

hth,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-11-06  7:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-02 22:11 [parisc-linux] syncdma question (back to ccio drivers) Joel Soete
2006-11-04 22:39 ` [parisc-linux] " Grant Grundler
2006-11-05 11:50   ` Joel Soete
2006-11-06  7:11     ` Grant Grundler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox