Linux MIPS Architecture development
 help / color / mirror / Atom feed
* VCEI/VCED handling
@ 1998-09-28 23:14 Thomas Bogendoerfer
  1998-09-28 23:50 ` ralf
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Bogendoerfer @ 1998-09-28 23:14 UTC (permalink / raw)
  To: linux

Yesterday I studied the MIPS user's manual to find out, what has to be
done for the virtual cache coherency exceptions. Before I start to write
some code, I want make sure, that I got it right.

VCEI:
	Hit Set Virtual on BadVaddr

VCED: 
	Hit Invalidate BadVaddr
	TLB Lookup for BadVaddr
	Physical Address -> Index
	Index Load Tag
	Extract PIdx from TagLo
	Construct Vaddr from BadVaddr and PIdx
	Hit Write Back on created Vaddr
	Hit Set Virtual on BadVaddr

Comments ?

Thomas.

-- 
See, you not only have to be a good coder to create a system like Linux,
you have to be a sneaky bastard too ;-)
                   [Linus Torvalds in <4rikft$7g5@linux.cs.Helsinki.FI>]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VCEI/VCED handling
  1998-09-28 23:14 VCEI/VCED handling Thomas Bogendoerfer
@ 1998-09-28 23:50 ` ralf
  1998-09-29 21:24   ` Thomas Bogendoerfer
  0 siblings, 1 reply; 7+ messages in thread
From: ralf @ 1998-09-28 23:50 UTC (permalink / raw)
  To: Thomas Bogendoerfer, linux

On Tue, Sep 29, 1998 at 01:14:14AM +0200, Thomas Bogendoerfer wrote:

> Yesterday I studied the MIPS user's manual to find out, what has to be
> done for the virtual cache coherency exceptions. Before I start to write
> some code, I want make sure, that I got it right.
> 
> VCEI:
> 	Hit Set Virtual on BadVaddr
> 
> VCED: 
> 	Hit Invalidate BadVaddr
> 	TLB Lookup for BadVaddr
> 	Physical Address -> Index
> 	Index Load Tag
> 	Extract PIdx from TagLo
> 	Construct Vaddr from BadVaddr and PIdx
> 	Hit Write Back on created Vaddr
> 	Hit Set Virtual on BadVaddr
> 
> Comments ?

We've got code of which we're shure that it is correct.  Nevertheless
Linux ist still fragile on SC machines.  I've been tracking this in
private emails with Ulf but so far only with limited success.  Aside of
the missing VCED / VCEI handlers there must be something else that is
broken.

  Ralf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VCEI/VCED handling
  1998-09-28 23:50 ` ralf
@ 1998-09-29 21:24   ` Thomas Bogendoerfer
  1998-09-29 22:34     ` William J. Earl
  1998-09-29 23:58     ` ralf
  0 siblings, 2 replies; 7+ messages in thread
From: Thomas Bogendoerfer @ 1998-09-29 21:24 UTC (permalink / raw)
  To: ralf; +Cc: linux

On Tue, Sep 29, 1998 at 01:50:03AM +0200, ralf@uni-koblenz.de wrote:
> We've got code of which we're shure that it is correct.  Nevertheless
> Linux ist still fragile on SC machines.  I've been tracking this in
> private emails with Ulf but so far only with limited success.  Aside of
> the missing VCED / VCEI handlers there must be something else that is
> broken.

As I understand the problem now, I wrote the little test program below.
If I'll try it on a R4600PC Indy or a R4000PC Olivetti with Linux, I don't
get what I would expect. On IRIX, Linux/Alpha (I have to change the offset
between the two mapping to 0x2000, because of the bigger page size on Alphas)
and Linux/x86 the program works. IMHO this is a showstopper as we don't handle 
cache aliases right.  

How does IRIX solve this problem ? Does it disable caching for shared 
writeable pages ?

Thomas.

#include <sys/types.h>
#include <sys/fcntl.h>
#include <sys/mman.h>
#include <stdio.h>
#include <unistd.h>

unsigned char buf[1024];

int main (int argc, char *argv[])
{
	int fd;
	unsigned char *mem1,*mem2;

	if ((fd = open ("mmap_file",O_RDWR|O_CREAT,0644)) < 0) {
		perror ("open");
		exit (1);
	}
	write (fd, buf, sizeof(buf));

	if ((mem1 = mmap (NULL, 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0)) == (unsigned char *)-1) {
		perror ("mmap mem1");
		exit (2);
	}
	if ((mem2 = mmap (mem1+0x1000, 1024, PROT_READ|PROT_WRITE,MAP_SHARED|MAP_FIXED, fd, 0)) == (unsigned char *)-1) {
		perror ("mmap mem2");
		exit (3);
	}
	printf ("mem1 %p, mem2 %p\n",mem1,mem2);

	*mem1 = 0x55;
	printf ("*mem1 = %x, *mem2 = %x\n",*mem1,*mem2);

	*mem1 = 0xaa;
	printf ("*mem1 = %x, *mem2 = %x\n",*mem1,*mem2);

	*mem2 = 0x33;
	printf ("*mem2 = %x, *mem1 = %x\n",*mem2,*mem1);

	*mem2 = 0xcc;
	printf ("*mem2 = %x, *mem1 = %x\n",*mem2,*mem1);

	return 0;
}

-- 
See, you not only have to be a good coder to create a system like Linux,
you have to be a sneaky bastard too ;-)
                   [Linus Torvalds in <4rikft$7g5@linux.cs.Helsinki.FI>]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VCEI/VCED handling
  1998-09-29 21:24   ` Thomas Bogendoerfer
@ 1998-09-29 22:34     ` William J. Earl
  1998-09-29 23:58     ` ralf
  1 sibling, 0 replies; 7+ messages in thread
From: William J. Earl @ 1998-09-29 22:34 UTC (permalink / raw)
  To: Thomas Bogendoerfer; +Cc: ralf, linux

Thomas Bogendoerfer writes:
 > On Tue, Sep 29, 1998 at 01:50:03AM +0200, ralf@uni-koblenz.de wrote:
 > > We've got code of which we're shure that it is correct.  Nevertheless
 > > Linux ist still fragile on SC machines.  I've been tracking this in
 > > private emails with Ulf but so far only with limited success.  Aside of
 > > the missing VCED / VCEI handlers there must be something else that is
 > > broken.
 > 
 > As I understand the problem now, I wrote the little test program below.
 > If I'll try it on a R4600PC Indy or a R4000PC Olivetti with Linux, I don't
 > get what I would expect. On IRIX, Linux/Alpha (I have to change the offset
 > between the two mapping to 0x2000, because of the bigger page size on Alphas)
 > and Linux/x86 the program works. IMHO this is a showstopper as we don't handle 
 > cache aliases right.  
 > 
 > How does IRIX solve this problem ? Does it disable caching for shared 
 > writeable pages ?

      No, IRIX does write ownership switching, using the TLB.  That
is, only one virtual cache color (virtual cache page index)
equivalence class of mappings can have the hardware PTE valid bit set
at any one time, if any class has the hardware PTE modify bit set in
any of its PTEs.  If no class has the modify bit set in any PTE, then
all classes may have the PTE valid bit set.  If you want to read
via class (color) 0, and the class 1 is currently writing, all PTEs of
class 1 have the modify bit turned off, and the primary data cache
for class 1 is written back to memory (and is hence marked "clean"
instead of "dirty" in the cache).  Class 0 is then allowed to read
(the hardware valid bit is set in the faulting PTE).  If class 0 wants
to write (gets a modify fault), the valid bit is turned off for all
PTEs of other classes, and the data cache for those classes is invalidated
with respect to the page in question, and then the modify bit is turned
on for the faulting PTE.  

     Note that this problem applies to the R4000PC and to all R4600
and R5000 processors (PC and SC), because there is no hardware VCE
support in those processors.  Software must avoid allowing virtual
aliases of different colors to write concurrently, and must
writeback-invalidate the cache for the old color and invalidate the
PTEs for the old color, when allowing some other color to write.
Doing this efficienly requires some form of back pointer from the page
frame table entry for the page to all of the virtual references for
the page (the PTEs).  IRIX does not need to do this for anonymous
pages, since they cannot be double-mapped with different virtual
colors.  It uses the vnode pointer in the page frame table to the list
of mappings for the page; the analog in linux is inode field in
mem_map_t, which points to the mapped file, which in turn points to
the mappings via i_mmap and the vm_next_share/vm_pprev_share pointers.

    Note further that this problem is complicated by the MIPS K0SEG
addressing mode.  Since there are no PTEs for K0SEG, the kernel should
not use K0SEG addresses for pages which may be mapped into user space
with multiple colors.

    IRIX makes an effort to minimize conflicting mappings, by arranging
that the default address selected for mmap() is color-congruent to the
offset in the file being mapped.  This of course does not work for
MAP_FIXED, so MAP_FIXED requires the above write ownership switching.

    The ownership can be provided by a field in mem_map_t of 5 bits:

	int	vcolor : 5;

Where we define:

#define	PAGE_VCOLOR_MIN 0
#define	PAGE_VCOLOR_MAX 7
#define PAGE_VCOLOR_NONE (-2)
#define PAGE_VCOLOR_SHARED (-1)

and

#define PAGE_IS_VCOLOR_EXCLUSIVE(mm) ((mm)->vcolor >= 0)
#define PAGE_IS_VCOLOR_SHARED(mm) ((mm)->vcolor == PAGE_VCOLOR_SHARED)

We also have

extern	int	pagevcolorsize;
extern	int	pagevcolormask;

#define vaddr_to_vcolor(va) ((((__psunsigned_t) (va)) / NBPP) & pagevcolormask)

We set pagecolorsize to the size of one set of the cache divided by the
size of a page, and we set pagecolormask to (pagecolorsize - 1).
For the R4000PC and R4600, pagecolorsize is 2; for the R5000, pagecolorsize
is 4.  Note that the virtual color on the R4000SC and R4400SC has 8 values,
regardless of the primary cache size, for a page size of 4 KB, because
the secondary cache treats the 8 possible values of the PIdx field of the
secondary cache tag as distinct.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VCEI/VCED handling
  1998-09-29 21:24   ` Thomas Bogendoerfer
  1998-09-29 22:34     ` William J. Earl
@ 1998-09-29 23:58     ` ralf
  1998-09-30  1:21       ` William J. Earl
  1 sibling, 1 reply; 7+ messages in thread
From: ralf @ 1998-09-29 23:58 UTC (permalink / raw)
  To: Thomas Bogendoerfer; +Cc: linux

On Tue, Sep 29, 1998 at 11:24:55PM +0200, Thomas Bogendoerfer wrote:

> On Tue, Sep 29, 1998 at 01:50:03AM +0200, ralf@uni-koblenz.de wrote:
> > We've got code of which we're shure that it is correct.  Nevertheless
> > Linux ist still fragile on SC machines.  I've been tracking this in
> > private emails with Ulf but so far only with limited success.  Aside of
> > the missing VCED / VCEI handlers there must be something else that is
> > broken.
> 
> As I understand the problem now, I wrote the little test program below.
> If I'll try it on a R4600PC Indy or a R4000PC Olivetti with Linux, I don't
> get what I would expect. On IRIX, Linux/Alpha (I have to change the offset
> between the two mapping to 0x2000, because of the bigger page size on Alphas)
> and Linux/x86 the program works. IMHO this is a showstopper as we don't handle 
> cache aliases right.  

It's a known problem - and a nasty one.  Basically a good solution requires
a way to map a page's physical address to virtual addresses.  And Linux 2.2's
mm will not provide this feature.  Reverse mappings are under planning for
2.3 for other improvments.  It was my plan to ignore this bug for now and
deal with virtual coherency when reverse mappings are implemented in 2.3.

Oh, I've already implemented the solution for the special case of ZERO_PAGE.
On CPUs which know about the virtual coherency exception have eight colours
for zero page.  The change is basically to pass in the virtual address to
ZERO_PAGE such that we always do ``colourly correct'' allocation.  That was
the simple case.

> How does IRIX solve this problem ? Does it disable caching for shared 
> writeable pages ?

Mapping shared writeable pages uncached is not the solution.  The virtual
coherency problem in Linux/MIPS may happen between multiple userspace
mappings or userspace and kernelspace, that is KSEG0, mappings.  While
we could disable caching for certain pages in the hope that we'll only
end up with a few uncached pages making KSEG0 uncached is completly
unacceptable performancewise.  However, if we don't, then we might end up
with aliases between userspace pages and KSEG0 pages.

Here two postings from Wje recovered from my neverending mail archives:

> Date: Thu, 2 Apr 1998 15:15:03 -0800
> Message-Id: <199804022315.PAA01986@fir.engr.sgi.com>
> From: "William J. Earl" <wje@fir.engr.sgi.com>
> To: ralf@uni-koblenz.de
> Cc: "William J. Earl" <wje@fir.engr.sgi.com>, linux@cthulhu.engr.sgi.com
> Subject: Re: VCE exceptions
> 
> ralf@uni-koblenz.de writes:
>  > On Thu, Apr 02, 1998 at 01:41:02PM -0800, William J. Earl wrote:
>  > 
>  > >  > Another way to finally eleminate the virtual coherency problem from
>  > >  > KSEG0's landscape would be to actually use 8 pages as an array of
>  > >  > empty_zero_pages[], so we would be able to map one wherever we want
>  > >  > such that we never run into virtual coherency trouble.
>  > > 
>  > >       For an always-zero page, this is the best solution.  At a small
>  > > cost in memory, you get far less overhead.
>  > 
>  > Indeed, 16ns on a 250Mhz machine for every exception that goes via the
>  > general exception vector _plus_ the actual vce / vci handling, that sucks.
>  > I just wonder why those exceptions have been implemented at all?
>  > 
>  > They may help somewhat in debugging operating systems, but in our situation
>  > they're nervragging by their mere existance.
> 
> In the R10000, the hardware does the VCE correction.  On the R4000PC, R4600,
> and R5000, we have to avoid the problem in software, since the hardware
> does not detect conflicts.   The motivation, and the reason that IRIX
> depends on VCEs on the R4000 and R4400, was to make it easier to port
> R3000 operating systems to the R4000.  If you don't have infrastructure
> to control virtual aliasing (where a single page is mapped read-write at
> two distinct virtual addresses with differing primary cache virtual indexes),
> you get wrong answers with VCE (whether handled in software or hardware).
> At MIPS, with the Magnum 4000PC under RISC/os, and at SGI, with the
> Indy R4000PC (and later R4600 and R5000), I modified RISC/os and IRIX to
> control virtual aliasing, but only for those platforms without hardware
> VCE detection (in order to minimize time to market).  
> 
>     Note that taking a K0SEG address for a physical page which is also mapped
> to user space can easily cause a VCE, since there is a good chance that
> the K0SEG virtual index differs from the user space virtual index, unless
> you match physical page color to virtual page color when allocating pages.
> Note that you have to do that for any pages which must be accessible in
> the general exception handler, since you cannot handle a VCE in the
> exception handler.

> Date: Fri, 3 Apr 1998 11:11:15 -0800
> Message-Id: <199804031911.LAA21028@fir.engr.sgi.com>
> From: "William J. Earl" <wje@fir.engr.sgi.com>
> To: ralf@uni-koblenz.de
> Cc: linux@cthulhu.engr.sgi.com
> Subject: Re: VCE exceptions

[...]

>      Remember that there are two parts to the problem.  For the R4000SC and
> R4400SC, you can attack it by having the general exception handler avoid
> referencing any page which could get a VCE (probably by not using data in
> memory until after determining that the exception is a VCE exception) 
> and having the VCE exception handler, running in assembly code at SR_EXL
> level, fixup the cache (typically by doing hit-writeback-invalidate on
> the D cache, hit-invalidate on the I cache, and hit-set-virtual on
> the S cache, or else by doing hit-writeback-invalidate on the S-cache,
> which is simpler but slower, accounting for the multiple I and
> D cache entries per S cache entry).    For the other processors, the software
> has to avoid creating D cache aliases (mappings with different virtual
> indexes in the D cache) for writeable pages, to avoid data corruption
> via stale copies in the D cache.  Since there is no hardware detection
> of the aliases, the data corruption is silent.

[...]

>  > A small test program for the mmap/write problem attached.  If may be
>  > necessary to start it several times in order to make it print the ``Big
>  > trouble, man ...'' message.
> 
>      As soon as I get a chance, I will look at the relevant linux
> code.  Note that physical color allocation can also make a big
> performance difference on direct mapped secondary caches, as on all of
> the Indy processors with secondary caches.  That is, you want to
> maximize the likelihood that the secondary cache indexes of the
> physical pages in a given application will be uniformly distributed
> across the secondary cache.  Excessive hot spots will lead to
> dramatically lower performance.  Allocation of a page where physical
> color matches intented virtual color matters only if you need
> to use a K0SEG address for the page to avoid TLB misses (as in 
> the general exception handler, unless the K2SEG address is wired).
> 
>      For the mmap/write problem, what I did in IRIX was to first try
> to assign mmap() virtual colors and buffer cache virtual colors
> (colors of the K2SEG address for the page, not necessarily physical
> color of the page, although having the physical color match means that
> a cheaper K0SEG reference can be used) congruent to the virtual color
> of the file offset for that page.  Then write() will see the same
> virtual color when accessing the page as will the user program when
> accessing the page using an address created using mmap().  When
> MAP_FIXED and MAP_SHARED are set, however, and the specified virtual
> color for the mapping is not congruent to the specified file offset,
> an extra mechanism is required, namely software ownership switching of
> the "current" virtual color.  For the page frame, we remember the
> current virtual color, and arrange that the pg_vr bit is set only for
> mappings which match that virtual color.  If we get a fault on a
> mapping of a different color, we writeback-invalidate the primary
> caches for the "current" color, invalidate the "current" mappings (by
> turning off pg_vr), record the new "current" color, and then validate
> the new "current" mappings (by turning on pg_vr).  In IRIX 6.3 and
> later versions, I also allow the possibility of a "shared read
> multiple color" state, where all mappings were allowed to be valid,
> but with pg_m off.  That is, the "current" color became a
> multiple-reader/single-writer lock on access to the page (where the
> "single-writer" was a color equivalence class, not a single mapping).
> In this case, the transition from "multiple-reader" mode to
> "single-writer" mode requires invalidating all colors of the primary
> cache for the given page.  Note that for MAP_FIXED with MAP_PRIVATE,
> we can simply copy the page, even when it has not yet been modified,
> if the mapped virtual color is not congruent to the file offset
> virtual color.
> 
>     In IRIX, we handle the instruction cache specially, and do not
> attempt to keep it coherent on the processors without hardware VCE
> detection, so the above description is a little more restrictive than
> what actually happens.  This approach is based on updates to instruction
> pages being relatively rare, compared to updates to other pages,
> so we wind up doing fewer I cache invalidates overall.

  Ralf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VCEI/VCED handling
  1998-09-29 23:58     ` ralf
@ 1998-09-30  1:21       ` William J. Earl
  1998-10-03 16:00         ` ralf
  0 siblings, 1 reply; 7+ messages in thread
From: William J. Earl @ 1998-09-30  1:21 UTC (permalink / raw)
  To: ralf; +Cc: Thomas Bogendoerfer, linux

ralf@uni-koblenz.de writes:
...
 > It's a known problem - and a nasty one.  Basically a good solution requires
 > a way to map a page's physical address to virtual addresses.  And Linux 2.2's
 > mm will not provide this feature.  Reverse mappings are under planning for
 > 2.3 for other improvments.  It was my plan to ignore this bug for now and
 > deal with virtual coherency when reverse mappings are implemented in 2.3.
 >
 > Oh, I've already implemented the solution for the special case of ZERO_PAGE.
 > On CPUs which know about the virtual coherency exception have eight colours
 > for zero page.  The change is basically to pass in the virtual address to
 > ZERO_PAGE such that we always do ``colourly correct'' allocation.  That was
 > the simple case.

       On the machines without VCE detection (R4000PC, R4600, R5000),
the zero page is safe, because it is read-only.  Anonymous pages
are not an issue, since they are not double-mapped.

       What is wrong with going from the mem_map_t.inode to the
inode.i_mmap list of mappings, and thence to the PTEs?  IRIX, at least
before IRIX 6.5, does the equivalent to solve this problem.

 > > How does IRIX solve this problem ? Does it disable caching for shared 
 > > writeable pages ?
 > 
 > Mapping shared writeable pages uncached is not the solution.  The virtual
 > coherency problem in Linux/MIPS may happen between multiple userspace
 > mappings or userspace and kernelspace, that is KSEG0, mappings.  While
 > we could disable caching for certain pages in the hope that we'll only
 > end up with a few uncached pages making KSEG0 uncached is completly
 > unacceptable performancewise.  However, if we don't, then we might end up
 > with aliases between userspace pages and KSEG0 pages.

      You can use KSEG2 instead of KSEG0 for all pages which might be
mapped into user space.  IRIX mostly does that, and keeps the KSEG2 mapping
around only as long as necessary, and then only with the current virtual
color (the color which currently has write ownership of the page) locked
(which means that references via other colors block until the kernel
gives up its mapping).  

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VCEI/VCED handling
  1998-09-30  1:21       ` William J. Earl
@ 1998-10-03 16:00         ` ralf
  0 siblings, 0 replies; 7+ messages in thread
From: ralf @ 1998-10-03 16:00 UTC (permalink / raw)
  To: William J. Earl; +Cc: Thomas Bogendoerfer, linux

On Tue, Sep 29, 1998 at 06:21:03PM -0700, William J. Earl wrote:

>        On the machines without VCE detection (R4000PC, R4600, R5000),
> the zero page is safe, because it is read-only.  Anonymous pages
> are not an issue, since they are not double-mapped.
> 
>        What is wrong with going from the mem_map_t.inode to the
> inode.i_mmap list of mappings, and thence to the PTEs?  IRIX, at least
> before IRIX 6.5, does the equivalent to solve this problem.

Nothing - anymore.  On older kernels your suggestion wouldn't have worked
for SYSV IPC, so I forgot about that possibility ...

>  > Mapping shared writeable pages uncached is not the solution.  The virtual
>  > coherency problem in Linux/MIPS may happen between multiple userspace
>  > mappings or userspace and kernelspace, that is KSEG0, mappings.  While
>  > we could disable caching for certain pages in the hope that we'll only
>  > end up with a few uncached pages making KSEG0 uncached is completly
>  > unacceptable performancewise.  However, if we don't, then we might end up
>  > with aliases between userspace pages and KSEG0 pages.
> 
>       You can use KSEG2 instead of KSEG0 for all pages which might be
> mapped into user space.  IRIX mostly does that, and keeps the KSEG2 mapping
> around only as long as necessary, and then only with the current virtual
> color (the color which currently has write ownership of the page) locked
> (which means that references via other colors block until the kernel
> gives up its mapping).  

It is my understanding that read/write syscalls used on mmaped file is the
only instance of that problem we need to deal with.  I think we can do so by
modifying update_vm_cache to fit the special needs of certain cache
architectures and introducing something similar to be used when reading.

The KSEG2 approach is definately much nicer for that than what we're using
right now for ptrace reading / writing to other processes' address space.

Btw, it looks like the MIPS NT HAL do the same thing.  At least the interfaces
provides by HAL.DLL strongly suggest so.

Oh, and somebody just promised to snail me a R4000SC module so I hope I can
tackle the SC problems rsn and make some more people lucky.

  Ralf

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~1998-10-03 16:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
1998-09-28 23:14 VCEI/VCED handling Thomas Bogendoerfer
1998-09-28 23:50 ` ralf
1998-09-29 21:24   ` Thomas Bogendoerfer
1998-09-29 22:34     ` William J. Earl
1998-09-29 23:58     ` ralf
1998-09-30  1:21       ` William J. Earl
1998-10-03 16:00         ` ralf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox