* [parisc-linux] depi?
@ 1999-11-15 8:08 Alex deVries
1999-11-15 7:24 ` Jeffrey A Law
` (2 more replies)
0 siblings, 3 replies; 45+ messages in thread
From: Alex deVries @ 1999-11-15 8:08 UTC (permalink / raw)
To: parisc-linux
What does this actually do:
; Get ready for phys->virt transition
; First order of business is to adjust some pointers
depi 3,1,2,%arg0 ; phys->virt(free mem ptr)
depi 3,1,2, %sp ; phys->virt SP
depi 3,1,2, %dp ; p2v DP
in head.S?
I don't have a 'depi' in the index of my PA 2.0 assembler book. I have a
depwi and a depdi though.
- Alex "I shot the sheriff, but I did not shoot the depdi" deVries
--
Alex deVries <adevries@thepuffingroup.com>
Vice President Engineering
The Puffin Group
^ permalink raw reply [flat|nested] 45+ messages in thread* Re: [parisc-linux] depi? 1999-11-15 8:08 [parisc-linux] depi? Alex deVries @ 1999-11-15 7:24 ` Jeffrey A Law 1999-11-15 7:36 ` Stan Sieler 1999-11-15 8:19 ` Philipp Rumpf 2 siblings, 0 replies; 45+ messages in thread From: Jeffrey A Law @ 1999-11-15 7:24 UTC (permalink / raw) To: Alex deVries; +Cc: parisc-linux In message <Pine.LNX.4.10.9911150307300.7996-100000@vodka.thepuffingroup.com> you write: > > What does this actually do: > > ; Get ready for phys->virt transition > ; First order of business is to adjust some pointers > depi 3,1,2,%arg0 ; phys->virt(free mem ptr) > depi 3,1,2, %sp ; phys->virt SP > depi 3,1,2, %dp ; p2v DP > > in head.S? > > I don't have a 'depi' in the index of my PA 2.0 assembler book. I have a > depwi and a depdi though. HP changed a large amount of their assembly syntax for PA2.0. depi == depwi Basically they made the extract/deposit instructions explicitly mention their size [word vs double word]. jeff ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-15 8:08 [parisc-linux] depi? Alex deVries 1999-11-15 7:24 ` Jeffrey A Law @ 1999-11-15 7:36 ` Stan Sieler 1999-11-15 8:25 ` Philipp Rumpf 1999-11-15 8:19 ` Philipp Rumpf 2 siblings, 1 reply; 45+ messages in thread From: Stan Sieler @ 1999-11-15 7:36 UTC (permalink / raw) To: Alex deVries; +Cc: parisc-linux Re: > ; Get ready for phys->virt transition > ; First order of business is to adjust some pointers > depi 3,1,2,%arg0 ; phys->virt(free mem ptr) > depi 3,1,2, %sp ; phys->virt SP > depi 3,1,2, %dp ; p2v DP DEPI is "Deposite Immediate". depi 3,1,2, %arg0 drops the value 3 into the upper 2 bits of register arg0. IIRC, it's: DEPI immediate_value, right_most_bit#, #bits, target_register But...strange code. It's setting the upper 2 bits of R26, R30, and R27. > in head.S? > > I don't have a 'depi' in the index of my PA 2.0 assembler book. I have a > depwi and a depdi though. > > - Alex "I shot the sheriff, but I did not shoot the depdi" deVries > > -- > Alex deVries <adevries@thepuffingroup.com> > Vice President Engineering > The Puffin Group > > --------------------------------------------------------------------------- > To unsubscribe: send e-mail to parisc-linux-request@thepuffingroup.com with > `unsubscribe' as the subject. > -- Stan Sieler sieler@allegro.com www.allegro.com/sieler/wanted/index.html www.allegro.com/sieler ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-15 7:36 ` Stan Sieler @ 1999-11-15 8:25 ` Philipp Rumpf 1999-11-15 23:14 ` Frank Rowand 0 siblings, 1 reply; 45+ messages in thread From: Philipp Rumpf @ 1999-11-15 8:25 UTC (permalink / raw) To: Stan Sieler; +Cc: Alex deVries, parisc-linux > > ; First order of business is to adjust some pointers > > depi 3,1,2,%arg0 ; phys->virt(free mem ptr) > > depi 3,1,2, %sp ; phys->virt SP > > depi 3,1,2, %dp ; p2v DP > > DEPI is "Deposite Immediate". depi 3,1,2, %arg0 > drops the value 3 into the upper 2 bits of register arg0. > > IIRC, it's: DEPI immediate_value, right_most_bit#, #bits, target_register > > But...strange code. It's setting the upper 2 bits of R26, R30, and R27. The way physical memory is mapped to kernel virtual memory is (with exceptions): physical address P is mapped at virtual address P + PAGE_OFFSET. PAGE_OFFSET currently is 0xc000 0000 which was a bad value and will be changed to either 0x8000 0000 or 0xe000 0000 in the near future. This is one of the reasons you should use tophys and tovirt instead of doing the depi by hand. Probably most of the depis are my code and I think there are several places where I am still hard-coding 0xc000 0000 as PAGE_OFFSET explicitly. If you find a place you think depends on a certain PAGE_OFFSET after I cleaned up the code I can find, please tell us. Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-15 8:25 ` Philipp Rumpf @ 1999-11-15 23:14 ` Frank Rowand 1999-11-16 0:26 ` John David Anglin ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Frank Rowand @ 1999-11-15 23:14 UTC (permalink / raw) To: Philipp Rumpf, parisc-linux; +Cc: Alex deVries Philipp Rumpf wrote: > > > > ; First order of business is to adjust some pointers > > > depi 3,1,2,%arg0 ; phys->virt(free mem ptr) > > > depi 3,1,2, %sp ; phys->virt SP > > > depi 3,1,2, %dp ; p2v DP > > > > DEPI is "Deposite Immediate". depi 3,1,2, %arg0 > > drops the value 3 into the upper 2 bits of register arg0. > > > > IIRC, it's: DEPI immediate_value, right_most_bit#, #bits, target_register > > > > But...strange code. It's setting the upper 2 bits of R26, R30, and R27. > > The way physical memory is mapped to kernel virtual memory is (with exceptions): > > physical address P is mapped at virtual address P + PAGE_OFFSET. > > PAGE_OFFSET currently is 0xc000 0000 which was a bad value and will be changed > to either 0x8000 0000 or 0xe000 0000 in the near future. This is one of the > reasons you should use tophys and tovirt instead of doing the depi by hand. < stuff deleted > This is just one of several recent messages dealing with the issues caused by locating the kernel at virtual address 0xc0000000 instead of 0x00000000. I still don't understand why the kernel can't be at zero, even though several people have tried to explain it to me. Can anyone provide a clear explanation? Thanks! -Frank ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-15 23:14 ` Frank Rowand @ 1999-11-16 0:26 ` John David Anglin 1999-11-16 12:39 ` Matthew Wilcox 1999-11-16 8:26 ` Philippe Benard 1999-11-16 16:08 ` Philipp Rumpf 2 siblings, 1 reply; 45+ messages in thread From: John David Anglin @ 1999-11-16 0:26 UTC (permalink / raw) To: frowand; +Cc: Philipp Rumpf, parisc-linux, Alex deVries Frank Rowand wrote: > This is just one of several recent messages dealing with the issues caused by > locating the kernel at virtual address 0xc0000000 instead of 0x00000000. I > still don't understand why the kernel can't be at zero, even though several > people have tried to explain it to me. Can anyone provide a clear > explanation? I must admit I don't understand it either. One consequence is that the kernel no longer boots using the hpux ipl command without running som_relocate on the kernel. This messes up the object file so debuggers will have to be specially modified for this configuration. It also means that virtual and physical addresses are different for most of the kernel. I think it was done to simplify the syscall interface. However, maybe instead of different real and virtual code, the kernel should have different syscall and real-virtual code. I am not an expert but I think this is how it is done with hpux. It is linked at ~0 (0x11000 for 10.20). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 0:26 ` John David Anglin @ 1999-11-16 12:39 ` Matthew Wilcox 1999-11-16 17:17 ` Philipp Rumpf 0 siblings, 1 reply; 45+ messages in thread From: Matthew Wilcox @ 1999-11-16 12:39 UTC (permalink / raw) To: John David Anglin; +Cc: frowand, Philipp Rumpf, parisc-linux, Alex deVries On Mon, Nov 15, 1999 at 07:26:49PM -0500, John David Anglin wrote: > I think it was done to simplify the syscall interface. Nope. Maybe we'll need a slightly different syscall entry path depending on where the kernel ends up, but choosing where the kernel lives will have no material effect on syscalls. About the only place which is undesirable for the kernel to live is 0xc000'0000 :-) -- Matthew Wilcox <willy@bofh.ai> "Windows and MacOS are products, contrived by engineers in the service of specific companies. Unix, by contrast, is not so much a product as it is a painstakingly compiled oral history of the hacker subculture." - N Stephenson ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 12:39 ` Matthew Wilcox @ 1999-11-16 17:17 ` Philipp Rumpf 0 siblings, 0 replies; 45+ messages in thread From: Philipp Rumpf @ 1999-11-16 17:17 UTC (permalink / raw) To: Matthew Wilcox Cc: John David Anglin, frowand, Philipp Rumpf, parisc-linux, Alex deVries > > I think it was done to simplify the syscall interface. > Nope. Maybe we'll need a slightly different syscall entry path depending > on where the kernel ends up, but choosing where the kernel lives will have > no material effect on syscalls. About the only place which is undesirable > for the kernel to live is 0xc000'0000 :-) Actually, that's not even true because of our space register usage. IIRC the syscall page is at SR7 / 0xc000 0000 (SR7 may be written to in PL 0 only) so there is nothing preventing us from setting the space register to something different from 0x0000 0000 and managing a TLB entry for it - nothing but lazi- ness, that is. Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-15 23:14 ` Frank Rowand 1999-11-16 0:26 ` John David Anglin @ 1999-11-16 8:26 ` Philippe Benard 1999-11-16 12:20 ` Alan Cox 1999-11-16 12:35 ` Matthew Wilcox 1999-11-16 16:08 ` Philipp Rumpf 2 siblings, 2 replies; 45+ messages in thread From: Philippe Benard @ 1999-11-16 8:26 UTC (permalink / raw) To: frowand; +Cc: Philipp Rumpf, parisc-linux, Alex deVries Frank Rowand wrote: > > > This is just one of several recent messages dealing with the issues caused by > locating the kernel at virtual address 0xc0000000 instead of 0x00000000. I > still don't understand why the kernel can't be at zero, even though several > people have tried to explain it to me. Can anyone provide a clear > explanation? > I asked this question (and provided examples) 4 or 6 monthes ago. I never got answered, I asked for kernel VAS description for 32 and 64 bits, and I asked for user process VAS too and yet got no answer. If I understood correctly, the kernel is located there because it is where it is on PC. Sounds like linux is not more portable than any other propriaitary OS since confined to PC architecture, a port beeing a mimic of PC if the target arch can do it. I'm not an HP-PA architecture fan, not orthogonal enough, but the architecture is what is is and IMHO, due to our cache design (+tlb) that is a virtual cache, the OS run in virtual mode (most of the time) except sometime, or except for a given set of data structures, for those exceptional code and data, the equivalently mapped is the easiest and cleanest solution (again due to hppa, not in absolute). I think getting away from equiv map, and moving the kernel text far in the kernel VAS is a source of lot of problems. On the other hand keeping equiv map mean kernel text in low phys mem addr to be runable on tiny WS machine like a C3000 with few megs of main memory up to a 8 node servers with 256 Gb of main memory. (oops kidding we are talking 712 16 Meg here :-) Now if linux design can't survive the kernel TEXT relocation to low addr we definitly are in a dead lock. I mean since I got no answer about kernel VAS description the few thing I see here doesn't looks like it will run on hppa2.0 in wide mode, then all this effort of what so called 'porting' has to be redone again for pa2.0 (or abandoned in favor of IA64?). The number of PA2.0 out there goes up and up and being able to run wide on those machine sounds ledgit. All thise is pure ignorance from me, 'may be' the current linux design with kernel at 0xc0000000 make sense in a WIDE kernel but it is unclear to me.... As i said in a previous mail we have two choice here, mimic closely the PC up to the endianism for instance, to be able to grab as much stuff as possible from the PC world but ultimatly we will not be able to run PC-linux executable, then being that close doesn't looks that important. On the other hand being as close as HPUX (and HP dependend stuff like PDC) could allow to be able to dual boot vmunix/vmlinux, to run hpux a.out onto vmlinux to laverage current hp software (if any) etc... being close to hp design is somewhat having the kernel in low phys addr. Not sure what I'm saying is completly valid, this is just feeling... Phi ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 8:26 ` Philippe Benard @ 1999-11-16 12:20 ` Alan Cox 1999-11-16 11:53 ` Philippe Benard 1999-11-16 12:35 ` Matthew Wilcox 1 sibling, 1 reply; 45+ messages in thread From: Alan Cox @ 1999-11-16 12:20 UTC (permalink / raw) To: Philippe Benard; +Cc: frowand, Philipp.H.Rumpf, parisc-linux, adevries > If I understood correctly, the kernel is located there because it is where it > is on PC. Thats why someone pulled that number out of a hat. You can pull any other page aligned number out of a hat and that will be fine too. > Sounds like linux is not more portable than any other propriaitary OS since > confined to PC architecture, a port beeing a mimic of PC if the target arch > can do it. Check the mm code. We don't care where you put the kernel. The rules we go by are simple 1. The kernel must be able to access all of the current tasks user space efficiently. This is done via macros/functions/inlines 2. The kernel must be able to access all physical ram and also mappings of MMIO space. The mappings can be 1:1 - for example on the ultrasparc we use the override bits for this. I/O space maps are accessed via macros/functions. If those functions resolve to no actual extra code this is good. 3. Access to other tasks virtual address space has to be possible. It can be slow and suck however as its main use is ptrace(). The mappings we use vary. M68K for example maps the kernel about 3.5Gig up and uses space registers to access user space. The x86 because of the weak page table flipping facilities keeps both user and kernel in a single map. That makes for nice fast x86 code. That is x86 specific. Alan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 12:20 ` Alan Cox @ 1999-11-16 11:53 ` Philippe Benard 1999-11-16 12:58 ` Alan Cox ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Philippe Benard @ 1999-11-16 11:53 UTC (permalink / raw) To: Alan Cox; +Cc: frowand, Philipp.H.Rumpf, parisc-linux, adevries Alan Cox wrote: > > Thats why someone pulled that number out of a hat. You can pull any other > page aligned number out of a hat and that will be fine too. > Hum interesting, too bad the hat didn't got 0x1000 or 0x10000 i.e something that could be mapped to 0.0x0x1000 or 0.0x10000 this would greatly simplify the kernel code writing that need to run traslation off, I bet they used a red hat :-) > > Alan Anyway I'm sure that after a significant effort and marco's vmlinux will finally boot with all those funny things, 2 os in 1 file etc... :-) I'm still in the dark regarding spaces usage (that is hppa dependent) in vmlinux, since this define the VAS usage, I'm back to the initial question how the VAS is used/designed for the kernel and for user processes. For instance how big can be a process under linux on pa1.1? If you answer is lnear 4Gb, I would say whoa they must have a real good design. If you say 4x1Gb, I would say hum, they are using spaces If you say nx1Gb, I would say, looks interesting, they must have compiler support for long pointer Etc.... For now I have the feeling (hoping I completly wrong) that the user space is confined into the low 2Gb and the kernel space is located into the high 2Gb, well I bet I'm wrong here, I will try to find this mm.c code you spoke about, I was more hoping de design document even very thin, there is no need for a big book to describe how a VAS is implemented. Phi ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 11:53 ` Philippe Benard @ 1999-11-16 12:58 ` Alan Cox 1999-11-16 15:55 ` John David Anglin 1999-11-17 13:00 ` Philipp Rumpf 2 siblings, 0 replies; 45+ messages in thread From: Alan Cox @ 1999-11-16 12:58 UTC (permalink / raw) To: Philippe Benard; +Cc: alan, frowand, Philipp.H.Rumpf, parisc-linux, adevries > For now I have the feeling (hoping I completly wrong) that the user space is > confined into the low 2Gb and the kernel space is located into the high 2Gb, > well I bet I'm wrong here, I will try to find this mm.c code you spoke about, > I was more hoping de design document even very thin, there is no need for a > big book to describe how a VAS is implemented. How it does now now on Linux/hppa32 and how it ends up looking are two different questions altogether. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 11:53 ` Philippe Benard 1999-11-16 12:58 ` Alan Cox @ 1999-11-16 15:55 ` John David Anglin 1999-11-17 13:00 ` Philipp Rumpf 2 siblings, 0 replies; 45+ messages in thread From: John David Anglin @ 1999-11-16 15:55 UTC (permalink / raw) To: Philippe Benard; +Cc: alan, frowand, Philipp.H.Rumpf, parisc-linux, adevries > For instance how big can be a process under linux on pa1.1? Unless you change gcc, the address space is 4GB for pa 1.x. This is the short (32 bit) pointer mode. Space register sr0 is used for loads and stores. When sr0 is used, the pa hardware uses bits 0 and 1 of the memory offset to select one of the space registers sr4 to sr7. The selected register is then used with the offset to generate the full virtual address for the operation. Thus, only 1GB of the 4GB addressable using a given space register is actually used in this mode. The standard usage for sr4 to sr7 in a user process is: sr4: text sr5: data sr6: shared data (shared libs & mmap) sr7: shared data (upper 256 MB is reserved for system use). This model was selected because it is large enough for most applications and efficient. -- J. David Anglin dave.anglin@nrc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6605) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 11:53 ` Philippe Benard 1999-11-16 12:58 ` Alan Cox 1999-11-16 15:55 ` John David Anglin @ 1999-11-17 13:00 ` Philipp Rumpf 2 siblings, 0 replies; 45+ messages in thread From: Philipp Rumpf @ 1999-11-17 13:00 UTC (permalink / raw) To: Philippe Benard Cc: Alan Cox, frowand, Philipp Heinrich Rumpf, parisc-linux, adevries > > Thats why someone pulled that number out of a hat. You can pull any other > > page aligned number out of a hat and that will be fine too. Fine with the MM subsystem. If you fix all the code stolen to work with the new PAGE_OFFSET. And fix it again when we merge with 2.3. > Hum interesting, too bad the hat didn't got 0x1000 or 0x10000 i.e something > that could be mapped to 0.0x0x1000 or 0.0x10000 this would greatly simplify > the kernel code writing that need to run traslation off, I bet they used a red > hat :-) While any other page-aligned value for PAGE_OFFSET is fine with the Linux MM code, it isn't with parisc hardware. For that we need 1 MB-aligned addresses (theoretically) or 512 KB-aligned addresses (according to the rumors about undocumented hardware I heard). > Anyway I'm sure that after a significant effort and marco's vmlinux will > finally boot with all those funny things, 2 os in 1 file etc... :-) It boots quite fine now, and it never did that for PAGE_OFFSET == 0x0000 0000 (as it was back in the dark time when we didn't use VM at all). The problem right now is we cannot run C code while PAGE_OFFSET is 0, and I would like to do that (if I were good at parisc assembly, we wouldn't need to do that. Unfortunately, I'm quite bad, and I happen to prefer reading debug messages over staring at PIM dumps for hours (which I've done, too)). > I'm still in the dark regarding spaces usage (that is hppa dependent) in > vmlinux, since this define the VAS usage, I'm back to the initial question how > the VAS is used/designed for the kernel and for user processes. > For instance how big can be a process under linux on pa1.1? Right now TASK_SIZE, which is #defined to PAGE_OFFSET, which is defined to be 0xc000 0000. Soon, 0x8000 0000 (without changing anything else). > If you answer is lnear 4Gb, I would say whoa they must have a real good > design. Eventually, we might have a flat 3 GB address space. (4 GB is more difficult because the syscall pages are at 0xc000 0000 / 0xc000 1000 currently). Flat 4 GB is more difficult because HP/UX's ABI puts a syscall page after 3 GB but otherwise there shouldn't be a problem (of course the pages at 0xc000 0000 and 0xc000 1000 always will be special). > If you say 4x1Gb, I would say hum, they are using spaces No need to do that, is there ? (It looks okay as long as 1 GB looks huge and you don't think you'll ever reach it - 80's literature can be so amusing). On the other hand, 64 bits is really huge and we'll never run out of address space on 64-bit machines. This is here so it can be quoted. > For now I have the feeling (hoping I completly wrong) that the user space is > confined into the low 2Gb and the kernel space is located into the high 2Gb, > well I bet I'm wrong here, I will try to find this mm.c code you spoke about, > I was more hoping de design document even very thin, there is no need for a > big book to describe how a VAS is implemented. Right now you're right, we don't implement anything fancy and have to flush the TLB on context switch (strictly speaking maybe we don't and could survive by flushing TLB entries after we got a protection id mismatch). In the near future we'll be using spaces in the obvious way - i.e. a process has all space registers set to a unique value (unique per view of the memory, so similar to the traditional process id (on Linux threads have pids too so it isn't all that easy)). My current impression is the easiest way to implement the unique value is using the same value for protection id and the space registers. Userspace is expected never to change the values in the space registers and doing so will result in a segmentation fault. Is this consistent with what HP/UX binaries expect ? In the far future, we might consider doing fancy things using more than one protection identifier at a time, allowing more than 32768 processes (not counting all threads per process), directly mapping a file using SR1-SR3 (I can see some applications that would like the performance improvement of mmapping a complete disk) and other, even sicker, things. Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 8:26 ` Philippe Benard 1999-11-16 12:20 ` Alan Cox @ 1999-11-16 12:35 ` Matthew Wilcox 1 sibling, 0 replies; 45+ messages in thread From: Matthew Wilcox @ 1999-11-16 12:35 UTC (permalink / raw) To: Philippe Benard; +Cc: frowand, Philipp Rumpf, parisc-linux, Alex deVries On Tue, Nov 16, 1999 at 09:26:32AM +0100, Philippe Benard wrote: > Sounds like linux is not more portable than any other propriaitary OS since > confined to PC architecture, a port beeing a mimic of PC if the target arch > can do it. I think Linux is actually more ported than *BSD these days... > As i said in a previous mail we have two choice here, mimic closely the PC up > to the endianism for instance, to be able to grab as much stuff as possible > from the PC world but ultimatly we will not be able to run PC-linux > executable, then being that close doesn't looks that important. On the other > hand being as close as HPUX (and HP dependend stuff like PDC) could allow to > be able to dual boot vmunix/vmlinux, to run hpux a.out onto vmlinux to > laverage current hp software (if any) etc... being close to hp design is > somewhat having the kernel in low phys addr. The sash executable we have been using as our only piece of userspace runs under HPUX using HPUX syscalls. It will make no difference to userspace where the kernel is mapped. -- Matthew Wilcox <willy@bofh.ai> "Windows and MacOS are products, contrived by engineers in the service of specific companies. Unix, by contrast, is not so much a product as it is a painstakingly compiled oral history of the hacker subculture." - N Stephenson ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-15 23:14 ` Frank Rowand 1999-11-16 0:26 ` John David Anglin 1999-11-16 8:26 ` Philippe Benard @ 1999-11-16 16:08 ` Philipp Rumpf 1999-11-16 17:14 ` Alan Cox ` (2 more replies) 2 siblings, 3 replies; 45+ messages in thread From: Philipp Rumpf @ 1999-11-16 16:08 UTC (permalink / raw) To: frowand; +Cc: Philipp Rumpf, parisc-linux, Alex deVries > This is just one of several recent messages dealing with the issues caused by > locating the kernel at virtual address 0xc0000000 instead of 0x00000000. I > still don't understand why the kernel can't be at zero, even though several > people have tried to explain it to me. Can anyone provide a clear > explanation? Advantages of mapping the kernel at 0xc000 0000 (or 0x8000 0000/0xe000 0000) - consistent with what the other ports do / what hardware does for the other ports (MIPS) - clear difference between user pointers and kernel pointers (this matters for debugging, not for a running system) - clear difference between physical addresses and kernel pointers. Think about vmalloc for one reason this is a good thing. Also I dislike the idea of staring at HPMCs for hours just to find out what really happened is someone forgot to set the D bit and we didn't notice until we happened to hit a vmalloc area. - allows us not to use space registers at all (this might be a nice option to have, though it will give us a performance hit) - allows us to catch NULL pointers and use large pages to map the physical memory [1] - is already implemented Disadvantages - conflicts with a rather obscure restriction of PA1.1 cache aliases. As Frank pointed out to me, it is only guaranteed you access the same data using a physical address and a virtual address mapped to the physical if the physical address is equal to the virtual address (or you are flushing the cache lines in question). I do not believe this to be a real problem with any existing hardware and if it is, the performance hit of the additional cache flushes may be seen as proper punishment for brain-dead hardware. - It is not what existing OSes on PA-RISC do. Whether this really is a dis- advantage I'm not sure (it might serve HP engineers as a reminder that this is not HP/UX and keep them from doing things the HP/UX way even where it is inferior). - It adds some depi instructions, some of them in important code paths. - it adds some human depi instructions during debugging (Ie you see an oops for virtual address 0xc200 XXXX and it takes you some time to figure out this really is because you only have 32 MB and something disagreed (this is a real world example I saw several times while debugging interruption handlers that got called recursively)) - it causes some additional complexity during booting (though Paul Bame has send a simple way to work-around those to the list some time ago). basically this situation is very similar to your typical twice-a-year endian- ness debate (surprisingly so. compare "clear difference between kernel/user pointers" "clear difference between integers of different sizes", "depi instructions" "byte-swapping", "human depi" "reading data dumped as bytes the wrong way around"). Unless it turns out that there are CPUs around which rely on the obscure condition mentioned above (if there is any HP guy around who can confirm/deny this it would be of great help), or other unexpected advantages of mapping the physical memory starting at 0x0000 0000, I don't see this as being a good idea. If you have additional points to make in favour of / against mapping the memory at 0x0000 0000, please reply as soon as possible. Philipp Rumpf [1] PA2.0 supports arbitrary power-of-four page sizes between 4 KB and, depen- ding on the CPU, up to 1 GB. Of course, those pages have to be aligned. So to map 1 GB of physical memory, but leaving the first page unmapped (as you would do to catch NULL pointers), you would need: 3 4 KB-pages 3 16 KB-pages 3 64 KB-pages 3 256 KB-pages 3 1 MB-pages ... 3 256 MB-pages. I.e. 27 pages compared to one 1 GB page for the 0xc000 0000 case. It's not as terrible as it sounds but it is added complexity. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 16:08 ` Philipp Rumpf @ 1999-11-16 17:14 ` Alan Cox 1999-11-16 16:47 ` Philipp Rumpf 1999-11-16 21:43 ` Frank Rowand 1999-11-17 8:14 ` Philippe Benard 2 siblings, 1 reply; 45+ messages in thread From: Alan Cox @ 1999-11-16 17:14 UTC (permalink / raw) To: Philipp Rumpf; +Cc: frowand, Philipp.H.Rumpf, parisc-linux, adevries > Advantages of mapping the kernel at 0xc000 0000 (or 0x8000 0000/0xe000 0000) > > - consistent with what the other ports do / what hardware does for the other > ports (MIPS) Not consistent with what some other ports do however. That one is a non valid argument. > - clear difference between user pointers and kernel pointers (this matters > for debugging, not for a running system) You can stick your elf binaries at 0x80000000 for debugging > - clear difference between physical addresses and kernel pointers. Think > about vmalloc for one reason this is a good thing. Also I dislike the > idea of staring at HPMCs for hours just to find out what really happened > is someone forgot to set the D bit and we didn't notice until we happened > to hit a vmalloc area. Phys = Virt can help you. Ultrasparc makes good use of tlb bypass bits for this. > - allows us to catch NULL pointers and use large pages to map the physical > memory [1] Only relevant for debug. Indeed on x86 with 4Mb maps enabled we dont do null catches in the same way > - is already implemented True > - conflicts with a rather obscure restriction of PA1.1 cache aliases. As > Frank pointed out to me, it is only guaranteed you access the same data > using a physical address and a virtual address mapped to the physical if > the physical address is equal to the virtual address (or you are flushing > the cache lines in question). I do not believe this to be a real problem > with any existing hardware and if it is, the performance hit of the > additional cache flushes may be seen as proper punishment for brain-dead > hardware. The goal is to make it work well. X86 is brain dead, thats why we do the 3gig/1gig game with it. Alan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 17:14 ` Alan Cox @ 1999-11-16 16:47 ` Philipp Rumpf 1999-11-16 17:50 ` Alan Cox 1999-11-17 0:06 ` Grant Grundler 0 siblings, 2 replies; 45+ messages in thread From: Philipp Rumpf @ 1999-11-16 16:47 UTC (permalink / raw) To: Alan Cox; +Cc: Philipp Heinrich Rumpf, frowand, parisc-linux, adevries >> - consistent with what the other ports do / what hardware does for the other >> ports (MIPS) > Not consistent with what some other ports do however. That one is a non > valid argument. Consistent with what the widely-used ports do in 2.2 (as far as I can see). > > - is already implemented > True This actually belongs here (i.e. most of the code is stolen from other architec- tures while the mapping at 0x0000 0000 one would have to be written AFAIK). > > - clear difference between user pointers and kernel pointers (this matters > > for debugging, not for a running system) > You can stick your elf binaries at 0x80000000 for debugging Sure I could. It is going to be work though while we get it for free when mapped at 0x8000 0000. Furthermore this is likely to cause some meta- debugging. > > - clear difference between physical addresses and kernel pointers. Think > > about vmalloc for one reason this is a good thing. Also I dislike the > > idea of staring at HPMCs for hours just to find out what really happened > > is someone forgot to set the D bit and we didn't notice until we happened > > to hit a vmalloc area. > > Phys = Virt can help you. It sure can help from a performance pov. > Ultrasparc makes good use of tlb bypass bits for this. We do have load/store word bypassing the TLB instructions, if that's what you mean. We also can fix one of SR1-SR3 in kernel mode to be the identical map if some phys->virt instructions shows up heavily in profiles. > > - allows us to catch NULL pointers and use large pages to map the physical > > memory [1] > > Only relevant for debug. > Indeed on x86 with 4Mb maps enabled we dont do > null catches in the same way The day the parisc port reaches the usage counts we can stop caring about getting Oops/Panic messages (especially if they were unexpected and happened on a system that seemed stable a long time before) I might agree to the "only". > > - conflicts with a rather obscure restriction of PA1.1 cache aliases. As > > Frank pointed out to me, it is only guaranteed you access the same data > > using a physical address and a virtual address mapped to the physical if > > the physical address is equal to the virtual address (or you are flushing > > the cache lines in question). I do not believe this to be a real problem > > with any existing hardware and if it is, the performance hit of the > > additional cache flushes may be seen as proper punishment for brain-dead > > hardware. > > The goal is to make it work well. > X86 is brain dead, thats why we do the 3gig/1gig game with it. The one point I see is how difficult it's going to be to get both mappings to the point we have 3.75 GB physical mem / 3.75 GB virtual mem (assuming we want to do it directly and don't want to do the x86 highmem stuff). Extending a mapping at 0x8000 0000 to do this is non-trivial, and I would consider an independent mapping using one of SR[1-3] as the easiest way to do it. A mapping at 0x0000 0000 would allow us to do this without any problems, so this is definitely a disadvantage I missed. Note that 3.75 GB userspace isn't a problem with either. My A class looks pretty packed with 768 MB RAM so I doubt you could fit more than 1.75 GB in many pa1.1 boxes. Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 16:47 ` Philipp Rumpf @ 1999-11-16 17:50 ` Alan Cox 1999-11-17 0:06 ` Grant Grundler 1 sibling, 0 replies; 45+ messages in thread From: Alan Cox @ 1999-11-16 17:50 UTC (permalink / raw) To: Philipp Rumpf; +Cc: alan, Philipp.H.Rumpf, frowand, parisc-linux, adevries > Consistent with what the widely-used ports do in 2.2 (as far as I can see). You mean x86 > It sure can help from a performance pov. > > > Ultrasparc makes good use of tlb bypass bits for this. > > We do have load/store word bypassing the TLB instructions, if that's what you > mean. We also can fix one of SR1-SR3 in kernel mode to be the identical map > if some phys->virt instructions shows up heavily in profiles. On the ultrasparc it pays off in part by avoiding TLB miss/reloads > The day the parisc port reaches the usage counts we can stop caring about > getting Oops/Panic messages (especially if they were unexpected and happened > on a system that seemed stable a long time before) I might agree to the "only". And until then you map in 20 odd pages. BFD > Note that 3.75 GB userspace isn't a problem with either. My A class looks > pretty packed with 768 MB RAM so I doubt you could fit more than 1.75 GB in > many pa1.1 boxes. 3.75Gig _virtual_ 3.75Gig _virtual_ 3.75Gig _virtual_ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 16:47 ` Philipp Rumpf 1999-11-16 17:50 ` Alan Cox @ 1999-11-17 0:06 ` Grant Grundler 1999-11-17 6:21 ` Philipp Rumpf 1 sibling, 1 reply; 45+ messages in thread From: Grant Grundler @ 1999-11-17 0:06 UTC (permalink / raw) To: Philipp Rumpf; +Cc: parisc-linux Philipp Rumpf wrote: > Note that 3.75 GB userspace isn't a problem with either. My A class looks > pretty packed with 768 MB RAM so I doubt you could fit more than 1.75 GB in > many pa1.1 boxes. a180.pdf says A-class supports 2GB. And that's the "smallest" box HP ships today and it won't be on the pricelist much longer. 712/100 only supports 192MB. I don't know about 715's and older B/C-class systems. I'm pretty sure all new workstations will support more than 4GB (and run PA2.0). If we can do something now to port linux to those boxes easier later, then I think we should. grant Grant Grundler Unix Developement Lab +1.408.447.7253 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 0:06 ` Grant Grundler @ 1999-11-17 6:21 ` Philipp Rumpf 1999-11-17 18:57 ` Stan Sieler 1999-11-17 19:29 ` Philipp Rumpf 0 siblings, 2 replies; 45+ messages in thread From: Philipp Rumpf @ 1999-11-17 6:21 UTC (permalink / raw) To: Grant Grundler; +Cc: Philipp Rumpf, parisc-linux > > Note that 3.75 GB userspace isn't a problem with either. My A class looks > > pretty packed with 768 MB RAM so I doubt you could fit more than 1.75 GB in > > many pa1.1 boxes. > > a180.pdf says A-class supports 2GB. And that's the "smallest" box HP To support 2 GB (instead of 1.75 GB) we indeed need to do some additional tricks (such as mapping the I/O range somewhere else, ugh). Any PA1.0 box around on which we waste more than 256 MB ? Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 6:21 ` Philipp Rumpf @ 1999-11-17 18:57 ` Stan Sieler 1999-11-17 19:29 ` Philipp Rumpf 1 sibling, 0 replies; 45+ messages in thread From: Stan Sieler @ 1999-11-17 18:57 UTC (permalink / raw) To: Philipp Rumpf; +Cc: parisc-linux Re: > To support 2 GB (instead of 1.75 GB) we indeed need to do some additional > tricks (such as mapping the I/O range somewhere else, ugh). > > Any PA1.0 box around on which we waste more than 256 MB ? I'm not sure exactly what you mean, but IIRC, the 8x2 can have up to about 720 MB of memory on it (been there, seen it running), although that's more than the HP marketing blessed maximum. The 832/822 is PA-RISC 1.0. (sched.models) (HP-PB machine) -- Stan Sieler sieler@allegro.com www.allegro.com/sieler/wanted/index.html www.allegro.com/sieler ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 6:21 ` Philipp Rumpf 1999-11-17 18:57 ` Stan Sieler @ 1999-11-17 19:29 ` Philipp Rumpf 1999-11-17 20:01 ` Stan Sieler 1 sibling, 1 reply; 45+ messages in thread From: Philipp Rumpf @ 1999-11-17 19:29 UTC (permalink / raw) To: Philipp Rumpf; +Cc: Grant Grundler, parisc-linux On Wed, Nov 17, 1999 at 07:21:41AM +0100, Philipp Rumpf wrote: > Any PA1.0 box around on which we waste more than 256 MB ? Eek, of course I meant PA1.1. Note the 7:21:41AM part as well. Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 19:29 ` Philipp Rumpf @ 1999-11-17 20:01 ` Stan Sieler 1999-11-17 20:33 ` Philipp Rumpf 0 siblings, 1 reply; 45+ messages in thread From: Stan Sieler @ 1999-11-17 20:01 UTC (permalink / raw) To: Philipp Rumpf; +Cc: parisc-linux > > Any PA1.0 box around on which we waste more than 256 MB ? > > Eek, of course I meant PA1.1. Note the 7:21:41AM part as well. Yes, many. 8x7 can go up to 1.5 GB (768 MB according to HP marketing). Others, too. -- Stan Sieler sieler@allegro.com www.allegro.com/sieler/wanted/index.html www.allegro.com/sieler ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 20:01 ` Stan Sieler @ 1999-11-17 20:33 ` Philipp Rumpf 0 siblings, 0 replies; 45+ messages in thread From: Philipp Rumpf @ 1999-11-17 20:33 UTC (permalink / raw) To: Stan Sieler; +Cc: Philipp Heinrich Rumpf, parisc-linux > > > Any PA1.0 box around on which we waste more than 256 MB ? > > Eek, of course I meant PA1.1. Note the 7:21:41AM part as well. > Yes, many. 8x7 can go up to 1.5 GB (768 MB according to HP marketing). > Others, too. Note the "waste". 1.75 GB we can map without any problem. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 16:08 ` Philipp Rumpf 1999-11-16 17:14 ` Alan Cox @ 1999-11-16 21:43 ` Frank Rowand 1999-11-17 6:12 ` Philipp Rumpf 1999-11-17 8:14 ` Philippe Benard 2 siblings, 1 reply; 45+ messages in thread From: Frank Rowand @ 1999-11-16 21:43 UTC (permalink / raw) To: parisc-linux Philipp Rumpf wrote: > > > This is just one of several recent messages dealing with the issues caused by > > locating the kernel at virtual address 0xc0000000 instead of 0x00000000. I > > still don't understand why the kernel can't be at zero, even though several > > people have tried to explain it to me. Can anyone provide a clear > > explanation? > > Advantages of mapping the kernel at 0xc000 0000 (or 0x8000 0000/0xe000 0000) > < stuff deleted > > > Disadvantages > - C code that needs to run in real mode seems to be rather fragile, given Paul's recent experience with head.c. (I think it was Paul.) - A single instance of code can't (easily) be coded to be able to run both in real mode and in virtual mode. - It's real easy to mis-code real mode assembly (eg. use the PA() macro when it shouldn't be used or don't use it when it should be used). - Issues with coherent I/O. < stuff deleted > -Frank ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 21:43 ` Frank Rowand @ 1999-11-17 6:12 ` Philipp Rumpf 1999-11-17 18:56 ` Frank Rowand 0 siblings, 1 reply; 45+ messages in thread From: Philipp Rumpf @ 1999-11-17 6:12 UTC (permalink / raw) To: frowand; +Cc: parisc-linux > - C code that needs to run in real mode seems to be rather fragile, given > Paul's recent experience with head.c. (I think it was Paul.) We do need some tricks to get C code to run in real-mode, and several people are working on them. When the kernel is mapped, we need to be extremely careful not to hit a vmalloced area or something when we call C code that should run in virtual mode in real mode. It's letting the code fail obviously, for all cases, or letting easy code work now and wait for subtle bugs to occur. Which one do you prefer ? > - A single instance of code can't (easily) be coded to be able to run both > in real mode and in virtual mode. That's true when we map the kernel at 0x0000 0000 as well. Again, it's an obvious bug or a subtle one. > - It's real easy to mis-code real mode assembly (eg. use the PA() macro when > it shouldn't be used or don't use it when it should be used). Is there any situation where you should not use PA() for a symbol in real-mode assembly ? I agree branches are a bit difficult because the right way to code them is .+(PA(symbol)-PA(.)) but symbol right now happens to work (this is good luck). I agree the PA() macro isn't a nice thing to use and if we have a chance to get rid of it, we should have a serious look. > - Issues with coherent I/O. I don't see them. I see issues with the architecture specification, but I do not believe there is any actual hardware around that depends on this issues. As far as I understood the implementation of cache-coherent I/O, what basically happens is: - I/O controller wants to access physical adress 0x1234 5678 - the I/O controller puts a special cycle on the bus it shares with the CPU(s) that basically says "hey, if you have address 0x1234 5678 in your cache and need to write it back, do it now" - all CPUs have a look at the cache lines corresponding to 0x1234 5678, which is the same cache lines as the one used for 0x9234 5678 (very very likely). - all CPUs have a look at the tags of these cache lines, which happen to be physical so 0x1234 5678 and 0x9234 5678 result in the same thing again. - the guilty CPU gives back the cache line, or none does if the line doesn't happen to be in the cache. Furthermore, there is no publically documented coherent I/O system from HP yet so we have just to assume hardware we only heard rumours about will be sane once we get to see it. Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 6:12 ` Philipp Rumpf @ 1999-11-17 18:56 ` Frank Rowand 1999-11-17 22:05 ` Philipp Rumpf 0 siblings, 1 reply; 45+ messages in thread From: Frank Rowand @ 1999-11-17 18:56 UTC (permalink / raw) To: Philipp Rumpf; +Cc: parisc-linux Philipp Rumpf wrote: > > > - C code that needs to run in real mode seems to be rather fragile, given > > Paul's recent experience with head.c. (I think it was Paul.) > > We do need some tricks to get C code to run in real-mode, and several people > are working on them. > > When the kernel is mapped, we need to be extremely careful not to hit a > vmalloced area or something when we call C code that should run in virtual > mode in real mode. > > It's letting the code fail obviously, for all cases, or letting easy code work > now and wait for subtle bugs to occur. > > Which one do you prefer ? > > > - A single instance of code can't (easily) be coded to be able to run both > > in real mode and in virtual mode. > > That's true when we map the kernel at 0x0000 0000 as well. Again, it's > an obvious bug or a subtle one. You missed my point. An example: the os_hpmc() that I wrote only works in real mode. If I wanted it to be able to run in virtual mode, I would either have to write a parallel version that uses virtual addresses (instead of physical) or every time I used an address I would have to choose whether to use the physical or virtual address, based on whether address translation was turned on or not. If the kernel is equivalently mapped, this problem goes away. > > - It's real easy to mis-code real mode assembly (eg. use the PA() macro when > > it shouldn't be used or don't use it when it should be used). > > Is there any situation where you should not use PA() for a symbol in real-mode > assembly ? I agree branches are a bit difficult because the right way to code > them is > > .+(PA(symbol)-PA(.)) > > but > > symbol > > right now happens to work (this is good luck). Look at os_hpmc() and see where it was not appropriate to use the PA() macro. > I agree the PA() macro isn't a nice thing to use and if we have a chance to get > rid of it, we should have a serious look. > > > - Issues with coherent I/O. > > I don't see them. I see issues with the architecture specification, but I do > not believe there is any actual hardware around that depends on this issues. > > As far as I understood the implementation of cache-coherent I/O, what basically > happens is: > > - I/O controller wants to access physical adress 0x1234 5678 > - the I/O controller puts a special cycle on the bus it shares with the CPU(s) > that basically says "hey, if you have address 0x1234 5678 in your cache and > need to write it back, do it now" > - all CPUs have a look at the cache lines corresponding to 0x1234 5678, which > is the same cache lines as the one used for 0x9234 5678 (very very likely). > - all CPUs have a look at the tags of these cache lines, which happen to be > physical so 0x1234 5678 and 0x9234 5678 result in the same thing again. > - the guilty CPU gives back the cache line, or none does if the line doesn't > happen to be in the cache. > > Furthermore, there is no publically documented coherent I/O system from HP yet > so we have just to assume hardware we only heard rumours about will be sane > once we get to see it. So listen to the people who have read that documentation (like Grant). > Philipp Rumpf > > --------------------------------------------------------------------------- > To unsubscribe: send e-mail to parisc-linux-request@thepuffingroup.com with > `unsubscribe' as the subject. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 18:56 ` Frank Rowand @ 1999-11-17 22:05 ` Philipp Rumpf 1999-11-17 22:39 ` John David Anglin 1999-11-17 23:02 ` Frank Rowand 0 siblings, 2 replies; 45+ messages in thread From: Philipp Rumpf @ 1999-11-17 22:05 UTC (permalink / raw) To: frowand; +Cc: Philipp Rumpf, parisc-linux > You missed my point. An example: the os_hpmc() that I wrote only works in > real mode. If I wanted it to be able to run in virtual mode, Why would you want to run any code which needs to be run in real-mode in the first place in virtual mode ? Any examples ? > I would either > have to write a parallel version that uses virtual addresses (instead of > physical) or every time I used an address I would have to choose whether to > use the physical or virtual address, based on whether address translation was > turned on or not. Or you would have to disable address translation before calling the code. Dis- abling address translation is what happens for every interrupt(ion), including TLB insert handlers which are very common. So it shouldn't be a big deal. > If the kernel is equivalently mapped, this problem goes away. If it exists in the first place. > Look at os_hpmc() and see where it was not appropriate to use the PA() macro. Just what I said. You use symbol where the "right" thing to do would be .+(PA(symbol)-PA(.)) But we're working on that (so you can just use "symbol" for code references, as it is real-mode code. You also will be able to use "symbol" if it's a real-mode data reference. Sounds right to me). >>Furthermore, there is no publically documented coherent I/O system from HP yet >> so we have just to assume hardware we only heard rumours about will be sane >> once we get to see it. > So listen to the people who have read that documentation (like Grant). What I really want to know is which algorithm do recent CPUs use to get the cache bank index / cache tag. The answer is either "Just the obvious", "just the obvious but we XOR some bits from the space registers in" or "we're doing tricks to special-case those accesses were the virtual address matches the physical, for no apparent reason". I don't believe the latter to be the case, but we just can't be sure until we get documentation (and would violate the architecture specification if the proof below turns out to be valid). I would agree it would be best to change the design to work according to the architecture specification if the restriction the "cache coherency issues" (which might exist) looked sane at all. It doesn't. [note: what follows is what I think is a proof we can do what I want to do. It is very likely to contain formal/grammatical mistakes but I think it should be valid nontheless]. The Rules 1. A physical and a virtual address refer to the same physical address (and caches work) if they are equal. (and obviously the virtual address translates to the physical). If a virtual address and its mapping satisfy this rule, the address is said to be _equivalently mapped_. 2. Two virtual addresses refer to the same physical address (and caches work) if they are equal modulo 2^20. (and obviously translate to the same physical address). If two virtual addresses satisfy this rule, they are said to be _equivalent aliases_. p0 contains physical address, is used as physical address p1 contains phys. addr, is used as virtual address, equivalently mapped p2 is an equivalent alias of p1, but not necessarily equivalently mapped. p0 and p1 satisfy Rule 1. p1 and p2 satisfy Rule 2. Therefore, p1 and p2 refer to the same physical address, and p0 and p1 refer to the same physical address. p0 and p2 refer to the same physical address if p1 is ever used to access memory. There is no requirement for the access using p1 to occur before the access using p0 or the access using p2. There is no requirement for the access using p1 to occur within a certain time or number of instructions after the access using p0 or the access using p2. Since delaying the access using p1 indefinitly is possible and there is no feasible way for the cache system to verify the access won't occur (except in certain cases which never occur during normal operation), there is no reason for the access using p1 ever to happen. Therefore, a physical address and a virtual address are guaranteed to refer to the same physical address if the virtual address is an equivalent alias of the equivalent mapping of the physical address. Any mistakes to point out ? Misunderstandments in the specification ? Other- wise could the specification please get fixed ? Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 22:05 ` Philipp Rumpf @ 1999-11-17 22:39 ` John David Anglin 1999-11-17 22:52 ` Philipp Rumpf 1999-11-17 23:02 ` Frank Rowand 1 sibling, 1 reply; 45+ messages in thread From: John David Anglin @ 1999-11-17 22:39 UTC (permalink / raw) To: Philipp Rumpf; +Cc: frowand, Philipp.H.Rumpf, parisc-linux I have been trying to figure outout why the kernel that I built with the default configuration dies after going virtual. It seems to me that some aspects of PA compilers and linkers haven't been taken into account. One particular thing to note is that long branches are done via stubs which use interspace branches (i.e., they use the space registers). There is a builtin assumption that the four quadrant model is being used. Since we are linking to 0xc0000000, sr7 is being used for long branches. Here is an example called from sys_pipe: 0xc0015538 <pdc_console_init+88>: ldil -3ff72800,r1 0xc001553c <pdc_console_init+92>: be,n 3c0(sr7,r1) I can't find where sr7 is initialized. Dave -- J. David Anglin dave.anglin@nrc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6605) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 22:39 ` John David Anglin @ 1999-11-17 22:52 ` Philipp Rumpf 1999-11-17 23:37 ` Stan Sieler 0 siblings, 1 reply; 45+ messages in thread From: Philipp Rumpf @ 1999-11-17 22:52 UTC (permalink / raw) To: John David Anglin; +Cc: Philipp Heinrich Rumpf, frowand, parisc-linux > Here is an example called from sys_pipe: > > 0xc0015538 <pdc_console_init+88>: ldil -3ff72800,r1 > 0xc001553c <pdc_console_init+92>: be,n 3c0(sr7,r1) > > I can't find where sr7 is initialized. It isn't, we don't need to. This branches to -0x3ff72800 + 0x3c0 = 0xc008dbc0. Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 22:52 ` Philipp Rumpf @ 1999-11-17 23:37 ` Stan Sieler 1999-11-18 0:09 ` Philipp Rumpf 0 siblings, 1 reply; 45+ messages in thread From: Stan Sieler @ 1999-11-17 23:37 UTC (permalink / raw) To: Philipp Rumpf; +Cc: parisc-linux Re: > > 0xc0015538 <pdc_console_init+88>: ldil -3ff72800,r1 > > 0xc001553c <pdc_console_init+92>: be,n 3c0(sr7,r1) > > > > I can't find where sr7 is initialized. > > It isn't, we don't need to. This branches to > -0x3ff72800 + 0x3c0 = 0xc008dbc0. I must be missing something...the above should branch to: sr7.0xc008dbc0, not to "0xc008dbc0". I.e., you specified SR7 in the BE instruction, so it gets used. Of course, if you'd said: BE,N 3c0(0,r1) the result would effectively be the same (because it's a short address, we grab the upper two bits of $c008dbc0, add 4, and therefore use SR7 as the space register). So, SR7 indeed needs to be set correctly...but since I haven't looked at the surrounding code... -- Stan Sieler sieler@allegro.com www.allegro.com/sieler/wanted/index.html www.allegro.com/sieler ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 23:37 ` Stan Sieler @ 1999-11-18 0:09 ` Philipp Rumpf 1999-11-18 0:43 ` Frank Rowand 0 siblings, 1 reply; 45+ messages in thread From: Philipp Rumpf @ 1999-11-18 0:09 UTC (permalink / raw) To: Stan Sieler; +Cc: Philipp Heinrich Rumpf, parisc-linux > I must be missing something...the above should branch > to: sr7.0xc008dbc0, not to "0xc008dbc0". I.e., you specified SR7 > in the BE instruction, so it gets used. No, we didn't. We set the space register selection field to '00' which according to the documentation means you select SR7 but in reality means you don't want to have anything to do with space registers. Just think of it as flat address 0xc008dbc0 and of SR[123]:0xc008dbc0 as "real" segmented addresses. > So, SR7 indeed needs to be set correctly...but since I haven't > looked at the surrounding code... SR0, SR4, SR5, SR6, SR7 shouldn't ever need to be set to different values for flat 4 gig code. SR1, SR2, SR3 you use only for "real" segmented code. (This is based on what the C compiler does, and what changed with PA2.0) Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-18 0:09 ` Philipp Rumpf @ 1999-11-18 0:43 ` Frank Rowand 1999-11-18 1:35 ` Frank Rowand 1999-11-18 5:33 ` John David Anglin 0 siblings, 2 replies; 45+ messages in thread From: Frank Rowand @ 1999-11-18 0:43 UTC (permalink / raw) To: Philipp Rumpf; +Cc: Stan Sieler, parisc-linux Philipp Rumpf wrote: > > > I must be missing something...the above should branch > > to: sr7.0xc008dbc0, not to "0xc008dbc0". I.e., you specified SR7 > > in the BE instruction, so it gets used. > > No, we didn't. We set the space register selection field to '00' which > according to the documentation means you select SR7 but in reality means > you don't want to have anything to do with space registers. Just think > of it as flat address 0xc008dbc0 and of SR[123]:0xc008dbc0 as "real" > segmented addresses. When address translation is enabled, a space register is *always* used. You cannot turn that off. Specifying '00' in the space register select bits really does mean that you use space register 7 to calculate the 64 bit virtual address. (Humor me, and pretend that all implementations of space registers are 32 bits, even though they aren't.) If you "don't want to have anything to do with space registers", you can put the same value (such as zero) in all of the space registers. > > So, SR7 indeed needs to be set correctly...but since I haven't > > looked at the surrounding code... > > SR0, SR4, SR5, SR6, SR7 shouldn't ever need to be set to different values > for flat 4 gig code. SR1, SR2, SR3 you use only for "real" segmented code. > > (This is based on what the C compiler does, and what changed with PA2.0) > > Philipp Rumpf I don't understand. Are you saying that there is a single 4gByte virtual address range that is shared by the kernel and all user processes? Or do you plan to provide a separate 4gByte virtual address range to each process/task/thread/whatever? -Frank ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-18 0:43 ` Frank Rowand @ 1999-11-18 1:35 ` Frank Rowand 1999-11-18 5:33 ` John David Anglin 1 sibling, 0 replies; 45+ messages in thread From: Frank Rowand @ 1999-11-18 1:35 UTC (permalink / raw) To: parisc-linux Frank Rowand wrote: > > When address translation is enabled, a space register is *always* used. Ok, ok, slight overstatement. Of course the various load absolute instructions don't use space registers... -Frank ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-18 0:43 ` Frank Rowand 1999-11-18 1:35 ` Frank Rowand @ 1999-11-18 5:33 ` John David Anglin 1999-11-18 8:02 ` Philippe Benard 1 sibling, 1 reply; 45+ messages in thread From: John David Anglin @ 1999-11-18 5:33 UTC (permalink / raw) To: frowand; +Cc: Philipp.H.Rumpf, sieler, parisc-linux > > Philipp Rumpf wrote: > > > > > I must be missing something...the above should branch > > > to: sr7.0xc008dbc0, not to "0xc008dbc0". I.e., you specified SR7 > > > in the BE instruction, so it gets used. I agree. The contents of SR7 is loaded into the instruction space id queue. The example was a stub. Stubs are generated by the compiler/linker when a relative branch is too far. The BE instruction was used (as opposed to a BL instruction) to allow the branch to go anywhere within the assumed 4GB address space model. The linker used SR7 because it determined that the branch was to an address in the 4th quadrant of the 4GB address space. > > > > No, we didn't. We set the space register selection field to '00' which > > according to the documentation means you select SR7 but in reality means > > you don't want to have anything to do with space registers. Just think > > of it as flat address 0xc008dbc0 and of SR[123]:0xc008dbc0 as "real" > > segmented addresses. This is only for data accesses. Instructions use the full space register specifications. When an interspace branch is taken with sr7, for example, the contents of sr7 becomes the new contents of the space id queue. > When address translation is enabled, a space register is *always* used. > You cannot turn that off. Specifying '00' in the space register > select bits really does mean that you use space register 7 to calculate > the 64 bit virtual address. (Humor me, and pretend that all > implementations of space registers are 32 bits, even though they aren't.) > > If you "don't want to have anything to do with space registers", you > can put the same value (such as zero) in all of the space registers. > > > > > So, SR7 indeed needs to be set correctly...but since I haven't > > > looked at the surrounding code... > > > > SR0, SR4, SR5, SR6, SR7 shouldn't ever need to be set to different values > > for flat 4 gig code. SR1, SR2, SR3 you use only for "real" segmented code. > > > > (This is based on what the C compiler does, and what changed with PA2.0) > > > > Philipp Rumpf > > I don't understand. Are you saying that there is a single 4gByte virtual > address range that is shared by the kernel and all user processes? Or > do you plan to provide a separate 4gByte virtual address range to each > process/task/thread/whatever? I think what is being suggested is to run with the space registers all zero and swap the TLB contents on context switches in order to changing the mapping from virtual to physical. However, I doubt this is efficient. On the otherhand, the space registers could be swapped on context switches, and the kernel and each process could have their own 4GB virtual space. The mapping of program virtual addresses to hardware virtual addresses would still be flat. I can't see any reason to segment the address space since the hardware virtual address space is large enough to accomodate more processes than would ever be needed. I assume here that we aren't concerned with running on level 0 machines without space registers. In this scenario, cache usage would reflect the scheduling priorities of the OS. I stress that, because the PA architecture has space registers, a program's virtual address space (0 - 4GB) doesn't have to be the same as its hardware virtual address space. My original point was that I couldn't see where the space registers were initialized prior to the transition to virtual operation. They may in fact be initialized to zero by the boot loader prior to transfer to the kernel (cf., head.S). Does the above suggestion regarding the usage of space registers make any sense? Dave -- J. David Anglin dave.anglin@nrc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6605) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-18 5:33 ` John David Anglin @ 1999-11-18 8:02 ` Philippe Benard 1999-11-18 20:37 ` John David Anglin 0 siblings, 1 reply; 45+ messages in thread From: Philippe Benard @ 1999-11-18 8:02 UTC (permalink / raw) To: John David Anglin; +Cc: frowand, Philipp.H.Rumpf, sieler, parisc-linux John David Anglin wrote: > > I think what is being suggested is to run with the space registers all > zero and swap the TLB contents on context switches in order to changing > the mapping from virtual to physical. However, I doubt this is efficient. This is what I undesrtand now too, at first I though it was flat 4Gb shared between the OS and the processes (ala 2Gb each), then space flip enter in action with flat 4Gb per threads (pardon proceses for now), and then a flat 4Gb for the kernel itself. I don't really see the implementation of a previous mail of someone saying the user VAS quad usage would be TEXT/DATA/SHARED1/SHARE2 if all the SR's are the same, I still have some ProtID problem I don't see how efficiently sharing is implemented (TEXT/SHARED1/SHARED2). I bet this is adressed, I just don't have the design document. And more than that, context switching is a pain on all machine, register save/restore, priv promot, etc... cost a lot, I don't see a global TLB purge on context switch as a booster (again assuming it is what happen) And my last point with no proof. 1) If now processes and the OS have a flat 4Gb in front of them, then the OS can be located anywhere (beside some things like 0 for NULL deref traping) i.e any quad 2) If a concept like equivalently map exist, it may or may not provide a gain. This is a degree of liberty. 3) Choosing to locate the kernel text in second, third or fourth quad prevent using of equiv mem. remove 1 degre of liberty. 4) For now it is claimed that equiv mem address non-existing problem (sic), then is useless, then not considered, then any quad could be choosen and the one choosen is the one that remove the equiv map potential, actually since there are 4 quad and the quad came from a red hat, we got 3 chances over 4 to get one that will remove us 1 degree of liberty. My feeling is since equiv map doesn't seems to be needed in this implementation, and since it was written that any quad could have been choosen, I would have choosen the one that still allow me to have quiv mem should a boo boo happen and made it necessary, in other words I like to keep my degree of liberty even if I'm not using it (for now). ------------------------------------------ Designing a ASL (addr space layout) implementation onto an architecture (specially this one) is something tough, and not uniq. For instance designing the ASL for 64bit wide for HP-UX (with the constraint of being able to run narrow process un-recompiled) was pretty interesting and did provide several options. I admit I didn't browse the web that much but will accept any pointer, for now I desesperatly looking at the puffin/doc page and see nothing but pure HP doc, no linux design options. What I'd like is a document that gives the design orientation, all this discovery about space usage is pure guessing from mail to mail, boucing from 1 flat 4Gb for all to 4Gb per process/kernel. Some may say, linux is not doc, it is hack n run, but on the long run I'm affraid that hack n run will type more text (try and fail code) than writing the design options. For instance the ASL document for HP-UX wide is 22 pages total, with TOC and figures, and pseudo-code algorithm. Remember Djikstra (well the old timer may, the bambinos may take Kurt Cobain as an example :-) "You should pay the programmers a very good salary, don't hesitate to bump their salary, BUT make them pay any puch in their punched card" The idea behind, think before code after, Kurt tried it the other way, he choose to shoot first and think after oops :-) Actually I did find a book on linux internals (some years ago) that did speak a bit about VAS, but the arch indep part speak a lot about design option based on x86 and MMU) which definilty doesn't apply here then not so indep... Phi ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-18 8:02 ` Philippe Benard @ 1999-11-18 20:37 ` John David Anglin 1999-11-18 22:38 ` Frank Rowand ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: John David Anglin @ 1999-11-18 20:37 UTC (permalink / raw) To: Philippe Benard; +Cc: frowand, Philipp.H.Rumpf, sieler, parisc-linux > > John David Anglin wrote: > > > > I think what is being suggested is to run with the space registers all > > zero and swap the TLB contents on context switches in order to changing > > the mapping from virtual to physical. However, I doubt this is efficient. > > This is what I undesrtand now too, at first I though it was flat 4Gb shared > between the OS and the processes (ala 2Gb each), then space flip enter in > action with flat 4Gb per threads (pardon proceses for now), and then a flat > 4Gb for the kernel itself. The architecture that I proposed was suggested by figure 3-1 of the Precision Architecture and Instruction Set Manual (1989). On a level 1 machine there 2**16-1 virtual 4GB spaces. Thus, each process can have its own virtual space. The physical page directory contains the mapping for the kernel and all processes. The algorithm for updating the TLB(s) from the page directory is very important for efficiency and also for security. I assume threads will run in a processes virtual space for efficiency. > I don't really see the implementation of a previous mail of someone saying the > user VAS quad usage would be TEXT/DATA/SHARED1/SHARE2 if all the SR's are the > same, I still have some ProtID problem I don't see how efficiently sharing is > implemented (TEXT/SHARED1/SHARED2). I bet this is adressed, I just don't have > the design document. This was me again. The TEXT/DATA/SHARED1/SHARE2 architecture is described in the 32-bit PA-RISC Runtime Architecture Document. It is how hpux 10.20 does it. Take a look at Table 1, "Space Register Usage". However, it doesn't really tell you anything about the OS implementation details. I do know that hpux uses different spaces for text and data. As a result, branches that cross quadrants must be interspace branches. The space of the caller must be saved and restored on return. If the same space id is used in SR4-SR7 for any given process, then I don't think it would be necessary to save and restore the instruction space register across calls except for system calls where the virtual space changes. The one advantage to reserving the fourth quadrant for the OS is that in a system call the OS has direct access to the first three quadrants of the processes address space as well as its own space. What I am suggesting here is that the TLB page table mapping for the fourth quadrant would be more or less the same except for access rights for the kernel and user processes. Of course, hardware registers wouldn't be mapped in process space. There could be a gateway page that messes with the mapping to allow the kernel to run at a different location. But then I think the system would have a more difficult job in accessing user virtual space. Possibly, some special provisions would have to be made for PDC calls in virtual mode. I know that there are restrictions using mmap with hpux 10.X. More or less, the problem is that you can't put a shared data area where you want and this causes problems with many apps that assume a more general implementation. The system chooses where shared data goes. This needs to be looked at in more detail. SR0 to SR4 can be changed by a non-privileged process. Thus, access identities have to be setup properly to prevent a process from flipping through the virtual address space and doing bad stuff. The OS must also be prepared for a process that messes with SR4. A nasty process might be able change the space id in SR4 to that of another process or the kernel, and do a system call that writes to this region. For PA_RISC 2.0, the architecture is changed for 64-bit operation. I quote "The OS will use a different address space layout for 64-bit processes, so we will not be able to specify that main program text is at the low end of Quadrant 0, nor will we be able to use absolute addressing at all. This will affect millicode calls, long calls, plabel materialization, and non-PIC literal references. Compilers must avoid any explicit reference to space registers, so there is no need to specify any association between particular segments and quadrants of the address space." Further, the 64-bit architecture defines a single PIC compilation model. I have to say that the 64-bit document as of Version 3.3 (1997) is vague. Probably, we should ignore the 64 bit issues and use an architecture that is good for 32 bit level 1 machines and above. > ------------------------------------------ > Some may say, linux is not doc, it is hack n run, but on the long run I'm > affraid that hack n run will type more text (try and fail code) than writing > the design options. For instance the ASL document for HP-UX wide is 22 pages > total, with TOC and figures, and pseudo-code algorithm. > Remember Djikstra (well the old timer may, the bambinos may take Kurt Cobain > as an example :-) "You should pay the programmers a very good salary, don't > hesitate to bump their salary, BUT make them pay any puch in their punched > card" The idea behind, think before code after, Kurt tried it the other way, > he choose to shoot first and think after oops :-) Agree totally. In summary, I think that each process slot should be assigned a unique space id. At least initially, the same id would be used for SR4-SR7 in each process. Link the OS at 0xc0000000 and go with a 3GB process/1GB OS virtual address model. All the physical memory in the machine can therefore be used, up to the ~ 4GB limit in the PA 1.X architecture. I think this model is compatible with the PA 1.1 architecture and hopefully also the x86 architecture. Dave -- J. David Anglin dave.anglin@nrc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6605) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-18 20:37 ` John David Anglin @ 1999-11-18 22:38 ` Frank Rowand 1999-11-19 4:12 ` Philipp Rumpf 1999-11-19 9:08 ` Philippe Benard 2 siblings, 0 replies; 45+ messages in thread From: Frank Rowand @ 1999-11-18 22:38 UTC (permalink / raw) To: parisc-linux John David Anglin wrote: > > > > > John David Anglin wrote: > > > > > > I think what is being suggested is to run with the space registers all > > > zero and swap the TLB contents on context switches in order to changing > > > the mapping from virtual to physical. However, I doubt this is efficient. > > > > This is what I undesrtand now too, at first I though it was flat 4Gb shared > > between the OS and the processes (ala 2Gb each), then space flip enter in > > action with flat 4Gb per threads (pardon proceses for now), and then a flat > > 4Gb for the kernel itself. > > The architecture that I proposed was suggested by figure 3-1 of the > Precision Architecture and Instruction Set Manual (1989). On a level 1 > machine there 2**16-1 virtual 4GB spaces. Thus, each process can > have its own virtual space. The physical page directory contains > the mapping for the kernel and all processes. The algorithm for > updating the TLB(s) from the page directory is very important for > efficiency and also for security. > > I assume threads will run in a processes virtual space for efficiency. > > > I don't really see the implementation of a previous mail of someone saying the > > user VAS quad usage would be TEXT/DATA/SHARED1/SHARE2 if all the SR's are the > > same, I still have some ProtID problem I don't see how efficiently sharing is > > implemented (TEXT/SHARED1/SHARED2). I bet this is adressed, I just don't have > > the design document. > > This was me again. The TEXT/DATA/SHARED1/SHARE2 architecture is described > in the 32-bit PA-RISC Runtime Architecture Document. It is how hpux 10.20 > does it. Take a look at Table 1, "Space Register Usage". However, it doesn't > really tell you anything about the OS implementation details. I do know > that hpux uses different spaces for text and data. As a result, branches > that cross quadrants must be interspace branches. The space of the caller > must be saved and restored on return. If the same space id is used in SR4-SR7 > for any given process, then I don't think it would be necessary to save > and restore the instruction space register across calls except for system > calls where the virtual space changes. > > The one advantage to reserving the fourth quadrant for the OS is that in > a system call the OS has direct access to the first three quadrants > of the processes address space as well as its own space. What I am > suggesting here is that the TLB page table mapping for the fourth quadrant > would be more or less the same except for access rights for the kernel > and user processes. Of course, hardware registers wouldn't be mapped > in process space. There could be a gateway page that messes with the > mapping to allow the kernel to run at a different location. But then > I think the system would have a more difficult job in accessing user > virtual space. Possibly, some special provisions would have to be made > for PDC calls in virtual mode. > > I know that there are restrictions using mmap with hpux 10.X. More or > less, the problem is that you can't put a shared data area where you want > and this causes problems with many apps that assume a more general > implementation. The system chooses where shared data goes. This needs > to be looked at in more detail. > > SR0 to SR4 can be changed by a non-privileged process. Thus, access > identities have to be setup properly to prevent a process from flipping > through the virtual address space and doing bad stuff. The OS must also > be prepared for a process that messes with SR4. A nasty process might > be able change the space id in SR4 to that of another process or the > kernel, and do a system call that writes to this region. > > For PA_RISC 2.0, the architecture is changed for 64-bit operation. I quote Just to avoid confusion, the following quote is from the 64-Bit Runtime Architecture, and applies to *** 'user-mode applications running in "Wide" mode' ***. > "The OS will use a different address space layout for 64-bit processes, > so we will not be able to specify that main program text is at the > low end of Quadrant 0, nor will we be able to use absolute addressing > at all. This will affect millicode calls, long calls, plabel > materialization, and non-PIC literal references. > > Compilers must avoid any explicit reference to space registers, so > there is no need to specify any association between particular > segments and quadrants of the address space." > > Further, the 64-bit architecture defines a single PIC compilation model. > I have to say that the 64-bit document as of Version 3.3 (1997) is vague. > > Probably, we should ignore the 64 bit issues and use an architecture > that is good for 32 bit level 1 machines and above. > > > ------------------------------------------ > > > Some may say, linux is not doc, it is hack n run, but on the long run I'm > > affraid that hack n run will type more text (try and fail code) than writing > > the design options. For instance the ASL document for HP-UX wide is 22 pages > > total, with TOC and figures, and pseudo-code algorithm. > > Remember Djikstra (well the old timer may, the bambinos may take Kurt Cobain > > as an example :-) "You should pay the programmers a very good salary, don't > > hesitate to bump their salary, BUT make them pay any puch in their punched > > card" The idea behind, think before code after, Kurt tried it the other way, > > he choose to shoot first and think after oops :-) > > Agree totally. > > In summary, I think that each process slot should be assigned a unique > space id. At least initially, the same id would be used for SR4-SR7 in > each process. Link the OS at 0xc0000000 and go with a 3GB process/1GB OS > virtual address model. All the physical memory in the machine can > therefore be used, up to the ~ 4GB limit in the PA 1.X architecture. > I think this model is compatible with the PA 1.1 architecture and hopefully > also the x86 architecture. > > Dave > -- > J. David Anglin dave.anglin@nrc.ca > National Research Council of Canada (613) 990-0752 (FAX: 952-6605) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-18 20:37 ` John David Anglin 1999-11-18 22:38 ` Frank Rowand @ 1999-11-19 4:12 ` Philipp Rumpf 1999-11-19 9:08 ` Philippe Benard 2 siblings, 0 replies; 45+ messages in thread From: Philipp Rumpf @ 1999-11-19 4:12 UTC (permalink / raw) To: John David Anglin Cc: Philippe Benard, frowand, Philipp Heinrich Rumpf, sieler, parisc-linux > > This is what I undesrtand now too, at first I though it was flat 4Gb shared > > between the OS and the processes (ala 2Gb each), then space flip enter in > > action with flat 4Gb per threads (pardon proceses for now), and then a flat > > 4Gb for the kernel itself. > The architecture that I proposed was suggested by figure 3-1 of the > Precision Architecture and Instruction Set Manual (1989). On a level 1 > machine there 2**16-1 virtual 4GB spaces. Thus, each process can > have its own virtual space. The physical page directory contains > the mapping for the kernel and all processes. The algorithm for > updating the TLB(s) from the page directory is very important for > efficiency and also for security. I agree with the description. > I assume threads will run in a processes virtual space for efficiency. Is there any reason we shouldn't do it ? It's just not a per-pid unique id we put into the space registers, it's a per-process(Unix sense) one. > > I don't really see the implementation of a previous mail of someone saying the > > user VAS quad usage would be TEXT/DATA/SHARED1/SHARE2 if all the SR's are the > > same, I still have some ProtID problem I don't see how efficiently sharing is > > implemented (TEXT/SHARED1/SHARED2). I bet this is adressed, I just don't have > > the design document. > > This was me again. The TEXT/DATA/SHARED1/SHARE2 architecture is described > in the 32-bit PA-RISC Runtime Architecture Document. It is how hpux 10.20 > does it. Take a look at Table 1, "Space Register Usage". However, it doesn't > really tell you anything about the OS implementation details. I do know > that hpux uses different spaces for text and data. As a result, branches > that cross quadrants must be interspace branches. The space of the caller > must be saved and restored on return. If the same space id is used in SR4-SR7 > for any given process, then I don't think it would be necessary to save > and restore the instruction space register across calls except for system > calls where the virtual space changes. Using the same value in SR0, SR4-SR7 sounds like it will be the least surprising. Throwing away the whole fourth quadrant when you only need two pages out of it doesn't sound like an especially good idea (though it might be more efficient in very limited circumstances). > The one advantage to reserving the fourth quadrant for the OS is that in > a system call the OS has direct access to the first three quadrants > of the processes address space as well as its own space. It still has (though I think this might be what you meant by "direct"). Load SR3 with the content of SR[04567]. Explicitly use SR3 for all references to user data. > SR0 to SR4 can be changed by a non-privileged process. Thus, access > identities have to be setup properly to prevent a process from flipping > through the virtual address space and doing bad stuff. This is an important point as soon as we have to consider system security in more detail. > The OS must also be prepared for a process that messes with SR4. A nasty > process might be able change the space id in SR4 to that of another process > or the kernel, and do a system call that writes to this region. > For PA_RISC 2.0, the architecture is changed for 64-bit operation. I quote > > "The OS will use a different address space layout for 64-bit processes, > so we will not be able to specify that main program text is at the > low end of Quadrant 0, nor will we be able to use absolute addressing > at all. This will affect millicode calls, long calls, plabel > materialization, and non-PIC literal references. > > Compilers must avoid any explicit reference to space registers, so > there is no need to specify any association between particular > segments and quadrants of the address space." > > Further, the 64-bit architecture defines a single PIC compilation model. > I have to say that the 64-bit document as of Version 3.3 (1997) is vague. The PA1.1 is vague/redundant as well (or at least I'll assume it is until someone points out my mistake). > Probably, we should ignore the 64 bit issues and use an architecture > that is good for 32 bit level 1 machines and above. Basically, the question with PA2.0 is "how do we access user memory out of kernel space". OTOH, as PA2.0 virtual address offsets are 62 bits only, this shouldn't be a problem (are they really 62 bits btw ?). > > ------------------------------------------ > > Some may say, linux is not doc, it is hack n run, but on the long run I'm > > affraid that hack n run will type more text (try and fail code) than writing > > the design options. For instance the ASL document for HP-UX wide is 22 pages > > total, with TOC and figures, and pseudo-code algorithm. > > Remember Djikstra (well the old timer may, the bambinos may take Kurt Cobain > > as an example :-) "You should pay the programmers a very good salary, don't > > hesitate to bump their salary, BUT make them pay any puch in their punched > > card" The idea behind, think before code after, Kurt tried it the other way, > > he choose to shoot first and think after oops :-) > Agree totally. Who is Kurt and why do we have to go around on parisc-linux and insult him ? > In summary, I think that each process slot should be assigned a unique > space id. I agree, and have said so before (Date: Wed, 17 Nov 1999 14:00:19 +0100). > At least initially, the same id would be used for SR4-SR7 in > each process. I don't see the need to ever change this _for PA1.1_. For PA2.0 (wide), our largest flat address space is 62 bits and using the two bits to select space registers is a good idea. > Link the OS at 0xc0000000 and go with a 3GB process/1GB OS > virtual address model. We don't want that. We might want to link the OS at 0x8000 0000 and go with a 4GB process / 2GB OS model. We might want to link the OS at 0x0000 0000 and go with a 4GB process / 4GB OS model. > All the physical memory in the machine can therefore be used, up to the ~ 4GB > limit in the PA 1.X architecture. I think this model is compatible with the > PA 1.1 architecture and hopefully also the x86 architecture. Why does it have to be compatible with the x86 architecture ? Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-18 20:37 ` John David Anglin 1999-11-18 22:38 ` Frank Rowand 1999-11-19 4:12 ` Philipp Rumpf @ 1999-11-19 9:08 ` Philippe Benard 2 siblings, 0 replies; 45+ messages in thread From: Philippe Benard @ 1999-11-19 9:08 UTC (permalink / raw) To: John David Anglin; +Cc: frowand, Philipp.H.Rumpf, sieler, parisc-linux Nice answer at least it clarify some design options John David Anglin wrote: > > The architecture that I proposed was suggested by figure 3-1 of the > Precision Architecture and Instruction Set Manual (1989). On a level 1 > machine there 2**16-1 virtual 4GB spaces. Thus, each process can > have its own virtual space. The physical page directory contains > the mapping for the kernel and all processes. The algorithm for > updating the TLB(s) from the page directory is very important for > efficiency and also for security. > > I assume threads will run in a processes virtual space for efficiency. IMHO efficiency is a side effect, by definition user threads are part of a process, i.e sharing the process VAS, so your assumption is more than valid. > > > This was me again. The TEXT/DATA/SHARED1/SHARE2 architecture is described > in the 32-bit PA-RISC Runtime Architecture Document. It is how hpux 10.20 > does it. Take a look at Table 1, "Space Register Usage". However, it doesn't > really tell you anything about the OS implementation details. I do know > that hpux uses different spaces for text and data. Not always, hpux have a lot of chatr(1) tricks to manage the quadran usage (again don't beat me, this is a consequence of segmented architecture). So for a regular (non-sharable text) the magic number is EXEC_MAGIC and for those kind of executable, the quad1 and quad2 (i.e the index from the 2 high bit of 32 bit addr is 0,1) are the same, this allow the DATA/BSS to be stuffed righ after the TEXT and then having alinear 2GB (whoa) linear space, yet leaving the quad3 and quad4 for sharable data (shm and mmap). On the other hand a shared executable (SHARE_MAGIC) does have quad1 (for TEXT) that is shared among several process, then having its own uniq space since shared, all the process using this area share it with the same space AND same offset (more comment below). The quad2 is process private DATA, then indeed each SHARE_MAGIC process does have its own private space (different space id). > As a result, branches > that cross quadrants must be interspace branches. The space of the caller > must be saved and restored on return. If the same space id is used in SR4-SR7 > for any given process, then I don't think it would be necessary to save > and restore the instruction space register across calls except for system > calls where the virtual space changes. In my EXEC_MAGIC does have a common spaceid for quad1 and quad2 and as in your example, where you generalize with all four quad equal, that's true that there is no SR jazz to do for calls. This bring me to another question, how sharing is done with linux on hppa Are you planing of using 'limited' aliasing? I don't see for now how sharing and copy-on-write could be accomplished, if thoug those concept are high level concern, they are driven by the architecture capability and virtual addr alias is one of the weak area of hppa IMHO. > > The one advantage to reserving the fourth quadrant for the OS is that in > a system call the OS has direct access to the first three quadrants > of the processes address space as well as its own space. This is an interesting point, does the OS must share a lot with a process? Personally I think the set of threads UAREA look enough to me I don't see the point of having the OS being capable of r/w on the 3Gb of the process. However thing that doesn't exist on unix and I think would be nice to have is a shared area between the process and the kernel, where the process (and its threads) can readonly, and where the kernel can r/w, the idea is to stick there some credentials, even stuff that is usually in the proc.h kthread.h lik pid, tid, lastrun, various time etc, this to allow a process/thread to get this data as pointer deref, I know I needed this of process tracing, the tracing lib would get the CR16, and manage the pa1.1 roll-over by watching a change in the elapsed time inthe uarea, since the CR16 roll-over in multi seconds, while the OS does an update on each schedule that is sure to happen at least every 10ms. To day the UAREA is part of the user VAS but is not readable and off course not writable, I think that a sharable part would be helpfull... So I'm still curious about sharing on linux + hppa. If someone does have infos on this... Phi -- mailto:phi@hpfrcu81.france.hp.com WTEC Project. Kernel debugging tools ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 22:05 ` Philipp Rumpf 1999-11-17 22:39 ` John David Anglin @ 1999-11-17 23:02 ` Frank Rowand 1999-11-17 23:25 ` Philipp Rumpf 1 sibling, 1 reply; 45+ messages in thread From: Frank Rowand @ 1999-11-17 23:02 UTC (permalink / raw) To: parisc-linux Philipp Rumpf wrote: < stuff deleted - I don't need to argue, my face is already turning blue... > > What I really want to know is which algorithm do recent CPUs use to get the > cache bank index / cache tag. < stuff deleted > > [note: what follows is what I think is a proof we can do what I want to do. > It is very likely to contain formal/grammatical mistakes but I think it should > be valid nontheless]. > > The Rules <stuff deleted > > Any mistakes to point out ? Misunderstandments in the specification ? Other- > wise could the specification please get fixed ? > > Philipp Rumpf I didn't bother reading the proof, so no arguments about it. I already thought through what is probably a similar proof for myself. The problem is that no matter how obvious it is to us software folks that it would be braindead, illogical, or nearly impossible for the hardware to behave in strange ways, the hardware folks are incredibly devious at making the hardware more effective, within the constraints of the architecture (and, on occasion, outside the constraints). This results in behaviour that may seem unreasonable to a software person. ***** I attempt to code within the ARCHITECTURE, not to implement what specific hardware implementations let me get away with. That way I don't get burnt by the creative hardware engineers, who might be pushing the envelope. ***** -Frank ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-17 23:02 ` Frank Rowand @ 1999-11-17 23:25 ` Philipp Rumpf 0 siblings, 0 replies; 45+ messages in thread From: Philipp Rumpf @ 1999-11-17 23:25 UTC (permalink / raw) To: frowand; +Cc: parisc-linux > The problem is that no matter how obvious it is to us software folks that it > would be braindead, illogical, or nearly impossible for the hardware to > behave in strange ways, the hardware folks are incredibly devious at making > the hardware more effective, within the constraints of the architecture > (and, on occasion, outside the constraints). This results in behaviour that > may seem unreasonable to a software person. If the hardware gives us cache problems, it is not in compliance with the architecture specification (OR my proof is wrong, but you "didn't bother to read it"). If it doesn't, we're just fine. (OR ...). > ***** I attempt to code within the ARCHITECTURE, not to implement what > specific hardware implementations let me get away with. That way I don't > get burnt by the creative hardware engineers, who might be pushing the > envelope. ***** Shrug. Both versions are in compliance with the architecture. (OR ...) Note especially that on PA2.0 in wide mode, there is an explicit exception as well, so mapping the memory at 0x0000 0000 0000 0000, 0x4000 0000 0000 0000, 0x8000 0000 0000 0000 or 0xc000 0000 0000 0000 is all okay. Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-16 16:08 ` Philipp Rumpf 1999-11-16 17:14 ` Alan Cox 1999-11-16 21:43 ` Frank Rowand @ 1999-11-17 8:14 ` Philippe Benard 2 siblings, 0 replies; 45+ messages in thread From: Philippe Benard @ 1999-11-17 8:14 UTC (permalink / raw) To: Philipp Rumpf; +Cc: frowand, parisc-linux, Alex deVries Philipp Rumpf wrote: > > > This is just one of several recent messages dealing with the issues caused by > > locating the kernel at virtual address 0xc0000000 instead of 0x00000000. I Hi All, Sorry for beeing a little out of phase with you this is the magic of the timezone :-) On the other I can read all the trail before reply. First of all I may have started the 'trouble' by saying low virtual addr (that can be mapped to low physical addr generally know as equiv map) I didn't meant necessary starting at 0x0, I think I even say start mapping at anything after page 0, it seems obvious that catch NULL deref in kernel mode is essential. Regarding factoring kernel text pages, it could be accomplished by either block tlb or super page depending on arch capability. There where a reference about 'HP way of thinking the OS' versus rest of the world, I would like to mention here that based on the french proverb 'who loves well flame well' I love hp-ux and I'm the first to blame a lot of its weaknesse. When I writing here I put my 'HP view' on the side, yet I'm thinking the architecture we want to write an OS on will influence the OS design. Here we have a 1Gb segmented machine, with virtual cache, and TLB, this is definitly different from a 68K linear 4Gb with MMU. I think mastering hppa is tough, and a laverage what the hpux designer did well or even goofed if any would save time. There is a reference about recognising a user pointer and a kernel pointer, that something I don't understand due to lack of knowledge of the underlying design. This sounds like (hope I'm wrong) user pointer and kernel pointer are recognisable by their hi-bits (i.e quad selector), this would mean that 0xCafeCafe is recognised as kernel addr while 0x000F0FF is recognised as a user addr, this imply the 0xc....... is not part of the user virtual adresse space, this imply a user process is not 4 Gb capable (well it never will on PA1.1 since we indeed need a UAREA, IO space, gateway page) This fear about user proces VAS (virtual adresse space) seem confirmed by another reference to 'avoiding to use the space register at all) using space register is the way to design multiple VAS, i.e one for user process for instance, and one completly different for the kernel space, i.e how we use those quadran in a given mode, I saw a reference about the user space layout (text/data/shared data) but yet nothing abot the kernel, (kernel text/ data/ buffer cache/ other stuff) I think the user virtual adress space must be design to allow a somewhat linear view, allowing big chunck of vm, for instance a 2.5 Gb malloc (or mmap). HP design is not good at that, while a new OS can learn what was wrong with HP-UX and try to do better, on the other hand if the 0xcxxxxxxx is simply removed from the user VAS, I think it is worst than HP-UX (process limited to 3 Gb by lack of last quad) Now an hacker need. Dunno how it fit with current implementation of vmlinux, but in case someday someone want to design a kernel tracer, that can be install/desinstalled on the fly, a common hppa dependent implementation is to divert the kernel code flow, i.e patch an instruction with a branch somewhere, this could be assimilated to patching with a break instruction, as any debuger the original instruct before patch would be saved for differed execution. For break kind of implementation, this would mean debugger kind of ptrace, i.e get the trap, restore instruction , single step, re-install the break, blah blah. this is costly for tracing, another implementation is simply to stick a branch to a tracer stub, and the tracer will one way or another execute the original saved instruction, this limit patchable instruction to one that can be defered (i.e non pc relative and non branch or delay slot kind) let simplify with ld/st only. Then you will discover the only possible instruction to use for branching is the BLE within low virtual addr i.e using no base register BLE trace_stub(0,0) The tracer got the caller rp in r31 and then can identify the tracepoint and manage it. A single instruction patch is the easiest way to do it specillay on MP, no spinlock needed to patch the trace point. This is what ktracer is doing on hpux, this is what I'm doing in my own kernel tracer, removing this possibilty somewhat reduce the supportability of the target OS. I understand though that linux come with sources, then it is just a matter to turn some compile flage to get any trace you want, but it happen that on some site critical, being able to trace on a fly a kernel that was not compiled for this on purpose is an ass saver (sometime) (well the tracer may panic :-) So the current 0xc0000000 may still allow this kind of tracer assuming we can get kernel pages on the fly in those low virtual addr. I rekon it is not easy to see its old habit going away, there where interesting thing with hpux :-) well long live to vmlinux though, I bet my kids are laughing (they are linux fan then we got animated meals :-) Phi -- mailto:phi@hpfrcu81.france.hp.com WTEC Project. Kernel debugging tools ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [parisc-linux] depi? 1999-11-15 8:08 [parisc-linux] depi? Alex deVries 1999-11-15 7:24 ` Jeffrey A Law 1999-11-15 7:36 ` Stan Sieler @ 1999-11-15 8:19 ` Philipp Rumpf 2 siblings, 0 replies; 45+ messages in thread From: Philipp Rumpf @ 1999-11-15 8:19 UTC (permalink / raw) To: Alex deVries; +Cc: parisc-linux > ; Get ready for phys->virt transition > ; First order of business is to adjust some pointers > depi 3,1,2,%arg0 ; phys->virt(free mem ptr) > depi 3,1,2, %sp ; phys->virt SP > depi 3,1,2, %dp ; p2v DP > > in head.S? > > I don't have a 'depi' in the index of my PA 2.0 assembler book. I have a > depwi and a depdi though. This should be equivalent to what depwi is in PA 2.0 (some mnemonics changed). More importantly, those instructions shouldn't be here. There are standard macros tophys and tovirt defined to do just the right thing to translate logical addresses from/to physical ones. Someone should go around and clean it up. If your file uses depi to do what tophys / tovirt should do, you have 24 hours to clean it up. If you don't, I'll clean up the rest of your file as well. You have been warned. Philipp Rumpf ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~1999-11-19 9:04 UTC | newest] Thread overview: 45+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 1999-11-15 8:08 [parisc-linux] depi? Alex deVries 1999-11-15 7:24 ` Jeffrey A Law 1999-11-15 7:36 ` Stan Sieler 1999-11-15 8:25 ` Philipp Rumpf 1999-11-15 23:14 ` Frank Rowand 1999-11-16 0:26 ` John David Anglin 1999-11-16 12:39 ` Matthew Wilcox 1999-11-16 17:17 ` Philipp Rumpf 1999-11-16 8:26 ` Philippe Benard 1999-11-16 12:20 ` Alan Cox 1999-11-16 11:53 ` Philippe Benard 1999-11-16 12:58 ` Alan Cox 1999-11-16 15:55 ` John David Anglin 1999-11-17 13:00 ` Philipp Rumpf 1999-11-16 12:35 ` Matthew Wilcox 1999-11-16 16:08 ` Philipp Rumpf 1999-11-16 17:14 ` Alan Cox 1999-11-16 16:47 ` Philipp Rumpf 1999-11-16 17:50 ` Alan Cox 1999-11-17 0:06 ` Grant Grundler 1999-11-17 6:21 ` Philipp Rumpf 1999-11-17 18:57 ` Stan Sieler 1999-11-17 19:29 ` Philipp Rumpf 1999-11-17 20:01 ` Stan Sieler 1999-11-17 20:33 ` Philipp Rumpf 1999-11-16 21:43 ` Frank Rowand 1999-11-17 6:12 ` Philipp Rumpf 1999-11-17 18:56 ` Frank Rowand 1999-11-17 22:05 ` Philipp Rumpf 1999-11-17 22:39 ` John David Anglin 1999-11-17 22:52 ` Philipp Rumpf 1999-11-17 23:37 ` Stan Sieler 1999-11-18 0:09 ` Philipp Rumpf 1999-11-18 0:43 ` Frank Rowand 1999-11-18 1:35 ` Frank Rowand 1999-11-18 5:33 ` John David Anglin 1999-11-18 8:02 ` Philippe Benard 1999-11-18 20:37 ` John David Anglin 1999-11-18 22:38 ` Frank Rowand 1999-11-19 4:12 ` Philipp Rumpf 1999-11-19 9:08 ` Philippe Benard 1999-11-17 23:02 ` Frank Rowand 1999-11-17 23:25 ` Philipp Rumpf 1999-11-17 8:14 ` Philippe Benard 1999-11-15 8:19 ` Philipp Rumpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox