All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [parisc-linux] Linux syscall ABI
@ 2000-02-16 13:57 John Marvin
  2000-02-16 17:41 ` Philipp Rumpf
  0 siblings, 1 reply; 13+ messages in thread
From: John Marvin @ 2000-02-16 13:57 UTC (permalink / raw)
  To: parisc-linux


> This sounds to me like a typical case of doing a static optimization (is
> this a memcpy() to I/O space, from I/O space, to and from I/O space) at
> runtime.

I believe there are some cases in the graphics libraries where it is
not known at runtime whether the destination will be IO (framebuffer)
or memory. But I also tend to agree with you. 99.9% of the use of memcpy
will not be for IO, so it probably would have made more sense for the
graphics libraries, and any other code where there is any possibility
of being handed a pointer to IO space, to handle it in a different way,
rather than having the test be in memcpy.


> > But Perhaps we can have a 16 Mb offset instead.
>
> I think not mapping the first 64 KB and making a copy of page 0 somewhere
> else would make sense.  Then we could use the first 64 KB of the virtual
> address space to implement gateway pages.

We can probably use a smaller offset than 16 Mb but 64 Kb won't work.  We
have to make sure that the kernel space virtual addresses are equivalently
aliased with their physical addresses. 64 Kb would work on a 712, but it
won't work on a C3000.  Currently PCXU supports a maximum external direct
mapped cache size of 4 Mb, and I don't think that has been increased for
PCXW.  I'm not sure what the largest actually implemented direct mapped
cache is, but I know it is at least 2 Mb.  Of course, to take full
advantage of large pages, it might make sense to use a larger offset, i.e.
64 Mb.

Rereading what you said above made me realize that you probably were not
talking about a 64 Kb offset. If so, then you are talking about
still using an offset of 0, but just not mapping the first 64 Kb a memory,
i.e. throwing those pages "away" (actually we can probably find ways
to use them). The only problem with this is that we would be prevented
from using maximally large tlb mappings to map the first 64 Mb of memory.
If we moved the offset to 64 Mb we could use 64 Mb page size mappings
to map the kernel address space. The cost of this is that it reduces
the amount of physical memory we can support.  We can't support 4 Gb
(at least not easily), since we need virtual space for the vmalloc area.
So I'm not sure losing 64 Mb of virtual space at the bottom end is that
much of an issue.

What is the largest amount of physical memory we want to support for the
32 bit implementation?  How hard do we want to work to achieve it?  We
can't support more than 4 Gb.  It would take some work to support 4 Gb.
My feeling is that if we supported 3.5 Gb max that would be more than
adequate.  We could use a 64 Mb offset and use 64 Mb page size mappings to
cover the kernel address space.  This should leave enough space for the
vmalloc area.

> >
> > I like this idea.  The only disadvantage is that if the user modifies sr2
> > by mistake, all of a sudden all of the syscalls stop working (for that
> > process only).
>
> I don't see a real problem with that.  Modifying SR2 requires either direct
> modification (the only code I could see doing that is HP/UX code, which isn't
> supposed to execute with PER_LINUX anytime soon) or executing random bytes,
> which will always break in unexpected ways.
>

I agree that it is not a significant enough problem to stop us from doing
this. So, I propose the following:

    1) When we move the kernel virtual mappings we will leave room at
    the bottom to a) properly trap on null pointer dereferences, and
    b) provide room for a Linux syscall gateway page in the kernel
    address space (space 0). This gateway page will be located at an
    offset within the positive offset range of a ble instruction.

    2) We will set sr2 to zero for each process.

    3) We will only map an HP-UX syscall gateway page into HP-UX
    processes, i.e. we will not map any gateway page into the user
    address space for PER_LINUX processes.

    4) Linux syscalls will use the following 2 instruction sequence
    to reach the gateway page:

	ble <gateway offset>)(%sr2,%r0)
	ldi <syscall #>,%r20

So, if anyone has a significant problem with this proposal, speak up.

John Marvin
jsm@fc.hp.com

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [parisc-linux] Linux syscall ABI
@ 2000-02-17 14:17 John Marvin
  0 siblings, 0 replies; 13+ messages in thread
From: John Marvin @ 2000-02-17 14:17 UTC (permalink / raw)
  To: parisc-linux

>
> I don't think I care.  Note that this is for PA1.1 anyway, so we don't have
> large pages architecturally.

But we do have an even smaller resource of block tlb's that can also
map a maximum of 64 Mb at a time, and which also need to be aligned
to there same size that they map. For this reason it might be worth
considering having a 64 Mb offset rather than a 0 offset and not mapping
the first 64 Kb.

>
> I'd prefer to fix gateway offset now - it's a pretty arbitrary decision,
> but it might break binary compatibility lateron.  My proposal is 0x100.

That's fine with me. We can put break instructions in 0x00-0xfc to catch
anyone branching through a null function pointer.

John Marvin
jsm@fc.hp.com

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [parisc-linux] Linux syscall ABI
@ 2000-02-15  5:36 John Marvin
  2000-02-15  6:15 ` willy
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: John Marvin @ 2000-02-15  5:36 UTC (permalink / raw)
  To: parisc-linux

> I don't think reserving 0xFXXX XXXX for I/O in userspace is a good idea.
> There is no problem with doing userspace I/O using the normal mmap /dev/mem
> approach.  (Except maybe HPUX compatibility, which doesn't concern linux-
> only processes).

...

> kernel memcpy() shouldn't ever be called with either an IO or a user address

I was referring to user space memcpy, not kernel memcpy.  The HP-UX user
space memcpy supports use with IO mapped addresses, however it has to
differentiate those addresses in order to not do optimizations that won't
work with IO mapped addresses. Having a dedicated range allows for an
easy test. But perhaps if this is not desirable we can just say that
Linux glibc memcpy is not supported for IO mapped addresses (assuming it
is optimized).

> > One disadvantage of this proposal is that we could not support the
> > System V personality null pointer dereference behaviour. This maps
> > a page of zero's at location 0 so that null pointer dereferences will
> > return 0 for buggy software. Do we really still need to maintain this
> > ancient hack?
>
> No, we don't.  We're talking about PER_LINUX binaries here, and those
> never expected to be able to dereference NULL pointers.

I don't know much about PER_SVR4, and why it exists.  Willy pointed it
out to me.  I can see from the kernel source that perhaps it is only there
for sparc.  If it is not necessary for parisc-linux to support then
there is no issue. If it is necessary then I guess I assumed that PER_SVR4
binaries would use the same gateway page as PER_LINUX binaries.

> Of course every page in the region 0xfffc0000 - 0x3f fffc (it's a 17-bit
> signed immediate shifted left 2 bits, so that should be -2^18 - 2^18-4)
> can be used, so we just need a page within the first 256 KB.
 
This is true for user space. For kernel space, I don't think we can
use anything in F space, unless we map the real IO addresses somewhere
else in virtual space. I'm not sure what assumptions are being made
right now regarding that mapping in the drivers.

I was also thinking that we may want to eventually map physical addresses
directly (with no offset) to virtual addresses, in order to support the
maximum amount of physical memory. But Perhaps we can have a 16 Mb offset
instead.

> a variety of reasons why it might not be available long term) the
> > sequence could be shortened to:
> >
> >       mtsp %r0,%sr0
> >       ble  <gateway offset>(%sr0,%r0)
> >       ldi <syscall #>,%r20
>
> In fact, what's wrong with shortening _this_ sequence to
>
>       ble <gateway offset)(%sr2, %r0)
>       ldi <syscall #>,%r20
>
> and teaching userspace to not modify sr2 ?

I like this idea.  The only disadvantage is that if the user modifies sr2
by mistake, all of a sudden all of the syscalls stop working (for that
process only).  It might be hard to debug.  But, as long as we make sure
that gcc never touches sr2, there should be almost no legitimate reason to
play with space registers in the user address space for Linux processes,
since we are going to have sr4=sr5=sr6=sr7.  In fact, gcc should be
modified to stop using $$dyncall for indirect function pointer calls.  So,
a C programmer will never run into this problem by mistake.  Only people
doing assembly language programming could run into the error.

Now, I am assuming we would set sr2 to 0 and locate the gateway page in
the kernel address space if we chose this proposal.  But this idea has the
flexibility of allowing us to move the gateway page into another space
completely if we ever need to (would require modifications to the tlb miss
handler).  It also has the interesting feature that a programmer could set
sr2 to point into the user address space, and if we choose an offset for
the gateway page in the kernel address space and make that offset also
available for mmap in the user address space, the user could place there
own page at the gateway offset in user space and intercept all syscalls
(there are other ways of doing this, but I just thought it was
interesting).

John Marvin
jsm@fc.hp.com

^ permalink raw reply	[flat|nested] 13+ messages in thread
* [parisc-linux] Linux syscall ABI
@ 2000-02-14  9:30 John Marvin
  2000-02-14 13:34 ` Philipp Rumpf
  0 siblings, 1 reply; 13+ messages in thread
From: John Marvin @ 2000-02-14  9:30 UTC (permalink / raw)
  To: parisc-linux


I've been talking with willy about the Linux syscall ABI, and now I'd
like to get some input from the rest of you regarding how it should
be handled.

As most of you are aware, HP-UX uses some parisc specific features,
namely the gate instruction used on a page mapped with privilege
promotion access rights (i.e. a gateway page), to implement HP-UX
syscalls. HP-UX puts this gateway page at 0xC0000000 in the users
address space (Which on HP-UX is in a shared quadrant, so there
is only one entry is needed in the tlb for all user processes).

Currently I've implemented a Linux syscall gateway page at 0xC0010000,
but since we don't have anything to be binary compatible with for
parisc linux applications, we can do things differently. I'd like
to throw out a few proposals and see what you all think. Feel free
to suggest other ideas.

Proposal #1:

Don't use a gateway page. Use a more "traditional" trapping instruction,
and handle syscalls in the fault path. We could use a subset of the
available break instructions, or we could "dedicate" a trap (the break
instruction trap handler will have to be shared with debugger support),
like the privileged register trap, or any of a few other traps that
a user program should not run into in the normal course of execution.

The disadvantage with this method is that I don't believe it can be made
to perform as well.  Even if we dedicate a particular trap for handling
syscalls, we still need to do at least 4 mtctl instructions (which on many
parisc processors take 2 states each, and don't bundle for multiple issue)
to reload the space queue and offset queue, plus and rfi instruction, in
order to return to virtual mode in the kernel.  This method also will
defeat any advantages from branch prediction.

All of the other proposals below deal with using a gateway page. I
personally believe that using a gateway page is the better choice.
However, on parisc linux we are capable of supporting a ~4 Gb linear
address space for user processes. I don't think locating the gateway
page at the ~3 Gb mark is a good idea, since it prevents heap expansion
beyond that point (this is a problem I am currently trying to work around
on HP-UX for customers who need this kind of large address space and
are not yet willing to port to 64 bit). I can think of no good reason
to put the gateway page in the middle of the user address space somewhere.
The remaining proposals have to do with where the Linux gateway page
should be located.

I should mention here that we do not currently plan on having any globally
shared quadrants in the user address space for parisc linux. Therefore
whether or not an HP-UX gateway page is mapped into the address space
can be determined on a per process basis. I can see no reason to map
a HP-UX gateway page into the address space for native parisc linux
processes (as opposed to HP-UX processes running on parisc linux).

Proposal #2:

Map the Linux syscall gateway page at the top end of the user address space.
What this top end address would be has yet to be determined. Depending
on how we support mapping I/O devices into the user address space, we
may want to reserve the 0xF0000000-0xFFFFFFFF range for IO (keeping the
device mapped at its equivalent address in the kernel address space).
This may be also be necessary for routines like memcpy (so it can easily
determine if the address is an IO mapped address), which if used on IO
addresses have to do things differently, assuming that memcpy is optimized
for performance.


Proposal #3:

Map the Linux syscall gateway page at near the bottom end of the users
address space.  We could define the default text start for parisc linux
processes such that it leaves room for a gateway page below it.

Proposal #4:

Map the Linux syscall gateway page at the very bottom end of the users
address space, i.e. 0x00000000! Note that gateway pages are execute only,
so processes would still fault on a data null pointer dereference. We
could put some trapping code at the beginning of the gateway page to
catch anyone branching through a null function pointer.

One disadvantage of this proposal is that we could not support the
System V personality null pointer dereference behaviour. This maps
a page of zero's at location 0 so that null pointer dereferences will
return 0 for buggy software. Do we really still need to maintain this
ancient hack?

A slight advantage of this proposal is that it eliminates one instruction
(yes, one whole instruction!) from the syscall path. The general syscall
stub for a user space gateway page looks something like this:

	ldil L%<gateway address>,%r1
	ble  R%<gateway address>(%sr?,%r1)
	ldi <syscall #>,%r20

With the gateway page at 0 we don't need the ldil and can do just:

	ble <gateway page offset>(%sr4,%r0)
	ldi <syscall #>,%r20

Proposal #5:

Locate the gateway page in the kernel address space (space 0).  This will
be a more efficient with respect to tlb usage.  It will add an instruction
to the syscall stub (perhaps an instruction or two can be reclaimed
on the gateway page in return, see below).

It is more efficient re: tlb usage for two reasons.  The first reason is
that since there is only one kernel address space, we only need one entry
in the tlb to map the page.  For user space gateway pages every process
will have its own mapping (aliased to the same page).  I should mention
here that every process will have its own unique space value, and we will
not need to flush the tlb on context switches. The second reason is
that we could locate the syscall return path on the gateway page, so
the syscall path will not need to run through another address range
(the syscall return code) that it could miss on. The kernel system
calls are written in C, and therefore cannot do a long branch back onto
the gateway page, which would be necessary if the gateway page is not
located in the kernel address space. If the gateway page is located in
the kernel address space the system calls can return there for the
syscall return path (check pending signals, rescheds, etc.) before
doing a long branch back to user space. We may also be able to save
a few instructions in the syscall path if the return point is the
natural return point for where the branch to the syscall was taken.

The disadvantage is that we would have to load a space register in
the syscall stub. The sequence would be something like this:

	mtsp %r0,%sr0
	ldil L%<gateway address>,%r1
	ble  R%<gateway address>(%sr0,%r1)
	ldi <syscall #>,%r20

If address 0 is available in the kernel address space (and there are
a variety of reasons why it might not be available long term) the
sequence could be shortened to:

	mtsp %r0,%sr0
	ble  <gateway offset>(%sr0,%r0)
	ldi <syscall #>,%r20

Proposal #6:

Locate the gateway page in a space dedicated purely for the gateway
page. This has the advantage of having one global mapping, similar
to proposal #5 above. It also is completely flexible in terms of
where in the address space it could be located, i.e. 0 would be
available. It has the disadvantages (compared to #5) of not being
able to locate the syscall return path on the gateway page. Also
it would take yet another instruction to load a non zero space value
into a space register, e.g: (assuming gateway at address 0)

    ldi <gateway space value>,%r1
    mtsp    %r1,%sr0
    ble  <gateway offset>(%sr0,%r0)
    ldi <syscall #>,%r20

I only mention this possibility to be complete. I personally do not
think it has much going for it.


I haven't proposed more flexible solutions, including what HP-UX
does for 64 bit syscalls, i.e. they pass a pointer to an array of
syscall pointers into the application at startup. This means that
you have to load them from memory.  My opinion is that we don't
need to be that flexible,  but I'm sure some of you will disagree.

So, what do you all think?

John Marvin
jsm@fc.hp.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2000-02-17 15:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-02-16 13:57 [parisc-linux] Linux syscall ABI John Marvin
2000-02-16 17:41 ` Philipp Rumpf
  -- strict thread matches above, loose matches on Subject: below --
2000-02-17 14:17 John Marvin
2000-02-15  5:36 John Marvin
2000-02-15  6:15 ` willy
2000-02-15 12:50 ` Philipp Rumpf
2000-02-15 17:25 ` Grant Grundler
2000-02-15 18:18   ` Philipp Rumpf
2000-02-15 19:15     ` Frank Rowand
2000-02-16  2:34     ` Grant Grundler
2000-02-16  9:33       ` Kirk Bresniker
2000-02-14  9:30 John Marvin
2000-02-14 13:34 ` Philipp Rumpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.