zap_page_range in a module

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* zap_page_range in a module
@ 2001-12-14 21:26 Sottek, Matthew J
  2001-12-14 22:30 ` Benjamin LaHaise
  0 siblings, 1 reply; 7+ messages in thread
From: Sottek, Matthew J @ 2001-12-14 21:26 UTC (permalink / raw)
  To: 'linux-kernel@vger.kernel.org'

  I have a driver memory mapping issue that I'm unsure how to
handle. Basically I've written a i810 framebuffer driver that uses
only stolen memory modes (Mostly for embedded customers). This driver
currently can only work when compiled into the kernel because I need 
zap_page_rage(). Is there an acceptable way for me to get equivalent
functionality in a module so that this will be more useful to the
general public?

Some backup info:
The "stolen memory" is the 1mb that the bios takes from the system
before OS load. The i810 maps this in 64k banks to 0xa0000. I can
use any video modes <1MB in size by accessing the memory via these
64k banks and swapping banks when needed.

For the fb driver I allow memory mapping of a 1MB area on the fb device
file and install a zero_page fault handler. When a page is faulted I
map the 64k region that contains the page the client needs with
remap_page_range() and switch the memory bank. I then need to drop
any old 64k ranges so that I will get another zero_page fault when
they are accessed. This way the client see's 1MB linear memory and
I bank flip behind the scenes.

So I'm using zap_page_range() to drop the pages for the "old" memory
bank. This, of course, is not exported to modules. Is there some
existing way to get this functionality in a module? is there any
chance to export zap_page_range()?

please cc this address in replies

 -Matt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: zap_page_range in a module
  2001-12-14 21:26 Sottek, Matthew J
@ 2001-12-14 22:30 ` Benjamin LaHaise
  2001-12-17  9:32   ` Martin Diehl
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin LaHaise @ 2001-12-14 22:30 UTC (permalink / raw)
  To: Sottek, Matthew J; +Cc: 'linux-kernel@vger.kernel.org'

On Fri, Dec 14, 2001 at 01:26:29PM -0800, Sottek, Matthew J wrote:
> currently can only work when compiled into the kernel because I need 
> zap_page_rage(). Is there an acceptable way for me to get equivalent
> functionality in a module so that this will be more useful to the
> general public?

The vm does zap_page_range for you if you're implementing an mmap operation, 
otherwise vmalloc/vfree/vremap will take care of the details for you.  How 
is your code using zap_page_range?  It really shouldn't be.

		-ben

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: zap_page_range in a module
@ 2001-12-15  2:10 Sottek, Matthew J
  2001-12-15  2:31 ` Benjamin LaHaise
  0 siblings, 1 reply; 7+ messages in thread
From: Sottek, Matthew J @ 2001-12-15  2:10 UTC (permalink / raw)
  To: 'Benjamin LaHaise'; +Cc: 'linux-kernel@vger.kernel.org'

>On Fri, Dec 14, 2001 at 01:26:29PM -0800, Sottek, Matthew J wrote:
>> currently can only work when compiled into the kernel because I need 
>> zap_page_rage(). Is there an acceptable way for me to get equivalent
>> functionality in a module so that this will be more useful to the
>> general public?

>The vm does zap_page_range for you if you're implementing an
>mmap operation, 

It only does zap_page_range() when the memory map is being
removed right?

>otherwise vmalloc/vfree/vremap will take care of the details for
>you.  How is your code using zap_page_range?  It really shouldn't be.

I will try to explain in it again in another way.

I have a 64k sliding "window" into a 1MB region. You can only access
64k at a time then you have to switch the "bank" to access the next
64k. Address 0xa0000-0xaffff is the 64k window. The actual 1MB of
memory is above the top of memory and not directly addressable by the
CPU, you have to go through the banks.

My driver implements the mmap file operation and does NOT do a
remap_page_range(). I also install a zero_page fault handler.

The client application then memory maps a 1MB region on the device
file. When the client tries to access the first page, my fault
handler is called and I remap_page_range() the 64k window
and set the hardware such that the first 64k of memory is what
can be viewed through the window.

When the client gets to 64k + 1 my fault handler is triggered again.
At this time I change the window to view the second 64k and do
another remap_page_range() of the window to the second 64k in the
vma.  HERE is the problem. I need to get rid of the area so that
when the client reads from the first page my fault handler is
triggered again. zap_page_range() works, but only from within the
kernel.

This seems like something that would have lots of uses, so I assume
there is a way to do it that I just haven't discovered.
Is there no driver doing something like this to give mutual exclusion
to a memory mapped resource?

-Matt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: zap_page_range in a module
  2001-12-15  2:10 zap_page_range in a module Sottek, Matthew J
@ 2001-12-15  2:31 ` Benjamin LaHaise
  2001-12-17  9:32   ` Martin Diehl
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin LaHaise @ 2001-12-15  2:31 UTC (permalink / raw)
  To: Sottek, Matthew J; +Cc: 'linux-kernel@vger.kernel.org'

On Fri, Dec 14, 2001 at 06:10:52PM -0800, Sottek, Matthew J wrote:
> >The vm does zap_page_range for you if you're implementing an
> >mmap operation, 
> 
> It only does zap_page_range() when the memory map is being
> removed right?

Right.

> I have a 64k sliding "window" into a 1MB region. You can only access
> 64k at a time then you have to switch the "bank" to access the next
> 64k. Address 0xa0000-0xaffff is the 64k window. The actual 1MB of
> memory is above the top of memory and not directly addressable by the
> CPU, you have to go through the banks.

Stop right there.  You can't do that.  The code will deadlock on page 
faults for certain usage patterns.  It's slow, inefficient and a waste 
of effort.

		-ben
-- 
Fish.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: zap_page_range in a module
  2001-12-14 22:30 ` Benjamin LaHaise
@ 2001-12-17  9:32   ` Martin Diehl
  0 siblings, 0 replies; 7+ messages in thread
From: Martin Diehl @ 2001-12-17  9:32 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Sottek, Matthew J, 'linux-kernel@vger.kernel.org'

On Fri, 14 Dec 2001, Benjamin LaHaise wrote:

> On Fri, Dec 14, 2001 at 01:26:29PM -0800, Sottek, Matthew J wrote:
> > currently can only work when compiled into the kernel because I need 
> > zap_page_rage(). Is there an acceptable way for me to get equivalent
> > functionality in a module so that this will be more useful to the
> > general public?
> 
> The vm does zap_page_range for you if you're implementing an mmap operation, 
> otherwise vmalloc/vfree/vremap will take care of the details for you.  How 
> is your code using zap_page_range?  It really shouldn't be.

True, but IMHO only for standard mmap semantics.

Well, the background is slightly different here, but very much the same
problem: I'd like to get rid of some page(s) which are mapped to an
userland vma. At certain points I need to force a page fault on this and
so the overloaded vma->nopage() gets called and can do the right thing.

zap_page_range() does exactly what I want. IMHO zap_page_range() is some
kind of symmetric buddy of remap_page_range() - it's somewhat surprizing
to find one exported but not the other one. And, AFAICS, there is no
technical reason as well, not to use it - at least for me it's working
perfectly fine. Of course it needs proper mm serialization provided by
down_write(&mm->mmap_sem).

Martin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: zap_page_range in a module
  2001-12-15  2:31 ` Benjamin LaHaise
@ 2001-12-17  9:32   ` Martin Diehl
  2001-12-18 13:04     ` Helge Hafting
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Diehl @ 2001-12-17  9:32 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Sottek, Matthew J, 'linux-kernel@vger.kernel.org'

On Fri, 14 Dec 2001, Benjamin LaHaise wrote:

> > I have a 64k sliding "window" into a 1MB region. You can only access
> > 64k at a time then you have to switch the "bank" to access the next
> > 64k. Address 0xa0000-0xaffff is the 64k window. The actual 1MB of
> > memory is above the top of memory and not directly addressable by the
> > CPU, you have to go through the banks.
> 
> Stop right there.  You can't do that.  The code will deadlock on page 
> faults for certain usage patterns.  It's slow, inefficient and a waste 
> of effort.

Would you mind giving a hint how the predicted deadlock path would look
like or what the usage pattern might be, please?

I'm asking because I'm happily doing something very similar to what
Matthew describes without ever running into trouble - and this operates
at major page fault rates up to 1000/sec here. What I'm doings is:

in fops->mmap(vma), serialized with other file operations:
	drv->vaddr = vma->vm_start;
	drv->vlen = vma->vm_end - vma->vm_start;
	vma->vm_flags |= VM_RESERVED;
	vma->vm_ops = &my_vm_ops;

vma->vm_ops->nopage() is my overloaded page fault handler which maps
_selectable_ kmalloc'ed kernel pages to the userland vma.

in fops->ioctl(), again serialized with other file operations:
	down_write(&current->mm->mmap_sem);
	zap_page_range(current->mm, drv->vaddr, drv->vlen);
	up_write(&current->mm->mmap_sem);

note that this is pretty much the same what sys_munmap() is doing - with
one important difference: the mmap'ed vma isn't freed, it just remains
unchanged and a major page fault is triggered on the next access.

Finally let me point out that performance is not an issue here - and IMHO
simple creation and destruction of pte's pointing to advance-kmalloc'ed 
pages shouldn't be that slow anyway. OTOH, ability to use the page fault
handler to control which page gets mapped to this vma (including none,
i.e. forcing SIGBUS) is an issue.

Regards,
Martin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: zap_page_range in a module
  2001-12-17  9:32   ` Martin Diehl
@ 2001-12-18 13:04     ` Helge Hafting
  0 siblings, 0 replies; 7+ messages in thread
From: Helge Hafting @ 2001-12-18 13:04 UTC (permalink / raw)
  To: Martin Diehl, linux-kernel

Martin Diehl wrote:
> 
> On Fri, 14 Dec 2001, Benjamin LaHaise wrote:
> 
> > > I have a 64k sliding "window" into a 1MB region. You can only access
> > > 64k at a time then you have to switch the "bank" to access the next
> > > 64k. Address 0xa0000-0xaffff is the 64k window. The actual 1MB of
> > > memory is above the top of memory and not directly addressable by the
> > > CPU, you have to go through the banks.
> >
> > Stop right there.  You can't do that.  The code will deadlock on page
> > faults for certain usage patterns.  It's slow, inefficient and a waste
> > of effort.
> 
> Would you mind giving a hint how the predicted deadlock path would look
> like or what the usage pattern might be, please?
> 
> I'm asking because I'm happily doing something very similar to what
> Matthew describes without ever running into trouble - and this operates
> at major page fault rates up to 1000/sec here. What I'm doings is:

Some processors have instructions that require 2 or more pages
present simultaneously to execute.  That _will_ fail
spectacularly if the two pages belongs to different banks
in the above scenario, as only one bank can be present at a time.

Some examples for x86 processors:

1. The string move/compare instructions.  Fine for copying blocks of
   memory around.  The above case is a framebuffer, using
   "movsd" to copy from one location to another isn't
   all that uncommon.  The two locations might be in different banks.

2. An unaligned read or write, such as writing a 32-bit quantity
   to the last even address in the first bank.  The the rest hits
   the first part of the next bank.  (A 16-bit quantity written to
   the last odd address does the same thing.)

3. An instruction that cross a bank bounddary, or lives in one
   and access data in another bank.  Of course you don't usually
   store instructions in a frame buffer. :-)

4. Processor-specific structures (page tables, interrupt
   vectors... stored so they cross a bank.)  Not applicable
   to framebuffers, but there might be strange machines with
   bank-switched main memory around.

In any of these cases, the following happens:
1. You get a page fault for the page in the missing bank.
2. The page fault handler switch banks.
3. The instruction is restarted as the page fault handler returns
4. You get a page fault for the now missing page in the bank
   that was switched off.
5. The page fault handler switch banks
7. the instruction is restarted.  Repeat from 1 in
   an endless loop.  Your machine is now deadlocked.  Perhaps
   you're so lucky that some other processes still gets 
   scheduled - lets hope none of them need the bank-switched 
   memory _at all_.

Helge Hafting

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-12-18 13:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-15  2:10 zap_page_range in a module Sottek, Matthew J
2001-12-15  2:31 ` Benjamin LaHaise
2001-12-17  9:32   ` Martin Diehl
2001-12-18 13:04     ` Helge Hafting
  -- strict thread matches above, loose matches on Subject: below --
2001-12-14 21:26 Sottek, Matthew J
2001-12-14 22:30 ` Benjamin LaHaise
2001-12-17  9:32   ` Martin Diehl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox