linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* question on symbol exports
@ 2005-02-01  0:15 Chris Friesen
  2005-02-01  7:36 ` Arjan van de Ven
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Friesen @ 2005-02-01  0:15 UTC (permalink / raw)
  To: linuxppc-dev, Linux kernel

It appears that in 2.6.9 the ppc64 version of flush_tlb_page() depends 
on two symbols which are not currently exported: the function 
__flush_tlb_pending(), and the per-cpu variable ppc64_tlb_batch.

Is there any particular reason why modules should not be allowed to 
flush the tlb, or is this an oversight?

Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: question on symbol exports
  2005-02-01  0:15 question on symbol exports Chris Friesen
@ 2005-02-01  7:36 ` Arjan van de Ven
  2005-02-01 15:37   ` Chris Friesen
  0 siblings, 1 reply; 12+ messages in thread
From: Arjan van de Ven @ 2005-02-01  7:36 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linuxppc-dev, Linux kernel

On Mon, 2005-01-31 at 18:15 -0600, Chris Friesen wrote:
> It appears that in 2.6.9 the ppc64 version of flush_tlb_page() depends 
> on two symbols which are not currently exported: the function 
> __flush_tlb_pending(), and the per-cpu variable ppc64_tlb_batch.
> 
> Is there any particular reason why modules should not be allowed to 
> flush the tlb, or is this an oversight?

can you point at the url to your module source? I suspect modules doing
tlb flushes is the wrong thing, but without seeing the source it's hard
to tell.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: question on symbol exports
  2005-02-01  7:36 ` Arjan van de Ven
@ 2005-02-01 15:37   ` Chris Friesen
  2005-02-01 15:50     ` Arjan van de Ven
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Friesen @ 2005-02-01 15:37 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linuxppc-dev, Linux kernel

Arjan van de Ven wrote:
> On Mon, 2005-01-31 at 18:15 -0600, Chris Friesen wrote:

>>Is there any particular reason why modules should not be allowed to 
>>flush the tlb, or is this an oversight?
> 
> can you point at the url to your module source? I suspect modules doing
> tlb flushes is the wrong thing, but without seeing the source it's hard
> to tell.

I've included the relevent code at the bottom.  The module will be 
released under the GPL.

I've got a module that I'm porting forward from 2.4.  The basic idea is 
that we want to be able to track pages dirtied by an application.  The 
system has no swap, so we use the dirty bit to get this information.  On 
demand we walk the page tables belonging to the process, store the 
addresses of any dirty ones, flush the tlb, and mark them clean.

I (obviously) don't have a good understanding of how the tlb interacts 
with the software page tables.  If we don't need to flush the tlb I'd 
love to hear it.  If there's an easier way than walking the tables 
manually please let me know.

If it matters, some of the dirty pages may be code (it's used by an 
emulator for a system that can handle on-the-fly binary patching).

Thanks,

Chris







Note: this code is run while holding &mm->mmap_sem and &mm->page_table_lock.

	/* scan through the entire address space given */
	dirty_count = 0;
	for(addr=start&PAGE_MASK; addr<=end; addr+=PAGE_SIZE) {
		pgd_t *pgd;
		pmd_t *pmd;
		pte_t *ptep, pte;
		
		/* Page table walking code stolen from follow_page() except
		 * that this version does not support huge tlbs.
		 */
		pgd = pgd_offset(mm, addr);
		if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
			continue;

		pmd = pmd_offset(pgd, addr);
		if (pmd_none(*pmd))
			continue;
		if (unlikely(pmd_bad(*pmd)))
			continue;

		ptep = pte_offset_map(pmd, addr);
		if (!ptep)
			continue;

		pte = *ptep;
		pte_unmap(ptep);
		if (!pte_present(pte))
			continue;

		if (!pte_dirty(pte))
			continue;

		if (!pte_read(pte))
			continue;

		/* We have a user readable dirty page.  Count it.*/
		dirty_count++;

		if (dirty_count > entries) {
			continue;
		} else {
			__put_user(addr, buf);
			buf++;
		}

		flush_tlb_page(find_vma(mm,addr), addr);
		pte = pte_mkclean(pte);
	}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: question on symbol exports
  2005-02-01 15:37   ` Chris Friesen
@ 2005-02-01 15:50     ` Arjan van de Ven
  2005-02-01 17:00       ` Chris Friesen
       [not found]       ` <20050204203050.GA5889@dmt.cnet>
  0 siblings, 2 replies; 12+ messages in thread
From: Arjan van de Ven @ 2005-02-01 15:50 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linuxppc-dev, Linux kernel

On Tue, 2005-02-01 at 09:37 -0600, Chris Friesen wrote:
> Arjan van de Ven wrote:
> > On Mon, 2005-01-31 at 18:15 -0600, Chris Friesen wrote:
> 
> >>Is there any particular reason why modules should not be allowed to 
> >>flush the tlb, or is this an oversight?
> > 
> > can you point at the url to your module source? I suspect modules doing
> > tlb flushes is the wrong thing, but without seeing the source it's hard
> > to tell.
> 
> I've included the relevent code at the bottom.  The module will be 
> released under the GPL.
> 
> I've got a module that I'm porting forward from 2.4.  The basic idea is 
> that we want to be able to track pages dirtied by an application.  The 
> system has no swap, so we use the dirty bit to get this information.  On 
> demand we walk the page tables belonging to the process, store the 
> addresses of any dirty ones, flush the tlb, and mark them clean.

afaik one doesn't need to do a tlb flush in code that clears the dirty
bit, as long as you use the proper vm functions to do so. 
(if those need a tlb flush, those are supposed to do that for you
afaik).

Also note that your code isn't dealing with 4 level pagetables.... And
pagetable walking in drivers is basically almost always a mistake and a
sign that something is wrong.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: question on symbol exports
  2005-02-01 15:50     ` Arjan van de Ven
@ 2005-02-01 17:00       ` Chris Friesen
       [not found]       ` <20050204203050.GA5889@dmt.cnet>
  1 sibling, 0 replies; 12+ messages in thread
From: Chris Friesen @ 2005-02-01 17:00 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linuxppc-dev, Linux kernel

Arjan van de Ven wrote:
> On Tue, 2005-02-01 at 09:37 -0600, Chris Friesen wrote:

>>I've got a module that I'm porting forward from 2.4.  The basic idea is 
>>that we want to be able to track pages dirtied by an application.  The 
>>system has no swap, so we use the dirty bit to get this information.  On 
>>demand we walk the page tables belonging to the process, store the 
>>addresses of any dirty ones, flush the tlb, and mark them clean.
> 
> 
> afaik one doesn't need to do a tlb flush in code that clears the dirty
> bit, as long as you use the proper vm functions to do so. 
> (if those need a tlb flush, those are supposed to do that for you
> afaik).

I've been in contact with one of the developers of the code.  The reason 
we flush the tlb is so that on the next write the cpu has to fault it in 
and set the dirty bit.  Does that make sense?  I should try 
experimenting.....

> Also note that your code isn't dealing with 4 level pagetables.... 

2.6.9 doesn't have 4 level pagetables.  We'll have to port it forward 
when we eventually upgrade.

 > And
> pagetable walking in drivers is basically almost always a mistake and a
> sign that something is wrong.

I'd rather not be doing it, but there's no nice generic va_to_pte() 
function.  There have been patches, and ppc has one, but as a whole I 
don't know of any nice way to do this.

Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: question on symbol exports
       [not found]       ` <20050204203050.GA5889@dmt.cnet>
@ 2005-02-04 20:14         ` Chris Friesen
  2005-02-05  9:19           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Friesen @ 2005-02-04 20:14 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linuxppc64-dev, linuxppc-dev, Linux kernel, Arjan van de Ven

I've added the ppc64 list to the addressees, in case they are interested.


Marcelo Tosatti wrote:
> On Tue, Feb 01, 2005 at 04:50:16PM +0100, Arjan van de Ven wrote:

>>afaik one doesn't need to do a tlb flush in code that clears the dirty
>>bit, as long as you use the proper vm functions to do so. 
>>(if those need a tlb flush, those are supposed to do that for you
>>afaik).

> Yep, and "proper VM function" is include/asm-generic/pgtable.h::ptep_clear_flush_dirty(),
> which on PPC flushes the TLB.

It turns out that to call ptep_clear_flush_dirty() on ppc64 from a 
module I needed to export the following symbols:

__flush_tlb_pending
ppc64_tlb_batch
hpte_update

>>Also note that your code isn't dealing with 4 level pagetables.... And
>>pagetable walking in drivers is basically almost always a mistake and a
>>sign that something is wrong.

> Or a sign that the core kernel lacks helper functions :) 

Absolutely.  It'd be so nice if there was a simple va_to_ptep() helper 
function available.

Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: question on symbol exports
  2005-02-04 20:14         ` Chris Friesen
@ 2005-02-05  9:19           ` Benjamin Herrenschmidt
  2005-02-07 14:44             ` Chris Friesen
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2005-02-05  9:19 UTC (permalink / raw)
  To: Chris Friesen
  Cc: linuxppc64-dev, Arjan van de Ven, Linux Kernel list,
	linuxppc-dev list


> It turns out that to call ptep_clear_flush_dirty() on ppc64 from a 
> module I needed to export the following symbols:
> 
> __flush_tlb_pending
> ppc64_tlb_batch
> hpte_update

Any reason why you need to call that from a module ? Is the module
GPL'd ?

Ben.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: question on symbol exports
  2005-02-05  9:19           ` Benjamin Herrenschmidt
@ 2005-02-07 14:44             ` Chris Friesen
  2005-02-07 21:35               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Friesen @ 2005-02-07 14:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc64-dev, Arjan van de Ven, Linux Kernel list,
	linuxppc-dev list

Benjamin Herrenschmidt wrote:
>>It turns out that to call ptep_clear_flush_dirty() on ppc64 from a 
>>module I needed to export the following symbols:
>>
>>__flush_tlb_pending
>>ppc64_tlb_batch
>>hpte_update
> 
> 
> Any reason why you need to call that from a module ? Is the module
> GPL'd ?

I explained this at the beginning of the thread, but I'll do so again. 
The module will be released under the GPL.

The basic idea is that we want to be able to track pages dirtied by a 
userspace process.  The system has no swap, so we use the dirty bit for 
this.  On demand we look up the page tables for an address range 
specified by the caller, store the addresses of any dirty pages, then 
mark them clean so that the next write causes them to get marked dirty 
again.  It is this act of marking them clean that requires the 
additional exports.

I've included the current code below.  If there is any way to accomplish 
this without the additional exports, I'd love to hear about it.

Chris








Note: this code is run while holding &mm->mmap_sem and &mm->page_table_lock.


for(addr=start&PAGE_MASK; addr<=end; addr+=PAGE_SIZE) {
	pte_t *ptep=0;

	ptep = va_to_ptep_map(mm, addr);
	if (!ptep)
		goto unmap_continue;

	if (!pte_dirty(*ptep))
		goto unmap_continue;

	/* We have a user readable dirty page.  Count it.*/
	dirty_count++;

	if (dirty_count <= entries) {
		__put_user(addr, buf);
		buf++;
		ptep_clear_flush_dirty(find_vma(mm, addr), addr, ptep);

		/* Handle option to stop early. */
		if ((dirty_count == entries) &&
			(options & STOP_WHEN_BUF_FULL))
			addr=end+1;
	}

unmap_continue:
	if (ptep)
		pte_unmap(ptep);		
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: question on symbol exports
  2005-02-07 14:44             ` Chris Friesen
@ 2005-02-07 21:35               ` Benjamin Herrenschmidt
  2005-02-07 23:02                 ` Chris Friesen
  2005-02-07 23:42                 ` Dan Malek
  0 siblings, 2 replies; 12+ messages in thread
From: Benjamin Herrenschmidt @ 2005-02-07 21:35 UTC (permalink / raw)
  To: Chris Friesen
  Cc: linuxppc64-dev, Arjan van de Ven, Linux Kernel list,
	linuxppc-dev list

On Mon, 2005-02-07 at 08:44 -0600, Chris Friesen wrote:
> Benjamin Herrenschmidt wrote:
> >>It turns out that to call ptep_clear_flush_dirty() on ppc64 from a 
> >>module I needed to export the following symbols:
> >>
> >>__flush_tlb_pending
> >>ppc64_tlb_batch
> >>hpte_update
> > 
> > 
> > Any reason why you need to call that from a module ? Is the module
> > GPL'd ?
> 
> I explained this at the beginning of the thread, but I'll do so again. 
> The module will be released under the GPL.
> 
> The basic idea is that we want to be able to track pages dirtied by a 
> userspace process.  The system has no swap, so we use the dirty bit for 
> this.  On demand we look up the page tables for an address range 
> specified by the caller, store the addresses of any dirty pages, then 
> mark them clean so that the next write causes them to get marked dirty 
> again.  It is this act of marking them clean that requires the 
> additional exports.
> 
> I've included the current code below.  If there is any way to accomplish 
> this without the additional exports, I'd love to hear about it.

Interesting... more than no swap, you must also make sure you have no
r/w mmap'ed file (which are technically equivalent to swap).

I'm not too fan about exporting those symbols, but I'll talk to paulus,
it should be possible at least to EXPORT_SYMBOL_GPL them...

Ben.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: question on symbol exports
  2005-02-07 21:35               ` Benjamin Herrenschmidt
@ 2005-02-07 23:02                 ` Chris Friesen
  2005-02-07 23:42                 ` Dan Malek
  1 sibling, 0 replies; 12+ messages in thread
From: Chris Friesen @ 2005-02-07 23:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc64-dev, Arjan van de Ven, Linux Kernel list,
	linuxppc-dev list

Benjamin Herrenschmidt wrote:

> Interesting... more than no swap, you must also make sure you have no
> r/w mmap'ed file (which are technically equivalent to swap).

Ah...thanks for the warning.

We want to eventually make it work with swap as well, but that's 
substantially more complicated.

> I'm not too fan about exporting those symbols, but I'll talk to paulus,
> it should be possible at least to EXPORT_SYMBOL_GPL them...

I understand the reluctance.  I'm perfectly willing to export it GPL in 
my private branch as long as you guys don't consider it evil--the module 
is going to be GPL anyways.

The alternative would be for me to build my code directly in to the 
kernel...just makes it harder for me to debug.

Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: question on symbol exports
  2005-02-07 21:35               ` Benjamin Herrenschmidt
  2005-02-07 23:02                 ` Chris Friesen
@ 2005-02-07 23:42                 ` Dan Malek
  2005-02-08 15:36                   ` Chris Friesen
  1 sibling, 1 reply; 12+ messages in thread
From: Dan Malek @ 2005-02-07 23:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev list, linuxppc64-dev, Chris Friesen,
	Linux Kernel list, Arjan van de Ven


On Feb 7, 2005, at 4:35 PM, Benjamin Herrenschmidt wrote:

> Interesting... more than no swap, you must also make sure you have no
> r/w mmap'ed file (which are technically equivalent to swap).

Yeah, I kinda had a similar thought.  Just because you aren't
swapping doesn't mean the VM subsystem isn't looking at dirty bits,
too.  It could potentially steal a page that it thinks can be replaced
from either a zero-fill or reading again from persistent storage.


	-- Dan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: question on symbol exports
  2005-02-07 23:42                 ` Dan Malek
@ 2005-02-08 15:36                   ` Chris Friesen
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Friesen @ 2005-02-08 15:36 UTC (permalink / raw)
  To: Dan Malek
  Cc: linuxppc-dev list, Arjan van de Ven, Linux Kernel list,
	linuxppc64-dev

Dan Malek wrote:
> 
> On Feb 7, 2005, at 4:35 PM, Benjamin Herrenschmidt wrote:
> 
>> Interesting... more than no swap, you must also make sure you have no
>> r/w mmap'ed file (which are technically equivalent to swap).
> 
> 
> Yeah, I kinda had a similar thought.  Just because you aren't
> swapping doesn't mean the VM subsystem isn't looking at dirty bits,
> too.  It could potentially steal a page that it thinks can be replaced
> from either a zero-fill or reading again from persistent storage.

In our existing case, the app also mlock()s the pages in question.  This 
should get around these two possible sources of inaccuracy.

Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-02-08 15:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-01  0:15 question on symbol exports Chris Friesen
2005-02-01  7:36 ` Arjan van de Ven
2005-02-01 15:37   ` Chris Friesen
2005-02-01 15:50     ` Arjan van de Ven
2005-02-01 17:00       ` Chris Friesen
     [not found]       ` <20050204203050.GA5889@dmt.cnet>
2005-02-04 20:14         ` Chris Friesen
2005-02-05  9:19           ` Benjamin Herrenschmidt
2005-02-07 14:44             ` Chris Friesen
2005-02-07 21:35               ` Benjamin Herrenschmidt
2005-02-07 23:02                 ` Chris Friesen
2005-02-07 23:42                 ` Dan Malek
2005-02-08 15:36                   ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).