linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* RFC: Deprecating io_block_mapping
@ 2005-05-25  1:30 Benjamin Herrenschmidt
  2005-05-25  2:17 ` Kumar Gala
  2005-05-25  4:45 ` Dan Malek
  0 siblings, 2 replies; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2005-05-25  1:30 UTC (permalink / raw)
  To: linuxppc-embedded; +Cc: linuxppc-dev list

As the subject says ... it's the source of endless headaches, is used in
a way that often prevents moving TASK_SIZE freely, etc etc etc...

What are the good and unavoidable uses of it currently that cannot be
replaced by some sort of ioremap ?

(Note that if the answer to the above is: page tables exist too late, I
already have a reply: our initialisations happen too early, let's move
things around so that ioremap is useable... pretty much everything
needed to setup kernel page tables & have working ioremap can be done
without any HW device access so ...)

Ben.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  1:30 RFC: Deprecating io_block_mapping Benjamin Herrenschmidt
@ 2005-05-25  2:17 ` Kumar Gala
  2005-05-25  2:21   ` Benjamin Herrenschmidt
  2005-05-25  4:48   ` Dan Malek
  2005-05-25  4:45 ` Dan Malek
  1 sibling, 2 replies; 29+ messages in thread
From: Kumar Gala @ 2005-05-25  2:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, linuxppc-embedded

On May 24, 2005, at 8:30 PM, Benjamin Herrenschmidt wrote:

> As the subject says ... it's the source of endless headaches, is used 
> in
> a way that often prevents moving TASK_SIZE freely, etc etc etc...
>
> What are the good and unavoidable uses of it currently that cannot be
> replaced by some sort of ioremap ?

Do you propose to fixup ioremap to allocate large page resources (BATs 
and CAMs) going forward?

> (Note that if the answer to the above is: page tables exist too late, I
> already have a reply: our initialisations happen too early, let's move
> things around so that ioremap is useable... pretty much everything
> needed to setup kernel page tables & have working ioremap can be done
> without any HW device access so ...)

Do you have any proposed solution for early console access?  I'm 
guessing that most of the need for early access is for some sort of 
console (serial) for early debug output.

Also does this mean we would drop ppc_md.setup_io_mappings() complete?

- kumar

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  2:17 ` Kumar Gala
@ 2005-05-25  2:21   ` Benjamin Herrenschmidt
  2005-05-25  2:30     ` Kumar Gala
  2005-05-25  5:14     ` Dan Malek
  2005-05-25  4:48   ` Dan Malek
  1 sibling, 2 replies; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2005-05-25  2:21 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev list, linuxppc-embedded

On Tue, 2005-05-24 at 21:17 -0500, Kumar Gala wrote:
> On May 24, 2005, at 8:30 PM, Benjamin Herrenschmidt wrote:
> 
> > As the subject says ... it's the source of endless headaches, is used 
> > in
> > a way that often prevents moving TASK_SIZE freely, etc etc etc...
> >
> > What are the good and unavoidable uses of it currently that cannot be
> > replaced by some sort of ioremap ?
> 
> Do you propose to fixup ioremap to allocate large page resources (BATs 
> and CAMs) going forward?

Do we really ever need them for anything but RAM mapping ?

> > (Note that if the answer to the above is: page tables exist too late, I
> > already have a reply: our initialisations happen too early, let's move
> > things around so that ioremap is useable... pretty much everything
> > needed to setup kernel page tables & have working ioremap can be done
> > without any HW device access so ...)
> 
> Do you have any proposed solution for early console access?  I'm 
> guessing that most of the need for early access is for some sort of 
> console (serial) for early debug output.

How do we implement io_block_mapping() on CPUs without a hash table ? We
need page tables for these so we can have ioremap working. On CPUs with
a hash, we could just shove entries in the hash... though we may need a
mecanism to bolt them or convert those mappings to page tables once
those are available.

> Also does this mean we would drop ppc_md.setup_io_mappings() complete?

Does it really make sense ? I always disliked it.

Ben.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  2:21   ` Benjamin Herrenschmidt
@ 2005-05-25  2:30     ` Kumar Gala
  2005-05-25  5:00       ` Dan Malek
  2005-05-25  5:14     ` Dan Malek
  1 sibling, 1 reply; 29+ messages in thread
From: Kumar Gala @ 2005-05-25  2:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, linuxppc-embedded


On May 24, 2005, at 9:21 PM, Benjamin Herrenschmidt wrote:

> On Tue, 2005-05-24 at 21:17 -0500, Kumar Gala wrote:
>> On May 24, 2005, at 8:30 PM, Benjamin Herrenschmidt wrote:
>>
>>> As the subject says ... it's the source of endless headaches, is
> used
>>> in
>>> a way that often prevents moving TASK_SIZE freely, etc etc etc...
>>>
>>> What are the good and unavoidable uses of it currently that cannot
> be
>>> replaced by some sort of ioremap ?
>>
>> Do you propose to fixup ioremap to allocate large page resources (BATs
>
>> and CAMs) going forward?
>
> Do we really ever need them for anything but RAM mapping ?

The only case I could thing of are embedded frame buffers across some 
bus like PCI.  An example would be a image buffer that a printing 
application may use.

>>> (Note that if the answer to the above is: page tables exist too
> late, I
>>> already have a reply: our initialisations happen too early, let's
> move
>>> things around so that ioremap is useable... pretty much everything
>>> needed to setup kernel page tables & have working ioremap can be
> done
>>> without any HW device access so ...)
>>
>> Do you have any proposed solution for early console access?  I'm
>> guessing that most of the need for early access is for some sort of
>> console (serial) for early debug output.
>
> How do we implement io_block_mapping() on CPUs without a hash table ? 
> We
> need page tables for these so we can have ioremap working. On CPUs with
> a hash, we could just shove entries in the hash... though we may need a
> mecanism to bolt them or convert those mappings to page tables once
> those are available.

I know what I've done in the past is either steal a BAT (83xx) or CAM 
(85xx) entry and then free it up when a proper ioremap can be done 
later.

>> Also does this mean we would drop ppc_md.setup_io_mappings() complete?
>
> Does it really make sense ? I always disliked it.

No, as far as a can tell doing a quick glance if we drop 
io_block_mapping than we can drop setup_io_mappings().

- kumar

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  1:30 RFC: Deprecating io_block_mapping Benjamin Herrenschmidt
  2005-05-25  2:17 ` Kumar Gala
@ 2005-05-25  4:45 ` Dan Malek
  2005-05-25  5:15   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 29+ messages in thread
From: Dan Malek @ 2005-05-25  4:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, linuxppc-embedded


On May 24, 2005, at 9:30 PM, Benjamin Herrenschmidt wrote:

> As the subject says ... it's the source of endless headaches, is used 
> in
> a way that often prevents moving TASK_SIZE freely, etc etc etc...

Why are you so obsessed about this? :-)  We all know what it does
and the limitations.  If someone wants to use it in addition to other
kernel configuration options, their particular start up code will have
to be modified to accept this.

> What are the good and unavoidable uses of it currently that cannot be
> replaced by some sort of ioremap ?
>
> (Note that if the answer to the above is: page tables exist too late,

That's one reason.  The other is to pin BATs or large page table entries
for more efficient access.

> ....  I
> already have a reply: our initialisations happen too early, let's move
> things around so that ioremap is useable

In most cases you can't do this.  There are boards that have to map
serial ports for kgdb or early console debugging.  There are also
boards that need access to local hardware registers to set up that
early.  You may need to map a rom or some other non-volatile storage
to get some system parameters.

Thanks.


	-- Dan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  2:17 ` Kumar Gala
  2005-05-25  2:21   ` Benjamin Herrenschmidt
@ 2005-05-25  4:48   ` Dan Malek
  1 sibling, 0 replies; 29+ messages in thread
From: Dan Malek @ 2005-05-25  4:48 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev list, linuxppc-embedded


On May 24, 2005, at 10:17 PM, Kumar Gala wrote:

> Do you propose to fixup ioremap to allocate large page resources (BATs 
> and CAMs) going forward?

That would have to be absolutely necessary, plus we need to ensure we
accommodate the processor cores with more or less BATs (or also variable
number of CAMs).  We can't afford to continue to work with the least 
common
denominator configurations and waste those resources.

Thanks.


	-- Dan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  2:30     ` Kumar Gala
@ 2005-05-25  5:00       ` Dan Malek
  2005-05-25  6:07         ` Pantelis Antoniou
  0 siblings, 1 reply; 29+ messages in thread
From: Dan Malek @ 2005-05-25  5:00 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev list, linuxppc-embedded


On May 24, 2005, at 10:30 PM, Kumar Gala wrote:

> I know what I've done in the past is either steal a BAT (83xx) or CAM 
> (85xx) entry and then free it up when a proper ioremap can be done 
> later.

This is even more of a hack than io_block_mapping() because it is often
obscure and not documented.  Several boards have done this in the past
as well.  It's "magic" that occurs, what seems to be minor code changes
often cause this to break and makes debugging more complex :-)

> No, as far as a can tell doing a quick glance if we drop 
> io_block_mapping than we can drop setup_io_mappings().

We've got to have something to address the board unique requirements
that are currently satisfied by this.

There is a real problem that we have to solve.  Some boards just need
access to mapped hardware before the VM is set up.  You can't just
remove a feature or tell them their design is wrong.  I don't think 
obscure
mapping tricks are the solution, either.

The only solution is to make ioremap() smart enough to properly use
BATs and CAMs that are available to a processor.   I suspect this is 
going
to lead to a bunch of also undesirable configuration options to address
the customizations necessary.

Thanks.


	-- Dan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  2:21   ` Benjamin Herrenschmidt
  2005-05-25  2:30     ` Kumar Gala
@ 2005-05-25  5:14     ` Dan Malek
  2005-05-25  5:20       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 29+ messages in thread
From: Dan Malek @ 2005-05-25  5:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, linuxppc-embedded


On May 24, 2005, at 10:21 PM, Benjamin Herrenschmidt wrote:

> Do we really ever need them for anything but RAM mapping ?

Of course.  We use BATs and CAMs to address a variety of
mapping options.

> How do we implement io_block_mapping() on CPUs without a hash table ?

What does a hash table have to do with anything?  The io_block_mapping()
is most important on processors that don't have them, like 8xx, 82xx, 
83xx,
and 85xx.   The hash table in Linux is just an unfortunate staging area 
for PTEs.

The io_block_mapping() simply loads BATs, CAMs, or init's page tables.

> .....  We
> need page tables for these so we can have ioremap working.

The problem is you need more than page tables.  We have kernel page
tables very early in the initialization.  The problem with ioremap() is 
if you
haven't done something (like io_block_mapping()) to set up BATs or CAMs,
it will call vmalloc() to get some VM space.  This is where the problem 
lies.

> ... On CPUs with
> a hash, we could just shove entries in the hash... though we may need a
> mecanism to bolt them or convert those mappings to page tables once
> those are available.

We don't need anything that complicated.  As I said, this already works 
on
all processors that don't have hash tables.

We need to make ioremap() much, much smarter, plus we need some kind
of board specific function to set up the BATs or CAMs, like 
io_block_mapping()
does today.


Thanks.


	-- Dan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  4:45 ` Dan Malek
@ 2005-05-25  5:15   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2005-05-25  5:15 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-dev list, linuxppc-embedded

On Wed, 2005-05-25 at 00:45 -0400, Dan Malek wrote:
> On May 24, 2005, at 9:30 PM, Benjamin Herrenschmidt wrote:
> 
> > As the subject says ... it's the source of endless headaches, is used 
> > in
> > a way that often prevents moving TASK_SIZE freely, etc etc etc...
> 
> Why are you so obsessed about this? :-)  We all know what it does
> and the limitations.  If someone wants to use it in addition to other
> kernel configuration options, their particular start up code will have
> to be modified to accept this.

We don't "all" know :) It's very easily misused... 

> > What are the good and unavoidable uses of it currently that cannot be
> > replaced by some sort of ioremap ?
> >
> > (Note that if the answer to the above is: page tables exist too late,
> 
> That's one reason.  The other is to pin BATs or large page table entries
> for more efficient access.

True.

> > ....  I
> > already have a reply: our initialisations happen too early, let's move
> > things around so that ioremap is useable
> 
> In most cases you can't do this.  There are boards that have to map
> serial ports for kgdb or early console debugging.

How do they map ? pinning TLBs ? ioremap can do that... ioremap can be
made to work very very early ...
 
> There are also
> boards that need access to local hardware registers to set up that
> early.  You may need to map a rom or some other non-volatile storage
> to get some system parameters.

Ok, well, it's really just requesting for comment here, I don't say I
will kill it, just wondering ...

Ben.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  5:14     ` Dan Malek
@ 2005-05-25  5:20       ` Benjamin Herrenschmidt
  2005-05-25  5:49         ` Dan Malek
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2005-05-25  5:20 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-dev list, linuxppc-embedded

On Wed, 2005-05-25 at 01:14 -0400, Dan Malek wrote:
> On May 24, 2005, at 10:21 PM, Benjamin Herrenschmidt wrote:
> 
> > Do we really ever need them for anything but RAM mapping ?
> 
> Of course.  We use BATs and CAMs to address a variety of
> mapping options.
> 
> > How do we implement io_block_mapping() on CPUs without a hash table ?
> 
> What does a hash table have to do with anything?  The io_block_mapping()
> is most important on processors that don't have them, like 8xx, 82xx, 
> 83xx,
> and 85xx.   The hash table in Linux is just an unfortunate staging area 
> for PTEs.

Sorry, I meant BATs or equivalent... (see POWER4 for example who has
none of this)
> 
> The io_block_mapping() simply loads BATs, CAMs, or init's page tables.

Ok, that was my point... since init page tables can be loaded by it, why
not make ioremap work that early and do the same ? The problem is of
course allocating the pte pages but how does io_block_mapping() do on
CPUS without BAT/CAMs/whatever ?

> > .....  We
> > need page tables for these so we can have ioremap working.
> 
> The problem is you need more than page tables.  We have kernel page
> tables very early in the initialization.  

We have the pgdir, but not the PTE pages...

> The problem with ioremap() is if you
> haven't done something (like io_block_mapping()) to set up BATs or CAMs,
> it will call vmalloc() to get some VM space.  This is where the problem 
> lies.

No, we have a trick with ioremap_bot, we don't need to get vmalloc space
for ioremap to work early. In fact, it would be nice to just have
io_block_mapping be able to "dynamically" allocate virtual space using
the same mecanism instead of beeing passed a virtual address. That would
fix most of the problems with hard coded 1:1 mappings.
> 
> > ... On CPUs with
> > a hash, we could just shove entries in the hash... though we may need a
> > mecanism to bolt them or convert those mappings to page tables once
> > those are available.
> 
> We don't need anything that complicated.  As I said, this already works 
> on
> all processors that don't have hash tables.
> 
> We need to make ioremap() much, much smarter, plus we need some kind
> of board specific function to set up the BATs or CAMs, like 
> io_block_mapping()
> does today.

Well, my problem is with hard-coded v:p mappings... If we can simply
have io_block_mapping take, for example, 0 for v (or -1) and use the
ioremap_bot trick to "allocate" virtual space, that would make me happy
(it needs to return the allocated address too).

Ben.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  5:20       ` Benjamin Herrenschmidt
@ 2005-05-25  5:49         ` Dan Malek
  2005-05-25  6:00           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Dan Malek @ 2005-05-25  5:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, linuxppc-embedded


On May 25, 2005, at 1:20 AM, Benjamin Herrenschmidt wrote:

> Ok, that was my point... since init page tables can be loaded by it, 
> why
> not make ioremap work that early and do the same ?

Because you lose the efficiency of mapping with BATs or CAMs.

> ... The problem is of
> course allocating the pte pages but how does io_block_mapping() do on
> CPUS without BAT/CAMs/whatever ?

It just loads init's page tables directly.

> We have the pgdir, but not the PTE pages...

The PTE pages are allocated as they are needed.  The PTE pages
are there :-)

> No, we have a trick with ioremap_bot, we don't need to get vmalloc 
> space
> for ioremap to work early.

That's only if you have already done something to already allocate BATs
or CAMs.  On processors that don't  have these, ioremap() would fail 
because
it would think it has to allocate VM space.

>  ....  In fact, it would be nice to just have
> io_block_mapping be able to "dynamically" allocate virtual space using
> the same mecanism instead of beeing passed a virtual address. That 
> would
> fix most of the problems with hard coded 1:1 mappings.

I think we should make ioremap() smarter and have some board 
initialization
that helps it by setting up BATs, CAMs, or unique page table mappings.
There is also an interdependence between ioremap() and other IO
initialization.  In the past some of the fixed addressing was necessary 
due
to assumptions built into IO setup, mapping functions, or macros.  I 
don't
know how much of this is still present.

> Well, my problem is with hard-coded v:p mappings... If we can simply
> have io_block_mapping take, for example, 0 for v (or -1) and use the
> ioremap_bot trick to "allocate" virtual space, that would make me happy
> (it needs to return the allocated address too).

Somewhere, at some point, prior to VM setup, we need to forcibly map
virtual to physical addresses.  These are going to be "hard coded"
mappings, that's exactly how ioremap_bot is set.  This is why
io_block_mapping was created in the first place.  Somehow you have
to specify this mapping before you have a VM allocator to give it to 
you. :-)
Even if you don't call it io_block_mapping(), you are going to need
a function that does exactly this.

Thanks.

	-- Dan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  5:49         ` Dan Malek
@ 2005-05-25  6:00           ` Benjamin Herrenschmidt
  2005-05-25  6:08             ` Kumar Gala
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2005-05-25  6:00 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-dev list, linuxppc-embedded

On Wed, 2005-05-25 at 01:49 -0400, Dan Malek wrote:
> On May 25, 2005, at 1:20 AM, Benjamin Herrenschmidt wrote:
> 
> > Ok, that was my point... since init page tables can be loaded by it, 
> > why
> > not make ioremap work that early and do the same ?
> 
> Because you lose the efficiency of mapping with BATs or CAMs.
> 
> > ... The problem is of
> > course allocating the pte pages but how does io_block_mapping() do on
> > CPUS without BAT/CAMs/whatever ?
> 
> It just loads init's page tables directly.
>
> > We have the pgdir, but not the PTE pages...
> 
> The PTE pages are allocated as they are needed.  The PTE pages
> are there :-)

Well, you say "loads init's page tables directly"... well, init starts
with a pgdir, it needs to allocate PTE pages in order to "load" the page
tables with PTEs, thus my question, how does io_block_mapping does
this ? It use the bootmem allocator ? If yes, then that means we can
have ioremap working as well. I'm not completely arguing against
io_block_mapping() but I think it's abused, and thus looking into
encouraging different approaches.

> > No, we have a trick with ioremap_bot, we don't need to get vmalloc 
> > space
> > for ioremap to work early.
> 
> That's only if you have already done something to already allocate BATs
> or CAMs.  On processors that don't  have these, ioremap() would fail 
> because it would think it has to allocate VM space.

The ioremap_bot trick works in whatever case provided MMU_init has been
called so ioremap_bot & ioremap_base are initialized. It would be fairly
easy to turn that into a static init though.

> >  ....  In fact, it would be nice to just have
> > io_block_mapping be able to "dynamically" allocate virtual space using
> > the same mecanism instead of beeing passed a virtual address. That 
> > would
> > fix most of the problems with hard coded 1:1 mappings.
> 
> I think we should make ioremap() smarter and have some board 
> initialization that helps it by setting up BATs, CAMs, or unique page table
> mappings. There is also an interdependence between ioremap() and other IO
> initialization.  In the past some of the fixed addressing was necessary 
> due to assumptions built into IO setup, mapping functions, or macros.  I 
> don't know how much of this is still present.

It depends on the platform I suppose. There is still a few bits in PReP
that I can fix, I'm not sure about embedded.

> > Well, my problem is with hard-coded v:p mappings... If we can simply
> > have io_block_mapping take, for example, 0 for v (or -1) and use the
> > ioremap_bot trick to "allocate" virtual space, that would make me happy
> > (it needs to return the allocated address too).
> 
> Somewhere, at some point, prior to VM setup, we need to forcibly map
> virtual to physical addresses.  These are going to be "hard coded"
> mappings, that's exactly how ioremap_bot is set.  This is why
> io_block_mapping was created in the first place.  Somehow you have
> to specify this mapping before you have a VM allocator to give it to 
> you. :-)

No, you just need to have ioremap_bot (which is in fact "top" not
"bottom", bad naming) initialized to something sane. This is currently
done in MMU_init() but could probably be initialized statically instead.
I do just that on ppc64 and thus can ioremap at any time without needing
to allocate vmalloc space. The vmalloc space is automatically "cap'd" by
ioremap_bot anyway.

> Even if you don't call it io_block_mapping(), you are going to need
> a function that does exactly this.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  5:00       ` Dan Malek
@ 2005-05-25  6:07         ` Pantelis Antoniou
  0 siblings, 0 replies; 29+ messages in thread
From: Pantelis Antoniou @ 2005-05-25  6:07 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-dev list, linuxppc-embedded

Dan Malek wrote:
> 
> On May 24, 2005, at 10:30 PM, Kumar Gala wrote:
> 
>> I know what I've done in the past is either steal a BAT (83xx) or CAM 
>> (85xx) entry and then free it up when a proper ioremap can be done later.
> 
> 
> This is even more of a hack than io_block_mapping() because it is often
> obscure and not documented.  Several boards have done this in the past
> as well.  It's "magic" that occurs, what seems to be minor code changes
> often cause this to break and makes debugging more complex :-)
> 
>> No, as far as a can tell doing a quick glance if we drop 
>> io_block_mapping than we can drop setup_io_mappings().
> 
> 
> We've got to have something to address the board unique requirements
> that are currently satisfied by this.
> 
> There is a real problem that we have to solve.  Some boards just need
> access to mapped hardware before the VM is set up.  You can't just
> remove a feature or tell them their design is wrong.  I don't think obscure
> mapping tricks are the solution, either.
> 
> The only solution is to make ioremap() smart enough to properly use
> BATs and CAMs that are available to a processor.   I suspect this is going
> to lead to a bunch of also undesirable configuration options to address
> the customizations necessary.

/me nods

> 
> Thanks.
> 
> 
>     -- Dan
> 

Regards

Pantelis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  6:00           ` Benjamin Herrenschmidt
@ 2005-05-25  6:08             ` Kumar Gala
  2005-05-25  7:04               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Kumar Gala @ 2005-05-25  6:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, linuxppc-embedded

>> Somewhere, at some point, prior to VM setup, we need to forcibly map
>> virtual to physical addresses.  These are going to be "hard coded"
>> mappings, that's exactly how ioremap_bot is set.  This is why
>> io_block_mapping was created in the first place.  Somehow you have
>> to specify this mapping before you have a VM allocator to give it to
>> you. :-)
>
> No, you just need to have ioremap_bot (which is in fact "top" not
> "bottom", bad naming) initialized to something sane. This is currently
> done in MMU_init() but could probably be initialized statically 
> instead.
> I do just that on ppc64 and thus can ioremap at any time without 
> needing
> to allocate vmalloc space. The vmalloc space is automatically "cap'd" 
> by
> ioremap_bot anyway.

Can one of you explain why this is necessary.  I believe it I just dont 
understand.  I think this is one of the abuses of io_block_mapping().  
People, myself included, realize some of the caveats implied by calling 
io_block_mapping().

- kumar

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  6:08             ` Kumar Gala
@ 2005-05-25  7:04               ` Benjamin Herrenschmidt
  2005-05-25 16:36                 ` Dan Malek
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2005-05-25  7:04 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev list, linuxppc-embedded


> Can one of you explain why this is necessary.  I believe it I just dont 
> understand.  I think this is one of the abuses of io_block_mapping().  
> People, myself included, realize some of the caveats implied by calling 
> io_block_mapping().

Well, there are 2 different things here. io_block_mapping "moving"
ioremap_bot, and my idea of having io_block_mapping "using" it...

So basically, the vmalloc/ioremap space starts at the end of the memory
linear mapping, and ends at ... ioremap_bot :)

This value is initially set to the "top" of the space useable for
vmalloc/ioremap. It's possible however to do "early" ioremap's (that is
before the vmalloc subsystem is initialized, and thus before we can
dynamically allocate virtual regions). In this case, ioremap just moves
ioremap_bot down and uses the space between the new and the previous
value.

In order to avoid having "block" mappings done by io_block_mapping()
collide with ioremap/vmalloc space, io_block_mapping() also has this bit
of code:

	if (virt > KERNELBASE && virt < ioremap_bot)
		ioremap_bot = ioremap_base = virt;

Which will "move down" ioremap_bot as well if a block mapping ends up in
the kernel area.

Now, my idea is that I dislike the io_block_mapping() interface because
we have to provide the virtual address. Which means, it forces us to
create hard coded v->p mappings, and I consider hard coding virtual
addresses a bad thing (for lots of reasons, including the TASK_SIZE
one).

Thus, I think we could "extend" io_block_mapping() to be able to take
"0" for virt, and return a virtual address. That would be 100%
compatible with existing code. When taking "0" for virt,
io_block_mapping would just allocate virtual space like early
ioremap_bot does, by moving ioremap_bot downward (with appropriate
aligment restrictions). By returning the actual virtual address used, it
makes possible for the caller to know it :)

That way, io_block_mapping() _can_ be used without hard coding virtual
addresses, which would then be documented as the "preferred" thing to
do, and would avoid some of the headaches.

Now, there may be a slight issue with when is ioremap_bot initialized...

It is in bss, so it is 0 by default (which isn't really suitable). It's
only initialized in MMU_init(). Thus there is a problem using it before
MMU_init(). Does that ever happen ? If it does, things are broken, since
the test "virt < ioremap_bot" will always be false anyway, and thus
io_block_mapping() will "fail" to move down ioremap_bot, thus
potentially letting the kernel allocate vmalloc/ioremap space that
overlap the block mapping.

Dan's point about io_block_mapping() supposedly "initializing"
ioremap_bot is bogus, unless I misunderstood him.

Ben.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25  7:04               ` Benjamin Herrenschmidt
@ 2005-05-25 16:36                 ` Dan Malek
  2005-05-25 21:44                   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Dan Malek @ 2005-05-25 16:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, linuxppc-embedded


On May 25, 2005, at 3:04 AM, Benjamin Herrenschmidt wrote:

>> Can one of you explain why this is necessary.  I believe it I just 
>> dont
>> understand.  I think this is one of the abuses of io_block_mapping().
>> People, myself included, realize some of the caveats implied by 
>> calling
>> io_block_mapping().
>
> Well, there are 2 different things here. io_block_mapping "moving"
> ioremap_bot, and my idea of having io_block_mapping "using" it...

It's more complicated than that.  The basic Linux kernel VM map is
kernel_base (usually 0xc0000000), kernel text, kernel data, VM guard,
VM alloc space, then ioremap space.

However, there are "holes" in the VM space that are completely
unused, and this is a precious resource.  The io_block_mapping()
gives us the ability to stick things into those holes.  Usually, we
would configure a system with a 2G user space, then use 
io_block_mapping()
to allocate the space between 0x80000000 and 0xc0000000.
The ioremap() isn't going to do this, unless we really make this
smarter.  On many systems, this was also the mapping for the
PCI space, so things like virt_to_xxx() were based on the assumptions
of this mapping.  So, if a board port wanted to use the option of
user task space configuration, it would have to also manage these
fixed address spaces accordingly.

This is not as simple as making io_block_mapping() use ioremap VM
space.  We have to find a way of managing all of the free kernel VM
space and ensuring all of the mapping APIs for IO know about and
utilize all of this space.

> Now, my idea is that I dislike the io_block_mapping() interface because
> we have to provide the virtual address. Which means, it forces us to
> create hard coded v->p mappings, and I consider hard coding virtual
> addresses a bad thing (for lots of reasons, including the TASK_SIZE
> one).

Then, you better get in line behind me for arguing for much better
VM space management in general :-)  Linux is horrible in this regard,
and the replies I get are " ...  for efficiency you have to know the use
of the spaces and the proper APIs to manage them ..."


> Thus, I think we could "extend" io_block_mapping() to be able to take
> "0" for virt, and return a virtual address.

But, no one would use that because it doesn't have the proper effect.
If this could be done, we would already be using ioremap().

> Dan's point about io_block_mapping() supposedly "initializing"
> ioremap_bot is bogus, unless I misunderstood him.

I never said that, but if you look at the code, it's exactly what it 
does :-)
Any mappings done between the top of user space and bottom of
the kernel are simply forced and ignored by any Linux VM.  The
io_block_mapping() is used to allocate BATs and CAMs and make
them available for ioremap() of devices.  It allows us to map various
devices into the ioremap space, take advantage of the efficiency of
BATs or large page mappings, and still have devices use the ioremap()
to find them.

As I keep saying, somehow you have to lay out the virtual to physical
mapping of devices using the efficiency of BATs and CAMs, and still
make the ioremap() interface work.  The device driver just calls 
ioremap(),
but if you have a smart board set up function, it can set up an 
efficient
mapping using BATs or CAMs rather than 4k pages requiring TLB
exceptions.

We can either make ioremap() really complex with knowledge of all of
these board configuration options so it can set up the BATs and CAMs,
or we set it all up using some functions (like io_block_mapping) in the
board set up and keep ioremap() a simple function.

The current implementation of io_block_mapping does two very important
functions.  One is this set up of efficient mapping for ioremap(), and 
the
other is to utilize the kernel VM space that isn't managed by Linux.
We are currently moving lots of the code to make use of ioremap() rather
than assuming prior mapping, which is a nice thing, but it's costing us
in terms of performance and resource utilization.

Thanks.


	-- Dan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25 16:36                 ` Dan Malek
@ 2005-05-25 21:44                   ` Benjamin Herrenschmidt
  2005-05-26  6:00                     ` Dan Malek
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2005-05-25 21:44 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-dev list, linuxppc-embedded


> However, there are "holes" in the VM space that are completely
> unused, and this is a precious resource.  The io_block_mapping()
> gives us the ability to stick things into those holes.  Usually, we
> would configure a system with a 2G user space, then use 
> io_block_mapping()
> to allocate the space between 0x80000000 and 0xc0000000.

This is a VERY BAD habit. Just set KERNELBASE to 0x80000000 if you do
that, an use io_block_mapping() dynamically the way I explained to alloc
from the top of the address space.

> The ioremap() isn't going to do this, unless we really make this
> smarter.  On many systems, this was also the mapping for the
> PCI space, so things like virt_to_xxx() were based on the assumptions
> of this mapping.  So, if a board port wanted to use the option of
> user task space configuration, it would have to also manage these
> fixed address spaces accordingly.

Well, The PCI IO space base just need to be in a global that is
referenced by _IO_BASE, it works fine, no need to hard code a mapping.
PCI memory space doesn't rely on any of this unless your platform code
is really screwy

> This is not as simple as making io_block_mapping() use ioremap VM
> space.  We have to find a way of managing all of the free kernel VM
> space and ensuring all of the mapping APIs for IO know about and
> utilize all of this space.

I'm not sure I understand your last sentence. As I explained, all we
need is to add to io_block_mapping() the _ability_ to allocate via
ioremap_bot. (I agree that it's a bit early to _deprecate it
completely_). That way, I don't break any existing setup. Then, we can
start adapting platforms (and making sure new ones) use a proper
mecanism instead of hard coding v->p mappings.

> Then, you better get in line behind me for arguing for much better
> VM space management in general :-)  Linux is horrible in this regard,
> and the replies I get are " ...  for efficiency you have to know the use
> of the spaces and the proper APIs to manage them ..."

That is out of topic. I'm talking about a specific issue, you are making
vague generalities.

> But, no one would use that because it doesn't have the proper effect.
> If this could be done, we would already be using ioremap().

Ugh ? Can you explain why it "doesn't have the proper effect" ?

> > Dan's point about io_block_mapping() supposedly "initializing"
> > ioremap_bot is bogus, unless I misunderstood him.
> 
> I never said that, but if you look at the code, it's exactly what it 
> does :-)

No. If you read properly, you'll see that it will _not_ initialize it if
it is 0, because the test virt < ioremap_bot will never be true (both
are unsigned long) before MMU_init() is called.

> Any mappings done between the top of user space and bottom of
> the kernel are simply forced and ignored by any Linux VM.  The
> io_block_mapping() is used to allocate BATs and CAMs and make
> them available for ioremap() of devices.  It allows us to map various
> devices into the ioremap space, take advantage of the efficiency of
> BATs or large page mappings, and still have devices use the ioremap()
> to find them.

Damn. What I am saying is that it's plain wrong to mess around with the
space between TASK_SIZE and KERNELBASE and we should tie them together.
I still don't see any reason why we couldn't have io_block_mappingt()
use the ioremap_bot technique to "allocate" virtual space dynamically at
the top of the address space. So far, none of your arguments contradicts
that.
 
> As I keep saying, somehow you have to lay out the virtual to physical
> mapping of devices using the efficiency of BATs and CAMs, and still
> make the ioremap() interface work.  The device driver just calls 
> ioremap(),
> but if you have a smart board set up function, it can set up an 
> efficient
> mapping using BATs or CAMs rather than 4k pages requiring TLB
> exceptions.

I don't see how that would be changed/affected in any way by making
io_block_mapping() capable of dynamically allocating it's virtual
space...

> We can either make ioremap() really complex with knowledge of all of
> these board configuration options so it can set up the BATs and CAMs,
> or we set it all up using some functions (like io_block_mapping) in the
> board set up and keep ioremap() a simple function.

I am not arguing that. Did you actually read my last mail ? I'm
effectively given up deprecating io_block_mapping() completely at this
point, but aim to change it so that it allocates it's virtual space
instead of hard-coding it using the ioremap_bot technique, which can
work at any time during boot, as early as you want.

> The current implementation of io_block_mapping does two very important
> functions.  One is this set up of efficient mapping for ioremap(), and 
> the
> other is to utilize the kernel VM space that isn't managed by Linux.

It isn't managed by Linux not because linux is bad, but because the
platforms do a stupid setup. Just move down KERNELBASE and linux will
happily manage that space for ioremap. 

> We are currently moving lots of the code to make use of ioremap() rather
> than assuming prior mapping, which is a nice thing, but it's costing us
> in terms of performance and resource utilization.

You can still eventually use an io_block_mapping() then to "optimize"
the mappings to some critical HW resources (PIC ?), I just don't want
thse v->p mapping to be hard coded.

Ben.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-25 21:44                   ` Benjamin Herrenschmidt
@ 2005-05-26  6:00                     ` Dan Malek
  2005-05-26  6:20                       ` Eugene Surovegin
                                         ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Dan Malek @ 2005-05-26  6:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, linuxppc-embedded


On May 25, 2005, at 5:44 PM, Benjamin Herrenschmidt wrote:

> This is a VERY BAD habit. Just set KERNELBASE to 0x80000000 if you do
> that, an use io_block_mapping() dynamically the way I explained to 
> alloc
> from the top of the address space.

Well, back when kernelbase was assumed in too many places
to be 0xc0000000, it was the only option.

> Well, The PCI IO space base just need to be in a global that is
> referenced by _IO_BASE, it works fine, no need to hard code a mapping.

Sure you do.  No one ioremap()s PCI IO space.  It has to be hard wired
somewhere.

> PCI memory space doesn't rely on any of this unless your platform code
> is really screwy

To take advantage of BAT or CAM mapping, you need to wire these
entries in conjunction with the way you have configured your PCI
bridges.  You also have to set up the arrays so ioremap() will find 
them.

> Ugh ? Can you explain why it "doesn't have the proper effect" ?

Because you need a way to wire a virtual address mapping to
a physical space before you have any way of allocating virtual space.
It's a chicken/egg problem, since ioremap_bot doesn't have a value
until someone has set it, and you don't know where to set it until
the board set up has determined the mapping from configuration
options.

> No. If you read properly, you'll see that it will _not_ initialize it 
> if
> it is 0, because the test virt < ioremap_bot will never be true (both
> are unsigned long) before MMU_init() is called.

I think we are talking past each other.  The reason for that code
in io_block_mapping() is so you can set multiple BAT/CAM entries,
and the lowest (smallest) wired virtual address becomes the base
of that space.  This way, you can use io_block_mapping() to set
up BATs and CAMs, and get ioremap() to use them if set.

The io_block_mapping() has never been used to request
virtual space, it's purpose is to wire virtual address spaces so
others can use them.  If all you are doing is requesting an
arbitrary virtual address to be allocated, just use ioremap().

> Damn. What I am saying is that it's plain wrong to mess around with the
> space between TASK_SIZE and KERNELBASE and we should tie them together.

Can we do that now?  The reason it wasn't done in the past was because
of the Prep memory map, our PCI configuration, and assumptions of the
macros/functions that managed those spaces.

> I still don't see any reason why we couldn't have io_block_mappingt()
> use the ioremap_bot technique to "allocate" virtual space dynamically 
> at
> the top of the address space. So far, none of your arguments 
> contradicts
> that.

You are missing the point.  The reason for io_block_mapping() isn't
to allocate virtual space for someone, it's to _wire_ a space using an
efficient mapping method so someone else can call ioremap() and
get that wired access.  Based upon various configuration options,
the board set up functions call io_block_mapping() to set up these
spaces.  Then, ioremap() just finds them in the BAT or CAM array
and says, "oh, it's a wired entry, I'll just compute the virtual address
and return that."  Unless someone tells ioremap() there are BATs or
CAMs, it won't use them.


> I don't see how that would be changed/affected in any way by making
> io_block_mapping() capable of dynamically allocating it's virtual
> space...

Ugh.  How do you know how much space is available?  How do you
know what to wire using BATs or CAMs?  Someone has to do that.
The io_block_mapping isn't a replacement to ioremap(), nor is
ioremap() a replacement for io_block_mapping().  They work together
to provide wired virtual address mapping.  In essence, 
io_block_mapping()
tells ioremap() about wired entries.

> ....  but aim to change it so that it allocates it's virtual space

But, it can't.  That's the whole purpose of the function, to determine
how much IO space can be mapped with BATs or CAMs, then to
remove that space from the vmalloc pool and wire it into a contiguous
space that can be covered by large mapping.

This is why it makes no sense to call io_block_mapping() with a zero
for a virtual address and ask it to allocate some arbitrary space.

> You can still eventually use an io_block_mapping() then to "optimize"
> the mappings to some critical HW resources (PIC ?), I just don't want
> thse v->p mapping to be hard coded.

What do you mean by "optimize"?  The whole purpose here is to
force a mapping using a BAT or CAM.  You can't do that with arbitrary
pages from vmalloc space.  You have to force the alignment and
size of the space, and the only way to do that is by simply removing
from the top of the vmalloc pool and giving it to ioremap().

Thanks.

	-- Dan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-26  6:00                     ` Dan Malek
@ 2005-05-26  6:20                       ` Eugene Surovegin
  2005-05-26 19:00                         ` Dan Malek
  2005-05-26  6:41                       ` Benjamin Herrenschmidt
  2005-05-26 16:31                       ` Matt Porter
  2 siblings, 1 reply; 29+ messages in thread
From: Eugene Surovegin @ 2005-05-26  6:20 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded, linuxppc-dev list

On Thu, May 26, 2005 at 02:00:11AM -0400, Dan Malek wrote:
> 
> >Well, The PCI IO space base just need to be in a global that is
> >referenced by _IO_BASE, it works fine, no need to hard code a mapping.
> 
> Sure you do.  No one ioremap()s PCI IO space.  It has to be hard wired
> somewhere.

Dan, you must be kidding. 44x ioremaps PCI IO space, 40x uses 
io_block_mapping for that, but this is just brain-damaged and should 
be fixed.

What is so special about PCI IO space that it must be "wired" ?

[snip]

> You are missing the point.  The reason for io_block_mapping() isn't
> to allocate virtual space for someone, it's to _wire_ a space using an
> efficient mapping method so someone else can call ioremap() and
> get that wired access.  

Wow, this is something new for me. So you are saying that 
io_block_mapping() was supposed to be used with ioremap()?

Could you point me to the port which actually does this?

So far I only saw io_block_mapping() used as a short-cut way _NOT_ to 
ioremap and get hard-coded v:p mapping and then use this knowledge to 
access physical address directly without ever calling ioremap(). And 
this is major source of problems.

Also, by this logic, if platform doesn't have BAT or CAMs or whatever, 
which effectively prevents creating this "efficient mapping" and 
hence stated purpose of io_block_mapping cannot be achieved, 
io_block_mapping() should be eliminated on this platform, right?

-- 
Eugene

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-26  6:00                     ` Dan Malek
  2005-05-26  6:20                       ` Eugene Surovegin
@ 2005-05-26  6:41                       ` Benjamin Herrenschmidt
  2005-05-26 19:32                         ` Dan Malek
  2005-05-26 20:30                         ` Mark A. Greer
  2005-05-26 16:31                       ` Matt Porter
  2 siblings, 2 replies; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2005-05-26  6:41 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-dev list, linuxppc-embedded


> > Well, The PCI IO space base just need to be in a global that is
> > referenced by _IO_BASE, it works fine, no need to hard code a mapping.
> 
> Sure you do.  No one ioremap()s PCI IO space.  It has to be hard wired
> somewhere.

And how does pmac do ? and chrp ? It's ioremap'ed by the host bridge
init code, and _IO_BASE is set to that value. That even works with
multiple domains using pointer arithmetic.

> To take advantage of BAT or CAM mapping, you need to wire these
> entries in conjunction with the way you have configured your PCI
> bridges.  You also have to set up the arrays so ioremap() will find 
> them.

io_block_mapping() as a way to set a BAT/CAM to speed up access to some
PCI resources make sense, and ioremap() can I think already find out
whether this is BAT-mapped or not. I don't see any way that can prevent
what I proposed.

> > Ugh ? Can you explain why it "doesn't have the proper effect" ?
> 
> Because you need a way to wire a virtual address mapping to
> a physical space before you have any way of allocating virtual space.

No, no, no.

> It's a chicken/egg problem, since ioremap_bot doesn't have a value
> until someone has set it, and you don't know where to set it until
> the board set up has determined the mapping from configuration
> options.

No.

ioremap_bot is set by MMU_init() and nowhere else, and to a constant
value (depending on HIGHMEM) and thus can perfectly be initialized
statically instead. It is _NOT_ intialized by io_block_mappingt() as I
explained already.


> > No. If you read properly, you'll see that it will _not_ initialize it 
> > if
> > it is 0, because the test virt < ioremap_bot will never be true (both
> > are unsigned long) before MMU_init() is called.
> 
> I think we are talking past each other.  The reason for that code
> in io_block_mapping() is so you can set multiple BAT/CAM entries,
> and the lowest (smallest) wired virtual address becomes the base
> of that space.  This way, you can use io_block_mapping() to set
> up BATs and CAMs, and get ioremap() to use them if set.

Ugh ??? Damn, you don't seem to understand that code at all.

But first, let's clear things up since you are mixing several different
things at least here. Let's make a list of _facts_ :

 - ioremap_bot. This is the "top" of the vmalloc space, and it can be
"pushed down" to make room for early allocations of virtual space (for
things like early ioremap) before full initialisation. This is
initialized by MMU_init(), which is called _after_ machine_init(), but
as I noted, it could/should be initialized statically instead. It is
initialized before ppc_md.setup_io_mappings() is called as well.

 - io_block_mapping() will push-down ioremap_bot as well if the mapping
requests a virtual address above KERNELBASE and below the current
ioremap_bot value. This is obviously necessary or further unrelated
vmalloc/ioremap's may be "allocated" to virtual addresses overlapping
the io_block_mapping() which is wrong. 

So there is nothing, I mean absolutely _nothing_ that prevents us from
"improving" io_block_mapping() so that it can request virtual space by
pushing ioremap_bot down the way ioremap() does when called early.

 - Both io_block_mapping() and ioremap() can/will call map_page(). The
former on CPUs without BATs or CAMs (or running out of them), the later
if the mapping requested is for a physical address that wasn't already
covered by a BAT or CAM. The only requirement for map_page() to work is
to be able to do pte_alloc_kernel(), which is implemented such that
when !mem_init_done, it does an early_get_page() which works using the
mem_pieces mecanism when bootmem hasn't be initialized yet ... That
means that both io_block_mapping() and ioremap() both _WORK_ at any time
after MMU_init(), or more specifically, from the time
ppc_md.setup_io_mappings() has been called. ioremap will simply allocate
virtual space by pushing ioremap_bot down, and I'm simply proposing to
add this ability to io_block_mapping() too. That will _not_ change
anything to the fact that io_block_mapping's done on BATs/CAMs will be
"recorded" in the array and thus futher ioremap's will be able to use
that etc etc etc...

 - There is _one_ important point to keep in mind, but that has always
been true: None of this work before MMU_init(), we may want to add some
BUG_ON() in there. BUG_ON(ioremap_bot == 0) would do the trick. Just in
case somebody tries to call these from platform_init().

So now, putting all the above together, what do we see ?

That the only real difference between io_block_mapping() and ioremap()
are:

 - The former allows you to setup hard code v->p mappings (but I've
showed several times now that it shouldn't be necessary anymore)

 - The former can use a "BAT" or "CAM" instead of page tables which can
be of some use for performances.

Now, what about:

 1) Adding to io_block_mapping() the ability to alloc dynamically the
virtual space. That would have 0 impact on drivers using ioremap (they
would still "benefit" from the speed up if they happen to map something
already covered by a BAT/CAM). _IO_BASE can already be a variable. Only
a few platforms hard-coding other things may need to be changed (but
that is step 1, no worry...)

 2) Getting rid of remaining hard coded mappings (heh, slowly, not all
at once, but that shouldn't be terribly difficult) and finally deprecate
the usage of io_block_mapping() with a hard coded virtual address.

 3) Heh, no more of these ? Cool ... Now the only remaining use of
io_block_mapping() is to setup those "fast" blocks to speed up some IOs.
Can we think about something like ... ahem... A special flag to pass to
__ioremap() that would make it use BATs/CAMs (the first one who says
that is complicated goes back to school, please !)

 4) ppc_md.setup_io_mappings is just a few instructions & 2 printk's
away from setup_arch(). (The common setup_arch). ppc_md.setup_arch() is
not that far away neither: ocp_init is done before, and xmon/kgdb stuff.
Do we really need that early callback _that_ early still ? can't
setup_arch() be early enough ?

> The io_block_mapping() has never been used to request
> virtual space, it's purpose is to wire virtual address spaces so
> others can use them.  If all you are doing is requesting an
> arbitrary virtual address to be allocated, just use ioremap().

No, this is not the _purpose_ of io_block_mapping(), though it's what it
does. It's purpose was lost in history :) There should be no reason ever
to want to wire virtual space. Which means the only purpose left here is
to setup mappings that are "faster" than normal page tables for critical
things.

> > Damn. What I am saying is that it's plain wrong to mess around with the
> > space between TASK_SIZE and KERNELBASE and we should tie them together.
> 
> Can we do that now?  The reason it wasn't done in the past was because
> of the Prep memory map, our PCI configuration, and assumptions of the
> macros/functions that managed those spaces.

PowerMac did that kind of cruft too, I fixed it in about 15 minutes a
few years ago. It's just a matter of killing those constants that
contain a virtual address and use variables instead. It's not like PCI
IO space or config space was a performance sensitive thing anyway :)

> You are missing the point.  The reason for io_block_mapping() isn't
> to allocate virtual space for someone, it's to _wire_ a space using an
> efficient mapping method so someone else can call ioremap() and
> get that wired access.  

But it can do that without wiring the virtual space ! That is my point.
That someone else will call ioremap with a physical address, ioremap
will figure out that it matches one of the BATs/CAMs and will return the
appropriate virtual address where this BAT/CAM is virtually mapped.
Wether that actual virtual address was hard-coded when
io_block_mapping() was called, or allocated dynamically by moving
ioremap_bot is totally irrelevant and doesn't change anything to the
effect on the driver calling ioremap.

> Based upon various configuration options,
> the board set up functions call io_block_mapping() to set up these
> spaces.  Then, ioremap() just finds them in the BAT or CAM array
> and says, "oh, it's a wired entry, I'll just compute the virtual address
> and return that."  Unless someone tells ioremap() there are BATs or
> CAMs, it won't use them.

And ? How does that invalidate any of my previous statements ?
> 
> > I don't see how that would be changed/affected in any way by making
> > io_block_mapping() capable of dynamically allocating it's virtual
> > space...
> 
> Ugh.  How do you know how much space is available?  How do you
> know what to wire using BATs or CAMs?  Someone has to do that.

Good, good, I'm not totally dismissing the usefulness of wiring some
memory using BATs and CAMs, just the fact that the virtual address is
hard coded. I don't see what is the problem with available space. That
space will be taken anyway. Since io_block_mapping() tend to be the
first thing done, it will get nicely aligned chunks of memory form
ioremap_bot, and there will be no wastage. Doesn't need to be hard coded
to obtain that result.
 
> The io_block_mapping isn't a replacement to ioremap(), nor is
> ioremap() a replacement for io_block_mapping().  They work together
> to provide wired virtual address mapping.  In essence, 
> io_block_mapping()
> tells ioremap() about wired entries.

Oh well, I've already explained what I think above, no need to repeat
myself again.

> > ....  but aim to change it so that it allocates it's virtual space
> 
> But, it can't.

But it CAN.

> That's the whole purpose of the function, to determine
> how much IO space can be mapped with BATs or CAMs, then to
> remove that space from the vmalloc pool and wire it into a contiguous
> space that can be covered by large mapping.

And what prevent the virtual address of that space to be calculated
instead of hard coded ?

> This is why it makes no sense to call io_block_mapping() with a zero
> for a virtual address and ask it to allocate some arbitrary space.

No, you don't make sense.

> What do you mean by "optimize"?  The whole purpose here is to
> force a mapping using a BAT or CAM.  You can't do that with arbitrary
> pages from vmalloc space.  You have to force the alignment and
> size of the space, and the only way to do that is by simply removing
> from the top of the vmalloc pool and giving it to ioremap().

Just having io_block_mapping() moving down ioremap_bot (with appropriate
alignement of course) would just do the trick :) Again, no need for the
caller to hard code that address.

Ben.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-26  6:00                     ` Dan Malek
  2005-05-26  6:20                       ` Eugene Surovegin
  2005-05-26  6:41                       ` Benjamin Herrenschmidt
@ 2005-05-26 16:31                       ` Matt Porter
  2005-05-26 16:54                         ` Eugene Surovegin
  2 siblings, 1 reply; 29+ messages in thread
From: Matt Porter @ 2005-05-26 16:31 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded, linuxppc-dev list

On Thu, May 26, 2005 at 02:00:11AM -0400, Dan Malek wrote:
> 
> On May 25, 2005, at 5:44 PM, Benjamin Herrenschmidt wrote:
> 
> > This is a VERY BAD habit. Just set KERNELBASE to 0x80000000 if you do
> > that, an use io_block_mapping() dynamically the way I explained to 
> > alloc
> > from the top of the address space.
> 
> Well, back when kernelbase was assumed in too many places
> to be 0xc0000000, it was the only option.
> 
> > Well, The PCI IO space base just need to be in a global that is
> > referenced by _IO_BASE, it works fine, no need to hard code a mapping.
> 
> Sure you do.  No one ioremap()s PCI IO space.  It has to be hard wired
> somewhere.

As BenH and Eugene already pointed out, this is simply not true. Most
maintained ports have completely stomped out use of io_block_map()
where it's not necessary. Which is everywhere on 4xx and 8xx. The
only place it serves any purpose at all is on green book and motbooke
processors.
 
<big snip>

> You are missing the point.  The reason for io_block_mapping() isn't
> to allocate virtual space for someone, it's to _wire_ a space using an
> efficient mapping method so someone else can call ioremap() and
> get that wired access.  Based upon various configuration options,
> the board set up functions call io_block_mapping() to set up these
> spaces.  Then, ioremap() just finds them in the BAT or CAM array
> and says, "oh, it's a wired entry, I'll just compute the virtual address
> and return that."  Unless someone tells ioremap() there are BATs or
> CAMs, it won't use them.

Why don't we try a different approach to the problem? The problem is
that io_block_mapping() is causing a ton of problems with people
abusing it. Just check the archives for all the ways people break
their ports by passing it arbitrary values.  The other issue is
that although it's dangerous, the call still serves a purposes on
those processors with BATs and CAMs. So, let's kill io_block_mapping().
i.e. the version that allows virt->phys translations to be set up
without use of BATs and CAMs. Let's add a new mmu_block_mapping()
call that will ONLY map using a BAT or CAM and is only available
on platforms with those facilities. If a free BAT or CAM is not available
or alignment/size is invalid, the call fails. I would hope that would
make everybody happy.

We still end up with a call that will help people shoot themselves
in the foot, but at least we limit it to a specific task.

-Matt

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-26 16:31                       ` Matt Porter
@ 2005-05-26 16:54                         ` Eugene Surovegin
  0 siblings, 0 replies; 29+ messages in thread
From: Eugene Surovegin @ 2005-05-26 16:54 UTC (permalink / raw)
  To: Matt Porter; +Cc: linuxppc-dev list, linuxppc-embedded

On Thu, May 26, 2005 at 09:31:19AM -0700, Matt Porter wrote:
> Why don't we try a different approach to the problem? The problem is
> that io_block_mapping() is causing a ton of problems with people
> abusing it. Just check the archives for all the ways people break
> their ports by passing it arbitrary values.  The other issue is
> that although it's dangerous, the call still serves a purposes on
> those processors with BATs and CAMs. So, let's kill io_block_mapping().
> i.e. the version that allows virt->phys translations to be set up
> without use of BATs and CAMs. Let's add a new mmu_block_mapping()
> call that will ONLY map using a BAT or CAM and is only available
> on platforms with those facilities. If a free BAT or CAM is not available
> or alignment/size is invalid, the call fails. I would hope that would
> make everybody happy.
> 
> We still end up with a call that will help people shoot themselves
> in the foot, but at least we limit it to a specific task.

I second that.

-- 
Eugene

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-26  6:20                       ` Eugene Surovegin
@ 2005-05-26 19:00                         ` Dan Malek
  2005-05-26 21:54                           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Dan Malek @ 2005-05-26 19:00 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-dev list, linuxppc-embedded


On May 26, 2005, at 2:20 AM, Eugene Surovegin wrote:

> Dan, you must be kidding. 44x ioremaps PCI IO space, 40x uses
> io_block_mapping for that, but this is just brain-damaged and should
> be fixed.

Anyone that uses assumptions of mapping from io_block_mapping()
should be fixed, unless it is absolutely necessary, like to get an
early serial port for debugging.  I've been guilty of this as well,
but everyone should be calling ioremap().

> What is so special about PCI IO space that it must be "wired" ?

My point was someone needs to set up BATs or CAMs for that
space and further configure it so the in/out macros work.  Drivers
don't call ioremap for such space (although I think they should).
Assuming mappings exist, whether by calling ioremap() or
otherwise, is Linux legacy and shouldn't be done.

> Wow, this is something new for me. So you are saying that
> io_block_mapping() was supposed to be used with ioremap()?

Except in rare circumstances, it's the way I've always used
it to set up IO mapping on embedded boards.

> Could you point me to the port which actually does this?

I don't see any of them in the 2.6 right now, but the EP and STx ports
I did should do this.  I certainly plan to use it on all of the 85xx 
ports
so we get the advantage of CAM mapping for various IO spaces.

> So far I only saw io_block_mapping() used as a short-cut way _NOT_ to
> ioremap and get hard-coded v:p mapping and then use this knowledge to
> access physical address directly without ever calling ioremap(). And
> this is major source of problems.

That's just wrong.  The only "acceptable" io_block_mapping without
an ioremap has usually been the PCI IO space.  It gets abused for
early serial ports, but that's a hack no matter how you look at it :-)

> Also, by this logic, if platform doesn't have BAT or CAMs or whatever,
> which effectively prevents creating this "efficient mapping" and
> hence stated purpose of io_block_mapping cannot be achieved,
> io_block_mapping() should be eliminated on this platform, right?

Depends.  If you need early ioremap() prior to VM initialized so
a vmalloc will work, you have to do this even if you do call ioremap().
The 8xx ports usually do this, and ioremap() for these boards should
know to not try to allocate VM space if this was done.  Most of the 8xx
ports still skip the ioremap() because the associated code changes
to ioremap() never stuck in the code.

The problem with ioremap() if you don't wire some spaces is it will try 
to
allocate VM space prior to the Linux VM being initialized.  We all want
to write drivers so they use the proper ioremap() interfaces, but many
drivers call this too early if you haven't made the provisions for 
ioremap()
to find the "efficient" map and just return the associated address.

Thanks.


	-- Dan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-26  6:41                       ` Benjamin Herrenschmidt
@ 2005-05-26 19:32                         ` Dan Malek
  2005-05-26 22:10                           ` Benjamin Herrenschmidt
  2005-05-26 20:30                         ` Mark A. Greer
  1 sibling, 1 reply; 29+ messages in thread
From: Dan Malek @ 2005-05-26 19:32 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, linuxppc-embedded


On May 26, 2005, at 2:41 AM, Benjamin Herrenschmidt wrote:

> io_block_mapping() as a way to set a BAT/CAM to speed up access to some
> PCI resources make sense, and ioremap() can I think already find out
> whether this is BAT-mapped or not. I don't see any way that can prevent
> what I proposed.

And that's the only way it should be used.  Calling io_block_mapping()
with a zero virtual address makes no sense.

> ioremap_bot is set by MMU_init() and nowhere else, and to a constant
> value (depending on HIGHMEM) and thus can perfectly be initialized
> statically instead. It is _NOT_ intialized by io_block_mappingt() as I
> explained already.

I'm only going to say this once more with an example.  A call to
io_block_mapping() may change ioremap_bot.   If ioremap_bot
is set to say 0xf0000000, and someone says ".. I need to allocate
more VM and phys space to my IO to make sure my devices are
covered "  will do an io_block_mapping(0xe0000000, 0xe0000000, size, 
flags)
This will then push ioremap_bot down to 0xe0000000 to ensure
calls made to ioremap() won't multiply map this space if the Linux
VM has not been initialized.


>  - io_block_mapping() will push-down ioremap_bot as well if the mapping
> requests a virtual address above KERNELBASE and below the current
> ioremap_bot value. This is obviously necessary or further unrelated
> vmalloc/ioremap's may be "allocated" to virtual addresses overlapping
> the io_block_mapping() which is wrong.

So, why are you telling me I don't understand the code when this is
what I've been trying to tell you for the last several days? :-)

> So there is nothing, I mean absolutely _nothing_ that prevents us from
> "improving" io_block_mapping() so that it can request virtual space by
> pushing ioremap_bot down the way ioremap() does when called early.

The io_block_mapping() is used in conjunction with setting up BATs or
CAMs that have size and alignment restrictions.  The io_block_mapping
is not a memory allocator and isn't intended to be used as a substitute
for ioremap().  If code is doing that, fix that and stop blaming a 
useful
set up function.

Somewhere, someone has to know all of these alignment concerns.
I think it's just easier to use a simple call to io_block_mapping with 
the
values you want for a particular processor/board port than to make up
some complex scheme for computing these values that is going to
vary among the different processors.

So, just leave ppc_md.setup_io_mappings, and if a board port chooses
to modify the mappings as an extension of MMU_init, then fine.  You
can achieve the same results by calling setbat() or settlbcam() and
managing those resources yourself, or you can get the advantage of
using io_block_mapping() to do it.  In the end, you have to allow
this to be done, so instead of calling io_block_mapping() I'll just
make all of the board set up functions call the appropriate functions
and update ioremap_bot, just like io_block_mapping() does.

>  - There is _one_ important point to keep in mind, but that has always
> been true: None of this work before MMU_init(), we may want to add some
> BUG_ON() in there. BUG_ON(ioremap_bot == 0) would do the trick. Just in
> case somebody tries to call these from platform_init().

There are various other horrible hacks we do to accommodate this, too 
:-)

> That the only real difference between io_block_mapping() and ioremap()
> are:
>
>  - The former allows you to setup hard code v->p mappings (but I've
> showed several times now that it shouldn't be necessary anymore)

As I have said, it is necessary for the proper alignment and allocation
of VM space so BATs/CAMs work and someone else doesn't multiply
map the space.

>  - The former can use a "BAT" or "CAM" instead of page tables which can
> be of some use for performances.

This is extremely important and something we have always done.
We already have too many performance issues with 2.6 to continually
disregard these features.

>  1) Adding to io_block_mapping() the ability to alloc dynamically the
> virtual space. That would have 0 impact on drivers using ioremap

Yes, it would have a big impact because you can't map BATs/CAMs
to arbitrary addresses.

> ... A special flag to pass to
> __ioremap() that would make it use BATs/CAMs (the first one who says
> that is complicated goes back to school, please !)

It is complicated, and I've spent more time in school than your age :-)
How do you know how many BATs are available?  How big? How
much to allocate?  In real-time embedded systems you need limits
and known resource allocation areas.  Often, these embedded systems
need careful tuning to make everything fit in the address spaces,
something that isn't going to be known or likely to be done correctly.

You need to work on a real production system that has resource
limitations.  Yes, we do count bytes of space used by the kernel, IO,
applications, flash, ram, everything, and they try to make it fit.  
Functions
like these are critical to make it happen.

If you don't want to use them, fine, but please don't be taking away
important features for embedded systems just because you don't
see a use for them or know how to use them correctly.

Anyway, I'm done.  If you want to remove it, then please fix up and test
all boards that use it.  Be prepared to see it emerge again when I
need this feature in embedded systems.

Thanks.


	-- Dan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-26  6:41                       ` Benjamin Herrenschmidt
  2005-05-26 19:32                         ` Dan Malek
@ 2005-05-26 20:30                         ` Mark A. Greer
  2005-05-26 22:13                           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 29+ messages in thread
From: Mark A. Greer @ 2005-05-26 20:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, linuxppc-embedded

Benjamin Herrenschmidt wrote:

> - There is _one_ important point to keep in mind, but that has always
>been true: None of this work before MMU_init(), 
>  
>

This is very true and raises a couple issues that we should fix while 
we're at it:

1) There are progress calls in MMU_init which will try to access the 
uart before its possible to create a mapping to the uart's regs 
(assuming you don't make a hack to map them and that you set up 
ppc_md.progress in your platform_init routine).  We should either get 
rid of those calls in MMU_init, provide an acceptable way to make 
temporary pre-MMU_init mappings, or make sure nobody sets up 
ppc_md.progress until ioremap is working (and also get rid of the calls 
in MMU_init b/c they're never used).

2) Some firmwares don't provide any info on how much memory is in the 
system but MMU_init needs to know that.  So the platform code has to 
read the SPD from the mem sticks via i2c, read the mem ctlr, or read a 
board reg that has the info.  All of those require access to hw regs 
before or during MMU_init.  I should be able to get rid of this one by 
figuring out the amount of memory in the bootwrapper and passing it in 
to the kernel.  I am assuming that all the boards with this problem use 
the bootwrapper.  I think that's a safe assumption but I'll have to verify.

BTW, these are the reasons that I made that set_bat hack that Dan is so 
fond of.  :)  I'll get rid of that hack but I need an answer to 1) first.

Mark

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-26 19:00                         ` Dan Malek
@ 2005-05-26 21:54                           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2005-05-26 21:54 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-dev list, linuxppc-embedded

On Thu, 2005-05-26 at 15:00 -0400, Dan Malek wrote:

> The problem with ioremap() if you don't wire some spaces is it will try 
> to
> allocate VM space prior to the Linux VM being initialized. 

No it won't. It will just push ioremap_bot down, and has always done
that.

Ben.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-26 19:32                         ` Dan Malek
@ 2005-05-26 22:10                           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2005-05-26 22:10 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-dev list, linuxppc-embedded


> > ioremap_bot is set by MMU_init() and nowhere else, and to a constant
> > value (depending on HIGHMEM) and thus can perfectly be initialized
> > statically instead. It is _NOT_ intialized by io_block_mappingt() as I
> > explained already.
> 
> I'm only going to say this once more with an example.  A call to
> io_block_mapping() may change ioremap_bot.   If ioremap_bot
> is set to say 0xf0000000, and someone says ".. I need to allocate
> more VM and phys space to my IO to make sure my devices are
> covered "  will do an io_block_mapping(0xe0000000, 0xe0000000, size, 
> flags)
> This will then push ioremap_bot down to 0xe0000000 to ensure
> calls made to ioremap() won't multiply map this space if the Linux
> VM has not been initialized.

Damn, thanks for repeating what I've been explaining for 3 mails.
(Sorry, I admit, my wording above shouldn't have been "ioremap_bot is
set" but "ioremap_bot is initialized").

io_block_mapping() does _nothing_ much different than ioremap in that
regard, it just "pushes it down" to avoid further conflicts. That is
100% compatible to dynamically allocating.

> >  - io_block_mapping() will push-down ioremap_bot as well if the mapping
> > requests a virtual address above KERNELBASE and below the current
> > ioremap_bot value. This is obviously necessary or further unrelated
> > vmalloc/ioremap's may be "allocated" to virtual addresses overlapping
> > the io_block_mapping() which is wrong.
> 
> So, why are you telling me I don't understand the code when this is
> what I've been trying to tell you for the last several days? :-)

That is what _I_ have been trying to tell you, dammit ! That and the
fact that ioremap does the exact same thing :)

> The io_block_mapping() is used in conjunction with setting up BATs or
> CAMs that have size and alignment restrictions.  The io_block_mapping
> is not a memory allocator and isn't intended to be used as a substitute
> for ioremap().  If code is doing that, fix that and stop blaming a 
> useful set up function.

Ugh ? This is out of topic. Dynamically allocating the virtual space
doesn't in any way prevent setting up BAT or CAMs ...

> Somewhere, someone has to know all of these alignment concerns.

Yup, and io_block_mapping() does know, and that doesn't prevent it to
allocate virtual addresses dynamically. How many times should I repeat
the same thing before you get it ? In French maybe ?

> I think it's just easier to use a simple call to io_block_mapping with 
> the values you want for a particular processor/board port than to make up
> some complex scheme for computing these values that is going to
> vary among the different processors.

More complex scheme ? Ugh ... A mask ! that is complex ?!?! In fact, you
move the complexity to the writer of the support code :)

> So, just leave ppc_md.setup_io_mappings, and if a board port chooses
> to modify the mappings as an extension of MMU_init, then fine.  You
> can achieve the same results by calling setbat() or settlbcam() and
> managing those resources yourself, or you can get the advantage of
> using io_block_mapping() to do it.  In the end, you have to allow
> this to be done, so instead of calling io_block_mapping() I'll just
> make all of the board set up functions call the appropriate functions
> and update ioremap_bot, just like io_block_mapping() does.

Ugh ? I can't make any sense of the above. It looks like you are trying
hard not to understand what I'm saying and proposing.

> >  - There is _one_ important point to keep in mind, but that has always
> > been true: None of this work before MMU_init(), we may want to add some
> > BUG_ON() in there. BUG_ON(ioremap_bot == 0) would do the trick. Just in
> > case somebody tries to call these from platform_init().
> 
> There are various other horrible hacks we do to accommodate this, too 
> :-)

No, there are not. It doesn't work. Calling io_block_mapping or ioremap
before MMU_init() will screw you up. Period. 

> > That the only real difference between io_block_mapping() and ioremap()
> > are:
> >
> >  - The former allows you to setup hard code v->p mappings (but I've
> > showed several times now that it shouldn't be necessary anymore)
> 
> As I have said, it is necessary for the proper alignment and allocation
> of VM space so BATs/CAMs work and someone else doesn't multiply
> map the space.
> 
No it's not. First you are AGAIN mixing two different things.
Alignement, and multiple allocation of the virtual space.

 - Alignement can be dealt very easily. First, top-align the size (we
have to do that anyway), and then, do ioremap_bot -= size; and finally,
down-align ioremap_bot, and miracle ! you get your new virtual address !

 - Multiple allocation of virtual space: that is a non issue since we
are moving ioremap_bot down. That's also what ioremap does. There is NO
problem here, unless you try calling them before MMU_init() of course.

> >  - The former can use a "BAT" or "CAM" instead of page tables which can
> > be of some use for performances.
> 
> This is extremely important and something we have always done.
> We already have too many performance issues with 2.6 to continually
> disregard these features.

Nobody is disregarding that feature. You are again trying very hard not
to understand what I'm saying

> >  1) Adding to io_block_mapping() the ability to alloc dynamically the
> > virtual space. That would have 0 impact on drivers using ioremap
> 
> Yes, it would have a big impact because you can't map BATs/CAMs
> to arbitrary addresses.

Who is talking about arbitrary addresses ? It's just a matter of
aligning down properly ioremap_bot.

> > ... A special flag to pass to
> > __ioremap() that would make it use BATs/CAMs (the first one who says
> > that is complicated goes back to school, please !)
> 
> It is complicated, and I've spent more time in school than your age :-)

Maybe too much ? :)

> How do you know how many BATs are available?

We do, we have an index in the array, and we can even scan the array for
valid bits if we want to.

> How big? How much to allocate?  

The first size that fits the requested argument to ioremap, again, very
easy.

> In real-time embedded systems you need limits and known resource allocation areas.

Ugh ?

> Often, these embedded systems need careful tuning to make everything fit in the address
> spaces, something that isn't going to be known or likely to be done correctly.

Heh, again, ioremap_bot starts nicely aligned, so if you do your BAT
allocation (with either io_block_mapping() as I suggested or with a
modified ioremap) first thing first in setup_io_mappings(), they'll get
nicely aligned near the top of your address space and you won't "lose"
anything.

> You need to work on a real production system that has resource
> limitations.  Yes, we do count bytes of space used by the kernel, IO,
> applications, flash, ram, everything, and they try to make it fit.  
> Functions
> like these are critical to make it happen.

Bla bla bla bla... I've heard that way too much. It's the magical excuse
against anything.

> If you don't want to use them, fine, but please don't be taking away
> important features for embedded systems just because you don't
> see a use for them or know how to use them correctly.

No, it's not an "important" feature, and it can be safely removed
without problem. 

> Anyway, I'm done.  If you want to remove it, then please fix up and test
> all boards that use it.  Be prepared to see it emerge again when I
> need this feature in embedded systems.

No, you won't need it unless you do things wrong.

Ben.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-26 20:30                         ` Mark A. Greer
@ 2005-05-26 22:13                           ` Benjamin Herrenschmidt
  2005-05-26 22:16                             ` Mark A. Greer
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Herrenschmidt @ 2005-05-26 22:13 UTC (permalink / raw)
  To: Mark A. Greer; +Cc: linuxppc-dev list, linuxppc-embedded

On Thu, 2005-05-26 at 13:30 -0700, Mark A. Greer wrote:
> Benjamin Herrenschmidt wrote:
> 
> > - There is _one_ important point to keep in mind, but that has always
> >been true: None of this work before MMU_init(), 
> >  
> >
> 
> This is very true and raises a couple issues that we should fix while 
> we're at it:
> 
> 1) There are progress calls in MMU_init which will try to access the 
> uart before its possible to create a mapping to the uart's regs 
> (assuming you don't make a hack to map them and that you set up 
> ppc_md.progress in your platform_init routine).  We should either get 
> rid of those calls in MMU_init, provide an acceptable way to make 
> temporary pre-MMU_init mappings, or make sure nobody sets up 
> ppc_md.progress until ioremap is working (and also get rid of the calls 
> in MMU_init b/c they're never used).

Or have the implementation of progress() check if the mapping was done
or not ... In any ways, I always disliked ppc_md.progress deeply. It's
ugly and clutters the code. It has never proven very useful to me vs.
having an early console.

> 2) Some firmwares don't provide any info on how much memory is in the 
> system but MMU_init needs to know that.  So the platform code has to 
> read the SPD from the mem sticks via i2c, read the mem ctlr, or read a 
> board reg that has the info.  All of those require access to hw regs 
> before or during MMU_init.  I should be able to get rid of this one by 
> figuring out the amount of memory in the bootwrapper and passing it in 
> to the kernel.  I am assuming that all the boards with this problem use 
> the bootwrapper.  I think that's a safe assumption but I'll have to verify.

Yes, the boot wrapper is the way to go here.

> BTW, these are the reasons that I made that set_bat hack that Dan is so 
> fond of.  :)  I'll get rid of that hack but I need an answer to 1) first.
> 
> Mark

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RFC: Deprecating io_block_mapping
  2005-05-26 22:13                           ` Benjamin Herrenschmidt
@ 2005-05-26 22:16                             ` Mark A. Greer
  0 siblings, 0 replies; 29+ messages in thread
From: Mark A. Greer @ 2005-05-26 22:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, linuxppc-embedded

Benjamin Herrenschmidt wrote:

>On Thu, 2005-05-26 at 13:30 -0700, Mark A. Greer wrote:
>  
>
>>Benjamin Herrenschmidt wrote:
>>
>>    
>>
>>>- There is _one_ important point to keep in mind, but that has always
>>>been true: None of this work before MMU_init(), 
>>> 
>>>
>>>      
>>>
>>This is very true and raises a couple issues that we should fix while 
>>we're at it:
>>
>>1) There are progress calls in MMU_init which will try to access the 
>>uart before its possible to create a mapping to the uart's regs 
>>(assuming you don't make a hack to map them and that you set up 
>>ppc_md.progress in your platform_init routine).  We should either get 
>>rid of those calls in MMU_init, provide an acceptable way to make 
>>temporary pre-MMU_init mappings, or make sure nobody sets up 
>>ppc_md.progress until ioremap is working (and also get rid of the calls 
>>in MMU_init b/c they're never used).
>>    
>>
>
>Or have the implementation of progress() check if the mapping was done
>or not ...
>

Doesn't seem worth it to me.

> In any ways, I always disliked ppc_md.progress deeply. It's
>ugly and clutters the code. It has never proven very useful to me vs.
>having an early console.
>

Okay, let's rip it out of MMU_init then.  Anyone have a problem with that?

Mark

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2005-05-26 22:16 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-25  1:30 RFC: Deprecating io_block_mapping Benjamin Herrenschmidt
2005-05-25  2:17 ` Kumar Gala
2005-05-25  2:21   ` Benjamin Herrenschmidt
2005-05-25  2:30     ` Kumar Gala
2005-05-25  5:00       ` Dan Malek
2005-05-25  6:07         ` Pantelis Antoniou
2005-05-25  5:14     ` Dan Malek
2005-05-25  5:20       ` Benjamin Herrenschmidt
2005-05-25  5:49         ` Dan Malek
2005-05-25  6:00           ` Benjamin Herrenschmidt
2005-05-25  6:08             ` Kumar Gala
2005-05-25  7:04               ` Benjamin Herrenschmidt
2005-05-25 16:36                 ` Dan Malek
2005-05-25 21:44                   ` Benjamin Herrenschmidt
2005-05-26  6:00                     ` Dan Malek
2005-05-26  6:20                       ` Eugene Surovegin
2005-05-26 19:00                         ` Dan Malek
2005-05-26 21:54                           ` Benjamin Herrenschmidt
2005-05-26  6:41                       ` Benjamin Herrenschmidt
2005-05-26 19:32                         ` Dan Malek
2005-05-26 22:10                           ` Benjamin Herrenschmidt
2005-05-26 20:30                         ` Mark A. Greer
2005-05-26 22:13                           ` Benjamin Herrenschmidt
2005-05-26 22:16                             ` Mark A. Greer
2005-05-26 16:31                       ` Matt Porter
2005-05-26 16:54                         ` Eugene Surovegin
2005-05-25  4:48   ` Dan Malek
2005-05-25  4:45 ` Dan Malek
2005-05-25  5:15   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).