On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/."

public inbox for linux-sh@vger.kernel.org
 help / color / mirror / Atom feed

* On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/."
@ 2009-03-17  7:04 Francesco VIRLINZI
  2009-03-17  7:15 ` Paul Mundt
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Francesco VIRLINZI @ 2009-03-17  7:04 UTC (permalink / raw)
  To: linux-sh

Hi Paul
I see you moved the sh-mobile suspend code in a dedicated directory.

What do you think about my proposal on a generic sh4 suspend interpreter?

Regards
 Francesco

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/."
  2009-03-17  7:04 On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/." Francesco VIRLINZI
@ 2009-03-17  7:15 ` Paul Mundt
  2009-03-17 10:35 ` Francesco VIRLINZI
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul Mundt @ 2009-03-17  7:15 UTC (permalink / raw)
  To: linux-sh

On Tue, Mar 17, 2009 at 08:04:28AM +0100, Francesco VIRLINZI wrote:
> Hi Paul
> I see you moved the sh-mobile suspend code in a dedicated directory.
> 
At the moment it is SH-Mobile specific, so that is the best place for it.

> What do you think about my proposal on a generic sh4 suspend interpreter?
> 
While I don't have any real objections against the interpreter idea, this
is something you and Magnus will have to sort out. My expectation is that
this is something that will be gradually implemented and changed over
time, and we can certainly move code around whenever there is a need. For
now there are no non-shmobile parts that can use this code, so this is
the best place for it.

As we move to a more generic implementation, the SH-Mobile bits will
likely be simplified quite a bit, but the extra abstraction still makes
sense, as it is a common characteristic for this family of parts.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/."
  2009-03-17  7:04 On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/." Francesco VIRLINZI
  2009-03-17  7:15 ` Paul Mundt
@ 2009-03-17 10:35 ` Francesco VIRLINZI
  2009-03-17 11:31 ` Magnus Damm
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Francesco VIRLINZI @ 2009-03-17 10:35 UTC (permalink / raw)
  To: linux-sh

Hi Paul, Magnus
>>     
> While I don't have any real objections against the interpreter idea, this
> is something you and Magnus will have to sort out. My expectation is that
> this is something that will be gradually implemented and changed over
> time, and we can certainly move code around whenever there is a need. For
> now there are no non-shmobile parts that can use this code, so this is
> the best place for it.
>
> As we move to a more generic implementation, the SH-Mobile bits will
> likely be simplified quite a bit, but the extra abstraction still makes
> sense, as it is a common characteristic for this family of parts.
>   
What do you think if we begin with a collection of all the shared 
requirement/constraint
 and after that we write on top of that the code required?

Regards
 Francesco

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/."
  2009-03-17  7:04 On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/." Francesco VIRLINZI
  2009-03-17  7:15 ` Paul Mundt
  2009-03-17 10:35 ` Francesco VIRLINZI
@ 2009-03-17 11:31 ` Magnus Damm
  2009-03-17 14:43 ` Francesco VIRLINZI
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Magnus Damm @ 2009-03-17 11:31 UTC (permalink / raw)
  To: linux-sh

Hi Francesco,

Sorry for not getting back to you earlier!

On Tue, Mar 17, 2009 at 7:35 PM, Francesco VIRLINZI
<francesco.virlinzi@st.com> wrote:
> What do you think if we begin with a collection of all the shared
> requirement/constraint
> and after that we write on top of that the code required?

Sure. Let's do it step by step. I like the idea of some kind of
interpreter, but I was hoping on that we can agree on something a bit
simpler.

My main question is: Do you really need to be able to load and save
data from the interpreter?

Right now your implementation has multiple tables. I'd prefer to just
generate a bunch of inline operations that we either load into the
cache or put in on-chip ram depending on processor type. Each
operation would be self-contained.

What are your requirements?

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/."
  2009-03-17  7:04 On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/." Francesco VIRLINZI
                   ` (2 preceding siblings ...)
  2009-03-17 11:31 ` Magnus Damm
@ 2009-03-17 14:43 ` Francesco VIRLINZI
  2009-03-18  9:53 ` Magnus Damm
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Francesco VIRLINZI @ 2009-03-17 14:43 UTC (permalink / raw)
  To: linux-sh

Hi Magnus
>
> Sure. Let's do it step by step. I like the idea of some kind of
> interpreter, but I was hoping on that we can agree on something a bit
> simpler.
If possible why not
>
> My main question is: Do you really need to be able to load and save
> data from the interpreter?
This isn't really a problem or an issue.
The current code work mainly with only 2 tables:
 - an instruction table
 - a resource table

The resource table is an array of base address of usable resource, it 
doesn't matter what the resource is... each
 resource can be hw or memory... the writable table (in my convention) 
is only the first element of
 ioresource table.

For this reason in the interpreter there is no real instruction 'data' 
oriented...
Each interpreter instruction works with the i-th element of an k-th 
ioresources.

In the code I have only to be care the 'writable' table is in Dcache... 
but after that
 for the interpreter point of vie the writable table and an hw-resource 
are perfectly equivalent.

>
> Right now your implementation has multiple tables.

>  I'd prefer to just
> generate a bunch of inline operations that we either load into the
> cache or put in on-chip ram depending on processor type. Each
> operation would be self-contained.
I agree and for this reason I wrote several macros on top of 
instructions to simplify the
 table creation, but I'd like maintain the current granularity because 
'ad-hoc' instruction
 could be too much hw dependent and we could need to add other asm code 
to support
 any new hw... While with this instruction granularity a new hw means at 
least
 some other macro to build the right instruction the new hw needs.
>
> What are your requirements?
In the ST40 we have no ipref instruction and in the latest version it 
was removed the
 RAM mode cache support.
Therefore the code has to be preloaded in Icache with the jumper 
mechanism I did (other suggestion?)
Moreover I don't want set constraint on how many resource the 
interpreter can use.
This means that the interpreter should be able to manage (teorically) 
any hw resource not only the dram,
 for example I strongly need access the Clocks-IP.

Regards
 Francesco

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/."
  2009-03-17  7:04 On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/." Francesco VIRLINZI
                   ` (3 preceding siblings ...)
  2009-03-17 14:43 ` Francesco VIRLINZI
@ 2009-03-18  9:53 ` Magnus Damm
  2009-03-18 14:17 ` Francesco VIRLINZI
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Magnus Damm @ 2009-03-18  9:53 UTC (permalink / raw)
  To: linux-sh

Hi Francesco,

Thanks for your email and all your patches. Sorry for slow response.

On Tue, Mar 17, 2009 at 11:43 PM, Francesco VIRLINZI
<francesco.virlinzi@st.com> wrote:
>> My main question is: Do you really need to be able to load and save
>> data from the interpreter?
>
> This isn't really a problem or an issue.

From a functional point of view I'm sure your interpreter is just
fine. We just need to discuss how to do it in a simple and powerful
way. Maybe your patch is all that already, but if so I need time to
understand it.

Things like this takes time. I must have made at least 5 different
intc prototypes before i got something that was simple and still
powerful enough. I hope we can come up with something a bit quicker
this time though. =)

> The current code work mainly with only 2 tables:
> - an instruction table
> - a resource table
>
> The resource table is an array of base address of usable resource, it
> doesn't matter what the resource is... each
> resource can be hw or memory... the writable table (in my convention) is
> only the first element of
> ioresource table.
>
> For this reason in the interpreter there is no real instruction 'data'
> oriented...
> Each interpreter instruction works with the i-th element of an k-th
> ioresources.

So you may pass the base address of some hardware block as a resource,
and then use offset from the base address to get to registers inside
the hardware block. Or did I misunderstand?

>>  I'd prefer to just
>> generate a bunch of inline operations that we either load into the
>> cache or put in on-chip ram depending on processor type. Each
>> operation would be self-contained.
>
> I agree and for this reason I wrote several macros on top of instructions to
> simplify the
> table creation, but I'd like maintain the current granularity because
> 'ad-hoc' instruction
> could be too much hw dependent and we could need to add other asm code to
> support
> any new hw... While with this instruction granularity a new hw means at
> least
> some other macro to build the right instruction the new hw needs.

I feel these multiple levels of macros and tables are a bit confusing.
Can't we just create a set of operations where each operation is fixed
size? No tables needed, everything is just put in instruction cache.

If it were up to me, then I'd go with a set of simple operations that
read and write data in memory at some address. The address should be
immediate for the operation. Those operations can point to either a
hardware register or some memory.
Maybe loading immediate and some bit manipulation operations would be
good to have as well. And you had some wait operation as well.

>> What are your requirements?
>
> In the ST40 we have no ipref instruction and in the latest version it was
> removed the
> RAM mode cache support.
> Therefore the code has to be preloaded in Icache with the jumper mechanism I
> did (other suggestion?)

I'd go with fixed sized operations where one operation is maximum one
cache line. And the first part of each operation has an instruction
that just does a jump to the beginning of the next operation (next or
same cache line). This jumping will go on until a magic end operation
is reached. That's how I would preload the operations.

Each operation has two entry points, one for preloading and another
one for running the actual code.

Together with this we need code that generates the operations for us.
A bit like a relocator or linker. I imagine that each operation is
implemented in assembly, and data needed by the operation is located
in the end at a fixed offset from the beginning of the operation. Then
we have C functions that just copies the code into a destination
address and fills in the data with whatever needed.

And for each processor or board we have some code that calls the C
functions that generates the operations for us. The argument to these
functions can be anything.

A bit like QEMU. =)

> Moreover I don't want set constraint on how many resource the interpreter
> can use.
> This means that the interpreter should be able to manage (teorically) any hw
> resource not only the dram,
> for example I strongly need access the Clocks-IP.

I agree it's best not to limit. But I think regular compiled C will
always be more efficient than interpreting, so I would go with as few
operations as possible.

Wouldn't a 32-bit pointer work well for you?

So basically I'd like to get rid of the your resource + offset
abstraction and go with regular addresses. What do you think?

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/."
  2009-03-17  7:04 On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/." Francesco VIRLINZI
                   ` (4 preceding siblings ...)
  2009-03-18  9:53 ` Magnus Damm
@ 2009-03-18 14:17 ` Francesco VIRLINZI
  2009-03-19  5:39 ` Magnus Damm
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Francesco VIRLINZI @ 2009-03-18 14:17 UTC (permalink / raw)
  To: linux-sh

Hi Magnus
>> For this reason in the interpreter there is no real instruction 'data'
>> oriented...
>> Each interpreter instruction works with the i-th element of an k-th
>> ioresources.
>>     
>
> So you may pass the base address of some hardware block as a resource,
> and then use offset from the base address to get to registers inside
> the hardware block. Or did I misunderstand?
>   
No you didn't. You are right the interpreter can manage registers inside 
the hardware block.
> I'd go with fixed sized operations where one operation is maximum one
> cache line. And the first part of each operation has an instruction
> that just does a jump to the beginning of the next operation (next or
> same cache line). This jumping will go on until a magic end operation
> is reached. That's how I would preload the operations.
>   
Ok.
> Each operation has two entry points, one for preloading and another
> one for running the actual code.
>
> Together with this we need code that generates the operations for us.
> A bit like a relocator or linker. I imagine that each operation is
> implemented in assembly, and data needed by the operation is located
> in the end at a fixed offset from the beginning of the operation. Then
> we have C functions that just copies the code into a destination
> address and fills in the data with whatever needed.
>
> And for each processor or board we have some code that calls the C
> functions that generates the operations for us. The argument to these
> functions can be anything.
>
> A bit like QEMU. =)
>
>
> Wouldn't a 32-bit pointer work well for you?
>   
Yes
> So basically I'd like to get rid of the your resource + offset
> abstraction and go with regular addresses. What do you think?
>   
The main reason of 'resource + offset' was due on a different PMB usage 
model in our kernel.
But more probably with your "relocator - linker" at runtime I should be 
able to close every issue.


Do you have already some stuff/prototype?
Regards
 Framcescp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/."
  2009-03-17  7:04 On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/." Francesco VIRLINZI
                   ` (5 preceding siblings ...)
  2009-03-18 14:17 ` Francesco VIRLINZI
@ 2009-03-19  5:39 ` Magnus Damm
  2009-03-19  7:17 ` Francesco VIRLINZI
  2009-03-23 10:29 ` Magnus Damm
  8 siblings, 0 replies; 10+ messages in thread
From: Magnus Damm @ 2009-03-19  5:39 UTC (permalink / raw)
  To: linux-sh

Hi Francesco!

On Wed, Mar 18, 2009 at 11:17 PM, Francesco VIRLINZI
<francesco.virlinzi@st.com> wrote:
> Hi Magnus
>>>
>>> For this reason in the interpreter there is no real instruction 'data'
>>> oriented...
>>> Each interpreter instruction works with the i-th element of an k-th
>>> ioresources.
>>>
>>
>> So you may pass the base address of some hardware block as a resource,
>> and then use offset from the base address to get to registers inside
>> the hardware block. Or did I misunderstand?
>>
>
> No you didn't. You are right the interpreter can manage registers inside the
> hardware block.

Good. Then we're on the same page!

>> I'd go with fixed sized operations where one operation is maximum one
>> cache line. And the first part of each operation has an instruction
>> that just does a jump to the beginning of the next operation (next or
>> same cache line). This jumping will go on until a magic end operation
>> is reached. That's how I would preload the operations.
>>
> Ok.

Do you think the above would work in your case?

>> So basically I'd like to get rid of the your resource + offset
>> abstraction and go with regular addresses. What do you think?
>>
>
> The main reason of 'resource + offset' was due on a different PMB usage
> model in our kernel.
> But more probably with your "relocator - linker" at runtime I should be able
> to close every issue.
>
>
> Do you have already some stuff/prototype?

No, no code exists at this point. I hope to find some time to hack up
a prototype next week. We probably need a couple of iterations before
we both are happy so I suspect the interpreter stuff would be 2.6.31
material. Is that ok, or do you have any special requirements from
your side?

I also need to focus on fixing up the clock framework and cpuidle. And
cpufreq of course. Many things. =)

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/."
  2009-03-17  7:04 On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/." Francesco VIRLINZI
                   ` (6 preceding siblings ...)
  2009-03-19  5:39 ` Magnus Damm
@ 2009-03-19  7:17 ` Francesco VIRLINZI
  2009-03-23 10:29 ` Magnus Damm
  8 siblings, 0 replies; 10+ messages in thread
From: Francesco VIRLINZI @ 2009-03-19  7:17 UTC (permalink / raw)
  To: linux-sh

Hi Magnus
>> No you didn't. You are right the interpreter can manage registers inside the
>> hardware block.
>>     
>
> Good. Then we're on the same page!
>   
What do you mean with 'page'? I don't see 'page' issue until we don't 
use the TLB

>   
>>> I'd go with fixed sized operations where one operation is maximum one
>>> cache line. And the first part of each operation has an instruction
>>> that just does a jump to the beginning of the next operation (next or
>>> same cache line). This jumping will go on until a magic end operation
>>> is reached. That's how I would preload the operations.
>>>       
> Do you think the above would work in your case?
>   
Yes, I think so.
> No, no code exists at this point. I hope to find some time to hack up
> a prototype next week. We probably need a couple of iterations before
> we both are happy so I suspect the interpreter stuff would be 2.6.31
> material. Is that ok, or do you have any special requirements from
> your side?
>   
It isn't a real problem for me.
In February we did our first kernel release with  pm support and I can 
maintain this
 code until the official pm support will be integrated in the kernel 
after that I will (slowly..)
 move our SOCs in the new pm infrastructure.
Let me say I have my 'recovery' solution therefore the time required
 for investigation/design/prototype isn't a problem.

> I also need to focus on fixing up the clock framework
What about clock framework?
I'd like an integrated support for standby (as I did for hibernation... 
also if I know the standby is more difficult
 than hibernation).

Regards
 Francesco

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/."
  2009-03-17  7:04 On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/." Francesco VIRLINZI
                   ` (7 preceding siblings ...)
  2009-03-19  7:17 ` Francesco VIRLINZI
@ 2009-03-23 10:29 ` Magnus Damm
  8 siblings, 0 replies; 10+ messages in thread
From: Magnus Damm @ 2009-03-23 10:29 UTC (permalink / raw)
  To: linux-sh

On Thu, Mar 19, 2009 at 4:17 PM, Francesco VIRLINZI
<francesco.virlinzi@st.com> wrote:
> What do you mean with 'page'? I don't see 'page' issue until we don't use
> the TLB

Hehe, sorry for my unclear language. I didn't mean page as memory
management unit. =)

>> No, no code exists at this point. I hope to find some time to hack up
>> a prototype next week. We probably need a couple of iterations before
>> we both are happy so I suspect the interpreter stuff would be 2.6.31
>> material. Is that ok, or do you have any special requirements from
>> your side?
>>
>
> It isn't a real problem for me.
> In February we did our first kernel release with  pm support and I can
> maintain this
> code until the official pm support will be integrated in the kernel after
> that I will (slowly..)
> move our SOCs in the new pm infrastructure.
> Let me say I have my 'recovery' solution therefore the time required
> for investigation/design/prototype isn't a problem.

Good. Sounds like you will be able to spend some time in the future.

>> I also need to focus on fixing up the clock framework
>
> What about clock framework?
> I'd like an integrated support for standby (as I did for hibernation... also
> if I know the standby is more difficult
> than hibernation).

I agree. What more is needed for standby in your opinion?

Cheers,

/ magnus

PS. If you have time, please fix clock framework warning in sh-2.6
introduced by suspend code (i think).

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-03-23 10:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-17  7:04 On "sh: Consolidate SH-Mobile CPU code in arch/sh/kernel/cpu/shmobile/." Francesco VIRLINZI
2009-03-17  7:15 ` Paul Mundt
2009-03-17 10:35 ` Francesco VIRLINZI
2009-03-17 11:31 ` Magnus Damm
2009-03-17 14:43 ` Francesco VIRLINZI
2009-03-18  9:53 ` Magnus Damm
2009-03-18 14:17 ` Francesco VIRLINZI
2009-03-19  5:39 ` Magnus Damm
2009-03-19  7:17 ` Francesco VIRLINZI
2009-03-23 10:29 ` Magnus Damm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox