LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Roland Dreier @ 2006-05-09 18:36 UTC (permalink / raw)
  To: Shirley Ma
  Cc: linux-kernel, openib-general, linuxppc-dev, Christoph Raisch,
	Hoang-Nam Nguyen, Marcus Eder, openib-general-bounces,
	Michael S. Tsirkin
In-Reply-To: <OF22D08323.20D303C1-ON87257169.0063C980-88257169.006A41AB@us.ibm.com>

    Shirley> I have done some patch like that on top of splitting
    Shirley> CQ. The problem I found that hardware interrupt favors
    Shirley> one CPU. Most of the time these two threads are running
    Shirley> on the same cpu according to my debug output. You can
    Shirley> easily find out by cat /proc/interrupts and
    Shirley> /proc/irq/XXX/smp_affinity.  ehca has distributed
    Shirley> interrupts evenly on SMP, so it gets the benefits of two
    Shirley> threads, and gains much better throughputs.

Yes, an interrupt will likely be delivered to one CPU.

But there's no reason why the two threads can't be pinned to different
CPUs or given exclusive CPU masks, exactly the same way that ehca
implements it.

 - R.

^ permalink raw reply

* IMAP_ADDR on PPC 8xx
From: Kenneth Poole @ 2006-05-09 18:38 UTC (permalink / raw)
  To: linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 1749 bytes --]

In our build, (currently based on 2.6.14.3) we define IMAP_ADDR as
follows:

#define	IMAP_ADDR	(((bd_t *)__res)->bi_immr_base)

With very few exceptions, nearly all driver code that dereferences
IMAP_ADDR can be used unchanged and the IMMR value is always the value
passed up from the bootloader. We build one image that runs on multiple
platforms and some platforms place the IMMR address space at different
addresses than others. It's not a constant.

Regardless, I see little reason to ioremap() the IMMR address. The MMU
is set up in such a way that IMMR based locations can be accessed
directly.

Ken Poole, MRV Communications, Inc.

> In a related vein, as I alluded to in my previous email, there has
been
> previous discussion on this list about the fact that it is frowned
upon
> for device drivers to directly dereference IMAP_ADDR.  Instead, I've
> seen a recommendation that each individual driver perform an
io_remap()
> operation on IMAP_ADDR and save the resulting kernel virtual address
in
> a driver-specific data structure.  Is this a universally-accepted
> viewpoint?  Is this something that the community agrees "should be
> fixed" and we're just waiting for someone (like me) to volunteer to
fix
> all the drivers?

> Or are there arguments in favor of keeping the direct IMAP_ADDR
> dereferences?  For example, if each driver performs its own separate
> io_remap(), doesn't that have potentially negative consequences on the
> VM system for some PPC implementations (e.g. increased contention for
> TLB entries)?

> Are these issues addressed by or otherwise impacted by other ongoing
PPC
> Linux work such as the "ppc" + "ppc64" --> "powerpc" effort / "flat
> device tree" stuff???


[-- Attachment #2: Type: text/html, Size: 6931 bytes --]

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Shirley Ma @ 2006-05-09 18:44 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-kernel, openib-general, linuxppc-dev, Christoph Raisch,
	Hoang-Nam Nguyen, Marcus Eder, openib-general-bounces,
	Michael S. Tsirkin
In-Reply-To: <adar733avvs.fsf@cisco.com>

[-- Attachment #1: Type: text/plain, Size: 412 bytes --]

Roland Dreier <rdreier@cisco.com> wrote on 05/09/2006 11:36:07 AM:

> But there's no reason why the two threads can't be pinned to different
> CPUs or given exclusive CPU masks, exactly the same way that ehca
> implements it.
> 
>  - R.

I could try this. Let's see how much latency increase there.

Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

[-- Attachment #2: Type: text/html, Size: 595 bytes --]

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Michael S. Tsirkin @ 2006-05-09 18:44 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces
In-Reply-To: <OF22D08323.20D303C1-ON87257169.0063C980-88257169.006A41AB@us.ibm.com>

Quoting r. Shirley Ma <xma@us.ibm.com>:
> According to some results from different resouces, NAPI only gives 3%-10% performance improvement on single CQ.

When you say performance you mean bandwidth.
But I think it should improve the CPU utilization on RX side significantly.
If it does, that an important metric as well.

> I am trying a simple NAPI patch on splitting CQ now to see how much performance there.

What are you using for a benchmark?

-- 
MST

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Shirley Ma @ 2006-05-09 18:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces
In-Reply-To: <20060509184451.GF22825@mellanox.co.il>

[-- Attachment #1: Type: text/plain, Size: 847 bytes --]

"Michael S. Tsirkin" <mst@mellanox.co.il> wrote on 05/09/2006 11:44:52 AM:

> Quoting r. Shirley Ma <xma@us.ibm.com>:
> > According to some results from different resouces, NAPI only gives
> 3%-10% performance improvement on single CQ.
> 
> When you say performance you mean bandwidth.
> But I think it should improve the CPU utilization on RX side 
significantly.
> If it does, that an important metric as well.

No, CPU utilization wasn't reduced. When you use single CQ, NAPI polls on 
both RX/TX.

> > I am trying a simple NAPI patch on splitting CQ now to see how 
> much performance there.
> 
> What are you using for a benchmark?
> 
> -- 
> MST

netperf, iperf, mpstat, netpipe, oprofiling, what's your suggestion?

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638





[-- Attachment #2: Type: text/html, Size: 1211 bytes --]

^ permalink raw reply

* Re: Re: [PATCH 07/16] ehca: interrupt handling routines
From: Michael S. Tsirkin @ 2006-05-09 18:55 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces
In-Reply-To: <OF9332FF11.38007290-ON87257169.00673F71-88257169.006C7F6E@us.ibm.com>

Quoting r. Shirley Ma <xma@us.ibm.com>:
> No, CPU utilization wasn't reduced. When you use single CQ, NAPI polls on both RX/TX.

I think NAPI's point is to reduce the interrupt rate.
Wouldn't this reduce CPU load?

> netperf, iperf, mpstat, netpipe, oprofiling, what's your suggestion?

netperf has -C which gives CPU load, which is handy.
Running vmstat in another window also works reasoably well.

-- 
MST

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Heiko J Schick @ 2006-05-09 18:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder
In-Reply-To: <20060509164919.GC5063@mellanox.co.il>

On 09.05.2006, at 18:49, Michael S. Tsirkin wrote:

>> The trivial way to do it would be to use the same idea as the current
>> ehca driver: just create a thread for receive CQ events and a thread
>> for send CQ events, and defer CQ polling into those two threads.
>
> For RX, isn't this basically what NAPI is doing?
> Only NAPI seems better, avoiding interrupts completely and avoiding  
> latency hit
> by only getting triggered on high load ...

Does NAPI schedules CQ callbacks to different CPUs or stays the callback
(handling of data, etc.) on the same CPU where the interrupt came in?

Regards,
	Heiko

^ permalink raw reply

* Re: [PATCH] powerpc: whitespace cleanup in reg.h
From: Michael Neuling @ 2006-05-09 19:01 UTC (permalink / raw)
  To: jschopp; +Cc: linuxppc-dev, paulus
In-Reply-To: <4460E0BC.4050908@austin.ibm.com>

> But then you seem to be trying to do indenting with 3 spaces instead of tabs.

I disagree.  These used to be #define<tab><space><space>.  I just
changed them to #define<space><space><space>.

> And your values don't line up, and your comments don't line up.

Try applying the patch and looking at reg.h.  It looks much different
there than in the patch.  

> I'm just saying, either fix the formatting right or don't fix it at
> all.  Moving it from one ugly to another ugly is not worth the trouble.

With 8 character tabs, I've not changed to look at all.  

Mikey

^ permalink raw reply

* Re: [PATCH] powerpc: whitespace cleanup in reg.h
From: Hollis Blanchard @ 2006-05-09 18:56 UTC (permalink / raw)
  To: jschopp; +Cc: linuxppc-dev, Michael Neuling, paulus
In-Reply-To: <4460E0BC.4050908@austin.ibm.com>

On Tue, 2006-05-09 at 13:34 -0500, jschopp wrote:
> 
> > +#define SPRN_HID6	0x3F9	/* BE HID 6 */
> > +#define   HID6_LB	(0x0F<<12) /* Concurrent Large Page Modes */
> > +#define   HID6_DLP	(1<<20)	/* Disable all large page modes (4K only) */
> > +#define SPRN_TSC_CELL	0x399	/* Thread switch control on Cell */
> > +#define   TSC_CELL_DEC_ENABLE_0	0x400000 /* Decrementer Interrupt */
> > +#define   TSC_CELL_DEC_ENABLE_1	0x200000 /* Decrementer Interrupt */
> > +#define   TSC_CELL_EE_ENABLE	0x100000 /* External Interrupt */
> > +#define   TSC_CELL_EE_BOOST	0x080000 /* External Interrupt Boost */
> > +#define SPRN_TSC 	0x3FD	/* Thread switch control on others */
> > +#define SPRN_TST 	0x3FC	/* Thread switch timeout on others */
> 
> OK, the tab to space for lines like SPRN_HID6 I understand.  But then you seem to be 
> trying to do indenting with 3 spaces instead of tabs.  And your values don't line up, and 
> your comments don't line up.

The SPR numbers are indented one space. The values for each SPR follow
the SPR definition, and are indented two spaces past that. It's not
unreasonable.

I don't really care about the values or comments, but if other people do
then please use spaces for formatting (and tabs only for indenting).

-Hollis

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Shirley Ma @ 2006-05-09 19:46 UTC (permalink / raw)
  To: Heiko J Schick
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces, Michael S. Tsirkin
In-Reply-To: <40FCD6B6-9135-43C1-8974-E9070475DB78@schihei.de>

[-- Attachment #1: Type: text/plain, Size: 1005 bytes --]

openib-general-bounces@openib.org wrote on 05/09/2006 11:57:01 AM:

> On 09.05.2006, at 18:49, Michael S. Tsirkin wrote:
> 
> >> The trivial way to do it would be to use the same idea as the current
> >> ehca driver: just create a thread for receive CQ events and a thread
> >> for send CQ events, and defer CQ polling into those two threads.
> >
> > For RX, isn't this basically what NAPI is doing?
> > Only NAPI seems better, avoiding interrupts completely and avoiding 
> > latency hit
> > by only getting triggered on high load ...
> 
> Does NAPI schedules CQ callbacks to different CPUs or stays the callback
> (handling of data, etc.) on the same CPU where the interrupt came in?
> 
> Regards,
>    Heiko

My understanding is NAPI handle interrutps CQ callbacks on the same CPU. 
But you could implement NAPI differently, then it doesn't follow the 
native NAPI 
implementation.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

[-- Attachment #2: Type: text/html, Size: 1399 bytes --]

^ permalink raw reply

* Re: IMAP_ADDR on PPC 8xx
From: Walter L. Wimer III @ 2006-05-09 19:52 UTC (permalink / raw)
  To: linuxppc-embedded
In-Reply-To: <4D8794260B62C940BBA7150CC5EB3BD43D23CA@bosmail.BOS.int.mrv.com>

On Tue, 2006-05-09 at 14:38 -0400, Kenneth Poole wrote:
> In our build, (currently based on 2.6.14.3) we define IMAP_ADDR as
> follows:
>=20
> #define IMAP_ADDR       (((bd_t *)__res)->bi_immr_base)

Yes, this is (part of) what our 2.6.11.7-based patch does.


> With very few exceptions, nearly all driver code that dereferences
> IMAP_ADDR can be used unchanged and the IMMR value is always the value
> passed up from the bootloader. We build one image that runs on
> multiple platforms and some platforms place the IMMR address space at
> different addresses than others. It=FFs not a constant.

Exactly.  I think this kind of "automatic adaption" to the particular
platform is what should be in the vanilla kernel.


> Regardless, I see little reason to ioremap() the IMMR address.

This was the second major part of our 2.6.11.7-based patch.  It
performed a single ioremap(), stored the result in a global pointer, and
then used that pointer in all the drivers instead of using IMAP_ADDR
directly.  Personally, I don't have a strong opinion yet as to whether
this is desirable or not.


> The MMU is set up in such a way that IMMR based locations can be
> accessed directly.

I'm still rather fuzzy on whether one can count on this always being the
case on all PPC variants.  (????)



> Ken Poole, MRV Communications, Inc.


Thanks!

Walt

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Michael S. Tsirkin @ 2006-05-09 20:20 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces
In-Reply-To: <OF6CAB9865.804CAFBB-ON87257169.006C3DBC-88257169.00718277@us.ibm.com>

Quoting r. Shirley Ma <xma@us.ibm.com>:
> My understanding is NAPI handle interrutps CQ callbacks on the same CPU.

My understanding is NAPI disables interrupts under high RX load. No?

-- 
MST

^ permalink raw reply

* Re: IMAP_ADDR on PPC 8xx
From: Wolfgang Denk @ 2006-05-09 20:22 UTC (permalink / raw)
  To: Walter L. Wimer III; +Cc: linuxppc-embedded
In-Reply-To: <1147194879.2200.41.camel@excalibur.timesys.com>

In message <1147194879.2200.41.camel@excalibur.timesys.com> you wrote:
> 
> Thanks again for the advice.  Interestingly, I gave the wrong address
> above.  It wasn't 0x22000000, it was 0x02200000 (i.e. even lower!).  And
> yet with the "io_remap()'ed global variable" patch, 2.6.11.7 does indeed
> work on this board with this U-Boot....  Perhaps this works because this
> particular board only has 8MiB of RAM....

It does not work. It will certainly crash as soon as you start a  few
user space applications.

> Bottom line: I'm wondering what the Linux PPC community thinks is the
> correct long term solution to these discrepancies.  Should we the
> community declare "Freescale U-Boots are considered harmful; never use
> them; always use the official U-Boot sources" ???

Indeed it would be nice if Freescale worked more  directly  with  the
community.

> Or should we create a kernel mechanism to automatically adapt to the
> different U-Boot flavors?

No, of course not. U-Boot is just one boot  loader,  there  are  many
others, and the kernel hast to stay independent.

And it is definitely not the kernel's fault if the boot  loader  sets
up a braindamaged memory map.

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
As in certain cults it is possible to kill a process if you know  its
true name.                      -- Ken Thompson and Dennis M. Ritchie

^ permalink raw reply

* might_sleep() called in die()
From: David Wilder @ 2006-05-09 21:28 UTC (permalink / raw)
  To: linuxppc-dev

Paul-
Can you advise me?   In the die() function might_sleep() is called while 
holding the die_lock (see call flow below).
 If voluntary preemption is set this can cause a deadlock when multiple 
Oops occur.  I am seeing this problem when issuing a soft-reset as all 
cups call die() at roughly the same time.

die() 
->>show_regs()->>show_instructions()->>__get_user_nocheck()->>might_sleep()

My question is basically should die() ever call might_sleep()? If so why?
 I currently working around the problem by calling clear_need_resched() 
at the top of die().

-- 
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
dwilder@us.ibm.com
(503)578-3789

^ permalink raw reply

* Re: IMAP_ADDR on PPC 8xx
From: Walter L. Wimer III @ 2006-05-09 20:46 UTC (permalink / raw)
  To: linuxppc-embedded
In-Reply-To: <20060509202257.551DF352B2A@atlas.denx.de>

On Tue, 2006-05-09 at 22:22 +0200, Wolfgang Denk wrote:
> In message <1147194879.2200.41.camel@excalibur.timesys.com> you wrote:
> > 
> > Thanks again for the advice.  Interestingly, I gave the wrong
> > address above.  It wasn't 0x22000000, it was 0x02200000 (i.e.
> > even lower!).  And yet with the "io_remap()'ed global variable"
> > patch, 2.6.11.7 does indeed work on this board with this U-Boot....
> > Perhaps this works because this particular board only has 8MiB of
> > RAM....
> 
> It does not work. It will certainly crash as soon as you start a  few
> user space applications.


Well, something "interesting" is certainly going on because our 2.6.11.7
kernel *does* work and *does not* crash when running user space
applications.  It runs BusyBox quite happily with multiple processes
(e.g. 3 incoming telnet sessions, a console shell, etc.).

I can only conclude that there is something more to our 2.6.11.7-based
patch than I currently understand.



Cheers!

Walt

^ permalink raw reply

* When is it safe to start using ioremap?
From: Chris Dumoulin @ 2006-05-09 21:13 UTC (permalink / raw)
  To: linuxppc-embedded

At what point in the linux boot sequence can/should you start using 
ioremap to get a virtual address to hardware? Early on (in head_4xx.S), 
I'm setting up a TLB entry to access my hardware, but eventually my TLB 
entry will be overwritten, and at this point I would like to call 
ioremap to get an address for accessing my hardware. I'm having trouble 
figuring out when the original TLB entry will be overwritten (can that 
even be determined?), and at what point I can start calling ioremap.

Any help is appreciated.

Cheers,
Chris Dumoulin
-- 
*--Christopher Dumoulin--*
Software Team Leader

<http://ics-ltd.com/>
<http://ics-ltd.com/>

Interactive Circuits and Systems Ltd.
5430 Canotek Road
Ottawa, ON
K1J 9G2
(613)749-9241
1-800-267-9794 (USA only)

------------------------------------------------------------------------
This e-mail is private and confidential and is for the addressee only. 
If misdirected, please notify us by telephone and confirm that it has 
been deleted from your system and any hard copies destroyed. You are 
strictly prohibited from using, printing, distributing or disseminating 
it or any information contained in it save to the intended recipient.

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Shirley Ma @ 2006-05-09 21:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces
In-Reply-To: <20060509202041.GB24713@mellanox.co.il>

[-- Attachment #1: Type: text/plain, Size: 855 bytes --]

"Michael S. Tsirkin" <mst@mellanox.co.il> wrote on 05/09/2006 01:20:41 PM:

> Quoting r. Shirley Ma <xma@us.ibm.com>:
> > My understanding is NAPI handle interrutps CQ callbacks on the same 
CPU.
> 
> My understanding is NAPI disables interrupts under high RX load. No?
> 
> -- 
> MST

Yes, NAPI disables the interrupts based on the weight. In IPoIB case, it 
doesn't 
send out the next completion notification under heavy loading. 
The similiar CQ polling is still in NAPI on same CPU, but it's not a 
callback
anymore. 

What I find that the send completion and recv completion are not 
that fast, which means RX load is not that heavy in IPoIB. That might be
the reason compared to multiple threads implementation NAPI is not good.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

[-- Attachment #2: Type: text/html, Size: 1307 bytes --]

^ permalink raw reply

* Re: IMAP_ADDR on PPC 8xx
From: Dan Malek @ 2006-05-09 21:51 UTC (permalink / raw)
  To: Walter L. Wimer III; +Cc: linuxppc-embedded
In-Reply-To: <1147204345.3139.11.camel@excalibur.timesys.com>


On May 9, 2006, at 3:52 PM, Walter L. Wimer III wrote:

> Exactly.  I think this kind of "automatic adaption" to the particular
> platform is what should be in the vanilla kernel.

This does not mean you can choose some arbitrary value.
There is a small range of high memory addresses that will
work successfully for IMMR.  You may not see any problems
right away, but depending upon drivers selected and the
software features used, some problems will crop up.
There are also MMU performance enhancements that may
be used with certain values, and guaranteed kernel crashes
at some point in the future when abused.

With Linux, the IMMR should always have a value of
0xf0000000 or 0xff000000 for best results.

> This was the second major part of our 2.6.11.7-based patch.  It
> performed a single ioremap(), stored the result in a global  
> pointer, and
> then used that pointer in all the drivers instead of using IMAP_ADDR
> directly.

This is not an acceptable practice.  We are removing all
global pointers like this, and any driver that needs access to
some or all of the IMMR space should be individually mapping
those regions it needs.  Under the covers of ioremap() we are
performing various alignment and reuse of address spaces
in order to support things like performance enhancements
and cache coherent regions.

Thanks.

	-- Dan

^ permalink raw reply

* Re: Viable PPC platform?
From: Wolfgang Denk @ 2006-05-09 22:31 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-embedded
In-Reply-To: <20060509171520.GA10886@gate.ebshome.net>

In message <20060509171520.GA10886@gate.ebshome.net> you wrote:
>
> After many years of doing embedded Linux stuff I still don't 
> understand why people are so fond of initrd.
> 
> For temporary stuff - tempfs is much better and flexible. For r/o 
> stuff - just make separate MTD partition (cramfs, squashfs) and mount 
> it directly as root. Both options will waste significantly less 
> memory.

Agreed.

And if somebody wants to see facts and numbers, please see
http://www.denx.de/wiki/view/DULG/RootFileSystemSelection

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Ninety-Ninety Rule of Project Schedules:
        The first ninety percent of the task takes ninety percent of
the time, and the last ten percent takes the other ninety percent.

^ permalink raw reply

* RE: Viable PPC platform?
From: Howard, Marc @ 2006-05-09 22:52 UTC (permalink / raw)
  To: Wolfgang Denk, Eugene Surovegin; +Cc: linuxppc-embedded

> -----Original Message-----
> From:=20
> linuxppc-embedded-bounces+marc.howard=3Dkla-tencor.com@ozlabs.or
g [mailto:linuxppc-embedded-bounces+marc.howard=3Dkla->
tencor.com@ozlabs.org] On Behalf Of Wolfgang Denk
> Sent: Tuesday, May 09, 2006 3:31 PM
> To: Eugene Surovegin
> Cc: linuxppc-embedded@ozlabs.org
> Subject: Re: Viable PPC platform?=20
>=20
> In message <20060509171520.GA10886@gate.ebshome.net> you wrote:
> >
> > After many years of doing embedded Linux stuff I still don't=20
> > understand why people are so fond of initrd.
> >=20
> > For temporary stuff - tempfs is much better and flexible. For r/o=20
> > stuff - just make separate MTD partition (cramfs, squashfs)=20
> and mount=20
> > it directly as root. Both options will waste significantly less=20
> > memory.
>=20
> Agreed.
>=20
> And if somebody wants to see facts and numbers, please see
> http://www.denx.de/wiki/view/DULG/RootFileSystemSelection
>=20

One size does not fit all.  We have an application with a very large
file system.  It can't fit in the available flash, however we do have a
ton of RAM (512MB).  NFS is not an option nor is it desirable (latency
and availability issues).  Boot time is not an issue either in this case
as it takes the equipment many minutes to calibrate and initialize.

initrd also solves another problem.  The combined uBoot multi-image
although huge (>32 MB) represents a complete system firmware snapshot in
a single (huge) file.  By selecting the appropriate uImage the host can
guarantee the linux build, device drivers, application version and FPGA
firmware revs (the embedded board is rebooted to guarantee a repeatable
starting state).  This makes revision control for the overall system
much easier, especially since the host system is running windoze.

I agree with your general conclusion but there are specific cases where
it is not optimal.

Marc W. Howard

^ permalink raw reply

* Re: Viable PPC platform?
From: Eugene Surovegin @ 2006-05-09 23:00 UTC (permalink / raw)
  To: Howard, Marc; +Cc: linuxppc-embedded
In-Reply-To: <91B22F93A880FA48879475E134D6F0BE028A43D2@CA1EXCLV02.adcorp.kla-tencor.com>

On Tue, May 09, 2006 at 03:52:20PM -0700, Howard, Marc wrote:
> > 
> > In message <20060509171520.GA10886@gate.ebshome.net> you wrote:
> > >
> > > After many years of doing embedded Linux stuff I still don't 
> > > understand why people are so fond of initrd.
> > > 
> > > For temporary stuff - tempfs is much better and flexible. For r/o 
> > > stuff - just make separate MTD partition (cramfs, squashfs) 
> > and mount 
> > > it directly as root. Both options will waste significantly less 
> > > memory.
> > 
> > Agreed.
> > 
> > And if somebody wants to see facts and numbers, please see
> > http://www.denx.de/wiki/view/DULG/RootFileSystemSelection
> > 
> 
> One size does not fit all.  We have an application with a very large
> file system.  It can't fit in the available flash, however we do have a
> ton of RAM (512MB).  NFS is not an option nor is it desirable (latency
> and availability issues).  Boot time is not an issue either in this case
> as it takes the equipment many minutes to calibrate and initialize.
> 
> initrd also solves another problem.  The combined uBoot multi-image
> although huge (>32 MB) represents a complete system firmware snapshot in
> a single (huge) file.  By selecting the appropriate uImage the host can
> guarantee the linux build, device drivers, application version and FPGA
> firmware revs (the embedded board is rebooted to guarantee a repeatable
> starting state).  This makes revision control for the overall system
> much easier, especially since the host system is running windoze.

This all is nice provided you use network for boot. IMHO this is quite 
_rare_ setup (especially Windows host!!!). For 99% of embedded designs 
this is obviously not a viable option.

-- 
Eugene

^ permalink raw reply

* RE: Viable PPC platform?
From: Howard, Marc @ 2006-05-09 23:11 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-embedded

 > -----Original Message-----
> From: Eugene Surovegin [mailto:ebs@ebshome.net]=20

> > > In message <20060509171520.GA10886@gate.ebshome.net> you wrote:
> > > >
> > > > After many years of doing embedded Linux stuff I still don't=20
> > > > understand why people are so fond of initrd.
> > > >=20
> > One size does not fit all.  We have an application with a very large
> > file system.  It can't fit in the available flash, however=20
> we do have a
> > ton of RAM (512MB).  NFS is not an option nor is it=20
> desirable (latency
> > and availability issues).  Boot time is not an issue either=20
> in this case
> > as it takes the equipment many minutes to calibrate and initialize.
> >=20
> > initrd also solves another problem.  The combined uBoot multi-image
> > although huge (>32 MB) represents a complete system=20
> firmware snapshot in
> > a single (huge) file.  By selecting the appropriate uImage=20
> the host can
> > guarantee the linux build, device drivers, application=20
> version and FPGA
> > firmware revs (the embedded board is rebooted to guarantee=20
> a repeatable
> > starting state).  This makes revision control for the overall system
> > much easier, especially since the host system is running windoze.
>=20
> This all is nice provided you use network for boot. IMHO this=20
> is quite=20
> _rare_ setup (especially Windows host!!!). For 99% of=20
> embedded designs=20
> this is obviously not a viable option.
>=20
> --=20
> Eugene

Again, I agree.  I just wanted to show you at least one case where
initrd is the best solution, IMHO.

As for a linux board booting off of a windoze host I prefer to think of
it as an island of sanity in a sea of chaos.

Marc W. Howard

^ permalink raw reply

* RE: Information for setting up SMT related parameters on linux 2.6.16 on POWER5
From: Meswani, Mitesh @ 2006-05-09 23:17 UTC (permalink / raw)
  To: Segher Boessenkool, will_schmidt; +Cc: linuxppc-dev, Arnd Bergmann
In-Reply-To: <18583972-9E29-4B52-BF2E-53102F1794EB@kernel.crashing.org>

[-- Attachment #1: Type: text/plain, Size: 2833 bytes --]

Thanks guys 
 
That answered so many of my questions. 
 
If I were to use these macros from user space, would they remain set until next reboot or change ? POWER5 allows priorities 2 through 4 for user apps, so considering this, and the fact that the normal prioirity is level 4, if a user app resets it to say 2 and then finishes without changing it back to 4 , would all the subsequent user apps run at the new level 2. I wonder what I am saying even makes sense, because the kernel internally throttles the priority for various sections of the kernel code and it may even overwrite it. 
 
 
On a slightly unrelated note, I appended some boot parameters like smt-enabled=on/off to /etc/lilo.conf and unfortunately I am not able to see any effect and it boots the same way. I am switching from the AIX world so I maybe doing something dumb, please point out if I am !  This kind of seems to effect the bind processor calls using sys_setaffinity when there are 4 logical processors 0-3 on two physical processors, bind only allows me to set affinity to either cpu 0 or 2, this seems weird to me because my system is booting with two logical cpus and then I set online bit to 1 to turn the remaining on, thereafter I try binding and havent been very successful. 
 
 
Thanks for all your replies. 
 
 
 

Mitesh R. Meswani 
Ph.D. Candidate 
Research Associate, PLS2 Group
Room 106 F, Department of Computer Science
The University of Texas at El Paso, 
El Paso, Texas 79968
Tel: 915 747 8012 (O)
Email: mmeswani@utep.edu

________________________________

From: Segher Boessenkool [mailto:segher@kernel.crashing.org]
Sent: Mon 5/8/2006 5:04 PM
To: will_schmidt@vnet.ibm.com
Cc: Meswani, Mitesh; linuxppc-dev@ozlabs.org; Arnd Bergmann; linux-kernel@vger.kernel.org; cbe-oss-dev@ozlabs.org
Subject: Re: Information for setting up SMT related parameters on linux 2.6.16 on POWER5



> the HMT_* macros are telling firmware that "this processor thread 
> should
> run at this priority".  Typically used when we're waiting on a 
> spinlock.
> I.e. When we are waiting on a spinlock, we hit the HMT_low macro to 
> drop
> our threads priority, allowing the other thread to use those extra
> cycles finish it's stuff quicker, and maybe even release the lock 
> we're
> waiting for.          HMT_* is all within the kernel though, no 
> exposure
> to userspace apps.

Actually, those macros translate straight into a single machine insn.
No firmware is involved.  See include/asm-powerpc/processor.h.  For
example:

#define HMT_very_low()   asm volatile("or 31,31,31   # very low 
priority")

You can use those same macros from user space, although it is CPU
implementation dependent which priorities you can actually set (you
probably can do low and medium priority).


Segher




[-- Attachment #2: Type: text/html, Size: 4525 bytes --]

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Shirley Ma @ 2006-05-09 18:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces
In-Reply-To: <20060509164919.GC5063@mellanox.co.il>

[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]

openib-general-bounces@openib.org wrote on 05/09/2006 09:49:19 AM:

> Quoting r. Roland Dreier <rdreier@cisco.com>:
> > The trivial way to do it would be to use the same idea as the current
> > ehca driver: just create a thread for receive CQ events and a thread
> > for send CQ events, and defer CQ polling into those two threads.

I have done some patch like that on top of splitting CQ. The problem I 
found that hardware interrupt favors one CPU. Most of the time these two 
threads are running on the same cpu according to my debug output. You can 
easily find out by cat /proc/interrupts and /proc/irq/XXX/smp_affinity. 
ehca has distributed interrupts evenly on SMP, so it gets the benefits of 
two threads, and gains much better throughputs.

The interesting thing is the UP results are much better than SMP results 
with this approach on mthca.

> For RX, isn't this basically what NAPI is doing?
> Only NAPI seems better, avoiding interrupts completely and avoiding 
> latency hit
> by only getting triggered on high load ...
> 
> -- 
> MST

According to some results from different resouces, NAPI only gives 3%-10% 
performance improvement on single CQ.
I am trying a simple NAPI patch on splitting CQ now to see how much 
performance there.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

[-- Attachment #2: Type: text/html, Size: 1767 bytes --]

^ permalink raw reply

* Re: [openib-general] [PATCH 07/16] ehca: interrupt handling routines
From: Segher Boessenkool @ 2006-05-09 23:35 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-kernel, openib-general, linuxppc-dev, Christoph Raisch,
	Hoang-Nam Nguyen, Marcus Eder
In-Reply-To: <adalktbcgl1.fsf@cisco.com>

>     Heiko> Yes, I agree. It would not be an optimal solution, because
>     Heiko> other upper level protocols (e.g. SDP, SRP, etc.) or
>     Heiko> userspace verbs would not be affected by this
>     Heiko> changes. Nevertheless, how can an improved "scaling" or
>     Heiko> "SMP" version of IPoIB look like. How could it be
>     Heiko> implemented?
>
> The trivial way to do it would be to use the same idea as the current
> ehca driver: just create a thread for receive CQ events and a thread
> for send CQ events, and defer CQ polling into those two threads.
>
> Something even better may be possible by specializing to IPoIB of  
> course.

The hardware IRQ should go to some CPU close to the hardware itself.   
The
softirq (or whatever else) should go to the same CPU that is handling  
the
user-level task for that message.  Or a CPU close to it, at least.


Segher

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox