LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] powerpc: whitespace cleanup in reg.h
From: Hollis Blanchard @ 2006-05-09 18:56 UTC (permalink / raw)
  To: jschopp; +Cc: linuxppc-dev, Michael Neuling, paulus
In-Reply-To: <4460E0BC.4050908@austin.ibm.com>

On Tue, 2006-05-09 at 13:34 -0500, jschopp wrote:
> 
> > +#define SPRN_HID6	0x3F9	/* BE HID 6 */
> > +#define   HID6_LB	(0x0F<<12) /* Concurrent Large Page Modes */
> > +#define   HID6_DLP	(1<<20)	/* Disable all large page modes (4K only) */
> > +#define SPRN_TSC_CELL	0x399	/* Thread switch control on Cell */
> > +#define   TSC_CELL_DEC_ENABLE_0	0x400000 /* Decrementer Interrupt */
> > +#define   TSC_CELL_DEC_ENABLE_1	0x200000 /* Decrementer Interrupt */
> > +#define   TSC_CELL_EE_ENABLE	0x100000 /* External Interrupt */
> > +#define   TSC_CELL_EE_BOOST	0x080000 /* External Interrupt Boost */
> > +#define SPRN_TSC 	0x3FD	/* Thread switch control on others */
> > +#define SPRN_TST 	0x3FC	/* Thread switch timeout on others */
> 
> OK, the tab to space for lines like SPRN_HID6 I understand.  But then you seem to be 
> trying to do indenting with 3 spaces instead of tabs.  And your values don't line up, and 
> your comments don't line up.

The SPR numbers are indented one space. The values for each SPR follow
the SPR definition, and are indented two spaces past that. It's not
unreasonable.

I don't really care about the values or comments, but if other people do
then please use spaces for formatting (and tabs only for indenting).

-Hollis

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Shirley Ma @ 2006-05-09 19:46 UTC (permalink / raw)
  To: Heiko J Schick
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces, Michael S. Tsirkin
In-Reply-To: <40FCD6B6-9135-43C1-8974-E9070475DB78@schihei.de>

[-- Attachment #1: Type: text/plain, Size: 1005 bytes --]

openib-general-bounces@openib.org wrote on 05/09/2006 11:57:01 AM:

> On 09.05.2006, at 18:49, Michael S. Tsirkin wrote:
> 
> >> The trivial way to do it would be to use the same idea as the current
> >> ehca driver: just create a thread for receive CQ events and a thread
> >> for send CQ events, and defer CQ polling into those two threads.
> >
> > For RX, isn't this basically what NAPI is doing?
> > Only NAPI seems better, avoiding interrupts completely and avoiding 
> > latency hit
> > by only getting triggered on high load ...
> 
> Does NAPI schedules CQ callbacks to different CPUs or stays the callback
> (handling of data, etc.) on the same CPU where the interrupt came in?
> 
> Regards,
>    Heiko

My understanding is NAPI handle interrutps CQ callbacks on the same CPU. 
But you could implement NAPI differently, then it doesn't follow the 
native NAPI 
implementation.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

[-- Attachment #2: Type: text/html, Size: 1399 bytes --]

^ permalink raw reply

* Re: IMAP_ADDR on PPC 8xx
From: Walter L. Wimer III @ 2006-05-09 19:52 UTC (permalink / raw)
  To: linuxppc-embedded
In-Reply-To: <4D8794260B62C940BBA7150CC5EB3BD43D23CA@bosmail.BOS.int.mrv.com>

On Tue, 2006-05-09 at 14:38 -0400, Kenneth Poole wrote:
> In our build, (currently based on 2.6.14.3) we define IMAP_ADDR as
> follows:
>=20
> #define IMAP_ADDR       (((bd_t *)__res)->bi_immr_base)

Yes, this is (part of) what our 2.6.11.7-based patch does.

> With very few exceptions, nearly all driver code that dereferences
> IMAP_ADDR can be used unchanged and the IMMR value is always the value
> passed up from the bootloader. We build one image that runs on
> multiple platforms and some platforms place the IMMR address space at
> different addresses than others. It=FFs not a constant.

Exactly.  I think this kind of "automatic adaption" to the particular
platform is what should be in the vanilla kernel.

> Regardless, I see little reason to ioremap() the IMMR address.

This was the second major part of our 2.6.11.7-based patch.  It
performed a single ioremap(), stored the result in a global pointer, and
then used that pointer in all the drivers instead of using IMAP_ADDR
directly.  Personally, I don't have a strong opinion yet as to whether
this is desirable or not.

> The MMU is set up in such a way that IMMR based locations can be
> accessed directly.

I'm still rather fuzzy on whether one can count on this always being the
case on all PPC variants.  (????)

> Ken Poole, MRV Communications, Inc.

Thanks!

Walt

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Michael S. Tsirkin @ 2006-05-09 20:20 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces
In-Reply-To: <OF6CAB9865.804CAFBB-ON87257169.006C3DBC-88257169.00718277@us.ibm.com>

Quoting r. Shirley Ma <xma@us.ibm.com>:
> My understanding is NAPI handle interrutps CQ callbacks on the same CPU.

My understanding is NAPI disables interrupts under high RX load. No?

-- 
MST

^ permalink raw reply

* Re: IMAP_ADDR on PPC 8xx
From: Wolfgang Denk @ 2006-05-09 20:22 UTC (permalink / raw)
  To: Walter L. Wimer III; +Cc: linuxppc-embedded
In-Reply-To: <1147194879.2200.41.camel@excalibur.timesys.com>

In message <1147194879.2200.41.camel@excalibur.timesys.com> you wrote:
> 
> Thanks again for the advice.  Interestingly, I gave the wrong address
> above.  It wasn't 0x22000000, it was 0x02200000 (i.e. even lower!).  And
> yet with the "io_remap()'ed global variable" patch, 2.6.11.7 does indeed
> work on this board with this U-Boot....  Perhaps this works because this
> particular board only has 8MiB of RAM....

It does not work. It will certainly crash as soon as you start a  few
user space applications.

> Bottom line: I'm wondering what the Linux PPC community thinks is the
> correct long term solution to these discrepancies.  Should we the
> community declare "Freescale U-Boots are considered harmful; never use
> them; always use the official U-Boot sources" ???

Indeed it would be nice if Freescale worked more  directly  with  the
community.

> Or should we create a kernel mechanism to automatically adapt to the
> different U-Boot flavors?

No, of course not. U-Boot is just one boot  loader,  there  are  many
others, and the kernel hast to stay independent.

And it is definitely not the kernel's fault if the boot  loader  sets
up a braindamaged memory map.

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
As in certain cults it is possible to kill a process if you know  its
true name.                      -- Ken Thompson and Dennis M. Ritchie

^ permalink raw reply

* might_sleep() called in die()
From: David Wilder @ 2006-05-09 21:28 UTC (permalink / raw)
  To: linuxppc-dev

Paul-
Can you advise me?   In the die() function might_sleep() is called while 
holding the die_lock (see call flow below).
 If voluntary preemption is set this can cause a deadlock when multiple 
Oops occur.  I am seeing this problem when issuing a soft-reset as all 
cups call die() at roughly the same time.

die() 
->>show_regs()->>show_instructions()->>__get_user_nocheck()->>might_sleep()

My question is basically should die() ever call might_sleep()? If so why?
 I currently working around the problem by calling clear_need_resched() 
at the top of die().

-- 
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
dwilder@us.ibm.com
(503)578-3789

^ permalink raw reply

* Re: IMAP_ADDR on PPC 8xx
From: Walter L. Wimer III @ 2006-05-09 20:46 UTC (permalink / raw)
  To: linuxppc-embedded
In-Reply-To: <20060509202257.551DF352B2A@atlas.denx.de>

On Tue, 2006-05-09 at 22:22 +0200, Wolfgang Denk wrote:
> In message <1147194879.2200.41.camel@excalibur.timesys.com> you wrote:
> > 
> > Thanks again for the advice.  Interestingly, I gave the wrong
> > address above.  It wasn't 0x22000000, it was 0x02200000 (i.e.
> > even lower!).  And yet with the "io_remap()'ed global variable"
> > patch, 2.6.11.7 does indeed work on this board with this U-Boot....
> > Perhaps this works because this particular board only has 8MiB of
> > RAM....
> 
> It does not work. It will certainly crash as soon as you start a  few
> user space applications.

Well, something "interesting" is certainly going on because our 2.6.11.7
kernel *does* work and *does not* crash when running user space
applications.  It runs BusyBox quite happily with multiple processes
(e.g. 3 incoming telnet sessions, a console shell, etc.).

I can only conclude that there is something more to our 2.6.11.7-based
patch than I currently understand.

Cheers!

Walt

^ permalink raw reply

* When is it safe to start using ioremap?
From: Chris Dumoulin @ 2006-05-09 21:13 UTC (permalink / raw)
  To: linuxppc-embedded

At what point in the linux boot sequence can/should you start using 
ioremap to get a virtual address to hardware? Early on (in head_4xx.S), 
I'm setting up a TLB entry to access my hardware, but eventually my TLB 
entry will be overwritten, and at this point I would like to call 
ioremap to get an address for accessing my hardware. I'm having trouble 
figuring out when the original TLB entry will be overwritten (can that 
even be determined?), and at what point I can start calling ioremap.

Any help is appreciated.

Cheers,
Chris Dumoulin
-- 
*--Christopher Dumoulin--*
Software Team Leader

<http://ics-ltd.com/>
<http://ics-ltd.com/>

Interactive Circuits and Systems Ltd.
5430 Canotek Road
Ottawa, ON
K1J 9G2
(613)749-9241
1-800-267-9794 (USA only)

------------------------------------------------------------------------
This e-mail is private and confidential and is for the addressee only. 
If misdirected, please notify us by telephone and confirm that it has 
been deleted from your system and any hard copies destroyed. You are 
strictly prohibited from using, printing, distributing or disseminating 
it or any information contained in it save to the intended recipient.

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Shirley Ma @ 2006-05-09 21:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces
In-Reply-To: <20060509202041.GB24713@mellanox.co.il>

[-- Attachment #1: Type: text/plain, Size: 855 bytes --]

"Michael S. Tsirkin" <mst@mellanox.co.il> wrote on 05/09/2006 01:20:41 PM:

> Quoting r. Shirley Ma <xma@us.ibm.com>:
> > My understanding is NAPI handle interrutps CQ callbacks on the same 
CPU.
> 
> My understanding is NAPI disables interrupts under high RX load. No?
> 
> -- 
> MST

Yes, NAPI disables the interrupts based on the weight. In IPoIB case, it 
doesn't 
send out the next completion notification under heavy loading. 
The similiar CQ polling is still in NAPI on same CPU, but it's not a 
callback
anymore. 

What I find that the send completion and recv completion are not 
that fast, which means RX load is not that heavy in IPoIB. That might be
the reason compared to multiple threads implementation NAPI is not good.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

[-- Attachment #2: Type: text/html, Size: 1307 bytes --]

^ permalink raw reply

* Re: IMAP_ADDR on PPC 8xx
From: Dan Malek @ 2006-05-09 21:51 UTC (permalink / raw)
  To: Walter L. Wimer III; +Cc: linuxppc-embedded
In-Reply-To: <1147204345.3139.11.camel@excalibur.timesys.com>

On May 9, 2006, at 3:52 PM, Walter L. Wimer III wrote:

> Exactly.  I think this kind of "automatic adaption" to the particular
> platform is what should be in the vanilla kernel.

This does not mean you can choose some arbitrary value.
There is a small range of high memory addresses that will
work successfully for IMMR.  You may not see any problems
right away, but depending upon drivers selected and the
software features used, some problems will crop up.
There are also MMU performance enhancements that may
be used with certain values, and guaranteed kernel crashes
at some point in the future when abused.

With Linux, the IMMR should always have a value of
0xf0000000 or 0xff000000 for best results.

> This was the second major part of our 2.6.11.7-based patch.  It
> performed a single ioremap(), stored the result in a global  
> pointer, and
> then used that pointer in all the drivers instead of using IMAP_ADDR
> directly.

This is not an acceptable practice.  We are removing all
global pointers like this, and any driver that needs access to
some or all of the IMMR space should be individually mapping
those regions it needs.  Under the covers of ioremap() we are
performing various alignment and reuse of address spaces
in order to support things like performance enhancements
and cache coherent regions.

Thanks.

	-- Dan

^ permalink raw reply

* Re: Viable PPC platform?
From: Wolfgang Denk @ 2006-05-09 22:31 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-embedded
In-Reply-To: <20060509171520.GA10886@gate.ebshome.net>

In message <20060509171520.GA10886@gate.ebshome.net> you wrote:
>
> After many years of doing embedded Linux stuff I still don't 
> understand why people are so fond of initrd.
> 
> For temporary stuff - tempfs is much better and flexible. For r/o 
> stuff - just make separate MTD partition (cramfs, squashfs) and mount 
> it directly as root. Both options will waste significantly less 
> memory.

Agreed.

And if somebody wants to see facts and numbers, please see
http://www.denx.de/wiki/view/DULG/RootFileSystemSelection

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Ninety-Ninety Rule of Project Schedules:
        The first ninety percent of the task takes ninety percent of
the time, and the last ten percent takes the other ninety percent.

^ permalink raw reply

* RE: Viable PPC platform?
From: Howard, Marc @ 2006-05-09 22:52 UTC (permalink / raw)
  To: Wolfgang Denk, Eugene Surovegin; +Cc: linuxppc-embedded

> -----Original Message-----
> From:=20
> linuxppc-embedded-bounces+marc.howard=3Dkla-tencor.com@ozlabs.or
g [mailto:linuxppc-embedded-bounces+marc.howard=3Dkla->
tencor.com@ozlabs.org] On Behalf Of Wolfgang Denk
> Sent: Tuesday, May 09, 2006 3:31 PM
> To: Eugene Surovegin
> Cc: linuxppc-embedded@ozlabs.org
> Subject: Re: Viable PPC platform?=20
>=20
> In message <20060509171520.GA10886@gate.ebshome.net> you wrote:
> >
> > After many years of doing embedded Linux stuff I still don't=20
> > understand why people are so fond of initrd.
> >=20
> > For temporary stuff - tempfs is much better and flexible. For r/o=20
> > stuff - just make separate MTD partition (cramfs, squashfs)=20
> and mount=20
> > it directly as root. Both options will waste significantly less=20
> > memory.
>=20
> Agreed.
>=20
> And if somebody wants to see facts and numbers, please see
> http://www.denx.de/wiki/view/DULG/RootFileSystemSelection
>=20

One size does not fit all.  We have an application with a very large
file system.  It can't fit in the available flash, however we do have a
ton of RAM (512MB).  NFS is not an option nor is it desirable (latency
and availability issues).  Boot time is not an issue either in this case
as it takes the equipment many minutes to calibrate and initialize.

initrd also solves another problem.  The combined uBoot multi-image
although huge (>32 MB) represents a complete system firmware snapshot in
a single (huge) file.  By selecting the appropriate uImage the host can
guarantee the linux build, device drivers, application version and FPGA
firmware revs (the embedded board is rebooted to guarantee a repeatable
starting state).  This makes revision control for the overall system
much easier, especially since the host system is running windoze.

I agree with your general conclusion but there are specific cases where
it is not optimal.

Marc W. Howard

^ permalink raw reply

* Re: Viable PPC platform?
From: Eugene Surovegin @ 2006-05-09 23:00 UTC (permalink / raw)
  To: Howard, Marc; +Cc: linuxppc-embedded
In-Reply-To: <91B22F93A880FA48879475E134D6F0BE028A43D2@CA1EXCLV02.adcorp.kla-tencor.com>

On Tue, May 09, 2006 at 03:52:20PM -0700, Howard, Marc wrote:
> > 
> > In message <20060509171520.GA10886@gate.ebshome.net> you wrote:
> > >
> > > After many years of doing embedded Linux stuff I still don't 
> > > understand why people are so fond of initrd.
> > > 
> > > For temporary stuff - tempfs is much better and flexible. For r/o 
> > > stuff - just make separate MTD partition (cramfs, squashfs) 
> > and mount 
> > > it directly as root. Both options will waste significantly less 
> > > memory.
> > 
> > Agreed.
> > 
> > And if somebody wants to see facts and numbers, please see
> > http://www.denx.de/wiki/view/DULG/RootFileSystemSelection
> > 
> 
> One size does not fit all.  We have an application with a very large
> file system.  It can't fit in the available flash, however we do have a
> ton of RAM (512MB).  NFS is not an option nor is it desirable (latency
> and availability issues).  Boot time is not an issue either in this case
> as it takes the equipment many minutes to calibrate and initialize.
> 
> initrd also solves another problem.  The combined uBoot multi-image
> although huge (>32 MB) represents a complete system firmware snapshot in
> a single (huge) file.  By selecting the appropriate uImage the host can
> guarantee the linux build, device drivers, application version and FPGA
> firmware revs (the embedded board is rebooted to guarantee a repeatable
> starting state).  This makes revision control for the overall system
> much easier, especially since the host system is running windoze.

This all is nice provided you use network for boot. IMHO this is quite 
_rare_ setup (especially Windows host!!!). For 99% of embedded designs 
this is obviously not a viable option.

-- 
Eugene

^ permalink raw reply

* RE: Viable PPC platform?
From: Howard, Marc @ 2006-05-09 23:11 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-embedded

 > -----Original Message-----
> From: Eugene Surovegin [mailto:ebs@ebshome.net]=20

> > > In message <20060509171520.GA10886@gate.ebshome.net> you wrote:
> > > >
> > > > After many years of doing embedded Linux stuff I still don't=20
> > > > understand why people are so fond of initrd.
> > > >=20
> > One size does not fit all.  We have an application with a very large
> > file system.  It can't fit in the available flash, however=20
> we do have a
> > ton of RAM (512MB).  NFS is not an option nor is it=20
> desirable (latency
> > and availability issues).  Boot time is not an issue either=20
> in this case
> > as it takes the equipment many minutes to calibrate and initialize.
> >=20
> > initrd also solves another problem.  The combined uBoot multi-image
> > although huge (>32 MB) represents a complete system=20
> firmware snapshot in
> > a single (huge) file.  By selecting the appropriate uImage=20
> the host can
> > guarantee the linux build, device drivers, application=20
> version and FPGA
> > firmware revs (the embedded board is rebooted to guarantee=20
> a repeatable
> > starting state).  This makes revision control for the overall system
> > much easier, especially since the host system is running windoze.
>=20
> This all is nice provided you use network for boot. IMHO this=20
> is quite=20
> _rare_ setup (especially Windows host!!!). For 99% of=20
> embedded designs=20
> this is obviously not a viable option.
>=20
> --=20
> Eugene

Again, I agree.  I just wanted to show you at least one case where
initrd is the best solution, IMHO.

As for a linux board booting off of a windoze host I prefer to think of
it as an island of sanity in a sea of chaos.

Marc W. Howard

^ permalink raw reply

* RE: Information for setting up SMT related parameters on linux 2.6.16 on POWER5
From: Meswani, Mitesh @ 2006-05-09 23:17 UTC (permalink / raw)
  To: Segher Boessenkool, will_schmidt; +Cc: linuxppc-dev, Arnd Bergmann
In-Reply-To: <18583972-9E29-4B52-BF2E-53102F1794EB@kernel.crashing.org>

[-- Attachment #1: Type: text/plain, Size: 2833 bytes --]

Thanks guys 

That answered so many of my questions. 

If I were to use these macros from user space, would they remain set until next reboot or change ? POWER5 allows priorities 2 through 4 for user apps, so considering this, and the fact that the normal prioirity is level 4, if a user app resets it to say 2 and then finishes without changing it back to 4 , would all the subsequent user apps run at the new level 2. I wonder what I am saying even makes sense, because the kernel internally throttles the priority for various sections of the kernel code and it may even overwrite it. 

On a slightly unrelated note, I appended some boot parameters like smt-enabled=on/off to /etc/lilo.conf and unfortunately I am not able to see any effect and it boots the same way. I am switching from the AIX world so I maybe doing something dumb, please point out if I am !  This kind of seems to effect the bind processor calls using sys_setaffinity when there are 4 logical processors 0-3 on two physical processors, bind only allows me to set affinity to either cpu 0 or 2, this seems weird to me because my system is booting with two logical cpus and then I set online bit to 1 to turn the remaining on, thereafter I try binding and havent been very successful. 

Thanks for all your replies. 

Mitesh R. Meswani 
Ph.D. Candidate 
Research Associate, PLS2 Group
Room 106 F, Department of Computer Science
The University of Texas at El Paso, 
El Paso, Texas 79968
Tel: 915 747 8012 (O)
Email: mmeswani@utep.edu

________________________________

From: Segher Boessenkool [mailto:segher@kernel.crashing.org]
Sent: Mon 5/8/2006 5:04 PM
To: will_schmidt@vnet.ibm.com
Cc: Meswani, Mitesh; linuxppc-dev@ozlabs.org; Arnd Bergmann; linux-kernel@vger.kernel.org; cbe-oss-dev@ozlabs.org
Subject: Re: Information for setting up SMT related parameters on linux 2.6.16 on POWER5

> the HMT_* macros are telling firmware that "this processor thread 
> should
> run at this priority".  Typically used when we're waiting on a 
> spinlock.
> I.e. When we are waiting on a spinlock, we hit the HMT_low macro to 
> drop
> our threads priority, allowing the other thread to use those extra
> cycles finish it's stuff quicker, and maybe even release the lock 
> we're
> waiting for.          HMT_* is all within the kernel though, no 
> exposure
> to userspace apps.

Actually, those macros translate straight into a single machine insn.
No firmware is involved.  See include/asm-powerpc/processor.h.  For
example:

#define HMT_very_low()   asm volatile("or 31,31,31   # very low 
priority")

You can use those same macros from user space, although it is CPU
implementation dependent which priorities you can actually set (you
probably can do low and medium priority).

Segher

[-- Attachment #2: Type: text/html, Size: 4525 bytes --]

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Shirley Ma @ 2006-05-09 18:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces
In-Reply-To: <20060509164919.GC5063@mellanox.co.il>

[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]

openib-general-bounces@openib.org wrote on 05/09/2006 09:49:19 AM:

> Quoting r. Roland Dreier <rdreier@cisco.com>:
> > The trivial way to do it would be to use the same idea as the current
> > ehca driver: just create a thread for receive CQ events and a thread
> > for send CQ events, and defer CQ polling into those two threads.

I have done some patch like that on top of splitting CQ. The problem I 
found that hardware interrupt favors one CPU. Most of the time these two 
threads are running on the same cpu according to my debug output. You can 
easily find out by cat /proc/interrupts and /proc/irq/XXX/smp_affinity. 
ehca has distributed interrupts evenly on SMP, so it gets the benefits of 
two threads, and gains much better throughputs.

The interesting thing is the UP results are much better than SMP results 
with this approach on mthca.

> For RX, isn't this basically what NAPI is doing?
> Only NAPI seems better, avoiding interrupts completely and avoiding 
> latency hit
> by only getting triggered on high load ...
> 
> -- 
> MST

According to some results from different resouces, NAPI only gives 3%-10% 
performance improvement on single CQ.
I am trying a simple NAPI patch on splitting CQ now to see how much 
performance there.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

[-- Attachment #2: Type: text/html, Size: 1767 bytes --]

^ permalink raw reply

* Re: [openib-general] [PATCH 07/16] ehca: interrupt handling routines
From: Segher Boessenkool @ 2006-05-09 23:35 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-kernel, openib-general, linuxppc-dev, Christoph Raisch,
	Hoang-Nam Nguyen, Marcus Eder
In-Reply-To: <adalktbcgl1.fsf@cisco.com>

>     Heiko> Yes, I agree. It would not be an optimal solution, because
>     Heiko> other upper level protocols (e.g. SDP, SRP, etc.) or
>     Heiko> userspace verbs would not be affected by this
>     Heiko> changes. Nevertheless, how can an improved "scaling" or
>     Heiko> "SMP" version of IPoIB look like. How could it be
>     Heiko> implemented?
>
> The trivial way to do it would be to use the same idea as the current
> ehca driver: just create a thread for receive CQ events and a thread
> for send CQ events, and defer CQ polling into those two threads.
>
> Something even better may be possible by specializing to IPoIB of  
> course.

The hardware IRQ should go to some CPU close to the hardware itself.   
The
softirq (or whatever else) should go to the same CPU that is handling  
the
user-level task for that message.  Or a CPU close to it, at least.


Segher

^ permalink raw reply

* pci_resource_end problem revisited
From: Geoff Levand @ 2006-05-10  1:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

Ben,

I still have this problem of OF reporting the serial port bar
size as 16 instead of 8 on my G5.  Where would be a proper
place to fix this?  BTW, I verified that it is OF that reports
the size as 16.

-Geoff

-------- Original Message --------
Subject: Re: pci_resource_end() changed problem with 2.6.14
Date: Fri, 04 Nov 2005 10:36:58 -0800
From: Geoff Levand <geoffrey.levand@am.sony.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc64-dev@ozlabs.org
References: <436ADBA7.7030706@am.sony.com> <1131087370.4680.238.camel@gaston>

Benjamin Herrenschmidt wrote:
> On Thu, 2005-11-03 at 19:55 -0800, Geoff Levand wrote:
> 
>>I found that the serial port probe code in drivers/serial/8250_pci.c 
>>no longer works properly for ppc64 in 2.6.14.  It seems the value 
>>returned by pci_resource_len() on ppc64 changed from 8 to 16 since 
>>2.6.13.  I tested on a PC and pci_resource_len() returns 8 as 
>>expected.
>>
> Interesting... What does an lspci -vv shows for the BARs of the PCI
> card ? Also, what do you have in /proc/device-tree  ? What is the
> machine precisely ?
> 
> 2.6.14 now uses the OF device-tree to generate the linux PCI tree
> instead of going directly to PCI probing. It's possible that this is
> causing your problem if for some reason, the BAR sizing done by OF ends
> up being different than what the kernel does ...
> 

Sorry, I should have mentioned it, this is on my PowerMac G5 with a
generic 8250 serial PCI card (StarTech PCI4S550N).  Here's what lspci
gives me:

0001:05:03.0 Serial controller: NetMos Technology PCI 9845 Multi-I/O Controller (rev 01) (prog-if 02 [16550])
        Subsystem: LSI Logic / Symbios Logic 0P4S (4 port 16550A serial card)
        Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 53
        Region 0: I/O ports at f4000050 [size=16]
        Region 1: I/O ports at f4000040 [size=16]
        Region 2: I/O ports at f4000030 [size=16]
        Region 3: I/O ports at f4000020 [size=16]
        Region 4: I/O ports at f4000010 [size=16]
        Region 5: I/O ports at f4000000 [size=16]

It could be the change to using the OF device-tree.  What's an easy way to
see the size OF has used?

-Geoff

^ permalink raw reply

* [PATCH] powerpc: fix LED progress on pseries boxes
From: Anton Blanchard @ 2006-05-10  3:05 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: paulus


It looks like we are printing the wrong thing on the op panel.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 5eb55ef..5f79f01 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -255,7 +255,7 @@ static int __init pSeries_init_panel(voi
 {
 	/* Manually leave the kernel version on the panel. */
 	ppc_md.progress("Linux ppc64\n", 0);
-	ppc_md.progress(system_utsname.version, 0);
+	ppc_md.progress(system_utsname.release, 0);
 
 	return 0;
 }

^ permalink raw reply related

* Re: [PATCH] powerpc: whitespace cleanup in reg.h
From: Olof Johansson @ 2006-05-10  3:14 UTC (permalink / raw)
  To: jschopp; +Cc: linuxppc-dev, Michael Neuling, paulus
In-Reply-To: <4460E0BC.4050908@austin.ibm.com>

On Tue, May 09, 2006 at 01:34:36PM -0500, jschopp wrote:

> > +#define SPRN_HID6	0x3F9	/* BE HID 6 */
> > +#define   HID6_LB	(0x0F<<12) /* Concurrent Large Page Modes */
> > +#define   HID6_DLP	(1<<20)	/* Disable all large page modes (4K only) */
> > +#define SPRN_TSC_CELL	0x399	/* Thread switch control on Cell */
> > +#define   TSC_CELL_DEC_ENABLE_0	0x400000 /* Decrementer Interrupt */
> > +#define   TSC_CELL_DEC_ENABLE_1	0x200000 /* Decrementer Interrupt */
> > +#define   TSC_CELL_EE_ENABLE	0x100000 /* External Interrupt */
> > +#define   TSC_CELL_EE_BOOST	0x080000 /* External Interrupt Boost */
> > +#define SPRN_TSC 	0x3FD	/* Thread switch control on others */
> > +#define SPRN_TST 	0x3FC	/* Thread switch timeout on others */
> 
> OK, the tab to space for lines like SPRN_HID6 I understand.  But then you seem to be 
> trying to do indenting with 3 spaces instead of tabs.

It's what the rest of the file uses. It might not correspond to
CodingStyle, but it makes it easy to read.

(Now, I'm not sure it's a good idea to define the meanings of HID bits
in the global register include, but that's unrelated to the whitespace
cleanup Mikey did.)


-Olof

^ permalink raw reply

* [RFC/PATCH] Make powerpc64 use __thread for per-cpu variables
From: Paul Mackerras @ 2006-05-10  4:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-arch, linuxppc-dev

With this patch, 64-bit powerpc uses __thread for per-cpu variables.

The motivation for doing this is that getting the address of a per-cpu
variable currently requires two loads (one to get our per-cpu offset
and one to get the address of the variable in the .data.percpu
section) plus an add.  With __thread we can get the address of our
copy of a per-cpu variable with just an add (r13 plus a constant).

This means that r13 now has to hold the per-cpu base address + 0x7000
(the 0x7000 is to allow us to address 60k of per-cpu data with a
16-bit signed offset, and is dictated by the toolchain).  In
particular that means that the r13 can't hold the pointer to the
paca.  Instead we can get the paca pointer from the SPRG3 register.
We use r13 for the paca pointer for the early exception entry code,
and load the thread pointer into r13 before calling C code.

With this there is an incentive to move things that are currently
stored in the paca into per-cpu variables, and eventually to get rid
of the paca altogether.  I'll address that in future patches.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index ed5b26a..95a7480 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -58,12 +58,13 @@ override LD	+= -m elf$(SZ)ppc
 override CC	+= -m$(SZ)
 endif
 
-LDFLAGS_vmlinux	:= -Bstatic
+LDFLAGS_vmlinux	:= -Bstatic --no-tls-optimize
 
 # The -Iarch/$(ARCH)/include is temporary while we are merging
 CPPFLAGS-$(CONFIG_PPC32) := -Iarch/$(ARCH) -Iarch/$(ARCH)/include
 AFLAGS-$(CONFIG_PPC32)	:= -Iarch/$(ARCH)
-CFLAGS-$(CONFIG_PPC64)	:= -mminimal-toc -mtraceback=none  -mcall-aixdesc
+CFLAGS-$(CONFIG_PPC64)	:= -mminimal-toc -mtraceback=none -mcall-aixdesc \
+			   -ftls-model=local-exec -mtls-size=16
 CFLAGS-$(CONFIG_PPC32)	:= -Iarch/$(ARCH) -ffixed-r2 -mmultiple
 CPPFLAGS	+= $(CPPFLAGS-y)
 AFLAGS		+= $(AFLAGS-y)
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 8f85c5e..1cd54a6 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -112,6 +112,7 @@ #ifdef CONFIG_PPC64
 	DEFINE(PACAPROCSTART, offsetof(struct paca_struct, cpu_start));
 	DEFINE(PACAKSAVE, offsetof(struct paca_struct, kstack));
 	DEFINE(PACACURRENT, offsetof(struct paca_struct, __current));
+	DEFINE(PACATHREADPTR, offsetof(struct paca_struct, thread_ptr));
 	DEFINE(PACASAVEDMSR, offsetof(struct paca_struct, saved_msr));
 	DEFINE(PACASTABREAL, offsetof(struct paca_struct, stab_real));
 	DEFINE(PACASTABVIRT, offsetof(struct paca_struct, stab_addr));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 19ad5c6..455443e 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -92,14 +92,15 @@ system_call_common:
 	ld	r11,exception_marker@toc(r2)
 	std	r11,-16(r9)		/* "regshere" marker */
 #ifdef CONFIG_PPC_ISERIES
+	lbz	r10,PACAPROCENABLED(r13)
+	std	r10,SOFTE(r1)
 	/* Hack for handling interrupts when soft-enabling on iSeries */
 	cmpdi	cr1,r0,0x5555		/* syscall 0x5555 */
 	andi.	r10,r12,MSR_PR		/* from kernel */
 	crand	4*cr0+eq,4*cr1+eq,4*cr0+eq
 	beq	hardware_interrupt_entry
-	lbz	r10,PACAPROCENABLED(r13)
-	std	r10,SOFTE(r1)
 #endif
+	ld	r13,PACATHREADPTR(r13)
 	mfmsr	r11
 	ori	r11,r11,MSR_EE
 	mtmsrd	r11,1
@@ -170,6 +171,7 @@ syscall_error_cont:
 	andi.	r6,r8,MSR_PR
 	ld	r4,_LINK(r1)
 	beq-	1f
+	mfspr	r13,SPRN_SPRG3
 	ACCOUNT_CPU_USER_EXIT(r11, r12)
 	ld	r13,GPR13(r1)	/* only restore r13 if returning to usermode */
 1:	ld	r2,GPR2(r1)
@@ -361,7 +363,8 @@ #ifdef CONFIG_SMP
 #endif /* CONFIG_SMP */
 
 	addi	r6,r4,-THREAD	/* Convert THREAD to 'current' */
-	std	r6,PACACURRENT(r13)	/* Set new 'current' */
+	mfspr	r10,SPRN_SPRG3
+	std	r6,PACACURRENT(r10)	/* Set new 'current' */
 
 	ld	r8,KSP(r4)	/* new stack pointer */
 BEGIN_FTR_SECTION
@@ -390,7 +393,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_SLB)
 	addi	r7,r7,THREAD_SIZE-SWITCH_FRAME_SIZE
 
 	mr	r1,r8		/* start using new stack pointer */
-	std	r7,PACAKSAVE(r13)
+	std	r7,PACAKSAVE(r10)
 
 	ld	r6,_CCR(r1)
 	mtcrf	0xFF,r6
@@ -457,22 +460,23 @@ restore:
 #ifdef CONFIG_PPC_ISERIES
 	ld	r5,SOFTE(r1)
 	cmpdi	0,r5,0
+	mfspr	r11,SPRN_SPRG3
 	beq	4f
 	/* Check for pending interrupts (iSeries) */
-	ld	r3,PACALPPACAPTR(r13)
+	ld	r3,PACALPPACAPTR(r11)
 	ld	r3,LPPACAANYINT(r3)
 	cmpdi	r3,0
 	beq+	4f			/* skip do_IRQ if no interrupts */
 
 	li	r3,0
-	stb	r3,PACAPROCENABLED(r13)	/* ensure we are soft-disabled */
+	stb	r3,PACAPROCENABLED(r11)	/* ensure we are soft-disabled */
 	ori	r10,r10,MSR_EE
 	mtmsrd	r10			/* hard-enable again */
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.do_IRQ
 	b	.ret_from_except_lite		/* loop back and handle more */
 
-4:	stb	r5,PACAPROCENABLED(r13)
+4:	stb	r5,PACAPROCENABLED(r11)
 #endif
 
 	ld	r3,_MSR(r1)
@@ -486,6 +490,7 @@ #endif
 	 * userspace
 	 */
 	beq	1f
+	mfspr	r13,SPRN_SPRG3
 	ACCOUNT_CPU_USER_EXIT(r3, r4)
 	REST_GPR(13, r1)
 1:
@@ -541,8 +546,9 @@ #endif
 	/* here we are preempting the current task */
 1:
 #ifdef CONFIG_PPC_ISERIES
+	mfspr	r11,SPRN_SPRG3
 	li	r0,1
-	stb	r0,PACAPROCENABLED(r13)
+	stb	r0,PACAPROCENABLED(r11)
 #endif
 	ori	r10,r10,MSR_EE
 	mtmsrd	r10,1		/* reenable interrupts */
@@ -641,8 +647,9 @@ _GLOBAL(enter_rtas)
 	 * so they are saved in the PACA which allows us to restore
 	 * our original state after RTAS returns.
          */
-	std	r1,PACAR1(r13)
-        std	r6,PACASAVEDMSR(r13)
+	mfspr	r5,SPRN_SPRG3
+	std	r1,PACAR1(r5)
+	std	r6,PACASAVEDMSR(r5)
 
 	/* Setup our real return addr */	
 	LOAD_REG_ADDR(r4,.rtas_return_loc)
@@ -698,6 +705,7 @@ _STATIC(rtas_restore_regs)
 	REST_10GPRS(22, r1)		/* ditto */
 
 	mfspr	r13,SPRN_SPRG3
+	ld	r13,PACATHREADPTR(r13)
 
 	ld	r4,_CCR(r1)
 	mtcr	r4
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index b7d1404..80d95b4 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -298,6 +298,7 @@ #define EXCEPTION_PROLOG_COMMON(n, area)
 	std	r10,_CTR(r1);						   \
 	mfspr	r11,SPRN_XER;		/* save XER in stackframe	*/ \
 	std	r11,_XER(r1);						   \
+	SAVE_INT_ENABLE(r10);		/* save soft irq disable state	*/ \
 	li	r9,(n)+1;						   \
 	std	r9,_TRAP(r1);		/* set trap number		*/ \
 	li	r10,0;							   \
@@ -338,27 +339,27 @@ label##_iSeries:							\
 	b	label##_common;						\
 
 #ifdef DO_SOFT_DISABLE
+#define SAVE_INT_ENABLE(rn)			\
+	lbz	rn,PACAPROCENABLED(r13);	\
+	std	rn,SOFTE(r1)
+
 #define DISABLE_INTS				\
-	lbz	r10,PACAPROCENABLED(r13);	\
 	li	r11,0;				\
-	std	r10,SOFTE(r1);			\
 	mfmsr	r10;				\
 	stb	r11,PACAPROCENABLED(r13);	\
 	ori	r10,r10,MSR_EE;			\
 	mtmsrd	r10,1
 
 #define ENABLE_INTS				\
-	lbz	r10,PACAPROCENABLED(r13);	\
 	mfmsr	r11;				\
-	std	r10,SOFTE(r1);			\
 	ori	r11,r11,MSR_EE;			\
 	mtmsrd	r11,1
 
 #else	/* hard enable/disable interrupts */
+#define SAVE_INT_ENABLE(rn)
 #define DISABLE_INTS
 
 #define ENABLE_INTS				\
-	ld	r12,_MSR(r1);			\
 	mfmsr	r11;				\
 	rlwimi	r11,r12,0,MSR_EE;		\
 	mtmsrd	r11,1
@@ -371,6 +372,7 @@ #define STD_EXCEPTION_COMMON(trap, label
 label##_common:						\
 	EXCEPTION_PROLOG_COMMON(trap, PACA_EXGEN);	\
 	DISABLE_INTS;					\
+	ld	r13,PACATHREADPTR(r13);		\
 	bl	.save_nvgprs;				\
 	addi	r3,r1,STACK_FRAME_OVERHEAD;		\
 	bl	hdlr;					\
@@ -387,6 +389,7 @@ label##_common:						\
 	EXCEPTION_PROLOG_COMMON(trap, PACA_EXGEN);	\
 	FINISH_NAP;					\
 	DISABLE_INTS;					\
+	ld	r13,PACATHREADPTR(r13);		\
 	bl	.save_nvgprs;				\
 	addi	r3,r1,STACK_FRAME_OVERHEAD;		\
 	bl	hdlr;					\
@@ -399,6 +402,7 @@ label##_common:						\
 	EXCEPTION_PROLOG_COMMON(trap, PACA_EXGEN);	\
 	FINISH_NAP;					\
 	DISABLE_INTS;					\
+	ld	r13,PACATHREADPTR(r13);		\
 	bl	.ppc64_runlatch_on;			\
 	addi	r3,r1,STACK_FRAME_OVERHEAD;		\
 	bl	hdlr;					\
@@ -810,6 +814,7 @@ machine_check_common:
 	EXCEPTION_PROLOG_COMMON(0x200, PACA_EXMC)
 	FINISH_NAP
 	DISABLE_INTS
+	ld	r13,PACATHREADPTR(r13)
 	bl	.save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.machine_check_exception
@@ -864,6 +869,7 @@ bad_stack:
 	li	r12,0
 	std	r12,0(r11)
 	ld	r2,PACATOC(r13)
+	ld	r13,PACATHREADPTR(r13)
 1:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.kernel_bad_stack
 	b	1b
@@ -886,6 +892,7 @@ fast_exception_return:
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
 	andi.	r3,r12,MSR_PR
 	beq	2f
+	mfspr	r13,SPRN_SPRG3
 	ACCOUNT_CPU_USER_EXIT(r3, r4)
 2:
 #endif
@@ -913,6 +920,8 @@ #endif
 	b	.	/* prevent speculative execution */
 
 unrecov_fer:
+	mfspr	r13,SPRN_SPRG3
+	ld	r13,PACATHREADPTR(r13)
 	bl	.save_nvgprs
 1:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.unrecoverable_exception
@@ -933,16 +942,20 @@ data_access_common:
 	EXCEPTION_PROLOG_COMMON(0x300, PACA_EXGEN)
 	ld	r3,PACA_EXGEN+EX_DAR(r13)
 	lwz	r4,PACA_EXGEN+EX_DSISR(r13)
+	DISABLE_INTS
 	li	r5,0x300
+	ld	r13,PACATHREADPTR(r13)
 	b	.do_hash_page	 	/* Try to handle as hpte fault */
 
 	.align	7
 	.globl instruction_access_common
 instruction_access_common:
 	EXCEPTION_PROLOG_COMMON(0x400, PACA_EXGEN)
+	DISABLE_INTS
 	ld	r3,_NIP(r1)
 	andis.	r4,r12,0x5820
 	li	r5,0x400
+	ld	r13,PACATHREADPTR(r13)
 	b	.do_hash_page		/* Try to handle as hpte fault */
 
 /*
@@ -958,7 +971,7 @@ slb_miss_user_common:
 	stw	r9,PACA_EXGEN+EX_CCR(r13)
 	std	r10,PACA_EXGEN+EX_LR(r13)
 	std	r11,PACA_EXGEN+EX_SRR0(r13)
-	bl	.slb_allocate_user
+	bl	..slb_allocate_user
 
 	ld	r10,PACA_EXGEN+EX_LR(r13)
 	ld	r3,PACA_EXGEN+EX_R3(r13)
@@ -996,11 +1009,14 @@ slb_miss_fault:
 	li	r5,0
 	std	r4,_DAR(r1)
 	std	r5,_DSISR(r1)
+	ld	r13,PACATHREADPTR(r13)
+	ENABLE_INTS
 	b	.handle_page_fault
 
 unrecov_user_slb:
 	EXCEPTION_PROLOG_COMMON(0x4200, PACA_EXGEN)
 	DISABLE_INTS
+	ld	r13,PACATHREADPTR(r13)
 	bl	.save_nvgprs
 1:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.unrecoverable_exception
@@ -1023,7 +1039,7 @@ _GLOBAL(slb_miss_realmode)
 	stw	r9,PACA_EXSLB+EX_CCR(r13)	/* save CR in exc. frame */
 	std	r10,PACA_EXSLB+EX_LR(r13)	/* save LR */
 
-	bl	.slb_allocate_realmode
+	bl	..slb_allocate_realmode
 
 	/* All done -- return from exception. */
 
@@ -1061,6 +1077,7 @@ #endif /* CONFIG_PPC_ISERIES */
 unrecov_slb:
 	EXCEPTION_PROLOG_COMMON(0x4100, PACA_EXSLB)
 	DISABLE_INTS
+	ld	r13,PACATHREADPTR(r13)
 	bl	.save_nvgprs
 1:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.unrecoverable_exception
@@ -1074,6 +1091,7 @@ hardware_interrupt_common:
 	FINISH_NAP
 hardware_interrupt_entry:
 	DISABLE_INTS
+	ld	r13,PACATHREADPTR(r13)
 	bl	.ppc64_runlatch_on
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.do_IRQ
@@ -1100,9 +1118,10 @@ alignment_common:
 	lwz	r4,PACA_EXGEN+EX_DSISR(r13)
 	std	r3,_DAR(r1)
 	std	r4,_DSISR(r1)
+	ld	r13,PACATHREADPTR(r13)
+	ENABLE_INTS
 	bl	.save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
-	ENABLE_INTS
 	bl	.alignment_exception
 	b	.ret_from_except
 
@@ -1110,9 +1129,10 @@ alignment_common:
 	.globl program_check_common
 program_check_common:
 	EXCEPTION_PROLOG_COMMON(0x700, PACA_EXGEN)
+	ld	r13,PACATHREADPTR(r13)
+	ENABLE_INTS
 	bl	.save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
-	ENABLE_INTS
 	bl	.program_check_exception
 	b	.ret_from_except
 
@@ -1121,9 +1141,10 @@ program_check_common:
 fp_unavailable_common:
 	EXCEPTION_PROLOG_COMMON(0x800, PACA_EXGEN)
 	bne	.load_up_fpu		/* if from user, just load it up */
+	ld	r13,PACATHREADPTR(r13)
+	ENABLE_INTS
 	bl	.save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
-	ENABLE_INTS
 	bl	.kernel_fp_unavailable_exception
 	BUG_OPCODE
 
@@ -1136,9 +1157,10 @@ BEGIN_FTR_SECTION
 	bne	.load_up_altivec	/* if from user, just load it up */
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif
+	ld	r13,PACATHREADPTR(r13)
+	ENABLE_INTS
 	bl	.save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
-	ENABLE_INTS
 	bl	.altivec_unavailable_exception
 	b	.ret_from_except
 
@@ -1242,13 +1264,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_SLB)
 	rlwimi	r4,r0,32-13,30,30	/* becomes _PAGE_USER access bit */
 	ori	r4,r4,1			/* add _PAGE_PRESENT */
 	rlwimi	r4,r5,22+2,31-2,31-2	/* Set _PAGE_EXEC if trap is 0x400 */
-
-	/*
-	 * On iSeries, we soft-disable interrupts here, then
-	 * hard-enable interrupts so that the hash_page code can spin on
-	 * the hash_table_lock without problems on a shared processor.
-	 */
-	DISABLE_INTS
 
 	/*
 	 * r3 contains the faulting address
@@ -1258,6 +1273,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_SLB)
 	 * at return r3 = 0 for success
 	 */
 	bl	.hash_page		/* build HPTE if possible */
+11:					/* re-enter here from do_ste_alloc */
 	cmpdi	r3,0			/* see if hash_page succeeded */
 
 #ifdef DO_SOFT_DISABLE
@@ -1280,18 +1296,18 @@ #ifdef DO_SOFT_DISABLE
 	 */
 	ld	r3,SOFTE(r1)
 	bl	.local_irq_restore
-	b	11f
 #else
 	beq	fast_exception_return   /* Return from exception on success */
 	ble-	12f			/* Failure return from hash_page */
 
-	/* fall through */
+	ld	r12,_MSR(r1)		/* Reenable interrupts if they */
+	ENABLE_INTS			/* were enabled when trap occurred */
 #endif
+	/* fall through */
 
 /* Here we have a page fault that hash_page can't handle. */
 _GLOBAL(handle_page_fault)
-	ENABLE_INTS
-11:	ld	r4,_DAR(r1)
+	ld	r4,_DAR(r1)
 	ld	r5,_DSISR(r1)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.do_page_fault
@@ -1316,9 +1332,7 @@ _GLOBAL(handle_page_fault)
 	/* here we have a segment miss */
 _GLOBAL(do_ste_alloc)
 	bl	.ste_allocate		/* try to insert stab entry */
-	cmpdi	r3,0
-	beq+	fast_exception_return
-	b	.handle_page_fault
+	b	11b
 
 /*
  * r13 points to the PACA, r9 contains the saved CR,
@@ -1796,6 +1810,9 @@ _GLOBAL(__secondary_start)
 	/* Clear backchain so we get nice backtraces */
 	li	r7,0
 	mtlr	r7
+
+	/* load per-cpu data area pointer */
+	ld	r13,PACATHREADPTR(r13)
 
 	/* enable MMU and jump to start_secondary */
 	LOAD_REG_ADDR(r3, .start_secondary_prolog)
@@ -1808,9 +1825,11 @@ #endif
 	rfid
 	b	.	/* prevent speculative execution */
 
-/* 
+/*
  * Running with relocation on at this point.  All we want to do is
  * zero the stack back-chain pointer before going into C code.
+ * We can't do this in __secondary_start because the stack isn't
+ * necessarily in the RMA, so it might not be accessible in real mode.
  */
 _GLOBAL(start_secondary_prolog)
 	li	r3,0
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 2778cce..f1899b0 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -27,6 +27,7 @@ #include <asm/ppc_asm.h>
 #include <asm/asm-offsets.h>
 #include <asm/cputable.h>
 #include <asm/thread_info.h>
+#include <asm/reg.h>
 
 	.text
 
@@ -820,6 +821,7 @@ #ifdef CONFIG_KEXEC
  * join other cpus in kexec_wait(phys_id)
  */
 _GLOBAL(kexec_smp_wait)
+	mfspr	r13,SPRN_SPRG3
 	lhz	r3,PACAHWCPUID(r13)
 	li	r4,-1
 	sth	r4,PACAHWCPUID(r13)	/* let others know we left */
@@ -885,6 +887,7 @@ _GLOBAL(kexec_sequence)
 	mr	r28,r6			/* control, unused */
 	mr	r27,r7			/* clear_all() fn desc */
 	mr	r26,r8			/* spare */
+	mfspr	r13,SPRN_SPRG3
 	lhz	r25,PACAHWCPUID(r13)	/* get our phys cpu from paca */
 
 	/* disable interrupts, we are overwriting kernel data next */
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index ba34001..8140cbe 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -357,9 +357,7 @@ int apply_relocate_add(Elf64_Shdr *sechd
 				       me->name, value);
 				return -ENOEXEC;
 			}
-			*((uint16_t *) location)
-				= (*((uint16_t *) location) & ~0xffff)
-				| (value & 0xffff);
+			*(u16 *)location = value;
 			break;
 
 		case R_PPC64_TOC16_DS:
@@ -398,6 +396,32 @@ int apply_relocate_add(Elf64_Shdr *sechd
 			*(uint32_t *)location 
 				= (*(uint32_t *)location & ~0x03fffffc)
 				| (value & 0x03fffffc);
+			break;
+
+		case R_PPC64_TPREL16:
+			if (value > 0xffff) {
+				printk(KERN_ERR "%s: TPREL16 relocation "
+				       "too large (%d)\n", value - 0x8000);
+				return -ENOEXEC;
+			}
+			*(u16 *)location = value - 0x8000;
+			break;
+
+		case R_PPC64_TPREL16_LO:
+			*(u16 *)location = PPC_LO(value - 0x8000);
+			break;
+
+		case R_PPC64_TPREL16_LO_DS:
+			*(u16 *)location = ((*(u16 *)location) & ~0xfffc)
+				| ((value - 0x8000) & 0xfffc);
+			break;
+
+		case R_PPC64_TPREL16_HA:
+			*(u16 *)location = PPC_HA(value - 0x8000);
+			break;
+
+		case R_PPC64_TPREL64:
+			*(u64 *)location = value - 0x8000;
 			break;
 
 		default:
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 4467c49..7fe7c7d 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -605,6 +605,7 @@ void __init setup_per_cpu_areas(void)
 {
 	int i;
 	unsigned long size;
+	unsigned long initsize;
 	char *ptr;
 
 	/* Copy section for each CPU (we discard the original) */
@@ -613,14 +614,19 @@ #ifdef CONFIG_MODULES
 	if (size < PERCPU_ENOUGH_ROOM)
 		size = PERCPU_ENOUGH_ROOM;
 #endif
+	initsize = __end_tdata - __start_tdata;
 
 	for_each_possible_cpu(i) {
 		ptr = alloc_bootmem_node(NODE_DATA(cpu_to_node(i)), size);
 		if (!ptr)
 			panic("Cannot allocate cpu data for CPU %d\n", i);
 
-		paca[i].data_offset = ptr - __per_cpu_start;
-		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+		paca[i].thread_ptr = (unsigned long)ptr + 0x7000;
+		memcpy(ptr, __start_tdata, initsize);
+		if (initsize < size)
+			memset(ptr + initsize, 0, size - initsize);
 	}
+	/* Set our percpu area pointer register */
+	asm volatile("mr 13,%0" : : "r" (paca[boot_cpuid].thread_ptr));
 }
 #endif
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index fe79c25..c83ff6a 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -141,11 +141,12 @@ #ifdef CONFIG_PPC32
 #else
 	. = ALIGN(128);
 #endif
-	.data.percpu : {
-		__per_cpu_start = .;
-		*(.data.percpu)
-		__per_cpu_end = .;
-	}
+	__start_tdata = .;
+	.tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) }
+	__end_tdata = .;
+	.tbss  : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) }
+	__per_cpu_start = 0x1000;
+	__per_cpu_end = 0x1000 + ALIGN(SIZEOF(.tdata), 128) + SIZEOF(.tbss);
 
 	. = ALIGN(8);
 	.machine.desc : {
diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S
index abfaabf..92f11cd 100644
--- a/arch/powerpc/mm/slb_low.S
+++ b/arch/powerpc/mm/slb_low.S
@@ -23,14 +23,30 @@ #include <asm/page.h>
 #include <asm/mmu.h>
 #include <asm/pgtable.h>
 
-/* void slb_allocate_realmode(unsigned long ea);
+/*
+ * void slb_allocate_realmode(unsigned long ea);
  *
+ * This version is callable from C; the version with two dots at the
+ * start of the name assumes r13 points to the PACA and thus isn't.
+ */
+_GLOBAL(slb_allocate_realmode)
+	mflr	r0
+	std	r0,16(r1)
+	mr	r8,r13
+	mfspr	r13,SPRN_SPRG3
+	bl	..slb_allocate_realmode
+	mr	r13,r8
+	mtlr	r0
+	blr
+
+/*
  * Create an SLB entry for the given EA (user or kernel).
  * 	r3 = faulting address, r13 = PACA
  *	r9, r10, r11 are clobbered by this function
  * No other registers are examined or changed.
  */
-_GLOBAL(slb_allocate_realmode)
+	.globl	..slb_allocate_realmode
+..slb_allocate_realmode:
 	/* r3 = faulting address */
 
 	srdi	r9,r3,60		/* get region */
@@ -121,7 +137,8 @@ #ifdef __DISABLED__
  * It is called with translation enabled in order to be able to walk the
  * page tables. This is not currently used.
  */
-_GLOBAL(slb_allocate_user)
+	.globl	..slb_allocate_user
+..slb_allocate_user:
 	/* r3 = faulting address */
 	srdi	r10,r3,28		/* get esid */
 
diff --git a/arch/powerpc/platforms/iseries/misc.S b/arch/powerpc/platforms/iseries/misc.S
index 7641fc7..d8a3ab5 100644
--- a/arch/powerpc/platforms/iseries/misc.S
+++ b/arch/powerpc/platforms/iseries/misc.S
@@ -21,30 +21,33 @@ #include <asm/ppc_asm.h>
 
 /* unsigned long local_save_flags(void) */
 _GLOBAL(local_get_flags)
-	lbz	r3,PACAPROCENABLED(r13)
+	mfspr	r3,SPRG3
+	lbz	r3,PACAPROCENABLED(r3)
 	blr
 
 /* unsigned long local_irq_disable(void) */
 _GLOBAL(local_irq_disable)
-	lbz	r3,PACAPROCENABLED(r13)
+	mfspr	r5,SPRG3
+	lbz	r3,PACAPROCENABLED(r5)
 	li	r4,0
-	stb	r4,PACAPROCENABLED(r13)
+	stb	r4,PACAPROCENABLED(r5)
 	blr			/* Done */
 
 /* void local_irq_restore(unsigned long flags) */
 _GLOBAL(local_irq_restore)
-	lbz	r5,PACAPROCENABLED(r13)
+	mfspr	r6,SPRG3
+	lbz	r5,PACAPROCENABLED(r6)
 	 /* Check if things are setup the way we want _already_. */
 	cmpw	0,r3,r5
 	beqlr
 	/* are we enabling interrupts? */
 	cmpdi	0,r3,0
-	stb	r3,PACAPROCENABLED(r13)
+	stb	r3,PACAPROCENABLED(r6)
 	beqlr
 	/* Check pending interrupts */
 	/*   A decrementer, IPI or PMC interrupt may have occurred
 	 *   while we were in the hypervisor (which enables) */
-	ld	r4,PACALPPACAPTR(r13)
+	ld	r4,PACALPPACAPTR(r6)
 	ld	r4,LPPACAANYINT(r4)
 	cmpdi	r4,0
 	beqlr
diff --git a/include/asm-powerpc/paca.h b/include/asm-powerpc/paca.h
index 706325f..afbfb5c 100644
--- a/include/asm-powerpc/paca.h
+++ b/include/asm-powerpc/paca.h
@@ -21,8 +21,14 @@ #include	<asm/types.h>
 #include	<asm/lppaca.h>
 #include	<asm/mmu.h>
 
-register struct paca_struct *local_paca asm("r13");
-#define get_paca()	local_paca
+static inline struct paca_struct *get_paca(void)
+{
+	struct paca_struct *p;
+
+	asm volatile("mfsprg3 %0" : "=r" (p));
+	return p;
+}
+
 #define get_lppaca()	(get_paca()->lppaca_ptr)
 
 struct task_struct;
@@ -66,7 +72,7 @@ #endif /* CONFIG_PPC_ISERIES */
 	u64 stab_real;			/* Absolute address of segment table */
 	u64 stab_addr;			/* Virtual address of segment table */
 	void *emergency_sp;		/* pointer to emergency stack */
-	u64 data_offset;		/* per cpu data offset */
+	u64 thread_ptr;			/* per cpu data pointer + 0x7000 */
 	s16 hw_cpu_id;			/* Physical processor number */
 	u8 cpu_start;			/* At startup, processor spins until */
 					/* this becomes non-zero. */
diff --git a/include/asm-powerpc/percpu.h b/include/asm-powerpc/percpu.h
index 5d603ff..dcd9aa0 100644
--- a/include/asm-powerpc/percpu.h
+++ b/include/asm-powerpc/percpu.h
@@ -2,40 +2,76 @@ #ifndef _ASM_POWERPC_PERCPU_H_
 #define _ASM_POWERPC_PERCPU_H_
 #ifdef __powerpc64__
 #include <linux/compiler.h>
-
-/*
- * Same as asm-generic/percpu.h, except that we store the per cpu offset
- * in the paca. Based on the x86-64 implementation.
- */
-
-#ifdef CONFIG_SMP
-
 #include <asm/paca.h>
 
-#define __per_cpu_offset(cpu) (paca[cpu].data_offset)
-#define __my_cpu_offset() get_paca()->data_offset
+#ifdef CONFIG_SMP
 
 /* Separate out the type, so (int[3], foo) works. */
 #define DEFINE_PER_CPU(type, name) \
-    __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
+	__thread __typeof__(type) per_cpu__##name __attribute__((__used__))
+
+#define __get_cpu_var(var)	per_cpu__##var
+#define __raw_get_cpu_var(var)	per_cpu__##var
 
-/* var is in discarded region: offset to particular copy we want */
-#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset(cpu)))
-#define __get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __my_cpu_offset()))
-#define __raw_get_cpu_var(var) (*RELOC_HIDE(&per_cpu__##var, __my_cpu_offset()))
+#define per_cpu(var, cpu)					\
+	(*(__typeof__(&per_cpu__##var))({			\
+		void *__ptr;					\
+		asm("addi %0,%1,per_cpu__"#var"@tprel"		\
+		    : "=b" (__ptr)				\
+		    : "b" (paca[(cpu)].thread_ptr));		\
+		__ptr;						\
+	}))
 
 /* A macro to avoid #include hell... */
-#define percpu_modcopy(pcpudst, src, size, zero_size)		\
-do {								\
-	unsigned int __i;					\
-	BUG_ON(zero_size != 0);					\
-	for_each_possible_cpu(__i)				\
-		memcpy((pcpudst)+__per_cpu_offset(__i),		\
-		       (src), (size));				\
+#define percpu_modcopy(pcpudst, src, size, total_size)			    \
+do {									    \
+	unsigned int __i;						    \
+	extern char __per_cpu_start[];					    \
+	unsigned long offset = (unsigned long)(pcpudst) - 0x8000;	    \
+	for_each_possible_cpu(__i) {					    \
+		memcpy((void *)(offset + paca[__i].thread_ptr),		    \
+		       (src), (size));					    \
+		if ((size) < (total_size))				    \
+			memset((void *)(offset + (size) + paca[__i].thread_ptr), \
+			       0, (total_size) - (size));		    \
+	}								    \
 } while (0)
 
 extern void setup_per_cpu_areas(void);
+
+#define DECLARE_PER_CPU(type, name) \
+	extern __thread __typeof__(type) per_cpu__##name
+
+#ifndef __GENKSYMS__
+#define __EXPORT_PER_CPU_SYMBOL(sym, sec)				\
+	extern __thread typeof(sym) sym;				\
+	__CRC_SYMBOL(sym, sec)						\
+	static const char __kstrtab_##sym[]				\
+	__attribute__((used, section("__ksymtab_strings"))) = #sym;	\
+	asm(".section	__ksymtab"sec",\"aw\",@progbits\n"		\
+	    "	.align 3\n"						\
+	    "	.type	__ksymtab_"#sym", @object\n"			\
+	    "	.size	__ksymtab_"#sym", 16\n"				\
+	    "__ksymtab_"#sym":\n"					\
+	    "	.quad	0x8000+"#sym"@tprel\n"				\
+	    "	.quad	__kstrtab_"#sym)
+
+#define EXPORT_PER_CPU_SYMBOL(var) \
+	__EXPORT_PER_CPU_SYMBOL(per_cpu__##var, "")
+#define EXPORT_PER_CPU_SYMBOL_GPL(var) \
+	__EXPORT_PER_CPU_SYMBOL(per_cpu__##var, "_gpl")
+
+#else
+/* for genksyms's sake... */
+#define __thread
+#define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
+#define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
+#endif
 
+/* Actual kernel address of .tdata section contents */
+extern char __start_tdata[];
+extern char __end_tdata[];
+
 #else /* ! SMP */
 
 #define DEFINE_PER_CPU(type, name) \
@@ -45,12 +81,12 @@ #define per_cpu(var, cpu)			(*((void)(cp
 #define __get_cpu_var(var)			per_cpu__##var
 #define __raw_get_cpu_var(var)			per_cpu__##var
 
-#endif	/* SMP */
-
 #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
 
 #define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
 #define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
+
+#endif	/* SMP */
 
 #else
 #include <asm-generic/percpu.h>
diff --git a/kernel/printk.c b/kernel/printk.c

^ permalink raw reply related

* Help Needed: input overrun(s)
From: s.maiti @ 2006-05-10  4:33 UTC (permalink / raw)
  To: linuxppc-embedded

Hi all,

I am currently involve in development of Multi-Channel Controller (MCC) 
driver for MPC8260 processor. Whenever we are loading the driver, on the 
console we are receiving a print "ttyS: 1 input overrun(s)" along with 
other prints of the driver and resulting in scrambled output. 
Can anyone suggest why this is happening? Is the driver affecting the uart 
driver? We have seen the memory map thoroughly, there is no issue of 
memory conflict. Any help in this regards, I will be grateful.

Thnaks and regards,
Souvik Maiti
Tata Consultancy Services Limited
Mailto: s.maiti@tcs.com
Website: http://www.tcs.com
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you

^ permalink raw reply

* Re: [RFC/PATCH] Make powerpc64 use __thread for per-cpu variables
From: Olof Johansson @ 2006-05-10  5:16 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linux-arch, linuxppc-dev, linux-kernel
In-Reply-To: <17505.26159.807484.477212@cargo.ozlabs.ibm.com>

On Wed, May 10, 2006 at 02:03:59PM +1000, Paul Mackerras wrote:
> With this patch, 64-bit powerpc uses __thread for per-cpu variables.

Nice! I like the way you hid the slb functions so they can't ever be
called by mistake from C code. :-)

This patch a ppc64_defconfig vmlinux a bit (with the other two percpu
patches):

olof@quad:~/work/linux/powerpc $ ls -l vmlinux.pre vmlinux
-rwxr-xr-x 1 olof olof 10290928 2006-05-09 23:48 vmlinux.pre
-rwxr-xr-x 1 olof olof 10307499 2006-05-09 23:50 vmlinux
olof@quad:~/work/linux/powerpc $ size vmlinux.pre vmlinux
   text    data     bss     dec     hex filename
5554034 2404256  480472 8438762  80c3ea vmlinux.pre
5578866 2384944  498848 8462658  812142 vmlinux

Looks like alot of the text growth is from the added mfsprg3 instructions:

$ objdump -d vmlinux.pre | egrep mfsprg.\*,3\$ | wc -l
26
$ objdump -d vmlinux | egrep mfsprg.\*,3\$ | wc -l
5134

... so, as the PACA gets deprecated, the bloat will go away again.

> The motivation for doing this is that getting the address of a per-cpu
> variable currently requires two loads (one to get our per-cpu offset
> and one to get the address of the variable in the .data.percpu
> section) plus an add.  With __thread we can get the address of our
> copy of a per-cpu variable with just an add (r13 plus a constant).

It would be interesting to see benchmarks of how much it improves
things. I guess it doesn't really get interesting until after the paca
gets removed though, due to the added mfsprg's.


-Olof

^ permalink raw reply

* Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines
From: Michael S. Tsirkin @ 2006-05-10  5:33 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Roland Dreier, linux-kernel, openib-general, linuxppc-dev,
	Christoph Raisch, Hoang-Nam Nguyen, Marcus Eder,
	openib-general-bounces
In-Reply-To: <OFD1053717.15B4F49A-ON87257169.007531E1-88257169.007AD29E@us.ibm.com>

Quoting r. Shirley Ma <xma@us.ibm.com>:
> Subject: Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling?routines
> 
> 
> "Michael S. Tsirkin" <mst@mellanox.co.il> wrote on 05/09/2006 01:20:41 PM:
> 
> > Quoting r. Shirley Ma <xma@us.ibm.com>:
> > > My understanding is NAPI handle interrutps CQ callbacks on the same CPU.
> >
> > My understanding is NAPI disables interrupts under high RX load. No?
> 
> Yes, NAPI disables the interrupts based on the weight. In IPoIB case, it doesn't
> send out the next completion notification under heavy loading.
> The similiar CQ polling is still in NAPI on same CPU, but it's not a callback
> anymore.

Sorry, same CPU as what?

-- 
MST

^ permalink raw reply

* Re: [RFC/PATCH] Make powerpc64 use __thread for per-cpu variables
From: Alan Modra @ 2006-05-10  5:35 UTC (permalink / raw)
  To: Olof Johansson; +Cc: linux-arch, linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <20060510051649.GD1794@lixom.net>

On Wed, May 10, 2006 at 12:16:50AM -0500, Olof Johansson wrote:
> ... so, as the PACA gets deprecated, the bloat will go away again.

We can also lose one instruction per tls access, if I can manage to
teach gcc a trick or two.

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox