LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/2]: powerpc/cell spidernet bottom half
From: David Miller @ 2006-08-16 22:29 UTC (permalink / raw)
  To: arnd
  Cc: akpm, jeff, netdev, jklewis, linux-kernel, linuxppc-dev,
	Jens.Osterkamp
In-Reply-To: <200608170016.47072.arnd@arndb.de>

From: Arnd Bergmann <arnd@arndb.de>
Date: Thu, 17 Aug 2006 00:16:46 +0200

> Am Wednesday 16 August 2006 23:32 schrieb David Miller:
> > Can spidernet be told these kinds of parameters? =A0"N packets or
> > X usecs"?
> =

> It can not do exactly this but probably we can get close to it by

Oh, you can only control TX packet counts using bits in the TX ring
entries :(

Tigon3 can even be told to use different interrupt mitigation
parameters when the cpu is actively servicing an interrupt for
the chip.

Didn't you say spidernet's facilities were sophisticated? :)
This Tigon3 stuff is like 5+ year old technology.

^ permalink raw reply

* Re: [PATCH 1/2]: powerpc/cell spidernet bottom half
From: Arnd Bergmann @ 2006-08-16 22:47 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: akpm, jeff, netdev, jklewis, linux-kernel, Jens.Osterkamp,
	David Miller
In-Reply-To: <20060816.152919.88472383.davem@davemloft.net>

Am Thursday 17 August 2006 00:29 schrieb David Miller:
> Didn't you say spidernet's facilities were sophisticated? :)
> This Tigon3 stuff is like 5+ year old technology.

I was rather overwhelmed by the 34 different interrupts that
the chip can create, that does not mean they chose the right
events for generating them.
Interestingly, spidernet has five different counters you can
set up to generate interrupts after a number of received frames,
but none for transmit...

	Arnd <><

^ permalink raw reply

* Re: [PATCH 1/2]: powerpc/cell spidernet bottom half
From: Linas Vepstas @ 2006-08-16 22:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: akpm, jeff, netdev, jklewis, linux-kernel, linuxppc-dev,
	Jens.Osterkamp, David Miller
In-Reply-To: <200608162324.47235.arnd@arndb.de>

On Wed, Aug 16, 2006 at 11:24:46PM +0200, Arnd Bergmann wrote:
> 
> it only
> seems to be hard to make it go fast using any of them. 

Last round of measurements seemed linear for packet sizes between
60 and 600 bytes, suggesting that the hardware can handle a 
maximum of 120K descriptors/second, independent of packet size.
I don't know why this is.

> That may
> be the fault of strange locking rules 

My fault; a few months ago, we were in panic mode trying to get
the thing functioning reliably, and I put locks around anything
and everything.  This last patch removes those locks, and protects
only a few pointers (the incrementing of the head and the tail 
pointers, and the location ofthe low watermark) that actually 
needed protection. They need protection because the code can 
get called in various different ways.

> Cleaning up the TX queue only from ->poll() like all the others

I'll try this ... 

> sounds like the right approach to simplify the code.

Its not a big a driver. 'wc' says its 2.3 loc, which 
is 1/3 or 1/5 the size of tg3.c or the e1000*c files.

--linas

^ permalink raw reply

* Re: [PATCH 1/2]: powerpc/cell spidernet bottom half
From: Arnd Bergmann @ 2006-08-16 23:03 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: akpm, jeff, netdev, jklewis, linux-kernel, Jens.Osterkamp,
	David Miller
In-Reply-To: <20060816225558.GM20551@austin.ibm.com>

Am Thursday 17 August 2006 00:55 schrieb Linas Vepstas:
> > it only
> > seems to be hard to make it go fast using any of them.
>
> Last round of measurements seemed linear for packet sizes between
> 60 and 600 bytes, suggesting that the hardware can handle a
> maximum of 120K descriptors/second, independent of packet size.
> I don't know why this is.

Could well be related to latencies when going to the remote
node for descriptor DMAs. Have you tried if the hch's NUMA
patch or using numactl makes a difference here?

> > sounds like the right approach to simplify the code.
>
> Its not a big a driver. 'wc' says its 2.3 loc, which
> is 1/3 or 1/5 the size of tg3.c or the e1000*c files.

Right, I was thinking of removing a lock or another, not
throwing out half of the driver ;-)

	Arnd <><

^ permalink raw reply

* Re: SMP in 32-bit arch/powerpc
From: Benjamin Herrenschmidt @ 2006-08-16 23:12 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: linuxppc-dev@ozlabs.org
In-Reply-To: <E1G8HvV-0007o7-0C@jdl.com>

On Wed, 2006-08-02 at 09:42 -0500, Jon Loeliger wrote:
> So, like, the other day Adrian Cox mumbled:
> > Is anybody else having problems with 32-bit SMP support in arch/powerpc?
> > I'm using 2.6.17 as my current base, because I've not yet merged the
> > latest mpic changes.
> > 
> > I'm currently bringing up a dual-7448 board, and when I build the kernel
> > with CONFIG_SMP, the bootmem allocator corrupts the device tree. The
> > strange thing is, this still happens when I don't start the second CPU.
> > Kernels built without CONFIG_SMP run flawlessly on the same hardware.
> 
> As a point of reference, the 32-bit 8641 HPCN seems to be working fine
> with both CONFIG_SMP and the device tree mechanism in place.  It was working
> both before and after the IRQ changes.
> 
> Can you nail down any more specifics on how it is failing or where
> the corruption happens?

For completeness, there is a known bug with 32 bits and SMP regarding
icache coherency.... If you have random SIGILL/SEGV under load, that's
probably what you are hitting. The problem is due to the way we do the
coherency and isn't trivial to fix unfortunately, though it's also
fairly rare.

Ben.

^ permalink raw reply

* Re: [PATCH] Convert to mac-address for ethernet MAC address data.
From: Benjamin Herrenschmidt @ 2006-08-16 23:15 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev@ozlabs.org
In-Reply-To: <0D9BEF4A-F8A5-42C5-9962-8C88322C9F50@kernel.crashing.org>

On Thu, 2006-08-03 at 16:49 -0500, Kumar Gala wrote:
> On Aug 3, 2006, at 4:25 PM, Jon Loeliger wrote:
> 
> > onvert to mac-address for ethernet MAC address data.
> >
> > Also accept "local-mac-address".  However the old "address"
> > is now obsolete, but accepted for backwards compatibility.
> > It should be removed after all device trees have been
> > converted to use "mac-address".

Why not stick to local-mac-address as this is the OF standard ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 1/2]: powerpc/cell spidernet bottom half
From: Rick Jones @ 2006-08-16 23:08 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: akpm, Arnd Bergmann, jeff, netdev, jklewis, linux-kernel,
	linuxppc-dev, Jens.Osterkamp, David Miller
In-Reply-To: <20060816225558.GM20551@austin.ibm.com>

Linas Vepstas wrote:
> On Wed, Aug 16, 2006 at 11:24:46PM +0200, Arnd Bergmann wrote:
> 
>>it only
>>seems to be hard to make it go fast using any of them. 
> 
> 
> Last round of measurements seemed linear for packet sizes between
> 60 and 600 bytes, suggesting that the hardware can handle a 
> maximum of 120K descriptors/second, independent of packet size.
> I don't know why this is.

DMA overhead perhaps?  If it takes so many micro/nanoseconds to get a 
DMA going....  That used to be a reason the Tigon2 had such low PPS 
rates and issues with multiple buffer packets and a 1500 byte MTU - it 
had rather high DMA setup latency, and then if you put it into a system 
with highish DMA read/write latency... well that didn't make it any 
better :)

rick jones

^ permalink raw reply

* Re: [PATCH 1/2]: powerpc/cell spidernet bottom half
From: Linas Vepstas @ 2006-08-16 23:24 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, arnd, jeff, netdev, jklewis, linux-kernel, linuxppc-dev,
	Jens.Osterkamp
In-Reply-To: <20060816.143203.11626235.davem@davemloft.net>

On Wed, Aug 16, 2006 at 02:32:03PM -0700, David Miller wrote:
> 
> The best schemes seem to be to interrupt mitigate using a combination
> of time and number of TX entries pending to be purged.  This is what
> most gigabit chips seem to offer.

I seem to be having a multi-hour delay for email delivery, so maybe
we've crossed emails.

A "low watermark interrupt" is an interrupt that is generated when
some queue is "almost empty". This last set of patches implement this
for the TX queue. The interrupt pops when 3/4ths of the packets 
in the queue have been processed.  Playing with ths setting
(3/4ths or some other number) seemed to make little difference.

> On Tigon3, for example, we tell the chip to interrupt if either 53
> frames or 150usecs have passed since the first TX packet has become
> available for reclaim.

The nature of a low-watermark interrupt is that it NEVER pops, as long
as the kernel keeps putting more stuff into the queue, so as to keep 
the queue at least 1/4'th full. I don't know how to mitigate interrupts 
more than that.

--linas

^ permalink raw reply

* Re: Linux hanging on Xilinx SystemACE
From: Grant Likely @ 2006-08-16 23:28 UTC (permalink / raw)
  To: Jeff Angielski; +Cc: linuxppc-embedded
In-Reply-To: <1155766078.10357.16.camel@sumo-jaa>

On 8/16/06, Jeff Angielski <jangiels@speakeasy.net> wrote:
> And like somebody else mentioned, if you are really going to use this
> for an embedded system, you are going to want to rethink your
> partitioning scheme.
>
> Maybe something like:
>
> p1 fat12 - kernel and binary image
> p2 ext2 - read only rootfs
> p3 ext3 - non volatile, slow rate data rootfs
> p4 tmpfs - volatile, high rate data rootfs

Or, if you have enough ram (and a small enough rootfs footprint); put
the rootfs into an initramfs and leave the CF alone entirely after
boot.  On small systems, I store all config parameters in a flat file
on the fat partition, and only write it out when it needs to save a
new configuration.  That way the entire system consists of three
files; a kernel, a rootfs image and a config file.  Makes managing
updates very easy.  :)

g.

-- 
Grant Likely, B.Sc. P.Eng.
Secret Lab Technologies Ltd.
grant.likely@secretlab.ca
(403) 399-0195

^ permalink raw reply

* Re: [PATCH 1/2]: powerpc/cell spidernet bottom half
From: Linas Vepstas @ 2006-08-16 23:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: akpm, jeff, netdev, jklewis, linux-kernel, linuxppc-dev,
	Jens.Osterkamp, David Miller
In-Reply-To: <200608170016.47072.arnd@arndb.de>

On Thu, Aug 17, 2006 at 12:16:46AM +0200, Arnd Bergmann wrote:
> Am Wednesday 16 August 2006 23:32 schrieb David Miller:
> > Can spidernet be told these kinds of parameters?  "N packets or
> > X usecs"?
> 
> It can not do exactly this but probably we can get close to it by

Why would you want o do this? It seems like a cruddier strategy 
than what we can already do  (which is to never get an transmit
interrupt, as long as the kernel can shove data into the device fast
enough to keep the queue from going empty.)  The whole *point* of a 
low-watermark interrupt is to never have to actually get the interrupt, 
if the rest of the system is on its toes and is supplying data fast
enough.

--linas

^ permalink raw reply

* Re: [PATCH 1/2]: powerpc/cell spidernet bottom half
From: David Miller @ 2006-08-16 23:32 UTC (permalink / raw)
  To: linas
  Cc: akpm, arnd, jeff, netdev, jklewis, linux-kernel, linuxppc-dev,
	Jens.Osterkamp
In-Reply-To: <20060816233028.GO20551@austin.ibm.com>

From: linas@austin.ibm.com (Linas Vepstas)
Date: Wed, 16 Aug 2006 18:30:28 -0500

> Why would you want o do this? It seems like a cruddier strategy 
> than what we can already do  (which is to never get an transmit
> interrupt, as long as the kernel can shove data into the device fast
> enough to keep the queue from going empty.)  The whole *point* of a 
> low-watermark interrupt is to never have to actually get the interrupt, 
> if the rest of the system is on its toes and is supplying data fast
> enough.

As long as TX packets get freed within a certain latency
boundary, this kind of scheme should be fine.

^ permalink raw reply

* Re: [RFC PATCH 1/4] powerpc 2.6.16-rt17: to build on powerpc w/ RT
From: Benjamin Herrenschmidt @ 2006-08-16 23:38 UTC (permalink / raw)
  To: john stultz; +Cc: linux-kernel, linuxppc-dev, Paul Mackerras, mingo, tglx
In-Reply-To: <1155318983.5337.2.camel@localhost.localdomain>


> You might take a peek at the patch set here:
> http://sr71.net/~jstultz/tod/ for a somewhat rough powerpc conversion to
> CONFIG_GENERIC_TIME.

Afaik, as-is, this patch will remove updating of the various bits used
by the vDSO for userland gettimeofday without actually removing the vdso
itself. Thus, with a recent glibc, it will break gettimeofday,
clock_gettime, .... Pretty bad :)

Ben.

^ permalink raw reply

* Re: [PATCH 2/4]: powerpc/cell spidernet low watermark patch.
From: Benjamin Herrenschmidt @ 2006-08-16 23:43 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: Arnd Bergmann, netdev, James K Lewis, linux-kernel, linuxppc-dev,
	Jens Osterkamp
In-Reply-To: <20060811170813.GJ10638@austin.ibm.com>

On Fri, 2006-08-11 at 12:08 -0500, Linas Vepstas wrote:
> 
> Implement basic low-watermark support for the transmit queue.
> 
> The basic idea of a low-watermark interrupt is as follows.
> The device driver queues up a bunch of packets for the hardware
> to transmit, and then kicks he hardware to get it started.
> As the hardware drains the queue of pending, untransmitted 
> packets, the device driver will want to know when the queue
> is almost empty, so that it can queue some more packets.
> 
> This is accomplished by setting the DESCR_TXDESFLG flag in
> one of the packets. When the hardware sees this flag, it will 
> interrupt the device driver. Because this flag is on a fixed
> packet, rather than at  fixed location in the queue, the
> code below needs to move the flag as more packets are
> queued up. This implementation attempts to keep te flag 
> at about 3/4's of the way into the queue.
> 
> This patch boosts driver performance from about 
> 300-400Mbps for 1500 byte packets, to about 710-740Mbps.

Sounds good (without actually looking at the code though :), that was a
long required improvement to that driver. Also, we should probably look
into using NAPI polling for tx completion queue as well, no ?

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 1/2]: powerpc/cell spidernet bottom half
From: Linas Vepstas @ 2006-08-16 23:47 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: akpm, jeff, netdev, jklewis, linux-kernel, linuxppc-dev,
	Jens.Osterkamp, David Miller
In-Reply-To: <200608170103.21097.arnd@arndb.de>

On Thu, Aug 17, 2006 at 01:03:20AM +0200, Arnd Bergmann wrote:
> 
> Could well be related to latencies when going to the remote
> node for descriptor DMAs. Have you tried if the hch's NUMA
> patch or using numactl makes a difference here?

No. I guess I should try.

> > > sounds like the right approach to simplify the code.
> >
> > Its not a big a driver. 'wc' says its 2.3 loc, which
> > is 1/3 or 1/5 the size of tg3.c or the e1000*c files.
> 
> Right, I was thinking of removing a lock or another, not
> throwing out half of the driver ;-)

There's only four lock points grand total. 
-- One on the receive side,
-- one to protect the transmit head pointer, 
-- one to protect the transmit tail pointer, 
-- one to protect the location of the transmit low watermark.

The last three share the same lock. I tried using distinct
locks, but this worsened things, probably due to cache-line 
trashing. I tried removing the head pointer lock, but this
failed. I don't know why, and was surprised by this. I thought
hard_start_xmit() was serialized.

--linas

^ permalink raw reply

* Re: [PATCH] fix gettimeofday vs. update_gtod race
From: Benjamin Herrenschmidt @ 2006-08-16 23:48 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: linuxppc-dev, Paul Mackerras
In-Reply-To: <20060811204105.GK3233@localdomain>

On Fri, 2006-08-11 at 15:41 -0500, Nathan Lynch wrote:

> +	/* Sampling the time base must be done after loading
> +	 * do_gtod.varp in order to avoid racing with update_gtod.
> +	 */
> +	rmb();
> +	tb_ticks = get_tb() - temp_varp->tb_orig_stamp;

The barrier isn't necessary and the race not completely closed imho... I
need to think about it a bit more closely but what about instead just
check if tb_ticks goes negative, and if yes, just do get_tb() again ?
That might be faster than having a sync in there and should still be
correct.

>  	temp_tb_to_xs = temp_varp->tb_to_xs;
>  	temp_stamp_xsec = temp_varp->stamp_xsec;
>  	xsec = temp_stamp_xsec + mulhdu(tb_ticks, temp_tb_to_xs);
> @@ -464,7 +469,7 @@ void do_gettimeofday(struct timeval *tv)
>  		tv->tv_usec = usec;
>  		return;
>  	}
> -	__do_gettimeofday(tv, get_tb());
> +	__do_gettimeofday(tv);
>  }
>  
>  EXPORT_SYMBOL(do_gettimeofday);
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH] no-execute -- please test
From: Benjamin Herrenschmidt @ 2006-08-16 23:55 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Albert Cahalan, linuxppc-dev, debian-powerpc
In-Reply-To: <17633.2179.582261.162544@cargo.ozlabs.ibm.com>


> We have a bit per page that says if the page is icache dirty or not.
> On machines with no-execute support, we already avoid flushing the
> page until some process first tries to execute from it.  If we
> extended that to this scheme, when we made a segment executable, we
> would have to find and flush all icache-dirty pages in the segment (up
> to 65536 pages).  We wouldn't want to do that every time we made a
> segment executable - it would need to be optimized (e.g. keep a count
> per segment of icache-dirty pages in the segment).

Note that we need to change the icache flush mecanism anyway as it's
always been racy on ppc32 SMP (though very few people noticed so far :)
and ppc64 SMP with < POWER3 CPUs (without N bit).

Ben.

^ permalink raw reply

* Re: [RFC PATCH 1/4] powerpc 2.6.16-rt17: to build on powerpc w/ RT
From: john stultz @ 2006-08-17  0:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-kernel, linuxppc-dev, Paul Mackerras, mingo, tglx
In-Reply-To: <1155771487.11312.114.camel@localhost.localdomain>

On Thu, 2006-08-17 at 01:38 +0200, Benjamin Herrenschmidt wrote:
> > You might take a peek at the patch set here:
> > http://sr71.net/~jstultz/tod/ for a somewhat rough powerpc conversion to
> > CONFIG_GENERIC_TIME.
> 
> Afaik, as-is, this patch will remove updating of the various bits used
> by the vDSO for userland gettimeofday without actually removing the vdso
> itself. Thus, with a recent glibc, it will break gettimeofday,
> clock_gettime, .... Pretty bad :)

Hey Ben,
	I appreciate your looking over my patch. You are correct, the
conversion is a bit rough and I've not yet been able to work on the
powerpc vDSO, although I'd like to get it working so any help or
suggestions would be appreciated (is there a reason the vDSO is written
in ASM?).

If you have any other concerns w/ that patch, or the generic timekeeping
code, please let me know and I'll do what I can to address them.

thanks
-john

^ permalink raw reply

* Re: Trouble with 85xx CDS cascade irq (i8259)
From: Benjamin Herrenschmidt @ 2006-08-17  0:04 UTC (permalink / raw)
  To: Andy Fleming; +Cc: linuxppc-dev@ozlabs.org list
In-Reply-To: <7B853E2B-4156-4601-A272-9A9098310240@freescale.com>

> I'm currently working on getting the 85xx platform working with the  
> new IRQ code.  Specifically, I'm working on the 8555 CDS right now.   
> The 85xx CDS consists of a PCI carrier card (with the 8555 processor,  
> serial, and networking interfaces), placed in one of 4 PCI slots on a  
> custom motherboard (the Arcadia).  The PCI interrupts are routed into  
> the carrier card through the standard PCI interrupt pins, which  
> requires some mucking around with interrupt assignments depending on  
> which slot the card is in.  There is also a VIA chipset on the  
> Arcadia motherboard, which provides IDE (among other things), and the  
> i8259 PIC in there is routed in through PCI interrupt A.

 <bell rings>  VIA chipsets are known to do very strange things with
interrupt routing depending on magic bits in some PCI config space
registers </bell rings>. No specifics in mind right now though...

One of the issue with some of those legacy chipsets is that they tend to
route IDE IRQs to 14 and 15 regardless of the PCI interrupt assigned to
the controller (that is they don't completely follow the native mode).
You might need a specific hack to set the right IRQs for your IDE
channels thus as the driver might not have a clue.

As you can see, init_hwif_via82cxxx() already has ugly hackes. A cleaner
way to do that would be to use the hook I added for such things:
pci_get_legacy_ide_irq(). It's currently used by the amd74xx driver but
via82cxxx could be fixd too. It will, on powerpc, call into a ppc_md.
hook that you can use to "fixup" your IDE interrupts manually from the
platform code.

 .../...

> At this point, I have determined that I am caught in a never-ending  
> cycle of interrupts on the i8259 cascade (that is, the mpic interrupt  
> that the i8259 is cascaded through).  i8259_irq() returns that it's  
> irq 7.

7 doesn't seem like a proper i8259 irq in that setup ... 
=
>                  /*
>                   * Since only primary interface works, force the
>                   * IDE function to standard primary IDE interrupt
>                   * w/ 8259 offset
>                   */
>                  dev->irq = 14;
>                  pci_write_config_byte(dev, PCI_INTERRUPT_LINE, dev- 
>  >irq);
>                  pci_dev_put(dev);
>          }

The above looks ok, but make sure that's really what the driver is
using ... Since it's a legacy irq, it should indeed not need any virtual
mapping (irqs below 16 are reserved for i8259)

So overall, sounds good to me, there is something lurking around but I
can't see it. (BTW. Why do you need to fixup all those IRQs ? Can't you
just have a proper interrupt-map ?

Ben.

^ permalink raw reply

* Re: [PATCH] fix gettimeofday vs. update_gtod race
From: Nathan Lynch @ 2006-08-17  0:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Paul Mackerras
In-Reply-To: <1155772134.11312.119.camel@localhost.localdomain>

Benjamin Herrenschmidt wrote:
> On Fri, 2006-08-11 at 15:41 -0500, Nathan Lynch wrote:
> 
> > +	/* Sampling the time base must be done after loading
> > +	 * do_gtod.varp in order to avoid racing with update_gtod.
> > +	 */
> > +	rmb();
> > +	tb_ticks = get_tb() - temp_varp->tb_orig_stamp;
> 
> The barrier isn't necessary

No?  I didn't find anything about mftb having synchronizing
behavior.  How should we ensure that temp_varp is assigned before
reading the timebase?

Surely at least a compiler barrier is needed?

> and the race not completely closed imho... 

How so?  I could've missed something, but I've hammered the patch
pretty hard, fwiw.

> I need to think about it a bit more closely but what about instead
> just check if tb_ticks goes negative, and if yes, just do get_tb()
> again ?  That might be faster than having a sync in there and should
> still be correct.

I did try something like that but found that a loop (i.e. multiple
get_tb's to "catch up") was necessary.

^ permalink raw reply

* Re: [PATCH 1/2]: powerpc/cell spidernet bottom half
From: Linas Vepstas @ 2006-08-17  0:23 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, arnd, jeff, netdev, jklewis, linux-kernel, linuxppc-dev,
	Jens.Osterkamp
In-Reply-To: <20060816.163252.64000941.davem@davemloft.net>

On Wed, Aug 16, 2006 at 04:32:52PM -0700, David Miller wrote:
> From: linas@austin.ibm.com (Linas Vepstas)
> > Why would you want to do this? It seems like a cruddier strategy 
> > than what we can already do  (which is to never get an transmit
> > interrupt, as long as the kernel can shove data into the device fast
> > enough to keep the queue from going empty.)  The whole *point* of a 
> > low-watermark interrupt is to never have to actually get the interrupt, 
> > if the rest of the system is on its toes and is supplying data fast
> > enough.
> 
> As long as TX packets get freed within a certain latency
> boundary, this kind of scheme should be fine.

I just had some fun making sure I wasn't making a liar out of myself.
So far, I'm good.

Did

echo 768111 > /proc/sys/net/core/wmem_max
echo 768111 > /proc/sys/net/core/wmem_default

to make sure that the app never blocked on a full socket.

(If the socket is small, then the app blocks, 
the transmit queue drains, and we do get an interupt -- 
about 1K/sec, which is what is expected).

Ran 'vmstat 10' to watch the interrupts in real time. 
Ran netperf, got 904 Mbits/sec, and *no* interrupts. 
Yahoo!

Ran oprofile to see where he time went:

CPU: ppc64 Cell Broadband Engine, speed 3200 MHz (estimated)
Counted CYCLES events (Processor Cycles) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               app name                 symbol name
13748742 77.6620  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .cbe_idle
936172    5.2881  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .__copy_tofrom_user
569353    3.2161  spidernet.ko             spidernet                .spider_net_xmit
450826    2.5466  spidernet.ko             spidernet                .spider_net_release_tx_chain
220374    1.2448  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       ._spin_unlock_irqrestore
112432    0.6351  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       ._spin_lock
91328     0.5159  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .__qdisc_run
84804     0.4790  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .packet_sendmsg
76167     0.4302  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .kfree
74321     0.4198  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .sock_alloc_send_skb
65323     0.3690  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .kmem_cache_free
60334     0.3408  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       ._read_lock
60071     0.3393  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .dev_queue_xmit
56900     0.3214  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .kmem_cache_alloc_node
55281     0.3123  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .sock_wfree
51242     0.2894  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .dev_get_by_index
50438     0.2849  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .compat_sys_socketcall
49247     0.2782  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .fput
48589     0.2745  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .sync_buffer
46055     0.2601  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       system_call_common
40607     0.2294  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .local_bh_enable
40273     0.2275  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .__might_sleep
38757     0.2189  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .fget_light
38219     0.2159  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .__kfree_skb
36804     0.2079  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .sock_def_write_space
36443     0.2059  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .skb_release_data
32174     0.1817  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .sys_sendto
31828     0.1798  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .sock_sendmsg
30676     0.1733  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .pfifo_fast_dequeue
29607     0.1672  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .add_event_entry
25870     0.1461  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       syscall_exit
25329     0.1431  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .__alloc_skb
23885     0.1349  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .__do_softirq
22046     0.1245  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .kmem_find_general_cachep
21610     0.1221  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .__dev_get_by_index
21059     0.1190  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .dma_map_single
21044     0.1189  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       ._spin_lock_irqsave
20105     0.1136  vmlinux-2.6.18-rc2       vmlinux-2.6.18-rc2       .memset

.__copy_tofrom_user          -- ouch spider does not currently do scatter-gather
.spider_net_xmit             -- hmmm ?? why is it this large ??
.spider_net_release_tx_chain --  ?? a lot of time being spent cleaning up tx queues.
._spin_unlock_irqrestore     -- hmm ? why so high? lock contention?

I presume the rest is normal.

^ permalink raw reply

* Re: [RFC PATCH 1/4] powerpc 2.6.16-rt17: to build on powerpc w/ RT
From: Benjamin Herrenschmidt @ 2006-08-17  0:26 UTC (permalink / raw)
  To: john stultz; +Cc: linux-kernel, linuxppc-dev, Paul Mackerras, mingo, tglx
In-Reply-To: <1155772859.15360.12.camel@localhost.localdomain>


> Hey Ben,
> 	I appreciate your looking over my patch. You are correct, the
> conversion is a bit rough and I've not yet been able to work on the
> powerpc vDSO, although I'd like to get it working so any help or
> suggestions would be appreciated (is there a reason the vDSO is written
> in ASM?).
> 
> If you have any other concerns w/ that patch, or the generic timekeeping
> code, please let me know and I'll do what I can to address them.

Well, I've been wanting to look at your stuff and possibly do the
conversion for some time, provided we don't lose performances ... Our
current implementation is very optimized to avoid even memory barriers
in most cases and I doubt we'll be able to be as fine tuned using your
generic code, thus it's a tradeoff decision that we have to do. But
then, I need to look into the details before doing any final
statement :)

As for why the vDSO is in assembly, well... because it's kewl ? :) More
seriously, because it's much more simpler that way (and it's hand
optimized in a couple of places, though that would probably benefit
going through a proper scheduling analysis). The vDSO code has "special"
calling conventions (like the need to tweak cr.so, the non-use of the
TOC, the lack of procedure descriptors, symbols are offsets to the
functions, etc...) that makes it awkward to write it in C.

Ben

^ permalink raw reply

* Re: [PATCH] fix gettimeofday vs. update_gtod race
From: Benjamin Herrenschmidt @ 2006-08-17  0:27 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: linuxppc-dev, Paul Mackerras
In-Reply-To: <20060817001807.GB354@localdomain>

On Wed, 2006-08-16 at 19:18 -0500, Nathan Lynch wrote:

> No?  I didn't find anything about mftb having synchronizing
> behavior.  How should we ensure that temp_varp is assigned before
> reading the timebase?

I sync an isync would be enough.

> Surely at least a compiler barrier is needed?

Yeah.

> > and the race not completely closed imho... 
> 
> How so?  I could've missed something, but I've hammered the patch
> pretty hard, fwiw.

Nah you are right, but you may be using a too big hammer

> > I need to think about it a bit more closely but what about instead
> > just check if tb_ticks goes negative, and if yes, just do get_tb()
> > again ?  That might be faster than having a sync in there and should
> > still be correct.
> 
> I did try something like that but found that a loop (i.e. multiple
> get_tb's to "catch up") was necessary.

Hrm... even with an isync ?

Ben.

^ permalink raw reply

* Re: [PATCH] powerpc: Make RTAS console init generic
From: Michael Ellerman @ 2006-08-17  0:44 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, paulus, anton
In-Reply-To: <20060816152303.7AF4967B5A@ozlabs.org>

[-- Attachment #1: Type: text/plain, Size: 2991 bytes --]

On Wed, 2006-08-16 at 10:22 -0500, Michael Neuling wrote:
> In message <1155705695.12715.7.camel@localhost.localdomain> you wrote:
> > 
> > --=-PNKyW5KJv4630LcmkXVt
> > Content-Type: text/plain
> > Content-Transfer-Encoding: quoted-printable
> > 
> > On Tue, 2006-08-15 at 23:00 -0500, Michael Neuling wrote:
> > > The RTAS console doesn't have to be Cell specific.  If we have both
> > > the put and get char RTAS functions, init the rtas console.
> > >=20
> > > Index: linux-2.6-ozlabs/arch/powerpc/kernel/rtas.c
> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/rtas.c
> > > +++ linux-2.6-ozlabs/arch/powerpc/kernel/rtas.c
> > > @@ -910,6 +910,11 @@ int __init early_init_dt_scan_rtas(unsig
> > >  	basep =3D of_get_flat_dt_prop(node, "get-term-char", NULL);
> > >  	if (basep)
> > >  		rtas_getchar_token =3D *basep;
> > > +
> > > +	if (rtas_putchar_token !=3D RTAS_UNKNOWN_SERVICE &&
> > > +	    rtas_getchar_token !=3D RTAS_UNKNOWN_SERVICE)
> > > +		udbg_init_rtas_console();
> > > +
> > >  #endif
> > > =20
> > >  	/* break now */
> > > Index: linux-2.6-ozlabs/arch/powerpc/platforms/cell/setup.c
> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > --- linux-2.6-ozlabs.orig/arch/powerpc/platforms/cell/setup.c
> > > +++ linux-2.6-ozlabs/arch/powerpc/platforms/cell/setup.c
> > > @@ -150,10 +150,6 @@ static int __init cell_probe(void)
> > >  	    !of_flat_dt_is_compatible(root, "IBM,CPBW-1.0"))
> > >  		return 0;
> > > =20
> > > -#ifdef CONFIG_UDBG_RTAS_CONSOLE
> > > -	udbg_init_rtas_console();
> > > -#endif
> > > -
> > 
> > I'd like to see it still guarded by UDBG_RTAS_CONSOLE, otherwise there's
> > no way to select a different type of early console on a machine which
> > has those tokens in the device tree.
> 
> Agreed but that section in rtas.c is already guarded by
> UDBG_RTAS_CONSOLE.  After applying the patch, it looks like: 
> 
> #ifdef CONFIG_UDBG_RTAS_CONSOLE
> 	basep = of_get_flat_dt_prop(node, "put-term-char", NULL);
> 	if (basep)
> 		rtas_putchar_token = *basep;
> 
> 	basep = of_get_flat_dt_prop(node, "get-term-char", NULL);
> 	if (basep)
> 		rtas_getchar_token = *basep;
> 
> 	if (rtas_putchar_token != RTAS_UNKNOWN_SERVICE &&
> 	    rtas_getchar_token != RTAS_UNKNOWN_SERVICE)
> 		udbg_init_rtas_console();
> 
> #endif

Ah sorry, I just looked at the patch.

cheers

-- 
Michael Ellerman
IBM OzLabs

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* Re: Linux hanging on Xilinx SystemACE
From: Jeff Angielski @ 2006-08-17  0:51 UTC (permalink / raw)
  To: Grant Likely; +Cc: linuxppc-embedded
In-Reply-To: <528646bc0608161448y26e13398v3b818ed5f1124295@mail.gmail.com>

On Wed, 2006-08-16 at 15:48 -0600, Grant Likely wrote:
> On 8/16/06, Clint Thomas <cthomas@soneticom.com> wrote:
> >
> >
> > Hey,
> >
> > Using the powerpc development tree of Linux 2.4, I am trying to boot my
> > system from CompactFlash using Xilinx SystemACE. My compact flash card has
> > two partitions, a 16MB FAT16 that holds the combination FPGA image / Linux
> > Kernel ELF file, and an Ext2 partition that holds the root file system. The
> > system starts the boot process, uncompresses the Linux kernel and begins
> > loading drivers. Part way into this process, it conducts a partition check
> > of the drive being reported to it by SystemACE, however, it hangs at that
> > point. No kernel panic, no error message, it simply hangs. Here is the
> > output at that point...
> >
> > Partition check:
> >  xsysacea:
> >
> > what I am trying to find out is if this problem has been seen/fixed in the
> > past? or did I format the CF card incorrectly?

I forgot to mention that we used to see this problem when the identify
command that is sent during intialization fails.  The driver is written
in such a way that if any of this fails, the system hangs because it
sits in a polling loop waiting for the correct response.  There are no
timeout failures... :(  In our case we saw this error because we forgot
to put a CF into the system [usually during development with NFS
rootfs].

It is fairly easy to printk() the drivers init code to find out which
step is stuck in the polling loop.

> Checking partitions is a user-space activity (fsck).  Remove it from
> your init scripts.  Besides, unless your using a microdrive, your ext2
> rootfs should be mounted read-only which greatly reduces the need for
> fsck.  (because FLASH will wear out after too many writes)

The partition check he is referring to is part of a block device driver
initialization.  It is not fsck.  If he were only so luck to be that far
in the startup sequence... :)



Jeff Angielski
The PTR Group

^ permalink raw reply

* Re: [RFC PATCH 1/4] powerpc 2.6.16-rt17: to build on powerpc w/ RT
From: john stultz @ 2006-08-17  0:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-kernel, linuxppc-dev, Paul Mackerras, mingo, tglx
In-Reply-To: <1155774368.11312.135.camel@localhost.localdomain>

On Thu, 2006-08-17 at 02:26 +0200, Benjamin Herrenschmidt wrote:
> > Hey Ben,
> > 	I appreciate your looking over my patch. You are correct, the
> > conversion is a bit rough and I've not yet been able to work on the
> > powerpc vDSO, although I'd like to get it working so any help or
> > suggestions would be appreciated (is there a reason the vDSO is written
> > in ASM?).
> > 
> > If you have any other concerns w/ that patch, or the generic timekeeping
> > code, please let me know and I'll do what I can to address them.
> 
> Well, I've been wanting to look at your stuff and possibly do the
> conversion for some time, provided we don't lose performances ... Our
> current implementation is very optimized to avoid even memory barriers
> in most cases and I doubt we'll be able to be as fine tuned using your
> generic code, thus it's a tradeoff decision that we have to do. But
> then, I need to look into the details before doing any final
> statement :)

Ok, although do let me know if you see places where the generic code
could use any of the optimizations used in powerpc.

thanks
-john

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox