PATCH: improved processor config for G3s

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* PATCH: improved processor config for G3s
@ 2000-09-03 13:03 Michel Lanners
  2000-09-03 13:43 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: Michel Lanners @ 2000-09-03 13:03 UTC (permalink / raw)
  To: paulus; +Cc: linuxppc-dev

[-- Attachment #1: Type: TEXT/plain, Size: 1566 bytes --]

Hi Paul,

It's patch day today ;-)

Attached is my try at improving the processor configuration for G3/G4
processors in head.S. Instead of leaving some config bits at what they
were set by the formware, I've tried to set all availbale faetures to
their 'optimum' value. Patch is against your 2.4 rsync tree as of today.
This patch, as well as the same thing against a 2.2 source, are also
posted on sourceforge.

Those where I see no problem are:

 * Branch History Table (BHTE), Branch Target ICache (BTIC), Dynamic
   Power Management (DPM) and Store Gathering (SGE) all on,
 * Instruction Cache Throttling Control (ICTC) off.

These were not touched before, but might be worth a discussion:

 * Speculative Cache Access Disable (SPD) cleared, Address Brodcast
   (ABE) enabled

Clearing SPD makes sense since some processor upgrade cards come with an
OF patch that sets this bit (due to ROM problems... we don't care once
Linux is up).

I'm running these processor settings without a problem for some time
now. If all the processor config in head.S doesn't look clean to some, I
agree, and I propose we change that setup a bit for 2.5, separating the
cache enable stuff from the processor configuration.

Cheers

Michel

-------------------------------------------------------------------------
Michel Lanners                 |  " Read Philosophy.  Study Art.
23, Rue Paul Henkes            |    Ask Questions.  Make Mistakes.
L-1710 Luxembourg              |
email   mlan@cpu.lu            |
http://www.cpu.lu/~mlan        |                     Learn Always. "

[-- Attachment #2: processor.diff --]
[-- Type: TEXT/plain, Size: 934 bytes --]

diff -uNr linux-2.4.paul/arch/ppc/kernel/head.S linux/arch/ppc/kernel/head.S
--- linux-2.4.paul/arch/ppc/kernel/head.S	Sun Aug 27 12:33:12 2000
+++ linux/arch/ppc/kernel/head.S	Sat Sep  2 11:26:12 2000
@@ -1347,13 +1347,17 @@
 4:
 	cror	14,14,18
 	bne	3,6f
-	/* We should add ABE here if we want to use Store Gathering
-	 * and other nifty bridge features
+	/* for G3/G4:
+	 * enable Store Gathering (SGE), Address Brodcast (ABE),
+	 * Branch History Table (BHTE), Branch Target ICache (BTIC)
 	 */
-	ori	r11,r11,HID0_SGE|HID0_BHTE|HID0_BTIC /* for g3/g4, enable */
+	ori	r11,r11,HID0_SGE | HID0_ABE | HID0_BHTE | HID0_BTIC
+	oris	r11,r11,HID0_DPM>>16	/* enable dynamic power mgmt */
+	li	r3,HID0_SPD
+	andc	r11,r11,r3		/* clear SPD: enable speculative */
  	li	r3,0
- 	mtspr	ICTC,r3
-5:	mtspr	HID0,r11		/* superscalar exec & br history tbl */
+ 	mtspr	ICTC,r3			/* Instruction Cache Throttling off */
+5:	mtspr	HID0,r11
 6:	blr

 /*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: improved processor config for G3s
  2000-09-03 13:03 PATCH: improved processor config for G3s Michel Lanners
@ 2000-09-03 13:43 ` Benjamin Herrenschmidt
  2000-09-04  9:48   ` Gabriel Paubert
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2000-09-03 13:43 UTC (permalink / raw)
  To: mlan; +Cc: paulus, linuxppc-dev

>
> * Branch History Table (BHTE), Branch Target ICache (BTIC), Dynamic
>   Power Management (DPM) and Store Gathering (SGE) all on,
> * Instruction Cache Throttling Control (ICTC) off.
>
>These were not touched before, but might be worth a discussion:
>
> * Speculative Cache Access Disable (SPD) cleared, Address Brodcast
>   (ABE) enabled
>
>Clearing SPD makes sense since some processor upgrade cards come with an
>OF patch that sets this bit (due to ROM problems... we don't care once
>Linux is up).

There's a bit of history about ABE:

Originally, it was not set. At that time, I tried to turn ON the "Store
Gathering" option of the MPC 106 (grackle) host brigde. I left it a few
weeks in my kernel and began getting reports of problems, mostly with
Adaptec cards.

At that time, some apple person told me enabling Store Gathering on the
bridge was wrong, they had problems with it (various compatibility
issues), probably due to grackle bugs, but without telling me exactly
what was going on.

Later on, a Moto person told me there was no known grackle bug about
store gathering, but that the CPU ABE was needed for proper operations. I
also suspect that store gathering increase the effect of PCI write
posting, and older versions of the Adaptec drivers were not correctly
taking this into account on sime timing critical accesses.

I finally ended up setting ABE in head.S (along with a few others), but
never uncommented the call to grackle_set_stg() in pmac_pci.c. If someone
want to test if it behaves properly now (the problems reported at that
time were mostly with Adaptec cards in B&W G3s not initializing
properly), I don't have a Grackle-based machine here any more.

Ben.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: improved processor config for G3s
  2000-09-03 13:43 ` Benjamin Herrenschmidt
@ 2000-09-04  9:48   ` Gabriel Paubert
  2000-09-04 10:34     ` Adrian Cox
  2000-09-04 17:51     ` Michel Lanners
  0 siblings, 2 replies; 10+ messages in thread
From: Gabriel Paubert @ 2000-09-04  9:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: mlan, paulus, linuxppc-dev

On Sun, 3 Sep 2000, Benjamin Herrenschmidt wrote:

> There's a bit of history about ABE:
>
> Originally, it was not set. At that time, I tried to turn ON the "Store
> Gathering" option of the MPC 106 (grackle) host brigde. I left it a few
> weeks in my kernel and began getting reports of problems, mostly with
> Adaptec cards.
>
> At that time, some apple person told me enabling Store Gathering on the
> bridge was wrong, they had problems with it (various compatibility
> issues), probably due to grackle bugs, but without telling me exactly
> what was going on.
>
> Later on, a Moto person told me there was no known grackle bug about
> store gathering, but that the CPU ABE was needed for proper operations. I
> also suspect that store gathering increase the effect of PCI write
> posting, and older versions of the Adaptec drivers were not correctly
> taking this into account on sime timing critical accesses.

I'd rather take the Moto version. The problem is that store gathering
should be inhibited if there is an eieio instruction between two
successive stores which may be gathered. However, the eieio instruction
is not propagated to the bus by all processors, it is by the 604 and 7400
(and 601 IIRC), not by the 603/603e/750. I can't remember offhand if ABE
enables eieio broadcasts on the 750, but it seems so from what Motorola
claims.

A rule of thumb is the following: fully SMP capable processors broadcast
eieio (and tlbie for that matter), others do not at least by default. On
an UP 750 (SMP 750 are an aberration in any case because of TLB issues),
I'd bet that it is more efficient to let the processor perform store
gathering when it can (an eieio between both stores will prevent it) and
to disable both ABE in the processor and store gathering in the bridge.
This will result in lower processor bus utilization.

	Regards,
	Gabriel

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: improved processor config for G3s
  2000-09-04  9:48   ` Gabriel Paubert
@ 2000-09-04 10:34     ` Adrian Cox
  2000-09-04 10:54       ` Benjamin Herrenschmidt
  2000-09-04 17:51     ` Michel Lanners
  1 sibling, 1 reply; 10+ messages in thread
From: Adrian Cox @ 2000-09-04 10:34 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: mlan, linuxppc-dev

Gabriel Paubert wrote:

> A rule of thumb is the following: fully SMP capable processors broadcast
> eieio (and tlbie for that matter), others do not at least by default. On
> an UP 750 (SMP 750 are an aberration in any case because of TLB issues),
> I'd bet that it is more efficient to let the processor perform store
> gathering when it can (an eieio between both stores will prevent it) and
> to disable both ABE in the processor and store gathering in the bridge.
> This will result in lower processor bus utilization.

Remember that the processor store gathering is only capable of turning
two 32-bit writes to uncached, nonguarded space into one 64-bit write.
The bridge store gathering converts an arbitrary sequence of sequential
writes into a PCI burst.

The bridge store gathering should be able to produce far more IO
improvement, and still works if the guard bit is set on the address
space.

I should have done a set of MPC107 experiments by the start of October,
and I'll know for sure then.

- Adrian Cox, AG Electronics

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: improved processor config for G3s
  2000-09-04 10:34     ` Adrian Cox
@ 2000-09-04 10:54       ` Benjamin Herrenschmidt
  2000-09-05  9:49         ` Gabriel Paubert
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2000-09-04 10:54 UTC (permalink / raw)
  To: Adrian Cox, Gabriel Paubert, linuxppc-dev


>> A rule of thumb is the following: fully SMP capable processors broadcast
>> eieio (and tlbie for that matter), others do not at least by default. On
>> an UP 750 (SMP 750 are an aberration in any case because of TLB issues),
>> I'd bet that it is more efficient to let the processor perform store
>> gathering when it can (an eieio between both stores will prevent it) and
>> to disable both ABE in the processor and store gathering in the bridge.
>> This will result in lower processor bus utilization.
>
>Remember that the processor store gathering is only capable of turning
>two 32-bit writes to uncached, nonguarded space into one 64-bit write.
>The bridge store gathering converts an arbitrary sequence of sequential
>writes into a PCI burst.
>
>The bridge store gathering should be able to produce far more IO
>improvement, and still works if the guard bit is set on the address
>space.
>
>I should have done a set of MPC107 experiments by the start of October,
>and I'll know for sure then.

Also, are you sure, Gabriel, that eieio() not beeing broadcast to the
bridge would harm ? The bridge is not allowed to do any re-ordering.
Maybe there are issues with devices not supporting burst access to
registers, but shouldn't those devices abort the burst after the first
access ?

Drivers sensitive to timing constraints must already do a read to flush
the bridge buffer, so...

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: improved processor config for G3s
  2000-09-04  9:48   ` Gabriel Paubert
  2000-09-04 10:34     ` Adrian Cox
@ 2000-09-04 17:51     ` Michel Lanners
  1 sibling, 0 replies; 10+ messages in thread
From: Michel Lanners @ 2000-09-04 17:51 UTC (permalink / raw)
  To: paubert; +Cc: bh40, paulus, linuxppc-dev


On   4 Sep, this message from Gabriel Paubert echoed through cyberspace:
>> Later on, a Moto person told me there was no known grackle bug about
>> store gathering, but that the CPU ABE was needed for proper operations. I
>> also suspect that store gathering increase the effect of PCI write
>> posting, and older versions of the Adaptec drivers were not correctly
>> taking this into account on sime timing critical accesses.
>
> I'd rather take the Moto version. The problem is that store gathering
> should be inhibited if there is an eieio instruction between two
> successive stores which may be gathered. However, the eieio instruction
> is not propagated to the bus by all processors, it is by the 604 and 7400
> (and 601 IIRC), not by the 603/603e/750. I can't remember offhand if ABE
> enables eieio broadcasts on the 750, but it seems so from what Motorola
> claims.

IBM's 740/750 User's Manual states on page 2-12, in the table about ABE:

 "Affected instructions are eieio, sync, dcbi, dcbf and dcbst. A sync
 instruction completes only after a successfull broadcast. Execution of
 eieio causes a brodcast that may be used to prevent any external
 devices, such as a bus bridge chip, from store gathering"

So it seems that to prevent store gathering in bus bridges across eieio
on the 750, you need to set ABE.

Michel

-------------------------------------------------------------------------
Michel Lanners                 |  " Read Philosophy.  Study Art.
23, Rue Paul Henkes            |    Ask Questions.  Make Mistakes.
L-1710 Luxembourg              |
email   mlan@cpu.lu            |
http://www.cpu.lu/~mlan        |                     Learn Always. "


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: improved processor config for G3s
  2000-09-04 10:54       ` Benjamin Herrenschmidt
@ 2000-09-05  9:49         ` Gabriel Paubert
  2000-09-05 10:50           ` Benjamin Herrenschmidt
  2000-09-05 11:32           ` Adrian Cox
  0 siblings, 2 replies; 10+ messages in thread
From: Gabriel Paubert @ 2000-09-05  9:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Adrian Cox, linuxppc-dev

On Mon, 4 Sep 2000, Benjamin Herrenschmidt wrote:

> >> A rule of thumb is the following: fully SMP capable processors broadcast
> >> eieio (and tlbie for that matter), others do not at least by default. On
> >> an UP 750 (SMP 750 are an aberration in any case because of TLB issues),
> >> I'd bet that it is more efficient to let the processor perform store
> >> gathering when it can (an eieio between both stores will prevent it) and
> >> to disable both ABE in the processor and store gathering in the bridge.
> >> This will result in lower processor bus utilization.
> >
> >Remember that the processor store gathering is only capable of turning
> >two 32-bit writes to uncached, nonguarded space into one 64-bit write.
> >The bridge store gathering converts an arbitrary sequence of sequential
> >writes into a PCI burst.

Within some limits (16 bytes within a 16 byte aligned area for the
Grackle).

> >The bridge store gathering should be able to produce far more IO
> >improvement, and still works if the guard bit is set on the address
> >space.

Avess to guarded space is limited to device registers, which is not thhat
frequent with modern PCI devices. Quite often you are not even able to
acces them in increasing address order for other reasons. But read
further...

> >I should have done a set of MPC107 experiments by the start of October,
> >and I'll know for sure then.
>
> Also, are you sure, Gabriel, that eieio() not beeing broadcast to the
> bridge would harm ? The bridge is not allowed to do any re-ordering.

The absence of broadcast might definitely harm, but first the support for
store gathering in the MPC106 (Grackle) is quite poor:

"For a stream of single-beat writes, the data for the first transaction is
latched in the first buffer and the MPC106 initiates the transaction on
the PCI bus. The second single-beat write is then stored in the second
buffer. For subsequent single-beat writes, store gathering is possible if
the incoming write is to sequential bytes in the same half cache line as
the previously latched data. Store gathering is only used for writes to
PCI memory space, not for writes to PCI I/O space. The store gathering
continues until the buffer is scheduled to be flushed or until the
processor issues a synchronizing transaction.

For example, if both PRPWBs are empty and the 60x processor issues a
single-beat write to PCI, the data is latched in the first buffer and the
PCI interface of the MPC106 attempts to acquire the PCI bus for the
transfer. The data for the next 60x-to-PCI write transaction is latched in
the second buffer, even if the second transaction's address falls within
the same half cache line as the first transaction. While the PCI interface
is busy with the first transfer, any sequential processor single-beat
writes within the same half cache line as the second transfer are gathered
in the second buffer until the PCI bus becomes available. "

So you need at least 3 writes, or to have one store buffer busy with a
previous write, to trigger the store gathering mechanism. This makes it
impossible to predict whether it will be used or not, and IMO not worth
the potential trouble since it will actually happen quite infrequently.

The only case where store gathering in the processor or in the bridge
may have a significant performance impact is when accessing a frame
buffer, which should never be mapped as guarded to start with.

Note that store gathering only affects memory space, not I/O space, I
don't know whether the Adaptec drivers are affected or not.

> Maybe there are issues with devices not supporting burst access to
> registers, but shouldn't those devices abort the burst after the first
> access ?

They should, and in this case store gathering in the bridge does not
bring you any significant performance benefit.

Just a question: were the devices exhibiting the problem 64 bit devices
behind the PCI<->PCI bridge on the Macs which have 64 bit PCI slots ?

> Drivers sensitive to timing constraints must already do a read to flush
> the bridge buffer, so...

Indeed, but the problem here is completely different and I would not call
buffer flushing a timing constraint, it is rather a coherency issue.

	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: improved processor config for G3s
  2000-09-05  9:49         ` Gabriel Paubert
@ 2000-09-05 10:50           ` Benjamin Herrenschmidt
  2000-09-05 11:06             ` Gabriel Paubert
  2000-09-05 11:32           ` Adrian Cox
  1 sibling, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2000-09-05 10:50 UTC (permalink / raw)
  To: Gabriel Paubert, linuxppc-dev


>
>The absence of broadcast might definitely harm, but first the support for
>store gathering in the MPC106 (Grackle) is quite poor:
>
>"For a stream of single-beat writes, the data for the first transaction is
>latched in the first buffer and the MPC106 initiates the transaction on
>the PCI bus. The second single-beat write is then stored in the second
>buffer. For subsequent single-beat writes, store gathering is possible if
>the incoming write is to sequential bytes in the same half cache line as
>the previously latched data. Store gathering is only used for writes to
>PCI memory space, not for writes to PCI I/O space. The store gathering
>continues until the buffer is scheduled to be flushed or until the
>processor issues a synchronizing transaction.

Well, that would help unaccelerated frame buffers (but are there any on
Grackle-based machines ?)


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: improved processor config for G3s
  2000-09-05 10:50           ` Benjamin Herrenschmidt
@ 2000-09-05 11:06             ` Gabriel Paubert
  0 siblings, 0 replies; 10+ messages in thread
From: Gabriel Paubert @ 2000-09-05 11:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

On Tue, 5 Sep 2000, Benjamin Herrenschmidt wrote:

> Well, that would help unaccelerated frame buffers (but are there any on
> Grackle-based machines ?)

Don't ask me this. I live in a basically Apple-free country (TM), and I
don't even use X or frame buffers on the couple of PPC machines that have
some graphics adapter (VGA text mode on these). All the other have a
serial console, the only true interface as everybody knows...

	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: improved processor config for G3s
  2000-09-05  9:49         ` Gabriel Paubert
  2000-09-05 10:50           ` Benjamin Herrenschmidt
@ 2000-09-05 11:32           ` Adrian Cox
  1 sibling, 0 replies; 10+ messages in thread
From: Adrian Cox @ 2000-09-05 11:32 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev

Gabriel Paubert wrote:
> So you need at least 3 writes, or to have one store buffer busy with a
> previous write, to trigger the store gathering mechanism. This makes it
> impossible to predict whether it will be used or not, and IMO not worth
> the potential trouble since it will actually happen quite infrequently.

This condition should be satisfied by memcpy_toio(). It won't be
satisfied by fb_memmove(), which is the function that makes large
virtual consoles on PowerPC so slow.

It might be more interesting to try an implementation of memcpy_toio()
using the DMA engines of the 106/7.

- Adrian Cox, AG Electronics

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2000-09-05 11:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-09-03 13:03 PATCH: improved processor config for G3s Michel Lanners
2000-09-03 13:43 ` Benjamin Herrenschmidt
2000-09-04  9:48   ` Gabriel Paubert
2000-09-04 10:34     ` Adrian Cox
2000-09-04 10:54       ` Benjamin Herrenschmidt
2000-09-05  9:49         ` Gabriel Paubert
2000-09-05 10:50           ` Benjamin Herrenschmidt
2000-09-05 11:06             ` Gabriel Paubert
2000-09-05 11:32           ` Adrian Cox
2000-09-04 17:51     ` Michel Lanners

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).