* RE: MPC8641D PEX: programming OWBAR in Endpoint mode?
From: Chen, Tiejun @ 2010-09-24 5:09 UTC (permalink / raw)
To: david.hagood; +Cc: linuxppc-dev
In-Reply-To: <409c3f6d508ece73c18f6ce750513b22.squirrel@localhost>
> -----Original Message-----
> From: david.hagood@gmail.com [mailto:david.hagood@gmail.com]=20
> Sent: Thursday, September 23, 2010 10:44 PM
> To: Chen, Tiejun
> Cc: David Hagood; linuxppc-dev@ozlabs.org
> Subject: RE: MPC8641D PEX: programming OWBAR in Endpoint mode?
>=20
> >> -----Original Message-----
> via the BARs.
> >
> > I read your email again and something hint me. I notice you clarify=20
> > you already condigure InBound successfully.
>=20
> I am programming BOTH the inbound ATMUs to make PPC memory=20
> available to the root complex, AND programming outbound ATMUs=20
> to enable the PPC to bus master to the root complex's memory=20
> space on PCIe.
>=20
Right but this should be done for RC mode, not for EP mode we're
discussing.
Tiejun
> I am NOT attempting to program the IWBARs - as you noted,=20
> they get programmed by the root complex via PCI config operations.
>=20
> >
> > And as my above comment I'm afraid you mix up InBound and=20
> OutBound on=20
> > EP mode?
>=20
> No, I am NOT confusing the two - that is why I am being VERY=20
> EXPLICIT about accessing the OUTBOUND ATMUs.
>=20
> The only reason I mention the inbound ATMUs is to demonstrate=20
> that the physical layer is working.
>=20
>=20
>=20
^ permalink raw reply
* Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
From: Michael Neuling @ 2010-09-24 6:56 UTC (permalink / raw)
To: michael
Cc: Neil Horman, linuxppc-dev, linux-kernel, Neil Horman,
Arjen Van De Ven, arjan
In-Reply-To: <1285304622.7637.55.camel@concordia>
> > > size_t size =3D 0;
> > > FILE *file;
> > > sprintf(buf, "/proc/irq/%i/smp_affinity", number);
> > > - file =3D fopen(buf, "r");
> > > + file =3D fopen(buf, "r+");
> > > if (!file)
> > > continue;
> > > if (getline(&line, &size, file)=3D=3D0) {
> > > @@ -89,7 +89,14 @@
> > > continue;
> > > }
> > > cpumask_parse_user(line, strlen(line), irq->mask);
> > > - fclose(file);
> > > + /*
> > > + * Check that we can write the affinity, if
> > > + * not take it out of the list.
> > > + */
> > > + if (fputs(line, file) =3D=3D EOF)
> > > + can_set =3D 0;
>
> > This is maybe a nit, but writing to the affinity file can fail for a few
> > different reasons, some of them permanent, some transient. For instance,=
> if
> > we're in a memory constrained condition temporarily irq_affinity_proc_wri=
> te
> > might return -ENOMEM. =20
>
> Yeah true, usually followed shortly by your kernel going so far into
> swap you never get it back, or OOMing, but I guess it's possible.
>
> > Might it be better to modify this code so that, instead
> > of using fputs to merge the various errors into an EOF, we use some other=
> write
> > method that lets us better determine the error and selectively ban the in=
> terrupt
> > only for those errors which we consider permanent?
>
> Yep. It seems fputs() gives you know way to get the actual error from
> write(), so it looks we'll need to switch to open/write, but that's
> probably not so terrible.
fclose inherits the error from fputs and it sets errno correctly. Below
uses this to catch only EIO errors and mark them for the banned list.
Mikey
irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
On pseries powerpc, IPIs are registered with an IRQ number so
/proc/interrupts looks like this on a 2 core/2 thread machine:
CPU0 CPU1 CPU2 CPU3
16: 3164282 3290514 1138794 983121 XICS Level IPI
18: 2605674 0 304994 0 XICS Level lan0
30: 400057 0 169209 0 XICS Level ibmvscsi
LOC: 133734 77250 106425 91951 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
CNT: 0 0 0 0 Performance monitoring interrupts
MCE: 0 0 0 0 Machine check exceptions
Unfortunately this means irqbalance attempts to set the affinity of IPIs
which is not possible. So in the above case, when irqbalance is in
performance mode due to heavy IPI, lan0 and ibmvscsi activity, it
sometimes attempts to put the IPIs on one core (CPU0&1) and lan0 and
ibmvscsi on the other core (CPU2&3). This is suboptimal as we want lan0
and ibmvscsi to be on separate cores and IPIs to be ignored.
When irqblance attempts writes to the IPI smp_affinity (ie.
/proc/irq/16/smp_affinity in the above example) it fails with an EIO but
irqbalance currently ignores this.
This patch catches these write fails and in this case adds that IRQ
number to the banned IRQ list. This will catch the above IPI case and
any other IRQ where the SMP affinity can't be set.
Tested on POWER6, POWER7 and x86.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Index: irqbalance/irqlist.c
===================================================================
--- irqbalance.orig/irqlist.c
+++ irqbalance/irqlist.c
@@ -28,6 +28,7 @@
#include <unistd.h>
#include <sys/types.h>
#include <dirent.h>
+#include <errno.h>
#include "types.h"
#include "irqbalance.h"
@@ -67,7 +68,7 @@
DIR *dir;
struct dirent *entry;
char *c, *c2;
- int nr , count = 0;
+ int nr , count = 0, can_set = 1;
char buf[PATH_MAX];
sprintf(buf, "/proc/irq/%i", number);
dir = opendir(buf);
@@ -80,7 +81,7 @@
size_t size = 0;
FILE *file;
sprintf(buf, "/proc/irq/%i/smp_affinity", number);
- file = fopen(buf, "r");
+ file = fopen(buf, "r+");
if (!file)
continue;
if (getline(&line, &size, file)==0) {
@@ -89,7 +90,13 @@
continue;
}
cpumask_parse_user(line, strlen(line), irq->mask);
- fclose(file);
+ /*
+ * Check that we can write the affinity, if
+ * not take it out of the list.
+ */
+ fputs(line, file);
+ if (fclose(file) && errno == EIO)
+ can_set = 0;
free(line);
} else if (strcmp(entry->d_name,"allowed_affinity")==0) {
char *line = NULL;
@@ -122,7 +129,7 @@
count++;
/* if there is no choice in the allowed mask, don't bother to balance */
- if (count<2)
+ if ((count<2) || (can_set == 0))
irq->balance_level = BALANCE_NONE;
^ permalink raw reply
* Re: [PATCH 1/2] PPC4xx: Generelizing drivers/dma/ppc4xx/adma.c
From: Stefan Roese @ 2010-09-24 6:58 UTC (permalink / raw)
To: linuxppc-dev
Cc: herbert, Tirumala Marri, yur, linux-raid, neilb, linux-crypto,
Dan Williams
In-Reply-To: <d2d7c3068b5582bde7529ccc65d71e52@mail.gmail.com>
On Friday 24 September 2010 00:39:47 Tirumala Marri wrote:
> > Will both versions of this driver exist in the same kernel build? For
> > example the iop-adma driver supports iop13xx and iop3xx, but we select
> > the archtitecture at build time? Or, as I assume in this case, will
> > the
> > two (maybe more?) ppc4xx adma drivers all be built in the same image,
> > more like ioatdma?
>
> [Marri] We select the architecture at build time.
It would be really preferable to support all those platforms in a single Linux
image. If technically possible, please try to move this direction.
Thanks.
Cheers,
Stefan
^ permalink raw reply
* Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree
From: Grant Likely @ 2010-09-24 7:10 UTC (permalink / raw)
To: christophe leroy
Cc: David Brownell, linux-kernel, spi-devel-general, Anton Vorontsov,
linuxppc-dev
In-Reply-To: <20100916070503.10046C7391@messagerie.si.c-s.fr>
On Thu, Sep 16, 2010 at 09:05:03AM +0200, christophe leroy wrote:
> This patch applies to 2.6.34.7 and 2.6.35.4
> It fixes an issue during the probe for CPM1 with definition of parameter ram from DTS
>
> Signed-off-by: christophe leroy <christophe.leroy@c-s.fr>
I'm sorry, I don't understand the fix from the given description.
What is the problem, and why is cpm_muram_alloc_fixed() the wrong
thing to call on CPM1? Does CPM2 still need it?
g.
>
> diff -urN b/drivers/spi/spi_mpc8xxx.c c/drivers/spi/spi_mpc8xxx.c
> --- b/drivers/spi/spi_mpc8xxx.c 2010-09-08 16:43:50.000000000 +0200
> +++ c/drivers/spi/spi_mpc8xxx.c 2010-09-08 16:44:03.000000000 +0200
> @@ -822,7 +822,7 @@
> if (!iprop || size != sizeof(*iprop) * 4)
> return -ENOMEM;
>
> - spi_base_ofs = cpm_muram_alloc_fixed(iprop[2], 2);
> + spi_base_ofs = iprop[2];
> if (IS_ERR_VALUE(spi_base_ofs))
> return -ENOMEM;
>
> @@ -844,7 +844,6 @@
> return spi_base_ofs;
> }
>
> - cpm_muram_free(spi_base_ofs);
> return pram_ofs;
> }
^ permalink raw reply
* Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree
From: LEROY Christophe @ 2010-09-24 7:20 UTC (permalink / raw)
To: Grant Likely
Cc: David Brownell, linux-kernel, spi-devel-general, Anton Vorontsov,
linuxppc-dev
In-Reply-To: <20100924071006.GA21318@angua.secretlab.ca>
[-- Attachment #1: Type: text/plain, Size: 1618 bytes --]
Hello,
The issue is that cpm_muram_alloc_fixed() allocates memory from the
general purpose muram area (from 0x0 to 0x1bff).
Here we need to return a pointer to the parameter RAM, which is located
somewhere starting at 0x1c00. It is not a dynamic allocation that is
required here but only to point on the correct location in the parameter
RAM.
For the CPM2, I don't know. I'm working with a MPC866.
Attached is a previous discussion on the subject where I explain a bit
more in details the issue.
Regards
C. Leroy
Le 24/09/2010 09:10, Grant Likely a écrit :
> On Thu, Sep 16, 2010 at 09:05:03AM +0200, christophe leroy wrote:
>> This patch applies to 2.6.34.7 and 2.6.35.4
>> It fixes an issue during the probe for CPM1 with definition of parameter ram from DTS
>>
>> Signed-off-by: christophe leroy<christophe.leroy@c-s.fr>
> I'm sorry, I don't understand the fix from the given description.
> What is the problem, and why is cpm_muram_alloc_fixed() the wrong
> thing to call on CPM1? Does CPM2 still need it?
>
> g.
>
>> diff -urN b/drivers/spi/spi_mpc8xxx.c c/drivers/spi/spi_mpc8xxx.c
>> --- b/drivers/spi/spi_mpc8xxx.c 2010-09-08 16:43:50.000000000 +0200
>> +++ c/drivers/spi/spi_mpc8xxx.c 2010-09-08 16:44:03.000000000 +0200
>> @@ -822,7 +822,7 @@
>> if (!iprop || size != sizeof(*iprop) * 4)
>> return -ENOMEM;
>>
>> - spi_base_ofs = cpm_muram_alloc_fixed(iprop[2], 2);
>> + spi_base_ofs = iprop[2];
>> if (IS_ERR_VALUE(spi_base_ofs))
>> return -ENOMEM;
>>
>> @@ -844,7 +844,6 @@
>> return spi_base_ofs;
>> }
>>
>> - cpm_muram_free(spi_base_ofs);
>> return pram_ofs;
>> }
[-- Attachment #2: Message joint --]
[-- Type: message/rfc822, Size: 6373 bytes --]
From: Scott Wood <scottwood@freescale.com>
To: LEROY Christophe <christophe.leroy@c-s.fr>
Cc: Kumar Gala <kumar.gala@freescale.com>, LinuxPPC-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: Small issue at init with spi_mpc8xxx.c with CPM1
Date: Tue, 7 Sep 2010 15:00:38 -0500
Message-ID: <20100907150038.57a7b065@schlenkerla.am.freescale.net>
On Tue, 7 Sep 2010 11:17:17 +0200
LEROY Christophe <christophe.leroy@c-s.fr> wrote:
>
> Dear Kumar,
>
> I have a small issue in the init of spi_mpc8xxx.c with MPC866 (CPM1)
>
> Unlike cpm_uart that maps the parameter ram directly using
> of_iomap(np,1), spi_mpc8xxx.c uses cpm_muram_alloc_fixed().
>
> This has two impacts in the .dts file:
> * The driver must be declared with pram at 1d80 instead of 3d80 whereas
> it is not a child of muram@2000 but a child of cpm@9c0
> * muram@2000/data@0 must be declared with reg = <0x0 0x2000> whereas
> is should be reg=<0x0 0x1c00> to avoid cpm_muram_alloc() to allocate
> space from parameters ram.
>
> Maybe I misunderstood something ?
Don't make the device tree lie, fix the driver instead.
The allocator should not be given any chunks of muram that are
dedicated to a fixed purpose -- it might hand it out to something else
before you reserve it. I don't think that cpm_muram_alloc_fixed() has
any legitimate use at all.
-Scott
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply
* Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
From: Richard Cochran @ 2010-09-24 7:29 UTC (permalink / raw)
To: john stultz
Cc: Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra, linux-api,
devicetree-discuss, linux-kernel, Thomas Gleixner, netdev,
Christoph Lameter, linuxppc-dev, David Miller, linux-arm-kernel,
Krzysztof Halasa
In-Reply-To: <1285271331.2587.56.camel@localhost.localdomain>
On Thu, Sep 23, 2010 at 12:48:51PM -0700, john stultz wrote:
> On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote:
> > A new syscall is introduced that allows tuning of a POSIX clock. The
> > syscall is implemented for four architectures: arm, blackfin, powerpc,
> > and x86.
> >
> > The new syscall, clock_adjtime, takes two parameters, the clock ID,
> > and a pointer to a struct timex. The semantics of the timex struct
> > have been expanded by one additional mode flag, which allows an
> > absolute offset correction. When specificied, the clock offset is
> > immediately corrected by adding the given time value to the current
> > time value.
>
>
> So I'd still split this patch up a little bit more.
>
> 1) Patch that implements the ADJ_SETOFFSET (*and its implementation*)
> in do_adjtimex.
>
> 2) Patch that adds the new syscall and clock_id multiplexing.
>
> 3) Patches that wire it up to the rest of the architectures (there's
> still a bunch missing here).
I was not sure what the policy is about adding syscalls. Is it the
syscall author's responsibility to add it into every arch?
The last time (see a2e2725541fad7) the commit only added half of some
archs, and ignored others. In my patch, the syscall *really* works on
the archs that are present in the patch.
(Actually, I did not test blackfin, since I don't have one, but I
included it since I know they have a PTP hardware clock.)
> > +static inline int common_clock_adj(const clockid_t which_clock, struct timex *t)
> > +{
> > + if (CLOCK_REALTIME == which_clock)
> > + return do_adjtimex(t);
> > + else
> > + return -EOPNOTSUPP;
> > +}
>
>
> Would it make sense to point to the do_adjtimex() in the k_clock
> definition for CLOCK_REALTIME rather then conditionalizing it here?
But what about CLOCK_MONOTONIC_RAW, for example?
Does it make sense to allow it to be adjusted?
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH 1/8] posix clocks: introduce a syscall for clock tuning.
From: Richard Cochran @ 2010-09-24 7:55 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Peter Zijlstra, John Stultz, devicetree-discuss, linuxppc-dev,
linux-kernel, David Miller, netdev, linux-api, Thomas Gleixner,
Rodolfo Giometti, Christoph Lameter, linux-arm-kernel,
Krzysztof Halasa
In-Reply-To: <1285279423.5158.20.camel@pasglop>
On Fri, Sep 24, 2010 at 08:03:43AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2010-09-23 at 19:31 +0200, Richard Cochran wrote:
> > A new syscall is introduced that allows tuning of a POSIX clock. The
> > syscall is implemented for four architectures: arm, blackfin, powerpc,
> > and x86.
> >
> > The new syscall, clock_adjtime, takes two parameters, the clock ID,
> > and a pointer to a struct timex. The semantics of the timex struct
> > have been expanded by one additional mode flag, which allows an
> > absolute offset correction. When specificied, the clock offset is
> > immediately corrected by adding the given time value to the current
> > time value.
>
> Any reason why you CC'ed device-tree discuss ?
>
> This list is getting way too much unrelated stuff, which I find
> annoying, it would be nice if we were all a bit more careful here with
> our CC lists.
Sorry, I only added device-tree because some one asked me to do so.
http://marc.info/?l=linux-netdev&m=127273157912358
I'll leave it off next time.
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH] spi_mpc8xxx: issue with using definition of pram in Device Tree
From: Anton Vorontsov @ 2010-09-24 7:57 UTC (permalink / raw)
To: LEROY Christophe
Cc: David Brownell, linux-kernel, spi-devel-general, linuxppc-dev
In-Reply-To: <4C9C513B.40501@c-s.fr>
Hello,
On Fri, Sep 24, 2010 at 09:20:27AM +0200, LEROY Christophe wrote:
> The issue is that cpm_muram_alloc_fixed() allocates memory from the
> general purpose muram area (from 0x0 to 0x1bff).
> Here we need to return a pointer to the parameter RAM, which is
> located somewhere starting at 0x1c00. It is not a dynamic allocation
> that is required here but only to point on the correct location in
> the parameter RAM.
>
> For the CPM2, I don't know. I'm working with a MPC866.
>
> Attached is a previous discussion on the subject where I explain a
> bit more in details the issue.
The patch looks OK, I think.
Doesn't explain why that worked on MPC8272 (CPM2) and MPC8560
(also CPM2) machines though. But here's my guess (I no longer
have these boards to test it):
On 8272 I used this node:
+ spi@4c0 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "fsl,cpm2-spi", "fsl,spi";
+ reg = <0x11a80 0x40 0x89fc 0x2>;
On that SOC there are two muram data regions 0x0..0x2000 and
0x9000..0x9100. Note that we actually don't want "data" regions,
and the only reason why that worked is that sysdev/cpm_common.c
maps muram(0)..muram(max).
Thanks,
--
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2
^ permalink raw reply
* Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
From: Richard Cochran @ 2010-09-24 8:33 UTC (permalink / raw)
To: Christoph Lameter
Cc: John Stultz, Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra,
linux-api, devicetree-discuss, linux-kernel, netdev,
Thomas Gleixner, linuxppc-dev, David Miller, linux-arm-kernel,
Krzysztof Halasa
In-Reply-To: <alpine.DEB.2.00.1009231238170.2962@router.home>
On Thu, Sep 23, 2010 at 12:53:20PM -0500, Christoph Lameter wrote:
> On Thu, 23 Sep 2010, Richard Cochran wrote:
> > 3.3 Synchronizing the Linux System Time
> > ========================================
> >
> > One could offer a PHC as a combined clock source and clock event
> > device. The advantage of this approach would be that it obviates
> > the need for synchronization when the PHC is selected as the system
> > timer. However, some PHCs, namely the PHY based clocks, cannot be
> > used in this way.
>
> Why not? Do PHY based clock not at least provide a counter that increments
> in synchronized intervals throughout the network?
The counter in the PHY is accessed via the MDIO bus. One 16 bit read
takes anywhere from 25 to 40 microseconds. Reading the 64 bit time
value requires four reads, so we're talking about 100 to 160
microseconds, just for a single time reading.
In addition to that, reading MDIO bus can sleep. So, we can't (in
general) to offer PHCs as clock sources.
> > Instead, the patch set provides a way to offer a Pulse Per Second
> > (PPS) event from the PHC to the Linux PPS subsystem. A user space
> > application can read the PPS events and tune the system clock, just
> > like when using other external time sources like radio clocks or
> > GPS.
>
> User space is subject to various latencies created by the OS etc. I would
> that in order to have fine grained (read microsecond) accurary we would
> have to run the portions that are relevant to obtaining the desired
> accuracy in the kernel.
The time-critical operations are all performed in hardware (packet
timestamp), or in kernel space (input PPS timestamp). User space only
runs the servo (using hardware or kernel timestamps as input) and
performs the clock correction. With a sample rate of 1 PPS, the small
user space induced delay (a few dozen microseconds) between sample
time and clock correction is not an issue.
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Richard Cochran @ 2010-09-24 8:49 UTC (permalink / raw)
To: Christoph Lameter
Cc: John Stultz, Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra,
linux-api, devicetree-discuss, linux-kernel, netdev,
Thomas Gleixner, linuxppc-dev, David Miller, linux-arm-kernel,
Krzysztof Halasa
In-Reply-To: <alpine.DEB.2.00.1009231348150.2962@router.home>
On Thu, Sep 23, 2010 at 02:17:36PM -0500, Christoph Lameter wrote:
> On Thu, 23 Sep 2010, Richard Cochran wrote:
> > + These properties set the operational parameters for the PTP
> > + clock. You must choose these carefully for the clock to work right.
> > + Here is how to figure good values:
> > +
> > + TimerOsc = system clock MHz
> > + tclk_period = desired clock period nanoseconds
> > + NominalFreq = 1000 / tclk_period MHz
> > + FreqDivRatio = TimerOsc / NominalFreq (must be greater that 1.0)
> > + tmr_add = ceil(2^32 / FreqDivRatio)
> > + OutputClock = NominalFreq / tmr_prsc MHz
> > + PulseWidth = 1 / OutputClock microseconds
> > + FiperFreq1 = desired frequency in Hz
> > + FiperDiv1 = 1000000 * OutputClock / FiperFreq1
> > + tmr_fiper1 = tmr_prsc * tclk_period * FiperDiv1 - tclk_period
> > + max_adj = 1000000000 * (FreqDivRatio - 1.0) - 1
>
> Great stuff for clock synchronization...
>
> > + The calculation for tmr_fiper2 is the same as for tmr_fiper1. The
> > + driver expects that tmr_fiper1 will be correctly set to produce a 1
> > + Pulse Per Second (PPS) signal, since this will be offered to the PPS
> > + subsystem to synchronize the Linux clock.
>
> Argh. And conceptually completely screwed up. Why go through the PPS
> subsystem if you can directly tune the system clock based on a number of
> the cool periodic clock features that you have above? See how the other
> clocks do that easily? Look into drivers/clocksource. Add it there.
>
> Please do not introduce useless additional layers for clock sync. Load
> these ptp clocks like the other regular clock modules and make them sync
> system time like any other clock.
>
> Really guys: I want a PTP solution! Now! And not some idiotic additional
> kernel layers that just pass bits around because its so much fun and
> screws up clock accurary in due to the latency noise introduced while
> having so much fun with the bits.
(Sorry if this message comes twice. Mutt/Gmail flaked out again.)
I think you misunderstood this particular patch. The device tree
parameters are really just internal driver stuff. When you use the
eTSEC, you must make some design choices at the same time as you plan
your board. The proper values for some of the eTSEC registers are
based on these design choices. Since the Freescale documentation is a
bit thin on this, I added a few notes to help my fellow board
designers.
Because these values are closely related to the board itself, I think
that it is nicer to configure them via the device tree than using
either CONFIG_ variables or platform data.
Richard
^ permalink raw reply
* Re: [BUG 2.6.36-rc5] of_i2c.ko <-> i2c-core.ko dependency loop
From: Jean Delvare @ 2010-09-24 9:12 UTC (permalink / raw)
To: Randy Dunlap, Grant Likely
Cc: Mikael Pettersson, linuxppc-dev, linux-kernel, linux-i2c
In-Reply-To: <20100923150559.9c67da11.rdunlap@xenotime.net>
On Thu, 23 Sep 2010 15:05:59 -0700, Randy Dunlap wrote:
> On Thu, 23 Sep 2010 22:16:32 +0200 Mikael Pettersson wrote:
> > Randy Dunlap writes:
> > > No kconfig warnings?
> >
> > Not that I recall. I can check tomorrow if necessary.
>
> No kconfig warnings. I checked with your .config file.
>
> > > Please post your full .config file.
>
> Just a matter of module i2c-core calls of_ functions and module of_i2c calls
> i2c_ functions. Hmph. Something for Grant, Jean, and Ben to work out.
As far as I can see this is caused by this commit from Grant:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5
Mikael, can you please try reverting this patch and see if it solves
your problem?
--
Jean Delvare
^ permalink raw reply
* Re: ppc44x - how do i optimize driver for tlb hits
From: Josh Boyer @ 2010-09-24 10:30 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Ayman El-Khashab
In-Reply-To: <1285303432.14081.28.camel@pasglop>
On Fri, Sep 24, 2010 at 02:43:52PM +1000, Benjamin Herrenschmidt wrote:
>> The DMA is what I use in the "real world case" to get data into and out
>> of these buffers. However, I can disable the DMA completely and do only
>> the kmalloc. In this case I still see the same poor performance. My
>> prefetching is part of my algo using the dcbt instructions. I know the
>> instructions are effective b/c without them the algo is much less
>> performant. So yes, my prefetches are explicit.
>
>Could be some "effect" of the cache structure, L2 cache, cache geometry
>(number of ways etc...). You might be able to alleviate that by changing
>the "stride" of your prefetch.
>
>Unfortunately, I'm not familiar enough with the 440 micro architecture
>and its caches to be able to help you much here.
Also, doesn't kmalloc have a limit to the size of the request it will
let you allocate? I know in the distant past you could allocate 128K
with kmalloc, and 2M with an explicit call to get_free_pages. Anything
larger than that had to use vmalloc. The limit might indeed be higher
now, but a 4MB kmalloc buffer sounds very large, given that it would be
contiguous pages. Two of them even less so.
>> Ok, I will give that a try ... in addition, is there an easy way to use
>> any sort of gprof like tool to see the system performance? What about
>> looking at the 44x performance counters in some meaningful way? All
>> the experiments point to the fetching being slower in the full program
>> as opposed to the algo in a testbench, so I want to determine what it is
>> that could cause that.
>
>Does it have any useful performance counters ? I didn't think it did but
>I may be mistaken.
No, it doesn't.
josh
^ permalink raw reply
* Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
From: Neil Horman @ 2010-09-24 10:37 UTC (permalink / raw)
To: Michael Neuling
Cc: linuxppc-dev, linux-kernel, Neil Horman, Arjen Van De Ven, arjan
In-Reply-To: <25264.1285311394@neuling.org>
On Fri, Sep 24, 2010 at 04:56:34PM +1000, Michael Neuling wrote:
>
> > > > size_t size =3D 0;
> > > > FILE *file;
> > > > sprintf(buf, "/proc/irq/%i/smp_affinity", number);
> > > > - file =3D fopen(buf, "r");
> > > > + file =3D fopen(buf, "r+");
> > > > if (!file)
> > > > continue;
> > > > if (getline(&line, &size, file)=3D=3D0) {
> > > > @@ -89,7 +89,14 @@
> > > > continue;
> > > > }
> > > > cpumask_parse_user(line, strlen(line), irq->mask);
> > > > - fclose(file);
> > > > + /*
> > > > + * Check that we can write the affinity, if
> > > > + * not take it out of the list.
> > > > + */
> > > > + if (fputs(line, file) =3D=3D EOF)
> > > > + can_set =3D 0;
> >
> > > This is maybe a nit, but writing to the affinity file can fail for a few
> > > different reasons, some of them permanent, some transient. For instance,=
> > if
> > > we're in a memory constrained condition temporarily irq_affinity_proc_wri=
> > te
> > > might return -ENOMEM. =20
> >
> > Yeah true, usually followed shortly by your kernel going so far into
> > swap you never get it back, or OOMing, but I guess it's possible.
> >
> > > Might it be better to modify this code so that, instead
> > > of using fputs to merge the various errors into an EOF, we use some other=
> > write
> > > method that lets us better determine the error and selectively ban the in=
> > terrupt
> > > only for those errors which we consider permanent?
> >
> > Yep. It seems fputs() gives you know way to get the actual error from
> > write(), so it looks we'll need to switch to open/write, but that's
> > probably not so terrible.
>
> fclose inherits the error from fputs and it sets errno correctly. Below
> uses this to catch only EIO errors and mark them for the banned list.
>
> Mikey
>
> irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
>
> On pseries powerpc, IPIs are registered with an IRQ number so
> /proc/interrupts looks like this on a 2 core/2 thread machine:
>
> CPU0 CPU1 CPU2 CPU3
> 16: 3164282 3290514 1138794 983121 XICS Level IPI
> 18: 2605674 0 304994 0 XICS Level lan0
> 30: 400057 0 169209 0 XICS Level ibmvscsi
> LOC: 133734 77250 106425 91951 Local timer interrupts
> SPU: 0 0 0 0 Spurious interrupts
> CNT: 0 0 0 0 Performance monitoring interrupts
> MCE: 0 0 0 0 Machine check exceptions
>
> Unfortunately this means irqbalance attempts to set the affinity of IPIs
> which is not possible. So in the above case, when irqbalance is in
> performance mode due to heavy IPI, lan0 and ibmvscsi activity, it
> sometimes attempts to put the IPIs on one core (CPU0&1) and lan0 and
> ibmvscsi on the other core (CPU2&3). This is suboptimal as we want lan0
> and ibmvscsi to be on separate cores and IPIs to be ignored.
>
> When irqblance attempts writes to the IPI smp_affinity (ie.
> /proc/irq/16/smp_affinity in the above example) it fails with an EIO but
> irqbalance currently ignores this.
>
> This patch catches these write fails and in this case adds that IRQ
> number to the banned IRQ list. This will catch the above IPI case and
> any other IRQ where the SMP affinity can't be set.
>
> Tested on POWER6, POWER7 and x86.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
>
> Index: irqbalance/irqlist.c
> ===================================================================
> --- irqbalance.orig/irqlist.c
> +++ irqbalance/irqlist.c
> @@ -28,6 +28,7 @@
> #include <unistd.h>
> #include <sys/types.h>
> #include <dirent.h>
> +#include <errno.h>
>
> #include "types.h"
> #include "irqbalance.h"
> @@ -67,7 +68,7 @@
> DIR *dir;
> struct dirent *entry;
> char *c, *c2;
> - int nr , count = 0;
> + int nr , count = 0, can_set = 1;
> char buf[PATH_MAX];
> sprintf(buf, "/proc/irq/%i", number);
> dir = opendir(buf);
> @@ -80,7 +81,7 @@
> size_t size = 0;
> FILE *file;
> sprintf(buf, "/proc/irq/%i/smp_affinity", number);
> - file = fopen(buf, "r");
> + file = fopen(buf, "r+");
> if (!file)
> continue;
> if (getline(&line, &size, file)==0) {
> @@ -89,7 +90,13 @@
> continue;
> }
> cpumask_parse_user(line, strlen(line), irq->mask);
> - fclose(file);
> + /*
> + * Check that we can write the affinity, if
> + * not take it out of the list.
> + */
> + fputs(line, file);
> + if (fclose(file) && errno == EIO)
> + can_set = 0;
> free(line);
> } else if (strcmp(entry->d_name,"allowed_affinity")==0) {
> char *line = NULL;
> @@ -122,7 +129,7 @@
> count++;
>
> /* if there is no choice in the allowed mask, don't bother to balance */
> - if (count<2)
> + if ((count<2) || (can_set == 0))
> irq->balance_level = BALANCE_NONE;
>
>
>
Thank you, this looks good to me, I'll integrate this shortly.
Neil
^ permalink raw reply
* Re: [BUG 2.6.36-rc5] of_i2c.ko <-> i2c-core.ko dependency loop
From: Mikael Pettersson @ 2010-09-24 10:50 UTC (permalink / raw)
To: Jean Delvare
Cc: Mikael Pettersson, linux-kernel, Randy Dunlap, linux-i2c,
linuxppc-dev
In-Reply-To: <20100924111209.3c071da3@endymion.delvare>
Jean Delvare writes:
> On Thu, 23 Sep 2010 15:05:59 -0700, Randy Dunlap wrote:
> > On Thu, 23 Sep 2010 22:16:32 +0200 Mikael Pettersson wrote:
> > > Randy Dunlap writes:
> > > > No kconfig warnings?
> > >
> > > Not that I recall. I can check tomorrow if necessary.
> >
> > No kconfig warnings. I checked with your .config file.
> >
> > > > Please post your full .config file.
> >
> > Just a matter of module i2c-core calls of_ functions and module of_i2c calls
> > i2c_ functions. Hmph. Something for Grant, Jean, and Ben to work out.
>
> As far as I can see this is caused by this commit from Grant:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5
>
> Mikael, can you please try reverting this patch and see if it solves
> your problem?
Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings,
and I was able to insmod the i2c-{core,dev,powermac}.ko modules.
/Mikael
^ permalink raw reply
* RE: MPC8641D PEX: programming OWBAR in Endpoint mode?
From: David Hagood @ 2010-09-24 10:50 UTC (permalink / raw)
To: Chen, Tiejun; +Cc: linuxppc-dev
In-Reply-To: <52CF90264091A14888078A031D780F4306C8C176@ism-mail03.corp.ad.wrs.com>
On Fri, 2010-09-24 at 07:09 +0200, Chen, Tiejun wrote:
>
> Right but this should be done for RC mode, not for EP mode we're
> discussing.
>
> Tiejun
According to the Freescale documentation, outbound is just as valid for
endpoint as for root complex - indeed, to generate MSIs from software
REQUIRES programming an outbound ATMU to access the host's APIC.
Moreover, ANY PCI endpoint SHOULD be able to do bus master access, and
that is done by the outbound ATMUs.
^ permalink raw reply
* Re: [BUG 2.6.36-rc5] of_i2c.ko <-> i2c-core.ko dependency loop
From: Jean Delvare @ 2010-09-24 11:39 UTC (permalink / raw)
To: Mikael Pettersson, Grant Likely
Cc: Randy Dunlap, linuxppc-dev, linux-kernel, linux-i2c
In-Reply-To: <19612.33369.175358.879889@pilspetsen.it.uu.se>
Hi Mikael,
On Fri, 24 Sep 2010 12:50:01 +0200, Mikael Pettersson wrote:
> Jean Delvare writes:
> > As far as I can see this is caused by this commit from Grant:
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5
> >
> > Mikael, can you please try reverting this patch and see if it solves
> > your problem?
>
> Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings,
> and I was able to insmod the i2c-{core,dev,powermac}.ko modules.
Thanks for testing and reporting. Grant, unless you come up with a fix
very quickly, I'll have to revert
959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 for 2.6.36.
--
Jean Delvare
^ permalink raw reply
* Re: [PATCH 6/8] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Alan Cox @ 2010-09-24 11:52 UTC (permalink / raw)
To: Christian Riesch
Cc: John Stultz, Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra,
linux-api, devicetree-discuss, linux-kernel, David Miller,
Thomas Gleixner, netdev, Christoph Lameter, linuxppc-dev,
Richard Cochran, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <4C9BC620.3@riesch.at>
> However, if the clock selected by the BMC is switched off, loses its
> network connection..., the second best clock is selected by the BMC and
> becomes master. This clock may be less accurate and thus our slave clock
> has to switch from one notion of time to another. Is that the conflict
> you mentioned?
No you get situations where you have policy reasons for trusting
particular clocks for particular things.
So you may have a PTP or NTP clock providing basic system time but also
have other PTP clocks that are actually being used for synchronization
work.
With NTP it's not so far been a big issue - NTP isn't used for industrial
high precision control and the cases we end up with multiple NTP clocks
it's on a virtualised systems where it is isolated.
With high precision clocks you sometimes want to honour a specific PTP
time source and use it rather than try and merge it with your other time
sources (which may differ from the equipment elsewhere). What matters is
things like all the parts of a several mile long conveyor belt of hot
steel slab stopping at the same moment [1].
In lots of control applications you've got assorted different time planes
which wish to talk their own time and you have to accept it, so we need
to support that kind of use.
I agree entirely the normal boring 'I installed my distro and..' case for
PTP or for NTP is merging all the sources, running the algorithm and using
the system time for it. Likewise almost all "normal" application code
will be watching system time.
Alan
[1] Which was my first encounter with writing Vax/VMS assembly language
^ permalink raw reply
* Re: ppc44x - how do i optimize driver for tlb hits
From: Ayman El-Khashab @ 2010-09-24 13:08 UTC (permalink / raw)
To: Josh Boyer; +Cc: linuxppc-dev
In-Reply-To: <20100924103034.GA27958@zod.rchland.ibm.com>
On Fri, Sep 24, 2010 at 06:30:34AM -0400, Josh Boyer wrote:
> On Fri, Sep 24, 2010 at 02:43:52PM +1000, Benjamin Herrenschmidt wrote:
> >> The DMA is what I use in the "real world case" to get data into and out
> >> of these buffers. However, I can disable the DMA completely and do only
> >> the kmalloc. In this case I still see the same poor performance. My
> >> prefetching is part of my algo using the dcbt instructions. I know the
> >> instructions are effective b/c without them the algo is much less
> >> performant. So yes, my prefetches are explicit.
> >
> >Could be some "effect" of the cache structure, L2 cache, cache geometry
> >(number of ways etc...). You might be able to alleviate that by changing
> >the "stride" of your prefetch.
My original theory was that it was having lots of cache misses. But since
the algorithm works standalone fast and uses large enough buffers (4MB),
much of the cache is flushed and replaced with my data. The cache is 32K,
8 way, 32b/line. I've crafted the algorithm to use those parameters.
> >
> >Unfortunately, I'm not familiar enough with the 440 micro architecture
> >and its caches to be able to help you much here.
>
> Also, doesn't kmalloc have a limit to the size of the request it will
> let you allocate? I know in the distant past you could allocate 128K
> with kmalloc, and 2M with an explicit call to get_free_pages. Anything
> larger than that had to use vmalloc. The limit might indeed be higher
> now, but a 4MB kmalloc buffer sounds very large, given that it would be
> contiguous pages. Two of them even less so.
I thought so too, but at least in the current implementation we found
empirically that we could kmalloc up to but no more than 4MB. We have
also tried an approach in user memory and then using "get_user_pages"
and building a scatter-gather. We found that the compare code doesn't
perform any better.
I suppose another option is to to use the kernel profiling option I
always see but have never used. Is that a viable option to figure out
what is happening here?
ayman
^ permalink raw reply
* Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
From: Richard Cochran @ 2010-09-24 13:14 UTC (permalink / raw)
To: Alan Cox
Cc: John Stultz, Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra,
linux-api, devicetree-discuss, linux-kernel, Thomas Gleixner,
netdev, Christoph Lameter, linuxppc-dev, David Miller,
linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <20100923213654.0c64b047@lxorguk.ukuu.org.uk>
On Thu, Sep 23, 2010 at 09:36:54PM +0100, Alan Cox wrote:
> Drop the clockid_t and swap it for a file handle like a proper Unix or
> Linux interface. The rest is much the same
>
> fd = open /sys/class/timesource/[whatever]
>
> various queries you may want to do to check the name etc
>
> fclock_adjtime(fd, ...)
Okay, but lets extend the story:
clock_getttime(fd, ...);
clock_settime(fd, ...);
timer_create(fd, ...);
Can you agree to that as well?
(We would need to ensure that 'fd' avoids the range 0 to MAX_CLOCKS).
Richard
^ permalink raw reply
* Re: [BUG 2.6.36-rc5] of_i2c.ko <-> i2c-core.ko dependency loop
From: Grant Likely @ 2010-09-24 13:48 UTC (permalink / raw)
To: Jean Delvare, Mikael Pettersson
Cc: Randy Dunlap, linuxppc-dev, linux-kernel, linux-i2c
In-Reply-To: <20100924133932.4320b993@endymion.delvare>
"Jean Delvare" <khali@linux-fr.org> wrote:
>Hi Mikael,
>
>On Fri, 24 Sep 2010 12:50:01 +0200, Mikael Pettersson wrote:
>> Jean Delvare writes:
>> > As far as I can see this is caused by this commit from Grant:
>> >
>> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5
>> >
>> > Mikael, can you please try reverting this patch and see if it solves
>> > your problem?
>>
>> Yes, reverting the above commit from 2.6.36-rc5 eliminated the warnings,
>> and I was able to insmod the i2c-{core,dev,powermac}.ko modules.
>
>Thanks for testing and reporting. Grant, unless you come up with a fix
>very quickly, I'll have to revert
>959e85f7751c33d1a2dabc5cc3fe2ed0db7052e5 for 2.6.36.
I'll get a fix out today.
g.
>
>--
>Jean Delvare
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
^ permalink raw reply
* Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
From: Richard Cochran @ 2010-09-24 13:50 UTC (permalink / raw)
To: john stultz
Cc: Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra, linux-api,
devicetree-discuss, linux-kernel, Thomas Gleixner, netdev,
Christoph Lameter, linuxppc-dev, David Miller, linux-arm-kernel,
Krzysztof Halasa
In-Reply-To: <1285270733.2587.46.camel@localhost.localdomain>
On Thu, Sep 23, 2010 at 12:38:53PM -0700, john stultz wrote:
> On Thu, 2010-09-23 at 19:30 +0200, Richard Cochran wrote:
> > /sys/class/timesource/<name>/id
> > /sys/class/ptp/ptp_clock_X/id
> >
> So yea, I'm not a fan of the "timesource" sysfs interface. One, I think
> the name is poor (posix_clocks or something a little more specific would
> be an improvement), and second, I don't like the dictionary interface,
> where one looks up the clock by name.
>
> Instead, I think having the id hanging off the class driver is much
> better, as it allows mapping the actual hardware to the id more clearly.
>
> So I'd drop the "timesource" listing. And maybe change "id" to
> "clock_id" so its a little more clear what the id is for.
Okay, I will drop /sys/class/timesource (hope Alan Cox agrees :)
I threw it out there mostly for the sake of discussion. I imagined
that there could be other properties in that directory, like time
scale (TAI, UTC, etc). But it seems like we don't really need anything
in that direction.
> > 3.3 Synchronizing the Linux System Time
> > ========================================
> >
> > One could offer a PHC as a combined clock source and clock event
> > device. The advantage of this approach would be that it obviates
> > the need for synchronization when the PHC is selected as the system
> > timer. However, some PHCs, namely the PHY based clocks, cannot be
> > used in this way.
>
> Again, I'd scratch this.
Okay, I only wanted to preempt the question which people are asking
all the time: why can't it work with the system clock transparently?
> > Instead, the patch set provides a way to offer a Pulse Per Second
> > (PPS) event from the PHC to the Linux PPS subsystem. A user space
> > application can read the PPS events and tune the system clock, just
> > like when using other external time sources like radio clocks or
> > GPS.
>
> Forgive me for a bit of a tangent here:
> So while I think this PPS method is a neat idea, I'm a little curious
> how much of a difference the PPS method for syncing the clock would be
> over just a simple reading of the two clocks and correcting the offset.
>
> It seems much of it depends on the read latency of the PTP hardware vs
> the interrupt latency. Also the PTP clock granularity would effect the
> read accuracy (like on the RTC, you don't really know how close to the
> second boundary you are).
>
> Have you done any such measurements between the two methods?
I have not yet tested how well the PPS method works, but I expect at
least as good results as when using a GPS.
> I just
> wonder if it would actually be something noticeable, and if its not, how
> much lighter this patch-set would be without the PPS connection.
As you say, the problem with just reading two clocks at nearly the
same time is that you have two uncertain operations. If you use a PPS,
then there is only one clock to read, and that clock is the system
clock, which hopefully is not too slow to read!
In addition, PHY reads can sleep, and that surely won't work. Even with
MAC PHCs, reading outside of interrupt context makes you vulnerable to
other interrupts.
> Again, this isn't super critical, just trying to make sure we don't end
> up adding a bunch of code that doesn't end up being used.
The PPS hooks are really only just a few lines of code.
The great advantage of a PPS approach over and ad-hoc "read two clocks
and compare", is that, with a steady, known sample rate, you can
analyze and predict your control loop behavior. There is lots of
literature available on how to do it. IMHO, that is the big weakness
of the timecompare.c stuff used in the current IGB driver.
> Also PPS
> interrupts are awfully frequent, so systems concerned with power-saving
> and deep idles probably would like something that could be done at a
> more coarse interval.
We could always make the pulse rate programmable, for power-saving
applications.
> > 4.1 Supported Hardware Clocks
> > ==============================
> >
> > + Standard Linux system timer
> > This driver exports the standard Linux timer as a PTP clock.
> > Although this duplicates CLOCK_REALTIME, the code serves as a
> > simple example for driver development and lets people who without
> > special hardware try the new API.
>
> Still not a fan of this one, figure the app should handle the special
> case where there are no PTP clocks and just use CLOCK_REALTIME rather
> then funneling CLOCK_REALTIME through the PTP interface.
It is really just as an example and for people who want to test driver
the API. It can surely be removed before the final version...
Thanks for your comments,
Richard
^ permalink raw reply
* Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
From: Alan Cox @ 2010-09-24 14:02 UTC (permalink / raw)
To: Richard Cochran
Cc: John Stultz, Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra,
linux-api, devicetree-discuss, linux-kernel, Thomas Gleixner,
netdev, Christoph Lameter, linuxppc-dev, David Miller,
linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <20100924131407.GA3113@riccoc20.at.omicron.at>
On Fri, 24 Sep 2010 15:14:07 +0200
Richard Cochran <richardcochran@gmail.com> wrote:
> On Thu, Sep 23, 2010 at 09:36:54PM +0100, Alan Cox wrote:
> > Drop the clockid_t and swap it for a file handle like a proper Unix or
> > Linux interface. The rest is much the same
> >
> > fd = open /sys/class/timesource/[whatever]
> >
> > various queries you may want to do to check the name etc
> >
> > fclock_adjtime(fd, ...)
>
> Okay, but lets extend the story:
>
> clock_getttime(fd, ...);
>
> clock_settime(fd, ...);
>
> timer_create(fd, ...);
>
> Can you agree to that as well?
>
> (We would need to ensure that 'fd' avoids the range 0 to MAX_CLOCKS).
You can't do that avoiding as you might like because the behaviour of
file handle numbering is defined by the standards. Hence the "f*"
versions of the calls (and of lots of other stuff)
Whether you add new syscalls or do the fd passing using flags and hide
the ugly bits in glibc is another question.
^ permalink raw reply
* Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
From: Alan Cox @ 2010-09-24 14:07 UTC (permalink / raw)
To: Alan Cox
Cc: John Stultz, Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra,
linux-api, devicetree-discuss, linux-kernel, David Miller,
Thomas Gleixner, netdev, Christoph Lameter, linuxppc-dev,
Richard Cochran, linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <20100924150246.0e6064b6@lxorguk.ukuu.org.uk>
> You can't do that avoiding as you might like because the behaviour of
> file handle numbering is defined by the standards. Hence the "f*"
> versions of the calls (and of lots of other stuff)
>
> Whether you add new syscalls or do the fd passing using flags and hide
> the ugly bits in glibc is another question.
To add an example of what I mean you might end up defining "CLOCK_FD" to
indicate to use the fd in the struct, but given syscalls are trivial
codewise and would end up as
fclock_foo(int fd, blah)
{
clock = fd_to_clock(fd);
if (error)
return error
clock_do_foo(clock, blah);
clock_put(clock);
}
and
clock_foo(int posixid, blah)
{
clock = posix_to_clock(posixid)
...
rest same
}
as wrappers it seems hardly worth adding ugly hacks
^ permalink raw reply
* Re: [PATCH 0/8] De-couple sysfs memory directories from memory sections
From: Nathan Fontenot @ 2010-09-24 14:35 UTC (permalink / raw)
To: balbir
Cc: linuxppc-dev, Greg KH, linux-kernel, Dave Hansen, linux-mm,
KAMEZAWA Hiroyuki
In-Reply-To: <20100923184002.GM3952@balbir.in.ibm.com>
On 09/23/2010 01:40 PM, Balbir Singh wrote:
> * Nathan Fontenot <nfont@austin.ibm.com> [2010-09-22 09:15:43]:
>
>> This set of patches decouples the concept that a single memory
>> section corresponds to a single directory in
>> /sys/devices/system/memory/. On systems
>> with large amounts of memory (1+ TB) there are performance issues
>> related to creating the large number of sysfs directories. For
>> a powerpc machine with 1 TB of memory we are creating 63,000+
>> directories. This is resulting in boot times of around 45-50
>> minutes for systems with 1 TB of memory and 8 hours for systems
>> with 2 TB of memory. With this patch set applied I am now seeing
>> boot times of 5 minutes or less.
>>
>> The root of this issue is in sysfs directory creation. Every time
>> a directory is created a string compare is done against all sibling
>> directories to ensure we do not create duplicates. The list of
>> directory nodes in sysfs is kept as an unsorted list which results
>> in this being an exponentially longer operation as the number of
>> directories are created.
>>
>> The solution solved by this patch set is to allow a single
>> directory in sysfs to span multiple memory sections. This is
>> controlled by an optional architecturally defined function
>> memory_block_size_bytes(). The default definition of this
>> routine returns a memory block size equal to the memory section
>> size. This maintains the current layout of sysfs memory
>> directories as it appears to userspace to remain the same as it
>> is today.
>>
>> For architectures that define their own version of this routine,
>> as is done for powerpc in this patchset, the view in userspace
>> would change such that each memoryXXX directory would span
>> multiple memory sections. The number of sections spanned would
>> depend on the value reported by memory_block_size_bytes.
>>
>> In both cases a new file 'end_phys_index' is created in each
>> memoryXXX directory. This file will contain the physical id
>> of the last memory section covered by the sysfs directory. For
>> the default case, the value in 'end_phys_index' will be the same
>> as in the existing 'phys_index' file.
>>
>
> What does this mean for memory hotplug or hotunplug?
>
Memory hotplug will function on a memory block size basis. For
architectures that do not define their own memory_block_size_bytes()
routine, they will get the default size and everything will work
the same as it does today.
For architectures that define their own memory_block_size_bytes()
routine and have multiple memory sections per memory block, hotplug
operations will add or remove all of the memory sections in the memory
memory block.
-Nathan
^ permalink raw reply
* Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
From: Alan Cox @ 2010-09-24 14:57 UTC (permalink / raw)
To: Richard Cochran
Cc: Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra, john stultz,
devicetree-discuss, linux-kernel, netdev, Thomas Gleixner,
linux-api, Christoph Lameter, linuxppc-dev, David Miller,
linux-arm-kernel, Krzysztof Halasa
In-Reply-To: <20100924135001.GB3113@riccoc20.at.omicron.at>
> > Instead, I think having the id hanging off the class driver is much
> > better, as it allows mapping the actual hardware to the id more clearly.
> >
> > So I'd drop the "timesource" listing. And maybe change "id" to
> > "clock_id" so its a little more clear what the id is for.
>
> Okay, I will drop /sys/class/timesource (hope Alan Cox agrees :)
It makes sense to hang anything off the physical id
> I threw it out there mostly for the sake of discussion. I imagined
> that there could be other properties in that directory, like time
> scale (TAI, UTC, etc). But it seems like we don't really need anything
> in that direction.
They can still hang off the physical device. Thats really a detail
> > interrupts are awfully frequent, so systems concerned with power-saving
> > and deep idles probably would like something that could be done at a
> > more coarse interval.
>
> We could always make the pulse rate programmable, for power-saving
> applications.
I would expect the kernel drivers to be responsible for
- Turning off when they can
- Picking rates that are power optimal for the requirement
The latter is a bit interesting as I don't see anything in any of the
timer APIs to express accuracy (a problem we have in kernel too).
Historically it simply hasn't mattered.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox