* RE: [gianfar]bandwidth management problem on mpc8313 based board
From: David Laight @ 2011-06-08 11:25 UTC (permalink / raw)
To: Vijay Nikam, linuxppc-dev
In-Reply-To: <BANLkTi=TVsXiWR1qS5upnM31kVk9qqXTmw@mail.gmail.com>
> Subject: [gianfar]bandwidth management problem on mpc8313 based board
...
> I have mpc8313 powerpc based board with silicon revision 2.1. the
> processor has two ETH ports (eTsec1 and eTsec2) i.e. eth0 and eth1.
> eth0 is 1Gbps port and eth1 is 100Mbps port. On board there is L2
> switch from TANTOS2G (psb6972) supports one port 1Gbps,
> and from switch there are 4 more eth ports derived which are 100Mbps
> ports and port based VLAN is configured for this purpose.
>=20
> The interface between switch and eth0 (port of processor) is RGMII. So
> the processor port and switch port are connected on 1Gbps Link.
...
> After this I started to perform bandwidth test using iperf tool.
> When I performed this test on one port out of 4 derived ports I am
> getting bandwidth in the range of 80-85Mbps
> but when the same test is performed on 2 ports simultaneously then the
> per port bandwidth is reduced to 40-45Mbps.
To summerise, you have a Ge port connected by RGMII (cross over) to
an on-board switch that is configured to use VLAN tagging to drive
four external 100M ports?
I see two likely reasons for the aggregate throughput being constant:
1) The switch has limited throughput/buffering
2) The host really is 100% busy
3) The remote system has limited throughput
I'd vote for the system being busy and 'top' (or whatever you are
using) lying about the cpu usage. Measuring free cpu time by counting
it in a low priority process is much more accurate than relying on
the 'code interrupted by timer tick' scheme.
(Clearly the scheduler could use a high-res timestamp on entry/exit
to the idle loop and/or process switch - but, to my knowledge, the
linux kernel only uses the timer interrupt.)
David
^ permalink raw reply
* Re: [gianfar]bandwidth management problem on mpc8313 based board
From: Vijay Nikam @ 2011-06-08 10:51 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110607142213.2851f92e@schlenkerla.am.freescale.net>
Hello Scott,
Thanks for the prompt reply.
> What's your CPU utilization? The CPU may just not be able to keep up wit=
h
> that much traffic, with the software you're running.
The software I am using to check bandwidth is 'iperf'. Without running iper=
f the
CPU utilization varies around 30-50% and with iperf running it shoots
upto 99.9%.
> What packet size are you using?
The packet size is - 1518 + VLAN_Tag (4Bytes) =3D 1522 Bytes
Another point which I would like to clear is that mpc8313 has eth0 (eTsec1)=
of
1Gbps, if more than 50% of CPU Time is available then why the total bandwid=
th
should limit to less than 100 Mbps? At least 400Mbps should be expected, p=
lease
correct if I am wrong!
Please acknowledge, thanks.
Kind Regards,
Vijay Nikam
On Wed, Jun 8, 2011 at 12:52 AM, Scott Wood <scottwood@freescale.com> wrote=
:
> On Tue, 7 Jun 2011 18:32:37 +0530
> Vijay Nikam <vijay.t.nikam@gmail.com> wrote:
>
>> Dear All,
>>
>> I have mpc8313 powerpc based board with silicon revision 2.1. the
>> processor has two ETH ports (eTsec1 and eTsec2) i.e. eth0 and eth1.
>> eth0 is 1Gbps port and eth1 is 100Mbps port. On board there is L2
>> switch from TANTOS2G (psb6972) supports one port 1Gbps,
>> and from switch there are 4 more eth ports derived which are 100Mbps
>> ports and port based VLAN is configured for this purpose.
>>
>> The interface between switch and eth0 (port of processor) is RGMII. So
>> the processor port and switch port are connected on 1Gbps Link.
>> The other 4 derived ports (100Mbps) are used to connect to external worl=
d.
>> On this board Embedded Linux is running of kernel version 2.6.23 with HR=
T patch.
>
> That's rather old.
>
>> The ethernet controller driver in use is "gianfar" version 1.3
>> The driver is configured properly as it determines both links 1000Mbps
>> (eth0) and 100Mbps (eth1) also verified with ethtool.
>>
>> After this I started to perform bandwidth test using iperf tool.
>> When I performed this test on one port out of 4 derived ports I am
>> getting bandwidth in the range of 80-85Mbps
>> but when the same test is performed on 2 ports simultaneously then the
>> per port bandwidth is reduced to 40-45Mbps.
>>
>> But my understanding is all of the 4 ports should support 100Mbps
>> bandwidth simultaneously (as base port is 1Gbps).
>> Then why bandwidth gets reduced when more than one port are
>> communicating simultaneously?
>> Any reason or suggestion I should check for this problem?
>
> What's your CPU utilization? =A0The CPU may just not be able to keep up w=
ith
> that much traffic, with the software you're running.
>
> What packet size are you using?
>
> -Scott
>
>
^ permalink raw reply
* [PATCH] gianfar:localized filer table
From: Jiajun Wu @ 2011-06-08 7:46 UTC (permalink / raw)
To: netdev, davem; +Cc: Jiajun Wu, linuxppc-dev
Each eTSEC device should own localized filer table.
Signed-off-by: Jiajun Wu <b06378@freescale.com>
---
drivers/net/gianfar.c | 29 ++++++++----------
drivers/net/gianfar.h | 8 +++--
drivers/net/gianfar_ethtool.c | 64 +++++++++++++++++++++--------------------
3 files changed, 51 insertions(+), 50 deletions(-)
diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index ff60b23..2dfcc80 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -10,7 +10,7 @@
* Maintainer: Kumar Gala
* Modifier: Sandeep Gopalpet <sandeep.kumar@freescale.com>
*
- * Copyright 2002-2009 Freescale Semiconductor, Inc.
+ * Copyright 2002-2009, 2011 Freescale Semiconductor, Inc.
* Copyright 2007 MontaVista Software, Inc.
*
* This program is free software; you can redistribute it and/or modify it
@@ -476,9 +476,6 @@ static const struct net_device_ops gfar_netdev_ops = {
#endif
};
-unsigned int ftp_rqfpr[MAX_FILER_IDX + 1];
-unsigned int ftp_rqfcr[MAX_FILER_IDX + 1];
-
void lock_rx_qs(struct gfar_private *priv)
{
int i = 0x0;
@@ -868,28 +865,28 @@ static u32 cluster_entry_per_class(struct gfar_private *priv, u32 rqfar,
rqfar--;
rqfcr = RQFCR_CLE | RQFCR_PID_MASK | RQFCR_CMP_EXACT;
- ftp_rqfpr[rqfar] = rqfpr;
- ftp_rqfcr[rqfar] = rqfcr;
+ priv->ftp_rqfpr[rqfar] = rqfpr;
+ priv->ftp_rqfcr[rqfar] = rqfcr;
gfar_write_filer(priv, rqfar, rqfcr, rqfpr);
rqfar--;
rqfcr = RQFCR_CMP_NOMATCH;
- ftp_rqfpr[rqfar] = rqfpr;
- ftp_rqfcr[rqfar] = rqfcr;
+ priv->ftp_rqfpr[rqfar] = rqfpr;
+ priv->ftp_rqfcr[rqfar] = rqfcr;
gfar_write_filer(priv, rqfar, rqfcr, rqfpr);
rqfar--;
rqfcr = RQFCR_CMP_EXACT | RQFCR_PID_PARSE | RQFCR_CLE | RQFCR_AND;
rqfpr = class;
- ftp_rqfcr[rqfar] = rqfcr;
- ftp_rqfpr[rqfar] = rqfpr;
+ priv->ftp_rqfcr[rqfar] = rqfcr;
+ priv->ftp_rqfpr[rqfar] = rqfpr;
gfar_write_filer(priv, rqfar, rqfcr, rqfpr);
rqfar--;
rqfcr = RQFCR_CMP_EXACT | RQFCR_PID_MASK | RQFCR_AND;
rqfpr = class;
- ftp_rqfcr[rqfar] = rqfcr;
- ftp_rqfpr[rqfar] = rqfpr;
+ priv->ftp_rqfcr[rqfar] = rqfcr;
+ priv->ftp_rqfpr[rqfar] = rqfpr;
gfar_write_filer(priv, rqfar, rqfcr, rqfpr);
return rqfar;
@@ -904,8 +901,8 @@ static void gfar_init_filer_table(struct gfar_private *priv)
/* Default rule */
rqfcr = RQFCR_CMP_MATCH;
- ftp_rqfcr[rqfar] = rqfcr;
- ftp_rqfpr[rqfar] = rqfpr;
+ priv->ftp_rqfcr[rqfar] = rqfcr;
+ priv->ftp_rqfpr[rqfar] = rqfpr;
gfar_write_filer(priv, rqfar, rqfcr, rqfpr);
rqfar = cluster_entry_per_class(priv, rqfar, RQFPR_IPV6);
@@ -921,8 +918,8 @@ static void gfar_init_filer_table(struct gfar_private *priv)
/* Rest are masked rules */
rqfcr = RQFCR_CMP_NOMATCH;
for (i = 0; i < rqfar; i++) {
- ftp_rqfcr[i] = rqfcr;
- ftp_rqfpr[i] = rqfpr;
+ priv->ftp_rqfcr[i] = rqfcr;
+ priv->ftp_rqfpr[i] = rqfpr;
gfar_write_filer(priv, i, rqfcr, rqfpr);
}
}
diff --git a/drivers/net/gianfar.h b/drivers/net/gianfar.h
index fc86f51..ba36dc7 100644
--- a/drivers/net/gianfar.h
+++ b/drivers/net/gianfar.h
@@ -9,7 +9,7 @@
* Maintainer: Kumar Gala
* Modifier: Sandeep Gopalpet <sandeep.kumar@freescale.com>
*
- * Copyright 2002-2009 Freescale Semiconductor, Inc.
+ * Copyright 2002-2009, 2011 Freescale Semiconductor, Inc.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
@@ -1107,10 +1107,12 @@ struct gfar_private {
/* HW time stamping enabled flag */
int hwts_rx_en;
int hwts_tx_en;
+
+ /*Filer table*/
+ unsigned int ftp_rqfpr[MAX_FILER_IDX + 1];
+ unsigned int ftp_rqfcr[MAX_FILER_IDX + 1];
};
-extern unsigned int ftp_rqfpr[MAX_FILER_IDX + 1];
-extern unsigned int ftp_rqfcr[MAX_FILER_IDX + 1];
static inline int gfar_has_errata(struct gfar_private *priv,
enum gfar_errata err)
diff --git a/drivers/net/gianfar_ethtool.c b/drivers/net/gianfar_ethtool.c
index 493d743..239e333 100644
--- a/drivers/net/gianfar_ethtool.c
+++ b/drivers/net/gianfar_ethtool.c
@@ -9,7 +9,7 @@
* Maintainer: Kumar Gala
* Modifier: Sandeep Gopalpet <sandeep.kumar@freescale.com>
*
- * Copyright 2003-2006, 2008-2009 Freescale Semiconductor, Inc.
+ * Copyright 2003-2006, 2008-2009, 2011 Freescale Semiconductor, Inc.
*
* This software may be used and distributed according to
* the terms of the GNU Public License, Version 2, incorporated herein
@@ -609,15 +609,15 @@ static void ethflow_to_filer_rules (struct gfar_private *priv, u64 ethflow)
if (ethflow & RXH_L2DA) {
fcr = RQFCR_PID_DAH |RQFCR_CMP_NOMATCH |
RQFCR_HASH | RQFCR_AND | RQFCR_HASHTBL_0;
- ftp_rqfpr[priv->cur_filer_idx] = fpr;
- ftp_rqfcr[priv->cur_filer_idx] = fcr;
+ priv->ftp_rqfpr[priv->cur_filer_idx] = fpr;
+ priv->ftp_rqfcr[priv->cur_filer_idx] = fcr;
gfar_write_filer(priv, priv->cur_filer_idx, fcr, fpr);
priv->cur_filer_idx = priv->cur_filer_idx - 1;
fcr = RQFCR_PID_DAL | RQFCR_AND | RQFCR_CMP_NOMATCH |
RQFCR_HASH | RQFCR_AND | RQFCR_HASHTBL_0;
- ftp_rqfpr[priv->cur_filer_idx] = fpr;
- ftp_rqfcr[priv->cur_filer_idx] = fcr;
+ priv->ftp_rqfpr[priv->cur_filer_idx] = fpr;
+ priv->ftp_rqfcr[priv->cur_filer_idx] = fcr;
gfar_write_filer(priv, priv->cur_filer_idx, fcr, fpr);
priv->cur_filer_idx = priv->cur_filer_idx - 1;
}
@@ -626,16 +626,16 @@ static void ethflow_to_filer_rules (struct gfar_private *priv, u64 ethflow)
fcr = RQFCR_PID_VID | RQFCR_CMP_NOMATCH | RQFCR_HASH |
RQFCR_AND | RQFCR_HASHTBL_0;
gfar_write_filer(priv, priv->cur_filer_idx, fcr, fpr);
- ftp_rqfpr[priv->cur_filer_idx] = fpr;
- ftp_rqfcr[priv->cur_filer_idx] = fcr;
+ priv->ftp_rqfpr[priv->cur_filer_idx] = fpr;
+ priv->ftp_rqfcr[priv->cur_filer_idx] = fcr;
priv->cur_filer_idx = priv->cur_filer_idx - 1;
}
if (ethflow & RXH_IP_SRC) {
fcr = RQFCR_PID_SIA | RQFCR_CMP_NOMATCH | RQFCR_HASH |
RQFCR_AND | RQFCR_HASHTBL_0;
- ftp_rqfpr[priv->cur_filer_idx] = fpr;
- ftp_rqfcr[priv->cur_filer_idx] = fcr;
+ priv->ftp_rqfpr[priv->cur_filer_idx] = fpr;
+ priv->ftp_rqfcr[priv->cur_filer_idx] = fcr;
gfar_write_filer(priv, priv->cur_filer_idx, fcr, fpr);
priv->cur_filer_idx = priv->cur_filer_idx - 1;
}
@@ -643,8 +643,8 @@ static void ethflow_to_filer_rules (struct gfar_private *priv, u64 ethflow)
if (ethflow & (RXH_IP_DST)) {
fcr = RQFCR_PID_DIA | RQFCR_CMP_NOMATCH | RQFCR_HASH |
RQFCR_AND | RQFCR_HASHTBL_0;
- ftp_rqfpr[priv->cur_filer_idx] = fpr;
- ftp_rqfcr[priv->cur_filer_idx] = fcr;
+ priv->ftp_rqfpr[priv->cur_filer_idx] = fpr;
+ priv->ftp_rqfcr[priv->cur_filer_idx] = fcr;
gfar_write_filer(priv, priv->cur_filer_idx, fcr, fpr);
priv->cur_filer_idx = priv->cur_filer_idx - 1;
}
@@ -652,8 +652,8 @@ static void ethflow_to_filer_rules (struct gfar_private *priv, u64 ethflow)
if (ethflow & RXH_L3_PROTO) {
fcr = RQFCR_PID_L4P | RQFCR_CMP_NOMATCH | RQFCR_HASH |
RQFCR_AND | RQFCR_HASHTBL_0;
- ftp_rqfpr[priv->cur_filer_idx] = fpr;
- ftp_rqfcr[priv->cur_filer_idx] = fcr;
+ priv->ftp_rqfpr[priv->cur_filer_idx] = fpr;
+ priv->ftp_rqfcr[priv->cur_filer_idx] = fcr;
gfar_write_filer(priv, priv->cur_filer_idx, fcr, fpr);
priv->cur_filer_idx = priv->cur_filer_idx - 1;
}
@@ -661,8 +661,8 @@ static void ethflow_to_filer_rules (struct gfar_private *priv, u64 ethflow)
if (ethflow & RXH_L4_B_0_1) {
fcr = RQFCR_PID_SPT | RQFCR_CMP_NOMATCH | RQFCR_HASH |
RQFCR_AND | RQFCR_HASHTBL_0;
- ftp_rqfpr[priv->cur_filer_idx] = fpr;
- ftp_rqfcr[priv->cur_filer_idx] = fcr;
+ priv->ftp_rqfpr[priv->cur_filer_idx] = fpr;
+ priv->ftp_rqfcr[priv->cur_filer_idx] = fcr;
gfar_write_filer(priv, priv->cur_filer_idx, fcr, fpr);
priv->cur_filer_idx = priv->cur_filer_idx - 1;
}
@@ -670,8 +670,8 @@ static void ethflow_to_filer_rules (struct gfar_private *priv, u64 ethflow)
if (ethflow & RXH_L4_B_2_3) {
fcr = RQFCR_PID_DPT | RQFCR_CMP_NOMATCH | RQFCR_HASH |
RQFCR_AND | RQFCR_HASHTBL_0;
- ftp_rqfpr[priv->cur_filer_idx] = fpr;
- ftp_rqfcr[priv->cur_filer_idx] = fcr;
+ priv->ftp_rqfpr[priv->cur_filer_idx] = fpr;
+ priv->ftp_rqfcr[priv->cur_filer_idx] = fcr;
gfar_write_filer(priv, priv->cur_filer_idx, fcr, fpr);
priv->cur_filer_idx = priv->cur_filer_idx - 1;
}
@@ -705,12 +705,12 @@ static int gfar_ethflow_to_filer_table(struct gfar_private *priv, u64 ethflow, u
}
for (i = 0; i < MAX_FILER_IDX + 1; i++) {
- local_rqfpr[j] = ftp_rqfpr[i];
- local_rqfcr[j] = ftp_rqfcr[i];
+ local_rqfpr[j] = priv->ftp_rqfpr[i];
+ local_rqfcr[j] = priv->ftp_rqfcr[i];
j--;
- if ((ftp_rqfcr[i] == (RQFCR_PID_PARSE |
+ if ((priv->ftp_rqfcr[i] == (RQFCR_PID_PARSE |
RQFCR_CLE |RQFCR_AND)) &&
- (ftp_rqfpr[i] == cmp_rqfpr))
+ (priv->ftp_rqfpr[i] == cmp_rqfpr))
break;
}
@@ -724,20 +724,22 @@ static int gfar_ethflow_to_filer_table(struct gfar_private *priv, u64 ethflow, u
* if it was already programmed, we need to overwrite these rules
*/
for (l = i+1; l < MAX_FILER_IDX; l++) {
- if ((ftp_rqfcr[l] & RQFCR_CLE) &&
- !(ftp_rqfcr[l] & RQFCR_AND)) {
- ftp_rqfcr[l] = RQFCR_CLE | RQFCR_CMP_EXACT |
+ if ((priv->ftp_rqfcr[l] & RQFCR_CLE) &&
+ !(priv->ftp_rqfcr[l] & RQFCR_AND)) {
+ priv->ftp_rqfcr[l] = RQFCR_CLE | RQFCR_CMP_EXACT |
RQFCR_HASHTBL_0 | RQFCR_PID_MASK;
- ftp_rqfpr[l] = FPR_FILER_MASK;
- gfar_write_filer(priv, l, ftp_rqfcr[l], ftp_rqfpr[l]);
+ priv->ftp_rqfpr[l] = FPR_FILER_MASK;
+ gfar_write_filer(priv, l, priv->ftp_rqfcr[l],
+ priv->ftp_rqfpr[l]);
break;
}
- if (!(ftp_rqfcr[l] & RQFCR_CLE) && (ftp_rqfcr[l] & RQFCR_AND))
+ if (!(priv->ftp_rqfcr[l] & RQFCR_CLE) &&
+ (priv->ftp_rqfcr[l] & RQFCR_AND))
continue;
else {
- local_rqfpr[j] = ftp_rqfpr[l];
- local_rqfcr[j] = ftp_rqfcr[l];
+ local_rqfpr[j] = priv->ftp_rqfpr[l];
+ local_rqfcr[j] = priv->ftp_rqfcr[l];
j--;
}
}
@@ -750,8 +752,8 @@ static int gfar_ethflow_to_filer_table(struct gfar_private *priv, u64 ethflow, u
/* Write back the popped out rules again */
for (k = j+1; k < MAX_FILER_IDX; k++) {
- ftp_rqfpr[priv->cur_filer_idx] = local_rqfpr[k];
- ftp_rqfcr[priv->cur_filer_idx] = local_rqfcr[k];
+ priv->ftp_rqfpr[priv->cur_filer_idx] = local_rqfpr[k];
+ priv->ftp_rqfcr[priv->cur_filer_idx] = local_rqfcr[k];
gfar_write_filer(priv, priv->cur_filer_idx,
local_rqfcr[k], local_rqfpr[k]);
if (!priv->cur_filer_idx)
--
1.5.6.5
^ permalink raw reply related
* [RFC][PATCH] kexec-tools: powerpc: Use the #address-cells information to parsememory/reg - V2
From: Suzuki Poulose @ 2011-06-08 6:38 UTC (permalink / raw)
To: Simon Horman
Cc: linux ppc dev, Sebastian Andrzej Siewior, David Laight,
kexec@lists.infradead.org
Hi,
This is version 2 of the patch
Changes from Version 1 :
: Changed the interface for read_memory_region_limits to use 'int fd'
instead of FILE*.
: Use sizeof(variable) for read(, instead of sizeof(type).
---
Fix parsing of the memory region information from the device-tree.
The format of memory/reg is based on the #address-cells,#size-cells. Currently,
the kexec-tools doesn't use the above values in parsing the memory/reg values.
Hence the kexec cannot handle cases where #address-cells, #size-cells are
different, (for e.g, PPC440X ).
This patch introduces a read_memory_region_limits(), which parses the
memory/reg contents based on the values of #address-cells and #size-cells.
Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
---
kexec/arch/ppc/crashdump-powerpc.c | 33 +------
kexec/arch/ppc/fs2dt.c | 14 ---
kexec/arch/ppc/kexec-ppc.c | 158 ++++++++++++++++++++++++++-----------
kexec/arch/ppc/kexec-ppc.h | 6 +
4 files changed, 129 insertions(+), 82 deletions(-)
Index: kexec-tools-2.0.4/kexec/arch/ppc/kexec-ppc.c
===================================================================
--- kexec-tools-2.0.4.orig/kexec/arch/ppc/kexec-ppc.c
+++ kexec-tools-2.0.4/kexec/arch/ppc/kexec-ppc.c
@@ -16,6 +16,7 @@
#include <dirent.h>
#include <stdlib.h>
#include <sys/stat.h>
+#include <fcntl.h>
#include <unistd.h>
#include "../../kexec.h"
@@ -26,6 +27,7 @@
#include "config.h"
+unsigned long dt_address_cells = 0, dt_size_cells = 0;
uint64_t rmo_top;
unsigned long long crash_base = 0, crash_size = 0;
unsigned long long initrd_base = 0, initrd_size = 0;
@@ -34,6 +36,98 @@ unsigned int rtas_base, rtas_size;
int max_memory_ranges;
const char *ramdisk;
+/*
+ * Reads the #address-cells and #size-cells on this platform.
+ * This is used to parse the memory/reg info from the device-tree
+ */
+int init_memory_region_info()
+{
+ size_t res = 0;
+ int fd;
+ char *file;
+
+ file = "/proc/device-tree/#address-cells";
+ fd = open(file, O_RDONLY);
+ if (fd < 0) {
+ fprintf(stderr, "Unable to open %s\n", file);
+ return -1;
+ }
+
+ res = read(fd, &dt_address_cells, sizeof(dt_address_cells));
+ if (res != sizeof(dt_address_cells)) {
+ fprintf(stderr, "Error reading %s\n", file);
+ return -1;
+ }
+ close(fd);
+
+ file = "/proc/device-tree/#size-cells";
+ fd = open(file, O_RDONLY);
+ if (fd < 0) {
+ fprintf(stderr, "Unable to open %s\n", file);
+ return -1;
+ }
+
+ res = read(fd, &dt_size_cells, sizeof(dt_size_cells));
+ if (res != sizeof(dt_size_cells)) {
+ fprintf(stderr, "Error reading %s\n", file);
+ return -1;
+ }
+ close(fd);
+
+ /* Convert the sizes into bytes */
+ dt_size_cells *= sizeof(unsigned long);
+ dt_address_cells *= sizeof(unsigned long);
+
+ return 0;
+}
+
+#define MAXBYTES 128
+/*
+ * Reads the memory region info from the device-tree node pointed
+ * by @fd and fills the *start, *end with the boundaries of the region
+ */
+int read_memory_region_limits(int fd, unsigned long long *start,
+ unsigned long long *end)
+{
+ char buf[MAXBYTES];
+ unsigned long *p;
+ unsigned long nbytes = dt_address_cells + dt_size_cells;
+
+ if (lseek(fd, 0, SEEK_SET) == -1) {
+ fprintf(stderr, "Error in file seek\n");
+ return -1;
+ }
+ if (read(fd, buf, nbytes) != nbytes) {
+ fprintf(stderr, "Error reading the memory region info\n");
+ return -1;
+ }
+
+ p = (unsigned long*)buf;
+ if (dt_address_cells == sizeof(unsigned long)) {
+ *start = p[0];
+ p++;
+ } else if (dt_address_cells == sizeof(unsigned long long)) {
+ *start = ((unsigned long long *)p)[0];
+ p = (unsigned long long *)p + 1;
+ } else {
+ fprintf(stderr, "Unsupported value for #address-cells : %ld\n",
+ dt_address_cells);
+ return -1;
+ }
+
+ if (dt_size_cells == sizeof(unsigned long))
+ *end = *start + p[0];
+ else if (dt_size_cells == sizeof(unsigned long long))
+ *end = *start + ((unsigned long long *)p)[0];
+ else {
+ fprintf(stderr, "Unsupported value for #size-cells : %ld\n",
+ dt_size_cells);
+ return -1;
+ }
+
+ return 0;
+}
+
void arch_reuse_initrd(void)
{
reuse_initrd = 1;
@@ -182,9 +276,6 @@ static int sort_base_ranges(void)
return 0;
}
-
-#define MAXBYTES 128
-
static int realloc_memory_ranges(void)
{
size_t memory_range_len;
@@ -228,9 +319,8 @@ static int get_base_ranges(void)
char fname[256];
char buf[MAXBYTES];
DIR *dir, *dmem;
- FILE *file;
struct dirent *dentry, *mentry;
- int n;
+ int n, fd;
if ((dir = opendir(device_tree)) == NULL) {
perror(device_tree);
@@ -248,54 +338,39 @@ static int get_base_ranges(void)
return -1;
}
while ((mentry = readdir(dmem)) != NULL) {
+ unsigned long long start, end;
+
if (strcmp(mentry->d_name, "reg"))
continue;
strcat(fname, "/reg");
- if ((file = fopen(fname, "r")) == NULL) {
+ if ((fd = open(fname, O_RDONLY)) < 0) {
perror(fname);
closedir(dmem);
closedir(dir);
return -1;
}
- if ((n = fread(buf, 1, MAXBYTES, file)) < 0) {
- perror(fname);
- fclose(file);
+ if (read_memory_region_limits(fd, &start, &end) != 0) {
+ close(fd);
closedir(dmem);
closedir(dir);
return -1;
}
if (local_memory_ranges >= max_memory_ranges) {
if (realloc_memory_ranges() < 0){
- fclose(file);
+ close(fd);
break;
}
}
- if (n == sizeof(uint32_t) * 2) {
- base_memory_range[local_memory_ranges].start =
- ((uint32_t *)buf)[0];
- base_memory_range[local_memory_ranges].end =
- base_memory_range[local_memory_ranges].start +
- ((uint32_t *)buf)[1];
- }
- else if (n == sizeof(uint64_t) * 2) {
- base_memory_range[local_memory_ranges].start =
- ((uint64_t *)buf)[0];
- base_memory_range[local_memory_ranges].end =
- base_memory_range[local_memory_ranges].start +
- ((uint64_t *)buf)[1];
- }
- else {
- fprintf(stderr, "Mem node has invalid size: %d\n", n);
- return -1;
- }
+ base_memory_range[local_memory_ranges].start = start;
+ base_memory_range[local_memory_ranges].end = end;
base_memory_range[local_memory_ranges].type = RANGE_RAM;
local_memory_ranges++;
dbgprintf("%016llx-%016llx : %x\n",
base_memory_range[local_memory_ranges-1].start,
base_memory_range[local_memory_ranges-1].end,
base_memory_range[local_memory_ranges-1].type);
- fclose(file);
+ close(fd);
}
closedir(dmem);
}
@@ -572,29 +647,19 @@ static int get_devtree_details(unsigned
if (!strncmp(dentry->d_name, "memory@", 7) ||
!strcmp(dentry->d_name, "memory")) {
+ int fd;
strcat(fname, "/reg");
- if ((file = fopen(fname, "r")) == NULL) {
+ if ((fd = open(fname, O_RDONLY)) < 0) {
perror(fname);
goto error_opencdir;
}
- if ((n = fread(buf, 1, MAXBYTES, file)) < 0) {
- perror(fname);
- goto error_openfile;
- }
- if (n == sizeof(uint64_t)) {
- rmo_base = ((uint32_t *)buf)[0];
- rmo_top = rmo_base + ((uint32_t *)buf)[1];
- } else if (n == 16) {
- rmo_base = ((uint64_t *)buf)[0];
- rmo_top = rmo_base + ((uint64_t *)buf)[1];
- } else {
- fprintf(stderr, "Mem node has invalid size: %d\n", n);
+ if (read_memory_region_limits(fd, &rmo_base, &rmo_top) != 0)
goto error_openfile;
- }
+
if (rmo_top > 0x30000000UL)
rmo_top = 0x30000000UL;
- fclose(file);
+ close(fd);
closedir(cdir);
} /* memory */
@@ -778,6 +843,11 @@ int get_memory_ranges_dt(struct memory_r
int get_memory_ranges(struct memory_range **range, int *ranges,
unsigned long kexec_flags)
{
+ int res = 0;
+
+ res = init_memory_region_info();
+ if (res != 0)
+ return res;
#ifdef WITH_GAMECUBE
return get_memory_ranges_gc(range, ranges, kexec_flags);
#else
Index: kexec-tools-2.0.4/kexec/arch/ppc/kexec-ppc.h
===================================================================
--- kexec-tools-2.0.4.orig/kexec/arch/ppc/kexec-ppc.h
+++ kexec-tools-2.0.4/kexec/arch/ppc/kexec-ppc.h
@@ -69,6 +69,12 @@ extern unsigned long long initrd_base, i
extern unsigned long long ramdisk_base, ramdisk_size;
extern unsigned char reuse_initrd;
extern const char *ramdisk;
+
+/* Method to parse the memory/reg nodes in device-tree */
+extern unsigned long dt_address_cells, dt_size_cells;
+extern int init_memory_region_info(void);
+extern int read_memory_region_limits(int fd, unsigned long long *start,
+ unsigned long long *end);
#define COMMAND_LINE_SIZE 512 /* from kernel */
/*fs2dt*/
void reserve(unsigned long long where, unsigned long long length);
Index: kexec-tools-2.0.4/kexec/arch/ppc/crashdump-powerpc.c
===================================================================
--- kexec-tools-2.0.4.orig/kexec/arch/ppc/crashdump-powerpc.c
+++ kexec-tools-2.0.4/kexec/arch/ppc/crashdump-powerpc.c
@@ -81,7 +81,7 @@ static int get_crash_memory_ranges(struc
char fname[256];
char buf[MAXBYTES];
DIR *dir, *dmem;
- FILE *file;
+ int fd;
struct dirent *dentry, *mentry;
int i, n, crash_rng_len = 0;
unsigned long long start, end, cstart, cend;
@@ -123,17 +123,16 @@ static int get_crash_memory_ranges(struc
if (strcmp(mentry->d_name, "reg"))
continue;
strcat(fname, "/reg");
- file = fopen(fname, "r");
- if (!file) {
+ fd = open(fname, O_RDONLY);
+ if (fd < 0) {
perror(fname);
closedir(dmem);
closedir(dir);
goto err;
}
- n = fread(buf, 1, MAXBYTES, file);
- if (n < 0) {
- perror(fname);
- fclose(file);
+ n = read_memory_region_limits(fd, &start, &end);
+ if (n != 0) {
+ close(fd);
closedir(dmem);
closedir(dir);
goto err;
@@ -146,24 +145,6 @@ static int get_crash_memory_ranges(struc
goto err;
}
- /*
- * FIXME: This code fails on platforms that
- * have more than one memory range specified
- * in the device-tree's /memory/reg property.
- * or where the #address-cells and #size-cells
- * are not identical.
- *
- * We should interpret the /memory/reg property
- * based on the values of the #address-cells and
- * #size-cells properites.
- */
- if (n == (sizeof(unsigned long) * 2)) {
- start = ((unsigned long *)buf)[0];
- end = start + ((unsigned long *)buf)[1];
- } else {
- start = ((unsigned long long *)buf)[0];
- end = start + ((unsigned long long *)buf)[1];
- }
if (start == 0 && end >= (BACKUP_SRC_END + 1))
start = BACKUP_SRC_END + 1;
@@ -212,7 +193,7 @@ static int get_crash_memory_ranges(struc
= RANGE_RAM;
memory_ranges++;
}
- fclose(file);
+ close(fd);
}
closedir(dmem);
}
Index: kexec-tools-2.0.4/kexec/arch/ppc/fs2dt.c
===================================================================
--- kexec-tools-2.0.4.orig/kexec/arch/ppc/fs2dt.c
+++ kexec-tools-2.0.4/kexec/arch/ppc/fs2dt.c
@@ -137,21 +137,11 @@ static void add_usable_mem_property(int
if (strncmp(bname, "/memory@", 8) && strcmp(bname, "/memory"))
return;
- if (len < 2 * sizeof(unsigned long))
- die("unrecoverable error: not enough data for mem property\n");
- len = 2 * sizeof(unsigned long);
-
if (lseek(fd, 0, SEEK_SET) < 0)
die("unrecoverable error: error seeking in \"%s\": %s\n",
pathname, strerror(errno));
- if (read(fd, buf, len) != len)
- die("unrecoverable error: error reading \"%s\": %s\n",
- pathname, strerror(errno));
-
- if (~0ULL - buf[0] < buf[1])
- die("unrecoverable error: mem property overflow\n");
- base = buf[0];
- end = base + buf[1];
+ if (read_memory_region_limits(fd, &base, &end) != 0)
+ die("unrecoverable error: error parsing memory/reg limits\n");
for (range = 0; range < usablemem_rgns.size; range++) {
loc_base = usablemem_rgns.ranges[range].start;
^ permalink raw reply
* Re: [PATCH] [RFC][V3] bluegene: add entry to cpu table
From: Michael Neuling @ 2011-06-08 5:37 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, linux-kernel, bg-linux
In-Reply-To: <1307494181.2874.214.camel@pasglop>
> > Create an entry for the BG/P chips, include bits to accomodate
> > the double fp2 fpu and the special MMU considerations like L1
> > writethrough.
> >
> > RFC Note: this patch fails scripts/checkpatch.pl because I
> > matched coding style of the surrounding existing code. Would
> > you rather have something checkpatch.pl clean or something
> > which is consistent with the surrounding code style?
>
> Stay consistent. You're welcome to my next checkpatch burning
> ceremony :-)
I'll make t-shirts for us!
> > #define PPC_FEATURE_POWER6_EXT 0x00000200
> > #define PPC_FEATURE_ARCH_2_06 0x00000100
> > #define PPC_FEATURE_HAS_VSX 0x00000080
> > +#define PPC_FEATURE_HAS_FPU_FP2 0x00000040
>
> Any chance for a better name ?
I've painted this particular bike shed with Eric already :-)
It's an extension to the FPU called FP2
(https://wiki.alcf.anl.gov/images/d/d9/PPC440_FP2_arch.pdf).
So PPC_FEATURE_HAS_FPU -> PPC_FEATURE_HAS_FPU_FP2.
I think the name is right as he has it.
Mikey
^ permalink raw reply
* Re: Hooking up SM501 on TQM5200 (MPC5200) board via device tree?
From: Josh Triplett @ 2011-06-08 3:09 UTC (permalink / raw)
To: Grant Likely; +Cc: devicetree-discuss, linuxppc-dev, linux-kernel, Jamey Sharp
In-Reply-To: <20110603205138.GI17972@ponder.secretlab.ca>
On Fri, Jun 03, 2011 at 02:51:38PM -0600, Grant Likely wrote:
> On Tue, May 31, 2011 at 10:07:01PM -0700, Josh Triplett wrote:
> > We have a TQM5200 board, which has GPIO lines hooked up to an SM501.
> > I've managed to come up with the following patch to the tqm5200 device
> > tree, which manages to convince the sm501 driver to attach an sm501-fb:
> >
> > --- a/arch/powerpc/boot/dts/tqm5200.dts 2009-11-23 03:13:27.000000000 -0800
> > +++ b/arch/powerpc/boot/dts/tqm5200.dts 2011-05-31 22:00:28.000580627 -0700
> > @@ -177,7 +177,8 @@
> > compatible = "fsl,mpc5200-lpb","simple-bus";
> > #address-cells = <2>;
> > #size-cells = <1>;
> > - ranges = <0 0 0xfc000000 0x02000000>;
> > + ranges = <0 0 0xfc000000 0x02000000
> > + 1 0 0xe0000000 0x04000000>;
> >
> > flash@0,0 {
> > compatible = "cfi-flash";
> > @@ -187,6 +188,13 @@
> > #size-cells = <1>;
> > #address-cells = <1>;
> > };
> > +
> > + display@1,0 {
> > + compatible = "smi,sm501";
> > + reg = <1 0x00000000 0x00800000
> > + 1 0x03e00000 0x00200000>;
> > + interrupts = <1 1 3>;
> > + };
> > };
> >
> > pci@f0000d00 {
> >
> >
> > However, this doesn't hook up the sm501-gpio bits. Reading the sm501
> > driver carefully, it looks like it only hooks up sm501-gpio if it has
> > platform_data available which sets some flags and other information.
> > So, if I understand correctly, hooking up sm501-gpio would require
> > adding functionality the driver to get the GPIO information from the
> > device tree in preference to the platform_data, if available, and fall
> > back to the platform_data for existing users?
> >
> > What should the necessary device tree properties look like to replace
> > sm501_initdata?
> >
> > - Josh Triplett
>
> You need to look at Documentation/devicetree/bindings/gpio/gpio.txt.
>
> Also, you need to add bits to the sm501-gpio driver to register a
> dynamically allocated range of gpio pins and to populate the
> gpiochip->of_node pointer. If that points at a device tree node, then
> the core code will take care of setting up translation for you.
What would the resulting device tree look like, given that the sm501
driver handles several different types of devices? sm501 doesn't have a
separate gpio driver; it just has a flag to enable GPIO. The GPIO
controller also doesn't have independent resources; as far as I can
tell, the sm501 knows everything it needs to know in order to drive the
gpio, except the boolean presence or absence of gpio.
Currently, to get the sm501 hooked up at all, I wrote this:
display@1,0 {
compatible = "smi,sm501";
reg = <1 0x00000000 0x00800000
1 0x03e00000 0x00200000>;
interrupts = <1 1 3>;
};
To hook up the gpio, would I nest an entire gpio@... { ...;
gpio-controller; } stanza inside that or next to that, or just add the
gpio-controller; and #gpio-cells lines inside the existing stanza for
the sm501?
Does the gpio@ bit have any semantic significance, or just the
gpio-controller line and the compatible line?
- Josh Triplett
^ permalink raw reply
* [PATCH] powerpc/85xx:DTS: Fix tbi node location for Px020RDB
From: Prabhakar Kushwaha @ 2011-06-08 2:49 UTC (permalink / raw)
To: linuxppc-dev, devicetree-discuss; +Cc: meet2prabhu, Prabhakar Kushwaha
ten-bit interface (TBI) module is part of SoC not board.
Move tbi entries from board related dts files to Si dts.
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
Based upon http://git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc.git (branch next)
arch/powerpc/boot/dts/p1020rdb.dts | 9 ---------
arch/powerpc/boot/dts/p1020rdb_camp_core0.dts | 8 --------
arch/powerpc/boot/dts/p1020si.dtsi | 6 +++++-
arch/powerpc/boot/dts/p2020rdb.dts | 8 --------
arch/powerpc/boot/dts/p2020rdb_camp_core0.dts | 8 --------
arch/powerpc/boot/dts/p2020si.dtsi | 6 +++++-
6 files changed, 10 insertions(+), 35 deletions(-)
diff --git a/arch/powerpc/boot/dts/p1020rdb.dts b/arch/powerpc/boot/dts/p1020rdb.dts
index d6a8ae4..a4e5d6c 100644
--- a/arch/powerpc/boot/dts/p1020rdb.dts
+++ b/arch/powerpc/boot/dts/p1020rdb.dts
@@ -211,14 +211,6 @@
};
};
- mdio@25000 {
-
- tbi0: tbi-phy@11 {
- reg = <0x11>;
- device_type = "tbi-phy";
- };
- };
-
enet0: ethernet@b0000 {
fixed-link = <1 1 1000 0 0>;
phy-connection-type = "rgmii-id";
@@ -227,7 +219,6 @@
enet1: ethernet@b1000 {
phy-handle = <&phy0>;
- tbi-handle = <&tbi0>;
phy-connection-type = "sgmii";
};
diff --git a/arch/powerpc/boot/dts/p1020rdb_camp_core0.dts b/arch/powerpc/boot/dts/p1020rdb_camp_core0.dts
index f0bf7f4..abab234 100644
--- a/arch/powerpc/boot/dts/p1020rdb_camp_core0.dts
+++ b/arch/powerpc/boot/dts/p1020rdb_camp_core0.dts
@@ -114,20 +114,12 @@
};
};
- mdio@25000 {
- tbi0: tbi-phy@11 {
- reg = <0x11>;
- device_type = "tbi-phy";
- };
- };
-
enet0: ethernet@b0000 {
status = "disabled";
};
enet1: ethernet@b1000 {
phy-handle = <&phy0>;
- tbi-handle = <&tbi0>;
phy-connection-type = "sgmii";
};
diff --git a/arch/powerpc/boot/dts/p1020si.dtsi b/arch/powerpc/boot/dts/p1020si.dtsi
index 5c5acb6..7844d2e 100644
--- a/arch/powerpc/boot/dts/p1020si.dtsi
+++ b/arch/powerpc/boot/dts/p1020si.dtsi
@@ -190,7 +190,10 @@
#size-cells = <0>;
compatible = "fsl,etsec2-tbi";
reg = <0x25000 0x1000 0xb1030 0x4>;
-
+ tbi0: tbi-phy@11 {
+ reg = <0x11>;
+ device_type = "tbi-phy";
+ };
};
enet0: ethernet@b0000 {
@@ -229,6 +232,7 @@
fsl,num_tx_queues = <0x8>;
local-mac-address = [ 00 00 00 00 00 00 ];
interrupt-parent = <&mpic>;
+ tbi-handle = <&tbi0>;
queue-group@0 {
#address-cells = <1>;
diff --git a/arch/powerpc/boot/dts/p2020rdb.dts b/arch/powerpc/boot/dts/p2020rdb.dts
index 1d7a05f..2941cbb 100644
--- a/arch/powerpc/boot/dts/p2020rdb.dts
+++ b/arch/powerpc/boot/dts/p2020rdb.dts
@@ -213,13 +213,6 @@
};
};
- mdio@25520 {
- tbi0: tbi-phy@11 {
- reg = <0x11>;
- device_type = "tbi-phy";
- };
- };
-
mdio@26520 {
status = "disabled";
};
@@ -243,7 +236,6 @@
};
enet1: ethernet@25000 {
- tbi-handle = <&tbi0>;
phy-handle = <&phy0>;
phy-connection-type = "sgmii";
};
diff --git a/arch/powerpc/boot/dts/p2020rdb_camp_core0.dts b/arch/powerpc/boot/dts/p2020rdb_camp_core0.dts
index fc8dddd..4641f3d 100644
--- a/arch/powerpc/boot/dts/p2020rdb_camp_core0.dts
+++ b/arch/powerpc/boot/dts/p2020rdb_camp_core0.dts
@@ -124,13 +124,6 @@
};
};
- mdio@25520 {
- tbi0: tbi-phy@11 {
- reg = <0x11>;
- device_type = "tbi-phy";
- };
- };
-
mdio@26520 {
status = "disabled";
};
@@ -140,7 +133,6 @@
};
enet1: ethernet@25000 {
- tbi-handle = <&tbi0>;
phy-handle = <&phy0>;
phy-connection-type = "sgmii";
diff --git a/arch/powerpc/boot/dts/p2020si.dtsi b/arch/powerpc/boot/dts/p2020si.dtsi
index 6def17f..9317075 100644
--- a/arch/powerpc/boot/dts/p2020si.dtsi
+++ b/arch/powerpc/boot/dts/p2020si.dtsi
@@ -235,6 +235,10 @@
#size-cells = <0>;
compatible = "fsl,gianfar-tbi";
reg = <0x26520 0x20>;
+ tbi0: tbi-phy@11 {
+ reg = <0x11>;
+ device_type = "tbi-phy";
+ };
};
mdio@26520 {
@@ -270,7 +274,7 @@
local-mac-address = [ 00 00 00 00 00 00 ];
interrupts = <35 2 36 2 40 2>;
interrupt-parent = <&mpic>;
-
+ tbi-handle = <&tbi0>;
};
enet2: ethernet@26000 {
--
1.7.3
^ permalink raw reply related
* Re: [PATCH] [RFC][V3] bluegene: add entry to cpu table
From: Eric Van Hensbergen @ 2011-06-08 2:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, linux-kernel, bg-linux
In-Reply-To: <1307494181.2874.214.camel@pasglop>
>> +#define PPC_FEATURE_HAS_FPU_FP2 =A0 =A0 =A0 =A0 =A0 =A0 =A00x00000040
>
> Any chance for a better name ?
>
That's the official external name, it sucks. I'm happy to
PPC_FEATURE_DOUBLE_HUMMER if you'd prefer, otherwise I'm not feeling
too creative, but am open to artistic suggestions.
-eric
^ permalink raw reply
* Re: [PATCH] [RFC][V3] bluegene: add entry to cpu table
From: Benjamin Herrenschmidt @ 2011-06-08 0:49 UTC (permalink / raw)
To: Eric Van Hensbergen; +Cc: linuxppc-dev, linux-kernel, bg-linux
In-Reply-To: <1307472447-1656-1-git-send-email-ericvh@gmail.com>
On Tue, 2011-06-07 at 13:47 -0500, Eric Van Hensbergen wrote:
> Create an entry for the BG/P chips, include bits to accomodate
> the double fp2 fpu and the special MMU considerations like L1
> writethrough.
>
> RFC Note: this patch fails scripts/checkpatch.pl because I
> matched coding style of the surrounding existing code. Would
> you rather have something checkpatch.pl clean or something
> which is consistent with the surrounding code style?
Stay consistent. You're welcome to my next checkpatch burning
ceremony :-)
> The three I got were:
> ERROR: Macros with complex values should be enclosed in parenthesis
> ERROR: "foo* bar" should be "foo *bar"
> WARNING: externs should be avoided in .c files
> and I got these by copying other code as an example.
>
> Thanks for any feedback.
>
> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
> ---
> arch/powerpc/include/asm/cputable.h | 1 +
> arch/powerpc/include/asm/mmu.h | 9 +++++++++
> arch/powerpc/kernel/cpu_setup_44x.S | 1 +
> arch/powerpc/kernel/cputable.c | 16 ++++++++++++++++
> 4 files changed, 27 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
> index c0d842c..ce709b5 100644
> --- a/arch/powerpc/include/asm/cputable.h
> +++ b/arch/powerpc/include/asm/cputable.h
> @@ -26,6 +26,7 @@
> #define PPC_FEATURE_POWER6_EXT 0x00000200
> #define PPC_FEATURE_ARCH_2_06 0x00000100
> #define PPC_FEATURE_HAS_VSX 0x00000080
> +#define PPC_FEATURE_HAS_FPU_FP2 0x00000040
Any chance for a better name ?
> #define PPC_FEATURE_PSERIES_PERFMON_COMPAT \
> 0x00000040
> diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
> index 4138b21..97f9502 100644
> --- a/arch/powerpc/include/asm/mmu.h
> +++ b/arch/powerpc/include/asm/mmu.h
> @@ -56,6 +56,13 @@
> */
> #define MMU_FTR_NEED_DTLB_SW_LRU ASM_CONST(0x00200000)
>
> +/* This indicates that the processor must using writethrough with
> + * the L1 in order to maintain SMP coherence on systems like the
> + * IBM BlueGene/L and IBM BlueGene/P.
> + */
> +
> +#define MMU_FTR_NEED_L1_WRITETHROUGH ASM_CONST(0x00400000)
> +
> /* Enable use of TLB reservation. Processor should support tlbsrx.
> * instruction and MAS0[WQ].
> */
> @@ -112,6 +119,8 @@
> MMU_FTR_USE_PAIRED_MAS | \
> MMU_FTR_TLBIEL | \
> MMU_FTR_16M_PAGE
> +#define MMU_FTRS_BGP MMU_FTR_TYPE_44x | MMU_FTR_16M_PAGE | \
> + MMU_FTR_NEED_L1_WRITETHROUGH
Time to add the ALWAYS/NEVER trick to mmu features like CPU feature has
> #ifndef __ASSEMBLY__
> #include <asm/cputable.h>
>
> diff --git a/arch/powerpc/kernel/cpu_setup_44x.S b/arch/powerpc/kernel/cpu_setup_44x.S
> index e32b4a9..920aed6 100644
> --- a/arch/powerpc/kernel/cpu_setup_44x.S
> +++ b/arch/powerpc/kernel/cpu_setup_44x.S
> @@ -35,6 +35,7 @@ _GLOBAL(__setup_cpu_440grx)
> _GLOBAL(__setup_cpu_460ex)
> _GLOBAL(__setup_cpu_460gt)
> _GLOBAL(__setup_cpu_460sx)
> +_GLOBAL(__setup_cpu_bgp)
> _GLOBAL(__setup_cpu_apm821xx)
> mflr r4
> bl __init_fpu_44x
> diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
> index 34d2722..550a078 100644
> --- a/arch/powerpc/kernel/cputable.c
> +++ b/arch/powerpc/kernel/cputable.c
> @@ -57,6 +57,7 @@ extern void __setup_cpu_750fx(unsigned long offset, struct cpu_spec* spec);
> extern void __setup_cpu_7400(unsigned long offset, struct cpu_spec* spec);
> extern void __setup_cpu_7410(unsigned long offset, struct cpu_spec* spec);
> extern void __setup_cpu_745x(unsigned long offset, struct cpu_spec* spec);
> +extern void __setup_cpu_bgp(unsigned long offset, struct cpu_spec* spec);
> #endif /* CONFIG_PPC32 */
> #ifdef CONFIG_PPC64
> extern void __setup_cpu_ppc970(unsigned long offset, struct cpu_spec* spec);
> @@ -1737,6 +1738,21 @@ static struct cpu_spec __initdata cpu_specs[] = {
> .machine_check = machine_check_440A,
> .platform = "ppc440",
> },
> + { /* Blue Gene/P */
> + .pvr_mask = 0xfffffff0,
> + .pvr_value = 0x52131880,
> + .cpu_name = "450 Blue Gene/P",
> + .cpu_features = CPU_FTRS_440x6,
> + .cpu_user_features = COMMON_USER_BOOKE |
> + PPC_FEATURE_HAS_FPU |
> + PPC_FEATURE_HAS_FPU_FP2,
> + .mmu_features = MMU_FTRS_BGP,
> + .icache_bsize = 32,
> + .dcache_bsize = 32,
> + .cpu_setup = __setup_cpu_bgp,
> + .machine_check = machine_check_440A,
> + .platform = "ppc440",
> + },
> { /* 460EX */
> .pvr_mask = 0xffff0006,
> .pvr_value = 0x13020002,
Cheers,
Ben.
> --
> 1.7.4.1
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply
* Re: [PATCH] [RFC][V3] bluegene: use MMU feature flag to conditionalize L1 writethrough code
From: Benjamin Herrenschmidt @ 2011-06-08 0:47 UTC (permalink / raw)
To: Eric Van Hensbergen; +Cc: linuxppc-dev, linux-kernel, bg-linux
In-Reply-To: <1307482573-25440-1-git-send-email-ericvh@gmail.com>
On Tue, 2011-06-07 at 16:36 -0500, Eric Van Hensbergen wrote:
> BG/P nodes need to be configured for writethrough to work in SMP
> configurations. This patch adds the right hooks in the MMU code
> to make sure BGP_L1_WRITETHROUGH configurations are setup for BG/P.
Ok so getting better, some comments tho :-)
> RFC note: this essentially just changes the ifdefs to use the
> BEGIN_MMU_FTR_SECTION macros. A couple of things that I really didn't
> like about this:
> a) we introduced at least one extra op that isn't needed to get around
> otherwise having multiple labels
Not 100% which one you are talking about but that's fixable, I'll put
comments inline in the code. Note that you can nest feature sections
using the _NESTED variants and you can use "alternates" to replace
sequences of code.
> b) we are introducting a bunch of no-ops in places that could be critical
> paths and jimix says this may not be the best thing for multiple reasons
> including having no-ops around the DCBZs is a bad thing
Right, best to use code sequence replacement with the "alternate"
variants, see below
> c) the ELSE_MMU_FTR_SECTION stuff appears to be broken (or I don't know
> how to use it, it gave me the error:
> Error: non-constant expression in ".if" statement
> so I switched out the else clauses with redundant FTR_SECTIONS
> (one for IFSET and on for IFCLR). Please someone throw me a clue
> as to what I was doing wrong. I'm running gcc 4.3.2 from crosstools-ng
> if it has some sort of impact.
Hrm, not sure how you tried to use it, it should work, but you need to
use the "ALT" variants.
> Jimix has thrown me some code to try and do a better job by branching
> to stub code inside of the MMU_FTRs so I don't have so many no-ops. I'm
> open to alternatives. jimix also suggested changing NEED_L1_WRITETHROUGH
> to DCBZ_BROKEN, which I'm open to if you think appropriate, or maybe
> DCBZ_BROKEN_DAMNIT would be more apt.
:-)
I think NEED_L1_WRITETHROUGH isn't great since we are dealing with more
than just that here. Let's call it 44x_SMP since afaik, all
implementations, whether it's BG or other variants of the same hack
(AMCC/APM has one too) need the same stuff here, no ?
Let's not use more than one feature bit, it's not needed in practice, a
better name is all we need. Might even call it MMU_FTR_BLUEGENE_44x_SMP
if you want.
I'll add comments inline:
> #define PPC44x_MMUCR_TID 0x000000ff
> #define PPC44x_MMUCR_STS 0x00010000
> +#define PPC44x_MMUCR_U2 0x00200000
Please document in a comment what is the effect of U2 on the BG/P ASIC
caches.
> #define PPC44x_TLB_PAGEID 0
> #define PPC44x_TLB_XLAT 1
> @@ -32,6 +33,7 @@
>
> /* Storage attribute and access control fields */
> #define PPC44x_TLB_ATTR_MASK 0x0000ff80
> +#define PPC44x_TLB_WL1 0x00100000 /* Write-through L1 */
> #define PPC44x_TLB_U0 0x00008000 /* User 0 */
> #define PPC44x_TLB_U1 0x00004000 /* User 1 */
> #define PPC44x_TLB_U2 0x00002000 /* User 2 */
> diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
> index 5e12b74..9a9a4ee 100644
> --- a/arch/powerpc/kernel/head_44x.S
> +++ b/arch/powerpc/kernel/head_44x.S
> @@ -429,7 +429,17 @@ finish_tlb_load_44x:
> andi. r10,r12,_PAGE_USER /* User page ? */
> beq 1f /* nope, leave U bits empty */
> rlwimi r11,r11,3,26,28 /* yes, copy S bits to U */
> -1: tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */
> +1:
> +BEGIN_MMU_FTR_SECTION
> + andi. r10, r11, PPC44x_TLB_I
> + bne 2f
> + oris r11,r11,PPC44x_TLB_WL1@h /* Add coherency for */
> + /* non-inhibited */
> + ori r11,r11,PPC44x_TLB_U2|PPC44x_TLB_M
> +END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH)
That will do for now, tho it does add 4 nops in the normal case, we can
look at putting it out of line etc.. later. However, it would be
generally better if we could instead have those bits in the PTE.
We can look at doing that as a later optimisation. For example, we could
define _PAGE_COHERENT as a variable when BGP support is enabled, and
initialize it to contain the U2 and WL1 bits.
Something like this in pte-44x.h
#ifdef CONFIG_PPC_BLUEGENE_SMP
#ifndef __ASSEMBLY__
extern unsigned int _page_coherent;
#define _PAGE_COHERENT _page_coherent
#define __PAGE_COHERENT 0x200
#define __PAGE_U2 0x2000
#endif
#else
#define _PAGE_COHERENT 0x200
#endif
And somewhere in 44x-mmu.c:MMU_init_hw()
if (mmu_has_feature(....))
_page_coherent = __PAGE_COHERENT | _PAGE_U2;
You still need -one- single instruction in the TLB code to
rlwimi either bit into WL1 (why the heck did the HW need that many bits
for the same thing ?) because WL1 falls outside of the current attrib
mask and it would be more work to actually sort that out, but that's
a single instruction, and it can still be inside the feature section.
That has the advantage of removing the conditional on I from the hot
path as well.
> +2:
> + tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */
>
> /* Done...restore registers and get out of here.
> */
> @@ -799,7 +809,12 @@ skpinv: addi r4,r4,1 /* Increment */
> sync
>
> /* Initialize MMUCR */
> +BEGIN_MMU_FTR_SECTION
> + lis r5, PPC44x_MMUCR_U2@h
> +END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH)
> +BEGIN_MMU_FTR_SECTION
> li r5,0
> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
> mtspr SPRN_MMUCR,r5
> sync
Use an alternate, something like:
BEGIN_MMU_FTR_SECTION
lis r5, PPC44x_MMUCR_U2@h
MMU_FTR_SECTION_ELSE
li r5,0
ALT_END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH)
BTW. Care to explain to me why you have U2 -both- in the arguments to
tlbwe and in MMUCR ? That doesn't look right to me... which one is used
where and when ?
> @@ -814,7 +829,15 @@ skpinv: addi r4,r4,1 /* Increment */
> /* attrib fields */
> /* Added guarded bit to protect against speculative loads/stores */
> li r5,0
> - ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G)
> +BEGIN_MMU_FTR_SECTION
> + ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
> + PPC44x_TLB_G | PPC44x_TLB_U2)
> + oris r5,r5,PPC44x_TLB_WL1@h
> +END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH)
> +BEGIN_MMU_FTR_SECTION
> + ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
> + PPC44x_TLB_G)
> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
>
> li r0,63 /* TLB slot 63 */
This isn't going to work. This happens before the CPU feature bits are
established.
I see two ways out of that dilemna:
- One is you find a way to identify the BG case at runtime from that
very early asm code. It's a bit tricky since we never added the MMU type
information to the device-tree blob header (but we're adding it to ePAPR
via a register iirc, so we could hijack that), or maybe via inspecting
what the FW left behind in the TLB...
- Another one is to leave the stuff as-is, and "fixup" the TLB entry
from MMU_init_hw(). At that point, we haven't started the secondary CPU
yet anyways, tho we'd have to make sure we flush the cache before we
"switch" the TLB entry to make sure all changes we made are visible
before we change the cache setting.
> diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
> index 998a100..b54e2e8 100644
> --- a/arch/powerpc/kernel/misc_32.S
> +++ b/arch/powerpc/kernel/misc_32.S
> @@ -506,7 +506,27 @@ _GLOBAL(clear_pages)
> li r0,PAGE_SIZE/L1_CACHE_BYTES
> slw r0,r0,r4
> mtctr r0
> -1: dcbz 0,r3
> + li r4, 0
> +1:
> +BEGIN_MMU_FTR_SECTION
> + /* assuming 32 byte cacheline */
> + stw r4, 0(r3)
> + stw r4, 4(r3)
> + stw r4, 8(r3)
> + stw r4, 12(r3)
> + stw r4, 16(r3)
> + stw r4, 20(r3)
> + stw r4, 24(r3)
> + stw r4, 28(r3)
> +END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH)
> +/*
> + * would have used an ELSE_MMU_FTR_SECTION here but it
> + * broke the code with Error: non-constant expression in ".if" statement
> + *
> + */
> +BEGIN_MMU_FTR_SECTION
> + dcbz 0,r3
> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
> addi r3,r3,L1_CACHE_BYTES
> bdnz 1b
> blr
> @@ -550,7 +570,9 @@ _GLOBAL(copy_page)
> mtctr r0
> 1:
> dcbt r11,r4
> +BEGIN_MMU_FTR_SECTION
> dcbz r5,r3
> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
> COPY_16_BYTES
> #if L1_CACHE_BYTES >= 32
> COPY_16_BYTES
Instead here I would just do a single feature section as the first
instruction of clear_pages() that covers a branch out of line to an
alternate implementation of the whole function.
_GLOBAL(clear_pages)
BEGIN_MMU_FTR_SECTION
b clear_pages_no_dcbz
END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH)
.../...
> diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
> index 55f19f9..2646838 100644
> --- a/arch/powerpc/lib/copy_32.S
> +++ b/arch/powerpc/lib/copy_32.S
> @@ -12,6 +12,7 @@
> #include <asm/cache.h>
> #include <asm/errno.h>
> #include <asm/ppc_asm.h>
> +#include <asm/mmu.h>
This is a bit more nasty. At some point we'll have to butcher that code
in order to deal with 440 and 476 that have different cache line sizes
and which we still want in the same kernel binary. In the meantime
I'd do like previously and just duplicate the whole lot with just a
single branch out.
Note that I fail to see how your cachable_memzero can be correct since
you don't replace dcbz with anything. On the other hand, the only user
of it in the entire tree is ... clearing the hash table in ppc_mmu_32.c
which we don't use on 440.
So why don't you just make a separate patch that just completely gets
rid of cachable_memzero() and use a memset in ppc_mmu_32.c ? I don't
think anybody will notice the difference....
For cachable_memcpy and copy_tofrom_user, just removing the dcbz's will
do for now, though I wonder whether we could just remove cachable_memcpy
as well. The only user is the EMAC driver and I doubt anybody will be
able to measure the difference. It will make everbody life easier in the
long term to remove those.
> #define COPY_16_BYTES \
> lwz r7,4(r4); \
> @@ -98,7 +99,10 @@ _GLOBAL(cacheable_memzero)
> bdnz 4b
> 3: mtctr r9
> li r7,4
> -10: dcbz r7,r6
> +10:
> +BEGIN_MMU_FTR_SECTION
> + dcbz r7,r6
> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
> addi r6,r6,CACHELINE_BYTES
> bdnz 10b
> clrlwi r5,r8,32-LG_CACHELINE_BYTES
> @@ -187,7 +191,9 @@ _GLOBAL(cacheable_memcpy)
> mtctr r0
> beq 63f
> 53:
> +BEGIN_MMU_FTR_SECTION
> dcbz r11,r6
> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
> COPY_16_BYTES
> #if L1_CACHE_BYTES >= 32
> COPY_16_BYTES
> @@ -368,7 +374,10 @@ _GLOBAL(__copy_tofrom_user)
> mtctr r8
>
> 53: dcbt r3,r4
> -54: dcbz r11,r6
> +54:
> +BEGIN_MMU_FTR_SECTION
> + dcbz r11,r6
> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
> .section __ex_table,"a"
> .align 2
> .long 54b,105f
> diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
> index 024acab..f5c60b3 100644
> --- a/arch/powerpc/mm/44x_mmu.c
> +++ b/arch/powerpc/mm/44x_mmu.c
> @@ -80,9 +80,12 @@ static void __init ppc44x_pin_tlb(unsigned int virt, unsigned int phys)
> :
> #ifdef CONFIG_PPC47x
> : "r" (PPC47x_TLB2_S_RWX),
> -#else
> +#elseif CONFIG_BGP_L1_WRITETHROUGH
> + : "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_WL1 \
> + | PPC44x_TLB_U2 | PPC44x_TLB_M),
> +#else /* neither CONFIG_PPC47x or CONFIG_BGP_L1_WRITETHROUGH */
> : "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G),
> -#endif
> +#endif /* CONFIG_PPC47x */
> "r" (phys),
> "r" (virt | PPC44x_TLB_VALID | PPC44x_TLB_256M),
> "r" (entry),
Make this conditional at runtime.
Cheers,
Ben.
^ permalink raw reply
* [PATCH] [RFC][V3] bluegene: use MMU feature flag to conditionalize L1 writethrough code
From: Eric Van Hensbergen @ 2011-06-07 21:36 UTC (permalink / raw)
To: linux-kernel; +Cc: linuxppc-dev, bg-linux
BG/P nodes need to be configured for writethrough to work in SMP
configurations. This patch adds the right hooks in the MMU code
to make sure BGP_L1_WRITETHROUGH configurations are setup for BG/P.
RFC note: this essentially just changes the ifdefs to use the
BEGIN_MMU_FTR_SECTION macros. A couple of things that I really didn't
like about this:
a) we introduced at least one extra op that isn't needed to get around
otherwise having multiple labels
b) we are introducting a bunch of no-ops in places that could be critical
paths and jimix says this may not be the best thing for multiple reasons
including having no-ops around the DCBZs is a bad thing
c) the ELSE_MMU_FTR_SECTION stuff appears to be broken (or I don't know
how to use it, it gave me the error:
Error: non-constant expression in ".if" statement
so I switched out the else clauses with redundant FTR_SECTIONS
(one for IFSET and on for IFCLR). Please someone throw me a clue
as to what I was doing wrong. I'm running gcc 4.3.2 from crosstools-ng
if it has some sort of impact.
Jimix has thrown me some code to try and do a better job by branching
to stub code inside of the MMU_FTRs so I don't have so many no-ops. I'm
open to alternatives. jimix also suggested changing NEED_L1_WRITETHROUGH
to DCBZ_BROKEN, which I'm open to if you think appropriate, or maybe
DCBZ_BROKEN_DAMNIT would be more apt.
Thanks for any help.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
---
arch/powerpc/include/asm/mmu-44x.h | 2 ++
arch/powerpc/kernel/head_44x.S | 27 +++++++++++++++++++++++++--
arch/powerpc/kernel/misc_32.S | 24 +++++++++++++++++++++++-
arch/powerpc/lib/copy_32.S | 13 +++++++++++--
arch/powerpc/mm/44x_mmu.c | 7 +++++--
5 files changed, 66 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h
index bf52d70..ca1b90c 100644
--- a/arch/powerpc/include/asm/mmu-44x.h
+++ b/arch/powerpc/include/asm/mmu-44x.h
@@ -8,6 +8,7 @@
#define PPC44x_MMUCR_TID 0x000000ff
#define PPC44x_MMUCR_STS 0x00010000
+#define PPC44x_MMUCR_U2 0x00200000
#define PPC44x_TLB_PAGEID 0
#define PPC44x_TLB_XLAT 1
@@ -32,6 +33,7 @@
/* Storage attribute and access control fields */
#define PPC44x_TLB_ATTR_MASK 0x0000ff80
+#define PPC44x_TLB_WL1 0x00100000 /* Write-through L1 */
#define PPC44x_TLB_U0 0x00008000 /* User 0 */
#define PPC44x_TLB_U1 0x00004000 /* User 1 */
#define PPC44x_TLB_U2 0x00002000 /* User 2 */
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 5e12b74..9a9a4ee 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -429,7 +429,17 @@ finish_tlb_load_44x:
andi. r10,r12,_PAGE_USER /* User page ? */
beq 1f /* nope, leave U bits empty */
rlwimi r11,r11,3,26,28 /* yes, copy S bits to U */
-1: tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */
+1:
+BEGIN_MMU_FTR_SECTION
+ andi. r10, r11, PPC44x_TLB_I
+ bne 2f
+ oris r11,r11,PPC44x_TLB_WL1@h /* Add coherency for */
+ /* non-inhibited */
+ ori r11,r11,PPC44x_TLB_U2|PPC44x_TLB_M
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH)
+
+2:
+ tlbwe r11,r13,PPC44x_TLB_ATTRIB /* Write ATTRIB */
/* Done...restore registers and get out of here.
*/
@@ -799,7 +809,12 @@ skpinv: addi r4,r4,1 /* Increment */
sync
/* Initialize MMUCR */
+BEGIN_MMU_FTR_SECTION
+ lis r5, PPC44x_MMUCR_U2@h
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH)
+BEGIN_MMU_FTR_SECTION
li r5,0
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
mtspr SPRN_MMUCR,r5
sync
@@ -814,7 +829,15 @@ skpinv: addi r4,r4,1 /* Increment */
/* attrib fields */
/* Added guarded bit to protect against speculative loads/stores */
li r5,0
- ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G)
+BEGIN_MMU_FTR_SECTION
+ ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+ PPC44x_TLB_G | PPC44x_TLB_U2)
+ oris r5,r5,PPC44x_TLB_WL1@h
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH)
+BEGIN_MMU_FTR_SECTION
+ ori r5,r5,(PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | \
+ PPC44x_TLB_G)
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
li r0,63 /* TLB slot 63 */
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 998a100..b54e2e8 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -506,7 +506,27 @@ _GLOBAL(clear_pages)
li r0,PAGE_SIZE/L1_CACHE_BYTES
slw r0,r0,r4
mtctr r0
-1: dcbz 0,r3
+ li r4, 0
+1:
+BEGIN_MMU_FTR_SECTION
+ /* assuming 32 byte cacheline */
+ stw r4, 0(r3)
+ stw r4, 4(r3)
+ stw r4, 8(r3)
+ stw r4, 12(r3)
+ stw r4, 16(r3)
+ stw r4, 20(r3)
+ stw r4, 24(r3)
+ stw r4, 28(r3)
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_L1_WRITETHROUGH)
+/*
+ * would have used an ELSE_MMU_FTR_SECTION here but it
+ * broke the code with Error: non-constant expression in ".if" statement
+ *
+ */
+BEGIN_MMU_FTR_SECTION
+ dcbz 0,r3
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
addi r3,r3,L1_CACHE_BYTES
bdnz 1b
blr
@@ -550,7 +570,9 @@ _GLOBAL(copy_page)
mtctr r0
1:
dcbt r11,r4
+BEGIN_MMU_FTR_SECTION
dcbz r5,r3
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
COPY_16_BYTES
#if L1_CACHE_BYTES >= 32
COPY_16_BYTES
diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S
index 55f19f9..2646838 100644
--- a/arch/powerpc/lib/copy_32.S
+++ b/arch/powerpc/lib/copy_32.S
@@ -12,6 +12,7 @@
#include <asm/cache.h>
#include <asm/errno.h>
#include <asm/ppc_asm.h>
+#include <asm/mmu.h>
#define COPY_16_BYTES \
lwz r7,4(r4); \
@@ -98,7 +99,10 @@ _GLOBAL(cacheable_memzero)
bdnz 4b
3: mtctr r9
li r7,4
-10: dcbz r7,r6
+10:
+BEGIN_MMU_FTR_SECTION
+ dcbz r7,r6
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
addi r6,r6,CACHELINE_BYTES
bdnz 10b
clrlwi r5,r8,32-LG_CACHELINE_BYTES
@@ -187,7 +191,9 @@ _GLOBAL(cacheable_memcpy)
mtctr r0
beq 63f
53:
+BEGIN_MMU_FTR_SECTION
dcbz r11,r6
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
COPY_16_BYTES
#if L1_CACHE_BYTES >= 32
COPY_16_BYTES
@@ -368,7 +374,10 @@ _GLOBAL(__copy_tofrom_user)
mtctr r8
53: dcbt r3,r4
-54: dcbz r11,r6
+54:
+BEGIN_MMU_FTR_SECTION
+ dcbz r11,r6
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_NEED_L1_WRITETHROUGH)
.section __ex_table,"a"
.align 2
.long 54b,105f
diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
index 024acab..f5c60b3 100644
--- a/arch/powerpc/mm/44x_mmu.c
+++ b/arch/powerpc/mm/44x_mmu.c
@@ -80,9 +80,12 @@ static void __init ppc44x_pin_tlb(unsigned int virt, unsigned int phys)
:
#ifdef CONFIG_PPC47x
: "r" (PPC47x_TLB2_S_RWX),
-#else
+#elseif CONFIG_BGP_L1_WRITETHROUGH
+ : "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_WL1 \
+ | PPC44x_TLB_U2 | PPC44x_TLB_M),
+#else /* neither CONFIG_PPC47x or CONFIG_BGP_L1_WRITETHROUGH */
: "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC44x_TLB_G),
-#endif
+#endif /* CONFIG_PPC47x */
"r" (phys),
"r" (virt | PPC44x_TLB_VALID | PPC44x_TLB_256M),
"r" (entry),
--
1.7.4.1
^ permalink raw reply related
* Re: [PATCH 7/7] [v2] drivers/misc: introduce Freescale hypervisor management driver
From: Arnd Bergmann @ 2011-06-07 19:34 UTC (permalink / raw)
To: Timur Tabi
Cc: Konrad Rzeszutek Wilk, greg, kumar.gala, linux-kernel,
Chris Metcalf, akpm, Deepak Saxena, linux-console, linuxppc-dev
In-Reply-To: <4DEE7A12.80302@freescale.com>
On Tuesday 07 June 2011 21:20:50 Timur Tabi wrote:
> Arnd Bergmann wrote:
> > For the spi flash driver that goes through the hypervisor abstraction,
> > I think drivers/virt/tile would be better than driver/platform/tile,
> > but we should really have a new "abstract flash character driver" subsystem
> > for that.
>
> Why should it matter that the SPI flash driver goes through the hypervisor
> abstration? One of the patches in this patchset is a TTY driver that goes
> through the Freescale hypervisor. I put the drivers in drivers/tty.
The driver in question is for a hypervisor that abstracts the flash memory
using a read/write interface. There is no way you can represent that as
a SPI host driver.
Arnd
^ permalink raw reply
* Re: [gianfar]bandwidth management problem on mpc8313 based board
From: Scott Wood @ 2011-06-07 19:22 UTC (permalink / raw)
To: Vijay Nikam; +Cc: linuxppc-dev
In-Reply-To: <BANLkTi=TVsXiWR1qS5upnM31kVk9qqXTmw@mail.gmail.com>
On Tue, 7 Jun 2011 18:32:37 +0530
Vijay Nikam <vijay.t.nikam@gmail.com> wrote:
> Dear All,
>
> I have mpc8313 powerpc based board with silicon revision 2.1. the
> processor has two ETH ports (eTsec1 and eTsec2) i.e. eth0 and eth1.
> eth0 is 1Gbps port and eth1 is 100Mbps port. On board there is L2
> switch from TANTOS2G (psb6972) supports one port 1Gbps,
> and from switch there are 4 more eth ports derived which are 100Mbps
> ports and port based VLAN is configured for this purpose.
>
> The interface between switch and eth0 (port of processor) is RGMII. So
> the processor port and switch port are connected on 1Gbps Link.
> The other 4 derived ports (100Mbps) are used to connect to external world.
> On this board Embedded Linux is running of kernel version 2.6.23 with HRT patch.
That's rather old.
> The ethernet controller driver in use is "gianfar" version 1.3
> The driver is configured properly as it determines both links 1000Mbps
> (eth0) and 100Mbps (eth1) also verified with ethtool.
>
> After this I started to perform bandwidth test using iperf tool.
> When I performed this test on one port out of 4 derived ports I am
> getting bandwidth in the range of 80-85Mbps
> but when the same test is performed on 2 ports simultaneously then the
> per port bandwidth is reduced to 40-45Mbps.
>
> But my understanding is all of the 4 ports should support 100Mbps
> bandwidth simultaneously (as base port is 1Gbps).
> Then why bandwidth gets reduced when more than one port are
> communicating simultaneously?
> Any reason or suggestion I should check for this problem?
What's your CPU utilization? The CPU may just not be able to keep up with
that much traffic, with the software you're running.
What packet size are you using?
-Scott
^ permalink raw reply
* Re: [PATCH 7/7] [v2] drivers/misc: introduce Freescale hypervisor management driver
From: Arnd Bergmann @ 2011-06-07 19:16 UTC (permalink / raw)
To: Chris Metcalf
Cc: Konrad Rzeszutek Wilk, greg, kumar.gala, linux-kernel, akpm,
Deepak Saxena, linux-console, linuxppc-dev, Timur Tabi
In-Reply-To: <4DEE567E.7080102@tilera.com>
On Tuesday 07 June 2011 18:49:02 Chris Metcalf wrote:
> > You can probably argue that the tile drivers do fit in here as long as
> > they are specific to the hypervisor and not to some SOC specific hardware.
>
> Can you clarify that? I think you're contrasting something like an ARM
> core that was licensed and put in a SoC by some random vendor, and you
> could have an endless stream of drivers for that case. The Tilera core
> isn't being licensed; it's sold more like an Intel chip with a fixed set of
> interfaces available only from Tilera. The particular interface in
> question here is SPI, and the core itself knows how to boot the chip over
> SPI by finding an SPI ROM and reading the boot stream out of it directly
> after power-up.
>
> So does that match with your model of "drivers/platform/tile"? Maybe we
> have a winner! :-)
I'm not really against drivers/platform/tile for this, the only potential
problem that I see with this is that having more stuff in drivers/platform
might lead to having even more other stuff in there that should really
go into another place.
Obviously, if the device is a raw SPI host, the driver should actually go
into drivers/spi/spi_tile.c rather than drivers/platform/tile/spi.c.
For the spi flash driver that goes through the hypervisor abstraction,
I think drivers/virt/tile would be better than driver/platform/tile,
but we should really have a new "abstract flash character driver" subsystem
for that.
Arnd
^ permalink raw reply
* Re: [PATCH 7/7] [v2] drivers/misc: introduce Freescale hypervisor management driver
From: Timur Tabi @ 2011-06-07 19:20 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Konrad Rzeszutek Wilk, greg, kumar.gala, linux-kernel,
Chris Metcalf, akpm, Deepak Saxena, linux-console, linuxppc-dev
In-Reply-To: <201106072116.21469.arnd@arndb.de>
Arnd Bergmann wrote:
> For the spi flash driver that goes through the hypervisor abstraction,
> I think drivers/virt/tile would be better than driver/platform/tile,
> but we should really have a new "abstract flash character driver" subsystem
> for that.
Why should it matter that the SPI flash driver goes through the hypervisor
abstration? One of the patches in this patchset is a TTY driver that goes
through the Freescale hypervisor. I put the drivers in drivers/tty.
--
Timur Tabi
Linux kernel developer at Freescale
^ permalink raw reply
* Re: [PATCH -v2] Audit: push audit success and retcode into arch ptrace.h
From: Eric Paris @ 2011-06-07 18:53 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-mips, linux-ia64, linux-sh, heiko.carstens, paulus, hpa,
sparclinux, linux-s390, richard, x86, mingo, fenghua.yu,
user-mode-linux-devel, microblaze-uclinux, jdike, viro, tglx,
monstr, tony.luck, linux-kernel, ralf, lethal, schwidefsky,
linux390, akpm, linuxppc-dev, davem
In-Reply-To: <20110607171952.GA25729@redhat.com>
On Tue, 2011-06-07 at 19:19 +0200, Oleg Nesterov wrote:
> On 06/03, Eric Paris wrote:
> >
> > The audit system previously expected arches calling to audit_syscall_exit to
> > supply as arguments if the syscall was a success and what the return code was.
> > Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
> > by converting from negative retcodes to an audit internal magic value stating
> > success or failure. This helper was wrong and could indicate that a valid
> > pointer returned to userspace was a failed syscall. The fix is to fix the
> > layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it
> > in turns calls back into arch code to collect the return value and to
> > determine if the syscall was a success or failure. We also define a generic
> > is_syscall_success() macro which determines success/failure based on if the
> > value is < -MAX_ERRNO. This works for arches like x86 which do not use a
> > separate mechanism to indicate syscall failure.
>
> I know nothing about audit, but the patch looks fine to me.
>
>
> But I have a bit off-topic question,
>
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index 8a445a0..b7b1f88 100644
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -53,6 +53,7 @@
> > #include <asm/paravirt.h>
> > #include <asm/ftrace.h>
> > #include <asm/percpu.h>
> > +#include <linux/err.h>
> >
> > /* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
> > #include <linux/elf-em.h>
> > @@ -564,17 +565,16 @@ auditsys:
> > jmp system_call_fastpath
> >
> > /*
> > - * Return fast path for syscall audit. Call audit_syscall_exit()
> > + * Return fast path for syscall audit. Call __audit_syscall_exit()
> > * directly and then jump back to the fast path with TIF_SYSCALL_AUDIT
> > * masked off.
> > */
> > sysret_audit:
> > movq RAX-ARGOFFSET(%rsp),%rsi /* second arg, syscall return value */
> > - cmpq $0,%rsi /* is it < 0? */
> > - setl %al /* 1 if so, 0 if not */
> > + cmpq $-MAX_ERRNO,%rsi /* is it < -MAX_ERRNO? */
> > + setbe %al /* 1 if so, 0 if not */
> > movzbl %al,%edi /* zero-extend that into %edi */
> > - inc %edi /* first arg, 0->1(AUDITSC_SUCCESS), 1->2(AUDITSC_FAILURE) */
> > - call audit_syscall_exit
> > + call __audit_syscall_exit
>
> With or without this patch, can't we call audit_syscall_exit() twice
> if there is something else in _TIF_WORK_SYSCALL_EXIT mask apart from
> SYSCALL_AUDIT ? First time it is called from asm, then from
> syscall_trace_leave(), no?
>
> For example. The task has TIF_SYSCALL_AUDIT and nothing else, it does
> system_call->auditsys->system_call_fastpath. What if it gets, say,
> TIF_SYSCALL_TRACE before ret_from_sys_call?
No harm is done calling twice. The first call will do the real work and
cleanup. It will set a flag in the audit data that the work has been
done (in_syscall == 0) thus the second call will then not do any real
work and won't have anything to clean up.
-Eric
^ permalink raw reply
* [PATCH] [RFC][V3] bluegene: add entry to cpu table
From: Eric Van Hensbergen @ 2011-06-07 18:47 UTC (permalink / raw)
To: linux-kernel; +Cc: linuxppc-dev, bg-linux
Create an entry for the BG/P chips, include bits to accomodate
the double fp2 fpu and the special MMU considerations like L1
writethrough.
RFC Note: this patch fails scripts/checkpatch.pl because I
matched coding style of the surrounding existing code. Would
you rather have something checkpatch.pl clean or something
which is consistent with the surrounding code style?
The three I got were:
ERROR: Macros with complex values should be enclosed in parenthesis
ERROR: "foo* bar" should be "foo *bar"
WARNING: externs should be avoided in .c files
and I got these by copying other code as an example.
Thanks for any feedback.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
---
arch/powerpc/include/asm/cputable.h | 1 +
arch/powerpc/include/asm/mmu.h | 9 +++++++++
arch/powerpc/kernel/cpu_setup_44x.S | 1 +
arch/powerpc/kernel/cputable.c | 16 ++++++++++++++++
4 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index c0d842c..ce709b5 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -26,6 +26,7 @@
#define PPC_FEATURE_POWER6_EXT 0x00000200
#define PPC_FEATURE_ARCH_2_06 0x00000100
#define PPC_FEATURE_HAS_VSX 0x00000080
+#define PPC_FEATURE_HAS_FPU_FP2 0x00000040
#define PPC_FEATURE_PSERIES_PERFMON_COMPAT \
0x00000040
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 4138b21..97f9502 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -56,6 +56,13 @@
*/
#define MMU_FTR_NEED_DTLB_SW_LRU ASM_CONST(0x00200000)
+/* This indicates that the processor must using writethrough with
+ * the L1 in order to maintain SMP coherence on systems like the
+ * IBM BlueGene/L and IBM BlueGene/P.
+ */
+
+#define MMU_FTR_NEED_L1_WRITETHROUGH ASM_CONST(0x00400000)
+
/* Enable use of TLB reservation. Processor should support tlbsrx.
* instruction and MAS0[WQ].
*/
@@ -112,6 +119,8 @@
MMU_FTR_USE_PAIRED_MAS | \
MMU_FTR_TLBIEL | \
MMU_FTR_16M_PAGE
+#define MMU_FTRS_BGP MMU_FTR_TYPE_44x | MMU_FTR_16M_PAGE | \
+ MMU_FTR_NEED_L1_WRITETHROUGH
#ifndef __ASSEMBLY__
#include <asm/cputable.h>
diff --git a/arch/powerpc/kernel/cpu_setup_44x.S b/arch/powerpc/kernel/cpu_setup_44x.S
index e32b4a9..920aed6 100644
--- a/arch/powerpc/kernel/cpu_setup_44x.S
+++ b/arch/powerpc/kernel/cpu_setup_44x.S
@@ -35,6 +35,7 @@ _GLOBAL(__setup_cpu_440grx)
_GLOBAL(__setup_cpu_460ex)
_GLOBAL(__setup_cpu_460gt)
_GLOBAL(__setup_cpu_460sx)
+_GLOBAL(__setup_cpu_bgp)
_GLOBAL(__setup_cpu_apm821xx)
mflr r4
bl __init_fpu_44x
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 34d2722..550a078 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -57,6 +57,7 @@ extern void __setup_cpu_750fx(unsigned long offset, struct cpu_spec* spec);
extern void __setup_cpu_7400(unsigned long offset, struct cpu_spec* spec);
extern void __setup_cpu_7410(unsigned long offset, struct cpu_spec* spec);
extern void __setup_cpu_745x(unsigned long offset, struct cpu_spec* spec);
+extern void __setup_cpu_bgp(unsigned long offset, struct cpu_spec* spec);
#endif /* CONFIG_PPC32 */
#ifdef CONFIG_PPC64
extern void __setup_cpu_ppc970(unsigned long offset, struct cpu_spec* spec);
@@ -1737,6 +1738,21 @@ static struct cpu_spec __initdata cpu_specs[] = {
.machine_check = machine_check_440A,
.platform = "ppc440",
},
+ { /* Blue Gene/P */
+ .pvr_mask = 0xfffffff0,
+ .pvr_value = 0x52131880,
+ .cpu_name = "450 Blue Gene/P",
+ .cpu_features = CPU_FTRS_440x6,
+ .cpu_user_features = COMMON_USER_BOOKE |
+ PPC_FEATURE_HAS_FPU |
+ PPC_FEATURE_HAS_FPU_FP2,
+ .mmu_features = MMU_FTRS_BGP,
+ .icache_bsize = 32,
+ .dcache_bsize = 32,
+ .cpu_setup = __setup_cpu_bgp,
+ .machine_check = machine_check_440A,
+ .platform = "ppc440",
+ },
{ /* 460EX */
.pvr_mask = 0xffff0006,
.pvr_value = 0x13020002,
--
1.7.4.1
^ permalink raw reply related
* Re: [PATCH -v2] Audit: push audit success and retcode into arch ptrace.h
From: Oleg Nesterov @ 2011-06-07 17:19 UTC (permalink / raw)
To: Eric Paris
Cc: linux-mips, linux-ia64, linux-sh, heiko.carstens, paulus, hpa,
sparclinux, linux-s390, richard, x86, mingo, fenghua.yu,
user-mode-linux-devel, microblaze-uclinux, jdike, viro, tglx,
monstr, tony.luck, linux-kernel, ralf, lethal, schwidefsky,
linux390, akpm, linuxppc-dev, davem
In-Reply-To: <20110603220451.23134.47368.stgit@paris.rdu.redhat.com>
On 06/03, Eric Paris wrote:
>
> The audit system previously expected arches calling to audit_syscall_exit to
> supply as arguments if the syscall was a success and what the return code was.
> Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
> by converting from negative retcodes to an audit internal magic value stating
> success or failure. This helper was wrong and could indicate that a valid
> pointer returned to userspace was a failed syscall. The fix is to fix the
> layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it
> in turns calls back into arch code to collect the return value and to
> determine if the syscall was a success or failure. We also define a generic
> is_syscall_success() macro which determines success/failure based on if the
> value is < -MAX_ERRNO. This works for arches like x86 which do not use a
> separate mechanism to indicate syscall failure.
I know nothing about audit, but the patch looks fine to me.
But I have a bit off-topic question,
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 8a445a0..b7b1f88 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -53,6 +53,7 @@
> #include <asm/paravirt.h>
> #include <asm/ftrace.h>
> #include <asm/percpu.h>
> +#include <linux/err.h>
>
> /* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
> #include <linux/elf-em.h>
> @@ -564,17 +565,16 @@ auditsys:
> jmp system_call_fastpath
>
> /*
> - * Return fast path for syscall audit. Call audit_syscall_exit()
> + * Return fast path for syscall audit. Call __audit_syscall_exit()
> * directly and then jump back to the fast path with TIF_SYSCALL_AUDIT
> * masked off.
> */
> sysret_audit:
> movq RAX-ARGOFFSET(%rsp),%rsi /* second arg, syscall return value */
> - cmpq $0,%rsi /* is it < 0? */
> - setl %al /* 1 if so, 0 if not */
> + cmpq $-MAX_ERRNO,%rsi /* is it < -MAX_ERRNO? */
> + setbe %al /* 1 if so, 0 if not */
> movzbl %al,%edi /* zero-extend that into %edi */
> - inc %edi /* first arg, 0->1(AUDITSC_SUCCESS), 1->2(AUDITSC_FAILURE) */
> - call audit_syscall_exit
> + call __audit_syscall_exit
With or without this patch, can't we call audit_syscall_exit() twice
if there is something else in _TIF_WORK_SYSCALL_EXIT mask apart from
SYSCALL_AUDIT ? First time it is called from asm, then from
syscall_trace_leave(), no?
For example. The task has TIF_SYSCALL_AUDIT and nothing else, it does
system_call->auditsys->system_call_fastpath. What if it gets, say,
TIF_SYSCALL_TRACE before ret_from_sys_call?
Oleg.
^ permalink raw reply
* Re: [PATCH 7/7] [v2] drivers/misc: introduce Freescale hypervisor management driver
From: Chris Metcalf @ 2011-06-07 16:49 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Konrad Rzeszutek Wilk, greg, kumar.gala, linux-kernel, akpm,
Deepak Saxena, linux-console, linuxppc-dev, Timur Tabi
In-Reply-To: <201106070908.16301.arnd@arndb.de>
On 6/7/2011 3:08 AM, Arnd Bergmann wrote:
> On Tuesday 07 June 2011 01:04:40 Chris Metcalf wrote:
>> There is certainly precedent for drivers that don't fit cleanly into an
>> existing category to go in drivers/<arch>, e.g. drivers/s390,
>> drivers/parisc, etc. There is also drivers/platform/x86, though that seems
>> to be for the bus "platform drivers" rather than just a random character
>> driver like the one in question.
> The drivers/s390 and drivers/parisc directories are from a distant past,
> we should not add new ones like them. drivers/platform is controversial,
> but I think it's ok for stuff that manages platform specific quirks.
> The main problem with that is that it doesn't work for embedded systems,
> by extension every ARM specific driver could go into drivers/platform/...
> and we don't want that.
>
> You can probably argue that the tile drivers do fit in here as long as
> they are specific to the hypervisor and not to some SOC specific hardware.
Can you clarify that? I think you're contrasting something like an ARM
core that was licensed and put in a SoC by some random vendor, and you
could have an endless stream of drivers for that case. The Tilera core
isn't being licensed; it's sold more like an Intel chip with a fixed set of
interfaces available only from Tilera. The particular interface in
question here is SPI, and the core itself knows how to boot the chip over
SPI by finding an SPI ROM and reading the boot stream out of it directly
after power-up.
So does that match with your model of "drivers/platform/tile"? Maybe we
have a winner! :-)
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
^ permalink raw reply
* [RFC PATCH V1 7/7] cpuidle: (POWER) Handle power_save=off
From: Trinabh Gupta @ 2011-06-07 16:30 UTC (permalink / raw)
To: linux-pm, linuxppc-dev; +Cc: linux-kernel
In-Reply-To: <20110607162847.6848.44707.stgit@tringupt.in.ibm.com>
This patch makes pseries_idle_driver to be not registered when
power_save=off kernel boot option is specified. For this
boot_option_idle_override is used similar to how it is used for x86.
Signed-off-by: Trinabh Gupta <trinabh@linux.vnet.ibm.com>
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/processor.h | 3 +++
arch/powerpc/kernel/idle.c | 4 ++++
arch/powerpc/platforms/pseries/processor_idle.c | 4 ++++
3 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index d50c2b6..0ce167e 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -377,6 +377,9 @@ static inline unsigned long get_clean_sp(struct pt_regs *regs, int is_32)
}
#endif
+extern unsigned long boot_option_idle_override;
+enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
+
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_PROCESSOR_H */
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 932392b..61515f4 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -39,9 +39,13 @@
#define cpu_should_die() 0
#endif
+unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
+EXPORT_SYMBOL(boot_option_idle_override);
+
static int __init powersave_off(char *arg)
{
ppc_md.power_save = NULL;
+ boot_option_idle_override = IDLE_POWERSAVE_OFF;
return 0;
}
__setup("powersave=off", powersave_off);
diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
index ff44b49..c4c3383 100644
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -288,6 +288,10 @@ static int pseries_idle_probe(void)
return -EPERM;
}
+ if (boot_option_idle_override != IDLE_NO_OVERRIDE) {
+ return -ENODEV;
+ }
+
if (!firmware_has_feature(FW_FEATURE_SPLPAR)) {
printk(KERN_DEBUG "Using default idle\n");
return -ENODEV;
^ permalink raw reply related
* [RFC PATCH V1 6/7] cpuidle: (POWER) Enable cpuidle and directly call cpuidle_idle_call() for pSeries
From: Trinabh Gupta @ 2011-06-07 16:30 UTC (permalink / raw)
To: linux-pm, linuxppc-dev; +Cc: linux-kernel
In-Reply-To: <20110607162847.6848.44707.stgit@tringupt.in.ibm.com>
This patch enables cpuidle for pSeries and cpuidle_idle_call() is
directly called from the idle loop. As a result pseries_idle cpuidle
driver registered with cpuidle subsystem comes into action. This patch
also removes the routines pseries_shared_idle_sleep and
pseries_dedicated_idle_sleep as they are now implemented as part of
pseries_idle cpuidle driver.
Signed-off-by: Trinabh Gupta <trinabh@linux.vnet.ibm.com>
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
arch/powerpc/platforms/Kconfig | 6 ++
arch/powerpc/platforms/pseries/Kconfig | 2 -
arch/powerpc/platforms/pseries/setup.c | 86 +-------------------------------
include/linux/cpuidle.h | 2 -
4 files changed, 9 insertions(+), 87 deletions(-)
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index f970ca2..80e3592 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -206,6 +206,12 @@ config PPC_PASEMI_CPUFREQ
endmenu
+menu "CPUIdle driver"
+
+source "drivers/cpuidle/Kconfig"
+
+endmenu
+
config PPC601_SYNC_FIX
bool "Workarounds for PPC601 bugs"
depends on 6xx && (PPC_PREP || PPC_PMAC)
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 877bac6..9729086 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -121,7 +121,7 @@ config DTL
config PSERIES_IDLE
tristate "Cpuidle driver for pSeries platforms"
- depends on CPU_IDLE
+ select CPU_IDLE
depends on PPC_PSERIES
default y
help
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 6893a0c..75d024b 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -39,6 +39,7 @@
#include <linux/irq.h>
#include <linux/seq_file.h>
#include <linux/root_dev.h>
+#include <linux/cpuidle.h>
#include <asm/mmu.h>
#include <asm/processor.h>
@@ -74,9 +75,6 @@ EXPORT_SYMBOL(CMO_PageSize);
int fwnmi_active; /* TRUE if an FWNMI handler is present */
-static void pseries_shared_idle_sleep(void);
-static void pseries_dedicated_idle_sleep(void);
-
static struct device_node *pSeries_mpic_node;
static void pSeries_show_cpuinfo(struct seq_file *m)
@@ -373,18 +371,9 @@ static void __init pSeries_setup_arch(void)
pSeries_nvram_init();
- /* Choose an idle loop */
if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
vpa_init(boot_cpuid);
- if (get_lppaca()->shared_proc) {
- printk(KERN_DEBUG "Using shared processor idle loop\n");
- ppc_md.power_save = pseries_shared_idle_sleep;
- } else {
- printk(KERN_DEBUG "Using dedicated idle loop\n");
- ppc_md.power_save = pseries_dedicated_idle_sleep;
- }
- } else {
- printk(KERN_DEBUG "Using default idle loop\n");
+ ppc_md.power_save = (void *)cpuidle_idle_call;
}
if (firmware_has_feature(FW_FEATURE_LPAR))
@@ -584,77 +573,6 @@ static int __init pSeries_probe(void)
return 1;
}
-static void pseries_dedicated_idle_sleep(void)
-{
- unsigned int cpu = smp_processor_id();
- unsigned long start_snooze;
- unsigned long in_purr, out_purr;
- long snooze = __get_cpu_var(smt_snooze_delay);
-
- /*
- * Indicate to the HV that we are idle. Now would be
- * a good time to find other work to dispatch.
- */
- get_lppaca()->idle = 1;
- get_lppaca()->donate_dedicated_cpu = 1;
- in_purr = mfspr(SPRN_PURR);
-
- /*
- * We come in with interrupts disabled, and need_resched()
- * has been checked recently. If we should poll for a little
- * while, do so.
- */
- if (snooze) {
- start_snooze = get_tb() + snooze * tb_ticks_per_usec;
- local_irq_enable();
- set_thread_flag(TIF_POLLING_NRFLAG);
-
- while ((snooze < 0) || (get_tb() < start_snooze)) {
- if (need_resched() || cpu_is_offline(cpu))
- goto out;
- ppc64_runlatch_off();
- HMT_low();
- HMT_very_low();
- }
-
- HMT_medium();
- clear_thread_flag(TIF_POLLING_NRFLAG);
- smp_mb();
- local_irq_disable();
- if (need_resched() || cpu_is_offline(cpu))
- goto out;
- }
-
- cede_processor();
-
-out:
- HMT_medium();
- out_purr = mfspr(SPRN_PURR);
- get_lppaca()->wait_state_cycles += out_purr - in_purr;
- get_lppaca()->donate_dedicated_cpu = 0;
- get_lppaca()->idle = 0;
-}
-
-static void pseries_shared_idle_sleep(void)
-{
- /*
- * Indicate to the HV that we are idle. Now would be
- * a good time to find other work to dispatch.
- */
- get_lppaca()->idle = 1;
-
- /*
- * Yield the processor to the hypervisor. We return if
- * an external interrupt occurs (which are driven prior
- * to returning here) or if a prod occurs from another
- * processor. When returning here, external interrupts
- * are enabled.
- */
- cede_processor();
-
- get_lppaca()->idle = 0;
-}
-
static int pSeries_pci_probe_mode(struct pci_bus *bus)
{
if (firmware_has_feature(FW_FEATURE_LPAR))
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index c904188..701bc9b 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -129,7 +129,6 @@ struct cpuidle_driver {
#ifdef CONFIG_CPU_IDLE
extern void disable_cpuidle(void);
extern int cpuidle_idle_call(void);
-
extern int cpuidle_register_driver(struct cpuidle_driver *drv);
struct cpuidle_driver *cpuidle_get_driver(void);
extern void cpuidle_unregister_driver(struct cpuidle_driver *drv);
@@ -144,7 +143,6 @@ extern void cpuidle_disable_device(struct cpuidle_device *dev);
#else
static inline void disable_cpuidle(void) { }
static inline int cpuidle_idle_call(void) { return -ENODEV; }
-
static inline int cpuidle_register_driver(struct cpuidle_driver *drv)
{return -ENODEV; }
static inline struct cpuidle_driver *cpuidle_get_driver(void) {return NULL; }
^ permalink raw reply related
* [RFC PATCH V1 5/7] cpuidle: (POWER) cpuidle driver for pSeries
From: Trinabh Gupta @ 2011-06-07 16:30 UTC (permalink / raw)
To: linux-pm, linuxppc-dev; +Cc: linux-kernel
In-Reply-To: <20110607162847.6848.44707.stgit@tringupt.in.ibm.com>
This patch implements a cpuidle driver for pSeries based on
routines pseries_dedicated_idle_loop and pseries_shared_idle_loop.
The driver is built only if CONFIG_CPU_IDLE is set. This
cpuidle driver uses global registration of idle states and
not per-cpu.
Signed-off-by: Trinabh Gupta <trinabh@linux.vnet.ibm.com>
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/system.h | 8 +
arch/powerpc/kernel/sysfs.c | 2
arch/powerpc/platforms/pseries/Kconfig | 9 +
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/processor_idle.c | 331 +++++++++++++++++++++++
arch/powerpc/platforms/pseries/pseries.h | 3
arch/powerpc/platforms/pseries/setup.c | 3
arch/powerpc/platforms/pseries/smp.c | 1
8 files changed, 355 insertions(+), 3 deletions(-)
create mode 100644 arch/powerpc/platforms/pseries/processor_idle.c
diff --git a/arch/powerpc/include/asm/system.h b/arch/powerpc/include/asm/system.h
index 811cdf1..b5b4fc4 100644
--- a/arch/powerpc/include/asm/system.h
+++ b/arch/powerpc/include/asm/system.h
@@ -224,6 +224,14 @@ extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
extern int powersave_nap; /* set if nap mode can be used in idle loop */
void cpu_idle_wait(void);
+#ifdef CONFIG_PSERIES_IDLE
+extern void update_smt_snooze_delay(int snooze);
+extern int pseries_notify_cpuidle_add_cpu(int cpu);
+#else
+static inline void update_smt_snooze_delay(int snooze) {}
+static inline int pseries_notify_cpuidle_add_cpu(int cpu) { }
+#endif
+
/*
* Atomic exchange
*
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index f0f2199..fbb666f 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -18,6 +18,7 @@
#include <asm/machdep.h>
#include <asm/smp.h>
#include <asm/pmc.h>
+#include <asm/system.h>
#include "cacheinfo.h"
@@ -51,6 +52,7 @@ static ssize_t store_smt_snooze_delay(struct sys_device *dev,
return -EINVAL;
per_cpu(smt_snooze_delay, cpu->sysdev.id) = snooze;
+ update_smt_snooze_delay(snooze);
return count;
}
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 71af4c5..877bac6 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -118,3 +118,12 @@ config DTL
which are accessible through a debugfs file.
Say N if you are unsure.
+
+config PSERIES_IDLE
+ tristate "Cpuidle driver for pSeries platforms"
+ depends on CPU_IDLE
+ depends on PPC_PSERIES
+ default y
+ help
+ Select this option to enable processor idle state management
+ through cpuidle subsystem.
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 3556e40..236db46 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o
obj-$(CONFIG_CMM) += cmm.o
obj-$(CONFIG_DTL) += dtl.o
obj-$(CONFIG_IO_EVENT_IRQ) += io_event_irq.o
+obj-$(CONFIG_PSERIES_IDLE) += processor_idle.o
ifeq ($(CONFIG_PPC_PSERIES),y)
obj-$(CONFIG_SUSPEND) += suspend.o
diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
new file mode 100644
index 0000000..ff44b49
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -0,0 +1,331 @@
+/*
+ * processor_idle - idle state cpuidle driver.
+ * Adapted from drivers/idle/intel_idle.c and
+ * drivers/acpi/processor_idle.c
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/cpuidle.h>
+#include <linux/cpu.h>
+
+#include <asm/paca.h>
+#include <asm/reg.h>
+#include <asm/system.h>
+#include <asm/machdep.h>
+#include <asm/firmware.h>
+
+#include "plpar_wrappers.h"
+#include "pseries.h"
+
+struct cpuidle_driver pseries_idle_driver = {
+ .name = "pseries_idle",
+ .owner = THIS_MODULE,
+};
+
+#define MAX_IDLE_STATE_COUNT 2
+
+static int max_cstate = MAX_IDLE_STATE_COUNT - 1;
+static struct cpuidle_device __percpu *pseries_idle_cpuidle_devices;
+static struct cpuidle_state *cpuidle_state_table;
+
+void update_smt_snooze_delay(int snooze)
+{
+ struct cpuidle_driver *drv = cpuidle_get_driver();
+ if (drv)
+ drv->states[0].target_residency = snooze;
+}
+
+static int snooze_loop(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
+ int index)
+{
+ unsigned long in_purr, out_purr;
+ ktime_t kt_before, kt_after;
+ s64 usec_delta;
+
+ /*
+ * Indicate to the HV that we are idle. Now would be
+ * a good time to find other work to dispatch.
+ */
+ get_lppaca()->idle = 1;
+ get_lppaca()->donate_dedicated_cpu = 1;
+ in_purr = mfspr(SPRN_PURR);
+
+ kt_before = ktime_get_real();
+
+ local_irq_enable();
+ set_thread_flag(TIF_POLLING_NRFLAG);
+ while (!need_resched()) {
+ ppc64_runlatch_off();
+ HMT_low();
+ HMT_very_low();
+ }
+ HMT_medium();
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ smp_mb();
+ local_irq_disable();
+
+ kt_after = ktime_get_real();
+ usec_delta = ktime_to_us(ktime_sub(kt_after, kt_before));
+
+ out_purr = mfspr(SPRN_PURR);
+ get_lppaca()->wait_state_cycles += out_purr - in_purr;
+ get_lppaca()->donate_dedicated_cpu = 0;
+ get_lppaca()->idle = 0;
+
+ dev->last_residency = (int)usec_delta;
+
+ return index;
+}
+
+static int dedicated_cede_loop(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
+ int index)
+{
+ unsigned long in_purr, out_purr;
+ ktime_t kt_before, kt_after;
+ s64 usec_delta;
+
+ /*
+ * Indicate to the HV that we are idle. Now would be
+ * a good time to find other work to dispatch.
+ */
+ get_lppaca()->idle = 1;
+ get_lppaca()->donate_dedicated_cpu = 1;
+ in_purr = mfspr(SPRN_PURR);
+
+ kt_before = ktime_get_real();
+
+ ppc64_runlatch_off();
+ HMT_medium();
+ cede_processor();
+
+ kt_after = ktime_get_real();
+ usec_delta = ktime_to_us(ktime_sub(kt_after, kt_before));
+
+ out_purr = mfspr(SPRN_PURR);
+ get_lppaca()->wait_state_cycles += out_purr - in_purr;
+ get_lppaca()->donate_dedicated_cpu = 0;
+ get_lppaca()->idle = 0;
+
+ dev->last_residency = (int)usec_delta;
+
+ return index;
+}
+
+static int shared_cede_loop(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
+ int index)
+{
+ unsigned long in_purr, out_purr;
+ ktime_t kt_before, kt_after;
+ s64 usec_delta;
+
+ /*
+ * Indicate to the HV that we are idle. Now would be
+ * a good time to find other work to dispatch.
+ */
+ get_lppaca()->idle = 1;
+ get_lppaca()->donate_dedicated_cpu = 1;
+ in_purr = mfspr(SPRN_PURR);
+
+ kt_before = ktime_get_real();
+ /*
+ * Yield the processor to the hypervisor. We return if
+ * an external interrupt occurs (which are driven prior
+ * to returning here) or if a prod occurs from another
+ * processor. When returning here, external interrupts
+ * are enabled.
+ */
+ cede_processor();
+
+ kt_after = ktime_get_real();
+
+ usec_delta = ktime_to_us(ktime_sub(kt_after, kt_before));
+
+ out_purr = mfspr(SPRN_PURR);
+ get_lppaca()->wait_state_cycles += out_purr - in_purr;
+ get_lppaca()->donate_dedicated_cpu = 0;
+ get_lppaca()->idle = 0;
+
+ dev->last_residency = (int)usec_delta;
+
+ return index;
+}
+
+/*
+ * States for dedicated partition case.
+ */
+static struct cpuidle_state dedicated_states[MAX_IDLE_STATE_COUNT] = {
+ { /* Snooze */
+ .name = "snooze",
+ .desc = "snooze",
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 0,
+ .target_residency = 0,
+ .enter = &snooze_loop },
+ { /* CEDE */
+ .name = "CEDE",
+ .desc = "CEDE",
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 1,
+ .target_residency = 10,
+ .enter = &dedicated_cede_loop },
+};
+
+/*
+ * States for shared partition case.
+ */
+static struct cpuidle_state shared_states[MAX_IDLE_STATE_COUNT] = {
+ { /* Shared Cede */
+ .name = "Shared Cede",
+ .desc = "Shared Cede",
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 0,
+ .target_residency = 0,
+ .enter = &shared_cede_loop },
+};
+
+int pseries_notify_cpuidle_add_cpu(int cpu)
+{
+ struct cpuidle_device *dev =
+ per_cpu_ptr(pseries_idle_cpuidle_devices, cpu);
+ if (dev && cpuidle_get_driver()) {
+ cpuidle_disable_device(dev);
+ cpuidle_enable_device(dev);
+ }
+ return 0;
+}
+
+/*
+ * pseries_idle_cpuidle_driver_init()
+ */
+static int pseries_idle_cpuidle_driver_init(void)
+{
+ int cstate;
+ struct cpuidle_driver *drv = &pseries_idle_driver;
+
+ drv->state_count = 0;
+
+ for (cstate = 0; cstate < MAX_IDLE_STATE_COUNT; ++cstate) {
+
+ if (cstate > max_cstate)
+ break;
+
+ /* is the state not enabled? */
+ if (cpuidle_state_table[cstate].enter == NULL)
+ continue;
+
+ drv->states[drv->state_count] = /* structure copy */
+ cpuidle_state_table[cstate];
+
+ if (cpuidle_state_table == dedicated_states)
+ drv->states[drv->state_count].target_residency =
+ __get_cpu_var(smt_snooze_delay);
+
+ drv->state_count += 1;
+ }
+
+ return 0;
+}
+
+/* pseries_idle_devices_uninit(void)
+ * unregister cpuidle devices and de-allocate memory
+ */
+static void pseries_idle_devices_uninit(void)
+{
+ int i;
+ struct cpuidle_device *dev;
+
+ for_each_possible_cpu(i) {
+ dev = per_cpu_ptr(pseries_idle_cpuidle_devices, i);
+ cpuidle_unregister_device(dev);
+ }
+
+ free_percpu(pseries_idle_cpuidle_devices);
+ return;
+}
+
+/* pseries_idle_devices_init()
+ * allocate, initialize and register cpuidle device
+ */
+static int pseries_idle_devices_init(void)
+{
+ int i;
+ struct cpuidle_driver *drv = &pseries_idle_driver;
+ struct cpuidle_device *dev;
+
+ pseries_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device);
+ if (pseries_idle_cpuidle_devices == NULL)
+ return -ENOMEM;
+
+ for_each_possible_cpu(i) {
+ dev = per_cpu_ptr(pseries_idle_cpuidle_devices, i);
+ dev->state_count = drv->state_count;
+ dev->cpu = i;
+ if (cpuidle_register_device(dev)) {
+ printk(KERN_DEBUG "cpuidle_register_device %d failed!\n",
+ i);
+ return -EIO;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * pseries_idle_probe()
+ * Choose state table for shared versus dedicated partition
+ */
+static int pseries_idle_probe(void)
+{
+ if (max_cstate == 0) {
+ printk(KERN_DEBUG "pseries processor idle disabled.\n");
+ return -EPERM;
+ }
+
+ if (!firmware_has_feature(FW_FEATURE_SPLPAR)) {
+ printk(KERN_DEBUG "Using default idle\n");
+ return -ENODEV;
+ }
+
+ if (get_lppaca()->shared_proc)
+ cpuidle_state_table = shared_states;
+ else
+ cpuidle_state_table = dedicated_states;
+
+ return 0;
+}
+
+static int __init pseries_processor_idle_init(void)
+{
+ int retval;
+
+ retval = pseries_idle_probe();
+ if (retval)
+ return retval;
+
+ pseries_idle_cpuidle_driver_init();
+ retval = cpuidle_register_driver(&pseries_idle_driver);
+ if (retval) {
+ printk(KERN_DEBUG "Registration of pseries driver failed.\n");
+ return retval;
+ }
+
+ retval = pseries_idle_devices_init();
+ if (retval) {
+ pseries_idle_devices_uninit();
+ cpuidle_unregister_driver(&pseries_idle_driver);
+ return retval;
+ }
+
+ printk(KERN_DEBUG "pseries_idle_driver registered\n");
+
+ return 0;
+}
+
+device_initcall(pseries_processor_idle_init);
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index e9f6d28..7c60380 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -56,4 +56,7 @@ extern struct device_node *dlpar_configure_connector(u32);
extern int dlpar_attach_node(struct device_node *);
extern int dlpar_detach_node(struct device_node *);
+/* Snooze Delay, pseries_idle */
+DECLARE_PER_CPU(long, smt_snooze_delay);
+
#endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 593acce..6893a0c 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -584,9 +584,6 @@ static int __init pSeries_probe(void)
return 1;
}
-
-DECLARE_PER_CPU(long, smt_snooze_delay);
-
static void pseries_dedicated_idle_sleep(void)
{
unsigned int cpu = smp_processor_id();
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index fbffd7e..2e46883 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -150,6 +150,7 @@ static void __devinit smp_xics_setup_cpu(int cpu)
set_cpu_current_state(cpu, CPU_STATE_ONLINE);
set_default_offline_state(cpu);
#endif
+ pseries_notify_cpuidle_add_cpu(cpu);
}
static int __devinit smp_pSeries_kick_cpu(int nr)
^ permalink raw reply related
* [RFC PATCH V1 4/7] cpuidle: (powerpc) Add cpu_idle_wait() to allow switching idle routines
From: Trinabh Gupta @ 2011-06-07 16:30 UTC (permalink / raw)
To: linux-pm, linuxppc-dev; +Cc: linux-kernel
In-Reply-To: <20110607162847.6848.44707.stgit@tringupt.in.ibm.com>
This patch provides cpu_idle_wait() routine required
by the cpuidle subsystem. Almost all the code is borrowed
from x86.
Signed-off-by: Trinabh Gupta <trinabh@linux.vnet.ibm.com>
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
arch/powerpc/Kconfig | 4 ++++
arch/powerpc/include/asm/system.h | 1 +
arch/powerpc/kernel/idle.c | 18 ++++++++++++++++++
3 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2729c66..518beda 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -87,6 +87,10 @@ config ARCH_HAS_ILOG2_U64
bool
default y if 64BIT
+config ARCH_HAS_CPU_IDLE_WAIT
+ bool
+ default y
+
config GENERIC_HWEIGHT
bool
default y
diff --git a/arch/powerpc/include/asm/system.h b/arch/powerpc/include/asm/system.h
index 2dc595d..811cdf1 100644
--- a/arch/powerpc/include/asm/system.h
+++ b/arch/powerpc/include/asm/system.h
@@ -222,6 +222,7 @@ extern unsigned long klimit;
extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
extern int powersave_nap; /* set if nap mode can be used in idle loop */
+void cpu_idle_wait(void);
/*
* Atomic exchange
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 39a2baa..932392b 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -102,6 +102,24 @@ void cpu_idle(void)
}
}
+static void do_nothing(void *unused)
+{
+}
+
+/*
+ * cpu_idle_wait - Used to ensure that all the CPUs come out of the old
+ * idle loop and start using the new idle loop.
+ * Required while changing idle handler on SMP systems.
+ * Caller must have changed idle handler to the new value before the call.
+ */
+void cpu_idle_wait(void)
+{
+ smp_mb();
+ /* kick all the CPUs so that they exit out of old idle routine */
+ smp_call_function(do_nothing, NULL, 1);
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
int powersave_nap;
#ifdef CONFIG_SYSCTL
^ permalink raw reply related
* [RFC PATCH V1 3/7] cpuidle: stop using pm_idle
From: Trinabh Gupta @ 2011-06-07 16:29 UTC (permalink / raw)
To: linux-pm, linuxppc-dev; +Cc: linux-kernel
In-Reply-To: <20110607162847.6848.44707.stgit@tringupt.in.ibm.com>
From: Len Brown <len.brown@intel.com>
pm_idle does not scale as an idle handler registration mechanism.
Don't use it for cpuidle. Instead, call cpuidle directly, and
allow architectures to use pm_idle as an arch-specific default
if they need it. ie.
cpu_idle()
...
if(cpuidle_call_idle())
pm_idle();
cc: x86@kernel.org
cc: Kevin Hilman <khilman@deeprootsystems.com>
cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Len Brown <len.brown@intel.com>
---
arch/arm/kernel/process.c | 4 +++-
arch/sh/kernel/idle.c | 6 ++++--
arch/x86/kernel/process_32.c | 4 +++-
arch/x86/kernel/process_64.c | 4 +++-
drivers/cpuidle/cpuidle.c | 39 ++++++++++++++++++---------------------
include/linux/cpuidle.h | 2 ++
6 files changed, 33 insertions(+), 26 deletions(-)
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 5e1e541..d7ee0d4 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -30,6 +30,7 @@
#include <linux/uaccess.h>
#include <linux/random.h>
#include <linux/hw_breakpoint.h>
+#include <linux/cpuidle.h>
#include <asm/cacheflush.h>
#include <asm/leds.h>
@@ -196,7 +197,8 @@ void cpu_idle(void)
cpu_relax();
} else {
stop_critical_timings();
- pm_idle();
+ if (cpuidle_call_idle())
+ pm_idle();
start_critical_timings();
/*
* This will eventually be removed - pm_idle
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 425d604..9c7099e 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -16,12 +16,13 @@
#include <linux/thread_info.h>
#include <linux/irqflags.h>
#include <linux/smp.h>
+#include <linux/cpuidle.h>
#include <asm/pgalloc.h>
#include <asm/system.h>
#include <asm/atomic.h>
#include <asm/smp.h>
-void (*pm_idle)(void) = NULL;
+static void (*pm_idle)(void);
static int hlt_counter;
@@ -100,7 +101,8 @@ void cpu_idle(void)
local_irq_disable();
/* Don't trace irqs off for idle */
stop_critical_timings();
- pm_idle();
+ if (cpuidle_call_idle())
+ pm_idle();
/*
* Sanity check to ensure that pm_idle() returns
* with IRQs enabled
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8d12878..61fadbe 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -38,6 +38,7 @@
#include <linux/uaccess.h>
#include <linux/io.h>
#include <linux/kdebug.h>
+#include <linux/cpuidle.h>
#include <asm/pgtable.h>
#include <asm/system.h>
@@ -109,7 +110,8 @@ void cpu_idle(void)
local_irq_disable();
/* Don't trace irqs off for idle */
stop_critical_timings();
- pm_idle();
+ if (cpuidle_idle_call())
+ pm_idle();
start_critical_timings();
}
tick_nohz_restart_sched_tick();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6c9dd92..62c219a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -37,6 +37,7 @@
#include <linux/uaccess.h>
#include <linux/io.h>
#include <linux/ftrace.h>
+#include <linux/cpuidle.h>
#include <asm/pgtable.h>
#include <asm/system.h>
@@ -136,7 +137,8 @@ void cpu_idle(void)
enter_idle();
/* Don't trace irqs off for idle */
stop_critical_timings();
- pm_idle();
+ if (cpuidle_idle_call())
+ pm_idle();
start_critical_timings();
/* In many cases the interrupt that ended idle
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 8d7303b..304e378 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -25,10 +25,10 @@ DEFINE_PER_CPU(struct cpuidle_device *, cpuidle_devices);
DEFINE_MUTEX(cpuidle_lock);
LIST_HEAD(cpuidle_detected_devices);
-static void (*pm_idle_old)(void);
static int enabled_devices;
static int off __read_mostly;
+static int initialized __read_mostly;
int cpuidle_disabled(void)
{
@@ -56,27 +56,24 @@ static int __cpuidle_register_device(struct cpuidle_device *dev);
* cpuidle_idle_call - the main idle loop
*
* NOTE: no locks or semaphores should be used here
+ * return non-zero on failure
*/
-static void cpuidle_idle_call(void)
+int cpuidle_idle_call(void)
{
struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
struct cpuidle_driver *drv = cpuidle_get_driver();
struct cpuidle_state *target_state;
int next_state, entered_state;
- /* check if the device is ready */
- if (!dev || !dev->enabled) {
- if (pm_idle_old)
- pm_idle_old();
- else
-#if defined(CONFIG_ARCH_HAS_DEFAULT_IDLE)
- default_idle();
-#else
- local_irq_enable();
-#endif
- return;
- }
+ if (off)
+ return -ENODEV;
+
+ if (!initialized)
+ return -ENODEV;
+ /* check if the device is ready */
+ if (!dev || !dev->enabled)
+ return -EBUSY;
#if 0
/* shows regressions, re-enable for 2.6.29 */
/*
@@ -90,7 +87,7 @@ static void cpuidle_idle_call(void)
next_state = cpuidle_curr_governor->select(drv, dev);
if (need_resched()) {
local_irq_enable();
- return;
+ return 0;
}
target_state = &drv->states[next_state];
@@ -116,6 +113,8 @@ static void cpuidle_idle_call(void)
/* give the governor an opportunity to reflect on the outcome */
if (cpuidle_curr_governor->reflect)
cpuidle_curr_governor->reflect(dev, entered_state);
+
+ return 0;
}
/**
@@ -123,10 +122,10 @@ static void cpuidle_idle_call(void)
*/
void cpuidle_install_idle_handler(void)
{
- if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
+ if (enabled_devices) {
/* Make sure all changes finished before we switch to new idle */
smp_wmb();
- pm_idle = cpuidle_idle_call;
+ initialized = 1;
}
}
@@ -135,8 +134,8 @@ void cpuidle_install_idle_handler(void)
*/
void cpuidle_uninstall_idle_handler(void)
{
- if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
- pm_idle = pm_idle_old;
+ if (enabled_devices) {
+ initialized = 0;
cpuidle_kick_cpus();
}
}
@@ -410,8 +409,6 @@ static int __init cpuidle_init(void)
if (cpuidle_disabled())
return -ENODEV;
- pm_idle_old = pm_idle;
-
ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
if (ret)
return ret;
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 2786787..c904188 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -128,6 +128,7 @@ struct cpuidle_driver {
#ifdef CONFIG_CPU_IDLE
extern void disable_cpuidle(void);
+extern int cpuidle_idle_call(void);
extern int cpuidle_register_driver(struct cpuidle_driver *drv);
struct cpuidle_driver *cpuidle_get_driver(void);
@@ -142,6 +143,7 @@ extern void cpuidle_disable_device(struct cpuidle_device *dev);
#else
static inline void disable_cpuidle(void) { }
+static inline int cpuidle_idle_call(void) { return -ENODEV; }
static inline int cpuidle_register_driver(struct cpuidle_driver *drv)
{return -ENODEV; }
^ permalink raw reply related
* [RFC PATCH V1 2/7] cpuidle: replace xen access to x86 pm_idle and default_idle
From: Trinabh Gupta @ 2011-06-07 16:29 UTC (permalink / raw)
To: linux-pm, linuxppc-dev; +Cc: linux-kernel
In-Reply-To: <20110607162847.6848.44707.stgit@tringupt.in.ibm.com>
From: Len Brown <len.brown@intel.com>
When a Xen Dom0 kernel boots on a hypervisor, it gets access
to the raw-hardware ACPI tables. While it parses the idle tables
for the hypervisor's beneift, it uses HLT for its own idle.
Rather than have xen scribble on pm_idle and access default_idle,
have it simply disable_cpuidle() so acpi_idle will not load and
architecture default HLT will be used.
cc: xen-devel@lists.xensource.com
Signed-off-by: Len Brown <len.brown@intel.com>
---
arch/x86/xen/setup.c | 3 ++-
drivers/cpuidle/cpuidle.c | 4 ++++
include/linux/cpuidle.h | 2 ++
3 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index be1a464..ab1a916 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -9,6 +9,7 @@
#include <linux/mm.h>
#include <linux/pm.h>
#include <linux/memblock.h>
+#include <linux/cpuidle.h>
#include <asm/elf.h>
#include <asm/vdso.h>
@@ -424,7 +425,7 @@ void __init xen_arch_setup(void)
#ifdef CONFIG_X86_32
boot_cpu_data.hlt_works_ok = 1;
#endif
- pm_idle = default_idle;
+ disable_cpuidle();
boot_option_idle_override = IDLE_HALT;
fiddle_vdso();
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a171b9e..8d7303b 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -34,6 +34,10 @@ int cpuidle_disabled(void)
{
return off;
}
+void disable_cpuidle(void)
+{
+ off = 1;
+}
#if defined(CONFIG_ARCH_HAS_CPU_IDLE_WAIT)
static void cpuidle_kick_cpus(void)
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 1e85538..2786787 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -127,6 +127,7 @@ struct cpuidle_driver {
};
#ifdef CONFIG_CPU_IDLE
+extern void disable_cpuidle(void);
extern int cpuidle_register_driver(struct cpuidle_driver *drv);
struct cpuidle_driver *cpuidle_get_driver(void);
@@ -140,6 +141,7 @@ extern int cpuidle_enable_device(struct cpuidle_device *dev);
extern void cpuidle_disable_device(struct cpuidle_device *dev);
#else
+static inline void disable_cpuidle(void) { }
static inline int cpuidle_register_driver(struct cpuidle_driver *drv)
{return -ENODEV; }
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox