* Re: [PATCH v3 8/8] powerpc/pseries: Read common partition via pstore
From: Benjamin Herrenschmidt @ 2013-06-01 4:52 UTC (permalink / raw)
To: Aruna Balakrishnaiah
Cc: jkenisto, tony.luck, mahesh, cbouatmailru, linux-kernel,
linuxppc-dev, paulus, anton, ccross, keescook
In-Reply-To: <20130425101908.21017.32553.stgit@aruna-ThinkPad-T420>
On Thu, 2013-04-25 at 15:49 +0530, Aruna Balakrishnaiah wrote:
> diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
> index 8d4fb65..88cc050 100644
> --- a/fs/pstore/inode.c
> +++ b/fs/pstore/inode.c
> @@ -330,6 +330,9 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count,
> case PSTORE_TYPE_PPC_OF:
> sprintf(name, "of-%s-%lld", psname, id);
> break;
Call this powerpc-ofw-... Does it even contain something we use in Linux
at all ? Last I looked we only used the common one right ? Also it's
format afaik is defined in the CHRP bindings so it's not generic OFW
stuff, hence the powerpc prefix.
> + case PSTORE_TYPE_PPC_COMMON:
> + sprintf(name, "common-%s-%lld", psname, id);
> + break;
Same deal, call that powerpc-common
> case PSTORE_TYPE_UNKNOWN:
> sprintf(name, "unknown-%s-%lld", psname, id);
> break;
> diff --git a/include/linux/pstore.h b/include/linux/pstore.h
> index 615dc18..656699f 100644
> --- a/include/linux/pstore.h
> +++ b/include/linux/pstore.h
> @@ -38,6 +38,7 @@ enum pstore_type_id {
> /* PPC64 partition types */
> PSTORE_TYPE_PPC_RTAS = 4,
> PSTORE_TYPE_PPC_OF = 5,
> + PSTORE_TYPE_PPC_COMMON = 6,
> PSTORE_TYPE_UNKNOWN = 255
> };
Do we expose anything else or keep it hidden ?
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 3/3] powerpc/pseries: Support compression of oops text via pstore
From: Benjamin Herrenschmidt @ 2013-06-01 4:54 UTC (permalink / raw)
To: Aruna Balakrishnaiah
Cc: jkenisto, tony.luck, mahesh, cbouatmailru, linux-kernel,
linuxppc-dev, paulus, anton, ccross, keescook
In-Reply-To: <20130426095620.14323.34124.stgit@aruna-ThinkPad-T420>
On Fri, 2013-04-26 at 15:26 +0530, Aruna Balakrishnaiah wrote:
> The patch set supports compression of oops messages while writing to NVRAM,
> this helps in capturing more of oops data to lnx,oops-log. The pstore file
> for oops messages will be in decompressed format making it readable.
>
> In case compression fails, the patch takes care of copying the header added
> by pstore and last oops_data_sz bytes of big_oops_buf to NVRAM so that we
> have recent oops messages in lnx,oops-log.
>
> In case decompression fails, it will result in absence of oops file but still
> have files (in /dev/pstore) for other partitions.
Any reason that isn't handled by pstore itself rather than here ? Ie
make a flag indicating that the partition supports compression and have
pstore do it (so we don't compress everything such as ofw common etc...)
Cheers,
Ben.
>
> Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
> ---
> arch/powerpc/platforms/pseries/nvram.c | 132 +++++++++++++++++++++++++++++---
> 1 file changed, 118 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c
> index 0159d74..b5ba5e2 100644
> --- a/arch/powerpc/platforms/pseries/nvram.c
> +++ b/arch/powerpc/platforms/pseries/nvram.c
> @@ -539,6 +539,65 @@ static int zip_oops(size_t text_len)
> }
>
> #ifdef CONFIG_PSTORE
> +/* Derived from logfs_uncompress */
> +int nvram_decompress(void *in, void *out, size_t inlen, size_t outlen)
> +{
> + int err, ret;
> +
> + ret = -EIO;
> + err = zlib_inflateInit(&stream);
> + if (err != Z_OK)
> + goto error;
> +
> + stream.next_in = in;
> + stream.avail_in = inlen;
> + stream.total_in = 0;
> + stream.next_out = out;
> + stream.avail_out = outlen;
> + stream.total_out = 0;
> +
> + err = zlib_inflate(&stream, Z_FINISH);
> + if (err != Z_STREAM_END)
> + goto error;
> +
> + err = zlib_inflateEnd(&stream);
> + if (err != Z_OK)
> + goto error;
> +
> + ret = stream.total_out;
> +error:
> + return ret;
> +}
> +
> +static int unzip_oops(char *oops_buf, char *big_buf)
> +{
> + struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf;
> + u64 timestamp = oops_hdr->timestamp;
> + char *big_oops_data = NULL;
> + char *oops_data_buf = NULL;
> + size_t big_oops_data_sz;
> + int unzipped_len;
> +
> + big_oops_data = big_buf + sizeof(struct oops_log_info);
> + big_oops_data_sz = big_oops_buf_sz - sizeof(struct oops_log_info);
> + oops_data_buf = oops_buf + sizeof(struct oops_log_info);
> +
> + unzipped_len = nvram_decompress(oops_data_buf, big_oops_data,
> + oops_hdr->report_length,
> + big_oops_data_sz);
> +
> + if (unzipped_len < 0) {
> + pr_err("nvram: decompression failed; returned %d\n",
> + unzipped_len);
> + return -1;
> + }
> + oops_hdr = (struct oops_log_info *)big_buf;
> + oops_hdr->version = OOPS_HDR_VERSION;
> + oops_hdr->report_length = (u16) unzipped_len;
> + oops_hdr->timestamp = timestamp;
> + return 0;
> +}
> +
> static int nvram_pstore_open(struct pstore_info *psi)
> {
> /* Reset the iterator to start reading partitions again */
> @@ -567,6 +626,7 @@ static int nvram_pstore_write(enum pstore_type_id type,
> size_t size, struct pstore_info *psi)
> {
> int rc;
> + unsigned int err_type = ERR_TYPE_KERNEL_PANIC;
> struct oops_log_info *oops_hdr = (struct oops_log_info *) oops_buf;
>
> /* part 1 has the recent messages from printk buffer */
> @@ -577,8 +637,31 @@ static int nvram_pstore_write(enum pstore_type_id type,
> oops_hdr->version = OOPS_HDR_VERSION;
> oops_hdr->report_length = (u16) size;
> oops_hdr->timestamp = get_seconds();
> +
> + if (big_oops_buf) {
> + rc = zip_oops(size);
> + /*
> + * If compression fails copy recent log messages from
> + * big_oops_buf to oops_data.
> + */
> + if (rc != 0) {
> + int hsize = pstore_get_header_size();
> + size_t diff = size - oops_data_sz + hsize;
> +
> + if (size > oops_data_sz) {
> + memcpy(oops_data, big_oops_buf, hsize);
> + memcpy(oops_data + hsize, big_oops_buf + diff,
> + oops_data_sz - hsize);
> +
> + oops_hdr->report_length = (u16) oops_data_sz;
> + } else
> + memcpy(oops_data, big_oops_buf, size);
> + } else
> + err_type = ERR_TYPE_KERNEL_PANIC_GZ;
> + }
> +
> rc = nvram_write_os_partition(&oops_log_partition, oops_buf,
> - (int) (sizeof(*oops_hdr) + size), ERR_TYPE_KERNEL_PANIC,
> + (int) (sizeof(*oops_hdr) + oops_hdr->report_length), err_type,
> count);
>
> if (rc != 0)
> @@ -600,10 +683,11 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
> struct oops_log_info *oops_hdr;
> unsigned int err_type, id_no, size = 0;
> struct nvram_os_partition *part = NULL;
> - char *buff = NULL;
> - int sig = 0;
> + char *buff = NULL, *big_buff = NULL;
> + int rc, sig = 0;
> loff_t p;
>
> +read_partition:
> read_type++;
>
> switch (nvram_type_ids[read_type]) {
> @@ -666,6 +750,25 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
> if (nvram_type_ids[read_type] == PSTORE_TYPE_DMESG) {
> oops_hdr = (struct oops_log_info *)buff;
> *buf = buff + sizeof(*oops_hdr);
> +
> + if (err_type == ERR_TYPE_KERNEL_PANIC_GZ) {
> + big_buff = kmalloc(big_oops_buf_sz, GFP_KERNEL);
> + if (!big_buff)
> + return -ENOMEM;
> +
> + rc = unzip_oops(buff, big_buff);
> +
> + if (rc != 0) {
> + kfree(buff);
> + kfree(big_buff);
> + goto read_partition;
> + }
> +
> + oops_hdr = (struct oops_log_info *)big_buff;
> + *buf = big_buff + sizeof(*oops_hdr);
> + kfree(buff);
> + }
> +
> time->tv_sec = oops_hdr->timestamp;
> time->tv_nsec = 0;
> return oops_hdr->report_length;
> @@ -687,17 +790,18 @@ static int nvram_pstore_init(void)
> {
> int rc = 0;
>
> - nvram_pstore_info.buf = oops_data;
> - nvram_pstore_info.bufsize = oops_data_sz;
> + if (big_oops_buf) {
> + nvram_pstore_info.buf = big_oops_buf;
> + nvram_pstore_info.bufsize = big_oops_buf_sz;
> + } else {
> + nvram_pstore_info.buf = oops_data;
> + nvram_pstore_info.bufsize = oops_data_sz;
> + }
>
> rc = pstore_register(&nvram_pstore_info);
> if (rc != 0)
> pr_err("nvram: pstore_register() failed, defaults to "
> "kmsg_dump; returned %d\n", rc);
> - else
> - /*TODO: Support compression when pstore is configured */
> - pr_info("nvram: Compression of oops text supported only when "
> - "pstore is not configured");
>
> return rc;
> }
> @@ -731,11 +835,6 @@ static void __init nvram_init_oops_partition(int rtas_partition_exists)
> oops_data = oops_buf + sizeof(struct oops_log_info);
> oops_data_sz = oops_log_partition.size - sizeof(struct oops_log_info);
>
> - rc = nvram_pstore_init();
> -
> - if (!rc)
> - return;
> -
> /*
> * Figure compression (preceded by elimination of each line's <n>
> * severity prefix) will reduce the oops/panic report to at most
> @@ -759,6 +858,11 @@ static void __init nvram_init_oops_partition(int rtas_partition_exists)
> stream.workspace = NULL;
> }
>
> + rc = nvram_pstore_init();
> +
> + if (!rc)
> + return;
> +
> rc = kmsg_dump_register(&nvram_kmsg_dumper);
> if (rc != 0) {
> pr_err("nvram: kmsg_dump_register() failed; returned %d\n", rc);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply
* Re: [PATCH v3 0/8] Nvram-to-pstore
From: Benjamin Herrenschmidt @ 2013-06-01 5:25 UTC (permalink / raw)
To: Aruna Balakrishnaiah
Cc: jkenisto, tony.luck, mahesh, cbouatmailru, linux-kernel,
linuxppc-dev, paulus, anton, ccross, keescook
In-Reply-To: <20130425100952.21017.51799.stgit@aruna-ThinkPad-T420>
Another question...
Should the core pstore fail to unlink partitions that don't have
an ->erase callback ? IE. Why would you let anyone erase the OFW
common partition for example ? That means that userspace tools
can no longer manipulate it but we certainly don't want to remove
it from the nvram itself.
That leads to a deeper concern. Looking at how efi-pstore works,
it looks like they create a file for each var.
This looks like something valuable we could do for something like
the common partition since typically it's made of name,value pairs.
However, pstore is a flat space, while we have patitions which
themselves can be organized in name,value pairs (some at least)
I wonder if it's time to introduce pstore directories... Or do
we stick to our special tools to interpret/change the name,value
pairs ?
Also do we want to add an ability to resize partitions ? Possibly
based on how much is written to them ?
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH V2] dtc: ensure #line directives don't consume data from the next line
From: David Gibson @ 2013-06-01 5:30 UTC (permalink / raw)
To: Stephen Warren
Cc: mmarek, jdl, Stephen Warren, linux-kbuild, devicetree-discuss,
linux-kernel, rob.herring, linuxppc-dev
In-Reply-To: <1370025184-23808-1-git-send-email-swarren@wwwdotorg.org>
[-- Attachment #1: Type: text/plain, Size: 1477 bytes --]
On Fri, May 31, 2013 at 12:33:04PM -0600, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
>
> Previously, the #line parsing regex ended with ({WS}+[0-9]+)?. The {WS}
> could match line-break characters. If the #line directive did not contain
> the optional flags field at the end, this could cause any integer data on
> the next line to be consumed as part of the #line directive parsing. This
> could cause syntax errors (i.e. #line parsing consuming the leading 0
> from a hex literal 0x1234, leaving x1234 to be parsed as cell data,
> which is a syntax error), or invalid compilation results (i.e. simply
> consuming literal 1234 as part of the #line processing, thus removing it
> from the cell data).
>
> Fix this by replacing {WS} with [ \t] so that it can't match line-breaks.
>
> Convert all instances of {WS}, even though the other instances should be
> irrelevant for any well-formed #line directive. This is done for
> consistency and ultimate safety.
>
> Reported-by: Ian Campbell <Ian.Campbell@citrix.com>
> Signed-off-by: Stephen Warren <swarren@nvidia.com>
Nice catch.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
I'll pull it into my github tree. Jon, please either apply directly
or pull from my tree.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: [PATCH v2] can: flexcan: remove HAVE_CAN_FLEXCAN Kconfig symbol
From: Marc Kleine-Budde @ 2013-06-01 8:32 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Arnd Bergmann, Sascha Hauer, U Bhaskar-B22300, linux-kernel,
linux-can, Shawn Guo, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1370043630.3928.166.camel@pasglop>
[-- Attachment #1: Type: text/plain, Size: 1494 bytes --]
On 06/01/2013 01:40 AM, Benjamin Herrenschmidt wrote:
> On Mon, 2013-05-20 at 18:06 +0200, Marc Kleine-Budde wrote:
>> On 05/17/2013 11:09 AM, Shawn Guo wrote:
>>> On Fri, May 17, 2013 at 10:59:17AM +0200, Marc Kleine-Budde wrote:
>>>> This patch removes the Kconfig symbol HAVE_CAN_FLEXCAN from arch/{arm,powerpc}
>>>> and allowing compilation unconditionally on all arm and powerpc platforms.
>>>>
>>>> This brings a bigger compile time coverage and removes the following dependency
>>>> warning found by Arnd Bergmann:
>>>>
>>>> warning: (SOC_IMX28 && SOC_IMX25 && SOC_IMX35 && IMX_HAVE_PLATFORM_FLEXCAN &&
>>>> SOC_IMX53 && SOC_IMX6Q) selects HAVE_CAN_FLEXCAN
>>>> which has unmet direct dependencies (NET && CAN && CAN_DEV)
>>>>
>>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>>> Cc: Shawn Guo <shawn.guo@linaro.org>
>>>
>>> Acked-by: Shawn Guo <shawn.guo@linaro.org>
>>
>> Thanks.
>>
>> An Acked-by by the powerpc people would be fine. However, if nobody
>> doesn't object, I'm sending this patch via linux-can and net-next upstream.
>
> Sorry, missed it, if it's still out there, add my
>
> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Thanks,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply
* Re: [PATCH v2] can: flexcan: remove HAVE_CAN_FLEXCAN Kconfig symbol
From: Arnd Bergmann @ 2013-06-01 8:59 UTC (permalink / raw)
To: Marc Kleine-Budde
Cc: Sascha Hauer, U Bhaskar-B22300, linux-kernel, linux-arm-kernel,
Shawn Guo, linuxppc-dev, linux-can
In-Reply-To: <1368781157-4646-1-git-send-email-mkl@pengutronix.de>
On Friday 17 May 2013 10:59:17 Marc Kleine-Budde wrote:
> This patch removes the Kconfig symbol HAVE_CAN_FLEXCAN from arch/{arm,powerpc}
> and allowing compilation unconditionally on all arm and powerpc platforms.
>
> This brings a bigger compile time coverage and removes the following dependency
> warning found by Arnd Bergmann:
>
> warning: (SOC_IMX28 && SOC_IMX25 && SOC_IMX35 && IMX_HAVE_PLATFORM_FLEXCAN &&
> SOC_IMX53 && SOC_IMX6Q) selects HAVE_CAN_FLEXCAN
> which has unmet direct dependencies (NET && CAN && CAN_DEV)
>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Shawn Guo <shawn.guo@linaro.org>
> Cc: Sascha Hauer <s.hauer@pengutronix.de>
> Cc: Kumar Gala <galak@kernel.crashing.org>
> Cc: U Bhaskar-B22300 <B22300@freescale.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Thanks for addressing this!
^ permalink raw reply
* Re: [PATCH v2] can: flexcan: remove HAVE_CAN_FLEXCAN Kconfig symbol
From: Marc Kleine-Budde @ 2013-06-01 9:56 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Sascha Hauer, U Bhaskar-B22300, linux-kernel, linux-arm-kernel,
Shawn Guo, linuxppc-dev, linux-can
In-Reply-To: <4297748.x0TtCUl2dJ@wuerfel>
[-- Attachment #1: Type: text/plain, Size: 1416 bytes --]
On 06/01/2013 10:59 AM, Arnd Bergmann wrote:
> On Friday 17 May 2013 10:59:17 Marc Kleine-Budde wrote:
>> This patch removes the Kconfig symbol HAVE_CAN_FLEXCAN from arch/{arm,powerpc}
>> and allowing compilation unconditionally on all arm and powerpc platforms.
>>
>> This brings a bigger compile time coverage and removes the following dependency
>> warning found by Arnd Bergmann:
>>
>> warning: (SOC_IMX28 && SOC_IMX25 && SOC_IMX35 && IMX_HAVE_PLATFORM_FLEXCAN &&
>> SOC_IMX53 && SOC_IMX6Q) selects HAVE_CAN_FLEXCAN
>> which has unmet direct dependencies (NET && CAN && CAN_DEV)
>>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Shawn Guo <shawn.guo@linaro.org>
>> Cc: Sascha Hauer <s.hauer@pengutronix.de>
>> Cc: Kumar Gala <galak@kernel.crashing.org>
>> Cc: U Bhaskar-B22300 <B22300@freescale.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
>
> Thanks for addressing this!
I'll include this patch in my next pull request to David Miller.
Tnx,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply
* Re: [PATCH 1/3] powerpc/mpc85xx: remove the unneeded pci init functions for corenet ds board
From: Kevin Hao @ 2013-06-01 10:59 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc
In-Reply-To: <1369995080.3928.140.camel@pasglop>
[-- Attachment #1: Type: text/plain, Size: 3024 bytes --]
On Fri, May 31, 2013 at 08:11:20PM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2013-05-31 at 14:41 +0800, Kevin Hao wrote:
> > Hi Ben,
> >
> > Could you shed some light on this issue? Do we really has the restriction
> > that we have to pick one bus controller as primary even there is no
> > ISA bus on the board? I did check the current code and found no code
> > has a requirement for this. I also searched the archives and but found
> > nothing useful. :-(
>
> You can just pick the first one as primary... The reason we somewhat need
> a primary is related to how we handle IO space.
>
> We ioremap the IO space of all busses and assign the base of the primary
> one to a global "_IO_BASE". Then any "port" access is an offset from that
> which means that non-primary can end up having negative offsets. We fix
> up all resources, which works fine ... unless drivers do stupid casts
> and the wrap-around fails.
>
> The main reason we did that originally is because we still had a slew of
> x86 originated HW that would access hard wired IO ports, especially on things
> like CHRP machines, looking for things like 8259 PIC, legacy kbd controllers,
> UARTs, etc... at fixed IO port numbers.
>
> We still support some of these boxes (though I do wonder how long since
> somebody last booted a Pegasos) so I'm not quite yet keen on getting rid
> of that stuff...
Thanks for the detailed explanation. But I don't mean to drop the support
of the primary bus. As you said, this is definitely needed now in order to
make some legacy device drivers work. What I want to do is not to pick a
primary bus if there is no such ISA devices at all. For example, on a fsl
p4080ds board, we would do the following in the current kernel:
/* pick up a random host bridge as primary bus */
for_each_matching_node(np, pci_ids) {
...
}
/* for host bridge 0 */
fsl_add_bridge(pdev, primary = 1) {
...
pci_process_bridge_OF_ranges(hose, dev, is_primary = 1);
}
/* for host bridge 1*/
fsl_add_bridge(pdev, primary = 0) {
...
pci_process_bridge_OF_ranges(hose, dev, is_primary = 0);
}
But there is no ISA bus on this board and we also don't need to support any
fixed IO port numbers. So it seems redundant to iterate the device list to
pick a host bridge as primary bus. So we can simply change the above to:
/* for host bridge 0 */
fsl_add_bridge(pdev, primary = 0) {
...
pci_process_bridge_OF_ranges(hose, dev, is_primary = 0);
}
/* for host bridge 1*/
fsl_add_bridge(pdev, primary = 0) {
...
pci_process_bridge_OF_ranges(hose, dev, is_primary = 0);
}
The effect of this change is that the isa_io_base will be 0 and the IO
resource are equal to the virtual address of the IO space. But the IO
functions such as outx/inx should work as well. This is why I ask the
above question. What do you think about this? Are there any subtle bugs
that will be triggered by this?
Thanks,
Kevin
>
> Cheers,
> Ben.
>
>
[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* Re: [PATCH 1/3] powerpc/mpc85xx: remove the unneeded pci init functions for corenet ds board
From: Kevin Hao @ 2013-06-01 11:13 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc
In-Reply-To: <1370042841.13614.19@snotra>
[-- Attachment #1: Type: text/plain, Size: 2884 bytes --]
On Fri, May 31, 2013 at 06:27:21PM -0500, Scott Wood wrote:
> On 05/31/2013 01:43:49 AM, Kevin Hao wrote:
> >On Thu, May 30, 2013 at 01:54:59PM -0500, Scott Wood wrote:
> >> On 05/30/2013 05:20:34 AM, Kevin Hao wrote:
> >> >On Tue, May 28, 2013 at 05:52:09PM -0500, Scott Wood wrote:
> >> >> On 05/21/2013 07:04:58 AM, Kevin Hao wrote:
> >> >> >It also seems that we don't support ISA on all the current
> >> >corenet ds
> >> >> >boards. So picking a primary bus seems useless, remove that
> >> >function
> >> >> >too.
> >> >>
> >> >> IIRC that was due to some bugs in the PPC PCI code in the
> >absence of
> >> >> any primary bus.
> >> >
> >> >Do you know more about these bugs?
> >>
> >> Not off the top of my head -- either search the archives or ask Ben.
> >>
> >> >> fsl_pci_assign_primary() will arbitrarily pick one
> >> >> to be primary if there's no ISA. Have the bugs been fixed?
> >> >
> >> >I know there should be some reason that we put the
> >> >fsl_pci_assign_primary()
> >> >here. But frankly I am not sure what bugs this workaround try to
> >> >fix. For these
> >> >corenet boards picking one to be primary has no effect to the
> >> >64bit kernel.
> >> >And for 32bit kernel, the only effect of this is that isa_io_base
> >> >is set to the
> >> >io virtual base of the primary bus. But the isa_io_base only make
> >> >sense when
> >> >we do have a isa bus, so that we can access some well-known io
> >> >ports directly
> >> >by using outx/inx. But if we don't have isa bus on the board, the
> >> >value of
> >> >isa_io_base should make no sense at all. So we really don't need
> >> >to pick a
> >> >fake primary bus. Of course I may miss something, correct me if I
> >> >am wrong. :-)
> >>
> >> outx/inx can also be used for PCI I/O BARs.
> >
> >Yes, I know there is also PIO. But the value of isa_io_base doesn't
> >have any effect for this.
>
> See this e-mail for some of the issues I was referring to with
> isa_io_base being zero:
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2012-June/098586.html
Thanks for the reference.
>
> Reading it again I'm not so sure that the problem is so much that we
> need a primary, as that somewhat bad things happen on non-primary
> buses, such as the possibility of assigning a zero BAR. Some
> hardware (including QEMU's PCI emulation) cares about this, though
> most doesn't. We only have one PCI bus under QEMU, so when we
> started picking an arbitrary bus to be primary, the problem went
> away because there was only one bus and therefore there was no
> non-primary bus.
Sorry, I am not sure what you mean. Do you mean that it will affect
the resources assigned to a bridge when it is marked as a primary bus?
I searched the code and didn't found anything related to this? Could you
give me some hint? :-)
Thanks,
Kevin
>
> -Scott
[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* Re: [PATCH] powerpc/mm: Always invalidate tlb on hpte invalidate and update
From: Michael Ellerman @ 2013-06-01 11:19 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linuxppc-dev
In-Reply-To: <1369998204-31490-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
On Fri, May 31, 2013 at 04:33:24PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> If a hash bucket gets full, we "evict" a more/less random entry from it.
> When we do that we don't invalidate the TLB (hpte_remove) because we assume
> the old translation is still technically "valid". This implies that when
> we are invalidating or updating pte, even if HPTE entry is not valid
> we should do a tlb invalidate.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Has this always been a bug? I assume not.
I'm asking because I have a kernel that's crashing and I'm wondering if
I might need this commit.
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/mm: Always invalidate tlb on hpte invalidate and update
From: Benjamin Herrenschmidt @ 2013-06-01 11:44 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <20130601111908.GA16571@concordia>
On Sat, 2013-06-01 at 21:19 +1000, Michael Ellerman wrote:
> On Fri, May 31, 2013 at 04:33:24PM +0530, Aneesh Kumar K.V wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >
> > If a hash bucket gets full, we "evict" a more/less random entry from it.
> > When we do that we don't invalidate the TLB (hpte_remove) because we assume
> > the old translation is still technically "valid". This implies that when
> > we are invalidating or updating pte, even if HPTE entry is not valid
> > we should do a tlb invalidate.
> >
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> Has this always been a bug? I assume not.
>
> I'm asking because I have a kernel that's crashing and I'm wondering if
> I might need this commit.
Bug got introduced in either
b1022fbd293564de91596b8775340cf41ad5214c or
7e74c3921ad9610c0b49f28b8fc69f7480505841, the jury is still out on
that one :-)
It's unlikely to crash the kernel however (it *can*, it's just unlikely).
Patch is good to have regardless...
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 1/3] powerpc/mpc85xx: remove the unneeded pci init functions for corenet ds board
From: Benjamin Herrenschmidt @ 2013-06-01 11:47 UTC (permalink / raw)
To: Kevin Hao; +Cc: Scott Wood, linuxppc
In-Reply-To: <20130601105936.GA1850@pek-khao-d1.corp.ad.wrs.com>
On Sat, 2013-06-01 at 18:59 +0800, Kevin Hao wrote:
> The effect of this change is that the isa_io_base will be 0 and the IO
> resource are equal to the virtual address of the IO space. But the IO
> functions such as outx/inx should work as well. This is why I ask the
> above question. What do you think about this? Are there any subtle bugs
> that will be triggered by this?
I don't see any obvious reason why that wouldn't work but like anything
in that area, it needs a bit of testing & hammering to be sure ;-)
In fact it would work on pmac32 as well since those generally don't have
legacy crap either.
So I have no fundamental objection, it just needs testing. My worry is
that we need to make sure we don't break old chrp and I don't have any
to test with. I'm happy to drop support for stuff nobody uses anymore
(we did drop PReP after all and I'm *that* close to drop power3) but as
long as somebody is still using a CHRP RS6K or a Pegasos I can't quite
drop those... And they do have legacy ISA crap to deal with.
Cheers,
Ben.
^ permalink raw reply
* [PATCH v2] powerpc/mpc85xx: match with the pci bus address used by u-boot for all p1_p2_rdb_pc boards
From: Kevin Hao @ 2013-06-01 12:12 UTC (permalink / raw)
To: Kumar Gala; +Cc: Scott Wood, linuxppc
In-Reply-To: <20130531075349.GA22164@pek-khao-d1.corp.ad.wrs.com>
All these boards use the same configuration file p1_p2_rdb_pc.h in
u-boot. So they have the same pci bus address set by the u-boot.
But in some of these boards the bus address set in dtb don't match
the one used by u-boot. And this will trigger a kernel bug in 32bit
kernel and cause the pci device malfunction. For example, on a
p2020rdb-pc board the u-boot use the 0xa0000000 as both bus address
and cpu address for one pci controller and then assign bus address
such as 0xa00004000 to some pci device. But in the kernel, the dtb
set the bus address to 0xe0000000 and the cpu address to 0xa0000000.
The kernel assumes mistakenly the assigned bus address 0xa0004000
in pci device is correct and keep it unchanged. This will definitely
cause the pci device malfunction. I have made a patch to fix this
in the pci subsystem.
https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=next&id=cf4d1cf5ac5e7d2b886af6ed906ea0dcdc5b6855
Since it changed the general code and will need more time to merge
into upstream. So I would like to tweak the 32bit dts of these boards
first and make it available to kernel ASAP.
The cpu address for the pci controller seems also not right in
p1025rdb_32b/36b.dts. So fix it at the same time.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
---
Hi Kumar,
I would like this to be merged into 3.10 since some boards which has
an integrated PCIe device (such as p2020rdb-pca) is not bootable at all
without this fix.
v2:
* Remove the parts for the 36bit dts.
* Add stable to the CC.
arch/powerpc/boot/dts/p1020mbg-pc_32b.dts | 4 ++--
arch/powerpc/boot/dts/p1020utm-pc_32b.dts | 4 ++--
arch/powerpc/boot/dts/p1024rdb_32b.dts | 4 ++--
arch/powerpc/boot/dts/p1025rdb_32b.dts | 4 ++--
arch/powerpc/boot/dts/p1025rdb_36b.dts | 2 +-
arch/powerpc/boot/dts/p2020rdb-pc_32b.dts | 4 ++--
6 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/boot/dts/p1020mbg-pc_32b.dts b/arch/powerpc/boot/dts/p1020mbg-pc_32b.dts
index ab8f076..042bda5 100644
--- a/arch/powerpc/boot/dts/p1020mbg-pc_32b.dts
+++ b/arch/powerpc/boot/dts/p1020mbg-pc_32b.dts
@@ -56,7 +56,7 @@
pci0: pcie@ffe09000 {
reg = <0x0 0xffe09000 0x0 0x1000>;
- ranges = <0x2000000 0x0 0xe0000000 0x0 0xa0000000 0x0 0x20000000
+ ranges = <0x2000000 0x0 0xa0000000 0x0 0xa0000000 0x0 0x20000000
0x1000000 0x0 0x00000000 0x0 0xffc10000 0x0 0x10000>;
pcie@0 {
ranges = <0x2000000 0x0 0xe0000000
@@ -71,7 +71,7 @@
pci1: pcie@ffe0a000 {
reg = <0x0 0xffe0a000 0x0 0x1000>;
- ranges = <0x2000000 0x0 0xe0000000 0x0 0x80000000 0x0 0x20000000
+ ranges = <0x2000000 0x0 0x80000000 0x0 0x80000000 0x0 0x20000000
0x1000000 0x0 0x00000000 0x0 0xffc00000 0x0 0x10000>;
pcie@0 {
ranges = <0x2000000 0x0 0xe0000000
diff --git a/arch/powerpc/boot/dts/p1020utm-pc_32b.dts b/arch/powerpc/boot/dts/p1020utm-pc_32b.dts
index 4bfdd89..27576eb 100644
--- a/arch/powerpc/boot/dts/p1020utm-pc_32b.dts
+++ b/arch/powerpc/boot/dts/p1020utm-pc_32b.dts
@@ -56,7 +56,7 @@
pci0: pcie@ffe09000 {
reg = <0x0 0xffe09000 0x0 0x1000>;
- ranges = <0x2000000 0x0 0xe0000000 0x0 0xa0000000 0x0 0x20000000
+ ranges = <0x2000000 0x0 0xa0000000 0x0 0xa0000000 0x0 0x20000000
0x1000000 0x0 0x00000000 0x0 0xffc10000 0x0 0x10000>;
pcie@0 {
ranges = <0x2000000 0x0 0xe0000000
@@ -71,7 +71,7 @@
pci1: pcie@ffe0a000 {
reg = <0x0 0xffe0a000 0x0 0x1000>;
- ranges = <0x2000000 0x0 0xe0000000 0x0 0x80000000 0x0 0x20000000
+ ranges = <0x2000000 0x0 0x80000000 0x0 0x80000000 0x0 0x20000000
0x1000000 0x0 0x00000000 0x0 0xffc00000 0x0 0x10000>;
pcie@0 {
ranges = <0x2000000 0x0 0xe0000000
diff --git a/arch/powerpc/boot/dts/p1024rdb_32b.dts b/arch/powerpc/boot/dts/p1024rdb_32b.dts
index 90e803e..ede1af6 100644
--- a/arch/powerpc/boot/dts/p1024rdb_32b.dts
+++ b/arch/powerpc/boot/dts/p1024rdb_32b.dts
@@ -53,7 +53,7 @@
pci0: pcie@ffe09000 {
reg = <0x0 0xffe09000 0 0x1000>;
- ranges = <0x2000000 0x0 0xe0000000 0x0 0xa0000000 0x0 0x20000000
+ ranges = <0x2000000 0x0 0xa0000000 0x0 0xa0000000 0x0 0x20000000
0x1000000 0x0 0x00000000 0x0 0xffc10000 0x0 0x10000>;
pcie@0 {
ranges = <0x2000000 0x0 0xe0000000
@@ -68,7 +68,7 @@
pci1: pcie@ffe0a000 {
reg = <0x0 0xffe0a000 0 0x1000>;
- ranges = <0x2000000 0x0 0xe0000000 0x0 0x80000000 0x0 0x20000000
+ ranges = <0x2000000 0x0 0x80000000 0x0 0x80000000 0x0 0x20000000
0x1000000 0x0 0x00000000 0x0 0xffc00000 0x0 0x10000>;
pcie@0 {
reg = <0x0 0x0 0x0 0x0 0x0>;
diff --git a/arch/powerpc/boot/dts/p1025rdb_32b.dts b/arch/powerpc/boot/dts/p1025rdb_32b.dts
index ac5729c..6ba7ddd 100644
--- a/arch/powerpc/boot/dts/p1025rdb_32b.dts
+++ b/arch/powerpc/boot/dts/p1025rdb_32b.dts
@@ -54,7 +54,7 @@
};
pci0: pcie@ffe09000 {
- ranges = <0x2000000 0x0 0xe0000000 0 0xe0000000 0x0 0x20000000
+ ranges = <0x2000000 0x0 0xa0000000 0 0xa0000000 0x0 0x20000000
0x1000000 0x0 0x00000000 0 0xffc10000 0x0 0x10000>;
reg = <0 0xffe09000 0 0x1000>;
pcie@0 {
@@ -70,7 +70,7 @@
pci1: pcie@ffe0a000 {
reg = <0 0xffe0a000 0 0x1000>;
- ranges = <0x2000000 0x0 0xe0000000 0 0xe0000000 0x0 0x20000000
+ ranges = <0x2000000 0x0 0x80000000 0 0x80000000 0x0 0x20000000
0x1000000 0x0 0x00000000 0 0xffc00000 0x0 0x10000>;
pcie@0 {
ranges = <0x2000000 0x0 0xe0000000
diff --git a/arch/powerpc/boot/dts/p1025rdb_36b.dts b/arch/powerpc/boot/dts/p1025rdb_36b.dts
index 06deb6f..f7a9cf8 100644
--- a/arch/powerpc/boot/dts/p1025rdb_36b.dts
+++ b/arch/powerpc/boot/dts/p1025rdb_36b.dts
@@ -55,7 +55,7 @@
pci0: pcie@fffe09000 {
reg = <0xf 0xffe09000 0 0x1000>;
- ranges = <0x2000000 0x0 0xe0000000 0xe 0x20000000 0x0 0x20000000
+ ranges = <0x2000000 0x0 0xe0000000 0xc 0x20000000 0x0 0x20000000
0x1000000 0x0 0x00000000 0xf 0xffc10000 0x0 0x10000>;
pcie@0 {
ranges = <0x2000000 0x0 0xe0000000
diff --git a/arch/powerpc/boot/dts/p2020rdb-pc_32b.dts b/arch/powerpc/boot/dts/p2020rdb-pc_32b.dts
index 57573bd..4ab21f8 100644
--- a/arch/powerpc/boot/dts/p2020rdb-pc_32b.dts
+++ b/arch/powerpc/boot/dts/p2020rdb-pc_32b.dts
@@ -63,7 +63,7 @@
pci1: pcie@ffe09000 {
reg = <0 0xffe09000 0 0x1000>;
- ranges = <0x2000000 0x0 0xe0000000 0 0xa0000000 0x0 0x20000000
+ ranges = <0x2000000 0x0 0xa0000000 0 0xa0000000 0x0 0x20000000
0x1000000 0x0 0x00000000 0 0xffc10000 0x0 0x10000>;
pcie@0 {
ranges = <0x2000000 0x0 0xe0000000
@@ -78,7 +78,7 @@
pci0: pcie@ffe0a000 {
reg = <0 0xffe0a000 0 0x1000>;
- ranges = <0x2000000 0x0 0xe0000000 0 0x80000000 0x0 0x20000000
+ ranges = <0x2000000 0x0 0x80000000 0 0x80000000 0x0 0x20000000
0x1000000 0x0 0x00000000 0 0xffc00000 0x0 0x10000>;
pcie@0 {
ranges = <0x2000000 0x0 0xe0000000
--
1.8.1.4
^ permalink raw reply related
* Re: [PATCH] powerpc/pci: Improve device hotplug initialization
From: Guenter Roeck @ 2013-06-01 13:58 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Yuanquan Chen, Hiroo Matsumoto, linux-kernel, Paul Mackerras,
Chen Yuanquan-B41889, linuxppc-dev
In-Reply-To: <1369979047.3928.112.camel@pasglop>
On Fri, May 31, 2013 at 03:44:07PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2013-05-30 at 22:14 -0700, Guenter Roeck wrote:
> > On Wed, May 29, 2013 at 05:30:41PM +0800, Chen Yuanquan-B41889 wrote:
> > > On 05/29/2013 01:35 AM, Guenter Roeck wrote:
> > > >bios_add_device(). Drop explicit calls to pcibios_setup_device();
> > > >this makes pcibios_setup_bus_devices() a noop function which could
> > > >eve
> > >
> > > Yeah, it's more reasonable to do the irq and DMA related initialization
> > > in one code path for all devices.
> > >
> > Any comments / feedback on the code itself ?
>
> Sorry, I haven't had a chance to review it yet, I'm fairly bogged down
> at the moment. I want to tread carefully because the previous iteration
> of changing that stuff did break a few platforms in the end.
>
Hi Ben,
the comment was actuially directed towards Yuanquan.
No problem, take your time. I did my best to test it, but I agree that this is a
critical area of the code, and it would be desirable to get additional scrutiny
and test feedback.
The code has been running in our system (P2020 and P5040) for several months.
I was preparing a patch for upstream submission when I noticed commit 37f02195b.
After testing ithis commit, I noticed the problems with it and wrote this patch,
which aligns the code with our initial patch. I tested it as good as I could on
our systems as well as with a P5040 evaluation board and an Intel GE PCIe
card.
Thanks,
Guenter
^ permalink raw reply
* Re: [PATCH] powerpc/pseries: Kill all prefetch streams on context switch
From: Michael Ellerman @ 2013-06-01 15:13 UTC (permalink / raw)
To: Michael Neuling; +Cc: anton, miltonm, Linux PPC dev
In-Reply-To: <17925.1369892067@ale.ozlabs.ibm.com>
On Thu, May 30, 2013 at 03:34:27PM +1000, Michael Neuling wrote:
> On context switch, we should have no prefetch streams leak from one
> userspace process to another. This frees up prefetch resources for the
> next process.
>
> Based on patch from Milton Miller.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
>
> diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
> index cea8496..2f1b6c5 100644
> --- a/arch/powerpc/include/asm/ppc_asm.h
> +++ b/arch/powerpc/include/asm/ppc_asm.h
> @@ -523,6 +523,17 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,946)
> #define PPC440EP_ERR42
> #endif
>
> +/* The following stops all load and store data streams associated with stream
> + * ID (ie. streams created explicitly). The embedded and server mnemonics for
> + * dcbt are different so we use machine "power4" here explicitly.
> + */
> +#define DCBT_STOP_ALL_STREAM_IDS(scratch) \
> +.machine push ; \
> +.machine "power4" ; \
> + lis scratch,0x60000000@h; \
> + dcbt r0,scratch,0b01010; \
> +.machine pop
I don't see why we need the macro, ie. just stick this code in
entry_64.S directly.
cheers
^ permalink raw reply
* Re: [PATCH 1/3] powerpc/mpc85xx: remove the unneeded pci init functions for corenet ds board
From: Kevin Hao @ 2013-06-02 0:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc
In-Reply-To: <1370087236.3766.43.camel@pasglop>
[-- Attachment #1: Type: text/plain, Size: 1694 bytes --]
On Sat, Jun 01, 2013 at 09:47:16PM +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2013-06-01 at 18:59 +0800, Kevin Hao wrote:
>
> > The effect of this change is that the isa_io_base will be 0 and the IO
> > resource are equal to the virtual address of the IO space. But the IO
> > functions such as outx/inx should work as well. This is why I ask the
> > above question. What do you think about this? Are there any subtle bugs
> > that will be triggered by this?
>
> I don't see any obvious reason why that wouldn't work but like anything
> in that area, it needs a bit of testing & hammering to be sure ;-)
Agreed.
>
> In fact it would work on pmac32 as well since those generally don't have
> legacy crap either.
>
> So I have no fundamental objection, it just needs testing. My worry is
> that we need to make sure we don't break old chrp and I don't have any
> to test with.
Don't worry, my patch just drop the picking of primary bus for several
fsl boards. All these changes are in board specific file, so it should
have no affect to other boards at all.
For the other boards which have the same issue, since this is not emergency,
we don't want handle them in a rush. So I would like to merge this patch
first and then wait one or two kernel release cycle before handling the others
to make sure that we don't break anything.
Thanks,
Kevin
> I'm happy to drop support for stuff nobody uses anymore
> (we did drop PReP after all and I'm *that* close to drop power3) but as
> long as somebody is still using a CHRP RS6K or a Pegasos I can't quite
> drop those... And they do have legacy ISA crap to deal with.
>
> Cheers,
> Ben.
>
>
[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* Re: 3.10-rc ppc64 corrupts usermem when swapping
From: Aneesh Kumar K.V @ 2013-06-02 7:22 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Paul Mackerras, Anton Blanchard, Hugh Dickins, linuxppc-dev,
David Gibson
In-Reply-To: <1370038982.3928.147.camel@pasglop>
Benjamin Herrenschmidt <benh@au1.ibm.com> writes:
> On Fri, 2013-05-31 at 14:45 +0530, Aneesh Kumar K.V wrote:
>
>> > The patch you are running on is what I'll send to Linus for 3.10 (+/-
>> > cosmetics). Aneesh second patch is a much larger rework which will be
>> > needed for THP but that will wait for 3.11. I'm happy for you to test it
>> > but I first want to make sure it's solid with the 3.10 fix :-)
>
> BTW. One concern I still have is that Hugh identified the bad commit
> to be:
>
> 7e74c3921ad9610c0b49f28b8fc69f7480505841
> "powerpc: Fix hpte_decode to use the correct decoding for page sizes".
>
> However, you introduce the return on HPTE not found earlier, in
>
> b1022fbd293564de91596b8775340cf41ad5214c
> "powerpc: Decode the pte-lp-encoding bits correctly."
>
> So while I'm still happy with the current band-aid for 3.10 and am
> about to send it to Linus, the above *does* seem to indicate that
> there is also something wrong with the "Fix hpte_decode..." commit,
> which might not actually get the page size right...
>
> Can you investigate ?
7e74c3921ad9610c0b49f28b8fc69f7480505841
"powerpc: Fix hpte_decode to use the correct decoding for page sizes"
changes should only impact hpte_decode. We don't change the details
of hpte_actual_psize at all in this patch. That means we should see a
difference only with kexec right ?.
Hugh,
Will you be able to double check whether
7e74c3921ad9610c0b49f28b8fc69f7480505841 is the bad commit. The one
before that is what we changed in the patch that fixed your problem.
-aneesh
^ permalink raw reply
* Re: [PATCH] powerpc/mm: Always invalidate tlb on hpte invalidate and update
From: Aneesh Kumar K.V @ 2013-06-02 7:26 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <20130601111908.GA16571@concordia>
Michael Ellerman <michael@ellerman.id.au> writes:
> On Fri, May 31, 2013 at 04:33:24PM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>
>> If a hash bucket gets full, we "evict" a more/less random entry from it.
>> When we do that we don't invalidate the TLB (hpte_remove) because we assume
>> the old translation is still technically "valid". This implies that when
>> we are invalidating or updating pte, even if HPTE entry is not valid
>> we should do a tlb invalidate.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>
> Has this always been a bug? I assume not.
>
> I'm asking because I have a kernel that's crashing and I'm wondering if
> I might need this commit.
Which config are you seeing the issue ? The changes should not impact
lpar. Can you share more info on crashes. There is a high chance that any crashes
that we are seeing in ppc64 can be the result of THP related changes,
because that did touch some subtle areas like tlb flushing,page table format etc.
-aneesh
^ permalink raw reply
* Re: 3.10-rc ppc64 corrupts usermem when swapping
From: Hugh Dickins @ 2013-06-02 18:19 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linuxppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
Anton Blanchard, David Gibson
In-Reply-To: <8738t1c6y7.fsf@linux.vnet.ibm.com>
On Sun, 2 Jun 2013, Aneesh Kumar K.V wrote:
> Benjamin Herrenschmidt <benh@au1.ibm.com> writes:
> > On Fri, 2013-05-31 at 14:45 +0530, Aneesh Kumar K.V wrote:
> >
> >> > The patch you are running on is what I'll send to Linus for 3.10 (+/-
> >> > cosmetics). Aneesh second patch is a much larger rework which will be
> >> > needed for THP but that will wait for 3.11. I'm happy for you to test it
> >> > but I first want to make sure it's solid with the 3.10 fix :-)
> >
> > BTW. One concern I still have is that Hugh identified the bad commit
> > to be:
> >
> > 7e74c3921ad9610c0b49f28b8fc69f7480505841
> > "powerpc: Fix hpte_decode to use the correct decoding for page sizes".
> >
> > However, you introduce the return on HPTE not found earlier, in
> >
> > b1022fbd293564de91596b8775340cf41ad5214c
> > "powerpc: Decode the pte-lp-encoding bits correctly."
> >
> > So while I'm still happy with the current band-aid for 3.10 and am
> > about to send it to Linus, the above *does* seem to indicate that
> > there is also something wrong with the "Fix hpte_decode..." commit,
> > which might not actually get the page size right...
> >
> > Can you investigate ?
>
> 7e74c3921ad9610c0b49f28b8fc69f7480505841
> "powerpc: Fix hpte_decode to use the correct decoding for page sizes"
> changes should only impact hpte_decode. We don't change the details
> of hpte_actual_psize at all in this patch. That means we should see a
> difference only with kexec right ?.
>
> Hugh,
>
> Will you be able to double check whether
> 7e74c3921ad9610c0b49f28b8fc69f7480505841 is the bad commit. The one
> before that is what we changed in the patch that fixed your problem.
You are absolutely right. I just set b1022fbd29 going, expecting
to answer you tomorrow: but got a Segmentation fault in 20 minutes
(quicker than ever seen before). It looks as if I was running some
other kernel for the last stage of my bisection: I can't see how that
came about, but it's not very interesting now - you got it right.
Prior to trying that, I had been running your second patch, 9f70fd8cfe,
and that tested out successfully for 50 hours before I stopped it.
Hugh
^ permalink raw reply
* Re: [PATCH] powerpc/pseries: Kill all prefetch streams on context switch
From: Michael Neuling @ 2013-06-03 0:42 UTC (permalink / raw)
To: Michael Ellerman; +Cc: anton, miltonm, Linux PPC dev
In-Reply-To: <20130601151327.GA27644@concordia>
Michael Ellerman <michael@ellerman.id.au> wrote:
> On Thu, May 30, 2013 at 03:34:27PM +1000, Michael Neuling wrote:
> > On context switch, we should have no prefetch streams leak from one
> > userspace process to another. This frees up prefetch resources for the
> > next process.
> >
> > Based on patch from Milton Miller.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> >
> > diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
> > index cea8496..2f1b6c5 100644
> > --- a/arch/powerpc/include/asm/ppc_asm.h
> > +++ b/arch/powerpc/include/asm/ppc_asm.h
> > @@ -523,6 +523,17 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,946)
> > #define PPC440EP_ERR42
> > #endif
> >
> > +/* The following stops all load and store data streams associated with stream
> > + * ID (ie. streams created explicitly). The embedded and server mnemonics for
> > + * dcbt are different so we use machine "power4" here explicitly.
> > + */
> > +#define DCBT_STOP_ALL_STREAM_IDS(scratch) \
> > +.machine push ; \
> > +.machine "power4" ; \
> > + lis scratch,0x60000000@h; \
> > + dcbt r0,scratch,0b01010; \
> > +.machine pop
>
> I don't see why we need the macro, ie. just stick this code in
> entry_64.S directly.
There's a decent chance we'll want to use this again at some point. We
also most stuck it in the error path of the power7 copy user loop but
after consulting Anton and heavily reviewing the code, we decoded it
shouldn't be needed.
It's in Linus tree now anyway.
Mikey
^ permalink raw reply
* Re: [PATCH 02/23] powerpc/eeh: Function to tranverse PCI devices
From: Gavin Shan @ 2013-06-03 1:00 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan
In-Reply-To: <1370059983.3766.1.camel@pasglop>
On Sat, Jun 01, 2013 at 02:13:03PM +1000, Benjamin Herrenschmidt wrote:
>On Thu, 2013-05-30 at 16:23 +0800, Gavin Shan wrote:
>> For EEH on PowerNV platform, the PCI devices will be probed to
>> check if they support EEH functionality. Different from the case
>> of EEH for pSeries platform, we will probe real PCI device instead
>> of device tree node for EEH capability on PowerNV platform.
>>
>> The patch introduces function eeh_pci_dev_traverse() to traverse
>> PCI devices for the indicated PCI bus from top to bottom.
>
>This seems racy vs. hotplug etc... Any reason you can't use
>pci_walk_bus() from drivers/pci/bus.c ?
>
Thanks, Ben. I'll use pci_walk_bus() in next version.
Thanks,
Gavin
^ permalink raw reply
* Re: [PATCH 03/23] powerpc/eeh: Make eeh_phb_pe_get() public
From: Gavin Shan @ 2013-06-03 1:02 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan
In-Reply-To: <1370060064.3766.2.camel@pasglop>
On Sat, Jun 01, 2013 at 02:14:24PM +1000, Benjamin Herrenschmidt wrote:
>On Thu, 2013-05-30 at 16:23 +0800, Gavin Shan wrote:
>> While processing EEH event interrupt from P7IOC, we need function
>> to retrieve the PE according to the indicated PCI host controller
>> (struct pci_controller). The patch makes function eeh_phb_pe_get()
>> public so that other source files can call it for that purpose.
>
>Just to make things clear to me... You always have the concept of a
>"controller PE" ? What does it actually correspond to in terms of HW
>setting ? Bus 0 ? Bus 0..255 ? IE, A "catch all" fallback ?
>
Ben, each PHB has its corresponding PE and we call it as "PHB PE", and
it can be regarded as domain to bus 0 ... 255. I'll update the change
log to make that more clear.
Thanks,
Gavin
>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/include/asm/eeh.h | 1 +
>> arch/powerpc/platforms/pseries/eeh_pe.c | 2 +-
>> 2 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>> index eeaeab6..4b48178 100644
>> --- a/arch/powerpc/include/asm/eeh.h
>> +++ b/arch/powerpc/include/asm/eeh.h
>> @@ -185,6 +185,7 @@ static inline void eeh_unlock(void)
>> typedef void *(*eeh_traverse_func)(void *data, void *flag);
>> typedef void *(*eeh_pci_traverse_func)(struct pci_dev *dev, void *flag);
>> int eeh_phb_pe_create(struct pci_controller *phb);
>> +struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
>> int eeh_add_to_parent_pe(struct eeh_dev *edev);
>> int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe);
>> void *eeh_pe_dev_traverse(struct eeh_pe *root,
>> diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c b/arch/powerpc/platforms/pseries/eeh_pe.c
>> index fe43d1a..6e3eb43 100644
>> --- a/arch/powerpc/platforms/pseries/eeh_pe.c
>> +++ b/arch/powerpc/platforms/pseries/eeh_pe.c
>> @@ -95,7 +95,7 @@ int eeh_phb_pe_create(struct pci_controller *phb)
>> * hierarchy tree is composed of PHB PEs. The function is used
>> * to retrieve the corresponding PHB PE according to the given PHB.
>> */
>> -static struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
>> +struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
>> {
>> struct eeh_pe *pe;
>>
>
>
^ permalink raw reply
* Re: [PATCH 08/23] powerpc/eeh: Refactor eeh_reset_pe_once()
From: Gavin Shan @ 2013-06-03 1:03 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan
In-Reply-To: <1370060330.3766.5.camel@pasglop>
On Sat, Jun 01, 2013 at 02:18:50PM +1000, Benjamin Herrenschmidt wrote:
>On Thu, 2013-05-30 at 16:23 +0800, Gavin Shan wrote:
>> The patch changes the criteria used to judge if the PE has been
>> resetted successfully. We needn't the PE status is exactly equal
>> to the combo: (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE).
>
>The comment above doesn't mean anything :-)
>
>I assume you meant to write "we shouldn't check that the
>returned PE status is exactly equal to the two flags X and Y,
>but instead only check that they are both set".
>
Thanks, Ben. I'll update it accordingly in next version :-)
>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/platforms/pseries/eeh.c | 4 ++--
>> 1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c
>> index 39d2ea6..ffe34c4 100644
>> --- a/arch/powerpc/platforms/pseries/eeh.c
>> +++ b/arch/powerpc/platforms/pseries/eeh.c
>> @@ -527,7 +527,6 @@ static void eeh_reset_pe_once(struct eeh_pe *pe)
>> * Partitionable Endpoint trumps hot-reset.
>> */
>> eeh_pe_dev_traverse(pe, eeh_set_dev_freset, &freset);
>> -
>
>Unrelated churn
>
Ok. Will remove it in next version.
>> if (freset)
>> eeh_ops->reset(pe, EEH_RESET_FUNDAMENTAL);
>> else
>> @@ -565,6 +564,7 @@ static void eeh_reset_pe_once(struct eeh_pe *pe)
>> */
>> int eeh_reset_pe(struct eeh_pe *pe)
>> {
>> + int flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
>> int i, rc;
>>
>> /* Take three shots at resetting the bus */
>> @@ -572,7 +572,7 @@ int eeh_reset_pe(struct eeh_pe *pe)
>> eeh_reset_pe_once(pe);
>>
>> rc = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
>> - if (rc == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE))
>> + if ((rc & flags) == flags)
>> return 0;
>>
>> if (rc < 0) {
Thanks,
Gavin
^ permalink raw reply
* Re: [PATCH 16/23] powerpc/eeh: I/O chip PE reset
From: Gavin Shan @ 2013-06-03 1:09 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan
In-Reply-To: <1370060670.3766.8.camel@pasglop>
On Sat, Jun 01, 2013 at 02:24:30PM +1000, Benjamin Herrenschmidt wrote:
>On Thu, 2013-05-30 at 16:23 +0800, Gavin Shan wrote:
>> The patch adds the I/O chip backend to do PE reset. For now, we
>> focus on PCI bus dependent PE. If PHB PE has been put into error
>> state, the PHB will take complete reset. Besides, the root bridge
>> will take fundamental or hot reset accordingly if the indicated
>> PE locates at the toppest of PCI hierarchy tree. Otherwise, the
>> upstream p2p bridge will take hot reset.
>>
>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>
> .../...
>
>> +static s64 ioda_eeh_phb_poll(struct pnv_phb *phb)
>> +{
>> + s64 rc = OPAL_HARDWARE;
>> +
>> + while (1) {
>> + rc = opal_pci_poll(phb->opal_id);
>> + if (rc <= 0)
>> + break;
>> + }
>> +
>> + return rc;
>> +}
>
>This is rude. It should at the very least cond_resched, but better,
>isn't the firmware supposed to return a positive value to indicate
>how long to wait for before calling again ? In that case it should
>msleep() ...
>
Yeah. Thanks for pointing it out. I will use msleep() if possible.
>> +static int ioda_eeh_phb_reset(struct pci_controller *hose, int option)
>> +{
>> + struct pnv_phb *phb = hose->private_data;
>> + s64 rc = OPAL_HARDWARE;
>> +
>> + pr_debug("%s: Reset PHB#%x, option=%d\n",
>> + __func__, hose->global_number, option);
>> +
>> + /* Issue PHB complete reset request */
>> + if (option == EEH_RESET_FUNDAMENTAL ||
>> + option == EEH_RESET_HOT)
>> + rc = opal_pci_reset(phb->opal_id,
>> + OPAL_PHB_COMPLETE,
>> + OPAL_ASSERT_RESET);
>> + else if (option == EEH_RESET_DEACTIVATE)
>> + rc = opal_pci_reset(phb->opal_id,
>> + OPAL_PHB_COMPLETE,
>> + OPAL_DEASSERT_RESET);
>> + if (rc < 0)
>> + goto out;
>> +
>> + /*
>> + * Poll state of the PHB until the request is done
>> + * successfully.
>> + */
>> + rc = ioda_eeh_phb_poll(phb);
>> +out:
>> + if (rc != OPAL_SUCCESS)
>> + return -EIO;
>> +
>> + return 0;
>> +}
>> +
>> +static int ioda_eeh_root_reset(struct pci_controller *hose, int option)
>> +{
>> + struct pnv_phb *phb = hose->private_data;
>> + s64 rc = OPAL_SUCCESS;
>> +
>> + pr_debug("%s: Reset PHB#%x, option=%d\n",
>> + __func__, hose->global_number, option);
>> +
>> + /*
>> + * During the reset deassert time, we needn't care
>> + * the reset scope because the firmware does nothing
>> + * for fundamental or hot reset during deassert phase.
>> + */
>> + if (option == EEH_RESET_FUNDAMENTAL)
>> + rc = opal_pci_reset(phb->opal_id,
>> + OPAL_PCI_FUNDAMENTAL_RESET,
>> + OPAL_ASSERT_RESET);
>> + else if (option == EEH_RESET_HOT)
>> + rc = opal_pci_reset(phb->opal_id,
>> + OPAL_PCI_HOT_RESET,
>> + OPAL_ASSERT_RESET);
>> + else if (option == EEH_RESET_DEACTIVATE)
>> + rc = opal_pci_reset(phb->opal_id,
>> + OPAL_PCI_HOT_RESET,
>> + OPAL_DEASSERT_RESET);
>> + if (rc < 0)
>> + goto out;
>> +
>> + /* Poll state of the PHB until the request is done */
>> + rc = ioda_eeh_phb_poll(phb);
>> +out:
>> + if (rc != OPAL_SUCCESS)
>> + return -EIO;
>> +
>> + return 0;
>> +}
>> +
>> +static int ioda_eeh_bridge_reset(struct pci_controller *hose,
>> + struct pci_dev *dev, int option)
>> +{
>> + u16 ctrl;
>> +
>> + pr_debug("%s: Reset device %04x:%02x:%02x.%01x with option %d\n",
>> + __func__, hose->global_number, dev->bus->number,
>> + PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), option);
>> +
>> + switch (option) {
>> + case EEH_RESET_FUNDAMENTAL:
>> + case EEH_RESET_HOT:
>> + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl);
>> + ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
>> + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
>> + break;
>> + case EEH_RESET_DEACTIVATE:
>> + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl);
>> + ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
>> + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
>> + break;
>> + }
>> +
>> + return 0;
>> +}
>
>Is there any minimum delay by spec here ? And is there a delay to
>respect after the rest or is that handled at the upper level ?
>
Yes, the spec requires 100ms between assert/deassert point and 1.5s
for the link to be stable after reset. The upper layer has 250ms and
1.8s for them separately.
>Also, it looks like nothing here checks if the link is coming back
>up below the bridge, at what speed etc... this might be something
>to add in the future.
>
Ok. I've put it into TODO list.
>> +/**
>> + * ioda_eeh_reset - Reset the indicated PE
>> + * @pe: EEH PE
>> + * @option: reset option
>> + *
>> + * Do reset on the indicated PE. For PCI bus sensitive PE,
>> + * we need to reset the parent p2p bridge. The PHB has to
>> + * be reinitialized if the p2p bridge is root bridge. For
>> + * PCI device sensitive PE, we will try to reset the device
>> + * through FLR. For now, we don't have OPAL APIs to do HARD
>> + * reset yet, so all reset would be SOFT (HOT) reset.
>> + */
>> +static int ioda_eeh_reset(struct eeh_pe *pe, int option)
>> +{
>> + struct pci_controller *hose = pe->phb;
>> + struct eeh_dev *edev;
>> + struct pci_dev *dev;
>> + int ret;
>> +
>> + /*
>> + * Anyway, we have to clear the problematic state for the
>> + * corresponding PE. However, we needn't do it if the PE
>> + * is PHB associated. That means the PHB is having fatal
>> + * errors and it needs reset. Further more, the AIB interface
>> + * isn't reliable any more.
>> + */
>> + if (!(pe->type & EEH_PE_PHB) &&
>> + (option == EEH_RESET_HOT ||
>> + option == EEH_RESET_FUNDAMENTAL)) {
>> + ret = ioda_eeh_pe_clear(pe);
>> + if (ret)
>> + return -EIO;
>> + }
>> +
>> + /*
>> + * The rules applied to reset, either fundamental or hot reset:
>> + *
>> + * We always reset the direct upstream bridge of the PE. If the
>> + * direct upstream bridge isn't root bridge, we always take hot
>> + * reset no matter what option (fundamental or hot) is. Otherwise,
>> + * we should do the reset according to the required option.
>> + */
>> + if (pe->type & EEH_PE_PHB) {
>> + ret = ioda_eeh_phb_reset(hose, option);
>> + } else {
>> + if (pe->type & EEH_PE_DEVICE) {
>> + /*
>> + * If it's device PE, we didn't refer to the parent
>> + * PCI bus yet. So we have to figure it out indirectly.
>> + */
>> + edev = list_first_entry(&pe->edevs,
>> + struct eeh_dev, list);
>> + dev = eeh_dev_to_pci_dev(edev);
>> + dev = dev->bus->self;
>> + } else {
>> + /*
>> + * If it's bus PE, the parent PCI bus is already there
>> + * and just pick it up.
>> + */
>> + dev = pe->bus->self;
>> + }
>> +
>> + /*
>> + * Do reset based on the fact that the direct upstream bridge
>> + * is root bridge (port) or not.
>> + */
>> + if (dev->bus->number == 0)
>> + ret = ioda_eeh_root_reset(hose, option);
>> + else
>> + ret = ioda_eeh_bridge_reset(hose, dev, option);
>> + }
>> +
>> + return ret;
>> +}
>> +
>> struct pnv_eeh_ops ioda_eeh_ops = {
>> .post_init = ioda_eeh_post_init,
>> .set_option = ioda_eeh_set_option,
>> .get_state = ioda_eeh_get_state,
>> - .reset = NULL,
>> + .reset = ioda_eeh_reset,
>> .get_log = NULL,
>> .configure_bridge = NULL
>> };
>
Thanks,
Gavin
^ permalink raw reply
* Re: [PATCH 18/23] powerpc/eeh: PowerNV EEH backends
From: Gavin Shan @ 2013-06-03 1:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Gavin Shan
In-Reply-To: <1370060971.3766.11.camel@pasglop>
On Sat, Jun 01, 2013 at 02:29:31PM +1000, Benjamin Herrenschmidt wrote:
>On Thu, 2013-05-30 at 16:24 +0800, Gavin Shan wrote:
>> The patch adds EEH backends for PowerNV platform. It's notable that
>> part of those EEH backends call to the I/O chip dependent backends.
>
>Add a check for my new OPALv3 flag before registering since you rely
>on a whole bunch of new APIs that the current versions of OPAL don't
>have.
>
Ok. I'll do it in next version :-)
Thanks,
Gavin
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox