All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Longchamp <valentin.longchamp@keymile.com>
To: linux-mtd@lists.infradead.org
Cc: "u-boot@lists.denx.de" <u-boot@lists.denx.de>,
	Holger Brunck <holger.brunck@keymile.com>,
	Stefan Bigler <stefan.bigler@keymile.com>
Subject: Possible race condition when accessing SPI NOR Flash ?
Date: Thu, 01 May 2014 13:52:55 +0200	[thread overview]
Message-ID: <53623597.8040909@keymile.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 5925 bytes --]

Hello,

I am currently seeing a very strange behavior on one our PowerPC based systems
(P2041 from Freescale). We have u-boot as a bootloader, currently using the
2013.10 release. The u-boot environment variables are stored in a SPI NOR Flash
(from which the CPU also boots u-boot), in two dedicated partitions in a
Spansion s25fl256s1 NOR Flash:

> 	spi@110000 {
> 			flash@0 {
> 				#address-cells = <1>;
> 				#size-cells = <1>;
> 				compatible = "spansion,s25fl256s1";
> 				reg = <0>;
> 				spi-max-frequency = <20000000>; /* input clock */
> 				partition@u-boot {
> 					label = "u-boot";
> 					reg = <0x00000000 0x00100000>;
> 					read-only;
> 				};
> 				partition@env {
> 					label = "env";
> 					reg = <0x00100000 0x00010000>;
> 				};
> 				partition@envred {
> 					label = "envred";
> 					reg = <0x00110000 0x00010000>;
> 				};
> 				partition@fman {
> 					label = "fman-ucode";
> 					reg = <0x00120000 0x00010000>;
> 				};
> 			};
> 

I have never noticed a problem when accessing the environment variables from
u-boot itself, nor the CPU not booting from the NOR Flash, so I assume the
hardware is OK (and I am able to reproduce it on Freescale's dev board).

Now when fw_setenv ([1], [2]) updates the environment variables, it sometimes
(approximately 1 out of 100 runs) triggers such kernel errors:

> 
> Unable to handle kernel paging request for data at address 0x54cc2178
> Faulting instruction address: 0xc01187e8
> Oops: Kernel access of bad area, sig: 11 [#2]
> SMP NR_CPUS=8 Keymile kmp204x
> Modules linked in:
> CPU: 2 PID: 1628 Comm: loop_setenv Tainted: G    B D      3.10.36-7.0.9-00435-g228e7e3 #21
> task: db720cb0 ti: db92e000 task.ti: db92e000
> NIP: c01187e8 LR: c011866c CTR: 00000000
> REGS: db92fd80 TRAP: 0300   Tainted: G    B D       (3.10.36-7.0.9-00435-g228e7e3)
> MSR: 00021002 <CE,ME>  CR: 24002424  XER: 20000000
> DEAR: 54cc2178, ESR: 00800000
> 
> GPR00: c011978c db92fe30 db720cb0 db8400c0 000000d0 00000000 db66f5bc 7f7f7f7f 
> GPR08: db0b4e90 db008a44 54cc2174 00000fec 24002422 1001c6ec 00000000 c0890000 
> GPR16: db008a54 db008a4c c0890000 00100100 00200200 00000001 db008a44 db008a40 
> GPR24: c0893444 00000000 000000d0 db92e000 db003dc0 db0b4e00 0000001b db840000 
> NIP [c01187e8] cache_alloc_refill+0x208/0x710
> LR [c011866c] cache_alloc_refill+0x8c/0x710
> Call Trace:
> [db92fe30] [c000ede8] handle_page_fault+0xc/0x80 (unreliable)
> [db92fe80] [c011978c] kmem_cache_alloc+0xec/0x180
> [db92fea0] [c00580a4] prepare_creds+0x24/0x1c0
> [db92feb0] [c0058964] prepare_exec_creds+0x14/0x50
> [db92fec0] [c012635c] prepare_bprm_creds+0x2c/0x70
> [db92fed0] [c01264bc] do_execve+0xbc/0x520
> [db92ff20] [c0126b6c] SyS_execve+0x3c/0x70
> [db92ff40] [c000e944] ret_from_syscall+0x0/0x3c
> --- Exception: c01 at 0xfed2d24
>     LR = 0xfe6aef8
> Instruction dump:
> 39090004 39290001 5508103a 7d1d4214 90680004 913d0000 813f0010 815c0018 
> 7f095040 41f8ff94 813f0004 815f0000 <912a0004> 91490000 927f0000 929f0004 
> ---[ end trace 85ac86b7cd18b604 ]---

OR:

> Unable to handle kernel paging request for data at address 0x54cc2178
> Faulting instruction address: 0xc0117c5c
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=8 Keymile kmp204x
> Modules linked in:
> CPU: 2 PID: 29 Comm: kworker/2:1 Not tainted 3.10.36-7.0.9-00435-g228e7e3 #21
> Workqueue: events cache_reap
> task: db2c12f0 ti: db2c6000 task.ti: db2c6000
> NIP: c0117c5c LR: c011816c CTR: 00000000
> REGS: db2c7d50 TRAP: 0300   Not tainted  (3.10.36-7.0.9-00435-g228e7e3)
> MSR: 00021002 <CE,ME>  CR: 24002424  XER: 20000000
> DEAR: 54cc2178, ESR: 00800000
> 
> GPR00: c011816c db2c7e00 db2c12f0 db003cc0 db800000 00000000 00000080 54cc2174 
> GPR08: 00005000 db008980 db800040 00021002 24000422 00000000 c0050300 db0d3de8 
> GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c1167740 
> GPR24: 00000000 c0893260 00100100 00200200 00000002 db003cc0 db0afdd4 00000000 
> NIP [c0117c5c] free_block+0xbc/0x180
> LR [c011816c] drain_array+0xfc/0x150
> Call Trace:
> [db2c7e30] [c011816c] drain_array+0xfc/0x150
> [db2c7e60] [c01183cc] cache_reap+0xac/0x170
> [db2c7e90] [c0046ce0] process_one_work+0x150/0x3d0
> [db2c7ec0] [c00473ac] worker_thread+0x12c/0x390
> [db2c7ef0] [c00503a8] kthread+0xa8/0xb0
> [db2c7f40] [c000ea7c] ret_from_kernel_thread+0x5c/0x64
> Instruction dump:
> 3d0a4000 5508c9f4 81390000 7cc9402e 7ce94214 70c58000 40c200c4 80870014 
> 813d0050 80e40000 81040004 7d29c02e <91070004> 90e80000 38c90004 93440000 
> ---[ end trace 721e12de4b707566 ]---

Notice that this does not necessarily happen in fw_setenv itself, but rather in
the next "task" that tries to allocate/free some virtual memory.

I see the same behavior with both the 2013.10 and the 2014.04 releases of
u-boot/fw_env. The kernel we are using is 3.10.36.

I suspect that the problem is related to SPI NOR/m25p80 driver: on the system we
have a NAND Flash device with UBI volumes. If I create 2 ubi volumes on the NAND
Flash and configure fw_setenv (/etc/fw_env.config) to use them instead of the
the mtd devices targetting the s25fl256s1, I am not able to reproduce the
problem, even over more than 10'000 runs.

I suspect that it is a race condition because it happens ~1 out of 100 times
and if I disable SMP in the kernel I am never able to reproduce the problem.

Finally, after having analyzed the fw_env behavior, I have tried to write a very
silly program that mimics the fw_env mtd accesses in my use case (attached with
this email) and I am able to reproduce the problem with it.

One other possible culprit that I see is the fsl_espi.c driver for the
underlying hardware connection from the CPU to the NOR Flash, but I wanted to
ask here if someone had an idea about what's going wrong.

Valentin

[1] http://elinux.org/U-boot_environment_variables_in_linux
[2]
http://git.denx.de/?p=u-boot.git;a=tree;f=tools/env;h=99706ce668db90bd4a476a12e6c07fe2e4b85fbb;hb=master


[-- Attachment #2: loop_mtd.c --]
[-- Type: text/plain, Size: 2105 bytes --]

#include <sys/types.h>
#include <sys/ioctl.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>

#include <mtd/mtd-user.h>

int mtd_open(const char *file, mode_t mode) {
	int fd = open(file, mode);
	if (fd == -1) {
		printf("Error opening file %s: %s\n", file, strerror(errno));
		return -1;
	}

	return fd;
}

int mtd_read(int fd, void *buf, size_t count) {
	int sz = read(fd, buf, count);
	if (sz == -1) {
		printf("Error reading fd %d: %s\n", fd, strerror(errno));
	}

	return sz;
}

int mtd_write(int fd, void *buf, size_t count) {
	int sz = write(fd, buf, count);
	if (sz == -1) {
		printf("Error reading fd %d: %s\n", fd, strerror(errno));
	}

	return sz;
}

int mtd_seek(int fd, off_t offset) {
	int ret = lseek(fd, offset, SEEK_SET);
	if (ret == -1) {
		printf("Error closing fd %d: %s\n", fd, strerror(errno));
	}

	return ret;
}

int mtd_erase(int fd, off_t offset) {
	int ret = ioctl(fd, MEMERASE, &offset);
	if (ret == -1) {
		printf("Error erasing fd %d: %s\n", fd, strerror(errno));
	}

	return ret;
}

int mtd_close(int fd) {
	int ret = close(fd);
	if (ret) {
		printf("Error closing fd %d: %s\n", fd, strerror(errno));
	}

	return ret;
}

int main(int argc, char *argv[])
{
	int	retValue;
	int	i;
	int	readSz1 = 0x4000;
	int	readSz2 = 0x10000;
	char	buffer1[readSz1];
	char	buffer2[readSz2];
	char *	file1 = "/dev/mtd6";
	char *	file2 = "/dev/mtd7";

	for (i=0; i < 1000; i++) {
		int ret, sz1, sz2, fd1, fd2;

		fd1 = mtd_open(file1, O_RDONLY);
		ret = mtd_seek(fd1, 0);
		sz1 = mtd_read(fd1, buffer1, readSz1);
		ret = mtd_close(fd1);
		fd2 = mtd_open(file2, O_RDONLY);
		ret = mtd_seek(fd2, 0);
		sz2 = mtd_read(fd2, buffer2, readSz1);
		ret = mtd_close(fd2);

		fd1 = mtd_open(file1, O_RDWR);
		fd2 = mtd_open(file2, O_RDWR);
		sz1 = mtd_read(fd2, buffer2, readSz2);
		ret = mtd_seek(fd2, 0);
		ret = mtd_erase(fd2, 0);
		sz1 = mtd_write(fd2, buffer2, readSz2);
		ret = mtd_seek(fd1, 4);
		sz2 = mtd_write(fd1, "\0", 1);
		ret = mtd_close(fd1);
		ret = mtd_close(fd2);

		if (!(i%50))
			printf("mtd, loop i=%d \n", i);
	}

	return 0;
}


             reply	other threads:[~2014-05-01 11:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-01 11:52 Valentin Longchamp [this message]
2014-05-01 22:31 ` Possible race condition when accessing SPI NOR Flash ? Wolfgang Denk
2014-05-01 22:31   ` [U-Boot] " Wolfgang Denk
2014-05-02  6:18   ` Valentin Longchamp
2014-05-02  6:18     ` [U-Boot] " Valentin Longchamp
2014-05-02 11:40 ` Mark Marshall
2014-05-02 11:40   ` [U-Boot] " Mark Marshall
2014-05-02 14:52   ` Valentin Longchamp
2014-05-02 14:52     ` Valentin Longchamp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53623597.8040909@keymile.com \
    --to=valentin.longchamp@keymile.com \
    --cc=holger.brunck@keymile.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=stefan.bigler@keymile.com \
    --cc=u-boot@lists.denx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.