Linux 2.4.17 bug, mmap of /dev/mem

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-08 16:07 David Ashley
  0 siblings, 0 replies; 33+ messages in thread
From: David Ashley @ 2002-02-08 16:07 UTC (permalink / raw)
  To: linuxppc-embedded


I use mmap of /dev/mem to access some pci devices in a user space program.
It can be any memory at all however, such as local bus memory. If the
program below is compiled, and the executable repeatedly run, there is
corruption of the system eventually. All the program does is read a longword
then write it back. If the write is removed, there is no problem. The read
is irrelevant. What I suspect is happening is perhaps the write causes a
page fault, which causes interrupt processing, and in that processing some
global structures are accessed which should be protected by semaphores.
Frequently I was getting a kernel Oops in the kupdated thread,
always inside the function get_hash_table. That function has some spinlocks
which get #defined away (no CONFIG_SMP, only 1 cpu). If I enable CONFIG_SMP
I don't get the corruption problems, but the kernel will eventually freeze
and if I stop it with the debugger it is in one of the spinlock loops.

This problem doesn't exist in x86 as far as I can tell.

/dev/mem is handled by drivers/char/mem.c. There is a special case there
#elif defined(__powerpc__)
	prot |= _PAGE_NO_CACHE | _PAGE_GUARDED;
#elif defined(__mc68000__)


Here is a simple program to demonstrate this bug. Just run this program
repeatedly and stuff starts happening (bad stuff).

#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define ADDR 0xf2300000
#define SIZE 0x00001000
main()
{
void *where;
int fd;
int t;
volatile unsigned long *base;

	fd=open("/dev/mem",O_RDWR);
	if(fd<0) {printf("Failed to open\n");return;}

	where=mmap(0,SIZE,PROT_READ|PROT_WRITE,MAP_SHARED,fd,ADDR);
	if(where==(void *)0xffffffff)
	{
		close(fd);
		printf("mmap failed\n");
		return -1;
	}

	base=(void *)where;

	t=base[0];
	base[0]=t;
	printf("%x,%x\n",where,t);
	munmap(where,SIZE);
	close(fd);
}

Any thoughts on this are welcome. I include my config file below.
This is running on a ppc 8260, linux version 2.4.17.

Thanks!
Dave

#
# Automatically generated make config: don't edit
#
# CONFIG_UID16 is not set
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_HAVE_DEC_LOCK=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y

#
# Loadable module support
#
CONFIG_MODULES=y
# CONFIG_MODVERSIONS is not set
CONFIG_KMOD=y

#
# Platform support
#
CONFIG_PPC=y
CONFIG_PPC32=y
CONFIG_6xx=y
# CONFIG_4xx is not set
# CONFIG_POWER3 is not set
# CONFIG_POWER4 is not set
# CONFIG_8xx is not set
CONFIG_8260=y
CONFIG_PPC_STD_MMU=y
CONFIG_SERIAL_CONSOLE=y
CONFIG_EST8260=y
# CONFIG_SMP is not set

#
# General setup
#
# CONFIG_HIGHMEM is not set
# CONFIG_ISA is not set
# CONFIG_EISA is not set
# CONFIG_SBUS is not set
# CONFIG_MCA is not set
CONFIG_PCI=y
CONFIG_NET=y
CONFIG_SYSCTL=y
CONFIG_SYSVIPC=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_ELF=y
CONFIG_KERNEL_ELF=y
# CONFIG_BINFMT_MISC is not set
# CONFIG_PCI_NAMES is not set
# CONFIG_HOTPLUG is not set
# CONFIG_PCMCIA is not set

#
# Parallel port support
#
# CONFIG_PARPORT is not set
# CONFIG_PPC_RTC is not set
# CONFIG_PPC601_SYNC_FIX is not set
# CONFIG_PROC_DEVICETREE is not set
# CONFIG_PPC_RTAS is not set
# CONFIG_BOOTX_TEXT is not set
# CONFIG_PREP_RESIDUAL is not set
# CONFIG_CMDLINE_BOOL is not set

#
# Memory Technology Devices (MTD)
#
CONFIG_MTD=y
# CONFIG_MTD_DEBUG is not set
# CONFIG_MTD_PARTITIONS is not set

#
# User Modules And Translation Layers
#
# CONFIG_MTD_CHAR is not set
# CONFIG_MTD_BLOCK is not set
# CONFIG_MTD_BLOCK_RO is not set
# CONFIG_FTL is not set
# CONFIG_NFTL is not set

#
# RAM/ROM/Flash chip drivers
#
# CONFIG_MTD_CFI is not set
# CONFIG_MTD_JEDECPROBE is not set
# CONFIG_MTD_GEN_PROBE is not set
# CONFIG_MTD_RAM is not set
# CONFIG_MTD_ROM is not set
# CONFIG_MTD_ABSENT is not set
# CONFIG_MTD_OBSOLETE_CHIPS is not set

#
# Mapping drivers for chip access
#

#
# Self-contained MTD device drivers
#
# CONFIG_MTD_PMC551 is not set
# CONFIG_MTD_SLRAM is not set
# CONFIG_MTD_MTDRAM is not set
# CONFIG_MTD_BLKMTD is not set

#
# Disk-On-Chip Device Drivers
#
# CONFIG_MTD_DOC1000 is not set
# CONFIG_MTD_DOC2000 is not set
# CONFIG_MTD_DOC2001 is not set
# CONFIG_MTD_DOCPROBE is not set

#
# NAND Flash Device Drivers
#
# CONFIG_MTD_NAND is not set

#
# Plug and Play configuration
#
# CONFIG_PNP is not set

#
# Block devices
#
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_NBD=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=32768
CONFIG_BLK_DEV_INITRD=y

#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set
# CONFIG_NETLINK_DEV is not set
# CONFIG_NETFILTER is not set
# CONFIG_FILTER is not set
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_INET_ECN is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_IPV6 is not set
# CONFIG_KHTTPD is not set
# CONFIG_ATM is not set
# CONFIG_VLAN_8021Q is not set

#
#
#
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_DECNET is not set
CONFIG_BRIDGE=y
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_LLC is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_FASTROUTE is not set
# CONFIG_NET_HW_FLOWCONTROL is not set

#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set

#
# ATA/IDE/MFM/RLL support
#
# CONFIG_IDE is not set
# CONFIG_BLK_DEV_IDE_MODES is not set
# CONFIG_BLK_DEV_HD is not set

#
# SCSI support
#
# CONFIG_SCSI is not set

#
# IEEE 1394 (FireWire) support (EXPERIMENTAL)
#
# CONFIG_IEEE1394 is not set

#
# Network device support
#
CONFIG_NETDEVICES=y

#
# ARCnet devices
#
# CONFIG_ARCNET is not set
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
# CONFIG_ETHERTAP is not set

#
# Ethernet (10 or 100Mbit)
#
CONFIG_NET_ETHERNET=y
# CONFIG_MACE is not set
# CONFIG_BMAC is not set
# CONFIG_GMAC is not set
# CONFIG_OAKNET is not set
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_NET_VENDOR_SMC is not set
# CONFIG_NET_VENDOR_RACAL is not set
# CONFIG_HP100 is not set
# CONFIG_NET_PCI is not set
# CONFIG_NET_POCKET is not set

#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_SK98LIN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set

#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set

#
# Token Ring devices
#
# CONFIG_TR is not set
# CONFIG_NET_FC is not set
# CONFIG_RCPCI is not set
# CONFIG_SHAPER is not set

#
# Wan interfaces
#
# CONFIG_WAN is not set

#
# Amateur Radio support
#
# CONFIG_HAMRADIO is not set

#
# IrDA (infrared) support
#
# CONFIG_IRDA is not set

#
# ISDN subsystem
#
# CONFIG_ISDN is not set

#
# Old CD-ROM drivers (not SCSI, not IDE)
#
# CONFIG_CD_NO_IDESCSI is not set

#
# Console drivers
#
# CONFIG_VGA_CONSOLE is not set

#
# Frame-buffer support
#
CONFIG_FB=y
CONFIG_DUMMY_CONSOLE=y
# CONFIG_FB_RIVA is not set
# CONFIG_FB_CLGEN is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_OF is not set
# CONFIG_FB_CONTROL is not set
# CONFIG_FB_PLATINUM is not set
# CONFIG_FB_VALKYRIE is not set
# CONFIG_FB_CT65550 is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_S3TRIO is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VIRTUAL is not set
CONFIG_FBCON_ADVANCED=y
# CONFIG_FBCON_MFB is not set
# CONFIG_FBCON_CFB2 is not set
# CONFIG_FBCON_CFB4 is not set
CONFIG_FBCON_CFB8=y
CONFIG_FBCON_CFB16=y
# CONFIG_FBCON_CFB24 is not set
# CONFIG_FBCON_CFB32 is not set
# CONFIG_FBCON_AFB is not set
# CONFIG_FBCON_ILBM is not set
# CONFIG_FBCON_IPLAN2P2 is not set
# CONFIG_FBCON_IPLAN2P4 is not set
# CONFIG_FBCON_IPLAN2P8 is not set
# CONFIG_FBCON_MAC is not set
# CONFIG_FBCON_VGA_PLANES is not set
# CONFIG_FBCON_VGA is not set
# CONFIG_FBCON_HGA is not set
# CONFIG_FBCON_FONTWIDTH8_ONLY is not set
# CONFIG_FBCON_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
# CONFIG_FB_COMPAT_XPMAC is not set

#
# Input core support
#
CONFIG_INPUT=y
CONFIG_INPUT_KEYBDEV=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=y
CONFIG_INPUT_EVDEV=y

#
# Macintosh device drivers
#
# CONFIG_ADB_CUDA is not set
# CONFIG_ADB_PMU is not set
# CONFIG_MAC_FLOPPY is not set
# CONFIG_MAC_SERIAL is not set
# CONFIG_ADB is not set
CONFIG_MAC_HID=y

#
# Character devices
#
CONFIG_VT=y
# CONFIG_VT_CONSOLE is not set
# CONFIG_SERIAL is not set
# CONFIG_SERIAL_NONSTANDARD is not set
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256

#
# I2C support
#
# CONFIG_I2C is not set

#
# Mice
#
# CONFIG_BUSMOUSE is not set
# CONFIG_MOUSE is not set

#
# Joysticks
#
# CONFIG_INPUT_GAMEPORT is not set
# CONFIG_INPUT_SERIO is not set

#
# Joysticks
#
# CONFIG_INPUT_IFORCE_USB is not set
# CONFIG_QIC02_TAPE is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
# CONFIG_INTEL_RNG is not set
# CONFIG_NVRAM is not set
# CONFIG_RTC is not set
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
# CONFIG_AGP is not set
# CONFIG_DRM is not set

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set

#
# File systems
#
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EXT3_FS is not set
# CONFIG_JBD is not set
# CONFIG_FAT_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_JFFS_FS is not set
# CONFIG_JFFS2_FS is not set
# CONFIG_CRAMFS is not set
CONFIG_TMPFS=y
# CONFIG_RAMFS is not set
# CONFIG_ISO9660_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_HPFS_FS is not set
CONFIG_PROC_FS=y
# CONFIG_DEVFS_FS is not set
CONFIG_DEVPTS_FS=y
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_EXT2_FS=y
# CONFIG_SYSV_FS is not set
# CONFIG_UDF_FS is not set
# CONFIG_UFS_FS is not set

#
# Network File Systems
#
# CONFIG_CODA_FS is not set
# CONFIG_INTERMEZZO_FS is not set
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFSD is not set
CONFIG_SUNRPC=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
# CONFIG_SMB_FS is not set
# CONFIG_NCP_FS is not set
# CONFIG_ZISOFS_FS is not set
# CONFIG_ZLIB_FS_INFLATE is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
# CONFIG_SMB_NLS is not set
# CONFIG_NLS is not set

#
# Sound
#
# CONFIG_SOUND is not set

#
# MPC8260 Communication Options
#
# CONFIG_SCC_ENET is not set
CONFIG_FEC_ENET=y
CONFIG_FCC1_ENET=y
CONFIG_FCC2_ENET=y
CONFIG_FCC3_ENET=y

#
# USB support
#
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set

#
# Miscellaneous USB options
#
# CONFIG_USB_DEVICEFS is not set
# CONFIG_USB_BANDWIDTH is not set
# CONFIG_USB_LONG_TIMEOUT is not set

#
# USB Controllers
#
# CONFIG_USB_UHCI is not set
# CONFIG_USB_UHCI_ALT is not set
CONFIG_USB_OHCI=y

#
# USB Device Class drivers
#
# CONFIG_USB_BLUETOOTH is not set
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set

#
# USB Human Interface Devices (HID)
#
CONFIG_USB_HID=y
# CONFIG_USB_HIDDEV is not set
# CONFIG_USB_WACOM is not set

#
# USB Imaging devices
#
# CONFIG_USB_DC2XX is not set
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_SCANNER is not set

#
# USB Multimedia devices
#

#
#   Video4Linux support is needed for USB Multimedia device support
#

#
# USB Network adaptors
#
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_CATC is not set
# CONFIG_USB_CDCETHER is not set
# CONFIG_USB_USBNET is not set

#
# USB port drivers
#

#
# USB Serial Converter support
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_RIO500 is not set

#
# Bluetooth support
#
# CONFIG_BLUEZ is not set

#
# Kernel hacking
#
# CONFIG_MAGIC_SYSRQ is not set
# CONFIG_KGDB is not set
# CONFIG_XMON is not set

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-12  0:36 David Ashley
  0 siblings, 0 replies; 33+ messages in thread
From: David Ashley @ 2002-02-12  0:36 UTC (permalink / raw)
  To: linuxppc-embedded

I found out more about the problem. If I mmap 2 regions that are distinct,
even though they map to the same physical address in non-cacheable memory,
if I only use one for reading and the other for writing, there is no
corruption of linux and the system works. The problem happens when I both
read and write to the same mmap'd area.

Easy enough workaround. Now maybe some kernel expert can figure it out???????

Later--
Dave

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-14  9:22 Goddeeris Frederic
  0 siblings, 0 replies; 33+ messages in thread
From: Goddeeris Frederic @ 2002-02-14  9:22 UTC (permalink / raw)
  To: 'David Ashley ',
	'linuxppc-embedded@lists.linuxppc.org '

Hi David,

I tested your application on my board (Embedded Planet CLLF (mpc860), Linux
2.4.2) and made a script that executes it as fast as possible.
I let it run for 10 minutes on several telnet sessions simultaneously but I
do not see anything strange... The memory I write to is NVRAM.

Frederic

-----Original Message-----
From: David Ashley
To: linuxppc-embedded@lists.linuxppc.org
Sent: 2/12/02 1:36 AM
Subject: Re: Linux 2.4.17 bug, mmap of /dev/mem

I found out more about the problem. If I mmap 2 regions that are
distinct,
even though they map to the same physical address in non-cacheable
memory,
if I only use one for reading and the other for writing, there is no
corruption of linux and the system works. The problem happens when I
both
read and write to the same mmap'd area.

Easy enough workaround. Now maybe some kernel expert can figure it
out???????

Later--
Dave

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-14 17:06 David Ashley
  0 siblings, 0 replies; 33+ messages in thread
From: David Ashley @ 2002-02-14 17:06 UTC (permalink / raw)
  To: linuxppc-embedded

Hmmm. Too many differences, the biggest being 2.4.17 vs 2.4.2.

I forgot to mention to the mailing list that the order of reads/writes to
a page matters.

In every mmap'd page, if the access order is:
reads only    = no problem
writes only   = no problem
write, read, then anything  = no problem
read, write   = trouble

Each page is a separate entity. Meaning if I mmap 2 pages of /dev/mem,
then write to page 0, I'll not have any trouble from page 0 from then on.
But if I read from page 1, then write to page 1, there is a chance of
corrupting linux.

-Dave

>Hi David,
>
>I tested your application on my board (Embedded Planet CLLF (mpc860), Linux
>2.4.2) and made a script that executes it as fast as possible.
>I let it run for 10 minutes on several telnet sessions simultaneously but I
>do not see anything strange... The memory I write to is NVRAM.
>
>Frederic

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-15  7:17 Goddeeris Frederic
  0 siblings, 0 replies; 33+ messages in thread
From: Goddeeris Frederic @ 2002-02-15  7:17 UTC (permalink / raw)
  To: 'David Ashley ',
	'linuxppc-embedded@lists.linuxppc.org '

Hi David,

Yes, there are a lot of differences.

Did you try other protections flags like VM_IO (I think this is selected
automatically in mem.c) and VM_RESERVED?

Fred

-----Original Message-----
From: David Ashley
To: linuxppc-embedded@lists.linuxppc.org
Sent: 2/14/02 6:06 PM
Subject: RE: Linux 2.4.17 bug, mmap of /dev/mem

Hmmm. Too many differences, the biggest being 2.4.17 vs 2.4.2.

I forgot to mention to the mailing list that the order of reads/writes
to
a page matters.

In every mmap'd page, if the access order is:
reads only    = no problem
writes only   = no problem
write, read, then anything  = no problem
read, write   = trouble

Each page is a separate entity. Meaning if I mmap 2 pages of /dev/mem,
then write to page 0, I'll not have any trouble from page 0 from then
on.
But if I read from page 1, then write to page 1, there is a chance of
corrupting linux.

-Dave

>Hi David,
>
>I tested your application on my board (Embedded Planet CLLF (mpc860),
Linux
>2.4.2) and made a script that executes it as fast as possible.
>I let it run for 10 minutes on several telnet sessions simultaneously
but I
>do not see anything strange... The memory I write to is NVRAM.
>
>Frederic

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-20 15:54 David Ashley
  0 siblings, 0 replies; 33+ messages in thread
From: David Ashley @ 2002-02-20 15:54 UTC (permalink / raw)
  To: linuxppc-embedded

It appears that drivers/char/mem.c in the mmap_mem function already
has the VM_RESERVED:
	/* Don't try to swap out physical pages.. */
	vma->vm_flags |= VM_RESERVED;

I tried adding the | VM_IO also there, and the behaviour changed (not in
a good way though). I could run my script a few times and I don't get the
kernel errors I had been getting (panic, kernel access of bad area, and
such like). But after running it several times first the shell froze
up, then linux itself.

In this case my script launched the mt program 64 times. The mt program
mmaps, then reads once, writes once, then exits.

-Dave

>Yes, there are a lot of differences.
>
>Did you try other protections flags like VM_IO (I think this is selected
>automatically in mem.c) and VM_RESERVED?
>
>Fred

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-25 18:16 David Ashley
  2002-02-25 18:51 ` Dan Malek
  0 siblings, 1 reply; 33+ messages in thread
From: David Ashley @ 2002-02-25 18:16 UTC (permalink / raw)
  To: linuxppc-embedded

I went back to an earlier version of linux, the 2.4.2_hhl20, and it
doesn't have the mmap problem.

I suppose I can do a binary search on kernel versions, to find out exactly
when the problem appears.

Can anyone tell me who "the guy" is for the powerpc virtual memory system?
I think maybe I'm not asking for help in the right place.

Thanks--
Dave

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-25 18:16 David Ashley
@ 2002-02-25 18:51 ` Dan Malek
  0 siblings, 0 replies; 33+ messages in thread
From: Dan Malek @ 2002-02-25 18:51 UTC (permalink / raw)
  To: David Ashley; +Cc: linuxppc-embedded

David Ashley wrote:

> Can anyone tell me who "the guy" is for the powerpc virtual memory system?
> I think maybe I'm not asking for help in the right place.

You are asking in the right place, but some of us that should be looking
at it are swamped on other projects at the moment.  I've been using
mmap() without trouble to do similar things, and when I get a moment
I'll peruse your messages in the archives.

Thanks.

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-25 20:27 David Ashley
  2002-02-25 20:54 ` Dan Malek
  0 siblings, 1 reply; 33+ messages in thread
From: David Ashley @ 2002-02-25 20:27 UTC (permalink / raw)
  To: linuxppc-embedded

In the meantime I managed to make some minimal changes to 2.4.14 to
get it partially up on our box. The modified files are
Makefile
  Setting arch: to ppc, and setting up CROSS_COMPILE

arch/ppc/8260_io/uart.c
arch/ppc/8260_io/fcc_enet.c
  Slight changes to reflect our port assignments

fs/proc/proc_misc.c
include/asm-ppc/est8260.h
  These work together to pass some information from ppcboot to linux,
  so I can configure networking and whatnot.

Basically I'm making no major changes to the linux kernel, and the problem
is still there. I am trying to get 2.4.8 working but it looks like things
have moved around quite a bit, so I'm not sure how successful I'll be...

So: 2.4.2_hhl20 doesn't have the problem
2.4.8 = still trying to get it up on our box
2.4.14 has the problem
2.4.17 has the problem

Here is the current program to demonstrate the problem:

#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define ADDR 0xf0010000
#define SIZE 0x00002000

main()
{
void *where;
int fd;
int t;
volatile unsigned long *base;
int i;

	fd=open("/dev/mem",O_RDWR);
	if(fd<0) {printf("Failed to open\n");return;}

	for(i=0;i<500;++i)
	{
		int status;
		if(fork()) {wait(&status);continue;}

		where=mmap(0,SIZE,PROT_READ|PROT_WRITE,MAP_SHARED,fd,ADDR);
		if(where==(void *)0xffffffff)
		{
			close(fd);
			printf("mmap failed\n");
			return -1;
		}

		base=(void *)where;

		t=base[0];
		base[0]=t;
		printf("%4d,%x\n",i,t);
		munmap(where,SIZE);
		exit(0);
	}
	close(fd);
	printf("done---mt\n");
}
---cut---
The above program fails at about iteration 228 on linux 2.4.17. On 2.4.14
it fails at an unpredictable iteration, from maybe 180 to 350. The number
of other seemingly harmless shell comands executed, like "ls", before running
the above program has an effect, meaning the above program will fail sooner
the more times a program has been executed by the shell.

In the case of 2.4.17, if I power up the box, wait until login prompt, login
as root, then run the above program, it very reliably fails at iteration
228. If I do 5 'ls' commands before, then it fails at iteration 223. Strange,
huh?

BTW the mmap is mapping the IMM area at 0xf0xxxxxx, but since the program
does a read then write back of the same value nothing should be affected.
The register I'm setting is the SIUMCR on the 8260.

Thanks--
Dave

>You are asking in the right place, but some of us that should be looking
>at it are swamped on other projects at the moment.  I've been using
>mmap() without trouble to do similar things, and when I get a moment
>I'll peruse your messages in the archives.
>
>Thanks.
>
>        -- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-25 20:27 David Ashley
@ 2002-02-25 20:54 ` Dan Malek
  2002-02-25 21:06   ` Dan Malek
  2002-02-25 22:36   ` Wolfgang Denk
  0 siblings, 2 replies; 33+ messages in thread
From: Dan Malek @ 2002-02-25 20:54 UTC (permalink / raw)
  To: David Ashley; +Cc: linuxppc-embedded

David Ashley wrote:

> #define ADDR 0xf0010000
> #define SIZE 0x00002000

Oh, now I remember......I found it amusing someone could think they
could just map the CPM memory and start reading and writing it.
You can't do stuff like that and expect the system to keep running
correctly.  The first 128 bytes of the DPRAM are initialized for
the SMC (whether you use it or not).  You have to be really, really
careful when you map anything like this, and you have to understand
the interaction of everything else that may also have access to these
memory spaces.  A common mistake is people map things like GPIO into
application space, and then think they can atomically update the
registers.  This doesnt' work because there may be drivers that
also do the same thing.

> The above program fails at about iteration 228 on linux 2.4.17. On 2.4.14
> it fails at an unpredictable iteration, from maybe 180 to 350. The number
> of other seemingly harmless shell comands executed, like "ls",

How does it fail?  If you are actually using the SMC as a console device
I'm surprised it runs that long.

There isn't anything wrong with mmap()......

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-25 20:54 ` Dan Malek
@ 2002-02-25 21:06   ` Dan Malek
  2002-02-25 22:36   ` Wolfgang Denk
  1 sibling, 0 replies; 33+ messages in thread
From: Dan Malek @ 2002-02-25 21:06 UTC (permalink / raw)
  To: Dan Malek; +Cc: David Ashley, linuxppc-embedded


Dan Malek wrote:

> Oh, now I remember......I found it amusing someone could think they
> could just map the CPM memory and start reading and writing it.

...Or the memory controller or whatever else is there.


	-- Dan


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-25 22:29 David Ashley
  2002-02-25 22:41 ` Wolfgang Denk
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: David Ashley @ 2002-02-25 22:29 UTC (permalink / raw)
  To: linuxppc-embedded

There is an issue here where I'm trying to give you or whoever is interested
in this thread a test program to run that will demonstrate the problem.
I don't actually bang the SMC or CPM or whatever, I am trying to do
perfectly valid stuff with a user level program accessing io space of
pci devices. There is no kernel level code accessing the device I'm trying
to work with. It makes absolutely no difference where the mmap goes to, as
long as it is not normal system ram.

Since my hardware isn't the same as anyone else's, I just bang the IMM
in a harmless way. This demonstrates the problem just as well on my box.
So I assume since you've got 60x hardware with an IMM you can try the same
thing on your hardware and see the same failure I'm seeing.

When you say mmap() works, I agree it works, mostly. But if you do things
like in my program enough times, the system ends up being corrupted.
Various kernel threads like kupdated have a kernel panic. Stuff just starts
failing, like system memory is getting corrupted. All this is described in
detail in this thread.

There is a bug in linux PPC, and that's what I'm trying to resolve. Here
is the last part of the printout of the execution of that program:

---cut--- (this is kernel 2.4.17 btw)
 224,42040000
 225,42040000
 226,42040000
 227,42040000
 228,42040000
Oops: kernel access of bad area, sig: 11
NIP: C0011D80 XER: 00000000 LR: C0011CB0 SP: C2A9DEF0 REGS: c2a9de40 TRAP: 0300d
MSR: 00001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: A3EBDA34, DSISR: 22000000
TASK = c2a9c000[108] 'mt' Last syscall: 2
last math 00000000 last altivec 00000000
GPR00: C2A780B0 C2A9DEF0 C2A9C000 00000001 C2A9C384 00000000 C2A78384 00000000
GPR08: C019E900 A3EBD980 C019E3B4 0000054C 24000242 10018940 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 00009032 02A9DF40 00000000 00000000
GPR24: C2A9DF50 7FFFFD10 00000152 00000011 C3E7D160 C2A780A8 C3E7D1A0 C2A78000
Call backtrace:
C0011C3C C0006A2C C0003D7C 10000698 0FEDA188 00000000
Segmentation fault
---cut---

The system is corrupted. If I try to do an 'ls', I get this:
---cut---
kernel BUG at memory.c:375!
Oops: Exception in kernel mode, sig: 4
NIP: C00220FC XER: 00000000 LR: C00220FC SP: C2A9DDE0 REGS: c2a9dd30 TRAP: 0700d
MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c2a9c000[339] 'bash' Last syscall: 30583
last math 00000000 last altivec 00000000
GPR00: C00220FC C2A9DDE0 C2A9C000 0000001C 00001032 00000001 C02F6160 C01A112A
GPR08: 00000000 00000000 0000001F C2A9DD00 0000000D 100AC3EC 00000000 00000000
GPR16: 00000000 00000000 00000000 C01961C0 00000000 02A9DF40 00000000 C0003FB4
GPR24: C0003D20 100C2870 C3DAEB8C 00000000 00000000 00000000 C3E53000 C3DDC240
Call backtrace:
C00220FC C0025098 C0011020 C00165A8 C00082FC C0003FE8
Illegal instruction
---cut---

This is a very real bug in linux ppc, I'm convinced of that. The crucial
thing is
mmap some region of io space
read from a page
write to the same page
Repeat and rinse, something will corrupt the system.

reads alone are ok.
writes alone are ok.
write followed by any combination of reads or writes is ok
read followed by write = trouble

-Dave

>David Ashley wrote:
>
>
>> #define ADDR 0xf0010000
>> #define SIZE 0x00002000
>
>Oh, now I remember......I found it amusing someone could think they
>could just map the CPM memory and start reading and writing it.
>You can't do stuff like that and expect the system to keep running
>correctly.  The first 128 bytes of the DPRAM are initialized for
>the SMC (whether you use it or not).  You have to be really, really
>careful when you map anything like this, and you have to understand
>the interaction of everything else that may also have access to these
>memory spaces.  A common mistake is people map things like GPIO into
>application space, and then think they can atomically update the
>registers.  This doesnt' work because there may be drivers that
>also do the same thing.
>
>> The above program fails at about iteration 228 on linux 2.4.17. On 2.4.14
>> it fails at an unpredictable iteration, from maybe 180 to 350. The number
>> of other seemingly harmless shell comands executed, like "ls",
>
>How does it fail?  If you are actually using the SMC as a console device
>I'm surprised it runs that long.
>
>There isn't anything wrong with mmap()......
>
>
>        -- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-25 20:54 ` Dan Malek
  2002-02-25 21:06   ` Dan Malek
@ 2002-02-25 22:36   ` Wolfgang Denk
  1 sibling, 0 replies; 33+ messages in thread
From: Wolfgang Denk @ 2002-02-25 22:36 UTC (permalink / raw)
  To: Dan Malek; +Cc: David Ashley, linuxppc-embedded


In message <3C7AA49D.8020809@embeddededge.com> you wrote:
>
> memory spaces.  A common mistake is people map things like GPIO into
> application space, and then think they can atomically update the
> registers.  This doesnt' work because there may be drivers that
> also do the same thing.

...and things get even worse when you happen to use the  RISC  timers
for  one  or  more  PWM  channels: you'll see completely asynchronous
updates of PBDAT :-(

Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88  Email: wd@denx.de
"Engineering without management is art."               - Jeff Johnson

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-25 22:29 David Ashley
@ 2002-02-25 22:41 ` Wolfgang Denk
  2002-02-26  0:57 ` Greg Griffes
  2002-02-26  1:34 ` Dan Malek
  2 siblings, 0 replies; 33+ messages in thread
From: Wolfgang Denk @ 2002-02-25 22:41 UTC (permalink / raw)
  To: David Ashley; +Cc: linuxppc-embedded


In message <200202252229.g1PMTdQ02395@xdr.com> you wrote:
>
> There is an issue here where I'm trying to give you or whoever is interested
> in this thread a test program to run that will demonstrate the problem.
> I don't actually bang the SMC or CPM or whatever, I am trying to do
> perfectly valid stuff with a user level program accessing io space of
> pci devices. There is no kernel level code accessing the device I'm trying
> to work with. It makes absolutely no difference where the mmap goes to, as
> long as it is not normal system ram.

Did you ever check the address returned from the mmap() call?  Is  it
always the same, or in the same range, or does it actually look as if
you were leaking mmap()ed memory even though you unmapped it?

Which errno is returned when mmap() fails?



Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88  Email: wd@denx.de
How many seconds are there in a year? If I tell you there are 3.155 x
10^7, you won't even try to remember it. On the other hand, who could
forget that, to within half a percent, pi seconds is  a  nanocentury.
                                               -- Tom Duff, Bell Labs

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
       [not found] <3C7AC345.301@embeddededge.com>
@ 2002-02-25 23:26 ` Wolfgang Denk
  0 siblings, 0 replies; 33+ messages in thread
From: Wolfgang Denk @ 2002-02-25 23:26 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded


In message <3C7AC345.301@embeddededge.com> Dan Malek wrote:
>
> > ....and things get even worse when you happen to use the  RISC  timers
> > for  one  or  more  PWM  channels: you'll see completely asynchronous
> > updates of PBDAT :-(
>
> Yeah, that one sucks.  Did you get a chance to try my GP Timer suggestion
> or didn't they fit the requirements?  They have dedicated I/O that doesn't

I know it would have worked, but in this case I had to  avoid  a  new
layout of the board :-(

Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88  Email: wd@denx.de
Minds are like parachutes - they only function when open.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-25 23:43 David Ashley
  0 siblings, 0 replies; 33+ messages in thread
From: David Ashley @ 2002-02-25 23:43 UTC (permalink / raw)
  To: linuxppc-embedded

mmap doesn't fail, it works, but running the program causes the system
to fail. It is something deeper, in the virtual memory system, that is
causing the trouble.

mmap is always returning the same address, something like
c0017000
for example.

-Dave

>Did you ever check the address returned from the mmap() call?  Is  it
>always the same, or in the same range, or does it actually look as if
>you were leaking mmap()ed memory even though you unmapped it?
>
>Which errno is returned when mmap() fails?
>
>
>
>Wolfgang Denk

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-26  0:06 David Ashley
  0 siblings, 0 replies; 33+ messages in thread
From: David Ashley @ 2002-02-26  0:06 UTC (permalink / raw)
  To: linuxppc-embedded

I have been trying other kernels with some different results
2.4.17 = has mmap problem
2.4.14 = has mmap problem
2.4.13 = has mmap problem
2.4.12 = has mmap problem
2.4.10 = has mmap problem but takes more iterations to happen
2.4.8  = not quite done getting it to boot.

The only files I change to get a kernel to boot on our box are:
Makefile
  set arch to ppc and set the CROSS_COMPILE
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8260_io/uart.c
arch/ppc/8260_io/commproc.c
  Slight changes to deal with our board's io port assignments
include/asm-ppc/est8260.c
  changes in the boardinfo structure passed from ppcboot
fs/proc/proc_misc.c
  added a /proc file to view some of the boardinfo structure

I'm not initializing usb or pci in the non-2.4.17 kernels.

-Dave

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-26  0:18 David Ashley
  0 siblings, 0 replies; 33+ messages in thread
From: David Ashley @ 2002-02-26  0:18 UTC (permalink / raw)
  To: linuxppc-embedded


I got 2.4.8 working, and it doesn't exhibit the mmap problem.

2.4.8 has other issues, like it doesn't report bogomips properly:
processor       : 0
cpu             : 82xx
core clock      : 166002750 MHz
CPM  clock      : 132802200 MHz
bus  clock      : 66401100 MHz
revision        : 1.1 (pvr 0081 0101)
bogomips        : 1507.32
zero pages      : total: 0 (0Kb) current: 0 (0Kb) hits: 0/0 (0%)

So: 2.4.8 does not have the mmap problem, 2.4.10 does have the mmap
problem. I'll try 2.4.9 now...

-Dave

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-26  0:36 David Ashley
  0 siblings, 0 replies; 33+ messages in thread
From: David Ashley @ 2002-02-26  0:36 UTC (permalink / raw)
  To: linuxppc-embedded

I got 2.4.9 working, and it also has no problem with the repeated mmaps.

So something changed between 2.4.9 and 2.4.10 that causes this.
What a day! Bringing up 6 different versions of linux on our box...

Thanks for everyone's patience.

Hopefully this will be of some us in tracking this down.

-Dave

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-25 22:29 David Ashley
  2002-02-25 22:41 ` Wolfgang Denk
@ 2002-02-26  0:57 ` Greg Griffes
  2002-02-26  1:34 ` Dan Malek
  2 siblings, 0 replies; 33+ messages in thread
From: Greg Griffes @ 2002-02-26  0:57 UTC (permalink / raw)
  To: linuxppc-embedded


(snip)
> The crucial thing is mmap some region of io space
> read from a page write to the same page
> Repeat and rinse, something will corrupt the system.
>
> reads alone are ok.
> writes alone are ok.
> write followed by any combination of reads or writes is ok
> read followed by write = trouble

I am an embedded PPC Linux novice, so, tell me if I'm way off base.
This sounds like a pipeline problem; out of order I/O execution.
Could there be an "eieio" missing somewhere?

Greg Griffes


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-25 22:29 David Ashley
  2002-02-25 22:41 ` Wolfgang Denk
  2002-02-26  0:57 ` Greg Griffes
@ 2002-02-26  1:34 ` Dan Malek
  2 siblings, 0 replies; 33+ messages in thread
From: Dan Malek @ 2002-02-26  1:34 UTC (permalink / raw)
  To: David Ashley; +Cc: linuxppc-embedded

David Ashley wrote:

> There is a bug in linux PPC, and that's what I'm trying to resolve. Here
> is the last part of the printout of the execution of that program:

You must have some unique hardware :-).  I've run 10,000 iterations on a 8260
and 860 with the 2.4.18-pre7 kernel, and on my 7410 PowerBook with 2.4.11
without any failures........Let us know what you fix to make it work for you.

Thanks.

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-26  3:15 David Ashley
  2002-02-26  3:50 ` Dan Malek
  0 siblings, 1 reply; 33+ messages in thread
From: David Ashley @ 2002-02-26  3:15 UTC (permalink / raw)
  To: dan, dash; +Cc: linuxppc-embedded

>David Ashley wrote:
>
>
>> There is a bug in linux PPC, and that's what I'm trying to resolve. Here
>> is the last part of the printout of the execution of that program:
>
>You must have some unique hardware :-).  I've run 10,000 iterations on a 8260
>and 860 with the 2.4.18-pre7 kernel, and on my 7410 PowerBook with 2.4.11
>without any failures........Let us know what you fix to make it work for you.
>
>Thanks.
>
>
>	-- Dan
>
>

Maybe you can point me to some discussion of how linux operates? I mean,
once the memory is mapped with the page tables, what happens once the
process does a read to a page? Does that generate a page fault? The crucial
thing is read followed by write. I suspect the problem isn't in the setting
up of the page, but in when it gets accessed and the resultant exceptions
or interrupts that produces. It seems like all discussions on this are
outdated and only apply to older kernels...

-Dave

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-26  3:15 Linux 2.4.17 bug, mmap of /dev/mem David Ashley
@ 2002-02-26  3:50 ` Dan Malek
  2002-02-26 14:43   ` John W. Linville
  0 siblings, 1 reply; 33+ messages in thread
From: Dan Malek @ 2002-02-26  3:50 UTC (permalink / raw)
  To: David Ashley; +Cc: linuxppc-embedded

David Ashley wrote:

> Maybe you can point me to some discussion of how linux operates? I mean,
> once the memory is mapped with the page tables, what happens once the
> process does a read to a page? Does that generate a page fault?

It isn't really unique to Linux.  Yes, the access can generate a page
fault, which will cause a kernel exception to load the TLB.  This can
generate some weird looking, early terminated bus timing, which is
perfectly within the specifications of the hardware but isn't something
the designers always consider.  I've seen this quite often on the 8xx,
but fortunately have never had to attach a logic analyzer to a 60x bus.

So, I doubt it is any Linux or software problem, but more likely something
wrong with the timing on the bus that is resulting in incorrect data
returned to a memory access.

> ....... It seems like all discussions on this are
> outdated and only apply to older kernels...

The basic concepts of how all of this works hasn't changed much.  There
have been lots of detailed updates to make it more efficient or flexible.
IIRC, somewhere around the 2.4.7 timeframe was a major VM change,
we were also making changes for tracking changed attributes and Paulus
made some other instruction page invalidate enhancements.  Except at
the lowest level of processor specific MMU details, all of the PowerPC
and Linux VM is the same.  A bug in the lower level functions is usually
quite obvious and quickly addressed.

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-26  3:50 ` Dan Malek
@ 2002-02-26 14:43   ` John W. Linville
  2002-02-26 15:18     ` Wolfgang Denk
  2002-02-26 17:06     ` Dan Malek
  0 siblings, 2 replies; 33+ messages in thread
From: John W. Linville @ 2002-02-26 14:43 UTC (permalink / raw)
  To: Dan Malek; +Cc: David Ashley, linuxppc-embedded

Dan,

Could you elaborate on the problems associated w/ bus timings that
you've seen on the 8xx?  We've been seeing a lot of unexplained Oops
messages (and even crashes) on one of our hardware platforms.  The only
common thread seems to be dereferncing bad pointer values, but they
occur in so many different places...

I've asked our hardware guys to take a look at the settings for the UPM
we are using to control SDRAM.  Do you think we are on the right track?
Can you provide any guidance?

Thanks in advance for any help you can provide!  I'll buy you a beer and
some maple candy the next time I'm up your way! :-)

John

Dan Malek wrote:
>
> David Ashley wrote:
>
> > Maybe you can point me to some discussion of how linux operates? I mean,
> > once the memory is mapped with the page tables, what happens once the
> > process does a read to a page? Does that generate a page fault?
>
> It isn't really unique to Linux.  Yes, the access can generate a page
> fault, which will cause a kernel exception to load the TLB.  This can
> generate some weird looking, early terminated bus timing, which is
> perfectly within the specifications of the hardware but isn't something
> the designers always consider.  I've seen this quite often on the 8xx,
> but fortunately have never had to attach a logic analyzer to a 60x bus.
>
> So, I doubt it is any Linux or software problem, but more likely something
> wrong with the timing on the bus that is resulting in incorrect data
> returned to a memory access.

--
John W. Linville
LVL7 Systems, Inc.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-26 14:43   ` John W. Linville
@ 2002-02-26 15:18     ` Wolfgang Denk
  2002-02-26 17:06     ` Dan Malek
  1 sibling, 0 replies; 33+ messages in thread
From: Wolfgang Denk @ 2002-02-26 15:18 UTC (permalink / raw)
  To: John W. Linville; +Cc: linuxppc-embedded

John,

in message <3C7B9F2E.40BC7C46@lvl7.com> you wrote:
>
> I've asked our hardware guys to take a look at the settings for the UPM
> we are using to control SDRAM.  Do you think we are on the right track?
> Can you provide any guidance?

Please be aware that UPM settings alone  are  NOT  sufficient  for  a
correct   initialization   of   most  SDRAM  chips.  Check  with  the
documentation for your SDRAM chips,  and  follow  the  initialization
sequence  TO THE LETTER. Pay special attention to minimum and maximum
delays, required dummy accesses, etc.

Wolfgang Denk

--
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-4596-87  Fax: (+49)-8142-4596-88  Email: wd@denx.de
A committee is a life form with six or more legs and no brain.
                              -- Lazarus Long, "Time Enough For Love"

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-26 16:00 David Ashley
  0 siblings, 0 replies; 33+ messages in thread
From: David Ashley @ 2002-02-26 16:00 UTC (permalink / raw)
  To: linuxppc-embedded; +Cc: dan, linville

I have seen similiar behaviour with our box, which uses the 8260. The problem
was traced down to the CPM and the external device (a pci bus master)
both accessing the 60x bus. I believe the fault was in the CPM itself.
The solution was to keep the CPM off the 60x bus, and instead use some
local bus ram for all the BD's and buffers. Also the GBL bit would be set to
0 also or there would be trouble. Our local bus ram was non-cacheable.

Our symptoms were strange crashing, strange addresses appearing on the bus,
just general flaky behaviour. If we had no external bus master, the system
was rock solid. Once an external bus master took control of the 60x bus
there would be trouble. I posted to this list more detailed information, so
it will be in the archives.

-Dave

>Dan,
>
>Could you elaborate on the problems associated w/ bus timings that
>you've seen on the 8xx?  We've been seeing a lot of unexplained Oops
>messages (and even crashes) on one of our hardware platforms.  The only
>common thread seems to be dereferncing bad pointer values, but they
>occur in so many different places...
>
>I've asked our hardware guys to take a look at the settings for the UPM
>we are using to control SDRAM.  Do you think we are on the right track?
>Can you provide any guidance?
>
>Thanks in advance for any help you can provide!  I'll buy you a beer and
>some maple candy the next time I'm up your way! :-)
>
>John

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-26 14:43   ` John W. Linville
  2002-02-26 15:18     ` Wolfgang Denk
@ 2002-02-26 17:06     ` Dan Malek
  1 sibling, 0 replies; 33+ messages in thread
From: Dan Malek @ 2002-02-26 17:06 UTC (permalink / raw)
  To: John W. Linville; +Cc: David Ashley, linuxppc-embedded

John W. Linville wrote:

> Could you elaborate on the problems associated w/ bus timings that
> you've seen on the 8xx?

I don't remember the details anymore.  I just remember looking at logic
analyzer traces of the system bus around MMU faults and CPM DMA and
thinking "....so that's what they mean in this timing diagram...."
As I recall, the bus lines would start to change state, but never really
start a cycle.

> .... We've been seeing a lot of unexplained Oops
> messages (and even crashes) on one of our hardware platforms.

This could be for many reasons.

> ....  The only
> common thread seems to be dereferncing bad pointer values,

Well, you aren't going to crash if you don't have bad pointer values,
so that is what you are going to see :-).

> I've asked our hardware guys to take a look at the settings for the UPM
> we are using to control SDRAM.  Do you think we are on the right track?
> Can you provide any guidance?

It could be incorrect timing.  That is one of the common problems.  You
don't get worst case bus timing until you fire up copyback caches, enable
the MMU, and start up the CPM DMA.  The core itself can't generate back to
back DRAM cycles.  Marginal timing or parts are going to show up on only
some boards.

If the DRAM timing is correct, these "weird" bus cycles don't bother DRAMs.
It usually messes up other devices glued on the 8xx bus that are trying to
decode their own address space.  It isn't like this isn't documented, but
it is easy to ignore or not understand until you see it in action.

> Thanks in advance for any help you can provide!  I'll buy you a beer and
> some maple candy the next time I'm up your way! :-)

Anything associated with UPM and SDRAM is going to cost you lots more
than a beer :-).

Good Luck.

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-26 20:17 David Ashley
  0 siblings, 0 replies; 33+ messages in thread
From: David Ashley @ 2002-02-26 20:17 UTC (permalink / raw)
  To: linuxppc-embedded

The problem is related to CONFIG_VT. If CONFIG_VT is set, the problem happens.
If CONFIG_VT isn't set, the problem doesn't happen, or doesn't happen as
often.
2.4.10 CONFIG_VT = off, no problem
2.4.14 CONFIG_VT = off, no problem
in both cases if I turn CONFIG_VT on, the mmap problem manifests itself.

2.4.17 CONFIG_VT = on, CONFIG_FB = off, CONFIG_INPUT = off, mmap program
   fails after 12 iterations
2.4.17 CONFIG_VT = off, CONFIG_FB = off, CONFIG_INPUT = off, mmap program
   fails after 6000+ iterations.
2.4.17 CONFIG_VT = on, CONFIG_FB = on, CONFIG_INPUT = on, mmap program
   fails after 228 iterations.

This is really a puzzler to me.
-Dave

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-27 21:04 David Ashley
  2002-02-27 21:06 ` Dan Malek
  0 siblings, 1 reply; 33+ messages in thread
From: David Ashley @ 2002-02-27 21:04 UTC (permalink / raw)
  To: dan; +Cc: linuxppc-embedded

I've traced the problem down to arch/ppc/mm/hashtable.S. When
there is a page fault, the function hash_page gets called. This does
some hashing and writes the hash values into a table located at
0xc0180000. That is the default value, before patching. These writes are
what is corrupting the linux kernel, because they are on top of linux
itself.

In arch/ppc/mm/ppc_mmu.c the function MMU_init_hw is called, but
since the 8260 doesn't have the CPU_FTR_HPTE_TABLE feature, the
hash table is never allocated and the hash_page_patch_* never get updated.

Turning on the CPU_FTR_HPTE_TABLE for the 8260 doesn't fix the problem.

I'm out of my depth here. This is a bug in linux, I know that much. But I
don't know what is supposed to happen during a page fault. So I need help
in resolving this.

Thanks--
Dave

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-27 21:04 David Ashley
@ 2002-02-27 21:06 ` Dan Malek
  0 siblings, 0 replies; 33+ messages in thread
From: Dan Malek @ 2002-02-27 21:06 UTC (permalink / raw)
  To: David Ashley; +Cc: linuxppc-embedded

David Ashley wrote:

> I've traced the problem down to arch/ppc/mm/hashtable.S. When
> there is a page fault, the function hash_page gets called.

In the case of a 603 core, hash_page is called for DSI (Data Access)
faults.  However, if the feature indicates there is no HPTE, the
hash_page function is patched to simply return.  You can't look at
the code in hashtable.S and know how it is going to work for a particular
implementation because it is patched at initialization to change
it's behavior.

> .....This does
> some hashing and writes the hash values into a table located at
> 0xc0180000.

When your kernel boots, does it print a message to indicate it has allocated
a hash table?

> In arch/ppc/mm/ppc_mmu.c the function MMU_init_hw is called, but
> since the 8260 doesn't have the CPU_FTR_HPTE_TABLE feature, the
> hash table is never allocated and the hash_page_patch_* never get updated.

Oh, I just looked at a variety of different versions back to 2.4.11, and it
is patched just as I described above.

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-27 21:36 David Ashley
  0 siblings, 0 replies; 33+ messages in thread
From: David Ashley @ 2002-02-27 21:36 UTC (permalink / raw)
  To: dan; +Cc: linuxppc-embedded


Let's agree on one thing: This isn't my theory of what is
happening. This *is* what is happening. I can single step the cpu with
the BDI2000 and see the writes taking place.

So when you say hash_page simply returns, well it isn't that way in
actuality. Where is it supposed to be patched? What isn't working correctly
to have hash_page just return? How can I fix this?

There is no message about allocating the hashtable.

-Dave


>David Ashley wrote:
>
>> I've traced the problem down to arch/ppc/mm/hashtable.S. When
>> there is a page fault, the function hash_page gets called.
>
>In the case of a 603 core, hash_page is called for DSI (Data Access)
>faults.  However, if the feature indicates there is no HPTE, the
>hash_page function is patched to simply return.  You can't look at
>the code in hashtable.S and know how it is going to work for a particular
>implementation because it is patched at initialization to change
>it's behavior.
>
>> .....This does
>> some hashing and writes the hash values into a table located at
>> 0xc0180000.
>
>When your kernel boots, does it print a message to indicate it has allocated
>a hash table?
>
>> In arch/ppc/mm/ppc_mmu.c the function MMU_init_hw is called, but
>> since the 8260 doesn't have the CPU_FTR_HPTE_TABLE feature, the
>> hash table is never allocated and the hash_page_patch_* never get updated.
>
>
>Oh, I just looked at a variety of different versions back to 2.4.11, and it
>is patched just as I described above.
>
>
>        -- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
@ 2002-02-27 21:48 David Ashley
  2002-02-27 22:05 ` Wolfgang Grandegger
  0 siblings, 1 reply; 33+ messages in thread
From: David Ashley @ 2002-02-27 21:48 UTC (permalink / raw)
  To: dan; +Cc: linuxppc-embedded

I can see the problem right in arch/ppc/mm/ppc_mmu.c, in the MMU_init_hw
function. The code goes:
	if ((cur_cpu_spec[0]->cpu_features & CPU_FTR_HPTE_TABLE) == 0)
		return;

But later on there is code to set hash_page[0] = 0x4e800020. Instead
of returning it should execute that else clause, and make hash_page return.

After making that change the problem appears fixed, at long last. It wasn't
buggy hardware at all, it *was* a bug in the linux kernel.

-Dave

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Linux 2.4.17 bug, mmap of /dev/mem
  2002-02-27 21:48 David Ashley
@ 2002-02-27 22:05 ` Wolfgang Grandegger
  0 siblings, 0 replies; 33+ messages in thread
From: Wolfgang Grandegger @ 2002-02-27 22:05 UTC (permalink / raw)
  To: David Ashley; +Cc: dan, linuxppc-embedded


David Ashley wrote:

>I can see the problem right in arch/ppc/mm/ppc_mmu.c, in the MMU_init_hw
>function. The code goes:
>	if ((cur_cpu_spec[0]->cpu_features & CPU_FTR_HPTE_TABLE) == 0)
>		return;
>
>But later on there is code to set hash_page[0] = 0x4e800020. Instead
>of returning it should execute that else clause, and make hash_page return.
>
>After making that change the problem appears fixed, at long last. It wasn't
>buggy hardware at all, it *was* a bug in the linux kernel.
>
Hmm, it seems to be ok in the linux_2_4_devel tree:
http://ppc.bkbits.net:8080/linuxppc_2_4_devel/anno/arch/ppc/mm/ppc_mmu.c@1.4?nav=index.html|src/.|src/arch|src/arch/ppc|src/arch/ppc/mm

Does this mean that the linux_2_4 tree is less up-to-date?

Wolfgang.


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2002-02-27 22:05 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-26  3:15 Linux 2.4.17 bug, mmap of /dev/mem David Ashley
2002-02-26  3:50 ` Dan Malek
2002-02-26 14:43   ` John W. Linville
2002-02-26 15:18     ` Wolfgang Denk
2002-02-26 17:06     ` Dan Malek
  -- strict thread matches above, loose matches on Subject: below --
2002-02-27 21:48 David Ashley
2002-02-27 22:05 ` Wolfgang Grandegger
2002-02-27 21:36 David Ashley
2002-02-27 21:04 David Ashley
2002-02-27 21:06 ` Dan Malek
2002-02-26 20:17 David Ashley
2002-02-26 16:00 David Ashley
2002-02-26  0:36 David Ashley
2002-02-26  0:18 David Ashley
2002-02-26  0:06 David Ashley
2002-02-25 23:43 David Ashley
     [not found] <3C7AC345.301@embeddededge.com>
2002-02-25 23:26 ` Wolfgang Denk
2002-02-25 22:29 David Ashley
2002-02-25 22:41 ` Wolfgang Denk
2002-02-26  0:57 ` Greg Griffes
2002-02-26  1:34 ` Dan Malek
2002-02-25 20:27 David Ashley
2002-02-25 20:54 ` Dan Malek
2002-02-25 21:06   ` Dan Malek
2002-02-25 22:36   ` Wolfgang Denk
2002-02-25 18:16 David Ashley
2002-02-25 18:51 ` Dan Malek
2002-02-20 15:54 David Ashley
2002-02-15  7:17 Goddeeris Frederic
2002-02-14 17:06 David Ashley
2002-02-14  9:22 Goddeeris Frederic
2002-02-12  0:36 David Ashley
2002-02-08 16:07 David Ashley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).