* Unresponiveness of 2.4.16
@ 2001-11-26 22:02 Nathan G. Grennan
2001-11-26 22:17 ` Alan Cox
` (4 more replies)
0 siblings, 5 replies; 53+ messages in thread
From: Nathan G. Grennan @ 2001-11-26 22:02 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1543 bytes --]
2.4.16 becomes very unresponsive for 30 seconds or so at a time during
large unarchiving of tarballs, like tar -zxf mozilla-src.tar.gz. The
file is about 36mb. I run top in one window, run free repeatedly in
another window and run the tar -zxf in a third window. I had many
suspects, but still not sure what it is. I have tried
ext2 vs ext3
preemptive vs non-preemptive
tainted vs non-tainted
Nothing seems to help 2.4.16.
I tried switching to Redhat's 2.4.9-13 kernel and it acts Alot better.
Not only does 2.4.9-13 not get the 30 second delay, but it also seems to
take advantage of caching. 2.4.16 takes the same moment of time each
time, even tho it should have cached it all into memory the first time.
2.4.9-13 takes a while the first time(without the 30 second new process
freezing), but then takes almost no time the times after that. One
interesting thing I noticed is that with and without preemptive a
already started mp3 playing had no disruption even during the 30 second
windows where any new commands would get stuck with 2.4.16. I am not
using custom
I plan to do more testing to see how say 2.4.9, 2.4.13ac7, etc.
Any ideas of how to fix this for 2.4.16?
I have attached my .config.
My system:
Redhat 7.2 with all updates
Athlon Thunderbird 1.33ghz
768mb(512mb, 256mb) PC133 SDRAM
Abit KT7A-RAID v1.0(KT133A chipset)
Bios 64
HPT370(bios v1.2.0604)
Primary Master Quantum Fireball AS40.0
Secondary Master IBM-DTLA-307045
VIA686B
Primary Master CREATIVE DVD-ROM DVD6240E
Secondary Master CR-2801TE
[-- Attachment #2: .config --]
[-- Type: text/plain, Size: 18291 bytes --]
#
# Automatically generated make config: don't edit
#
CONFIG_X86=y
CONFIG_ISA=y
# CONFIG_SBUS is not set
CONFIG_UID16=y
#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KMOD=y
#
# Processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
CONFIG_MK7=y
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_USE_3DNOW=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
# CONFIG_SMP is not set
# CONFIG_X86_UP_APIC is not set
#
# General setup
#
CONFIG_NET=y
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_HOTPLUG is not set
# CONFIG_PCMCIA is not set
# CONFIG_HOTPLUG_PCI is not set
CONFIG_SYSVIPC=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
# CONFIG_KCORE_AOUT is not set
# CONFIG_BINFMT_AOUT is not set
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_PM=y
CONFIG_ACPI=y
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_BUSMGR=y
CONFIG_ACPI_SYS=y
CONFIG_ACPI_CPU=y
CONFIG_ACPI_BUTTON=y
# CONFIG_ACPI_AC is not set
# CONFIG_ACPI_EC is not set
# CONFIG_APM is not set
#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set
#
# Parallel port support
#
# CONFIG_PARPORT is not set
#
# Plug and Play configuration
#
# CONFIG_PNP is not set
#
# Block devices
#
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_DEV_XD is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
CONFIG_BLK_DEV_LOOP=m
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_RAM is not set
#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set
#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETLINK=y
CONFIG_RTNETLINK=y
# CONFIG_NETLINK_DEV is not set
# CONFIG_NETFILTER is not set
# CONFIG_FILTER is not set
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_INET_ECN is not set
CONFIG_SYN_COOKIES=y
# CONFIG_IPV6 is not set
# CONFIG_KHTTPD is not set
# CONFIG_ATM is not set
# CONFIG_VLAN_8021Q is not set
#
#
#
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_DECNET is not set
# CONFIG_BRIDGE is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_LLC is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_FASTROUTE is not set
# CONFIG_NET_HW_FLOWCONTROL is not set
#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set
#
# Telephony Support
#
# CONFIG_PHONE is not set
#
# ATA/IDE/MFM/RLL support
#
CONFIG_IDE=y
#
# IDE, ATA and ATAPI Block devices
#
CONFIG_BLK_DEV_IDE=y
#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_HD_IDE is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
# CONFIG_BLK_DEV_IDEDISK_VENDOR is not set
# CONFIG_BLK_DEV_COMMERIAL is not set
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
CONFIG_BLK_DEV_IDESCSI=y
#
# IDE chipset support/bugfixes
#
# CONFIG_BLK_DEV_CMD640 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_BLK_DEV_ADMA=y
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_PCI_WIP is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
CONFIG_BLK_DEV_HPT366=y
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_PDC202XX is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
CONFIG_BLK_DEV_VIA82CXXX=y
# CONFIG_IDE_CHIPSETS is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_IDEDMA_IVB is not set
# CONFIG_DMA_NONPCI is not set
CONFIG_BLK_DEV_IDE_MODES=y
# CONFIG_BLK_DEV_ATARAID is not set
#
# SCSI support
#
CONFIG_SCSI=y
#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_SD_EXTRA_DEVS=40
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=y
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_SR_EXTRA_DEVS=2
CONFIG_CHR_DEV_SG=y
#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
# CONFIG_SCSI_DEBUG_QUEUES is not set
# CONFIG_SCSI_MULTI_LUN is not set
# CONFIG_SCSI_CONSTANTS is not set
# CONFIG_SCSI_LOGGING is not set
#
# SCSI low-level drivers
#
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_7000FASST is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AHA152X is not set
# CONFIG_SCSI_AHA1542 is not set
# CONFIG_SCSI_AHA1740 is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_IN2000 is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_MEGARAID is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_CPQFCTS is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_DTC3280 is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_EATA_DMA is not set
# CONFIG_SCSI_EATA_PIO is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_GENERIC_NCR5380 is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_NCR53C406A is not set
# CONFIG_SCSI_NCR53C7xx is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_NCR53C8XX is not set
# CONFIG_SCSI_SYM53C8XX is not set
# CONFIG_SCSI_PAS16 is not set
# CONFIG_SCSI_PCI2000 is not set
# CONFIG_SCSI_PCI2220I is not set
# CONFIG_SCSI_PSI240I is not set
# CONFIG_SCSI_QLOGIC_FAS is not set
# CONFIG_SCSI_QLOGIC_ISP is not set
# CONFIG_SCSI_QLOGIC_FC is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_SEAGATE is not set
# CONFIG_SCSI_SIM710 is not set
# CONFIG_SCSI_SYM53C416 is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_T128 is not set
# CONFIG_SCSI_U14_34F is not set
# CONFIG_SCSI_ULTRASTOR is not set
# CONFIG_SCSI_DEBUG is not set
#
# Fusion MPT device support
#
# CONFIG_FUSION is not set
# CONFIG_FUSION_BOOT is not set
# CONFIG_FUSION_ISENSE is not set
# CONFIG_FUSION_CTL is not set
# CONFIG_FUSION_LAN is not set
#
# IEEE 1394 (FireWire) support (EXPERIMENTAL)
#
# CONFIG_IEEE1394 is not set
#
# I2O device support
#
# CONFIG_I2O is not set
#
# Network device support
#
CONFIG_NETDEVICES=y
#
# ARCnet devices
#
# CONFIG_ARCNET is not set
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
# CONFIG_ETHERTAP is not set
#
# Ethernet (10 or 100Mbit)
#
CONFIG_NET_ETHERNET=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
CONFIG_NET_VENDOR_3COM=y
# CONFIG_EL1 is not set
# CONFIG_EL2 is not set
# CONFIG_ELPLUS is not set
# CONFIG_EL16 is not set
# CONFIG_EL3 is not set
# CONFIG_3C515 is not set
CONFIG_VORTEX=m
# CONFIG_LANCE is not set
# CONFIG_NET_VENDOR_SMC is not set
# CONFIG_NET_VENDOR_RACAL is not set
# CONFIG_AT1700 is not set
# CONFIG_DEPCA is not set
# CONFIG_HP100 is not set
# CONFIG_NET_ISA is not set
CONFIG_NET_PCI=y
# CONFIG_PCNET32 is not set
# CONFIG_ADAPTEC_STARFIRE is not set
# CONFIG_AC3200 is not set
# CONFIG_APRICOT is not set
# CONFIG_CS89x0 is not set
CONFIG_TULIP=m
# CONFIG_TULIP_MWI is not set
# CONFIG_TULIP_MMIO is not set
# CONFIG_DE4X5 is not set
# CONFIG_DGRS is not set
# CONFIG_DM9102 is not set
# CONFIG_EEPRO100 is not set
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
# CONFIG_NE2K_PCI is not set
# CONFIG_8139CP is not set
# CONFIG_8139TOO is not set
# CONFIG_SIS900 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
# CONFIG_VIA_RHINE is not set
# CONFIG_WINBOND_840 is not set
# CONFIG_NET_POCKET is not set
#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_SK98LIN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set
#
# Token Ring devices
#
# CONFIG_TR is not set
# CONFIG_NET_FC is not set
# CONFIG_RCPCI is not set
# CONFIG_SHAPER is not set
#
# Wan interfaces
#
# CONFIG_WAN is not set
#
# Amateur Radio support
#
# CONFIG_HAMRADIO is not set
#
# IrDA (infrared) support
#
# CONFIG_IRDA is not set
#
# ISDN subsystem
#
# CONFIG_ISDN is not set
#
# Old CD-ROM drivers (not SCSI, not IDE)
#
# CONFIG_CD_NO_IDESCSI is not set
#
# Input core support
#
CONFIG_INPUT=y
# CONFIG_INPUT_KEYBDEV is not set
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1400
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=1050
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_SERIAL=y
# CONFIG_SERIAL_CONSOLE is not set
# CONFIG_SERIAL_ACPI is not set
# CONFIG_SERIAL_EXTENDED is not set
# CONFIG_SERIAL_NONSTANDARD is not set
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
#
# I2C support
#
CONFIG_I2C=m
CONFIG_I2C_ALGOBIT=m
# CONFIG_I2C_ELV is not set
# CONFIG_I2C_VELLEMAN is not set
# CONFIG_I2C_ALGOPCF is not set
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_PROC=m
#
# Mice
#
# CONFIG_BUSMOUSE is not set
# CONFIG_MOUSE is not set
#
# Joysticks
#
# CONFIG_INPUT_GAMEPORT is not set
# CONFIG_INPUT_SERIO is not set
#
# Joysticks
#
# CONFIG_INPUT_IFORCE_USB is not set
# CONFIG_QIC02_TAPE is not set
#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
# CONFIG_INTEL_RNG is not set
# CONFIG_NVRAM is not set
CONFIG_RTC=y
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set
#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
CONFIG_AGP=y
# CONFIG_AGP_INTEL is not set
# CONFIG_AGP_I810 is not set
CONFIG_AGP_VIA=y
# CONFIG_AGP_AMD is not set
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_ALI is not set
# CONFIG_AGP_SWORKS is not set
# CONFIG_DRM is not set
# CONFIG_MWAVE is not set
#
# Multimedia devices
#
CONFIG_VIDEO_DEV=m
#
# Video For Linux
#
# CONFIG_VIDEO_PROC_FS is not set
#
# Video Adapters
#
CONFIG_VIDEO_BT848=m
# CONFIG_VIDEO_PMS is not set
# CONFIG_VIDEO_CPIA is not set
# CONFIG_VIDEO_SAA5249 is not set
# CONFIG_TUNER_3036 is not set
# CONFIG_VIDEO_STRADIS is not set
# CONFIG_VIDEO_ZORAN is not set
# CONFIG_VIDEO_ZR36120 is not set
#
# Radio Adapters
#
# CONFIG_RADIO_CADET is not set
# CONFIG_RADIO_RTRACK is not set
# CONFIG_RADIO_RTRACK2 is not set
# CONFIG_RADIO_AZTECH is not set
# CONFIG_RADIO_GEMTEK is not set
# CONFIG_RADIO_GEMTEK_PCI is not set
# CONFIG_RADIO_MAXIRADIO is not set
# CONFIG_RADIO_MAESTRO is not set
# CONFIG_RADIO_SF16FMI is not set
# CONFIG_RADIO_TERRATEC is not set
# CONFIG_RADIO_TRUST is not set
# CONFIG_RADIO_TYPHOON is not set
# CONFIG_RADIO_ZOLTRIX is not set
#
# File systems
#
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
CONFIG_REISERFS_FS=m
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_BFS_FS is not set
CONFIG_EXT3_FS=y
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=m
# CONFIG_UMSDOS_FS is not set
CONFIG_VFAT_FS=y
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
CONFIG_TMPFS=y
# CONFIG_RAMFS is not set
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_HPFS_FS is not set
CONFIG_PROC_FS=y
# CONFIG_DEVFS_FS is not set
CONFIG_DEVPTS_FS=y
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_EXT2_FS=y
# CONFIG_SYSV_FS is not set
CONFIG_UDF_FS=m
# CONFIG_UDF_RW is not set
# CONFIG_UFS_FS is not set
#
# Network File Systems
#
# CONFIG_CODA_FS is not set
# CONFIG_INTERMEZZO_FS is not set
# CONFIG_NFS_FS is not set
# CONFIG_NFSD is not set
# CONFIG_SUNRPC is not set
# CONFIG_LOCKD is not set
CONFIG_SMB_FS=m
CONFIG_SMB_NLS_DEFAULT=y
CONFIG_SMB_NLS_REMOTE="cp437"
# CONFIG_NCP_FS is not set
# CONFIG_ZISOFS_FS is not set
# CONFIG_ZLIB_FS_INFLATE is not set
#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_SMB_NLS=y
CONFIG_NLS=y
#
# Native Language Support
#
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set
#
# Console drivers
#
CONFIG_VGA_CONSOLE=y
# CONFIG_VIDEO_SELECT is not set
# CONFIG_MDA_CONSOLE is not set
#
# Frame-buffer support
#
# CONFIG_FB is not set
#
# Sound
#
CONFIG_SOUND=y
# CONFIG_SOUND_BT878 is not set
# CONFIG_SOUND_CMPCI is not set
CONFIG_SOUND_EMU10K1=m
# CONFIG_MIDI_EMU10K1 is not set
# CONFIG_SOUND_FUSION is not set
# CONFIG_SOUND_CS4281 is not set
# CONFIG_SOUND_ES1370 is not set
CONFIG_SOUND_ES1371=y
# CONFIG_SOUND_ESSSOLO1 is not set
# CONFIG_SOUND_MAESTRO is not set
# CONFIG_SOUND_MAESTRO3 is not set
# CONFIG_SOUND_ICH is not set
# CONFIG_SOUND_RME96XX is not set
# CONFIG_SOUND_SONICVIBES is not set
# CONFIG_SOUND_TRIDENT is not set
# CONFIG_SOUND_MSNDCLAS is not set
# CONFIG_SOUND_MSNDPIN is not set
# CONFIG_SOUND_VIA82CXXX is not set
# CONFIG_SOUND_OSS is not set
# CONFIG_SOUND_TVMIXER is not set
#
# USB support
#
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# CONFIG_USB_BANDWIDTH is not set
# CONFIG_USB_LONG_TIMEOUT is not set
#
# USB Controllers
#
CONFIG_USB_UHCI_ALT=y
# CONFIG_USB_OHCI is not set
#
# USB Device Class drivers
#
# CONFIG_USB_AUDIO is not set
# CONFIG_USB_BLUETOOTH is not set
CONFIG_USB_STORAGE=y
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_ISD200 is not set
# CONFIG_USB_STORAGE_DPCM is not set
# CONFIG_USB_STORAGE_HP8200e is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
#
# USB Human Interface Devices (HID)
#
CONFIG_USB_HID=y
# CONFIG_USB_HIDDEV is not set
# CONFIG_USB_WACOM is not set
#
# USB Imaging devices
#
# CONFIG_USB_DC2XX is not set
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_SCANNER is not set
# CONFIG_USB_MICROTEK is not set
# CONFIG_USB_HPUSBSCSI is not set
#
# USB Multimedia devices
#
# CONFIG_USB_IBMCAM is not set
# CONFIG_USB_OV511 is not set
# CONFIG_USB_PWC is not set
# CONFIG_USB_SE401 is not set
# CONFIG_USB_DSBR is not set
# CONFIG_USB_DABUSB is not set
#
# USB Network adaptors
#
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_CATC is not set
# CONFIG_USB_CDCETHER is not set
# CONFIG_USB_USBNET is not set
#
# USB port drivers
#
#
# USB Serial Converter support
#
# CONFIG_USB_SERIAL is not set
#
# USB Miscellaneous drivers
#
# CONFIG_USB_RIO500 is not set
#
# Bluetooth support
#
# CONFIG_BLUEZ is not set
#
# Kernel hacking
#
# CONFIG_DEBUG_KERNEL is not set
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 22:02 Nathan G. Grennan
@ 2001-11-26 22:17 ` Alan Cox
2001-11-26 23:34 ` Nicolas Pitre
` (2 more replies)
2001-11-26 22:21 ` Andrew Morton
` (3 subsequent siblings)
4 siblings, 3 replies; 53+ messages in thread
From: Alan Cox @ 2001-11-26 22:17 UTC (permalink / raw)
To: Nathan G. Grennan; +Cc: linux-kernel
> 2.4.16 becomes very unresponsive for 30 seconds or so at a time during
> large unarchiving of tarballs, like tar -zxf mozilla-src.tar.gz. The
> file is about 36mb. I run top in one window, run free repeatedly in
This seems to be one of the small as yet unresolved problems with the newer
VM code in 2.4.16. I've not managed to prove its the VM or the differing
I/O scheduling rules however.
> Any ideas of how to fix this for 2.4.16?
If it is the VM then watch for a patch from Rik for 2.4.16 + RielVM. If
that helps then we know its VM related , if not then we know to look at
other suspects
Alan
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 22:02 Nathan G. Grennan
2001-11-26 22:17 ` Alan Cox
@ 2001-11-26 22:21 ` Andrew Morton
2001-11-27 7:42 ` Jens Axboe
2001-11-26 22:44 ` Lincoln Dale
` (2 subsequent siblings)
4 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2001-11-26 22:21 UTC (permalink / raw)
To: Nathan G. Grennan; +Cc: linux-kernel
"Nathan G. Grennan" wrote:
>
> 2.4.16 becomes very unresponsive for 30 seconds or so at a time during
> large unarchiving of tarballs, like tar -zxf mozilla-src.tar.gz. The
> file is about 36mb. I run top in one window, run free repeatedly in
> another window and run the tar -zxf in a third window. I had many
> suspects, but still not sure what it is. I have tried
yes. I'm doing quite some work in this area at present. There
are a couple of things which may help here.
1: The current code which is designed to throttle heavy writers
basically doesn't work under some workloads. It's designed to
block the writer when there are too many dirty buffers in the
machine. But in fact, all dirty data writeout occurs in the
context of shrink_cache(), so all tasks are penalised and if
the writing task doesn't happen to run shrink_cache(), it gets
to merrily continue stuffing the machine full of write data.
The fix is to account for locked buffers as well as dirty ones
in balance_dirty_state().
2: The current elevator design is downright cruel to humans in
the presence of heavy write traffic.
Please try this lot:
--- linux-2.4.16-pre1/fs/buffer.c Thu Nov 22 23:02:58 2001
+++ linux-akpm/fs/buffer.c Sun Nov 25 00:07:47 2001
@@ -1036,6 +1036,7 @@ static int balance_dirty_state(void)
unsigned long dirty, tot, hard_dirty_limit, soft_dirty_limit;
dirty = size_buffers_type[BUF_DIRTY] >> PAGE_SHIFT;
+ dirty += size_buffers_type[BUF_LOCKED] >> PAGE_SHIFT;
tot = nr_free_buffer_pages();
dirty *= 100;
--- linux-2.4.16-pre1/mm/filemap.c Sat Nov 24 13:14:52 2001
+++ linux-akpm/mm/filemap.c Sun Nov 25 00:07:47 2001
@@ -3023,7 +3023,18 @@ generic_file_write(struct file *file,con
unlock:
kunmap(page);
/* Mark it unlocked again and drop the page.. */
- SetPageReferenced(page);
+// SetPageReferenced(page);
+ ClearPageReferenced(page);
+#if 0
+ {
+ lru_cache_del(page);
+ TestSetPageLRU(page);
+ spin_lock(&pagemap_lru_lock);
+ list_add_tail(&(page)->lru, &inactive_list);
+ nr_inactive_pages++;
+ spin_unlock(&pagemap_lru_lock);
+ }
+#endif
UnlockPage(page);
page_cache_release(page);
--- linux-2.4.16-pre1/mm/vmscan.c Thu Nov 22 23:02:59 2001
+++ linux-akpm/mm/vmscan.c Sun Nov 25 00:08:03 2001
@@ -573,6 +573,9 @@ static int shrink_caches(zone_t * classz
nr_pages = shrink_cache(nr_pages, classzone, gfp_mask, priority);
if (nr_pages <= 0)
return 0;
+ nr_pages = shrink_cache(nr_pages, classzone, gfp_mask, priority);
+ if (nr_pages <= 0)
+ return 0;
shrink_dcache_memory(priority, gfp_mask);
shrink_icache_memory(priority, gfp_mask);
@@ -585,7 +588,7 @@ static int shrink_caches(zone_t * classz
int try_to_free_pages(zone_t *classzone, unsigned int gfp_mask, unsigned int order)
{
- int priority = DEF_PRIORITY;
+ int priority = DEF_PRIORITY - 2;
int nr_pages = SWAP_CLUSTER_MAX;
do {
--- linux-2.4.16-pre1/include/linux/elevator.h Thu Feb 15 16:58:34 2001
+++ linux-akpm/include/linux/elevator.h Sat Nov 24 19:58:43 2001
@@ -5,8 +5,9 @@ typedef void (elevator_fn) (struct reque
struct list_head *,
struct list_head *, int);
-typedef int (elevator_merge_fn) (request_queue_t *, struct request **, struct list_head *,
- struct buffer_head *, int, int);
+typedef int (elevator_merge_fn)(request_queue_t *, struct request **,
+ struct list_head *, struct buffer_head *bh,
+ int rw, int max_sectors, int max_bomb_segments);
typedef void (elevator_merge_cleanup_fn) (request_queue_t *, struct request *, int);
@@ -16,6 +17,7 @@ struct elevator_s
{
int read_latency;
int write_latency;
+ int max_bomb_segments;
elevator_merge_fn *elevator_merge_fn;
elevator_merge_cleanup_fn *elevator_merge_cleanup_fn;
@@ -24,13 +26,13 @@ struct elevator_s
unsigned int queue_ID;
};
-int elevator_noop_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int);
-void elevator_noop_merge_cleanup(request_queue_t *, struct request *, int);
-void elevator_noop_merge_req(struct request *, struct request *);
-
-int elevator_linus_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int);
-void elevator_linus_merge_cleanup(request_queue_t *, struct request *, int);
-void elevator_linus_merge_req(struct request *, struct request *);
+elevator_merge_fn elevator_noop_merge;
+elevator_merge_cleanup_fn elevator_noop_merge_cleanup;
+elevator_merge_req_fn elevator_noop_merge_req;
+
+elevator_merge_fn elevator_linus_merge;
+elevator_merge_cleanup_fn elevator_linus_merge_cleanup;
+elevator_merge_req_fn elevator_linus_merge_req;
typedef struct blkelv_ioctl_arg_s {
int queue_ID;
@@ -54,22 +56,6 @@ extern void elevator_init(elevator_t *,
#define ELEVATOR_FRONT_MERGE 1
#define ELEVATOR_BACK_MERGE 2
-/*
- * This is used in the elevator algorithm. We don't prioritise reads
- * over writes any more --- although reads are more time-critical than
- * writes, by treating them equally we increase filesystem throughput.
- * This turns out to give better overall performance. -- sct
- */
-#define IN_ORDER(s1,s2) \
- ((((s1)->rq_dev == (s2)->rq_dev && \
- (s1)->sector < (s2)->sector)) || \
- (s1)->rq_dev < (s2)->rq_dev)
-
-#define BHRQ_IN_ORDER(bh, rq) \
- ((((bh)->b_rdev == (rq)->rq_dev && \
- (bh)->b_rsector < (rq)->sector)) || \
- (bh)->b_rdev < (rq)->rq_dev)
-
static inline int elevator_request_latency(elevator_t * elevator, int rw)
{
int latency;
@@ -85,7 +71,7 @@ static inline int elevator_request_laten
((elevator_t) { \
0, /* read_latency */ \
0, /* write_latency */ \
- \
+ 0, /* max_bomb_segments */ \
elevator_noop_merge, /* elevator_merge_fn */ \
elevator_noop_merge_cleanup, /* elevator_merge_cleanup_fn */ \
elevator_noop_merge_req, /* elevator_merge_req_fn */ \
@@ -95,7 +81,7 @@ static inline int elevator_request_laten
((elevator_t) { \
8192, /* read passovers */ \
16384, /* write passovers */ \
- \
+ 6, /* max_bomb_segments */ \
elevator_linus_merge, /* elevator_merge_fn */ \
elevator_linus_merge_cleanup, /* elevator_merge_cleanup_fn */ \
elevator_linus_merge_req, /* elevator_merge_req_fn */ \
--- linux-2.4.16-pre1/drivers/block/elevator.c Thu Jul 19 20:59:41 2001
+++ linux-akpm/drivers/block/elevator.c Sat Nov 24 20:51:29 2001
@@ -74,36 +74,52 @@ inline int bh_rq_in_between(struct buffe
return 0;
}
+struct akpm_elv_stats {
+ int zapme;
+ int nr_read_sectors;
+ int nr_write_sectors;
+ int nr_read_requests;
+ int nr_write_requests;
+} akpm_elv_stats;
int elevator_linus_merge(request_queue_t *q, struct request **req,
struct list_head * head,
struct buffer_head *bh, int rw,
- int max_sectors)
+ int max_sectors, int max_bomb_segments)
{
- struct list_head *entry = &q->queue_head;
- unsigned int count = bh->b_size >> 9, ret = ELEVATOR_NO_MERGE;
+ struct list_head *entry;
+ unsigned int count = bh->b_size >> 9;
+ unsigned int ret = ELEVATOR_NO_MERGE;
+ int no_in_between = 0;
+ if (akpm_elv_stats.zapme)
+ memset(&akpm_elv_stats, 0, sizeof(akpm_elv_stats));
+
+ entry = &q->queue_head;
while ((entry = entry->prev) != head) {
struct request *__rq = blkdev_entry_to_request(entry);
-
- /*
- * simply "aging" of requests in queue
- */
- if (__rq->elevator_sequence-- <= 0)
- break;
-
+ if (__rq->elevator_sequence-- <= 0) {
+ /*
+ * OK, we've exceeded someone's latency limit.
+ * But we still continue to look for merges,
+ * because they're so much better than seeks.
+ */
+ no_in_between = 1;
+ }
if (__rq->waiting)
continue;
if (__rq->rq_dev != bh->b_rdev)
continue;
- if (!*req && bh_rq_in_between(bh, __rq, &q->queue_head))
+ if (!*req && !no_in_between &&
+ bh_rq_in_between(bh, __rq, &q->queue_head)) {
*req = __rq;
+ }
if (__rq->cmd != rw)
continue;
if (__rq->nr_sectors + count > max_sectors)
continue;
if (__rq->elevator_sequence < count)
- break;
+ no_in_between = 1;
if (__rq->sector + __rq->nr_sectors == bh->b_rsector) {
ret = ELEVATOR_BACK_MERGE;
*req = __rq;
@@ -116,6 +132,66 @@ int elevator_linus_merge(request_queue_t
}
}
+ /*
+ * If we failed to merge a read anywhere in the request
+ * queue, we really don't want to place it at the end
+ * of the list, behind lots of writes. So place it near
+ * the front.
+ *
+ * We don't want to place it in front of _all_ writes: that
+ * would create lots of seeking, and isn't tunable.
+ * We try to avoid promoting this read in front of existing
+ * reads.
+ *
+ * max_bomb_sectors becomes the maximum number of write
+ * requests which we allow to remain in place in front of
+ * a newly introduced read. We weight things a little bit,
+ * so large writes are more expensive than small ones, but it's
+ * requests which count, not sectors.
+ */
+ if (rw == READ && ret == ELEVATOR_NO_MERGE) {
+ int cur_latency = 0;
+ struct request * const cur_request = *req;
+
+ entry = head->next;
+ while (entry != &q->queue_head) {
+ struct request *__rq;
+
+ if (entry == &q->queue_head)
+ BUG();
+ if (entry == q->queue_head.next &&
+ q->head_active && !q->plugged)
+ BUG();
+ __rq = blkdev_entry_to_request(entry);
+
+ if (__rq == cur_request) {
+ /*
+ * This is where the old algorithm placed it.
+ * There's no point pushing it further back,
+ * so leave it here, in sorted order.
+ */
+ break;
+ }
+ if (__rq->cmd == WRITE) {
+ cur_latency += 1 + __rq->nr_sectors / 64;
+ if (cur_latency >= max_bomb_segments) {
+ *req = __rq;
+ break;
+ }
+ }
+ entry = entry->next;
+ }
+ }
+ if (ret == ELEVATOR_NO_MERGE) {
+ if (rw == READ)
+ akpm_elv_stats.nr_read_requests++;
+ else
+ akpm_elv_stats.nr_write_requests++;
+ }
+ if (rw == READ)
+ akpm_elv_stats.nr_read_sectors += count;
+ else
+ akpm_elv_stats.nr_write_sectors += count;
return ret;
}
@@ -144,7 +220,7 @@ void elevator_linus_merge_req(struct req
int elevator_noop_merge(request_queue_t *q, struct request **req,
struct list_head * head,
struct buffer_head *bh, int rw,
- int max_sectors)
+ int max_sectors, int max_bomb_segments)
{
struct list_head *entry;
unsigned int count = bh->b_size >> 9;
@@ -188,7 +264,7 @@ int blkelvget_ioctl(elevator_t * elevato
output.queue_ID = elevator->queue_ID;
output.read_latency = elevator->read_latency;
output.write_latency = elevator->write_latency;
- output.max_bomb_segments = 0;
+ output.max_bomb_segments = elevator->max_bomb_segments;
if (copy_to_user(arg, &output, sizeof(blkelv_ioctl_arg_t)))
return -EFAULT;
@@ -207,9 +283,12 @@ int blkelvset_ioctl(elevator_t * elevato
return -EINVAL;
if (input.write_latency < 0)
return -EINVAL;
+ if (input.max_bomb_segments < 0)
+ return -EINVAL;
elevator->read_latency = input.read_latency;
elevator->write_latency = input.write_latency;
+ elevator->max_bomb_segments = input.max_bomb_segments;
return 0;
}
--- linux-2.4.16-pre1/drivers/block/ll_rw_blk.c Mon Nov 5 21:01:11 2001
+++ linux-akpm/drivers/block/ll_rw_blk.c Sat Nov 24 22:25:47 2001
@@ -690,7 +690,8 @@ again:
} else if (q->head_active && !q->plugged)
head = head->next;
- el_ret = elevator->elevator_merge_fn(q, &req, head, bh, rw,max_sectors);
+ el_ret = elevator->elevator_merge_fn(q, &req, head, bh,
+ rw, max_sectors, elevator->max_bomb_segments);
switch (el_ret) {
case ELEVATOR_BACK_MERGE:
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 22:02 Nathan G. Grennan
2001-11-26 22:17 ` Alan Cox
2001-11-26 22:21 ` Andrew Morton
@ 2001-11-26 22:44 ` Lincoln Dale
2001-11-27 4:34 ` GOTO Masanori
2001-11-27 0:44 ` Lost Logic
2001-11-27 3:49 ` Sean Elble
4 siblings, 1 reply; 53+ messages in thread
From: Lincoln Dale @ 2001-11-26 22:44 UTC (permalink / raw)
To: Alan Cox; +Cc: Nathan G. Grennan, linux-kernel
At 10:17 PM 26/11/2001 +0000, Alan Cox wrote:
> > 2.4.16 becomes very unresponsive for 30 seconds or so at a time during
> > large unarchiving of tarballs, like tar -zxf mozilla-src.tar.gz. The
> > file is about 36mb. I run top in one window, run free repeatedly in
>
>This seems to be one of the small as yet unresolved problems with the newer
>VM code in 2.4.16. I've not managed to prove its the VM or the differing
>I/O scheduling rules however.
it is I/O scheduling.
i have a system with a large amount of RAM.
it has both 15K RPM SCSI disks (off a symbios controller) and some bog-slow
IDE/ATA disks which the system decides to use PIO for rather than DMA. (i
don't use them for anything other than bootup so don't really care about it
deciding to use PIO..).
a copy to/from the 15K RPM SCSI disks doesn't show any performance problems.
a copy to/from the PIO-based IDE disks has the same effect -- 20/30 seconds
of no interactiveness -- even a "vmstat 1" *stops* for 20-30 seconds while
200+MB of buffer-cache data gets written out to disk.
i'm guessing that:
(a) the i/o scheduler isn't taking into account "disk speed" and thus
slower disks
show it more effectively than fast-disks
(b) its isolated to somewhere in the IDE drivers
cheers,
lincoln.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 22:17 ` Alan Cox
@ 2001-11-26 23:34 ` Nicolas Pitre
2001-11-27 0:05 ` Steve Lion
2001-11-27 9:12 ` Ahmed Masud
2001-11-26 23:59 ` Rik van Riel
2001-11-27 1:45 ` Andrea Arcangeli
2 siblings, 2 replies; 53+ messages in thread
From: Nicolas Pitre @ 2001-11-26 23:34 UTC (permalink / raw)
To: Alan Cox; +Cc: Nathan G. Grennan, lkml
On Mon, 26 Nov 2001, Alan Cox wrote:
> > 2.4.16 becomes very unresponsive for 30 seconds or so at a time during
> > large unarchiving of tarballs, like tar -zxf mozilla-src.tar.gz. The
> > file is about 36mb. I run top in one window, run free repeatedly in
>
> This seems to be one of the small as yet unresolved problems with the newer
> VM code in 2.4.16. I've not managed to prove its the VM or the differing
> I/O scheduling rules however.
FWIW...
I experienced quite the same unresponsiveness but more in the order of 4-5
seconds since I started to use ext3 with RH 7.2 (i.e. kernel 2.4.7 based).
I'm currently running 2.4.15-pre7 and the same momentary stalls are there
just like with 2.4.7. It is much more visible when applying large patches to
a kernel source tree as the patch output stops scrolling from time to time
for about 5 secs. I never saw such thing while previously using reiserfs.
I've yet to try reiserfs on a 2.4.16 tree to see if this is actually an ext3
problem.
Nicolas
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 22:17 ` Alan Cox
2001-11-26 23:34 ` Nicolas Pitre
@ 2001-11-26 23:59 ` Rik van Riel
2001-11-27 0:36 ` Andrew Morton
2001-11-27 1:45 ` Andrea Arcangeli
2 siblings, 1 reply; 53+ messages in thread
From: Rik van Riel @ 2001-11-26 23:59 UTC (permalink / raw)
To: Alan Cox; +Cc: Nathan G. Grennan, linux-kernel
On Mon, 26 Nov 2001, Alan Cox wrote:
> > Any ideas of how to fix this for 2.4.16?
>
> If it is the VM then watch for a patch from Rik for 2.4.16 + RielVM.
> If that helps then we know its VM related , if not then we know to
> look at other suspects
The patch to 2.4.16 + rielvm (well, a merge between my VM and
Andrea's VM) is available on my home page and seems stable now.
FYI, my 64MB dual pentium test box seems to "happily" survive
a 'make -j bzImage' over NFS...
However, I suspect this unresponsiveness issue is related to
either IO scheduling or write throttling, and that code is
the same in both VMs. I'll take a look at smoothing out writes
so we can get this thing fixed in both VMs.
The patch is on http://www.surriel.com/patches/
regards,
Rik
--
Shortwave goes a long way: irc.starchat.net #swl
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 23:34 ` Nicolas Pitre
@ 2001-11-27 0:05 ` Steve Lion
2001-11-27 9:12 ` Ahmed Masud
1 sibling, 0 replies; 53+ messages in thread
From: Steve Lion @ 2001-11-27 0:05 UTC (permalink / raw)
To: lkml
I'm running 2.4.13-ac7 with preempt patch and ext3 on this box. I don't seem
to be encountering any unresponsiveness at all while untar'ing a kernel src.
Just some info for you guys.
-Steve
* Nicolas Pitre (nico@cam.org) wrote:
> On Mon, 26 Nov 2001, Alan Cox wrote:
>
> > > 2.4.16 becomes very unresponsive for 30 seconds or so at a time during
> > > large unarchiving of tarballs, like tar -zxf mozilla-src.tar.gz. The
> > > file is about 36mb. I run top in one window, run free repeatedly in
> >
> > This seems to be one of the small as yet unresolved problems with the newer
> > VM code in 2.4.16. I've not managed to prove its the VM or the differing
> > I/O scheduling rules however.
>
> FWIW...
>
> I experienced quite the same unresponsiveness but more in the order of 4-5
> seconds since I started to use ext3 with RH 7.2 (i.e. kernel 2.4.7 based).
> I'm currently running 2.4.15-pre7 and the same momentary stalls are there
> just like with 2.4.7. It is much more visible when applying large patches to
> a kernel source tree as the patch output stops scrolling from time to time
> for about 5 secs. I never saw such thing while previously using reiserfs.
> I've yet to try reiserfs on a 2.4.16 tree to see if this is actually an ext3
> problem.
>
>
> Nicolas
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 23:59 ` Rik van Riel
@ 2001-11-27 0:36 ` Andrew Morton
2001-11-27 0:46 ` Rik van Riel
2001-11-27 4:38 ` Mike Fedyk
0 siblings, 2 replies; 53+ messages in thread
From: Andrew Morton @ 2001-11-27 0:36 UTC (permalink / raw)
To: Rik van Riel; +Cc: Alan Cox, Nathan G. Grennan, linux-kernel
Rik van Riel wrote:
>
> However, I suspect this unresponsiveness issue is related to
> either IO scheduling or write throttling, and that code is
> the same in both VMs. I'll take a look at smoothing out writes
> so we can get this thing fixed in both VMs.
>
umm... What I said.
balance_dirty_state() is allowing writes to flood the machine
with locked buffers.
elevator is penalising reads horridly. Try this on your
64 megabyte box:
dd if=/dev/zero of=foo bs=1024k count=8000
and then try to log in to it. Be patient. Very patient. Five
minutes pass. Still being patient? In fact with this test I've
never been able to get a login prompt. The filesystem which
holds `foo' is only 8 gigs, and it fills up, permitting the login
to happen.
What happens is this: sshd gets paged out. It wakes up, faults
and tries to read a page. That read gets stuck on the request
queue behind about 50 megabytes of write data. Eventually, it
gets read. Then sshd faults in another page. That gets stuck
on the request queue behind about 50 megabytes of data. By the time
this one gets read, the first page is probably paged out again. See
how this isn't getting us very far?
The patch I sent puts read requests near the head of the request
queue, and to hell with aggregate throughput. It's tunable with
`elvtune -b'. And it fixes it.
-
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 22:02 Nathan G. Grennan
` (2 preceding siblings ...)
2001-11-26 22:44 ` Lincoln Dale
@ 2001-11-27 0:44 ` Lost Logic
2001-11-27 0:57 ` Lost Logic
2001-11-27 3:49 ` Sean Elble
4 siblings, 1 reply; 53+ messages in thread
From: Lost Logic @ 2001-11-27 0:44 UTC (permalink / raw)
To: Nathan G. Grennan; +Cc: linux-kernel
I'm running 2.4.16 with 2 IDE UDMA mode 4 drives, and I have experienced
no such pausing no matter what I do. (which usually includes patching,
extracting, and generally messing with kernels from Eterm with XMMS
playing, and a couple mozillas open)
Nathan G. Grennan wrote:
>2.4.16 becomes very unresponsive for 30 seconds or so at a time during
>large unarchiving of tarballs, like tar -zxf mozilla-src.tar.gz. The
>file is about 36mb. I run top in one window, run free repeatedly in
>another window and run the tar -zxf in a third window. I had many
>suspects, but still not sure what it is. I have tried
>
>ext2 vs ext3
>preemptive vs non-preemptive
>tainted vs non-tainted
>
>Nothing seems to help 2.4.16.
>
>I tried switching to Redhat's 2.4.9-13 kernel and it acts Alot better.
>Not only does 2.4.9-13 not get the 30 second delay, but it also seems to
>take advantage of caching. 2.4.16 takes the same moment of time each
>time, even tho it should have cached it all into memory the first time.
>2.4.9-13 takes a while the first time(without the 30 second new process
>freezing), but then takes almost no time the times after that. One
>interesting thing I noticed is that with and without preemptive a
>already started mp3 playing had no disruption even during the 30 second
>windows where any new commands would get stuck with 2.4.16. I am not
>using custom
>
>I plan to do more testing to see how say 2.4.9, 2.4.13ac7, etc.
>
>Any ideas of how to fix this for 2.4.16?
>
>I have attached my .config.
>
>My system:
>
>Redhat 7.2 with all updates
>
>Athlon Thunderbird 1.33ghz
>768mb(512mb, 256mb) PC133 SDRAM
>Abit KT7A-RAID v1.0(KT133A chipset)
> Bios 64
> HPT370(bios v1.2.0604)
> Primary Master Quantum Fireball AS40.0
> Secondary Master IBM-DTLA-307045
> VIA686B
> Primary Master CREATIVE DVD-ROM DVD6240E
> Secondary Master CR-2801TE
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 0:36 ` Andrew Morton
@ 2001-11-27 0:46 ` Rik van Riel
2001-11-27 4:38 ` Mike Fedyk
1 sibling, 0 replies; 53+ messages in thread
From: Rik van Riel @ 2001-11-27 0:46 UTC (permalink / raw)
To: Andrew Morton; +Cc: Alan Cox, Nathan G. Grennan, linux-kernel
On Mon, 26 Nov 2001, Andrew Morton wrote:
> umm... What I said.
>
> balance_dirty_state() is allowing writes to flood the machine
> with locked buffers.
Saw your patch, it's neat. I'm going to try it
first thing in the morning...
Rik
--
Shortwave goes a long way: irc.starchat.net #swl
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 0:44 ` Lost Logic
@ 2001-11-27 0:57 ` Lost Logic
0 siblings, 0 replies; 53+ messages in thread
From: Lost Logic @ 2001-11-27 0:57 UTC (permalink / raw)
To: linux-kernel
Lost Logic wrote:
> I'm running 2.4.16 with 2 IDE UDMA mode 4 drives, and I have
> experienced no such pausing no matter what I do. (which usually
> includes patching, extracting, and generally messing with kernels from
> Eterm with XMMS playing, and a couple mozillas open)
Ignore that, I know why I have no problems, I can extract kernels, make
kernels, etc w/o paging...
>
> Nathan G. Grennan wrote:
>
>> 2.4.16 becomes very unresponsive for 30 seconds or so at a time during
>> large unarchiving of tarballs, like tar -zxf mozilla-src.tar.gz. The
>> file is about 36mb. I run top in one window, run free repeatedly in
>> another window and run the tar -zxf in a third window. I had many
>> suspects, but still not sure what it is. I have tried
>>
>> ext2 vs ext3
>> preemptive vs non-preemptive
>> tainted vs non-tainted
>>
>> Nothing seems to help 2.4.16.
>>
>> I tried switching to Redhat's 2.4.9-13 kernel and it acts Alot better.
>> Not only does 2.4.9-13 not get the 30 second delay, but it also seems to
>> take advantage of caching. 2.4.16 takes the same moment of time each
>> time, even tho it should have cached it all into memory the first time.
>> 2.4.9-13 takes a while the first time(without the 30 second new process
>> freezing), but then takes almost no time the times after that. One
>> interesting thing I noticed is that with and without preemptive a
>> already started mp3 playing had no disruption even during the 30 second
>> windows where any new commands would get stuck with 2.4.16. I am not
>> using custom
>>
>> I plan to do more testing to see how say 2.4.9, 2.4.13ac7, etc.
>> Any ideas of how to fix this for 2.4.16?
>>
>> I have attached my .config.
>>
>> My system:
>>
>> Redhat 7.2 with all updates
>>
>> Athlon Thunderbird 1.33ghz
>> 768mb(512mb, 256mb) PC133 SDRAM
>> Abit KT7A-RAID v1.0(KT133A chipset)
>> Bios 64
>> HPT370(bios v1.2.0604)
>> Primary Master Quantum Fireball AS40.0
>> Secondary Master IBM-DTLA-307045
>> VIA686B Primary Master CREATIVE DVD-ROM DVD6240E
>> Secondary Master CR-2801TE
>>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 22:17 ` Alan Cox
2001-11-26 23:34 ` Nicolas Pitre
2001-11-26 23:59 ` Rik van Riel
@ 2001-11-27 1:45 ` Andrea Arcangeli
2 siblings, 0 replies; 53+ messages in thread
From: Andrea Arcangeli @ 2001-11-27 1:45 UTC (permalink / raw)
To: Alan Cox; +Cc: Nathan G. Grennan, linux-kernel
On Mon, Nov 26, 2001 at 10:17:06PM +0000, Alan Cox wrote:
> > 2.4.16 becomes very unresponsive for 30 seconds or so at a time during
> > large unarchiving of tarballs, like tar -zxf mozilla-src.tar.gz. The
> > file is about 36mb. I run top in one window, run free repeatedly in
>
> This seems to be one of the small as yet unresolved problems with the newer
> VM code in 2.4.16. I've not managed to prove its the VM or the differing
can you reproduce on 2.4.15aa1?
Andrea
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 22:02 Nathan G. Grennan
` (3 preceding siblings ...)
2001-11-27 0:44 ` Lost Logic
@ 2001-11-27 3:49 ` Sean Elble
2001-11-27 3:56 ` Doug Ledford
4 siblings, 1 reply; 53+ messages in thread
From: Sean Elble @ 2001-11-27 3:49 UTC (permalink / raw)
To: Nathan G. Grennan, linux-kernel
> I tried switching to Redhat's 2.4.9-13 kernel and it acts Alot better.
> Not only does 2.4.9-13 not get the 30 second delay, but it also seems to
> take advantage of caching. 2.4.16 takes the same moment of time each
> time, even tho it should have cached it all into memory the first time.
Unless Red Hat has specifically added Andrea's new VM code to the 2.4.9
kernel, then that kernel is still using the old VM. The 2.4.10 (?) and above
kernels all use Andrea's new VM, and this includes 2.4.16 (obviously :-). My
guess is that it is a small VM-related problem, but I am certainly not a
programmer; I did see other replies to this problem, but I accidently
deleted them before I could fully read them. :-( My suggestion would
definitely to be to try other kernels; I would personally try 2.4.10,
2.4.12, 2.4.14, and you have already tried 2.4.16. This would at the very
least tell you where the problem was introduced, and hopefully, some of the
brilliant kernel people (not me) could take over from there. Hope that
helps.
-----------------------------------------------
Sean P. Elble
Editor, Writer, Co-Webmaster
ReactiveLinux.com (Formerly MaximumLinux.org)
http://www.reactivelinux.com/
elbles@reactivelinux.com
-----------------------------------------------
----- Original Message -----
From: "Nathan G. Grennan" <ngrennan@okcforum.org>
To: <linux-kernel@vger.kernel.org>
Sent: Monday, November 26, 2001 5:02 PM
Subject: Unresponiveness of 2.4.16
> 2.4.16 becomes very unresponsive for 30 seconds or so at a time during
> large unarchiving of tarballs, like tar -zxf mozilla-src.tar.gz. The
> file is about 36mb. I run top in one window, run free repeatedly in
> another window and run the tar -zxf in a third window. I had many
> suspects, but still not sure what it is. I have tried
>
> ext2 vs ext3
> preemptive vs non-preemptive
> tainted vs non-tainted
>
> Nothing seems to help 2.4.16.
>
> I tried switching to Redhat's 2.4.9-13 kernel and it acts Alot better.
> Not only does 2.4.9-13 not get the 30 second delay, but it also seems to
> take advantage of caching. 2.4.16 takes the same moment of time each
> time, even tho it should have cached it all into memory the first time.
> 2.4.9-13 takes a while the first time(without the 30 second new process
> freezing), but then takes almost no time the times after that. One
> interesting thing I noticed is that with and without preemptive a
> already started mp3 playing had no disruption even during the 30 second
> windows where any new commands would get stuck with 2.4.16. I am not
> using custom
>
> I plan to do more testing to see how say 2.4.9, 2.4.13ac7, etc.
>
> Any ideas of how to fix this for 2.4.16?
>
> I have attached my .config.
>
> My system:
>
> Redhat 7.2 with all updates
>
> Athlon Thunderbird 1.33ghz
> 768mb(512mb, 256mb) PC133 SDRAM
> Abit KT7A-RAID v1.0(KT133A chipset)
> Bios 64
> HPT370(bios v1.2.0604)
> Primary Master Quantum Fireball AS40.0
> Secondary Master IBM-DTLA-307045
> VIA686B
> Primary Master CREATIVE DVD-ROM DVD6240E
> Secondary Master CR-2801TE
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 3:49 ` Sean Elble
@ 2001-11-27 3:56 ` Doug Ledford
2001-11-27 4:00 ` Sean Elble
0 siblings, 1 reply; 53+ messages in thread
From: Doug Ledford @ 2001-11-27 3:56 UTC (permalink / raw)
To: Sean Elble; +Cc: Nathan G. Grennan, linux-kernel
Sean Elble wrote:
>>I tried switching to Redhat's 2.4.9-13 kernel and it acts Alot better.
>>Not only does 2.4.9-13 not get the 30 second delay, but it also seems to
>>take advantage of caching. 2.4.16 takes the same moment of time each
>>time, even tho it should have cached it all into memory the first time.
>>
>
> Unless Red Hat has specifically added Andrea's new VM code to the 2.4.9
> kernel, then that kernel is still using the old VM.
Not exactly. That kernel is -ac based (plus lots of other patches, some
of them VM tweaks) and is a Van Riel VM.
--
Doug Ledford <dledford@redhat.com> http://people.redhat.com/dledford
Please check my web site for aic7xxx updates/answers before
e-mailing me about problems
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 3:56 ` Doug Ledford
@ 2001-11-27 4:00 ` Sean Elble
0 siblings, 0 replies; 53+ messages in thread
From: Sean Elble @ 2001-11-27 4:00 UTC (permalink / raw)
To: Doug Ledford; +Cc: Nathan G. Grennan, linux-kernel
> Not exactly. That kernel is -ac based (plus lots of other patches, some
> of them VM tweaks) and is a Van Riel VM.
Right; it's not the "stock" 2.4.9 VM, but it isn't Andrea's either . . . one
of those gray area things. :-) I guess we just have to wait until he posts
the results with the "stock" 2.4.9 kernel to see if Red Hat fixed the
problem or not. Have a good one!
-----------------------------------------------
Sean P. Elble
Editor, Writer, Co-Webmaster
ReactiveLinux.com (Formerly MaximumLinux.org)
http://www.reactivelinux.com/
elbles@reactivelinux.com
-----------------------------------------------
----- Original Message -----
From: "Doug Ledford" <dledford@redhat.com>
To: "Sean Elble" <S_Elble@yahoo.com>
Cc: "Nathan G. Grennan" <ngrennan@okcforum.org>;
<linux-kernel@vger.kernel.org>
Sent: Monday, November 26, 2001 10:56 PM
Subject: Re: Unresponiveness of 2.4.16
> Sean Elble wrote:
>
> >>I tried switching to Redhat's 2.4.9-13 kernel and it acts Alot better.
> >>Not only does 2.4.9-13 not get the 30 second delay, but it also seems to
> >>take advantage of caching. 2.4.16 takes the same moment of time each
> >>time, even tho it should have cached it all into memory the first time.
> >>
> >
> > Unless Red Hat has specifically added Andrea's new VM code to the 2.4.9
> > kernel, then that kernel is still using the old VM.
>
>
> Not exactly. That kernel is -ac based (plus lots of other patches, some
> of them VM tweaks) and is a Van Riel VM.
>
>
>
>
> --
>
> Doug Ledford <dledford@redhat.com> http://people.redhat.com/dledford
> Please check my web site for aic7xxx updates/answers before
> e-mailing me about problems
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 22:44 ` Lincoln Dale
@ 2001-11-27 4:34 ` GOTO Masanori
0 siblings, 0 replies; 53+ messages in thread
From: GOTO Masanori @ 2001-11-27 4:34 UTC (permalink / raw)
To: ltd; +Cc: alan, ngrennan, linux-kernel
At Mon, 26 Nov 2001 14:44:19 -0800,
Lincoln Dale <ltd@cisco.com> wrote:
> At 10:17 PM 26/11/2001 +0000, Alan Cox wrote:
> > > 2.4.16 becomes very unresponsive for 30 seconds or so at a time during
> > > large unarchiving of tarballs, like tar -zxf mozilla-src.tar.gz. The
> > > file is about 36mb. I run top in one window, run free repeatedly in
> >
> >This seems to be one of the small as yet unresolved problems with the newer
> >VM code in 2.4.16. I've not managed to prove its the VM or the differing
> >I/O scheduling rules however.
>
> it is I/O scheduling.
>
> i have a system with a large amount of RAM.
> it has both 15K RPM SCSI disks (off a symbios controller) and some bog-slow
> IDE/ATA disks which the system decides to use PIO for rather than DMA. (i
> don't use them for anything other than bootup so don't really care about it
> deciding to use PIO..).
>
> a copy to/from the 15K RPM SCSI disks doesn't show any performance problems.
> a copy to/from the PIO-based IDE disks has the same effect -- 20/30 seconds
> of no interactiveness -- even a "vmstat 1" *stops* for 20-30 seconds while
> 200+MB of buffer-cache data gets written out to disk.
I guess this problem repeatly posted...
Is it related with IDE chip or chipset code?
I use Athlon on KT133A plus 2 IDE disks, and I'm also experiencing
such problem with only 1 disk. But I don't know it's PIO-based or not.
-- gotom
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 0:36 ` Andrew Morton
2001-11-27 0:46 ` Rik van Riel
@ 2001-11-27 4:38 ` Mike Fedyk
2001-11-27 4:45 ` Andrew Morton
1 sibling, 1 reply; 53+ messages in thread
From: Mike Fedyk @ 2001-11-27 4:38 UTC (permalink / raw)
To: Andrew Morton; +Cc: Rik van Riel, Alan Cox, Nathan G. Grennan, linux-kernel
On Mon, Nov 26, 2001 at 04:36:25PM -0800, Andrew Morton wrote:
> The patch I sent puts read requests near the head of the request
> queue, and to hell with aggregate throughput. It's tunable with
> `elvtune -b'. And it fixes it.
for i in `seq 9`; do elvtune -b $i /dev/hda; done
-b doesn't seem to change the "max_bomb_segments". Does your patch fix this?
Tested on 2.4.15-pre1.
MF
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 4:38 ` Mike Fedyk
@ 2001-11-27 4:45 ` Andrew Morton
0 siblings, 0 replies; 53+ messages in thread
From: Andrew Morton @ 2001-11-27 4:45 UTC (permalink / raw)
To: Mike Fedyk; +Cc: Rik van Riel, Alan Cox, Nathan G. Grennan, linux-kernel
Mike Fedyk wrote:
>
> On Mon, Nov 26, 2001 at 04:36:25PM -0800, Andrew Morton wrote:
> > The patch I sent puts read requests near the head of the request
> > queue, and to hell with aggregate throughput. It's tunable with
> > `elvtune -b'. And it fixes it.
>
> for i in `seq 9`; do elvtune -b $i /dev/hda; done
>
> -b doesn't seem to change the "max_bomb_segments". Does your patch fix this?
>
Yes, it does.
Presumably, once upon a time, max_bomb_segments actually did
something. But it's a complete no-op at present, so I co-opted it.
Nice name, but I'd prefer max_cluster_bombs.
-
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-26 22:21 ` Andrew Morton
@ 2001-11-27 7:42 ` Jens Axboe
2001-11-27 7:58 ` Mike Fedyk
2001-11-27 8:31 ` Andrew Morton
0 siblings, 2 replies; 53+ messages in thread
From: Jens Axboe @ 2001-11-27 7:42 UTC (permalink / raw)
To: Andrew Morton; +Cc: Nathan G. Grennan, linux-kernel
On Mon, Nov 26 2001, Andrew Morton wrote:
> 2: The current elevator design is downright cruel to humans in
> the presence of heavy write traffic.
max_bomb_segments logic was established to help absolutely _nothing_ a
long time ago.
I agree that the current i/o scheduler has really bad interactive
performance -- at first sight your changes looks mostly like add-on
hacks though. Arjan's priority based scheme is more promising.
--
Jens Axboe
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 7:42 ` Jens Axboe
@ 2001-11-27 7:58 ` Mike Fedyk
2001-11-27 8:01 ` Jens Axboe
2001-11-27 8:31 ` Andrew Morton
1 sibling, 1 reply; 53+ messages in thread
From: Mike Fedyk @ 2001-11-27 7:58 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, Nathan G. Grennan, linux-kernel
On Tue, Nov 27, 2001 at 08:42:34AM +0100, Jens Axboe wrote:
> On Mon, Nov 26 2001, Andrew Morton wrote:
> > 2: The current elevator design is downright cruel to humans in
> > the presence of heavy write traffic.
>
> max_bomb_segments logic was established to help absolutely _nothing_ a
> long time ago.
>
> I agree that the current i/o scheduler has really bad interactive
> performance -- at first sight your changes looks mostly like add-on
> hacks though. Arjan's priority based scheme is more promising.
>
Based on pid priority or niceness?
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 7:58 ` Mike Fedyk
@ 2001-11-27 8:01 ` Jens Axboe
0 siblings, 0 replies; 53+ messages in thread
From: Jens Axboe @ 2001-11-27 8:01 UTC (permalink / raw)
To: Andrew Morton, Nathan G. Grennan, linux-kernel
On Mon, Nov 26 2001, Mike Fedyk wrote:
> On Tue, Nov 27, 2001 at 08:42:34AM +0100, Jens Axboe wrote:
> > On Mon, Nov 26 2001, Andrew Morton wrote:
> > > 2: The current elevator design is downright cruel to humans in
> > > the presence of heavy write traffic.
> >
> > max_bomb_segments logic was established to help absolutely _nothing_ a
> > long time ago.
> >
> > I agree that the current i/o scheduler has really bad interactive
> > performance -- at first sight your changes looks mostly like add-on
> > hacks though. Arjan's priority based scheme is more promising.
> >
>
> Based on pid priority or niceness?
None of the above yet. It isn't hard to add process I/O priority and
inherit that once the support is there in the i/o scheduler / block
layer, though.
--
Jens Axboe
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 7:42 ` Jens Axboe
2001-11-27 7:58 ` Mike Fedyk
@ 2001-11-27 8:31 ` Andrew Morton
2001-11-27 8:38 ` Jens Axboe
1 sibling, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2001-11-27 8:31 UTC (permalink / raw)
To: Jens Axboe; +Cc: Nathan G. Grennan, linux-kernel
Jens Axboe wrote:
>
> I agree that the current i/o scheduler has really bad interactive
> performance -- at first sight your changes looks mostly like add-on
> hacks though.
Good hacks, or bad ones?
It keeps things localised. It works. It's tunable. It's the best
IO scheduler presently available.
> Arjan's priority based scheme is more promising.
If the IO priority becomes an attribute of the calling process
then an approach like that has value. For writes, the priority
should be driven by VM pressure and it's probably simpler just
to stick the priority into struct buffer_head -> struct request.
For reads, the priority could just be scooped out of *current.
If we're not going to push the IO priority all the way down from
userspace then you may as well keep the logic inside the elevator
and just say reads-go-here and writes-go-there.
But this has potential to turn into a great designfest. Are
we going to leave 2.4 as-is? Please say no.
-
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 8:31 ` Andrew Morton
@ 2001-11-27 8:38 ` Jens Axboe
0 siblings, 0 replies; 53+ messages in thread
From: Jens Axboe @ 2001-11-27 8:38 UTC (permalink / raw)
To: Andrew Morton; +Cc: Nathan G. Grennan, linux-kernel
On Tue, Nov 27 2001, Andrew Morton wrote:
> Jens Axboe wrote:
> >
> > I agree that the current i/o scheduler has really bad interactive
> > performance -- at first sight your changes looks mostly like add-on
> > hacks though.
>
> Good hacks, or bad ones?
>
> It keeps things localised. It works. It's tunable. It's the best
> IO scheduler presently available.
Hacks look ok on cursory glances :-)
> > Arjan's priority based scheme is more promising.
>
> If the IO priority becomes an attribute of the calling process
> then an approach like that has value. For writes, the priority
> should be driven by VM pressure and it's probably simpler just
> to stick the priority into struct buffer_head -> struct request.
> For reads, the priority could just be scooped out of *current.
>
> If we're not going to push the IO priority all the way down from
> userspace then you may as well keep the logic inside the elevator
> and just say reads-go-here and writes-go-there.
Priority will be passed down for reads as you suggest, at least that is
the intention I had as well. I've only worked on 2.5 with this, but I
guess we can find some space in the buffer_head to squeeze in some
priority bits.
> But this has potential to turn into a great designfest. Are
Oh yeah
> we going to leave 2.4 as-is? Please say no.
I'd be happy to review anything you come up with -- or in other works,
feel free to knock yourself out, I'm busy with other stuff currently :)
--
Jens Axboe
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: Unresponiveness of 2.4.16
2001-11-26 23:34 ` Nicolas Pitre
2001-11-27 0:05 ` Steve Lion
@ 2001-11-27 9:12 ` Ahmed Masud
2001-11-27 17:12 ` Andrew Morton
1 sibling, 1 reply; 53+ messages in thread
From: Ahmed Masud @ 2001-11-27 9:12 UTC (permalink / raw)
To: 'Nicolas Pitre', 'Alan Cox'
Cc: 'Nathan G. Grennan', 'lkml'
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Nicolas
> Pitre Sent: Monday, November 26, 2001 6:34 PM
> To: Alan Cox
> Cc: Nathan G. Grennan; lkml
> Subject: Re: Unresponiveness of 2.4.16
>
>
> On Mon, 26 Nov 2001, Alan Cox wrote:
>
> > > 2.4.16 becomes very unresponsive for 30 seconds or so at a time
> > > during large unarchiving of tarballs, like tar -zxf
> > > mozilla-src.tar.gz. The file is about 36mb. I run top in
> one window,
> > > run free repeatedly in
> >
> > This seems to be one of the small as yet unresolved
> problems with the
> > newer VM code in 2.4.16. I've not managed to prove its the
> VM or the
> > differing I/O scheduling rules however.
>
> FWIW...
>
> I experienced quite the same unresponsiveness but more in the
> order of 4-5 seconds since I started to use ext3 with RH 7.2
> (i.e. kernel 2.4.7 based).
> I'm currently running 2.4.15-pre7 and the same momentary
> stalls are there just like with 2.4.7. It is much more
> visible when applying large patches to a kernel source tree
> as the patch output stops scrolling from time to time for
> about 5 secs. I never saw such thing while previously using
> reiserfs.
> I've yet to try reiserfs on a 2.4.16 tree to see if this is
> actually an ext3 problem.
>
>
Just to add to the above something I've experienced:
2.4.12 - 2.4.14 on a number of AMD Athelon 900 with 256 MB
RAM doing serial I/O would miss data while any DISK writes would
occure.
Reads would be okay but writes of any significance like untarring a
relatively large tar ball ( > 10 megs ).
While turning on UDMA for PROMISE PDC20265 chipset significantly
reduced the
Slugishness (by an order of magnitude) the problem would still crop
up
Whenever there were more than three processing doing disk writing.
CPU: AMD 900 Athelon
Chipset: VIA
IDE Controller: PROMISE PDC20265
Disks: IBM ATA100 IC35L020AVER07-0
I tried the same operations on Reiserfs, ext2 and ext3; on direct
partitions
on software raid 1 devices and on LVM ( 1.0.1-rc4 patches from
sistina ).
All permutations with all kernels 2.4.12 thru to 2.4.14 yield
identical results
... Loss of data while selecting on serial ports while there are
heavy writes to
the file system.
Doing the same operation on same hardware with 2.2.16 yields no loss
of data.
Perhaps if I can get some guidance as to what else to try to resolve
whether this is
a VM related problem or an IO subsystem related problem, I'll be more
than happy to
experiment and relay the results.
Ahmed
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.5.3 for non-commercial use <http://www.pgp.com>
iQA/AwUBPANZG+A+WVFT6/r4EQL7PgCg3dWSrBDxsxqCF6OY1YiKDiEd34sAnA4W
S6Zb2wfzBj6bXETTFNoYzTlW
=HFWs
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
@ 2001-11-27 9:56 willy tarreau
2001-11-27 10:57 ` Heinz Diehl
0 siblings, 1 reply; 53+ messages in thread
From: willy tarreau @ 2001-11-27 9:56 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
> Please try this lot:
Hi Andrew,
I just tried 2.4.16 with and without your patch.
During the
test, I wrote a 640 MB file on an IDE disk at an
average
speed of 10 MB/s. Without your patch, I could easily
reproduce the slugginess other people report, mostly
at
the login prompt. But when I applied your patch, I can
log in
immediately, so yes, I can say that your patch
improves
things dramatically.
I can't say yet if there are side effects, but I keep
testing.
Regards,
Willy
___________________________________________________________
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Yahoo! Courrier : http://courrier.yahoo.fr
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 9:56 willy tarreau
@ 2001-11-27 10:57 ` Heinz Diehl
0 siblings, 0 replies; 53+ messages in thread
From: Heinz Diehl @ 2001-11-27 10:57 UTC (permalink / raw)
To: linux-kernel
On Tue Nov 27 2001, willy tarreau wrote:
> I just tried 2.4.16 with and without your patch.
I applied Andrew's patch to 2.5.1-pre1.
> Without your patch, I could easily
> reproduce the slugginess other people report, mostly
> at the login prompt. But when I applied your patch, I can
> log in immediately, so yes, I can say that your patch
> improves things dramatically.
The same thing here: with the patch applied, things improved, without
I can also easily reproduce unresponsiveness. It definitely fixes
the problem....
--
# Heinz Diehl, 68259 Mannheim, Germany
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 9:12 ` Ahmed Masud
@ 2001-11-27 17:12 ` Andrew Morton
2001-11-27 20:31 ` Mike Fedyk
0 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2001-11-27 17:12 UTC (permalink / raw)
To: Ahmed Masud; +Cc: 'lkml'
Ahmed Masud wrote:
>
> Just to add to the above something I've experienced:
>
> 2.4.12 - 2.4.14 on a number of AMD Athelon 900 with 256 MB
> RAM doing serial I/O would miss data while any DISK writes would
> occure.
Two possibilities suggest themselves:
- Interrupt latency. Last time I checked (a year ago), the worst-case
interrupt latency of the IDE drivers was 80 microseconds on a 500MHz PII.
That was with `hdparm -u 1'. That's pretty good.
Could you please confirm that you're using `hdparm -u 1' against the
relevant disk?
- The serial port is working OK, but the application which is handling
serial IO is blocked on a disk read (something got paged out), and
that disk read fails to complete by the time the serial port buffer
fills up.
I'll send you a patch which makes the VM less inclined to page things
out in the presence of heavy writes, and which decreases read
latencies.
Thanks.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 17:12 ` Andrew Morton
@ 2001-11-27 20:31 ` Mike Fedyk
2001-11-27 20:57 ` Andrew Morton
0 siblings, 1 reply; 53+ messages in thread
From: Mike Fedyk @ 2001-11-27 20:31 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ahmed Masud, 'lkml'
On Tue, Nov 27, 2001 at 09:12:13AM -0800, Andrew Morton wrote:
> Ahmed Masud wrote:
> >
> > Just to add to the above something I've experienced:
> >
> > 2.4.12 - 2.4.14 on a number of AMD Athelon 900 with 256 MB
> > RAM doing serial I/O would miss data while any DISK writes would
> > occure.
>
> Two possibilities suggest themselves:
>
> - Interrupt latency. Last time I checked (a year ago), the worst-case
> interrupt latency of the IDE drivers was 80 microseconds on a 500MHz PII.
> That was with `hdparm -u 1'. That's pretty good.
>
> Could you please confirm that you're using `hdparm -u 1' against the
> relevant disk?
>
> - The serial port is working OK, but the application which is handling
> serial IO is blocked on a disk read (something got paged out), and
> that disk read fails to complete by the time the serial port buffer
> fills up.
>
> I'll send you a patch which makes the VM less inclined to page things
> out in the presence of heavy writes, and which decreases read
> latencies.
>
Is this patch posted anywhere?
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 20:31 ` Mike Fedyk
@ 2001-11-27 20:57 ` Andrew Morton
2001-11-27 21:19 ` Martin Eriksson
` (2 more replies)
0 siblings, 3 replies; 53+ messages in thread
From: Andrew Morton @ 2001-11-27 20:57 UTC (permalink / raw)
To: Mike Fedyk; +Cc: Ahmed Masud, 'lkml'
Mike Fedyk wrote:
>
> > I'll send you a patch which makes the VM less inclined to page things
> > out in the presence of heavy writes, and which decreases read
> > latencies.
> >
> Is this patch posted anywhere?
I sent it yesterday, in this thread. Here it is again.
Description:
- Account for locked as well as dirty buffers when deciding
to throttle writers.
- Tweak VM to make it work the inactive list harder, before starting
to evict pages or swap.
- Change the elevator so that once a request's latency has
expired, we can still perform merges in front of that
request. But we no longer will insert new requests in
front of that request.
- Modify elevator so that new read requests do not have
more than N write requests placed in front of them, where
N is tunable per-device with `elvtune -b'.
Theoretically, the last change needs significant alterations
to the readhead code. But a rewrite of readhead made negligible
difference (I wasn't able to trigger the failure scenario).
Still crunching on this.
--- linux-2.4.16-pre1/fs/buffer.c Thu Nov 22 23:02:58 2001
+++ linux-akpm/fs/buffer.c Sun Nov 25 00:07:47 2001
@@ -1036,6 +1036,7 @@ static int balance_dirty_state(void)
unsigned long dirty, tot, hard_dirty_limit, soft_dirty_limit;
dirty = size_buffers_type[BUF_DIRTY] >> PAGE_SHIFT;
+ dirty += size_buffers_type[BUF_LOCKED] >> PAGE_SHIFT;
tot = nr_free_buffer_pages();
dirty *= 100;
--- linux-2.4.16-pre1/mm/filemap.c Sat Nov 24 13:14:52 2001
+++ linux-akpm/mm/filemap.c Sun Nov 25 00:07:47 2001
@@ -3023,7 +3023,18 @@ generic_file_write(struct file *file,con
unlock:
kunmap(page);
/* Mark it unlocked again and drop the page.. */
- SetPageReferenced(page);
+// SetPageReferenced(page);
+ ClearPageReferenced(page);
+#if 0
+ {
+ lru_cache_del(page);
+ TestSetPageLRU(page);
+ spin_lock(&pagemap_lru_lock);
+ list_add_tail(&(page)->lru, &inactive_list);
+ nr_inactive_pages++;
+ spin_unlock(&pagemap_lru_lock);
+ }
+#endif
UnlockPage(page);
page_cache_release(page);
--- linux-2.4.16-pre1/mm/vmscan.c Thu Nov 22 23:02:59 2001
+++ linux-akpm/mm/vmscan.c Sun Nov 25 00:08:03 2001
@@ -573,6 +573,9 @@ static int shrink_caches(zone_t * classz
nr_pages = shrink_cache(nr_pages, classzone, gfp_mask, priority);
if (nr_pages <= 0)
return 0;
+ nr_pages = shrink_cache(nr_pages, classzone, gfp_mask, priority);
+ if (nr_pages <= 0)
+ return 0;
shrink_dcache_memory(priority, gfp_mask);
shrink_icache_memory(priority, gfp_mask);
@@ -585,7 +588,7 @@ static int shrink_caches(zone_t * classz
int try_to_free_pages(zone_t *classzone, unsigned int gfp_mask, unsigned int order)
{
- int priority = DEF_PRIORITY;
+ int priority = DEF_PRIORITY - 2;
int nr_pages = SWAP_CLUSTER_MAX;
do {
--- linux-2.4.16/include/linux/elevator.h Thu Feb 15 16:58:34 2001
+++ linux-akpm/include/linux/elevator.h Tue Nov 27 12:34:59 2001
@@ -5,8 +5,9 @@ typedef void (elevator_fn) (struct reque
struct list_head *,
struct list_head *, int);
-typedef int (elevator_merge_fn) (request_queue_t *, struct request **, struct list_head *,
- struct buffer_head *, int, int);
+typedef int (elevator_merge_fn)(request_queue_t *, struct request **,
+ struct list_head *, struct buffer_head *bh,
+ int rw, int max_sectors, int max_bomb_segments);
typedef void (elevator_merge_cleanup_fn) (request_queue_t *, struct request *, int);
@@ -16,6 +17,7 @@ struct elevator_s
{
int read_latency;
int write_latency;
+ int max_bomb_segments;
elevator_merge_fn *elevator_merge_fn;
elevator_merge_cleanup_fn *elevator_merge_cleanup_fn;
@@ -24,13 +26,13 @@ struct elevator_s
unsigned int queue_ID;
};
-int elevator_noop_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int);
-void elevator_noop_merge_cleanup(request_queue_t *, struct request *, int);
-void elevator_noop_merge_req(struct request *, struct request *);
-
-int elevator_linus_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int);
-void elevator_linus_merge_cleanup(request_queue_t *, struct request *, int);
-void elevator_linus_merge_req(struct request *, struct request *);
+elevator_merge_fn elevator_noop_merge;
+elevator_merge_cleanup_fn elevator_noop_merge_cleanup;
+elevator_merge_req_fn elevator_noop_merge_req;
+
+elevator_merge_fn elevator_linus_merge;
+elevator_merge_cleanup_fn elevator_linus_merge_cleanup;
+elevator_merge_req_fn elevator_linus_merge_req;
typedef struct blkelv_ioctl_arg_s {
int queue_ID;
@@ -54,22 +56,6 @@ extern void elevator_init(elevator_t *,
#define ELEVATOR_FRONT_MERGE 1
#define ELEVATOR_BACK_MERGE 2
-/*
- * This is used in the elevator algorithm. We don't prioritise reads
- * over writes any more --- although reads are more time-critical than
- * writes, by treating them equally we increase filesystem throughput.
- * This turns out to give better overall performance. -- sct
- */
-#define IN_ORDER(s1,s2) \
- ((((s1)->rq_dev == (s2)->rq_dev && \
- (s1)->sector < (s2)->sector)) || \
- (s1)->rq_dev < (s2)->rq_dev)
-
-#define BHRQ_IN_ORDER(bh, rq) \
- ((((bh)->b_rdev == (rq)->rq_dev && \
- (bh)->b_rsector < (rq)->sector)) || \
- (bh)->b_rdev < (rq)->rq_dev)
-
static inline int elevator_request_latency(elevator_t * elevator, int rw)
{
int latency;
@@ -85,7 +71,7 @@ static inline int elevator_request_laten
((elevator_t) { \
0, /* read_latency */ \
0, /* write_latency */ \
- \
+ 0, /* max_bomb_segments */ \
elevator_noop_merge, /* elevator_merge_fn */ \
elevator_noop_merge_cleanup, /* elevator_merge_cleanup_fn */ \
elevator_noop_merge_req, /* elevator_merge_req_fn */ \
@@ -95,7 +81,7 @@ static inline int elevator_request_laten
((elevator_t) { \
8192, /* read passovers */ \
16384, /* write passovers */ \
- \
+ 6, /* max_bomb_segments */ \
elevator_linus_merge, /* elevator_merge_fn */ \
elevator_linus_merge_cleanup, /* elevator_merge_cleanup_fn */ \
elevator_linus_merge_req, /* elevator_merge_req_fn */ \
--- linux-2.4.16/drivers/block/elevator.c Thu Jul 19 20:59:41 2001
+++ linux-akpm/drivers/block/elevator.c Tue Nov 27 12:35:20 2001
@@ -74,36 +74,41 @@ inline int bh_rq_in_between(struct buffe
return 0;
}
-
int elevator_linus_merge(request_queue_t *q, struct request **req,
struct list_head * head,
struct buffer_head *bh, int rw,
- int max_sectors)
+ int max_sectors, int max_bomb_segments)
{
- struct list_head *entry = &q->queue_head;
- unsigned int count = bh->b_size >> 9, ret = ELEVATOR_NO_MERGE;
+ struct list_head *entry;
+ unsigned int count = bh->b_size >> 9;
+ unsigned int ret = ELEVATOR_NO_MERGE;
+ int no_in_between = 0;
+ entry = &q->queue_head;
while ((entry = entry->prev) != head) {
struct request *__rq = blkdev_entry_to_request(entry);
-
- /*
- * simply "aging" of requests in queue
- */
- if (__rq->elevator_sequence-- <= 0)
- break;
-
+ if (__rq->elevator_sequence-- <= 0) {
+ /*
+ * OK, we've exceeded someone's latency limit.
+ * But we still continue to look for merges,
+ * because they're so much better than seeks.
+ */
+ no_in_between = 1;
+ }
if (__rq->waiting)
continue;
if (__rq->rq_dev != bh->b_rdev)
continue;
- if (!*req && bh_rq_in_between(bh, __rq, &q->queue_head))
+ if (!*req && !no_in_between &&
+ bh_rq_in_between(bh, __rq, &q->queue_head)) {
*req = __rq;
+ }
if (__rq->cmd != rw)
continue;
if (__rq->nr_sectors + count > max_sectors)
continue;
if (__rq->elevator_sequence < count)
- break;
+ no_in_between = 1;
if (__rq->sector + __rq->nr_sectors == bh->b_rsector) {
ret = ELEVATOR_BACK_MERGE;
*req = __rq;
@@ -116,6 +121,56 @@ int elevator_linus_merge(request_queue_t
}
}
+ /*
+ * If we failed to merge a read anywhere in the request
+ * queue, we really don't want to place it at the end
+ * of the list, behind lots of writes. So place it near
+ * the front.
+ *
+ * We don't want to place it in front of _all_ writes: that
+ * would create lots of seeking, and isn't tunable.
+ * We try to avoid promoting this read in front of existing
+ * reads.
+ *
+ * max_bomb_sectors becomes the maximum number of write
+ * requests which we allow to remain in place in front of
+ * a newly introduced read. We weight things a little bit,
+ * so large writes are more expensive than small ones, but it's
+ * requests which count, not sectors.
+ */
+ if (rw == READ && ret == ELEVATOR_NO_MERGE) {
+ int cur_latency = 0;
+ struct request * const cur_request = *req;
+
+ entry = head->next;
+ while (entry != &q->queue_head) {
+ struct request *__rq;
+
+ if (entry == &q->queue_head)
+ BUG();
+ if (entry == q->queue_head.next &&
+ q->head_active && !q->plugged)
+ BUG();
+ __rq = blkdev_entry_to_request(entry);
+
+ if (__rq == cur_request) {
+ /*
+ * This is where the old algorithm placed it.
+ * There's no point pushing it further back,
+ * so leave it here, in sorted order.
+ */
+ break;
+ }
+ if (__rq->cmd == WRITE) {
+ cur_latency += 1 + __rq->nr_sectors / 64;
+ if (cur_latency >= max_bomb_segments) {
+ *req = __rq;
+ break;
+ }
+ }
+ entry = entry->next;
+ }
+ }
return ret;
}
@@ -144,7 +199,7 @@ void elevator_linus_merge_req(struct req
int elevator_noop_merge(request_queue_t *q, struct request **req,
struct list_head * head,
struct buffer_head *bh, int rw,
- int max_sectors)
+ int max_sectors, int max_bomb_segments)
{
struct list_head *entry;
unsigned int count = bh->b_size >> 9;
@@ -188,7 +243,7 @@ int blkelvget_ioctl(elevator_t * elevato
output.queue_ID = elevator->queue_ID;
output.read_latency = elevator->read_latency;
output.write_latency = elevator->write_latency;
- output.max_bomb_segments = 0;
+ output.max_bomb_segments = elevator->max_bomb_segments;
if (copy_to_user(arg, &output, sizeof(blkelv_ioctl_arg_t)))
return -EFAULT;
@@ -207,9 +262,12 @@ int blkelvset_ioctl(elevator_t * elevato
return -EINVAL;
if (input.write_latency < 0)
return -EINVAL;
+ if (input.max_bomb_segments < 0)
+ return -EINVAL;
elevator->read_latency = input.read_latency;
elevator->write_latency = input.write_latency;
+ elevator->max_bomb_segments = input.max_bomb_segments;
return 0;
}
--- linux-2.4.16/drivers/block/ll_rw_blk.c Mon Nov 5 21:01:11 2001
+++ linux-akpm/drivers/block/ll_rw_blk.c Tue Nov 27 12:34:59 2001
@@ -690,7 +690,8 @@ again:
} else if (q->head_active && !q->plugged)
head = head->next;
- el_ret = elevator->elevator_merge_fn(q, &req, head, bh, rw,max_sectors);
+ el_ret = elevator->elevator_merge_fn(q, &req, head, bh,
+ rw, max_sectors, elevator->max_bomb_segments);
switch (el_ret) {
case ELEVATOR_BACK_MERGE:
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 20:57 ` Andrew Morton
@ 2001-11-27 21:19 ` Martin Eriksson
2001-11-27 21:24 ` Mike Fedyk
2001-11-28 18:24 ` Marcelo Tosatti
2 siblings, 0 replies; 53+ messages in thread
From: Martin Eriksson @ 2001-11-27 21:19 UTC (permalink / raw)
To: Andrew Morton, Mike Fedyk; +Cc: Ahmed Masud, 'lkml'
----- Original Message -----
From: "Andrew Morton" <akpm@zip.com.au>
To: "Mike Fedyk" <mfedyk@matchmail.com>
Cc: "Ahmed Masud" <masud@googgun.com>; "'lkml'"
<linux-kernel@vger.kernel.org>
Sent: Tuesday, November 27, 2001 9:57 PM
Subject: Re: Unresponiveness of 2.4.16
> Mike Fedyk wrote:
> >
> > > I'll send you a patch which makes the VM less inclined to page
things
> > > out in the presence of heavy writes, and which decreases read
> > > latencies.
> > >
> > Is this patch posted anywhere?
>
> I sent it yesterday, in this thread. Here it is again.
<snip>
I have made it available at
http://www.cs.umu.se/~c97men/linux/am-response-2.4.16.patch
because I personally like a link or attachment, as that doesn't mess up the
whitespace...(goddamn OE)
I hope you don't mind?
Btw, I'm happily running your patch with
2.4.16 (final)
preempt-kernel-rml-2.4.16-1
ide.2.4.16-p1.11242001
/Martin
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 20:57 ` Andrew Morton
2001-11-27 21:19 ` Martin Eriksson
@ 2001-11-27 21:24 ` Mike Fedyk
2001-11-28 18:24 ` Marcelo Tosatti
2 siblings, 0 replies; 53+ messages in thread
From: Mike Fedyk @ 2001-11-27 21:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ahmed Masud, 'lkml'
On Tue, Nov 27, 2001 at 12:57:19PM -0800, Andrew Morton wrote:
> Mike Fedyk wrote:
> >
> > > I'll send you a patch which makes the VM less inclined to page things
> > > out in the presence of heavy writes, and which decreases read
> > > latencies.
> > >
> > Is this patch posted anywhere?
>
> I sent it yesterday, in this thread. Here it is again.
>
Yep, saw it. I didn't realize (didn't read patch) that it modified the VM
swapping.
> Description:
>
> - Account for locked as well as dirty buffers when deciding
> to throttle writers.
>
> - Tweak VM to make it work the inactive list harder, before starting
> to evict pages or swap.
>
> - Change the elevator so that once a request's latency has
> expired, we can still perform merges in front of that
> request. But we no longer will insert new requests in
> front of that request.
>
> - Modify elevator so that new read requests do not have
> more than N write requests placed in front of them, where
> N is tunable per-device with `elvtune -b'.
>
> Theoretically, the last change needs significant alterations
> to the readhead code. But a rewrite of readhead made negligible
> difference (I wasn't able to trigger the failure scenario).
> Still crunching on this.
>
Sounds great.
I'll test it out.
MF
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: Unresponiveness of 2.4.16
@ 2001-11-28 0:33 Torrey Hoffman
2001-11-28 0:48 ` Andrew Morton
0 siblings, 1 reply; 53+ messages in thread
From: Torrey Hoffman @ 2001-11-28 0:33 UTC (permalink / raw)
To: 'Andrew Morton'; +Cc: 'lkml'
I've running 2.4.16 with this VM patch combined with your
2.4.15-pre7-low-latency patch from www.zip.com.au. (it applied with a
little fuzz, no rejects). Is this a combination that you would feel
comfortable with?
So far it hasn't blown up on me, and in fact seems very quick and
responsive.
Unless I hear a "No, don't do that!", I'm going to push this kernel into
testing for our video applications...
Thanks!
Torrey Hoffman
torrey.hoffman@myrio.com
Andrew Morton wrote:
[...]
> Description:
>
> - Account for locked as well as dirty buffers when deciding
> to throttle writers.
>
> - Tweak VM to make it work the inactive list harder, before starting
> to evict pages or swap.
>
> - Change the elevator so that once a request's latency has
> expired, we can still perform merges in front of that
> request. But we no longer will insert new requests in
> front of that request.
>
> - Modify elevator so that new read requests do not have
> more than N write requests placed in front of them, where
> N is tunable per-device with `elvtune -b'.
>
> Theoretically, the last change needs significant alterations
> to the readhead code. But a rewrite of readhead made negligible
> difference (I wasn't able to trigger the failure scenario).
> Still crunching on this.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 0:33 Unresponiveness of 2.4.16 Torrey Hoffman
@ 2001-11-28 0:48 ` Andrew Morton
2001-11-28 18:09 ` Marcelo Tosatti
0 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2001-11-28 0:48 UTC (permalink / raw)
To: Torrey Hoffman; +Cc: 'lkml'
Torrey Hoffman wrote:
>
> I've running 2.4.16 with this VM patch combined with your
> 2.4.15-pre7-low-latency patch from www.zip.com.au. (it applied with a
> little fuzz, no rejects). Is this a combination that you would feel
> comfortable with?
Should be OK. There is a possibility of livelock when you have
a lot of dirty buffers against multiple devices. It may
be a good idea to pick up the 2.4.16 low-latency patch.
http://www.zip.com.au/~akpm/linux/2.4.16-low-latency.patch.gz
> So far it hasn't blown up on me, and in fact seems very quick and
> responsive.
>
> Unless I hear a "No, don't do that!", I'm going to push this kernel into
> testing for our video applications...
If any quantitative results become available, please share...
-
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: Unresponiveness of 2.4.16
@ 2001-11-28 1:31 Dieter Nützel
2001-11-28 2:13 ` Andrew Morton
0 siblings, 1 reply; 53+ messages in thread
From: Dieter Nützel @ 2001-11-28 1:31 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Kernel List
Andrew Morton wrote:
> Jens Axboe wrote:
> >
> > I agree that the current i/o scheduler has really bad interactive
> > performance -- at first sight your changes looks mostly like add-on
> > hacks though.
>
> Good hacks, or bad ones?
As I can "see" not so good.
I've tried "dbench 32" and playing an MP3 with Noatun (KDE-2.2.2) and "saw"
my reported hiccup since 2.4.7-ac4, as always.
Noatun stops after 9-10 seconds of the "dbench 32" run and then every few
seconds, again and again. The hiccup take place more often but for shorter
times then without your patch.
System was:
2.4.16 +
preempt +
lock-break-rml-2.4.16-1.patch +
all ReiserFS patches for 2.4.16
1 GHz Athlon II
MSI MS-6167 Rev 1.0B (AMD Irongate C4, without bypass)
640 MB PC100-2-2-2 SDRAM
U160 IBM 18 GB disk
AHA-2940 UW
> It keeps things localised. It works. It's tunable. It's the best
> IO scheduler presently available.
Throughput was a little lower ;-)
Don't forget to tune max-readahead.
I've used 127 and that gave me 4 MB (at the end) to 6 MB (at the beginning of
the disk) more transferrate.
Write caching is off per default on all of my disks and it didn't offer much
gain with dbench and bonnie++.
> > Arjan's priority based scheme is more promising.
>
> If the IO priority becomes an attribute of the calling process
> then an approach like that has value. For writes, the priority
> should be driven by VM pressure and it's probably simpler just
> to stick the priority into struct buffer_head -> struct request.
> For reads, the priority could just be scooped out of *current.
Yes, please. I think, too that we need IO priority even for "little" IO
consuming (weak) RT tasks (MP3, DVD, etc).
> If we're not going to push the IO priority all the way down from
> userspace then you may as well keep the logic inside the elevator
> and just say reads-go-here and writes-go-there.
>
> But this has potential to turn into a great designfest. Are
> we going to leave 2.4 as-is? Please say no.
I'll second that.
Thank you for your work, Andrew!
-Dieter
--
Dieter Nützel
Graduate Student, Computer Science
University of Hamburg
Department of Computer Science
@home: Dieter.Nuetzel@hamburg.de
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 1:31 Dieter Nützel
@ 2001-11-28 2:13 ` Andrew Morton
2001-11-28 2:34 ` Mike Fedyk
0 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2001-11-28 2:13 UTC (permalink / raw)
To: Dieter Nützel; +Cc: Linux Kernel List
Dieter Nützel wrote:
>
> Andrew Morton wrote:
> > Jens Axboe wrote:
> > >
> > > I agree that the current i/o scheduler has really bad interactive
> > > performance -- at first sight your changes looks mostly like add-on
> > > hacks though.
> >
> > Good hacks, or bad ones?
>
> As I can "see" not so good.
> I've tried "dbench 32" and playing an MP3 with Noatun (KDE-2.2.2) and "saw"
> my reported hiccup since 2.4.7-ac4, as always.
Ah. dbench. The change to balance_dirty_state() absolutely
cripples dbench throughput. And that really doesn't matter,
unless you want to run dbench for a living.
You can get the dbench throughput back by increasing the
async and sync dirty buffer writeback thresholds:
echo 70 64 64 256 30000 3000 80 0 0 > /proc/sys/vm/bdflush
> Noatun stops after 9-10 seconds of the "dbench 32" run and then every few
> seconds, again and again. The hiccup take place more often but for shorter
> times then without your patch.
Probably Noatun needs larger buffers if it is to survive concurrent
dbench. You may see improvement with
elvtune -b N /dev/hdaX
where 8 >= N >= 1.
> System was:
>
> 2.4.16 +
> preempt +
> lock-break-rml-2.4.16-1.patch +
> all ReiserFS patches for 2.4.16
>
> 1 GHz Athlon II
> MSI MS-6167 Rev 1.0B (AMD Irongate C4, without bypass)
> 640 MB PC100-2-2-2 SDRAM
> U160 IBM 18 GB disk
> AHA-2940 UW
>
> > It keeps things localised. It works. It's tunable. It's the best
> > IO scheduler presently available.
>
> Throughput was a little lower ;-)
dbench? Throughput seems to scale with the fourth power of the
amount of RAM you chuck at it :)
> Don't forget to tune max-readahead.
Yes. Readahead is fairly critical and there may be additional fixes
needed in this area.
Someone recently added the /proc/sys/vm/max_readahead (?) tunable.
Beware of this. It only works for device drivers which do not
populate their own readhead table. For IDE, it *looks* like
it works, but it doesn't. For IDE, the only way to alter VM
readahead is via
echo file_readahead:N > /proc/ide/ide0/hda/settings
where N is in kilobytes in 2.4.16 kernels. In earlier kernels
it's kilopages (!).
-
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 2:13 ` Andrew Morton
@ 2001-11-28 2:34 ` Mike Fedyk
2001-11-28 2:48 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 53+ messages in thread
From: Mike Fedyk @ 2001-11-28 2:34 UTC (permalink / raw)
To: Andrew Morton; +Cc: Dieter N?tzel, Linux Kernel List
On Tue, Nov 27, 2001 at 06:13:41PM -0800, Andrew Morton wrote:
> Dieter N?tzel wrote:
> > Don't forget to tune max-readahead.
>
> Yes. Readahead is fairly critical and there may be additional fixes
> needed in this area.
>
> Someone recently added the /proc/sys/vm/max_readahead (?) tunable.
> Beware of this. It only works for device drivers which do not
> populate their own readhead table. For IDE, it *looks* like
> it works, but it doesn't. For IDE, the only way to alter VM
> readahead is via
>
> echo file_readahead:N > /proc/ide/ide0/hda/settings
>
> where N is in kilobytes in 2.4.16 kernels.
Any idea which drivers it will/won't work on? ie, "almost all ide" or
"almost none of the ide driers"?
>In earlier kernels
> it's kilopages (!).
Isn't this part of the max-readahead patch?
Does /proc/sys/vm/max_readahead affect scsi in any way?
What layer does /proc/sys/vm/max_readahead affect? Block? FS?
MF
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 2:34 ` Mike Fedyk
@ 2001-11-28 2:48 ` Andrew Morton
2001-11-28 20:21 ` Roger Larsson
2001-11-28 3:53 ` Dieter Nützel
[not found] ` <200111280353.fAS3rEB05638@zero.tech9.net>
2 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2001-11-28 2:48 UTC (permalink / raw)
To: Mike Fedyk; +Cc: Dieter N?tzel, Linux Kernel List
Mike Fedyk wrote:
>
> > echo file_readahead:N > /proc/ide/ide0/hda/settings
> >
> > where N is in kilobytes in 2.4.16 kernels.
>
> Any idea which drivers it will/won't work on? ie, "almost all ide" or
> "almost none of the ide driers"?
It appears that all IDE is controlled with /proc/ide/ide0/hda/settings
> >In earlier kernels
> > it's kilopages (!).
>
> Isn't this part of the max-readahead patch?
No, that fix went in separately. Roger Larsson created it, then
I hit the same problem and forwarded Roger's patch to the relevant
parties.
> Does /proc/sys/vm/max_readahead affect scsi in any way?
Well, `grep -r max_readahead drivers/scsi' comes up blank,
so it looks like the scsi drivers don't implement the
driver-specific readhead tunable, and so they will fall back
to the /proc/sys/vm/max_readahead global. I guess.
> What layer does /proc/sys/vm/max_readahead affect? Block? FS?
The generic filesystem library code. The bit which sits
on top of the block layer and gets its block mappings from the
filesystem and does generic_file_readahead(). Variously
referred to as VFS or VM. It's neither, and both, really.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 2:34 ` Mike Fedyk
2001-11-28 2:48 ` Andrew Morton
@ 2001-11-28 3:53 ` Dieter Nützel
[not found] ` <200111280353.fAS3rEB05638@zero.tech9.net>
2 siblings, 0 replies; 53+ messages in thread
From: Dieter Nützel @ 2001-11-28 3:53 UTC (permalink / raw)
To: Mike Fedyk, Andrew Morton; +Cc: Linux Kernel List, Robert Love
Am Mittwoch, 28. November 2001 03:34 schrieb Mike Fedyk:
> On Tue, Nov 27, 2001 at 06:13:41PM -0800, Andrew Morton wrote:
> > Dieter N?tzel wrote:
> > > Don't forget to tune max-readahead.
> >
> > Yes. Readahead is fairly critical and there may be additional fixes
> > needed in this area.
> >
> > Someone recently added the /proc/sys/vm/max_readahead (?) tunable.
-mt (Marcelo Tosatti) our _new_ 2.4.x maintainer did it.
> Isn't this part of the max-readahead patch?
>
> Does /proc/sys/vm/max_readahead affect scsi in any way?
Hello people, can you read?
I've reported U160 (SCSI) IBM DDYS (Ultrastar 36LZX) 18 GB 10k results...;-)
Kernel default:
SunWave1 src/linux# cat /proc/sys/vm/min-readahead
3
SunWave1 src/linux# cat /proc/sys/vm/max-readahead
31
SunWave1 src/linux# hdparm -tT /dev/sda1
/dev/sda1:
Timing buffer-cache reads: 128 MB in 0.80 seconds =160.00 MB/sec
Timing buffered disk reads: 64 MB in 2.28 seconds = 28.07 MB/sec
SunWave1 src/linux# cat /proc/sys/vm/max-readahead
127
SunWave1 src/linux# hdparm -tT /dev/sda1
/dev/sda1:
Timing buffer-cache reads: 128 MB in 0.80 seconds =160.00 MB/sec
Timing buffered disk reads: 64 MB in 1.87 seconds = 34.22 MB/sec
So it improved hdparm by 0.5 MB at the inner and 6 MB at the outer cylinders.
max-readahead=31 max-readahead=127
26-28 MB/s 26.5-34 MB/s
max-readahead=63 is nearly the same
max-readahead=255 little slower
max-readahead=511 even little slower
Here is a snipped of the IBM specs:
Performance
Data buffer 4 MB²
Rotational speed 10,000 RPM
Latency (average) 2.99 ms
Media transfer rate 280-452 Mbits/sec
Interface transfer rate 160 MB/sec
Sustained data rate 21.7- 36.1MB/sec
Seek time
Average 4.9 ms
Track to track 0.5 ms
Full track 10.5 ms
To Robert Love:
I get the following in dmesg:
lock-break-rml-2.4.16-1.patch
date: busy buffer
lock_break: buffer.c:681: count was 2 not 551
invalidate: busy buffer
lock_break: buffer.c:681: count was 2 not 551
invalidate: busy buffer
[-]
lock-break-rml-2.4.16-2.patch
validate: busy buffer
invalidate: busy buffer
invalidate: busy buffer
invalidate: busy buffer
[-]
Now my dbench numbers.
First without Noatun playing Ogg-Vorbis:
dbench/dbench> time ./dbench 32
32 clients started
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+................+..............................................................................+++.+.+.......+.....++++....++.....+++++.++++.++.+++++++********************************
Throughput 43.8254 MB/sec (NB=54.7818 MB/sec 438.254 MBit/sec)
14.490u 53.230s 1:37.40 69.5% 0+0k 0+0io 937pf+0w
system load: 23.52
Second Noatun playing Ogg-Vorbis (with hiccup):
dbench/dbench> time ./dbench 32
32 clients started
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+....+........................+.......++++...+.+...........+.+.....+++++.+.++..++..+++.++.++++.++********************************
Throughput 42.1212 MB/sec (NB=52.6515 MB/sec 421.212 MBit/sec)
14.710u 53.940s 1:41.29 67.7% 0+0k 0+0io 937pf+0w
system load: 26.30
Not bad, I think.
Andrew, your patch follows tomorrow.
Regards,
Dieter
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
[not found] ` <200111280353.fAS3rEB05638@zero.tech9.net>
@ 2001-11-28 4:14 ` Robert Love
0 siblings, 0 replies; 53+ messages in thread
From: Robert Love @ 2001-11-28 4:14 UTC (permalink / raw)
To: Dieter Nützel; +Cc: Mike Fedyk, Andrew Morton, Linux Kernel List
On Tue, 2001-11-27 at 22:53, Dieter Nützel wrote:
> To Robert Love:
> I get the following in dmesg:
> lock-break-rml-2.4.16-1.patch
>
> date: busy buffer
> lock_break: buffer.c:681: count was 2 not 551
> invalidate: busy buffer
> lock_break: buffer.c:681: count was 2 not 551
> invalidate: busy buffer
Thanks for the feedback, Dieter.
Robert Love
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 0:48 ` Andrew Morton
@ 2001-11-28 18:09 ` Marcelo Tosatti
2001-11-28 19:38 ` Andrew Morton
0 siblings, 1 reply; 53+ messages in thread
From: Marcelo Tosatti @ 2001-11-28 18:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: Torrey Hoffman, 'lkml'
On Tue, 27 Nov 2001, Andrew Morton wrote:
> Torrey Hoffman wrote:
> >
> > I've running 2.4.16 with this VM patch combined with your
> > 2.4.15-pre7-low-latency patch from www.zip.com.au. (it applied with a
> > little fuzz, no rejects). Is this a combination that you would feel
> > comfortable with?
>
> Should be OK. There is a possibility of livelock when you have
> a lot of dirty buffers against multiple devices.
Could you please describe this one ?
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-27 20:57 ` Andrew Morton
2001-11-27 21:19 ` Martin Eriksson
2001-11-27 21:24 ` Mike Fedyk
@ 2001-11-28 18:24 ` Marcelo Tosatti
2001-11-28 18:57 ` Marcelo Tosatti
2001-11-28 20:31 ` Andrew Morton
2 siblings, 2 replies; 53+ messages in thread
From: Marcelo Tosatti @ 2001-11-28 18:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: Mike Fedyk, Ahmed Masud, 'lkml'
On Tue, 27 Nov 2001, Andrew Morton wrote:
> Mike Fedyk wrote:
> >
> > > I'll send you a patch which makes the VM less inclined to page things
> > > out in the presence of heavy writes, and which decreases read
> > > latencies.
> > >
> > Is this patch posted anywhere?
>
> I sent it yesterday, in this thread. Here it is again.
>
> Description:
>
> - Account for locked as well as dirty buffers when deciding
> to throttle writers.
Just one thing: If we have lots of locked buffers due to reads we are
going to may unecessarily block writes, and thats not any good.
But well, I prefer to fix interactivity than to care about that one kind
of workload, so I'm ok with it.
> - Tweak VM to make it work the inactive list harder, before starting
> to evict pages or swap.
I would like to see he interactivity problems get fixed on block layer
side first: Its not a VM issue initially. Actually, the thing is that if
you tweak VM this way you're going to break some workloads.
> - Change the elevator so that once a request's latency has
> expired, we can still perform merges in front of that
> request. But we no longer will insert new requests in
> front of that request.
Sounds fine... I've received quite many success reports already, right ?
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: Unresponiveness of 2.4.16
@ 2001-11-28 18:56 Torrey Hoffman
2001-11-28 19:31 ` Andrew Morton
0 siblings, 1 reply; 53+ messages in thread
From: Torrey Hoffman @ 2001-11-28 18:56 UTC (permalink / raw)
To: 'Andrew Morton', Dieter Nützel; +Cc: Linux Kernel List
Hmm. Speaking of dbench, I tried the combination of 2.4.16,
your 2.4.16 low latency patch, and the IO scheduling patch
on my dual PIII.
After starting it up I did a dbench 32 on a 180 GB reiserfs
running on software RAID 5, just to see if it would
fall over, and during the run I got the following error/
warning message printed about 20 times on the console
and in the kernel log:
vs-4150: reiserfs_new_blocknrs, block not free<4>
Took it to single user mode after that and ran reiserfsck,
which printed a lot of stuff but I don't think it found any
problems.
Went back to 2.4.15-pre5 and could not reproduce the problem
on that kernel.
Torrey
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 18:24 ` Marcelo Tosatti
@ 2001-11-28 18:57 ` Marcelo Tosatti
2001-11-28 20:31 ` Andrew Morton
1 sibling, 0 replies; 53+ messages in thread
From: Marcelo Tosatti @ 2001-11-28 18:57 UTC (permalink / raw)
To: Andrew Morton; +Cc: Mike Fedyk, Ahmed Masud, 'lkml'
On Wed, 28 Nov 2001, Marcelo Tosatti wrote:
>
>
> On Tue, 27 Nov 2001, Andrew Morton wrote:
>
> > Mike Fedyk wrote:
> > >
> > > > I'll send you a patch which makes the VM less inclined to page things
> > > > out in the presence of heavy writes, and which decreases read
> > > > latencies.
> > > >
> > > Is this patch posted anywhere?
> >
> > I sent it yesterday, in this thread. Here it is again.
> >
> > Description:
> >
> > - Account for locked as well as dirty buffers when deciding
> > to throttle writers.
>
> Just one thing: If we have lots of locked buffers due to reads we are
> going to may unecessarily block writes, and thats not any good.
>
> But well, I prefer to fix interactivity than to care about that one kind
> of workload, so I'm ok with it.
>
> > - Tweak VM to make it work the inactive list harder, before starting
> > to evict pages or swap.
>
> I would like to see he interactivity problems get fixed on block layer
> side first: Its not a VM issue initially. Actually, the thing is that if
> you tweak VM this way you're going to break some workloads.
>
> > - Change the elevator so that once a request's latency has
> > expired, we can still perform merges in front of that
> > request. But we no longer will insert new requests in
> > front of that request.
>
> Sounds fine... I've received quite many success reports already, right ?
Err...
s/I/you/
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 18:56 Torrey Hoffman
@ 2001-11-28 19:31 ` Andrew Morton
0 siblings, 0 replies; 53+ messages in thread
From: Andrew Morton @ 2001-11-28 19:31 UTC (permalink / raw)
To: Torrey Hoffman; +Cc: Dieter Nützel, Linux Kernel List
Torrey Hoffman wrote:
>
> Hmm. Speaking of dbench, I tried the combination of 2.4.16,
> your 2.4.16 low latency patch, and the IO scheduling patch
> on my dual PIII.
>
> After starting it up I did a dbench 32 on a 180 GB reiserfs
> running on software RAID 5, just to see if it would
> fall over, and during the run I got the following error/
> warning message printed about 20 times on the console
> and in the kernel log:
>
> vs-4150: reiserfs_new_blocknrs, block not free<4>
>
uh-oh. I probably broke reiserfs in the low-latency patch.
It's fairly harmless - we drop the big kernel lock, schedule
away. Upon resumption, the block we had decided to allocate
has been allocated by someone else. The filesystem emits a
warning and goes off to find a different block.
Will fix.
-
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 18:09 ` Marcelo Tosatti
@ 2001-11-28 19:38 ` Andrew Morton
0 siblings, 0 replies; 53+ messages in thread
From: Andrew Morton @ 2001-11-28 19:38 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Torrey Hoffman, 'lkml'
Marcelo Tosatti wrote:
>
> On Tue, 27 Nov 2001, Andrew Morton wrote:
>
> > Torrey Hoffman wrote:
> > >
> > > I've running 2.4.16 with this VM patch combined with your
> > > 2.4.15-pre7-low-latency patch from www.zip.com.au. (it applied with a
> > > little fuzz, no rejects). Is this a combination that you would feel
> > > comfortable with?
> >
> > Should be OK. There is a possibility of livelock when you have
> > a lot of dirty buffers against multiple devices.
>
> Could you please describe this one ?
It's a recurring problem with the low-latency patch. Basically:
restart:
spin_lock(some_lock);
for (lots of data) {
if (current->need_resched) {
spin_unlock(some_lock);
schedule();
goto restart;
}
if (something_which_is_often_true)
continue();
other_stuff();
}
If there is a realtime task which wants to be scheduled at,
say, one kilohertz, and the execution of that loop takes
more than one millisecond before it actually hits other_stuff()
and does any actual work, we make no progress at all, and we lock
up until the 1 kHz scheduling pressure is stopped.
In the 2.4.15-pre low-latency patch this can happen if we're
running fsync_dev(devA) and there are heaps of buffers for
devB on a list.
It's not a problem in your kernel ;)
-
^ permalink raw reply [flat|nested] 53+ messages in thread
* RE: Unresponiveness of 2.4.16
@ 2001-11-28 19:42 Torrey Hoffman
2001-11-28 20:51 ` Dieter Nützel
0 siblings, 1 reply; 53+ messages in thread
From: Torrey Hoffman @ 2001-11-28 19:42 UTC (permalink / raw)
To: 'Andrew Morton', Torrey Hoffman
Cc: Dieter Nützel, Linux Kernel List
Yes, I just looked at the code in /fs/reiserfs/bitmap.c and
the comment block above the warning message specifically mentions
the low-latency patches.
I feel better now, looks like my filesystem is safe...
Torrey
Andrew Morton wrote:
[...]
> > fall over, and during the run I got the following error/
> > warning message printed about 20 times on the console
> > and in the kernel log:
> >
> > vs-4150: reiserfs_new_blocknrs, block not free<4>
> >
>
> uh-oh. I probably broke reiserfs in the low-latency patch.
>
> It's fairly harmless - we drop the big kernel lock, schedule
> away. Upon resumption, the block we had decided to allocate
> has been allocated by someone else. The filesystem emits a
> warning and goes off to find a different block.
>
> Will fix.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 21:12 ` Andrew Morton
@ 2001-11-28 20:04 ` Marcelo Tosatti
2001-11-28 21:26 ` Andrew Morton
0 siblings, 1 reply; 53+ messages in thread
From: Marcelo Tosatti @ 2001-11-28 20:04 UTC (permalink / raw)
To: Andrew Morton; +Cc: Andreas Dilger, 'lkml'
On Wed, 28 Nov 2001, Andrew Morton wrote:
> Andreas Dilger wrote:
> >
> > On Nov 28, 2001 12:31 -0800, Andrew Morton wrote:
> > > write-cluster.patch
> > > ext2 metadata prereading and various other hacks which
> > > prevent writes from stumbling over reads, and thus ruining
> > > write clustering. This patch is in the early prototype stage
> >
> > Shouldn't the ext2_inode_preread() code use "ll_rw_block(READ_AHEAD,...)"
> > just to be proper?
> >
>
> Yes, especially now the request queues are shorter than they have
> historically been. READA also needs to be propagated through the
> pagecache readhead, which may prove tricky.
>
> But so little code is actually using READA at this stage that I didn't
> bother - I first need to go through those paths and make sure that they
> are in fact complete, working and useful...
I've done some experiments in the past which have shown that doing this
will cause us to almost _never_ do readahead on IO intensive workloads,
which ended up decreasing performance instead increasing it.
Please make sure to extensively test the propagation of READA through the
pagecache when you do so...
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 2:48 ` Andrew Morton
@ 2001-11-28 20:21 ` Roger Larsson
0 siblings, 0 replies; 53+ messages in thread
From: Roger Larsson @ 2001-11-28 20:21 UTC (permalink / raw)
To: Andrew Morton, Andre Hedrick; +Cc: Linux Kernel List
On Wednesday 28 November 2001 03:48, Andrew Morton wrote:
> Mike Fedyk wrote:
> > > echo file_readahead:N > /proc/ide/ide0/hda/settings
> > >
> > > where N is in kilobytes in 2.4.16 kernels.
> >
> > Any idea which drivers it will/won't work on? ie, "almost all ide" or
> > "almost none of the ide driers"?
>
> It appears that all IDE is controlled with /proc/ide/ide0/hda/settings
>
> > >In earlier kernels
> > > it's kilopages (!).
> >
> > Isn't this part of the max-readahead patch?
>
> No, that fix went in separately. Roger Larsson created it, then
> I hit the same problem and forwarded Roger's patch to the relevant
> parties.
>
The reason I did not send it directly, but sent it to Andre for proper
fix, is that the error is all over the place, this is from ide-cd.c
static void ide_cdrom_add_settings(ide_drive_t *drive)
{
int major = HWIF(drive)->major;
int minor = drive->select.b.unit << PARTN_BITS;
ide_add_setting(drive, "breada_readahead", SETTING_RW, BLKRAGET, BLKRASET,
TYPE_INT, 0, 255, 1, 2, &read_ahead[major], NULL);
ide_add_setting(drive, "file_readahead", SETTING_RW, BLKFRAGET, BLKFRASET,
TYPE_INTA, 0, INT_MAX, 1, 1024, &max_readahead[major][minor], NULL);
ide_add_setting(drive, "max_kb_per_request", SETTING_RW, BLKSECTGET,
BLKSECTSET, TYPE_INTA, 1, 255, 1, 2, &max_sectors[major][minor], NULL);
ide_add_setting(drive, "dsc_overlap", SETTING_RW, -1, -1, TYPE_BYTE, 0, 1,
1, 1, &drive->dsc_overlap, NULL);
}
/RogerL
--
Roger Larsson
Skellefteå
Sweden
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 18:24 ` Marcelo Tosatti
2001-11-28 18:57 ` Marcelo Tosatti
@ 2001-11-28 20:31 ` Andrew Morton
2001-11-28 20:56 ` Andreas Dilger
1 sibling, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2001-11-28 20:31 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Mike Fedyk, Ahmed Masud, 'lkml'
Marcelo Tosatti wrote:
>
> On Tue, 27 Nov 2001, Andrew Morton wrote:
>
> > Mike Fedyk wrote:
> > >
> > > > I'll send you a patch which makes the VM less inclined to page things
> > > > out in the presence of heavy writes, and which decreases read
> > > > latencies.
> > > >
> > > Is this patch posted anywhere?
> >
> > I sent it yesterday, in this thread. Here it is again.
> >
> > Description:
> >
> > - Account for locked as well as dirty buffers when deciding
> > to throttle writers.
>
> Just one thing: If we have lots of locked buffers due to reads we are
> going to may unecessarily block writes, and thats not any good.
True. I believe this change makes balance_dirty() work as it was
originally intended to work. But in so doing, lots of things change.
Various places which have been tuned for the broken balance_dirty()
behaviour may need to be retuned. It needs testing, thought, and
a comment from Linus would be helpful.
> But well, I prefer to fix interactivity than to care about that one kind
> of workload, so I'm ok with it.
>
> > - Tweak VM to make it work the inactive list harder, before starting
> > to evict pages or swap.
>
> I would like to see he interactivity problems get fixed on block layer
> side first: Its not a VM issue initially. Actually, the thing is that if
> you tweak VM this way you're going to break some workloads.
Possibly. I have a feeling that the VM is a bit too swaphappy,
especially in the presence of heavy write() loads. I'd rather
see more aggressive dropbehind on the write() data, than see
useful cache data dropped. But I'm not sure yet.
> > - Change the elevator so that once a request's latency has
> > expired, we can still perform merges in front of that
> > request. But we no longer will insert new requests in
> > front of that request.
>
> Sounds fine... I've received quite many success reports already, right ?
A few people have reported success. Nathan Grennan didn't.
The elevator change also needs more testing and review.
There's a possibility that it could cause a seek-storm collapse
when interacting with readahead. Currently, readhead does this:
for (some pages) {
alloc_page()
page_cache_read()
}
See the potential here for the alloc_page() to get abducted
by shrink_cache(), to perform IO, and to not return until after
the previous page_cache_read() has been submitted to the device?
Ouch. Putting reads nearer the elevator head exposes this possibility.
It seems to not happen, due to the vagaries of the VM-of-the-minute,
and the workload. But it could.
So the obvious change is to allocate all the readhead pages up-front
before issuing the reads. I rewrote the readhead code to do this
(and dropped about 300 lines from filemap.c in the process), but given
that the condition doesn't trigger, it doesn't make much difference.
I've spent a week so far looking closely at various performance
and usability problems with 2.4. It's still a work-in-progress.
I don't feel ready to start offering anything for merging yet,
really. Some of these things interact, and I'd prefer to get
more off-stream testing done, as well as code review.
Current patchset is at http://www.zip.com.au/~akpm/linux/2.4/2.4.17-pre1/
The list so far is:
vm-fixes.patch
The balance_dirty() and less swap-happy changes
write-cluster.patch
ext2 metadata prereading and various other hacks which
prevent writes from stumbling over reads, and thus ruining
write clustering. This patch is in the early prototype stage
readhead.patch
VM readhead rewrite. Designed to avoid the above
problem, and to make readhead growth more aggressive,
and to make readhead shrinkth less aggressive. I
don't see why we should drop the readhead window on the
floor if someone has read a few megs from a file and then
seeks elsewhere within it. Also uses common code for
mmap readhead. The madvise explicit dropbehind code
accidentally died. Oh well.
Testing with paging-intensive workloads (start X11, staroffice6)
indicates that we indeed do more IO, in less requests. But
walltime doesn't change. I may not proceed with this.
mini-ll.patch
A kinder, gentler low-latency patch, based on the one which
Andrea is maintaining. Doesn't drop any locks. As far as
I'm concerned, this can be merged today (six months ago, in
fact). It gives practically all the perceived benefit of
the preemptive kernel patch and is clearly safe.
A number of vendors are shipping kernels which are patched
to add rescheduling points to copy_*_user(), which is
much less effective than this patch. They shouldn't
be doing this.
elevator.patch
The previously-described elevator changes
inline.patch
Drops a large number of ill-chosen `inline' qualifiers
from the kernel. Removes a total of about 12,000 bytes
of instructions, almost all from the very hottest parts of
the kernel. Should prove useful for computers which
have an L1 cache which is faster than main memory.
block-alloc.patch
My nemesis. Fixing the long- and short-term fragmentation
of ext2/ext3 blocks would be a more significant performance
boost than anything else in the 2.4 series. But it's just
proving intractable. I'll probably have to drop most of
this, and look at online defrag. There's potential for
a 3x to 5x speedup here.
Also need to do something about the stalls which Nathan Grennan
has reported. On ext3 it seems to be due to atime updates.
Not sure about ext2 yet.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 19:42 Torrey Hoffman
@ 2001-11-28 20:51 ` Dieter Nützel
0 siblings, 0 replies; 53+ messages in thread
From: Dieter Nützel @ 2001-11-28 20:51 UTC (permalink / raw)
To: Torrey Hoffman, 'Andrew Morton'; +Cc: Linux Kernel List, Robert Love
Am Mittwoch, 28. November 2001 20:42 schrieb Torrey Hoffman:
> Yes, I just looked at the code in /fs/reiserfs/bitmap.c and
> the comment block above the warning message specifically mentions
> the low-latency patches.
>
> I feel better now, looks like my filesystem is safe...
>
> Torrey
So may I ask you to give 2.4.16 + preempt + lock-break (it is an additional
one which do the same as Andrew's low-latency) a try?
Please run an MP3 or Ogg-Vorbis together with dbench. As you have a dual PIII
I am very interested. I will buy a dual Athlon XP/MP, soon.
Thanks,
Dieter
> Andrew Morton wrote:
> [...]
>
> > > fall over, and during the run I got the following error/
> > > warning message printed about 20 times on the console
> > > and in the kernel log:
> > >
> > > vs-4150: reiserfs_new_blocknrs, block not free<4>
> >
> > uh-oh. I probably broke reiserfs in the low-latency patch.
> >
> > It's fairly harmless - we drop the big kernel lock, schedule
> > away. Upon resumption, the block we had decided to allocate
> > has been allocated by someone else. The filesystem emits a
> > warning and goes off to find a different block.
> >
> > Will fix.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 20:31 ` Andrew Morton
@ 2001-11-28 20:56 ` Andreas Dilger
2001-11-28 21:12 ` Andrew Morton
0 siblings, 1 reply; 53+ messages in thread
From: Andreas Dilger @ 2001-11-28 20:56 UTC (permalink / raw)
To: Andrew Morton; +Cc: 'lkml'
On Nov 28, 2001 12:31 -0800, Andrew Morton wrote:
> write-cluster.patch
> ext2 metadata prereading and various other hacks which
> prevent writes from stumbling over reads, and thus ruining
> write clustering. This patch is in the early prototype stage
Shouldn't the ext2_inode_preread() code use "ll_rw_block(READ_AHEAD,...)"
just to be proper?
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 20:56 ` Andreas Dilger
@ 2001-11-28 21:12 ` Andrew Morton
2001-11-28 20:04 ` Marcelo Tosatti
0 siblings, 1 reply; 53+ messages in thread
From: Andrew Morton @ 2001-11-28 21:12 UTC (permalink / raw)
To: Andreas Dilger; +Cc: 'lkml'
Andreas Dilger wrote:
>
> On Nov 28, 2001 12:31 -0800, Andrew Morton wrote:
> > write-cluster.patch
> > ext2 metadata prereading and various other hacks which
> > prevent writes from stumbling over reads, and thus ruining
> > write clustering. This patch is in the early prototype stage
>
> Shouldn't the ext2_inode_preread() code use "ll_rw_block(READ_AHEAD,...)"
> just to be proper?
>
Yes, especially now the request queues are shorter than they have
historically been. READA also needs to be propagated through the
pagecache readhead, which may prove tricky.
But so little code is actually using READA at this stage that I didn't
bother - I first need to go through those paths and make sure that they
are in fact complete, working and useful...
-
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Unresponiveness of 2.4.16
2001-11-28 20:04 ` Marcelo Tosatti
@ 2001-11-28 21:26 ` Andrew Morton
0 siblings, 0 replies; 53+ messages in thread
From: Andrew Morton @ 2001-11-28 21:26 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Andreas Dilger, 'lkml'
Marcelo Tosatti wrote:
>
> > ...
> > But so little code is actually using READA at this stage that I didn't
> > bother - I first need to go through those paths and make sure that they
> > are in fact complete, working and useful...
>
> I've done some experiments in the past which have shown that doing this
> will cause us to almost _never_ do readahead on IO intensive workloads,
> which ended up decreasing performance instead increasing it.
Interesting. Thanks.
One _could_ make the first readahead page non-READA, and then
make the rest READA. That way, all block-contiguous requests
will be merged, and any non-contiguous requests will be dropped on
the floor if the request queue is full. Which is probably what
we want to happen anyway.
Of course the alternative is to slot a little bmap() call into
the readhead logic :)
> Please make sure to extensively test the propagation of READA through the
> pagecache when you do so...
Extensivelytest is my middle name.
-
^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2001-11-28 21:28 UTC | newest]
Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-11-28 0:33 Unresponiveness of 2.4.16 Torrey Hoffman
2001-11-28 0:48 ` Andrew Morton
2001-11-28 18:09 ` Marcelo Tosatti
2001-11-28 19:38 ` Andrew Morton
-- strict thread matches above, loose matches on Subject: below --
2001-11-28 19:42 Torrey Hoffman
2001-11-28 20:51 ` Dieter Nützel
2001-11-28 18:56 Torrey Hoffman
2001-11-28 19:31 ` Andrew Morton
2001-11-28 1:31 Dieter Nützel
2001-11-28 2:13 ` Andrew Morton
2001-11-28 2:34 ` Mike Fedyk
2001-11-28 2:48 ` Andrew Morton
2001-11-28 20:21 ` Roger Larsson
2001-11-28 3:53 ` Dieter Nützel
[not found] ` <200111280353.fAS3rEB05638@zero.tech9.net>
2001-11-28 4:14 ` Robert Love
2001-11-27 9:56 willy tarreau
2001-11-27 10:57 ` Heinz Diehl
2001-11-26 22:02 Nathan G. Grennan
2001-11-26 22:17 ` Alan Cox
2001-11-26 23:34 ` Nicolas Pitre
2001-11-27 0:05 ` Steve Lion
2001-11-27 9:12 ` Ahmed Masud
2001-11-27 17:12 ` Andrew Morton
2001-11-27 20:31 ` Mike Fedyk
2001-11-27 20:57 ` Andrew Morton
2001-11-27 21:19 ` Martin Eriksson
2001-11-27 21:24 ` Mike Fedyk
2001-11-28 18:24 ` Marcelo Tosatti
2001-11-28 18:57 ` Marcelo Tosatti
2001-11-28 20:31 ` Andrew Morton
2001-11-28 20:56 ` Andreas Dilger
2001-11-28 21:12 ` Andrew Morton
2001-11-28 20:04 ` Marcelo Tosatti
2001-11-28 21:26 ` Andrew Morton
2001-11-26 23:59 ` Rik van Riel
2001-11-27 0:36 ` Andrew Morton
2001-11-27 0:46 ` Rik van Riel
2001-11-27 4:38 ` Mike Fedyk
2001-11-27 4:45 ` Andrew Morton
2001-11-27 1:45 ` Andrea Arcangeli
2001-11-26 22:21 ` Andrew Morton
2001-11-27 7:42 ` Jens Axboe
2001-11-27 7:58 ` Mike Fedyk
2001-11-27 8:01 ` Jens Axboe
2001-11-27 8:31 ` Andrew Morton
2001-11-27 8:38 ` Jens Axboe
2001-11-26 22:44 ` Lincoln Dale
2001-11-27 4:34 ` GOTO Masanori
2001-11-27 0:44 ` Lost Logic
2001-11-27 0:57 ` Lost Logic
2001-11-27 3:49 ` Sean Elble
2001-11-27 3:56 ` Doug Ledford
2001-11-27 4:00 ` Sean Elble
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox