* 2.6.6 breaks kmail (nfs related?) @ 2004-05-13 12:11 Andreas Amann 2004-05-16 4:46 ` Linus Torvalds 2004-05-17 6:35 ` Norberto Bensa 0 siblings, 2 replies; 20+ messages in thread From: Andreas Amann @ 2004-05-13 12:11 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1214 bytes --] Hi, I upgraded from vanilla 2.6.4 to vanilla 2.6.6, using the same compiler (gcc-3.3.1) and .config file (shortened version in attachment) for both. Now I cannot send messages with kmail and I get the following error messages: ... kmail: Error: Could not add message to folder (No space left on device?) kmail: WARNING: KMail encountered a fatal error and will terminate now. The error was: KMFolderMaildir::addMsg: abnormally terminating to prevent data loss. ... Apparently kmail thinks that my /home device is full, but it is not (still ~37GB free). Other programs have no problem writing into my home. Maybe kmail uses some kind of lock before writing to a folder? (I have fam enabled.) Is it possible that this broke recently? My home directory is mounted via udp-nfs from a server running vanilla 2.4.25 with a reiserfs on a hardware raid. The mount options on the client are hservnlds:/home /net/hservnlds/home nfs rw,nosuid,nodev,v3,rsize=8192,wsize=8192,hard,intr,udp,lock,addr=sservnlds 0 0 Any hints? This looks like a reproducible regression between 2.6.4 and 2.6.6 to me. I can do more tests on request. Andreas -- Andreas Amann Institut für Theoretische Physik, TU Berlin [-- Attachment #2: config-2.6.6 --] [-- Type: text/plain, Size: 5356 bytes --] CONFIG_X86=y CONFIG_MMU=y CONFIG_UID16=y CONFIG_GENERIC_ISA_DMA=y CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y CONFIG_STANDALONE=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSCTL=y CONFIG_LOG_BUF_SHIFT=15 CONFIG_HOTPLUG=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_KALLSYMS=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_OBSOLETE_MODPARM=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_X86_PC=y CONFIG_M586=y CONFIG_X86_GENERIC=y CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_X86_PPRO_FENCE=y CONFIG_X86_F00F_BUG=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_ALIGNMENT_16=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_SMP=y CONFIG_NR_CPUS=32 CONFIG_PREEMPT=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_HIGHMEM4G=y CONFIG_HIGHMEM=y CONFIG_MTRR=y CONFIG_IRQBALANCE=y CONFIG_HAVE_DEC_LOCK=y CONFIG_PM=y CONFIG_ACPI_BOOT=y CONFIG_APM=m CONFIG_APM_RTC_IS_GMT=y CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y CONFIG_PCI_LEGACY_PROC=y CONFIG_PCI_NAMES=y CONFIG_ISA=y CONFIG_PCMCIA_PROBE=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_AOUT=y CONFIG_PARPORT=m CONFIG_PARPORT_PC=m CONFIG_PARPORT_PC_CML1=m CONFIG_PARPORT_1284=y CONFIG_BLK_DEV_FD=m CONFIG_BLK_DEV_LOOP=m CONFIG_BLK_DEV_CRYPTOLOOP=m CONFIG_BLK_DEV_RAM=m CONFIG_BLK_DEV_RAM_SIZE=4096 CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y CONFIG_BLK_DEV_IDEDISK=y CONFIG_BLK_DEV_IDECD=m CONFIG_BLK_DEV_IDESCSI=m CONFIG_IDE_GENERIC=y CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_IDEDMA_PCI=y CONFIG_IDEDMA_PCI_AUTO=y CONFIG_BLK_DEV_ADMA=y CONFIG_BLK_DEV_AMD74XX=y CONFIG_BLK_DEV_PIIX=y CONFIG_BLK_DEV_VIA82CXXX=y CONFIG_BLK_DEV_IDEDMA=y CONFIG_IDEDMA_AUTO=y CONFIG_SCSI=m CONFIG_SCSI_PROC_FS=y CONFIG_BLK_DEV_SD=m CONFIG_BLK_DEV_SR=m CONFIG_CHR_DEV_SG=m CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_AIC7XXX=m CONFIG_AIC7XXX_CMDS_PER_DEVICE=253 CONFIG_AIC7XXX_RESET_DELAY_MS=15000 CONFIG_AIC7XXX_DEBUG_MASK=0 CONFIG_SCSI_QLA2XXX=m CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_NET_KEY=y CONFIG_INET=y CONFIG_INET_AH=y CONFIG_INET_ESP=y CONFIG_INET_IPCOMP=y CONFIG_IPV6=m CONFIG_NETFILTER=y CONFIG_IP_NF_IPTABLES=m CONFIG_IP_NF_FILTER=m CONFIG_XFRM=y CONFIG_NETDEVICES=y CONFIG_DUMMY=m CONFIG_NET_ETHERNET=y CONFIG_MII=y CONFIG_NET_VENDOR_3COM=y CONFIG_VORTEX=y CONFIG_NET_TULIP=y CONFIG_TULIP=y CONFIG_NET_PCI=y CONFIG_E100=y CONFIG_INPUT=y CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 CONFIG_SOUND_GAMEPORT=y CONFIG_SERIO=y CONFIG_SERIO_I8042=y CONFIG_SERIO_SERPORT=y CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_NR_UARTS=4 CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 CONFIG_PRINTER=m CONFIG_AGP=y CONFIG_AGP_INTEL=y CONFIG_AGP_VIA=y CONFIG_I2C=m CONFIG_I2C_CHARDEV=m CONFIG_I2C_ALGOBIT=m CONFIG_I2C_ALGOPCF=m CONFIG_I2C_AMD756=m CONFIG_I2C_ISA=m CONFIG_I2C_PIIX4=m CONFIG_I2C_VIA=m CONFIG_I2C_VIAPRO=m CONFIG_I2C_SENSOR=m CONFIG_SENSORS_VIA686A=m CONFIG_SENSORS_W83781D=m CONFIG_SENSORS_W83L785TS=m CONFIG_VGA_CONSOLE=y CONFIG_DUMMY_CONSOLE=y CONFIG_SOUND=m CONFIG_SND=m CONFIG_SND_TIMER=m CONFIG_SND_PCM=m CONFIG_SND_HWDEP=m CONFIG_SND_RAWMIDI=m CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=m CONFIG_SND_PCM_OSS=m CONFIG_SND_MPU401_UART=m CONFIG_SND_OPL3_LIB=m CONFIG_SND_DUMMY=m CONFIG_SND_MPU401=m CONFIG_SND_AC97_CODEC=m CONFIG_SND_CMIPCI=m CONFIG_SND_ENS1371=m CONFIG_USB=m CONFIG_USB_DEVICEFS=y CONFIG_USB_EHCI_HCD=m CONFIG_USB_OHCI_HCD=m CONFIG_USB_UHCI_HCD=m CONFIG_USB_AUDIO=m CONFIG_USB_PRINTER=m CONFIG_USB_STORAGE=m CONFIG_USB_HID=m CONFIG_USB_HIDINPUT=y CONFIG_USB_HIDDEV=y CONFIG_USB_USBNET=m CONFIG_USB_ALI_M5632=y CONFIG_USB_ARMLINUX=y CONFIG_EXT2_FS=y CONFIG_EXT3_FS=y CONFIG_JBD=y CONFIG_REISERFS_FS=m CONFIG_MINIX_FS=m CONFIG_AUTOFS4_FS=y CONFIG_ISO9660_FS=m CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_ZISOFS_FS=m CONFIG_UDF_FS=m CONFIG_FAT_FS=m CONFIG_MSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_NTFS_FS=m CONFIG_NTFS_RW=y CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_RAMFS=y CONFIG_CRAMFS=m CONFIG_NFS_FS=y CONFIG_NFS_V3=y CONFIG_NFSD=y CONFIG_NFSD_V3=y CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_EXPORTFS=y CONFIG_SUNRPC=y CONFIG_SMB_FS=m CONFIG_MSDOS_PARTITION=y CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_CODEPAGE_437=m CONFIG_NLS_ISO8859_1=m CONFIG_NLS_ISO8859_15=m CONFIG_EARLY_PRINTK=y CONFIG_X86_FIND_SMP_CONFIG=y CONFIG_X86_MPPARSE=y CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_NULL=m CONFIG_CRYPTO_MD4=m CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=y CONFIG_CRYPTO_SHA256=m CONFIG_CRYPTO_SHA512=m CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_TWOFISH=m CONFIG_CRYPTO_SERPENT=m CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_DEFLATE=y CONFIG_CRC32=y CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=y CONFIG_X86_SMP=y CONFIG_X86_HT=y CONFIG_X86_BIOS_REBOOT=y CONFIG_X86_TRAMPOLINE=y CONFIG_X86_STD_RESOURCES=y CONFIG_PC=y ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-13 12:11 2.6.6 breaks kmail (nfs related?) Andreas Amann @ 2004-05-16 4:46 ` Linus Torvalds 2004-05-16 17:59 ` Trond Myklebust 2004-05-17 6:35 ` Norberto Bensa 1 sibling, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2004-05-16 4:46 UTC (permalink / raw) To: Andreas Amann, Trond Myklebust; +Cc: Kernel Mailing List On Thu, 13 May 2004, Andreas Amann wrote: > > I upgraded from vanilla 2.6.4 to vanilla 2.6.6, using the same compiler > (gcc-3.3.1) and .config file (shortened version in attachment) for both. Now > I cannot send messages with kmail and I get the following error messages: > > ... > kmail: Error: Could not add message to folder (No space left on device?) > kmail: WARNING: KMail encountered a fatal error and will terminate now. > The error was: > KMFolderMaildir::addMsg: abnormally terminating to prevent data loss. Can you strace it to see what the failing system call was? Especially if you can compare the traces between 2.6.4 and 2.6.6 some way.. Trond, any idea? Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-16 4:46 ` Linus Torvalds @ 2004-05-16 17:59 ` Trond Myklebust 2004-05-16 18:10 ` Trond Myklebust 0 siblings, 1 reply; 20+ messages in thread From: Trond Myklebust @ 2004-05-16 17:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andreas Amann, Kernel Mailing List På su , 16/05/2004 klokka 00:46, skreiv Linus Torvalds: > Can you strace it to see what the failing system call was? Especially if > you can compare the traces between 2.6.4 and 2.6.6 some way.. > > Trond, any idea? Not really: there isn't anything in the NFS filesystem code that can generate an ENOSPC. I agree that the "strace" output will help. Andreas are both the server and the client running 2.6.6? If so, which do you have to downgrade to 2.6.4 in order to get rid of the error? Cheers, Trond ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-16 17:59 ` Trond Myklebust @ 2004-05-16 18:10 ` Trond Myklebust 2004-05-16 18:19 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Trond Myklebust @ 2004-05-16 18:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andreas Amann, Kernel Mailing List På su , 16/05/2004 klokka 13:59, skreiv Trond Myklebust: > Andreas are both the server and the client running 2.6.6? If so, which > do you have to downgrade to 2.6.4 in order to get rid of the error? Oh... Another thing that would be useful: mount options please... Cheers, Trond ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-16 18:10 ` Trond Myklebust @ 2004-05-16 18:19 ` Linus Torvalds 2004-05-16 18:47 ` Trond Myklebust 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2004-05-16 18:19 UTC (permalink / raw) To: Trond Myklebust; +Cc: Andreas Amann, Kernel Mailing List On Sun, 16 May 2004, Trond Myklebust wrote: > > Oh... Another thing that would be useful: mount options please... They were in the original email on the kernel mailing list: hservnlds:/home /net/hservnlds/home nfs rw,nosuid,nodev,v3,rsize=8192,wsize=8192,hard,intr,udp,lock,addr=sservnlds 0 The only thing there is that "intr". Maybe something has broken so that non-lethal signals also trigger errors? That could explain it (partial reads or writes when a timer goes off, or something). Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-16 18:19 ` Linus Torvalds @ 2004-05-16 18:47 ` Trond Myklebust 2004-05-16 18:50 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Trond Myklebust @ 2004-05-16 18:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andreas Amann, Kernel Mailing List På su , 16/05/2004 klokka 14:19, skreiv Linus Torvalds: > They were in the original email on the kernel mailing list: Sorry. I was in Malaysia last week so that email probably drowned in the 1600 other mails I found in my backlog when I returned on Friday. I've found it now in the archives... > hservnlds:/home /net/hservnlds/home nfs rw,nosuid,nodev,v3,rsize=8192,wsize=8192,hard,intr,udp,lock,addr=sservnlds 0 > > The only thing there is that "intr". Maybe something has broken so that > non-lethal signals also trigger errors? That could explain it (partial > reads or writes when a timer goes off, or something). I haven't touched rpc_clnt_sigmask() in many years, so that would have to be some change to the generic signal handling code. If kmail really is reporting an ENOSPC, though, then it's hard to see how a signal could produce that particular error. Cheers, Trond ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-16 18:47 ` Trond Myklebust @ 2004-05-16 18:50 ` Linus Torvalds 2004-05-16 19:10 ` Trond Myklebust 0 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2004-05-16 18:50 UTC (permalink / raw) To: Trond Myklebust; +Cc: Andreas Amann, Kernel Mailing List On Sun, 16 May 2004, Trond Myklebust wrote: > > If kmail really is reporting an ENOSPC, though, then it's hard to see > how a signal could produce that particular error. Agreed. But the kmail message is apparently "(No space left on device?)", which may be just kmail itself reacting to a truncated write rather than any actual ENOSPC error. A "strace" would help clarify exactly what goes wrong.. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-16 18:50 ` Linus Torvalds @ 2004-05-16 19:10 ` Trond Myklebust 2004-05-17 11:31 ` Andreas Amann 0 siblings, 1 reply; 20+ messages in thread From: Trond Myklebust @ 2004-05-16 19:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andreas Amann, Kernel Mailing List På su , 16/05/2004 klokka 14:50, skreiv Linus Torvalds: > Agreed. But the kmail message is apparently "(No space left on device?)", > which may be just kmail itself reacting to a truncated write rather than > any actual ENOSPC error. A "strace" would help clarify exactly what goes > wrong.. Right... One possible suspect might be open(O_EXCL) since, AFAICS, Andreas is using maildir-style mailboxes. Perhaps that SETATTR call in nfs3_proc_create() is failing? We recently fixed so that it always sets MTIME/ATIME... Andreas: when you do the "strace" could you first run echo "16" >/proc/sys/sunrpc/nfs_debug and then record the output from "dmesg" immediately after the kmail crash? Cheers, Trond ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-16 19:10 ` Trond Myklebust @ 2004-05-17 11:31 ` Andreas Amann 2004-05-17 15:55 ` Trond Myklebust 2004-05-17 21:35 ` Matthias Urlichs 0 siblings, 2 replies; 20+ messages in thread From: Andreas Amann @ 2004-05-17 11:31 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linus Torvalds, Kernel Mailing List On Sunday 16 May 2004 21:10, Trond Myklebust wrote: > På su , 16/05/2004 klokka 14:50, skreiv Linus Torvalds: > > Agreed. But the kmail message is apparently "(No space left on device?)", > > which may be just kmail itself reacting to a truncated write rather than > > any actual ENOSPC error. A "strace" would help clarify exactly what goes > > wrong.. > > Right... > > One possible suspect might be open(O_EXCL) since, AFAICS, Andreas is > using maildir-style mailboxes. Perhaps that SETATTR call in > nfs3_proc_create() is failing? We recently fixed so that it always sets > MTIME/ATIME... > > Andreas: when you do the "strace" could you first run > > echo "16" >/proc/sys/sunrpc/nfs_debug > > and then record the output from "dmesg" immediately after the kmail > crash? Ok, I produced the "strace"s and "dmesg"s for the kernels 2.6.4, 2.6.5 and 2.6.6 and made them available at http://wwwnlds.physik.tu-berlin.de/~amann/kmail_bug/dmesg_kmail_2.6.4 http://wwwnlds.physik.tu-berlin.de/~amann/kmail_bug/dmesg_kmail_2.6.5 http://wwwnlds.physik.tu-berlin.de/~amann/kmail_bug/dmesg_kmail_2.6.6 http://wwwnlds.physik.tu-berlin.de/~amann/kmail_bug/kmail_trace_2.6.4 http://wwwnlds.physik.tu-berlin.de/~amann/kmail_bug/kmail_trace_2.6.5 http://wwwnlds.physik.tu-berlin.de/~amann/kmail_bug/kmail_trace_2.6.6 Some further information: My problem occurs already with kernel 2.6.5, and it is indeed NFS related (It does not appear on a local home partition). I reproduced the crash with a server exporting an ext2 partition and one which exports a reiserfs partition. So far I only tested servers running on vanilla 2.4.25. Should I check others? The mount options according to /proc/mount are viola:/tmp /net/viola/tmp nfs rw,nosuid,nodev,v3,rsize=8192,wsize=8192,hard,intr,udp,lock,addr=viola 0 0 The traces were produced by the command lines strace 2>/tmp/kmail_trace_2.6.x /usr/linux-local/kde/bin/kmail --nofork -s test --msg test_mail amann@physik.tu-berlin.de dmesg > /tmp/dmesg_kmail_2.6.x (I also tried "strace -f", but apparently exim does not like to be traced?) >From my (limited) point of view, the problem is the ESTALE of an fstat64 call in the 2.6.5 trace: > access("/net/viola/tmp/amann/home_tmp/Mail/outbox/cur/1084784925.736.utEix:2,S", F_OK) = -1 ENOENT (No such file or directory) rename("/net/viola/tmp/amann/home_tmp/Mail/outbox/tmp/1084784925.736.utEix:2,S", "/net/viola/tmp/amann/home_tmp/Mail/outbox/cur/1084784925 .736.utEix:2,S") = 0 fstat64(8, 0xbfffe650) = -1 ESTALE (Stale NFS file handle) _llseek(8, 0, [373], SEEK_END) = 0 write(8, "X\1\0\0", 4) = -1 ESTALE (Stale NFS file handle) write(8, "\5\0\0\0,\0\0a\0z\0/\0w\0t\0z\0t\0+\0N\0S\0+\0001\0s"..., 344) = -1 ESTALE (Stale NFS file handle) < This succeeds in the 2.6.4 trace: > access("/net/viola/tmp/amann/home_tmp/Mail/outbox/cur/1084785768.460.mTWwO:2,S", F_OK) = -1 ENOENT (No such file or directory) rename("/net/viola/tmp/amann/home_tmp/Mail/outbox/tmp/1084785768.460.mTWwO:2,S", "/net/viola/tmp/amann/home_tmp/Mail/outbox/cur/1084785768 .460.mTWwO:2,S") = 0 fstat64(8, {st_mode=S_IFREG|0600, st_size=373, ...}) = 0 _llseek(8, 0, [0], SEEK_SET) = 0 read(8, "# KMail-Index V1506\n\0\10\0\0\0xV4\22\4\0\0"..., 373) = 373 write(8, "X\1\0\0", 4) = 4 write(8, "\5\0\0\0,\0\0j\0w\0N\0O\0S\0g\0p\0x\0j\0003\0o\0l\0S"..., 344) = 344 < In both cases filehandle 8 was generated before by > open("/net/viola/tmp/amann/home_tmp/Mail/.outbox.index", O_RDWR|O_LARGEFILE) = 8 < No idea what causes the difference. I hope this is the information you expected. Please let me know what further checks I can do. Andreas ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-17 11:31 ` Andreas Amann @ 2004-05-17 15:55 ` Trond Myklebust 2004-05-21 15:27 ` Andreas Amann 2004-05-17 21:35 ` Matthias Urlichs 1 sibling, 1 reply; 20+ messages in thread From: Trond Myklebust @ 2004-05-17 15:55 UTC (permalink / raw) To: Andreas Amann; +Cc: Linus Torvalds, Kernel Mailing List På må , 17/05/2004 klokka 07:31, skreiv Andreas Amann: > fstat64(8, 0xbfffe650) = -1 ESTALE (Stale NFS file handle) > _llseek(8, 0, [373], SEEK_END) = 0 > write(8, "X\1\0\0", 4) = -1 ESTALE (Stale NFS file handle) > write(8, "\5\0\0\0,\0\0a\0z\0/\0w\0t\0z\0t\0+\0N\0S\0+\0001\0s"..., 344) = -1 > ESTALE (Stale NFS file handle) That's wierd... Where could that be coming from? The client is *never* supposed to generate that on its own. If an ESTALE turns up, it should always be generated from the server. Does that same ESTALE show up on a tcpdump/ethereal dump? If so, could you please check that the filehandle that is contained from the reply to LOOKUP(".outbox.index") is the same as that which is sent on the offending GETATTR call? Cheers, Trond ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-17 15:55 ` Trond Myklebust @ 2004-05-21 15:27 ` Andreas Amann 2004-05-21 16:40 ` Trond Myklebust 0 siblings, 1 reply; 20+ messages in thread From: Andreas Amann @ 2004-05-21 15:27 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linus Torvalds, Kernel Mailing List On Monday 17 May 2004 17:55, Trond Myklebust wrote: > På må , 17/05/2004 klokka 07:31, skreiv Andreas Amann: > > fstat64(8, 0xbfffe650) = -1 ESTALE (Stale NFS file > > handle) _llseek(8, 0, [373], SEEK_END) = 0 > > write(8, "X\1\0\0", 4) = -1 ESTALE (Stale NFS file > > handle) write(8, > > "\5\0\0\0,\0\0a\0z\0/\0w\0t\0z\0t\0+\0N\0S\0+\0001\0s"..., 344) = -1 > > ESTALE (Stale NFS file handle) > > That's wierd... Where could that be coming from? The client is *never* > supposed to generate that on its own. If an ESTALE turns up, it should > always be generated from the server. > > Does that same ESTALE show up on a tcpdump/ethereal dump? If so, could > you please check that the filehandle that is contained from the reply to > LOOKUP(".outbox.index") is the same as that which is sent on the > offending GETATTR call? I now produced the "etheral" dumps and put them on: http://wwwnlds.physik.tu-berlin.de/~amann/kmail_bug/kmail_etheral_cut_2.6.4 http://wwwnlds.physik.tu-berlin.de/~amann/kmail_bug/kmail_etheral_cut_2.6.6 together with the pertinent "strace"s at: http://wwwnlds.physik.tu-berlin.de/~amann/kmail_bug/kmail_trace_2.6.4_new http://wwwnlds.physik.tu-berlin.de/~amann/kmail_bug/kmail_trace_2.6.6_new I interpret the dump in the failing case with the 2.6.6 client as follows: First the Filehandle for ".outbox.index" (0xdc36f60a) is delivered by a READDIRPLUS Reply (Frame 4 in kmail_etheral_cut_2.6.6). Then the client does GETATTR, ACCESS, SETATTR, READ (Frames 100-114) without any problems. The client subsequnetly issues a WRITE and a COMMIT (Frame 741 + 751) command, which are still successful. But the immediadetly following GETATTR (Frame 743) fails with ERR_STALE. In the case where the client is 2.6.4, the dumps look very similar, except that now a lot of the GETATTR Calls are missing. In particular the GETATTR which failed in the 2.6.6 case is not present, and therefore can not fail. In both cases the server was 2.4.25. Who is now wrong in this case, the client or the server? To me it looks now, as if the server needs to be fixed, but I am no expert. Andreas ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-21 15:27 ` Andreas Amann @ 2004-05-21 16:40 ` Trond Myklebust 2004-05-21 23:05 ` Andreas Amann 0 siblings, 1 reply; 20+ messages in thread From: Trond Myklebust @ 2004-05-21 16:40 UTC (permalink / raw) To: Andreas Amann; +Cc: Linus Torvalds, Kernel Mailing List På fr , 21/05/2004 klokka 11:27, skreiv Andreas Amann: > In both cases the server was 2.4.25. Who is now wrong in this case, the client > or the server? To me it looks now, as if the server needs to be fixed, but I > am no expert. Yep. This is a server side bug. I just checked the dump, and the client is indeed sending the correct filehandle (exactly the same one as in the COMMIT just before it). Hmm... It looks to me as if you are exporting that filesystem with the "subtree_check" option enabled. Could you try to set "no_subtree_check"? The subtree checking stuff breaks NFS in various subtle ways (including renames etc), and is one of the more common sources of ESTALE errors. Cheers, Trond ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-21 16:40 ` Trond Myklebust @ 2004-05-21 23:05 ` Andreas Amann 2004-05-22 3:40 ` J. Bruce Fields 0 siblings, 1 reply; 20+ messages in thread From: Andreas Amann @ 2004-05-21 23:05 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linus Torvalds, Kernel Mailing List On Fri, May 21, 2004 at 12:40:02PM -0400, Trond Myklebust wrote: > > Hmm... It looks to me as if you are exporting that filesystem with the > "subtree_check" option enabled. Could you try to set "no_subtree_check"? Thanks for that one, with "no_subtree_check" the problem disappears! What is the disadvantage of this option? Andreas ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-21 23:05 ` Andreas Amann @ 2004-05-22 3:40 ` J. Bruce Fields 0 siblings, 0 replies; 20+ messages in thread From: J. Bruce Fields @ 2004-05-22 3:40 UTC (permalink / raw) To: Andreas Amann; +Cc: Trond Myklebust, Linus Torvalds, Kernel Mailing List On Sat, May 22, 2004 at 01:05:45AM +0200, Andreas Amann wrote: > On Fri, May 21, 2004 at 12:40:02PM -0400, Trond Myklebust wrote: > > > > Hmm... It looks to me as if you are exporting that filesystem with the > > "subtree_check" option enabled. Could you try to set "no_subtree_check"? > > Thanks for that one, with "no_subtree_check" the problem disappears! > What is the disadvantage of this option? With "no_subtree_check" the server will not attempt to verify that a given filehandle points to a file that is beneath an exported directory; thus an attacker can guess filehandles of files not beneath any exported directory and use those guessed filehandles to acces files you didn't mean to export. Even with "no_subtree_check", the server can still recognize which filesystem a filehandle belongs to; so you're only in trouble if you have files you don't want exported on the same partition as files you do want exported. See "man exports" for more. --Bruce Fields ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-17 11:31 ` Andreas Amann 2004-05-17 15:55 ` Trond Myklebust @ 2004-05-17 21:35 ` Matthias Urlichs 1 sibling, 0 replies; 20+ messages in thread From: Matthias Urlichs @ 2004-05-17 21:35 UTC (permalink / raw) To: linux-kernel Hi, Andreas Amann wrote: > (I also tried "strace -f", but apparently exim does not like to be traced?) Exim's setuid. Tracing setuid programs generally is fraught with peril, especially if that program changes uids, drops privileges and then re-execs itself (as exim does, IIRC). :-/ -- Matthias Urlichs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-13 12:11 2.6.6 breaks kmail (nfs related?) Andreas Amann 2004-05-16 4:46 ` Linus Torvalds @ 2004-05-17 6:35 ` Norberto Bensa 2004-05-17 7:14 ` Andrew Morton 2004-05-17 16:17 ` Frank van Maarseveen 1 sibling, 2 replies; 20+ messages in thread From: Norberto Bensa @ 2004-05-17 6:35 UTC (permalink / raw) To: linux-kernel Andreas Amann wrote: > kmail: Error: Could not add message to folder (No space left on device?) > kmail: WARNING: KMail encountered a fatal error and will terminate now. > The error was: > KMFolderMaildir::addMsg: abnormally terminating to prevent data loss. > ... Well, I'm getting this with kcalc after upgrading to 2.6.6-mm3: $ kcalc KCrash: Application 'kcalc' crashing... strace shows lots of ... close(1002) = -1 EBADF (Bad file descriptor) close(1003) = -1 EBADF (Bad file descriptor) close(1004) = -1 EBADF (Bad file descriptor) close(1005) = -1 EBADF (Bad file descriptor) ... Now it's late. More tests and info tomorrow (unless there's a new -mm kernel which fixes this :-) ) Regards, Norberto ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-17 6:35 ` Norberto Bensa @ 2004-05-17 7:14 ` Andrew Morton 2004-05-17 17:35 ` Andrew Morton 2004-05-17 16:17 ` Frank van Maarseveen 1 sibling, 1 reply; 20+ messages in thread From: Andrew Morton @ 2004-05-17 7:14 UTC (permalink / raw) To: Norberto Bensa; +Cc: linux-kernel Norberto Bensa <norberto+linux-kernel@bensa.ath.cx> wrote: > > Well, I'm getting this with kcalc after upgrading to 2.6.6-mm3: > > $ kcalc > KCrash: Application 'kcalc' crashing... > > strace shows lots of > ... > close(1002) = -1 EBADF (Bad file descriptor) > close(1003) = -1 EBADF (Bad file descriptor) > close(1004) = -1 EBADF (Bad file descriptor) > close(1005) = -1 EBADF (Bad file descriptor) > ... Send the whole thing, please: `strace -f -o log kcalc', and send `log'. If it's too big to post please mail it to me direct and I'll stick it on a public server. Thanks. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-17 7:14 ` Andrew Morton @ 2004-05-17 17:35 ` Andrew Morton 2004-05-17 18:01 ` Trond Myklebust 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2004-05-17 17:35 UTC (permalink / raw) To: norberto+linux-kernel, linux-kernel; +Cc: Trond Myklebust Andrew Morton <akpm@osdl.org> wrote: > > Norberto Bensa <norberto+linux-kernel@bensa.ath.cx> wrote: > > > > Well, I'm getting this with kcalc after upgrading to 2.6.6-mm3: > > > > $ kcalc > > KCrash: Application 'kcalc' crashing... > > > > strace shows lots of > > ... > > close(1002) = -1 EBADF (Bad file descriptor) > > close(1003) = -1 EBADF (Bad file descriptor) > > close(1004) = -1 EBADF (Bad file descriptor) > > close(1005) = -1 EBADF (Bad file descriptor) > > ... > > Send the whole thing, please: `strace -f -o log kcalc', and send `log'. If > it's too big to post please mail it to me direct and I'll stick it on a > public server. > Norberto's strace log is at http://www.zip.com.au/~akpm/linux/patches/stuff/log.txt ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-17 17:35 ` Andrew Morton @ 2004-05-17 18:01 ` Trond Myklebust 0 siblings, 0 replies; 20+ messages in thread From: Trond Myklebust @ 2004-05-17 18:01 UTC (permalink / raw) To: Andrew Morton; +Cc: norberto+linux-kernel, linux-kernel På må , 17/05/2004 klokka 13:35, skreiv Andrew Morton: > Norberto's strace log is at > http://www.zip.com.au/~akpm/linux/patches/stuff/log.txt A priori, it looks very different from Andreas' problem. This beast is crashing due to a SIGSEGV. The EBADF here appear to be correct: the application or glibc or whatever appears to be trying to close files more than once. Duh... Cheers, Trond ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.6 breaks kmail (nfs related?) 2004-05-17 6:35 ` Norberto Bensa 2004-05-17 7:14 ` Andrew Morton @ 2004-05-17 16:17 ` Frank van Maarseveen 1 sibling, 0 replies; 20+ messages in thread From: Frank van Maarseveen @ 2004-05-17 16:17 UTC (permalink / raw) To: linux-kernel On Mon, May 17, 2004 at 03:35:42AM -0300, Norberto Bensa wrote: > > Well, I'm getting this with kcalc after upgrading to 2.6.6-mm3: > > $ kcalc > KCrash: Application 'kcalc' crashing... > > strace shows lots of > ... > close(1002) = -1 EBADF (Bad file descriptor) > close(1003) = -1 EBADF (Bad file descriptor) > close(1004) = -1 EBADF (Bad file descriptor) > close(1005) = -1 EBADF (Bad file descriptor) Looks like daemonizing code to me, getting rid of open fds. -- Frank ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2004-05-22 3:40 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-05-13 12:11 2.6.6 breaks kmail (nfs related?) Andreas Amann 2004-05-16 4:46 ` Linus Torvalds 2004-05-16 17:59 ` Trond Myklebust 2004-05-16 18:10 ` Trond Myklebust 2004-05-16 18:19 ` Linus Torvalds 2004-05-16 18:47 ` Trond Myklebust 2004-05-16 18:50 ` Linus Torvalds 2004-05-16 19:10 ` Trond Myklebust 2004-05-17 11:31 ` Andreas Amann 2004-05-17 15:55 ` Trond Myklebust 2004-05-21 15:27 ` Andreas Amann 2004-05-21 16:40 ` Trond Myklebust 2004-05-21 23:05 ` Andreas Amann 2004-05-22 3:40 ` J. Bruce Fields 2004-05-17 21:35 ` Matthias Urlichs 2004-05-17 6:35 ` Norberto Bensa 2004-05-17 7:14 ` Andrew Morton 2004-05-17 17:35 ` Andrew Morton 2004-05-17 18:01 ` Trond Myklebust 2004-05-17 16:17 ` Frank van Maarseveen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox