linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Issues with a rather unusual configured NFS server
@ 2013-08-11  9:48 Toralf Förster
  2013-08-12 14:36 ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Toralf Förster @ 2013-08-11  9:48 UTC (permalink / raw)
  To: Linux NFS mailing list
  Cc: user-mode-linux-devel@lists.sourceforge.net, linux-ext4,
	Linux Kernel

so that the server either crashes (if it is a user mode linux image) or at least its reboot functionality got broken
- if the NFS server is hammered with scary NFS calls using a fuzzy tool running at a remote NFS client under a non-privileged user id.

It can re reproduced, if
	- the NFS share is an EXT3 or EXT4 directory
	- and it is created at file located at tempfs and mounted via loop device
	- and the NFS server is forced to umount the NFS share
	- and the server forced to restart the NSF service afterwards
	- and trinity is used

I could find a scenario for an automated bisect. 2 times it brought this commit 
commit 68a3396178e6688ad7367202cdf0af8ed03c8727
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Thu Mar 21 11:21:50 2013 -0400

    nfsd4: shut down more of delegation earlier


to be the one after which the user mode linux server crashes with a back trace like this:


$ cat /mnt/ramdisk/bt.v3.11-rc4-172-g8ae3f1d
[New LWP 14025]
Core was generated by `/home/tfoerste/devel/linux/linux earlyprintk ubda=/home/tfoerste/virtual/uml/tr'.
Program terminated with signal 6, Aborted.
#0  0xb77ef424 in __kernel_vsyscall ()
#0  0xb77ef424 in __kernel_vsyscall ()
#1  0x083a33c5 in kill ()
#2  0x0807163d in uml_abort () at arch/um/os-Linux/util.c:93
#3  0x08071925 in os_dump_core () at arch/um/os-Linux/util.c:138
#4  0x080613a7 in panic_exit (self=0x85a1518 <panic_exit_notifier>, unused1=0, unused2=0x85d6ce0 <buf.15904>) at arch/um/kernel/um_arch.c:240
#5  0x0809a3b8 in notifier_call_chain (nl=0x0, val=0, v=0x85d6ce0 <buf.15904>, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93
#6  0x0809a503 in __atomic_notifier_call_chain (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized out>, val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:182
#7  atomic_notifier_call_chain (nh=0x85d6cc4 <panic_notifier_list>, val=0, v=0x85d6ce0 <buf.15904>) at kernel/notifier.c:191
#8  0x08400ba8 in panic (fmt=0x0) at kernel/panic.c:128
#9  0x0818edf4 in ext4_put_super (sb=0x4a042690) at fs/ext4/super.c:818
#10 0x081010d2 in generic_shutdown_super (sb=0x4a042690) at fs/super.c:418
#11 0x0810209a in kill_block_super (sb=0x0) at fs/super.c:1028
#12 0x08100f6a in deactivate_locked_super (s=0x4a042690) at fs/super.c:299
#13 0x08101001 in deactivate_super (s=0x4a042690) at fs/super.c:324
#14 0x08118e0c in mntfree (mnt=<optimized out>) at fs/namespace.c:891
#15 mntput_no_expire (mnt=0x0) at fs/namespace.c:929
#16 0x0811a2f5 in SYSC_umount (flags=<optimized out>, name=<optimized out>) at fs/namespace.c:1335
#17 SyS_umount (name=134541632, flags=0) at fs/namespace.c:1305
#18 0x0811a369 in SYSC_oldumount (name=<optimized out>) at fs/namespace.c:1347
#19 SyS_oldumount (name=134541632) at fs/namespace.c:1345
#20 0x080618e2 in handle_syscall (r=0x49e919d4) at arch/um/kernel/skas/syscall.c:35
#21 0x08073c0d in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Linux/skas/process.c:198
#22 userspace (regs=0x49e919d4) at arch/um/os-Linux/skas/process.c:431
#23 0x0805e65c in fork_handler () at arch/um/kernel/process.c:160
#24 0x00000000 in ?? ()



A real system however would not crash bug would give a kernel BUG as reported here:
http://article.gmane.org/gmane.comp.file-systems.ext4/38915
Furthermore the server won't be able any longer to reboot - it would hang infinitely in the reboot phase.
Just the magic sysrq keys still works then.



Steps to reproduce at two 32 bit Gentoo Linux user mode linux images:

1. prepare the server :
	<mount a tempfs onto /mnt/ramdisk>
	mkdir /mnt/ramdisk/victims
	dd if=/dev/zero of=/mnt/ramdisk/disk1 bs=1M count=257 2>/dev/null
	yes | mkfs.ext4 -q /mnt/ramdisk/disk1 1>/dev/null
	mount -o loop /mnt/ramdisk/disk1 /mnt/ramdisk/victims
	chmod 777 /mnt/ramdisk/victims
	/etc/init.d/nfs restart

2. prepare the client
	mount the NFS share onto the local mount point /mnt/ramdisk/victims/ with NFSv4
	
3. run the fuzzy tool trinity at the client:
	while [[ : ]]; do
		<(re-)create and fill /mnt/ramdisk/victims/v1/v2 with 100 empty files and 100 empty directories>
		trinity -V /mnt/ramdisk/victims/v1/v2 -C 1 -N 10000 -q
		sleep 3
	done

4. after 15 min kill the user mode linux client with -9

5. now run at the server
	umount /mnt/ramdisk/victims || /etc/init.d/nfs restart && umount /mnt/ramdisk/victims && echo ' no issue so far'


You might need this patch too from Oleg Nesterov <oleg@redhat.com> (not in mainline currently) .

--- x/kernel/exit.c
+++ x/kernel/exit.c
@@ -783,8 +783,8 @@ void do_exit(long code)
        exit_shm(tsk);
        exit_files(tsk);
        exit_fs(tsk);
-       exit_task_namespaces(tsk);
        exit_task_work(tsk);
+       exit_task_namespaces(tsk);
        check_stack_usage();
        exit_thread();


-- 
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: Issues with a rather unusual configured NFS server
@ 2013-09-28 15:37 Marc Meledandri
  2013-10-01 14:34 ` J. Bruce Fields
  0 siblings, 1 reply; 23+ messages in thread
From: Marc Meledandri @ 2013-09-28 15:37 UTC (permalink / raw)
  To: linux-nfs

On 08/27/2013 08:06 PM, J. Bruce Fields wrote:
> On Tue, Aug 13, 2013 at 05:53:14PM -0400, bfields wrote:
>> On Mon, Aug 12, 2013 at 04:36:40PM +0200, Jan Kara wrote:
>>> On Sun 11-08-13 11:48:49, Toralf Förster wrote:
>>>> so that the server either crashes (if it is a user mode linux image) or at least its reboot functionality got broken
>>>> - if the NFS server is hammered with scary NFS calls using a fuzzy tool running at a remote NFS client under a non-privileged user id.
>>>>
>>>> It can re reproduced, if
>>>> - the NFS share is an EXT3 or EXT4 directory
>>>> - and it is created at file located at tempfs and mounted via loop device
>>>> - and the NFS server is forced to umount the NFS share
>>>> - and the server forced to restart the NSF service afterwards
>>>> - and trinity is used
>>>>
>>>> I could find a scenario for an automated bisect. 2 times it brought this commit
>>>> commit 68a3396178e6688ad7367202cdf0af8ed03c8727
>>>> Author: J. Bruce Fields <bfields@xxxxxxxxxx>
>>>> Date:   Thu Mar 21 11:21:50 2013 -0400
>>>>
>>>>     nfsd4: shut down more of delegation earlier
>>
>> Thanks for the report.  I think I see the problem--after this commit
>> nfs4_set_delegation() failures result in nfs4_put_delegation being
>> called, but nfs4_put_delegation doesn't free the nfs4_file that has
>> already been set by alloc_init_deleg().
>>
>> Let me think about how to fix that....
>
> Sorry for the slow response--can you check whether this fixes the
> problem?
>
Yes.

With the attached patch the problem can't be reproduced any longer with
the prepared test case and current git kernels.

> --b.
>
> commit 624a0ee0375940ce4aa36330b0b5a70af6d2b6f5
> Author: J. Bruce Fields <bfields@xxxxxxxxxx>
> Date:   Thu Aug 15 16:55:26 2013 -0400
>
>     nfsd4: fix leak of inode reference on delegation failure
>
>     This fixes a regression from 68a3396178e6688ad7367202cdf0af8ed03c8727
>     "nfsd4: shut down more of delegation earlier".
>
>     After that commit, nfs4_set_delegation() failures result in
>     nfs4_put_delegation being called, but nfs4_put_delegation doesn't free
>     the nfs4_file that has already been set by alloc_init_deleg().
>
>     This can result in an oops on later unmounting the exported filesystem.
>
>     Note also delaying the fi_had_conflict check we're able to return a
>     better error (hence give 4.1 clients a better idea why the delegation
>     failed; though note CONFLICT isn't an exact match here, as that's
>     supposed to indicate a current conflict, but all we know here is that
>     there was one recently).
>
>     Reported-by: Toralf Förster <toralf.foerster@xxxxxx>
>     Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx>
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index eb9cf81..0874998 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -368,11 +368,8 @@ static struct nfs4_delegation *
>  alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct svc_fh *current_fh)
>  {
>   struct nfs4_delegation *dp;
> - struct nfs4_file *fp = stp->st_file;
>
>   dprintk("NFSD alloc_init_deleg\n");
> - if (fp->fi_had_conflict)
> - return NULL;
>   if (num_delegations > max_delegations)
>   return NULL;
>   dp = delegstateid(nfs4_alloc_stid(clp, deleg_slab));
> @@ -389,8 +386,7 @@ alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct sv
>   INIT_LIST_HEAD(&dp->dl_perfile);
>   INIT_LIST_HEAD(&dp->dl_perclnt);
>   INIT_LIST_HEAD(&dp->dl_recall_lru);
> - get_nfs4_file(fp);
> - dp->dl_file = fp;
> + dp->dl_file = NULL;
>   dp->dl_type = NFS4_OPEN_DELEGATE_READ;
>   fh_copy_shallow(&dp->dl_fh, &current_fh->fh_handle);
>   dp->dl_time = 0;
> @@ -3044,22 +3040,35 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
>   return 0;
>  }
>
> -static int nfs4_set_delegation(struct nfs4_delegation *dp)
> +static int nfs4_set_delegation(struct nfs4_delegation *dp, struct nfs4_file *fp)
>  {
> - struct nfs4_file *fp = dp->dl_file;
> + int status;
>
> - if (!fp->fi_lease)
> - return nfs4_setlease(dp);
> + if (fp->fi_had_conflict)
> + return -EAGAIN;
> + get_nfs4_file(fp);
> + dp->dl_file = fp;
> + if (!fp->fi_lease) {
> + status = nfs4_setlease(dp);
> + if (status)
> + goto out_free;
> + return 0;
> + }
>   spin_lock(&recall_lock);
>   if (fp->fi_had_conflict) {
>   spin_unlock(&recall_lock);
> - return -EAGAIN;
> + status = -EAGAIN;
> + goto out_free;
>   }
>   atomic_inc(&fp->fi_delegees);
>   list_add(&dp->dl_perfile, &fp->fi_delegations);
>   spin_unlock(&recall_lock);
>   list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
>   return 0;
> +out_free:
> + put_nfs4_file(fp);
> + dp->dl_file = fp;
> + return status;
>  }
>
>  static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status)
> @@ -3134,7 +3143,7 @@ nfs4_open_delegation(struct net *net, struct svc_fh *fh,
>   dp = alloc_init_deleg(oo->oo_owner.so_client, stp, fh);
>   if (dp == NULL)
>   goto out_no_deleg;
> - status = nfs4_set_delegation(dp);
> + status = nfs4_set_delegation(dp, stp->st_file);
>   if (status)
>   goto out_free;
>

I was pointed at this thread by the linux-ext4 folks as relevant to my
issue on kernels in
the 3.10.x series. I see this commit was tagged for 3.12-rc2 on git,
and wondering if it will be
rebased for previous kernels? Maybe my issue (oops at shutdown) is
caused by something
else entirely? Thanks!

[183727.974779] EXT4-fs (dm-0): sb orphan head is 47193630
[183727.974864] sb_info orphan list:
[183727.974932]   inode dm-0:47193630 at ffff8802b98950f0: mode
100644, nlink 0, next 0
[183727.975039] ------------[ cut here ]------------
[183727.975108] kernel BUG at fs/ext4/super.c:804!
[183727.975177] invalid opcode: 0000 [#1] SMP
[183727.975341] Modules linked in: btrfs zlib_deflate ufs qnx4 hfsplus
hfs minix ntfs vfat msdos fat jfs xfs libcrc32c reiserfs ext3 jbd ext2
efivars cpuid fuse ecb pci_stub parport_pc ppdev lp parport
cpufreq_userspace cpufreq_stats cpufreq_conservative cpufreq_powersave
binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache
sunrpc usblp hid_microsoft dm_crypt dm_mod loop ecryptfs joydev
nvidia(PO) snd_hda_codec_realtek snd_hda_intel iTCO_wdt
iTCO_vendor_support snd_hda_codec mxm_wmi evdev snd_hwdep snd_pcm
snd_page_alloc coretemp snd_seq snd_timer snd_seq_device psmouse wmi
serio_raw snd i2c_i801 lpc_ich soundcore mfd_core i2c_core ehci_pci
ehci_hcd acpi_cpufreq mperf processor button thermal_sys ext4 crc16
jbd2 mbcache raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq md_mod sg sr_mod cdrom sd_mod crc_t10dif
hid_generic usbhid hid crc32c_intel ghash_clmulni_intel ahci libahci
libata scsi_mod aesni_intel xhci_hcd aes_x86_64 ablk_helper cryptd lrw
gf128mul glue_helper microcode usbcore usb_common e1000e ptp pps_core
[last unloaded: vboxdrv]
[183727.981527] CPU: 2 PID: 24609 Comm: umount Tainted: P           O
3.10.10+mfm #1
[183727.981614] Hardware name:                  /DZ68BC, BIOS
BCZ6810H.86A.0027.2011.1013.1636 10/13/2011
[183727.981703] task: ffff8803e7b06810 ti: ffff8803fffe0000 task.ti:
ffff8803fffe0000
[183727.981790] RIP: 0010:[<ffffffffa0209b62>]  [<ffffffffa0209b62>]
ext4_put_super+0x256/0x310 [ext4]
[183727.981933] RSP: 0018:ffff8803fffe1e78  EFLAGS: 00010287
[183727.982003] RAX: 0000000000000047 RBX: ffff88040de47000 RCX:
00000000d2a7d2a7
[183727.982088] RDX: 000000000000508c RSI: 0000000000000046 RDI:
ffffffff817a94a4
[183727.982174] RBP: ffff88040beb0800 R08: 0000000000000000 R09:
0000000000000100
[183727.982260] R10: 0000000000000100 R11: 0000000000000100 R12:
ffff88040de47200
[183727.982345] R13: ffff88040de47200 R14: ffff88040de47190 R15:
ffff8803fffe1f38
[183727.982432] FS:  00007fb64bfc17e0(0000) GS:ffff88041f500000(0000)
knlGS:0000000000000000
[183727.982519] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[183727.982590] CR2: 00007f3fd0af4f80 CR3: 00000002e46b5000 CR4:
00000000000407e0
[183727.982676] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[183727.982762] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[183727.982847] Stack:
[183727.982911]  ffff880300000000 ffff8803fffe1e78 ffff88040beb0800
ffff88040beb08a0
[183727.983190]  ffffffffa0228fb0 ffff88040ea052a0 ffff88040ea05280
ffffffff810f9415
[183727.983468]  ffff88041e59dbc0 0000000000000083 ffff88040ea05280
ffffffff810f94ab
[183727.983745] Call Trace:
[183727.983813]  [<ffffffff810f9415>] ? generic_shutdown_super+0x4d/0xc5
[183727.983885]  [<ffffffff810f94ab>] ? kill_block_super+0x1e/0x5f
[183727.983958]  [<ffffffff810f97b2>] ? deactivate_locked_super+0x1b/0x46
[183727.984030]  [<ffffffff8110e8b0>] ? SyS_umount+0x2d0/0x2f1
[183727.984102]  [<ffffffff8136b912>] ? system_call_fastpath+0x16/0x1b
[183727.984173] Code: c7 c7 04 1b 23 a0 49 8b 54 24 78 48 81 c6 20 03
00 00 89 04 24 31 c0 e8 de 72 15 e1 4d 8b 24 24 4d 39 ec 0f 84 6e ff
ff ff eb b7 <0f> 0b 48 8b bd 20 01 00 00 e8 b5 65 f1 e0 48 8b bb 50 02
00 00
[183727.987403] RIP  [<ffffffffa0209b62>] ext4_put_super+0x256/0x310 [ext4]
[183727.987525]  RSP <ffff8803fffe1e78>
[183727.987597] ---[ end trace eb19380900af1108 ]---
[183728.094179] EXT4-fs (sda2): re-mounted. Opts: (null)
[184000.112039] SysRq : Keyboard mode set to system default
[184001.631989] SysRq : Terminate All Tasks
[184269.335155] EXT4-fs (sda2): re-mounted. Opts: discard,errors=remount-ro

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-10-02 20:29 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-11  9:48 Issues with a rather unusual configured NFS server Toralf Förster
2013-08-12 14:36 ` Jan Kara
2013-08-13 21:53   ` J. Bruce Fields
2013-08-14 16:44     ` Toralf Förster
2013-08-27 18:06     ` J. Bruce Fields
2013-08-28 17:21       ` Toralf Förster
2013-08-29  9:57         ` [uml-devel] " richard -rw- weinberger
2013-08-29 13:30           ` J. Bruce Fields
2013-08-30 14:10             ` Toralf Förster
2013-08-30 14:25               ` J. Bruce Fields
2013-08-30 14:36               ` Richard Weinberger
2013-08-30 18:27           ` Michael Richardson
2013-09-07 20:44         ` Toralf Förster
2013-09-07 20:51           ` [uml-devel] " richard -rw- weinberger
2013-09-10 14:09           ` J. Bruce Fields
2013-09-10 15:51             ` Toralf Förster
2013-09-22 16:58             ` Toralf Förster
2013-09-23 17:41               ` J. Bruce Fields
2013-10-02 20:29                 ` Toralf Förster
  -- strict thread matches above, loose matches on Subject: below --
2013-09-28 15:37 Marc Meledandri
2013-10-01 14:34 ` J. Bruce Fields
2013-10-02  1:24   ` Marc Meledandri
2013-10-02 14:04     ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).