* More autofs problems @ 2008-06-30 8:08 Carsten Aulbert 2008-06-30 8:47 ` Ian Kent 0 siblings, 1 reply; 20+ messages in thread From: Carsten Aulbert @ 2008-06-30 8:08 UTC (permalink / raw) To: autofs Hi again, last weekend went quite bad for our nodes and autofs seem to die quite often. Caveat: We had some networking problems during the weekend as well so this might be triggered by failing network connections: Jun 28 01:05:17 n1035 mountd[32688]: authenticated mount request from 10.10.11.47:603 for /local (/local) Jun 28 01:05:24 n1035 mountd[32665]: authenticated mount request from 10.10.10.33:644 for /local (/local) Jun 28 01:05:32 n1035 kernel: [1956924.510009] Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: Jun 28 01:05:32 n1035 kernel: [1956924.510027] [<ffffffff802710c4>] fput+0x0/0x11 Jun 28 01:05:32 n1035 kernel: [1956924.510466] PGD d0a52067 PUD d0841067 PMD 0 Jun 28 01:05:32 n1035 kernel: [1956924.510621] Oops: 0002 [1] SMP Jun 28 01:05:32 n1035 kernel: [1956924.510773] CPU 3 Jun 28 01:05:32 n1035 kernel: [1956924.510919] Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ipmi_si ipmi_devintf i pmi_watchdog ipmi_poweroff ipmi_msghandler i2c_i801 8250_pnp 8250 i2c_core e1000 serial_core Jun 28 01:05:32 n1035 kernel: [1956924.511407] Pid: 25000, comm: get_file Not tainted 2.6.24.4-nodes #1 Jun 28 01:05:32 n1035 kernel: [1956924.511567] RIP: 0010:[<ffffffff802710c4>] [<ffffffff802710c4>] fput+0x0/0x11 Jun 28 01:05:32 n1035 kernel: [1956924.511862] RSP: 0018:ffff8100e792b950 EFLAGS: 00010246 Jun 28 01:05:32 n1035 kernel: [1956924.512014] RAX: 0000000000000001 RBX: ffff81020f26d180 RCX: ffff810216941408 Jun 28 01:05:32 n1035 kernel: [1956924.512307] RDX: fffffffffffffe00 RSI: 0000000000000206 RDI: 0000000000000000 Jun 28 01:05:32 n1035 kernel: [1956924.512608] RBP: 0000000000000000 R08: ffff8100e792a000 R09: 0000000000000000 Jun 28 01:05:32 n1035 kernel: [1956924.512902] R10: 0000000000000000 R11: ffff810216ae2e80 R12: ffff81020b2a8a80 Jun 28 01:05:32 n1035 kernel: [1956924.513197] R13: ffff8100e792b9e8 R14: ffff810000000000 R15: 0000000000000000 Jun 28 01:05:32 n1035 kernel: [1956924.513492] FS: 0000000040401960(0063) GS:ffff810217c771c0(0000) knlGS:0000000000000000 Jun 28 01:05:32 n1035 kernel: [1956924.513791] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jun 28 01:05:32 n1035 kernel: [1956924.513945] CR2: 0000000000000028 CR3: 00000000e4c85000 CR4: 00000000000006e0 Jun 28 01:05:32 n1035 kernel: [1956924.514239] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 28 01:05:32 n1035 kernel: [1956924.514535] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jun 28 01:05:32 n1035 kernel: [1956924.514831] Process get_file (pid: 25000, threadinfo ffff8100e792a000, task ffff8100d3005850) Jun 28 01:05:32 n1035 kernel: [1956924.515128] Stack: ffffffff8806dc26 ffff81020f26d180 ffff81012d1db5c0 0000000000000110 Jun 28 01:05:32 n1035 kernel: [1956924.515433] ffffffff8806e222 000000011781cb40 ffff81020b2a8a80 ffff81021781cb40 Jun 28 01:05:32 n1035 kernel: [1956924.515734] ffff810217c2d680 ffff8100e792ba18 0000000000000000 0000000000000000 Jun 28 01:05:32 n1035 kernel: [1956924.515895] Call Trace: Jun 28 01:05:32 n1035 kernel: [1956924.516186] [<ffffffff8806dc26>] :autofs4:autofs4_catatonic_mode+0x5e/0x6c Jun 28 01:05:32 n1035 kernel: [1956924.516346] [<ffffffff8806e222>] :autofs4:autofs4_wait+0x5a2/0x76d Jun 28 01:05:32 n1035 kernel: [1956924.516509] [<ffffffff8027ec71>] dput+0x1c/0x10b Jun 28 01:05:32 n1035 kernel: [1956924.516667] [<ffffffff8806ccfd>] :autofs4:try_to_fill_dentry+0x6d/0xf1 Jun 28 01:05:32 n1035 kernel: [1956924.516826] [<ffffffff8806cf7b>] :autofs4:autofs4_revalidate+0xa6/0x17e Jun 28 01:05:32 n1035 kernel: [1956924.516983] [<ffffffff8031ef47>] xfs_vn_permission+0x0/0x17 Jun 28 01:05:32 n1035 kernel: [1956924.517139] [<ffffffff8806d35f>] :autofs4:autofs4_lookup+0x289/0x33e Jun 28 01:05:32 n1035 kernel: [1956924.517297] [<ffffffff80276cc6>] do_lookup+0xc2/0x1b8 Jun 28 01:05:32 n1035 kernel: [1956924.517452] [<ffffffff8027725e>] __link_path_walk+0x365/0xd0a Jun 28 01:05:32 n1035 kernel: [1956924.517627] [<ffffffff8811abc6>] :nfs:nfs_writepages_callback+0x0/0x1e Jun 28 01:05:32 n1035 kernel: [1956924.517785] [<ffffffff80277c52>] link_path_walk+0x4f/0xcb Jun 28 01:05:32 n1035 kernel: [1956924.517941] [<ffffffff8026f7cc>] get_unused_fd_flags+0x77/0x112 Jun 28 01:05:32 n1035 kernel: [1956924.518097] [<ffffffff80277e6f>] do_path_lookup+0x1a1/0x1c5 Jun 28 01:05:32 n1035 kernel: [1956924.518252] [<ffffffff80277fa8>] __path_lookup_intent_open+0x50/0x8d Jun 28 01:05:32 n1035 kernel: [1956924.518409] [<ffffffff802787e2>] open_namei+0x86/0x606 Jun 28 01:05:32 n1035 kernel: [1956924.518568] [<ffffffff8026f652>] do_filp_open+0x1c/0x3d Jun 28 01:05:32 n1035 kernel: [1956924.518723] [<ffffffff8026f7cc>] get_unused_fd_flags+0x77/0x112 Jun 28 01:05:32 n1035 kernel: [1956924.518880] [<ffffffff8026f979>] do_sys_open+0x46/0xca Jun 28 01:05:32 n1035 kernel: [1956924.519034] [<ffffffff8020be0e>] system_call+0x7e/0x83 Jun 28 01:05:32 n1035 kernel: [1956924.519187] Jun 28 01:05:32 n1035 kernel: [1956924.519331] Jun 28 01:05:32 n1035 kernel: [1956924.519332] Code: f0 ff 4f 28 0f 94 c0 84 c0 74 05 e9 92 fe ff ff c3 65 48 8b Jun 28 01:05:32 n1035 kernel: [1956924.519813] RIP [<ffffffff802710c4>] fput+0x0/0x11 Jun 28 01:05:32 n1035 kernel: [1956924.519967] RSP <ffff8100e792b950> Jun 28 01:05:32 n1035 kernel: [1956924.520115] CR2: 0000000000000028 Jun 28 01:05:32 n1035 kernel: [1956924.521139] ---[ end trace 436b46a0e9d6c8a6 ]--- We get several of these after one antoher. A little bit later we get: Jun 28 01:05:32 n1035 rpc.idmapd[2133]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files Jun 28 01:05:32 n1035 last message repeated 2 times Jun 28 01:05:35 n1035 mountd[32665]: authenticated unmount request from 10.10.8.70:867 for /local (/local) Jun 28 01:05:50 n1035 mountd[32627]: authenticated unmount request from 10.10.13.37:600 for /local (/local) Jun 28 01:05:52 n1035 rpc.idmapd[2133]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files Jun 28 01:05:52 n1035 last message repeated 2 times Jun 28 01:05:52 n1035 automount[18299]: AUTOFS_IOC_READY: Invalid argument Jun 28 01:05:52 n1035 last message repeated 5 times Jun 28 01:05:52 n1035 rpc.idmapd[2133]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files Jun 28 01:05:53 n1035 last message repeated 10 times Jun 28 01:05:53 n1035 automount[31737]: >> mount: n1037:/local failed, reason given by server: Permission denied Jun 28 01:05:53 n1035 automount[31756]: >> mount: n0197:/local failed, reason given by server: Permission denied Jun 28 01:05:53 n1035 automount[31743]: >> nfs bindresvport: Address already in use Jun 28 01:05:53 n1035 automount[31766]: >> mount: n0365:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31754]: >> mount: n0717:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31778]: >> mount: n0986:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31739]: >> mount: n1315:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31774]: >> mount: n0919:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31781]: >> mount: n0006:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31741]: >> mount: n0592:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31753]: >> mount: n1080:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31757]: >> mount: n0168:/local: can't read superblock Jun 28 01:05:53 n1035 automount[18299]: AUTOFS_IOC_READY: Invalid argument Jun 28 01:05:53 n1035 automount[31787]: >> mount: n0677:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31785]: >> mount: n0806:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31760]: >> mount: n0868:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31756]: mount(nfs): nfs: mount failure n0197:/local on /atlas/node/n0197 Jun 28 01:05:53 n1035 automount[31737]: mount(nfs): nfs: mount failure n1037:/local on /atlas/node/n1037 Jun 28 01:05:53 n1035 automount[31745]: >> mount: n1105:/local: can't read superblock Jun 28 01:05:53 n1035 automount[31743]: mount(nfs): nfs: mount failure n0937:/local on /atlas/node/n0937 Jun 28 01:05:53 n1035 rpc.idmapd[2133]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files Jun 28 01:05:53 n1035 automount[31753]: mount(nfs): nfs: mount failure n1080:/local on /atlas/node/n1080 I guess since the mounts are not umounted anymore that should explain why we run out of ports. Any suggestions? Do you know this error? Again these boxes are all Debian Etch with 2.6.24.4 kernel (vanilla). and autof 4.1.4-13 on amd64. TIA Carsten ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 8:08 More autofs problems Carsten Aulbert @ 2008-06-30 8:47 ` Ian Kent 2008-06-30 10:08 ` Carsten Aulbert 0 siblings, 1 reply; 20+ messages in thread From: Ian Kent @ 2008-06-30 8:47 UTC (permalink / raw) To: Carsten Aulbert; +Cc: autofs On Mon, 2008-06-30 at 10:08 +0200, Carsten Aulbert wrote: > Hi again, > > last weekend went quite bad for our nodes and autofs seem to die quite often. > Caveat: We had some networking problems during the weekend as well so this might > be triggered by failing network connections: I have been made aware of this bug recently. > > Jun 28 01:05:17 n1035 mountd[32688]: authenticated mount request from 10.10.11.47:603 for /local (/local) > Jun 28 01:05:24 n1035 mountd[32665]: authenticated mount request from 10.10.10.33:644 for /local (/local) > Jun 28 01:05:32 n1035 kernel: [1956924.510009] Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: > Jun 28 01:05:32 n1035 kernel: [1956924.510027] [<ffffffff802710c4>] fput+0x0/0x11 > Jun 28 01:05:32 n1035 kernel: [1956924.510466] PGD d0a52067 PUD d0841067 PMD 0 > Jun 28 01:05:32 n1035 kernel: [1956924.510621] Oops: 0002 [1] SMP > Jun 28 01:05:32 n1035 kernel: [1956924.510773] CPU 3 > Jun 28 01:05:32 n1035 kernel: [1956924.510919] Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ipmi_si ipmi_devintf i > pmi_watchdog ipmi_poweroff ipmi_msghandler i2c_i801 8250_pnp 8250 i2c_core e1000 serial_core > Jun 28 01:05:32 n1035 kernel: [1956924.511407] Pid: 25000, comm: get_file Not tainted 2.6.24.4-nodes #1 > Jun 28 01:05:32 n1035 kernel: [1956924.511567] RIP: 0010:[<ffffffff802710c4>] [<ffffffff802710c4>] fput+0x0/0x11 > Jun 28 01:05:32 n1035 kernel: [1956924.511862] RSP: 0018:ffff8100e792b950 EFLAGS: 00010246 > Jun 28 01:05:32 n1035 kernel: [1956924.512014] RAX: 0000000000000001 RBX: ffff81020f26d180 RCX: ffff810216941408 > Jun 28 01:05:32 n1035 kernel: [1956924.512307] RDX: fffffffffffffe00 RSI: 0000000000000206 RDI: 0000000000000000 > Jun 28 01:05:32 n1035 kernel: [1956924.512608] RBP: 0000000000000000 R08: ffff8100e792a000 R09: 0000000000000000 > Jun 28 01:05:32 n1035 kernel: [1956924.512902] R10: 0000000000000000 R11: ffff810216ae2e80 R12: ffff81020b2a8a80 > Jun 28 01:05:32 n1035 kernel: [1956924.513197] R13: ffff8100e792b9e8 R14: ffff810000000000 R15: 0000000000000000 > Jun 28 01:05:32 n1035 kernel: [1956924.513492] FS: 0000000040401960(0063) GS:ffff810217c771c0(0000) knlGS:0000000000000000 > Jun 28 01:05:32 n1035 kernel: [1956924.513791] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > Jun 28 01:05:32 n1035 kernel: [1956924.513945] CR2: 0000000000000028 CR3: 00000000e4c85000 CR4: 00000000000006e0 > Jun 28 01:05:32 n1035 kernel: [1956924.514239] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Jun 28 01:05:32 n1035 kernel: [1956924.514535] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Jun 28 01:05:32 n1035 kernel: [1956924.514831] Process get_file (pid: 25000, threadinfo ffff8100e792a000, task ffff8100d3005850) > Jun 28 01:05:32 n1035 kernel: [1956924.515128] Stack: ffffffff8806dc26 ffff81020f26d180 ffff81012d1db5c0 0000000000000110 > Jun 28 01:05:32 n1035 kernel: [1956924.515433] ffffffff8806e222 000000011781cb40 ffff81020b2a8a80 ffff81021781cb40 > Jun 28 01:05:32 n1035 kernel: [1956924.515734] ffff810217c2d680 ffff8100e792ba18 0000000000000000 0000000000000000 > Jun 28 01:05:32 n1035 kernel: [1956924.515895] Call Trace: > Jun 28 01:05:32 n1035 kernel: [1956924.516186] [<ffffffff8806dc26>] :autofs4:autofs4_catatonic_mode+0x5e/0x6c > Jun 28 01:05:32 n1035 kernel: [1956924.516346] [<ffffffff8806e222>] :autofs4:autofs4_wait+0x5a2/0x76d > Jun 28 01:05:32 n1035 kernel: [1956924.516509] [<ffffffff8027ec71>] dput+0x1c/0x10b > Jun 28 01:05:32 n1035 kernel: [1956924.516667] [<ffffffff8806ccfd>] :autofs4:try_to_fill_dentry+0x6d/0xf1 > Jun 28 01:05:32 n1035 kernel: [1956924.516826] [<ffffffff8806cf7b>] :autofs4:autofs4_revalidate+0xa6/0x17e > Jun 28 01:05:32 n1035 kernel: [1956924.516983] [<ffffffff8031ef47>] xfs_vn_permission+0x0/0x17 > Jun 28 01:05:32 n1035 kernel: [1956924.517139] [<ffffffff8806d35f>] :autofs4:autofs4_lookup+0x289/0x33e > Jun 28 01:05:32 n1035 kernel: [1956924.517297] [<ffffffff80276cc6>] do_lookup+0xc2/0x1b8 > Jun 28 01:05:32 n1035 kernel: [1956924.517452] [<ffffffff8027725e>] __link_path_walk+0x365/0xd0a > Jun 28 01:05:32 n1035 kernel: [1956924.517627] [<ffffffff8811abc6>] :nfs:nfs_writepages_callback+0x0/0x1e > Jun 28 01:05:32 n1035 kernel: [1956924.517785] [<ffffffff80277c52>] link_path_walk+0x4f/0xcb > Jun 28 01:05:32 n1035 kernel: [1956924.517941] [<ffffffff8026f7cc>] get_unused_fd_flags+0x77/0x112 > Jun 28 01:05:32 n1035 kernel: [1956924.518097] [<ffffffff80277e6f>] do_path_lookup+0x1a1/0x1c5 > Jun 28 01:05:32 n1035 kernel: [1956924.518252] [<ffffffff80277fa8>] __path_lookup_intent_open+0x50/0x8d > Jun 28 01:05:32 n1035 kernel: [1956924.518409] [<ffffffff802787e2>] open_namei+0x86/0x606 > Jun 28 01:05:32 n1035 kernel: [1956924.518568] [<ffffffff8026f652>] do_filp_open+0x1c/0x3d > Jun 28 01:05:32 n1035 kernel: [1956924.518723] [<ffffffff8026f7cc>] get_unused_fd_flags+0x77/0x112 > Jun 28 01:05:32 n1035 kernel: [1956924.518880] [<ffffffff8026f979>] do_sys_open+0x46/0xca > Jun 28 01:05:32 n1035 kernel: [1956924.519034] [<ffffffff8020be0e>] system_call+0x7e/0x83 > Jun 28 01:05:32 n1035 kernel: [1956924.519187] > Jun 28 01:05:32 n1035 kernel: [1956924.519331] > Jun 28 01:05:32 n1035 kernel: [1956924.519332] Code: f0 ff 4f 28 0f 94 c0 84 c0 74 05 e9 92 fe ff ff c3 65 48 8b > Jun 28 01:05:32 n1035 kernel: [1956924.519813] RIP [<ffffffff802710c4>] fput+0x0/0x11 > Jun 28 01:05:32 n1035 kernel: [1956924.519967] RSP <ffff8100e792b950> > Jun 28 01:05:32 n1035 kernel: [1956924.520115] CR2: 0000000000000028 > Jun 28 01:05:32 n1035 kernel: [1956924.521139] ---[ end trace 436b46a0e9d6c8a6 ]--- > > We get several of these after one antoher. A little bit later we get: I'm not sure we really want to try go further with the stuff below until the breakage in the kernel is fixed. > > Jun 28 01:05:32 n1035 rpc.idmapd[2133]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files > Jun 28 01:05:32 n1035 last message repeated 2 times > Jun 28 01:05:35 n1035 mountd[32665]: authenticated unmount request from 10.10.8.70:867 for /local (/local) > Jun 28 01:05:50 n1035 mountd[32627]: authenticated unmount request from 10.10.13.37:600 for /local (/local) > Jun 28 01:05:52 n1035 rpc.idmapd[2133]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files > Jun 28 01:05:52 n1035 last message repeated 2 times > Jun 28 01:05:52 n1035 automount[18299]: AUTOFS_IOC_READY: Invalid argument > Jun 28 01:05:52 n1035 last message repeated 5 times > Jun 28 01:05:52 n1035 rpc.idmapd[2133]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files > Jun 28 01:05:53 n1035 last message repeated 10 times > Jun 28 01:05:53 n1035 automount[31737]: >> mount: n1037:/local failed, reason given by server: Permission denied > Jun 28 01:05:53 n1035 automount[31756]: >> mount: n0197:/local failed, reason given by server: Permission denied > Jun 28 01:05:53 n1035 automount[31743]: >> nfs bindresvport: Address already in use > Jun 28 01:05:53 n1035 automount[31766]: >> mount: n0365:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31754]: >> mount: n0717:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31778]: >> mount: n0986:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31739]: >> mount: n1315:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31774]: >> mount: n0919:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31781]: >> mount: n0006:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31741]: >> mount: n0592:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31753]: >> mount: n1080:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31757]: >> mount: n0168:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[18299]: AUTOFS_IOC_READY: Invalid argument > Jun 28 01:05:53 n1035 automount[31787]: >> mount: n0677:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31785]: >> mount: n0806:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31760]: >> mount: n0868:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31756]: mount(nfs): nfs: mount failure n0197:/local on /atlas/node/n0197 > Jun 28 01:05:53 n1035 automount[31737]: mount(nfs): nfs: mount failure n1037:/local on /atlas/node/n1037 > Jun 28 01:05:53 n1035 automount[31745]: >> mount: n1105:/local: can't read superblock > Jun 28 01:05:53 n1035 automount[31743]: mount(nfs): nfs: mount failure n0937:/local on /atlas/node/n0937 > Jun 28 01:05:53 n1035 rpc.idmapd[2133]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clnt32d3/idmap): Too many open files > Jun 28 01:05:53 n1035 automount[31753]: mount(nfs): nfs: mount failure n1080:/local on /atlas/node/n1080 > > > I guess since the mounts are not umounted anymore that should explain why we run out of ports. > > Any suggestions? Do you know this error? > > Again these boxes are all Debian Etch with 2.6.24.4 kernel (vanilla). and autof 4.1.4-13 on amd64. I can produce patches for 2.6.24.4 if you would like to test them. They are quite recent and haven't been posted for inclusion in any kernel yet. I would also include some other patches, that I hope will get into mainline, in the series. Ian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 8:47 ` Ian Kent @ 2008-06-30 10:08 ` Carsten Aulbert 2008-06-30 10:35 ` Ian Kent 0 siblings, 1 reply; 20+ messages in thread From: Carsten Aulbert @ 2008-06-30 10:08 UTC (permalink / raw) To: autofs Hi Ian, Ian Kent wrote: f 4.1.4-13 on amd64. > > I can produce patches for 2.6.24.4 if you would like to test them. > They are quite recent and haven't been posted for inclusion in any > kernel yet. I would also include some other patches, that I hope will > get into mainline, in the series. I think that would be a good option. I do hope there are not that many side effects with the other patches, but I definitely would like to get away from this bug. Thanks a lot Carsten ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 10:08 ` Carsten Aulbert @ 2008-06-30 10:35 ` Ian Kent 2008-06-30 13:18 ` Carsten Aulbert 0 siblings, 1 reply; 20+ messages in thread From: Ian Kent @ 2008-06-30 10:35 UTC (permalink / raw) To: Carsten Aulbert; +Cc: autofs On Mon, 2008-06-30 at 12:08 +0200, Carsten Aulbert wrote: > Hi Ian, > > Ian Kent wrote: > f 4.1.4-13 on amd64. > > > > I can produce patches for 2.6.24.4 if you would like to test them. > > They are quite recent and haven't been posted for inclusion in any > > kernel yet. I would also include some other patches, that I hope will > > get into mainline, in the series. > > I think that would be a good option. I do hope there are not that many > side effects with the other patches, but I definitely would like to get > away from this bug. This patch is a combined diff of the current bug fixes for autofs4, against 2.6.24 (it applied OK to vanilla 2.6.24.4). See how it goes. Ian --- linux-2.6.24.orig/fs/autofs4/waitq.c +++ linux-2.6.24/fs/autofs4/waitq.c @@ -28,6 +28,12 @@ void autofs4_catatonic_mode(struct autof { struct autofs_wait_queue *wq, *nwq; + mutex_lock(&sbi->wq_mutex); + if (sbi->catatonic) { + mutex_unlock(&sbi->wq_mutex); + return; + } + DPRINTK("entering catatonic mode"); sbi->catatonic = 1; @@ -36,13 +42,18 @@ void autofs4_catatonic_mode(struct autof while (wq) { nwq = wq->next; wq->status = -ENOENT; /* Magic is gone - report failure */ - kfree(wq->name); - wq->name = NULL; + if (wq->name.name) { + kfree(wq->name.name); + wq->name.name = NULL; + } + wq->wait_ctr--; wake_up_interruptible(&wq->queue); wq = nwq; } fput(sbi->pipe); /* Close the pipe */ sbi->pipe = NULL; + sbi->pipefd = -1; + mutex_unlock(&sbi->wq_mutex); } static int autofs4_write(struct file *file, const void *addr, int bytes) @@ -89,10 +100,11 @@ static void autofs4_notify_daemon(struct union autofs_packet_union v4_pkt; union autofs_v5_packet_union v5_pkt; } pkt; + struct file *pipe = NULL; size_t pktsz; DPRINTK("wait id = 0x%08lx, name = %.*s, type=%d", - wq->wait_queue_token, wq->len, wq->name, type); + wq->wait_queue_token, wq->name.len, wq->name.name, type); memset(&pkt,0,sizeof pkt); /* For security reasons */ @@ -107,9 +119,9 @@ static void autofs4_notify_daemon(struct pktsz = sizeof(*mp); mp->wait_queue_token = wq->wait_queue_token; - mp->len = wq->len; - memcpy(mp->name, wq->name, wq->len); - mp->name[wq->len] = '\0'; + mp->len = wq->name.len; + memcpy(mp->name, wq->name.name, wq->name.len); + mp->name[wq->name.len] = '\0'; break; } case autofs_ptype_expire_multi: @@ -119,9 +131,9 @@ static void autofs4_notify_daemon(struct pktsz = sizeof(*ep); ep->wait_queue_token = wq->wait_queue_token; - ep->len = wq->len; - memcpy(ep->name, wq->name, wq->len); - ep->name[wq->len] = '\0'; + ep->len = wq->name.len; + memcpy(ep->name, wq->name.name, wq->name.len); + ep->name[wq->name.len] = '\0'; break; } /* @@ -138,9 +150,9 @@ static void autofs4_notify_daemon(struct pktsz = sizeof(*packet); packet->wait_queue_token = wq->wait_queue_token; - packet->len = wq->len; - memcpy(packet->name, wq->name, wq->len); - packet->name[wq->len] = '\0'; + packet->len = wq->name.len; + memcpy(packet->name, wq->name.name, wq->name.len); + packet->name[wq->name.len] = '\0'; packet->dev = wq->dev; packet->ino = wq->ino; packet->uid = wq->uid; @@ -154,8 +166,19 @@ static void autofs4_notify_daemon(struct return; } - if (autofs4_write(sbi->pipe, &pkt, pktsz)) - autofs4_catatonic_mode(sbi); + /* Check if we have become catatonic */ + mutex_lock(&sbi->wq_mutex); + if (!sbi->catatonic) { + pipe = sbi->pipe; + get_file(pipe); + } + mutex_unlock(&sbi->wq_mutex); + + if (pipe) { + if (autofs4_write(pipe, &pkt, pktsz)) + autofs4_catatonic_mode(sbi); + fput(pipe); + } } static int autofs4_getpath(struct autofs_sb_info *sbi, @@ -171,7 +194,7 @@ static int autofs4_getpath(struct autofs for (tmp = dentry ; tmp != root ; tmp = tmp->d_parent) len += tmp->d_name.len + 1; - if (--len > NAME_MAX) { + if (!len || --len > NAME_MAX) { spin_unlock(&dcache_lock); return 0; } @@ -191,58 +214,55 @@ static int autofs4_getpath(struct autofs } static struct autofs_wait_queue * -autofs4_find_wait(struct autofs_sb_info *sbi, - char *name, unsigned int hash, unsigned int len) +autofs4_find_wait(struct autofs_sb_info *sbi, struct qstr *qstr) { struct autofs_wait_queue *wq; for (wq = sbi->queues; wq; wq = wq->next) { - if (wq->hash == hash && - wq->len == len && - wq->name && !memcmp(wq->name, name, len)) + if (wq->name.hash == qstr->hash && + wq->name.len == qstr->len && + wq->name.name && + !memcmp(wq->name.name, qstr->name, qstr->len)) break; } return wq; } -int autofs4_wait(struct autofs_sb_info *sbi, struct dentry *dentry, - enum autofs_notify notify) +/* + * Check if we have a valid request. + * Returns + * 1 if the request should continue. + * In this case we can return an autofs_wait_queue entry if one is + * found or NULL to idicate a new wait needs to be created. + * 0 or a negative errno if the request shouldn't continue. + */ +static int validate_request(struct autofs_wait_queue **wait, + struct autofs_sb_info *sbi, + struct qstr *qstr, + struct dentry*dentry, enum autofs_notify notify) { - struct autofs_info *ino; struct autofs_wait_queue *wq; - char *name; - unsigned int len = 0; - unsigned int hash = 0; - int status, type; - - /* In catatonic mode, we don't wait for nobody */ - if (sbi->catatonic) - return -ENOENT; - - name = kmalloc(NAME_MAX + 1, GFP_KERNEL); - if (!name) - return -ENOMEM; + struct autofs_info *ino; - /* If this is a direct mount request create a dummy name */ - if (IS_ROOT(dentry) && (sbi->type & AUTOFS_TYPE_DIRECT)) - len = sprintf(name, "%p", dentry); - else { - len = autofs4_getpath(sbi, dentry, &name); - if (!len) { - kfree(name); - return -ENOENT; - } + /* Wait in progress, continue; */ + wq = autofs4_find_wait(sbi, qstr); + if (wq) { + *wait = wq; + return 1; } - hash = full_name_hash(name, len); - if (mutex_lock_interruptible(&sbi->wq_mutex)) { - kfree(name); - return -EINTR; - } + *wait = NULL; - wq = autofs4_find_wait(sbi, name, hash, len); + /* If we don't yet have any info this is a new request */ ino = autofs4_dentry_ino(dentry); - if (!wq && ino && notify == NFY_NONE) { + if (!ino) + return 1; + + /* + * If we've been asked to wait on an existing expire (NFY_NONE) + * but there is no wait in the queue ... + */ + if (notify == NFY_NONE) { /* * Either we've betean the pending expire to post it's * wait or it finished while we waited on the mutex. @@ -253,13 +273,14 @@ int autofs4_wait(struct autofs_sb_info * while (ino->flags & AUTOFS_INF_EXPIRING) { mutex_unlock(&sbi->wq_mutex); schedule_timeout_interruptible(HZ/10); - if (mutex_lock_interruptible(&sbi->wq_mutex)) { - kfree(name); + if (mutex_lock_interruptible(&sbi->wq_mutex)) return -EINTR; + + wq = autofs4_find_wait(sbi, qstr); + if (wq) { + *wait = wq; + return 1; } - wq = autofs4_find_wait(sbi, name, hash, len); - if (wq) - break; } /* @@ -267,18 +288,85 @@ int autofs4_wait(struct autofs_sb_info * * cases where we wait on NFY_NONE neither depend on the * return status of the wait. */ - if (!wq) { + return 0; + } + + /* + * If we've been asked to trigger a mount and the request + * completed while we waited on the mutex ... + */ + if (notify == NFY_MOUNT) { + /* + * If the dentry isn't hashed just go ahead and try the + * mount again with a new wait (not much else we can do). + */ + if (!d_unhashed(dentry)) { + /* + * But if the dentry is hashed, that means that we + * got here through the revalidate path. Thus, we + * need to check if the dentry has been mounted + * while we waited on the wq_mutex. If it has, + * simply return success. + */ + if (d_mountpoint(dentry)) + return 0; + } + } + + return 1; +} + +int autofs4_wait(struct autofs_sb_info *sbi, struct dentry *dentry, + enum autofs_notify notify) +{ + struct autofs_wait_queue *wq; + struct qstr qstr; + char *name; + int status, ret, type; + + /* In catatonic mode, we don't wait for nobody */ + if (sbi->catatonic) + return -ENOENT; + + if (!dentry->d_inode && + (sbi->type & (AUTOFS_TYPE_DIRECT | AUTOFS_TYPE_OFFSET))) + return -ENOENT; + + name = kmalloc(NAME_MAX + 1, GFP_KERNEL); + if (!name) + return -ENOMEM; + + /* If this is a direct mount request create a dummy name */ + if (IS_ROOT(dentry) && (sbi->type & AUTOFS_TYPE_DIRECT)) + qstr.len = sprintf(name, "%p", dentry); + else { + qstr.len = autofs4_getpath(sbi, dentry, &name); + if (!qstr.len) { kfree(name); - mutex_unlock(&sbi->wq_mutex); - return 0; + return -ENOENT; } } + qstr.name = name; + qstr.hash = full_name_hash(name, qstr.len); + + if (mutex_lock_interruptible(&sbi->wq_mutex)) { + kfree(qstr.name); + return -EINTR; + } + + ret = validate_request(&wq, sbi, &qstr, dentry, notify); + if (ret <= 0) { + if (ret == 0) + mutex_unlock(&sbi->wq_mutex); + kfree(qstr.name); + return ret; + } if (!wq) { /* Create a new wait queue */ wq = kmalloc(sizeof(struct autofs_wait_queue),GFP_KERNEL); if (!wq) { - kfree(name); + kfree(qstr.name); mutex_unlock(&sbi->wq_mutex); return -ENOMEM; } @@ -289,9 +377,7 @@ int autofs4_wait(struct autofs_sb_info * wq->next = sbi->queues; sbi->queues = wq; init_waitqueue_head(&wq->queue); - wq->hash = hash; - wq->name = name; - wq->len = len; + memcpy(&wq->name, &qstr, sizeof(struct qstr)); wq->dev = autofs4_get_dev(sbi); wq->ino = autofs4_get_ino(sbi); wq->uid = current->uid; @@ -299,7 +385,7 @@ int autofs4_wait(struct autofs_sb_info * wq->pid = current->pid; wq->tgid = current->tgid; wq->status = -EINTR; /* Status return if interrupted */ - atomic_set(&wq->wait_ctr, 2); + wq->wait_ctr = 2; mutex_unlock(&sbi->wq_mutex); if (sbi->version < 5) { @@ -319,28 +405,25 @@ int autofs4_wait(struct autofs_sb_info * } DPRINTK("new wait id = 0x%08lx, name = %.*s, nfy=%d\n", - (unsigned long) wq->wait_queue_token, wq->len, wq->name, notify); + (unsigned long) wq->wait_queue_token, wq->name.len, + wq->name.name, notify); /* autofs4_notify_daemon() may block */ autofs4_notify_daemon(sbi, wq, type); } else { - atomic_inc(&wq->wait_ctr); + wq->wait_ctr++; mutex_unlock(&sbi->wq_mutex); - kfree(name); + kfree(qstr.name); DPRINTK("existing wait id = 0x%08lx, name = %.*s, nfy=%d", - (unsigned long) wq->wait_queue_token, wq->len, wq->name, notify); + (unsigned long) wq->wait_queue_token, wq->name.len, + wq->name.name, notify); } - /* wq->name is NULL if and only if the lock is already released */ - - if (sbi->catatonic) { - /* We might have slept, so check again for catatonic mode */ - wq->status = -ENOENT; - kfree(wq->name); - wq->name = NULL; - } - - if (wq->name) { + /* + * wq->name.name is NULL iff the lock is already released + * or the mount has been made catatonic. + */ + if (wq->name.name) { /* Block all but "shutdown" signals while waiting */ sigset_t oldset; unsigned long irqflags; @@ -351,7 +434,7 @@ int autofs4_wait(struct autofs_sb_info * recalc_sigpending(); spin_unlock_irqrestore(¤t->sighand->siglock, irqflags); - wait_event_interruptible(wq->queue, wq->name == NULL); + wait_event_interruptible(wq->queue, wq->name.name == NULL); spin_lock_irqsave(¤t->sighand->siglock, irqflags); current->blocked = oldset; @@ -364,8 +447,10 @@ int autofs4_wait(struct autofs_sb_info * status = wq->status; /* Are we the last process to need status? */ - if (atomic_dec_and_test(&wq->wait_ctr)) + mutex_lock(&sbi->wq_mutex); + if (!--wq->wait_ctr) kfree(wq); + mutex_unlock(&sbi->wq_mutex); return status; } @@ -387,16 +472,13 @@ int autofs4_wait_release(struct autofs_s } *wql = wq->next; /* Unlink from chain */ - mutex_unlock(&sbi->wq_mutex); - kfree(wq->name); - wq->name = NULL; /* Do not wait on this queue */ - + kfree(wq->name.name); + wq->name.name = NULL; /* Do not wait on this queue */ wq->status = status; - - if (atomic_dec_and_test(&wq->wait_ctr)) /* Is anyone still waiting for this guy? */ + wake_up_interruptible(&wq->queue); + if (!--wq->wait_ctr) kfree(wq); - else - wake_up_interruptible(&wq->queue); + mutex_unlock(&sbi->wq_mutex); return 0; } --- linux-2.6.24.orig/fs/autofs4/expire.c +++ linux-2.6.24/fs/autofs4/expire.c @@ -73,8 +73,8 @@ static int autofs4_mount_busy(struct vfs status = 0; done: DPRINTK("returning = %d", status); - mntput(mnt); dput(dentry); + mntput(mnt); return status; } @@ -333,7 +333,7 @@ static struct dentry *autofs4_expire_ind /* Can we expire this guy */ if (autofs4_can_expire(dentry, timeout, do_now)) { expired = dentry; - break; + goto found; } goto next; } @@ -352,7 +352,7 @@ static struct dentry *autofs4_expire_ind inf->flags |= AUTOFS_INF_EXPIRING; spin_unlock(&sbi->fs_lock); expired = dentry; - break; + goto found; } spin_unlock(&sbi->fs_lock); /* @@ -363,7 +363,7 @@ static struct dentry *autofs4_expire_ind expired = autofs4_check_leaves(mnt, dentry, timeout, do_now); if (expired) { dput(dentry); - break; + goto found; } } next: @@ -371,18 +371,16 @@ next: spin_lock(&dcache_lock); next = next->next; } - - if (expired) { - DPRINTK("returning %p %.*s", - expired, (int)expired->d_name.len, expired->d_name.name); - spin_lock(&dcache_lock); - list_move(&expired->d_parent->d_subdirs, &expired->d_u.d_child); - spin_unlock(&dcache_lock); - return expired; - } spin_unlock(&dcache_lock); - return NULL; + +found: + DPRINTK("returning %p %.*s", + expired, (int)expired->d_name.len, expired->d_name.name); + spin_lock(&dcache_lock); + list_move(&expired->d_parent->d_subdirs, &expired->d_u.d_child); + spin_unlock(&dcache_lock); + return expired; } /* Perform an expiry operation */ --- linux-2.6.24.orig/fs/autofs4/root.c +++ linux-2.6.24/fs/autofs4/root.c @@ -25,25 +25,25 @@ static int autofs4_dir_rmdir(struct inod static int autofs4_dir_mkdir(struct inode *,struct dentry *,int); static int autofs4_root_ioctl(struct inode *, struct file *,unsigned int,unsigned long); static int autofs4_dir_open(struct inode *inode, struct file *file); -static int autofs4_dir_close(struct inode *inode, struct file *file); -static int autofs4_dir_readdir(struct file * filp, void * dirent, filldir_t filldir); -static int autofs4_root_readdir(struct file * filp, void * dirent, filldir_t filldir); static struct dentry *autofs4_lookup(struct inode *,struct dentry *, struct nameidata *); static void *autofs4_follow_link(struct dentry *, struct nameidata *); +#define TRIGGER_FLAGS (LOOKUP_CONTINUE | LOOKUP_DIRECTORY) +#define TRIGGER_INTENTS (LOOKUP_OPEN | LOOKUP_ACCESS | LOOKUP_CREATE) + const struct file_operations autofs4_root_operations = { .open = dcache_dir_open, .release = dcache_dir_close, .read = generic_read_dir, - .readdir = autofs4_root_readdir, + .readdir = dcache_readdir, .ioctl = autofs4_root_ioctl, }; const struct file_operations autofs4_dir_operations = { .open = autofs4_dir_open, - .release = autofs4_dir_close, + .release = dcache_dir_close, .read = generic_read_dir, - .readdir = autofs4_dir_readdir, + .readdir = dcache_readdir, }; const struct inode_operations autofs4_indirect_root_inode_operations = { @@ -70,42 +70,10 @@ const struct inode_operations autofs4_di .rmdir = autofs4_dir_rmdir, }; -static int autofs4_root_readdir(struct file *file, void *dirent, - filldir_t filldir) -{ - struct autofs_sb_info *sbi = autofs4_sbi(file->f_path.dentry->d_sb); - int oz_mode = autofs4_oz_mode(sbi); - - DPRINTK("called, filp->f_pos = %lld", file->f_pos); - - /* - * Don't set reghost flag if: - * 1) f_pos is larger than zero -- we've already been here. - * 2) we haven't even enabled reghosting in the 1st place. - * 3) this is the daemon doing a readdir - */ - if (oz_mode && file->f_pos == 0 && sbi->reghost_enabled) - sbi->needs_reghost = 1; - - DPRINTK("needs_reghost = %d", sbi->needs_reghost); - - return dcache_readdir(file, dirent, filldir); -} - static int autofs4_dir_open(struct inode *inode, struct file *file) { struct dentry *dentry = file->f_path.dentry; - struct vfsmount *mnt = file->f_path.mnt; struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb); - struct dentry *cursor; - int status; - - status = dcache_dir_open(inode, file); - if (status) - goto out; - - cursor = file->private_data; - cursor->d_fsdata = NULL; DPRINTK("file=%p dentry=%p %.*s", file, dentry, dentry->d_name.len, dentry->d_name.name); @@ -113,136 +81,27 @@ static int autofs4_dir_open(struct inode if (autofs4_oz_mode(sbi)) goto out; - if (autofs4_ispending(dentry)) { - DPRINTK("dentry busy"); - dcache_dir_close(inode, file); - status = -EBUSY; - goto out; - } - - status = -ENOENT; - if (!d_mountpoint(dentry) && dentry->d_op && dentry->d_op->d_revalidate) { - struct nameidata nd; - int empty, ret; - - /* In case there are stale directory dentrys from a failed mount */ - spin_lock(&dcache_lock); - empty = list_empty(&dentry->d_subdirs); - spin_unlock(&dcache_lock); - - if (!empty) - d_invalidate(dentry); - - nd.flags = LOOKUP_DIRECTORY; - ret = (dentry->d_op->d_revalidate)(dentry, &nd); - - if (ret <= 0) { - if (ret < 0) - status = ret; - dcache_dir_close(inode, file); - goto out; - } - } - - if (d_mountpoint(dentry)) { - struct file *fp = NULL; - struct vfsmount *fp_mnt = mntget(mnt); - struct dentry *fp_dentry = dget(dentry); - - if (!autofs4_follow_mount(&fp_mnt, &fp_dentry)) { - dput(fp_dentry); - mntput(fp_mnt); - dcache_dir_close(inode, file); - goto out; - } - - fp = dentry_open(fp_dentry, fp_mnt, file->f_flags); - status = PTR_ERR(fp); - if (IS_ERR(fp)) { - dcache_dir_close(inode, file); - goto out; - } - cursor->d_fsdata = fp; - } - return 0; -out: - return status; -} - -static int autofs4_dir_close(struct inode *inode, struct file *file) -{ - struct dentry *dentry = file->f_path.dentry; - struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb); - struct dentry *cursor = file->private_data; - int status = 0; - - DPRINTK("file=%p dentry=%p %.*s", - file, dentry, dentry->d_name.len, dentry->d_name.name); - - if (autofs4_oz_mode(sbi)) - goto out; - - if (autofs4_ispending(dentry)) { - DPRINTK("dentry busy"); - status = -EBUSY; - goto out; - } - - if (d_mountpoint(dentry)) { - struct file *fp = cursor->d_fsdata; - if (!fp) { - status = -ENOENT; - goto out; - } - filp_close(fp, current->files); - } -out: - dcache_dir_close(inode, file); - return status; -} - -static int autofs4_dir_readdir(struct file *file, void *dirent, filldir_t filldir) -{ - struct dentry *dentry = file->f_path.dentry; - struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb); - struct dentry *cursor = file->private_data; - int status; - - DPRINTK("file=%p dentry=%p %.*s", - file, dentry, dentry->d_name.len, dentry->d_name.name); - - if (autofs4_oz_mode(sbi)) - goto out; - - if (autofs4_ispending(dentry)) { - DPRINTK("dentry busy"); - return -EBUSY; - } - - if (d_mountpoint(dentry)) { - struct file *fp = cursor->d_fsdata; - - if (!fp) - return -ENOENT; - - if (!fp->f_op || !fp->f_op->readdir) - goto out; + /* + * An empty directory in an autofs file system is always a + * mount point. The daemon must have failed to mount this + * during lookup so it doesn't exist. This can happen, for + * example, if user space returns an incorrect status for a + * mount request. Otherwise we're doing a readdir on the + * autofs file system so just let the libfs routines handle + * it. + */ + if (!d_mountpoint(dentry) && __simple_empty(dentry)) + return -ENOENT; - status = vfs_readdir(fp, filldir, dirent); - file->f_pos = fp->f_pos; - if (status) - autofs4_copy_atime(file, fp); - return status; - } out: - return dcache_readdir(file, dirent, filldir); + return dcache_dir_open(inode, file); } static int try_to_fill_dentry(struct dentry *dentry, int flags) { struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb); struct autofs_info *ino = autofs4_dentry_ino(dentry); - int status = 0; + int status; /* Block on any pending expiry here; invalidate the dentry when expiration is done to trigger mount request with a new @@ -291,7 +150,7 @@ static int try_to_fill_dentry(struct den return status; } /* Trigger mount for path component or follow link */ - } else if (flags & (LOOKUP_CONTINUE | LOOKUP_DIRECTORY) || + } else if (flags & (TRIGGER_FLAGS | TRIGGER_INTENTS) || current->link_count) { DPRINTK("waiting for mount name=%.*s", dentry->d_name.len, dentry->d_name.name); @@ -318,7 +177,8 @@ static int try_to_fill_dentry(struct den spin_lock(&dentry->d_lock); dentry->d_flags &= ~DCACHE_AUTOFS_PENDING; spin_unlock(&dentry->d_lock); - return status; + + return 0; } /* For autofs direct mounts the follow link triggers the mount */ @@ -335,7 +195,7 @@ static void *autofs4_follow_link(struct nd->flags); /* If it's our master or we shouldn't trigger a mount we're done */ - lookup_type = nd->flags & (LOOKUP_CONTINUE | LOOKUP_DIRECTORY); + lookup_type = nd->flags & (TRIGGER_FLAGS | TRIGGER_INTENTS); if (oz_mode || !lookup_type) goto done; @@ -470,10 +330,12 @@ void autofs4_dentry_release(struct dentr struct autofs_sb_info *sbi = autofs4_sbi(de->d_sb); if (sbi) { - spin_lock(&sbi->rehash_lock); - if (!list_empty(&inf->rehash)) - list_del(&inf->rehash); - spin_unlock(&sbi->rehash_lock); + spin_lock(&sbi->lookup_lock); + if (!list_empty(&inf->active)) + list_del(&inf->active); + if (!list_empty(&inf->expiring)) + list_del(&inf->expiring); + spin_unlock(&sbi->lookup_lock); } inf->dentry = NULL; @@ -495,7 +357,7 @@ static struct dentry_operations autofs4_ .d_release = autofs4_dentry_release, }; -static struct dentry *autofs4_lookup_unhashed(struct autofs_sb_info *sbi, struct dentry *parent, struct qstr *name) +static struct dentry *autofs4_lookup_active(struct autofs_sb_info *sbi, struct dentry *parent, struct qstr *name) { unsigned int len = name->len; unsigned int hash = name->hash; @@ -503,14 +365,66 @@ static struct dentry *autofs4_lookup_unh struct list_head *p, *head; spin_lock(&dcache_lock); - spin_lock(&sbi->rehash_lock); - head = &sbi->rehash_list; + spin_lock(&sbi->lookup_lock); + head = &sbi->active_list; list_for_each(p, head) { struct autofs_info *ino; struct dentry *dentry; struct qstr *qstr; - ino = list_entry(p, struct autofs_info, rehash); + ino = list_entry(p, struct autofs_info, active); + dentry = ino->dentry; + + spin_lock(&dentry->d_lock); + + /* Already gone? */ + if (atomic_read(&dentry->d_count) == 0) + goto next; + + qstr = &dentry->d_name; + + if (dentry->d_name.hash != hash) + goto next; + if (dentry->d_parent != parent) + goto next; + + if (qstr->len != len) + goto next; + if (memcmp(qstr->name, str, len)) + goto next; + + if (d_unhashed(dentry)) { + dget(dentry); + spin_unlock(&dentry->d_lock); + spin_unlock(&sbi->lookup_lock); + spin_unlock(&dcache_lock); + return dentry; + } +next: + spin_unlock(&dentry->d_lock); + } + spin_unlock(&sbi->lookup_lock); + spin_unlock(&dcache_lock); + + return NULL; +} + +static struct dentry *autofs4_lookup_expiring(struct autofs_sb_info *sbi, struct dentry *parent, struct qstr *name) +{ + unsigned int len = name->len; + unsigned int hash = name->hash; + const unsigned char *str = name->name; + struct list_head *p, *head; + + spin_lock(&dcache_lock); + spin_lock(&sbi->lookup_lock); + head = &sbi->expiring_list; + list_for_each(p, head) { + struct autofs_info *ino; + struct dentry *dentry; + struct qstr *qstr; + + ino = list_entry(p, struct autofs_info, expiring); dentry = ino->dentry; spin_lock(&dentry->d_lock); @@ -532,33 +446,17 @@ static struct dentry *autofs4_lookup_unh goto next; if (d_unhashed(dentry)) { - struct autofs_info *ino = autofs4_dentry_ino(dentry); - struct inode *inode = dentry->d_inode; - - list_del_init(&ino->rehash); + list_del_init(&ino->expiring); dget(dentry); - /* - * Make the rehashed dentry negative so the VFS - * behaves as it should. - */ - if (inode) { - dentry->d_inode = NULL; - list_del_init(&dentry->d_alias); - spin_unlock(&dentry->d_lock); - spin_unlock(&sbi->rehash_lock); - spin_unlock(&dcache_lock); - iput(inode); - return dentry; - } spin_unlock(&dentry->d_lock); - spin_unlock(&sbi->rehash_lock); + spin_unlock(&sbi->lookup_lock); spin_unlock(&dcache_lock); return dentry; } next: spin_unlock(&dentry->d_lock); } - spin_unlock(&sbi->rehash_lock); + spin_unlock(&sbi->lookup_lock); spin_unlock(&dcache_lock); return NULL; @@ -568,7 +466,8 @@ next: static struct dentry *autofs4_lookup(struct inode *dir, struct dentry *dentry, struct nameidata *nd) { struct autofs_sb_info *sbi; - struct dentry *unhashed; + struct autofs_info *ino; + struct dentry *expiring, *unhashed; int oz_mode; DPRINTK("name = %.*s", @@ -584,8 +483,28 @@ static struct dentry *autofs4_lookup(str DPRINTK("pid = %u, pgrp = %u, catatonic = %d, oz_mode = %d", current->pid, task_pgrp_nr(current), sbi->catatonic, oz_mode); - unhashed = autofs4_lookup_unhashed(sbi, dentry->d_parent, &dentry->d_name); - if (!unhashed) { + expiring = autofs4_lookup_expiring(sbi, dentry->d_parent, &dentry->d_name); + if (expiring) { + /* + * If we are racing with expire the request might not + * be quite complete but the directory has been removed + * so it must have been successful, so just wait for it. + */ + ino = autofs4_dentry_ino(expiring); + while (ino && (ino->flags & AUTOFS_INF_EXPIRING)) { + DPRINTK("wait for incomplete expire %p name=%.*s", + expiring, expiring->d_name.len, + expiring->d_name.name); + autofs4_wait(sbi, expiring, NFY_NONE); + DPRINTK("request completed"); + } + dput(expiring); + } + + unhashed = autofs4_lookup_active(sbi, dentry->d_parent, &dentry->d_name); + if (unhashed) + dentry = unhashed; + else { /* * Mark the dentry incomplete but don't hash it. We do this * to serialize our inode creation operations (symlink and @@ -599,39 +518,34 @@ static struct dentry *autofs4_lookup(str */ dentry->d_op = &autofs4_root_dentry_operations; - dentry->d_fsdata = NULL; - d_instantiate(dentry, NULL); - } else { - struct autofs_info *ino = autofs4_dentry_ino(unhashed); - DPRINTK("rehash %p with %p", dentry, unhashed); /* - * If we are racing with expire the request might not - * be quite complete but the directory has been removed - * so it must have been successful, so just wait for it. - * We need to ensure the AUTOFS_INF_EXPIRING flag is clear - * before continuing as revalidate may fail when calling - * try_to_fill_dentry (returning EAGAIN) if we don't. + * And we need to ensure that the same dentry is used for + * all following lookup calls until it is hashed so that + * the dentry flags are persistent throughout the request. */ - while (ino && (ino->flags & AUTOFS_INF_EXPIRING)) { - DPRINTK("wait for incomplete expire %p name=%.*s", - unhashed, unhashed->d_name.len, - unhashed->d_name.name); - autofs4_wait(sbi, unhashed, NFY_NONE); - DPRINTK("request completed"); - } - dentry = unhashed; + ino = autofs4_init_ino(NULL, sbi, 0555); + if (!ino) + return ERR_PTR(-ENOMEM); + + dentry->d_fsdata = ino; + ino->dentry = dentry; + + spin_lock(&sbi->lookup_lock); + list_add(&ino->active, &sbi->active_list); + spin_unlock(&sbi->lookup_lock); + + d_instantiate(dentry, NULL); } if (!oz_mode) { spin_lock(&dentry->d_lock); dentry->d_flags |= DCACHE_AUTOFS_PENDING; spin_unlock(&dentry->d_lock); - } - - if (dentry->d_op && dentry->d_op->d_revalidate) { - mutex_unlock(&dir->i_mutex); - (dentry->d_op->d_revalidate)(dentry, nd); - mutex_lock(&dir->i_mutex); + if (dentry->d_op && dentry->d_op->d_revalidate) { + mutex_unlock(&dir->i_mutex); + (dentry->d_op->d_revalidate)(dentry, nd); + mutex_lock(&dir->i_mutex); + } } /* @@ -650,9 +564,11 @@ static struct dentry *autofs4_lookup(str return ERR_PTR(-ERESTARTNOINTR); } } - spin_lock(&dentry->d_lock); - dentry->d_flags &= ~DCACHE_AUTOFS_PENDING; - spin_unlock(&dentry->d_lock); + if (!oz_mode) { + spin_lock(&dentry->d_lock); + dentry->d_flags &= ~DCACHE_AUTOFS_PENDING; + spin_unlock(&dentry->d_lock); + } } /* @@ -683,7 +599,7 @@ static struct dentry *autofs4_lookup(str } if (unhashed) - return dentry; + return unhashed; return NULL; } @@ -705,20 +621,30 @@ static int autofs4_dir_symlink(struct in return -EACCES; ino = autofs4_init_ino(ino, sbi, S_IFLNK | 0555); - if (ino == NULL) - return -ENOSPC; + if (!ino) + return -ENOMEM; - ino->size = strlen(symname); - ino->u.symlink = cp = kmalloc(ino->size + 1, GFP_KERNEL); - - if (cp == NULL) { - kfree(ino); - return -ENOSPC; + spin_lock(&sbi->lookup_lock); + if (!list_empty(&ino->active)) + list_del_init(&ino->active); + spin_unlock(&sbi->lookup_lock); + + cp = kmalloc(ino->size + 1, GFP_KERNEL); + if (!cp) { + if (!dentry->d_fsdata) + kfree(ino); + return -ENOMEM; } strcpy(cp, symname); inode = autofs4_get_inode(dir->i_sb, ino); + if (!inode) { + kfree(cp); + if (!dentry->d_fsdata) + kfree(ino); + return -ENOMEM; + } d_add(dentry, inode); if (dir == dir->i_sb->s_root->d_inode) @@ -734,6 +660,8 @@ static int autofs4_dir_symlink(struct in atomic_inc(&p_ino->count); ino->inode = inode; + ino->size = strlen(symname); + ino->u.symlink = cp; dir->i_mtime = CURRENT_TIME; return 0; @@ -746,9 +674,8 @@ static int autofs4_dir_symlink(struct in * that the file no longer exists. However, doing that means that the * VFS layer can turn the dentry into a negative dentry. We don't want * this, because the unlink is probably the result of an expire. - * We simply d_drop it and add it to a rehash candidates list in the - * super block, which allows the dentry lookup to reuse it retaining - * the flags, such as expire in progress, in case we're racing with expire. + * We simply d_drop it and add it to a expiring list in the super block, + * which allows the dentry lookup to check for an incomplete expire. * * If a process is blocked on the dentry waiting for the expire to finish, * it will invalidate the dentry and try to mount with a new one. @@ -778,9 +705,10 @@ static int autofs4_dir_unlink(struct ino dir->i_mtime = CURRENT_TIME; spin_lock(&dcache_lock); - spin_lock(&sbi->rehash_lock); - list_add(&ino->rehash, &sbi->rehash_list); - spin_unlock(&sbi->rehash_lock); + spin_lock(&sbi->lookup_lock); + if (list_empty(&ino->expiring)) + list_add(&ino->expiring, &sbi->expiring_list); + spin_unlock(&sbi->lookup_lock); spin_lock(&dentry->d_lock); __d_drop(dentry); spin_unlock(&dentry->d_lock); @@ -806,9 +734,10 @@ static int autofs4_dir_rmdir(struct inod spin_unlock(&dcache_lock); return -ENOTEMPTY; } - spin_lock(&sbi->rehash_lock); - list_add(&ino->rehash, &sbi->rehash_list); - spin_unlock(&sbi->rehash_lock); + spin_lock(&sbi->lookup_lock); + if (list_empty(&ino->expiring)) + list_add(&ino->expiring, &sbi->expiring_list); + spin_unlock(&sbi->lookup_lock); spin_lock(&dentry->d_lock); __d_drop(dentry); spin_unlock(&dentry->d_lock); @@ -843,10 +772,20 @@ static int autofs4_dir_mkdir(struct inod dentry, dentry->d_name.len, dentry->d_name.name); ino = autofs4_init_ino(ino, sbi, S_IFDIR | 0555); - if (ino == NULL) - return -ENOSPC; + if (!ino) + return -ENOMEM; + + spin_lock(&sbi->lookup_lock); + if (!list_empty(&ino->active)) + list_del_init(&ino->active); + spin_unlock(&sbi->lookup_lock); inode = autofs4_get_inode(dir->i_sb, ino); + if (!inode) { + if (!dentry->d_fsdata) + kfree(ino); + return -ENOMEM; + } d_add(dentry, inode); if (dir == dir->i_sb->s_root->d_inode) @@ -899,44 +838,6 @@ static inline int autofs4_get_protosubve } /* - * Tells the daemon whether we need to reghost or not. Also, clears - * the reghost_needed flag. - */ -static inline int autofs4_ask_reghost(struct autofs_sb_info *sbi, int __user *p) -{ - int status; - - DPRINTK("returning %d", sbi->needs_reghost); - - status = put_user(sbi->needs_reghost, p); - if (status) - return status; - - sbi->needs_reghost = 0; - return 0; -} - -/* - * Enable / Disable reghosting ioctl() operation - */ -static inline int autofs4_toggle_reghost(struct autofs_sb_info *sbi, int __user *p) -{ - int status; - int val; - - status = get_user(val, p); - - DPRINTK("reghost = %d", val); - - if (status) - return status; - - /* turn on/off reghosting, with the val */ - sbi->reghost_enabled = val; - return 0; -} - -/* * Tells the daemon whether it can umount the autofs mount. */ static inline int autofs4_ask_umount(struct vfsmount *mnt, int __user *p) @@ -1000,11 +901,6 @@ static int autofs4_root_ioctl(struct ino case AUTOFS_IOC_SETTIMEOUT: return autofs4_get_set_timeout(sbi, p); - case AUTOFS_IOC_TOGGLEREGHOST: - return autofs4_toggle_reghost(sbi, p); - case AUTOFS_IOC_ASKREGHOST: - return autofs4_ask_reghost(sbi, p); - case AUTOFS_IOC_ASKUMOUNT: return autofs4_ask_umount(filp->f_path.mnt, p); --- linux-2.6.24.orig/fs/autofs4/autofs_i.h +++ linux-2.6.24/fs/autofs4/autofs_i.h @@ -52,7 +52,8 @@ struct autofs_info { int flags; - struct list_head rehash; + struct list_head active; + struct list_head expiring; struct autofs_sb_info *sbi; unsigned long last_used; @@ -74,9 +75,7 @@ struct autofs_wait_queue { struct autofs_wait_queue *next; autofs_wqt_t wait_queue_token; /* We use the following to see what we are waiting for */ - unsigned int hash; - unsigned int len; - char *name; + struct qstr name; u32 dev; u64 ino; uid_t uid; @@ -85,7 +84,7 @@ struct autofs_wait_queue { pid_t tgid; /* This is for status reporting upon return */ int status; - atomic_t wait_ctr; + unsigned int wait_ctr; }; #define AUTOFS_SBI_MAGIC 0x6d4a556d @@ -112,8 +111,9 @@ struct autofs_sb_info { struct mutex wq_mutex; spinlock_t fs_lock; struct autofs_wait_queue *queues; /* Wait queue pointer */ - spinlock_t rehash_lock; - struct list_head rehash_list; + spinlock_t lookup_lock; + struct list_head active_list; + struct list_head expiring_list; }; static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb) --- linux-2.6.24.orig/fs/autofs4/inode.c +++ linux-2.6.24/fs/autofs4/inode.c @@ -24,8 +24,10 @@ static void ino_lnkfree(struct autofs_info *ino) { - kfree(ino->u.symlink); - ino->u.symlink = NULL; + if (ino->u.symlink) { + kfree(ino->u.symlink); + ino->u.symlink = NULL; + } } struct autofs_info *autofs4_init_ino(struct autofs_info *ino, @@ -41,16 +43,18 @@ struct autofs_info *autofs4_init_ino(str if (ino == NULL) return NULL; - ino->flags = 0; - ino->mode = mode; - ino->inode = NULL; - ino->dentry = NULL; - ino->size = 0; - - INIT_LIST_HEAD(&ino->rehash); + if (!reinit) { + ino->flags = 0; + ino->inode = NULL; + ino->dentry = NULL; + ino->size = 0; + INIT_LIST_HEAD(&ino->active); + INIT_LIST_HEAD(&ino->expiring); + atomic_set(&ino->count, 0); + } + ino->mode = mode; ino->last_used = jiffies; - atomic_set(&ino->count, 0); ino->sbi = sbi; @@ -159,8 +163,8 @@ void autofs4_kill_sb(struct super_block if (!sbi) goto out_kill_sb; - if (!sbi->catatonic) - autofs4_catatonic_mode(sbi); /* Free wait queues, close pipe */ + /* Free wait queues, close pipe */ + autofs4_catatonic_mode(sbi); /* Clean up and release dangling references */ autofs4_force_release(sbi); @@ -333,8 +337,9 @@ int autofs4_fill_super(struct super_bloc mutex_init(&sbi->wq_mutex); spin_lock_init(&sbi->fs_lock); sbi->queues = NULL; - spin_lock_init(&sbi->rehash_lock); - INIT_LIST_HEAD(&sbi->rehash_list); + spin_lock_init(&sbi->lookup_lock); + INIT_LIST_HEAD(&sbi->active_list); + INIT_LIST_HEAD(&sbi->expiring_list); s->s_blocksize = 1024; s->s_blocksize_bits = 10; s->s_magic = AUTOFS_SUPER_MAGIC; --- linux-2.6.24.orig/fs/compat_ioctl.c +++ linux-2.6.24/fs/compat_ioctl.c @@ -2384,8 +2384,6 @@ COMPATIBLE_IOCTL(AUTOFS_IOC_PROTOVER) COMPATIBLE_IOCTL(AUTOFS_IOC_EXPIRE) COMPATIBLE_IOCTL(AUTOFS_IOC_EXPIRE_MULTI) COMPATIBLE_IOCTL(AUTOFS_IOC_PROTOSUBVER) -COMPATIBLE_IOCTL(AUTOFS_IOC_ASKREGHOST) -COMPATIBLE_IOCTL(AUTOFS_IOC_TOGGLEREGHOST) COMPATIBLE_IOCTL(AUTOFS_IOC_ASKUMOUNT) /* Raw devices */ COMPATIBLE_IOCTL(RAW_SETBIND) --- linux-2.6.24.orig/include/linux/auto_fs4.h +++ linux-2.6.24/include/linux/auto_fs4.h @@ -98,8 +98,6 @@ union autofs_v5_packet_union { #define AUTOFS_IOC_EXPIRE_INDIRECT AUTOFS_IOC_EXPIRE_MULTI #define AUTOFS_IOC_EXPIRE_DIRECT AUTOFS_IOC_EXPIRE_MULTI #define AUTOFS_IOC_PROTOSUBVER _IOR(0x93,0x67,int) -#define AUTOFS_IOC_ASKREGHOST _IOR(0x93,0x68,int) -#define AUTOFS_IOC_TOGGLEREGHOST _IOR(0x93,0x69,int) #define AUTOFS_IOC_ASKUMOUNT _IOR(0x93,0x70,int) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 10:35 ` Ian Kent @ 2008-06-30 13:18 ` Carsten Aulbert 2008-06-30 13:26 ` Ian Kent 2008-06-30 13:45 ` Steffen Grunewald 0 siblings, 2 replies; 20+ messages in thread From: Carsten Aulbert @ 2008-06-30 13:18 UTC (permalink / raw) To: Ian Kent; +Cc: autofs Hi Ian, Ian Kent wrote: > > This patch is a combined diff of the current bug fixes for autofs4, > against 2.6.24 (it applied OK to vanilla 2.6.24.4). It applies fine and we are about to test it on a single test node now. If that succeeds we will push it onto the cluster. Keeping my fingers crossed Thanks! Carsten ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 13:18 ` Carsten Aulbert @ 2008-06-30 13:26 ` Ian Kent 2008-06-30 14:15 ` Carsten Aulbert 2008-06-30 13:45 ` Steffen Grunewald 1 sibling, 1 reply; 20+ messages in thread From: Ian Kent @ 2008-06-30 13:26 UTC (permalink / raw) To: Carsten Aulbert; +Cc: autofs On Mon, 2008-06-30 at 15:18 +0200, Carsten Aulbert wrote: > Hi Ian, > > Ian Kent wrote: > > > > This patch is a combined diff of the current bug fixes for autofs4, > > against 2.6.24 (it applied OK to vanilla 2.6.24.4). > > It applies fine and we are about to test it on a single test node now. > If that succeeds we will push it onto the cluster. Make sure you test thoroughly. Ian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 13:26 ` Ian Kent @ 2008-06-30 14:15 ` Carsten Aulbert 2008-06-30 14:20 ` Ian Kent 0 siblings, 1 reply; 20+ messages in thread From: Carsten Aulbert @ 2008-06-30 14:15 UTC (permalink / raw) To: Ian Kent; +Cc: autofs Ian Kent wrote: > Make sure you test thoroughly. Sure! Hopefully we can stress it hard enough. Right now we are running out of files for nfsopen *without* hitting a kernel trace again so far! (good for you, somewhat bad for us ;)). Can we do anything to help you and the autofs package? Give you a box for testing, offer you a plane ticket + hotel to come here and do massive debugging on our system,...? Cheers Carsten -- Dr. Carsten Aulbert - Max Planck Institut für Gravitationsphysik Callinstraße 38, 30167 Hannover, Germany Fon: +49 511 762 17185, Fax: +49 511 762 17193 http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31 _______________________________________________ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 14:15 ` Carsten Aulbert @ 2008-06-30 14:20 ` Ian Kent 2008-06-30 14:32 ` Carsten Aulbert 0 siblings, 1 reply; 20+ messages in thread From: Ian Kent @ 2008-06-30 14:20 UTC (permalink / raw) To: Carsten Aulbert; +Cc: autofs On Mon, 2008-06-30 at 16:15 +0200, Carsten Aulbert wrote: > > Ian Kent wrote: > > > Make sure you test thoroughly. > > Sure! Hopefully we can stress it hard enough. Right now we are running > out of files for nfsopen *without* hitting a kernel trace again so far! > (good for you, somewhat bad for us ;)). > > Can we do anything to help you and the autofs package? Give you a box > for testing, offer you a plane ticket + hotel to come here and do > massive debugging on our system,...? Hahahaha! Nothing is that simple, especially for me, enough said. But your using version 4 which is a big minus for me as all my effort has been going into version 5 for over 3 years now (admittedly it's only been released for about 18 months). Encouraging the Debian folks to qa version 5 would be probably the best thing you could do for everyone. Ian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 14:20 ` Ian Kent @ 2008-06-30 14:32 ` Carsten Aulbert 2008-06-30 14:41 ` Ian Kent 2008-06-30 14:57 ` Steffen Grunewald 0 siblings, 2 replies; 20+ messages in thread From: Carsten Aulbert @ 2008-06-30 14:32 UTC (permalink / raw) To: Ian Kent; +Cc: autofs Ian Kent wrote: > Encouraging the Debian folks to qa version 5 would be probably the best > thing you could do for everyone. It's already in Debian, but only for the next stable version aka lenny: http://packages.debian.org/search?keywords=autofs5&searchon=names&suite=all§ion=all So, I don't know how much effort it would be to backport them to glibc 2.3 instead of 2.7 and so on... Might be worth a try, if you say it's stable enough ;) Cheers Carsten ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 14:32 ` Carsten Aulbert @ 2008-06-30 14:41 ` Ian Kent 2008-07-01 5:49 ` Carsten Aulbert 2008-06-30 14:57 ` Steffen Grunewald 1 sibling, 1 reply; 20+ messages in thread From: Ian Kent @ 2008-06-30 14:41 UTC (permalink / raw) To: Carsten Aulbert; +Cc: autofs On Mon, 2008-06-30 at 16:32 +0200, Carsten Aulbert wrote: > > Ian Kent wrote: > > > Encouraging the Debian folks to qa version 5 would be probably the best > > thing you could do for everyone. > > It's already in Debian, but only for the next stable version aka lenny: > > http://packages.debian.org/search?keywords=autofs5&searchon=names&suite=all§ion=all > > So, I don't know how much effort it would be to backport them to glibc > 2.3 instead of 2.7 and so on... Should be OK, as long as the pthreads library is stable (mmm ... when did nptl make it into glibc?) and you're running a recent (perhaps patched) kernel. > > Might be worth a try, if you say it's stable enough ;) Yeah, it's always hard to say "it's stable enough" because all I see most of the time are bug reports and there are still a some annoying problems. But a lot of work has gone into v5 and reports are quite positive. As well, bug fixes and enhancements tend to introduce new and interesting bugs so it's an ongoing effort. But then it's been included in RHEL-5 from initial release and it's holding up fine. Ian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 14:41 ` Ian Kent @ 2008-07-01 5:49 ` Carsten Aulbert 2008-07-01 6:03 ` Ian Kent 0 siblings, 1 reply; 20+ messages in thread From: Carsten Aulbert @ 2008-07-01 5:49 UTC (permalink / raw) To: Ian Kent; +Cc: autofs Hi again, Ian Kent wrote: > But then it's been included in RHEL-5 from initial release and it's > holding up fine. We'll try autofs5 on at leat one node possibly later today, either Jan, Steffen or I should succeed in backporting it (and getting around the LDAP problem). However I still have v4 related question. We're merciless and run this on one of the nodes: $ cat test_mount #!/bin/sh n_node=1000 for i in `seq 1 $n_node`;do n=`echo $RANDOM%1342+10001 | bc| sed -e "s/1/n/"` $HOME/bin/mount.sh $n& echo -n . done $ cat mount.sh #!/bin/sh dir="/distributed/spray/data/EatH/S5R1" ping -c1 -w1 $1 > /dev/null&& file="/atlas/node/$1$dir/"`ls -f /atlas/node/$1$dir/|head -n 50 | tail -n 1` md5sum ${file} Running this gives this in syslog: Jul 1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clntaa58/idmap): Too many open files Jul 1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clntaa58/idmap): Too many open files Jul 1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clntaa5e/idmap): Too many open files Jul 1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clntaa5e/idmap): Too many open files Jul 1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen: open(/var/lib/nfs/rpc_pipefs/nfs/clntaa9c/idmap): Too many open files Which is not surprising to me. However, there are a few things I'm wondering about. (1) Shall I try the nfs list at sourceforge or is that list only full of spam? (2) All our mounts use nfsvers=3 why is rpc.idmapd involved at all? (3) Why is this daemon growing so extremely large? # ps aux|grep rpc.idmapd root 2309 0.1 16.2 2037152 1326944 ? Ss Jun30 1:24 /usr/sbin/rpc.idmapd (4) The script maxes out at about 340 concurrent mounts, any idea how to increase this number? (5) Finally autofs related again: After running the script, /proc/mounts has these leftovers: n0765:/local /atlas/node/n0765 nfs rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.7.65 0 0 n1058:/local /atlas/node/n1058 nfs rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.10.58 0 0 n0232:/local /atlas/node/n0232 nfs rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.2.32 0 0 n0409:/local /atlas/node/n0409 nfs rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.4.9 0 0 n0022:/local /atlas/node/n0022 nfs rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.0.22 0 0 n0549:/local /atlas/node/n0549 nfs rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.5.49 0 0 n0016:/local /atlas/node/n0016 nfs rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.0.16 0 0 n0975:/local /atlas/node/n0975 nfs rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.9.75 0 0 Which I need to umount manually now, remove the empty directories under /atlas/node before I could restart autofs. Any idea or did we set-up our systems somewhat flawed? Cheers Carsten ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-07-01 5:49 ` Carsten Aulbert @ 2008-07-01 6:03 ` Ian Kent 2008-07-01 6:09 ` Carsten Aulbert 2008-07-15 14:23 ` Steffen Grunewald 0 siblings, 2 replies; 20+ messages in thread From: Ian Kent @ 2008-07-01 6:03 UTC (permalink / raw) To: Carsten Aulbert; +Cc: autofs On Tue, 2008-07-01 at 07:49 +0200, Carsten Aulbert wrote: > Hi again, > > Ian Kent wrote: > > But then it's been included in RHEL-5 from initial release and it's > > holding up fine. > > We'll try autofs5 on at leat one node possibly later today, either Jan, > Steffen or I should succeed in backporting it (and getting around the > LDAP problem). btw, I discovered a mistake in one of the patches included in the previous patch. I'll post an updated patch later today. Sorry for the trouble. > > However I still have v4 related question. > > We're merciless and run this on one of the nodes: > > $ cat test_mount > #!/bin/sh > > n_node=1000 > > for i in `seq 1 $n_node`;do > n=`echo $RANDOM%1342+10001 | bc| sed -e "s/1/n/"` > $HOME/bin/mount.sh $n& > echo -n . > done > > $ cat mount.sh > #!/bin/sh > > dir="/distributed/spray/data/EatH/S5R1" > > ping -c1 -w1 $1 > /dev/null&& file="/atlas/node/$1$dir/"`ls -f > /atlas/node/$1$dir/|head -n 50 | tail -n 1` > md5sum ${file} > > > Running this gives this in syslog: > Jul 1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen: > open(/var/lib/nfs/rpc_pipefs/nfs/clntaa58/idmap): Too many open files > Jul 1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen: > open(/var/lib/nfs/rpc_pipefs/nfs/clntaa58/idmap): Too many open files > Jul 1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen: > open(/var/lib/nfs/rpc_pipefs/nfs/clntaa5e/idmap): Too many open files > Jul 1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen: > open(/var/lib/nfs/rpc_pipefs/nfs/clntaa5e/idmap): Too many open files > Jul 1 07:37:19 n1312 rpc.idmapd[2309]: nfsopen: > open(/var/lib/nfs/rpc_pipefs/nfs/clntaa9c/idmap): Too many open files > > Which is not surprising to me. However, there are a few things I'm > wondering about. > > (1) Shall I try the nfs list at sourceforge or is that list only full of > spam? Yep, for sure, the NFS maintainer in present on that list, hopefully he will be able to help. > (2) All our mounts use nfsvers=3 why is rpc.idmapd involved at all? Not sure, I really must find time to get up to speed on that stuff. > (3) Why is this daemon growing so extremely large? > # ps aux|grep rpc.idmapd > root 2309 0.1 16.2 2037152 1326944 ? Ss Jun30 1:24 > /usr/sbin/rpc.idmapd Ditto. > (4) The script maxes out at about 340 concurrent mounts, any idea how to > increase this number? Complicated question. We can go into that further in a separate thread if you see bind to reserved port fail messages in the log otherwise I'm not sure so we would need to investigate. > (5) Finally autofs related again: After running the script, /proc/mounts > has these leftovers: > > n0765:/local /atlas/node/n0765 nfs > rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.7.65 > 0 0 > n1058:/local /atlas/node/n1058 nfs > rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.10.58 > 0 0 > n0232:/local /atlas/node/n0232 nfs > rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.2.32 > 0 0 > n0409:/local /atlas/node/n0409 nfs > rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.4.9 > 0 0 > n0022:/local /atlas/node/n0022 nfs > rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.0.22 > 0 0 > n0549:/local /atlas/node/n0549 nfs > rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.5.49 > 0 0 > n0016:/local /atlas/node/n0016 nfs > rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.0.16 > 0 0 > n0975:/local /atlas/node/n0975 nfs > rw,vers=3,rsize=32768,wsize=32768,soft,intr,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.10.9.75 > 0 0 > > > Which I need to umount manually now, remove the empty directories under > /atlas/node before I could restart autofs. Check if /etc/mtab is out of sync with /proc/mounts when you see this. If so then your mount(8) mtab locking is broken otherwise it's something else and, rather than try and dig up v4 patches, I'd recommend v5. I haven't been able to completely resolve this in v5 yet but it is much better so you will need to see how it goes. Ian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-07-01 6:03 ` Ian Kent @ 2008-07-01 6:09 ` Carsten Aulbert 2008-07-15 14:23 ` Steffen Grunewald 1 sibling, 0 replies; 20+ messages in thread From: Carsten Aulbert @ 2008-07-01 6:09 UTC (permalink / raw) To: Ian Kent; +Cc: autofs Ian Kent wrote: > btw, I discovered a mistake in one of the patches included in the > previous patch. I'll post an updated patch later today. Sorry for the > trouble. > No problem, it has not killed the node and it's just one node right now. >> (1) Shall I try the nfs list at sourceforge or is that list only full of >> spam? > > Yep, for sure, the NFS maintainer in present on that list, hopefully he > will be able to help. > OK >> Which I need to umount manually now, remove the empty directories under >> /atlas/node before I could restart autofs. > > Check if /etc/mtab is out of sync with /proc/mounts when you see this. > yes it is > If so then your mount(8) mtab locking is broken otherwise it's something > else and, rather than try and dig up v4 patches, I'd recommend v5. I > haven't been able to completely resolve this in v5 yet but it is much > better so you will need to see how it goes. OK, thanks! Carsten -- Dr. Carsten Aulbert - Max Planck Institut für Gravitationsphysik Callinstraße 38, 30167 Hannover, Germany Fon: +49 511 762 17185, Fax: +49 511 762 17193 http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31 _______________________________________________ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-07-01 6:03 ` Ian Kent 2008-07-01 6:09 ` Carsten Aulbert @ 2008-07-15 14:23 ` Steffen Grunewald 2008-07-17 3:03 ` Ian Kent 1 sibling, 1 reply; 20+ messages in thread From: Steffen Grunewald @ 2008-07-15 14:23 UTC (permalink / raw) To: Ian Kent; +Cc: autofs On Tue, Jul 01, 2008 at 02:03:23PM +0800, Ian Kent wrote: > > btw, I discovered a mistake in one of the patches included in the > previous patch. I'll post an updated patch later today. Sorry for the > trouble. Now that 2.6.26 is out, how many of your patches are left over (and would still be recommended) ? Cheers Steffen -- Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/ * e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298} No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-07-15 14:23 ` Steffen Grunewald @ 2008-07-17 3:03 ` Ian Kent 0 siblings, 0 replies; 20+ messages in thread From: Ian Kent @ 2008-07-17 3:03 UTC (permalink / raw) To: Steffen Grunewald; +Cc: autofs On Tue, 2008-07-15 at 16:23 +0200, Steffen Grunewald wrote: > On Tue, Jul 01, 2008 at 02:03:23PM +0800, Ian Kent wrote: > > > > btw, I discovered a mistake in one of the patches included in the > > previous patch. I'll post an updated patch later today. Sorry for the > > trouble. > > Now that 2.6.26 is out, how many of your patches are left over (and > would still be recommended) ? Most of the patches I've recently posted are still in the mm tree as they were way too late for the 2.6.26 merge window. I also have another 5 or so maintenance/bug fix patches I'm testing. Hopefully, this will be done soon and I can release 5.0.4 with updated patches. I can put patches for point release kernels on kernel.org if that is needed. Version 5.0.4 is quite important to me because, even though I think this every time I do a release, this one fixes some fairly important bugs. Ian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 14:32 ` Carsten Aulbert 2008-06-30 14:41 ` Ian Kent @ 2008-06-30 14:57 ` Steffen Grunewald 2008-06-30 15:01 ` Ian Kent 2008-06-30 15:33 ` Jan Christoph Nordholz 1 sibling, 2 replies; 20+ messages in thread From: Steffen Grunewald @ 2008-06-30 14:57 UTC (permalink / raw) To: Carsten Aulbert; +Cc: autofs, Ian Kent On Mon, Jun 30, 2008 at 04:32:17PM +0200, Carsten Aulbert wrote: > > http://packages.debian.org/search?keywords=autofs5&searchon=names&suite=all§ion=all > > So, I don't know how much effort it would be to backport them to glibc > 2.3 instead of 2.7 and so on... Jan Christoph Nordholz <hesso@pool.math.tu-berlin.de> (who is listed as maintainer of the package) might be able to tell. From a first glance, the package build-depends on debhelper >=6 - something that isn't easily ignored. OTOH, since autofs5 has been there for so long, there might exist some debian/ stuff that would match Etch. Should I ask him? > Might be worth a try, if you say it's stable enough ;) Let's hope so. Except for quite some patches names *UPSTREAM*, there isn't too much in the .diff.gz ... Steffen ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 14:57 ` Steffen Grunewald @ 2008-06-30 15:01 ` Ian Kent 2008-06-30 15:33 ` Jan Christoph Nordholz 1 sibling, 0 replies; 20+ messages in thread From: Ian Kent @ 2008-06-30 15:01 UTC (permalink / raw) To: Steffen Grunewald; +Cc: autofs On Mon, 2008-06-30 at 16:57 +0200, Steffen Grunewald wrote: > On Mon, Jun 30, 2008 at 04:32:17PM +0200, Carsten Aulbert wrote: > > > > http://packages.debian.org/search?keywords=autofs5&searchon=names&suite=all§ion=all 5.0.3 .... that's good it's quite new. There were very few patches for it until recently. > > > > So, I don't know how much effort it would be to backport them to glibc > > 2.3 instead of 2.7 and so on... > > Jan Christoph Nordholz <hesso@pool.math.tu-berlin.de> (who is listed as > maintainer of the package) might be able to tell. > > From a first glance, the package build-depends on debhelper >=6 - something > that isn't easily ignored. OTOH, since autofs5 has been there for so long, > there might exist some debian/ stuff that would match Etch. Should I ask him? > > > Might be worth a try, if you say it's stable enough ;) > > Let's hope so. > Except for quite some patches names *UPSTREAM*, there isn't too much in the > .diff.gz ... > > Steffen ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 14:57 ` Steffen Grunewald 2008-06-30 15:01 ` Ian Kent @ 2008-06-30 15:33 ` Jan Christoph Nordholz 2008-06-30 16:57 ` Lukas Kolbe 1 sibling, 1 reply; 20+ messages in thread From: Jan Christoph Nordholz @ 2008-06-30 15:33 UTC (permalink / raw) To: autofs Hi, > > http://packages.debian.org/search?keywords=autofs5&searchon=names&suite=all§ion=all > > > > So, I don't know how much effort it would be to backport them to glibc > > 2.3 instead of 2.7 and so on... > > Jan Christoph Nordholz <hesso@pool.math.tu-berlin.de> (who is listed as > maintainer of the package) might be able to tell. I don't think that it's very problematic to backport the autofs5 package to Etch. I'm working on it. (Thanks to Steffen for mailing me directly - I follow this mailinglist, but sometimes I lag behind a bit.) Regards, Jan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 15:33 ` Jan Christoph Nordholz @ 2008-06-30 16:57 ` Lukas Kolbe 0 siblings, 0 replies; 20+ messages in thread From: Lukas Kolbe @ 2008-06-30 16:57 UTC (permalink / raw) To: Jan Christoph Nordholz; +Cc: autofs Hi! > > > http://packages.debian.org/search?keywords=autofs5&searchon=names&suite=all§ion=all > > > > > > So, I don't know how much effort it would be to backport them to glibc > > > 2.3 instead of 2.7 and so on... > > > > Jan Christoph Nordholz <hesso@pool.math.tu-berlin.de> (who is listed as > > maintainer of the package) might be able to tell. > > I don't think that it's very problematic to backport the autofs5 package to Etch. > I'm working on it. (Thanks to Steffen for mailing me directly - I follow this > mailinglist, but sometimes I lag behind a bit.) I don't know if this has changed with the latest patches, but last time I tried autofs5 used a symbol from openldap that was not in etch's openldap (LDAP_CONTROL_PAGEDRESULTS), which could be fixed by patching that symbol into the autofs-source, but since then it's crashing on us every now and then. Sorry to be so vague, but I didn't have time to debug the crash any further. Other than that, autofs5 is order magnitudes better and more reliable than 4 ever was. Seriously. > Regards, > > Jan -- Lukas ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: More autofs problems 2008-06-30 13:18 ` Carsten Aulbert 2008-06-30 13:26 ` Ian Kent @ 2008-06-30 13:45 ` Steffen Grunewald 1 sibling, 0 replies; 20+ messages in thread From: Steffen Grunewald @ 2008-06-30 13:45 UTC (permalink / raw) To: Carsten Aulbert; +Cc: autofs, Ian Kent On Mon, Jun 30, 2008 at 03:18:52PM +0200, Carsten Aulbert wrote: > Hi Ian, > > Ian Kent wrote: > > > > This patch is a combined diff of the current bug fixes for autofs4, > > against 2.6.24 (it applied OK to vanilla 2.6.24.4). > > It applies fine and we are about to test it on a single test node now. > If that succeeds we will push it onto the cluster. It will also apply to 2.6.25.9 (with minor offsets). Any estimate when the patch will make it into the mainstream kernel? > Keeping my fingers crossed Good luck. (If you need the 25.9 kernel package, you know where to get it :) Steffen ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-07-17 3:03 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-30 8:08 More autofs problems Carsten Aulbert 2008-06-30 8:47 ` Ian Kent 2008-06-30 10:08 ` Carsten Aulbert 2008-06-30 10:35 ` Ian Kent 2008-06-30 13:18 ` Carsten Aulbert 2008-06-30 13:26 ` Ian Kent 2008-06-30 14:15 ` Carsten Aulbert 2008-06-30 14:20 ` Ian Kent 2008-06-30 14:32 ` Carsten Aulbert 2008-06-30 14:41 ` Ian Kent 2008-07-01 5:49 ` Carsten Aulbert 2008-07-01 6:03 ` Ian Kent 2008-07-01 6:09 ` Carsten Aulbert 2008-07-15 14:23 ` Steffen Grunewald 2008-07-17 3:03 ` Ian Kent 2008-06-30 14:57 ` Steffen Grunewald 2008-06-30 15:01 ` Ian Kent 2008-06-30 15:33 ` Jan Christoph Nordholz 2008-06-30 16:57 ` Lukas Kolbe 2008-06-30 13:45 ` Steffen Grunewald
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.