* Connectathon locking test fails over NFSv3 with EBUSY @ 2010-06-22 19:03 Chuck Lever 2010-06-22 19:17 ` Trond Myklebust 0 siblings, 1 reply; 8+ messages in thread From: Chuck Lever @ 2010-06-22 19:03 UTC (permalink / raw) To: Trond Myklebust; +Cc: NFSv3 list It looks like the connectathon tests race with the removal of deleted files. The actual lock test is successful, but when the scripts attempt to reset the test directory for another pass, the RMDIR fails because the directory is full of ".nfsxxx" files. Seems like RMDIR should wait for those silly deletes before trying to remove the parent directory. I've seen this with both 2.6.34 and 2.6.35-rc3 clients, and it happens nearly every time. Test #15 - Test 2nd open and I/O after lock and close. Parent: Second open succeeded. Parent: 15.0 - F_LOCK [ 0, ENDING] PASSED. Parent: 15.1 - F_ULOCK [ 0, ENDING] PASSED. Parent: Closed testfile. Parent: Wrote 'abcdefghij' to testfile [ 0, 11 ]. Parent: Read 'abcdefghij' from testfile [ 0, 11 ]. Parent: 15.2 - COMPARE [ 0, b] PASSED. ** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total). ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total). Congratulations, you passed the locking tests! ... Pass 2 ... rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000d8e00000041': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000df100000050': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000dfb0000004a': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000dec00000047': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000df90000004b': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000dfa0000004e': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000df80000004f': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000df20000004c': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000deb00000051': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000def00000048': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000dea0000004d': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000de900000049': Device or resource busy Starting BASIC tests: test directory /mnt/klimt/ellison.test (arg: -t) mkdir: cannot create directory `/mnt/klimt/ellison.test': File exists ./test1: File and directory creation test rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000d8e00000041': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000df100000050': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000dfb0000004a': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000dec00000047': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000df90000004b': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000dfa0000004e': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000df80000004f': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000df20000004c': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000deb00000051': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000def00000048': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000dea0000004d': Device or resource busy rm: cannot remove `/mnt/klimt/ellison.test/.nfs0000000000000de900000049': Device or resource busy ./test1: (/home/cel/src/cthon04/basic) can't remove old test directory /mnt/klimt/ellison.test basic tests failed Tests failed, leaving /mnt/klimt mounted [cel@ellison cthon04]$ -- Chuck Lever ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Connectathon locking test fails over NFSv3 with EBUSY 2010-06-22 19:03 Connectathon locking test fails over NFSv3 with EBUSY Chuck Lever @ 2010-06-22 19:17 ` Trond Myklebust 2010-06-23 17:51 ` Chuck Lever 0 siblings, 1 reply; 8+ messages in thread From: Trond Myklebust @ 2010-06-22 19:17 UTC (permalink / raw) To: Chuck Lever; +Cc: NFSv3 list On Tue, 2010-06-22 at 15:03 -0400, Chuck Lever wrote: > It looks like the connectathon tests race with the removal of deleted > files. The actual lock test is successful, but when the scripts attempt > to reset the test directory for another pass, the RMDIR fails because > the directory is full of ".nfsxxx" files. > > Seems like RMDIR should wait for those silly deletes before trying to > remove the parent directory. > > I've seen this with both 2.6.34 and 2.6.35-rc3 clients, and it happens > nearly every time. > > > Test #15 - Test 2nd open and I/O after lock and close. > Parent: Second open succeeded. > Parent: 15.0 - F_LOCK [ 0, ENDING] PASSED. > Parent: 15.1 - F_ULOCK [ 0, ENDING] PASSED. > Parent: Closed testfile. > Parent: Wrote 'abcdefghij' to testfile [ 0, 11 ]. > Parent: Read 'abcdefghij' from testfile [ 0, 11 ]. > Parent: 15.2 - COMPARE [ 0, b] PASSED. > > ** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total). > > ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total). > Congratulations, you passed the locking tests! > ... Pass 2 ... Err... Any idea what kind of operations are causing the sillyrename to happen? The locking tests in particular should _never_ have any outstanding operations post-ULOCK. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Connectathon locking test fails over NFSv3 with EBUSY 2010-06-22 19:17 ` Trond Myklebust @ 2010-06-23 17:51 ` Chuck Lever 2010-06-23 18:06 ` Staubach_Peter 0 siblings, 1 reply; 8+ messages in thread From: Chuck Lever @ 2010-06-23 17:51 UTC (permalink / raw) To: Trond Myklebust; +Cc: NFSv3 list On 06/22/10 03:17 PM, Trond Myklebust wrote: > On Tue, 2010-06-22 at 15:03 -0400, Chuck Lever wrote: >> It looks like the connectathon tests race with the removal of deleted >> files. The actual lock test is successful, but when the scripts attempt >> to reset the test directory for another pass, the RMDIR fails because >> the directory is full of ".nfsxxx" files. >> >> Seems like RMDIR should wait for those silly deletes before trying to >> remove the parent directory. >> >> I've seen this with both 2.6.34 and 2.6.35-rc3 clients, and it happens >> nearly every time. >> >> >> Test #15 - Test 2nd open and I/O after lock and close. >> Parent: Second open succeeded. >> Parent: 15.0 - F_LOCK [ 0, ENDING] PASSED. >> Parent: 15.1 - F_ULOCK [ 0, ENDING] PASSED. >> Parent: Closed testfile. >> Parent: Wrote 'abcdefghij' to testfile [ 0, 11 ]. >> Parent: Read 'abcdefghij' from testfile [ 0, 11 ]. >> Parent: 15.2 - COMPARE [ 0, b] PASSED. >> >> ** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total). >> >> ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total). >> Congratulations, you passed the locking tests! >> ... Pass 2 ... > > Err... Any idea what kind of operations are causing the sillyrename to > happen? The locking tests in particular should _never_ have any > outstanding operations post-ULOCK. I've reproduced this by running several passes of all of the tests ("./server -a -N10") while oprofile is running. Without oprofile running this seems to be nearly impossible to reproduce. When a pass finishes, the RMDIR of the test directory fails because there are .nfsxxx files left in the directory. These .nfsxxx files are not eventually removed, they stay after the test fails. Looking at the network trace, I see the RENAME that creates the files but no REMOVE is issued for these files. Somehow, the client is forgetting to remove them. There are plenty of proper RENAME/REMOVE pairs in the trace, so maybe this is a race condition. I found the RENAMEs in the network trace for all the remaining .nfsxxx files. The names are: op_unlk, stat, op_ren, op_chmod, dupreq, excltest, negseek, rename, holey, truncate, nfsidem, rewind, telldir, bigfile, bigfile2, freesp These look like files created during the special tests. ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Connectathon locking test fails over NFSv3 with EBUSY 2010-06-23 17:51 ` Chuck Lever @ 2010-06-23 18:06 ` Staubach_Peter [not found] ` <BF3BB6D12298F54B89C8DCC1E4073D80017545B4-1Zg0zMUlrbd9m/dOYFj4Yjjd7nCn89gW@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Staubach_Peter @ 2010-06-23 18:06 UTC (permalink / raw) To: chuck.lever, Trond.Myklebust; +Cc: linux-nfs UGVyaGFwcyB0aGUgb3Byb2ZpbGUgc3VwcG9ydCBpcyByZXRhaW5pbmcgYW4gYWRkaXRpb25hbCBy ZWZlcmVuY2UgdG8gdGhlIGluLWNvcmUNCmlub2RlIHdoaWNoIGlzIGNhdXNpbmcgdGhlIC5uZnNY WFhYIGZpbGVzIHRvIGdldCBjcmVhdGVkIGFuZCBpcyBhbHNvIGRlbGF5aW5nIHRoZWlyDQpyZW1v dmFsPw0KDQoJCXBzDQoNCg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCkZyb206IGxpbnV4 LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpsaW51eC1uZnMtb3duZXJAdmdlci5r ZXJuZWwub3JnXSBPbiBCZWhhbGYgT2YgQ2h1Y2sgTGV2ZXINClNlbnQ6IFdlZG5lc2RheSwgSnVu ZSAyMywgMjAxMCAxOjUxIFBNDQpUbzogVHJvbmQgTXlrbGVidXN0DQpDYzogTkZTdjMgbGlzdA0K U3ViamVjdDogUmU6IENvbm5lY3RhdGhvbiBsb2NraW5nIHRlc3QgZmFpbHMgb3ZlciBORlN2MyB3 aXRoIEVCVVNZDQoNCk9uIDA2LzIyLzEwIDAzOjE3IFBNLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6 DQo+IE9uIFR1ZSwgMjAxMC0wNi0yMiBhdCAxNTowMyAtMDQwMCwgQ2h1Y2sgTGV2ZXIgd3JvdGU6 DQo+PiBJdCBsb29rcyBsaWtlIHRoZSBjb25uZWN0YXRob24gdGVzdHMgcmFjZSB3aXRoIHRoZSBy ZW1vdmFsIG9mIGRlbGV0ZWQNCj4+IGZpbGVzLiAgVGhlIGFjdHVhbCBsb2NrIHRlc3QgaXMgc3Vj Y2Vzc2Z1bCwgYnV0IHdoZW4gdGhlIHNjcmlwdHMgYXR0ZW1wdA0KPj4gdG8gcmVzZXQgdGhlIHRl c3QgZGlyZWN0b3J5IGZvciBhbm90aGVyIHBhc3MsIHRoZSBSTURJUiBmYWlscyBiZWNhdXNlDQo+ PiB0aGUgZGlyZWN0b3J5IGlzIGZ1bGwgb2YgIi5uZnN4eHgiIGZpbGVzLg0KPj4NCj4+IFNlZW1z IGxpa2UgUk1ESVIgc2hvdWxkIHdhaXQgZm9yIHRob3NlIHNpbGx5IGRlbGV0ZXMgYmVmb3JlIHRy eWluZyB0bw0KPj4gcmVtb3ZlIHRoZSBwYXJlbnQgZGlyZWN0b3J5Lg0KPj4NCj4+IEkndmUgc2Vl biB0aGlzIHdpdGggYm90aCAyLjYuMzQgYW5kIDIuNi4zNS1yYzMgY2xpZW50cywgYW5kIGl0IGhh cHBlbnMNCj4+IG5lYXJseSBldmVyeSB0aW1lLg0KPj4NCj4+DQo+PiBUZXN0ICMxNSAtIFRlc3Qg Mm5kIG9wZW4gYW5kIEkvTyBhZnRlciBsb2NrIGFuZCBjbG9zZS4NCj4+IAlQYXJlbnQ6IFNlY29u ZCBvcGVuIHN1Y2NlZWRlZC4NCj4+IAlQYXJlbnQ6IDE1LjAgIC0gRl9MT0NLICBbICAgICAgICAg ICAgICAgMCwgICAgICAgICAgRU5ESU5HXSBQQVNTRUQuDQo+PiAJUGFyZW50OiAxNS4xICAtIEZf VUxPQ0sgWyAgICAgICAgICAgICAgIDAsICAgICAgICAgIEVORElOR10gUEFTU0VELg0KPj4gCVBh cmVudDogQ2xvc2VkIHRlc3RmaWxlLg0KPj4gCVBhcmVudDogV3JvdGUgJ2FiY2RlZmdoaWonIHRv IHRlc3RmaWxlIFsgMCwgMTEgXS4NCj4+IAlQYXJlbnQ6IFJlYWQgJ2FiY2RlZmdoaWonIGZyb20g dGVzdGZpbGUgWyAwLCAxMSBdLg0KPj4gCVBhcmVudDogMTUuMiAgLSBDT01QQVJFIFsgICAgICAg ICAgICAgICAwLCAgICAgICAgICAgICAgIGJdIFBBU1NFRC4NCj4+DQo+PiAqKiBQQVJFTlQgcGFz cyAxIHJlc3VsdHM6IDQ5LzQ5IHBhc3MsIDEvMSB3YXJuLCAwLzAgZmFpbCAocGFzcy90b3RhbCku DQo+Pg0KPj4gKiogIENISUxEIHBhc3MgMSByZXN1bHRzOiA2NC82NCBwYXNzLCAwLzAgd2Fybiwg MC8wIGZhaWwgKHBhc3MvdG90YWwpLg0KPj4gQ29uZ3JhdHVsYXRpb25zLCB5b3UgcGFzc2VkIHRo ZSBsb2NraW5nIHRlc3RzIQ0KPj4gLi4uIFBhc3MgMiAuLi4NCj4NCj4gRXJyLi4uIEFueSBpZGVh IHdoYXQga2luZCBvZiBvcGVyYXRpb25zIGFyZSBjYXVzaW5nIHRoZSBzaWxseXJlbmFtZSB0bw0K PiBoYXBwZW4/IFRoZSBsb2NraW5nIHRlc3RzIGluIHBhcnRpY3VsYXIgc2hvdWxkIF9uZXZlcl8g aGF2ZSBhbnkNCj4gb3V0c3RhbmRpbmcgb3BlcmF0aW9ucyBwb3N0LVVMT0NLLg0KDQpJJ3ZlIHJl cHJvZHVjZWQgdGhpcyBieSBydW5uaW5nIHNldmVyYWwgcGFzc2VzIG9mIGFsbCBvZiB0aGUgdGVz dHMgDQooIi4vc2VydmVyIC1hIC1OMTAiKSB3aGlsZSBvcHJvZmlsZSBpcyBydW5uaW5nLiAgV2l0 aG91dCBvcHJvZmlsZSANCnJ1bm5pbmcgdGhpcyBzZWVtcyB0byBiZSBuZWFybHkgaW1wb3NzaWJs ZSB0byByZXByb2R1Y2UuDQoNCldoZW4gYSBwYXNzIGZpbmlzaGVzLCB0aGUgUk1ESVIgb2YgdGhl IHRlc3QgZGlyZWN0b3J5IGZhaWxzIGJlY2F1c2UgDQp0aGVyZSBhcmUgLm5mc3h4eCBmaWxlcyBs ZWZ0IGluIHRoZSBkaXJlY3RvcnkuICBUaGVzZSAubmZzeHh4IGZpbGVzIGFyZSANCm5vdCBldmVu dHVhbGx5IHJlbW92ZWQsIHRoZXkgc3RheSBhZnRlciB0aGUgdGVzdCBmYWlscy4NCg0KTG9va2lu ZyBhdCB0aGUgbmV0d29yayB0cmFjZSwgSSBzZWUgdGhlIFJFTkFNRSB0aGF0IGNyZWF0ZXMgdGhl IGZpbGVzIA0KYnV0IG5vIFJFTU9WRSBpcyBpc3N1ZWQgZm9yIHRoZXNlIGZpbGVzLiAgU29tZWhv dywgdGhlIGNsaWVudCBpcyANCmZvcmdldHRpbmcgdG8gcmVtb3ZlIHRoZW0uICBUaGVyZSBhcmUg cGxlbnR5IG9mIHByb3BlciBSRU5BTUUvUkVNT1ZFIA0KcGFpcnMgaW4gdGhlIHRyYWNlLCBzbyBt YXliZSB0aGlzIGlzIGEgcmFjZSBjb25kaXRpb24uDQoNCkkgZm91bmQgdGhlIFJFTkFNRXMgaW4g dGhlIG5ldHdvcmsgdHJhY2UgZm9yIGFsbCB0aGUgcmVtYWluaW5nIC5uZnN4eHggDQpmaWxlcy4g IFRoZSBuYW1lcyBhcmU6DQoNCm9wX3VubGssIHN0YXQsIG9wX3Jlbiwgb3BfY2htb2QsIGR1cHJl cSwgZXhjbHRlc3QsIG5lZ3NlZWssIHJlbmFtZSwgDQpob2xleSwgdHJ1bmNhdGUsIG5mc2lkZW0s IHJld2luZCwgdGVsbGRpciwgYmlnZmlsZSwgYmlnZmlsZTIsIGZyZWVzcA0KDQpUaGVzZSBsb29r IGxpa2UgZmlsZXMgY3JlYXRlZCBkdXJpbmcgdGhlIHNwZWNpYWwgdGVzdHMuDQotLQ0KVG8gdW5z dWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4 LW5mcyIgaW4NCnRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwu b3JnDQpNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9y ZG9tby1pbmZvLmh0bWwNCg0K ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <BF3BB6D12298F54B89C8DCC1E4073D80017545B4-1Zg0zMUlrbd9m/dOYFj4Yjjd7nCn89gW@public.gmane.org>]
* RE: Connectathon locking test fails over NFSv3 with EBUSY [not found] ` <BF3BB6D12298F54B89C8DCC1E4073D80017545B4-1Zg0zMUlrbd9m/dOYFj4Yjjd7nCn89gW@public.gmane.org> @ 2010-06-23 18:43 ` Trond Myklebust 2010-06-23 19:17 ` Chuck Lever 1 sibling, 0 replies; 8+ messages in thread From: Trond Myklebust @ 2010-06-23 18:43 UTC (permalink / raw) To: Staubach_Peter; +Cc: chuck.lever, linux-nfs On Wed, 2010-06-23 at 14:06 -0400, Staubach_Peter@emc.com wrote: > Perhaps the oprofile support is retaining an additional reference to the in-core > inode which is causing the .nfsXXXX files to get created and is also delaying their > removal? Could the files actually be temporary files that are being created by oprofile itself? I must admit that I have little experience with running oprofile... Cheers Trond > -----Original Message----- > From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Chuck Lever > Sent: Wednesday, June 23, 2010 1:51 PM > To: Trond Myklebust > Cc: NFSv3 list > Subject: Re: Connectathon locking test fails over NFSv3 with EBUSY > > On 06/22/10 03:17 PM, Trond Myklebust wrote: > > On Tue, 2010-06-22 at 15:03 -0400, Chuck Lever wrote: > >> It looks like the connectathon tests race with the removal of deleted > >> files. The actual lock test is successful, but when the scripts attempt > >> to reset the test directory for another pass, the RMDIR fails because > >> the directory is full of ".nfsxxx" files. > >> > >> Seems like RMDIR should wait for those silly deletes before trying to > >> remove the parent directory. > >> > >> I've seen this with both 2.6.34 and 2.6.35-rc3 clients, and it happens > >> nearly every time. > >> > >> > >> Test #15 - Test 2nd open and I/O after lock and close. > >> Parent: Second open succeeded. > >> Parent: 15.0 - F_LOCK [ 0, ENDING] PASSED. > >> Parent: 15.1 - F_ULOCK [ 0, ENDING] PASSED. > >> Parent: Closed testfile. > >> Parent: Wrote 'abcdefghij' to testfile [ 0, 11 ]. > >> Parent: Read 'abcdefghij' from testfile [ 0, 11 ]. > >> Parent: 15.2 - COMPARE [ 0, b] PASSED. > >> > >> ** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total). > >> > >> ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total). > >> Congratulations, you passed the locking tests! > >> ... Pass 2 ... > > > > Err... Any idea what kind of operations are causing the sillyrename to > > happen? The locking tests in particular should _never_ have any > > outstanding operations post-ULOCK. > > I've reproduced this by running several passes of all of the tests > ("./server -a -N10") while oprofile is running. Without oprofile > running this seems to be nearly impossible to reproduce. > > When a pass finishes, the RMDIR of the test directory fails because > there are .nfsxxx files left in the directory. These .nfsxxx files are > not eventually removed, they stay after the test fails. > > Looking at the network trace, I see the RENAME that creates the files > but no REMOVE is issued for these files. Somehow, the client is > forgetting to remove them. There are plenty of proper RENAME/REMOVE > pairs in the trace, so maybe this is a race condition. > > I found the RENAMEs in the network trace for all the remaining .nfsxxx > files. The names are: > > op_unlk, stat, op_ren, op_chmod, dupreq, excltest, negseek, rename, > holey, truncate, nfsidem, rewind, telldir, bigfile, bigfile2, freesp > > These look like files created during the special tests. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Connectathon locking test fails over NFSv3 with EBUSY [not found] ` <BF3BB6D12298F54B89C8DCC1E4073D80017545B4-1Zg0zMUlrbd9m/dOYFj4Yjjd7nCn89gW@public.gmane.org> 2010-06-23 18:43 ` Trond Myklebust @ 2010-06-23 19:17 ` Chuck Lever 2010-06-23 19:26 ` Trond Myklebust 1 sibling, 1 reply; 8+ messages in thread From: Chuck Lever @ 2010-06-23 19:17 UTC (permalink / raw) To: Staubach_Peter; +Cc: Trond.Myklebust, linux-nfs On 06/23/10 02:06 PM, Staubach_Peter@emc.com wrote: > Perhaps the oprofile support is retaining an additional reference to the in-core > inode which is causing the .nfsXXXX files to get created and is also delaying their > removal? The files do not appear in oprofiled's fd list (in /proc). Killing the oprofiled process after the test finishes does make those files go away. Just shutting down the profiler leaves oprofiled, so additionally killing the daemon appears to be necessary to finish the silly removal process. These files are all executables (part of the connectathon suite), but I don't have the "profile user space binaries" checkbox selected. > -----Original Message----- > From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Chuck Lever > Sent: Wednesday, June 23, 2010 1:51 PM > To: Trond Myklebust > Cc: NFSv3 list > Subject: Re: Connectathon locking test fails over NFSv3 with EBUSY > > On 06/22/10 03:17 PM, Trond Myklebust wrote: >> On Tue, 2010-06-22 at 15:03 -0400, Chuck Lever wrote: >>> It looks like the connectathon tests race with the removal of deleted >>> files. The actual lock test is successful, but when the scripts attempt >>> to reset the test directory for another pass, the RMDIR fails because >>> the directory is full of ".nfsxxx" files. >>> >>> Seems like RMDIR should wait for those silly deletes before trying to >>> remove the parent directory. >>> >>> I've seen this with both 2.6.34 and 2.6.35-rc3 clients, and it happens >>> nearly every time. >>> >>> >>> Test #15 - Test 2nd open and I/O after lock and close. >>> Parent: Second open succeeded. >>> Parent: 15.0 - F_LOCK [ 0, ENDING] PASSED. >>> Parent: 15.1 - F_ULOCK [ 0, ENDING] PASSED. >>> Parent: Closed testfile. >>> Parent: Wrote 'abcdefghij' to testfile [ 0, 11 ]. >>> Parent: Read 'abcdefghij' from testfile [ 0, 11 ]. >>> Parent: 15.2 - COMPARE [ 0, b] PASSED. >>> >>> ** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total). >>> >>> ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total). >>> Congratulations, you passed the locking tests! >>> ... Pass 2 ... >> >> Err... Any idea what kind of operations are causing the sillyrename to >> happen? The locking tests in particular should _never_ have any >> outstanding operations post-ULOCK. > > I've reproduced this by running several passes of all of the tests > ("./server -a -N10") while oprofile is running. Without oprofile > running this seems to be nearly impossible to reproduce. > > When a pass finishes, the RMDIR of the test directory fails because > there are .nfsxxx files left in the directory. These .nfsxxx files are > not eventually removed, they stay after the test fails. > > Looking at the network trace, I see the RENAME that creates the files > but no REMOVE is issued for these files. Somehow, the client is > forgetting to remove them. There are plenty of proper RENAME/REMOVE > pairs in the trace, so maybe this is a race condition. > > I found the RENAMEs in the network trace for all the remaining .nfsxxx > files. The names are: > > op_unlk, stat, op_ren, op_chmod, dupreq, excltest, negseek, rename, > holey, truncate, nfsidem, rewind, telldir, bigfile, bigfile2, freesp > > These look like files created during the special tests. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Connectathon locking test fails over NFSv3 with EBUSY 2010-06-23 19:17 ` Chuck Lever @ 2010-06-23 19:26 ` Trond Myklebust [not found] ` <1277321217.4991.64.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Trond Myklebust @ 2010-06-23 19:26 UTC (permalink / raw) To: Chuck Lever; +Cc: Staubach_Peter, linux-nfs On Wed, 2010-06-23 at 15:17 -0400, Chuck Lever wrote: > On 06/23/10 02:06 PM, Staubach_Peter@emc.com wrote: > > Perhaps the oprofile support is retaining an additional reference to the in-core > > inode which is causing the .nfsXXXX files to get created and is also delaying their > > removal? > > The files do not appear in oprofiled's fd list (in /proc). Killing the > oprofiled process after the test finishes does make those files go away. > Just shutting down the profiler leaves oprofiled, so additionally > killing the daemon appears to be necessary to finish the silly removal > process. > > These files are all executables (part of the connectathon suite), but I > don't have the "profile user space binaries" checkbox selected. OK. That makes more sense... Do these files perhaps appear in the /proc/<pid>/maps and/or /proc/<pid>/smaps pseudofile for oprofiled? Cheers Trond ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1277321217.4991.64.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: Connectathon locking test fails over NFSv3 with EBUSY [not found] ` <1277321217.4991.64.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2010-06-23 19:55 ` Chuck Lever 0 siblings, 0 replies; 8+ messages in thread From: Chuck Lever @ 2010-06-23 19:55 UTC (permalink / raw) To: Trond Myklebust; +Cc: Staubach_Peter, linux-nfs On 06/23/10 03:26 PM, Trond Myklebust wrote: > On Wed, 2010-06-23 at 15:17 -0400, Chuck Lever wrote: >> On 06/23/10 02:06 PM, Staubach_Peter@emc.com wrote: >>> Perhaps the oprofile support is retaining an additional reference to the in-core >>> inode which is causing the .nfsXXXX files to get created and is also delaying their >>> removal? >> >> The files do not appear in oprofiled's fd list (in /proc). Killing the >> oprofiled process after the test finishes does make those files go away. >> Just shutting down the profiler leaves oprofiled, so additionally >> killing the daemon appears to be necessary to finish the silly removal >> process. >> >> These files are all executables (part of the connectathon suite), but I >> don't have the "profile user space binaries" checkbox selected. > > OK. That makes more sense... Do these files perhaps appear in > the /proc/<pid>/maps and/or /proc/<pid>/smaps pseudofile for oprofiled? I don't see anything suspicious in those files. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-06-23 19:56 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-22 19:03 Connectathon locking test fails over NFSv3 with EBUSY Chuck Lever
2010-06-22 19:17 ` Trond Myklebust
2010-06-23 17:51 ` Chuck Lever
2010-06-23 18:06 ` Staubach_Peter
[not found] ` <BF3BB6D12298F54B89C8DCC1E4073D80017545B4-1Zg0zMUlrbd9m/dOYFj4Yjjd7nCn89gW@public.gmane.org>
2010-06-23 18:43 ` Trond Myklebust
2010-06-23 19:17 ` Chuck Lever
2010-06-23 19:26 ` Trond Myklebust
[not found] ` <1277321217.4991.64.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2010-06-23 19:55 ` Chuck Lever
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.