* CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS @ 2016-08-11 15:23 Jeff Layton 2016-08-11 15:55 ` Trond Myklebust 2018-01-27 15:39 ` Benjamin Coddington 0 siblings, 2 replies; 10+ messages in thread From: Jeff Layton @ 2016-08-11 15:23 UTC (permalink / raw) To: open list:NFS, SUNRPC, AND...; +Cc: Tom Haynes, Christoph Hellwig, Bruce Fields I was playing around with the in-kernel flexfiles server today, and I seem to be hitting a deadlock when using it on an XFS-exported filesystem. Here's the stack trace of how the CB_LAYOUTRECALL occurs: [ 928.736139] CPU: 0 PID: 846 Comm: nfsd Tainted: G OE 4.8.0-rc1+ #3 [ 928.737040] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 [ 928.738009] 0000000000000286 000000006125f50e ffff91153845b878 ffffffff8f463853 [ 928.738906] ffff91152ec194d0 ffff91152d31d9c0 ffff91153845b8a8 ffffffffc045936f [ 928.739788] ffff91152c051980 ffff91152d31d9c0 ffff91152c051540 ffff9115361b8a58 [ 928.740697] Call Trace: [ 928.740998] [<ffffffff8f463853>] dump_stack+0x86/0xc3 [ 928.741570] [<ffffffffc045936f>] nfsd4_recall_file_layout+0x17f/0x190 [nfsd] [ 928.742380] [<ffffffffc045939d>] nfsd4_layout_lm_break+0x1d/0x30 [nfsd] [ 928.743115] [<ffffffff8f3056d8>] __break_lease+0x118/0x6a0 [ 928.743759] [<ffffffffc02dea69>] xfs_break_layouts+0x79/0x120 [xfs] [ 928.744462] [<ffffffffc029ea04>] xfs_file_aio_write_checks+0x94/0x1f0 [xfs] [ 928.745251] [<ffffffffc029f36b>] xfs_file_buffered_aio_write+0x7b/0x330 [xfs] [ 928.746063] [<ffffffffc029f70c>] xfs_file_write_iter+0xec/0x140 [xfs] [ 928.746803] [<ffffffff8f2a0599>] do_iter_readv_writev+0xb9/0x140 [ 928.747478] [<ffffffff8f2a126b>] do_readv_writev+0x19b/0x240 [ 928.748146] [<ffffffffc029f620>] ? xfs_file_buffered_aio_write+0x330/0x330 [xfs] [ 928.748956] [<ffffffff8f29e02b>] ? do_dentry_open+0x28b/0x310 [ 928.749614] [<ffffffffc029c800>] ? xfs_extent_busy_ag_cmp+0x20/0x20 [xfs] [ 928.750367] [<ffffffff8f2a156f>] vfs_writev+0x3f/0x50 [ 928.750934] [<ffffffffc04276ca>] nfsd_vfs_write+0xca/0x3a0 [nfsd] [ 928.751608] [<ffffffffc0429ec5>] nfsd_write+0x485/0x780 [nfsd] [ 928.752263] [<ffffffffc043144c>] nfsd3_proc_write+0xbc/0x150 [nfsd] [ 928.752973] [<ffffffffc0421388>] nfsd_dispatch+0xb8/0x1f0 [nfsd] [ 928.753642] [<ffffffffc036d78f>] svc_process_common+0x42f/0x690 [sunrpc] [ 928.754395] [<ffffffffc036e8e8>] svc_process+0x118/0x330 [sunrpc] [ 928.755080] [<ffffffffc04208ac>] nfsd+0x19c/0x2b0 [nfsd] [ 928.755681] [<ffffffffc0420715>] ? nfsd+0x5/0x2b0 [nfsd] [ 928.756274] [<ffffffffc0420710>] ? nfsd_destroy+0x190/0x190 [nfsd] [ 928.756991] [<ffffffff8f0d5891>] kthread+0x101/0x120 [ 928.757563] [<ffffffff8f10dcc5>] ? trace_hardirqs_on_caller+0xf5/0x1b0 [ 928.758282] [<ffffffff8f8f2fef>] ret_from_fork+0x1f/0x40 [ 928.758875] [<ffffffff8f0d5790>] ? kthread_create_on_node+0x250/0x250 So the client gets a flexfiles layout, and then tries to issue a v3 WRITE against the file. XFS then recalls the layout, but the client can't return the layout until the v3 WRITE completes. Eventually this should resolve itself after 2 lease periods, but that's quite a long time. I guess XFS requires recalling block and SCSI layouts when the server wants to issue a write (or someone writes to it locally), but that seems like it shouldn't be happening when the layout is a flexfiles layout. Any thoughts on what the right fix is here? On a related note, knfsd will spam the heck out of the client with CB_LAYOUTRECALLs during this time. I think we ought to consider fixing the server not to treat an NFS_OK return from the client like NFS4ERR_DELAY there, but that would mean a different mechanism for timing out a CB_LAYOUTRECALL. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS 2016-08-11 15:23 CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS Jeff Layton @ 2016-08-11 15:55 ` Trond Myklebust 2016-08-11 16:06 ` Jeff Layton 2018-01-27 15:39 ` Benjamin Coddington 1 sibling, 1 reply; 10+ messages in thread From: Trond Myklebust @ 2016-08-11 15:55 UTC (permalink / raw) To: Jeff Layton Cc: List Linux NFS Mailing, Thomas Haynes, hch, Fields Bruce James DQo+IE9uIEF1ZyAxMSwgMjAxNiwgYXQgMTE6MjMsIEplZmYgTGF5dG9uIDxqbGF5dG9uQHJlZGhh dC5jb20+IHdyb3RlOg0KPiANCj4gSSB3YXMgcGxheWluZyBhcm91bmQgd2l0aCB0aGUgaW4ta2Vy bmVsIGZsZXhmaWxlcyBzZXJ2ZXIgdG9kYXksIGFuZCBJDQo+IHNlZW0gdG8gYmUgaGl0dGluZyBh IGRlYWRsb2NrIHdoZW4gdXNpbmcgaXQgb24gYW4gWEZTLWV4cG9ydGVkDQo+IGZpbGVzeXN0ZW0u IEhlcmUncyB0aGUgc3RhY2sgdHJhY2Ugb2YgaG93IHRoZSBDQl9MQVlPVVRSRUNBTEwgb2NjdXJz Og0KPiANCj4gWyAgOTI4LjczNjEzOV0gQ1BVOiAwIFBJRDogODQ2IENvbW06IG5mc2QgVGFpbnRl ZDogRyAgICAgICAgICAgT0UgICA0LjguMC1yYzErICMzDQo+IFsgIDkyOC43MzcwNDBdIEhhcmR3 YXJlIG5hbWU6IFFFTVUgU3RhbmRhcmQgUEMgKGk0NDBGWCArIFBJSVgsIDE5OTYpLCBCSU9TIDEu OS4xLTEuZmMyNCAwNC8wMS8yMDE0DQo+IFsgIDkyOC43MzgwMDldICAwMDAwMDAwMDAwMDAwMjg2 IDAwMDAwMDAwNjEyNWY1MGUgZmZmZjkxMTUzODQ1Yjg3OCBmZmZmZmZmZjhmNDYzODUzDQo+IFsg IDkyOC43Mzg5MDZdICBmZmZmOTExNTJlYzE5NGQwIGZmZmY5MTE1MmQzMWQ5YzAgZmZmZjkxMTUz ODQ1YjhhOCBmZmZmZmZmZmMwNDU5MzZmDQo+IFsgIDkyOC43Mzk3ODhdICBmZmZmOTExNTJjMDUx OTgwIGZmZmY5MTE1MmQzMWQ5YzAgZmZmZjkxMTUyYzA1MTU0MCBmZmZmOTExNTM2MWI4YTU4DQo+ IFsgIDkyOC43NDA2OTddIENhbGwgVHJhY2U6DQo+IFsgIDkyOC43NDA5OThdICBbPGZmZmZmZmZm OGY0NjM4NTM+XSBkdW1wX3N0YWNrKzB4ODYvMHhjMw0KPiBbICA5MjguNzQxNTcwXSAgWzxmZmZm ZmZmZmMwNDU5MzZmPl0gbmZzZDRfcmVjYWxsX2ZpbGVfbGF5b3V0KzB4MTdmLzB4MTkwIFtuZnNk XQ0KPiBbICA5MjguNzQyMzgwXSAgWzxmZmZmZmZmZmMwNDU5MzlkPl0gbmZzZDRfbGF5b3V0X2xt X2JyZWFrKzB4MWQvMHgzMCBbbmZzZF0NCj4gWyAgOTI4Ljc0MzExNV0gIFs8ZmZmZmZmZmY4ZjMw NTZkOD5dIF9fYnJlYWtfbGVhc2UrMHgxMTgvMHg2YTANCj4gWyAgOTI4Ljc0Mzc1OV0gIFs8ZmZm ZmZmZmZjMDJkZWE2OT5dIHhmc19icmVha19sYXlvdXRzKzB4NzkvMHgxMjAgW3hmc10NCj4gWyAg OTI4Ljc0NDQ2Ml0gIFs8ZmZmZmZmZmZjMDI5ZWEwND5dIHhmc19maWxlX2Fpb193cml0ZV9jaGVj a3MrMHg5NC8weDFmMCBbeGZzXQ0KPiBbICA5MjguNzQ1MjUxXSAgWzxmZmZmZmZmZmMwMjlmMzZi Pl0geGZzX2ZpbGVfYnVmZmVyZWRfYWlvX3dyaXRlKzB4N2IvMHgzMzAgW3hmc10NCj4gWyAgOTI4 Ljc0NjA2M10gIFs8ZmZmZmZmZmZjMDI5ZjcwYz5dIHhmc19maWxlX3dyaXRlX2l0ZXIrMHhlYy8w eDE0MCBbeGZzXQ0KPiBbICA5MjguNzQ2ODAzXSAgWzxmZmZmZmZmZjhmMmEwNTk5Pl0gZG9faXRl cl9yZWFkdl93cml0ZXYrMHhiOS8weDE0MA0KPiBbICA5MjguNzQ3NDc4XSAgWzxmZmZmZmZmZjhm MmExMjZiPl0gZG9fcmVhZHZfd3JpdGV2KzB4MTliLzB4MjQwDQo+IFsgIDkyOC43NDgxNDZdICBb PGZmZmZmZmZmYzAyOWY2MjA+XSA/IHhmc19maWxlX2J1ZmZlcmVkX2Fpb193cml0ZSsweDMzMC8w eDMzMCBbeGZzXQ0KPiBbICA5MjguNzQ4OTU2XSAgWzxmZmZmZmZmZjhmMjllMDJiPl0gPyBkb19k ZW50cnlfb3BlbisweDI4Yi8weDMxMA0KPiBbICA5MjguNzQ5NjE0XSAgWzxmZmZmZmZmZmMwMjlj ODAwPl0gPyB4ZnNfZXh0ZW50X2J1c3lfYWdfY21wKzB4MjAvMHgyMCBbeGZzXQ0KPiBbICA5Mjgu NzUwMzY3XSAgWzxmZmZmZmZmZjhmMmExNTZmPl0gdmZzX3dyaXRldisweDNmLzB4NTANCj4gWyAg OTI4Ljc1MDkzNF0gIFs8ZmZmZmZmZmZjMDQyNzZjYT5dIG5mc2RfdmZzX3dyaXRlKzB4Y2EvMHgz YTAgW25mc2RdDQo+IFsgIDkyOC43NTE2MDhdICBbPGZmZmZmZmZmYzA0MjllYzU+XSBuZnNkX3dy aXRlKzB4NDg1LzB4NzgwIFtuZnNkXQ0KPiBbICA5MjguNzUyMjYzXSAgWzxmZmZmZmZmZmMwNDMx NDRjPl0gbmZzZDNfcHJvY193cml0ZSsweGJjLzB4MTUwIFtuZnNkXQ0KPiBbICA5MjguNzUyOTcz XSAgWzxmZmZmZmZmZmMwNDIxMzg4Pl0gbmZzZF9kaXNwYXRjaCsweGI4LzB4MWYwIFtuZnNkXQ0K PiBbICA5MjguNzUzNjQyXSAgWzxmZmZmZmZmZmMwMzZkNzhmPl0gc3ZjX3Byb2Nlc3NfY29tbW9u KzB4NDJmLzB4NjkwIFtzdW5ycGNdDQo+IFsgIDkyOC43NTQzOTVdICBbPGZmZmZmZmZmYzAzNmU4 ZTg+XSBzdmNfcHJvY2VzcysweDExOC8weDMzMCBbc3VucnBjXQ0KPiBbICA5MjguNzU1MDgwXSAg WzxmZmZmZmZmZmMwNDIwOGFjPl0gbmZzZCsweDE5Yy8weDJiMCBbbmZzZF0NCj4gWyAgOTI4Ljc1 NTY4MV0gIFs8ZmZmZmZmZmZjMDQyMDcxNT5dID8gbmZzZCsweDUvMHgyYjAgW25mc2RdDQo+IFsg IDkyOC43NTYyNzRdICBbPGZmZmZmZmZmYzA0MjA3MTA+XSA/IG5mc2RfZGVzdHJveSsweDE5MC8w eDE5MCBbbmZzZF0NCj4gWyAgOTI4Ljc1Njk5MV0gIFs8ZmZmZmZmZmY4ZjBkNTg5MT5dIGt0aHJl YWQrMHgxMDEvMHgxMjANCj4gWyAgOTI4Ljc1NzU2M10gIFs8ZmZmZmZmZmY4ZjEwZGNjNT5dID8g dHJhY2VfaGFyZGlycXNfb25fY2FsbGVyKzB4ZjUvMHgxYjANCj4gWyAgOTI4Ljc1ODI4Ml0gIFs8 ZmZmZmZmZmY4ZjhmMmZlZj5dIHJldF9mcm9tX2ZvcmsrMHgxZi8weDQwDQo+IFsgIDkyOC43NTg4 NzVdICBbPGZmZmZmZmZmOGYwZDU3OTA+XSA/IGt0aHJlYWRfY3JlYXRlX29uX25vZGUrMHgyNTAv MHgyNTANCj4gDQo+IA0KPiBTbyB0aGUgY2xpZW50IGdldHMgYSBmbGV4ZmlsZXMgbGF5b3V0LCBh bmQgdGhlbiB0cmllcyB0byBpc3N1ZSBhIHYzDQo+IFdSSVRFIGFnYWluc3QgdGhlIGZpbGUuIFhG UyB0aGVuIHJlY2FsbHMgdGhlIGxheW91dCwgYnV0IHRoZSBjbGllbnQNCj4gY2FuJ3QgcmV0dXJu IHRoZSBsYXlvdXQgdW50aWwgdGhlIHYzIFdSSVRFIGNvbXBsZXRlcy4gRXZlbnR1YWxseSB0aGlz DQo+IHNob3VsZCByZXNvbHZlIGl0c2VsZiBhZnRlciAyIGxlYXNlIHBlcmlvZHMsIGJ1dCB0aGF0 J3MgcXVpdGUgYSBsb25nDQo+IHRpbWUuDQoNCldoYXTigJlzIHRoZSBzZXF1ZW5jZSBvZiBvcGVy YXRpb25zIGhlcmU/IElmIHRoZSBjbGllbnQgaGFzIG91dHN0YW5kaW5nIEkvTywgSSBzaG91bGQg bm93IGJlIHJldHVybmluZyBORlNfT0ssIGFuZCB0aGVuIGNvbXBsZXRpbmcgdGhlIHJlY2FsbCB3 aXRoIGEgTEFZT1VUUkVUVVJOIGFzIHNvb24gYXMgdGhlIG91dHN0YW5kaW5nIEkvTyAoYW5kIGxh eW91dGNvbW1pdCwgaWYgb25lIGlzIGR1ZSkgaXMgZG9uZS4NCg0KVGhlIHNlcnZlciBpcyBleHBl Y3RlZCB0byByZXR1cm4gTkZTNEVSUl9SRUNBTExDT05GTElDVCB0byBhbnkgTEFZT1VUR0VUIGF0 dGVtcHRzIHRoYXQgb2NjdXIgYmVmb3JlIHRoZSBMQVlPVVRSRVRVUk4uDQoNCj4gDQo+IEkgZ3Vl c3MgWEZTIHJlcXVpcmVzIHJlY2FsbGluZyBibG9jayBhbmQgU0NTSSBsYXlvdXRzIHdoZW4gdGhl IHNlcnZlcg0KPiB3YW50cyB0byBpc3N1ZSBhIHdyaXRlIChvciBzb21lb25lIHdyaXRlcyB0byBp dCBsb2NhbGx5KSwgYnV0IHRoYXQNCj4gc2VlbXMgbGlrZSBpdCBzaG91bGRuJ3QgYmUgaGFwcGVu aW5nIHdoZW4gdGhlIGxheW91dCBpcyBhIGZsZXhmaWxlcw0KPiBsYXlvdXQuDQo+IA0KPiBBbnkg dGhvdWdodHMgb24gd2hhdCB0aGUgcmlnaHQgZml4IGlzIGhlcmU/DQo+IA0KPiBPbiBhIHJlbGF0 ZWQgbm90ZSwga25mc2Qgd2lsbCBzcGFtIHRoZSBoZWNrIG91dCBvZiB0aGUgY2xpZW50IHdpdGgN Cj4gQ0JfTEFZT1VUUkVDQUxMcyBkdXJpbmcgdGhpcyB0aW1lLiBJIHRoaW5rIHdlIG91Z2h0IHRv IGNvbnNpZGVyIGZpeGluZw0KPiB0aGUgc2VydmVyIG5vdCB0byB0cmVhdCBhbiBORlNfT0sgcmV0 dXJuIGZyb20gdGhlIGNsaWVudCBsaWtlDQo+IE5GUzRFUlJfREVMQVkgdGhlcmUsIGJ1dCB0aGF0 IHdvdWxkIG1lYW4gYSBkaWZmZXJlbnQgbWVjaGFuaXNtIGZvcg0KPiB0aW1pbmcgb3V0IGEgQ0Jf TEFZT1VUUkVDQUxMLg0KDQpUaGVyZSBpcyBhIGJpZyBkaWZmZXJlbmNlIGJldHdlZW4gTkZTX09L IGFuZCBORlM0RVJSX0RFTEFZIGFzIGZhciBhcyB0aGUgc2VydmVyIGlzIGNvbmNlcm5lZDoNCg0K LSBORlNfT0sgbWVhbnMgdGhhdCB0aGUgY2xpZW50IGhhcyBub3cgc2VlbiB0aGUgc3RhdGVpZCB3 aXRoIHRoZSB1cGRhdGVkIHNlcXVlbmNlIGlkIHRoYXQgd2FzIHNlbnQgaW4gQ0JfTEFZT1VUUkVD QUxMLCBhbmQgaXMgcHJvY2Vzc2luZyBpdC4gTm8gcmVzZW5kIG9mIHRoZSBDQl9MQVlPVVRSRUNB TEwgaXMgcmVxdWlyZWQuDQotIE9UT0gsIE5GUzRFUlJfREVMQVkgbWVhbnMgdGhlIHNhbWUgdGhp bmcgaW4gdGhlIGJhY2sgY2hhbm5lbCBhcyBpdCBkb2VzIGluIHRoZSBmb3J3YXJkIGNoYW5uZWw6 IEnigJltIGJ1c3kgYW5kIGNhbm5vdCBwcm9jZXNzIHlvdXIgcmVxdWVzdCwgcGxlYXNlIHJlc2Vu ZCBpdCBsYXRlci4= ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS 2016-08-11 15:55 ` Trond Myklebust @ 2016-08-11 16:06 ` Jeff Layton 2016-08-11 16:20 ` Trond Myklebust 0 siblings, 1 reply; 10+ messages in thread From: Jeff Layton @ 2016-08-11 16:06 UTC (permalink / raw) To: Trond Myklebust Cc: List Linux NFS Mailing, Thomas Haynes, hch, Fields Bruce James On Thu, 2016-08-11 at 15:55 +0000, Trond Myklebust wrote: > > > > On Aug 11, 2016, at 11:23, Jeff Layton <jlayton@redhat.com> wrote: > > > > I was playing around with the in-kernel flexfiles server today, and > > I > > seem to be hitting a deadlock when using it on an XFS-exported > > filesystem. Here's the stack trace of how the CB_LAYOUTRECALL > > occurs: > > > > [ 928.736139] CPU: 0 PID: 846 Comm: nfsd Tainted: > > G OE 4.8.0-rc1+ #3 > > [ 928.737040] Hardware name: QEMU Standard PC (i440FX + PIIX, > > 1996), BIOS 1.9.1-1.fc24 04/01/2014 > > [ 928.738009] 0000000000000286 000000006125f50e ffff91153845b878 > > ffffffff8f463853 > > [ 928.738906] ffff91152ec194d0 ffff91152d31d9c0 ffff91153845b8a8 > > ffffffffc045936f > > [ 928.739788] ffff91152c051980 ffff91152d31d9c0 ffff91152c051540 > > ffff9115361b8a58 > > [ 928.740697] Call Trace: > > [ 928.740998] [<ffffffff8f463853>] dump_stack+0x86/0xc3 > > [ 928.741570] [<ffffffffc045936f>] > > nfsd4_recall_file_layout+0x17f/0x190 [nfsd] > > [ 928.742380] [<ffffffffc045939d>] > > nfsd4_layout_lm_break+0x1d/0x30 [nfsd] > > [ 928.743115] [<ffffffff8f3056d8>] __break_lease+0x118/0x6a0 > > [ 928.743759] [<ffffffffc02dea69>] xfs_break_layouts+0x79/0x120 > > [xfs] > > [ 928.744462] [<ffffffffc029ea04>] > > xfs_file_aio_write_checks+0x94/0x1f0 [xfs] > > [ 928.745251] [<ffffffffc029f36b>] > > xfs_file_buffered_aio_write+0x7b/0x330 [xfs] > > [ 928.746063] [<ffffffffc029f70c>] xfs_file_write_iter+0xec/0x140 > > [xfs] > > [ 928.746803] [<ffffffff8f2a0599>] > > do_iter_readv_writev+0xb9/0x140 > > [ 928.747478] [<ffffffff8f2a126b>] do_readv_writev+0x19b/0x240 > > [ 928.748146] [<ffffffffc029f620>] ? > > xfs_file_buffered_aio_write+0x330/0x330 [xfs] > > [ 928.748956] [<ffffffff8f29e02b>] ? do_dentry_open+0x28b/0x310 > > [ 928.749614] [<ffffffffc029c800>] ? > > xfs_extent_busy_ag_cmp+0x20/0x20 [xfs] > > [ 928.750367] [<ffffffff8f2a156f>] vfs_writev+0x3f/0x50 > > [ 928.750934] [<ffffffffc04276ca>] nfsd_vfs_write+0xca/0x3a0 > > [nfsd] > > [ 928.751608] [<ffffffffc0429ec5>] nfsd_write+0x485/0x780 [nfsd] > > [ 928.752263] [<ffffffffc043144c>] nfsd3_proc_write+0xbc/0x150 > > [nfsd] > > [ 928.752973] [<ffffffffc0421388>] nfsd_dispatch+0xb8/0x1f0 > > [nfsd] > > [ 928.753642] [<ffffffffc036d78f>] svc_process_common+0x42f/0x690 > > [sunrpc] > > [ 928.754395] [<ffffffffc036e8e8>] svc_process+0x118/0x330 > > [sunrpc] > > [ 928.755080] [<ffffffffc04208ac>] nfsd+0x19c/0x2b0 [nfsd] > > [ 928.755681] [<ffffffffc0420715>] ? nfsd+0x5/0x2b0 [nfsd] > > [ 928.756274] [<ffffffffc0420710>] ? nfsd_destroy+0x190/0x190 > > [nfsd] > > [ 928.756991] [<ffffffff8f0d5891>] kthread+0x101/0x120 > > [ 928.757563] [<ffffffff8f10dcc5>] ? > > trace_hardirqs_on_caller+0xf5/0x1b0 > > [ 928.758282] [<ffffffff8f8f2fef>] ret_from_fork+0x1f/0x40 > > [ 928.758875] [<ffffffff8f0d5790>] ? > > kthread_create_on_node+0x250/0x250 > > > > > > So the client gets a flexfiles layout, and then tries to issue a v3 > > WRITE against the file. XFS then recalls the layout, but the client > > can't return the layout until the v3 WRITE completes. Eventually > > this > > should resolve itself after 2 lease periods, but that's quite a > > long > > time. > > What’s the sequence of operations here? If the client has outstanding > I/O, I should now be returning NFS_OK, and then completing the recall > with a LAYOUTRETURN as soon as the outstanding I/O (and layoutcommit, > if one is due) is done. > > The server is expected to return NFS4ERR_RECALLCONFLICT to any > LAYOUTGET attempts that occur before the LAYOUTRETURN. > Basically, I'm just doing this on the client: $ echo "foo" > /mnt/knfsdsrv/testfile The client does: OPEN LAYOUTGET (for RW) GETDEVICEINFO ...and then a v3 WRITE under the aegis of the layout it got. The server then issues a CB_LAYOUTRECALL (because XFS wants to do that whenever there is a local write, apparently). The client returns NFS_OK, but it can't return the layout until the v3 WRITE completes. The v3 write is hung though because it's waiting for the layout to be returned. > > > > > > I guess XFS requires recalling block and SCSI layouts when the > > server > > wants to issue a write (or someone writes to it locally), but that > > seems like it shouldn't be happening when the layout is a flexfiles > > layout. > > > > Any thoughts on what the right fix is here? > > > > On a related note, knfsd will spam the heck out of the client with > > CB_LAYOUTRECALLs during this time. I think we ought to consider > > fixing > > the server not to treat an NFS_OK return from the client like > > NFS4ERR_DELAY there, but that would mean a different mechanism for > > timing out a CB_LAYOUTRECALL. > > There is a big difference between NFS_OK and NFS4ERR_DELAY as far as > the server is concerned: > > - NFS_OK means that the client has now seen the stateid with the > updated sequence id that was sent in CB_LAYOUTRECALL, and is > processing it. No resend of the CB_LAYOUTRECALL is required. > - OTOH, NFS4ERR_DELAY means the same thing in the back channel as it > does in the forward channel: I’m busy and cannot process your > request, please resend it later. Right. The current code basically just treats them the same as a mechanism to handle eventually timing out the layoutrecall. The extra CB_LAYOUTRECALLs are entirely superfluous. It's probably not too hard to fix, but we'd need to come up with some other mechanism for timing out the layoutrecall. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS 2016-08-11 16:06 ` Jeff Layton @ 2016-08-11 16:20 ` Trond Myklebust 2016-08-11 16:25 ` hch 0 siblings, 1 reply; 10+ messages in thread From: Trond Myklebust @ 2016-08-11 16:20 UTC (permalink / raw) To: Jeff Layton Cc: List Linux NFS Mailing, Thomas Haynes, hch, Fields Bruce James DQo+IE9uIEF1ZyAxMSwgMjAxNiwgYXQgMTI6MDYsIEplZmYgTGF5dG9uIDxqbGF5dG9uQHJlZGhh dC5jb20+IHdyb3RlOg0KPiANCj4gT24gVGh1LCAyMDE2LTA4LTExIGF0IDE1OjU1ICswMDAwLCBU cm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+Pj4gDQo+Pj4gT24gQXVnIDExLCAyMDE2LCBhdCAxMToy MywgSmVmZiBMYXl0b24gPGpsYXl0b25AcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4gDQo+Pj4gSSB3 YXMgcGxheWluZyBhcm91bmQgd2l0aCB0aGUgaW4ta2VybmVsIGZsZXhmaWxlcyBzZXJ2ZXIgdG9k YXksIGFuZA0KPj4+IEkNCj4+PiBzZWVtIHRvIGJlIGhpdHRpbmcgYSBkZWFkbG9jayB3aGVuIHVz aW5nIGl0IG9uIGFuIFhGUy1leHBvcnRlZA0KPj4+IGZpbGVzeXN0ZW0uIEhlcmUncyB0aGUgc3Rh Y2sgdHJhY2Ugb2YgaG93IHRoZSBDQl9MQVlPVVRSRUNBTEwNCj4+PiBvY2N1cnM6DQo+Pj4gDQo+ Pj4gWyAgOTI4LjczNjEzOV0gQ1BVOiAwIFBJRDogODQ2IENvbW06IG5mc2QgVGFpbnRlZDoNCj4+ PiBHICAgICAgICAgICBPRSAgIDQuOC4wLXJjMSsgIzMNCj4+PiBbICA5MjguNzM3MDQwXSBIYXJk d2FyZSBuYW1lOiBRRU1VIFN0YW5kYXJkIFBDIChpNDQwRlggKyBQSUlYLA0KPj4+IDE5OTYpLCBC SU9TIDEuOS4xLTEuZmMyNCAwNC8wMS8yMDE0DQo+Pj4gWyAgOTI4LjczODAwOV0gIDAwMDAwMDAw MDAwMDAyODYgMDAwMDAwMDA2MTI1ZjUwZSBmZmZmOTExNTM4NDViODc4DQo+Pj4gZmZmZmZmZmY4 ZjQ2Mzg1Mw0KPj4+IFsgIDkyOC43Mzg5MDZdICBmZmZmOTExNTJlYzE5NGQwIGZmZmY5MTE1MmQz MWQ5YzAgZmZmZjkxMTUzODQ1YjhhOA0KPj4+IGZmZmZmZmZmYzA0NTkzNmYNCj4+PiBbICA5Mjgu NzM5Nzg4XSAgZmZmZjkxMTUyYzA1MTk4MCBmZmZmOTExNTJkMzFkOWMwIGZmZmY5MTE1MmMwNTE1 NDANCj4+PiBmZmZmOTExNTM2MWI4YTU4DQo+Pj4gWyAgOTI4Ljc0MDY5N10gQ2FsbCBUcmFjZToN Cj4+PiBbICA5MjguNzQwOTk4XSAgWzxmZmZmZmZmZjhmNDYzODUzPl0gZHVtcF9zdGFjaysweDg2 LzB4YzMNCj4+PiBbICA5MjguNzQxNTcwXSAgWzxmZmZmZmZmZmMwNDU5MzZmPl0NCj4+PiBuZnNk NF9yZWNhbGxfZmlsZV9sYXlvdXQrMHgxN2YvMHgxOTAgW25mc2RdDQo+Pj4gWyAgOTI4Ljc0MjM4 MF0gIFs8ZmZmZmZmZmZjMDQ1OTM5ZD5dDQo+Pj4gbmZzZDRfbGF5b3V0X2xtX2JyZWFrKzB4MWQv MHgzMCBbbmZzZF0NCj4+PiBbICA5MjguNzQzMTE1XSAgWzxmZmZmZmZmZjhmMzA1NmQ4Pl0gX19i cmVha19sZWFzZSsweDExOC8weDZhMA0KPj4+IFsgIDkyOC43NDM3NTldICBbPGZmZmZmZmZmYzAy ZGVhNjk+XSB4ZnNfYnJlYWtfbGF5b3V0cysweDc5LzB4MTIwDQo+Pj4gW3hmc10NCj4+PiBbICA5 MjguNzQ0NDYyXSAgWzxmZmZmZmZmZmMwMjllYTA0Pl0NCj4+PiB4ZnNfZmlsZV9haW9fd3JpdGVf Y2hlY2tzKzB4OTQvMHgxZjAgW3hmc10NCj4+PiBbICA5MjguNzQ1MjUxXSAgWzxmZmZmZmZmZmMw MjlmMzZiPl0NCj4+PiB4ZnNfZmlsZV9idWZmZXJlZF9haW9fd3JpdGUrMHg3Yi8weDMzMCBbeGZz XQ0KPj4+IFsgIDkyOC43NDYwNjNdICBbPGZmZmZmZmZmYzAyOWY3MGM+XSB4ZnNfZmlsZV93cml0 ZV9pdGVyKzB4ZWMvMHgxNDANCj4+PiBbeGZzXQ0KPj4+IFsgIDkyOC43NDY4MDNdICBbPGZmZmZm ZmZmOGYyYTA1OTk+XQ0KPj4+IGRvX2l0ZXJfcmVhZHZfd3JpdGV2KzB4YjkvMHgxNDANCj4+PiBb ICA5MjguNzQ3NDc4XSAgWzxmZmZmZmZmZjhmMmExMjZiPl0gZG9fcmVhZHZfd3JpdGV2KzB4MTli LzB4MjQwDQo+Pj4gWyAgOTI4Ljc0ODE0Nl0gIFs8ZmZmZmZmZmZjMDI5ZjYyMD5dID8NCj4+PiB4 ZnNfZmlsZV9idWZmZXJlZF9haW9fd3JpdGUrMHgzMzAvMHgzMzAgW3hmc10NCj4+PiBbICA5Mjgu NzQ4OTU2XSAgWzxmZmZmZmZmZjhmMjllMDJiPl0gPyBkb19kZW50cnlfb3BlbisweDI4Yi8weDMx MA0KPj4+IFsgIDkyOC43NDk2MTRdICBbPGZmZmZmZmZmYzAyOWM4MDA+XSA/DQo+Pj4geGZzX2V4 dGVudF9idXN5X2FnX2NtcCsweDIwLzB4MjAgW3hmc10NCj4+PiBbICA5MjguNzUwMzY3XSAgWzxm ZmZmZmZmZjhmMmExNTZmPl0gdmZzX3dyaXRldisweDNmLzB4NTANCj4+PiBbICA5MjguNzUwOTM0 XSAgWzxmZmZmZmZmZmMwNDI3NmNhPl0gbmZzZF92ZnNfd3JpdGUrMHhjYS8weDNhMA0KPj4+IFtu ZnNkXQ0KPj4+IFsgIDkyOC43NTE2MDhdICBbPGZmZmZmZmZmYzA0MjllYzU+XSBuZnNkX3dyaXRl KzB4NDg1LzB4NzgwIFtuZnNkXQ0KPj4+IFsgIDkyOC43NTIyNjNdICBbPGZmZmZmZmZmYzA0MzE0 NGM+XSBuZnNkM19wcm9jX3dyaXRlKzB4YmMvMHgxNTANCj4+PiBbbmZzZF0NCj4+PiBbICA5Mjgu NzUyOTczXSAgWzxmZmZmZmZmZmMwNDIxMzg4Pl0gbmZzZF9kaXNwYXRjaCsweGI4LzB4MWYwDQo+ Pj4gW25mc2RdDQo+Pj4gWyAgOTI4Ljc1MzY0Ml0gIFs8ZmZmZmZmZmZjMDM2ZDc4Zj5dIHN2Y19w cm9jZXNzX2NvbW1vbisweDQyZi8weDY5MA0KPj4+IFtzdW5ycGNdDQo+Pj4gWyAgOTI4Ljc1NDM5 NV0gIFs8ZmZmZmZmZmZjMDM2ZThlOD5dIHN2Y19wcm9jZXNzKzB4MTE4LzB4MzMwDQo+Pj4gW3N1 bnJwY10NCj4+PiBbICA5MjguNzU1MDgwXSAgWzxmZmZmZmZmZmMwNDIwOGFjPl0gbmZzZCsweDE5 Yy8weDJiMCBbbmZzZF0NCj4+PiBbICA5MjguNzU1NjgxXSAgWzxmZmZmZmZmZmMwNDIwNzE1Pl0g PyBuZnNkKzB4NS8weDJiMCBbbmZzZF0NCj4+PiBbICA5MjguNzU2Mjc0XSAgWzxmZmZmZmZmZmMw NDIwNzEwPl0gPyBuZnNkX2Rlc3Ryb3krMHgxOTAvMHgxOTANCj4+PiBbbmZzZF0NCj4+PiBbICA5 MjguNzU2OTkxXSAgWzxmZmZmZmZmZjhmMGQ1ODkxPl0ga3RocmVhZCsweDEwMS8weDEyMA0KPj4+ IFsgIDkyOC43NTc1NjNdICBbPGZmZmZmZmZmOGYxMGRjYzU+XSA/DQo+Pj4gdHJhY2VfaGFyZGly cXNfb25fY2FsbGVyKzB4ZjUvMHgxYjANCj4+PiBbICA5MjguNzU4MjgyXSAgWzxmZmZmZmZmZjhm OGYyZmVmPl0gcmV0X2Zyb21fZm9yaysweDFmLzB4NDANCj4+PiBbICA5MjguNzU4ODc1XSAgWzxm ZmZmZmZmZjhmMGQ1NzkwPl0gPw0KPj4+IGt0aHJlYWRfY3JlYXRlX29uX25vZGUrMHgyNTAvMHgy NTANCj4+PiANCj4+PiANCj4+PiBTbyB0aGUgY2xpZW50IGdldHMgYSBmbGV4ZmlsZXMgbGF5b3V0 LCBhbmQgdGhlbiB0cmllcyB0byBpc3N1ZSBhIHYzDQo+Pj4gV1JJVEUgYWdhaW5zdCB0aGUgZmls ZS4gWEZTIHRoZW4gcmVjYWxscyB0aGUgbGF5b3V0LCBidXQgdGhlIGNsaWVudA0KPj4+IGNhbid0 IHJldHVybiB0aGUgbGF5b3V0IHVudGlsIHRoZSB2MyBXUklURSBjb21wbGV0ZXMuIEV2ZW50dWFs bHkNCj4+PiB0aGlzDQo+Pj4gc2hvdWxkIHJlc29sdmUgaXRzZWxmIGFmdGVyIDIgbGVhc2UgcGVy aW9kcywgYnV0IHRoYXQncyBxdWl0ZSBhDQo+Pj4gbG9uZw0KPj4+IHRpbWUuDQo+PiANCj4+IFdo YXTigJlzIHRoZSBzZXF1ZW5jZSBvZiBvcGVyYXRpb25zIGhlcmU/IElmIHRoZSBjbGllbnQgaGFz IG91dHN0YW5kaW5nDQo+PiBJL08sIEkgc2hvdWxkIG5vdyBiZSByZXR1cm5pbmcgTkZTX09LLCBh bmQgdGhlbiBjb21wbGV0aW5nIHRoZSByZWNhbGwNCj4+IHdpdGggYSBMQVlPVVRSRVRVUk4gYXMg c29vbiBhcyB0aGUgb3V0c3RhbmRpbmcgSS9PIChhbmQgbGF5b3V0Y29tbWl0LA0KPj4gaWYgb25l IGlzIGR1ZSkgaXMgZG9uZS4NCj4+IA0KPj4gVGhlIHNlcnZlciBpcyBleHBlY3RlZCB0byByZXR1 cm4gTkZTNEVSUl9SRUNBTExDT05GTElDVCB0byBhbnkNCj4+IExBWU9VVEdFVCBhdHRlbXB0cyB0 aGF0IG9jY3VyIGJlZm9yZSB0aGUgTEFZT1VUUkVUVVJOLg0KPj4gDQo+IA0KPiBCYXNpY2FsbHks IEknbSBqdXN0IGRvaW5nIHRoaXMgb24gdGhlIGNsaWVudDoNCj4gDQo+ICAgICAkIGVjaG8gImZv byIgPiAvbW50L2tuZnNkc3J2L3Rlc3RmaWxlDQo+IA0KPiANCj4gVGhlIGNsaWVudCBkb2VzOg0K PiANCj4gT1BFTg0KPiBMQVlPVVRHRVQgKGZvciBSVykNCj4gR0VUREVWSUNFSU5GTw0KPiANCj4g Li4uYW5kIHRoZW4gYSB2MyBXUklURSB1bmRlciB0aGUgYWVnaXMgb2YgdGhlIGxheW91dCBpdCBn b3QuDQo+IA0KPiBUaGUgc2VydmVyIHRoZW4gaXNzdWVzIGEgQ0JfTEFZT1VUUkVDQUxMIChiZWNh dXNlIFhGUyB3YW50cyB0byBkbyB0aGF0DQo+IHdoZW5ldmVyIHRoZXJlIGlzIGEgbG9jYWwgd3Jp dGUsIGFwcGFyZW50bHkpLiBUaGUgY2xpZW50IHJldHVybnMNCj4gTkZTX09LLCBidXQgaXQgY2Fu J3QgcmV0dXJuIHRoZSBsYXlvdXQgdW50aWwgdGhlIHYzIFdSSVRFIGNvbXBsZXRlcy4NCj4gVGhl IHYzIHdyaXRlIGlzIGh1bmcgdGhvdWdoIGJlY2F1c2UgaXQncyB3YWl0aW5nIGZvciB0aGUgbGF5 b3V0IHRvIGJlDQo+IHJldHVybmVkLg0KDQpPaOKApiBTbyB0aGlzIGlzIGFuIGFydGlmYWN0IG9m IHRoZSB3cml0ZSBiZWluZyBsb2NhbCwgYW5kIFhGUyBoYXZpbmcgYSBwYXRoIHRvIHJlY2FsbCB0 aGUgbGF5b3V0IHRoYXQgaXQgcmVhbGx5IHNob3VsZG7igJl0IGhhdmUgaW4gdGhlIGZsZXhmaWxl cyBjYXNlPw0KDQo+IA0KPj4+IA0KPj4+IA0KPj4+IEkgZ3Vlc3MgWEZTIHJlcXVpcmVzIHJlY2Fs bGluZyBibG9jayBhbmQgU0NTSSBsYXlvdXRzIHdoZW4gdGhlDQo+Pj4gc2VydmVyDQo+Pj4gd2Fu dHMgdG8gaXNzdWUgYSB3cml0ZSAob3Igc29tZW9uZSB3cml0ZXMgdG8gaXQgbG9jYWxseSksIGJ1 dCB0aGF0DQo+Pj4gc2VlbXMgbGlrZSBpdCBzaG91bGRuJ3QgYmUgaGFwcGVuaW5nIHdoZW4gdGhl IGxheW91dCBpcyBhIGZsZXhmaWxlcw0KPj4+IGxheW91dC4NCj4+PiANCj4+PiBBbnkgdGhvdWdo dHMgb24gd2hhdCB0aGUgcmlnaHQgZml4IGlzIGhlcmU/DQo+Pj4gDQo+Pj4gT24gYSByZWxhdGVk IG5vdGUsIGtuZnNkIHdpbGwgc3BhbSB0aGUgaGVjayBvdXQgb2YgdGhlIGNsaWVudCB3aXRoDQo+ Pj4gQ0JfTEFZT1VUUkVDQUxMcyBkdXJpbmcgdGhpcyB0aW1lLiBJIHRoaW5rIHdlIG91Z2h0IHRv IGNvbnNpZGVyDQo+Pj4gZml4aW5nDQo+Pj4gdGhlIHNlcnZlciBub3QgdG8gdHJlYXQgYW4gTkZT X09LIHJldHVybiBmcm9tIHRoZSBjbGllbnQgbGlrZQ0KPj4+IE5GUzRFUlJfREVMQVkgdGhlcmUs IGJ1dCB0aGF0IHdvdWxkIG1lYW4gYSBkaWZmZXJlbnQgbWVjaGFuaXNtIGZvcg0KPj4+IHRpbWlu ZyBvdXQgYSBDQl9MQVlPVVRSRUNBTEwuDQo+PiANCj4+IFRoZXJlIGlzIGEgYmlnIGRpZmZlcmVu Y2UgYmV0d2VlbiBORlNfT0sgYW5kIE5GUzRFUlJfREVMQVkgYXMgZmFyIGFzDQo+PiB0aGUgc2Vy dmVyIGlzIGNvbmNlcm5lZDoNCj4+IA0KPj4gLSBORlNfT0sgbWVhbnMgdGhhdCB0aGUgY2xpZW50 IGhhcyBub3cgc2VlbiB0aGUgc3RhdGVpZCB3aXRoIHRoZQ0KPj4gdXBkYXRlZCBzZXF1ZW5jZSBp ZCB0aGF0IHdhcyBzZW50IGluIENCX0xBWU9VVFJFQ0FMTCwgYW5kIGlzDQo+PiBwcm9jZXNzaW5n IGl0LiBObyByZXNlbmQgb2YgdGhlIENCX0xBWU9VVFJFQ0FMTCBpcyByZXF1aXJlZC4NCj4+IC0g T1RPSCwgTkZTNEVSUl9ERUxBWSBtZWFucyB0aGUgc2FtZSB0aGluZyBpbiB0aGUgYmFjayBjaGFu bmVsIGFzIGl0DQo+PiBkb2VzIGluIHRoZSBmb3J3YXJkIGNoYW5uZWw6IEnigJltIGJ1c3kgYW5k IGNhbm5vdCBwcm9jZXNzIHlvdXINCj4+IHJlcXVlc3QsIHBsZWFzZSByZXNlbmQgaXQgbGF0ZXIu DQo+IA0KPiBSaWdodC4gVGhlIGN1cnJlbnQgY29kZSBiYXNpY2FsbHkganVzdCB0cmVhdHMgdGhl bSB0aGUgc2FtZSBhcyBhDQo+IG1lY2hhbmlzbSB0byBoYW5kbGUgZXZlbnR1YWxseSB0aW1pbmcg b3V0IHRoZSBsYXlvdXRyZWNhbGwuIFRoZSBleHRyYQ0KPiBDQl9MQVlPVVRSRUNBTExzIGFyZSBl bnRpcmVseSBzdXBlcmZsdW91cy4gSXQncyBwcm9iYWJseSBub3QgdG9vIGhhcmQNCj4gdG8gZml4 LCBidXQgd2UnZCBuZWVkIHRvIGNvbWUgdXAgd2l0aCBzb21lIG90aGVyIG1lY2hhbmlzbSBmb3Ig dGltaW5nDQo+IG91dCB0aGUgbGF5b3V0cmVjYWxsLg0KPiANCj4gLS0gDQo+IEplZmYgTGF5dG9u IDxqbGF5dG9uQHJlZGhhdC5jb20+DQoNCg== ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS 2016-08-11 16:20 ` Trond Myklebust @ 2016-08-11 16:25 ` hch 2016-08-11 16:33 ` Jeff Layton 0 siblings, 1 reply; 10+ messages in thread From: hch @ 2016-08-11 16:25 UTC (permalink / raw) To: Trond Myklebust Cc: Jeff Layton, List Linux NFS Mailing, Thomas Haynes, hch, Fields Bruce James Yeah, for file-like layouts there should be a flag in struct nfsd4_layout_ops to disable recalls. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS 2016-08-11 16:25 ` hch @ 2016-08-11 16:33 ` Jeff Layton 2016-08-11 16:59 ` hch 0 siblings, 1 reply; 10+ messages in thread From: Jeff Layton @ 2016-08-11 16:33 UTC (permalink / raw) To: hch, Trond Myklebust Cc: List Linux NFS Mailing, Thomas Haynes, Fields Bruce James On Thu, 2016-08-11 at 18:25 +0200, hch wrote: > Yeah, for file-like layouts there should be a flag in > struct nfsd4_layout_ops to disable recalls. I don't think disabling recalls would be enough, would it? XFS still wants to break_layout and won't proceed until the layout list is empty, AFAICT. We need some way to indicate to the lower filesystem not to call break_layout in this case. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS 2016-08-11 16:33 ` Jeff Layton @ 2016-08-11 16:59 ` hch 2016-08-11 17:10 ` Jeff Layton 0 siblings, 1 reply; 10+ messages in thread From: hch @ 2016-08-11 16:59 UTC (permalink / raw) To: Jeff Layton Cc: hch, Trond Myklebust, List Linux NFS Mailing, Thomas Haynes, Fields Bruce James On Thu, Aug 11, 2016 at 12:33:47PM -0400, Jeff Layton wrote: > On Thu, 2016-08-11 at 18:25 +0200, hch wrote: > > Yeah, for file-like layouts there should be a flag in > > struct nfsd4_layout_ops to disable recalls. > > I don't think disabling recalls would be enough, would it? XFS still > wants to break_layout and won't proceed until the layout list is empty, > AFAICT. We need some way to indicate to the lower filesystem not to > call break_layout in this case. XFS only cares about block-like layours where the client has direct access to the file blocks. I'd need to look how to propagate the flag into break_layout, but in principle we don't need to do any recalls on truncate every for file and flexfile layouts. > > -- > Jeff Layton <jlayton@redhat.com> ---end quoted text--- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS 2016-08-11 16:59 ` hch @ 2016-08-11 17:10 ` Jeff Layton 0 siblings, 0 replies; 10+ messages in thread From: Jeff Layton @ 2016-08-11 17:10 UTC (permalink / raw) To: hch Cc: Trond Myklebust, List Linux NFS Mailing, Thomas Haynes, Fields Bruce James On Thu, 2016-08-11 at 18:59 +0200, hch wrote: > On Thu, Aug 11, 2016 at 12:33:47PM -0400, Jeff Layton wrote: > > > > On Thu, 2016-08-11 at 18:25 +0200, hch wrote: > > > > > > Yeah, for file-like layouts there should be a flag in > > > struct nfsd4_layout_ops to disable recalls. > > > > I don't think disabling recalls would be enough, would it? XFS > > still > > wants to break_layout and won't proceed until the layout list is > > empty, > > AFAICT. We need some way to indicate to the lower filesystem not to > > call break_layout in this case. > > XFS only cares about block-like layours where the client has direct > access to the file blocks. I'd need to look how to propagate the > flag into break_layout, but in principle we don't need to do any > recalls on truncate every for file and flexfile layouts. > Hmm...if we aren't ever going to recall files and flexfiles layouts, then do we even need to set a FL_LAYOUT lease for them at all? I think I'll try hacking something up that takes that approach and see if that might be a reasonable fix. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS 2016-08-11 15:23 CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS Jeff Layton 2016-08-11 15:55 ` Trond Myklebust @ 2018-01-27 15:39 ` Benjamin Coddington 2018-01-27 21:41 ` Jeff Layton 1 sibling, 1 reply; 10+ messages in thread From: Benjamin Coddington @ 2018-01-27 15:39 UTC (permalink / raw) To: Jeff Layton Cc: open list:NFS, SUNRPC, AND..., Tom Haynes, Christoph Hellwig, Bruce Fields On 11 Aug 2016, at 11:23, Jeff Layton wrote: > I was playing around with the in-kernel flexfiles server today, and I > seem to be hitting a deadlock when using it on an XFS-exported > filesystem. Here's the stack trace of how the CB_LAYOUTRECALL occurs: > > [ 928.736139] CPU: 0 PID: 846 Comm: nfsd Tainted: G OE > 4.8.0-rc1+ #3 > [ 928.737040] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.9.1-1.fc24 04/01/2014 > [ 928.738009] 0000000000000286 000000006125f50e ffff91153845b878 > ffffffff8f463853 > [ 928.738906] ffff91152ec194d0 ffff91152d31d9c0 ffff91153845b8a8 > ffffffffc045936f > [ 928.739788] ffff91152c051980 ffff91152d31d9c0 ffff91152c051540 > ffff9115361b8a58 > [ 928.740697] Call Trace: > [ 928.740998] [<ffffffff8f463853>] dump_stack+0x86/0xc3 > [ 928.741570] [<ffffffffc045936f>] > nfsd4_recall_file_layout+0x17f/0x190 [nfsd] > [ 928.742380] [<ffffffffc045939d>] nfsd4_layout_lm_break+0x1d/0x30 > [nfsd] > [ 928.743115] [<ffffffff8f3056d8>] __break_lease+0x118/0x6a0 > [ 928.743759] [<ffffffffc02dea69>] xfs_break_layouts+0x79/0x120 > [xfs] > [ 928.744462] [<ffffffffc029ea04>] > xfs_file_aio_write_checks+0x94/0x1f0 [xfs] > [ 928.745251] [<ffffffffc029f36b>] > xfs_file_buffered_aio_write+0x7b/0x330 [xfs] > [ 928.746063] [<ffffffffc029f70c>] xfs_file_write_iter+0xec/0x140 > [xfs] > [ 928.746803] [<ffffffff8f2a0599>] do_iter_readv_writev+0xb9/0x140 > [ 928.747478] [<ffffffff8f2a126b>] do_readv_writev+0x19b/0x240 > [ 928.748146] [<ffffffffc029f620>] ? > xfs_file_buffered_aio_write+0x330/0x330 [xfs] > [ 928.748956] [<ffffffff8f29e02b>] ? do_dentry_open+0x28b/0x310 > [ 928.749614] [<ffffffffc029c800>] ? > xfs_extent_busy_ag_cmp+0x20/0x20 [xfs] > [ 928.750367] [<ffffffff8f2a156f>] vfs_writev+0x3f/0x50 > [ 928.750934] [<ffffffffc04276ca>] nfsd_vfs_write+0xca/0x3a0 [nfsd] > [ 928.751608] [<ffffffffc0429ec5>] nfsd_write+0x485/0x780 [nfsd] > [ 928.752263] [<ffffffffc043144c>] nfsd3_proc_write+0xbc/0x150 > [nfsd] > [ 928.752973] [<ffffffffc0421388>] nfsd_dispatch+0xb8/0x1f0 [nfsd] > [ 928.753642] [<ffffffffc036d78f>] svc_process_common+0x42f/0x690 > [sunrpc] > [ 928.754395] [<ffffffffc036e8e8>] svc_process+0x118/0x330 [sunrpc] > [ 928.755080] [<ffffffffc04208ac>] nfsd+0x19c/0x2b0 [nfsd] > [ 928.755681] [<ffffffffc0420715>] ? nfsd+0x5/0x2b0 [nfsd] > [ 928.756274] [<ffffffffc0420710>] ? nfsd_destroy+0x190/0x190 [nfsd] > [ 928.756991] [<ffffffff8f0d5891>] kthread+0x101/0x120 > [ 928.757563] [<ffffffff8f10dcc5>] ? > trace_hardirqs_on_caller+0xf5/0x1b0 > [ 928.758282] [<ffffffff8f8f2fef>] ret_from_fork+0x1f/0x40 > [ 928.758875] [<ffffffff8f0d5790>] ? > kthread_create_on_node+0x250/0x250 > > > So the client gets a flexfiles layout, and then tries to issue a v3 > WRITE against the file. XFS then recalls the layout, but the client > can't return the layout until the v3 WRITE completes. Eventually this > should resolve itself after 2 lease periods, but that's quite a long > time. > > I guess XFS requires recalling block and SCSI layouts when the server > wants to issue a write (or someone writes to it locally), but that > seems like it shouldn't be happening when the layout is a flexfiles > layout. > > Any thoughts on what the right fix is here? > > On a related note, knfsd will spam the heck out of the client with > CB_LAYOUTRECALLs during this time. I think we ought to consider fixing > the server not to treat an NFS_OK return from the client like > NFS4ERR_DELAY there, but that would mean a different mechanism for > timing out a CB_LAYOUTRECALL. I'm getting into similar trouble with SCSI layouts when the client ends up submitting a WRITE because the IO is not page aligned, but it already holds a layout for that range. It looks like the server sends a CB_LAYOUTRECALL, but the client has to answer NFS4ERR_DELAY because it is still holding the layout. Probably, the client should return any layouts it holds for that range before doing IO through the MDS. Alternatively, shouldn't the MDS accept IO from the same client that holds a layout for that range, rather than recall that layout? RFC 5661 Section 20.3.4 talks about the client submitting WRITEs before responding to CB_LAYOUTRECALL: "As always, the client may write the data through the metadata server." I'm trying to find the discussion that resulted in this commit: commit 6b9b21073d3b250e17812cd562fffc9006962b39 Author: Jeff Layton <jlayton@poochiereds.net> Date: Tue Dec 8 07:23:48 2015 -0500 nfsd: give up on CB_LAYOUTRECALLs after two lease periods Why should we poll the client if the client answers with NFS4ERR_DELAY? Can we instead just wait for the layout to be returned? Also, I think the 2*lease period timeout is currently broken because we reset tk_start after every call.. but that's not really causing any trouble. Ben ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS 2018-01-27 15:39 ` Benjamin Coddington @ 2018-01-27 21:41 ` Jeff Layton 0 siblings, 0 replies; 10+ messages in thread From: Jeff Layton @ 2018-01-27 21:41 UTC (permalink / raw) To: Benjamin Coddington Cc: open list:NFS, SUNRPC, AND..., Tom Haynes, Christoph Hellwig, Bruce Fields On Sat, 2018-01-27 at 10:39 -0500, Benjamin Coddington wrote: > On 11 Aug 2016, at 11:23, Jeff Layton wrote: > > > I was playing around with the in-kernel flexfiles server today, and I > > seem to be hitting a deadlock when using it on an XFS-exported > > filesystem. Here's the stack trace of how the CB_LAYOUTRECALL occurs: > > > > [ 928.736139] CPU: 0 PID: 846 Comm: nfsd Tainted: G OE > > 4.8.0-rc1+ #3 > > [ 928.737040] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > BIOS 1.9.1-1.fc24 04/01/2014 > > [ 928.738009] 0000000000000286 000000006125f50e ffff91153845b878 > > ffffffff8f463853 > > [ 928.738906] ffff91152ec194d0 ffff91152d31d9c0 ffff91153845b8a8 > > ffffffffc045936f > > [ 928.739788] ffff91152c051980 ffff91152d31d9c0 ffff91152c051540 > > ffff9115361b8a58 > > [ 928.740697] Call Trace: > > [ 928.740998] [<ffffffff8f463853>] dump_stack+0x86/0xc3 > > [ 928.741570] [<ffffffffc045936f>] > > nfsd4_recall_file_layout+0x17f/0x190 [nfsd] > > [ 928.742380] [<ffffffffc045939d>] nfsd4_layout_lm_break+0x1d/0x30 > > [nfsd] > > [ 928.743115] [<ffffffff8f3056d8>] __break_lease+0x118/0x6a0 > > [ 928.743759] [<ffffffffc02dea69>] xfs_break_layouts+0x79/0x120 > > [xfs] > > [ 928.744462] [<ffffffffc029ea04>] > > xfs_file_aio_write_checks+0x94/0x1f0 [xfs] > > [ 928.745251] [<ffffffffc029f36b>] > > xfs_file_buffered_aio_write+0x7b/0x330 [xfs] > > [ 928.746063] [<ffffffffc029f70c>] xfs_file_write_iter+0xec/0x140 > > [xfs] > > [ 928.746803] [<ffffffff8f2a0599>] do_iter_readv_writev+0xb9/0x140 > > [ 928.747478] [<ffffffff8f2a126b>] do_readv_writev+0x19b/0x240 > > [ 928.748146] [<ffffffffc029f620>] ? > > xfs_file_buffered_aio_write+0x330/0x330 [xfs] > > [ 928.748956] [<ffffffff8f29e02b>] ? do_dentry_open+0x28b/0x310 > > [ 928.749614] [<ffffffffc029c800>] ? > > xfs_extent_busy_ag_cmp+0x20/0x20 [xfs] > > [ 928.750367] [<ffffffff8f2a156f>] vfs_writev+0x3f/0x50 > > [ 928.750934] [<ffffffffc04276ca>] nfsd_vfs_write+0xca/0x3a0 [nfsd] > > [ 928.751608] [<ffffffffc0429ec5>] nfsd_write+0x485/0x780 [nfsd] > > [ 928.752263] [<ffffffffc043144c>] nfsd3_proc_write+0xbc/0x150 > > [nfsd] > > [ 928.752973] [<ffffffffc0421388>] nfsd_dispatch+0xb8/0x1f0 [nfsd] > > [ 928.753642] [<ffffffffc036d78f>] svc_process_common+0x42f/0x690 > > [sunrpc] > > [ 928.754395] [<ffffffffc036e8e8>] svc_process+0x118/0x330 [sunrpc] > > [ 928.755080] [<ffffffffc04208ac>] nfsd+0x19c/0x2b0 [nfsd] > > [ 928.755681] [<ffffffffc0420715>] ? nfsd+0x5/0x2b0 [nfsd] > > [ 928.756274] [<ffffffffc0420710>] ? nfsd_destroy+0x190/0x190 [nfsd] > > [ 928.756991] [<ffffffff8f0d5891>] kthread+0x101/0x120 > > [ 928.757563] [<ffffffff8f10dcc5>] ? > > trace_hardirqs_on_caller+0xf5/0x1b0 > > [ 928.758282] [<ffffffff8f8f2fef>] ret_from_fork+0x1f/0x40 > > [ 928.758875] [<ffffffff8f0d5790>] ? > > kthread_create_on_node+0x250/0x250 > > > > > > So the client gets a flexfiles layout, and then tries to issue a v3 > > WRITE against the file. XFS then recalls the layout, but the client > > can't return the layout until the v3 WRITE completes. Eventually this > > should resolve itself after 2 lease periods, but that's quite a long > > time. > > > > I guess XFS requires recalling block and SCSI layouts when the server > > wants to issue a write (or someone writes to it locally), but that > > seems like it shouldn't be happening when the layout is a flexfiles > > layout. > > > > Any thoughts on what the right fix is here? > > > > On a related note, knfsd will spam the heck out of the client with > > CB_LAYOUTRECALLs during this time. I think we ought to consider fixing > > the server not to treat an NFS_OK return from the client like > > NFS4ERR_DELAY there, but that would mean a different mechanism for > > timing out a CB_LAYOUTRECALL. > > I'm getting into similar trouble with SCSI layouts when the client ends > up > submitting a WRITE because the IO is not page aligned, but it already > holds > a layout for that range. It looks like the server sends a > CB_LAYOUTRECALL, > but the client has to answer NFS4ERR_DELAY because it is still holding > the > layout. > > Probably, the client should return any layouts it holds for that range > before > doing IO through the MDS. > Yes, that might be good. Could even prefix the WRITE compound with a LAYOUTRETURN if you want to get fancy. :) > Alternatively, shouldn't the MDS accept IO from the same client that > holds a > layout for that range, rather than recall that layout? RFC 5661 Section > 20.3.4 talks about the client submitting WRITEs before responding to > CB_LAYOUTRECALL: "As always, the client may write the data through the > metadata server." > Agreed. That seems reasonable too. > I'm trying to find the discussion that resulted in this commit: > > commit 6b9b21073d3b250e17812cd562fffc9006962b39 > Author: Jeff Layton <jlayton@poochiereds.net> > Date: Tue Dec 8 07:23:48 2015 -0500 > > nfsd: give up on CB_LAYOUTRECALLs after two lease periods > > Why should we poll the client if the client answers with NFS4ERR_DELAY? > Can > we instead just wait for the layout to be returned? > No. NFS4ERR_DELAY just means "I'm too busy to answer right now, please call again later". You can't infer that the client has made any note of the CB_LAYOUTRECALL at all since it didn't succeed. Returning NFS4_OK on a CB_LAYOUTRECALL just means that you acknowledge that it has been recalled and will eventually send a LAYOUTRETURN. It doesn't mean that you are immediately returning it. Probably what the client should do in this situation is mark the layout as having been recalled and return NFS4_OK instead of NFS4ERR_DELAY. It seems like that ought to be possible, but I haven't looked at the code to see why that isn't occurring. > Also, I think the 2*lease period timeout is currently broken because we > reset > tk_start after every call.. but that's not really causing any trouble. > It'd be good to fix that too, since you're in there... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-01-27 21:41 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-08-11 15:23 CB_LAYOUTRECALL "deadlock" with in-kernel flexfiles server and XFS Jeff Layton 2016-08-11 15:55 ` Trond Myklebust 2016-08-11 16:06 ` Jeff Layton 2016-08-11 16:20 ` Trond Myklebust 2016-08-11 16:25 ` hch 2016-08-11 16:33 ` Jeff Layton 2016-08-11 16:59 ` hch 2016-08-11 17:10 ` Jeff Layton 2018-01-27 15:39 ` Benjamin Coddington 2018-01-27 21:41 ` Jeff Layton
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.