* [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release @ 2011-08-28 6:22 Vitaliy Gusev 2011-09-06 19:29 ` Trond Myklebust 0 siblings, 1 reply; 29+ messages in thread From: Vitaliy Gusev @ 2011-08-28 6:22 UTC (permalink / raw) To: Trond.Myklebust; +Cc: peng_tao, linux-nfs, Vitaliy Gusev pnfs_layout_segment can be already under handling LAYOUTCOMMIT, so adding list twice causes hang: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4] Call Trace: nfs4_layoutcommit_release+0x5a/0x9c [nfs] rpc_release_calldata+0x17/0x19 [sunrpc] rpc_free_task+0x5e/0x66 [sunrpc] __rpc_execute+0x29e/0x2ad [sunrpc] rpc_async_schedule+0x15/0x17 [sunrpc] process_one_work+0x213/0x3ba process_one_work+0x142/0x3ba __rpc_execute+0x2ad/0x2ad [sunrpc] worker_thread+0xfd/0x181 Signed-off-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com> --- fs/nfs/pnfs.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index e550e88..1465f44 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct inode *inode, struct list_head *listp) list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list) { if (lseg->pls_range.iomode == IOMODE_RW && - test_bit(NFS_LSEG_LAYOUTCOMMIT, &lseg->pls_flags)) + test_bit(NFS_LSEG_LAYOUTCOMMIT, &lseg->pls_flags) && + list_empty(&lseg->pls_lc_list)) list_add(&lseg->pls_lc_list, listp); } } -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-08-28 6:22 [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release Vitaliy Gusev @ 2011-09-06 19:29 ` Trond Myklebust 2011-09-06 22:13 ` Vitaliy Gusev 0 siblings, 1 reply; 29+ messages in thread From: Trond Myklebust @ 2011-09-06 19:29 UTC (permalink / raw) To: Vitaliy Gusev; +Cc: peng_tao, linux-nfs, Vitaliy Gusev On Sun, 2011-08-28 at 10:22 +0400, Vitaliy Gusev wrote: > pnfs_layout_segment can be already under handling LAYOUTCOMMIT, > so adding list twice causes hang: > > BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4] > Call Trace: > > nfs4_layoutcommit_release+0x5a/0x9c [nfs] > rpc_release_calldata+0x17/0x19 [sunrpc] > rpc_free_task+0x5e/0x66 [sunrpc] > __rpc_execute+0x29e/0x2ad [sunrpc] > rpc_async_schedule+0x15/0x17 [sunrpc] > process_one_work+0x213/0x3ba > process_one_work+0x142/0x3ba > __rpc_execute+0x2ad/0x2ad [sunrpc] > worker_thread+0xfd/0x181 > > Signed-off-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com> > --- > fs/nfs/pnfs.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > index e550e88..1465f44 100644 > --- a/fs/nfs/pnfs.c > +++ b/fs/nfs/pnfs.c > @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct inode *inode, struct list_head *listp) > > list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list) { > if (lseg->pls_range.iomode == IOMODE_RW && > - test_bit(NFS_LSEG_LAYOUTCOMMIT, &lseg->pls_flags)) > + test_bit(NFS_LSEG_LAYOUTCOMMIT, &lseg->pls_flags) && > + list_empty(&lseg->pls_lc_list)) > list_add(&lseg->pls_lc_list, listp); > } > } If the lseg is already part of one layoutcommit, but we're sending a second one for the same range (presumably because we wrote more data in the same region), then the above causes the lseg to be excluded. I agree that the current code causes list corruption, but before applying something like the above patch, I'd like to understand why it is correct. Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-06 19:29 ` Trond Myklebust @ 2011-09-06 22:13 ` Vitaliy Gusev 2011-09-06 22:32 ` Trond Myklebust 2011-09-08 10:00 ` tao.peng 0 siblings, 2 replies; 29+ messages in thread From: Vitaliy Gusev @ 2011-09-06 22:13 UTC (permalink / raw) To: Trond Myklebust; +Cc: Vitaliy Gusev, peng_tao, linux-nfs >> @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct inode *inode, struct list_head *listp) >> >> list_for_each_entry(lseg,&NFS_I(inode)->layout->plh_segs, pls_list) { >> if (lseg->pls_range.iomode == IOMODE_RW&& >> - test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)) >> + test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)&& >> + list_empty(&lseg->pls_lc_list)) >> list_add(&lseg->pls_lc_list, listp); >> } >> } > > If the lseg is already part of one layoutcommit, but we're sending a > second one for the same range (presumably because we wrote more data in > the same region), then the above causes the lseg to be excluded. Yes, lseg is excluded, This patch does exactly only exclusion of lseg. lseg is used here only to get/put reference on this lseg, so skipping is correct. However, checking on list_empty can occur (on other CPU) in the middle: list_del_init(&lseg->pls_lc_list); Here >>>>>> if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT, &lseg->pls_flags)) put_lseg(lseg); So list_del_init must be executed under the same lock as pnfs_list_write_lseg, i.e. inode->i_lock. > > I agree that the current code causes list corruption, but before > applying something like the above patch, I'd like to understand why it > is correct. > > Trond > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-06 22:13 ` Vitaliy Gusev @ 2011-09-06 22:32 ` Trond Myklebust 2011-09-08 10:21 ` tao.peng 2011-09-08 10:00 ` tao.peng 1 sibling, 1 reply; 29+ messages in thread From: Trond Myklebust @ 2011-09-06 22:32 UTC (permalink / raw) To: Vitaliy Gusev; +Cc: Vitaliy Gusev, peng_tao, linux-nfs On Wed, 2011-09-07 at 02:13 +0400, Vitaliy Gusev wrote: > >> @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct inode *inode, struct list_head *listp) > >> > >> list_for_each_entry(lseg,&NFS_I(inode)->layout->plh_segs, pls_list) { > >> if (lseg->pls_range.iomode == IOMODE_RW&& > >> - test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)) > >> + test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)&& > >> + list_empty(&lseg->pls_lc_list)) > >> list_add(&lseg->pls_lc_list, listp); > >> } > >> } > > > > If the lseg is already part of one layoutcommit, but we're sending a > > second one for the same range (presumably because we wrote more data in > > the same region), then the above causes the lseg to be excluded. > > > Yes, lseg is excluded, This patch does exactly only exclusion of lseg. > lseg is used here only to get/put reference on this lseg, so skipping is > correct. Are you sure? As far as I can see, pnfs_list_write_lseg() actually assigns the lseg to a particular layoutcommit call. My questions are: if I write to an area described by that lseg _after_ it has been assigned to that layoutcommit RPC call (and the latter has already been dispatched to the metadata server), then A. shouldn't it be assigned to a second layoutcommit call too? B. If not, what are we doing to ensure mutual exclusion between layoutcommit RPC calls and pNFS writes to the data server? > However, checking on list_empty can occur (on other CPU) in the middle: > > list_del_init(&lseg->pls_lc_list); > Here >>>>>> > if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT, > &lseg->pls_flags)) > put_lseg(lseg); > > > So list_del_init must be executed under the same lock as > pnfs_list_write_lseg, i.e. inode->i_lock. Agreed if my questions above make no sense. Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-06 22:32 ` Trond Myklebust @ 2011-09-08 10:21 ` tao.peng 2011-09-08 12:01 ` Myklebust, Trond 0 siblings, 1 reply; 29+ messages in thread From: tao.peng @ 2011-09-08 10:21 UTC (permalink / raw) To: Trond.Myklebust, gusev.vitaliy; +Cc: gusev.vitaliy, linux-nfs SGksIFRyb25kLA0KDQpTb3JyeSBmb3IgdGhlIGxhdGUgcmVzcG9uc2UuDQoNCj4gLS0tLS1Pcmln aW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogVHJvbmQgTXlrbGVidXN0IFttYWlsdG86VHJvbmQu TXlrbGVidXN0QG5ldGFwcC5jb21dDQo+IFNlbnQ6IFdlZG5lc2RheSwgU2VwdGVtYmVyIDA3LCAy MDExIDY6MzMgQU0NCj4gVG86IFZpdGFsaXkgR3VzZXYNCj4gQ2M6IFZpdGFsaXkgR3VzZXY7IFBl bmcsIFRhbzsgbGludXgtbmZzQHZnZXIua2VybmVsLm9yZw0KPiBTdWJqZWN0OiBSZTogW1BBVENI XSBuZnM6IGZpeCBpbmlmaW5pdGUgbG9vcCBhdCBuZnM0X2xheW91dGNvbW1pdF9yZWxlYXNlDQo+ IA0KPiBPbiBXZWQsIDIwMTEtMDktMDcgYXQgMDI6MTMgKzA0MDAsIFZpdGFsaXkgR3VzZXYgd3Jv dGU6DQo+ID4gPj4gQEAgLTEzNzYsNyArMTM3Niw4IEBAIHN0YXRpYyB2b2lkIHBuZnNfbGlzdF93 cml0ZV9sc2VnKHN0cnVjdCBpbm9kZSAqaW5vZGUsDQo+IHN0cnVjdCBsaXN0X2hlYWQgKmxpc3Rw KQ0KPiA+ID4+DQo+ID4gPj4gICAJbGlzdF9mb3JfZWFjaF9lbnRyeShsc2VnLCZORlNfSShpbm9k ZSktPmxheW91dC0+cGxoX3NlZ3MsIHBsc19saXN0KSB7DQo+ID4gPj4gICAJCWlmIChsc2VnLT5w bHNfcmFuZ2UuaW9tb2RlID09IElPTU9ERV9SVyYmDQo+ID4gPj4gLQkJICAgIHRlc3RfYml0KE5G U19MU0VHX0xBWU9VVENPTU1JVCwmbHNlZy0+cGxzX2ZsYWdzKSkNCj4gPiA+PiArCQkgICAgdGVz dF9iaXQoTkZTX0xTRUdfTEFZT1VUQ09NTUlULCZsc2VnLT5wbHNfZmxhZ3MpJiYNCj4gPiA+PiAr CQkgICAgbGlzdF9lbXB0eSgmbHNlZy0+cGxzX2xjX2xpc3QpKQ0KPiA+ID4+ICAgCQkJbGlzdF9h ZGQoJmxzZWctPnBsc19sY19saXN0LCBsaXN0cCk7DQo+ID4gPj4gICAJfQ0KPiA+ID4+ICAgfQ0K PiA+ID4NCj4gPiA+IElmIHRoZSBsc2VnIGlzIGFscmVhZHkgcGFydCBvZiBvbmUgbGF5b3V0Y29t bWl0LCBidXQgd2UncmUgc2VuZGluZyBhDQo+ID4gPiBzZWNvbmQgb25lIGZvciB0aGUgc2FtZSBy YW5nZSAocHJlc3VtYWJseSBiZWNhdXNlIHdlIHdyb3RlIG1vcmUgZGF0YSBpbg0KPiA+ID4gdGhl IHNhbWUgcmVnaW9uKSwgdGhlbiB0aGUgYWJvdmUgY2F1c2VzIHRoZSBsc2VnIHRvIGJlIGV4Y2x1 ZGVkLg0KPiA+DQo+ID4NCj4gPiBZZXMsIGxzZWcgaXMgZXhjbHVkZWQsIFRoaXMgcGF0Y2ggZG9l cyBleGFjdGx5IG9ubHkgZXhjbHVzaW9uIG9mIGxzZWcuDQo+ID4gbHNlZyBpcyB1c2VkIGhlcmUg b25seSB0byBnZXQvcHV0IHJlZmVyZW5jZSBvbiB0aGlzIGxzZWcsIHNvIHNraXBwaW5nIGlzDQo+ ID4gY29ycmVjdC4NCj4gDQo+IEFyZSB5b3Ugc3VyZT8gQXMgZmFyIGFzIEkgY2FuIHNlZSwgcG5m c19saXN0X3dyaXRlX2xzZWcoKSBhY3R1YWxseQ0KPiBhc3NpZ25zIHRoZSBsc2VnIHRvIGEgcGFy dGljdWxhciBsYXlvdXRjb21taXQgY2FsbC4NCj4gDQo+IE15IHF1ZXN0aW9ucyBhcmU6IGlmIEkg d3JpdGUgdG8gYW4gYXJlYSBkZXNjcmliZWQgYnkgdGhhdCBsc2VnIF9hZnRlcl8NCj4gaXQgaGFz IGJlZW4gYXNzaWduZWQgdG8gdGhhdCBsYXlvdXRjb21taXQgUlBDIGNhbGwgKGFuZCB0aGUgbGF0 dGVyIGhhcw0KPiBhbHJlYWR5IGJlZW4gZGlzcGF0Y2hlZCB0byB0aGUgbWV0YWRhdGEgc2VydmVy KSwgdGhlbg0KPiAgICAgIEEuIHNob3VsZG4ndCBpdCBiZSBhc3NpZ25lZCB0byBhIHNlY29uZCBs YXlvdXRjb21taXQgY2FsbCB0b28/DQo+ICAgICAgQi4gSWYgbm90LCB3aGF0IGFyZSB3ZSBkb2lu ZyB0byBlbnN1cmUgbXV0dWFsIGV4Y2x1c2lvbiBiZXR3ZWVuDQo+ICAgICAgICAgbGF5b3V0Y29t bWl0IFJQQyBjYWxscyBhbmQgcE5GUyB3cml0ZXMgdG8gdGhlIGRhdGEgc2VydmVyPw0KSSB0aGlu ayBpdCBkZXBlbmRzIG9uIHRoZSBwdXJwb3NlIG9mIGxheW91dGNvbW1pdC4NCjEuIGZvciB1cGRh dGluZyBhdGltZS9tdGltZS4gSW4gdGhpcyBjYXNlLCB3ZSBkb24ndCBuZWVkIG11dHVhbCBleGNs dXNpb24NCmJldHdlZW4gbGF5b3V0Y29tbWl0IFJQQyBhbmQgcE5GUyB3cml0ZXMgdG8gZGF0YSBz ZXJ2ZXIuDQoyLiBmb3IgdXBkYXRpbmcgTEQgc3BlY2lmaWMgc3RhdGUuIEluIHRoaXMgY2FzZSwg TEQgc2hvdWxkIGRlY2lkZSB3aGF0IHRvIGNvbW1pdA0KKGFuZCBhY3R1YWxseSBpZ25vcmVzIHRo ZSByYW5nZSBvZiBsc2VnKS4gVGFrZSBibG9jayBsYXlvdXQgZm9yIGV4YW1wbGUuIEl0IG1haW50 YWlucw0KYmxvY2tzaXplZCBleHRlbnQgc3RhdGUgaW5zaWRlIExEIGFuZCBkZXRlcm1pbmVzIHdo YXQgdG8gZW5jb2RlIGluc2lkZSBsYXlvdXRjb21taXQNClJQQyB3aGVuZXZlciB0aGVyZSBpcyBh IGdlbmVyaWMgbGF5ZXIgbGF5b3V0Y29tbWl0IGNhbGwuIFNvIHdoZW4gYW4gZXh0ZW50IGlzIGJl aW5nDQpsYXlvdXRjb21taXR0ZWQsIGNsaWVudCBjYW4gc3RpbGwgd3JpdGUgdG8gdGhlIHNhbWUg cmFuZ2UsIGFzIGxvbmcgYXMgdGhlIGxzZWcgaXMgaGVsZCBieSBjbGllbnQuDQoNCkkgYW0gbm90 IGZhbWlsaWFyIGVub3VnaCB3aXRoIGZpbGUvb2JqZWN0IGxheW91dCBjYXNlIHRob3VnaC4gRG9l cyBjdXJyZW50IGltcGxlbWVudGF0aW9uDQpicmVhayBhbnkgcmVxdWlyZW1lbnRzIGZvciBmaWxl L29iamVjdCBsYXlvdXQ/DQoNClRoYW5rcywNClRhbw0KPiANCj4gPiBIb3dldmVyLCBjaGVja2lu ZyBvbiBsaXN0X2VtcHR5IGNhbiBvY2N1ciAob24gb3RoZXIgQ1BVKSBpbiB0aGUgbWlkZGxlOg0K PiA+DQo+ID4gCWxpc3RfZGVsX2luaXQoJmxzZWctPnBsc19sY19saXN0KTsNCj4gPiBIZXJlID4+ Pj4+Pg0KPiA+IAlpZiAodGVzdF9hbmRfY2xlYXJfYml0KE5GU19MU0VHX0xBWU9VVENPTU1JVCwN Cj4gPiAJCQkgICAgICAgJmxzZWctPnBsc19mbGFncykpDQo+ID4gCQlwdXRfbHNlZyhsc2VnKTsN Cj4gPg0KPiA+DQo+ID4gU28gbGlzdF9kZWxfaW5pdCBtdXN0IGJlIGV4ZWN1dGVkIHVuZGVyIHRo ZSBzYW1lIGxvY2sgYXMNCj4gPiBwbmZzX2xpc3Rfd3JpdGVfbHNlZywgaS5lLiBpbm9kZS0+aV9s b2NrLg0KPiANCj4gQWdyZWVkIGlmIG15IHF1ZXN0aW9ucyBhYm92ZSBtYWtlIG5vIHNlbnNlLg0K PiANCj4gQ2hlZXJzDQo+ICAgVHJvbmQNCj4gDQo+IC0tDQo+IFRyb25kIE15a2xlYnVzdA0KPiBM aW51eCBORlMgY2xpZW50IG1haW50YWluZXINCj4gDQo+IE5ldEFwcA0KPiBUcm9uZC5NeWtsZWJ1 c3RAbmV0YXBwLmNvbQ0KPiB3d3cubmV0YXBwLmNvbQ0KPiANCg0KTu+/ve+/ve+/ve+/ve+/vXLv v73vv71577+977+977+9Yu+/vVjvv73vv73Hp3bvv71e77+9Kd66ey5u77+9K++/ve+/ve+/ve+/ vXvvv73vv73vv70i77+977+9Xm7vv71y77+977+977+9eu+/vRrvv73vv71o77+977+977+977+9 Ju+/ve+/vR7vv71H77+977+977+9aO+/vQMo77+96ZqO77+93aJqIu+/ve+/vRrvv70bbe+/ve+/ ve+/ve+/ve+/vXrvv73elu+/ve+/ve+/vWbvv73vv73vv71o77+977+977+9fu+/vW3vv70= ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-08 10:21 ` tao.peng @ 2011-09-08 12:01 ` Myklebust, Trond 2011-09-08 15:00 ` Peng Tao 0 siblings, 1 reply; 29+ messages in thread From: Myklebust, Trond @ 2011-09-08 12:01 UTC (permalink / raw) To: tao.peng, gusev.vitaliy; +Cc: gusev.vitaliy, linux-nfs PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiB0YW8ucGVuZ0BlbWMuY29tIFtt YWlsdG86dGFvLnBlbmdAZW1jLmNvbV0NCj4gU2VudDogVGh1cnNkYXksIFNlcHRlbWJlciAwOCwg MjAxMSA2OjIxIEFNDQo+IFRvOiBNeWtsZWJ1c3QsIFRyb25kOyBndXNldi52aXRhbGl5QG5leGVu dGEuY29tDQo+IENjOiBndXNldi52aXRhbGl5QGdtYWlsLmNvbTsgbGludXgtbmZzQHZnZXIua2Vy bmVsLm9yZw0KPiBTdWJqZWN0OiBSRTogW1BBVENIXSBuZnM6IGZpeCBpbmlmaW5pdGUgbG9vcCBh dA0KPiBuZnM0X2xheW91dGNvbW1pdF9yZWxlYXNlDQo+IA0KPiBIaSwgVHJvbmQsDQo+IA0KPiBT b3JyeSBmb3IgdGhlIGxhdGUgcmVzcG9uc2UuDQo+IA0KPiA+IC0tLS0tT3JpZ2luYWwgTWVzc2Fn ZS0tLS0tDQo+ID4gRnJvbTogVHJvbmQgTXlrbGVidXN0IFttYWlsdG86VHJvbmQuTXlrbGVidXN0 QG5ldGFwcC5jb21dDQo+ID4gU2VudDogV2VkbmVzZGF5LCBTZXB0ZW1iZXIgMDcsIDIwMTEgNjoz MyBBTQ0KPiA+IFRvOiBWaXRhbGl5IEd1c2V2DQo+ID4gQ2M6IFZpdGFsaXkgR3VzZXY7IFBlbmcs IFRhbzsgbGludXgtbmZzQHZnZXIua2VybmVsLm9yZw0KPiA+IFN1YmplY3Q6IFJlOiBbUEFUQ0hd IG5mczogZml4IGluaWZpbml0ZSBsb29wIGF0DQo+IG5mczRfbGF5b3V0Y29tbWl0X3JlbGVhc2UN Cj4gPg0KPiA+IE9uIFdlZCwgMjAxMS0wOS0wNyBhdCAwMjoxMyArMDQwMCwgVml0YWxpeSBHdXNl diB3cm90ZToNCj4gPiA+ID4+IEBAIC0xMzc2LDcgKzEzNzYsOCBAQCBzdGF0aWMgdm9pZCBwbmZz X2xpc3Rfd3JpdGVfbHNlZyhzdHJ1Y3QNCj4gaW5vZGUgKmlub2RlLA0KPiA+IHN0cnVjdCBsaXN0 X2hlYWQgKmxpc3RwKQ0KPiA+ID4gPj4NCj4gPiA+ID4+ICAgCWxpc3RfZm9yX2VhY2hfZW50cnko bHNlZywmTkZTX0koaW5vZGUpLT5sYXlvdXQtPnBsaF9zZWdzLA0KPiBwbHNfbGlzdCkgew0KPiA+ ID4gPj4gICAJCWlmIChsc2VnLT5wbHNfcmFuZ2UuaW9tb2RlID09IElPTU9ERV9SVyYmDQo+ID4g PiA+PiAtCQkgICAgdGVzdF9iaXQoTkZTX0xTRUdfTEFZT1VUQ09NTUlULCZsc2VnLT5wbHNfZmxh Z3MpKQ0KPiA+ID4gPj4gKwkJICAgIHRlc3RfYml0KE5GU19MU0VHX0xBWU9VVENPTU1JVCwmbHNl Zy0NCj4gPnBsc19mbGFncykmJg0KPiA+ID4gPj4gKwkJICAgIGxpc3RfZW1wdHkoJmxzZWctPnBs c19sY19saXN0KSkNCj4gPiA+ID4+ICAgCQkJbGlzdF9hZGQoJmxzZWctPnBsc19sY19saXN0LCBs aXN0cCk7DQo+ID4gPiA+PiAgIAl9DQo+ID4gPiA+PiAgIH0NCj4gPiA+ID4NCj4gPiA+ID4gSWYg dGhlIGxzZWcgaXMgYWxyZWFkeSBwYXJ0IG9mIG9uZSBsYXlvdXRjb21taXQsIGJ1dCB3ZSdyZQ0K PiBzZW5kaW5nIGENCj4gPiA+ID4gc2Vjb25kIG9uZSBmb3IgdGhlIHNhbWUgcmFuZ2UgKHByZXN1 bWFibHkgYmVjYXVzZSB3ZSB3cm90ZSBtb3JlDQo+IGRhdGEgaW4NCj4gPiA+ID4gdGhlIHNhbWUg cmVnaW9uKSwgdGhlbiB0aGUgYWJvdmUgY2F1c2VzIHRoZSBsc2VnIHRvIGJlIGV4Y2x1ZGVkLg0K PiA+ID4NCj4gPiA+DQo+ID4gPiBZZXMsIGxzZWcgaXMgZXhjbHVkZWQsIFRoaXMgcGF0Y2ggZG9l cyBleGFjdGx5IG9ubHkgZXhjbHVzaW9uIG9mDQo+IGxzZWcuDQo+ID4gPiBsc2VnIGlzIHVzZWQg aGVyZSBvbmx5IHRvIGdldC9wdXQgcmVmZXJlbmNlIG9uIHRoaXMgbHNlZywgc28NCj4gc2tpcHBp bmcgaXMNCj4gPiA+IGNvcnJlY3QuDQo+ID4NCj4gPiBBcmUgeW91IHN1cmU/IEFzIGZhciBhcyBJ IGNhbiBzZWUsIHBuZnNfbGlzdF93cml0ZV9sc2VnKCkgYWN0dWFsbHkNCj4gPiBhc3NpZ25zIHRo ZSBsc2VnIHRvIGEgcGFydGljdWxhciBsYXlvdXRjb21taXQgY2FsbC4NCj4gPg0KPiA+IE15IHF1 ZXN0aW9ucyBhcmU6IGlmIEkgd3JpdGUgdG8gYW4gYXJlYSBkZXNjcmliZWQgYnkgdGhhdCBsc2Vn DQo+IF9hZnRlcl8NCj4gPiBpdCBoYXMgYmVlbiBhc3NpZ25lZCB0byB0aGF0IGxheW91dGNvbW1p dCBSUEMgY2FsbCAoYW5kIHRoZSBsYXR0ZXINCj4gaGFzDQo+ID4gYWxyZWFkeSBiZWVuIGRpc3Bh dGNoZWQgdG8gdGhlIG1ldGFkYXRhIHNlcnZlciksIHRoZW4NCj4gPiAgICAgIEEuIHNob3VsZG4n dCBpdCBiZSBhc3NpZ25lZCB0byBhIHNlY29uZCBsYXlvdXRjb21taXQgY2FsbCB0b28/DQo+ID4g ICAgICBCLiBJZiBub3QsIHdoYXQgYXJlIHdlIGRvaW5nIHRvIGVuc3VyZSBtdXR1YWwgZXhjbHVz aW9uIGJldHdlZW4NCj4gPiAgICAgICAgIGxheW91dGNvbW1pdCBSUEMgY2FsbHMgYW5kIHBORlMg d3JpdGVzIHRvIHRoZSBkYXRhIHNlcnZlcj8NCj4gSSB0aGluayBpdCBkZXBlbmRzIG9uIHRoZSBw dXJwb3NlIG9mIGxheW91dGNvbW1pdC4NCj4gMS4gZm9yIHVwZGF0aW5nIGF0aW1lL210aW1lLiBJ biB0aGlzIGNhc2UsIHdlIGRvbid0IG5lZWQgbXV0dWFsDQo+IGV4Y2x1c2lvbg0KDQpBZ3JlZWQu DQoNCj4gYmV0d2VlbiBsYXlvdXRjb21taXQgUlBDIGFuZCBwTkZTIHdyaXRlcyB0byBkYXRhIHNl cnZlci4NCj4gMi4gZm9yIHVwZGF0aW5nIExEIHNwZWNpZmljIHN0YXRlLiBJbiB0aGlzIGNhc2Us IExEIHNob3VsZCBkZWNpZGUgd2hhdA0KPiB0byBjb21taXQNCj4gKGFuZCBhY3R1YWxseSBpZ25v cmVzIHRoZSByYW5nZSBvZiBsc2VnKS4gVGFrZSBibG9jayBsYXlvdXQgZm9yDQo+IGV4YW1wbGUu IEl0IG1haW50YWlucw0KPiBibG9ja3NpemVkIGV4dGVudCBzdGF0ZSBpbnNpZGUgTEQgYW5kIGRl dGVybWluZXMgd2hhdCB0byBlbmNvZGUgaW5zaWRlDQo+IGxheW91dGNvbW1pdA0KPiBSUEMgd2hl bmV2ZXIgdGhlcmUgaXMgYSBnZW5lcmljIGxheWVyIGxheW91dGNvbW1pdCBjYWxsLiBTbyB3aGVu IGFuDQo+IGV4dGVudCBpcyBiZWluZw0KPiBsYXlvdXRjb21taXR0ZWQsIGNsaWVudCBjYW4gc3Rp bGwgd3JpdGUgdG8gdGhlIHNhbWUgcmFuZ2UsIGFzIGxvbmcgYXMNCj4gdGhlIGxzZWcgaXMgaGVs ZCBieSBjbGllbnQuDQoNClllcywgYnV0IGFzIGZhciBhcyBJIGNhbiBzZWUsIGV2ZW4gaW4gdGhl IGJsb2NrcyBjYXNlIHRoZXJlIGNhbiBiZSBtdWx0aXBsZSBleHRlbnRzIHBlciBsYXlvdXQgc2Vn bWVudC4gV2hhdCBpZiBJIHdyaXRlIHRvIG9uZSB1bmluaXRpYWxpc2VkIGV4dGVudCwgbGF5b3V0 Y29tbWl0LCB0aGVuIHdyaXRlIHRvIGFub3RoZXIgdW5pbml0aWFsaXplZCBleHRlbnQgaW4gdGhl IHNhbWUgbGF5b3V0IHNlZ21lbnQgYW5kIGxheW91dGNvbW1pdD8gSW4gbXkgcmVhZGluZyBvZiB0 aGUgY29kZSwgdGhlcmUgaXMgYSBjaGFuY2UgdGhhdCB0aGUgc2Vjb25kIGxheW91dGNvbW1pdCB3 aWxsIGZhaWwgdG8gcGljayB1cCB0aGUgbGF5b3V0IHNlZ21lbnQsIGFuZCBzbyB3aWxsIGZhaWwg dG8gbm90aWZ5IHRoZSBNRFMgdGhhdCB0aGUgc2Vjb25kIGV4dGVudCBub3cgY29udGFpbnMgZGF0 YS4NCg0KVHJvbmQNCgTvv717Lm7vv70r77+977+977+977+977+977+977+9KyXvv73vv71sendt 77+977+9Yu+/veunsu+/ve+/vXLvv73vv716WO+/ve+/vRnfsinvv73vv73vv713Kh9qZ++/ve+/ ve+/vR7vv73vv73vv73vv73vv73domov77+977+977+9eu+/vd6W77+977+9Mu+/vd6Z77+977+9 77+9Ju+/vSnfoe+/vWHvv73vv71/77+977+9Hu+/vUfvv73vv73vv71o77+9D++/vWo6K3bvv73v v73vv71377+92aU= ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-08 12:01 ` Myklebust, Trond @ 2011-09-08 15:00 ` Peng Tao 2011-09-08 17:05 ` Myklebust, Trond 0 siblings, 1 reply; 29+ messages in thread From: Peng Tao @ 2011-09-08 15:00 UTC (permalink / raw) To: Myklebust, Trond; +Cc: tao.peng, gusev.vitaliy, gusev.vitaliy, linux-nfs On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >> -----Original Message----- >> From: tao.peng@emc.com [mailto:tao.peng@emc.com] >> Sent: Thursday, September 08, 2011 6:21 AM >> To: Myklebust, Trond; gusev.vitaliy@nexenta.com >> Cc: gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org >> Subject: RE: [PATCH] nfs: fix inifinite loop at >> nfs4_layoutcommit_release >> >> Hi, Trond, >> >> Sorry for the late response. >> >> > -----Original Message----- >> > From: Trond Myklebust [mailto:Trond.Myklebust@netapp.com] >> > Sent: Wednesday, September 07, 2011 6:33 AM >> > To: Vitaliy Gusev >> > Cc: Vitaliy Gusev; Peng, Tao; linux-nfs@vger.kernel.org >> > Subject: Re: [PATCH] nfs: fix inifinite loop at >> nfs4_layoutcommit_release >> > >> > On Wed, 2011-09-07 at 02:13 +0400, Vitaliy Gusev wrote: >> > > >> @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct >> inode *inode, >> > struct list_head *listp) >> > > >> >> > > >> list_for_each_entry(lseg,&NFS_I(inode)->layout->plh_segs, >> pls_list) { >> > > >> if (lseg->pls_range.iomode == IOMODE_RW&& >> > > >> - test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)) >> > > >> + test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg- >> >pls_flags)&& >> > > >> + list_empty(&lseg->pls_lc_list)) >> > > >> list_add(&lseg->pls_lc_list, listp); >> > > >> } >> > > >> } >> > > > >> > > > If the lseg is already part of one layoutcommit, but we're >> sending a >> > > > second one for the same range (presumably because we wrote more >> data in >> > > > the same region), then the above causes the lseg to be excluded. >> > > >> > > >> > > Yes, lseg is excluded, This patch does exactly only exclusion of >> lseg. >> > > lseg is used here only to get/put reference on this lseg, so >> skipping is >> > > correct. >> > >> > Are you sure? As far as I can see, pnfs_list_write_lseg() actually >> > assigns the lseg to a particular layoutcommit call. >> > >> > My questions are: if I write to an area described by that lseg >> _after_ >> > it has been assigned to that layoutcommit RPC call (and the latter >> has >> > already been dispatched to the metadata server), then >> > A. shouldn't it be assigned to a second layoutcommit call too? >> > B. If not, what are we doing to ensure mutual exclusion between >> > layoutcommit RPC calls and pNFS writes to the data server? >> I think it depends on the purpose of layoutcommit. >> 1. for updating atime/mtime. In this case, we don't need mutual >> exclusion > > Agreed. > >> between layoutcommit RPC and pNFS writes to data server. >> 2. for updating LD specific state. In this case, LD should decide what >> to commit >> (and actually ignores the range of lseg). Take block layout for >> example. It maintains >> blocksized extent state inside LD and determines what to encode inside >> layoutcommit >> RPC whenever there is a generic layer layoutcommit call. So when an >> extent is being >> layoutcommitted, client can still write to the same range, as long as >> the lseg is held by client. > > Yes, but as far as I can see, even in the blocks case there can be multiple extents per layout segment. What if I write to one uninitialised extent, layoutcommit, then write to another uninitialized extent in the same layout segment and layoutcommit? In my reading of the code, there is a chance that the second layoutcommit will fail to pick up the layout segment, and so will fail to notify the MDS that the second extent now contains data. blocklayout does not decide what to layoutcommit according to the lseg list. Instead, it keeps track of each extent's state at the granularity of blocksize, and encode whatever needs layoutcommitted in the layoutcommit call. So in your above case, as long as the second layoutcommit is issued, blocklayout will encode the newly written extent in the second layoutcommit call, even if the lseg is not attached to the second layoutcommit. But that leads to another two question: 1. How do we protect against layoutrecall if lseg is not linked to layoutcommit? For this one, can we just reject layoutrecall if there is inflight layoutcommit? It will be less parallel but can guarantee current implementation correctness. 2. blocklayout ONLY: bl_committing may be overloaded by several layoutcommit calls and we don't have information in cleanup_layoutcommit() on how many entry should be removed from bl_committing. Maybe we can add a (void*) to struct nfs4_layoutcommit_data, so that LD can pass some private information between encode_layoutcommit() and cleanup_layoutcommit()? Thanks, Tao ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-08 15:00 ` Peng Tao @ 2011-09-08 17:05 ` Myklebust, Trond 2011-09-09 3:11 ` tao.peng 0 siblings, 1 reply; 29+ messages in thread From: Myklebust, Trond @ 2011-09-08 17:05 UTC (permalink / raw) To: Peng Tao; +Cc: tao.peng, gusev.vitaliy, gusev.vitaliy, linux-nfs PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBQZW5nIFRhbyBbbWFpbHRvOmJl cmd3b2xmQGdtYWlsLmNvbV0NCj4gU2VudDogVGh1cnNkYXksIFNlcHRlbWJlciAwOCwgMjAxMSAx MTowMCBBTQ0KPiBUbzogTXlrbGVidXN0LCBUcm9uZA0KPiBDYzogdGFvLnBlbmdAZW1jLmNvbTsg Z3VzZXYudml0YWxpeUBuZXhlbnRhLmNvbTsNCj4gZ3VzZXYudml0YWxpeUBnbWFpbC5jb207IGxp bnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmcNCj4gU3ViamVjdDogUmU6IFtQQVRDSF0gbmZzOiBmaXgg aW5pZmluaXRlIGxvb3AgYXQNCj4gbmZzNF9sYXlvdXRjb21taXRfcmVsZWFzZQ0KPiANCj4gT24g VGh1LCBTZXAgOCwgMjAxMSBhdCA4OjAxIFBNLCBNeWtsZWJ1c3QsIFRyb25kDQo+IDxUcm9uZC5N eWtsZWJ1c3RAbmV0YXBwLmNvbT4gd3JvdGU6DQo+ID4+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0t LS0tDQo+DQo+ID4NCj4gPiBZZXMsIGJ1dCBhcyBmYXIgYXMgSSBjYW4gc2VlLCBldmVuIGluIHRo ZSBibG9ja3MgY2FzZSB0aGVyZSBjYW4gYmUNCj4gbXVsdGlwbGUgZXh0ZW50cyBwZXIgbGF5b3V0 IHNlZ21lbnQuIFdoYXQgaWYgSSB3cml0ZSB0byBvbmUNCj4gdW5pbml0aWFsaXNlZCBleHRlbnQs IGxheW91dGNvbW1pdCwgdGhlbiB3cml0ZSB0byBhbm90aGVyIHVuaW5pdGlhbGl6ZWQNCj4gZXh0 ZW50IGluIHRoZSBzYW1lIGxheW91dCBzZWdtZW50IGFuZCBsYXlvdXRjb21taXQ/IEluIG15IHJl YWRpbmcgb2YNCj4gdGhlIGNvZGUsIHRoZXJlIGlzIGEgY2hhbmNlIHRoYXQgdGhlIHNlY29uZCBs YXlvdXRjb21taXQgd2lsbCBmYWlsIHRvDQo+IHBpY2sgdXAgdGhlIGxheW91dCBzZWdtZW50LCBh bmQgc28gd2lsbCBmYWlsIHRvIG5vdGlmeSB0aGUgTURTIHRoYXQgdGhlDQo+IHNlY29uZCBleHRl bnQgbm93IGNvbnRhaW5zIGRhdGEuDQo+IA0KPiBibG9ja2xheW91dCBkb2VzIG5vdCBkZWNpZGUg d2hhdCB0byBsYXlvdXRjb21taXQgYWNjb3JkaW5nIHRvIHRoZSBsc2VnDQo+IGxpc3QuIEluc3Rl YWQsIGl0IGtlZXBzIHRyYWNrIG9mIGVhY2ggZXh0ZW50J3Mgc3RhdGUgYXQgdGhlDQo+IGdyYW51 bGFyaXR5IG9mIGJsb2Nrc2l6ZSwgYW5kIGVuY29kZSB3aGF0ZXZlciBuZWVkcyBsYXlvdXRjb21t aXR0ZWQgaW4NCj4gdGhlIGxheW91dGNvbW1pdCBjYWxsLiBTbyBpbiB5b3VyIGFib3ZlIGNhc2Us IGFzIGxvbmcgYXMgdGhlIHNlY29uZA0KPiBsYXlvdXRjb21taXQgaXMgaXNzdWVkLCBibG9ja2xh eW91dCB3aWxsIGVuY29kZSB0aGUgbmV3bHkgd3JpdHRlbg0KPiBleHRlbnQgaW4gdGhlIHNlY29u ZCBsYXlvdXRjb21taXQgY2FsbCwgZXZlbiBpZiB0aGUgbHNlZyBpcyBub3QNCj4gYXR0YWNoZWQg dG8gdGhlIHNlY29uZCBsYXlvdXRjb21taXQuDQo+IA0KPiBCdXQgdGhhdCBsZWFkcyB0byBhbm90 aGVyIHR3byBxdWVzdGlvbjoNCj4gMS4gSG93IGRvIHdlIHByb3RlY3QgYWdhaW5zdCBsYXlvdXRy ZWNhbGwgaWYgbHNlZyBpcyBub3QgbGlua2VkIHRvDQo+IGxheW91dGNvbW1pdD8gRm9yIHRoaXMg b25lLCBjYW4gd2UganVzdCByZWplY3QgbGF5b3V0cmVjYWxsIGlmIHRoZXJlDQo+IGlzIGluZmxp Z2h0IGxheW91dGNvbW1pdD8gSXQgd2lsbCBiZSBsZXNzIHBhcmFsbGVsIGJ1dCBjYW4gZ3VhcmFu dGVlDQo+IGN1cnJlbnQgaW1wbGVtZW50YXRpb24gY29ycmVjdG5lc3MuDQo+IDIuIGJsb2NrbGF5 b3V0IE9OTFk6IGJsX2NvbW1pdHRpbmcgbWF5IGJlIG92ZXJsb2FkZWQgYnkgc2V2ZXJhbA0KPiBs YXlvdXRjb21taXQgY2FsbHMgYW5kIHdlIGRvbid0IGhhdmUgaW5mb3JtYXRpb24gaW4NCj4gY2xl YW51cF9sYXlvdXRjb21taXQoKSBvbiBob3cgbWFueSBlbnRyeSBzaG91bGQgYmUgcmVtb3ZlZCBm cm9tDQo+IGJsX2NvbW1pdHRpbmcuIE1heWJlIHdlIGNhbiBhZGQgYSAodm9pZCopIHRvIHN0cnVj dA0KPiBuZnM0X2xheW91dGNvbW1pdF9kYXRhLCBzbyB0aGF0IExEIGNhbiBwYXNzIHNvbWUgcHJp dmF0ZSBpbmZvcm1hdGlvbg0KPiBiZXR3ZWVuIGVuY29kZV9sYXlvdXRjb21taXQoKSBhbmQgY2xl YW51cF9sYXlvdXRjb21taXQoKT8NCg0KMy4gV2hhdCBpcyB0aGUgcHVycG9zZSBvZiBwaW5uaW5n IHRoZSBsYXlvdXQgc2VnbWVudCBhdCBhbGwgaWYgbmVpdGhlciBibG9ja3MsIG5vciBvYmplY3Rz IG5vciBmaWxlcyBjYXJlcz8NCklPVzogaWYgYm90aCBvYmplY3RzIGFuZCBibG9ja3MgdHJhY2sg dGhlIGluZm9ybWF0aW9uIHRoYXQgdGhleSBuZWVkIGZvciBsYXlvdXRjb21taXQgb3V0c2lkZSB0 aGUgbGF5b3V0IHNlZ21lbnRzLCB0aGVuIHdoeSBkbyB3ZSBuZWVkIHRoZSBORlNfTFNFR19MQVlP VVRDT01NSVQgYml0IGFuZCBwbHNfbGNfbGlzdCBhdCBhbGw/DQoNCkNoZWVycw0KICBUcm9uZA0K DQoT77+977+97Lm7HO+/vSbvv71+77+9Ju+/vRjvv73vv70rLe+/ve+/vd22F++/ve+/vXfvv73v v73Lm++/ve+/ve+/vW3vv71i77+977+9Z37Ip++/vRfvv73vv73cqH3vv73vv73vv73GoHrvv70m ajordu+/ve+/ve+/vQfvv73vv73vv73vv716Wivvv73vv70rembvv73vv73vv71o77+977+977+9 fu+/ve+/ve+/ve+/vWnvv73vv73vv71677+9Hu+/vXfvv73vv73vv70/77+977+977+977+9Ju+/ vSnfohtm ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-08 17:05 ` Myklebust, Trond @ 2011-09-09 3:11 ` tao.peng 2011-09-09 18:20 ` Trond Myklebust 0 siblings, 1 reply; 29+ messages in thread From: tao.peng @ 2011-09-09 3:11 UTC (permalink / raw) To: Trond.Myklebust; +Cc: gusev.vitaliy, gusev.vitaliy, linux-nfs, bergwolf SEksIFRyb25kLA0KDQo+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IE15a2xl YnVzdCwgVHJvbmQgW21haWx0bzpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbV0NCj4gU2VudDog RnJpZGF5LCBTZXB0ZW1iZXIgMDksIDIwMTEgMTowNSBBTQ0KPiBUbzogUGVuZyBUYW8NCj4gQ2M6 IFBlbmcsIFRhbzsgZ3VzZXYudml0YWxpeUBuZXhlbnRhLmNvbTsgZ3VzZXYudml0YWxpeUBnbWFp bC5jb207DQo+IGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmcNCj4gU3ViamVjdDogUkU6IFtQQVRD SF0gbmZzOiBmaXggaW5pZmluaXRlIGxvb3AgYXQgbmZzNF9sYXlvdXRjb21taXRfcmVsZWFzZQ0K PiANCj4gPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+IEZyb206IFBlbmcgVGFvIFtt YWlsdG86YmVyZ3dvbGZAZ21haWwuY29tXQ0KPiA+IFNlbnQ6IFRodXJzZGF5LCBTZXB0ZW1iZXIg MDgsIDIwMTEgMTE6MDAgQU0NCj4gPiBUbzogTXlrbGVidXN0LCBUcm9uZA0KPiA+IENjOiB0YW8u cGVuZ0BlbWMuY29tOyBndXNldi52aXRhbGl5QG5leGVudGEuY29tOw0KPiA+IGd1c2V2LnZpdGFs aXlAZ21haWwuY29tOyBsaW51eC1uZnNAdmdlci5rZXJuZWwub3JnDQo+ID4gU3ViamVjdDogUmU6 IFtQQVRDSF0gbmZzOiBmaXggaW5pZmluaXRlIGxvb3AgYXQNCj4gPiBuZnM0X2xheW91dGNvbW1p dF9yZWxlYXNlDQo+ID4NCj4gPiBPbiBUaHUsIFNlcCA4LCAyMDExIGF0IDg6MDEgUE0sIE15a2xl YnVzdCwgVHJvbmQNCj4gPiA8VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+ ID4+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4NCj4gPiA+DQo+ID4gPiBZZXMsIGJ1 dCBhcyBmYXIgYXMgSSBjYW4gc2VlLCBldmVuIGluIHRoZSBibG9ja3MgY2FzZSB0aGVyZSBjYW4g YmUNCj4gPiBtdWx0aXBsZSBleHRlbnRzIHBlciBsYXlvdXQgc2VnbWVudC4gV2hhdCBpZiBJIHdy aXRlIHRvIG9uZQ0KPiA+IHVuaW5pdGlhbGlzZWQgZXh0ZW50LCBsYXlvdXRjb21taXQsIHRoZW4g d3JpdGUgdG8gYW5vdGhlciB1bmluaXRpYWxpemVkDQo+ID4gZXh0ZW50IGluIHRoZSBzYW1lIGxh eW91dCBzZWdtZW50IGFuZCBsYXlvdXRjb21taXQ/IEluIG15IHJlYWRpbmcgb2YNCj4gPiB0aGUg Y29kZSwgdGhlcmUgaXMgYSBjaGFuY2UgdGhhdCB0aGUgc2Vjb25kIGxheW91dGNvbW1pdCB3aWxs IGZhaWwgdG8NCj4gPiBwaWNrIHVwIHRoZSBsYXlvdXQgc2VnbWVudCwgYW5kIHNvIHdpbGwgZmFp bCB0byBub3RpZnkgdGhlIE1EUyB0aGF0IHRoZQ0KPiA+IHNlY29uZCBleHRlbnQgbm93IGNvbnRh aW5zIGRhdGEuDQo+ID4NCj4gPiBibG9ja2xheW91dCBkb2VzIG5vdCBkZWNpZGUgd2hhdCB0byBs YXlvdXRjb21taXQgYWNjb3JkaW5nIHRvIHRoZSBsc2VnDQo+ID4gbGlzdC4gSW5zdGVhZCwgaXQg a2VlcHMgdHJhY2sgb2YgZWFjaCBleHRlbnQncyBzdGF0ZSBhdCB0aGUNCj4gPiBncmFudWxhcml0 eSBvZiBibG9ja3NpemUsIGFuZCBlbmNvZGUgd2hhdGV2ZXIgbmVlZHMgbGF5b3V0Y29tbWl0dGVk IGluDQo+ID4gdGhlIGxheW91dGNvbW1pdCBjYWxsLiBTbyBpbiB5b3VyIGFib3ZlIGNhc2UsIGFz IGxvbmcgYXMgdGhlIHNlY29uZA0KPiA+IGxheW91dGNvbW1pdCBpcyBpc3N1ZWQsIGJsb2NrbGF5 b3V0IHdpbGwgZW5jb2RlIHRoZSBuZXdseSB3cml0dGVuDQo+ID4gZXh0ZW50IGluIHRoZSBzZWNv bmQgbGF5b3V0Y29tbWl0IGNhbGwsIGV2ZW4gaWYgdGhlIGxzZWcgaXMgbm90DQo+ID4gYXR0YWNo ZWQgdG8gdGhlIHNlY29uZCBsYXlvdXRjb21taXQuDQo+ID4NCj4gPiBCdXQgdGhhdCBsZWFkcyB0 byBhbm90aGVyIHR3byBxdWVzdGlvbjoNCj4gPiAxLiBIb3cgZG8gd2UgcHJvdGVjdCBhZ2FpbnN0 IGxheW91dHJlY2FsbCBpZiBsc2VnIGlzIG5vdCBsaW5rZWQgdG8NCj4gPiBsYXlvdXRjb21taXQ/ IEZvciB0aGlzIG9uZSwgY2FuIHdlIGp1c3QgcmVqZWN0IGxheW91dHJlY2FsbCBpZiB0aGVyZQ0K PiA+IGlzIGluZmxpZ2h0IGxheW91dGNvbW1pdD8gSXQgd2lsbCBiZSBsZXNzIHBhcmFsbGVsIGJ1 dCBjYW4gZ3VhcmFudGVlDQo+ID4gY3VycmVudCBpbXBsZW1lbnRhdGlvbiBjb3JyZWN0bmVzcy4N Cj4gPiAyLiBibG9ja2xheW91dCBPTkxZOiBibF9jb21taXR0aW5nIG1heSBiZSBvdmVybG9hZGVk IGJ5IHNldmVyYWwNCj4gPiBsYXlvdXRjb21taXQgY2FsbHMgYW5kIHdlIGRvbid0IGhhdmUgaW5m b3JtYXRpb24gaW4NCj4gPiBjbGVhbnVwX2xheW91dGNvbW1pdCgpIG9uIGhvdyBtYW55IGVudHJ5 IHNob3VsZCBiZSByZW1vdmVkIGZyb20NCj4gPiBibF9jb21taXR0aW5nLiBNYXliZSB3ZSBjYW4g YWRkIGEgKHZvaWQqKSB0byBzdHJ1Y3QNCj4gPiBuZnM0X2xheW91dGNvbW1pdF9kYXRhLCBzbyB0 aGF0IExEIGNhbiBwYXNzIHNvbWUgcHJpdmF0ZSBpbmZvcm1hdGlvbg0KPiA+IGJldHdlZW4gZW5j b2RlX2xheW91dGNvbW1pdCgpIGFuZCBjbGVhbnVwX2xheW91dGNvbW1pdCgpPw0KPiANCj4gMy4g V2hhdCBpcyB0aGUgcHVycG9zZSBvZiBwaW5uaW5nIHRoZSBsYXlvdXQgc2VnbWVudCBhdCBhbGwg aWYgbmVpdGhlciBibG9ja3MsIG5vcg0KPiBvYmplY3RzIG5vciBmaWxlcyBjYXJlcz8NCkkgYmVs aWV2ZSBpdCBpcyBmb3IgcHJvdGVjdGluZyBhZ2FpbnN0IGxheW91dHJlY2FsbC4gQnV0IHNpbmNl IHdlIGFyZSBzZXBlcmF0aW5nIGxzZWcgYW5kIExEIHNwZWNpZmljIGxheW91dCBpbmZvcm1hdGlv biBtYW5hZ2VtZW50LCBpdCBpcyBhY3R1YWxseSBub3Qgd29ya2luZyBhcyBleHBlY3RlZC4NCg0K PiBJT1c6IGlmIGJvdGggb2JqZWN0cyBhbmQgYmxvY2tzIHRyYWNrIHRoZSBpbmZvcm1hdGlvbiB0 aGF0IHRoZXkgbmVlZCBmb3INCj4gbGF5b3V0Y29tbWl0IG91dHNpZGUgdGhlIGxheW91dCBzZWdt ZW50cywgdGhlbiB3aHkgZG8gd2UgbmVlZCB0aGUNCj4gTkZTX0xTRUdfTEFZT1VUQ09NTUlUIGJp dCBhbmQgcGxzX2xjX2xpc3QgYXQgYWxsPw0KSWYgd2UgcmVtb3ZlIE5GU19MU0VHX0xBWU9VVENP TU1JVCBiaXQgYW5kIHBsc19sY19saXN0LCB3ZSBtdXN0IGZpbmQgb3RoZXIgbWV0aG9kIHRvIHBy b3RlY3QgYWdhaW5zIGZyZWVpbmcgbHNlZyB3aGlsZSBsYXlvdXRjb21taXQgaXMgbmVlZGVkIG9y IGlzIGdvaW5nIG9uLiBlLmcuLCBjaGVjayBmb3IgTkZTX0lOT19MQVlPVVRDT01NSVQgYml0IGFu ZCBpbmZsaWdodCBsYXlvdXRjb21taXQgaW4gaW5pdGlhdGVfZmlsZV9kcmFpbmluZygpLg0KDQpU aGFua3MsDQpUYW8NCg0KDQoT77+977+97Lm7HO+/vSbvv71+77+9Ju+/vRjvv73vv70rLe+/ve+/ vd22F++/ve+/vXfvv73vv73Lm++/ve+/ve+/vW3vv71i77+977+9Z37Ip++/vRfvv73vv73cqH3v v73vv73vv73GoHrvv70majordu+/ve+/ve+/vQfvv73vv73vv73vv716Wivvv73vv70rembvv73v v73vv71o77+977+977+9fu+/ve+/ve+/ve+/vWnvv73vv73vv71677+9Hu+/vXfvv73vv73vv70/ 77+977+977+977+9Ju+/vSnfohtm ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-09 3:11 ` tao.peng @ 2011-09-09 18:20 ` Trond Myklebust 2011-09-10 7:14 ` Benny Halevy 2011-09-12 14:48 ` Peng Tao 0 siblings, 2 replies; 29+ messages in thread From: Trond Myklebust @ 2011-09-09 18:20 UTC (permalink / raw) To: tao.peng; +Cc: gusev.vitaliy, gusev.vitaliy, linux-nfs, bergwolf On Thu, 2011-09-08 at 23:11 -0400, tao.peng@emc.com wrote: > HI, Trond, > > > -----Original Message----- > > From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com] > > Sent: Friday, September 09, 2011 1:05 AM > > To: Peng Tao > > Cc: Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; > > linux-nfs@vger.kernel.org > > Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release > > > > > -----Original Message----- > > > From: Peng Tao [mailto:bergwolf@gmail.com] > > > Sent: Thursday, September 08, 2011 11:00 AM > > > To: Myklebust, Trond > > > Cc: tao.peng@emc.com; gusev.vitaliy@nexenta.com; > > > gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org > > > Subject: Re: [PATCH] nfs: fix inifinite loop at > > > nfs4_layoutcommit_release > > > > > > On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond > > > <Trond.Myklebust@netapp.com> wrote: > > > >> -----Original Message----- > > > > > > > > > > > Yes, but as far as I can see, even in the blocks case there can be > > > multiple extents per layout segment. What if I write to one > > > uninitialised extent, layoutcommit, then write to another uninitialized > > > extent in the same layout segment and layoutcommit? In my reading of > > > the code, there is a chance that the second layoutcommit will fail to > > > pick up the layout segment, and so will fail to notify the MDS that the > > > second extent now contains data. > > > > > > blocklayout does not decide what to layoutcommit according to the lseg > > > list. Instead, it keeps track of each extent's state at the > > > granularity of blocksize, and encode whatever needs layoutcommitted in > > > the layoutcommit call. So in your above case, as long as the second > > > layoutcommit is issued, blocklayout will encode the newly written > > > extent in the second layoutcommit call, even if the lseg is not > > > attached to the second layoutcommit. > > > > > > But that leads to another two question: > > > 1. How do we protect against layoutrecall if lseg is not linked to > > > layoutcommit? For this one, can we just reject layoutrecall if there > > > is inflight layoutcommit? It will be less parallel but can guarantee > > > current implementation correctness. > > > 2. blocklayout ONLY: bl_committing may be overloaded by several > > > layoutcommit calls and we don't have information in > > > cleanup_layoutcommit() on how many entry should be removed from > > > bl_committing. Maybe we can add a (void*) to struct > > > nfs4_layoutcommit_data, so that LD can pass some private information > > > between encode_layoutcommit() and cleanup_layoutcommit()? > > > > 3. What is the purpose of pinning the layout segment at all if neither blocks, nor > > objects nor files cares? > I believe it is for protecting against layoutrecall. But since we are seperating lseg and LD specific layout information management, it is actually not working as expected. > > > IOW: if both objects and blocks track the information that they need for > > layoutcommit outside the layout segments, then why do we need the > > NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list at all? > If we remove NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list, we must find other method to protect agains freeing lseg while layoutcommit is needed or is going on. e.g., check for NFS_INO_LAYOUTCOMMIT bit and inflight layoutcommit in initiate_file_draining(). The easiest solution is to ensure we have only _one_ LAYOUTCOMMIT on the wire at a time. Otherwise, you are looking at a many-to-many mapping between layoutcommits and lsegs. We should not expect to need multiple layoutcommits on the wire if pNFS is working smoothly (i.e. no layout recalls), so optimising for that case is wrong. I'd also think that we want to order layoutcommit and ordinary commits (for those NFS files servers that require both) so that we don't end up writing a file size change to disk before the actual data is committed. So why not just protect the layoutcommit call using the existing nfs_commit_set_lock/nfs_commit_clear_lock? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-09 18:20 ` Trond Myklebust @ 2011-09-10 7:14 ` Benny Halevy 2011-09-12 14:56 ` Peng Tao 2011-09-12 14:48 ` Peng Tao 1 sibling, 1 reply; 29+ messages in thread From: Benny Halevy @ 2011-09-10 7:14 UTC (permalink / raw) To: Trond Myklebust Cc: tao.peng, gusev.vitaliy, gusev.vitaliy, linux-nfs, bergwolf On 2011-09-09 11:20, Trond Myklebust wrote: > On Thu, 2011-09-08 at 23:11 -0400, tao.peng@emc.com wrote: >> HI, Trond, >> >>> -----Original Message----- >>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com] >>> Sent: Friday, September 09, 2011 1:05 AM >>> To: Peng Tao >>> Cc: Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; >>> linux-nfs@vger.kernel.org >>> Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release >>> >>>> -----Original Message----- >>>> From: Peng Tao [mailto:bergwolf@gmail.com] >>>> Sent: Thursday, September 08, 2011 11:00 AM >>>> To: Myklebust, Trond >>>> Cc: tao.peng@emc.com; gusev.vitaliy@nexenta.com; >>>> gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org >>>> Subject: Re: [PATCH] nfs: fix inifinite loop at >>>> nfs4_layoutcommit_release >>>> >>>> On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond >>>> <Trond.Myklebust@netapp.com> wrote: >>>>>> -----Original Message----- >>>> >>>>> >>>>> Yes, but as far as I can see, even in the blocks case there can be >>>> multiple extents per layout segment. What if I write to one >>>> uninitialised extent, layoutcommit, then write to another uninitialized >>>> extent in the same layout segment and layoutcommit? In my reading of >>>> the code, there is a chance that the second layoutcommit will fail to >>>> pick up the layout segment, and so will fail to notify the MDS that the >>>> second extent now contains data. >>>> >>>> blocklayout does not decide what to layoutcommit according to the lseg >>>> list. Instead, it keeps track of each extent's state at the >>>> granularity of blocksize, and encode whatever needs layoutcommitted in >>>> the layoutcommit call. So in your above case, as long as the second >>>> layoutcommit is issued, blocklayout will encode the newly written >>>> extent in the second layoutcommit call, even if the lseg is not >>>> attached to the second layoutcommit. >>>> >>>> But that leads to another two question: >>>> 1. How do we protect against layoutrecall if lseg is not linked to >>>> layoutcommit? For this one, can we just reject layoutrecall if there >>>> is inflight layoutcommit? It will be less parallel but can guarantee >>>> current implementation correctness. >>>> 2. blocklayout ONLY: bl_committing may be overloaded by several >>>> layoutcommit calls and we don't have information in >>>> cleanup_layoutcommit() on how many entry should be removed from >>>> bl_committing. Maybe we can add a (void*) to struct >>>> nfs4_layoutcommit_data, so that LD can pass some private information >>>> between encode_layoutcommit() and cleanup_layoutcommit()? >>> >>> 3. What is the purpose of pinning the layout segment at all if neither blocks, nor >>> objects nor files cares? >> I believe it is for protecting against layoutrecall. But since we are seperating lseg and LD specific layout information management, it is actually not working as expected. >> The layout segments are not really in use while in LAYOUTCOMMIT. We only need to get the stateid right with respect to concurrent layout recalls. >>> IOW: if both objects and blocks track the information that they need for >>> layoutcommit outside the layout segments, then why do we need the >>> NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list at all? >> If we remove NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list, we must find other method to protect agains freeing lseg while layoutcommit is needed or is going on. e.g., check for NFS_INO_LAYOUTCOMMIT bit and inflight layoutcommit in initiate_file_draining(). > > The easiest solution is to ensure we have only _one_ LAYOUTCOMMIT on the > wire at a time. Otherwise, you are looking at a many-to-many mapping > between layoutcommits and lsegs. > > We should not expect to need multiple layoutcommits on the wire if pNFS > is working smoothly (i.e. no layout recalls), so optimising for that > case is wrong. > I'd also think that we want to order layoutcommit and ordinary commits > (for those NFS files servers that require both) so that we don't end up > writing a file size change to disk before the actual data is committed. > > So why not just protect the layoutcommit call using the existing > nfs_commit_set_lock/nfs_commit_clear_lock? > Sounds good to me, Benny ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-10 7:14 ` Benny Halevy @ 2011-09-12 14:56 ` Peng Tao 2011-09-12 20:31 ` Benny Halevy 0 siblings, 1 reply; 29+ messages in thread From: Peng Tao @ 2011-09-12 14:56 UTC (permalink / raw) To: Benny Halevy Cc: Trond Myklebust, tao.peng, gusev.vitaliy, gusev.vitaliy, linux-nfs On Sat, Sep 10, 2011 at 3:14 PM, Benny Halevy <bhalevy@tonian.com> wrote: > On 2011-09-09 11:20, Trond Myklebust wrote: >> On Thu, 2011-09-08 at 23:11 -0400, tao.peng@emc.com wrote: >>> HI, Trond, >>> >>>> -----Original Message----- >>>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com] >>>> Sent: Friday, September 09, 2011 1:05 AM >>>> To: Peng Tao >>>> Cc: Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; >>>> linux-nfs@vger.kernel.org >>>> Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release >>>> >>>>> -----Original Message----- >>>>> From: Peng Tao [mailto:bergwolf@gmail.com] >>>>> Sent: Thursday, September 08, 2011 11:00 AM >>>>> To: Myklebust, Trond >>>>> Cc: tao.peng@emc.com; gusev.vitaliy@nexenta.com; >>>>> gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org >>>>> Subject: Re: [PATCH] nfs: fix inifinite loop at >>>>> nfs4_layoutcommit_release >>>>> >>>>> On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond >>>>> <Trond.Myklebust@netapp.com> wrote: >>>>>>> -----Original Message----- >>>>> >>>>>> >>>>>> Yes, but as far as I can see, even in the blocks case there can be >>>>> multiple extents per layout segment. What if I write to one >>>>> uninitialised extent, layoutcommit, then write to another uninitialized >>>>> extent in the same layout segment and layoutcommit? In my reading of >>>>> the code, there is a chance that the second layoutcommit will fail to >>>>> pick up the layout segment, and so will fail to notify the MDS that the >>>>> second extent now contains data. >>>>> >>>>> blocklayout does not decide what to layoutcommit according to the lseg >>>>> list. Instead, it keeps track of each extent's state at the >>>>> granularity of blocksize, and encode whatever needs layoutcommitted in >>>>> the layoutcommit call. So in your above case, as long as the second >>>>> layoutcommit is issued, blocklayout will encode the newly written >>>>> extent in the second layoutcommit call, even if the lseg is not >>>>> attached to the second layoutcommit. >>>>> >>>>> But that leads to another two question: >>>>> 1. How do we protect against layoutrecall if lseg is not linked to >>>>> layoutcommit? For this one, can we just reject layoutrecall if there >>>>> is inflight layoutcommit? It will be less parallel but can guarantee >>>>> current implementation correctness. >>>>> 2. blocklayout ONLY: bl_committing may be overloaded by several >>>>> layoutcommit calls and we don't have information in >>>>> cleanup_layoutcommit() on how many entry should be removed from >>>>> bl_committing. Maybe we can add a (void*) to struct >>>>> nfs4_layoutcommit_data, so that LD can pass some private information >>>>> between encode_layoutcommit() and cleanup_layoutcommit()? >>>> >>>> 3. What is the purpose of pinning the layout segment at all if neither blocks, nor >>>> objects nor files cares? >>> I believe it is for protecting against layoutrecall. But since we are seperating lseg and LD specific layout information management, it is actually not working as expected. >>> > > The layout segments are not really in use while in LAYOUTCOMMIT. > We only need to get the stateid right with respect to concurrent layout recalls. LAYOUTCOMMIT takes lseg reference to mark them as in use so that layoutrecall cannot free them. -- Thanks, Tao ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-12 14:56 ` Peng Tao @ 2011-09-12 20:31 ` Benny Halevy 2011-09-12 21:10 ` Trond Myklebust 2011-09-13 7:02 ` tao.peng 0 siblings, 2 replies; 29+ messages in thread From: Benny Halevy @ 2011-09-12 20:31 UTC (permalink / raw) To: Peng Tao; +Cc: Trond Myklebust, tao.peng, gusev.vitaliy, gusev.vitaliy, linux-nfs On 2011-09-12 07:56, Peng Tao wrote: > On Sat, Sep 10, 2011 at 3:14 PM, Benny Halevy <bhalevy@tonian.com> wrote: >> On 2011-09-09 11:20, Trond Myklebust wrote: >>> On Thu, 2011-09-08 at 23:11 -0400, tao.peng@emc.com wrote: >>>> HI, Trond, >>>> >>>>> -----Original Message----- >>>>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com] >>>>> Sent: Friday, September 09, 2011 1:05 AM >>>>> To: Peng Tao >>>>> Cc: Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; >>>>> linux-nfs@vger.kernel.org >>>>> Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release >>>>> >>>>>> -----Original Message----- >>>>>> From: Peng Tao [mailto:bergwolf@gmail.com] >>>>>> Sent: Thursday, September 08, 2011 11:00 AM >>>>>> To: Myklebust, Trond >>>>>> Cc: tao.peng@emc.com; gusev.vitaliy@nexenta.com; >>>>>> gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org >>>>>> Subject: Re: [PATCH] nfs: fix inifinite loop at >>>>>> nfs4_layoutcommit_release >>>>>> >>>>>> On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond >>>>>> <Trond.Myklebust@netapp.com> wrote: >>>>>>>> -----Original Message----- >>>>>> >>>>>>> >>>>>>> Yes, but as far as I can see, even in the blocks case there can be >>>>>> multiple extents per layout segment. What if I write to one >>>>>> uninitialised extent, layoutcommit, then write to another uninitialized >>>>>> extent in the same layout segment and layoutcommit? In my reading of >>>>>> the code, there is a chance that the second layoutcommit will fail to >>>>>> pick up the layout segment, and so will fail to notify the MDS that the >>>>>> second extent now contains data. >>>>>> >>>>>> blocklayout does not decide what to layoutcommit according to the lseg >>>>>> list. Instead, it keeps track of each extent's state at the >>>>>> granularity of blocksize, and encode whatever needs layoutcommitted in >>>>>> the layoutcommit call. So in your above case, as long as the second >>>>>> layoutcommit is issued, blocklayout will encode the newly written >>>>>> extent in the second layoutcommit call, even if the lseg is not >>>>>> attached to the second layoutcommit. >>>>>> >>>>>> But that leads to another two question: >>>>>> 1. How do we protect against layoutrecall if lseg is not linked to >>>>>> layoutcommit? For this one, can we just reject layoutrecall if there >>>>>> is inflight layoutcommit? It will be less parallel but can guarantee >>>>>> current implementation correctness. >>>>>> 2. blocklayout ONLY: bl_committing may be overloaded by several >>>>>> layoutcommit calls and we don't have information in >>>>>> cleanup_layoutcommit() on how many entry should be removed from >>>>>> bl_committing. Maybe we can add a (void*) to struct >>>>>> nfs4_layoutcommit_data, so that LD can pass some private information >>>>>> between encode_layoutcommit() and cleanup_layoutcommit()? >>>>> >>>>> 3. What is the purpose of pinning the layout segment at all if neither blocks, nor >>>>> objects nor files cares? >>>> I believe it is for protecting against layoutrecall. But since we are seperating lseg and LD specific layout information management, it is actually not working as expected. >>>> >> >> The layout segments are not really in use while in LAYOUTCOMMIT. >> We only need to get the stateid right with respect to concurrent layout recalls. > LAYOUTCOMMIT takes lseg reference to mark them as in use so that > layoutrecall cannot free them. > And if layoutrecall would have freed layout segments during layoutcommit, what is your specific concern? Benny ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-12 20:31 ` Benny Halevy @ 2011-09-12 21:10 ` Trond Myklebust 2011-09-13 7:50 ` Benny Halevy 2011-09-13 8:09 ` tao.peng 2011-09-13 7:02 ` tao.peng 1 sibling, 2 replies; 29+ messages in thread From: Trond Myklebust @ 2011-09-12 21:10 UTC (permalink / raw) To: Benny Halevy; +Cc: Peng Tao, tao.peng, gusev.vitaliy, gusev.vitaliy, linux-nfs On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote: > On 2011-09-12 07:56, Peng Tao wrote: > >> The layout segments are not really in use while in LAYOUTCOMMIT. > >> We only need to get the stateid right with respect to concurrent layout recalls. > > LAYOUTCOMMIT takes lseg reference to mark them as in use so that > > layoutrecall cannot free them. > > > > And if layoutrecall would have freed layout segments during layoutcommit, > what is your specific concern? That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case according to section 18.42.3 of RFC5661. I can't find anything in the errata that changes that requirement. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-12 21:10 ` Trond Myklebust @ 2011-09-13 7:50 ` Benny Halevy 2011-09-13 8:32 ` tao.peng 2011-09-13 8:09 ` tao.peng 1 sibling, 1 reply; 29+ messages in thread From: Benny Halevy @ 2011-09-13 7:50 UTC (permalink / raw) To: Trond Myklebust Cc: Peng Tao, tao.peng, gusev.vitaliy, gusev.vitaliy, linux-nfs On 2011-09-12 14:10, Trond Myklebust wrote: > On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote: >> On 2011-09-12 07:56, Peng Tao wrote: >>>> The layout segments are not really in use while in LAYOUTCOMMIT. >>>> We only need to get the stateid right with respect to concurrent layout recalls. >>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that >>> layoutrecall cannot free them. >>> >> >> And if layoutrecall would have freed layout segments during layoutcommit, >> what is your specific concern? > > That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case > according to section 18.42.3 of RFC5661. I can't find anything in the > errata that changes that requirement. > Right. That tells me there no need to strictly serialize LAYOUTCOMMITs with CB_LAYOUTRECALL, as long as the layout stateid sent with LAYOUTCOMMIT atomically represents the state when the operation was prepared. That said, since we do want the LAYOUTCOMMIT to succeed, it would be beneficial for the client to reply to a CB_LAYOUTRECALL received while a conflicting LAYOUTCOMMIT is in progress with NFS4ERR_DELAY. The server, on its side, should prevent a distributed deadlock by avoiding blocking of a LAYOUTCOMMIT on an outstanding CB_LAYOUTRECALL for the same client that sent the LAYOUTCOMMIT. I'm not sure what error would be best to return. Maybe NFS4ERR_RECALL_CONFLICT if it would be allowed (it isn't listed for LAYOUTCOMMIT at the moment). Just returning NFS4ER_DELAY might lead to a live lock situation where neither the LAYOUTCOMMIT not the CB_LAYOUTRECALL complete. Benny ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-13 7:50 ` Benny Halevy @ 2011-09-13 8:32 ` tao.peng 2011-09-14 6:43 ` Benny Halevy 0 siblings, 1 reply; 29+ messages in thread From: tao.peng @ 2011-09-13 8:32 UTC (permalink / raw) To: bhalevy, Trond.Myklebust Cc: bergwolf, gusev.vitaliy, gusev.vitaliy, linux-nfs > -----Original Message----- > From: Benny Halevy [mailto:bhalevy@tonian.com] > Sent: Tuesday, September 13, 2011 3:51 PM > To: Trond Myklebust > Cc: Peng Tao; Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; > linux-nfs@vger.kernel.org > Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release > > On 2011-09-12 14:10, Trond Myklebust wrote: > > On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote: > >> On 2011-09-12 07:56, Peng Tao wrote: > >>>> The layout segments are not really in use while in LAYOUTCOMMIT. > >>>> We only need to get the stateid right with respect to concurrent layout recalls. > >>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that > >>> layoutrecall cannot free them. > >>> > >> > >> And if layoutrecall would have freed layout segments during layoutcommit, > >> what is your specific concern? > > > > That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case > > according to section 18.42.3 of RFC5661. I can't find anything in the > > errata that changes that requirement. > > > > Right. That tells me there no need to strictly serialize LAYOUTCOMMITs > with CB_LAYOUTRECALL, as long as the layout stateid sent with LAYOUTCOMMIT > atomically represents the state when the operation was prepared. > > That said, since we do want the LAYOUTCOMMIT to succeed, it would be beneficial > for the client to reply to a CB_LAYOUTRECALL received while a conflicting > LAYOUTCOMMIT is in progress with NFS4ERR_DELAY. I agree. How about adding a new flag to nfsi->flags for this? We can use the same flag on to ensure serialization of multiple layoutcommit. nfs_commit_set_lock/nfs_commit_clear_lock may not fit for this. > > The server, on its side, should prevent a distributed deadlock by avoiding > blocking of a LAYOUTCOMMIT on an outstanding CB_LAYOUTRECALL for the same > client that sent the LAYOUTCOMMIT. I'm not sure what error would be best to > return. Maybe NFS4ERR_RECALL_CONFLICT if it would be allowed (it isn't listed > for LAYOUTCOMMIT at the moment). Just returning NFS4ER_DELAY might lead to > a live lock situation where neither the LAYOUTCOMMIT not the CB_LAYOUTRECALL > complete. > > Benny ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-13 8:32 ` tao.peng @ 2011-09-14 6:43 ` Benny Halevy 2011-09-14 7:53 ` tao.peng 0 siblings, 1 reply; 29+ messages in thread From: Benny Halevy @ 2011-09-14 6:43 UTC (permalink / raw) To: tao.peng; +Cc: Trond.Myklebust, bergwolf, gusev.vitaliy, gusev.vitaliy, linux-nfs On 2011-09-13 10:32, tao.peng@emc.com wrote: >> -----Original Message----- >> From: Benny Halevy [mailto:bhalevy@tonian.com] >> Sent: Tuesday, September 13, 2011 3:51 PM >> To: Trond Myklebust >> Cc: Peng Tao; Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; >> linux-nfs@vger.kernel.org >> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release >> >> On 2011-09-12 14:10, Trond Myklebust wrote: >>> On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote: >>>> On 2011-09-12 07:56, Peng Tao wrote: >>>>>> The layout segments are not really in use while in LAYOUTCOMMIT. >>>>>> We only need to get the stateid right with respect to concurrent layout recalls. >>>>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that >>>>> layoutrecall cannot free them. >>>>> >>>> >>>> And if layoutrecall would have freed layout segments during layoutcommit, >>>> what is your specific concern? >>> >>> That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case >>> according to section 18.42.3 of RFC5661. I can't find anything in the >>> errata that changes that requirement. >>> >> >> Right. That tells me there no need to strictly serialize LAYOUTCOMMITs >> with CB_LAYOUTRECALL, as long as the layout stateid sent with LAYOUTCOMMIT >> atomically represents the state when the operation was prepared. >> >> That said, since we do want the LAYOUTCOMMIT to succeed, it would be beneficial >> for the client to reply to a CB_LAYOUTRECALL received while a conflicting >> LAYOUTCOMMIT is in progress with NFS4ERR_DELAY. > I agree. How about adding a new flag to nfsi->flags for this? We can use the same flag on to ensure serialization of multiple layoutcommit. nfs_commit_set_lock/nfs_commit_clear_lock may not fit for this. > Sounds good in principle. Can you take a stab at a patch that does this? Benny >> >> The server, on its side, should prevent a distributed deadlock by avoiding >> blocking of a LAYOUTCOMMIT on an outstanding CB_LAYOUTRECALL for the same >> client that sent the LAYOUTCOMMIT. I'm not sure what error would be best to >> return. Maybe NFS4ERR_RECALL_CONFLICT if it would be allowed (it isn't listed >> for LAYOUTCOMMIT at the moment). Just returning NFS4ER_DELAY might lead to >> a live lock situation where neither the LAYOUTCOMMIT not the CB_LAYOUTRECALL >> complete. >> >> Benny > ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-14 6:43 ` Benny Halevy @ 2011-09-14 7:53 ` tao.peng [not found] ` <F19688880B763E40B28B2B462677FBF805C3F7A911-AYrsSIZi/B2B3McK65YKY9BPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 29+ messages in thread From: tao.peng @ 2011-09-14 7:53 UTC (permalink / raw) To: bhalevy; +Cc: Trond.Myklebust, bergwolf, gusev.vitaliy, gusev.vitaliy, linux-nfs > -----Original Message----- > From: Benny Halevy [mailto:bhalevy@tonian.com] > Sent: Wednesday, September 14, 2011 2:43 PM > To: Peng, Tao > Cc: Trond.Myklebust@netapp.com; bergwolf@gmail.com; > gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org > Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release > > On 2011-09-13 10:32, tao.peng@emc.com wrote: > >> -----Original Message----- > >> From: Benny Halevy [mailto:bhalevy@tonian.com] > >> Sent: Tuesday, September 13, 2011 3:51 PM > >> To: Trond Myklebust > >> Cc: Peng Tao; Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; > >> linux-nfs@vger.kernel.org > >> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release > >> > >> On 2011-09-12 14:10, Trond Myklebust wrote: > >>> On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote: > >>>> On 2011-09-12 07:56, Peng Tao wrote: > >>>>>> The layout segments are not really in use while in LAYOUTCOMMIT. > >>>>>> We only need to get the stateid right with respect to concurrent layout > recalls. > >>>>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that > >>>>> layoutrecall cannot free them. > >>>>> > >>>> > >>>> And if layoutrecall would have freed layout segments during layoutcommit, > >>>> what is your specific concern? > >>> > >>> That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case > >>> according to section 18.42.3 of RFC5661. I can't find anything in the > >>> errata that changes that requirement. > >>> > >> > >> Right. That tells me there no need to strictly serialize LAYOUTCOMMITs > >> with CB_LAYOUTRECALL, as long as the layout stateid sent with LAYOUTCOMMIT > >> atomically represents the state when the operation was prepared. > >> > >> That said, since we do want the LAYOUTCOMMIT to succeed, it would be > beneficial > >> for the client to reply to a CB_LAYOUTRECALL received while a conflicting > >> LAYOUTCOMMIT is in progress with NFS4ERR_DELAY. > > I agree. How about adding a new flag to nfsi->flags for this? We can use the same > flag on to ensure serialization of multiple layoutcommit. > nfs_commit_set_lock/nfs_commit_clear_lock may not fit for this. > > > > Sounds good in principle. > Can you take a stab at a patch that does this? OK, I will spend some time on it this week. Thanks, Tao > > Benny > > >> > >> The server, on its side, should prevent a distributed deadlock by avoiding > >> blocking of a LAYOUTCOMMIT on an outstanding CB_LAYOUTRECALL for the > same > >> client that sent the LAYOUTCOMMIT. I'm not sure what error would be best to > >> return. Maybe NFS4ERR_RECALL_CONFLICT if it would be allowed (it isn't listed > >> for LAYOUTCOMMIT at the moment). Just returning NFS4ER_DELAY might lead > to > >> a live lock situation where neither the LAYOUTCOMMIT not the > CB_LAYOUTRECALL > >> complete. > >> > >> Benny > > ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <F19688880B763E40B28B2B462677FBF805C3F7A911-AYrsSIZi/B2B3McK65YKY9BPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release [not found] ` <F19688880B763E40B28B2B462677FBF805C3F7A911-AYrsSIZi/B2B3McK65YKY9BPR1lH4CV8@public.gmane.org> @ 2011-09-14 13:20 ` Benny Halevy 0 siblings, 0 replies; 29+ messages in thread From: Benny Halevy @ 2011-09-14 13:20 UTC (permalink / raw) To: tao.peng; +Cc: Trond.Myklebust, bergwolf, gusev.vitaliy, gusev.vitaliy, linux-nfs VGhhbmtzIQ0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCkZyb206IDx0YW8ucGVuZ0BlbWMu Y29tPg0KRGF0ZTogV2VkLCAxNCBTZXAgMjAxMSAwMzo1MzozMCANClRvOiA8YmhhbGV2eUB0b25p YW4uY29tPg0KQ2M6IDxUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbT47IDxiZXJnd29sZkBnbWFp bC5jb20+OyA8Z3VzZXYudml0YWxpeUBuZXhlbnRhLmNvbT47IDxndXNldi52aXRhbGl5QGdtYWls LmNvbT47IDxsaW51eC1uZnNAdmdlci5rZXJuZWwub3JnPg0KU3ViamVjdDogUkU6IFtQQVRDSF0g bmZzOiBmaXggaW5pZmluaXRlIGxvb3AgYXQgbmZzNF9sYXlvdXRjb21taXRfcmVsZWFzZQ0KDQo+ IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IEJlbm55IEhhbGV2eSBbbWFpbHRv OmJoYWxldnlAdG9uaWFuLmNvbV0NCj4gU2VudDogV2VkbmVzZGF5LCBTZXB0ZW1iZXIgMTQsIDIw MTEgMjo0MyBQTQ0KPiBUbzogUGVuZywgVGFvDQo+IENjOiBUcm9uZC5NeWtsZWJ1c3RAbmV0YXBw LmNvbTsgYmVyZ3dvbGZAZ21haWwuY29tOw0KPiBndXNldi52aXRhbGl5QG5leGVudGEuY29tOyBn dXNldi52aXRhbGl5QGdtYWlsLmNvbTsgbGludXgtbmZzQHZnZXIua2VybmVsLm9yZw0KPiBTdWJq ZWN0OiBSZTogW1BBVENIXSBuZnM6IGZpeCBpbmlmaW5pdGUgbG9vcCBhdCBuZnM0X2xheW91dGNv bW1pdF9yZWxlYXNlDQo+IA0KPiBPbiAyMDExLTA5LTEzIDEwOjMyLCB0YW8ucGVuZ0BlbWMuY29t IHdyb3RlOg0KPiA+PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+PiBGcm9tOiBCZW5u eSBIYWxldnkgW21haWx0bzpiaGFsZXZ5QHRvbmlhbi5jb21dDQo+ID4+IFNlbnQ6IFR1ZXNkYXks IFNlcHRlbWJlciAxMywgMjAxMSAzOjUxIFBNDQo+ID4+IFRvOiBUcm9uZCBNeWtsZWJ1c3QNCj4g Pj4gQ2M6IFBlbmcgVGFvOyBQZW5nLCBUYW87IGd1c2V2LnZpdGFsaXlAbmV4ZW50YS5jb207IGd1 c2V2LnZpdGFsaXlAZ21haWwuY29tOw0KPiA+PiBsaW51eC1uZnNAdmdlci5rZXJuZWwub3JnDQo+ ID4+IFN1YmplY3Q6IFJlOiBbUEFUQ0hdIG5mczogZml4IGluaWZpbml0ZSBsb29wIGF0IG5mczRf bGF5b3V0Y29tbWl0X3JlbGVhc2UNCj4gPj4NCj4gPj4gT24gMjAxMS0wOS0xMiAxNDoxMCwgVHJv bmQgTXlrbGVidXN0IHdyb3RlOg0KPiA+Pj4gT24gTW9uLCAyMDExLTA5LTEyIGF0IDEzOjMxIC0w NzAwLCBCZW5ueSBIYWxldnkgd3JvdGU6DQo+ID4+Pj4gT24gMjAxMS0wOS0xMiAwNzo1NiwgUGVu ZyBUYW8gd3JvdGU6DQo+ID4+Pj4+PiBUaGUgbGF5b3V0IHNlZ21lbnRzIGFyZSBub3QgcmVhbGx5 IGluIHVzZSB3aGlsZSBpbiBMQVlPVVRDT01NSVQuDQo+ID4+Pj4+PiBXZSBvbmx5IG5lZWQgdG8g Z2V0IHRoZSBzdGF0ZWlkIHJpZ2h0IHdpdGggcmVzcGVjdCB0byBjb25jdXJyZW50IGxheW91dA0K PiByZWNhbGxzLg0KPiA+Pj4+PiBMQVlPVVRDT01NSVQgdGFrZXMgbHNlZyByZWZlcmVuY2UgdG8g bWFyayB0aGVtIGFzIGluIHVzZSBzbyB0aGF0DQo+ID4+Pj4+IGxheW91dHJlY2FsbCBjYW5ub3Qg ZnJlZSB0aGVtLg0KPiA+Pj4+Pg0KPiA+Pj4+DQo+ID4+Pj4gQW5kIGlmIGxheW91dHJlY2FsbCB3 b3VsZCBoYXZlIGZyZWVkIGxheW91dCBzZWdtZW50cyBkdXJpbmcgbGF5b3V0Y29tbWl0LA0KPiA+ Pj4+IHdoYXQgaXMgeW91ciBzcGVjaWZpYyBjb25jZXJuPw0KPiA+Pj4NCj4gPj4+IFRoYXQgbGF5 b3V0Y29tbWl0IGlzIHN1cHBvc2VkIHRvIHJldHVybiBORlM0RVJSX0JBRF9MQVlPVVQgaW4gdGhh dCBjYXNlDQo+ID4+PiBhY2NvcmRpbmcgdG8gc2VjdGlvbiAxOC40Mi4zIG9mIFJGQzU2NjEuIEkg Y2FuJ3QgZmluZCBhbnl0aGluZyBpbiB0aGUNCj4gPj4+IGVycmF0YSB0aGF0IGNoYW5nZXMgdGhh dCByZXF1aXJlbWVudC4NCj4gPj4+DQo+ID4+DQo+ID4+IFJpZ2h0LiBUaGF0IHRlbGxzIG1lIHRo ZXJlIG5vIG5lZWQgdG8gc3RyaWN0bHkgc2VyaWFsaXplIExBWU9VVENPTU1JVHMNCj4gPj4gd2l0 aCBDQl9MQVlPVVRSRUNBTEwsIGFzIGxvbmcgYXMgdGhlIGxheW91dCBzdGF0ZWlkIHNlbnQgd2l0 aCBMQVlPVVRDT01NSVQNCj4gPj4gYXRvbWljYWxseSByZXByZXNlbnRzIHRoZSBzdGF0ZSB3aGVu IHRoZSBvcGVyYXRpb24gd2FzIHByZXBhcmVkLg0KPiA+Pg0KPiA+PiBUaGF0IHNhaWQsIHNpbmNl IHdlIGRvIHdhbnQgdGhlIExBWU9VVENPTU1JVCB0byBzdWNjZWVkLCBpdCB3b3VsZCBiZQ0KPiBi ZW5lZmljaWFsDQo+ID4+IGZvciB0aGUgY2xpZW50IHRvIHJlcGx5IHRvIGEgQ0JfTEFZT1VUUkVD QUxMIHJlY2VpdmVkIHdoaWxlIGEgY29uZmxpY3RpbmcNCj4gPj4gTEFZT1VUQ09NTUlUIGlzIGlu IHByb2dyZXNzIHdpdGggTkZTNEVSUl9ERUxBWS4NCj4gPiBJIGFncmVlLiBIb3cgYWJvdXQgYWRk aW5nIGEgbmV3IGZsYWcgdG8gbmZzaS0+ZmxhZ3MgZm9yIHRoaXM/IFdlIGNhbiB1c2UgdGhlIHNh bWUNCj4gZmxhZyBvbiB0byBlbnN1cmUgc2VyaWFsaXphdGlvbiBvZiBtdWx0aXBsZSBsYXlvdXRj b21taXQuDQo+IG5mc19jb21taXRfc2V0X2xvY2svbmZzX2NvbW1pdF9jbGVhcl9sb2NrIG1heSBu b3QgZml0IGZvciB0aGlzLg0KPiA+DQo+IA0KPiBTb3VuZHMgZ29vZCBpbiBwcmluY2lwbGUuDQo+ IENhbiB5b3UgdGFrZSBhIHN0YWIgYXQgYSBwYXRjaCB0aGF0IGRvZXMgdGhpcz8NCk9LLCBJIHdp bGwgc3BlbmQgc29tZSB0aW1lIG9uIGl0IHRoaXMgd2Vlay4NCg0KVGhhbmtzLA0KVGFvDQo+IA0K PiBCZW5ueQ0KPiANCj4gPj4NCj4gPj4gVGhlIHNlcnZlciwgb24gaXRzIHNpZGUsIHNob3VsZCBw cmV2ZW50IGEgZGlzdHJpYnV0ZWQgZGVhZGxvY2sgYnkgYXZvaWRpbmcNCj4gPj4gYmxvY2tpbmcg b2YgYSBMQVlPVVRDT01NSVQgb24gYW4gb3V0c3RhbmRpbmcgQ0JfTEFZT1VUUkVDQUxMIGZvciB0 aGUNCj4gc2FtZQ0KPiA+PiBjbGllbnQgdGhhdCBzZW50IHRoZSBMQVlPVVRDT01NSVQuICBJJ20g bm90IHN1cmUgd2hhdCBlcnJvciB3b3VsZCBiZSBiZXN0IHRvDQo+ID4+IHJldHVybi4gIE1heWJl IE5GUzRFUlJfUkVDQUxMX0NPTkZMSUNUIGlmIGl0IHdvdWxkIGJlIGFsbG93ZWQgKGl0IGlzbid0 IGxpc3RlZA0KPiA+PiBmb3IgTEFZT1VUQ09NTUlUIGF0IHRoZSBtb21lbnQpLiAgSnVzdCByZXR1 cm5pbmcgTkZTNEVSX0RFTEFZIG1pZ2h0IGxlYWQNCj4gdG8NCj4gPj4gYSBsaXZlIGxvY2sgc2l0 dWF0aW9uIHdoZXJlIG5laXRoZXIgdGhlIExBWU9VVENPTU1JVCBub3QgdGhlDQo+IENCX0xBWU9V VFJFQ0FMTA0KPiA+PiBjb21wbGV0ZS4NCj4gPj4NCj4gPj4gQmVubnkNCj4gPg0KDQo= ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-12 21:10 ` Trond Myklebust 2011-09-13 7:50 ` Benny Halevy @ 2011-09-13 8:09 ` tao.peng 2011-09-14 6:46 ` Benny Halevy 1 sibling, 1 reply; 29+ messages in thread From: tao.peng @ 2011-09-13 8:09 UTC (permalink / raw) To: Trond.Myklebust, bhalevy Cc: bergwolf, gusev.vitaliy, gusev.vitaliy, linux-nfs SGksIFRyb25kLA0KPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1u ZnMtb3duZXJAdmdlci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZzLW93bmVyQHZnZXIua2Vy bmVsLm9yZ10NCj4gT24gQmVoYWxmIE9mIFRyb25kIE15a2xlYnVzdA0KPiBTZW50OiBUdWVzZGF5 LCBTZXB0ZW1iZXIgMTMsIDIwMTEgNToxMSBBTQ0KPiBUbzogQmVubnkgSGFsZXZ5DQo+IENjOiBQ ZW5nIFRhbzsgUGVuZywgVGFvOyBndXNldi52aXRhbGl5QG5leGVudGEuY29tOyBndXNldi52aXRh bGl5QGdtYWlsLmNvbTsNCj4gbGludXgtbmZzQHZnZXIua2VybmVsLm9yZw0KPiBTdWJqZWN0OiBS ZTogW1BBVENIXSBuZnM6IGZpeCBpbmlmaW5pdGUgbG9vcCBhdCBuZnM0X2xheW91dGNvbW1pdF9y ZWxlYXNlDQo+IA0KPiBPbiBNb24sIDIwMTEtMDktMTIgYXQgMTM6MzEgLTA3MDAsIEJlbm55IEhh bGV2eSB3cm90ZToNCj4gPiBPbiAyMDExLTA5LTEyIDA3OjU2LCBQZW5nIFRhbyB3cm90ZToNCj4g PiA+PiBUaGUgbGF5b3V0IHNlZ21lbnRzIGFyZSBub3QgcmVhbGx5IGluIHVzZSB3aGlsZSBpbiBM QVlPVVRDT01NSVQuDQo+ID4gPj4gV2Ugb25seSBuZWVkIHRvIGdldCB0aGUgc3RhdGVpZCByaWdo dCB3aXRoIHJlc3BlY3QgdG8gY29uY3VycmVudCBsYXlvdXQgcmVjYWxscy4NCj4gPiA+IExBWU9V VENPTU1JVCB0YWtlcyBsc2VnIHJlZmVyZW5jZSB0byBtYXJrIHRoZW0gYXMgaW4gdXNlIHNvIHRo YXQNCj4gPiA+IGxheW91dHJlY2FsbCBjYW5ub3QgZnJlZSB0aGVtLg0KPiA+ID4NCj4gPg0KPiA+ IEFuZCBpZiBsYXlvdXRyZWNhbGwgd291bGQgaGF2ZSBmcmVlZCBsYXlvdXQgc2VnbWVudHMgZHVy aW5nIGxheW91dGNvbW1pdCwNCj4gPiB3aGF0IGlzIHlvdXIgc3BlY2lmaWMgY29uY2Vybj8NCj4g DQo+IFRoYXQgbGF5b3V0Y29tbWl0IGlzIHN1cHBvc2VkIHRvIHJldHVybiBORlM0RVJSX0JBRF9M QVlPVVQgaW4gdGhhdCBjYXNlDQo+IGFjY29yZGluZyB0byBzZWN0aW9uIDE4LjQyLjMgb2YgUkZD NTY2MS4gSSBjYW4ndCBmaW5kIGFueXRoaW5nIGluIHRoZQ0KPiBlcnJhdGEgdGhhdCBjaGFuZ2Vz IHRoYXQgcmVxdWlyZW1lbnQuDQpEbyB5b3UgbWVhbiB0aGF0IGlmIGNsaWVudCBzZW5kcyBsYXlv dXRjb21taXQgYXQgdGhlIHNhbWUgdGltZSBhcyBzZXJ2ZXIgc2VuZHMgbGF5b3V0cmVjYWxsIGZv ciBvdmVybGFwcGVkIHJhbmdlLCB0aGVuDQpzZXJ2ZXIgbXVzdCByZWplY3QgbGF5b3V0Y29tbWl0 IHdpdGggTkZTNEVSUl9CQURMQVlPVVQgd2hlbiByZWNlaXZpbmcgbGF5b3V0Y29tbWl0Pw0KDQpS RkM1NjYxIHNlY3Rpb24gMTguNDIuMyBzYXlzOg0KICAgSWYgdGhlIGxheW91dCBpZGVudGlmaWVk IGluIHRoZSBhcmd1bWVudHMgZG9lcyBub3QgZXhpc3QsIHRoZSBlcnJvcg0KICAgTkZTNEVSUl9C QURMQVlPVVQgaXMgcmV0dXJuZWQuICBUaGUgbGF5b3V0IGJlaW5nIGNvbW1pdHRlZCBtYXkgYWxz bw0KICAgYmUgcmVqZWN0ZWQgaWYgaXQgZG9lcyBub3QgY29ycmVzcG9uZCB0byBhbiBleGlzdGlu ZyBsYXlvdXQgd2l0aCBhbg0KICAgaW9tb2RlIG9mIExBWU9VVElPTU9ERTRfUlcuDQoNCklNSE8s IGluIHRoZSBhYm92ZSBjYXNlLCBzZXJ2ZXIgY2Fubm90IGRlY2lkZSBpZiB0aGUgbGF5b3V0IGV4 aXN0cyB1bnRpbCBjbGllbnQgcmV0dXJucyByZXNwb25zZSB0byBsYXlvdXRyZWNhbGwuIEJ1dCBJ IGNhbid0IGZpbmQNCmFueSByZXF1aXJlbWVudCBpbiBSRkM1NjYxIHRoYXQgc2VydmVyIG11c3Qg d2FpdCBmb3IgbGF5b3V0cmVjYWxsJ3MgcmVzcG9uc2UgYmVmb3JlIHByb2Nlc3NpbmcgbGF5b3V0 Y29tbWl0Lg0KDQpUaGFua3MsDQpUYW8NCg0KTu+/ve+/ve+/ve+/ve+/vXLvv73vv71577+977+9 77+9Yu+/vVjvv73vv73Hp3bvv71e77+9Kd66ey5u77+9K++/ve+/ve+/ve+/vXvvv73vv73vv70i 77+977+9Xm7vv71y77+977+977+9eu+/vRrvv73vv71o77+977+977+977+9Ju+/ve+/vR7vv71H 77+977+977+9aO+/vQMo77+96ZqO77+93aJqIu+/ve+/vRrvv70bbe+/ve+/ve+/ve+/ve+/vXrv v73elu+/ve+/ve+/vWbvv73vv73vv71o77+977+977+9fu+/vW3vv70= ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-13 8:09 ` tao.peng @ 2011-09-14 6:46 ` Benny Halevy 0 siblings, 0 replies; 29+ messages in thread From: Benny Halevy @ 2011-09-14 6:46 UTC (permalink / raw) To: tao.peng; +Cc: Trond.Myklebust, bergwolf, gusev.vitaliy, gusev.vitaliy, linux-nfs On 2011-09-13 10:09, tao.peng@emc.com wrote: > Hi, Trond, >> -----Original Message----- >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-owner@vger.kernel.org] >> On Behalf Of Trond Myklebust >> Sent: Tuesday, September 13, 2011 5:11 AM >> To: Benny Halevy >> Cc: Peng Tao; Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; >> linux-nfs@vger.kernel.org >> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release >> >> On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote: >>> On 2011-09-12 07:56, Peng Tao wrote: >>>>> The layout segments are not really in use while in LAYOUTCOMMIT. >>>>> We only need to get the stateid right with respect to concurrent layout recalls. >>>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that >>>> layoutrecall cannot free them. >>>> >>> >>> And if layoutrecall would have freed layout segments during layoutcommit, >>> what is your specific concern? >> >> That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case >> according to section 18.42.3 of RFC5661. I can't find anything in the >> errata that changes that requirement. > Do you mean that if client sends layoutcommit at the same time as server sends layoutrecall for overlapped range, then > server must reject layoutcommit with NFS4ERR_BADLAYOUT when receiving layoutcommit? > > RFC5661 section 18.42.3 says: > If the layout identified in the arguments does not exist, the error > NFS4ERR_BADLAYOUT is returned. The layout being committed may also > be rejected if it does not correspond to an existing layout with an > iomode of LAYOUTIOMODE4_RW. > > IMHO, in the above case, server cannot decide if the layout exists until client returns response to layoutrecall. Right, unless the server revokes the layout, but then there are also other consequences like SEQ4_STATUS_RECALLABLE_STATE_REVOKED. Benny > But I can't find > any requirement in RFC5661 that server must wait for layoutrecall's response before processing layoutcommit. > > Thanks, > Tao ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-12 20:31 ` Benny Halevy 2011-09-12 21:10 ` Trond Myklebust @ 2011-09-13 7:02 ` tao.peng 2011-09-13 7:54 ` Benny Halevy 1 sibling, 1 reply; 29+ messages in thread From: tao.peng @ 2011-09-13 7:02 UTC (permalink / raw) To: bhalevy, bergwolf Cc: Trond.Myklebust, gusev.vitaliy, gusev.vitaliy, linux-nfs > -----Original Message----- > From: Benny Halevy [mailto:bhalevy@tonian.com] > Sent: Tuesday, September 13, 2011 4:32 AM > To: Peng Tao > Cc: Trond Myklebust; Peng, Tao; gusev.vitaliy@nexenta.com; > gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org > Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release > > On 2011-09-12 07:56, Peng Tao wrote: > > On Sat, Sep 10, 2011 at 3:14 PM, Benny Halevy <bhalevy@tonian.com> wrote: > >> On 2011-09-09 11:20, Trond Myklebust wrote: > >>> On Thu, 2011-09-08 at 23:11 -0400, tao.peng@emc.com wrote: > >>>> HI, Trond, > >>>> > >>>>> -----Original Message----- > >>>>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com] > >>>>> Sent: Friday, September 09, 2011 1:05 AM > >>>>> To: Peng Tao > >>>>> Cc: Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; > >>>>> linux-nfs@vger.kernel.org > >>>>> Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Peng Tao [mailto:bergwolf@gmail.com] > >>>>>> Sent: Thursday, September 08, 2011 11:00 AM > >>>>>> To: Myklebust, Trond > >>>>>> Cc: tao.peng@emc.com; gusev.vitaliy@nexenta.com; > >>>>>> gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org > >>>>>> Subject: Re: [PATCH] nfs: fix inifinite loop at > >>>>>> nfs4_layoutcommit_release > >>>>>> > >>>>>> On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond > >>>>>> <Trond.Myklebust@netapp.com> wrote: > >>>>>>>> -----Original Message----- > >>>>>> > >>>>>>> > >>>>>>> Yes, but as far as I can see, even in the blocks case there can be > >>>>>> multiple extents per layout segment. What if I write to one > >>>>>> uninitialised extent, layoutcommit, then write to another uninitialized > >>>>>> extent in the same layout segment and layoutcommit? In my reading of > >>>>>> the code, there is a chance that the second layoutcommit will fail to > >>>>>> pick up the layout segment, and so will fail to notify the MDS that the > >>>>>> second extent now contains data. > >>>>>> > >>>>>> blocklayout does not decide what to layoutcommit according to the lseg > >>>>>> list. Instead, it keeps track of each extent's state at the > >>>>>> granularity of blocksize, and encode whatever needs layoutcommitted in > >>>>>> the layoutcommit call. So in your above case, as long as the second > >>>>>> layoutcommit is issued, blocklayout will encode the newly written > >>>>>> extent in the second layoutcommit call, even if the lseg is not > >>>>>> attached to the second layoutcommit. > >>>>>> > >>>>>> But that leads to another two question: > >>>>>> 1. How do we protect against layoutrecall if lseg is not linked to > >>>>>> layoutcommit? For this one, can we just reject layoutrecall if there > >>>>>> is inflight layoutcommit? It will be less parallel but can guarantee > >>>>>> current implementation correctness. > >>>>>> 2. blocklayout ONLY: bl_committing may be overloaded by several > >>>>>> layoutcommit calls and we don't have information in > >>>>>> cleanup_layoutcommit() on how many entry should be removed from > >>>>>> bl_committing. Maybe we can add a (void*) to struct > >>>>>> nfs4_layoutcommit_data, so that LD can pass some private information > >>>>>> between encode_layoutcommit() and cleanup_layoutcommit()? > >>>>> > >>>>> 3. What is the purpose of pinning the layout segment at all if neither blocks, > nor > >>>>> objects nor files cares? > >>>> I believe it is for protecting against layoutrecall. But since we are seperating > lseg and LD specific layout information management, it is actually not working as > expected. > >>>> > >> > >> The layout segments are not really in use while in LAYOUTCOMMIT. > >> We only need to get the stateid right with respect to concurrent layout recalls. > > LAYOUTCOMMIT takes lseg reference to mark them as in use so that > > layoutrecall cannot free them. > > > > And if layoutrecall would have freed layout segments during layoutcommit, > what is your specific concern? In layoutcommit_release, blocklayout need to access the corresponding extents to convert their states. If the layout segments are freed by layoutrecall, it can cause problems. > > Benny ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-13 7:02 ` tao.peng @ 2011-09-13 7:54 ` Benny Halevy 0 siblings, 0 replies; 29+ messages in thread From: Benny Halevy @ 2011-09-13 7:54 UTC (permalink / raw) To: tao.peng; +Cc: bergwolf, Trond.Myklebust, gusev.vitaliy, gusev.vitaliy, linux-nfs On 2011-09-13 00:02, tao.peng@emc.com wrote: > >> -----Original Message----- >> From: Benny Halevy [mailto:bhalevy@tonian.com] >> Sent: Tuesday, September 13, 2011 4:32 AM >> To: Peng Tao >> Cc: Trond Myklebust; Peng, Tao; gusev.vitaliy@nexenta.com; >> gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org >> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release >> >> On 2011-09-12 07:56, Peng Tao wrote: >>> On Sat, Sep 10, 2011 at 3:14 PM, Benny Halevy <bhalevy@tonian.com> wrote: >>>> On 2011-09-09 11:20, Trond Myklebust wrote: >>>>> On Thu, 2011-09-08 at 23:11 -0400, tao.peng@emc.com wrote: >>>>>> HI, Trond, >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com] >>>>>>> Sent: Friday, September 09, 2011 1:05 AM >>>>>>> To: Peng Tao >>>>>>> Cc: Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; >>>>>>> linux-nfs@vger.kernel.org >>>>>>> Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Peng Tao [mailto:bergwolf@gmail.com] >>>>>>>> Sent: Thursday, September 08, 2011 11:00 AM >>>>>>>> To: Myklebust, Trond >>>>>>>> Cc: tao.peng@emc.com; gusev.vitaliy@nexenta.com; >>>>>>>> gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org >>>>>>>> Subject: Re: [PATCH] nfs: fix inifinite loop at >>>>>>>> nfs4_layoutcommit_release >>>>>>>> >>>>>>>> On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond >>>>>>>> <Trond.Myklebust@netapp.com> wrote: >>>>>>>>>> -----Original Message----- >>>>>>>> >>>>>>>>> >>>>>>>>> Yes, but as far as I can see, even in the blocks case there can be >>>>>>>> multiple extents per layout segment. What if I write to one >>>>>>>> uninitialised extent, layoutcommit, then write to another uninitialized >>>>>>>> extent in the same layout segment and layoutcommit? In my reading of >>>>>>>> the code, there is a chance that the second layoutcommit will fail to >>>>>>>> pick up the layout segment, and so will fail to notify the MDS that the >>>>>>>> second extent now contains data. >>>>>>>> >>>>>>>> blocklayout does not decide what to layoutcommit according to the lseg >>>>>>>> list. Instead, it keeps track of each extent's state at the >>>>>>>> granularity of blocksize, and encode whatever needs layoutcommitted in >>>>>>>> the layoutcommit call. So in your above case, as long as the second >>>>>>>> layoutcommit is issued, blocklayout will encode the newly written >>>>>>>> extent in the second layoutcommit call, even if the lseg is not >>>>>>>> attached to the second layoutcommit. >>>>>>>> >>>>>>>> But that leads to another two question: >>>>>>>> 1. How do we protect against layoutrecall if lseg is not linked to >>>>>>>> layoutcommit? For this one, can we just reject layoutrecall if there >>>>>>>> is inflight layoutcommit? It will be less parallel but can guarantee >>>>>>>> current implementation correctness. >>>>>>>> 2. blocklayout ONLY: bl_committing may be overloaded by several >>>>>>>> layoutcommit calls and we don't have information in >>>>>>>> cleanup_layoutcommit() on how many entry should be removed from >>>>>>>> bl_committing. Maybe we can add a (void*) to struct >>>>>>>> nfs4_layoutcommit_data, so that LD can pass some private information >>>>>>>> between encode_layoutcommit() and cleanup_layoutcommit()? >>>>>>> >>>>>>> 3. What is the purpose of pinning the layout segment at all if neither blocks, >> nor >>>>>>> objects nor files cares? >>>>>> I believe it is for protecting against layoutrecall. But since we are seperating >> lseg and LD specific layout information management, it is actually not working as >> expected. >>>>>> >>>> >>>> The layout segments are not really in use while in LAYOUTCOMMIT. >>>> We only need to get the stateid right with respect to concurrent layout recalls. >>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that >>> layoutrecall cannot free them. >>> >> >> And if layoutrecall would have freed layout segments during layoutcommit, >> what is your specific concern? > In layoutcommit_release, blocklayout need to access the corresponding extents to convert their states. If the layout segments are freed by layoutrecall, it can cause problems. > See my response to Trond on his previous message. I think the best thing to do is to return NFS4ERR_DELAY if the gets a conflicting CB_LAYOUTRECALL while the LAYOUTCOMMIT is in progress. The server may need to reject the LAYOUTCOMMIT in this case to prevent a distributed deadlock so the client should be prepared to retry. Benny >> >> Benny ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-09 18:20 ` Trond Myklebust 2011-09-10 7:14 ` Benny Halevy @ 2011-09-12 14:48 ` Peng Tao 1 sibling, 0 replies; 29+ messages in thread From: Peng Tao @ 2011-09-12 14:48 UTC (permalink / raw) To: Trond Myklebust; +Cc: tao.peng, gusev.vitaliy, gusev.vitaliy, linux-nfs Hi, Trond, On Sat, Sep 10, 2011 at 2:20 AM, Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Thu, 2011-09-08 at 23:11 -0400, tao.peng@emc.com wrote: >> HI, Trond, >> >> > -----Original Message----- >> > From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com] >> > Sent: Friday, September 09, 2011 1:05 AM >> > To: Peng Tao >> > Cc: Peng, Tao; gusev.vitaliy@nexenta.com; gusev.vitaliy@gmail.com; >> > linux-nfs@vger.kernel.org >> > Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release >> > >> > > -----Original Message----- >> > > From: Peng Tao [mailto:bergwolf@gmail.com] >> > > Sent: Thursday, September 08, 2011 11:00 AM >> > > To: Myklebust, Trond >> > > Cc: tao.peng@emc.com; gusev.vitaliy@nexenta.com; >> > > gusev.vitaliy@gmail.com; linux-nfs@vger.kernel.org >> > > Subject: Re: [PATCH] nfs: fix inifinite loop at >> > > nfs4_layoutcommit_release >> > > >> > > On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond >> > > <Trond.Myklebust@netapp.com> wrote: >> > > >> -----Original Message----- >> > > >> > > > >> > > > Yes, but as far as I can see, even in the blocks case there can be >> > > multiple extents per layout segment. What if I write to one >> > > uninitialised extent, layoutcommit, then write to another uninitialized >> > > extent in the same layout segment and layoutcommit? In my reading of >> > > the code, there is a chance that the second layoutcommit will fail to >> > > pick up the layout segment, and so will fail to notify the MDS that the >> > > second extent now contains data. >> > > >> > > blocklayout does not decide what to layoutcommit according to the lseg >> > > list. Instead, it keeps track of each extent's state at the >> > > granularity of blocksize, and encode whatever needs layoutcommitted in >> > > the layoutcommit call. So in your above case, as long as the second >> > > layoutcommit is issued, blocklayout will encode the newly written >> > > extent in the second layoutcommit call, even if the lseg is not >> > > attached to the second layoutcommit. >> > > >> > > But that leads to another two question: >> > > 1. How do we protect against layoutrecall if lseg is not linked to >> > > layoutcommit? For this one, can we just reject layoutrecall if there >> > > is inflight layoutcommit? It will be less parallel but can guarantee >> > > current implementation correctness. >> > > 2. blocklayout ONLY: bl_committing may be overloaded by several >> > > layoutcommit calls and we don't have information in >> > > cleanup_layoutcommit() on how many entry should be removed from >> > > bl_committing. Maybe we can add a (void*) to struct >> > > nfs4_layoutcommit_data, so that LD can pass some private information >> > > between encode_layoutcommit() and cleanup_layoutcommit()? >> > >> > 3. What is the purpose of pinning the layout segment at all if neither blocks, nor >> > objects nor files cares? >> I believe it is for protecting against layoutrecall. But since we are seperating lseg and LD specific layout information management, it is actually not working as expected. >> >> > IOW: if both objects and blocks track the information that they need for >> > layoutcommit outside the layout segments, then why do we need the >> > NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list at all? >> If we remove NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list, we must find other method to protect agains freeing lseg while layoutcommit is needed or is going on. e.g., check for NFS_INO_LAYOUTCOMMIT bit and inflight layoutcommit in initiate_file_draining(). > > The easiest solution is to ensure we have only _one_ LAYOUTCOMMIT on the > wire at a time. Otherwise, you are looking at a many-to-many mapping > between layoutcommits and lsegs. > > We should not expect to need multiple layoutcommits on the wire if pNFS > is working smoothly (i.e. no layout recalls), so optimising for that > case is wrong. > I'd also think that we want to order layoutcommit and ordinary commits > (for those NFS files servers that require both) so that we don't end up > writing a file size change to disk before the actual data is committed. > > So why not just protect the layoutcommit call using the existing > nfs_commit_set_lock/nfs_commit_clear_lock? Completely agree. We should serialize layoutcommit instead of trying to handle multiple of them concurrently. -- Thanks, -Bergwolf ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-06 22:13 ` Vitaliy Gusev 2011-09-06 22:32 ` Trond Myklebust @ 2011-09-08 10:00 ` tao.peng 2011-09-08 13:02 ` Vitaliy Gusev 1 sibling, 1 reply; 29+ messages in thread From: tao.peng @ 2011-09-08 10:00 UTC (permalink / raw) To: gusev.vitaliy, Trond.Myklebust; +Cc: gusev.vitaliy, linux-nfs Hi, Gusev, > -----Original Message----- > From: Vitaliy Gusev [mailto:gusev.vitaliy@nexenta.com] > Sent: Wednesday, September 07, 2011 6:14 AM > To: Trond Myklebust > Cc: Vitaliy Gusev; Peng, Tao; linux-nfs@vger.kernel.org > Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release > > >> @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct inode *inode, > struct list_head *listp) > >> > >> list_for_each_entry(lseg,&NFS_I(inode)->layout->plh_segs, pls_list) { > >> if (lseg->pls_range.iomode == IOMODE_RW&& > >> - test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)) > >> + test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)&& > >> + list_empty(&lseg->pls_lc_list)) > >> list_add(&lseg->pls_lc_list, listp); > >> } > >> } > > > > If the lseg is already part of one layoutcommit, but we're sending a > > second one for the same range (presumably because we wrote more data in > > the same region), then the above causes the lseg to be excluded. > > > Yes, lseg is excluded, This patch does exactly only exclusion of lseg. > lseg is used here only to get/put reference on this lseg, so skipping is > correct. > > > However, checking on list_empty can occur (on other CPU) in the middle: > > list_del_init(&lseg->pls_lc_list); > Here >>>>>> > if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT, > &lseg->pls_flags)) > put_lseg(lseg); > > > So list_del_init must be executed under the same lock as > pnfs_list_write_lseg, i.e. inode->i_lock. Yes, you are right. How about following patch? >From 14c6da67565fb31c2d2775ccefd93251f348910d Mon Sep 17 00:00:00 2001 From: Peng Tao <bergwolf@gmail.com> Date: Thu, 8 Sep 2011 00:57:02 -0400 Subject: [PATCH] nfsv4: fix race in layoutcommit lseg list create/free Since there can be more than one layoutcommit proc happen the same time, lseg list create/free should be protected. Otherwise lseg list may get corrupted. Reported-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com> Signed-off-by: Peng Tao <peng_tao@emc.com> --- fs/nfs/nfs4proc.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 8c77039..da7c20c 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void *calldata) struct pnfs_layout_segment *lseg, *tmp; pnfs_cleanup_layoutcommit(data); + spin_lock(&data->args.inode->i_lock); /* Matched by references in pnfs_set_layoutcommit */ list_for_each_entry_safe(lseg, tmp, &data->lseg_list, pls_lc_list) { list_del_init(&lseg->pls_lc_list); @@ -5971,6 +5972,7 @@ static void nfs4_layoutcommit_release(void *calldata) &lseg->pls_flags)) put_lseg(lseg); } + spin_unlock(&data->args.inode->i_lock); put_rpccred(data->cred); kfree(data); } -- 1.7.4.2 > > > > > > I agree that the current code causes list corruption, but before > > applying something like the above patch, I'd like to understand why it > > is correct. > > > > Trond > > > ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-08 10:00 ` tao.peng @ 2011-09-08 13:02 ` Vitaliy Gusev 2011-09-08 15:09 ` Peng Tao 0 siblings, 1 reply; 29+ messages in thread From: Vitaliy Gusev @ 2011-09-08 13:02 UTC (permalink / raw) To: tao.peng; +Cc: Trond.Myklebust, gusev.vitaliy, linux-nfs On 09/08/2011 02:00 PM, tao.peng@emc.com wrote: > Hi, Gusev, Hello Tao! > Yes, you are right. How about following patch? > > From 14c6da67565fb31c2d2775ccefd93251f348910d Mon Sep 17 00:00:00 2001 > From: Peng Tao<bergwolf@gmail.com> > Date: Thu, 8 Sep 2011 00:57:02 -0400 > Subject: [PATCH] nfsv4: fix race in layoutcommit lseg list create/free > > Since there can be more than one layoutcommit proc happen the same time, > lseg list create/free should be protected. Otherwise lseg list > may get corrupted. > > Reported-by: Vitaliy Gusev<gusev.vitaliy@nexenta.com> > Signed-off-by: Peng Tao<peng_tao@emc.com> > --- > fs/nfs/nfs4proc.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 8c77039..da7c20c 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void *calldata) > struct pnfs_layout_segment *lseg, *tmp; > > pnfs_cleanup_layoutcommit(data); > + spin_lock(&data->args.inode->i_lock); I think lock over list_del_init(&lseg->pls_lc_list) is enough, because 1. here lseg is deleted from unique (placed in stack) data and there is no any race during deletion. 2. Only one thing must be protected - list_empty status at pnfs_list_write_lseg (after my patch). The problem is that list_del_init is placed before test_and_clear_bit and spinlock can be some kind of barrier for protection against adding lseg to new data->lseg_list at pnfs_list_write_lseg. Do reordering list_del_init with test_and_clear_bit is not good, because lseg can be invalid after put_lseg. > /* Matched by references in pnfs_set_layoutcommit */ > list_for_each_entry_safe(lseg, tmp,&data->lseg_list, pls_lc_list) { > list_del_init(&lseg->pls_lc_list); > @@ -5971,6 +5972,7 @@ static void nfs4_layoutcommit_release(void *calldata) > &lseg->pls_flags)) > put_lseg(lseg); > } > + spin_unlock(&data->args.inode->i_lock); > put_rpccred(data->cred); > kfree(data); > } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-08 13:02 ` Vitaliy Gusev @ 2011-09-08 15:09 ` Peng Tao 2011-09-09 17:46 ` Vitaliy Gusev 0 siblings, 1 reply; 29+ messages in thread From: Peng Tao @ 2011-09-08 15:09 UTC (permalink / raw) To: Vitaliy Gusev; +Cc: tao.peng, Trond.Myklebust, gusev.vitaliy, linux-nfs Hi, Gusev, On Thu, Sep 8, 2011 at 9:02 PM, Vitaliy Gusev <gusev.vitaliy@nexenta.com> wrote: > On 09/08/2011 02:00 PM, tao.peng@emc.com wrote: >> >> Hi, Gusev, > > Hello Tao! > >> Yes, you are right. How about following patch? >> >> From 14c6da67565fb31c2d2775ccefd93251f348910d Mon Sep 17 00:00:00 2001 >> From: Peng Tao<bergwolf@gmail.com> >> Date: Thu, 8 Sep 2011 00:57:02 -0400 >> Subject: [PATCH] nfsv4: fix race in layoutcommit lseg list create/free >> >> Since there can be more than one layoutcommit proc happen the same time, >> lseg list create/free should be protected. Otherwise lseg list >> may get corrupted. >> >> Reported-by: Vitaliy Gusev<gusev.vitaliy@nexenta.com> >> Signed-off-by: Peng Tao<peng_tao@emc.com> >> --- >> fs/nfs/nfs4proc.c | 2 ++ >> 1 files changed, 2 insertions(+), 0 deletions(-) >> >> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >> index 8c77039..da7c20c 100644 >> --- a/fs/nfs/nfs4proc.c >> +++ b/fs/nfs/nfs4proc.c >> @@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void >> *calldata) >> struct pnfs_layout_segment *lseg, *tmp; >> >> pnfs_cleanup_layoutcommit(data); >> + spin_lock(&data->args.inode->i_lock); > > I think lock over list_del_init(&lseg->pls_lc_list) is enough, because I put the spinlock outside of the loop because the critical section is short enough and it should be faster than grabbing/dropping the inode lock for every entry in the list, agree? Thanks, Tao > > 1. here lseg is deleted from unique (placed in stack) data and there is no > any race during deletion. > > > 2. Only one thing must be protected - list_empty status at > pnfs_list_write_lseg (after my patch). > > The problem is that list_del_init is placed before test_and_clear_bit and > spinlock can be some kind of barrier for protection against adding lseg to > new data->lseg_list at > pnfs_list_write_lseg. > > Do reordering list_del_init with test_and_clear_bit is not good, because > lseg can be invalid after put_lseg. > > >> /* Matched by references in pnfs_set_layoutcommit */ >> list_for_each_entry_safe(lseg, tmp,&data->lseg_list, pls_lc_list) { >> list_del_init(&lseg->pls_lc_list); >> @@ -5971,6 +5972,7 @@ static void nfs4_layoutcommit_release(void >> *calldata) >> &lseg->pls_flags)) >> put_lseg(lseg); >> } >> + spin_unlock(&data->args.inode->i_lock); >> put_rpccred(data->cred); >> kfree(data); >> } > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Thanks, -Bergwolf ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-08 15:09 ` Peng Tao @ 2011-09-09 17:46 ` Vitaliy Gusev 2011-09-12 14:42 ` Peng Tao 0 siblings, 1 reply; 29+ messages in thread From: Vitaliy Gusev @ 2011-09-09 17:46 UTC (permalink / raw) To: Peng Tao; +Cc: tao.peng, Trond.Myklebust, gusev.vitaliy, linux-nfs >>> Reported-by: Vitaliy Gusev<gusev.vitaliy@nexenta.com> >>> Signed-off-by: Peng Tao<peng_tao@emc.com> >>> --- >>> fs/nfs/nfs4proc.c | 2 ++ >>> 1 files changed, 2 insertions(+), 0 deletions(-) >>> >>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >>> index 8c77039..da7c20c 100644 >>> --- a/fs/nfs/nfs4proc.c >>> +++ b/fs/nfs/nfs4proc.c >>> @@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void >>> *calldata) >>> struct pnfs_layout_segment *lseg, *tmp; >>> >>> pnfs_cleanup_layoutcommit(data); >>> + spin_lock(&data->args.inode->i_lock); >> >> I think lock over list_del_init(&lseg->pls_lc_list) is enough, because > I put the spinlock outside of the loop because the critical section is > short enough and it should be faster than grabbing/dropping the inode > lock for every entry in the list, agree? Yes, you are right. Looked again I saw that issue is not so bad as seems. Really, what is result of this issue? Only that lseg is in data->lseg_list, but without set NFS_LSEG_LAYOUTCOMMIT. The put_lseg is called correctly at nfs4_layoutcommit_release. So there is no any bug. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release 2011-09-09 17:46 ` Vitaliy Gusev @ 2011-09-12 14:42 ` Peng Tao 0 siblings, 0 replies; 29+ messages in thread From: Peng Tao @ 2011-09-12 14:42 UTC (permalink / raw) To: Vitaliy Gusev; +Cc: tao.peng, Trond.Myklebust, gusev.vitaliy, linux-nfs On Sat, Sep 10, 2011 at 1:46 AM, Vitaliy Gusev <gusev.vitaliy@nexenta.com> wrote: >>>> Reported-by: Vitaliy Gusev<gusev.vitaliy@nexenta.com> >>>> Signed-off-by: Peng Tao<peng_tao@emc.com> >>>> --- >>>> fs/nfs/nfs4proc.c | 2 ++ >>>> 1 files changed, 2 insertions(+), 0 deletions(-) >>>> >>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >>>> index 8c77039..da7c20c 100644 >>>> --- a/fs/nfs/nfs4proc.c >>>> +++ b/fs/nfs/nfs4proc.c >>>> @@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void >>>> *calldata) >>>> struct pnfs_layout_segment *lseg, *tmp; >>>> >>>> pnfs_cleanup_layoutcommit(data); >>>> + spin_lock(&data->args.inode->i_lock); >>> >>> I think lock over list_del_init(&lseg->pls_lc_list) is enough, because >> >> I put the spinlock outside of the loop because the critical section is >> short enough and it should be faster than grabbing/dropping the inode >> lock for every entry in the list, agree? > > Yes, you are right. > > Looked again I saw that issue is not so bad as seems. > Really, what is result of this issue? Only that lseg is in data->lseg_list, > but without set NFS_LSEG_LAYOUTCOMMIT. The put_lseg is called correctly at > nfs4_layoutcommit_release. So there is no any bug. Yes, you are right. In that case, we will have one lseg in the lc_list list w/o NFS_LSEG_LAYOUTCOMMIT set but it doesn't hurt anyone. The above patch of mine is not really necessary. Thanks, Tao ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2011-09-14 13:20 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-08-28 6:22 [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release Vitaliy Gusev 2011-09-06 19:29 ` Trond Myklebust 2011-09-06 22:13 ` Vitaliy Gusev 2011-09-06 22:32 ` Trond Myklebust 2011-09-08 10:21 ` tao.peng 2011-09-08 12:01 ` Myklebust, Trond 2011-09-08 15:00 ` Peng Tao 2011-09-08 17:05 ` Myklebust, Trond 2011-09-09 3:11 ` tao.peng 2011-09-09 18:20 ` Trond Myklebust 2011-09-10 7:14 ` Benny Halevy 2011-09-12 14:56 ` Peng Tao 2011-09-12 20:31 ` Benny Halevy 2011-09-12 21:10 ` Trond Myklebust 2011-09-13 7:50 ` Benny Halevy 2011-09-13 8:32 ` tao.peng 2011-09-14 6:43 ` Benny Halevy 2011-09-14 7:53 ` tao.peng [not found] ` <F19688880B763E40B28B2B462677FBF805C3F7A911-AYrsSIZi/B2B3McK65YKY9BPR1lH4CV8@public.gmane.org> 2011-09-14 13:20 ` Benny Halevy 2011-09-13 8:09 ` tao.peng 2011-09-14 6:46 ` Benny Halevy 2011-09-13 7:02 ` tao.peng 2011-09-13 7:54 ` Benny Halevy 2011-09-12 14:48 ` Peng Tao 2011-09-08 10:00 ` tao.peng 2011-09-08 13:02 ` Vitaliy Gusev 2011-09-08 15:09 ` Peng Tao 2011-09-09 17:46 ` Vitaliy Gusev 2011-09-12 14:42 ` Peng Tao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).