* [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.
@ 2016-10-21 0:01 NeilBrown
2016-11-18 5:01 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2016-10-21 0:01 UTC (permalink / raw)
To: Andy Adamson, Trond Myklebust, Anna Schumaker; +Cc: Linux NFS Mailing List
[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]
Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL
'ds', then ds->ds_clp will also be non-NULL.
This is not necessasrily true in the case when the process received a
fatal signal while nfs4_pnfs_ds_connect is waiting in
nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the
devid may not recently have been marked unavailable.
So add a test for ->ds_clp == NULL and return NULL in that case.
Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
Signed-off-by: NeilBrown <neilb@suse.com>
---
fs/nfs/filelayout/filelayoutdev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index 4946ef40ba87..85ef38f9765f 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
s->nfs_client->cl_rpcclient->cl_auth->au_flavor);
out_test_devid:
- if (filelayout_test_devid_unavailable(devid))
+ if (ret->ds_clp == NULL ||
+ filelayout_test_devid_unavailable(devid))
ret = NULL;
out:
return ret;
--
2.10.1
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. 2016-10-21 0:01 [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success NeilBrown @ 2016-11-18 5:01 ` NeilBrown 2016-11-18 14:32 ` Trond Myklebust 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2016-11-18 5:01 UTC (permalink / raw) To: Andy Adamson, Trond Myklebust, Anna Schumaker; +Cc: Linux NFS Mailing List [-- Attachment #1: Type: text/plain, Size: 1389 bytes --] Ping? No sign of this in linux-next, and no replies.... Thanks, NeilBrown On Fri, Oct 21 2016, NeilBrown wrote: > > Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL > 'ds', then ds->ds_clp will also be non-NULL. > > This is not necessasrily true in the case when the process received a > fatal signal while nfs4_pnfs_ds_connect is waiting in > nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the > devid may not recently have been marked unavailable. > > So add a test for ->ds_clp == NULL and return NULL in that case. > > Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race") > Signed-off-by: NeilBrown <neilb@suse.com> > --- > fs/nfs/filelayout/filelayoutdev.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c > index 4946ef40ba87..85ef38f9765f 100644 > --- a/fs/nfs/filelayout/filelayoutdev.c > +++ b/fs/nfs/filelayout/filelayoutdev.c > @@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx) > s->nfs_client->cl_rpcclient->cl_auth->au_flavor); > > out_test_devid: > - if (filelayout_test_devid_unavailable(devid)) > + if (ret->ds_clp == NULL || > + filelayout_test_devid_unavailable(devid)) > ret = NULL; > out: > return ret; > -- > 2.10.1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 800 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. 2016-11-18 5:01 ` NeilBrown @ 2016-11-18 14:32 ` Trond Myklebust 2016-11-18 17:34 ` Olga Kornievskaia 0 siblings, 1 reply; 9+ messages in thread From: Trond Myklebust @ 2016-11-18 14:32 UTC (permalink / raw) To: NeilBrown; +Cc: Adamson William, Schumaker Anna, List Linux NFS Mailing DQo+IE9uIE5vdiAxOCwgMjAxNiwgYXQgMDA6MDEsIE5laWxCcm93biA8bmVpbGJAc3VzZS5jb20+ IHdyb3RlOg0KPiANCj4gDQo+IFBpbmc/DQo+IE5vIHNpZ24gb2YgdGhpcyBpbiBsaW51eC1uZXh0 LCBhbmQgbm8gcmVwbGllc+KApi4NCg0KSeKAmWQgbGlrZSBzb21lIGNvbmZpcm1hdGlvbiBmcm9t IHRoZSBOZXRBcHAgZm9sa3Mgb24gdGhpcy4gVGhleSBhcmUgdGhlIG1haW4gc3Rha2Vob2xkZXJz IGZvciB0aGUgcE5GUyBmaWxlcyBpbXBsZW1lbnRhdGlvbi4gSSBkb27igJl0IHRoaW5rIHdlIG5l ZWQgYW4gZXF1aXZhbGVudCBmb3IgZmxleGZpbGVzLg0KDQo+IA0KPiBUaGFua3MsDQo+IE5laWxC cm93bg0KPiANCj4gDQo+IE9uIEZyaSwgT2N0IDIxIDIwMTYsIE5laWxCcm93biB3cm90ZToNCj4g DQo+PiANCj4+IFZhcmlvdXMgcGxhY2VzIGFzc3VtZSB0aGF0IGlmIG5mczRfZmxfcHJlcGFyZV9k cygpIHR1cm5zIGEgbm9uLU5VTEwNCj4+ICdkcycsIHRoZW4gZHMtPmRzX2NscCB3aWxsIGFsc28g YmUgbm9uLU5VTEwuDQo+PiANCj4+IFRoaXMgaXMgbm90IG5lY2Vzc2FzcmlseSB0cnVlIGluIHRo ZSBjYXNlIHdoZW4gdGhlIHByb2Nlc3MgcmVjZWl2ZWQgYQ0KPj4gZmF0YWwgc2lnbmFsIHdoaWxl IG5mczRfcG5mc19kc19jb25uZWN0IGlzIHdhaXRpbmcgaW4NCj4+IG5mczRfd2FpdF9kc19jb25u ZWN0KCkuICBJbiB0aGF0IGNhc2UgLT5kc19jbHAgbWF5IG5vdCBiZSBzZXQsIGFuZCB0aGUNCj4+ IGRldmlkIG1heSBub3QgcmVjZW50bHkgaGF2ZSBiZWVuIG1hcmtlZCB1bmF2YWlsYWJsZS4NCj4+ IA0KPj4gU28gYWRkIGEgdGVzdCBmb3IgLT5kc19jbHAgPT0gTlVMTCBhbmQgcmV0dXJuIE5VTEwg aW4gdGhhdCBjYXNlLg0KPj4gDQo+PiBGaXhlczogYzIzMjY2ZDUzMmI0ICgiTkZTNC4xIEZpeCBk YXRhIHNlcnZlciBjb25uZWN0aW9uIHJhY2UiKQ0KPj4gU2lnbmVkLW9mZi1ieTogTmVpbEJyb3du IDxuZWlsYkBzdXNlLmNvbT4NCj4+IC0tLQ0KPj4gZnMvbmZzL2ZpbGVsYXlvdXQvZmlsZWxheW91 dGRldi5jIHwgMyArKy0NCj4+IDEgZmlsZSBjaGFuZ2VkLCAyIGluc2VydGlvbnMoKyksIDEgZGVs ZXRpb24oLSkNCj4+IA0KPj4gZGlmZiAtLWdpdCBhL2ZzL25mcy9maWxlbGF5b3V0L2ZpbGVsYXlv dXRkZXYuYyBiL2ZzL25mcy9maWxlbGF5b3V0L2ZpbGVsYXlvdXRkZXYuYw0KPj4gaW5kZXggNDk0 NmVmNDBiYTg3Li44NWVmMzhmOTc2NWYgMTAwNjQ0DQo+PiAtLS0gYS9mcy9uZnMvZmlsZWxheW91 dC9maWxlbGF5b3V0ZGV2LmMNCj4+ICsrKyBiL2ZzL25mcy9maWxlbGF5b3V0L2ZpbGVsYXlvdXRk ZXYuYw0KPj4gQEAgLTI4Myw3ICsyODMsOCBAQCBuZnM0X2ZsX3ByZXBhcmVfZHMoc3RydWN0IHBu ZnNfbGF5b3V0X3NlZ21lbnQgKmxzZWcsIHUzMiBkc19pZHgpDQo+PiAJCQkgICAgIHMtPm5mc19j bGllbnQtPmNsX3JwY2NsaWVudC0+Y2xfYXV0aC0+YXVfZmxhdm9yKTsNCj4+IA0KPj4gb3V0X3Rl c3RfZGV2aWQ6DQo+PiAtCWlmIChmaWxlbGF5b3V0X3Rlc3RfZGV2aWRfdW5hdmFpbGFibGUoZGV2 aWQpKQ0KPj4gKwlpZiAocmV0LT5kc19jbHAgPT0gTlVMTCB8fA0KPj4gKwkgICAgZmlsZWxheW91 dF90ZXN0X2RldmlkX3VuYXZhaWxhYmxlKGRldmlkKSkNCj4+IAkJcmV0ID0gTlVMTDsNCj4+IG91 dDoNCj4+IAlyZXR1cm4gcmV0Ow0KPj4gLS0gDQo+PiAyLjEwLjENCg0K ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. 2016-11-18 14:32 ` Trond Myklebust @ 2016-11-18 17:34 ` Olga Kornievskaia 2016-11-19 6:33 ` NeilBrown 0 siblings, 1 reply; 9+ messages in thread From: Olga Kornievskaia @ 2016-11-18 17:34 UTC (permalink / raw) To: Trond Myklebust Cc: NeilBrown, Adamson William, Schumaker Anna, List Linux NFS Mailing On Fri, Nov 18, 2016 at 9:32 AM, Trond Myklebust <trondmy@primarydata.com> wrote: > >> On Nov 18, 2016, at 00:01, NeilBrown <neilb@suse.com> wrote: >> >> >> Ping? >> No sign of this in linux-next, and no replies=E2=80=A6. > > I=E2=80=99d like some confirmation from the NetApp folks on this. They ar= e the main stakeholders for the pNFS files implementation. I don=E2=80=99t = think we need an equivalent for flex files. Just to see if I understand this patch. If "ds" isn't NULL, then the client has an RPC client with the ds. But it doesn't have a valid NFS client? Looking at the code I really don't see how that can happen. Maybe there is a bug elsewhere? If we return here, is there a chance we are going to have a zombie RPC client with the ds? If that's not a problem then I don't think there an issue to assume that if there is no valid NFS client then we would want to return a NULL from nfs4_fl_prepare_ds(). > >> >> Thanks, >> NeilBrown >> >> >> On Fri, Oct 21 2016, NeilBrown wrote: >> >>> >>> Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL >>> 'ds', then ds->ds_clp will also be non-NULL. >>> >>> This is not necessasrily true in the case when the process received a >>> fatal signal while nfs4_pnfs_ds_connect is waiting in >>> nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the >>> devid may not recently have been marked unavailable. >>> >>> So add a test for ->ds_clp =3D=3D NULL and return NULL in that case. >>> >>> Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race") >>> Signed-off-by: NeilBrown <neilb@suse.com> >>> --- >>> fs/nfs/filelayout/filelayoutdev.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/file= layoutdev.c >>> index 4946ef40ba87..85ef38f9765f 100644 >>> --- a/fs/nfs/filelayout/filelayoutdev.c >>> +++ b/fs/nfs/filelayout/filelayoutdev.c >>> @@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg= , u32 ds_idx) >>> s->nfs_client->cl_rpcclient->cl_auth->au_flav= or); >>> >>> out_test_devid: >>> - if (filelayout_test_devid_unavailable(devid)) >>> + if (ret->ds_clp =3D=3D NULL || >>> + filelayout_test_devid_unavailable(devid)) >>> ret =3D NULL; >>> out: >>> return ret; >>> -- >>> 2.10.1 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. 2016-11-18 17:34 ` Olga Kornievskaia @ 2016-11-19 6:33 ` NeilBrown 2016-11-22 20:32 ` Adamson, Andy 2016-11-23 17:37 ` Olga Kornievskaia 0 siblings, 2 replies; 9+ messages in thread From: NeilBrown @ 2016-11-19 6:33 UTC (permalink / raw) To: Olga Kornievskaia, Trond Myklebust Cc: Adamson William, Schumaker Anna, List Linux NFS Mailing [-- Attachment #1: Type: text/plain, Size: 3744 bytes --] On Sat, Nov 19 2016, Olga Kornievskaia wrote: > On Fri, Nov 18, 2016 at 9:32 AM, Trond Myklebust > <trondmy@primarydata.com> wrote: >> >>> On Nov 18, 2016, at 00:01, NeilBrown <neilb@suse.com> wrote: >>> >>> >>> Ping? >>> No sign of this in linux-next, and no replies…. >> >> I’d like some confirmation from the NetApp folks on this. They are the main stakeholders for the pNFS files implementation. I don’t think we need an equivalent for flex files. > > Just to see if I understand this patch. If "ds" isn't NULL, then the > client has an RPC client with the ds. But it doesn't have a valid NFS > client? Looking at the code I really don't see how that can happen. I thought I explained that, but I was rather brief. nfs4_fl_prepare_ds() calls nfs4_pnfs_ds_connect() in order to create the NFS client. The first time it is called it will set NFS4DS_CONNECTING and then call _nfs4_pnfs_v?_ds_connect() which will establish the connection and set ds->ds_clp. If the server is unresponsive, this an block indefinitely. If a second thread then tries the same time, it will enter nfs4_pnfs_ds_connect() and will discovery that NFS4DS_CONNECTING is already set, so it will call nfs4_wait_ds_connect(). As nfs4_wait_ds_connect() waits with TASK_KILLABLE, it will abort if the process is kill by an uncaught signal (such as SIGKILL). In this case it will return even though NFS4DS_CONNECTING is still set, and ds->ds_clp is still NULL. i.e. the first thread hasn't established the connection yet. As the process has been killed, the 'ds' that it holds that doesn't have an NFS client handle will never be used. But we need to be sure that the process will exit cleanly without trying to de-reference that NULL ds_clp. That is what the patch ensures. > Maybe there is a bug elsewhere? If we return here, is there a chance > we are going to have a zombie RPC client with the ds? If that's not a > problem then I don't think there an issue to assume that if there is > no valid NFS client then we would want to return a NULL from > nfs4_fl_prepare_ds(). It isn't a "zombie RPC client", but rather an unborn RPC client, which still has NFS4DS_CONNECTING set. Thanks, NeilBrown > > >> >>> >>> Thanks, >>> NeilBrown >>> >>> >>> On Fri, Oct 21 2016, NeilBrown wrote: >>> >>>> >>>> Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL >>>> 'ds', then ds->ds_clp will also be non-NULL. >>>> >>>> This is not necessasrily true in the case when the process received a >>>> fatal signal while nfs4_pnfs_ds_connect is waiting in >>>> nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the >>>> devid may not recently have been marked unavailable. >>>> >>>> So add a test for ->ds_clp == NULL and return NULL in that case. >>>> >>>> Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race") >>>> Signed-off-by: NeilBrown <neilb@suse.com> >>>> --- >>>> fs/nfs/filelayout/filelayoutdev.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c >>>> index 4946ef40ba87..85ef38f9765f 100644 >>>> --- a/fs/nfs/filelayout/filelayoutdev.c >>>> +++ b/fs/nfs/filelayout/filelayoutdev.c >>>> @@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx) >>>> s->nfs_client->cl_rpcclient->cl_auth->au_flavor); >>>> >>>> out_test_devid: >>>> - if (filelayout_test_devid_unavailable(devid)) >>>> + if (ret->ds_clp == NULL || >>>> + filelayout_test_devid_unavailable(devid)) >>>> ret = NULL; >>>> out: >>>> return ret; >>>> -- >>>> 2.10.1 >> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 800 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. 2016-11-19 6:33 ` NeilBrown @ 2016-11-22 20:32 ` Adamson, Andy 2016-11-23 17:37 ` Olga Kornievskaia 1 sibling, 0 replies; 9+ messages in thread From: Adamson, Andy @ 2016-11-22 20:32 UTC (permalink / raw) To: NeilBrown, Olga Kornievskaia, Trond Myklebust Cc: Schumaker, Anna, List Linux NFS Mailing VGhlIHBhdGNoIGxvb2tzIGdvb2QgdG8gbWUuIFdlIGRvIG5lZWQgdG8gZW5zdXJlIHRoYXQgZHNf Y2xwIGlzIHNldCBiZWZvcmUgcmV0dXJuaW5nIGEgbm9uLW51bGwgbmZzNF9wbmZzX2RzIGZyb20g bmZzNF9mbF9wcmVwYXJlX2RzLg0KQXMgVHJvbmQgcG9pbnRlZCBvdXQsIG5mczRfZmZfbGF5b3V0 X3ByZXBhcmVfZHMgYWxyZWFkeSBkb2VzIHRoaXMuDQoNCuKGkkFuZHkNCg0KT24gMTEvMTkvMTYs IDE6MzMgQU0sICJOZWlsQnJvd24iIDxuZWlsYkBzdXNlLmNvbT4gd3JvdGU6DQoNCk9uIFNhdCwg Tm92IDE5IDIwMTYsIE9sZ2EgS29ybmlldnNrYWlhIHdyb3RlOg0KDQo+IE9uIEZyaSwgTm92IDE4 LCAyMDE2IGF0IDk6MzIgQU0sIFRyb25kIE15a2xlYnVzdA0KPiA8dHJvbmRteUBwcmltYXJ5ZGF0 YS5jb20+IHdyb3RlOg0KPj4NCj4+PiBPbiBOb3YgMTgsIDIwMTYsIGF0IDAwOjAxLCBOZWlsQnJv d24gPG5laWxiQHN1c2UuY29tPiB3cm90ZToNCj4+Pg0KPj4+DQo+Pj4gUGluZz8NCj4+PiBObyBz aWduIG9mIHRoaXMgaW4gbGludXgtbmV4dCwgYW5kIG5vIHJlcGxpZXPigKYuDQo+Pg0KPj4gSeKA mWQgbGlrZSBzb21lIGNvbmZpcm1hdGlvbiBmcm9tIHRoZSBOZXRBcHAgZm9sa3Mgb24gdGhpcy4g VGhleSBhcmUgdGhlIG1haW4gc3Rha2Vob2xkZXJzIGZvciB0aGUgcE5GUyBmaWxlcyBpbXBsZW1l bnRhdGlvbi4gSSBkb27igJl0IHRoaW5rIHdlIG5lZWQgYW4gZXF1aXZhbGVudCBmb3IgZmxleCBm aWxlcy4NCj4NCj4gSnVzdCB0byBzZWUgaWYgSSB1bmRlcnN0YW5kIHRoaXMgcGF0Y2guIElmICJk cyIgaXNuJ3QgTlVMTCwgdGhlbiB0aGUNCj4gY2xpZW50IGhhcyBhbiBSUEMgY2xpZW50IHdpdGgg dGhlIGRzLiBCdXQgaXQgZG9lc24ndCBoYXZlIGEgdmFsaWQgTkZTDQo+IGNsaWVudD8gTG9va2lu ZyBhdCB0aGUgY29kZSBJIHJlYWxseSBkb24ndCBzZWUgaG93IHRoYXQgY2FuIGhhcHBlbi4NCg0K SSB0aG91Z2h0IEkgZXhwbGFpbmVkIHRoYXQsIGJ1dCBJIHdhcyByYXRoZXIgYnJpZWYuDQoNCm5m czRfZmxfcHJlcGFyZV9kcygpIGNhbGxzIG5mczRfcG5mc19kc19jb25uZWN0KCkgaW4gb3JkZXIg dG8gY3JlYXRlIHRoZQ0KTkZTIGNsaWVudC4gIFRoZSBmaXJzdCB0aW1lIGl0IGlzIGNhbGxlZCBp dCB3aWxsIHNldCBORlM0RFNfQ09OTkVDVElORw0KYW5kIHRoZW4gY2FsbCBfbmZzNF9wbmZzX3Y/ X2RzX2Nvbm5lY3QoKSB3aGljaCB3aWxsIGVzdGFibGlzaCB0aGUNCmNvbm5lY3Rpb24gYW5kIHNl dCBkcy0+ZHNfY2xwLiAgSWYgdGhlIHNlcnZlciBpcyB1bnJlc3BvbnNpdmUsIHRoaXMgYW4NCmJs b2NrIGluZGVmaW5pdGVseS4NCg0KSWYgYSBzZWNvbmQgdGhyZWFkIHRoZW4gdHJpZXMgdGhlIHNh bWUgdGltZSwgaXQgd2lsbCBlbnRlcg0KbmZzNF9wbmZzX2RzX2Nvbm5lY3QoKSBhbmQgd2lsbCBk aXNjb3ZlcnkgdGhhdCBORlM0RFNfQ09OTkVDVElORyBpcw0KYWxyZWFkeSBzZXQsIHNvIGl0IHdp bGwgY2FsbCBuZnM0X3dhaXRfZHNfY29ubmVjdCgpLg0KDQpBcyBuZnM0X3dhaXRfZHNfY29ubmVj dCgpIHdhaXRzIHdpdGggVEFTS19LSUxMQUJMRSwgaXQgd2lsbCBhYm9ydCBpZiB0aGUNCnByb2Nl c3MgaXMga2lsbCBieSBhbiB1bmNhdWdodCBzaWduYWwgKHN1Y2ggYXMgU0lHS0lMTCkuDQpJbiB0 aGlzIGNhc2UgaXQgd2lsbCByZXR1cm4gZXZlbiB0aG91Z2ggTkZTNERTX0NPTk5FQ1RJTkcgaXMg c3RpbGwgc2V0LA0KYW5kIGRzLT5kc19jbHAgaXMgc3RpbGwgTlVMTC4gIGkuZS4gdGhlIGZpcnN0 IHRocmVhZCBoYXNuJ3QgZXN0YWJsaXNoZWQNCnRoZSBjb25uZWN0aW9uIHlldC4NCg0KQXMgdGhl IHByb2Nlc3MgaGFzIGJlZW4ga2lsbGVkLCB0aGUgJ2RzJyB0aGF0IGl0IGhvbGRzIHRoYXQgZG9l c24ndCBoYXZlDQphbiBORlMgY2xpZW50IGhhbmRsZSB3aWxsIG5ldmVyIGJlIHVzZWQuICBCdXQg d2UgbmVlZCB0byBiZSBzdXJlIHRoYXQNCnRoZSBwcm9jZXNzIHdpbGwgZXhpdCBjbGVhbmx5IHdp dGhvdXQgdHJ5aW5nIHRvIGRlLXJlZmVyZW5jZSB0aGF0IE5VTEwNCmRzX2NscC4NCg0KVGhhdCBp cyB3aGF0IHRoZSBwYXRjaCBlbnN1cmVzLg0KDQoNCj4gTWF5YmUgdGhlcmUgaXMgYSBidWcgZWxz ZXdoZXJlPyBJZiB3ZSByZXR1cm4gaGVyZSwgaXMgdGhlcmUgYSBjaGFuY2UNCj4gd2UgYXJlIGdv aW5nIHRvIGhhdmUgYSB6b21iaWUgUlBDIGNsaWVudCB3aXRoIHRoZSBkcz8gSWYgdGhhdCdzIG5v dCBhDQo+IHByb2JsZW0gdGhlbiBJIGRvbid0IHRoaW5rIHRoZXJlIGFuIGlzc3VlIHRvIGFzc3Vt ZSB0aGF0IGlmIHRoZXJlIGlzDQo+IG5vIHZhbGlkIE5GUyBjbGllbnQgdGhlbiB3ZSB3b3VsZCB3 YW50IHRvIHJldHVybiBhIE5VTEwgZnJvbQ0KPiBuZnM0X2ZsX3ByZXBhcmVfZHMoKS4NCg0KSXQg aXNuJ3QgYSAiem9tYmllIFJQQyBjbGllbnQiLCBidXQgcmF0aGVyIGFuIHVuYm9ybiBSUEMgY2xp ZW50LCB3aGljaA0Kc3RpbGwgaGFzIE5GUzREU19DT05ORUNUSU5HIHNldC4NCg0KVGhhbmtzLA0K TmVpbEJyb3duDQoNCg0KPg0KPg0KPj4NCj4+Pg0KPj4+IFRoYW5rcywNCj4+PiBOZWlsQnJvd24N Cj4+Pg0KPj4+DQo+Pj4gT24gRnJpLCBPY3QgMjEgMjAxNiwgTmVpbEJyb3duIHdyb3RlOg0KPj4+ DQo+Pj4+DQo+Pj4+IFZhcmlvdXMgcGxhY2VzIGFzc3VtZSB0aGF0IGlmIG5mczRfZmxfcHJlcGFy ZV9kcygpIHR1cm5zIGEgbm9uLU5VTEwNCj4+Pj4gJ2RzJywgdGhlbiBkcy0+ZHNfY2xwIHdpbGwg YWxzbyBiZSBub24tTlVMTC4NCj4+Pj4NCj4+Pj4gVGhpcyBpcyBub3QgbmVjZXNzYXNyaWx5IHRy dWUgaW4gdGhlIGNhc2Ugd2hlbiB0aGUgcHJvY2VzcyByZWNlaXZlZCBhDQo+Pj4+IGZhdGFsIHNp Z25hbCB3aGlsZSBuZnM0X3BuZnNfZHNfY29ubmVjdCBpcyB3YWl0aW5nIGluDQo+Pj4+IG5mczRf d2FpdF9kc19jb25uZWN0KCkuICBJbiB0aGF0IGNhc2UgLT5kc19jbHAgbWF5IG5vdCBiZSBzZXQs IGFuZCB0aGUNCj4+Pj4gZGV2aWQgbWF5IG5vdCByZWNlbnRseSBoYXZlIGJlZW4gbWFya2VkIHVu YXZhaWxhYmxlLg0KPj4+Pg0KPj4+PiBTbyBhZGQgYSB0ZXN0IGZvciAtPmRzX2NscCA9PSBOVUxM IGFuZCByZXR1cm4gTlVMTCBpbiB0aGF0IGNhc2UuDQo+Pj4+DQo+Pj4+IEZpeGVzOiBjMjMyNjZk NTMyYjQgKCJORlM0LjEgRml4IGRhdGEgc2VydmVyIGNvbm5lY3Rpb24gcmFjZSIpDQo+Pj4+IFNp Z25lZC1vZmYtYnk6IE5laWxCcm93biA8bmVpbGJAc3VzZS5jb20+DQo+Pj4+IC0tLQ0KPj4+PiBm cy9uZnMvZmlsZWxheW91dC9maWxlbGF5b3V0ZGV2LmMgfCAzICsrLQ0KPj4+PiAxIGZpbGUgY2hh bmdlZCwgMiBpbnNlcnRpb25zKCspLCAxIGRlbGV0aW9uKC0pDQo+Pj4+DQo+Pj4+IGRpZmYgLS1n aXQgYS9mcy9uZnMvZmlsZWxheW91dC9maWxlbGF5b3V0ZGV2LmMgYi9mcy9uZnMvZmlsZWxheW91 dC9maWxlbGF5b3V0ZGV2LmMNCj4+Pj4gaW5kZXggNDk0NmVmNDBiYTg3Li44NWVmMzhmOTc2NWYg MTAwNjQ0DQo+Pj4+IC0tLSBhL2ZzL25mcy9maWxlbGF5b3V0L2ZpbGVsYXlvdXRkZXYuYw0KPj4+ PiArKysgYi9mcy9uZnMvZmlsZWxheW91dC9maWxlbGF5b3V0ZGV2LmMNCj4+Pj4gQEAgLTI4Myw3 ICsyODMsOCBAQCBuZnM0X2ZsX3ByZXBhcmVfZHMoc3RydWN0IHBuZnNfbGF5b3V0X3NlZ21lbnQg KmxzZWcsIHUzMiBkc19pZHgpDQo+Pj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgcy0+bmZz X2NsaWVudC0+Y2xfcnBjY2xpZW50LT5jbF9hdXRoLT5hdV9mbGF2b3IpOw0KPj4+Pg0KPj4+PiBv dXRfdGVzdF9kZXZpZDoNCj4+Pj4gLSAgICBpZiAoZmlsZWxheW91dF90ZXN0X2RldmlkX3VuYXZh aWxhYmxlKGRldmlkKSkNCj4+Pj4gKyAgICBpZiAocmV0LT5kc19jbHAgPT0gTlVMTCB8fA0KPj4+ PiArICAgICAgICBmaWxlbGF5b3V0X3Rlc3RfZGV2aWRfdW5hdmFpbGFibGUoZGV2aWQpKQ0KPj4+ PiAgICAgICAgICAgICAgcmV0ID0gTlVMTDsNCj4+Pj4gb3V0Og0KPj4+PiAgICAgIHJldHVybiBy ZXQ7DQo+Pj4+IC0tDQo+Pj4+IDIuMTAuMQ0KPj4NCg0KDQo= ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. 2016-11-19 6:33 ` NeilBrown 2016-11-22 20:32 ` Adamson, Andy @ 2016-11-23 17:37 ` Olga Kornievskaia 2016-12-19 0:19 ` [PATCH resend] " NeilBrown 1 sibling, 1 reply; 9+ messages in thread From: Olga Kornievskaia @ 2016-11-23 17:37 UTC (permalink / raw) To: NeilBrown Cc: Trond Myklebust, Adamson William, Schumaker Anna, List Linux NFS Mailing On Sat, Nov 19, 2016 at 1:33 AM, NeilBrown <neilb@suse.com> wrote: > On Sat, Nov 19 2016, Olga Kornievskaia wrote: > >> On Fri, Nov 18, 2016 at 9:32 AM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >>> >>>> On Nov 18, 2016, at 00:01, NeilBrown <neilb@suse.com> wrote: >>>> >>>> >>>> Ping? >>>> No sign of this in linux-next, and no replies=E2=80=A6. >>> >>> I=E2=80=99d like some confirmation from the NetApp folks on this. They = are the main stakeholders for the pNFS files implementation. I don=E2=80=99= t think we need an equivalent for flex files. >> >> Just to see if I understand this patch. If "ds" isn't NULL, then the >> client has an RPC client with the ds. But it doesn't have a valid NFS >> client? Looking at the code I really don't see how that can happen. > > I thought I explained that, but I was rather brief. > > nfs4_fl_prepare_ds() calls nfs4_pnfs_ds_connect() in order to create the > NFS client. The first time it is called it will set NFS4DS_CONNECTING > and then call _nfs4_pnfs_v?_ds_connect() which will establish the > connection and set ds->ds_clp. If the server is unresponsive, this an > block indefinitely. > > If a second thread then tries the same time, it will enter > nfs4_pnfs_ds_connect() and will discovery that NFS4DS_CONNECTING is > already set, so it will call nfs4_wait_ds_connect(). > > As nfs4_wait_ds_connect() waits with TASK_KILLABLE, it will abort if the > process is kill by an uncaught signal (such as SIGKILL). > In this case it will return even though NFS4DS_CONNECTING is still set, > and ds->ds_clp is still NULL. i.e. the first thread hasn't established > the connection yet. > > As the process has been killed, the 'ds' that it holds that doesn't have > an NFS client handle will never be used. But we need to be sure that > the process will exit cleanly without trying to de-reference that NULL > ds_clp. > > That is what the patch ensures. Thanks for the explanation. Makes sense. >> Maybe there is a bug elsewhere? If we return here, is there a chance >> we are going to have a zombie RPC client with the ds? If that's not a >> problem then I don't think there an issue to assume that if there is >> no valid NFS client then we would want to return a NULL from >> nfs4_fl_prepare_ds(). > > It isn't a "zombie RPC client", but rather an unborn RPC client, which > still has NFS4DS_CONNECTING set. > > Thanks, > NeilBrown > > >> >> >>> >>>> >>>> Thanks, >>>> NeilBrown >>>> >>>> >>>> On Fri, Oct 21 2016, NeilBrown wrote: >>>> >>>>> >>>>> Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL >>>>> 'ds', then ds->ds_clp will also be non-NULL. >>>>> >>>>> This is not necessasrily true in the case when the process received a >>>>> fatal signal while nfs4_pnfs_ds_connect is waiting in >>>>> nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and th= e >>>>> devid may not recently have been marked unavailable. >>>>> >>>>> So add a test for ->ds_clp =3D=3D NULL and return NULL in that case. >>>>> >>>>> Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race") >>>>> Signed-off-by: NeilBrown <neilb@suse.com> >>>>> --- >>>>> fs/nfs/filelayout/filelayoutdev.c | 3 ++- >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/fi= lelayoutdev.c >>>>> index 4946ef40ba87..85ef38f9765f 100644 >>>>> --- a/fs/nfs/filelayout/filelayoutdev.c >>>>> +++ b/fs/nfs/filelayout/filelayoutdev.c >>>>> @@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *ls= eg, u32 ds_idx) >>>>> s->nfs_client->cl_rpcclient->cl_auth->au_fl= avor); >>>>> >>>>> out_test_devid: >>>>> - if (filelayout_test_devid_unavailable(devid)) >>>>> + if (ret->ds_clp =3D=3D NULL || >>>>> + filelayout_test_devid_unavailable(devid)) >>>>> ret =3D NULL; >>>>> out: >>>>> return ret; >>>>> -- >>>>> 2.10.1 >>> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH resend] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. 2016-11-23 17:37 ` Olga Kornievskaia @ 2016-12-19 0:19 ` NeilBrown 2016-12-19 0:28 ` Trond Myklebust 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2016-12-19 0:19 UTC (permalink / raw) To: Trond Myklebust Cc: Olga Kornievskaia, Adamson William, Schumaker Anna, List Linux NFS Mailing, Adamson, Andy [-- Attachment #1: Type: text/plain, Size: 1447 bytes --] Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds', then ds->ds_clp will also be non-NULL. This is not necessasrily true in the case when the process received a fatal signal while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the devid may not recently have been marked unavailable. So add a test for ds_clp == NULL and return NULL in that case. Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race") Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Olga Kornievskaia <aglo@umich.edu> Acked-by: Adamson, Andy <William.Adamson@netapp.com> --- Hi Trond, I just noticed that this wasn't in your 4.10 pull request. So I've added Acked-bys from Andy and Olga and am resending. Thanks, NeilBrown fs/nfs/filelayout/filelayoutdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c index a5589b791439..f956ca20a8a3 100644 --- a/fs/nfs/filelayout/filelayoutdev.c +++ b/fs/nfs/filelayout/filelayoutdev.c @@ -282,7 +282,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx) s->nfs_client->cl_minorversion); out_test_devid: - if (filelayout_test_devid_unavailable(devid)) + if (ret->ds_clp == NULL || + filelayout_test_devid_unavailable(devid)) ret = NULL; out: return ret; -- 2.11.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH resend] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. 2016-12-19 0:19 ` [PATCH resend] " NeilBrown @ 2016-12-19 0:28 ` Trond Myklebust 0 siblings, 0 replies; 9+ messages in thread From: Trond Myklebust @ 2016-12-19 0:28 UTC (permalink / raw) To: NeilBrown Cc: Olga Kornievskaia, William Andros Adamson, Anna Schumaker, Linux NFS Mailing List, Adamson@suse.com, William Andros Adamson DQo+IE9uIERlYyAxOCwgMjAxNiwgYXQgMTk6MTksIE5laWxCcm93biA8bmVpbGJAc3VzZS5jb20+ IHdyb3RlOg0KPiANCj4gDQo+IFZhcmlvdXMgcGxhY2VzIGFzc3VtZSB0aGF0IGlmIG5mczRfZmxf cHJlcGFyZV9kcygpIHR1cm5zIGEgbm9uLU5VTEwgJ2RzJywNCj4gdGhlbiBkcy0+ZHNfY2xwIHdp bGwgYWxzbyBiZSBub24tTlVMTC4NCj4gDQo+IFRoaXMgaXMgbm90IG5lY2Vzc2FzcmlseSB0cnVl IGluIHRoZSBjYXNlIHdoZW4gdGhlIHByb2Nlc3MgcmVjZWl2ZWQgYSBmYXRhbCBzaWduYWwNCj4g d2hpbGUgbmZzNF9wbmZzX2RzX2Nvbm5lY3QgaXMgd2FpdGluZyBpbiBuZnM0X3dhaXRfZHNfY29u bmVjdCgpLg0KPiBJbiB0aGF0IGNhc2UgLT5kc19jbHAgbWF5IG5vdCBiZSBzZXQsIGFuZCB0aGUg ZGV2aWQgbWF5IG5vdCByZWNlbnRseSBoYXZlIGJlZW4gbWFya2VkDQo+IHVuYXZhaWxhYmxlLg0K PiANCj4gU28gYWRkIGEgdGVzdCBmb3IgZHNfY2xwID09IE5VTEwgYW5kIHJldHVybiBOVUxMIGlu IHRoYXQgY2FzZS4NCj4gDQo+IEZpeGVzOiBjMjMyNjZkNTMyYjQgKCJORlM0LjEgRml4IGRhdGEg c2VydmVyIGNvbm5lY3Rpb24gcmFjZSIpDQo+IFNpZ25lZC1vZmYtYnk6IE5laWxCcm93biA8bmVp bGJAc3VzZS5jb20+DQo+IEFja2VkLWJ5OiBPbGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWljaC5l ZHU+DQo+IEFja2VkLWJ5OiBBZGFtc29uLCBBbmR5IDxXaWxsaWFtLkFkYW1zb25AbmV0YXBwLmNv bT4NCj4gLS0tDQo+IA0KPiBIaSBUcm9uZCwNCj4gSSBqdXN0IG5vdGljZWQgdGhhdCB0aGlzIHdh c24ndCBpbiB5b3VyIDQuMTAgcHVsbCByZXF1ZXN0LiAgU28gSSd2ZQ0KPiBhZGRlZCBBY2tlZC1i eXMgZnJvbSBBbmR5IGFuZCBPbGdhIGFuZCBhbSByZXNlbmRpbmcuDQo+IA0KDQpTb3JyeSBmb3Ig aGF2aW5nIG1pc3NlZCB0aGVpciBBY2tlZC1ieXMgZWFybGllcuKApiBJ4oCZbGwgc2VuZCB0aGlz IHRvZ2V0aGVyIHdpdGggdGhlIG90aGVyIDEwIHBhdGNoZXMgdG9tb3Jyb3cuDQoNCkNoZWVycw0K ICBUcm9uZA0KDQo= ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-12-19 0:28 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-10-21 0:01 [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success NeilBrown 2016-11-18 5:01 ` NeilBrown 2016-11-18 14:32 ` Trond Myklebust 2016-11-18 17:34 ` Olga Kornievskaia 2016-11-19 6:33 ` NeilBrown 2016-11-22 20:32 ` Adamson, Andy 2016-11-23 17:37 ` Olga Kornievskaia 2016-12-19 0:19 ` [PATCH resend] " NeilBrown 2016-12-19 0:28 ` Trond Myklebust
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).