* questions about the linux NFS 4.1 client and persistent sessions @ 2020-10-10 20:39 guy keren 2020-10-14 19:26 ` J. Bruce Fields 0 siblings, 1 reply; 6+ messages in thread From: guy keren @ 2020-10-10 20:39 UTC (permalink / raw) To: linux-nfs hi, my name is guy keren, and the company i work at is looking at implementing an NFS 4.1 server for our existing storage product. during the design, we encountered some issues with high-availability and persistent sessions handling by the linux NFS client, and i would like to understand a few things about the linux NFS client - i read all relevant material on www.linux-nfs.org, and spent a while reading the relevant recovery code in the nfs4.1 client kernel sources, but i am missing some things (a pointer to the relevant part in the recovery code will be appreciated as well): 1. suppose there is a persistent session that got disconnected (because of a server restart, for example). i see that the client is re-sending all the in-flight commands as part of the recovery. however, suppose that one of the commands was a compound command containing 2 requests, and the reply to the first of them was NFS4_OK, and to the 2nd it was NFS4ERR_DELAY - will the client's code know that after it finishes recovery of the session - then when it creates a new session, it needs to re-send the 2nd request in this compound command? the broader question is about a compound with N commands, where the first X have an NFS4_OK reply and the last N-X have NFS4_DELAY - will the client re-send a new compound with the last N-X commands after establishing a new session? 2. if there is a non-persistent session, on which the client sent a non-idempotent request (e.g. rename of a file into a different directory), and the server restarted before the client received the response - will the client just blindly re-send the same request again after establishing a new session, or will it take some measures to attempt to understand whether the command was already executed? i.e. if the server already executed the rename, then re-sending it will return a failure to locate the source file handle (because it moved to a new directory). does the linux NFS client attempt to recover from this, or will it simply return an error to the application layer? 3. what NFS server with persistent sessions is used (or was used) when testing the persistent sessions support in the linux NFS client? the linux NFS server, as far as i understood, cannot support persistent sessions (due to lack of assured persistent memory). thanks in advance, --guy keren Vast data. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: questions about the linux NFS 4.1 client and persistent sessions 2020-10-10 20:39 questions about the linux NFS 4.1 client and persistent sessions guy keren @ 2020-10-14 19:26 ` J. Bruce Fields 2020-10-17 20:40 ` Guy Keren 0 siblings, 1 reply; 6+ messages in thread From: J. Bruce Fields @ 2020-10-14 19:26 UTC (permalink / raw) To: guy keren; +Cc: linux-nfs On Sat, Oct 10, 2020 at 11:39:30PM +0300, guy keren wrote: > during the design, we encountered some issues with high-availability > and persistent sessions handling by the linux NFS client, and i > would like to understand a few things about the linux NFS client - i > read all relevant material on www.linux-nfs.org, and spent a while > reading the relevant recovery code in the nfs4.1 client kernel > sources, but i am missing some things (a pointer to the relevant > part in the recovery code will be appreciated as well): > > > 1. suppose there is a persistent session that got disconnected > (because of a server restart, for example). i see that the client is > re-sending all the in-flight commands as part of > > the recovery. however, suppose that one of the commands was a > compound command containing 2 requests, and the reply to the first > of them was NFS4_OK, and to the 2nd it was NFS4ERR_DELAY - will the > client's code know that after it finishes recovery of the session - > then when it creates a new session, it needs to re-send the 2nd > request in this compound command? If the client received the reply, it shouldn't have to resend the compound at all. If the client didn't see the reply, it will resend the whole compound. Its behavior won't be affected by how the compound failed, since it can't know that. > the broader question is about a > compound with N commands, where the first X have an NFS4_OK reply > and the last N-X have NFS4_DELAY The server always stops processing a compound at the first failure, so N-X is always <=1. > - will the client re-send a new > compound with the last N-X commands after establishing a new > session? A resend by definition is a resend of exactly the same compound. The client won't break it into pieces in that way. (And typical compounds can't be broken up that way anyway--often earlier ops in the compound are things like PUTFH's that supply required information to later ops.) > 2. if there is a non-persistent session, on which the client sent a > non-idempotent request (e.g. rename of a file into a different > directory), and the server restarted before the client received the > response - will the client just blindly re-send the same request > again after establishing a new session, or will it take some > measures to attempt to understand whether the command was already > executed? i.e. if the server already executed the rename, then > re-sending it will return a failure to locate the source file handle > (because it moved to a new directory). In a rename of A/X to B/Y, the source filehandle refers to the directory "A", so that filehandle will still work. You might get a NFS4ERR_NOENT if there's nothing at A/X any more, and you could guess that meant the rename succeeded. But it could equally well be that your rename was never executed, and it's somebody else's rename or unlink that caused A/X to no longer exist. Similarly, the A/X might have executed but another operation might have immediately created something else at A/X. > does the linux NFS client > attempt to recover from this, or will it simply return an error to > the application layer? I suspect that's all any client does. You can imagine all sorts of complicated hueristics, but none of them will be 100% right. Persistent sessions is what you really need to fix this kind of bug. > 3. what NFS server with persistent sessions is used (or was used) > when testing the persistent sessions support in the linux NFS > client? the linux NFS server, as far as i understood, cannot support > persistent sessions (due to lack of assured persistent memory). I don't think any special hardware is necessary. Or if it is, we could just disable the feature in the absence of that hardware. Mainly what we need is some cooperation from the filesystem--some way the can ID particular operations so the server can ask the filesystem if a particular operation was committed to disk. I talked to the XFS developers about it informally and they seemed open to the idea, but they need some sort of explanation of the requirements and I haven't gotten around to it.... --b. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: questions about the linux NFS 4.1 client and persistent sessions 2020-10-14 19:26 ` J. Bruce Fields @ 2020-10-17 20:40 ` Guy Keren 2020-10-17 21:14 ` J. Bruce Fields 0 siblings, 1 reply; 6+ messages in thread From: Guy Keren @ 2020-10-17 20:40 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs hi Bruce, thanks for the response. this opens up a few questions about things i thought i understood initially, so i did a re-read of parts of the NFS 4.1 RFC (RFC 5661), and i would like to clarify some things further. see answers below: On Wed, Oct 14, 2020 at 10:27 PM J. Bruce Fields <bfields@fieldses.org> wrote: > > On Sat, Oct 10, 2020 at 11:39:30PM +0300, guy keren wrote: > > during the design, we encountered some issues with high-availability > > and persistent sessions handling by the linux NFS client, and i > > would like to understand a few things about the linux NFS client - i > > read all relevant material on www.linux-nfs.org, and spent a while > > reading the relevant recovery code in the nfs4.1 client kernel > > sources, but i am missing some things (a pointer to the relevant > > part in the recovery code will be appreciated as well): > > > > > > 1. suppose there is a persistent session that got disconnected > > (because of a server restart, for example). i see that the client is > > re-sending all the in-flight commands as part of > > > > the recovery. however, suppose that one of the commands was a > > compound command containing 2 requests, and the reply to the first > > of them was NFS4_OK, and to the 2nd it was NFS4ERR_DELAY - will the > > client's code know that after it finishes recovery of the session - > > then when it creates a new session, it needs to re-send the 2nd > > request in this compound command? > > If the client received the reply, it shouldn't have to resend the > compound at all. > > If the client didn't see the reply, it will resend the whole compound. > Its behavior won't be affected by how the compound failed, since it > can't know that. according to what you wrote here, an NFS4ERR_DELAY response is something that needs to be sent at the level of the entire compound request - i.e. the server is not allowed to send a compound response where the first few requests have a status of NFS4_OK, while the last have a status of NFS4ERR_DELAY. i tried looking exactly where the spec specifies the possibility of the server sending an NFS4ERR_DELAY, and one example is on delegation recall. i am quoting from a paragraph from section 10.2 of the spec: =================== On recall, the client holding the delegation needs to flush modified state (such as modified data) to the server and return the delegation. The conflicting request will not be acted on until the recall is complete. The recall is considered complete when the client returns the delegation or the server times its wait for the delegation to be returned and revokes the delegation as a result of the timeout. In the interim, the server will either delay responding to conflicting requests or respond to them with NFS4ERR_DELAY. Following the resolution of the recall, the server has the information necessary to grant or deny the second client's request. =========================== according to what you say, if the OPEN request is in the middle of the compound request, and is preceded by state-modifying requests (e.g. creation of other files, writes into other open handles, renames, etc.), then the server must avoid processing them until it recalled the delegation to the file (i.e. it must process the entire command to make sure it doesn't need to send an NFS4ERR_DELAY response due to any of the requests inside it, before it starts processing, and it must also lock the state of all files involved in the request, to avoid another client acquiring a delegation on any of the files in the request that have an OPEN request in the same compound. alternatively, it must not send an NFS4ERR_DELAY request, and instead just keep the request pending until the delegation recall was completed. do i understand you correctly here? > > > the broader question is about a > > compound with N commands, where the first X have an NFS4_OK reply > > and the last N-X have NFS4_DELAY > > The server always stops processing a compound at the first failure, so > N-X is always <=1. granted. > > > - will the client re-send a new > > compound with the last N-X commands after establishing a new > > session? > > A resend by definition is a resend of exactly the same compound. The > client won't break it into pieces in that way. > > (And typical compounds can't be broken up that way anyway--often earlier > ops in the compound are things like PUTFH's that supply required > information to later ops.) i would assume that the same mechanism used to create the compound request in the first place (adding the PUTFH in front, etc.) could be used during a re-building of a smaller compound request - provided that the client knows which requests from the compound were already completed - and which were not. but i understand that there's no such mechanism today on the linux NFS client kernel - which is what i initially asked - so that clarifies things. > > > 2. if there is a non-persistent session, on which the client sent a > > non-idempotent request (e.g. rename of a file into a different > > directory), and the server restarted before the client received the > > response - will the client just blindly re-send the same request > > again after establishing a new session, or will it take some > > measures to attempt to understand whether the command was already > > executed? i.e. if the server already executed the rename, then > > re-sending it will return a failure to locate the source file handle > > (because it moved to a new directory). > > In a rename of A/X to B/Y, the source filehandle refers to the directory > "A", so that filehandle will still work. You might get a NFS4ERR_NOENT > if there's nothing at A/X any more, and you could guess that meant the > rename succeeded. But it could equally well be that your rename was > never executed, and it's somebody else's rename or unlink that caused > A/X to no longer exist. Similarly, the A/X might have executed but > another operation might have immediately created something else at A/X. i see. understood. > > > does the linux NFS client > > attempt to recover from this, or will it simply return an error to > > the application layer? > > I suspect that's all any client does. You can imagine all sorts of > complicated hueristics, but none of them will be 100% right. Persistent > sessions is what you really need to fix this kind of bug. what about a situation in which instead of a server restart event, the client just disconnected before receiving a rename response, and re-connected with the same session to the same session? in that case, i presume that the Linux NFS client will re-send the compound request, and get the results from the server's Duplicate-Request cache, without returning errors to the application. correct? > > > 3. what NFS server with persistent sessions is used (or was used) > > when testing the persistent sessions support in the linux NFS > > client? the linux NFS server, as far as i understood, cannot support > > persistent sessions (due to lack of assured persistent memory). > > I don't think any special hardware is necessary. Or if it is, we could > just disable the feature in the absence of that hardware. Mainly what > we need is some cooperation from the filesystem--some way the can ID > particular operations so the server can ask the filesystem if a > particular operation was committed to disk. I talked to the XFS > developers about it informally and they seemed open to the idea, but > they need some sort of explanation of the requirements and I haven't > gotten around to it.... you might also need the file system to be aware of delegations at some level, in order to break delegations held by NFS4 clients, when a local application attempts to open a file in a conflicting manner. and this doesn't answer the original question: how was the "persistent sessions" support in the linux NFS 4.1 client tested? when i tried to find an NFS 4.1 server that supports "persistent sessions" i first went to NetApp - and doing a "node takeover" operation on it revealed that the session is unknown on the 2nd node - making it practically irrelevant for such scenarios (unless there is some way to change the behaviour of this feature to behave more like SMB3 CA volumes). > > --b. on an aside - i see that you are also the maintainer of the pynfs test suite. would you be interested in patches fixing its install operation, and if yes - should we send them to this mailing list, or directly to you? i failed to find a mailing list dedicated to pynfs development. thanks, --guy keren Vast Data ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: questions about the linux NFS 4.1 client and persistent sessions 2020-10-17 20:40 ` Guy Keren @ 2020-10-17 21:14 ` J. Bruce Fields 2020-10-18 11:18 ` Guy Keren 0 siblings, 1 reply; 6+ messages in thread From: J. Bruce Fields @ 2020-10-17 21:14 UTC (permalink / raw) To: Guy Keren; +Cc: linux-nfs On Sat, Oct 17, 2020 at 11:40:09PM +0300, Guy Keren wrote: > according to what you wrote here, an NFS4ERR_DELAY response is > something that needs to be sent at the level of the entire compound > request - i.e. the server is not allowed to send a compound response > where the first few requests have a status of NFS4_OK, while the last > have a status of NFS4ERR_DELAY. Oh, no, it's absolutely fine for a server to do that. Sorry, you mentioned persistent sessions, so I assumed somehow this was about retries after crashes or reboots, where the client may not have received the reply and doesn't know whether it executed. > according to what you say, if the OPEN request is in the middle of the > compound request, and is preceded by state-modifying requests (e.g. > creation of other files, writes into other open handles, renames, > etc.), then the server must avoid processing them until it recalled > the delegation to the file (i.e. it must process the entire command to > make sure it doesn't need to send an NFS4ERR_DELAY response due to any > of the requests inside it, before it starts processing, and it must > also lock the state of all files involved in the request, to avoid > another client acquiring a delegation on any of the files in the > request that have an OPEN request in the same compound. alternatively, > it must not send an NFS4ERR_DELAY request, and instead just keep the > request pending until the delegation recall was completed. No, sorry for the confusion, you're correct, if the client had a bunch of non-idempotent ops all in one compound, and got a DELAY partway through, then, yes, it would have to deal with retrying only the part that didn't execute. I don't know of any client that actually does that, for what it's worth. The Linux client, for example, doesn't send any compounds that I can think of that have more than one nonidempotent op. > i would assume that the same mechanism used to create the compound > request in the first place (adding the PUTFH in front, etc.) could be > used during a re-building of a smaller compound request - provided > that the client knows which requests from the compound were already > completed - and which were not. > > but i understand that there's no such mechanism today on the linux NFS > client kernel - which is what i initially asked - so that clarifies > things. Right, in theory you could imagine clients doing very general things with compounds. In practice I don't know of any that do. (Not that that allows a spec-compliant server to assume they won't.) > what about a situation in which instead of a server restart event, the > client just disconnected before receiving a rename response, and > re-connected with the same session to the same session? in that case, > i presume that the Linux NFS client will re-send the compound request, > and get the results from the server's Duplicate-Request cache, without > returning errors to the application. correct? Right, assuming the client managed to hang on to its lease. > and this doesn't answer the original question: how was the "persistent > sessions" support in the linux NFS 4.1 client tested? I don't know, sorry. > on an aside - i see that you are also the maintainer of the pynfs test > suite. would you be interested in patches fixing its install > operation, and if yes - should we send them to this mailing list, or > directly to you? i failed to find a mailing list dedicated to pynfs > development. Just send them to me, cc'd to this list. Thanks! --b. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: questions about the linux NFS 4.1 client and persistent sessions 2020-10-17 21:14 ` J. Bruce Fields @ 2020-10-18 11:18 ` Guy Keren 2020-10-19 18:14 ` J. Bruce Fields 0 siblings, 1 reply; 6+ messages in thread From: Guy Keren @ 2020-10-18 11:18 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs On Sun, Oct 18, 2020 at 12:14 AM J. Bruce Fields <bfields@fieldses.org> wrote: > > On Sat, Oct 17, 2020 at 11:40:09PM +0300, Guy Keren wrote: > > according to what you wrote here, an NFS4ERR_DELAY response is > > something that needs to be sent at the level of the entire compound > > request - i.e. the server is not allowed to send a compound response > > where the first few requests have a status of NFS4_OK, while the last > > have a status of NFS4ERR_DELAY. > > Oh, no, it's absolutely fine for a server to do that. > > Sorry, you mentioned persistent sessions, so I assumed somehow this was > about retries after crashes or reboots, where the client may not have > received the reply and doesn't know whether it executed. > > > according to what you say, if the OPEN request is in the middle of the > > compound request, and is preceded by state-modifying requests (e.g. > > creation of other files, writes into other open handles, renames, > > etc.), then the server must avoid processing them until it recalled > > the delegation to the file (i.e. it must process the entire command to > > make sure it doesn't need to send an NFS4ERR_DELAY response due to any > > of the requests inside it, before it starts processing, and it must > > also lock the state of all files involved in the request, to avoid > > another client acquiring a delegation on any of the files in the > > request that have an OPEN request in the same compound. alternatively, > > it must not send an NFS4ERR_DELAY request, and instead just keep the > > request pending until the delegation recall was completed. > > No, sorry for the confusion, you're correct, if the client had a bunch > of non-idempotent ops all in one compound, and got a DELAY partway > through, then, yes, it would have to deal with retrying only the part > that didn't execute. actually, it is my understanding that, with persistent sessions, the client has no way to distinguish between a temporary network connection loss, and a server restart, if the server stores the client state (client_id and all stateids) in persistent store. so suppose that the client sent two 'Open' requests in one compound. the server finished processing the first, but then had a delegation on the 2nd one, so it is supposed to return an NFS4_OK to the first Open and a NFSERR_DELAY for the 2nd open (and this is also the compound response that the server will store in its Duplicate Request Cache). if the server had a temporary network disconnection, or had a server restart, then when the client re-connects and re-sends this compound request, it receives the response from the server's Duplicate Request Cache (with OK for the first open and DELA?Y For the 2nd). than, i presume that the client needs to accept that the first Open already succeeded, and when creating a new session, re-send only the 2nd Open request. does this make sense? > > I don't know of any client that actually does that, for what it's worth. > The Linux client, for example, doesn't send any compounds that I can > think of that have more than one nonidempotent op. does it mean that the linux NFS 4.1 client will also never send two Write requests in the same compound? and never send an Open request which might create a file, with a Write request in the same compound? because, although these are not non-idempotent requests, it could be that one of them was executed while the next one was not (at least according to the spec, the server might return NFS4ERR_DELAY for all of the NFS4.1 Request types)? > > > i would assume that the same mechanism used to create the compound > > request in the first place (adding the PUTFH in front, etc.) could be > > used during a re-building of a smaller compound request - provided > > that the client knows which requests from the compound were already > > completed - and which were not. > > > > but i understand that there's no such mechanism today on the linux NFS > > client kernel - which is what i initially asked - so that clarifies > > things. > > Right, in theory you could imagine clients doing very general things > with compounds. In practice I don't know of any that do. > > (Not that that allows a spec-compliant server to assume they won't.) > > > what about a situation in which instead of a server restart event, the > > client just disconnected before receiving a rename response, and > > re-connected with the same session to the same session? in that case, > > i presume that the Linux NFS client will re-send the compound request, > > and get the results from the server's Duplicate-Request cache, without > > returning errors to the application. correct? > > Right, assuming the client managed to hang on to its lease. right. which will be the case if the server doesn't revoke state immediately upon lease expiration, and no other client performed conflicting requests. > > > and this doesn't answer the original question: how was the "persistent > > sessions" support in the linux NFS 4.1 client tested? > > I don't know, sorry. ok, thanks. > > > on an aside - i see that you are also the maintainer of the pynfs test > > suite. would you be interested in patches fixing its install > > operation, and if yes - should we send them to this mailing list, or > > directly to you? i failed to find a mailing list dedicated to pynfs > > development. > > Just send them to me, cc'd to this list. Thanks! ok. we'll clean-up what we have and send it within a few days. thanks. > > --b. --guy keren Vast Data ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: questions about the linux NFS 4.1 client and persistent sessions 2020-10-18 11:18 ` Guy Keren @ 2020-10-19 18:14 ` J. Bruce Fields 0 siblings, 0 replies; 6+ messages in thread From: J. Bruce Fields @ 2020-10-19 18:14 UTC (permalink / raw) To: Guy Keren; +Cc: linux-nfs On Sun, Oct 18, 2020 at 02:18:55PM +0300, Guy Keren wrote: > so suppose that the client sent two 'Open' requests in one compound. > the server finished processing the first, but then had a delegation on > the 2nd one, so it is supposed to return an NFS4_OK to the first Open > and a NFSERR_DELAY for the 2nd open (and this is also the compound > response that the server will store in its Duplicate Request Cache). > if the server had a temporary network disconnection, or had a server > restart, then when the client re-connects and re-sends this compound > request, it receives the response from the server's Duplicate Request > Cache (with OK for the first open and DELA?Y For the 2nd). than, i > presume that the client needs to accept that the first Open already > succeeded, and when creating a new session, re-send only the 2nd Open > request. does this make sense? Sounds right. > > I don't know of any client that actually does that, for what it's worth. > > The Linux client, for example, doesn't send any compounds that I can > > think of that have more than one nonidempotent op. > > does it mean that the linux NFS 4.1 client will also never send two > Write requests in the same compound? and never send an Open request > which might create a file, with a Write request in the same compound? "Will never" might be a little strong--maybe there'll be a reason to do it some day. A server should be prepared to handle it. But the client doesn't currently do either of those things. --b. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-10-19 18:14 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-10-10 20:39 questions about the linux NFS 4.1 client and persistent sessions guy keren 2020-10-14 19:26 ` J. Bruce Fields 2020-10-17 20:40 ` Guy Keren 2020-10-17 21:14 ` J. Bruce Fields 2020-10-18 11:18 ` Guy Keren 2020-10-19 18:14 ` J. Bruce Fields
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox