Re: state lock elimination

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: state lock elimination
       [not found]   ` <20131016064243.GA28758@infradead.org>
@ 2013-10-16  8:21     ` Benny Halevy
  2013-10-16 18:22       ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Benny Halevy @ 2013-10-16  8:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: bfields, NFS list

On 2013-10-16 09:42, Christoph Hellwig wrote:
>> A lot of crap ended up under the client_mutex that we need to untangle.
>> My general direction is:
>> - use a spin lock for all non-blocking sections (rename recall_lock to state_lock for that)
> 
> Any chance we can avoid global locks for most of this?  Global locks
> really suck for scalability, and in nfsd we should be able to do most
> things per-export at least.
> 

I tried reducing the lock scope but the problem is that we hash state
on objects with different scopes of access, particularly net/client vs. file
and the hashing needs to be consistent and atomic.

>> - for state mutating nfs4 ops like open/close/lock/... use wait_on_bit to serialize access
>>   *per open owner* rather than globally.
> 
> per open owner sounds good.  Not a huge fan of bit locks as they make
> debugging very hard, though.

I'll try using a refcount and mutex first as per Bruce's current direction with stateowners.

> 
> I'll look over your changes soon, btw.
> 

Thanks!

The WIP version is in git://linux-nfs/~bhalevy/linux-pnfs.git state-lock-elim
if anyone else is interested in previewing the patchset in its current drafty state)

Benny

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nfs4_file usage
       [not found]   ` <20131016064649.GB28758@infradead.org>
@ 2013-10-16  8:45     ` Benny Halevy
  2013-10-16  8:51       ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Benny Halevy @ 2013-10-16  8:45 UTC (permalink / raw)
  To: Christoph Hellwig, bfields; +Cc: NFS list

[adding linux-nfs to the Cc]

On 2013-10-16 09:46, Christoph Hellwig wrote:
> On Wed, Oct 16, 2013 at 09:04:12AM +0300, Benny Halevy wrote:
>> On 2013-10-16 00:14, Christoph Hellwig wrote:
>>> it seems like the pnfs tree uses struct nfs4_file for something entirely
>>> different than the current nfsd usage of it - in fact none of the actual
>>> fields except for the refcount, hashing and inode reference seems to be
>>> shared.
>>
>> The dlm patchset adds these members:
>> +	struct list_head	fi_lo_states;
>>
>> The fi_lo_states (coupled with nfs4_client.cl_lo_states) use case is equivalent
>> to fi_delegations.
>> But that said, it might make sense to move the respective layout state handling
>> to nfs4state.c.
>>
>> +	struct nfs4_fsid	fi_fsid;
>> fi_fsid is a shorthand for fh_export->ex_fsid and it would be expensive to look
>> it up on demand (return and recall by fsid)
>>
>> Other uses in the full pnfsd patchset add:
>> +	struct mutex		fi_lo_lock;
>> for inter-locking layout changing ops (layout get/commit/return) across
>> down calls to the file system.
>> I need to see if that can/should be moved to the layout state structure instead.
>>
>> +	u32			fi_fhlen;
>> +	u8			fi_fhval[NFS4_FHSIZE];
>>
>> Similar to fi_fsid, a shorthand for layout recall by file (common case).
>> I don't see a better place to put it.
> 
> Maybe I wasn't clear enough in the first mail.   pnfs still needs a
> structure the shadows struct inode with all the fields it adds.  I just
> don't think overlyaing it over nfs4_file makes sense, but rather have
> something separate ala:
> 
> struct pnfsd_inode {
> 	atomic_t                pi_ref;
> 	struct hlist_node       pi_hash;
> 	struct list_head        pi_lo_states;

I'm reluctant about the layout stateids as it uses common hashing data structs
locking infrastructure, and code with all other nfsd4 stateids.

I hope that the pnfs layer can just carry a pointer to the nfs4_file
to passed to functions living in nfs4state.c that control the layout state.

> 	struct mutex            pi_lo_lock;
> 	struct nfs4_fsid        pi_fsid;
> 	u32                     pi_fhlen;
> 	u8			pi_fhval[NFS4_FHSIZE];
> };
> 

We can have a similar hash table for pnfsd_inode similar to nfs4_file
Note that nfs4_file is also per inode, not per file as might be reflected from its name.
Maybe moving it nfs4state.c also warrants renaming it to something more accurate.
And while there, change file_hashval to hash on i_ino rather than the inode ptr.

Benny

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nfs4_file usage
  2013-10-16  8:45     ` nfs4_file usage Benny Halevy
@ 2013-10-16  8:51       ` Christoph Hellwig
  2013-10-16 12:18         ` Benny Halevy
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2013-10-16  8:51 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Christoph Hellwig, bfields, NFS list

On Wed, Oct 16, 2013 at 11:45:28AM +0300, Benny Halevy wrote:
> I'm reluctant about the layout stateids as it uses common hashing data structs
> locking infrastructure, and code with all other nfsd4 stateids.

The layout stateids are another thing that confuses me in the current
pnfs code.  Depending on what we find the generic state id might be
embedded into the layout stateid or they might be separate, but there
don't seem to be clear rules on how to deal with freeing things in the
later case.  I haven't brought this up because I didn't dig deep into it
enough, but it's for sure more than confusing.

> We can have a similar hash table for pnfsd_inode similar to nfs4_file

Yes, that was the plan.

> Note that nfs4_file is also per inode, not per file as might be reflected from its name.
> Maybe moving it nfs4state.c also warrants renaming it to something more accurate.

Yes, it should have an inode instead of file in the name, and it really
should be nfsd4_ prefix instead of using the client namespace. Given
that it's open/locking specific I'm not even sure that name should be
that generic.

> And while there, change file_hashval to hash on i_ino rather than the inode ptr.

That's a bad idea.  i_ino is 32-bit only while many NFS-exported
filesystems have 64-bit inode numbers.  Hashing on kernel pointers
genetrally is a fairly good idea, and we do it for the inode in other
places, too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nfs4_file usage
  2013-10-16  8:51       ` Christoph Hellwig
@ 2013-10-16 12:18         ` Benny Halevy
  2013-10-16 12:33           ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Benny Halevy @ 2013-10-16 12:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: bfields, NFS list

On 2013-10-16 11:51, Christoph Hellwig wrote:
> On Wed, Oct 16, 2013 at 11:45:28AM +0300, Benny Halevy wrote:
>> I'm reluctant about the layout stateids as it uses common hashing data structs
>> locking infrastructure, and code with all other nfsd4 stateids.
> 
> The layout stateids are another thing that confuses me in the current
> pnfs code.  Depending on what we find the generic state id might be
> embedded into the layout stateid or they might be separate, but there
> don't seem to be clear rules on how to deal with freeing things in the
> later case.  I haven't brought this up because I didn't dig deep into it
> enough, but it's for sure more than confusing.
> 

The nfs4.1 layout stateid is like any other stateid (open/lock/deleg)
but it's life time is a bit different as it is not descending from the
open stateid (though the protocol requires the client to use it
to acquire the first layout for the file) and it is released via
either LAYOUTRETURN or CLOSE (in the "return_on_close" case).
The rules here are different enough from open/lock/deleg stateids that
pnfsd needs special code to handle the management of the actual layout state
under the layout stateid and manipulate it accordingly.

>> We can have a similar hash table for pnfsd_inode similar to nfs4_file
> 
> Yes, that was the plan.
> 
>> Note that nfs4_file is also per inode, not per file as might be reflected from its name.
>> Maybe moving it nfs4state.c also warrants renaming it to something more accurate.
> 
> Yes, it should have an inode instead of file in the name, and it really
> should be nfsd4_ prefix instead of using the client namespace. Given
> that it's open/locking specific I'm not even sure that name should be
> that generic.

Yeah, the server naming conventions are confusing and in many cases
conflicting with client names (file names, data types, functions, etc.)

> 
>> And while there, change file_hashval to hash on i_ino rather than the inode ptr.
> 
> That's a bad idea.  i_ino is 32-bit only while many NFS-exported
> filesystems have 64-bit inode numbers.  Hashing on kernel pointers
> genetrally is a fairly good idea, and we do it for the inode in other
> places, too.
> 

How many bits in the kernel address space are unique?
I am under the impression that they'd differ mostly in their least significant bits anyway.
Regardless, if it works well now then no reason to change it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nfs4_file usage
  2013-10-16 12:18         ` Benny Halevy
@ 2013-10-16 12:33           ` Christoph Hellwig
  2013-10-16 12:48             ` Benny Halevy
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2013-10-16 12:33 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Christoph Hellwig, bfields, NFS list

On Wed, Oct 16, 2013 at 03:18:21PM +0300, Benny Halevy wrote:
> The nfs4.1 layout stateid is like any other stateid (open/lock/deleg)
> but it's life time is a bit different as it is not descending from the
> open stateid (though the protocol requires the client to use it
> to acquire the first layout for the file) and it is released via
> either LAYOUTRETURN or CLOSE (in the "return_on_close" case).
> The rules here are different enough from open/lock/deleg stateids that
> pnfsd needs special code to handle the management of the actual layout state
> under the layout stateid and manipulate it accordingly.

I'm not primarily ocnfused about the protocol even if that isn't all
that clear either.  I'm more worried about the code in and around
nfs4_find_create_layout_stateid.  I have looked a bit deeper into it
and I think the main problem is that struct nfs4_stid isn't refcounted
by itself, which really confused me expecting nfsd4_lookup_stateid
to return a reference to it.

> How many bits in the kernel address space are unique?
> I am under the impression that they'd differ mostly in their least significant bits anyway.
> Regardless, if it works well now then no reason to change it.

In theory the hash_ptr helper is there to take care exactly of that sort
of issue, in practice I can't see anything in there right now.

Before that becomes an issue we'll have to replace that dumb hash with a
more scalable data structure anyway.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nfs4_file usage
  2013-10-16 12:33           ` Christoph Hellwig
@ 2013-10-16 12:48             ` Benny Halevy
  0 siblings, 0 replies; 7+ messages in thread
From: Benny Halevy @ 2013-10-16 12:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: bfields, NFS list

On 2013-10-16 15:33, Christoph Hellwig wrote:
> On Wed, Oct 16, 2013 at 03:18:21PM +0300, Benny Halevy wrote:
>> The nfs4.1 layout stateid is like any other stateid (open/lock/deleg)
>> but it's life time is a bit different as it is not descending from the
>> open stateid (though the protocol requires the client to use it
>> to acquire the first layout for the file) and it is released via
>> either LAYOUTRETURN or CLOSE (in the "return_on_close" case).
>> The rules here are different enough from open/lock/deleg stateids that
>> pnfsd needs special code to handle the management of the actual layout state
>> under the layout stateid and manipulate it accordingly.
> 
> I'm not primarily ocnfused about the protocol even if that isn't all
> that clear either.  I'm more worried about the code in and around
> nfs4_find_create_layout_stateid.  I have looked a bit deeper into it
> and I think the main problem is that struct nfs4_stid isn't refcounted
> by itself, which really confused me expecting nfsd4_lookup_stateid
> to return a reference to it.
> 

True.  That is likely to change with the state lock elimination project
where a stateid would get refcounted once found and its pointer is returned to
the caller.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: state lock elimination
  2013-10-16  8:21     ` state lock elimination Benny Halevy
@ 2013-10-16 18:22       ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2013-10-16 18:22 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Christoph Hellwig, bfields, NFS list

On Wed, Oct 16, 2013 at 11:21:28AM +0300, Benny Halevy wrote:
> I tried reducing the lock scope but the problem is that we hash state
> on objects with different scopes of access, particularly net/client vs. file
> and the hashing needs to be consistent and atomic.

Lists, Hashes or similar lookup data structures are generally not the
hard part, as it's enough to lock the data structure that the hash or
list hangs off.  No need to involve looks in the structure pointed to.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-10-16 18:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20131015211445.GA23636@infradead.org>
     [not found] ` <525DB943.3010707@primarydata.com>
     [not found]   ` <20131016064243.GA28758@infradead.org>
2013-10-16  8:21     ` state lock elimination Benny Halevy
2013-10-16 18:22       ` Christoph Hellwig
     [not found] ` <525E2C5C.4020104@primarydata.com>
     [not found]   ` <20131016064649.GB28758@infradead.org>
2013-10-16  8:45     ` nfs4_file usage Benny Halevy
2013-10-16  8:51       ` Christoph Hellwig
2013-10-16 12:18         ` Benny Halevy
2013-10-16 12:33           ` Christoph Hellwig
2013-10-16 12:48             ` Benny Halevy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).