From: Boaz Harrosh <bharrosh@panasas.com>
To: Andy Adamson <andros@netapp.com>
Cc: Benny Halevy <bhalevy@panasas.com>,
"linux-nfs@vger.kernel.org Mailing list"
<linux-nfs@vger.kernel.org>, Fred Isaman <iisaman@netapp.com>
Subject: Re: [PATCH 3/5] SQUASHME: pnfs-submit: move pnfs_layout_suspend back to nfs_inode
Date: Wed, 30 Jun 2010 20:17:48 +0300 [thread overview]
Message-ID: <4C2B7C3C.20408@panasas.com> (raw)
In-Reply-To: <85483B53-9B0E-416E-A346-47FEED313F14@netapp.com>
On 06/30/2010 07:38 PM, Andy Adamson wrote:
>
> On Jun 30, 2010, at 11:56 AM, Boaz Harrosh wrote:
>
>> On 06/30/2010 06:13 PM, Andy Adamson wrote:
>>>
>>> On Jun 30, 2010, at 10:49 AM, Boaz Harrosh wrote:
>>>
>>>> On 06/29/2010 07:42 PM, andros@netapp.com wrote:
>>>>> From: Fred Isaman <iisaman@netapp.com>
>>>>>
>>>>> Prepare to embed struct pnfs_layout_tupe into layout driver private
>>>>> structs
>>>>> and to make layout a pointer. Since a temp error on first LAYOUTGET
>>>>> erases the layout, the suspend timer needs to be moved.
>>>>>
>>>>
>>>> OK Fred, Andy. (I'm also referring to the 1/5 and 2/5 issues here)
>>>>
>>>> There is a total mess up. LAYOUTGET has nothing to do with layout
>>>> (.eg struct pnfs_layout_type) and the layout must not be deallocated
>>>> if there is any kind of error and/or if IO is actually to be done
>>>> with
>>>> MDS or not. LAYOUTGET corresponds to layout_seg held on a list in
>>>> layout.
>>>> The list might be empty and so it's flags and state.
>>>
>>> pnfs_layout_type is the layout cache head structure which has no
>
> Hi Boaz
>
>>
>> Why are you calling this "cache" head? Are you referring to the
>> segments list .ie layout->lo_layouts. For me it's a list head
>> why is it a cache ?
>
> It's a cache of layout segments implemented as a list. We may at some
> point use a b-tree - whatever.
>
First of all It's not my day. layout->lo_layouts is that client list
head which keeps the global list of nfsi's that have layouts.
So we are talking about pnfs_layout_type->segs (Why don't you fix me ;-))
cache for me is a bag-of-objects that is kept around for later use even
though I was done with it. Well it might be that. We could theoretically
call layout_get for every IO and immediately return it after commit, so
we keep it until close or destroy, so it's a cache. (I never looked at
it this way. I always look at it as a file's state. now I'm open, now I'm
active IO...)
OK got you
>>
>>> business in nfsi unless there are layout segments populating it.
>>>
>>
>> This is where we disagree. You say layout is lo_layouts. I say
>> it is more, it is that plus the state and additional information.
>> An empty list is not a none-existent list. There are two states
>> here.
>
> The pnfs_layout_state is in the nfsi. For file layout, there is no
> need for pnfs_layout_type when there are no layout segments.
>
There is in the generic layer. The before-layout-get state and the
after-layout-return state. The layout-segments "potential" is part
of the game. see your own patches. why did you have to leak some of
these layout members to outside?
> What does the object layout driver need in pnfs_layout_type (and/or in
> private data) when there are no layout segments?
>
Nothing
>>
>>>>
>>>> Half of the problem was before and this here made it worse. The
>>>> life time of nfsi->layout should be the same as nfsi itself just
>>>> as it was before.
>>>
>>> No, the nfs_inode exists with no regard to pnfs_layout_type (horrible
>>> name).
>>>
>>
>> Yes
>>
>>>>
>>>> Once the life_time of nfsi && nfsi->layout are unified then all your
>>>> problems go away because you are not checking for existence of nfsi
>>>> anywhere right? that's because there is no such problem.
>>>
>>> No, the problems don't go away simply because you statically allocate
>>> a cache head struct. We have problems with races and reference
>>> counting that exist independently of how a structure is allocated.
>>>
>>> We don't need to check for the existence of nfsi, The inode is
>>> guaranteed to exist until nfs_destroy_inode is called by the VFS.
>>>
>>
>> And so if you restrict the life time of layout with nfsi your problem
>> will go away as well, no?
>
>>
>>>>
>>>> Look at it this way. If a layout_driver is in the game it gets to
>>>> allocate it's own area in NFS_I. part of that area is for generic
>>>> pnfs.c use, i.e struct pnfs_layout_type, the rest is for private
>>>> use.
>>>> once existing it is part of the NFS_I life up till death.
>>>
>>> No. It should only exists when needed - just like nfsi->nfs_acl and
>>> nfsi->delegation etc.
>>>
>>
>> It is the same problem with what we had with commit. We had it all
>> over
>> the place and had races refcounting and shit for ever, until we
>> simplified
>> it and moved it to the proper NFS checkpoint.
>>
>> Here too we can just do much better and drop lots of accounting by
>> saying:
>>
>> If this nfsi is eligible for pnfs_io and layouts. We allocate a
>> layout
>> governing structure for it. That will be freed at the end together
>> with
>> nfsi. These that never need it like directories links and so on
>> will not
>> have one.
>
> What accounting are you removing with your scheme? If at
> nfs_destroy_inode you have LAYOUTRETURNS (or any LAYOUTXXX) on the
> wire, you have to wait until they are resolved. Exactly the same
> accounting as with my patch - which by the way, did not change any
> refcounting so I'm confused as to your complaint. Note that prior to
> my patches, pnfs_layout_type->lo_data was freed at the last reference!
> All I did was include the pnfs_layout_type because just like the
> lo_data, it is no longer needed.
>
Yes the pnfs_layout_type->lo_data was half used as a state variable.
We don't have to do that. We can just keep it around up until nfsi
destructor. This one is surly properly ref-counted, right. So we can
just drop our private ref-counting.
>>
>> Now if that first layout_get() never gets through, well nu, we only
>> used
>> it to say "Don't have any".
>
> A NULL nfsi->layout and/or a state in nfsi->pnfs_layout_state can
> indicate "don't have any layout segments in the cache".
>
> If pNFS is never used, why have the pnfs_layout_type? Should
> NFSv3/4.0/4.1 mounts have all these extra fields in the inode (which
> is a precious resource)?
>
What? I thought it is understood that these patches was to remove that,
sure, and I agree.
For all modes other then pnfs+regular-rw-files this structure pointer is NULL.
For pnfs-IO-potential files this structure is allocated, together with all
it's members and glory.
so:
* nfsi->layout == NULL - This file will not need pnfs (v234 other)
* nfsi->layout->segs == EMPTY - Before layout_get or after layout_return
* nfsi->layout->XXX - other layout state global to any segments.
>>
>> Don't you think that is a much simpler model. Why should one shoot
>> one's foot?
>
> No, I don't think it's any simpler. Allocating the structure when it's
> needed and freeing it when it's not is the normal way to do this -
> just like delegations.
>
I don't know about delegations. Just here. What I'm saying is lets collapse
some of the extra stuff, by using existing life-time rules.
You know what, it looks like this is a big jump for you right now. Leave it
like that. Lets run and stabilize this stage. I will put it on my todo list
to try and sanitize it farther before we submit.
> I do agree that we need to review the refcounting.
>
> -->Andy
>
>>
Thanks, this all is moving in the right directions, eventually we will
get there.
Boaz
next prev parent reply other threads:[~2010-06-30 17:17 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-29 16:42 [PATCH 0/5] pnfs-submit: embed pnfs_layout_type in per layout structure andros
2010-06-29 16:42 ` [PATCH 1/5] SQUASHME pnfs-submit: add state flag for layoutcommit_needed andros
2010-06-29 16:42 ` [PATCH 2/5] SQUASHME: pnfs_submit: move pnfs_layout_state back to nfs_inode andros
2010-06-29 16:42 ` [PATCH 3/5] SQUASHME: pnfs-submit: move pnfs_layout_suspend " andros
2010-06-29 16:42 ` [PATCH 4/5] SQUASHME pnfs-submit embed pnfs_layout_type andros
2010-06-29 16:42 ` [PATCH 5/5] SQUASHME pnfs-submit: filelayout: use new alloc_layout API andros
2010-06-30 9:06 ` Boaz Harrosh
2010-06-30 13:42 ` Andy Adamson
2010-06-30 10:02 ` [PATCH 4/5] SQUASHME pnfs-submit embed pnfs_layout_type Boaz Harrosh
2010-06-30 19:43 ` Andy Adamson
2010-06-30 14:49 ` [PATCH 3/5] SQUASHME: pnfs-submit: move pnfs_layout_suspend back to nfs_inode Boaz Harrosh
2010-06-30 15:13 ` Andy Adamson
2010-06-30 15:56 ` Boaz Harrosh
2010-06-30 16:38 ` Andy Adamson
2010-06-30 17:17 ` Boaz Harrosh [this message]
2010-06-30 9:05 ` [PATCH 2/5] SQUASHME: pnfs_submit: move pnfs_layout_state " Boaz Harrosh
2010-06-30 9:31 ` Boaz Harrosh
2010-06-30 13:48 ` Fred Isaman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C2B7C3C.20408@panasas.com \
--to=bharrosh@panasas.com \
--cc=andros@netapp.com \
--cc=bhalevy@panasas.com \
--cc=iisaman@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.