All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: andros@netapp.com
Cc: bhalevy@panasas.com, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 0/16] pnfs-submit fix layout allocation and reference counting
Date: Thu, 08 Jul 2010 13:16:40 +0300	[thread overview]
Message-ID: <4C35A588.7040103@panasas.com> (raw)
In-Reply-To: <1278542063-4009-1-git-send-email-andros@netapp.com>

On 07/08/2010 01:34 AM, andros@netapp.com wrote:

Hi Andy, 

> The current nfs_inode has an embedded pnfs_layout_type structure, with per
> layout type private data allocated. Change nfs_inode->layout to be a pointer
> to a pnfs_layout_type structure, embed the pnfs_layout_type in the per
> layout type structure, and allocate both.
> 

Amen

> The current pnfs_layout_type allocation waits on a bit lock to handle
> concurrent allocation attempts. Replace this with the normal form.
> 

Why don't we allocate this at inode allocation. Or inode iget() once
and be done with it. Why the fights, races, and error handling?
In a normal pnfs-able mount five-9(s) percent of IO operations need
a layout_type structure. Let's optimize the fast path. On these
rare "error" cases when we could not get any lsegs and eventually
did not use the nfsi->layout, who cares that we allocated extra
20 bytes.

> The current pnfs_layout_type reference counting is very un-clear, and one
> instance of put_layout was called outside the i_lock which probably was
> causing the intermittant pnfs_layout_type refcount bug we've been seeing.
> 
> Replace the nfs_inode->layout reference counting with the following scheme:
> 
> As in the current code, the pnfs_layout_type reference counting is always done
> with the inode->i_lock held.
> 
> The nfs_inode->layout comes into existence when the first layout_segment is
> cached and stays until inode is destroyed.
> 

I see that you have thought about my proposal. I have not looked at the
patches yet, will do soon.

So I have a brave question. If nfs_inode->layout is only freed at
inode-destroy, why do we need to ref-count it. Refcounting is for
holding things so they don't go away. But since now the nfsi->layout
stays until the very end then what's the point?

If you really think it is possible that the layout is held longer then
the inode itself then, 1- this is surly a bug, I'm not sure you can do
much with a layout with a dangling inode pointer. 2- Fine then just
take the inode reference. If you equate the life time of these two
objects why not use the ref that is already there?

> 1) alloc nfs_inode->layout:
>         - Initialized to 1. This holds it around for the clp->cl_layouts
>         (layout->lo_layouts) list.
> 
> 2) Each layoutget
>    layoutget    GET
>    layoutget release PUT
> 
> 3) insert lseg into nfs_inode->layout->segs GET
>    remove lseg from nfs_inode->layout->segs  PUT
> 
> I/O - no reference (except the lseg is used and referenced in 3)
> 
> 4) Each layoutcommit references the layout which keeps it around while in use
>    by the call which could race with layoutreturn
> 
>    layoutcommit GET
>    layoutcommit release PUT
> 
> 5) Each layoutreturn references the layout which keeps it around while in use
>    by the call.
> 
>    layoutreturn  GET
>    layoutreturn release  PUT
> 

2, 4, 5 - surly the inode ref-count is taken during RPC?
3 - inode gone while IO? I don't think so.

> 6) inode destruction (usually umount)
> 
>    Destroy_layout PUT to balance initial allocation where it is set to 1.
> 

inode are destructed when evicted from cache and/or at last reference
drop. Also at umount, but you should start testing like I do, "git clone"
you'll see inodes start to be destroyed 27 seconds into the operation.

> When the reference moves from 1->0 the layout is removed from the nfs_client
> cl_layouts list and freed.
> 

The nfs_client->cl_layouts operations can be moved to the first/last lseg
insertion/removal, as an optimization step.

> 
> Change nfs_inode->layout to be a pointer to a pnfs_layout_type strucure.
> 
> 0001-SQUASHME-pnfs-submit-add-state-flag-for-layoutcommit.patch
> 0002-SQUASHME-pnfs-submit-move-pnfs_layout_suspend-back-t.patch
> 0003-SQUASHME-pnfs-submit-embed-pnfs_layout_type.patch
> 0004-SQUASHME-pnfs-submit-filelayout-use-new-alloc-free_l.patch
> 
> Rewrite the pnfs_layout_type allocation and reference counting
> 
> 0005-SQUASHME-pnfs-submit-rewrite-layout-allocation.patch
> 0006-SQUASHME-pnfs-submit-fix-pnfs_update_layout-referenc.patch
> 0007-SQUASHME-pnfs_submit-don-t-get-a-reference-on-bounda.patch
> 0008-SQUASHME-pnfs-submit-don-t-reference-the-layout-in-i.patch
> 0009-SQUASHME-pnfs-submit-pnfs_update_layout-always-refer.patch
> 0010-SQUASHME-pnfs-submit-reference-the-layout-when-inser.patch
> 0011-SQUASHME-pnfs-submit-rename-put_layout-to-put_layout.patch
> 0012-SQUASHME-pnfs-submit-reference-layout-across-layoutc.patch
> 0013-SQUASHME-pnfs-submit-reference-layout-for-layoutretu.patch
> 0014-SQUASHME-pnfs-submit-remove-put_layout-from-pnfs_fre.patch
> 0015-SQUASHME-pnfs-submit-do-not-reference-a-layout-in-de.patch
> 0016-SQUASHME-pnfs-submit-remove-grab_current_layout.patch
> 
> Testing;
> 
> CONFIG_NFS_V4_1 set:
> Connectathon tests pass against GFS2/pNFS and pyNFS file layout server. Tested
> with layout return-on-close off and on.
> 
> CONFIG_NFS_V4_1 not set;
> NFSv4.0 mount passes Connectation tests.
> 
> -->Andy

Thanks Andy for this work. The code is really getting clearer finally.

Boaz

  parent reply	other threads:[~2010-07-08 10:16 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-07 22:34 [PATCH 0/16] pnfs-submit fix layout allocation and reference counting andros
2010-07-07 22:34 ` [PATCH 01/16] SQUASHME pnfs-submit: add state flag for layoutcommit_needed andros
2010-07-07 22:34   ` [PATCH 02/16] SQUASHME: pnfs-submit: move pnfs_layout_suspend back to nfs_inode andros
2010-07-07 22:34     ` [PATCH 03/16] SQUASHME pnfs-submit embed pnfs_layout_type andros
2010-07-07 22:34       ` [PATCH 04/16] SQUASHME pnfs-submit: filelayout: use new alloc/free_layout API andros
2010-07-07 22:34         ` [PATCH 05/16] SQUASHME pnfs-submit: rewrite layout allocation andros
2010-07-07 22:34           ` [PATCH 06/16] SQUASHME pnfs-submit; fix pnfs_update_layout reference counting andros
2010-07-07 22:34             ` [PATCH 07/16] SQUASHME pnfs_submit: don't get a reference on boundary calculation andros
2010-07-07 22:34               ` [PATCH 08/16] SQUASHME pnfs-submit: don't reference the layout in init_lseg andros
2010-07-07 22:34                 ` [PATCH 09/16] SQUASHME pnfs-submit: pnfs_update_layout always references the lseg andros
2010-07-07 22:34                   ` [PATCH 10/16] SQUASHME pnfs-submit: reference the layout when inserted into segs list andros
2010-07-07 22:34                     ` [PATCH 11/16] SQUASHME pnfs-submit: rename put_layout to put_layout_locked andros
2010-07-07 22:34                       ` [PATCH 12/16] SQUASHME pnfs-submit: reference layout across layoutcommit andros
2010-07-07 22:34                         ` [PATCH 13/16] SQUASHME pnfs-submit: reference layout for layoutreturn andros
2010-07-07 22:34                           ` [PATCH 14/16] SQUASHME pnfs-submit: remove put_layout from pnfs_free_layout andros
2010-07-07 22:34                             ` [PATCH 15/16] SQUASHME pnfs-submit: do not reference a layout in destroy_layout andros
2010-07-07 22:34                               ` [PATCH 16/16] SQUASHME pnfs-submit: remove grab_current_layout andros
2010-07-12 17:19                         ` [PATCH 12/16] SQUASHME pnfs-submit: reference layout across layoutcommit Benny Halevy
2010-07-12 18:27                           ` William A. (Andy) Adamson
     [not found]                             ` <AANLkTil1wUQUTht0QP7_Ttaagw0-LXef2J8wU6wSFUWG-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-12 18:29                               ` Benny Halevy
2010-07-13 13:50                                 ` William A. (Andy) Adamson
2010-07-12 17:25                         ` Boaz Harrosh
2010-07-12 18:09                           ` William A. (Andy) Adamson
     [not found]                             ` <AANLkTilWfV86wpO7vFho3FmL9y9Y6Sx9_-knKq7T-snu-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-12 18:27                               ` Benny Halevy
2010-07-12 18:28                                 ` William A. (Andy) Adamson
     [not found]                                   ` <AANLkTilrcKirEQLe5mhtYNAnaJBSn6q4edC3TlXJd1rm-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-12 19:54                                     ` Benny Halevy
2010-07-12  9:13                   ` [PATCH 09/16] SQUASHME pnfs-submit: pnfs_update_layout always references the lseg Benny Halevy
2010-07-12  8:30           ` [PATCH 05/16] SQUASHME pnfs-submit: rewrite layout allocation Benny Halevy
2010-07-13 13:39             ` William A. (Andy) Adamson
     [not found]               ` <AANLkTilb6ePL7s6H4TbhGfJt05Yw7Gc24V0C8cPHC22K-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-13 14:04                 ` Benny Halevy
2010-07-12 16:32         ` [PATCH 04/16] SQUASHME pnfs-submit: filelayout: use new alloc/free_layout API Boaz Harrosh
2010-07-12 16:29       ` [PATCH 03/16] SQUASHME pnfs-submit embed pnfs_layout_type Boaz Harrosh
2010-07-12 17:05         ` Benny Halevy
2010-07-12 17:38           ` William A. (Andy) Adamson
2010-07-12 17:34         ` William A. (Andy) Adamson
     [not found]           ` <AANLkTikTMLQJwj1PeW2Y0dfmRcVFdNr6sasxJtaAiMg4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-12 17:43             ` Boaz Harrosh
2010-07-12 18:33               ` William A. (Andy) Adamson
2010-07-07 22:57 ` [PATCH 0/16] pnfs-submit fix layout allocation and reference counting Gilliam, PaulX J
2010-07-08 10:16 ` Boaz Harrosh [this message]
2010-07-08 16:14   ` William A. (Andy) Adamson
2010-07-12 15:33     ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C35A588.7040103@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=andros@netapp.com \
    --cc=bhalevy@panasas.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.