All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benny Halevy <bhalevy@panasas.com>
To: Fred Isaman <iisaman@netapp.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH 07/12] RFC: pnfs: full mount/umount infrastructure
Date: Mon, 20 Sep 2010 19:43:11 +0200	[thread overview]
Message-ID: <4C979D2F.7050405@panasas.com> (raw)
In-Reply-To: <AANLkTingsdMNbEdUNqA72QJvhsgqd+gXoCeZ=e5wZEqt@mail.gmail.com>

On 2010-09-20 18:21, Fred Isaman wrote:
> On Mon, Sep 20, 2010 at 10:24 AM, Benny Halevy <bhalevy@panasas.com> wrote:
>> On 2010-09-18 05:17, Fred Isaman wrote:
>>> From: The pNFS Team <linux-nfs@vger.kernel.org>
>>>
>>> Allow a module implementing a layout type to register, and
>>> have its mount/umount routines called for filesystems that
>>> the server declares support it.
>>>
>>> Signed-off-by: TBD - melding/reorganization of several patches
>>> ---
>>>  Documentation/filesystems/nfs/00-INDEX |    2 +
>>>  Documentation/filesystems/nfs/pnfs.txt |   48 ++++++++++++++++++
>>>  fs/nfs/Kconfig                         |    2 +-
>>>  fs/nfs/pnfs.c                          |   82 +++++++++++++++++++++++++++++++-
>>>  fs/nfs/pnfs.h                          |    9 ++++
>>>  5 files changed, 140 insertions(+), 3 deletions(-)
>>>  create mode 100644 Documentation/filesystems/nfs/pnfs.txt
>>>
>>> diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX
>>> index 2f68cd6..8d930b9 100644
>>> --- a/Documentation/filesystems/nfs/00-INDEX
>>> +++ b/Documentation/filesystems/nfs/00-INDEX
>>> @@ -12,5 +12,7 @@ nfs-rdma.txt
>>>       - how to install and setup the Linux NFS/RDMA client and server software
>>>  nfsroot.txt
>>>       - short guide on setting up a diskless box with NFS root filesystem.
>>> +pnfs.txt
>>> +     - short explanation of some of the internals of the pnfs code
>>
>> that is, pnfs _client_ code...
>>
> 
> OK
> 
> 
>>>  rpc-cache.txt
>>>       - introduction to the caching mechanisms in the sunrpc layer.
>>> diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt
>>> new file mode 100644
>>> index 0000000..bc0b9cf
>>> --- /dev/null
>>> +++ b/Documentation/filesystems/nfs/pnfs.txt
>>> @@ -0,0 +1,48 @@
>>> +Reference counting in pnfs:
>>> +==========================
>>> +
>>> +The are several inter-related caches.  We have layouts which can
>>> +reference multiple devices, each of which can reference multiple data servers.
>>> +Each data server can be referenced by multiple devices.  Each device
>>> +can be referenced by multiple layouts.  To keep all of this straight,
>>> +we need to reference count.
>>> +
>>> +
>>> +struct pnfs_layout_hdr
>>> +----------------------
>>> +The on-the-wire command LAYOUTGET corresponds to struct
>>> +pnfs_layout_segment, usually referred to by the variable name lseg.
>>> +Each nfs_inode may hold a pointer to a cache of of these layout
>>> +segments in nfsi->layout, of type struct pnfs_layout_hdr.
>>> +
>>> +We reference the header for the inode pointing to it, across each
>>> +outstanding RPC call that references it (LAYOUTGET, fs/nfs/LAYOUTRETURN,
>>> +LAYOUTCOMMIT), and for each lseg held within.
>>> +
>>> +Each header is also (when non-empty) put on a list associated with
>>> +struct nfs_client (cl_layouts).  Being put on this list does not bump
>>> +the reference count, as the layout is kept around by the lseg that
>>> +keeps it in the list.
>>> +
>>> +deviceid_cache
>>> +--------------
>>> +lsegs reference device ids, which are resolved per nfs_client and
>>> +layout driver type.  The device ids are held in a RCU cache (struct
>>> +nfs4_deviceid_cache).  The cache itself is referenced across each
>>> +mount.  The entries (struct nfs4_deviceid) themselves are held across
>>> +the lifetime of each lseg referencing them.
>>> +
>>> +RCU is used because the deviceid is basically a write once, read many
>>> +data structure.  The hlist size of 32 buckets needs better
>>> +justification, but seems reasonable given that we can have multiple
>>> +deviceid's per filesystem, and multiple filesystems per nfs_client.
>>> +
>>> +The hash code is copied from the nfsd code base.  A discussion of
>>> +hashing and variations of this algorithm can be found at:
>>> +http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809
>>> +
>>> +data server cache
>>> +-----------------
>>> +file driver devices refer to data servers, which are kept in a module
>>> +level cache.  Its reference is held over the lifetime of the deviceid
>>> +pointing to it.
>>> diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
>>> index 6c2aad4..5f1b936 100644
>>> --- a/fs/nfs/Kconfig
>>> +++ b/fs/nfs/Kconfig
>>> @@ -78,7 +78,7 @@ config NFS_V4_1
>>>       depends on NFS_V4 && EXPERIMENTAL
>>>       help
>>>         This option enables support for minor version 1 of the NFSv4 protocol
>>> -       (draft-ietf-nfsv4-minorversion1) in the kernel's NFS client.
>>> +       (RFC 5661) in the kernel's NFS client.
>>>
>>>         If unsure, say N.
>>>
>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>> index 2e5dba1..5a8a676 100644
>>> --- a/fs/nfs/pnfs.c
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -32,16 +32,53 @@
>>>
>>>  #define NFSDBG_FACILITY              NFSDBG_PNFS
>>>
>>> -/* STUB that returns the equivalent of "no module found" */
>>> +/* Locking:
>>> + *
>>> + * pnfs_spinlock:
>>> + *      protects pnfs_modules_tbl.
>>> + */
>>> +static DEFINE_SPINLOCK(pnfs_spinlock);
>>> +
>>> +/*
>>> + * pnfs_modules_tbl holds all pnfs modules
>>> + */
>>> +static LIST_HEAD(pnfs_modules_tbl);
>>> +
>>> +/* Return the registered pnfs layout driver module matching given id */
>>> +static struct pnfs_layoutdriver_type *
>>> +find_pnfs_driver_locked(u32 id) {
>>
>> nit: the curly brace should be moved down a line
> 
> OK
> 
> 
>>
>>> +     struct  pnfs_layoutdriver_type *local;
>>> +
>>> +     dprintk("PNFS: %s: Searching for %u\n", __func__, id);
>>
>> I'd move this printk down, before returning and print
>> the result as well.
>>
> 
> OK
> 
> 
>>> +     list_for_each_entry(local, &pnfs_modules_tbl, pnfs_tblid)
>>> +             if (local->id == id) {
>>> +                     if (!try_module_get(local->owner))
>>> +                             local = NULL;
>>
>> If this happens (e.g. in the case the module exited without
>> calling pnfs_unregister_layoutdriver) another (or a different instance)
>> layout driver might have registered on the same id so we need to keep
>> looking.
> 
> There can only be a single driver registered with a particular id.  So
> continuing the search is pointless.
> 
> 

After this conditions happens,
find_pnfs_driver_locked will not find the current one
so next time pnfs_register_layoutdriver is called, it may
register a new layout driver with the same id
and then there will be two entries with the same id.
One, for which try_module_get returns zero and another one
which you will never find if you don't keep looking for it.

Benny

>>
>>> +                     goto out;
>>> +             }
>>> +     local = NULL;
>>> +out:
>>> +     return local;
>>> +}
>>
>> how about the following?
>>
>> static struct pnfs_layoutdriver_type *
>> find_pnfs_driver_locked(u32 id)
>> {
>>        struct pnfs_layoutdriver_type *local, *found = NULL;
>>
>>        list_for_each_entry (local, &pnfs_modules_tbl, pnfs_tblid)
>>                if (local->id == id) {
>>                        if (!try_module_get(local->owner))
>>                                continue;
>>                        found = local;
>>                        break;
>>                }
>> out:
>>        dprintk("%s: layout_type %u found %p\n", __func__, id, found);
>>        return found;
>> }
>>
>>> +
>>>  static struct pnfs_layoutdriver_type *
>>>  find_pnfs_driver(u32 id) {
>>> -     return NULL;
>>> +     struct  pnfs_layoutdriver_type *local;
>>> +
>>> +     spin_lock(&pnfs_spinlock);
>>> +     local = find_pnfs_driver_locked(id);
>>> +     spin_unlock(&pnfs_spinlock);
>>> +     return local;
>>>  }
>>>
>>>  /* Unitialize a mountpoint in a layout driver */
>>>  void
>>>  unset_pnfs_layoutdriver(struct nfs_server *nfss)
>>>  {
>>> +     if (nfss->pnfs_curr_ld) {
>>> +             nfss->pnfs_curr_ld->uninitialize_mountpoint(nfss);
>>> +             module_put(nfss->pnfs_curr_ld->owner);
>>> +     }
>>>       nfss->pnfs_curr_ld = NULL;
>>>  }
>>>
>>> @@ -68,6 +105,13 @@ set_pnfs_layoutdriver(struct nfs_server *server, u32 id)
>>>                       goto out_no_driver;
>>>               }
>>>       }
>>> +     if (ld_type->initialize_mountpoint(server)) {
>>> +             printk(KERN_ERR
>>> +                    "%s: Error initializing mount point for layout driver %u.\n",
>>> +                    __func__, id);
>>> +             module_put(ld_type->owner);
>>> +             goto out_no_driver;
>>> +     }
>>>       server->pnfs_curr_ld = ld_type;
>>>       dprintk("%s: pNFS module for %u set\n", __func__, id);
>>>       return;
>>> @@ -76,3 +120,37 @@ out_no_driver:
>>>       dprintk("%s: Using NFSv4 I/O\n", __func__);
>>>       server->pnfs_curr_ld = NULL;
>>>  }
>>> +
>>> +int
>>> +pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
>>> +{
>>> +     int status = -EINVAL;
>>> +     struct pnfs_layoutdriver_type *tmp;
>>> +
>>
>> Since we're relying on the fact the ld_type->id != 0
>> let's add
>>        BUG_ON(ld_type->id == 0);
>>
>> Benny
> 
> OK.
> 
> Fred
> 
>>
>>> +     spin_lock(&pnfs_spinlock);
>>> +     tmp = find_pnfs_driver_locked(ld_type->id);
>>> +     if (!tmp) {
>>> +             list_add(&ld_type->pnfs_tblid, &pnfs_modules_tbl);
>>> +             status = 0;
>>> +             dprintk("%s Registering id:%u name:%s\n", __func__, ld_type->id,
>>> +                     ld_type->name);
>>> +     } else {
>>> +             module_put(tmp->owner);
>>> +             printk(KERN_ERR "%s Module with id %d already loaded!\n",
>>> +                     __func__, ld_type->id);
>>> +     }
>>> +     spin_unlock(&pnfs_spinlock);
>>> +
>>> +     return status;
>>> +}
>>> +EXPORT_SYMBOL_GPL(pnfs_register_layoutdriver);
>>> +
>>> +void
>>> +pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
>>> +{
>>> +     dprintk("%s Deregistering id:%u\n", __func__, ld_type->id);
>>> +     spin_lock(&pnfs_spinlock);
>>> +     list_del(&ld_type->pnfs_tblid);
>>> +     spin_unlock(&pnfs_spinlock);
>>> +}
>>> +EXPORT_SYMBOL_GPL(pnfs_unregister_layoutdriver);
>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>> index c628ef1..61531f3 100644
>>> --- a/fs/nfs/pnfs.h
>>> +++ b/fs/nfs/pnfs.h
>>> @@ -36,8 +36,17 @@
>>>
>>>  /* Per-layout driver specific registration structure */
>>>  struct pnfs_layoutdriver_type {
>>> +     struct list_head pnfs_tblid;
>>> +     const u32 id;
>>> +     const char *name;
>>> +     struct module *owner;
>>> +     int (*initialize_mountpoint) (struct nfs_server *);
>>> +     int (*uninitialize_mountpoint) (struct nfs_server *);
>>>  };
>>>
>>> +extern int pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *);
>>> +extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
>>> +
>>>  void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
>>>  void unset_pnfs_layoutdriver(struct nfs_server *);
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

  reply	other threads:[~2010-09-20 17:42 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-18  3:17 [PATCH 00/12] RFC: pnfs: LAYOUTGET/DEVINFO submission, v2 Fred Isaman
2010-09-18  3:17 ` [PATCH 01/12] NFSD: remove duplicate NFS4_STATEID_SIZE Fred Isaman
2010-09-18  3:17 ` [PATCH 02/12] SUNRPC: define xdr_decode_opaque_fixed Fred Isaman
2010-09-18  3:17 ` [PATCH 03/12] RFC: pnfsd, pnfs: protocol level pnfs constants Fred Isaman
2010-09-18  3:17 ` [PATCH 04/12] RFC: nfs: change stateid to be a union Fred Isaman
2010-09-18  3:17 ` [PATCH 05/12] RFC: nfs: ask for layouttypes during fsinfo call Fred Isaman
2010-09-20 10:29   ` Benny Halevy
2010-09-20 13:46     ` Fred Isaman
2010-09-18  3:17 ` [PATCH 06/12] RFC: nfs: set layout driver Fred Isaman
2010-09-20 10:42   ` Benny Halevy
2010-09-20 14:06     ` Fred Isaman
2010-09-20 14:21       ` Benny Halevy
2010-09-20 15:24         ` Fred Isaman
2010-09-20 14:24       ` Benny Halevy
2010-09-20 15:17         ` Fred Isaman
2010-09-20 13:14   ` Benny Halevy
2010-09-20 14:07     ` Fred Isaman
2010-09-18  3:17 ` [PATCH 07/12] RFC: pnfs: full mount/umount infrastructure Fred Isaman
2010-09-20 14:24   ` Benny Halevy
2010-09-20 16:21     ` Fred Isaman
2010-09-20 17:43       ` Benny Halevy [this message]
2010-09-18  3:17 ` [PATCH 08/12] RFC: pnfs: filelayout: introduce minimal file layout driver Fred Isaman
2010-09-18  3:17 ` [PATCH 09/12] RFC: nfs: create and destroy inode's layout cache Fred Isaman
2010-09-18  3:17 ` [PATCH 10/12] RFC: nfs: client needs to maintain list of inodes with active layouts Fred Isaman
2010-09-18  3:17 ` [PATCH 11/12] RFC: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure Fred Isaman
2010-09-19 19:07   ` Boaz Harrosh
2010-09-20 14:56     ` Fred Isaman
2010-09-20 16:20       ` Boaz Harrosh
2010-09-20 18:40         ` Benny Halevy
2010-09-20 19:10           ` Fred Isaman
2010-09-18  3:17 ` [PATCH 12/12] RFC: pnfs: filelayout: add driver's " Fred Isaman
  -- strict thread matches above, loose matches on Subject: below --
2010-09-22 22:04 [PATCH 00/12] RFC: pnfs: LAYOUTGRT/DEVINFO submission, v3 Fred Isaman
2010-09-22 22:05 ` [PATCH 07/12] RFC: pnfs: full mount/umount infrastructure Fred Isaman
2010-10-10 15:22 [PATCH 00/12] RFC: pnfs: LAYOUTGET/DEVINFO submission, try 4 Fred Isaman
2010-10-10 15:22 ` [PATCH 07/12] RFC: pnfs: full mount/umount infrastructure Fred Isaman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C979D2F.7050405@panasas.com \
    --to=bhalevy@panasas.com \
    --cc=iisaman@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.