linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benny Halevy <bhalevy@panasas.com>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: Fred Isaman <iisaman@netapp.com>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 11/12] RFC: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
Date: Mon, 20 Sep 2010 20:40:38 +0200	[thread overview]
Message-ID: <4C97AAA6.30501@panasas.com> (raw)
In-Reply-To: <4C9789BF.4090303@panasas.com>

On 2010-09-20 18:20, Boaz Harrosh wrote:
> On 09/20/2010 04:56 PM, Fred Isaman wrote:
>> On Sun, Sep 19, 2010 at 3:07 PM, Boaz Harrosh <bharrosh@panasas.com> wrote:
>>> On 09/18/2010 05:17 AM, Fred Isaman wrote:
>>
>> I agree.  The original deviceid code had an issue where nothing was
>> ever deallocated until shutdown.  The code as given here ties the
>> deviceid lifetime to existing references.  This is not ideal, because
>> a brief loss of reference to a deviceid will cause an unnecessary
>> GETDEVICEINFO.  However, for the current submission, it has the
>> advantage that it is simple and correct.  Adding the machinery to do
>> as you suggest above is indeed a (fairly high) priority goal, but was
>> intentionally deferred.
>>
> 
> I would like to please return it. That is deallocate it only at shut down.
> Otherwise we are for a lot of trouble. We are not yet in a situation of
> 1000nd of devices where it might matter. Even up to 100rd it is still fine
> to never de-allocate. (Currently on the planet Panasas is the only one with
> 1000nd of devices in a single installation)
> 
> Look at it this way. Before we mounted all the NFS servers in our domain
> prior to usage, and/or as part of auto-mount which never got unmounted
> until shotdown/unmount. So we do the same with pNFS only we have the
> privilege of doing a single mount and have everything dynamically logged
> in for us. So we are just as before.
> 
> For all pNFS filesystems today the first IOs will GETDEVICEINFO for the
> 10s of devices deploid and keep them allocated. If we free them then
> what will happen if we need to GETDEVICEINFO when all memory is dirty
> and we need it for cleaning that dirty memory. We don't have any
> forward progress provision for that yet.

Devices are no different than layouts for this matter.
To flush your cache under low memory conditions using pnfs you'll need
both a layout and the corresponding devices. So why do you want to
keep the devices forever but not the layouts?

> 
> The simple thing to do is keep them allocated for ever (until unmount)
> by adding a single did_get() call in the did_add() function.
> (And did_put() in cache deallocation). That's more simple then hardening
> the code.
> 
> And If we don't the pNFS performance will suck big time specially for
> that humongous files-layout get_device_info. Because it will be done
> for every get_layout. A regular bash script opens a file then closes it.
> Most of the time we are not parallel as we think.
> 

I'd like to see the actual numbers for a given benchmark.
Keep in mind that typically for the files layout the server won't
ask for return_on_close so the layout will actually be kept around
(and the associated devices) while the inode is resident, right?

>>>
>>> So in effect if [optional] code above is not yet implemented then a
>>> getdeviceinfo is never preformed twice for the same deviceid
>>> (Assuming no device recall). This is what I had in osd, and is cardinal
>>> for it's performance. Is that what we are aiming at?
>>
>> Yes, that is the goal.  Right now, if a layout is still around that
>> references a deviceid, a second GETDEVICEINFO will not be sent.
>> However, if all layouts referencing a specific deviceid have been
>> returned, then yes, a second GETDEVICEINFO will be called.
>>
> 
> Then sure when we come to the situation that we need proper support
> for more then 100 machines in a cluster, then we can add the clean
> on add stage where if we are higher then x devices we start to replace
> entries.
> 
> What you guys think?

It's hard to tell without the actual numbers.

Keeping a reference count on the device from the layout makes sure
you'll have the devices to use the layout and that you don't issue
multiple GETDEVICEINFOs in case a device is shared between more than
one layout.

Ideally, a laundromat service could clean up unused client state once
in a while so to keep it around for a short while so it can be reused
if possible.

Benny

> 
> Boaz
> 

  reply	other threads:[~2010-09-20 18:40 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-18  3:17 [PATCH 00/12] RFC: pnfs: LAYOUTGET/DEVINFO submission, v2 Fred Isaman
2010-09-18  3:17 ` [PATCH 01/12] NFSD: remove duplicate NFS4_STATEID_SIZE Fred Isaman
2010-09-18  3:17 ` [PATCH 02/12] SUNRPC: define xdr_decode_opaque_fixed Fred Isaman
2010-09-18  3:17 ` [PATCH 03/12] RFC: pnfsd, pnfs: protocol level pnfs constants Fred Isaman
2010-09-18  3:17 ` [PATCH 04/12] RFC: nfs: change stateid to be a union Fred Isaman
2010-09-18  3:17 ` [PATCH 05/12] RFC: nfs: ask for layouttypes during fsinfo call Fred Isaman
2010-09-20 10:29   ` Benny Halevy
2010-09-20 13:46     ` Fred Isaman
2010-09-18  3:17 ` [PATCH 06/12] RFC: nfs: set layout driver Fred Isaman
2010-09-20 10:42   ` Benny Halevy
2010-09-20 14:06     ` Fred Isaman
2010-09-20 14:21       ` Benny Halevy
2010-09-20 15:24         ` Fred Isaman
2010-09-20 14:24       ` Benny Halevy
2010-09-20 15:17         ` Fred Isaman
2010-09-20 13:14   ` Benny Halevy
2010-09-20 14:07     ` Fred Isaman
2010-09-18  3:17 ` [PATCH 07/12] RFC: pnfs: full mount/umount infrastructure Fred Isaman
2010-09-20 14:24   ` Benny Halevy
2010-09-20 16:21     ` Fred Isaman
2010-09-20 17:43       ` Benny Halevy
2010-09-18  3:17 ` [PATCH 08/12] RFC: pnfs: filelayout: introduce minimal file layout driver Fred Isaman
2010-09-18  3:17 ` [PATCH 09/12] RFC: nfs: create and destroy inode's layout cache Fred Isaman
2010-09-18  3:17 ` [PATCH 10/12] RFC: nfs: client needs to maintain list of inodes with active layouts Fred Isaman
2010-09-18  3:17 ` [PATCH 11/12] RFC: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure Fred Isaman
2010-09-19 19:07   ` Boaz Harrosh
2010-09-20 14:56     ` Fred Isaman
2010-09-20 16:20       ` Boaz Harrosh
2010-09-20 18:40         ` Benny Halevy [this message]
2010-09-20 19:10           ` Fred Isaman
2010-09-18  3:17 ` [PATCH 12/12] RFC: pnfs: filelayout: add driver's " Fred Isaman
  -- strict thread matches above, loose matches on Subject: below --
2010-09-22 22:04 [PATCH 00/12] RFC: pnfs: LAYOUTGRT/DEVINFO submission, v3 Fred Isaman
2010-09-22 22:05 ` [PATCH 11/12] RFC: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure Fred Isaman
2010-10-10 15:22 [PATCH 00/12] RFC: pnfs: LAYOUTGET/DEVINFO submission, try 4 Fred Isaman
2010-10-10 15:22 ` [PATCH 11/12] RFC: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure Fred Isaman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C97AAA6.30501@panasas.com \
    --to=bhalevy@panasas.com \
    --cc=bharrosh@panasas.com \
    --cc=iisaman@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).