From: Benny Halevy <bhalevy@panasas.com>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: Fred Isaman <iisaman@netapp.com>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 11/12] RFC: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
Date: Mon, 20 Sep 2010 20:40:38 +0200 [thread overview]
Message-ID: <4C97AAA6.30501@panasas.com> (raw)
In-Reply-To: <4C9789BF.4090303@panasas.com>
On 2010-09-20 18:20, Boaz Harrosh wrote:
> On 09/20/2010 04:56 PM, Fred Isaman wrote:
>> On Sun, Sep 19, 2010 at 3:07 PM, Boaz Harrosh <bharrosh@panasas.com> wrote:
>>> On 09/18/2010 05:17 AM, Fred Isaman wrote:
>>
>> I agree. The original deviceid code had an issue where nothing was
>> ever deallocated until shutdown. The code as given here ties the
>> deviceid lifetime to existing references. This is not ideal, because
>> a brief loss of reference to a deviceid will cause an unnecessary
>> GETDEVICEINFO. However, for the current submission, it has the
>> advantage that it is simple and correct. Adding the machinery to do
>> as you suggest above is indeed a (fairly high) priority goal, but was
>> intentionally deferred.
>>
>
> I would like to please return it. That is deallocate it only at shut down.
> Otherwise we are for a lot of trouble. We are not yet in a situation of
> 1000nd of devices where it might matter. Even up to 100rd it is still fine
> to never de-allocate. (Currently on the planet Panasas is the only one with
> 1000nd of devices in a single installation)
>
> Look at it this way. Before we mounted all the NFS servers in our domain
> prior to usage, and/or as part of auto-mount which never got unmounted
> until shotdown/unmount. So we do the same with pNFS only we have the
> privilege of doing a single mount and have everything dynamically logged
> in for us. So we are just as before.
>
> For all pNFS filesystems today the first IOs will GETDEVICEINFO for the
> 10s of devices deploid and keep them allocated. If we free them then
> what will happen if we need to GETDEVICEINFO when all memory is dirty
> and we need it for cleaning that dirty memory. We don't have any
> forward progress provision for that yet.
Devices are no different than layouts for this matter.
To flush your cache under low memory conditions using pnfs you'll need
both a layout and the corresponding devices. So why do you want to
keep the devices forever but not the layouts?
>
> The simple thing to do is keep them allocated for ever (until unmount)
> by adding a single did_get() call in the did_add() function.
> (And did_put() in cache deallocation). That's more simple then hardening
> the code.
>
> And If we don't the pNFS performance will suck big time specially for
> that humongous files-layout get_device_info. Because it will be done
> for every get_layout. A regular bash script opens a file then closes it.
> Most of the time we are not parallel as we think.
>
I'd like to see the actual numbers for a given benchmark.
Keep in mind that typically for the files layout the server won't
ask for return_on_close so the layout will actually be kept around
(and the associated devices) while the inode is resident, right?
>>>
>>> So in effect if [optional] code above is not yet implemented then a
>>> getdeviceinfo is never preformed twice for the same deviceid
>>> (Assuming no device recall). This is what I had in osd, and is cardinal
>>> for it's performance. Is that what we are aiming at?
>>
>> Yes, that is the goal. Right now, if a layout is still around that
>> references a deviceid, a second GETDEVICEINFO will not be sent.
>> However, if all layouts referencing a specific deviceid have been
>> returned, then yes, a second GETDEVICEINFO will be called.
>>
>
> Then sure when we come to the situation that we need proper support
> for more then 100 machines in a cluster, then we can add the clean
> on add stage where if we are higher then x devices we start to replace
> entries.
>
> What you guys think?
It's hard to tell without the actual numbers.
Keeping a reference count on the device from the layout makes sure
you'll have the devices to use the layout and that you don't issue
multiple GETDEVICEINFOs in case a device is shared between more than
one layout.
Ideally, a laundromat service could clean up unused client state once
in a while so to keep it around for a short while so it can be reused
if possible.
Benny
>
> Boaz
>
next prev parent reply other threads:[~2010-09-20 18:40 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-18 3:17 [PATCH 00/12] RFC: pnfs: LAYOUTGET/DEVINFO submission, v2 Fred Isaman
2010-09-18 3:17 ` [PATCH 01/12] NFSD: remove duplicate NFS4_STATEID_SIZE Fred Isaman
2010-09-18 3:17 ` [PATCH 02/12] SUNRPC: define xdr_decode_opaque_fixed Fred Isaman
2010-09-18 3:17 ` [PATCH 03/12] RFC: pnfsd, pnfs: protocol level pnfs constants Fred Isaman
2010-09-18 3:17 ` [PATCH 04/12] RFC: nfs: change stateid to be a union Fred Isaman
2010-09-18 3:17 ` [PATCH 05/12] RFC: nfs: ask for layouttypes during fsinfo call Fred Isaman
2010-09-20 10:29 ` Benny Halevy
2010-09-20 13:46 ` Fred Isaman
2010-09-18 3:17 ` [PATCH 06/12] RFC: nfs: set layout driver Fred Isaman
2010-09-20 10:42 ` Benny Halevy
2010-09-20 14:06 ` Fred Isaman
2010-09-20 14:21 ` Benny Halevy
2010-09-20 15:24 ` Fred Isaman
2010-09-20 14:24 ` Benny Halevy
2010-09-20 15:17 ` Fred Isaman
2010-09-20 13:14 ` Benny Halevy
2010-09-20 14:07 ` Fred Isaman
2010-09-18 3:17 ` [PATCH 07/12] RFC: pnfs: full mount/umount infrastructure Fred Isaman
2010-09-20 14:24 ` Benny Halevy
2010-09-20 16:21 ` Fred Isaman
2010-09-20 17:43 ` Benny Halevy
2010-09-18 3:17 ` [PATCH 08/12] RFC: pnfs: filelayout: introduce minimal file layout driver Fred Isaman
2010-09-18 3:17 ` [PATCH 09/12] RFC: nfs: create and destroy inode's layout cache Fred Isaman
2010-09-18 3:17 ` [PATCH 10/12] RFC: nfs: client needs to maintain list of inodes with active layouts Fred Isaman
2010-09-18 3:17 ` [PATCH 11/12] RFC: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure Fred Isaman
2010-09-19 19:07 ` Boaz Harrosh
2010-09-20 14:56 ` Fred Isaman
2010-09-20 16:20 ` Boaz Harrosh
2010-09-20 18:40 ` Benny Halevy [this message]
2010-09-20 19:10 ` Fred Isaman
2010-09-18 3:17 ` [PATCH 12/12] RFC: pnfs: filelayout: add driver's " Fred Isaman
-- strict thread matches above, loose matches on Subject: below --
2010-09-22 22:04 [PATCH 00/12] RFC: pnfs: LAYOUTGRT/DEVINFO submission, v3 Fred Isaman
2010-09-22 22:05 ` [PATCH 11/12] RFC: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure Fred Isaman
2010-10-10 15:22 [PATCH 00/12] RFC: pnfs: LAYOUTGET/DEVINFO submission, try 4 Fred Isaman
2010-10-10 15:22 ` [PATCH 11/12] RFC: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure Fred Isaman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C97AAA6.30501@panasas.com \
--to=bhalevy@panasas.com \
--cc=bharrosh@panasas.com \
--cc=iisaman@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.