From: Boaz Harrosh <bharrosh@panasas.com>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Peng Tao <bergwolf@gmail.com>, <linux-nfs@vger.kernel.org>,
<bhalevy@tonian.com>, Garth Gibson <garth@panasas.com>,
Matt Benjamin <matt@linuxbox.com>,
Marc Eshel <eshel@almaden.ibm.com>,
Fred Isaman <iisaman@netapp.com>
Subject: Re: [PATCH 0/4] nfs41: allow layoutget at pnfs_do_multiple_writes
Date: Tue, 29 Nov 2011 17:46:06 -0800 [thread overview]
Message-ID: <4ED58ADE.8010809@panasas.com> (raw)
In-Reply-To: <1322614718.11286.104.camel@lade.trondhjem.org>
On 11/29/2011 04:58 PM, Trond Myklebust wrote:
> On Tue, 2011-11-29 at 16:24 -0800, Boaz Harrosh wrote:
>> On 11/29/2011 03:30 PM, Trond Myklebust wrote:
>>> On Tue, 2011-11-29 at 14:58 -0800, Boaz Harrosh wrote:
>>
>> That I don't understand. What "spec worms that the pNFS layout segments open"
>> Are you seeing. Because it works pretty simple for me. And I don't see the
>> big difference for files. One thing I learned for the past is that when you
>> have concerns I should understand them and start to address them. Because
>> your insights are usually on the Money. If you are concerned then there is
>> something I should fix.
>
> I'm saying that if I need to manage layouts that deal with >1000 DSes,
> then I presumably need a strategy for ensuring that I return/forget
> segments that are no longer needed, and I need a strategy for ensuring
> that I always hold the segments that I do need; otherwise, I could just
> ask for a full-file layout and deal with the 1000 DSes (which is what we
> do today)...
>
Thanks for asking because now I can answer you and you will find that I'm
one step a head in some of the issues.
1. The 1000 DSes problem is separate from the segments problem. The devices
solution is on the way. The device cache is all but ready to see some
periodic scan that throws 0 used devices. We never got to it because
currently every one is testing with up to 10 devices and I'm using upto
128 devices which is just fine. The load is marginal so far.
But I promise you it is right here on my to do list. After some more
pressed problem.
Lets say one thing this subsystem is the same regardless of if the
1000 devices are refed by 1 segment or by 10 segments. Actually if
by 10 then I might get rid of some and free devices.
2. The many segments problem. There are not that many. It's more less
a segment for every 2GB so an lo_seg struct for so much IO is not
noticeable.
At the upper bound we do not have any problem because Once the system is
out of memory it will start to evict inodes. And on evict we just return
them. Also ROC Servers we forget them on close. So so far all our combined
testing did not show any real memory pressure caused by that. When shown we
can start discarding segs in an LRU fashion. There are all the mechanics
to do that, we only need to see the need.
3. The current situation is fine and working and showing great performance
for objects and blocks. And it is all in the Generic part so it should just
be the same for files. I do not see any difference.
The only BUG I see is the COMMIT and I think we know how to fix that
> My problem is that the spec certainly doesn't give me any guidance as to
> such a strategy, and I haven't seen anybody else step up to the plate.
> In fact, I strongly suspect that such a strategy is going to be very
> application specific.
>
You never asked. I'm thinking about these things all the time. Currently
we are far behind the limits of a running system. I think I'm going to
get to these limits before any one else.
My strategy is stated above LRU for devices is almost all there ref-counting
and all only the periodic timer needs to be added.
LRU for segments is more work, but is doable. But the segments count are
so low that we will not hit that problem for a long time. Before I ship
a system that will break that barrier I'll send a fix I promise.
> IOW: I don't accept that a layout-segment based solution is useful
> without some form of strategy for telling me which segments to keep and
> which to throw out when I start hitting client resource limits.
LRU. Again there are not more than a few segments per inode. It's not
1000 like devices.
> I also
> haven't seen any strategy out there for setting loga_length (as opposed
> to loga_minlength) in the LAYOUTGET requests: as far as I know that is
> going to be heavily application-dependent in the 1000-DS world.
>
Current situation is working for me. But we also are actively working to
improve it. What we want is that files-LO can enjoy the same privileges that
objects and blocks already have, in exactly the same, simple stupid but working,
way.
All your above concerns are true and interesting. I call them a rich man problems.
But they are not specific to files-LO they are generic to all of us. Current situation
satisfies us for blocks and objects. The file guys out there are jealous.
Thanks
Heart
next prev parent reply other threads:[~2011-11-30 1:46 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-03 4:52 [PATCH 0/4] nfs41: allow layoutget at pnfs_do_multiple_writes Peng Tao
2011-11-29 21:34 ` Boaz Harrosh
2011-11-29 21:50 ` Boaz Harrosh
2011-11-29 21:57 ` Trond Myklebust
2011-11-29 22:40 ` Boaz Harrosh
2011-11-29 22:47 ` Trond Myklebust
2011-11-29 22:58 ` Boaz Harrosh
2011-11-29 23:30 ` Trond Myklebust
2011-11-29 23:49 ` Marc Eshel
2011-11-30 0:08 ` Trond Myklebust
2011-11-30 0:20 ` Marc Eshel
2011-11-30 0:37 ` Trond Myklebust
2011-11-30 0:50 ` Boaz Harrosh
2011-11-30 19:39 ` J. Bruce Fields
2011-11-30 0:52 ` Marc Eshel
2011-11-30 19:44 ` J. Bruce Fields
2011-12-01 9:47 ` Benny Halevy
2011-12-01 11:14 ` J. Bruce Fields
2011-12-01 11:48 ` J. Bruce Fields
2011-11-30 0:42 ` Boaz Harrosh
2011-11-30 0:24 ` Boaz Harrosh
2011-11-30 0:58 ` Trond Myklebust
2011-11-30 1:46 ` Boaz Harrosh [this message]
2011-11-30 2:07 ` Trond Myklebust
2011-11-30 3:08 ` Boaz Harrosh
2011-11-30 12:33 ` Benny Halevy
2011-11-30 0:37 ` Matt W. Benjamin
2011-11-30 0:48 ` Matt W. Benjamin
2011-11-30 1:01 ` Trond Myklebust
2011-11-30 1:03 ` Matt W. Benjamin
2011-11-29 23:01 ` Trond Myklebust
2011-11-29 23:47 ` Boaz Harrosh
2011-11-30 3:16 ` tao.peng
2011-11-30 3:50 ` Boaz Harrosh
2011-11-30 5:05 ` tao.peng
2011-11-30 12:42 ` Benny Halevy
2011-12-03 4:52 ` [PATCH 1/4] nfsv41: export pnfs_find_alloc_layout Peng Tao
2011-12-03 4:52 ` [PATCH 2/4] nfsv41: add and export pnfs_find_get_layout_locked Peng Tao
2011-12-03 4:52 ` [PATCH 3/4] nfsv41: get lseg before issue LD IO if pgio doesn't carry lseg Peng Tao
2011-11-30 13:01 ` Benny Halevy
2011-11-30 13:20 ` Peng Tao
2011-12-03 4:52 ` [PATCH 4/4] pnfsblock: do ask for layout in pg_init Peng Tao
2011-11-29 16:40 ` Trond Myklebust
2011-11-29 17:25 ` Peng Tao
2011-11-29 17:43 ` Trond Myklebust
2011-11-30 2:55 ` tao.peng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ED58ADE.8010809@panasas.com \
--to=bharrosh@panasas.com \
--cc=Trond.Myklebust@netapp.com \
--cc=bergwolf@gmail.com \
--cc=bhalevy@tonian.com \
--cc=eshel@almaden.ibm.com \
--cc=garth@panasas.com \
--cc=iisaman@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=matt@linuxbox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).