From: Joe Thornber <thornber@redhat.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Heinz Mauelshagen <heinzm@redhat.com>
Subject: Re: call for slideware ;)
Date: Wed, 23 Feb 2011 12:24:48 +0000 [thread overview]
Message-ID: <1298463888.19562.138.camel@ubuntu> (raw)
In-Reply-To: <20110223012159.GA13983@redhat.com>
Mike,
On Tue, 2011-02-22 at 20:22 -0500, Mike Snitzer wrote:
> I just had a look at the latest content and have some questions (way
> more than I'd imagine you'd like to see.. means I'm clearly missing a
> lot):
Thanks a lot for taking the time to go through this. I'm updating the
document as I answer your questions. I'll put the git commit hashes in
square brackets to make it easier for you to pick out the changes for
each question.
> 1) from "Solution" slide:
> "Space comes from a preallocated ‘pool’, which is itself just another
> logical volume, thus can be resized on demand."
> ...
> "Separate metadata device simplifies extension, this is hidden by the
> LVM system so sys admin unlikely to be aware of it."
> Q: Can you elaborate on the role of the metadata? It maps between
> physical "area" (allocated from pool) for all writes to the
> logical address space?
[0127dd9]
> Q: can thinp and snapshot metadata coexist in the same pool? -- ask
> similar question below.
I've added a new introduction section at the start of the document that
tries to explain that the thinp target is just a simple thin
provisioning solution, whereas multisnap will provide both thinp and
snapshots. [70e448f]
>
> 2) from "Block size choice" slide:
> The larger the block size:
> - the less chance there is of fragmentation (describe this)
> Q: can you please "describe this"? :)
[a6306c8]
> - the less frequently we need the expensive mapping operation
> Q: "expensive" is all relative, seems you contradict the expense of
> the mapping operation in the "Performance" slide?
[938422d] You still want to minimise it. The performance at small
block sizes is better than I expected.
> - the smaller the metadata tables are, so more of them can be held in core
> at a time. Leading to faster access to the provisioned blocks by
> minimizing reading in mapping information
> Q: "more of them" -- "them" being metadata tables? So the take
> away is more thinp devices available on the same host?
No, fewer reads to load bit of the mapping table that aren't in the
cache. [9ba3ae3]
>
> 3) from "Performance" slide:
> "Expensive operation is mapping in a new ‘area’"
> Q: is area the same as a block in the pool? Why not call block size:
> "area size"? "Block size" is familiar to people? Original snapshot
> had "chunk size".
I switched from 'chunk' to 'block' because we seem to be the only people
who use the term chunk (my fault) and I was reading lots of filesystem
papers in preparation for this work where block is more ubiquitous.
I've changed 'area' and 'region' to block [1c6a5352]. If you think it's
still confusing I'll change everything to 'chunk' (the LVM2 tools are
still going to use --chunksize etc.).
> 4) Q: what did you decide to run with for reads to logical address space
> that weren't previously mapped? Just return zeroes like was
> discussed on lvm-team?
[49c8490]
I've added a 'target parameter' section [8332c43].
> The "Metadata object" section is where you lose me:
I've added some more background stuff [c8e1685].
>
> 5) I'm not clear on the notion of "external" vs "internal" snapshots.
> Q: can you elaborate on their characteristics?
See above commit.
> 6) I'm not clear on how you're going to clone the metadata tree for
> userspace to walk (for snapshot merge, etc). Is that "clone" really
> a snapshot of the metadata device? -- seems unlikely as you'd need a
> metadata device for your metadata device's snapshots?
No.
> - you said: "Userland will be given the location of an alternative
> superblock for the metadata device. This is the root of a tree of
> blocks referring to other blocks in a variety of data structures
> (btrees, space maps etc.). Blocks will be shared with the ‘live’
> version of the metadata, their reference counts may change as
> sharing is broken, but we know the blocks will never be updated."
> - Q: is this describing an "internal snapshot"?
No. I don't really want to go into how the persistent-data library
works. I should start a separate document for that. If you think I'm
just confusing people by adding these issues then I can take this
section out?
> 7) from the "thin' target section:
> "All devices stored within a metadata object are instanced with this
> target. Be they fully mapped devices, thin provisioned devices, internal
> snapshots or external snapshots."
> Q: what is a fully mapped device?
A thinp that's fully mapped, I'll take it out [831c136].
>
> 8) "The target line:
>
> thin <pool object> <internal device id>"
> Q: so by <pool object>, that is the _id_ of a pool object that was
> returned from the 'create virtual device' message?
Yep, or rather the id that was passed in to that call. Userland is in
charge of allocating these numbers.
> In general my understanding of all this shared store infrastructure is a
> muddled. I need the audience to take away big concepts not get tripped
> up (or trip me up!) on the minutia.
Agreed, let's try and restrict this document to high level stuff. I'll
do a separate persistent-data doc with the detail in.
>
> Subtle inconsistencies and/or opaque explanation aren't helping, e.g.:
> 1) the detail of "Configuration/Use" for thinp volume
> - "Allocate (empty) logical volume for the thin provisioning pool"
> Q: how can it be "empty"? Isn't it the data volume you hand to
> the pool target?
Changed to 'possibly empty' [3ce2226]. I think this scenario will occur
quite often, for example a VM hosting service might create a new VM for
a client with a bunch of thinp devices, but not want to commit any space
to the VM until the client actually starts using the devices.
> - "Allocate small logical volume for the thin provisioning metadata"
> Q: before in "Solution" slide you said "Separate metadata device
> simplifies extension", can the metadata volume be extended too?
That's the plan. A userland library will make the necc. tweaks to the
metadata while the device is suspended.
> - "Set up thin provisioning mapped device on aforementioned 2 LVs"
> Q: so there is no distinct step for creating a pool?
For the thinp target, the data device that you pass in to the target is
the 'pool'. I hope the 'target parameters' section I've added helps
explain this?
> Q: pool is implicitly created at the time the thinp device is
> created? (doubtful but how you enumerated the steps makes it
> misleading/confusing).
The LVM tools will implicitly create the data/backing device and the
metadata device. agk is envisioning a command line like:
lvcreate --target-type=thinp --chunksize=512k --low-water-mark=4 -L10G
> Q: can snapshot and thinp volumes share the same pool?
> (if possible I could see it being brittle?)
> (but expressing such capability will help the audience "get"
> the fact that the pool is nicely abstracted/sound design,
> etc).
I'm not sure if you're talking thinp target or multisnap here. Why
'brittle'?
> p.s. I was going to hold off sending this and take another pass of your
> slides but decided your feedback to all my Q:s would likely be much more
> helpful than me trying to parse the slides again.
You definitely did right to send these, it gives me a kick to keep
improving it. Have a read through it now and see if it's any better.
I'm quite happy to keep revising it for you.
- Joe
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
next prev parent reply other threads:[~2011-02-23 12:24 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20110209231656.GB5193@redhat.com>
2011-02-10 14:59 ` call for slideware ;) Joe Thornber
2011-02-23 1:21 ` Mike Snitzer
2011-02-23 12:18 ` Heinz Mauelshagen
2011-02-23 12:24 ` Joe Thornber [this message]
2011-02-23 12:59 ` Joe Thornber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1298463888.19562.138.camel@ubuntu \
--to=thornber@redhat.com \
--cc=dm-devel@redhat.com \
--cc=heinzm@redhat.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox