dm-userspace memory consumption in remap cache

All of lore.kernel.org
 help / color / mirror / Atom feed

* dm-userspace memory consumption in remap cache
@ 2006-08-25 18:13 Benjamin Gilbert
  2006-08-25 18:43 ` Dan Smith
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Gilbert @ 2006-08-25 18:13 UTC (permalink / raw)
  To: danms; +Cc: dm-devel, Niraj Tolia

Hi Dan,

I've been playing with a program which uses the libdmu/libdevmapper 
interface to map a block device through dm-userspace.  (I haven't been 
using cowd; I'm looking to integrate dmu support into an existing program.)

I noticed that after I wrote 1 GB of data to a dmu device with a 4 KB 
blocksize, the dm-userspace-remaps slab cache consumed about 39 MB of 
memory.  Looking at alloc_remap_atomic(), dmu makes no attempt to reuse 
dmu_maps until a memory allocation fails, so that potentially dmu could 
force a large amount of data out of the page cache to make room for its map.

I've considered some workarounds from the userspace side, but they all 
seem fairly suboptimal:

1. Periodically invalidate the entire table.  When cowd does this right 
now (on SIGHUP), it invalidates each page individually, which is not 
very pleasant.  I suppose this could be done by loading a new dm table.

2. Periodically trigger block invalidations from userspace, fired by 
either the completion notification mechanism or a periodic timer. 
Userspace couldn't do this in an LRU fashion, since it doesn't see remap 
cache hits.

(As an aside, I haven't been able to figure out the semantics of the 
completion notification mechanism.  Could you provide an example of how 
you expect it to be used from the userspace side?)

3. Map in dm-linear when there are large consecutive ranges, to try to 
keep the table size down.  Some of the early dm-cow design notes 
mentioned this approach*, but I notice that the current cowd doesn't use 
it.  Is this still a recommended procedure?

 From the kernel side -- if the remap cache in the kernel is expected to 
be a subset of the mapping information maintained by userspace, it seems 
as though it should be possible to more aggressively reuse the LRU 
dmu_maps.  That would impose a performance penalty for the extra map 
requests to userspace, but I wonder how that balances against having a 
larger page cache.

Thoughts?

Thanks
--Benjamin Gilbert

* http://www.redhat.com/archives/dm-devel/2006-March/msg00013.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dm-userspace memory consumption in remap cache
  2006-08-25 18:13 dm-userspace memory consumption in remap cache Benjamin Gilbert
@ 2006-08-25 18:43 ` Dan Smith
  2006-08-25 19:32   ` Benjamin Gilbert
  0 siblings, 1 reply; 4+ messages in thread
From: Dan Smith @ 2006-08-25 18:43 UTC (permalink / raw)
  To: Benjamin Gilbert; +Cc: dm-devel, Niraj Tolia

[-- Attachment #1.1: Type: text/plain, Size: 4515 bytes --]

BG> I've been playing with a program which uses the
BG> libdmu/libdevmapper interface to map a block device through
BG> dm-userspace.  (I haven't been using cowd; I'm looking to
BG> integrate dmu support into an existing program.)

Very cool!

BG> I noticed that after I wrote 1 GB of data to a dmu device with a 4
BG> KB blocksize, the dm-userspace-remaps slab cache consumed about 39
BG> MB of memory.  

Ah, right.

BG> Looking at alloc_remap_atomic(), dmu makes no attempt to reuse
BG> dmu_maps until a memory allocation fails, so that potentially dmu
BG> could force a large amount of data out of the page cache to make
BG> room for its map.

That's true.  Good point.

BG> 1. Periodically invalidate the entire table.  When cowd does this
BG> right now (on SIGHUP), it invalidates each page individually,
BG> which is not very pleasant.  I suppose this could be done by
BG> loading a new dm table.

Right, invalidating the entire table, one remap at a time, would be a
bad thing.  The current SIGHUP behavior was just intended to be a
mechanism for me to test the invalidation process.

BG> 2. Periodically trigger block invalidations from userspace, fired
BG> by either the completion notification mechanism or a periodic
BG> timer. Userspace couldn't do this in an LRU fashion, since it
BG> doesn't see remap cache hits.

Right.  We could push statistic information back to cowd when there
was nothing else to do.  That might be interesting, but probably not
the best way to solve this particular issue.

BG> (As an aside, I haven't been able to figure out the semantics of
BG> the completion notification mechanism.  Could you provide an
BG> example of how you expect it to be used from the userspace side?)

Recent versions of cowd use this to prevent the completion (endio)
From firing until it has flushed its internal metadata mapping to
disk, to prevent the data from being written and the completion event
sent, when the data isn't really on the disk (well, it's on the disk,
but if we crash before we write our metadata, we can't tell that it's
really there during recovery).

BG> 3. Map in dm-linear when there are large consecutive ranges, to
BG> try to keep the table size down.  Some of the early dm-cow design
BG> notes mentioned this approach*, but I notice that the current cowd
BG> doesn't use it.  Is this still a recommended procedure?

I don't think this is the best approach, because if you want to
invalidate a mapping, you'd have to split the dm-linear back up,
suspend/resume the device, etc.

Initially, I was planning to take a cow-centric approach, where
dm-linear could be used to map the sections that were already mapped.
Now that I'm focusing on a more generic approach, we want it to be
more flexible, which is why I implemented a hash table for the remaps
(my initial plan was to remap with dm-linear for performance reasons).

BG> From the kernel side -- if the remap cache in the kernel is
BG> expected to be a subset of the mapping information maintained by
BG> userspace, it seems as though it should be possible to more
BG> aggressively reuse the LRU dmu_maps.  

Yes.

BG> That would impose a performance penalty for the extra map requests
BG> to userspace, but I wonder how that balances against having a
BG> larger page cache.

Correct.

BG> Thoughts?

So, my preference would be to put a limit on the number of remaps that
we maintain "cached" in the kernel.  The existing MRU list (which is
an LRU list if you traverse it backwards) would allow us to more
aggressively re-use remaps as we approached the limit.  Setting a
higher limit at device creation time would allow for more memory
usage, but better performance.

My testing shows that communication with userspace (i.e. to refresh a
mapping that we expired to make room for another) is not as much of a
performance hit as I would have initially imagined.  Thus, I think the
above would be a good way to limit a full-scale memory takeover by
dm-userspace :)

Now that I know at least someone is paying attention, I'll try to get
my latest dm-userspace and cowd versions out on this list.  A small
fix has been made to dm-userspace, and several improvements and fixes
have been made to cowd.  After I post my current code, I'll implement
the memory limit/aggressive reuse functionality and post that as well.

Thanks!

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #1.2: Type: application/pgp-signature, Size: 190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dm-userspace memory consumption in remap cache
  2006-08-25 18:43 ` Dan Smith
@ 2006-08-25 19:32   ` Benjamin Gilbert
  2006-08-25 19:40     ` Dan Smith
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Gilbert @ 2006-08-25 19:32 UTC (permalink / raw)
  To: Dan Smith; +Cc: Niraj Tolia, dm-devel

Dan Smith wrote:
> BG> I noticed that after I wrote 1 GB of data to a dmu device with a 4
> BG> KB blocksize, the dm-userspace-remaps slab cache consumed about 39
> BG> MB of memory.  
> Ah, right.

In fact, when the dmu device is unmapped, destroy_dmu_device() moves all 
of the dmu_maps to the end of the MRU list but does not free them, so 
that memory stays around.  If a new device is created, its dmu_maps will 
still be obtained from kmem_cache_alloc even though there are unused 
dmu_maps.

> We could push statistic information back to cowd when there
> was nothing else to do.  That might be interesting, but probably not
> the best way to solve this particular issue.

Come to think of it, that would be very interesting data in its own 
right.  Hmm... you could push statistics on a block whenever its mapping 
expires, but that doesn't help the MFU blocks.  You could provide a 
query-and-reset-counters request for individual blocks still in the 
cache, and since userspace could watch the statistics pushes to see what 
blocks had been removed, it would know which blocks to query on.  That 
would allow userspace to maintain statistics on whatever level of 
time-granularity it wanted, without requiring the kernel to do periodic 
sweeps or to do large dumps to userspace.

...I'm probably missing an obvious reason that that won't work.

> BG> (As an aside, I haven't been able to figure out the semantics of
> BG> the completion notification mechanism.  Could you provide an
> BG> example of how you expect it to be used from the userspace side?)
> Recent versions of cowd use this to prevent the completion (endio)
> From firing until it has flushed its internal metadata mapping to
> disk, to prevent the data from being written and the completion event
> sent, when the data isn't really on the disk (well, it's on the disk,
> but if we crash before we write our metadata, we can't tell that it's
> really there during recovery).

Okay, I see.

> BG> 3. Map in dm-linear when there are large consecutive ranges, to
> BG> try to keep the table size down.
> I don't think this is the best approach, because if you want to
> invalidate a mapping, you'd have to split the dm-linear back up,
> suspend/resume the device, etc.

Oh, good point.

> Now that I know at least someone is paying attention, I'll try to get
> my latest dm-userspace and cowd versions out on this list.  A small
> fix has been made to dm-userspace, and several improvements and fixes
> have been made to cowd.  After I post my current code, I'll implement
> the memory limit/aggressive reuse functionality and post that as well.

Great.  Thanks!

Thanks
--Benjamin Gilbert

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dm-userspace memory consumption in remap cache
  2006-08-25 19:32   ` Benjamin Gilbert
@ 2006-08-25 19:40     ` Dan Smith
  0 siblings, 0 replies; 4+ messages in thread
From: Dan Smith @ 2006-08-25 19:40 UTC (permalink / raw)
  To: Benjamin Gilbert; +Cc: Niraj Tolia, dm-devel

[-- Attachment #1.1: Type: text/plain, Size: 766 bytes --]

BG> In fact, when the dmu device is unmapped, destroy_dmu_device()
BG> moves all of the dmu_maps to the end of the MRU list but does not
BG> free them, so that memory stays around.  If a new device is
BG> created, its dmu_maps will still be obtained from kmem_cache_alloc
BG> even though there are unused dmu_maps.

Ah, yet another good point :)

BG> Come to think of it, that would be very interesting data in its
BG> own right.

I agree.

BG> ...I'm probably missing an obvious reason that that won't work.

I'm not sure why it wouldn't work.  I think it's likely to result in a
lot of polling, but it could definitely be useful in certain
situations.

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@us.ibm.com

[-- Attachment #1.2: Type: application/pgp-signature, Size: 190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-08-25 19:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-25 18:13 dm-userspace memory consumption in remap cache Benjamin Gilbert
2006-08-25 18:43 ` Dan Smith
2006-08-25 19:32   ` Benjamin Gilbert
2006-08-25 19:40     ` Dan Smith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.