* dm-userspace memory consumption in remap cache @ 2006-08-25 18:13 Benjamin Gilbert 2006-08-25 18:43 ` Dan Smith 0 siblings, 1 reply; 4+ messages in thread From: Benjamin Gilbert @ 2006-08-25 18:13 UTC (permalink / raw) To: danms; +Cc: dm-devel, Niraj Tolia Hi Dan, I've been playing with a program which uses the libdmu/libdevmapper interface to map a block device through dm-userspace. (I haven't been using cowd; I'm looking to integrate dmu support into an existing program.) I noticed that after I wrote 1 GB of data to a dmu device with a 4 KB blocksize, the dm-userspace-remaps slab cache consumed about 39 MB of memory. Looking at alloc_remap_atomic(), dmu makes no attempt to reuse dmu_maps until a memory allocation fails, so that potentially dmu could force a large amount of data out of the page cache to make room for its map. I've considered some workarounds from the userspace side, but they all seem fairly suboptimal: 1. Periodically invalidate the entire table. When cowd does this right now (on SIGHUP), it invalidates each page individually, which is not very pleasant. I suppose this could be done by loading a new dm table. 2. Periodically trigger block invalidations from userspace, fired by either the completion notification mechanism or a periodic timer. Userspace couldn't do this in an LRU fashion, since it doesn't see remap cache hits. (As an aside, I haven't been able to figure out the semantics of the completion notification mechanism. Could you provide an example of how you expect it to be used from the userspace side?) 3. Map in dm-linear when there are large consecutive ranges, to try to keep the table size down. Some of the early dm-cow design notes mentioned this approach*, but I notice that the current cowd doesn't use it. Is this still a recommended procedure? From the kernel side -- if the remap cache in the kernel is expected to be a subset of the mapping information maintained by userspace, it seems as though it should be possible to more aggressively reuse the LRU dmu_maps. That would impose a performance penalty for the extra map requests to userspace, but I wonder how that balances against having a larger page cache. Thoughts? Thanks --Benjamin Gilbert * http://www.redhat.com/archives/dm-devel/2006-March/msg00013.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dm-userspace memory consumption in remap cache 2006-08-25 18:13 dm-userspace memory consumption in remap cache Benjamin Gilbert @ 2006-08-25 18:43 ` Dan Smith 2006-08-25 19:32 ` Benjamin Gilbert 0 siblings, 1 reply; 4+ messages in thread From: Dan Smith @ 2006-08-25 18:43 UTC (permalink / raw) To: Benjamin Gilbert; +Cc: dm-devel, Niraj Tolia [-- Attachment #1.1: Type: text/plain, Size: 4515 bytes --] BG> I've been playing with a program which uses the BG> libdmu/libdevmapper interface to map a block device through BG> dm-userspace. (I haven't been using cowd; I'm looking to BG> integrate dmu support into an existing program.) Very cool! BG> I noticed that after I wrote 1 GB of data to a dmu device with a 4 BG> KB blocksize, the dm-userspace-remaps slab cache consumed about 39 BG> MB of memory. Ah, right. BG> Looking at alloc_remap_atomic(), dmu makes no attempt to reuse BG> dmu_maps until a memory allocation fails, so that potentially dmu BG> could force a large amount of data out of the page cache to make BG> room for its map. That's true. Good point. BG> 1. Periodically invalidate the entire table. When cowd does this BG> right now (on SIGHUP), it invalidates each page individually, BG> which is not very pleasant. I suppose this could be done by BG> loading a new dm table. Right, invalidating the entire table, one remap at a time, would be a bad thing. The current SIGHUP behavior was just intended to be a mechanism for me to test the invalidation process. BG> 2. Periodically trigger block invalidations from userspace, fired BG> by either the completion notification mechanism or a periodic BG> timer. Userspace couldn't do this in an LRU fashion, since it BG> doesn't see remap cache hits. Right. We could push statistic information back to cowd when there was nothing else to do. That might be interesting, but probably not the best way to solve this particular issue. BG> (As an aside, I haven't been able to figure out the semantics of BG> the completion notification mechanism. Could you provide an BG> example of how you expect it to be used from the userspace side?) Recent versions of cowd use this to prevent the completion (endio) From firing until it has flushed its internal metadata mapping to disk, to prevent the data from being written and the completion event sent, when the data isn't really on the disk (well, it's on the disk, but if we crash before we write our metadata, we can't tell that it's really there during recovery). BG> 3. Map in dm-linear when there are large consecutive ranges, to BG> try to keep the table size down. Some of the early dm-cow design BG> notes mentioned this approach*, but I notice that the current cowd BG> doesn't use it. Is this still a recommended procedure? I don't think this is the best approach, because if you want to invalidate a mapping, you'd have to split the dm-linear back up, suspend/resume the device, etc. Initially, I was planning to take a cow-centric approach, where dm-linear could be used to map the sections that were already mapped. Now that I'm focusing on a more generic approach, we want it to be more flexible, which is why I implemented a hash table for the remaps (my initial plan was to remap with dm-linear for performance reasons). BG> From the kernel side -- if the remap cache in the kernel is BG> expected to be a subset of the mapping information maintained by BG> userspace, it seems as though it should be possible to more BG> aggressively reuse the LRU dmu_maps. Yes. BG> That would impose a performance penalty for the extra map requests BG> to userspace, but I wonder how that balances against having a BG> larger page cache. Correct. BG> Thoughts? So, my preference would be to put a limit on the number of remaps that we maintain "cached" in the kernel. The existing MRU list (which is an LRU list if you traverse it backwards) would allow us to more aggressively re-use remaps as we approached the limit. Setting a higher limit at device creation time would allow for more memory usage, but better performance. My testing shows that communication with userspace (i.e. to refresh a mapping that we expired to make room for another) is not as much of a performance hit as I would have initially imagined. Thus, I think the above would be a good way to limit a full-scale memory takeover by dm-userspace :) Now that I know at least someone is paying attention, I'll try to get my latest dm-userspace and cowd versions out on this list. A small fix has been made to dm-userspace, and several improvements and fixes have been made to cowd. After I post my current code, I'll implement the memory limit/aggressive reuse functionality and post that as well. Thanks! -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com [-- Attachment #1.2: Type: application/pgp-signature, Size: 190 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dm-userspace memory consumption in remap cache 2006-08-25 18:43 ` Dan Smith @ 2006-08-25 19:32 ` Benjamin Gilbert 2006-08-25 19:40 ` Dan Smith 0 siblings, 1 reply; 4+ messages in thread From: Benjamin Gilbert @ 2006-08-25 19:32 UTC (permalink / raw) To: Dan Smith; +Cc: Niraj Tolia, dm-devel Dan Smith wrote: > BG> I noticed that after I wrote 1 GB of data to a dmu device with a 4 > BG> KB blocksize, the dm-userspace-remaps slab cache consumed about 39 > BG> MB of memory. > Ah, right. In fact, when the dmu device is unmapped, destroy_dmu_device() moves all of the dmu_maps to the end of the MRU list but does not free them, so that memory stays around. If a new device is created, its dmu_maps will still be obtained from kmem_cache_alloc even though there are unused dmu_maps. > We could push statistic information back to cowd when there > was nothing else to do. That might be interesting, but probably not > the best way to solve this particular issue. Come to think of it, that would be very interesting data in its own right. Hmm... you could push statistics on a block whenever its mapping expires, but that doesn't help the MFU blocks. You could provide a query-and-reset-counters request for individual blocks still in the cache, and since userspace could watch the statistics pushes to see what blocks had been removed, it would know which blocks to query on. That would allow userspace to maintain statistics on whatever level of time-granularity it wanted, without requiring the kernel to do periodic sweeps or to do large dumps to userspace. ...I'm probably missing an obvious reason that that won't work. > BG> (As an aside, I haven't been able to figure out the semantics of > BG> the completion notification mechanism. Could you provide an > BG> example of how you expect it to be used from the userspace side?) > Recent versions of cowd use this to prevent the completion (endio) > From firing until it has flushed its internal metadata mapping to > disk, to prevent the data from being written and the completion event > sent, when the data isn't really on the disk (well, it's on the disk, > but if we crash before we write our metadata, we can't tell that it's > really there during recovery). Okay, I see. > BG> 3. Map in dm-linear when there are large consecutive ranges, to > BG> try to keep the table size down. > I don't think this is the best approach, because if you want to > invalidate a mapping, you'd have to split the dm-linear back up, > suspend/resume the device, etc. Oh, good point. > Now that I know at least someone is paying attention, I'll try to get > my latest dm-userspace and cowd versions out on this list. A small > fix has been made to dm-userspace, and several improvements and fixes > have been made to cowd. After I post my current code, I'll implement > the memory limit/aggressive reuse functionality and post that as well. Great. Thanks! Thanks --Benjamin Gilbert ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dm-userspace memory consumption in remap cache 2006-08-25 19:32 ` Benjamin Gilbert @ 2006-08-25 19:40 ` Dan Smith 0 siblings, 0 replies; 4+ messages in thread From: Dan Smith @ 2006-08-25 19:40 UTC (permalink / raw) To: Benjamin Gilbert; +Cc: Niraj Tolia, dm-devel [-- Attachment #1.1: Type: text/plain, Size: 766 bytes --] BG> In fact, when the dmu device is unmapped, destroy_dmu_device() BG> moves all of the dmu_maps to the end of the MRU list but does not BG> free them, so that memory stays around. If a new device is BG> created, its dmu_maps will still be obtained from kmem_cache_alloc BG> even though there are unused dmu_maps. Ah, yet another good point :) BG> Come to think of it, that would be very interesting data in its BG> own right. I agree. BG> ...I'm probably missing an obvious reason that that won't work. I'm not sure why it wouldn't work. I think it's likely to result in a lot of polling, but it could definitely be useful in certain situations. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com [-- Attachment #1.2: Type: application/pgp-signature, Size: 190 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-08-25 19:40 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-25 18:13 dm-userspace memory consumption in remap cache Benjamin Gilbert 2006-08-25 18:43 ` Dan Smith 2006-08-25 19:32 ` Benjamin Gilbert 2006-08-25 19:40 ` Dan Smith
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.