From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Vrable Subject: Re: LVM Snapshot Troubles Date: Tue, 28 Sep 2004 11:09:05 -0700 Sender: xen-devel-admin@lists.sourceforge.net Message-ID: <20040928110905.A18973@cs.ucsd.edu> References: <41597B4B.6020009@thegreen.co.uk> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="PmA2V3Z32TCmWXqI" Return-path: Content-Disposition: inline In-Reply-To: ; from Ian.Pratt@cl.cam.ac.uk on Tue, Sep 28, 2004 at 04:43:25PM +0100 Errors-To: xen-devel-admin@lists.sourceforge.net List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , List-Archive: To: xen-devel@lists.sourceforge.net List-Id: xen-devel@lists.xenproject.org --PmA2V3Z32TCmWXqI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Sep 28, 2004 at 04:43:25PM +0100, Ian Pratt wrote: > There's nothing in slabinfo that looks crazy. I wander where all > your memory is gone? BTW: how big is your dom0? > > It's possible that dm-io or kcopyd is chewing up pages (which > won't show up in the slab allocator). I'm surprised they're not > just transient, though. When I've run into memory trouble with snapshots, I've always seen a stack backtrace that points me at kcopyd_client_create. Following the code: when creating a snapshot, a new kcopyd client is created with 256 (SNAPSHOT_PAGES in dm-snap.c) pages (= 1 MB) dedicated to that snapshot. I think I managed to dig up the logs from one of the failures I've seen; I've attached them to this message. The problem seems to be made worse by the fact that all 256 pages are allocated in a fairly short span of time, and (at least this is my guess) the allocation fails even if it would be possible for the kernel to free up the necessary memory with a bit more work. (I've been able to create many more snapshots before running into trouble if I try to make sure the kernel has a bit of extra free memory before each lvcreate call--using dd to create a several megabyte file, then deleting it to free up that space in the page cache.) As has been noted, LVM doesn't have a very graceful failure mode when this memory allocation problem is hit--I lose access to all the snapshots when that happens. I have also found that I can use dmsetup to create the COW devices myself, which did at least (if I'm remembering correctly--this was a little bit ago) have the benefit that if one snapshot failed, the others were still available. Basically, I used the same setup that LVM normally would, except that I didn't create a snapshot-origin device layered over the original device (this is what intercepts writes to the source device and propagates a copy of the original data to each snapshot, if needed). Doing this manually isn't ideal, however. Improvements that I think could be made: - Change the dm-snapshot driver in the kernel to (optionally?) allocate less memory for each snapshot, and fail more gracefully if unable to allocate the memory. - Adjust the LVM userspace tool to fail more gracefully if the device mapper driver gives an out-of-memory error. - Add an option to LVM for snapshots with a read-only origin (as I was doing manually with dmsetup). --Michael Vrable --PmA2V3Z32TCmWXqI Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=lvm_error_log Sep 3 18:33:51 localhost kernel: dmsetup: page allocation failure. order:0, mode:0xd0 Sep 3 18:33:51 localhost kernel: [__alloc_pages+727/839] __alloc_pages+0x2d7/0x347 Sep 3 18:33:51 localhost kernel: [kmem_cache_alloc+108/112] kmem_cache_alloc+0x6c/0x70 Sep 3 18:33:51 localhost kernel: [alloc_pl+51/82] alloc_pl+0x33/0x52 Sep 3 18:33:51 localhost kernel: [client_alloc_pages+28/87] client_alloc_pages+0x1c/0x57 Sep 3 18:33:51 localhost kernel: [vmalloc+32/36] vmalloc+0x20/0x24 Sep 3 18:33:51 localhost kernel: [kcopyd_client_create+104/185] kcopyd_client_create+0x68/0xb9 Sep 3 18:33:51 localhost kernel: [dm_create_persistent+199/305] dm_create_persistent+0xc7/0x131 Sep 3 18:33:51 localhost kernel: [snapshot_ctr+676/874] snapshot_ctr+0x2a4/0x36a Sep 3 18:33:51 localhost kernel: [dm_table_add_target+262/422] dm_table_add_target+0x106/0x1a6 Sep 3 18:33:51 localhost kernel: [populate_table+125/210] populate_table+0x7d/0xd2 Sep 3 18:33:51 localhost kernel: [table_load+103/298] table_load+0x67/0x12a Sep 3 18:33:51 localhost kernel: [ctl_ioctl+242/336] ctl_ioctl+0xf2/0x150 Sep 3 18:33:51 localhost kernel: [table_load+0/298] table_load+0x0/0x12a Sep 3 18:33:51 localhost kernel: [ctl_ioctl+0/336] ctl_ioctl+0x0/0x150 Sep 3 18:33:51 localhost kernel: [sys_ioctl+247/588] sys_ioctl+0xf7/0x24c Sep 3 18:33:51 localhost kernel: [syscall_call+7/11] syscall_call+0x7/0xb Sep 3 18:33:51 localhost kernel: Sep 3 18:33:51 localhost kernel: device-mapper: : Could not create kcopyd client Sep 3 18:33:51 localhost kernel: device-mapper: error adding target to table --PmA2V3Z32TCmWXqI-- ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl