From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oren Laadan Subject: Re: bugs with ckpt-v15-dev Date: Wed, 20 May 2009 01:28:49 -0400 Message-ID: <4A139511.7060900@cs.columbia.edu> References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Nathan Lynch Cc: Containers List-Id: containers.vger.kernel.org Nathan, Thanks for insisting on this ... I believe it's now fixed in the ckpt-v15-dev branch. In particular, error reporting works better, and there is a new utility "ckptinfo" which can do basic parsing of the checkpoint image. If given the switch '-e' it will display error strings found in the image. The checkpoint image format has changed so you need to pull both linux-cr and user-cr. Oren. Nathan Lynch wrote: > Last commit is ed3b275 "allow error string during checkpoint while > holding a spinlock". > > # bash -c 'exec <&- >&- 2>&- ; while : ; do : ; done' & > [1] 2269 > # ckpt $! > /tmp/bash.ckpt > > BUG: sleeping function called from invalid context at mm/slub.c:1595 > in_atomic(): 1, irqs_disabled(): 0, pid: 2270, name: ckpt > 1 lock held by ckpt/2270: > #0: (tasklist_lock){.+.+.+}, at: [] tree_count_tasks+0x2a/0x2a2 > Pid: 2270, comm: ckpt Not tainted 2.6.30-rc3-00074-ged3b275 #30 > Call Trace: > [] ? __debug_show_held_locks+0x1e/0x20 > [] __might_sleep+0x100/0x107 > [] kmem_cache_alloc+0x35/0x11f > [] ? __ckpt_generate_err+0x25/0x12b > [] ? put_lock_stats+0x1e/0x29 > [] __ckpt_generate_err+0x25/0x12b > [] ? ftrace_call+0x5/0x8 > [] __ckpt_write_err+0x16/0x18 > [] tree_count_tasks+0xf2/0x2a2 > [] do_checkpoint+0x150/0x5f2 > [] ? kzalloc+0x10/0x12 > [] ? ckpt_obj_hash_alloc+0x35/0x60 > [] ? ckpt_ctx_alloc+0x77/0x99 > [] sys_checkpoint+0x6c/0x82 > [] syscall_call+0x7/0xb > ------------[ cut here ]------------ > kernel BUG at checkpoint/checkpoint.c:136! > invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > last sysfs file: /sys/block/sda/size > Modules linked in: > > Pid: 2270, comm: ckpt Not tainted (2.6.30-rc3-00074-ged3b275 #30) > EIP: 0060:[] EFLAGS: 00010246 CPU: 0 > EIP is at __ckpt_generate_err+0xf2/0x12b > EAX: df051300 EBX: deb72f30 ECX: df051530 EDX: 0000001c > ESI: df051430 EDI: deb72f28 EBP: deb72f10 ESP: deb72ef8 > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > Process ckpt (pid: 2270, ti=deb72000 task=df9adf60 task.ti=deb72000) > Stack: > c072ce85 df051300 0000001c deb75600 df9ad1c0 00000000 deb72f18 c03911ba > deb72f50 c03912ae df051300 c072ce85 000008dd df9ad4ec df051300 df9ad1c0 > 00000000 00000000 00000000 deb75600 deb75604 df051300 deb72f98 c03915ae > Call Trace: > [] ? __ckpt_write_err+0x16/0x18 > [] ? tree_count_tasks+0xf2/0x2a2 > [] ? do_checkpoint+0x150/0x5f2 > [] ? kzalloc+0x10/0x12 > [] ? ckpt_obj_hash_alloc+0x35/0x60 > [] ? ckpt_ctx_alloc+0x77/0x99 > [] ? sys_checkpoint+0x6c/0x82 > [] ? syscall_call+0x7/0xb > Code: 08 0c 8b c0 03 74 1b f6 05 c2 8f ff c0 20 74 12 f6 05 c9 8f ff c0 10 74 09 80 3d 47 94 83 c0 00 75 1d 8b 45 ec 83 78 2c 00 75 04 <0f> 0b eb fe 8b 55 ec 31 c0 89 72 2c 8d 65 f4 5b 5e 5f 5d c3 31 > EIP: [] __ckpt_generate_err+0xf2/0x12b SS:ESP 0068:deb72ef8 > ---[ end trace d54433b47f0c4829 ]--- > note: ckpt[2270] exited with preempt_count 1 > BUG: scheduling while atomic: ckpt/2270/0x10000002 > INFO: lockdep is turned off. > Modules linked in: > Pid: 2270, comm: ckpt Tainted: G D 2.6.30-rc3-00074-ged3b275 #30 > Call Trace: > [] __schedule_bug+0x63/0x6a > [] __schedule+0x8f/0x7ac > [] ? print_lock_contention_bug+0x14/0xd7 > [] ? unmap_vmas+0x1e1/0x518 > [] ? ftrace_call+0x5/0x8 > [] ? ftrace_call+0x5/0x8 > [] schedule+0x17/0x38 > [] __cond_resched+0x26/0x3b > [] _cond_resched+0x2c/0x37 > [] unmap_vmas+0x4c7/0x518 > [] exit_mmap+0x6c/0xb7 > [] mmput+0x3c/0x8f > [] exit_mm+0xe3/0xeb > [] do_exit+0x188/0x64b > [] ? printk+0x14/0x16 > [] ? oops_exit+0x28/0x2d > [] oops_end+0x92/0x9a > [] die+0x59/0x5f > [] do_trap+0x89/0xa2 > [] ? do_invalid_op+0x0/0x80 > [] do_invalid_op+0x76/0x80 > [] ? __ckpt_generate_err+0xf2/0x12b > [] ? ftrace_call+0x5/0x8 > [] ? strnlen+0x8/0x1f > [] ? string+0x34/0x82 > [] ? vsnprintf+0x173/0x311 > [] ? vsnprintf+0x83/0x311 > [] ? trace_hardirqs_off_thunk+0xc/0x10 > [] error_code+0x72/0x78 > [] ? do_invalid_op+0x0/0x80 > [] ? __ckpt_generate_err+0xf2/0x12b > [] __ckpt_write_err+0x16/0x18 > [] tree_count_tasks+0xf2/0x2a2 > [] do_checkpoint+0x150/0x5f2 > [] ? kzalloc+0x10/0x12 > [] ? ckpt_obj_hash_alloc+0x35/0x60 > [] ? ckpt_ctx_alloc+0x77/0x99 > [] sys_checkpoint+0x6c/0x82 > [] syscall_call+0x7/0xb >