From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oren Laadan Subject: Re: [C/R] sleepers don't wake up on restart Date: Thu, 02 Apr 2009 18:18:29 -0400 Message-ID: <49D539B5.7060305@cs.columbia.edu> References: <20090402002005.GA22375@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20090402002005.GA22375-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Sukadev Bhattiprolu Cc: Containers List-Id: containers.vger.kernel.org Suka, can you please post the entire test program so I can try to reproduce it here ? thanks, Oren. Sukadev Bhattiprolu wrote: > Tried this with v14-rc2. This is probably not implemented yet... > > I created a simple 1-level process tree (parent with 10 children). The > parent just waits for the children to exit. The children do (error > checks removed). > > do_child(): > sprintf(cfile, "child-log-%d", cnum); > > i = 0; > while (!test_done()) { > cfp = fopen(cfile, "a"); > fprintf(cfp, "i = %d\n", i++); > fflush(cfp); > fclose(cfp); > sleep(1); > } > > test_done(): > > rc = access("test-done", F_OK) > return rc == 0; > > After freezing and restarting (using mktree), only one off 10 children makes > progress. Others are stuck in: > > ptree1 S f63ef170 0 8272 1 > f5f17780 00000082 c2b1d5c8 f63ef170 f63ef2dc c2b20260 f5e0ff44 1c18de95 > 00000282 c0233594 c2b1d5c8 00000000 00000282 0000c350 00000001 0000c350 > 00000001 f5e0ff44 0000c350 c051b229 00000000 00000001 0000c350 bff32f44 > Call Trace: > [] hrtimer_start_range_ns+0x105/0x111 > [] do_nanosleep+0x54/0x8c > [] hrtimer_nanosleep+0x8f/0xee > [] hrtimer_wakeup+0x0/0x18 > [] do_nanosleep+0x3a/0x8c > [] sys_nanosleep+0x41/0x51 > [] sysenter_do_call+0x12/0x25 > > One of them, (the 8th child) makes progress and continues to write to > its file > > ptree1 D f6bb7170 0 8280 1 > f5de8b00 00000086 00000000 f6bb7170 f6bb72dc c2b17260 c0288ead 000165a5 > c05322de c02dde8a ffffffff f6b34800 00000000 00000000 00000000 f6255d94 > f5dd0700 c2b02314 f552d208 c02de045 00000000 f6e21478 00000000 f6b14600 > Call Trace: > [] __find_get_block+0x121/0x12b > [] journal_stop+0x231/0x23d > [] do_get_write_access+0x1af/0x33b > [] wake_bit_function+0x0/0x3c > [] journal_get_write_access+0x18/0x26 > [] __ext3_journal_get_write_access+0x13/0x32 > [] ext3_reserve_inode_write+0x2d/0x5d > [] ext3_mark_inode_dirty+0x1a/0x30 > [] ext3_dirty_inode+0x50/0x63 > [] __mark_inode_dirty+0x21/0x129 > [] file_update_time+0x7e/0xa7 > [] __generic_file_aio_write_nolock+0x32a/0x4a1 > [] mntput_no_expire+0x13/0xd7 > [] generic_file_aio_write+0x4f/0xa6 > [] ext3_file_write+0x19/0x83 > [] do_sync_write+0xbf/0x100 > [] autoremove_wake_function+0x0/0x2d > [] page_add_new_anon_rmap+0x20/0x3b > [] handle_mm_fault+0x1df/0x50c > [] security_file_permission+0xc/0xd > [] do_sync_write+0x0/0x100 > [] vfs_write+0x83/0xf6 > [] sys_write+0x3c/0x63 > > When I create the 'test-done' file, this (8th child) exits but others are > still sleeping (confirms they never woke up from the sleep(1)) > > Sukadev >