From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:45941) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gzf8y-000101-EZ for qemu-devel@nongnu.org; Fri, 01 Mar 2019 05:11:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gzf8x-0006YR-5L for qemu-devel@nongnu.org; Fri, 01 Mar 2019 05:11:52 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49896) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gzf8w-0006Xj-SE for qemu-devel@nongnu.org; Fri, 01 Mar 2019 05:11:51 -0500 Date: Fri, 1 Mar 2019 10:11:42 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20190301101141.GB2851@work-vm> References: <20190227164900.16378-1-dgilbert@redhat.com> <20190228062803.GA7471@xz-x1> <20190228114019.GB4970@work-vm> <20190228120103.GD9217@redhat.com> <20190228122822.GD4970@work-vm> <20190301032718.GB7661@xz-x1> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20190301032718.GB7661@xz-x1> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] migration: Cleanup during exit List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Xu Cc: Daniel =?iso-8859-1?Q?P=2E_Berrang=E9?= , alex.bennee@linaro.org, qemu-devel@nongnu.org, quintela@redhat.com * Peter Xu (peterx@redhat.com) wrote: > On Thu, Feb 28, 2019 at 12:28:22PM +0000, Dr. David Alan Gilbert wrote: > > * Daniel P. Berrang=E9 (berrange@redhat.com) wrote: > > > On Thu, Feb 28, 2019 at 11:40:19AM +0000, Dr. David Alan Gilbert wr= ote: > > > > * Peter Xu (peterx@redhat.com) wrote: > > > > > On Wed, Feb 27, 2019 at 04:49:00PM +0000, Dr. David Alan Gilber= t (git) wrote: > > > > > > From: "Dr. David Alan Gilbert" > > > > > >=20 > > > > > > Currently we cleanup the migration object as we exit main aft= er the > > > > > > main_loop finishes; however if there's a migration running th= ings > > > > > > get messy and we can end up with the migration thread still t= rying > > > > > > to access freed structures. > > > > > >=20 > > > > > > We now take a ref to the object around the migration thread i= tself, > > > > > > so the act of dropping the ref during exit doesn't cause us t= o lose > > > > > > the state until the thread quits. > > > > > >=20 > > > > > > Cancelling the migration during migration also tries to get t= he thread > > > > > > to quit. > > > > > >=20 > > > > > > We do this a bit earlier; so hopefully migration gets out of = the way > > > > > > before all the devices etc are freed. > > > > >=20 > > > > > So does it mean that even with the patch it's still possible th= e > > > > > migration thread will be accessing device structs that have alr= eady > > > > > been freed which can still crash QEMU? > > > >=20 > > > > Possibly yes; I'm not sure how to go to the next stage and stop t= hat > > > > case; the consensus seems to be we don't want to explicitly block > > > > during the exit process, so doing a join on the migration thread = doesn't > > > > seem to be wanted. > > >=20 > > > Essentially the many *_cleanup() calls at the end of main() in vl.= c > > > are only ever safe if all background threads have stopped using the > > > resources that are being freed. This isn't the case with migration > > > currently. I also worry about other threads that might be running > > > in QEMU, SPICE in particular as it touchs many device backends. > > >=20 > > > Aborting the migration & joining the migration threads is the natur= al > > > way to address this. Cancelling appears to require the main loop to > > > still be running, so would require main_loop_should_exit() to issue > > > the cancel & return false unless it has completed. This would delay > > > shutdown for as long as it takes migration to abort. > >=20 > > ish - cancelling performs a shutdown(2) on the fd, that should be eno= ugh > > in most cases to kick the migration thread into falling out without > > main loop interaction; I think... >=20 > Dave, could you hint me on when shutdown() will not suffice (say, > we'll hang death if we do join() upon the migration thread)? I think I mostly worry about when we hang in a device's migration code or where that interacts with an external program (e.g. vhost-user). > >=20 > > > FWIW, even the bdrv_close_all() call during shutdown can delay thin= gs > > > for a considerable time if draining the backends isn't a quick op, > > > which could be the case if there are storage problems (blocked loca= l > > > I/O, or interrupted network - rbd/ceph/nfs clients). So I'm not sur= e > > > that stopping migration is inherantly worse than what's already > > > possible with block cleanup, in terms of delaying things. > > >=20 > > > A slightly more hacky approach would be to pthread_cancel() all the > > > migration threads. Normally we'd never use pthread_cancel() as it > > > is incredibly hard to ensure all memory used by the threads is > > > freed. Since we're about to exit the process though, this cleanup > > > does not matter. The risk, however, is that one of the threads is > > > holding a mutex which then blocks the rest of the *cleanup function= s, > > > so this still feels dangerous. > > >=20 > > > Overall to me a clean cancel & join of the migration threads feels > > > like the only safe option, unless we just remove all notion of > > > cleanup from QEMU & delete all the _cleanup functions in main() > > > and let the OS free all memory. > >=20 > > That's actually my preference; I think trying to do clean tidy ups > > here is very racy and doesn't gain much. However, for things like > > storage there may be locks that have to be properly dropped and > > bitmaps and things to be stored/sync'd - so just exiting probably > > isn't safe either. >=20 > I'm unsure about whether I caught the whole idea but I feel like we > can't drop all the cleanup hooks since what if we want to do something > else than "freeing memories"? Or anything that the OS can't do for us > but we want to try to do before the process quits. If that operation > hangs we'll probably face a similar hang issue. Right; things like where the block code wants to ensure that bitmaps/data structtures/buffers are saved out - you need to give them a chance. > Regarding pthread_cancel() - it will only work if the thread reaches > cancellation points, right? Then does it mean that it still cannot > guarantee the thread will quit very soon and we still have chance that > we'll wait forever when join()? One major reason that the thread will > be waiting should be the migration streams but AFAIU shutdown() (as > Dave has mentioned) should solve this problem properly, then I'm > unsuer how pthread_cancel() can help much comparing to shutdown()... If I understand correctly, threads are always cancelable by default? > My understanding (probably not correct, but I'll just speak loud...) > is that we will never have a way to guarantee a process to quit > cleanly all the time because there're really many things that can hang > (I'm assuming we've already solved the MigrationState refcounting > issue by this patch, so I'm assuming we're having a "hang QEMU" rather > than crashing issue). Then IMHO we could simply do whatever we can do > to cleanup everything assuming no hang will happen, and if it really > happens we use SIGKILL which will be the last thing we can do by which > we might lost many things (e.g., unflushed caches in block layer) but > we've tried our best. For migration, it'll be (1) cancel (not > pthread_cancel, but cancel the migration to trigger shutdown() on fds > or whatever) (2) join thread (3) finalize/cleanup data structures. Right so the question I think is just whether it's worth adding the join at the moment which turns us from a small-risk of crash to small-risk of hang. > IMHO SIGKILL is the real de-facto solution for all these issues to me. > And even if with SIGKILL we can still hang.... but I'll assume those > hangs will be kernel problems not QEMU since SIGKILL should be > designed to kill a process without a question. Dave > Thanks, >=20 > --=20 > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK