From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zachary Amsden Subject: Re: 2.6.35-rc1 regression with pvclock and smp guests Date: Fri, 30 Jul 2010 16:39:10 -1000 Message-ID: <4C538CCE.1010104@redhat.com> References: <4C4D4B8B.80006@amd.com> <4C4DDB00.50203@xutrox.com> <4C4F48D0.8090609@xutrox.com> <4C500872.1020809@redhat.com> <4C536F80.5090205@xutrox.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Avi Kivity , Glauber Costa To: Arjan Koers <0h61vkll2ly8@xutrox.com> Return-path: Received: from mx1.redhat.com ([209.132.183.28]:30926 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754477Ab0GaCjO (ORCPT ); Fri, 30 Jul 2010 22:39:14 -0400 In-Reply-To: <4C536F80.5090205@xutrox.com> Sender: kvm-owner@vger.kernel.org List-ID: On 07/30/2010 02:34 PM, Arjan Koers wrote: > On 2010-07-28 12:37, Avi Kivity wrote: > >> On 07/28/2010 12:00 AM, Arjan Koers wrote: >> >>> On 2010-07-26 20:59, Arjan Koers wrote: >>> >>> >>>> I ran into the same problem. 2.6.34.1 and 2.6.35-rc6 SMP guest >>>> kernels hang during boot. >>>> >>> It appears that last is way ahead of ret twice. >>> The kernel boots with this debug patch that makes the clock go >>> backwards if the difference is big: >>> >>> last = atomic64_read(&last_value); >>> do { >>> - if (ret< last) >>> - return last; >>> + if (ret< last) { >>> + if ( last - ret< 25000000 ) >>> + return last; >>> + else >>> + printk("pvclock backwards: ret = %llx; last = >>> %llx\n", ret, last); >>> + } >>> last = atomic64_cmpxchg(&last_value, last, ret); >>> } while (unlikely(last != ret)); >>> >>> >>> >>> [ 0.037122] Total of 2 processors activated (11198.08 BogoMIPS). >>> [ 0.037118] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106 >>> [ 0.040000] pvclock backwards: ret = 108373705fe2; last = 210aff61470a >>> >> Zaaaacchhhh?! >> >> > > The lists below show some debug data of the first 99 calls to > pvclock_clocksource_read since the kernel booted. The situation > after the 'do ... while (version != src->version)' loop is > displayed. > > Meaning of the columns: > - src pointer > - shadow.tsc_timestamp > - shadow.system_timestamp > - shadow.version > - native_read_tsc() > - delta = native_read_tsc() - shadow.tsc_timestamp > - offset = scale_delta(delta, shadow.tsc_to_nsec_mul, shadow.tsc_shift) > - ret = shadow.system_timestamp + offset > > Fields left out, because they were the same for all rows: > - shadow.tsc_to_nsec_mul: b6dc43b6 > - shadow.tsc_shift: ffffffff > - shadow.flags: 0 > > Debug log of guest after cold boot of virtual machine: > 1: ffff880001411c00 2107d5a4e b42c01d704c6 8294 210d8d4b5 5b7a67 20abdc b42c01f7b0a2 > 2: ffff880001411c00 2107d5a4e b42c01d704c6 8294 210dc2b61 5ed113 21dd1b b42c01f8e1e1 > 3: ffff880001411c00 21cb0d4a8 b42c0632768f bb70 21cb10a00 3558 130d b42c0632899c > 4: ffff880001411c00 21cb0d4a8 b42c0632768f bb70 21cb11f17 4a6f 1a95 b42c06329124 > 5: ffff880001411c00 21cceaad2 b42c063d1e45 bbd8 21ccec522 1a50 965 b42c063d27aa > 6: ffff880001411c00 21cde0644 b42c06429a42 bc10 21ce25457 44e13 1899a b42c064423dc > 7: ffff880001411c00 21cf905c1 b42c064c3e76 bc46 21cfa182b 1126a 6201 b42c064ca077 > 8: ffff880001411c00 21d088194 b42c0651c601 bc7a 21d089592 13fe 723 b42c0651cd24 > 9: ffff880001411c00 21d1ad073 b42c06584fc3 bcde 21d1b135d 42ea 17e5 b42c065867a8 > 10: ffff880001411c00 21d2a3837 b42c065dd039 bd10 21d2a4825 fee 5b0 b42c065dd5e9 > 11: ffff880001411c00 21d38bab3 b42c0662fea6 bd42 21d38caa1 fee 5b0 b42c06630456 > 12: ffff880001411c00 21d47459b b42c06683029 bd78 21d475517 f7c 587 b42c066835b0 > 13: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57d70c 4a25 1a7a b42c066e1ad9 > 14: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57d8d6 4bef 1b1e b42c066e1b7d > 15: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57da22 4d3b 1b94 b42c066e1bf3 > 16: ffff880001411c00 21d578ce7 b42c066e005f bdb2 21d57fc5e 6f77 27ce b42c066e282d > 17: ffff880001411c00 21d67c77c b42c0673cc0a bde4 21d67d685 f09 55e b42c0673d168 > 18: ffff880001411c00 21d7625b2 b42c0678ed96 be16 21d763488 ed6 54c b42c0678f2e2 > 19: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa78b9 69d83 25cd5 b42c06a82ef7 > 20: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa7a3f 69f09 25d61 b42c06a82f83 > 21: ffff880001411c00 21df3db36 b42c06a5d222 be54 21dfa7f8b 6a455 25f45 b42c06a83167 > 22: ffff880001411c00 21e3a50ea b42c06befbb1 be58 21e3c1750 1c666 a249 b42c06bf9dfa > 23: ffff880001411c00 21e4bfe47 b42c06c54bc5 be92 21e4c4c61 4e1a 1be4 b42c06c567a9 > 24: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea4b224 20cb6 bb66 b42c06e4f922 > 25: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52748 281da e53c b42c06e522f8 > 26: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52907 28399 e5db b42c06e52397 > 27: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea52a76 28508 e65f b42c06e5241b > 28: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea5c86a 322fc 11ec9 b42c06e55c85 > 29: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea60e3a 368cc 137b7 b42c06e57573 > 30: ffff880001411c00 21ea2a56e b42c06e43dbc beca 21ea64dc8 3a85a 14e6a b42c06e58c26 > 31: ffff880001411c00 21ed8a003 b42c06f78496 bf02 21efda28b 250288 d37d2 b42c0704bc68 > 32: ffff880001411c00 21f0e9488 b42c070ac93f bf38 21f0eacdb 1853 8af b42c070ad1ee > 33: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230aeeac6 3e60 1646 b42c0d5636ed > 34: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230af06d0 5a6a 204a b42c0d5640f1 > 35: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b03f25 192bf 8fd6 b42c0d56b07d > 36: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b043c8 19762 917f b42c0d56b226 > 37: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b0526b 1a605 96b8 b42c0d56b75f > 38: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b05632 1a9cc 9812 b42c0d56b8b9 > 39: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b07eaa 1d244 a686 b42c0d56c72d > 40: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b094e9 1e883 ae78 b42c0d56cf1f > 41: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b09962 1ecfc b011 b42c0d56d0b8 > 42: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b10590 2592a d6b4 b42c0d56f75b > 43: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1090d 25ca7 d7f3 b42c0d56f89a > 44: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b10f99 26333 da49 b42c0d56faf0 > 45: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b11204 2659e db27 b42c0d56fbce > 46: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1217c 27516 e0ad b42c0d570154 > 47: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b1483f 29bd9 ee85 b42c0d570f2c > 48: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b14ba6 29f40 efbc b42c0d571063 > 49: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b15569 2a903 f338 b42c0d5713df > 50: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b250b3 3a44d 14cf8 b42c0d576d9f > 51: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b254a0 3a83a 14e5f b42c0d576f06 > 52: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b25bd8 3af72 150f3 b42c0d57719a > 53: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b25ec3 3b25d 151fd b42c0d5772a4 > 54: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b5fcab 75045 29cad b42c0d58bd54 > 55: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6013b 754d5 29e4e b42c0d58bef5 > 56: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6b86c 80c06 2dfbc b42c0d590063 > 57: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6bc41 80fdb 2e11a b42c0d5901c1 > 58: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6c4e5 8187f 2e430 b42c0d5904d7 > 59: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b6c776 81b10 2e51b b42c0d5905c2 > 60: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b7f97b 94d15 35266 b42c0d59730d > 61: ffff880001411c00 230aeac66 b42c0d5620a7 c100 230b864af 9b849 378b0 b42c0d599957 > 62: ffff880001411c00 23132e49d b42c0d855884 c16e 231599c3a 26b79d dd3ec b42c0d932c70 > 63: ffff880001411c00 23132e49d b42c0d855884 c16e 231599dbc 26b91f dd476 b42c0d932cfa > 64: ffff880001411c00 23132e49d b42c0d855884 c16e 231599f5f 26bac2 dd50c b42c0d932d90 > 65: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046a74 6771d 24f1e b42c0dd02b65 > 66: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046c53 678fc 24fca b42c0dd02c11 > 67: ffff880001411c00 231fdf357 b42c0dcddc47 c176 232046da0 67a49 25040 b42c0dd02c87 > 68: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f62a2d 184df 8ae2 b42c0e2680c9 > 69: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f63478 18f2a 8e8f b42c0e268476 > 70: ffff880001411c00 232f4a54e b42c0e25f5e7 c17c 232f63f61 19a13 9274 b42c0e26885b > 71: ffff880001511c00 20afec946 b42bffe0b604 130 1f890681eacdf 1f88e5d1fe399 b433ab005565 1685faae10b69 > Okay, I think I know what's going on and why Glauber's patch causes problems for you. It looks like your kernel is reading the kvmclock on the AP before it is initialized. Looking at the guest side of things, it seems entirely plausible this could happen. You did mention printk timing causes the bug to appear? Perhaps it is not just coincidental. Printk getting the time might very well call back into the timer code before the clock is initialized, and you've got tons of stuff in cpu_init and friends that are likely to want to printk all kinds of bootup messages. If this were in fact the case, the cmpxchg that was added by Glauber's patch could leap your clock forward to some very uninitialized random value and then you could end up stuck in a timeout loop for days, as you are seeing. Can you try very simply disabling printk timing to see if that might be the source of the bug? In the meantime, what kernel version do you have in the guests? Zach