From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rusty Russell Subject: Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine Date: Wed, 12 Nov 2008 14:09:49 +1030 Message-ID: <200811112256.58467.rusty@rustcorp.com.au> References: <200811102355.42389.rjw@sisk.pl> <20081111105214.GA15645@elte.hu> Mime-Version: 1.0 Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <20081111105214.GA15645-X9Un+BFzKDI@public.gmane.org> Content-Disposition: inline Sender: kernel-testers-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Ingo Molnar Cc: "Rafael J. Wysocki" , Heiko Carstens , Linux Kernel Mailing List , Kernel Testers List , Vegard Nossum , Peter Zijlstra , Oleg Nesterov , Dmitry Adamushko , Andrew Morton T24gVHVlc2RheSAxMSBOb3ZlbWJlciAyMDA4IDIxOjIyOjE0IEluZ28gTW9sbmFyIHdyb3RlOgo+ ICogUmFmYWVsIEouIFd5c29ja2kgPHJqd0BzaXNrLnBsPiB3cm90ZToKPiA+IFNvLCBpdCBldmlk ZW50bHkgZmFpbHMgd2hpbGUgcmUtZW5hYmxpbmcgdGhlIG5vbi1ib290IENQVSBhbmQgbm90Cj4g PiBkdXJpbmcgZGlzYWJsaW5nIGl0IGFzIEkgdGhvdWdodCBiZWZvcmUuCgooUmVzZW5kLCBkdWUg dG8gSFRNTCB2ZXJzaW9uIHByZXZpb3VzbHkpCgpCdXQgd2hhdCBpcyBjYWxsaW5nIHN0b3BfbWFj aGluZSBpbiB0aGF0IHBhdGg/CgpUaGVyZSAqaXMqIGEgcmFjZSwgYnV0IEkgZG9uJ3QgdGhpbmsg aXQgY291bGQgY2F1c2UgdGhpcyAod2Ugc2hvdWxkIG1ha2UgYQpjb3B5IG9mIGFjdGl2ZS5mbnJl dCBpbnNpZGUgdGhlIGxvY2sgYmVmb3JlIHJldHVybmluZyBpdCkuCgpUd28gcGF0Y2hlczogb25l IGZpeGVzIHRoYXQgcmFjZSwgdGhlIG5leHQgYWRkcyBkZWJ1Z2dpbmcgc3Bldy4KCnN0b3BfbWFj aGluZTogZml4IHJhY2Ugd2l0aCByZXR1cm4gdmFsdWUKCldlIHNob3VsZCBub3QgYWNjZXNzIGFj dGl2ZS5mbnJldCBvdXRzaWRlIHRoZSBsb2NrOyBpbiB0aGVvcnkgdGhlIG5leHQKc3RvcF9tYWNo aW5lIGNvdWxkIG92ZXJ3cml0ZSBpdC4KClNpZ25lZC1vZmYtYnk6IFJ1c3R5IFJ1c3NlbGwgPHJ1 c3R5QHJ1c3Rjb3JwLmNvbS5hdT4KLS0tCiBrZXJuZWwvc3RvcF9tYWNoaW5lLmMgfCAgICA1ICsr Ky0tCiAxIGZpbGUgY2hhbmdlZCwgMyBpbnNlcnRpb25zKCspLCAyIGRlbGV0aW9ucygtKQoKZGlm ZiAtciBkN2M5YTE1ZGE2MTUga2VybmVsL3N0b3BfbWFjaGluZS5jCi0tLSBhL2tlcm5lbC9zdG9w X21hY2hpbmUuYwlNb24gTm92IDEwIDA5OjQ3OjQ1IDIwMDggKzExMDAKKysrIGIva2VybmVsL3N0 b3BfbWFjaGluZS5jCVR1ZSBOb3YgMTEgMjM6MTk6NDcgMjAwOCArMTAzMApAQCAtMTEyLDcgKzEx Miw3IEBACiBpbnQgX19zdG9wX21hY2hpbmUoaW50ICgqZm4pKHZvaWQgKiksIHZvaWQgKmRhdGEs IGNvbnN0IGNwdW1hc2tfdCAqY3B1cykKIHsKIAlzdHJ1Y3Qgd29ya19zdHJ1Y3QgKnNtX3dvcms7 Ci0JaW50IGk7CisJaW50IGksIHJldDsKIAogCS8qIFNldCB1cCBpbml0aWFsIHN0YXRlLiAqLwog CW11dGV4X2xvY2soJmxvY2spOwpAQCAtMTM3LDggKzEzNyw5IEBACiAJLyogVGhpcyB3aWxsIHJl bGVhc2UgdGhlIHRocmVhZCBvbiBvdXIgQ1BVLiAqLwogCXB1dF9jcHUoKTsKIAlmbHVzaF93b3Jr cXVldWUoc3RvcF9tYWNoaW5lX3dxKTsKKwlyZXQgPSBhY3RpdmUuZm5yZXQ7CiAJbXV0ZXhfdW5s b2NrKCZsb2NrKTsKLQlyZXR1cm4gYWN0aXZlLmZucmV0OworCXJldHVybiByZXQ7CiB9CiAKIGlu dCBzdG9wX21hY2hpbmUoaW50ICgqZm4pKHZvaWQgKiksIHZvaWQgKmRhdGEsIGNvbnN0IGNwdW1h c2tfdCAqY3B1cykKPT09CmRpZmYgLXIgZmU3ZGQzOWIxY2ZmIGtlcm5lbC9zdG9wX21hY2hpbmUu YwotLS0gYS9rZXJuZWwvc3RvcF9tYWNoaW5lLmMJV2VkIE5vdiAxMiAxNDowNzoxOCAyMDA4ICsx MDMwCisrKyBiL2tlcm5lbC9zdG9wX21hY2hpbmUuYwlXZWQgTm92IDEyIDE0OjA5OjA4IDIwMDgg KzEwMzAKQEAgLTg5LDYgKzg5LDggQEAKIAkJCWNhc2UgU1RPUE1BQ0hJTkVfUlVOOgogCQkJCS8q IE9uIG11bHRpcGxlIENQVXMgb25seSBhIHNpbmdsZSBlcnJvciBjb2RlCiAJCQkJICogaXMgbmVl ZGVkIHRvIHRlbGwgdGhhdCBzb21ldGhpbmcgZmFpbGVkLiAqLworCQkJCXByaW50aygic3RvcF9t YWNoaW5lOiAlaSBydW5uaW5nICVwXG4iLAorCQkJCSAgICAgICBzbXBfcHJvY2Vzc29yX2lkKCks IHNtZGF0YS0+Zm4pOwogCQkJCWVyciA9IHNtZGF0YS0+Zm4oc21kYXRhLT5kYXRhKTsKIAkJCQlp ZiAoZXJyKQogCQkJCQlzbWRhdGEtPmZucmV0ID0gZXJyOwpAQCAtMTA2LDYgKzEwOCw3IEBACiAv KiBDYWxsYmFjayBmb3IgQ1BVcyB3aGljaCBhcmVuJ3Qgc3VwcG9zZWQgdG8gZG8gYW55dGhpbmcu ICovCiBzdGF0aWMgaW50IGNoaWxsKHZvaWQgKnVudXNlZCkKIHsKKwlwcmludGsoInN0b3BfbWFj aGluZTogJWkgY2hpbGxpbmdcbiIsIHNtcF9wcm9jZXNzb3JfaWQoKSk7CiAJcmV0dXJuIDA7CiB9 CiAKQEAgLTEyNiwxNyArMTI5LDIzIEBACiAKIAlzZXRfc3RhdGUoU1RPUE1BQ0hJTkVfUFJFUEFS RSk7CiAKKwlwcmludGsoInN0b3BfbWFjaGluZTogcnVubmluZyBvbiAlaSBjcHVzOlxuIiwgbnVt X3RocmVhZHMpOworCWR1bXBfc3RhY2soKTsKKwogCS8qIFNjaGVkdWxlIHRoZSBzdG9wX2NwdSB3 b3JrIG9uIGFsbCBjcHVzOiBob2xkIHRoaXMgQ1BVIHNvIG9uZQogCSAqIGRvZXNuJ3QgaGl0IHRo aXMgQ1BVIHVudGlsIHdlJ3JlIHJlYWR5LiAqLwogCWdldF9jcHUoKTsKIAlmb3JfZWFjaF9vbmxp bmVfY3B1KGkpIHsKKwkJcHJpbnRrKCJzdG9wX21hY2hpbmU6IHNldHRpbmcgdXAgY3B1ICVpXG4i LCBpKTsKIAkJc21fd29yayA9IHBlcmNwdV9wdHIoc3RvcF9tYWNoaW5lX3dvcmssIGkpOwogCQlJ TklUX1dPUksoc21fd29yaywgc3RvcF9jcHUpOwogCQlxdWV1ZV93b3JrX29uKGksIHN0b3BfbWFj aGluZV93cSwgc21fd29yayk7CiAJfQogCS8qIFRoaXMgd2lsbCByZWxlYXNlIHRoZSB0aHJlYWQg b24gb3VyIENQVS4gKi8KKwlwcmludGsoInN0b3BfbWFjaGluZTogcmVsZWFzaW5nIENQVSAlaVxu Iiwgc21wX3Byb2Nlc3Nvcl9pZCgpKTsKIAlwdXRfY3B1KCk7CiAJZmx1c2hfd29ya3F1ZXVlKHN0 b3BfbWFjaGluZV93cSk7CisJcHJpbnRrKCJzdG9wX21hY2hpbmU6IGRvbmVcbiIpOwogCXJldCA9 IGFjdGl2ZS5mbnJldDsKIAltdXRleF91bmxvY2soJmxvY2spOwogCXJldHVybiByZXQ7CgAK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752770AbYKLDkP (ORCPT ); Tue, 11 Nov 2008 22:40:15 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751468AbYKLDkB (ORCPT ); Tue, 11 Nov 2008 22:40:01 -0500 Received: from ozlabs.org ([203.10.76.45]:44930 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751129AbYKLDkA (ORCPT ); Tue, 11 Nov 2008 22:40:00 -0500 From: Rusty Russell To: Ingo Molnar Subject: Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine Date: Wed, 12 Nov 2008 14:09:49 +1030 User-Agent: KMail/1.10.1 (Linux/2.6.27-7-generic; KDE/4.1.2; i686; ; ) Cc: "Rafael J. Wysocki" , Heiko Carstens , Linux Kernel Mailing List , Kernel Testers List , Vegard Nossum , Peter Zijlstra , Oleg Nesterov , Dmitry Adamushko , Andrew Morton References: <200811102355.42389.rjw@sisk.pl> <20081111105214.GA15645@elte.hu> In-Reply-To: <20081111105214.GA15645@elte.hu> MIME-Version: 1.0 Content-Disposition: inline Message-Id: <200811112256.58467.rusty@rustcorp.com.au> Content-Type: text/plain; charset="iso-8859-1" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by alpha id mAC3eadq024367 On Tuesday 11 November 2008 21:22:14 Ingo Molnar wrote:> * Rafael J. Wysocki wrote:> > So, it evidently fails while re-enabling the non-boot CPU and not> > during disabling it as I thought before. (Resend, due to HTML version previously) But what is calling stop_machine in that path? There *is* a race, but I don't think it could cause this (we should make acopy of active.fnret inside the lock before returning it). Two patches: one fixes that race, the next adds debugging spew. stop_machine: fix race with return value We should not access active.fnret outside the lock; in theory the nextstop_machine could overwrite it. Signed-off-by: Rusty Russell --- kernel/stop_machine.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -r d7c9a15da615 kernel/stop_machine.c--- a/kernel/stop_machine.c Mon Nov 10 09:47:45 2008 +1100+++ b/kernel/stop_machine.c Tue Nov 11 23:19:47 2008 +1030@@ -112,7 +112,7 @@ int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus) { struct work_struct *sm_work;- int i;+ int i, ret; /* Set up initial state. */ mutex_lock(&lock);@@ -137,8 +137,9 @@ /* This will release the thread on our CPU. */ put_cpu(); flush_workqueue(stop_machine_wq);+ ret = active.fnret; mutex_unlock(&lock);- return active.fnret;+ return ret; } int stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)===diff -r fe7dd39b1cff kernel/stop_machine.c--- a/kernel/stop_machine.c Wed Nov 12 14:07:18 2008 +1030+++ b/kernel/stop_machine.c Wed Nov 12 14:09:08 2008 +1030@@ -89,6 +89,8 @@ case STOPMACHINE_RUN: /* On multiple CPUs only a single error code * is needed to tell that something failed. */+ printk("stop_machine: %i running %p\n",+ smp_processor_id(), smdata->fn); err = smdata->fn(smdata->data); if (err) smdata->fnret = err;@@ -106,6 +108,7 @@ /* Callback for CPUs which aren't supposed to do anything. */ static int chill(void *unused) {+ printk("stop_machine: %i chilling\n", smp_processor_id()); return 0; } @@ -126,17 +129,23 @@ set_state(STOPMACHINE_PREPARE); + printk("stop_machine: running on %i cpus:\n", num_threads);+ dump_stack();+ /* Schedule the stop_cpu work on all cpus: hold this CPU so one * doesn't hit this CPU until we're ready. */ get_cpu(); for_each_online_cpu(i) {+ printk("stop_machine: setting up cpu %i\n", i); sm_work = percpu_ptr(stop_machine_work, i); INIT_WORK(sm_work, stop_cpu); queue_work_on(i, stop_machine_wq, sm_work); } /* This will release the thread on our CPU. */+ printk("stop_machine: releasing CPU %i\n", smp_processor_id()); put_cpu(); flush_workqueue(stop_machine_wq);+ printk("stop_machine: done\n"); ret = active.fnret; mutex_unlock(&lock); return ret;{.n++%ݶw{.n+{G{ayʇڙ,jfhz_(階ݢj"mG?&~iOzv^m ?I