public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	syzbot <syzbot+7937ba6a50bdd00fffdf@syzkaller.appspotmail.com>,
	syzkaller-bugs@googlegroups.com
Cc: linux-kernel@vger.kernel.org, Stephen Boyd <swboyd@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	wuchi <wuchi.zero@gmail.com>
Subject: Re: [PATCH] debugobjects: turn off debug_objects_enabled from debug_objects_oom()
Date: Wed, 07 Jun 2023 00:35:41 +0200	[thread overview]
Message-ID: <871qiokyma.ffs@tglx> (raw)
In-Reply-To: <1af29817-4698-c5ac-cf63-0dad289e740f@I-love.SAKURA.ne.jp>

Tetsuo!

On Mon, May 29 2023 at 23:39, Tetsuo Handa wrote:
> syzbot is reporting false positive ODEBUG message immediately after
> ODEBUG was disabled due to OOM.
>
>   [ 1062.309646][T22911] ODEBUG: Out of memory. ODEBUG disabled
>   [ 1062.886755][ T5171] ------------[ cut here ]------------
>   [ 1062.892770][ T5171] ODEBUG: assert_init not available (active state 0) object: ffffc900056afb20 object type: timer_list hint: process_timeout+0x0/0x40
>
> This race happened because debug_objects_oom() emitted OOM message but did
> not turn off debug_objects_enabled, and debug_print_object() did not check
> debug_objects_enabled when calling WARN().
>
>   CPU 0 [ T5171]                CPU 1 [T22911]
>   --------------                --------------
>   debug_object_assert_init() {
>     if (!debug_objects_enabled)
>       return;
>     db = get_bucket((unsigned long) addr); // Finds a bucket, but...
>                                 debug_objects_oom() {
>                                   pr_warn("Out of memory. ODEBUG disabled\n");
>                                   // all buckets get emptied here, and...
>                                   hlist_move_list(&db->list, &freelist);
>                                 }
>     lookup_object_or_alloc(addr, db, descr, false, true) {
>       lookup_object(addr, b) {
>         return NULL; // this bucket is already empty.
>       }
>       if (!descr->is_static_object || !descr->is_static_object(addr))
>         return ERR_PTR(-ENOENT);
>     }
>     if (!obj) { // obj == ERR_PTR(-ENOENT) because non-static object.
>        debug_objects_oom();
>        return;
>     }
>     debug_print_object(&o, "assert_init") {
>       // False positive due to not checking debug_objects_enabled.
>       WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) "
>            "object: %p object type: %s hint: %pS\n", ...);
>     }
>   }

The above is undecodable gibberish.

Something like this is completely sufficient:

  CPU 0 [ T5171]			CPU 1 [T22911]
  --------------                	--------------
  debug_object_assert_init() {
    db = get_bucket(addr);
					debug_objects_oom() {
                                  	  pr_warn("Out of memory. ODEBUG disabled\n");
                                  	  // all buckets get emptied here
                                	}
    lookup_object_or_alloc(addr, db, ...)
      // Due to OOM:
      return ERR_PTR(-ENOENT);
    ...

    // Emits assert_init message and warning
    debug_print_object(&o, "assert_init");
  }

And this:

> This race happened because debug_objects_oom() emitted OOM message but did
> not turn off debug_objects_enabled

is completely wrong. Why?

The place where debug_objects_enabled is set to 0 is way before
debug_objects_oom() is invoked. That place _cannot_ invoke
debug_objects_oom() because it holds a hash bucket lock.

There are exactly three places which invoke debug_objects_oom() and for
all three places the pattern is exactly the same:

      lock_bucket();
      obj = lookup_object_or_alloc();
      unlock_bucket();
      if (!obj)
         debug_objects_oom();

The place which clears debug_objects_enabled is unsurprisingly
lookup_object_or_alloc() itself, which _cannot_ invoke
debug_objects_oom() because it is invoked with the hash bucket lock
held. There is even a comment to that effect:

	/* Out of memory. Do the cleanup outside of the locked region */
	debug_objects_enabled = 0;
	return NULL;

So at the point where debug_objects_oom() is invoked
@debug_objects_enabled is already 0.

But you claim that this is required, right?

> @@ -466,6 +466,7 @@ static void debug_objects_oom(void)
>  	unsigned long flags;
>  	int i;
>  
> +	debug_objects_enabled = 0;
>  	pr_warn("Out of memory. ODEBUG disabled\n");

Q: What is setting a variable which is already 0 to 0 solving?
A: Absolutely nothing

Now this:

> @@ -502,10 +503,10 @@ static void debug_print_object(struct de
>  		void *hint = descr->debug_hint ?
>  			descr->debug_hint(obj->object) : NULL;
>  		limit++;
> -		WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) "
> -				 "object: %p object type: %s hint: %pS\n",
> -			msg, obj_states[obj->state], obj->astate,
> -			obj->object, descr->name, hint);
> +		WARN(debug_objects_enabled, KERN_ERR
> +		     "ODEBUG: %s %s (active state %u) object: %p object type: %s hint: %pS\n",
> +		     msg, obj_states[obj->state], obj->astate,
> +		     obj->object, descr->name, hint);
>  	}

Q: Why is this related to the WARN() itself?
A: It's not related at all

The obvious fix is:

--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -498,6 +498,14 @@ static void debug_print_object(struct de
 	const struct debug_obj_descr *descr = obj->descr;
 	static int limit;
 
+	/*
+	 * OOM handling is asynchronous for performance reasons. So the
+	 * call site might have raced with a concurrent OOM which cleared
+	 * the hash buckets.
+	 */
+	if (!debug_objects_enabled)
+		return;
+
 	if (limit < 5 && descr != descr_test) {
 		void *hint = descr->debug_hint ?
 			descr->debug_hint(obj->object) : NULL;

Along with a understandable changelog, no?

Thanks,

        tglx

  reply	other threads:[~2023-06-06 22:37 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-29 14:31 [syzbot] [kernel?] WARNING: ODEBUG bug in __mod_timer syzbot
2023-05-29 14:39 ` [PATCH] debugobjects: turn off debug_objects_enabled from debug_objects_oom() Tetsuo Handa
2023-06-06 22:35   ` Thomas Gleixner [this message]
2023-06-07 10:19     ` [PATCH v2] debugobjects: recheck debug_objects_enabled before reporting Tetsuo Handa
2023-06-07 12:20       ` [tip: core/debugobjects] debugobjects: Recheck " tip-bot2 for Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871qiokyma.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=peterz@infradead.org \
    --cc=swboyd@chromium.org \
    --cc=syzbot+7937ba6a50bdd00fffdf@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=wuchi.zero@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox