All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bill Pringlemeir <bpringlemeir@nbsps.com>
To: "Wiedemer, Thorsten (Lawo AG)" <Thorsten.Wiedemer@lawo.com>
Cc: Richard Weinberger <richard@nod.at>,
	"linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>
Subject: Re: AW: AW: AW: UBI leb_write_unlock NULL pointer Oops (continuation)
Date: Fri, 21 Feb 2014 12:53:28 -0500	[thread overview]
Message-ID: <877g8ojqsn.fsf@nbsps.com> (raw)
In-Reply-To: <D7B1B5F4F3F27A4CB073BF422331203F2A250B54E6@Exchange1.lawo.de> (Thorsten Wiedemer's message of "Fri, 21 Feb 2014 09:55:28 +0100")

On 21 Feb 2014, Thorsten.Wiedemer@lawo.com wrote:

> Here I have the last parts of the two hopefully "valuable" traces. The
> pieces show what happens between the leb_write_lock() und the
> leb_write unlock() of the process which triggers the oops.  If this is
> not enough, I can provide also more ...  Take care of the PIDs, there
> are several processes with the same name running.  Boths traces did
> not result from identical test cases, so there are some different
> processes running.

> The part of the first trace fits in one paste:

> http://pastebin.com/TL3yNVcw

I am not sure if this is a complete trace?  I don't understand why we
would start off with 'leb_write_unlock'...  However,

$ grep -E 'leb|ltree' ubi.crash.raw # slight edits
  sync-7493    0.... 1348758715us : leb_write_unlock <-ubi_eba_write_leb
  sync-7493    0.... 1348758719us : add_preempt_count <-leb_write_unlock
  sync-7493    0...1 1348758724us : up_write <-leb_write_unlock
  sync-7493    0...1 1348758739us : kfree <-leb_write_unlock
  sync-7493    0...1 1348758746us : sub_preempt_count <-leb_write_unlock
  sync-7493    0.... 1348758954us : ubifs_leb_write <-ubifs_wbuf_sync_nolock
  sync-7493    0.... 1348758960us : ubi_leb_write <-ubifs_leb_write
  sync-7493    0.... 1348758965us : ubi_eba_write_leb <-ubi_leb_write
  sync-7493    0.... 1348758969us : leb_write_lock <-ubi_eba_write_leb
  sync-7493    0.... 1348758973us : ltree_add_entry <-leb_write_lock
  sync-7493    0.... 1348758977us : kmem_cache_alloc_trace <-ltree_add_entry
  sync-7493    0.... 1348758983us : add_preempt_count <-ltree_add_entry
  sync-7493    0...1 1348758989us : sub_preempt_count <-ltree_add_entry
  sync-7493    0.... 1348758994us : kfree <-ltree_add_entry
  sync-7493    0.... 1348758998us : down_write <-leb_write_lock
  sync-7493    0.... 1348759010us : ubi_io_write <-ubi_eba_write_leb
< many reschedules, but  sync-7493 still active.>
   sync-7492    0.... 1348761397us : leb_write_unlock <-ubi_eba_write_leb
   sync-7492    0.... 1348761400us : add_preempt_count <-leb_write_unlock
   sync-7492    0...1 1348761406us : up_write <-leb_write_unlock
   sync-7492    0...1 1348761419us : kfree <-leb_write_unlock
   sync-7492    0...1 1348761425us : sub_preempt_count <-leb_write_unlock
   sync-7492    0.... 1348761580us : ubifs_leb_write <-ubifs_write_node
   sync-7492    0.... 1348761585us : ubi_leb_write <-ubifs_leb_write
   sync-7492    0.... 1348761591us : ubi_eba_write_leb <-ubi_leb_write
   sync-7492    0.... 1348761595us : leb_write_lock <-ubi_eba_write_leb
   sync-7492    0.... 1348761599us : ltree_add_entry <-leb_write_lock
   sync-7492    0.... 1348761603us : kmem_cache_alloc_trace <-ltree_add_entry
   sync-7492    0.... 1348761609us : add_preempt_count <-ltree_add_entry
   sync-7492    0...1 1348761615us : sub_preempt_count <-ltree_add_entry
   sync-7492    0.... 1348761619us : kfree <-ltree_add_entry
   sync-7492    0.... 1348761622us : down_write <-leb_write_lock
   sync-7492    0.... 1348761635us : ubi_io_write <-ubi_eba_write_leb
   sync-7492    0.... 1348771081us : leb_write_unlock <-ubi_eba_write_leb
   sync-7492    0.... 1348771084us : add_preempt_count <-leb_write_unlock
   sync-7492    0...1 1348771090us : up_write <-leb_write_unlock
   sync-7492    0...1 1348771102us : kfree <-leb_write_unlock
   sync-7492    0...1 1348771109us : sub_preempt_count <-leb_write_unlock
   sync-7492    0.... 1348771269us : ubifs_leb_write <-ubifs_write_node
   sync-7492    0.... 1348771275us : ubi_leb_write <-ubifs_leb_write
   sync-7492    0.... 1348771280us : ubi_eba_write_leb <-ubi_leb_write
   sync-7492    0.... 1348771285us : leb_write_lock <-ubi_eba_write_leb
   sync-7492    0.... 1348771289us : ltree_add_entry <-leb_write_lock
   sync-7492    0.... 1348771292us : kmem_cache_alloc_trace <-ltree_add_entry
   sync-7492    0.... 1348771299us : add_preempt_count <-ltree_add_entry
   sync-7492    0...1 1348771304us : sub_preempt_count <-ltree_add_entry
   sync-7492    0.... 1348771308us : kfree <-ltree_add_entry
   sync-7492    0.... 1348771311us : down_write <-leb_write_lock
   sync-7492    0.... 1348771324us : ubi_io_write <-ubi_eba_write_leb
< many reschedules and sync-7493 still in ubi_eba_write_leb>
  sync-7493    0.... 1348781410us : leb_write_unlock <-ubi_eba_write_leb
  sync-7493    0.... 1348781413us : add_preempt_count <-leb_write_unlock
  sync-7493    0...1 1348781418us : up_write <-leb_write_unlock
...

I am not sure that we can know which LEBs are involved.  However, I see
cases for a double free and other issues.  I think we need
'atomic_dec_and_test()' on the leb->users and to check this in the tree
lookup.  For instance, in 'leb_write_unlock()', the call to 'up_write()'
can cause a reschedule.  Say we enter 'leb_write_unlock()' with 'users =
2'.  The count will decrement to one and then we may reschedule in
'up_write'.  The 2nd UBI task may decrement users and inspect the count
and call 'kfree'.  Then we return to the original 'leb_write_unlock()',
we will inspect 'le->users' and it will be zero.

static void leb_write_unlock(struct ubi_device *ubi, int vol_id, int lnum)
{
	struct ubi_ltree_entry *le;

	spin_lock(&ubi->ltree_lock);
	le = ltree_lookup(ubi, vol_id, lnum);
	le->users -= 1;
	ubi_assert(le->users >= 0);
	up_write(&le->mutex);         /* can cause reschedule */
	if (le->users == 0) {
		rb_erase(&le->rb, &ubi->ltree);
		kfree(le);
	}
	spin_unlock(&ubi->ltree_lock);
}

Maybe there are better kernel design's/APIs that can wrap this.
However, I don't think the 'ubi_ltree_entry' is perfectly race free.

On the IMX series, the MTD driver does a 'read-modify-write' to support
sub-pages.  The driver can not write to a subpage at a time.  The MTD
driver is also interrupt driven and several reschedules happened during
ubi_io_write().

I don't think the spin_lock() will do anything on a UP system like the
ARM926's that have encountered this issue.

My compiler is reloading 'le->users' in the 'if' path.  It seems odd
that the 'wait_list' in the rwsem would always be NULL in this case.
However, the wake may have rescheduled the blocked task and the stack
was re-used and set to NULL/zero.  I think races with 'le->users' are an
issue; I don't know that it is this issue.  I think it is also possible
for the 'ltree_lookup()' to return an entry with 'le->user = 0', which
would indicate someone is about to use 'kfree()'.  I guess this is what
Artem suspected?

Fwiw,
Bill Pringlemeir.

  parent reply	other threads:[~2014-02-21 18:01 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-03  8:51 UBI leb_write_unlock NULL pointer Oops (continuation) Wiedemer, Thorsten (Lawo AG)
2014-02-03  9:38 ` Richard Weinberger
2014-02-03 10:31   ` AW: " Wiedemer, Thorsten (Lawo AG)
2014-02-03 11:02     ` Richard Weinberger
2014-02-03 12:51       ` AW: " Wiedemer, Thorsten (Lawo AG)
2014-02-03 13:56         ` Richard Weinberger
2014-02-04  7:22           ` Artem Bityutskiy
2014-02-04  7:46             ` Richard Weinberger
2014-02-04  7:54               ` Artem Bityutskiy
2014-02-04 15:45                 ` UBI leb_write_unlock NULL pointer Oops (continuation) on ARM926 Bill Pringlemeir
2014-02-04 15:45                   ` Bill Pringlemeir
2014-02-04 17:05                   ` Bill Pringlemeir
2014-02-04 17:05                     ` Bill Pringlemeir
2014-02-04 19:57                     ` Bill Pringlemeir
2014-02-04 19:57                       ` Bill Pringlemeir
2014-02-04 20:07                       ` Richard Weinberger
2014-02-04 20:07                         ` Richard Weinberger
2014-02-04 17:01           ` AW: UBI leb_write_unlock NULL pointer Oops (continuation) Wiedemer, Thorsten (Lawo AG)
2014-02-04 17:52             ` Wiedemer, Thorsten (Lawo AG)
2014-02-05  8:29             ` Richard Weinberger
2014-02-05 21:45               ` Bill Pringlemeir
2014-02-05 22:13                 ` Richard Weinberger
2014-02-05 22:23                   ` Bill Pringlemeir
2014-02-06 13:05                     ` AW: " Wiedemer, Thorsten (Lawo AG)
2014-02-06 16:00                       ` Bill Pringlemeir
2014-02-11  8:01               ` Wiedemer, Thorsten (Lawo AG)
2014-02-11 15:25                 ` Bill Pringlemeir
2014-02-12 15:18                   ` AW: " Wiedemer, Thorsten (Lawo AG)
2014-02-12 17:46                     ` Richard Weinberger
2014-02-12 18:11                     ` AW: AW: " Bill Pringlemeir
2014-02-12 18:21                       ` Bill Pringlemeir
2014-02-12 20:48                         ` Richard Weinberger
2014-02-14 17:11                           ` Bill Pringlemeir
2014-02-18  8:25                           ` Ziegler, Emanuel (Lawo AG)
2014-02-19 11:09                             ` Ziegler, Emanuel (Lawo AG)
2014-02-20 15:21                       ` AW: AW: AW: " Wiedemer, Thorsten (Lawo AG)
2014-02-20 17:26                         ` Bill Pringlemeir
2014-02-20 17:38                           ` Bill Pringlemeir
2014-02-21  8:55                         ` AW: AW: AW: " Wiedemer, Thorsten (Lawo AG)
2014-02-21  9:28                           ` Quiniou, Benoit (Lawo AG)
2014-02-21 17:53                           ` Bill Pringlemeir [this message]
2014-02-21 18:12                             ` Richard Weinberger
2014-02-21 19:45                               ` Bill Pringlemeir
2014-02-22  0:49                                 ` Bill Pringlemeir
2014-02-22  8:32                                   ` Richard Weinberger
2014-02-24 15:09                                     ` Bill Pringlemeir
2014-02-24 15:36                                       ` Richard Weinberger
2014-02-24 15:45                                         ` Bill Pringlemeir
2014-02-24 15:48                                           ` Bill Pringlemeir
2014-03-05 20:57                                             ` Richard Weinberger
2014-03-05 21:30                                               ` Bill Pringlemeir
2014-03-05 21:42                                                 ` Bill Pringlemeir
2014-03-05 23:11                                                   ` Richard Weinberger
2014-03-05 23:12                                                   ` Richard Weinberger
2014-02-04 19:49     ` Andrew Ruder
2014-02-05  8:39       ` AW: " Wiedemer, Thorsten (Lawo AG)
2014-02-05 20:13         ` Andrew Ruder
2015-10-16 12:17 ` Wojciech Nizinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877g8ojqsn.fsf@nbsps.com \
    --to=bpringlemeir@nbsps.com \
    --cc=Thorsten.Wiedemer@lawo.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.