All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dimitar Dimitrov <dinuxbg@gmail.com>
To: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-omap@vger.kernel.org, linux-fbdev@vger.kernel.org
Subject: Re: [RFC PATCH] OMAPDSS: DISPC: Fix IRQ unregister race
Date: Fri, 07 Sep 2012 14:42:11 +0000	[thread overview]
Message-ID: <201209071742.11735.dinuxbg@gmail.com> (raw)
In-Reply-To: <1346938614.2737.68.camel@deskari>

Hi,

On Thursday 06 September 2012 16:36:54 Tomi Valkeinen wrote:
> Hi,
> 
> On Sun, 2012-09-02 at 22:12 +0300, Dimitar Dimitrov wrote:
> > Very rare kernel crashes are reported on a custom OMAP4 board. Kernel
> > panics due to corrupted completion structure while executing
> > 
> > dispc_irq_wait_handler(). Excerpt from kernel log:
> >   Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP
> >   Unable to handle kernel paging request at virtual address 00400130
> >   ...
> >   PC is at 0xebf205bc
> >   LR is at __wake_up_common+0x54/0x94
> >   ...
> >   (__wake_up_common+0x0/0x94)
> >   (complete+0x0/0x60)
> >   (dispc_irq_wait_handler.36902+0x0/0x14)
> >   (omap_dispc_irq_handler+0x0/0x354)
> >   (handle_irq_event_percpu+0x0/0x188)
> >   (handle_irq_event+0x0/0x64)
> >   (handle_fasteoi_irq+0x0/0x10c)
> >   (generic_handle_irq+0x0/0x48)
> >   (asm_do_IRQ+0x0/0xc0)
> > 
> > DISPC IRQ executes callbacks with dispc.irq_lock released. Hence
> > unregister_isr() and DISPC IRQ might be running in parallel on different
> > CPUs. So there is a chance that a callback is executed even though it
> > has been unregistered. As omap_dispc_wait_for_irq_timeout() declares a
> > completion on stack, the dispc_irq_wait_handler() callback might try to
> > access a completion structure that is invalid. This leads to crashes and
> > hangs.
> > 
> > Solution is to divide unregister calls into two sets:
> >   1. Non-strict unregistering of callbacks. Callbacks could safely be
> >   
> >      executed after unregistering them. This is the case with unregister
> >      calls from the IRQ handler itself.
> >   
> >   2. Strict (synchronized) unregistering. Callbacks are not allowed
> >   
> >      after unregistering. This is the case with completion waiting.
> > 
> > The above solution should satisfy one of the original intentions of the
> > driver: callbacks should be able to unregister themselves.
> 
> I think it'd be better to create a new function for the nosync version,
> and keep the old name for the sync version. The reason for this is to
> minimize the amount of changes, as I think this one needs to be applied
> to stable kernel trees also.
My intention was to force all callers to pick sides. In case of rebase issues 
we get link errors instead of rare and subtle run-time races. Still, if you 
think we should leave the old name untouched then I'll change my 
patch.

> 
> Also, I think we need similar one for dsi.c, as it has the same kind of
> irq handling. But with a quick glance only sync version is needed there.
Thanks, I missed that. I'll try to fix and send again.

> 
> However, I'm not quite sure about this approach. The fix makes sense,
> but it makes me think if the irq handling is designed the wrong way.
> 
> While debugging and fixing this, did you think some other irq handling
> approach would be saner?
I tried but could not come up with better approach. The main difficulty is 
that there are two contradicting requirements:
 1. Some callbacks unregister other callbacks, including themselves, from 
DISPC IRQ context.
 2. Some functions expect that once a callback is unregistered it is never 
executed.

Hence it is natural to split callers into two sets. I'm open for suggestions.
> 
>  Tomi

Thanks,
Dimitar

WARNING: multiple messages have this Message-ID (diff)
From: Dimitar Dimitrov <dinuxbg@gmail.com>
To: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-omap@vger.kernel.org, linux-fbdev@vger.kernel.org
Subject: Re: [RFC PATCH] OMAPDSS: DISPC: Fix IRQ unregister race
Date: Fri, 7 Sep 2012 17:42:11 +0300	[thread overview]
Message-ID: <201209071742.11735.dinuxbg@gmail.com> (raw)
In-Reply-To: <1346938614.2737.68.camel@deskari>

Hi,

On Thursday 06 September 2012 16:36:54 Tomi Valkeinen wrote:
> Hi,
> 
> On Sun, 2012-09-02 at 22:12 +0300, Dimitar Dimitrov wrote:
> > Very rare kernel crashes are reported on a custom OMAP4 board. Kernel
> > panics due to corrupted completion structure while executing
> > 
> > dispc_irq_wait_handler(). Excerpt from kernel log:
> >   Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP
> >   Unable to handle kernel paging request at virtual address 00400130
> >   ...
> >   PC is at 0xebf205bc
> >   LR is at __wake_up_common+0x54/0x94
> >   ...
> >   (__wake_up_common+0x0/0x94)
> >   (complete+0x0/0x60)
> >   (dispc_irq_wait_handler.36902+0x0/0x14)
> >   (omap_dispc_irq_handler+0x0/0x354)
> >   (handle_irq_event_percpu+0x0/0x188)
> >   (handle_irq_event+0x0/0x64)
> >   (handle_fasteoi_irq+0x0/0x10c)
> >   (generic_handle_irq+0x0/0x48)
> >   (asm_do_IRQ+0x0/0xc0)
> > 
> > DISPC IRQ executes callbacks with dispc.irq_lock released. Hence
> > unregister_isr() and DISPC IRQ might be running in parallel on different
> > CPUs. So there is a chance that a callback is executed even though it
> > has been unregistered. As omap_dispc_wait_for_irq_timeout() declares a
> > completion on stack, the dispc_irq_wait_handler() callback might try to
> > access a completion structure that is invalid. This leads to crashes and
> > hangs.
> > 
> > Solution is to divide unregister calls into two sets:
> >   1. Non-strict unregistering of callbacks. Callbacks could safely be
> >   
> >      executed after unregistering them. This is the case with unregister
> >      calls from the IRQ handler itself.
> >   
> >   2. Strict (synchronized) unregistering. Callbacks are not allowed
> >   
> >      after unregistering. This is the case with completion waiting.
> > 
> > The above solution should satisfy one of the original intentions of the
> > driver: callbacks should be able to unregister themselves.
> 
> I think it'd be better to create a new function for the nosync version,
> and keep the old name for the sync version. The reason for this is to
> minimize the amount of changes, as I think this one needs to be applied
> to stable kernel trees also.
My intention was to force all callers to pick sides. In case of rebase issues 
we get link errors instead of rare and subtle run-time races. Still, if you 
think we should leave the old name untouched then I'll change my 
patch.

> 
> Also, I think we need similar one for dsi.c, as it has the same kind of
> irq handling. But with a quick glance only sync version is needed there.
Thanks, I missed that. I'll try to fix and send again.

> 
> However, I'm not quite sure about this approach. The fix makes sense,
> but it makes me think if the irq handling is designed the wrong way.
> 
> While debugging and fixing this, did you think some other irq handling
> approach would be saner?
I tried but could not come up with better approach. The main difficulty is 
that there are two contradicting requirements:
 1. Some callbacks unregister other callbacks, including themselves, from 
DISPC IRQ context.
 2. Some functions expect that once a callback is unregistered it is never 
executed.

Hence it is natural to split callers into two sets. I'm open for suggestions.
> 
>  Tomi

Thanks,
Dimitar

  reply	other threads:[~2012-09-07 14:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-02 19:12 [RFC PATCH] OMAPDSS: DISPC: Fix IRQ unregister race Dimitar Dimitrov
2012-09-02 19:12 ` Dimitar Dimitrov
2012-09-06 13:36 ` Tomi Valkeinen
2012-09-06 13:36   ` Tomi Valkeinen
2012-09-07 14:42   ` Dimitar Dimitrov [this message]
2012-09-07 14:42     ` Dimitar Dimitrov
2012-09-08 15:05     ` [RFC PATCH v2] OMAPDSS: " Dimitar Dimitrov
2012-09-08 15:05       ` Dimitar Dimitrov
2012-09-10  7:49       ` Tomi Valkeinen
2012-09-10  7:49         ` Tomi Valkeinen
2012-09-10 19:19         ` Dimitar Dimitrov
2012-09-10 19:19           ` Dimitar Dimitrov
2012-09-10 19:34           ` [PATCH v3] " Dimitar Dimitrov
2012-09-10 19:34             ` Dimitar Dimitrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201209071742.11735.dinuxbg@gmail.com \
    --to=dinuxbg@gmail.com \
    --cc=linux-fbdev@vger.kernel.org \
    --cc=linux-omap@vger.kernel.org \
    --cc=tomi.valkeinen@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.