linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Mack <zonque@gmail.com>
To: Takashi Iwai <tiwai@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>,
	Christof Meerwald <cmeerw@cmeerw.org>,
	"Artem S. Tashkinov" <t.artem@lycos.com>,
	Kernel development list <linux-kernel@vger.kernel.org>,
	USB list <linux-usb@vger.kernel.org>
Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website
Date: Thu, 08 Nov 2012 01:42:59 +0100	[thread overview]
Message-ID: <509B0013.6080508@gmail.com> (raw)
In-Reply-To: <s5hzk2tfcg8.wl%tiwai@suse.de>

On 07.11.2012 20:19, Takashi Iwai wrote:
> At Wed, 7 Nov 2012 12:34:43 -0500 (EST),
> Alan Stern wrote:
>>
>> On Mon, 5 Nov 2012, Christof Meerwald wrote:
>>
>>> BTW, I have been able to reproduce the problem on a completely
>>> different machine (also running Ubuntu 12.10, but different hardware).
>>> The important thing appears to be that the USB audio device is
>>> connected via a USB 2.0 hub (and then using the test code posted in
>>> http://pastebin.com/aHGe1S1X specifying the audio device as
>>> "plughw:Set" (or whatever it's called) seems to trigger the freeze).
>>
>> Christof: Thank you for that reference, it was a big help.  After
>> crashing my system many times I have tracked the problem, at least in
>> part.  The patch below should prevent your system from freezing.
>>
>>
>> Takashi: It turns out the the problem is triggered when the audio
>> subsystem calls snd_usb_endpoint_stop() with wait == 0 and then calls
>> snd_usb_endpoint_start().  Since the driver doesn't wait for the
>> outstanding URBs to finish, it tries to submit them again while they
>> are still active.
>>
>> Normally the USB core would realize this and fail the submission, but a
>> bug in ehci-hcd prevented this from happening.  (That bug is what the
>> patch below fixes.)  The URB gets added to the active list twice,
>> resulting in list corruption and an oops in interrupt context, which
>> freezes the system.
>>
>> The user program that triggers the problem basically looks like this:
>>
>>   snd_pcm_prepare(rec_pcm);
>>   snd_pcm_start(rec_pcm);
>>   snd_pcm_drop(rec_pcm);
>>
>>   snd_pcm_prepare(rec_pcm);
>>   snd_pcm_start(rec_pcm);
>>
>> The snd_pcm_drop call unlinks the URBs but does not wait for them to
>> finish.  Then the second snd_pcm_start call submits the URBs before
>> they have finished.


Thanks for investigating on this and to everyone who so quickyl tested
the provided patch. Seems like we got the right idea where the problem
really is.

However, the proposed patch seems wrong to me (see below).

>> What is the right solution for this problem?
> 
> How about the patch below?  (It's for 3.6, and won't be applied cleanly
> to 3.7, but easy to adapt.)
> 
> 
> Takashi
> 
> ---
> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
> index d9de667..38830e2 100644
> --- a/sound/usb/endpoint.c
> +++ b/sound/usb/endpoint.c
> @@ -35,6 +35,7 @@
>  
>  #define EP_FLAG_ACTIVATED	0
>  #define EP_FLAG_RUNNING		1
> +#define EP_FLAG_STOPPING	2
>  
>  /*
>   * snd_usb_endpoint is a model that abstracts everything related to an
> @@ -502,10 +503,19 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep)
>  	if (alive)
>  		snd_printk(KERN_ERR "timeout: still %d active urbs on EP #%x\n",
>  					alive, ep->ep_num);
> +	clear_bit(EP_FLAG_STOPPING, &ep->flags);
>  
>  	return 0;
>  }
>  
> +/* wait until urbs are really dropped */
> +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep)
> +{
> +	if (test_bit(EP_FLAG_STOPPING, &ep->flags))
> +		wait_clear_urbs(ep);
> +}
> +
> +
>  /*
>   * unlink active urbs.
>   */
> @@ -913,6 +923,8 @@ void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
>  
>  		if (wait)
>  			wait_clear_urbs(ep);
> +		else
> +			set_bit(EP_FLAG_STOPPING, &ep->flags);
>  	}
>  }
>  
> diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
> index cbbbdf2..c1540a4 100644
> --- a/sound/usb/endpoint.h
> +++ b/sound/usb/endpoint.h
> @@ -16,6 +16,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>  int  snd_usb_endpoint_start(struct snd_usb_endpoint *ep, int can_sleep);
>  void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
>  			   int force, int can_sleep, int wait);
> +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep);
>  int  snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
>  int  snd_usb_endpoint_deactivate(struct snd_usb_endpoint *ep);
>  void snd_usb_endpoint_free(struct list_head *head);
> diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
> index f782ce1..aee3ab0 100644
> --- a/sound/usb/pcm.c
> +++ b/sound/usb/pcm.c
> @@ -546,6 +546,11 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
>  	if (snd_BUG_ON(!subs->data_endpoint))
>  		return -EIO;
>  
> +	if (subs->sync_endpoint)
> +		snd_usb_endpoint_sync_stop(subs->sync_endpoint);
> +	if (subs->data_endpoint)
> +		snd_usb_endpoint_sync_stop(subs->data_endpoint);

We can't simply stop both endpoints in the prepare callback. The essence
of the new reference-counting system is that we can use endpoints from
multiple contexts, and the logic inside endpoint.c will care about when
to start up and take down the urbs. The idea here is that endoints can
be run for many purposes, and the new implementation that was added
allows capture endpoints to run purely as timing reference for playback.

This bug needs to be fixed in the ehci controller, or we need some other
solution in the snd-usb-audio driver. I'll do some test once I'm back
from ELC.


Daniel

  parent reply	other threads:[~2012-11-08  0:43 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-20 12:06 A reliable kernel panic (3.6.2) and system crash when visiting a particular website Artem S. Tashkinov
2012-10-20 16:27 ` Borislav Petkov
2012-10-20 17:41   ` Artem S. Tashkinov
2012-10-20 18:04     ` Borislav Petkov
2012-10-20 20:57       ` Artem S. Tashkinov
2012-10-20 22:00       ` Artem S. Tashkinov
2012-10-20 20:32     ` Pavel Machek
2012-10-20 22:58       ` Borislav Petkov
2012-10-20 23:15         ` Artem S. Tashkinov
2012-10-21  0:24           ` Borislav Petkov
2012-10-21  1:57             ` Artem S. Tashkinov
2012-10-21 11:08               ` Borislav Petkov
2012-10-21 11:59                 ` Artem S. Tashkinov
2012-10-21 12:03                   ` Daniel Mack
2012-10-21 12:30                     ` Artem S. Tashkinov
2012-10-21 14:21                       ` was: " Daniel Mack
2012-10-21 14:57                         ` Artem S. Tashkinov
2012-10-21 15:22                           ` Daniel Mack
2012-10-21 15:28                             ` Alan Stern
2012-10-21 12:12                   ` Daniel Mack
2012-10-21 15:23                   ` Re: Re: Re: " Alan Stern
2012-10-21 22:20                     ` A strange Linux 3.6 bug: corrupted page table Artem S. Tashkinov
2012-10-21 17:03                   ` Re: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website Borislav Petkov
2012-10-21 19:49                     ` Artem S. Tashkinov
2012-10-21 19:54                       ` Daniel Mack
2012-10-21 20:43                         ` Artem S. Tashkinov
2012-10-21 21:00                           ` Daniel Mack
2012-10-21 20:36                       ` Re: Re: Re: Re: " Borislav Petkov
2012-10-22 15:17                       ` Alan Stern
2012-10-22 15:30                         ` Daniel Mack
2012-10-22 15:54                           ` Alan Stern
2012-10-22 17:30                             ` Artem S. Tashkinov
2012-10-22 18:01                               ` Alan Stern
2012-10-21  2:19           ` Alan Stern
2012-10-21 10:34           ` Daniel Mack
2012-10-21 11:59             ` Daniel Mack
2012-11-03 14:10           ` Christof Meerwald
2012-11-03 14:16             ` Daniel Mack
2012-11-03 14:28               ` Sven-Haegar Koch
2012-11-05 19:13               ` Christof Meerwald
2012-11-07 17:34                 ` Alan Stern
2012-11-07 19:19                   ` Takashi Iwai
2012-11-07 20:37                     ` Alan Stern
2012-11-08  6:48                       ` Takashi Iwai
2012-11-07 20:46                     ` Christof Meerwald
2012-11-07 20:59                     ` Artem S. Tashkinov
2012-11-08  0:42                     ` Daniel Mack [this message]
2012-11-08  6:43                       ` Takashi Iwai
2012-11-08  7:31                         ` Daniel Mack
2012-11-08  8:09                           ` Takashi Iwai
2012-11-08 15:55                             ` Alan Stern
2012-11-08 15:58                               ` Takashi Iwai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=509B0013.6080508@gmail.com \
    --to=zonque@gmail.com \
    --cc=cmeerw@cmeerw.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    --cc=t.artem@lycos.com \
    --cc=tiwai@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).