From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <akpm@linux-foundation.org>
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
 [140.211.169.12]) by ozlabs.org (Postfix) with ESMTP id B20F22C031A
 for <linuxppc-dev@lists.ozlabs.org>; Thu,  4 Oct 2012 08:29:55 +1000 (EST)
Date: Wed, 3 Oct 2012 15:29:53 -0700
From: Andrew Morton <akpm@linux-foundation.org>
To: Alexandre Bounine <alexandre.bounine@idt.com>
Subject: Re: [PATCH 3/5] rapidio: run discovery as an asynchronous process
Message-Id: <20121003152953.79aecece.akpm@linux-foundation.org>
In-Reply-To: <1349291923-22860-4-git-send-email-alexandre.bounine@idt.com>
References: <1349291923-22860-1-git-send-email-alexandre.bounine@idt.com>
 <1349291923-22860-4-git-send-email-alexandre.bounine@idt.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
List-Id: Linux on PowerPC Developers Mail List <linuxppc-dev.lists.ozlabs.org>
List-Unsubscribe: <https://lists.ozlabs.org/options/linuxppc-dev>,
 <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>
List-Archive: <http://lists.ozlabs.org/pipermail/linuxppc-dev/>
List-Post: <mailto:linuxppc-dev@lists.ozlabs.org>
List-Help: <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>
List-Subscribe: <https://lists.ozlabs.org/listinfo/linuxppc-dev>,
 <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>

On Wed,  3 Oct 2012 15:18:41 -0400
Alexandre Bounine <alexandre.bounine@idt.com> wrote:

> Modify mport initialization routine to run the RapidIO discovery process
> asynchronously. This allows to have an arbitrary order of enumerating and
> discovering ports in systems with multiple RapidIO controllers without
> creating a deadlock situation if enumerator port is registered after a
> discovering one.
> 
> Making netID matching to mportID ensures consistent net ID assignment in
> multiport RapidIO systems with asynchronous discovery process (global counter
> implementation is affected by race between threads).
> 
>
> ...
>
> +static void __devinit disc_work_handler(struct work_struct *_work)
> +{
> +	struct rio_disc_work *work = container_of(_work,
> +						  struct rio_disc_work, work);

There's a nice simple way to avoid such ugliness:

--- a/drivers/rapidio/rio.c~rapidio-run-discovery-as-an-asynchronous-process-fix
+++ a/drivers/rapidio/rio.c
@@ -1269,9 +1269,9 @@ struct rio_disc_work {
 
 static void __devinit disc_work_handler(struct work_struct *_work)
 {
-	struct rio_disc_work *work = container_of(_work,
-						  struct rio_disc_work, work);
+	struct rio_disc_work *work;
 
+	work = container_of(_work, struct rio_disc_work, work);
 	pr_debug("RIO: discovery work for mport %d %s\n",
 		 work->mport->id, work->mport->name);
 	rio_disc_mport(work->mport);
_

> +	pr_debug("RIO: discovery work for mport %d %s\n",
> +		 work->mport->id, work->mport->name);
> +	rio_disc_mport(work->mport);
> +
> +	kfree(work);
> +}
> +
>  int __devinit rio_init_mports(void)
>  {
>  	struct rio_mport *port;
> +	struct rio_disc_work *work;
> +	int no_disc = 0;
>  
>  	list_for_each_entry(port, &rio_mports, node) {
>  		if (port->host_deviceid >= 0)
>  			rio_enum_mport(port);
> -		else
> -			rio_disc_mport(port);
> +		else if (!no_disc) {
> +			if (!rio_wq) {
> +				rio_wq = alloc_workqueue("riodisc", 0, 0);
> +				if (!rio_wq) {
> +					pr_err("RIO: unable allocate rio_wq\n");
> +					no_disc = 1;
> +					continue;
> +				}
> +			}
> +
> +			work = kzalloc(sizeof *work, GFP_KERNEL);
> +			if (!work) {
> +				pr_err("RIO: no memory for work struct\n");
> +				no_disc = 1;
> +				continue;
> +			}
> +
> +			work->mport = port;
> +			INIT_WORK(&work->work, disc_work_handler);
> +			queue_work(rio_wq, &work->work);
> +		}
> +	}

I'm having a lot of trouble with `no_disc'.  afacit what it does is to
cease running async discovery for any remaining devices if the workqueue
allocation failed (vaguely reasonable) or if the allocation of a single
work item failed (incomprehensible).

But if we don't run discovery, the subsystem is permanently busted for
at least some devices, isn't it?

And this code is basically untestable unless the programmer does
deliberate fault injection, which makes it pretty much unmaintainable.

So...  if I haven't totally misunderstood, I suggest a rethink is in
order?

> +	if (rio_wq) {
> +		pr_debug("RIO: flush discovery workqueue\n");
> +		flush_workqueue(rio_wq);
> +		pr_debug("RIO: flush discovery workqueue finished\n");
> +		destroy_workqueue(rio_wq);
>  	}