From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shreyansh Jain Subject: Re: [PATCH] eal: bus scan and probe never fail Date: Thu, 12 Oct 2017 11:09:20 +0530 Message-ID: References: <20170812102220.27773-1-shreyansh.jain@nxp.com> <2075457.Vvey9mxHue@xps> <10403057.Ll0Xg1E4J1@xps> <83422f57-4c0e-5806-c741-ed5ce10891b0@nxp.com> <83a3c6c6-8d50-8106-7c7f-9b5c8263ce96@nxp.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Cc: Don Provan , Jan Blunck , Thomas Monjalon , dev , Hemant Agrawal To: Aaron Conole Return-path: Received: from NAM02-BL2-obe.outbound.protection.outlook.com (mail-bl2nam02on0049.outbound.protection.outlook.com [104.47.38.49]) by dpdk.org (Postfix) with ESMTP id 2E49E23A for ; Thu, 12 Oct 2017 07:27:13 +0200 (CEST) In-Reply-To: Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hello Aaron, On Tuesday 10 October 2017 09:30 PM, Aaron Conole wrote: > Shreyansh Jain writes: > >> Hello Don, >> [snip] >>> >>> These practical problems confirm to me that the failure of a bus >>> scan is more of a strategic issue: when asking "which devices can >>> I use?", "none" is a perfectly valid answer that does not seem >>> like an error to me even when a failed bus scan is the reason for >>> that answer. >> >> I agree with this. >> >>> >>> From the application's point of view, the potential error here >>> is that the device it wants to use isn't available. I don't see that >>> either the init function or the probe function will have enough >>> information to understand that application-level problem, so >>> they should leave it to the application to detect it. >> >> I think I understand you comment but just want to cross check again: >> Scan or probe error should simply be ignored by EAL layer and let the >> application take stance when it detects that the device it was looking >> for is missing. Is my understanding correct? >> >> I am trying to come a conclusion so that this patch can either be >> modified or pushed as it is. If the above understanding is correct, I >> don't see any changes required in the patch. > > Does it make sense to introduce a way to query the results of the > various bus types for their status? That way we can give the relevant > information to the application if it wants, and make the bus scanning > code *always* succeed? This version shouldn't be an ABI breakage, > either (confirm?). > > half-baked below (not tested or suitable - just an example): > > --- > diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c > index a30a898..cd1ef1e 100644 > --- a/lib/librte_eal/common/eal_common_bus.c > +++ b/lib/librte_eal/common/eal_common_bus.c > @@ -38,9 +38,23 @@ > > #include "eal_private.h" > > +struct rte_bus_failure { > + struct rte_bus *bus; > + int err; > +}; > + > struct rte_bus_list rte_bus_list = > TAILQ_HEAD_INITIALIZER(rte_bus_list); > > +TAILQ_HEAD(rte_bus_scan_failure_list, rte_bus_failure); > +struct rte_bus_scan_failure_list rte_bus_scan_failure_list = > + TAILQ_HEAD_INITIALIZER(rte_bus_failure); > + > +TAILQ_HEAD(rte_bus_probe_failure_list, rte_bus_failure); > +struct rte_bus_probe_failure_list rte_bus_probe_failure_list = > + TAILQ_HEAD_INITIALIZER(rte_bus_failure); > + > + > void > rte_bus_register(struct rte_bus *bus) > { > @@ -64,6 +78,26 @@ rte_bus_unregister(struct rte_bus *bus) > RTE_LOG(DEBUG, EAL, "Unregistered [%s] bus.\n", bus->name); > } > > +static void > +rte_bus_append_failed_scan(struct rte_bus *bus, int ret) > +{ > + struct rte_bus_failure *f = malloc(sizeof(struct rte_bus_failure)); > + if (!f) abort(); > + f->bus = bus; > + f->ret = ret; > + TAILQ_INSERT_TAIL(&rte_bus_scan_failure_list, f, next); > +} > + > +static void > +rte_bus_append_failed_scan(struct rte_bus *bus, int ret) > +{ > + struct rte_bus_failure *f = malloc(sizeof(struct rte_bus_failure)); > + if (!f) abort(); > + f->bus = bus; > + f->ret = ret; > + TAILQ_INSERT_TAIL(&rte_bus_probe_failure_list, f, next); > +} > + > /* Scan all the buses for registered devices */ > int > rte_bus_scan(void) > @@ -76,13 +110,33 @@ rte_bus_scan(void) > if (ret) { > RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n", > bus->name); > - return ret; > + rte_bus_append_failed_scan(bus, ret); > } > } > > return 0; > } > > +/* Seek through scan failures */ > +void > +rte_bus_scan_errors(rte_bus_error_callback cb) > +{ > + struct rte_bus_failure *f = NULL; > + TAILQ_FOREACH(f, &rte_bus_scan_failure_list, next) { > + cb(f->bus, f->ret); > + } > +} > + > +/* Seek through probe failures */ > +void > +rte_bus_probe_errors(rte_bus_error_callback cb) > +{ > + struct rte_bus_failure *f = NULL; > + TAILQ_FOREACH(f, &rte_bus_probe_failure_list, next) { > + cb(f->bus, f->ret); > + } > +} > + > /* Probe all devices of all buses */ > int > rte_bus_probe(void) > @@ -100,7 +154,7 @@ rte_bus_probe(void) > if (ret) { > RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n", > bus->name); > - return ret; > + rte_bus_append_failed_probe(bus, ret); > } > } > > @@ -109,7 +163,7 @@ rte_bus_probe(void) > if (ret) { > RTE_LOG(ERR, EAL, "Bus (%s) probe failed.\n", > vbus->name); > - return ret; > + rte_bus_append_failed_probe(bus, ret); > } > } > > diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h > index 6fb0834..daddb28 100644 > --- a/lib/librte_eal/common/include/rte_bus.h > +++ b/lib/librte_eal/common/include/rte_bus.h > @@ -231,6 +231,20 @@ void rte_bus_register(struct rte_bus *bus); > */ > void rte_bus_unregister(struct rte_bus *bus); > > +typedef void (*rte_bus_error_callback)(struct rte_bus *bus, int err); > + > +/** > + * Search through all buses, invoking cb for each bus which reports scan > + * error. > + */ > +void rte_bus_scan_errors(rte_bus_error_callback cb); > + > +/** > + * Search through all buses, invoking cb for each bus which reports scan > + * error. > + */ > +void rte_bus_probe_errors(rte_bus_error_callback cb); > + > /** > * Scan all the buses. > * > I am assuming that that aim of this is to have a way so that application can query whether its device of interest is there or not. But, I think this (creating a list of scan errrors) would be overkill. Even if we were to create a list of errors from scan/probe, how would that help an application? Is there some specific use-case that you are hinting at? Application should worry about devices rather than how they are being detected (scan/probe etc). Application can use API like rte_eth_dev_get_port_by_name to query its specific device of interest. If the scan has failed, this API would be sufficient for the application to take counter-measures. Isn't that enough from a DPDK application perspective to move from init to I/O? I am not discounting that there might be some higher use-cases where this list might come of us - but I can't think of one right now and I can't comment on this proposal in absence of that understanding - sorry.