From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752911AbdLENnw (ORCPT ); Tue, 5 Dec 2017 08:43:52 -0500 Received: from mga03.intel.com ([134.134.136.65]:33233 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752126AbdLENnu (ORCPT ); Tue, 5 Dec 2017 08:43:50 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,364,1508828400"; d="scan'208";a="184222558" Subject: Re: [alsa-devel] [PATCH v4 06/15] soundwire: Add IO transfer To: Vinod Koul Cc: ALSA , Charles Keepax , Sudheer Papothi , Takashi , Greg Kroah-Hartman , plai@codeaurora.org, LKML , patches.audio@intel.com, Mark , srinivas.kandagatla@linaro.org, Sagar Dharia , alan@linux.intel.com References: <1512122177-2889-1-git-send-email-vinod.koul@intel.com> <1512122177-2889-7-git-send-email-vinod.koul@intel.com> <40d00228-2891-6e25-4c6f-789518a0e2c2@linux.intel.com> <20171203170418.GN32417@localhost> <20171205063114.GL32417@localhost> From: Pierre-Louis Bossart Message-ID: Date: Tue, 5 Dec 2017 07:43:46 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20171205063114.GL32417@localhost> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/5/17 12:31 AM, Vinod Koul wrote: > On Sun, Dec 03, 2017 at 09:01:41PM -0600, Pierre-Louis Bossart wrote: >> On 12/3/17 11:04 AM, Vinod Koul wrote: >>> On Fri, Dec 01, 2017 at 05:27:31PM -0600, Pierre-Louis Bossart wrote: > > Sorry looks like I missed replying to this one earlier. > >>>>> +static inline int find_response_code(enum sdw_command_response resp) >>>>> +{ >>>>> + switch (resp) { >>>>> + case SDW_CMD_OK: >>>>> + return 0; >>>>> + >>>>> + case SDW_CMD_IGNORED: >>>>> + return -ENODATA; >>>>> + >>>>> + case SDW_CMD_TIMEOUT: >>>>> + return -ETIMEDOUT; >>>>> + >>>>> + default: >>>>> + return -EIO; >>>> >>>> the 'default' case will handle both SDW_CMD_FAIL (which is a bus event >>>> usually due to bus clash or parity issues) and SDW_CMD_FAIL_OTHER (which is >>>> an imp-def IP event). >>>> >>>> Do they really belong in the same basket? From a debug perspective there is >>>> quite a bit of information lost. >>> >>> at higher level the error handling is same. the information is not lost as >>> it is expected that you would log it at error source. >> >> I don't understand this. It's certainly not the same for me if you detect an >> electric problem or if the IP is in the weeds. Logging at the source is fine >> but this filtering prevents higher levels from doing anything different. > > The point is higher levels like here cant do much than bail out and complain. > > Can you point out what would be different behaviour in each of these cases? > >>>>> +static inline int do_transfer(struct sdw_bus *bus, struct sdw_msg *msg) >>>>> +{ >>>>> + int retry = bus->prop.err_threshold; >>>>> + enum sdw_command_response resp; >>>>> + int ret = 0, i; >>>>> + >>>>> + for (i = 0; i <= retry; i++) { >>>>> + resp = bus->ops->xfer_msg(bus, msg); >>>>> + ret = find_response_code(resp); >>>>> + >>>>> + /* if cmd is ok or ignored return */ >>>>> + if (ret == 0 || ret == -ENODATA) >>>> >>>> Can you document why you don't retry on a CMD_IGNORED? I know there was a >>>> reason, I just can't remember it. >>> >>> CMD_IGNORED can be okay on broadcast. User of this API can retry all they >>> want! >> >> So you retry if this is a CMD_FAILED but let higher levels retry for >> CMD_IGNORED, sorry I don't see the logic. > > Yes that is right. > > If I am doing a broadcast read, lets say for Device Id registers, why in the > world would I want to retry? CMD_IGNORED is a valid response and required to > stop enumeration cycle in that case. > > But if I am not expecting a CMD_IGNORED response, I can very well go ahead > and retry from caller. The context is with caller and they can choose to do > appropriate handling. > > And I have clarified this couple of times to you already, not sure how many > more times I would have to do that. Until you clarify what you are doing. There is ONE case where IGNORED is a valid answer (reading the Prepare not finished bits), and it should not only be documented but analyzed in more details. For a write an IGNORED is never OK. > >>>> Now that I think of it, the retry on TIMEOUT makes no sense to me. The retry >>>> was intended for bus-level issues, where maybe a single bit error causes an >>>> issue without consequences, but the TIMEOUT is a completely different beast, >>>> it's the master IP that doesn't answer really, a completely different case. >>> >>> well in those cases where you have blue wires, it actually helps :) >> >> Blue wires are not supposed to change electrical behavior. TIMEOUT is only >> an internal SOC level issue, so no I don't get how this helps. >> >> You have a retry count that is provided in the BIOS/firmware through disco >> properties and it's meant to bus errors. You are abusing the definitions. A >> command failed is supposed to be detected at the frame rate, which is >> typically 20us. a timeout is likely a 100s of ms value, so if you retry on >> top it's going to lock up the bus. > > The world is not perfect! A guy debugging setups needs all the help. I do > not see any reason for not to retry. Bus is anyway locked up while a > transfer is ongoing (we serialize transfers). > > Now if you feel this should be abhorred, I can change this for timeout. This TIMEOUT thing is your own definition, it's not part of the spec, so I don't see how it can be lumped together with spec-related parts. It's fine to keep a retry but please document what the expectations are for the TIMEOUT case. > >>>>> +enum sdw_command_response { >>>>> + SDW_CMD_OK = 0, >>>>> + SDW_CMD_IGNORED = 1, >>>>> + SDW_CMD_FAIL = 2, >>>>> + SDW_CMD_TIMEOUT = 4, >>>>> + SDW_CMD_FAIL_OTHER = 8, >>>> >>>> Humm, I can't recall if/why this is a mask? does it need to be? >>> >>> mask, not following! >>> >>> Taking a wild guess that you are asking about last error, which is for SW >>> errors like malloc fail etc... >> >> no, I was asking why this is declared as if it was used for a bitmask, why >> not 0,1,2,3,4? > > Oh okay, I think it was something to do with bits for errors, but don see it > helping so I can change it either way... Unless you use bit-wise operators and combined responses there is no reason to keep the current definitions.