From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A90B6E81E1F for ; Fri, 6 Oct 2023 17:57:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233079AbjJFR53 (ORCPT ); Fri, 6 Oct 2023 13:57:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233048AbjJFR52 (ORCPT ); Fri, 6 Oct 2023 13:57:28 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5872ACE for ; Fri, 6 Oct 2023 10:56:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696615001; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cqRG8+3axgVFCIH20i/Xs4I4NkkMZLq9UXCr/6OJE2Q=; b=iz0fKxIE7EmaV1U+iRltCVn1tJ8gbIECqIuGHnRmP7QsPUXbM44O26kzJjdqOq5iahokvM FD1GtwG5Ca3x2auJ0LTWFIcHbBWUrVvjXC3rEDFHbpqIVaxvs3b9Yff+wxPLfHYMiJ2yS5 8FWsgq/ua1rjGvaSpwk2ErKzxzDIwzQ= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-446-m5U-XtViM0qiDN0EgKq0Ww-1; Fri, 06 Oct 2023 13:56:40 -0400 X-MC-Unique: m5U-XtViM0qiDN0EgKq0Ww-1 Received: by mail-ej1-f72.google.com with SMTP id a640c23a62f3a-997c891a88dso205882866b.3 for ; Fri, 06 Oct 2023 10:56:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696614999; x=1697219799; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cqRG8+3axgVFCIH20i/Xs4I4NkkMZLq9UXCr/6OJE2Q=; b=GOf8wWMfpnGTUNrGfho8ihJvqnBdmCN4SAeSw5KSp0WrmmE7wx//EQnarq5bu1d9qI IlO2A7SM0NKpJqMjgARu0GC5gNHvWMxgQAw2CyL23YWDJg3ZJbk7HpGHJ3/hOqjCfgR0 JU2QKVXBpsdQjamWuAJbkO7//a0zj3DluQvpBaD44OF9MZGUDCFngaxIuqcBQTWMcIe1 DfodhsJ4SpRBouwSJBuJyJInpkdwrxpxzpNcNOuMBfvNGb6lLXrpAF8BixJkqViV4Ekg CUVtGcrpzRpKWhtpl0GRxeohAz/HKjYlLeFsfkZ4YNT1S4QgigIvgRrbxOp6NOqOUBpM RZwQ== X-Gm-Message-State: AOJu0YzK5NvnwX5iqIXlDhsASM4s6CAYM6j1Vtn9Ja+CkMnSgTjnGpU+ +t3EpMRvxgSH6Hu0+rb9I5MGJ3eg7BkCaUEL3treCtdwDnb+1RZqUltc+vECuo1kVwtOx7Mv9W8 zcaRPI+6OEha4nKR5nkHN7ts= X-Received: by 2002:a17:906:c801:b0:9b2:8df4:f2cd with SMTP id cx1-20020a170906c80100b009b28df4f2cdmr7977120ejb.18.1696614998984; Fri, 06 Oct 2023 10:56:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG1doNqYhov/kWbF55hQnoYHix7L4MHbSxLXjIUL6uNynueeBrlYLKRfPxN9uK7JtMpxZahlQ== X-Received: by 2002:a17:906:c801:b0:9b2:8df4:f2cd with SMTP id cx1-20020a170906c80100b009b28df4f2cdmr7977106ejb.18.1696614998663; Fri, 06 Oct 2023 10:56:38 -0700 (PDT) Received: from ?IPV6:2001:1c00:c32:7800:5bfa:a036:83f0:f9ec? (2001-1c00-0c32-7800-5bfa-a036-83f0-f9ec.cable.dynamic.v6.ziggo.nl. [2001:1c00:c32:7800:5bfa:a036:83f0:f9ec]) by smtp.gmail.com with ESMTPSA id v24-20020a1709067d9800b009b2c5363ebasm3195377ejo.26.2023.10.06.10.56.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 06 Oct 2023 10:56:38 -0700 (PDT) Message-ID: Date: Fri, 6 Oct 2023 19:56:37 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH 1/2] HID: logitech-hidpp: Avoid hidpp_connect_event() running while probe() restarts IO Content-Language: en-US, nl From: Hans de Goede To: Benjamin Tissoires Cc: =?UTF-8?Q?Filipe_La=c3=adns?= , Bastien Nocera , Jiri Kosina , Benjamin Tissoires , linux-input@vger.kernel.org, stable@vger.kernel.org References: <20231006081858.17677-1-hdegoede@redhat.com> <20231006081858.17677-2-hdegoede@redhat.com> <686e8973-613b-2fb3-efd6-26f3dd21ed9d@redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-input@vger.kernel.org Hi Benjamin, On 10/6/23 19:24, Hans de Goede wrote: > Hi, > > On 10/6/23 18:28, Benjamin Tissoires wrote: >> On Oct 06 2023, Hans de Goede wrote: > > > >>>>> @@ -4207,36 +4208,39 @@ static void hidpp_connect_event(struct hidpp_device *hidpp) >>>>> return; >>>>> } >>>>> >>>>> + /* Avoid probe() restarting IO */ >>>>> + mutex_lock(&hidpp->io_mutex); >>>> >>>> I'd put a `__must_hold(&hidpp->io_mutex);` here, not changing any return >>>> path and forcing any caller to `hidpp_connect_event()` (which will >>>> eventually only be the work struct) to take the lock. >>>> >>>> This should simplify the patch by a lot and also ensure someone doesn't >>>> forget the `goto out_unlock`. >>> >>> Ok, I can add the __must_hold() here and make >>> delayed_Work_cb take the lock, but that would make it >>> impossible to implement patch 2/2 in a clean manner and >>> I do like patch 2/2 since it makes it clear that >>> hidpp_connect_event must only run from the workqueue >>> but I guess we could just add a comment for that >>> instead. >> >> In 2/2, just rename this function to __do_hidpp_connect_event(), and >> have hidpp_connect_event() being the worker, which takes the lock, and >> calls __do_hidpp_connect_event(). > > Ok, will do for v2. > > > >>>>> @@ -4519,6 +4526,9 @@ static int hidpp_probe(struct hid_device *hdev, const struct hid_device_id *id) >>>>> flush_work(&hidpp->work); >>>>> >>>>> if (will_restart) { >>>>> + /* Avoid hidpp_connect_event() running while restarting */ >>>>> + mutex_lock(&hidpp->io_mutex); >>>>> + >>>>> /* Reset the HID node state */ >>>>> hid_device_io_stop(hdev); >>>> >>>> That's the part that makes me raise an eyebrow. Because we lock, then >>>> release the semaphore to get it back later. Can this induce a dead lock? >>>> >>>> Can't we solve that same scenario without a mutex, but forcing either >>>> the workqueue to not run or to be finished at this point? >>> >>> I'm not sure what you are worried about after the mutex_lock >>> the line above we are 100% guaranteed that hidpp_connect_event() >>> is not running and since it is not running it will also not >>> be holding any other locks, so it can not cause any problems. >> >> Agree, but my point is that you are not entirely solving the issue: >> if now, between hid_device_io_stop() and hid_hw_close() we receive a >> connect notification from the device, hid_input_report() will return >> -EBUSY, and we will lose it (it will not be stacked in the workqueue). >> >> I was thinking at adding a flush_work(&hidpp->work) here, instead of >> the mutex solution, but yours ensures that any connect event already >> started will be handled properly, which is a plus. >> >> Still if between the mutex lock here we receive a connect event from the >> device, we still get -EBUSY at the hid-core layer, and so we will lose >> it. Maybe that's OK because we might re-ask for the device later (I >> don't remember exactly the code), but my point is that because we add a >> mutex doesn't mean we will solve all multi-thread problems. So finding a >> non-mutex solution sometimes is better :) >> >> And the fact that we need to think through every preemption case often >> means that there is something wrong *elsewhere*. > > Right, I did consider seeing if we can get rid of the restart > altogether, as the whole restarting thing is actually the problem > here. AFAICT this is only really necessary in the WTP path since > there where we need to know resolution before instantiating the > input device. > > But atm this is also done for all unifying devices, which seems > unnecessary. > > Buy we still need the restart anyways for the WTP case, > so we need to make it work reliable anyways. > > Now that I understand your concern about the missed connect > packet, which I agree is a real concern I think I know how to > fix this. I'll prepare a new version of this series tomorrow. > > Hmm, thinking more about this, if we normally just create > the input device right away even for unifying devices and > we already always delay the creation for WTP even during > the restart: > > if (hidpp->quirks & HIDPP_QUIRK_DELAYED_INIT) > connect_mask &= ~HID_CONNECT_HIDINPUT; > > Then I wonder why do we even bother to do the restart > thing for unifying devices. Do you know what this is based on ? > > I guess this might have to do with ensuring the configure > commands are send before creating the input + hidraw > devices, but if the connect event comes later on then > the configuration is already done later on after > the input device has already been created ? > > So maybe we should indeed just remove the whole restart > thing entirely and also always rely on hidpp_connect_event > to send the configuration commands, because currently > those are send twice if the device is already connnected > at probe() time. The whole restart thing seems to have been added by you in: 91cf9a98ae4127 ("HID: logitech-hidpp: make .probe usbhid capable") The commit message says that not starting all the HID subdrivers right from the start is necessary because: "It means that any configuration retrieved after the initial hid_hw_start would not be exposed to user space without races." I believe this refers to the filling of hdev->name and hdev->uniq by hidpp_unifying_init() (or other similar code paths for non unifying). It seems that has been broken though by: 498ba20690357 ("HID: logitech-hidpp: Don't restart communication if not necessary") Which causes input-dev registration to happen before hidpp_unifying_init(). TL;DR: 1. There seems to be a good reason for the restart dance so we need to fix it rather then remove it. 2. 498ba20690357 ("HID: logitech-hidpp: Don't restart communication if not necessary") (from Bastien) re-introduces the race which the restart tries to fix and should be reverted. Regards, Hans