From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA12CC2BCA1 for ; Fri, 7 Jun 2019 13:32:02 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 80DE220B7C for ; Fri, 7 Jun 2019 13:32:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 80DE220B7C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:51150 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hZEyP-0007Rz-OG for qemu-devel@archiver.kernel.org; Fri, 07 Jun 2019 09:32:01 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36908) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hZEtn-0005sU-EK for qemu-devel@nongnu.org; Fri, 07 Jun 2019 09:27:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hZEtj-0004mh-VV for qemu-devel@nongnu.org; Fri, 07 Jun 2019 09:27:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45390) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hZEtb-0004NL-NZ; Fri, 07 Jun 2019 09:27:04 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5387A83F45; Fri, 7 Jun 2019 13:26:59 +0000 (UTC) Received: from dhcp-200-226.str.redhat.com (dhcp-200-226.str.redhat.com [10.33.200.226]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 61F3082F6E; Fri, 7 Jun 2019 13:26:54 +0000 (UTC) Date: Fri, 7 Jun 2019 15:26:52 +0200 From: Kevin Wolf To: Vladimir Sementsov-Ogievskiy Message-ID: <20190607132652.GD5055@dhcp-200-226.str.redhat.com> References: <20190411172709.205032-1-vsementsov@virtuozzo.com> <20190411172709.205032-7-vsementsov@virtuozzo.com> <0b64cff5-33fa-0945-504c-b1bdd004c42a@redhat.com> <20190607080635.GB5055@dhcp-200-226.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Fri, 07 Jun 2019 13:26:59 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: Re: [Qemu-devel] [PATCH v6 6/7] block/nbd-client: nbd reconnect X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Denis Lunev , "qemu-block@nongnu.org" , "armbru@redhat.com" , "qemu-devel@nongnu.org" , "mreitz@redhat.com" , "stefanha@redhat.com" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Am 07.06.2019 um 14:02 hat Vladimir Sementsov-Ogievskiy geschrieben: > 07.06.2019 11:06, Kevin Wolf wrote: > > Am 07.06.2019 um 05:17 hat Eric Blake geschrieben: > >> On 4/11/19 12:27 PM, Vladimir Sementsov-Ogievskiy wrote: > >>> +static coroutine_fn void nbd_reconnect_loop(NBDConnection *con) > >>> +{ > >>> + NBDClientSession *s = nbd_get_client_session(con->bs); > >>> + uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > >>> + uint64_t delay_ns = s->reconnect_delay * 1000000000UL; > >> > >> Do we have a #define constant for nanoseconds in a second to make this > >> more legible than counting '0's? > >> > >>> + uint64_t timeout = 1000000000UL; /* 1 sec */ > >>> + uint64_t max_timeout = 16000000000UL; /* 16 sec */ > >> > >> 1 * constant, 16 * constant > >> > >>> + > >>> + nbd_reconnect_attempt(con); > >>> + > >>> + while (nbd_client_connecting(s)) { > >>> + if (s->state == NBD_CLIENT_CONNECTING_WAIT && > >>> + qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns) > >>> + { > >>> + s->state = NBD_CLIENT_CONNECTING_NOWAIT; > >>> + qemu_co_queue_restart_all(&s->free_sema); > >>> + } > >>> + > >>> + bdrv_dec_in_flight(con->bs); > >>> + qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, timeout); > >> > >> Another place where I'd like someone more familiar with coroutines to > >> also have a look. > > > > What's the exact question you'd like me to answer? > > > > But anyway, bdrv_dec/inc_in_flight() around the sleep looks wrong to me. > > Either the operation must be waited for in drain, then you can't > > decrease the counter even during the sleep. Or drain doesn't have to > > consider it, then why is the counter even increased in the first place? > > > > The way it currently is, drain can return assuming that there are no > > requests, but after the timeout the request automatically comes back > > while the drain caller assumes that there is no more activity. This > > doesn't look right. > > Hmm. > > This ind/dec around all lifetime of connection coroutine you added in > > commit 5ad81b4946baf948c65cf7e1436d9b74803c1280 > Author: Kevin Wolf > Date: Fri Feb 15 16:53:51 2019 +0100 > > nbd: Restrict connection_co reentrance > > > And now I try to connect in endless loop, with qemu_co_sleep_ns() inside. > I need to get a change to bdrv_drain to complete, so I have to sometimes > drop this in_flight request. But I understand your point. Ah, right, I see. I think it's fine to add a second point where we decrease the counter (though I would add a comment telling why we do this) as long as the right conditions are met. The right conditions are probably something like: Once drained, we guarantee that we don't call any callbacks until the drained section ends. In nbd_read_eof() this is true because we can't get an answer if we had no request running. Or actually... This assumes a nice compliant server that doesn't just arbitrarily send unexpected messages. Is the existing code broken if we connect to a malicious server? > Will the following work better? > > bdrv_dec_in_flight(con->bs); > qemu_co_sleep_ns(...); > while (s->drained) { > s->wait_drained_end = true; > qemu_coroutine_yield(); > } > bdrv_inc_in_flight(con->bs); > > ... > .drained_begin() { > s->drained = true; > } > > .drained_end() { > if (s->wait_drained_end) { > s->wait_drained_end = false; > aio_co_wake(s->connection_co); > } > } This should make sure that we don't call any callbacks before the drain section has ended. We probably still have a problem because nbd_client_attach_aio_context() reenters the coroutine with qemu_aio_coroutine_enter(), which will cause an assertion failure if co->scheduled is set. So this needs to use a version that can cancel a sleep. I see that your patch currently simply ignores attaching a new context, but then the coroutine stays in the old AioContext. Did I miss some additional code that moves it to the new context somehow or will it just stay where it was if the coroutine happens to be reconnecting when the AioContext was supposed to change? Kevin