From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BBF3FC433EF for ; Tue, 15 Mar 2022 16:16:48 +0000 (UTC) Received: from localhost ([::1]:42966 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nU9qp-0002fM-Lh for qemu-devel@archiver.kernel.org; Tue, 15 Mar 2022 12:16:47 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50276) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nU9pG-0001bQ-0w for qemu-devel@nongnu.org; Tue, 15 Mar 2022 12:15:10 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:26906) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nU9pC-0007tD-Mm for qemu-devel@nongnu.org; Tue, 15 Mar 2022 12:15:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1647360902; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RDctXT/lZtjkn19xFE+eAy3PgzAbC69/mw4yZDE5a4Q=; b=H5AVFciMv51kMm9y69T7VBKPS6FUTW32NFGgrRavM1I466mzqB8BM2Tsx6c9Oe4yIOsutt MZe+wAvmwwLn6fdvtpm5CQF7qPOtCdSCjyFtC9l3pIyYMYi+0gpAIfhtRKt1neKNTxapx1 x8CAeNIDpd9dkqACtXGYXTga0EkQGjs= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-275-FtnYoa-MOze2YrhpzoKwrw-1; Tue, 15 Mar 2022 12:14:57 -0400 X-MC-Unique: FtnYoa-MOze2YrhpzoKwrw-1 Received: by mail-wm1-f69.google.com with SMTP id 14-20020a05600c028e00b003897a4056e8so1347004wmk.9 for ; Tue, 15 Mar 2022 09:14:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=RDctXT/lZtjkn19xFE+eAy3PgzAbC69/mw4yZDE5a4Q=; b=hFabAIbuaAkHx6+uYSqG6MQaB5e+JQjhOwo7kD3e7/GckdqDcXjIjuvSFKH5tDR0Rq AShqKb93lrMbcaxTS5F4CZJqS1iIQ/A5NQ0YRklZj9y0+i+G0HP9Qze25FGOYPFeDj27 3mfWGzGWSKbSNau/m3uuyjypQm+E+GBdgRyXlq0uBJIqSjmfkm6PqT8hF1yC0pS1vhvp hTP1kirmIgUg5Huh6khTWEsIHh9nd0SrLWvYu3f3eMG+SxW9bdyRP1wOaTaa1oQRrn78 gtJrfw2IYzBZ2BrDP21gcOyI0vFrp2F9JQyFgEGTQl0jjfdVF5PW5ReHseh4WTK+ATyY XbVA== X-Gm-Message-State: AOAM532TXMR7RWsSifQ3aZ6wzLNV35bMEuCNoAq9rFqheMI2gnjHo5vN lAueBhZECYZ3lgqIq77xtBP2G8Cs0f1BzFeVJ+54XaaOdsxJxZX5bp+e+CBf8VcqlpFp4B6i7FT ba5NHEzuekbUEOo8= X-Received: by 2002:a5d:4a8f:0:b0:1f0:4af2:4e29 with SMTP id o15-20020a5d4a8f000000b001f04af24e29mr20786574wrq.519.1647360895777; Tue, 15 Mar 2022 09:14:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwoD9PN1q34ht1rQhfBYnMwfFHCvXGoBIa/EAitvZ7TEjlUiRah3zDbVDSckV8mkUWEaoP4Rg== X-Received: by 2002:a5d:4a8f:0:b0:1f0:4af2:4e29 with SMTP id o15-20020a5d4a8f000000b001f04af24e29mr20786545wrq.519.1647360895470; Tue, 15 Mar 2022 09:14:55 -0700 (PDT) Received: from work-vm (cpc109025-salf6-2-0-cust480.10-2.cable.virginm.net. [82.30.61.225]) by smtp.gmail.com with ESMTPSA id m3-20020a5d6a03000000b001f06621641fsm15701700wru.96.2022.03.15.09.14.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Mar 2022 09:14:54 -0700 (PDT) Date: Tue, 15 Mar 2022 16:14:52 +0000 From: "Dr. David Alan Gilbert" To: Peter Maydell Subject: Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue) Message-ID: References: MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/2.1.5 (2021-12-30) Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dgilbert@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Received-SPF: pass client-ip=170.10.129.124; envelope-from=dgilbert@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Thomas Huth , Christian Borntraeger , Daniel =?iso-8859-1?Q?P=2E_Berrang=E9?= , Ilya Leoshkevich , Juan Quintela , s.reiter@proxmox.com, QEMU Developers , Peter Xu , "open list:S390 general arch..." , Philippe =?iso-8859-1?Q?Mathieu-Daud=E9?= , Hanna Reitz , f.ebner@proxmox.com, Jinpu Wang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" * Peter Maydell (peter.maydell@linaro.org) wrote: > On Tue, 15 Mar 2022 at 14:39, Peter Maydell wrote: > > > > On Mon, 14 Mar 2022 at 19:44, Peter Maydell wrote: > > > On Mon, 14 Mar 2022 at 18:58, Peter Maydell wrote: > > > > I just hit the abort case, narrowing it down to the > > > > /i386/migration/multifd/tcp/zlib case, which can hit this without > > > > any other tests being run: > > > > > > > This test seems to fail fairly frequently. I'll try a bisect... > > > > > > On this s390 machine, this test has been intermittent since > > > it was first added in commit 7ec2c2b3c1 ("multifd: Add zlib compression > > > multifd support") in 2019. > > > > I have tried (on current master) runs of various of the other > > migration tests, and: > > * /i386/migration/multifd/tcp/zstd completed 1170 iterations without > > failing > > * /i386/migration/precopy/tcp completed 4669 iterations without > > failing > > * /i386/migration/multifd/tcp/zlib fails usually within the first > > 10 iterations (the most I ever saw it manage was 32) > > > > So whatever this is, it seems like it might be specific to the > > zlib code somehow ? > > Maybe we're running into this bug > https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427 > ("zlib: compressBound() returns an incorrect result on z15") ? The initial description of compressBound being wrong doesn't feel like it would cause that; it claims it would trigger an error (I'm not sure how good we are at spotting that!); but then later in the description it says: 'Mistakes in dfltcc_free_window OF and especially DEFLATE_BOUND_COMPLEN, (incl. the bit definitions), may cause various and unforseen defects' Certainly looks like a 'various and unforseen defect'. Dave > That bug report claims it doesn't affect focal, though, which > is what we're running on this box (specifically, the zlib1g > package is version 1:1.2.11.dfsg-2ubuntu1.2). > > A run with DFLTCC=0 has made it past 60 iterations so far, which > suggests that that does serve as a workaround for the bug. > > thanks > -- PMM > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK