From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-io1-f49.google.com (mail-io1-f49.google.com [209.85.166.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 747E21D54E2 for ; Fri, 24 Jan 2025 20:51:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.49 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737751898; cv=none; b=lDAdde7TiMjrfkyjxs5206O/kqtlo0M5wHvlOluF/lvOawxLS6Vx3HZTEuMqFyhb/8dIJL6DvlyXWH7naYv1b3zcTEHFxNKBC5yaWUoLIFA2ROKrZZ+XxtlohdrCCGy5DOgHCEcBte6n7yRzimVYKQrBtv+ztidNWruTjeJQGcI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737751898; c=relaxed/simple; bh=KIdqUxQ/TeBsrZa+YNOxUuG6i684VFDu6KajdTLBNXE=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=u+Y152sdta0iWpPHN3LCK5UMD+osCqVFids0BnVphODpqm0ODvtiAjubAH+QKplTW5OQ5Ec1MGPGY2x5ICRS9urgkzvvhQVK174Ue4huv10YP27ejZ5dDNalvUd3ZBNuRlfs9ECF0rE1u+PC9eF4a6iJXaWE5dNAoEwNtNlXwTs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=H4NNKTlm; arc=none smtp.client-ip=209.85.166.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="H4NNKTlm" Received: by mail-io1-f49.google.com with SMTP id ca18e2360f4ac-844e61f3902so175633639f.0 for ; Fri, 24 Jan 2025 12:51:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1737751894; x=1738356694; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=piCoSIefp0mhGxpiCOEP0Otj5ta/B/7ZE/GZR6dGe8k=; b=H4NNKTlmJhi6Ksv3wk3VMfiVb4OSU6p+QPR9/bJU4jyd49lM1En8I6P5O4UY0sJTtj 6GqvDaWfm4MYScVY7y1/Is76+POELqk7pE2qIRHV+CQ+nnRrHF1D+Ybi14e8hCKYgcSE /ZDWnbe9rOSloLsXYPfcVZfiKlcqIXiM7n0HL/Jfx7/zen92QVgY8WbaDk5oFTGW0Drg WJuJYmpOoXR285/k9XcEjBV2lyxbxRfsN8xKNXnr2nEDtSyUv5GwKVPNbyC85qYUbiRW 1pQSTpKF8yonFsLp+ymbUzt3UlB1GjZV2tx+M1U1dTD0LimDav2NfgCbNL6eRaxNmW27 tfqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737751894; x=1738356694; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=piCoSIefp0mhGxpiCOEP0Otj5ta/B/7ZE/GZR6dGe8k=; b=uXRrA8FdrKInUG1lvM+91pWQNc/nj7wwGFpcmdBHO08vQDZyQvVfI/O0NqXctYGuP3 7Ob5otIvQUgRFO3neYM2lcgfzfsBOL9BJJPZXRMZ/akD1luf50v9w0JHfA2bSelHVORF /mxWspCbMtMfVPpgexIaBCq+TUkKcq5MlLnewILSZ4dP/koBtd7V8Chho48w+Z2HFSSo 8p+1GPkbbun7mx3YSXNCm6IqRF1QOLsQB/pJr2mYRYjfvBHYs3zxk2NLb0wBBjMjoUD3 shID4i6iLrdEFC3YJZn/NtDcEbTskehYMou6VR6n7rhUpvNziRVzpZVN1ZEXiH+uD6p9 gDpw== X-Forwarded-Encrypted: i=1; AJvYcCUKh+w7w64AKoRV+YMchoIks/ECckAcyyIVrmD1qC9fRGFBcS44aIRKBn74GEb/QilfYvvmJucHF1s7FQk=@vger.kernel.org X-Gm-Message-State: AOJu0YxnXtO+h2TJZWZZnAeTHhsLxdIhVmM8VlCaVmONALA3kLwhUSZE aPQb8zNvXnR+hjMPLS+d9+/HS5PleAw42zArY1DYNRYRKmKtx5nRw7gVs+pOEKeojp/gBdZxbxb Z X-Gm-Gg: ASbGncuSpG1D/nkKQaLAREdopY05xJ4HzmqAU5YvyW1uCTN+yPahpJAOm7+kI1H8LKb 7i1UDYU4SXPax8u4BH/FF7zcSk9XgGsEVUPP1FGlalfhDOzxJKWmu0r1KIfH7ai6YxWO++PQKeq /pnpfbjn+30Mw4bQ9NClTBhiNweVqmCy15OtMfnCl/Sf2noMgrGlu61nWHcfgsM2fqnoCF0SXOq i3mG4h41HDZnVQduPv3PdVrBP12GIcb3eDcdSK9MIjdMHqZhVxb888YOoCwnOZdPMT5FFCRXRcH Pg== X-Google-Smtp-Source: AGHT+IFAQ2V0tUehGHsduBzV1CBK9BUXzGuNC88ilzYl7uiRcSbenuJl8iISOcgvGQGRcykqUgXddg== X-Received: by 2002:a05:6602:6d8d:b0:841:8d66:8aea with SMTP id ca18e2360f4ac-851b617219bmr2503664939f.2.1737751894515; Fri, 24 Jan 2025 12:51:34 -0800 (PST) Received: from [192.168.1.150] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec1da2db34sm849968173.36.2025.01.24.12.51.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 24 Jan 2025 12:51:33 -0800 (PST) Message-ID: <13ba3fc4-eea3-48b1-8076-6089aaa978fb@kernel.dk> Date: Fri, 24 Jan 2025 13:51:33 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs To: Salvatore Bonaccorso , Pavel Begunkov Cc: Xan Charbonnet , 1093243@bugs.debian.org, Bernhard Schmidt , io-uring@vger.kernel.org, linux-kernel@vger.kernel.org, regressions@lists.linux.dev References: <173706089225.4380.9492796104667651797.reportbug@backup22.biblionix.com> <8af1733b-95a8-4ac9-b931-6a403f5b1652@gmail.com> From: Jens Axboe Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 1/24/25 1:33 PM, Salvatore Bonaccorso wrote: > Hi Pavel, > > On Fri, Jan 24, 2025 at 06:40:51PM +0000, Pavel Begunkov wrote: >> On 1/24/25 16:30, Xan Charbonnet wrote: >>> On 1/24/25 04:33, Pavel Begunkov wrote: >>>> Thanks for narrowing it down. Xan, can you try this change please? >>>> Waiters can miss wake ups without it, seems to match the description. >>>> >>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>> index 9b58ba4616d40..e5a8ee944ef59 100644 >>>> --- a/io_uring/io_uring.c >>>> +++ b/io_uring/io_uring.c >>>> @@ -592,8 +592,10 @@ static inline void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx) >>>> io_commit_cqring(ctx); >>>> spin_unlock(&ctx->completion_lock); >>>> io_commit_cqring_flush(ctx); >>>> - if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) >>>> + if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) { >>>> + smp_mb(); >>>> __io_cqring_wake(ctx); >>>> + } >>>> } >>>> void io_cq_unlock_post(struct io_ring_ctx *ctx) >>>> >>> >>> >>> Thanks Pavel! Early results look very good for this change. I'm now running 6.1.120 with your added smp_mb() call. The backup process which had been quickly triggering the issue has been running longer than it ever did when it would ultimately fail. So that's great! >>> >>> One sour note: overnight, replication hung on this machine, which is another failure that started happening with the jump from 6.1.119 to 6.1.123. The machine was running 6.1.124 with the __io_cq_unlock_post_flush function removed completely. That's the kernel we had celebrated yesterday for running the backup process successfully. >>> >>> So, we might have two separate issues to deal with, unfortunately. >> >> Possible, but it could also be a side effect of reverting the patch. >> As usual, in most cases patches are ported either because they're >> fixing sth or other fixes depend on it, and it's not yet apparent >> to me what happened with this one. > > I researched bit the lists, and there was the inclusion request on the > stable list itself. Looking into the io-uring list I found > https://lore.kernel.org/io-uring/CADZouDRFJ9jtXHqkX-PTKeT=GxSwdMC42zEsAKR34psuG9tUMQ@mail.gmail.com/ > which I think was the trigger to later on include in fact the commit > in 6.1.120. Yep indeed, was just looking for the backstory and that is why it got backported. Just missed the fact that it should've been an io_cqring_wake() rather than __io_cqring_wake()... -- Jens Axboe