From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2A2A15EFC0 for ; Mon, 29 Jul 2024 16:55:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.43 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722272159; cv=none; b=IS2b/LG9naxbvx+zRyTY1Bk3E6PDOJevmG9nbsYqQk+JqoQ0JUMvJY9u0I6vTiKn1xkOLLiKsaYHFQYr+omq5mSkFOCkqQowu3KslM7oZmnDmK2DMhFnIFConfLJWMsYJ3tI5dlI/Jw1xHSiazSze4uJP9/Cs2KcxcY+ZOhXUtA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722272159; c=relaxed/simple; bh=GBcQxEcm76aRw5bm5m7TK9aRH3OcNZY833zuwx7b+Ac=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=srT7iqhhytrzcgnsgm7VVdSret5u9Gn0renuuV+uWFAdqwKMo43OXvUR5KVUv92MrcusoMBQI9WSY4kAxiiCwNhGTrXbFFpWhTEEseSP6XLCkAlQaFkqAHMf7AVe9+vd9NpkOux4UxSog/aixq07mbFIkuN2N2irShImIwnxJQs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.208.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-5a10835480bso5572015a12.2 for ; Mon, 29 Jul 2024 09:55:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722272156; x=1722876956; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pInQiHS1SIzp/gP7f63T6ThFWloDP+HBPBas+N8DhlI=; b=rBtFIMLWHEqUQO3bKUDBWhJWjPhaCp88v4XUdLsGESUc8N/o/2ZCkXHGJBVTsk5bKl bprvHMtlqS2k4vDs+Xlb2CVBSGaQXKzgpdNgshZzCC+rN3ldmyWwZUtngVpVmWwME1s8 D71V1/bUzzsauUUmO53MlVt4Xr5RyJzu9+9KDSgPB5Kmm1MTO+S/hGR/Js4PiaUh+WEN Bx8Yn8KTKL7PruSd18wpSMB+/pfhWG6adl9Z5OePrEA9yzuUADbH36jPs+eQxAWaIeVX aHX7RbaPHRQ3jX6KnIGkO7ExROP1qxgSDbc5WtgcQ6vz0cjwysKp4wAFUGkpk2K0VTqE tROQ== X-Forwarded-Encrypted: i=1; AJvYcCUBYAEYSQeUYbOM2xjh47dIZiWA3HnSbINRc8DwMwOZPViVmLRFVjxKGpQDMc536ooDoKdt8AEhLzjA/jz+edGBDcg/a77toZm8GHN1 X-Gm-Message-State: AOJu0YzuFErNjrBpNkPxUmWFzcNGlRoAr6srijbSLLn2EjDmbYwGw0I9 MYNXH/3/QYCodi0gHySkxMQ7QZeygxfCYPnBIZmSYVa1aM9fhbqf X-Google-Smtp-Source: AGHT+IF9nMyWswyCHwij8Zvuwt8VRzAnVK4GY2dn8OzlZ3itNwyWmyv8AqHsXb/XQ+UlJW8H6JM8lw== X-Received: by 2002:a05:6402:3546:b0:5a2:2b56:e08e with SMTP id 4fb4d7f45d1cf-5b020cbe0a3mr6143374a12.18.1722272156078; Mon, 29 Jul 2024 09:55:56 -0700 (PDT) Received: from gmail.com (fwdproxy-lla-112.fbsv.net. [2a03:2880:30ff:70::face:b00c]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5ac64eb3cb8sm5929111a12.77.2024.07.29.09.55.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jul 2024 09:55:55 -0700 (PDT) Date: Mon, 29 Jul 2024 09:55:53 -0700 From: Breno Leitao To: Thomas Gleixner Cc: Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , leit@meta.com, "Peter Zijlstra (Intel)" , Wei Liu , Marc Zyngier , Adrian Huang , "open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" Subject: Re: [PATCH] x86/apic: Add retry mechanism to add_pin_to_irq_node() Message-ID: References: <20240729140604.2814597-1-leitao@debian.org> <874j8889ch.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <874j8889ch.ffs@tglx> Hello Thomas, On Mon, Jul 29, 2024 at 06:13:34PM +0200, Thomas Gleixner wrote: > On Mon, Jul 29 2024 at 07:06, Breno Leitao wrote: > > I've been running some experiments with failslab fault injector running > > to detect a different problem, and the machine always crash with the > > following stack: > > > > can not alloc irq_pin_list (-1,0,20) > > Kernel panic - not syncing: IO-APIC: failed to add irq-pin. Can not proceed > > > > Call Trace: > > panic > > _printk > > panic_smp_self_stop > > rcu_is_watching > > intel_irq_remapping_free > > This completely lacks context. When does this happen? What's the system > state? What has intel_irq_remapping_free() to do with the allocation path? Sorry, let me clarify it a bit better: 1) This happens when the machine is booted up, and being under stress 2) This happens when I have failslab fault injection enabled. 3) The machine crashes after hitting this error. 4) This is reproducible with `stress-ng` using the `--aggressive` parameter 5) This is the full stack (sorry for not decoding the stack, but if you need it, I am more than happy to give you a decoded stack) 04:12:34 can not alloc irq_pin_list (-1,0,20) Kernel panic - not syncing: IO-APIC: failed to add irq-pin. Can not proceed CPU: 11 UID: 0 PID: 335023 Comm: stress-ng-dev Kdump: loaded Tainted: G S E N 6.10.0-12563-gdb0610128a16 #48 Call Trace: panic+0x4e9/0x590 ? _printk+0xb3/0xe0 ? panic_smp_self_stop+0x70/0x70 ? rcu_is_watching+0xe/0xb0 ? intel_irq_remapping_free+0x30/0x30 ? __add_pin_to_irq_node+0xf4/0x2d0 ? rcu_is_watching+0xe/0xb0 mp_irqdomain_alloc+0x9ab/0xa80 ? IO_APIC_get_PCI_irq_vector+0x850/0x850 ? __kmalloc_cache_node_noprof+0x1e0/0x360 ? mutex_lock_io_nested+0x1420/0x1420 irq_domain_alloc_irqs_locked+0x25d/0x8d0 __irq_domain_alloc_irqs+0x80/0x110 mp_map_pin_to_irq+0x645/0x890 ? __acpi_get_override_irq+0x350/0x350 ? mutex_lock_io_nested+0x1420/0x1420 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? mp_map_gsi_to_irq+0xe6/0x1b0 acpi_register_gsi_ioapic+0xe6/0x150 ? acpi_unregister_gsi_ioapic+0x40/0x40 ? mark_held_locks+0x9f/0xe0 ? _raw_spin_unlock_irq+0x24/0x50 hpet_open+0x313/0x480 misc_open+0x306/0x420 chrdev_open+0x218/0x660 ? __unregister_chrdev+0xe0/0xe0 ? security_file_open+0x3d4/0x740 do_dentry_open+0x4a1/0x1300 ? __unregister_chrdev+0xe0/0xe0 vfs_open+0x7e/0x350 path_openat+0xb46/0x2740 ? kernel_tmpfile_open+0x60/0x60 ? lock_acquire+0x1e4/0x650 do_filp_open+0x1af/0x3e0 ? path_openat+0x2740/0x2740 ? do_raw_spin_lock+0x12d/0x270 ? spin_bug+0x1d0/0x1d0 ? _raw_spin_unlock+0x29/0x40 ? alloc_fd+0x1e6/0x640 do_sys_openat2+0x117/0x150 ? build_open_flags+0x450/0x450 ? lock_downgrade+0x690/0x690 __x64_sys_openat+0x11f/0x1d0 ? __x64_sys_open+0x1a0/0x1a0 ? do_syscall_64+0x36/0x190 do_syscall_64+0x6e/0x190 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f6c406fd784 Code: 24 20 eb 8f 66 90 44 89 54 24 0c e8 d6 88 f8 ff 44 8b 54 24 0c 44 89 e2 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 89 44 24 0c e8 28 89 f8 ff 8b 44 RSP: 002b:00007fff72413a70 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 00007f6c408c43a8 RCX: 00007f6c406fd784 RDX: 0000000000000800 RSI: 000055759a5fc910 RDI: 00000000ffffff9c RBP: 000055759a5fc910 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000800 R13: 00007fff72413c90 R14: 000055759a5fc910 R15: 00007f6c408c43a8 > > This happens because add_pin_to_irq_node() function would panic if > > adding a pin to an IRQ failed due to -ENOMEM (which was injected by > > failslab fault injector). I've been running with this patch in my test > > cases in order to be able to pick real bugs, and I thought it might be a > > good idea to have it upstream also, so, other people trying to find real > > bugs don't stumble upon this one. Also, this makes sense in a real > > world(?), when retrying a few times might be better than just > > panicking. > > While it seems to make sense, the reality is that this is mostly early > boot code. If there is a real world memory allocation failure during > early boot then retries will not help at all. This is not happening at early boot, this is reproducible when running stress-ng in this aggressive mode. Since I have failslab injecting a kmalloc fault, __add_pin_to_irq_noder() returns -ENOMEM, which causes the undesired panic(). > > Introduce a retry mechanism that attempts to add the pin up to 3 times > > before giving up and panicking. This should improve the robustness of > > the IO-APIC code in the face of transient errors. > > I'm absolutely not convinced by this loop heuristic. That's just a bad > hack. I will not disagree with you here, but I need to use this patch in order to be able t keep the system not panicking and stable while fault injecting slab errors and trying to reproduce a real bug in the network stack. Thanks for the review, --breno