linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hans de Goede <hdegoede@redhat.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>,
	"Mika Westerberg" <mika.westerberg@linux.intel.com>,
	"Krzysztof Wilczyński" <kw@linux.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Myron Stowe" <myron.stowe@redhat.com>,
	"Juha-Pekka Heikkila" <juhapekka.heikkila@gmail.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"H . Peter Anvin" <hpa@zytor.com>,
	linux-acpi@vger.kernel.org, linux-pci@vger.kernel.org,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	"Benoit Grégoire" <benoitg@coeus.ca>,
	"Hui Wang" <hui.wang@canonical.com>,
	stable@vger.kernel.org,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Subject: Re: [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems
Date: Tue, 7 Dec 2021 17:52:40 +0100	[thread overview]
Message-ID: <8cfab16c-5edc-dbff-b73e-65419a649ac2@redhat.com> (raw)
In-Reply-To: <1a001812-1f18-1999-44b7-30fe3a19f460@redhat.com>

Hi Bjorn,

On 11/10/21 14:05, Hans de Goede wrote:
> Hi Bjorn,
> 
> On 11/10/21 09:45, Hans de Goede wrote:
>> Hi Bjorn,
>>
>> On 11/9/21 23:07, Bjorn Helgaas wrote:
>>> On Sat, Nov 06, 2021 at 11:15:07AM +0100, Hans de Goede wrote:
>>>> On 10/20/21 23:14, Bjorn Helgaas wrote:
>>>>> On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
>>>>>> On 10/19/21 23:52, Bjorn Helgaas wrote:
>>>>>>> On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
>>>>>>>> Some BIOS-es contain a bug where they add addresses which map to system
>>>>>>>> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
>>>>>>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>>>>>>>> space").
>>>>>>>>
>>>>>>>> To work around this bug Linux excludes E820 reserved addresses when
>>>>>>>> allocating addresses from the PCI host bridge window since 2010.
>>>>>>>> ...
>>>>>
>>>>>>> I haven't seen anybody else eager to merge this, so I guess I'll stick
>>>>>>> my neck out here.
>>>>>>>
>>>>>>> I applied this to my for-linus branch for v5.15.
>>>>>>
>>>>>> Thank you, and sorry about the build-errors which the lkp
>>>>>> kernel-test-robot found.
>>>>>>
>>>>>> I've just send out a patch which fixes these build-errors
>>>>>> (verified with both .config-s from the lkp reports).
>>>>>> Feel free to squash this into the original patch (or keep
>>>>>> them separate, whatever works for you).
>>>>>
>>>>> Thanks, I squashed the fix in.
>>>>>
>>>>> HOWEVER, I think it would be fairly risky to push this into v5.15.
>>>>> We would be relying on the assumption that current machines have all
>>>>> fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
>>>>> evidence for that.
>>>>>
>>>>> I'm not sure there's significant benefit to having this in v5.15.
>>>>> Yes, the mainline v5.15 kernel would work on the affected machines,
>>>>> but I suspect most people with those machines are running distro
>>>>> kernels, not mainline kernels.
>>>>
>>>> I understand that you were reluctant to add this to 5.15 so close
>>>> near the end of the 5.15 cycle, but can we please get this into
>>>> 5.16 now ?
>>>>
>>>> I know you ultimately want to see if there is a better fix,
>>>> but this is hitting a *lot* of users right now and if we come up
>>>> with a better fix we can always use that to replace this one
>>>> later.
>>>
>>> I don't know whether there's a "better" fix, but I do know that if we
>>> merge what we have right now, nobody will be looking for a better
>>> one.
>>>
>>> We're in the middle of the merge window, so the v5.16 development
>>> cycle is over.  The v5.17 cycle is just starting, so we have time to
>>> hit that.  Obviously a fix can be backported to older kernels as
>>> needed.
>>>
>>>> So can we please just go with this fix now, so that we can
>>>> fix the issues a lot of users are seeing caused by the current
>>>> *wrong* behavior of taking the e820 reservations into account ?
>>>
>>> I think the fix on the table is "ignore E820 for BIOS date >= 2018"
>>> plus the obvious parameters to force it both ways.
>>
>> Correct.
>>
>>> The thing I don't like is that this isn't connected at all to the
>>> actual BIOS defect.  We have no indication that current BIOSes have
>>> fixed the defect,
>>
>> We also have no indication that that defect from 10 years ago, from
>> pre UEFI firmware is still present in modern day UEFI firmware which
>> is basically an entire different code-base.
>>
>> And even 10 years ago the problem was only happening to a single
>> family of laptop models (Dell Precision laptops) so this clearly
>> was a bug in that specific implementation and not some generic
>> issue which is likely to be carried forward.
>>
>>> and we have no assurance that future ones will not
>>> have the defect.  It would be better if we had some algorithmic way of
>>> figuring out what to do.
>>
>> You yourself have said that in hindsight taking E820 reservations
>> into account for PCI bridge host windows was a mistake. So what
>> the "ignore E820 for BIOS date >= 2018" is doing is letting the
>> past be the past (without regressing on older models) while fixing
>> that mistake on any hardware going forward.
>>
>> In the unlikely case that we hit that BIOS bug again on 1 or 2 models,
>> we can simply DMI quirk those models, as we do for countless other
>> BIOS issues.
>>
>>> Thank you very much for chasing down the dmesg log archive
>>> (https://github.com/linuxhw/Dmesg; see
>>> https://lore.kernel.org/r/82035130-d810-9f0b-259e-61280de1d81f@redhat.com).
>>> Unfortunately I haven't had time to look through it myself, and I
>>> haven't heard of anybody else doing it either.
>>
>> Right, I'm afraid that I already have spend way too much time on this
>> myself. Note that I've been working with users on this bug on and off
>> for over a year now.
>>
>> This is hitting many users and now that we have a viable fix, this
>> really needs to be fixed now.
>>
>> I believe that the "ignore E820 for BIOS date >= 2018" fix is good
>> enough and that you are letting perfect be the enemy of good here.
>>
>> As an upstream kernel maintainer myself, I'm sorry to say this,
>> but if we don't get some fix for this merged soon you are leaving
>> my no choice but to add my fix to the Fedora kernels as a downstream
>> patch (and to advise other distros to do the same).
>>
>> Note that if you are still afraid of regressions going the downstream
>> route is also an opportunity, Fedora will start testing moving users
>> to 5.15.y soon, so I could add the patch to Fedora's 5.15.y builds and
>> see how that goes ?
> 
> So I've discussed this with the Fedora kernel maintainers and they have
> agreed to add the patch to the Fedora 5.15 kernels, which we will ask
> our users to start testing soon (we first run some voluntary testing
> before eventually moving all users over).
> 
> This will provide us with valuable feedback wrt this patch causing
> regressions as you are worried about, or not.
> 
> Assuming no regressions show up I hope that this will give you
> some assurance that there the patch causes no regressions and that
> you will then be willing to pick this up later during the 5.16
> cycle so that Fedora only deviates from upstream for 1 cycle.

5.15.y kernels with this patch added have been in Fedora's
stable updates repo for a while now without any reports of the
regressions you feared this may cause.

Bjorn, I hope that you are willing to merge this patch now that it has
seen some more wide spread testing ?

Regards,

Hans



  parent reply	other threads:[~2021-12-07 16:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bfbac749-7434-1497-039b-3b8bc4dc5499@redhat.com>
2021-10-20 21:14 ` [PATCH v5 1/2] x86/PCI: Ignore E820 reservations for bridge windows on newer systems Bjorn Helgaas
2021-10-21 17:15   ` Hans de Goede
2021-10-22  1:20     ` Bjorn Helgaas
2021-10-22  9:53       ` Hans de Goede
2021-10-29  8:10         ` Hans de Goede
2021-11-06 10:15   ` Hans de Goede
2021-11-09 22:07     ` Bjorn Helgaas
2021-11-10  8:45       ` Hans de Goede
2021-11-10 13:05         ` Hans de Goede
2021-11-10 21:51           ` Thomas Backlund
2021-12-07 16:52           ` Hans de Goede [this message]
2021-12-15 16:01             ` Bjorn Helgaas
2021-12-15 16:33               ` Hans de Goede
2021-10-14 18:39 [PATCH v5 0/2] " Hans de Goede
2021-10-14 18:39 ` [PATCH v5 1/2] " Hans de Goede
2021-10-15 20:03   ` Bjorn Helgaas
2021-10-15 20:15     ` Hans de Goede
2021-10-19 21:52   ` Bjorn Helgaas
2021-10-19 21:55     ` Bjorn Helgaas
2021-10-21 11:30     ` Hans de Goede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8cfab16c-5edc-dbff-b73e-65419a649ac2@redhat.com \
    --to=hdegoede@redhat.com \
    --cc=benoitg@coeus.ca \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=helgaas@kernel.org \
    --cc=hpa@zytor.com \
    --cc=hui.wang@canonical.com \
    --cc=juhapekka.heikkila@gmail.com \
    --cc=kw@linux.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=mingo@redhat.com \
    --cc=myron.stowe@redhat.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).