linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <bhelgaas@google.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-mm@kvack.org, linuxppc-dev@ozlabs.org,
	Gavin Shan <gwshan@linux.vnet.ibm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: [PATCH RESEND v2 7/7] PCI/hotplug: PowerPC PowerNV PCI hotplug driver
Date: Wed, 18 Feb 2015 08:30:28 -0600	[thread overview]
Message-ID: <CAErSpo6vMyy2SoqZ4ca2D_BUxd3J4W5jASsQu0NM9opFzR5mfg@mail.gmail.com> (raw)
In-Reply-To: <1424219432.21410.113.camel@kernel.crashing.org>

[+cc linux-mm, linux-kernel]

For context, the start of this discussion was here:
http://lkml.kernel.org/r/1424157203-691-8-git-send-email-gwshan@linux.vnet.ibm.com
where Gavin is adding a new PCI hotplug driver for PowerNV.  That new
driver calls vm_unmap_aliases() the same way we do in the existing RPA
hotplug driver here:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/hotplug/rpaphp_core.c#n432

I'm trying to figure out whether it's correct to use
vm_unmap_aliases() here, but I'm not an mm person so all I have is my
gut feeling that something doesn't smell right.

On Tue, Feb 17, 2015 at 6:30 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Wed, 2015-02-18 at 11:16 +1100, Gavin Shan wrote:
>> >What is vm_unmap_aliases() for?  I see this is probably copied from
>> >rpaphp_core.c, where it was added by b4a26be9f6f8 ("powerpc/pseries:
>> Flush
>> >lazy kernel mappings after unplug operations").
>> >
>> >But I don't know whether:
>> >
>> >  - this is something specific to powerpc,
>> >  - the lack of vm_unmap_aliases() in other hotplug paths is a bug,
>> >  - the fact that we only do this on powerpc is covering up a
>> >    powerpc bug somewhere
>>
>> Yes, I copied this piece of code from rpaphp_core.c. I think Ben might
>> help to answer the questions as he added the patch. I had very quick
>> check on mm/vmalloc.c and it's reasonable to have vm_unmap_aliases()
>> here to flush TLB entries for ioremap() regions, which were unmapped
>> previously. if I'm correct. I don't think it's powerpc specific.
>
> It's specific to running under the PowerVM hypervisor, and thus doesn't
> affect PowerNV, just don't copy it over.
>
> It comes from the fact that the generic ioremap code nowadays delays
> TLB flushing on unmap. The TLB flushing code is what, on powerpc,
> ensures that we remove the translations from the MMU hash table (the
> hash table is essentially treated as an extended in-memory TLB), which
> on pseries turns into hypervisor calls.
>
> When running under that hypervisor, the HV ensures that no translation
> still exists in the hash before allowing a device to be removed from
> a partition. If translations still exist, the removal fails.
>
> So we need to force the generic ioremap code to perform all the TLB
> flushes for iounmap'ed regions before we "complete" the unplug operation
> from a kernel perspective so that the device can be re-assigned to
> another partition.
>
> This is thus useless on platforms like powernv which do not run under
> such a hypervisor.

So the hypervisor call that removes the device from the partition will
fail if there are any translations that reference the memory of the
device.

Let me go through this in excruciating detail to see if I understand
what's going on:

  - PCI core enumerates device D1
  - PCI core sets device D1 BAR 0 = 0x1000
  - driver claims D1
  - driver ioremaps 0x1000 at virtual address V
  - translation V -> 0x1000 is in TLB
  - driver iounmaps V (but V -> 0x1000 translation may remain in TLB)
  - driver releases D1
  - hot-remove D1 (without vm_unmap_aliases(), hypervisor would fail this)
  - it would be a bug to reference V here, but if we did, the
virt-to-phys translation would succeed and we'd have a Master Abort or
Unsupported Request on PCI/PCIe
  - hot-add D2
  - PCI core enumerates device D2
  - PCI core sets device D2 BAR 0 = 0x1000
  - it would be a bug to reference V here (before ioremapping), but if
we did, the reference would reach D2

I don't see anything hypervisor-specific here except for the fact that
the hypervisor checks for existing translations and most other
platforms don't.  But it seems like the unexpected PCI aborts could
happen on any platform.

Are we saying that those PCI aborts are OK, since it's a bug to make
those references in the first place?  Or would we rather take a TLB
miss fault instead so the references never make it to PCI?

I would think there would be similar issues when unmapping and
re-mapping plain old physical memory.  But I don't see
vm_unmap_aliases() calls there, so those issues must be handled
differently.  Should we handle this PCI hotplug issue the same way we
handle RAM?

Bjorn

  reply	other threads:[~2015-02-18 14:30 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-17  7:13 [PATCH RESEND v2 0/7] powerpc/powernv: Unified PCI slot reset and hotplug Gavin Shan
2015-02-17  7:13 ` [PATCH RESEND v2 1/7] powerpc/powernv: Use PCI slot reset infrastructure Gavin Shan
2015-02-17  7:13 ` [PATCH RESEND v2 2/7] powerpc/powernv: Issue fundamental reset if required Gavin Shan
2015-02-17  7:13 ` [PATCH RESEND v2 3/7] powerpc/pci: Move pcibios_find_pci_bus() around Gavin Shan
2015-02-17  7:13 ` [PATCH RESEND v2 4/7] powerpc/pci: Don't scan empty slot Gavin Shan
2015-02-17  7:13 ` [PATCH RESEND v2 5/7] powerpc/powernv: Introduce pnv_pci_poll() Gavin Shan
2015-02-17  7:13 ` [PATCH RESEND v2 6/7] powerpc/powernv: Functions to retrieve PCI slot status Gavin Shan
2015-02-17  7:13 ` [PATCH RESEND v2 7/7] PCI/hotplug: PowerPC PowerNV PCI hotplug driver Gavin Shan
2015-02-17 22:09   ` Bjorn Helgaas
2015-02-18  0:16     ` Gavin Shan
2015-02-18  0:30       ` Benjamin Herrenschmidt
2015-02-18 14:30         ` Bjorn Helgaas [this message]
2015-02-18 21:03           ` Benjamin Herrenschmidt
2015-02-17 21:44 ` [PATCH RESEND v2 0/7] powerpc/powernv: Unified PCI slot reset and hotplug Stewart Smith
2015-02-17 22:26   ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAErSpo6vMyy2SoqZ4ca2D_BUxd3J4W5jASsQu0NM9opFzR5mfg@mail.gmail.com \
    --to=bhelgaas@google.com \
    --cc=benh@kernel.crashing.org \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).