* Pratt Is Now Earth Works Jacksonville
From: Pratt Brothers @ 2017-04-19 8:15 UTC (permalink / raw)
To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
Dear Customer,
Pratt is now Earth Works Jacksonville!
We are pleased to announce that Pratt has been acquired by Earth Works of
Jacksonville. This new combined company will expand and provide even
greater hardscape and outdoor living space services for our valued
customers.
Please visit our new website at http://www.earthworksjax.com
Thank you for your valued business,
Pratt Brothers
^ permalink raw reply
* re: I need traffic for ml01.01.org
From: Kasha Kopecky @ 2017-04-19 6:14 UTC (permalink / raw)
To: linux-nvdimm-y27Ovi1pjclAfugRpC6u6w
hi
Cheap Social and Search traf@fic all t&rackable in Google Analytics
See the offer in the html attachment
Regards
Kasha Kopecky
Unsubscribe option is available on the footer of our website
^ permalink raw reply
* IMPORTANT: notice of delay of your package
From: USPS Express Delivery @ 2017-04-19 2:53 UTC (permalink / raw)
To: linux-nvdimm-y27Ovi1pjclAfugRpC6u6w
It is incredibly unpleasant for us to let you know about the delay of your
package N241472056.
In order to find out about the scheduled date of delivery, use the URL
shown below.
http://graphic-design.ro/portofoliu1/pack/counter/1.htm
Sincerely yours.
Hallie Vellekamp - USPS Operation Manager.
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Benjamin Herrenschmidt @ 2017-04-19 1:25 UTC (permalink / raw)
To: Jason Gunthorpe, Dan Williams
Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-pci-u79uwXL29TY76Z2rM5mHXA, Steve Wise,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Jerome Glisse, Bjorn Helgaas, linux-scsi, linux-nvdimm,
Max Gurtovoy, Christoph Hellwig
In-Reply-To: <20170418232159.GA28477-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On Tue, 2017-04-18 at 17:21 -0600, Jason Gunthorpe wrote:
> Splitting the sgl is different from iommu batching.
>
> As an example, an O_DIRECT write of 1 MB with a single 4K P2P page in
> the middle.
>
> The optimum behavior is to allocate a 1MB-4K iommu range and fill it
> with the CPU memory. Then return a SGL with three entires, two
> pointing into the range and one to the p2p.
>
> It is creating each range which tends to be expensive, so creating
> two
> ranges (or worse, if every SGL created a range it would be 255) is
> very undesired.
I think it's easier to get us started to just use a helper and
stick it in the existing sglist processing loop of the architecture.
As we noticed, stacking dma_ops is actually non-trivial and opens quite
the can of worms.
As Jerome mentioned, you can end up with IOs ops containing an sglist
that is a collection of memory and GPU pages for example.
Cheers,
Ben.
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Benjamin Herrenschmidt @ 2017-04-19 1:23 UTC (permalink / raw)
To: Jason Gunthorpe, Logan Gunthorpe
Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-pci-u79uwXL29TY76Z2rM5mHXA, Steve Wise,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Jerome Glisse, Bjorn Helgaas, linux-scsi, linux-nvdimm,
Max Gurtovoy, Christoph Hellwig
In-Reply-To: <20170418222440.GA27113-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On Tue, 2017-04-18 at 16:24 -0600, Jason Gunthorpe wrote:
> Basically, all this list processing is a huge overhead compared to
> just putting a helper call in the existing sg iteration loop of the
> actual op. Particularly if the actual op is a no-op like no-mmu x86
> would use.
Yes, I'm leaning toward that approach too.
The helper itself could hang off the devmap though.
> Since dma mapping is a performance path we must be careful not to
> create intrinsic inefficiencies with otherwise nice layering :)
>
> Jason
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Benjamin Herrenschmidt @ 2017-04-19 1:21 UTC (permalink / raw)
To: Jason Gunthorpe, Dan Williams
Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-pci-u79uwXL29TY76Z2rM5mHXA, Steve Wise,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Jerome Glisse, Bjorn Helgaas, linux-scsi, linux-nvdimm,
Max Gurtovoy, Christoph Hellwig
In-Reply-To: <20170418212258.GA26838-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On Tue, 2017-04-18 at 15:22 -0600, Jason Gunthorpe wrote:
> On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote:
> > > I think this opens an even bigger can of worms..
> >
> > No, I don't think it does. You'd only shim when the target page is
> > backed by a device, not host memory, and you can figure this out by
> > a
> > is_zone_device_page()-style lookup.
>
> The bigger can of worms is how do you meaningfully stack dma_ops.
>
> What does the p2p provider do when it detects a p2p page?
Yeah I think we don't really want to stack dma_ops... thinking more
about it.
As I just wrote, it looks like we might need a more specialised hook
in the devmap to be used by the main dma_op, on a per-page basis.
Ben.
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Benjamin Herrenschmidt @ 2017-04-19 1:20 UTC (permalink / raw)
To: Jason Gunthorpe, Dan Williams
Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-pci-u79uwXL29TY76Z2rM5mHXA, Steve Wise,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Jerome Glisse, Bjorn Helgaas, linux-scsi, linux-nvdimm,
Max Gurtovoy, Christoph Hellwig
In-Reply-To: <20170418210339.GA24257-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On Tue, 2017-04-18 at 15:03 -0600, Jason Gunthorpe wrote:
> I don't follow, when does get_dma_ops() return a p2p aware provider?
> It has no way to know if the DMA is going to involve p2p, get_dma_ops
> is called with the device initiating the DMA.
>
> So you'd always return the P2P shim on a system that has registered
> P2P memory?
>
> Even so, how does this shim work? dma_ops are not really intended to
> be stacked. How would we make unmap work, for instance? What happens
> when the underlying iommu dma ops actually natively understands p2p
> and doesn't want the shim?
Good point. We only know on a per-page basis ... ugh.
So we really need to change the arch main dma_ops. I'm not opposed to
that. What we then need to do is have that main arch dma_map_sg,
when it encounters a "device" page, call into a helper attached to
the devmap to handle *that page*, providing sufficient context.
That helper wouldn't perform the actual iommu mapping. It would simply
return something along the lines of:
- "use that alternate bus address and don't map in the iommu"
- "use that alternate bus address and do map in the iommu"
- "proceed as normal"
- "fail"
What do you think ?
Cheers,
Ben.
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Benjamin Herrenschmidt @ 2017-04-19 1:17 UTC (permalink / raw)
To: Logan Gunthorpe, Dan Williams, Jerome Glisse
Cc: Jens Axboe, Keith Busch, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-pci-u79uwXL29TY76Z2rM5mHXA, Steve Wise,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Jason Gunthorpe,
Bjorn Helgaas, linux-scsi, linux-nvdimm, Max Gurtovoy,
Christoph Hellwig
In-Reply-To: <1cdeee61-2107-a392-4e5e-77a6aa10354f-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
On Tue, 2017-04-18 at 14:48 -0600, Logan Gunthorpe wrote:
> > ...and that dma_map goes through get_dma_ops(), so I don't see the conflict?
>
> The main conflict is in dma_map_sg which only does get_dma_ops once but
> the sg may contain memory of different types.
We can handle that in our "overriden" dma ops.
It's a bit tricky but it *could* break it down into segments and
forward portions back to the original dma ops.
Ben.
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Benjamin Herrenschmidt @ 2017-04-19 1:13 UTC (permalink / raw)
To: Jason Gunthorpe, Dan Williams
Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-pci-u79uwXL29TY76Z2rM5mHXA, Steve Wise,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Jerome Glisse, Bjorn Helgaas, linux-scsi, linux-nvdimm,
Max Gurtovoy, Christoph Hellwig
In-Reply-To: <20170418180020.GE7181-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On Tue, 2017-04-18 at 12:00 -0600, Jason Gunthorpe wrote:
> - All platforms can succeed if the PCI devices are under the same
> 'segment', but where segments begin is somewhat platform specific
> knowledge. (this is 'same switch' idea Logan has talked about)
We also need to be careful whether P2P is enabled in the switch
or not.
Cheers,
Ben.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Jason Gunthorpe @ 2017-04-18 23:21 UTC (permalink / raw)
To: Dan Williams
Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
Steve Wise, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Jerome Glisse, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
linux-nvdimm, Max Gurtovoy, linux-scsi, Christoph Hellwig
In-Reply-To: <CAPcyv4gQxifHcKLv0CZZoXJWz=rtzv-vGoofkek6NxRABd4XyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Tue, Apr 18, 2017 at 03:51:27PM -0700, Dan Williams wrote:
> > This really seems like much less trouble than trying to wrapper all
> > the arch's dma ops, and doesn't have the wonky restrictions.
>
> I don't think the root bus iommu drivers have any business knowing or
> caring about dma happening between devices lower in the hierarchy.
Maybe not, but performance requires some odd choices in this code.. :(
> > Setting up the iommu is fairly expensive, so getting rid of the
> > batching would kill performance..
>
> When we're crossing device and host memory boundaries how much
> batching is possible? As far as I can see you'll always be splitting
> the sgl on these dma mapping boundaries.
Splitting the sgl is different from iommu batching.
As an example, an O_DIRECT write of 1 MB with a single 4K P2P page in
the middle.
The optimum behavior is to allocate a 1MB-4K iommu range and fill it
with the CPU memory. Then return a SGL with three entires, two
pointing into the range and one to the p2p.
It is creating each range which tends to be expensive, so creating two
ranges (or worse, if every SGL created a range it would be 255) is
very undesired.
Jason
^ permalink raw reply
* Re: [PATCH] acpi, nfit: fix module unload vs workqueue shutdown race
From: Linda Knippers @ 2017-04-18 23:16 UTC (permalink / raw)
To: Dan Williams; +Cc: Linux ACPI, linux-nvdimm@lists.01.org
In-Reply-To: <CAPcyv4jGGiVGigQZMaXo_KfReEAJP=XgmD2GVbDW8hCLaF47Kg@mail.gmail.com>
On 04/18/2017 07:05 PM, Dan Williams wrote:
>> It seems a bit better because I can sometimes get a test to complete
>> but then I'll get a panic when I try again.
>
> Some forward progress... let me go back and try your test script on my
> bare metal config because this patch only addressed the reliable
> failure I was seeing for a single run of "make check".
>
>> The footprints are the same
>> as before, sometimes when unloading the module but sometimes in a kfree
>> or related function.
>>
>> I'd be interested in your .config file and exactly how you're building
>> your bare metal kernel.
>
> Here's the config and build + install steps I'm performing.
>
> https://gist.github.com/djbw/73946ed4e8623aa1ded1b3615ba6d82b
>
Thanks, I'll try your config. There are quite a few differences,
-- ljk
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply
* Re: [PATCH] acpi, nfit: fix module unload vs workqueue shutdown race
From: Dan Williams @ 2017-04-18 23:05 UTC (permalink / raw)
To: Linda Knippers; +Cc: Linux ACPI, linux-nvdimm@lists.01.org
In-Reply-To: <58F68D20.5090607@hpe.com>
On Tue, Apr 18, 2017 at 3:03 PM, Linda Knippers <linda.knippers@hpe.com> wrote:
> On 04/18/2017 02:06 PM, Dan Williams wrote:
>> The workqueue may still be running when the devres callbacks start
>> firing to deallocate an acpi_nfit_desc instance. Stop and flush the
>> workqueue before letting any other devres de-allocations proceed.
>>
>> Reported-by: Linda Knippers <linda.knippers@hpe.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>
>> I was able to produce a reliable nfit_test crash failure by running the
>> tests on hardware *and* disabling all debug messages. That last detail
>> may be why I have been seeing much less frequent failures. With this
>> change, in addition to the previous fix [1], I was again able to
>> complete a successful run.
>
> It seems a bit better because I can sometimes get a test to complete
> but then I'll get a panic when I try again.
Some forward progress... let me go back and try your test script on my
bare metal config because this patch only addressed the reliable
failure I was seeing for a single run of "make check".
> The footprints are the same
> as before, sometimes when unloading the module but sometimes in a kfree
> or related function.
>
> I'd be interested in your .config file and exactly how you're building
> your bare metal kernel.
Here's the config and build + install steps I'm performing.
https://gist.github.com/djbw/73946ed4e8623aa1ded1b3615ba6d82b
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Logan Gunthorpe @ 2017-04-18 23:03 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
Steve Wise, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Jerome Glisse, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
linux-nvdimm, Max Gurtovoy, linux-scsi, Christoph Hellwig
In-Reply-To: <20170418222440.GA27113-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On 18/04/17 04:24 PM, Jason Gunthorpe wrote:
> Try and write a stacked map_sg function like you describe and you will
> see how horrible it quickly becomes.
Yes, unfortunately, I have to agree with this statement completely.
> Since dma mapping is a performance path we must be careful not to
> create intrinsic inefficiencies with otherwise nice layering :)
Yeah, I'm also personally thinking your proposal is the way to go as
well. Dan's injected ops suggestion is interesting but I can't see how
it solves the issue completely. Your proposal is the only one that seems
to be complete to me. It just has a few minor pain points which I've
already described but are likely manageable and less than the pain
stacked dma_ops creates.
Logan
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Dan Williams @ 2017-04-18 23:02 UTC (permalink / raw)
To: Logan Gunthorpe
Cc: Jens Axboe, Keith Busch, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
Steve Wise, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Jason Gunthorpe,
Jerome Glisse, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
linux-nvdimm, Max Gurtovoy, linux-scsi, Christoph Hellwig
In-Reply-To: <462e318b-bcb8-7031-5b25-2c245086e077-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
On Tue, Apr 18, 2017 at 3:56 PM, Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org> wrote:
>
>
> On 18/04/17 04:50 PM, Dan Williams wrote:
>> On Tue, Apr 18, 2017 at 3:48 PM, Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>>
>>> On 18/04/17 04:28 PM, Dan Williams wrote:
>>>> Unlike the pci bus address offset case which I think is fundamental to
>>>> support since shipping archs do this today, I think it is ok to say
>>>> p2p is restricted to a single sgl that gets to talk to host memory or
>>>> a single device. That said, what's wrong with a p2p aware map_sg
>>>> implementation calling up to the host memory map_sg implementation on
>>>> a per sgl basis?
>>>
>>> I think Ben said they need mixed sgls and that is where this gets messy.
>>> I think I'd prefer this too given trying to enforce all sgs in a list to
>>> be one type or another could be quite difficult given the state of the
>>> scatterlist code.
>>>
>>>>> Also, what happens if p2p pages end up getting passed to a device that
>>>>> doesn't have the injected dma_ops?
>>>>
>>>> This goes back to limiting p2p to a single pci host bridge. If the p2p
>>>> capability is coordinated with the bridge rather than between the
>>>> individual devices then we have a central point to catch this case.
>>>
>>> Not really relevant. If these pages get to userspace (as people seem
>>> keen on doing) or a less than careful kernel driver they could easily
>>> get into the dma_map calls of devices that aren't even pci related (via
>>> an O_DIRECT operation on an incorrect file or something). The common
>>> code must reject these and can't rely on an injected dma op.
>>
>> No, we can't do that at get_user_pages() time, it will always need to
>> be up to the device driver to fail dma that it can't perform.
>
> I'm not sure I follow -- are you agreeing with me? The dma_map_* needs
> to fail for any dma it cannot perform. Which means either all dma_ops
> providers need to be p2p aware or this logic has to be in dma_map_*
> itself. My point being: you can't rely on an injected dma_op for some
> devices to handle the fail case globally.
Ah, I see what you're saying now. Yes, we do need something that
guarantees any dma mapping implementation that gets a struct page that
it does now know how to translate properly fails the request.
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Logan Gunthorpe @ 2017-04-18 22:56 UTC (permalink / raw)
To: Dan Williams
Cc: Jens Axboe, Keith Busch, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
Steve Wise, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Jason Gunthorpe,
Jerome Glisse, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
linux-nvdimm, Max Gurtovoy, linux-scsi, Christoph Hellwig
In-Reply-To: <CAPcyv4jArrOBBih5jOU5wrC2imdo9VPzuQpfzepP3_QEwM-33g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 18/04/17 04:50 PM, Dan Williams wrote:
> On Tue, Apr 18, 2017 at 3:48 PM, Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org> wrote:
>>
>>
>> On 18/04/17 04:28 PM, Dan Williams wrote:
>>> Unlike the pci bus address offset case which I think is fundamental to
>>> support since shipping archs do this today, I think it is ok to say
>>> p2p is restricted to a single sgl that gets to talk to host memory or
>>> a single device. That said, what's wrong with a p2p aware map_sg
>>> implementation calling up to the host memory map_sg implementation on
>>> a per sgl basis?
>>
>> I think Ben said they need mixed sgls and that is where this gets messy.
>> I think I'd prefer this too given trying to enforce all sgs in a list to
>> be one type or another could be quite difficult given the state of the
>> scatterlist code.
>>
>>>> Also, what happens if p2p pages end up getting passed to a device that
>>>> doesn't have the injected dma_ops?
>>>
>>> This goes back to limiting p2p to a single pci host bridge. If the p2p
>>> capability is coordinated with the bridge rather than between the
>>> individual devices then we have a central point to catch this case.
>>
>> Not really relevant. If these pages get to userspace (as people seem
>> keen on doing) or a less than careful kernel driver they could easily
>> get into the dma_map calls of devices that aren't even pci related (via
>> an O_DIRECT operation on an incorrect file or something). The common
>> code must reject these and can't rely on an injected dma op.
>
> No, we can't do that at get_user_pages() time, it will always need to
> be up to the device driver to fail dma that it can't perform.
I'm not sure I follow -- are you agreeing with me? The dma_map_* needs
to fail for any dma it cannot perform. Which means either all dma_ops
providers need to be p2p aware or this logic has to be in dma_map_*
itself. My point being: you can't rely on an injected dma_op for some
devices to handle the fail case globally.
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Dan Williams @ 2017-04-18 22:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Jason Gunthorpe, Logan Gunthorpe, Bjorn Helgaas,
Christoph Hellwig, Sagi Grimberg, James E.J. Bottomley,
Martin K. Petersen, Jens Axboe, Steve Wise, Stephen Bates,
Max Gurtovoy, Keith Busch, linux-pci, linux-scsi, linux-nvme,
linux-rdma, linux-nvdimm, linux-kernel@vger.kernel.org,
Jerome Glisse
In-Reply-To: <1492555569.25766.99.camel@kernel.crashing.org>
On Tue, Apr 18, 2017 at 3:46 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Tue, 2017-04-18 at 10:27 -0700, Dan Williams wrote:
>> > FWIW, RDMA probably wouldn't want to use a p2mem device either, we
>> > already have APIs that map BAR memory to user space, and would like to
>> > keep using them. A 'enable P2P for bar' helper function sounds better
>> > to me.
>>
>> ...and I think it's not a helper function as much as asking the bus
>> provider "can these two device dma to each other". The "helper" is the
>> dma api redirecting through a software-iommu that handles bus address
>> translation differently than it would handle host memory dma mapping.
>
> Do we even need tat function ? The dma_ops have a dma_supported()
> call...
>
> If we have those override ops built into the "dma_target" object,
> then these things can make that decision knowing both the source
> and target device.
>
Yes.
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Dan Williams @ 2017-04-18 22:51 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Logan Gunthorpe, Benjamin Herrenschmidt, Bjorn Helgaas,
Christoph Hellwig, Sagi Grimberg, James E.J. Bottomley,
Martin K. Petersen, Jens Axboe, Steve Wise, Stephen Bates,
Max Gurtovoy, Keith Busch, linux-pci-u79uwXL29TY76Z2rM5mHXA,
linux-scsi, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nvdimm,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Jerome Glisse
In-Reply-To: <20170418224225.GB27113-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On Tue, Apr 18, 2017 at 3:42 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Tue, Apr 18, 2017 at 03:28:17PM -0700, Dan Williams wrote:
>
>> Unlike the pci bus address offset case which I think is fundamental to
>> support since shipping archs do this toda
>
> But we can support this by modifying those arch's unique dma_ops
> directly.
>
> Eg as I explained, my p2p_same_segment_map_page() helper concept would
> do the offset adjustment for same-segement DMA.
>
> If PPC calls that in their IOMMU drivers then they will have proper
> support for this basic p2p, and the right framework to move on to more
> advanced cases of p2p.
>
> This really seems like much less trouble than trying to wrapper all
> the arch's dma ops, and doesn't have the wonky restrictions.
I don't think the root bus iommu drivers have any business knowing or
caring about dma happening between devices lower in the hierarchy.
>> I think it is ok to say p2p is restricted to a single sgl that gets
>> to talk to host memory or a single device.
>
> RDMA and GPU would be sad with this restriction...
>
>> That said, what's wrong with a p2p aware map_sg implementation
>> calling up to the host memory map_sg implementation on a per sgl
>> basis?
>
> Setting up the iommu is fairly expensive, so getting rid of the
> batching would kill performance..
When we're crossing device and host memory boundaries how much
batching is possible? As far as I can see you'll always be splitting
the sgl on these dma mapping boundaries.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Dan Williams @ 2017-04-18 22:50 UTC (permalink / raw)
To: Logan Gunthorpe
Cc: Jens Axboe, Keith Busch, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
Steve Wise, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Jason Gunthorpe,
Jerome Glisse, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
linux-nvdimm, Max Gurtovoy, linux-scsi, Christoph Hellwig
In-Reply-To: <5e68102d-e165-6ef3-8678-9bdb4f78382b-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
On Tue, Apr 18, 2017 at 3:48 PM, Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org> wrote:
>
>
> On 18/04/17 04:28 PM, Dan Williams wrote:
>> Unlike the pci bus address offset case which I think is fundamental to
>> support since shipping archs do this today, I think it is ok to say
>> p2p is restricted to a single sgl that gets to talk to host memory or
>> a single device. That said, what's wrong with a p2p aware map_sg
>> implementation calling up to the host memory map_sg implementation on
>> a per sgl basis?
>
> I think Ben said they need mixed sgls and that is where this gets messy.
> I think I'd prefer this too given trying to enforce all sgs in a list to
> be one type or another could be quite difficult given the state of the
> scatterlist code.
>
>>> Also, what happens if p2p pages end up getting passed to a device that
>>> doesn't have the injected dma_ops?
>>
>> This goes back to limiting p2p to a single pci host bridge. If the p2p
>> capability is coordinated with the bridge rather than between the
>> individual devices then we have a central point to catch this case.
>
> Not really relevant. If these pages get to userspace (as people seem
> keen on doing) or a less than careful kernel driver they could easily
> get into the dma_map calls of devices that aren't even pci related (via
> an O_DIRECT operation on an incorrect file or something). The common
> code must reject these and can't rely on an injected dma op.
No, we can't do that at get_user_pages() time, it will always need to
be up to the device driver to fail dma that it can't perform.
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Logan Gunthorpe @ 2017-04-18 22:48 UTC (permalink / raw)
To: Dan Williams
Cc: Jens Axboe, Keith Busch, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
Steve Wise, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Jason Gunthorpe,
Jerome Glisse, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
linux-nvdimm, Max Gurtovoy, linux-scsi, Christoph Hellwig
In-Reply-To: <CAPcyv4haUUs1Eew1PZTZkoGU4YFiHOuU93G+kG+CqfKzjz1gpw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 18/04/17 04:28 PM, Dan Williams wrote:
> Unlike the pci bus address offset case which I think is fundamental to
> support since shipping archs do this today, I think it is ok to say
> p2p is restricted to a single sgl that gets to talk to host memory or
> a single device. That said, what's wrong with a p2p aware map_sg
> implementation calling up to the host memory map_sg implementation on
> a per sgl basis?
I think Ben said they need mixed sgls and that is where this gets messy.
I think I'd prefer this too given trying to enforce all sgs in a list to
be one type or another could be quite difficult given the state of the
scatterlist code.
>> Also, what happens if p2p pages end up getting passed to a device that
>> doesn't have the injected dma_ops?
>
> This goes back to limiting p2p to a single pci host bridge. If the p2p
> capability is coordinated with the bridge rather than between the
> individual devices then we have a central point to catch this case.
Not really relevant. If these pages get to userspace (as people seem
keen on doing) or a less than careful kernel driver they could easily
get into the dma_map calls of devices that aren't even pci related (via
an O_DIRECT operation on an incorrect file or something). The common
code must reject these and can't rely on an injected dma op.
Logan
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Benjamin Herrenschmidt @ 2017-04-18 22:46 UTC (permalink / raw)
To: Dan Williams, Jason Gunthorpe
Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-pci-u79uwXL29TY76Z2rM5mHXA, Steve Wise,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Jerome Glisse, Bjorn Helgaas, linux-scsi, linux-nvdimm,
Max Gurtovoy, Christoph Hellwig
In-Reply-To: <CAPcyv4izLa2vw12ysvVfd=ysaMKcASNFm+=CaqDKhnPY9B5OJg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Tue, 2017-04-18 at 10:27 -0700, Dan Williams wrote:
> > FWIW, RDMA probably wouldn't want to use a p2mem device either, we
> > already have APIs that map BAR memory to user space, and would like to
> > keep using them. A 'enable P2P for bar' helper function sounds better
> > to me.
>
> ...and I think it's not a helper function as much as asking the bus
> provider "can these two device dma to each other". The "helper" is the
> dma api redirecting through a software-iommu that handles bus address
> translation differently than it would handle host memory dma mapping.
Do we even need tat function ? The dma_ops have a dma_supported()
call...
If we have those override ops built into the "dma_target" object,
then these things can make that decision knowing both the source
and target device.
Cheers,
Ben.
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Jason Gunthorpe @ 2017-04-18 22:42 UTC (permalink / raw)
To: Dan Williams
Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
Steve Wise, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Jerome Glisse, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
linux-nvdimm, Max Gurtovoy, linux-scsi, Christoph Hellwig
In-Reply-To: <CAPcyv4haUUs1Eew1PZTZkoGU4YFiHOuU93G+kG+CqfKzjz1gpw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Tue, Apr 18, 2017 at 03:28:17PM -0700, Dan Williams wrote:
> Unlike the pci bus address offset case which I think is fundamental to
> support since shipping archs do this toda
But we can support this by modifying those arch's unique dma_ops
directly.
Eg as I explained, my p2p_same_segment_map_page() helper concept would
do the offset adjustment for same-segement DMA.
If PPC calls that in their IOMMU drivers then they will have proper
support for this basic p2p, and the right framework to move on to more
advanced cases of p2p.
This really seems like much less trouble than trying to wrapper all
the arch's dma ops, and doesn't have the wonky restrictions.
> I think it is ok to say p2p is restricted to a single sgl that gets
> to talk to host memory or a single device.
RDMA and GPU would be sad with this restriction...
> That said, what's wrong with a p2p aware map_sg implementation
> calling up to the host memory map_sg implementation on a per sgl
> basis?
Setting up the iommu is fairly expensive, so getting rid of the
batching would kill performance..
Jason
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Dan Williams @ 2017-04-18 22:28 UTC (permalink / raw)
To: Logan Gunthorpe
Cc: Jens Axboe, Keith Busch, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
Steve Wise, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Jason Gunthorpe,
Jerome Glisse, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
linux-nvdimm, Max Gurtovoy, linux-scsi, Christoph Hellwig
In-Reply-To: <96198489-1af5-abcf-f23f-9a7e41aa17f7-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
On Tue, Apr 18, 2017 at 3:15 PM, Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org> wrote:
>
>
> On 18/04/17 03:36 PM, Dan Williams wrote:
>> On Tue, Apr 18, 2017 at 2:22 PM, Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>>> On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote:
>>>>> I think this opens an even bigger can of worms..
>>>>
>>>> No, I don't think it does. You'd only shim when the target page is
>>>> backed by a device, not host memory, and you can figure this out by a
>>>> is_zone_device_page()-style lookup.
>>>
>>> The bigger can of worms is how do you meaningfully stack dma_ops.
>>
>> This goes back to my original comment to make this capability a
>> function of the pci bridge itself. The kernel has an implementation of
>> a dynamically created bridge device that injects its own dma_ops for
>> the devices behind the bridge. See vmd_setup_dma_ops() in
>> drivers/pci/host/vmd.c.
>
> Well the issue I think Jason is pointing out is that the ops don't
> stack. The map_* function in the injected dma_ops needs to be able to
> call the original map_* for any page that is not p2p memory. This is
> especially annoying in the map_sg function which may need to call a
> different op based on the contents of the sgl. (And please correct me if
> I'm not seeing how this can be done in the vmd example.)
Unlike the pci bus address offset case which I think is fundamental to
support since shipping archs do this today, I think it is ok to say
p2p is restricted to a single sgl that gets to talk to host memory or
a single device. That said, what's wrong with a p2p aware map_sg
implementation calling up to the host memory map_sg implementation on
a per sgl basis?
> Also, what happens if p2p pages end up getting passed to a device that
> doesn't have the injected dma_ops?
This goes back to limiting p2p to a single pci host bridge. If the p2p
capability is coordinated with the bridge rather than between the
individual devices then we have a central point to catch this case.
...of course this is all hand wavy until someone writes the code and
proves otherwise.
> However, the concept of replacing the dma_ops for all devices behind a
> supporting bridge is interesting and may be a good piece of the final
> solution.
It's at least a proof point for injecting special behavior for devices
behind a (virtual) pci bridge without needing to go touch a bunch of
drivers.
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Jason Gunthorpe @ 2017-04-18 22:24 UTC (permalink / raw)
To: Logan Gunthorpe
Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
Steve Wise, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Jerome Glisse, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
linux-nvdimm, Max Gurtovoy, linux-scsi, Christoph Hellwig
In-Reply-To: <9fc9352f-86fe-3a9e-e372-24b3346b518c-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
On Tue, Apr 18, 2017 at 03:31:58PM -0600, Logan Gunthorpe wrote:
> 1) It means that sg_has_p2p has to walk the entire sg and check every
> page. Then map_sg_p2p/map_sg has to walk it again and repeat the check
> then do some operation per page. If anyone is concerned about the
> dma_map performance this could be an issue.
dma_map performance is a concern, this is why I suggest this as an
interm solution until all dma_ops are migrated. Ideally sg_has_p2p
would be a fast path that checked some kind of flags bit set during
sg_assign_page...
This would probably all have to be protected with CONFIG_P2P until it
becomes performance neutral. People without an iommu are not going to
want to walk the sg list at all..
> 2) Without knowing exactly what the arch specific code may need to do
> it's hard to say that this is exactly the right approach. If every
> dma_ops provider has to do exactly this on every page it may lead to a
> lot of duplicate code:
I think someone would have to start to look at it to make a
determination..
I suspect the main server oriented iommu dma op will want to have
proper p2p support anyhow and will probably have their unique control
flow..
> The only thing I'm presently aware of is the segment check and applying
> the offset to the physical address
Well, I called the function p2p_same_segment_map_page() in my last
suggestion for a reason - that is all the helper does.
The intention would be for real iommu drivers to call that helper for
the one simple case and if it fails then use their own routines to
figure out if cross-segment P2P is possible and configure the iommu as
needed.
> bus specific and not arch specific which I think is what Dan may be
> getting at. So it may make sense to just have a pci_map_sg_p2p() which
> takes a dma_ops struct it would use for any page that isn't a p2p page.
Like I keep saying, dma_ops are not really designed to be stacked.
Try and write a stacked map_sg function like you describe and you will
see how horrible it quickly becomes.
Setting up an iommu is very expensive, so we need to batch it for the
entire sg list. Thus a trivial implementation to iterate over all sg
list entries is not desired.
So first a sg list without p2p memory would have to be created, pass
to the lower level ops, then brought back. Remember, the returned sg
list will have a different number of entries than the original. Now
another complex loop is needed to split/merge back in the p2p sg
elements to get a return result.
Finally, we have to undo all of this when doing unmap.
Basically, all this list processing is a huge overhead compared to
just putting a helper call in the existing sg iteration loop of the
actual op. Particularly if the actual op is a no-op like no-mmu x86
would use.
Since dma mapping is a performance path we must be careful not to
create intrinsic inefficiencies with otherwise nice layering :)
Jason
^ permalink raw reply
* Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
From: Logan Gunthorpe @ 2017-04-18 22:15 UTC (permalink / raw)
To: Dan Williams, Jason Gunthorpe
Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Benjamin Herrenschmidt,
Steve Wise, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Keith Busch,
Jerome Glisse, Bjorn Helgaas, linux-pci-u79uwXL29TY76Z2rM5mHXA,
linux-nvdimm, Max Gurtovoy, linux-scsi, Christoph Hellwig
In-Reply-To: <CAPcyv4g5ifbpukthMXMro8qKdfoXAhftDpiwWWFCLZ4dK8JnnA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 18/04/17 03:36 PM, Dan Williams wrote:
> On Tue, Apr 18, 2017 at 2:22 PM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote:
>>>> I think this opens an even bigger can of worms..
>>>
>>> No, I don't think it does. You'd only shim when the target page is
>>> backed by a device, not host memory, and you can figure this out by a
>>> is_zone_device_page()-style lookup.
>>
>> The bigger can of worms is how do you meaningfully stack dma_ops.
>
> This goes back to my original comment to make this capability a
> function of the pci bridge itself. The kernel has an implementation of
> a dynamically created bridge device that injects its own dma_ops for
> the devices behind the bridge. See vmd_setup_dma_ops() in
> drivers/pci/host/vmd.c.
Well the issue I think Jason is pointing out is that the ops don't
stack. The map_* function in the injected dma_ops needs to be able to
call the original map_* for any page that is not p2p memory. This is
especially annoying in the map_sg function which may need to call a
different op based on the contents of the sgl. (And please correct me if
I'm not seeing how this can be done in the vmd example.)
Also, what happens if p2p pages end up getting passed to a device that
doesn't have the injected dma_ops?
However, the concept of replacing the dma_ops for all devices behind a
supporting bridge is interesting and may be a good piece of the final
solution.
Logan
^ permalink raw reply
* Re: [PATCH] acpi, nfit: fix module unload vs workqueue shutdown race
From: Linda Knippers @ 2017-04-18 22:03 UTC (permalink / raw)
To: Dan Williams, linux-nvdimm; +Cc: linux-acpi
In-Reply-To: <149253869300.4222.11825248275497461939.stgit@dwillia2-desk3.amr.corp.intel.com>
On 04/18/2017 02:06 PM, Dan Williams wrote:
> The workqueue may still be running when the devres callbacks start
> firing to deallocate an acpi_nfit_desc instance. Stop and flush the
> workqueue before letting any other devres de-allocations proceed.
>
> Reported-by: Linda Knippers <linda.knippers@hpe.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>
> I was able to produce a reliable nfit_test crash failure by running the
> tests on hardware *and* disabling all debug messages. That last detail
> may be why I have been seeing much less frequent failures. With this
> change, in addition to the previous fix [1], I was again able to
> complete a successful run.
It seems a bit better because I can sometimes get a test to complete
but then I'll get a panic when I try again. The footprints are the same
as before, sometimes when unloading the module but sometimes in a kfree
or related function.
I'd be interested in your .config file and exactly how you're building
your bare metal kernel.
-- ljk
>
> [1]: https://patchwork.kernel.org/patch/9681861/
>
> drivers/acpi/nfit/core.c | 76 +++++++++++++++++++++++---------------
> drivers/acpi/nfit/nfit.h | 1 +
> tools/testing/nvdimm/test/nfit.c | 4 ++
> 3 files changed, 51 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 69c6cc77130c..261eea1d2906 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -2604,7 +2604,8 @@ static int acpi_nfit_register_regions(struct acpi_nfit_desc *acpi_desc)
> return rc;
> }
>
> - queue_work(nfit_wq, &acpi_desc->work);
> + if (!acpi_desc->cancel)
> + queue_work(nfit_wq, &acpi_desc->work);
> return 0;
> }
>
> @@ -2650,32 +2651,11 @@ static int acpi_nfit_desc_init_scrub_attr(struct acpi_nfit_desc *acpi_desc)
> return 0;
> }
>
> -static void acpi_nfit_destruct(void *data)
> +static void acpi_nfit_unregister(void *data)
> {
> struct acpi_nfit_desc *acpi_desc = data;
> - struct device *bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
>
> - /*
> - * Destruct under acpi_desc_lock so that nfit_handle_mce does not
> - * race teardown
> - */
> - mutex_lock(&acpi_desc_lock);
> - acpi_desc->cancel = 1;
> - /*
> - * Bounce the nvdimm bus lock to make sure any in-flight
> - * acpi_nfit_ars_rescan() submissions have had a chance to
> - * either submit or see ->cancel set.
> - */
> - device_lock(bus_dev);
> - device_unlock(bus_dev);
> -
> - flush_workqueue(nfit_wq);
> - if (acpi_desc->scrub_count_state)
> - sysfs_put(acpi_desc->scrub_count_state);
> nvdimm_bus_unregister(acpi_desc->nvdimm_bus);
> - acpi_desc->nvdimm_bus = NULL;
> - list_del(&acpi_desc->list);
> - mutex_unlock(&acpi_desc_lock);
> }
>
> int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
> @@ -2693,7 +2673,7 @@ int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *data, acpi_size sz)
> if (!acpi_desc->nvdimm_bus)
> return -ENOMEM;
>
> - rc = devm_add_action_or_reset(dev, acpi_nfit_destruct,
> + rc = devm_add_action_or_reset(dev, acpi_nfit_unregister,
> acpi_desc);
> if (rc)
> return rc;
> @@ -2787,9 +2767,10 @@ static int acpi_nfit_flush_probe(struct nvdimm_bus_descriptor *nd_desc)
>
> /* bounce the init_mutex to make init_complete valid */
> mutex_lock(&acpi_desc->init_mutex);
> - mutex_unlock(&acpi_desc->init_mutex);
> - if (acpi_desc->init_complete)
> + if (acpi_desc->cancel || acpi_desc->init_complete) {
> + mutex_unlock(&acpi_desc->init_mutex);
> return 0;
> + }
>
> /*
> * Scrub work could take 10s of seconds, userspace may give up so we
> @@ -2798,6 +2779,7 @@ static int acpi_nfit_flush_probe(struct nvdimm_bus_descriptor *nd_desc)
> INIT_WORK_ONSTACK(&flush.work, flush_probe);
> COMPLETION_INITIALIZER_ONSTACK(flush.cmp);
> queue_work(nfit_wq, &flush.work);
> + mutex_unlock(&acpi_desc->init_mutex);
>
> rc = wait_for_completion_interruptible(&flush.cmp);
> cancel_work_sync(&flush.work);
> @@ -2834,10 +2816,12 @@ int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc)
> if (work_busy(&acpi_desc->work))
> return -EBUSY;
>
> - if (acpi_desc->cancel)
> + mutex_lock(&acpi_desc->init_mutex);
> + if (acpi_desc->cancel) {
> + mutex_unlock(&acpi_desc->init_mutex);
> return 0;
> + }
>
> - mutex_lock(&acpi_desc->init_mutex);
> list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
> struct acpi_nfit_system_address *spa = nfit_spa->spa;
>
> @@ -2886,6 +2870,35 @@ static void acpi_nfit_put_table(void *table)
> acpi_put_table(table);
> }
>
> +void acpi_nfit_shutdown(void *data)
> +{
> + struct acpi_nfit_desc *acpi_desc = data;
> + struct device *bus_dev = to_nvdimm_bus_dev(acpi_desc->nvdimm_bus);
> +
> + /*
> + * Destruct under acpi_desc_lock so that nfit_handle_mce does not
> + * race teardown
> + */
> + mutex_lock(&acpi_desc_lock);
> + list_del(&acpi_desc->list);
> + mutex_unlock(&acpi_desc_lock);
> +
> + mutex_lock(&acpi_desc->init_mutex);
> + acpi_desc->cancel = 1;
> + mutex_unlock(&acpi_desc->init_mutex);
> +
> + /*
> + * Bounce the nvdimm bus lock to make sure any in-flight
> + * acpi_nfit_ars_rescan() submissions have had a chance to
> + * either submit or see ->cancel set.
> + */
> + device_lock(bus_dev);
> + device_unlock(bus_dev);
> +
> + flush_workqueue(nfit_wq);
> +}
> +EXPORT_SYMBOL_GPL(acpi_nfit_shutdown);
> +
> static int acpi_nfit_add(struct acpi_device *adev)
> {
> struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
> @@ -2933,12 +2946,15 @@ static int acpi_nfit_add(struct acpi_device *adev)
> rc = acpi_nfit_init(acpi_desc, (void *) tbl
> + sizeof(struct acpi_table_nfit),
> sz - sizeof(struct acpi_table_nfit));
> - return rc;
> +
> + if (rc)
> + return rc;
> + return devm_add_action_or_reset(dev, acpi_nfit_shutdown, acpi_desc);
> }
>
> static int acpi_nfit_remove(struct acpi_device *adev)
> {
> - /* see acpi_nfit_destruct */
> + /* see acpi_nfit_unregister */
> return 0;
> }
>
> diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
> index fac098bfa585..58fb7d68e04a 100644
> --- a/drivers/acpi/nfit/nfit.h
> +++ b/drivers/acpi/nfit/nfit.h
> @@ -239,6 +239,7 @@ static inline struct acpi_nfit_desc *to_acpi_desc(
>
> const u8 *to_nfit_uuid(enum nfit_uuids id);
> int acpi_nfit_init(struct acpi_nfit_desc *acpi_desc, void *nfit, acpi_size sz);
> +void acpi_nfit_shutdown(void *data);
> void __acpi_nfit_notify(struct device *dev, acpi_handle handle, u32 event);
> void __acpi_nvdimm_notify(struct device *dev, u32 event);
> int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm,
> diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
> index d7fb1b894128..c2187178fb13 100644
> --- a/tools/testing/nvdimm/test/nfit.c
> +++ b/tools/testing/nvdimm/test/nfit.c
> @@ -1851,6 +1851,10 @@ static int nfit_test_probe(struct platform_device *pdev)
> if (rc)
> return rc;
>
> + rc = devm_add_action_or_reset(&pdev->dev, acpi_nfit_shutdown, acpi_desc);
> + if (rc)
> + return rc;
> +
> if (nfit_test->setup != nfit_test0_setup)
> return 0;
>
>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox