Linux virtualization list
 help / color / mirror / Atom feed
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Michael S. Tsirkin @ 2018-06-13 14:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: robh, pawel.moll, Tom Lendacky, cohuck, Ram Pai, linux-kernel,
	virtualization, Christoph Hellwig, joe, Rustad, Mark D,
	Anshuman Khandual, linuxppc-dev, elfring, david
In-Reply-To: <59e60715f27b10bc6816193eaf324824eff69c46.camel@kernel.crashing.org>

On Mon, Jun 11, 2018 at 01:34:50PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-06-11 at 06:28 +0300, Michael S. Tsirkin wrote:
> > 
> > > However if the administrator
> > > ignores/forgets/deliberatey-decides/is-constrained to NOT enable the
> > > flag, virtio will not be able to pass control to the DMA ops associated
> > > with the virtio devices. Which means, we have no opportunity to share
> > > the I/O buffers with the hypervisor/qemu.
> > > 
> > > How do you suggest, we handle this case?
> > 
> > As step 1, ignore it as a user error.
> 
> Ugh ... not again. Ram, don't bring that subject back we ALREADY
> addressed it, and requiring the *user* to do special things is just
> utterly and completely wrong.
> 
> The *user* has no bloody idea what that stuff is, will never know to
> set whatver magic qemu flag etc... The user will just start a a VM
> normally and expect things to work. Requiring the *user* to know things
> like that iommu virtio flag is complete nonsense.

You should consider teaching QEMU that the platform has support for the
ultravisor thing so it can set flags for you.

That's true even if something else is done for virtio - if you don't
have the capability right now you will come to regret it, host userspace
is just fundamentally in charge of control-path enumeration in the KVM
stack, I understand you want to take some of it away for security but
trying to hide things from QEMU completely is a mistake IMHO.

Just my $.02.

> If by "user" you mean libvirt, then you are now requesting about 4 or 5
> different projects to be patched to add speical cases for something
> they know nothing about and is completely irrelevant, while it can be
> entirely addressed with a 1-liner in virtio kernel side to allow the
> arch to plumb alternate DMA ops.
> 
> So for some reason you seem to be dead set on a path that leads to
> mountain of user pain, changes to many different projects and overall
> havok while there is a much much simpler and elegant solution at hand
> which I described (again) in the response to Ram I sent about 5mn ago.

What you sent to Ram sounds OK to me - I only hope we do all mean
the same thing because the 3 paragraphs above are all
about the libvirt/QEMU split mess which linux or virtio
should not really care about.

And I'd like to stress that direct path isn't merely legacy.

It's a common sense approach that when you have a hypervisor doing
things like taking care of cache coherency, you do not want to do them
in the guest as well. And the same guest can use a pass-through device
where it does need to do all the coherency mess, so while it's quite
possible to build a platform-specific way to tell guests which devices
need which kind of treatment, people understandably chose to do it in a
portable way through the virtio device.

I understand you guys are working on a new project that is going to use
bounce buffers anyway so what do you care about optimizations but we
can't just slow everyone else down for your benefit.

> > Further you can for example add per-device quirks in virtio so it can be
> > switched to dma api. make extra decisions in platform code then.

Isn't above exactly what you sent to Ram?

-- 
MST

^ permalink raw reply

* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Michael S. Tsirkin @ 2018-06-13 14:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: robh, pawel.moll, Tom Lendacky, cohuck, Ram Pai, linux-kernel,
	virtualization, Christoph Hellwig, joe, Rustad, Mark D,
	Anshuman Khandual, linuxppc-dev, elfring, david
In-Reply-To: <07b804fccd7373c650be79ac9fa77ae7f2375ced.camel@kernel.crashing.org>

On Mon, Jun 11, 2018 at 01:29:18PM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2018-06-10 at 19:39 -0700, Ram Pai wrote:
> > 
> > However if the administrator
> > ignores/forgets/deliberatey-decides/is-constrained to NOT enable the
> > flag, virtio will not be able to pass control to the DMA ops associated
> > with the virtio devices. Which means, we have no opportunity to share
> > the I/O buffers with the hypervisor/qemu.
> > 
> > How do you suggest, we handle this case?
> 
> At the risk of repeating myself, let's just do the first pass which is
> to switch virtio over to always using the DMA API in the actual data
> flow code, with a hook at initialization time that replaces the DMA ops
> with some home cooked "direct" ops in the case where the IOMMU flag
> isn't set.

I'm not sure I understand all of the details, will have to
see the patch, but superficially it sounds good to me.

> This will be equivalent to what we have today but avoids having 2
> separate code path all over the driver.
> 
> Then a second stage, I think, is to replace this "hook" so that the
> architecture gets a say in the matter.
> 
> Basically, something like:
> 
> 	arch_virtio_update_dma_ops(pci_dev, qemu_direct_mode).
> 
> IE, virtio would tell the arch whether the "other side" is in fact QEMU
> in a mode that bypasses the IOMMU and is cache coherent with the guest.
> This is our legacy "qemu special" mode. If the performance is
> sufficient we may want to deprecate it over time and have qemu enable
> the iommu by default but we still need it.
> 
> A weak implementation of the above will be provied that just puts in
> the direct ops when qemu_direct_mode is set, and thus provides today's
> behaviour on any arch that doesn't override it. If the flag is not set,
> the ops are left to whatever the arch PCI layer already set.
> 
> This will provide the opportunity for architectures that want to do
> something special, such as in our case, when we want to force even the
> "qemu_direct_mode" to go via bounce buffers, to put our own ops in
> there, while retaining the legacy behaviour otherwise.
> 
> It also means that the "gunk" is entirely localized in that one
> function, the rest of virtio just uses the DMA API normally.
> 
> Christoph, are you actually hacking "stage 1" above already or should
> we produce patches ?
> 
> Cheers,
> Ben.

^ permalink raw reply

* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Michael S. Tsirkin @ 2018-06-13 13:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tom Lendacky, pawel.moll, robh, Benjamin Herrenschmidt, cohuck,
	Ram Pai, linux-kernel, virtualization, joe, Rustad, Mark D,
	Anshuman Khandual, linuxppc-dev, elfring, david
In-Reply-To: <20180613074141.GA12033@infradead.org>

On Wed, Jun 13, 2018 at 12:41:41AM -0700, Christoph Hellwig wrote:
> On Mon, Jun 11, 2018 at 01:29:18PM +1000, Benjamin Herrenschmidt wrote:
> > At the risk of repeating myself, let's just do the first pass which is
> > to switch virtio over to always using the DMA API in the actual data
> > flow code, with a hook at initialization time that replaces the DMA ops
> > with some home cooked "direct" ops in the case where the IOMMU flag
> > isn't set.
> > 
> > This will be equivalent to what we have today but avoids having 2
> > separate code path all over the driver.
> > 
> > Then a second stage, I think, is to replace this "hook" so that the
> > architecture gets a say in the matter.
> 
> I don't think we can actually use dma_direct_ops.  It still allows
> architectures to override parts of the dma setup, which virtio seems
> to blindly assume phys == dma and not cache flushing.
> 
> I think the right way forward is to either add a new
> VIRTIO_F_IS_PCI_DEVICE (or redefine the existing iommu flag if deemed
> possible).

Given this is exactly what happens now, this seems possible, but maybe
we want a non-PCI specific name.

>  And then make sure recent qemu always sets it.

I don't think that part is going to happen, sorry.

Hypervisors can set it when they *actually have* a real PCI device.
People emulate systems which have a bunch of overhead in the DMA API
which is required for real DMA. Your proposal would double that overhead
by first doing it in guest then re-doing it in host.

I don't think it's justified when 99% of the world doesn't need it.
-- 
MST

^ permalink raw reply

* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Michael S. Tsirkin @ 2018-06-13 13:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: robh, pawel.moll, Tom Lendacky, cohuck, Ram Pai, linux-kernel,
	virtualization, joe, Rustad, Mark D, david, linuxppc-dev, elfring,
	Anshuman Khandual
In-Reply-To: <20180608063655.GA32080@infradead.org>

On Thu, Jun 07, 2018 at 11:36:55PM -0700, Christoph Hellwig wrote:
> > This seems to be what was being asked for in this thread,
> > with comments claiming IOMMU flag adds too much overhead.
> 
> Right now it means implementing a virtual iommu, which I agree is
> way too much overhead.

Well not really. The flag in question will have a desired effect without
a virtual iommu.

> > SEV guys report that they just set the iommu flag and then it all works.
> > I guess if there's translation we can think of this as a kind of iommu.
> > Maybe we should rename PLATFORM_IOMMU to PLARTFORM_TRANSLATION?
> 
> VIRTIO_F_BEHAVES_LIKE_A_REAL_PCI_DEVICE_DONT_TRY_TO_OUTSMART_ME
> 
> as said it's not just translations, it is cache coherence as well.

Well it's only for DMA. So maybe PLATFORM_DMA.

I suspect people will then come and complain that they
do *not* want cache coherence tricks because virtio is
running on a CPU, but we'll see.

> > And apparently some people complain that just setting that flag makes
> > qemu check translation on each access with an unacceptable performance
> > overhead.  Forcing same behaviour for everyone on general principles
> > even without the flag is unlikely to make them happy.
> 
> That sounds like a qemu implementation bug.  If qemu knowns that
> guest physiscall == guest dma space there is no need to check.

Possibly. Or it's possible it's all just theoretical, no one
posted any numbers.

-- 
MST

^ permalink raw reply

* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Benjamin Herrenschmidt @ 2018-06-13 13:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: robh, cohuck, pawel.moll, Tom Lendacky, Michael S. Tsirkin,
	Ram Pai, linux-kernel, virtualization, joe, Rustad, Mark D,
	Anshuman Khandual, linuxppc-dev, elfring, david
In-Reply-To: <5dbcafa73b065bc619fd6adc9ef47eb6367b8378.camel@kernel.crashing.org>

On Wed, 2018-06-13 at 22:25 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2018-06-13 at 00:41 -0700, Christoph Hellwig wrote:
> > On Mon, Jun 11, 2018 at 01:29:18PM +1000, Benjamin Herrenschmidt wrote:
> > > At the risk of repeating myself, let's just do the first pass which is
> > > to switch virtio over to always using the DMA API in the actual data
> > > flow code, with a hook at initialization time that replaces the DMA ops
> > > with some home cooked "direct" ops in the case where the IOMMU flag
> > > isn't set.
> > > 
> > > This will be equivalent to what we have today but avoids having 2
> > > separate code path all over the driver.
> > > 
> > > Then a second stage, I think, is to replace this "hook" so that the
> > > architecture gets a say in the matter.
> > 
> > I don't think we can actually use dma_direct_ops.  It still allows
> > architectures to override parts of the dma setup, which virtio seems
> > to blindly assume phys == dma and not cache flushing.
> 
> By direct ops I didn't mean *the* dma_direct_ops but a virtio-local
> variants that effectively reproduces the existing expectations (ie,
> virtio-dma-legacy-ops or something).

Actually ... the stuff in lib/dma-direct.c seems to be just it, no ?

There's no cache flushing and there's no architecture hooks that I can
see other than the AMD security stuff which is probably fine.

Or am I missing something ?

Cheers,
Ben.
> 
> > I think the right way forward is to either add a new
> > VIRTIO_F_IS_PCI_DEVICE (or redefine the existing iommu flag if deemed
> > possible).  And then make sure recent qemu always sets it.
> 
> Ben.

^ permalink raw reply

* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Benjamin Herrenschmidt @ 2018-06-13 12:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: robh, cohuck, pawel.moll, Tom Lendacky, Michael S. Tsirkin,
	Ram Pai, linux-kernel, virtualization, joe, Rustad, Mark D,
	Anshuman Khandual, linuxppc-dev, elfring, david
In-Reply-To: <20180613074141.GA12033@infradead.org>

On Wed, 2018-06-13 at 00:41 -0700, Christoph Hellwig wrote:
> On Mon, Jun 11, 2018 at 01:29:18PM +1000, Benjamin Herrenschmidt wrote:
> > At the risk of repeating myself, let's just do the first pass which is
> > to switch virtio over to always using the DMA API in the actual data
> > flow code, with a hook at initialization time that replaces the DMA ops
> > with some home cooked "direct" ops in the case where the IOMMU flag
> > isn't set.
> > 
> > This will be equivalent to what we have today but avoids having 2
> > separate code path all over the driver.
> > 
> > Then a second stage, I think, is to replace this "hook" so that the
> > architecture gets a say in the matter.
> 
> I don't think we can actually use dma_direct_ops.  It still allows
> architectures to override parts of the dma setup, which virtio seems
> to blindly assume phys == dma and not cache flushing.

By direct ops I didn't mean *the* dma_direct_ops but a virtio-local
variants that effectively reproduces the existing expectations (ie,
virtio-dma-legacy-ops or something).

> I think the right way forward is to either add a new
> VIRTIO_F_IS_PCI_DEVICE (or redefine the existing iommu flag if deemed
> possible).  And then make sure recent qemu always sets it.

Ben.

^ permalink raw reply

* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Christoph Hellwig @ 2018-06-13  7:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: robh, cohuck, pawel.moll, Tom Lendacky, Michael S. Tsirkin,
	Ram Pai, linux-kernel, virtualization, Christoph Hellwig, joe,
	Rustad, Mark D, Anshuman Khandual, linuxppc-dev, elfring, david
In-Reply-To: <07b804fccd7373c650be79ac9fa77ae7f2375ced.camel@kernel.crashing.org>

On Mon, Jun 11, 2018 at 01:29:18PM +1000, Benjamin Herrenschmidt wrote:
> At the risk of repeating myself, let's just do the first pass which is
> to switch virtio over to always using the DMA API in the actual data
> flow code, with a hook at initialization time that replaces the DMA ops
> with some home cooked "direct" ops in the case where the IOMMU flag
> isn't set.
> 
> This will be equivalent to what we have today but avoids having 2
> separate code path all over the driver.
> 
> Then a second stage, I think, is to replace this "hook" so that the
> architecture gets a say in the matter.

I don't think we can actually use dma_direct_ops.  It still allows
architectures to override parts of the dma setup, which virtio seems
to blindly assume phys == dma and not cache flushing.

I think the right way forward is to either add a new
VIRTIO_F_IS_PCI_DEVICE (or redefine the existing iommu flag if deemed
possible).  And then make sure recent qemu always sets it.

^ permalink raw reply

* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Jason Wang @ 2018-06-13  5:40 UTC (permalink / raw)
  To: Samudrala, Sridhar, Michael S. Tsirkin
  Cc: alexander.h.duyck, virtio-dev, qemu-devel, virtualization
In-Reply-To: <c29a5ac9-cfb5-0a19-f28c-30d41bd5efcf@intel.com>



On 2018年06月13日 12:24, Samudrala, Sridhar wrote:
> On 6/12/2018 7:38 PM, Jason Wang wrote:
>>
>>
>> On 2018年06月12日 19:54, Michael S. Tsirkin wrote:
>>> On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:
>>>>
>>>> On 2018年06月05日 20:33, Michael S. Tsirkin wrote:
>>>>> I don't think this is sufficient.
>>>>>
>>>>> If both primary and standby devices are present, a legacy guest 
>>>>> without
>>>>> support for the feature might see two devices with same mac and get
>>>>> confused.
>>>>>
>>>>> I think that we should only make primary visible after guest acked 
>>>>> the
>>>>> backup feature bit.
>>>> I think we want exactly the reverse? E.g fail the negotiation when 
>>>> guest
>>>> does not ack backup feature.
>>>>
>>>> Otherwise legacy guest won't even have the chance to see primary 
>>>> device in
>>>> the guest.
>>> That's by design.
>>
>> So management needs to know the capability of guest to set the backup 
>> feature. This looks a chicken or egg problem to me.
>
> I don't think so. If the tenant requests 'accelerated datapath 
> feature', the management
> will set 'standby' feature bit on virtio-net interface and if the 
> guest virtio-net driver
> supports this feature, then the tenant VM will get that capability via 
> a hot-plugged
> primary device.

Ok, I thought exactly the reverse because of the commit title is "enable 
virtio_net to act as a standby for a passthru device". But re-read the 
commit log content, I understand the case a little bit. Btw, VF is not 
necessarily faster than virtio-net, especially consider virtio-net may 
have a lot of queues.

>
>
>>
>>>
>>>>> And on reset or when backup is cleared in some other way, unplug the
>>>>> primary.
>>>> What if guest does not support hotplug?
>>> It shouldn't ack the backup feature then and it's a good point.
>>> We should both document this and check kernel config has
>>> hotplug support. Sridhar could you take a look pls?
>>>
>>>>> Something like the below will do the job:
>>>>>
>>>>> Primary device is added with a special "primary-failover" flag.
>>>>> A virtual machine is then initialized with just a standby virtio
>>>>> device. Primary is not yet added.
>>>> A question is how to do the matching? Qemu knows nothing about e.g mac
>>>> address of a pass-through device I believe?
>>> Supply a flag to the VFIO when it's added, this way QEMU will know.
>>>
>>>>> Later QEMU detects that guest driver device set DRIVER_OK.
>>>>> It then exposes the primary device to the guest, and triggers
>>>>> a device addition event (hot-plug event) for it.
>>>> Do you mean we won't have primary device in the initial qemu cli?
>>> No, that's not what I mean.
>>>
>>> I mean management will supply a flag to VFIO and then
>>>
>>>
>>> - VFIO defers exposing
>>> primary to guest until guest acks the feature bit.
>>> - When we see guest ack, initiate hotplug.
>>> - On reboot, hide it again.
>>> - On reset without reboot, request hot-unplug and on eject hide it 
>>> again.
>>
>> This sounds much like a kind of bonding in qemu.
>>
>>>
>>>>> If QEMU detects guest driver removal, it initiates a hot-unplug 
>>>>> sequence
>>>>> to remove the primary driver.  In particular, if QEMU detects guest
>>>>> re-initialization (e.g. by detecting guest reset) it immediately 
>>>>> removes
>>>>> the primary device.
>>>> I believe guest failover module should handle this gracefully?
>>> It can't control enumeration order, if primary is enumerated before
>>> standby then guest will load its driver and it's too late
>>> when failover driver is loaded.
>>
>> Well, even if we can delay the visibility of primary until DRIVER_OK, 
>> there still be a race I think? And it looks to me it's still a bug of 
>> guest:
>>
>> E.g primary could be probed before failover_register() in guest. Then 
>> we will miss the enslaving of primary forever.
>
> That is not an issue. Even if the primary is probed before failover 
> driver, it will
> enslave the primary via the call to failover_existing_slave_register() 
> as part of
> failover_register() routine.

Aha I get it. So the enumeration order is not an issue.

Consider primary may still be seen by guest kernel even if we delay its 
visibility, I wonder whether we can control the lifecycle of primary 
through driver but not qemu. This can simplify a lot of things.

Thanks

>
>>
>> Thanks
>>
>>>
>>>> Thanks
>>>>
>>>>> We can move some of this code to management as well, 
>>>>> architecturally it
>>>>> does not make too much sense but it might be easier 
>>>>> implementation-wise.
>>>>>
>>>>> HTH
>>>>>
>>>>> On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
>>>>>> Ping on this patch now that the kernel patches are accepted into 
>>>>>> davem's net-next tree.
>>>>>> https://patchwork.ozlabs.org/cover/920005/
>>>>>>
>>>>>>
>>>>>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
>>>>>>> This feature bit can be used by hypervisor to indicate 
>>>>>>> virtio_net device to
>>>>>>> act as a standby for another device with the same MAC address.
>>>>>>>
>>>>>>> I tested this with a small change to the patch to mark the 
>>>>>>> STANDBY feature 'true'
>>>>>>> by default as i am using libvirt to start the VMs.
>>>>>>> Is there a way to pass the newly added feature bit 'standby' to 
>>>>>>> qemu via libvirt
>>>>>>> XML file?
>>>>>>>
>>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>>>> ---
>>>>>>>     hw/net/virtio-net.c                         | 2 ++
>>>>>>>     include/standard-headers/linux/virtio_net.h | 3 +++
>>>>>>>     2 files changed, 5 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>>> index 90502fca7c..38b3140670 100644
>>>>>>> --- a/hw/net/virtio-net.c
>>>>>>> +++ b/hw/net/virtio-net.c
>>>>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>>>>>                          true),
>>>>>>>         DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, 
>>>>>>> SPEED_UNKNOWN),
>>>>>>>         DEFINE_PROP_STRING("duplex", VirtIONet, 
>>>>>>> net_conf.duplex_str),
>>>>>>> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
>>>>>>> VIRTIO_NET_F_STANDBY,
>>>>>>> +                      false),
>>>>>>>         DEFINE_PROP_END_OF_LIST(),
>>>>>>>     };
>>>>>>> diff --git a/include/standard-headers/linux/virtio_net.h 
>>>>>>> b/include/standard-headers/linux/virtio_net.h
>>>>>>> index e9f255ea3f..01ec09684c 100644
>>>>>>> --- a/include/standard-headers/linux/virtio_net.h
>>>>>>> +++ b/include/standard-headers/linux/virtio_net.h
>>>>>>> @@ -57,6 +57,9 @@
>>>>>>>                          * Steering */
>>>>>>>     #define VIRTIO_NET_F_CTRL_MAC_ADDR 23    /* Set MAC address */
>>>>>>> +#define VIRTIO_NET_F_STANDBY      62    /* Act as standby for 
>>>>>>> another device
>>>>>>> +                                         * with the same MAC.
>>>>>>> +                                         */
>>>>>>>     #define VIRTIO_NET_F_SPEED_DUPLEX 63    /* Device set 
>>>>>>> linkspeed and duplex */
>>>>>>>     #ifndef VIRTIO_NET_NO_LEGACY
>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-13  4:24 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: alexander.h.duyck, virtio-dev, qemu-devel, virtualization
In-Reply-To: <5718de66-74c9-1362-a87b-2eba02475b48@redhat.com>

On 6/12/2018 7:38 PM, Jason Wang wrote:
>
>
> On 2018年06月12日 19:54, Michael S. Tsirkin wrote:
>> On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:
>>>
>>> On 2018年06月05日 20:33, Michael S. Tsirkin wrote:
>>>> I don't think this is sufficient.
>>>>
>>>> If both primary and standby devices are present, a legacy guest 
>>>> without
>>>> support for the feature might see two devices with same mac and get
>>>> confused.
>>>>
>>>> I think that we should only make primary visible after guest acked the
>>>> backup feature bit.
>>> I think we want exactly the reverse? E.g fail the negotiation when 
>>> guest
>>> does not ack backup feature.
>>>
>>> Otherwise legacy guest won't even have the chance to see primary 
>>> device in
>>> the guest.
>> That's by design.
>
> So management needs to know the capability of guest to set the backup 
> feature. This looks a chicken or egg problem to me.

I don't think so. If the tenant requests 'accelerated datapath feature', the management
will set 'standby' feature bit on virtio-net interface and if the guest virtio-net driver
supports this feature, then the tenant VM will get that capability via a hot-plugged
primary device.


>
>>
>>>> And on reset or when backup is cleared in some other way, unplug the
>>>> primary.
>>> What if guest does not support hotplug?
>> It shouldn't ack the backup feature then and it's a good point.
>> We should both document this and check kernel config has
>> hotplug support. Sridhar could you take a look pls?
>>
>>>> Something like the below will do the job:
>>>>
>>>> Primary device is added with a special "primary-failover" flag.
>>>> A virtual machine is then initialized with just a standby virtio
>>>> device. Primary is not yet added.
>>> A question is how to do the matching? Qemu knows nothing about e.g mac
>>> address of a pass-through device I believe?
>> Supply a flag to the VFIO when it's added, this way QEMU will know.
>>
>>>> Later QEMU detects that guest driver device set DRIVER_OK.
>>>> It then exposes the primary device to the guest, and triggers
>>>> a device addition event (hot-plug event) for it.
>>> Do you mean we won't have primary device in the initial qemu cli?
>> No, that's not what I mean.
>>
>> I mean management will supply a flag to VFIO and then
>>
>>
>> - VFIO defers exposing
>> primary to guest until guest acks the feature bit.
>> - When we see guest ack, initiate hotplug.
>> - On reboot, hide it again.
>> - On reset without reboot, request hot-unplug and on eject hide it 
>> again.
>
> This sounds much like a kind of bonding in qemu.
>
>>
>>>> If QEMU detects guest driver removal, it initiates a hot-unplug 
>>>> sequence
>>>> to remove the primary driver.  In particular, if QEMU detects guest
>>>> re-initialization (e.g. by detecting guest reset) it immediately 
>>>> removes
>>>> the primary device.
>>> I believe guest failover module should handle this gracefully?
>> It can't control enumeration order, if primary is enumerated before
>> standby then guest will load its driver and it's too late
>> when failover driver is loaded.
>
> Well, even if we can delay the visibility of primary until DRIVER_OK, 
> there still be a race I think? And it looks to me it's still a bug of 
> guest:
>
> E.g primary could be probed before failover_register() in guest. Then 
> we will miss the enslaving of primary forever.

That is not an issue. Even if the primary is probed before failover driver, it will
enslave the primary via the call to failover_existing_slave_register() as part of
failover_register() routine.

>
> Thanks
>
>>
>>> Thanks
>>>
>>>> We can move some of this code to management as well, 
>>>> architecturally it
>>>> does not make too much sense but it might be easier 
>>>> implementation-wise.
>>>>
>>>> HTH
>>>>
>>>> On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
>>>>> Ping on this patch now that the kernel patches are accepted into 
>>>>> davem's net-next tree.
>>>>> https://patchwork.ozlabs.org/cover/920005/
>>>>>
>>>>>
>>>>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
>>>>>> This feature bit can be used by hypervisor to indicate virtio_net 
>>>>>> device to
>>>>>> act as a standby for another device with the same MAC address.
>>>>>>
>>>>>> I tested this with a small change to the patch to mark the 
>>>>>> STANDBY feature 'true'
>>>>>> by default as i am using libvirt to start the VMs.
>>>>>> Is there a way to pass the newly added feature bit 'standby' to 
>>>>>> qemu via libvirt
>>>>>> XML file?
>>>>>>
>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>>> ---
>>>>>>     hw/net/virtio-net.c                         | 2 ++
>>>>>>     include/standard-headers/linux/virtio_net.h | 3 +++
>>>>>>     2 files changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>> index 90502fca7c..38b3140670 100644
>>>>>> --- a/hw/net/virtio-net.c
>>>>>> +++ b/hw/net/virtio-net.c
>>>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>>>>                          true),
>>>>>>         DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, 
>>>>>> SPEED_UNKNOWN),
>>>>>>         DEFINE_PROP_STRING("duplex", VirtIONet, 
>>>>>> net_conf.duplex_str),
>>>>>> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
>>>>>> VIRTIO_NET_F_STANDBY,
>>>>>> +                      false),
>>>>>>         DEFINE_PROP_END_OF_LIST(),
>>>>>>     };
>>>>>> diff --git a/include/standard-headers/linux/virtio_net.h 
>>>>>> b/include/standard-headers/linux/virtio_net.h
>>>>>> index e9f255ea3f..01ec09684c 100644
>>>>>> --- a/include/standard-headers/linux/virtio_net.h
>>>>>> +++ b/include/standard-headers/linux/virtio_net.h
>>>>>> @@ -57,6 +57,9 @@
>>>>>>                          * Steering */
>>>>>>     #define VIRTIO_NET_F_CTRL_MAC_ADDR 23    /* Set MAC address */
>>>>>> +#define VIRTIO_NET_F_STANDBY      62    /* Act as standby for 
>>>>>> another device
>>>>>> +                                         * with the same MAC.
>>>>>> +                                         */
>>>>>>     #define VIRTIO_NET_F_SPEED_DUPLEX 63    /* Device set 
>>>>>> linkspeed and duplex */
>>>>>>     #ifndef VIRTIO_NET_NO_LEGACY
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Jason Wang @ 2018-06-13  2:41 UTC (permalink / raw)
  To: Samudrala, Sridhar, Michael S. Tsirkin
  Cc: alexander.h.duyck, virtio-dev, virtualization, qemu-devel
In-Reply-To: <7aa6dbe7-bac2-96f6-d96c-f51e5173c556@intel.com>



On 2018年06月13日 08:20, Samudrala, Sridhar wrote:
> On 6/12/2018 4:54 AM, Michael S. Tsirkin wrote:
>> On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:
>>>
>>> On 2018年06月05日 20:33, Michael S. Tsirkin wrote:
>>>> I don't think this is sufficient.
>>>>
>>>> If both primary and standby devices are present, a legacy guest 
>>>> without
>>>> support for the feature might see two devices with same mac and get
>>>> confused.
>>>>
>>>> I think that we should only make primary visible after guest acked the
>>>> backup feature bit.
>>> I think we want exactly the reverse? E.g fail the negotiation when 
>>> guest
>>> does not ack backup feature.
>>>
>>> Otherwise legacy guest won't even have the chance to see primary 
>>> device in
>>> the guest.
>> That's by design.
>>
>>>> And on reset or when backup is cleared in some other way, unplug the
>>>> primary.
>>> What if guest does not support hotplug?
>> It shouldn't ack the backup feature then and it's a good point.
>> We should both document this and check kernel config has
>> hotplug support. Sridhar could you take a look pls?
>
> Right now we select NET_FAILOVER when virtio-net is enabled, Should we 
> select
> PCI_HOTPLUG when NET_FAILOVER is enabled? 

Maybe I was wrong but it looks to me PCI_HOTPLUG is no sufficient since 
it depends on other to work e.g HOTPLUG_PCI_ACPI.

Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Jason Wang @ 2018-06-13  2:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, virtio-dev, Samudrala, Sridhar, qemu-devel,
	virtualization
In-Reply-To: <20180612144743-mutt-send-email-mst@kernel.org>



On 2018年06月12日 19:54, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:
>>
>> On 2018年06月05日 20:33, Michael S. Tsirkin wrote:
>>> I don't think this is sufficient.
>>>
>>> If both primary and standby devices are present, a legacy guest without
>>> support for the feature might see two devices with same mac and get
>>> confused.
>>>
>>> I think that we should only make primary visible after guest acked the
>>> backup feature bit.
>> I think we want exactly the reverse? E.g fail the negotiation when guest
>> does not ack backup feature.
>>
>> Otherwise legacy guest won't even have the chance to see primary device in
>> the guest.
> That's by design.

So management needs to know the capability of guest to set the backup 
feature. This looks a chicken or egg problem to me.

>
>>> And on reset or when backup is cleared in some other way, unplug the
>>> primary.
>> What if guest does not support hotplug?
> It shouldn't ack the backup feature then and it's a good point.
> We should both document this and check kernel config has
> hotplug support. Sridhar could you take a look pls?
>
>>> Something like the below will do the job:
>>>
>>> Primary device is added with a special "primary-failover" flag.
>>> A virtual machine is then initialized with just a standby virtio
>>> device. Primary is not yet added.
>> A question is how to do the matching? Qemu knows nothing about e.g mac
>> address of a pass-through device I believe?
> Supply a flag to the VFIO when it's added, this way QEMU will know.
>
>>> Later QEMU detects that guest driver device set DRIVER_OK.
>>> It then exposes the primary device to the guest, and triggers
>>> a device addition event (hot-plug event) for it.
>> Do you mean we won't have primary device in the initial qemu cli?
> No, that's not what I mean.
>
> I mean management will supply a flag to VFIO and then
>
>
> - VFIO defers exposing
> primary to guest until guest acks the feature bit.
> - When we see guest ack, initiate hotplug.
> - On reboot, hide it again.
> - On reset without reboot, request hot-unplug and on eject hide it again.

This sounds much like a kind of bonding in qemu.

>
>>> If QEMU detects guest driver removal, it initiates a hot-unplug sequence
>>> to remove the primary driver.  In particular, if QEMU detects guest
>>> re-initialization (e.g. by detecting guest reset) it immediately removes
>>> the primary device.
>> I believe guest failover module should handle this gracefully?
> It can't control enumeration order, if primary is enumerated before
> standby then guest will load its driver and it's too late
> when failover driver is loaded.

Well, even if we can delay the visibility of primary until DRIVER_OK, 
there still be a race I think? And it looks to me it's still a bug of guest:

E.g primary could be probed before failover_register() in guest. Then we 
will miss the enslaving of primary forever.

Thanks

>
>> Thanks
>>
>>> We can move some of this code to management as well, architecturally it
>>> does not make too much sense but it might be easier implementation-wise.
>>>
>>> HTH
>>>
>>> On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
>>>> Ping on this patch now that the kernel patches are accepted into davem's net-next tree.
>>>> https://patchwork.ozlabs.org/cover/920005/
>>>>
>>>>
>>>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
>>>>> This feature bit can be used by hypervisor to indicate virtio_net device to
>>>>> act as a standby for another device with the same MAC address.
>>>>>
>>>>> I tested this with a small change to the patch to mark the STANDBY feature 'true'
>>>>> by default as i am using libvirt to start the VMs.
>>>>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
>>>>> XML file?
>>>>>
>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>> ---
>>>>>     hw/net/virtio-net.c                         | 2 ++
>>>>>     include/standard-headers/linux/virtio_net.h | 3 +++
>>>>>     2 files changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>> index 90502fca7c..38b3140670 100644
>>>>> --- a/hw/net/virtio-net.c
>>>>> +++ b/hw/net/virtio-net.c
>>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>>>                          true),
>>>>>         DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>>>>         DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>>>> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>>>>> +                      false),
>>>>>         DEFINE_PROP_END_OF_LIST(),
>>>>>     };
>>>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
>>>>> index e9f255ea3f..01ec09684c 100644
>>>>> --- a/include/standard-headers/linux/virtio_net.h
>>>>> +++ b/include/standard-headers/linux/virtio_net.h
>>>>> @@ -57,6 +57,9 @@
>>>>>     					 * Steering */
>>>>>     #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
>>>>> +#define VIRTIO_NET_F_STANDBY      62    /* Act as standby for another device
>>>>> +                                         * with the same MAC.
>>>>> +                                         */
>>>>>     #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
>>>>>     #ifndef VIRTIO_NET_NO_LEGACY

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-13  0:20 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: alexander.h.duyck, virtio-dev, qemu-devel, virtualization
In-Reply-To: <20180612144743-mutt-send-email-mst@kernel.org>

On 6/12/2018 4:54 AM, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:
>>
>> On 2018年06月05日 20:33, Michael S. Tsirkin wrote:
>>> I don't think this is sufficient.
>>>
>>> If both primary and standby devices are present, a legacy guest without
>>> support for the feature might see two devices with same mac and get
>>> confused.
>>>
>>> I think that we should only make primary visible after guest acked the
>>> backup feature bit.
>> I think we want exactly the reverse? E.g fail the negotiation when guest
>> does not ack backup feature.
>>
>> Otherwise legacy guest won't even have the chance to see primary device in
>> the guest.
> That's by design.
>
>>> And on reset or when backup is cleared in some other way, unplug the
>>> primary.
>> What if guest does not support hotplug?
> It shouldn't ack the backup feature then and it's a good point.
> We should both document this and check kernel config has
> hotplug support. Sridhar could you take a look pls?

Right now we select NET_FAILOVER when virtio-net is enabled,  Should we select
PCI_HOTPLUG when NET_FAILOVER is enabled?

>
>>> Something like the below will do the job:
>>>
>>> Primary device is added with a special "primary-failover" flag.
>>> A virtual machine is then initialized with just a standby virtio
>>> device. Primary is not yet added.
>> A question is how to do the matching? Qemu knows nothing about e.g mac
>> address of a pass-through device I believe?
> Supply a flag to the VFIO when it's added, this way QEMU will know.
>
>>> Later QEMU detects that guest driver device set DRIVER_OK.
>>> It then exposes the primary device to the guest, and triggers
>>> a device addition event (hot-plug event) for it.
>> Do you mean we won't have primary device in the initial qemu cli?
> No, that's not what I mean.
>
> I mean management will supply a flag to VFIO and then
>
>
> - VFIO defers exposing
> primary to guest until guest acks the feature bit.
> - When we see guest ack, initiate hotplug.
> - On reboot, hide it again.
> - On reset without reboot, request hot-unplug and on eject hide it again.
>
>
>>> If QEMU detects guest driver removal, it initiates a hot-unplug sequence
>>> to remove the primary driver.  In particular, if QEMU detects guest
>>> re-initialization (e.g. by detecting guest reset) it immediately removes
>>> the primary device.
>> I believe guest failover module should handle this gracefully?
> It can't control enumeration order, if primary is enumerated before
> standby then guest will load its driver and it's too late
> when failover driver is loaded.

Does it mean that if guest unloads virtio-net driver, Qemu should automatically
unplug the associated primary device?

What about guest unloading/re-loading primary device driver?


>
>> Thanks
>>
>>> We can move some of this code to management as well, architecturally it
>>> does not make too much sense but it might be easier implementation-wise.
>>>
>>> HTH
>>>
>>> On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
>>>> Ping on this patch now that the kernel patches are accepted into davem's net-next tree.
>>>> https://patchwork.ozlabs.org/cover/920005/
>>>>
>>>>
>>>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
>>>>> This feature bit can be used by hypervisor to indicate virtio_net device to
>>>>> act as a standby for another device with the same MAC address.
>>>>>
>>>>> I tested this with a small change to the patch to mark the STANDBY feature 'true'
>>>>> by default as i am using libvirt to start the VMs.
>>>>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
>>>>> XML file?
>>>>>
>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>> ---
>>>>>     hw/net/virtio-net.c                         | 2 ++
>>>>>     include/standard-headers/linux/virtio_net.h | 3 +++
>>>>>     2 files changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>> index 90502fca7c..38b3140670 100644
>>>>> --- a/hw/net/virtio-net.c
>>>>> +++ b/hw/net/virtio-net.c
>>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>>>                          true),
>>>>>         DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>>>>         DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>>>> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>>>>> +                      false),
>>>>>         DEFINE_PROP_END_OF_LIST(),
>>>>>     };
>>>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
>>>>> index e9f255ea3f..01ec09684c 100644
>>>>> --- a/include/standard-headers/linux/virtio_net.h
>>>>> +++ b/include/standard-headers/linux/virtio_net.h
>>>>> @@ -57,6 +57,9 @@
>>>>>     					 * Steering */
>>>>>     #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
>>>>> +#define VIRTIO_NET_F_STANDBY      62    /* Act as standby for another device
>>>>> +                                         * with the same MAC.
>>>>> +                                         */
>>>>>     #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
>>>>>     #ifndef VIRTIO_NET_NO_LEGACY

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-13  0:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, virtio-dev, aaron.f.brown, jiri, kubakici,
	netdev, qemu-devel, loseweigh, virtualization
In-Reply-To: <20180612142557-mutt-send-email-mst@kernel.org>

On 6/12/2018 4:34 AM, Michael S. Tsirkin wrote:
> On Mon, Jun 11, 2018 at 10:02:45PM -0700, Samudrala, Sridhar wrote:
>> On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote:
>>> On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote:
>>>> On 2018年06月12日 01:26, Michael S. Tsirkin wrote:
>>>>> On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:
>>>>>> This feature bit can be used by hypervisor to indicate virtio_net device to
>>>>>> act as a standby for another device with the same MAC address.
>>>>>>
>>>>>> I tested this with a small change to the patch to mark the STANDBY feature 'true'
>>>>>> by default as i am using libvirt to start the VMs.
>>>>>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
>>>>>> XML file?
>>>>>>
>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>> So I do not think we can commit to this interface: we
>>>>> really need to control visibility of the primary device.
>>>> The problem is legacy guest won't use primary device at all if we do this.
>>> And that's by design - I think it's the only way to ensure the
>>> legacy guest isn't confused.
>> Yes. I think so. But i am not sure if Qemu is the right place to control the visibility
>> of the primary device. The primary device may not be specified as an argument to Qemu. It
>> may be plugged in later.
>> The cloud service provider is providing a feature that enables low latency datapath and live
>> migration capability.
>> A tenant can use this feature only if he is running a VM that has virtio-net with failover support.
> Well live migration is there already. The new feature is low latency
> data path.

we get live migration with just virtio.  But I meant live migration with VF as
primary device.

>
> And it's the guest that needs failover support not the VM.

Isn't guest and VM synonymous?


>
>
>> I think Qemu should check if guest virtio-net supports this feature and provide a mechanism for
>> an upper layer indicating if the STANDBY feature is successfully negotiated or not.
>> The upper layer can then decide if it should hot plug a VF with the same MAC and manage the 2 links.
>> If VF is successfully hot plugged, virtio-net link should be disabled.
> Did you even talk to upper layer management about it?
> Just list the steps they need to do and you will see
> that's a lot of machinery to manage by the upper layer.
>
> What do we gain in flexibility? As far as I can see the
> only gain is some resources saved for legacy VMs.
>
> That's not a lot as tenant of the upper layer probably already has
> at least a hunch that it's a new guest otherwise
> why bother specifying the feature at all - you
> save even more resources without it.
>

I am not all that familiar with how Qemu manages network devices. If we can do all the
required management of the primary/standby devices within Qemu, that is definitely a better
approach without upper layer involvement.


>
>
>>>> How about control the visibility of standby device?
>>>>
>>>> Thanks
>>> standy the always there to guarantee no downtime.
>>>
>>>>> However just for testing purposes, we could add a non-stable
>>>>> interface "x-standby" with the understanding that as any
>>>>> x- prefix it's unstable and will be changed down the road,
>>>>> likely in the next release.
>>>>>
>>>>>
>>>>>> ---
>>>>>>     hw/net/virtio-net.c                         | 2 ++
>>>>>>     include/standard-headers/linux/virtio_net.h | 3 +++
>>>>>>     2 files changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>> index 90502fca7c..38b3140670 100644
>>>>>> --- a/hw/net/virtio-net.c
>>>>>> +++ b/hw/net/virtio-net.c
>>>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>>>>                          true),
>>>>>>         DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>>>>>         DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>>>>> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>>>>>> +                      false),
>>>>>>         DEFINE_PROP_END_OF_LIST(),
>>>>>>     };
>>>>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
>>>>>> index e9f255ea3f..01ec09684c 100644
>>>>>> --- a/include/standard-headers/linux/virtio_net.h
>>>>>> +++ b/include/standard-headers/linux/virtio_net.h
>>>>>> @@ -57,6 +57,9 @@
>>>>>>     					 * Steering */
>>>>>>     #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
>>>>>> +#define VIRTIO_NET_F_STANDBY      62    /* Act as standby for another device
>>>>>> +                                         * with the same MAC.
>>>>>> +                                         */
>>>>>>     #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
>>>>>>     #ifndef VIRTIO_NET_NO_LEGACY
>>>>>> -- 
>>>>>> 2.14.3
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v4 1/3] compiler-gcc.h: add gnu_inline to all inline declarations
From: H. Peter Anvin @ 2018-06-12 21:31 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Kate Stewart, linux-efi, brijesh.singh, J. Kiszka, Will Deacon,
	virtualization, Masahiro Yamada, Manoj Gupta, boris.ostrovsky,
	mawilcox, x86, akataria, Greg Hackmann, mingo, Alistair Strachan,
	David Rientjes, geert, Arnd Bergmann, Linux Kbuild mailing list,
	Philippe Ombredanne, rostedt, acme, Andrey Ryabinin, sedat.dilek,
	Thomas Gleixner, Michal Marek, tstellar
In-Reply-To: <CAKwvOdkeEqb0HnHoggpwmHiKtDo06VN0N_r9ffv0rzigNLvXUg@mail.gmail.com>

On 06/12/18 13:19, Nick Desaulniers wrote:
> 
> Oh, by [0] __GCC_STDC_INLINE__ is indeed actually the correct symbol
> to check for.  Clang does support that, so nothing to fix there.
> 
>> By the way, you should check clang against gcc's predefined macros by doing:
>>
>> gcc [options] -x c -Wp,-dM -E /dev/null | sort
>>
>> Options can change the predefined macros substantially, especially the, -std=, arch and -O options. -x c can be replaced with e.g. -x c++, objective-c, assembler-with-cpp etc.
> 
> Neat, I'll have to bookmark that incantation.  I can s/gcc/clang/ to
> get a similar list (which is how I know it supports
> __GCC_STDC_INLINE__).>

I bet that if you add -fgnu89-inlines then it *does* define
__GNUC_GNU_INLINE__.

> 
> Patch now becomes something like:
> 
> #ifdef __GNUC_GNU_INLINE__
> #define __gnu_inline __attribute__((gnu_inline))
> #else
> #define __gnu_inline
> #endif
> 
> #define inline inline __attribute__((always_inline, unused)) notrace
> __gnu_inline
> ...
> 
> Issues with that approach?
> 

s/__GNUC_GNU_INLINE__/__GNUC_STDC_INLINE__/

	-hpa

^ permalink raw reply

* Re: [PATCH 0/3] Use sbitmap instead of percpu_ida
From: Jens Axboe @ 2018-06-12 20:40 UTC (permalink / raw)
  To: Matthew Wilcox, linux-kernel, linux-scsi, target-devel,
	linux1394-devel, linux-usb, kvm, virtualization, netdev,
	Juergen Gross, qla2xxx-upstream, Kent Overstreet
In-Reply-To: <20180612190545.10781-1-willy@infradead.org>

On 6/12/18 1:05 PM, Matthew Wilcox wrote:
> Removing the percpu_ida code nets over 400 lines of removal.  It's not
> as spectacular as deleting an entire architecture, but it's still a
> worthy reduction in lines of code.
> 
> Untested due to lack of hardware and not understanding how to set up a
> target platform.
> 
> Changes from v1:
>  - Fixed bugs pointed out by Jens in iscsit_wait_for_tag()
>  - Abstracted out tag freeing as requested by Bart
>  - Made iscsit_wait_for_tag static as pointed out by 0day

Reviewed-by: Jens Axboe <axboe@kernel.dk>

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH 0/3] Use sbitmap instead of percpu_ida
From: Bart Van Assche @ 2018-06-12 20:06 UTC (permalink / raw)
  To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-usb@vger.kernel.org, willy@infradead.org,
	virtualization@lists.linux-foundation.org,
	kent.overstreet@gmail.com, linux1394-devel@lists.sourceforge.net,
	jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
	qla2xxx-upstream@qlogic.com, target-devel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20180612190545.10781-1-willy@infradead.org>

On Tue, 2018-06-12 at 12:05 -0700, Matthew Wilcox wrote:
> Removing the percpu_ida code nets over 400 lines of removal.  It's not
> as spectacular as deleting an entire architecture, but it's still a
> worthy reduction in lines of code.
> 
> Untested due to lack of hardware and not understanding how to set up a
> target platform.
> 
> Changes from v1:
>  - Fixed bugs pointed out by Jens in iscsit_wait_for_tag()
>  - Abstracted out tag freeing as requested by Bart
>  - Made iscsit_wait_for_tag static as pointed out by 0day

For the whole series:

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>

Thanks,

Bart.

^ permalink raw reply

* [PATCH 3/3] Remove percpu_ida
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
	linux-usb, kvm, virtualization, netdev, Juergen Gross,
	qla2xxx-upstream, Kent Overstreet, Jens Axboe
  Cc: Matthew Wilcox
In-Reply-To: <20180612190545.10781-1-willy@infradead.org>

With its one user gone, remove the library code.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
---
 include/linux/percpu_ida.h |  83 ---------
 lib/Makefile               |   2 +-
 lib/percpu_ida.c           | 370 -------------------------------------
 3 files changed, 1 insertion(+), 454 deletions(-)
 delete mode 100644 include/linux/percpu_ida.h
 delete mode 100644 lib/percpu_ida.c

diff --git a/include/linux/percpu_ida.h b/include/linux/percpu_ida.h
deleted file mode 100644
index 07d78e4653bc..000000000000
--- a/include/linux/percpu_ida.h
+++ /dev/null
@@ -1,83 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __PERCPU_IDA_H__
-#define __PERCPU_IDA_H__
-
-#include <linux/types.h>
-#include <linux/bitops.h>
-#include <linux/init.h>
-#include <linux/sched.h>
-#include <linux/spinlock_types.h>
-#include <linux/wait.h>
-#include <linux/cpumask.h>
-
-struct percpu_ida_cpu;
-
-struct percpu_ida {
-	/*
-	 * number of tags available to be allocated, as passed to
-	 * percpu_ida_init()
-	 */
-	unsigned			nr_tags;
-	unsigned			percpu_max_size;
-	unsigned			percpu_batch_size;
-
-	struct percpu_ida_cpu __percpu	*tag_cpu;
-
-	/*
-	 * Bitmap of cpus that (may) have tags on their percpu freelists:
-	 * steal_tags() uses this to decide when to steal tags, and which cpus
-	 * to try stealing from.
-	 *
-	 * It's ok for a freelist to be empty when its bit is set - steal_tags()
-	 * will just keep looking - but the bitmap _must_ be set whenever a
-	 * percpu freelist does have tags.
-	 */
-	cpumask_t			cpus_have_tags;
-
-	struct {
-		spinlock_t		lock;
-		/*
-		 * When we go to steal tags from another cpu (see steal_tags()),
-		 * we want to pick a cpu at random. Cycling through them every
-		 * time we steal is a bit easier and more or less equivalent:
-		 */
-		unsigned		cpu_last_stolen;
-
-		/* For sleeping on allocation failure */
-		wait_queue_head_t	wait;
-
-		/*
-		 * Global freelist - it's a stack where nr_free points to the
-		 * top
-		 */
-		unsigned		nr_free;
-		unsigned		*freelist;
-	} ____cacheline_aligned_in_smp;
-};
-
-/*
- * Number of tags we move between the percpu freelist and the global freelist at
- * a time
- */
-#define IDA_DEFAULT_PCPU_BATCH_MOVE	32U
-/* Max size of percpu freelist, */
-#define IDA_DEFAULT_PCPU_SIZE	((IDA_DEFAULT_PCPU_BATCH_MOVE * 3) / 2)
-
-int percpu_ida_alloc(struct percpu_ida *pool, int state);
-void percpu_ida_free(struct percpu_ida *pool, unsigned tag);
-
-void percpu_ida_destroy(struct percpu_ida *pool);
-int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
-	unsigned long max_size, unsigned long batch_size);
-static inline int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags)
-{
-	return __percpu_ida_init(pool, nr_tags, IDA_DEFAULT_PCPU_SIZE,
-		IDA_DEFAULT_PCPU_BATCH_MOVE);
-}
-
-typedef int (*percpu_ida_cb)(unsigned, void *);
-int percpu_ida_for_each_free(struct percpu_ida *pool, percpu_ida_cb fn,
-	void *data);
-
-unsigned percpu_ida_free_tags(struct percpu_ida *pool, int cpu);
-#endif /* __PERCPU_IDA_H__ */
diff --git a/lib/Makefile b/lib/Makefile
index 84c6dcb31fbb..f4722a7fa62c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -40,7 +40,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
 	 bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
-	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
+	 percpu-refcount.o rhashtable.o reciprocal_div.o \
 	 once.o refcount.o usercopy.o errseq.o bucket_locks.o
 obj-$(CONFIG_STRING_SELFTEST) += test_string.o
 obj-y += string_helpers.o
diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
deleted file mode 100644
index 9bbd9c5d375a..000000000000
--- a/lib/percpu_ida.c
+++ /dev/null
@@ -1,370 +0,0 @@
-/*
- * Percpu IDA library
- *
- * Copyright (C) 2013 Datera, Inc. Kent Overstreet
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2, or (at
- * your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- */
-
-#include <linux/mm.h>
-#include <linux/bitmap.h>
-#include <linux/bitops.h>
-#include <linux/bug.h>
-#include <linux/err.h>
-#include <linux/export.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/percpu.h>
-#include <linux/sched/signal.h>
-#include <linux/string.h>
-#include <linux/spinlock.h>
-#include <linux/percpu_ida.h>
-
-struct percpu_ida_cpu {
-	/*
-	 * Even though this is percpu, we need a lock for tag stealing by remote
-	 * CPUs:
-	 */
-	spinlock_t			lock;
-
-	/* nr_free/freelist form a stack of free IDs */
-	unsigned			nr_free;
-	unsigned			freelist[];
-};
-
-static inline void move_tags(unsigned *dst, unsigned *dst_nr,
-			     unsigned *src, unsigned *src_nr,
-			     unsigned nr)
-{
-	*src_nr -= nr;
-	memcpy(dst + *dst_nr, src + *src_nr, sizeof(unsigned) * nr);
-	*dst_nr += nr;
-}
-
-/*
- * Try to steal tags from a remote cpu's percpu freelist.
- *
- * We first check how many percpu freelists have tags
- *
- * Then we iterate through the cpus until we find some tags - we don't attempt
- * to find the "best" cpu to steal from, to keep cacheline bouncing to a
- * minimum.
- */
-static inline void steal_tags(struct percpu_ida *pool,
-			      struct percpu_ida_cpu *tags)
-{
-	unsigned cpus_have_tags, cpu = pool->cpu_last_stolen;
-	struct percpu_ida_cpu *remote;
-
-	for (cpus_have_tags = cpumask_weight(&pool->cpus_have_tags);
-	     cpus_have_tags; cpus_have_tags--) {
-		cpu = cpumask_next(cpu, &pool->cpus_have_tags);
-
-		if (cpu >= nr_cpu_ids) {
-			cpu = cpumask_first(&pool->cpus_have_tags);
-			if (cpu >= nr_cpu_ids)
-				BUG();
-		}
-
-		pool->cpu_last_stolen = cpu;
-		remote = per_cpu_ptr(pool->tag_cpu, cpu);
-
-		cpumask_clear_cpu(cpu, &pool->cpus_have_tags);
-
-		if (remote == tags)
-			continue;
-
-		spin_lock(&remote->lock);
-
-		if (remote->nr_free) {
-			memcpy(tags->freelist,
-			       remote->freelist,
-			       sizeof(unsigned) * remote->nr_free);
-
-			tags->nr_free = remote->nr_free;
-			remote->nr_free = 0;
-		}
-
-		spin_unlock(&remote->lock);
-
-		if (tags->nr_free)
-			break;
-	}
-}
-
-/*
- * Pop up to IDA_PCPU_BATCH_MOVE IDs off the global freelist, and push them onto
- * our percpu freelist:
- */
-static inline void alloc_global_tags(struct percpu_ida *pool,
-				     struct percpu_ida_cpu *tags)
-{
-	move_tags(tags->freelist, &tags->nr_free,
-		  pool->freelist, &pool->nr_free,
-		  min(pool->nr_free, pool->percpu_batch_size));
-}
-
-/**
- * percpu_ida_alloc - allocate a tag
- * @pool: pool to allocate from
- * @state: task state for prepare_to_wait
- *
- * Returns a tag - an integer in the range [0..nr_tags) (passed to
- * tag_pool_init()), or otherwise -ENOSPC on allocation failure.
- *
- * Safe to be called from interrupt context (assuming it isn't passed
- * TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, of course).
- *
- * @gfp indicates whether or not to wait until a free id is available (it's not
- * used for internal memory allocations); thus if passed __GFP_RECLAIM we may sleep
- * however long it takes until another thread frees an id (same semantics as a
- * mempool).
- *
- * Will not fail if passed TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE.
- */
-int percpu_ida_alloc(struct percpu_ida *pool, int state)
-{
-	DEFINE_WAIT(wait);
-	struct percpu_ida_cpu *tags;
-	unsigned long flags;
-	int tag = -ENOSPC;
-
-	tags = raw_cpu_ptr(pool->tag_cpu);
-	spin_lock_irqsave(&tags->lock, flags);
-
-	/* Fastpath */
-	if (likely(tags->nr_free >= 0)) {
-		tag = tags->freelist[--tags->nr_free];
-		spin_unlock_irqrestore(&tags->lock, flags);
-		return tag;
-	}
-	spin_unlock_irqrestore(&tags->lock, flags);
-
-	while (1) {
-		spin_lock_irqsave(&pool->lock, flags);
-		tags = this_cpu_ptr(pool->tag_cpu);
-
-		/*
-		 * prepare_to_wait() must come before steal_tags(), in case
-		 * percpu_ida_free() on another cpu flips a bit in
-		 * cpus_have_tags
-		 *
-		 * global lock held and irqs disabled, don't need percpu lock
-		 */
-		if (state != TASK_RUNNING)
-			prepare_to_wait(&pool->wait, &wait, state);
-
-		if (!tags->nr_free)
-			alloc_global_tags(pool, tags);
-		if (!tags->nr_free)
-			steal_tags(pool, tags);
-
-		if (tags->nr_free) {
-			tag = tags->freelist[--tags->nr_free];
-			if (tags->nr_free)
-				cpumask_set_cpu(smp_processor_id(),
-						&pool->cpus_have_tags);
-		}
-
-		spin_unlock_irqrestore(&pool->lock, flags);
-
-		if (tag >= 0 || state == TASK_RUNNING)
-			break;
-
-		if (signal_pending_state(state, current)) {
-			tag = -ERESTARTSYS;
-			break;
-		}
-
-		schedule();
-	}
-	if (state != TASK_RUNNING)
-		finish_wait(&pool->wait, &wait);
-
-	return tag;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_alloc);
-
-/**
- * percpu_ida_free - free a tag
- * @pool: pool @tag was allocated from
- * @tag: a tag previously allocated with percpu_ida_alloc()
- *
- * Safe to be called from interrupt context.
- */
-void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
-{
-	struct percpu_ida_cpu *tags;
-	unsigned long flags;
-	unsigned nr_free;
-
-	BUG_ON(tag >= pool->nr_tags);
-
-	tags = raw_cpu_ptr(pool->tag_cpu);
-
-	spin_lock_irqsave(&tags->lock, flags);
-	tags->freelist[tags->nr_free++] = tag;
-
-	nr_free = tags->nr_free;
-
-	if (nr_free == 1) {
-		cpumask_set_cpu(smp_processor_id(),
-				&pool->cpus_have_tags);
-		wake_up(&pool->wait);
-	}
-	spin_unlock_irqrestore(&tags->lock, flags);
-
-	if (nr_free == pool->percpu_max_size) {
-		spin_lock_irqsave(&pool->lock, flags);
-		spin_lock(&tags->lock);
-
-		if (tags->nr_free == pool->percpu_max_size) {
-			move_tags(pool->freelist, &pool->nr_free,
-				  tags->freelist, &tags->nr_free,
-				  pool->percpu_batch_size);
-
-			wake_up(&pool->wait);
-		}
-		spin_unlock(&tags->lock);
-		spin_unlock_irqrestore(&pool->lock, flags);
-	}
-}
-EXPORT_SYMBOL_GPL(percpu_ida_free);
-
-/**
- * percpu_ida_destroy - release a tag pool's resources
- * @pool: pool to free
- *
- * Frees the resources allocated by percpu_ida_init().
- */
-void percpu_ida_destroy(struct percpu_ida *pool)
-{
-	free_percpu(pool->tag_cpu);
-	free_pages((unsigned long) pool->freelist,
-		   get_order(pool->nr_tags * sizeof(unsigned)));
-}
-EXPORT_SYMBOL_GPL(percpu_ida_destroy);
-
-/**
- * percpu_ida_init - initialize a percpu tag pool
- * @pool: pool to initialize
- * @nr_tags: number of tags that will be available for allocation
- *
- * Initializes @pool so that it can be used to allocate tags - integers in the
- * range [0, nr_tags). Typically, they'll be used by driver code to refer to a
- * preallocated array of tag structures.
- *
- * Allocation is percpu, but sharding is limited by nr_tags - for best
- * performance, the workload should not span more cpus than nr_tags / 128.
- */
-int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
-	unsigned long max_size, unsigned long batch_size)
-{
-	unsigned i, cpu, order;
-
-	memset(pool, 0, sizeof(*pool));
-
-	init_waitqueue_head(&pool->wait);
-	spin_lock_init(&pool->lock);
-	pool->nr_tags = nr_tags;
-	pool->percpu_max_size = max_size;
-	pool->percpu_batch_size = batch_size;
-
-	/* Guard against overflow */
-	if (nr_tags > (unsigned) INT_MAX + 1) {
-		pr_err("percpu_ida_init(): nr_tags too large\n");
-		return -EINVAL;
-	}
-
-	order = get_order(nr_tags * sizeof(unsigned));
-	pool->freelist = (void *) __get_free_pages(GFP_KERNEL, order);
-	if (!pool->freelist)
-		return -ENOMEM;
-
-	for (i = 0; i < nr_tags; i++)
-		pool->freelist[i] = i;
-
-	pool->nr_free = nr_tags;
-
-	pool->tag_cpu = __alloc_percpu(sizeof(struct percpu_ida_cpu) +
-				       pool->percpu_max_size * sizeof(unsigned),
-				       sizeof(unsigned));
-	if (!pool->tag_cpu)
-		goto err;
-
-	for_each_possible_cpu(cpu)
-		spin_lock_init(&per_cpu_ptr(pool->tag_cpu, cpu)->lock);
-
-	return 0;
-err:
-	percpu_ida_destroy(pool);
-	return -ENOMEM;
-}
-EXPORT_SYMBOL_GPL(__percpu_ida_init);
-
-/**
- * percpu_ida_for_each_free - iterate free ids of a pool
- * @pool: pool to iterate
- * @fn: interate callback function
- * @data: parameter for @fn
- *
- * Note, this doesn't guarantee to iterate all free ids restrictly. Some free
- * ids might be missed, some might be iterated duplicated, and some might
- * be iterated and not free soon.
- */
-int percpu_ida_for_each_free(struct percpu_ida *pool, percpu_ida_cb fn,
-	void *data)
-{
-	unsigned long flags;
-	struct percpu_ida_cpu *remote;
-	unsigned cpu, i, err = 0;
-
-	for_each_possible_cpu(cpu) {
-		remote = per_cpu_ptr(pool->tag_cpu, cpu);
-		spin_lock_irqsave(&remote->lock, flags);
-		for (i = 0; i < remote->nr_free; i++) {
-			err = fn(remote->freelist[i], data);
-			if (err)
-				break;
-		}
-		spin_unlock_irqrestore(&remote->lock, flags);
-		if (err)
-			goto out;
-	}
-
-	spin_lock_irqsave(&pool->lock, flags);
-	for (i = 0; i < pool->nr_free; i++) {
-		err = fn(pool->freelist[i], data);
-		if (err)
-			break;
-	}
-	spin_unlock_irqrestore(&pool->lock, flags);
-out:
-	return err;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_for_each_free);
-
-/**
- * percpu_ida_free_tags - return free tags number of a specific cpu or global pool
- * @pool: pool related
- * @cpu: specific cpu or global pool if @cpu == nr_cpu_ids
- *
- * Note: this just returns a snapshot of free tags number.
- */
-unsigned percpu_ida_free_tags(struct percpu_ida *pool, int cpu)
-{
-	struct percpu_ida_cpu *remote;
-	if (cpu == nr_cpu_ids)
-		return pool->nr_free;
-	remote = per_cpu_ptr(pool->tag_cpu, cpu);
-	return remote->nr_free;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_free_tags);
-- 
2.17.1

^ permalink raw reply related

* [PATCH 2/3] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
	linux-usb, kvm, virtualization, netdev, Juergen Gross,
	qla2xxx-upstream, Kent Overstreet, Jens Axboe
  Cc: Matthew Wilcox
In-Reply-To: <20180612190545.10781-1-willy@infradead.org>

The sbitmap and the percpu_ida perform essentially the same task,
allocating tags for commands.  The sbitmap outperforms the percpu_ida
as documented here: https://lkml.org/lkml/2014/4/22/553

The sbitmap interface is a little harder to use, but being able to
remove the percpu_ida code and getting better performance justifies the
additional complexity.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>	# f_tcm
---
 drivers/scsi/qla2xxx/qla_target.c        | 10 ++++---
 drivers/target/iscsi/iscsi_target_util.c | 33 +++++++++++++++++++++---
 drivers/target/sbp/sbp_target.c          |  5 ++--
 drivers/target/target_core_transport.c   |  5 ++--
 drivers/target/tcm_fc/tfc_cmd.c          |  6 ++---
 drivers/usb/gadget/function/f_tcm.c      |  5 ++--
 drivers/vhost/scsi.c                     |  6 ++---
 drivers/xen/xen-scsiback.c               |  5 ++--
 include/target/iscsi/iscsi_target_core.h |  1 +
 include/target/target_core_base.h        |  7 ++---
 10 files changed, 59 insertions(+), 24 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index 05290966e630..a1725a054749 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -4277,9 +4277,9 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha,
 {
 	struct se_session *se_sess = sess->se_sess;
 	struct qla_tgt_cmd *cmd;
-	int tag;
+	int tag, cpu;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0)
 		return NULL;
 
@@ -4292,6 +4292,7 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha,
 	qlt_incr_num_pend_cmds(vha);
 	cmd->vha = vha;
 	cmd->se_cmd.map_tag = tag;
+	cmd->se_cmd.map_cpu = cpu;
 	cmd->sess = sess;
 	cmd->loop_id = sess->loop_id;
 	cmd->conf_compl_supported = sess->conf_compl_supported;
@@ -5294,7 +5295,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
 	struct fc_port *sess;
 	struct se_session *se_sess;
 	struct qla_tgt_cmd *cmd;
-	int tag;
+	int tag, cpu;
 	unsigned long flags;
 
 	if (unlikely(tgt->tgt_stop)) {
@@ -5326,7 +5327,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
 
 	se_sess = sess->se_sess;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0)
 		return;
 
@@ -5357,6 +5358,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
 	cmd->reset_count = ha->base_qpair->chip_reset;
 	cmd->q_full = 1;
 	cmd->qpair = ha->base_qpair;
+	cmd->se_cmd.map_cpu = cpu;
 
 	if (qfull) {
 		cmd->q_full = 1;
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 7e98697cfb8e..8cfcf9033507 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -17,7 +17,7 @@
  ******************************************************************************/
 
 #include <linux/list.h>
-#include <linux/percpu_ida.h>
+#include <linux/sched/signal.h>
 #include <net/ipv6.h>         /* ipv6_addr_equal() */
 #include <scsi/scsi_tcq.h>
 #include <scsi/iscsi_proto.h>
@@ -147,6 +147,30 @@ void iscsit_free_r2ts_from_list(struct iscsi_cmd *cmd)
 	spin_unlock_bh(&cmd->r2t_lock);
 }
 
+static int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
+{
+	int tag = -1;
+	DEFINE_WAIT(wait);
+	struct sbq_wait_state *ws;
+
+	if (state == TASK_RUNNING)
+		return tag;
+
+	ws = &se_sess->sess_tag_pool.ws[0];
+	for (;;) {
+		prepare_to_wait_exclusive(&ws->wait, &wait, state);
+		if (signal_pending_state(state, current))
+			break;
+		tag = sbitmap_queue_get(&se_sess->sess_tag_pool, cpup);
+		if (tag >= 0)
+			break;
+		schedule();
+	}
+
+	finish_wait(&ws->wait, &wait);
+	return tag;
+}
+
 /*
  * May be called from software interrupt (timer) context for allocating
  * iSCSI NopINs.
@@ -155,9 +179,11 @@ struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, int state)
 {
 	struct iscsi_cmd *cmd;
 	struct se_session *se_sess = conn->sess->se_sess;
-	int size, tag;
+	int size, tag, cpu;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, state);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
+	if (tag < 0)
+		tag = iscsit_wait_for_tag(se_sess, state, &cpu);
 	if (tag < 0)
 		return NULL;
 
@@ -166,6 +192,7 @@ struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, int state)
 	memset(cmd, 0, size);
 
 	cmd->se_cmd.map_tag = tag;
+	cmd->se_cmd.map_cpu = cpu;
 	cmd->conn = conn;
 	cmd->data_direction = DMA_NONE;
 	INIT_LIST_HEAD(&cmd->i_conn_node);
diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c
index 679ae29d25ab..42b21f2ac8b0 100644
--- a/drivers/target/sbp/sbp_target.c
+++ b/drivers/target/sbp/sbp_target.c
@@ -926,15 +926,16 @@ static struct sbp_target_request *sbp_mgt_get_req(struct sbp_session *sess,
 {
 	struct se_session *se_sess = sess->se_sess;
 	struct sbp_target_request *req;
-	int tag;
+	int tag, cpu;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0)
 		return ERR_PTR(-ENOMEM);
 
 	req = &((struct sbp_target_request *)se_sess->sess_cmd_map)[tag];
 	memset(req, 0, sizeof(*req));
 	req->se_cmd.map_tag = tag;
+	req->se_cmd.map_cpu = cpu;
 	req->se_cmd.tag = next_orb;
 
 	return req;
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index f0e8f0f4ccb4..18c53c5cdd3d 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -260,7 +260,8 @@ int transport_alloc_session_tags(struct se_session *se_sess,
 		}
 	}
 
-	rc = percpu_ida_init(&se_sess->sess_tag_pool, tag_num);
+	rc = sbitmap_queue_init_node(&se_sess->sess_tag_pool, tag_num, -1,
+			false, GFP_KERNEL, NUMA_NO_NODE);
 	if (rc < 0) {
 		pr_err("Unable to init se_sess->sess_tag_pool,"
 			" tag_num: %u\n", tag_num);
@@ -547,7 +548,7 @@ void transport_free_session(struct se_session *se_sess)
 		target_put_nacl(se_nacl);
 	}
 	if (se_sess->sess_cmd_map) {
-		percpu_ida_destroy(&se_sess->sess_tag_pool);
+		sbitmap_queue_free(&se_sess->sess_tag_pool);
 		kvfree(se_sess->sess_cmd_map);
 	}
 	kmem_cache_free(se_sess_cache, se_sess);
diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index 13e4efbe1ce7..a183d4da7db2 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -28,7 +28,6 @@
 #include <linux/configfs.h>
 #include <linux/ctype.h>
 #include <linux/hash.h>
-#include <linux/percpu_ida.h>
 #include <asm/unaligned.h>
 #include <scsi/scsi_tcq.h>
 #include <scsi/libfc.h>
@@ -448,9 +447,9 @@ static void ft_recv_cmd(struct ft_sess *sess, struct fc_frame *fp)
 	struct ft_cmd *cmd;
 	struct fc_lport *lport = sess->tport->lport;
 	struct se_session *se_sess = sess->se_sess;
-	int tag;
+	int tag, cpu;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0)
 		goto busy;
 
@@ -458,6 +457,7 @@ static void ft_recv_cmd(struct ft_sess *sess, struct fc_frame *fp)
 	memset(cmd, 0, sizeof(struct ft_cmd));
 
 	cmd->se_cmd.map_tag = tag;
+	cmd->se_cmd.map_cpu = cpu;
 	cmd->sess = sess;
 	cmd->seq = fc_seq_assign(lport, fp);
 	if (!cmd->seq) {
diff --git a/drivers/usb/gadget/function/f_tcm.c b/drivers/usb/gadget/function/f_tcm.c
index 9f670d9224b9..5003e857dce7 100644
--- a/drivers/usb/gadget/function/f_tcm.c
+++ b/drivers/usb/gadget/function/f_tcm.c
@@ -1071,15 +1071,16 @@ static struct usbg_cmd *usbg_get_cmd(struct f_uas *fu,
 {
 	struct se_session *se_sess = tv_nexus->tvn_se_sess;
 	struct usbg_cmd *cmd;
-	int tag;
+	int tag, cpu;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0)
 		return ERR_PTR(-ENOMEM);
 
 	cmd = &((struct usbg_cmd *)se_sess->sess_cmd_map)[tag];
 	memset(cmd, 0, sizeof(*cmd));
 	cmd->se_cmd.map_tag = tag;
+	cmd->se_cmd.map_cpu = cpu;
 	cmd->se_cmd.tag = cmd->tag = scsi_tag;
 	cmd->fu = fu;
 
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 70d35e696533..c9c5d6b291cc 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -46,7 +46,6 @@
 #include <linux/virtio_scsi.h>
 #include <linux/llist.h>
 #include <linux/bitmap.h>
-#include <linux/percpu_ida.h>
 
 #include "vhost.h"
 
@@ -567,7 +566,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
 	struct se_session *se_sess;
 	struct scatterlist *sg, *prot_sg;
 	struct page **pages;
-	int tag;
+	int tag, cpu;
 
 	tv_nexus = tpg->tpg_nexus;
 	if (!tv_nexus) {
@@ -576,7 +575,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
 	}
 	se_sess = tv_nexus->tvn_se_sess;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0) {
 		pr_err("Unable to obtain tag for vhost_scsi_cmd\n");
 		return ERR_PTR(-ENOMEM);
@@ -591,6 +590,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
 	cmd->tvc_prot_sgl = prot_sg;
 	cmd->tvc_upages = pages;
 	cmd->tvc_se_cmd.map_tag = tag;
+	cmd->tvc_se_cmd.map_cpu = cpu;
 	cmd->tvc_tag = scsi_tag;
 	cmd->tvc_lun = lun;
 	cmd->tvc_task_attr = task_attr;
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
index ec6635258ed8..764dd9aa0131 100644
--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -654,9 +654,9 @@ static struct vscsibk_pend *scsiback_get_pend_req(struct vscsiif_back_ring *ring
 	struct scsiback_nexus *nexus = tpg->tpg_nexus;
 	struct se_session *se_sess = nexus->tvn_se_sess;
 	struct vscsibk_pend *req;
-	int tag, i;
+	int tag, cpu, i;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+	tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
 	if (tag < 0) {
 		pr_err("Unable to obtain tag for vscsiif_request\n");
 		return ERR_PTR(-ENOMEM);
@@ -665,6 +665,7 @@ static struct vscsibk_pend *scsiback_get_pend_req(struct vscsiif_back_ring *ring
 	req = &((struct vscsibk_pend *)se_sess->sess_cmd_map)[tag];
 	memset(req, 0, sizeof(*req));
 	req->se_cmd.map_tag = tag;
+	req->se_cmd.map_cpu = cpu;
 
 	for (i = 0; i < VSCSI_MAX_GRANTS; i++)
 		req->grant_handles[i] = SCSIBACK_INVALID_HANDLE;
diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h
index cf5f3fff1f1a..f2e6abea8490 100644
--- a/include/target/iscsi/iscsi_target_core.h
+++ b/include/target/iscsi/iscsi_target_core.h
@@ -4,6 +4,7 @@
 
 #include <linux/dma-direction.h>     /* enum dma_data_direction */
 #include <linux/list.h>              /* struct list_head */
+#include <linux/sched.h>
 #include <linux/socket.h>            /* struct sockaddr_storage */
 #include <linux/types.h>             /* u8 */
 #include <scsi/iscsi_proto.h>        /* itt_t */
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 260c2f3e9460..448f291125c2 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -4,7 +4,7 @@
 
 #include <linux/configfs.h>      /* struct config_group */
 #include <linux/dma-direction.h> /* enum dma_data_direction */
-#include <linux/percpu_ida.h>    /* struct percpu_ida */
+#include <linux/sbitmap.h>
 #include <linux/percpu-refcount.h>
 #include <linux/semaphore.h>     /* struct semaphore */
 #include <linux/completion.h>
@@ -455,6 +455,7 @@ struct se_cmd {
 	int			sam_task_attr;
 	/* Used for se_sess->sess_tag_pool */
 	unsigned int		map_tag;
+	int			map_cpu;
 	/* Transport protocol dependent state, see transport_state_table */
 	enum transport_state_table t_state;
 	/* See se_cmd_flags_table */
@@ -608,7 +609,7 @@ struct se_session {
 	struct list_head	sess_wait_list;
 	spinlock_t		sess_cmd_lock;
 	void			*sess_cmd_map;
-	struct percpu_ida	sess_tag_pool;
+	struct sbitmap_queue	sess_tag_pool;
 };
 
 struct se_device;
@@ -936,7 +937,7 @@ static inline void atomic_dec_mb(atomic_t *v)
 
 static inline void target_free_tag(struct se_session *sess, struct se_cmd *cmd)
 {
-	percpu_ida_free(&sess->sess_tag_pool, cmd->map_tag);
+	sbitmap_queue_clear(&sess->sess_tag_pool, cmd->map_tag, cmd->map_cpu);
 }
 
 #endif /* TARGET_CORE_BASE_H */
-- 
2.17.1

^ permalink raw reply related

* [PATCH 1/3] target: Abstract tag freeing
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
	linux-usb, kvm, virtualization, netdev, Juergen Gross,
	qla2xxx-upstream, Kent Overstreet, Jens Axboe
  Cc: Matthew Wilcox
In-Reply-To: <20180612190545.10781-1-willy@infradead.org>

Introduce target_free_tag() and convert all drivers to use it.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
---
 drivers/scsi/qla2xxx/qla_target.c        | 4 ++--
 drivers/target/iscsi/iscsi_target_util.c | 2 +-
 drivers/target/sbp/sbp_target.c          | 2 +-
 drivers/target/tcm_fc/tfc_cmd.c          | 4 ++--
 drivers/usb/gadget/function/f_tcm.c      | 2 +-
 drivers/vhost/scsi.c                     | 2 +-
 drivers/xen/xen-scsiback.c               | 4 +---
 include/target/target_core_base.h        | 5 +++++
 8 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index b85c833099ff..05290966e630 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -3783,7 +3783,7 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
 		return;
 	}
 	cmd->jiffies_at_free = get_jiffies_64();
-	percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+	target_free_tag(sess->se_sess, &cmd->se_cmd);
 }
 EXPORT_SYMBOL(qlt_free_cmd);
 
@@ -4146,7 +4146,7 @@ static void __qlt_do_work(struct qla_tgt_cmd *cmd)
 	qlt_send_term_exchange(qpair, NULL, &cmd->atio, 1, 0);
 
 	qlt_decr_num_pend_cmds(vha);
-	percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+	target_free_tag(sess->se_sess, &cmd->se_cmd);
 	spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
 
 	spin_lock_irqsave(&ha->tgt.sess_lock, flags);
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 4435bf374d2d..7e98697cfb8e 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -711,7 +711,7 @@ void iscsit_release_cmd(struct iscsi_cmd *cmd)
 	kfree(cmd->iov_data);
 	kfree(cmd->text_in_ptr);
 
-	percpu_ida_free(&sess->se_sess->sess_tag_pool, se_cmd->map_tag);
+	target_free_tag(sess->se_sess, se_cmd);
 }
 EXPORT_SYMBOL(iscsit_release_cmd);
 
diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c
index fb1003921d85..679ae29d25ab 100644
--- a/drivers/target/sbp/sbp_target.c
+++ b/drivers/target/sbp/sbp_target.c
@@ -1460,7 +1460,7 @@ static void sbp_free_request(struct sbp_target_request *req)
 	kfree(req->pg_tbl);
 	kfree(req->cmd_buf);
 
-	percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+	target_free_tag(se_sess, se_cmd);
 }
 
 static void sbp_mgt_agent_process(struct work_struct *work)
diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index ec372860106f..13e4efbe1ce7 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -92,7 +92,7 @@ static void ft_free_cmd(struct ft_cmd *cmd)
 	if (fr_seq(fp))
 		fc_seq_release(fr_seq(fp));
 	fc_frame_free(fp);
-	percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+	target_free_tag(sess->se_sess, &cmd->se_cmd);
 	ft_sess_put(sess);	/* undo get from lookup at recv */
 }
 
@@ -461,7 +461,7 @@ static void ft_recv_cmd(struct ft_sess *sess, struct fc_frame *fp)
 	cmd->sess = sess;
 	cmd->seq = fc_seq_assign(lport, fp);
 	if (!cmd->seq) {
-		percpu_ida_free(&se_sess->sess_tag_pool, tag);
+		target_free_tag(se_sess, &cmd->se_cmd);
 		goto busy;
 	}
 	cmd->req_frame = fp;		/* hold frame during cmd */
diff --git a/drivers/usb/gadget/function/f_tcm.c b/drivers/usb/gadget/function/f_tcm.c
index d78dbb73bde8..9f670d9224b9 100644
--- a/drivers/usb/gadget/function/f_tcm.c
+++ b/drivers/usb/gadget/function/f_tcm.c
@@ -1288,7 +1288,7 @@ static void usbg_release_cmd(struct se_cmd *se_cmd)
 	struct se_session *se_sess = se_cmd->se_sess;
 
 	kfree(cmd->data_buf);
-	percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+	target_free_tag(se_sess, se_cmd);
 }
 
 static u32 usbg_sess_get_index(struct se_session *se_sess)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 7ad57094d736..70d35e696533 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -324,7 +324,7 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
 	}
 
 	vhost_scsi_put_inflight(tv_cmd->inflight);
-	percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+	target_free_tag(se_sess, se_cmd);
 }
 
 static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
index 7bc88fd43cfc..ec6635258ed8 100644
--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -1377,9 +1377,7 @@ static int scsiback_check_stop_free(struct se_cmd *se_cmd)
 
 static void scsiback_release_cmd(struct se_cmd *se_cmd)
 {
-	struct se_session *se_sess = se_cmd->se_sess;
-
-	percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+	target_free_tag(se_cmd->se_sess, se_cmd);
 }
 
 static u32 scsiback_sess_get_index(struct se_session *se_sess)
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 922a39f45abc..260c2f3e9460 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -934,4 +934,9 @@ static inline void atomic_dec_mb(atomic_t *v)
 	smp_mb__after_atomic();
 }
 
+static inline void target_free_tag(struct se_session *sess, struct se_cmd *cmd)
+{
+	percpu_ida_free(&sess->sess_tag_pool, cmd->map_tag);
+}
+
 #endif /* TARGET_CORE_BASE_H */
-- 
2.17.1

^ permalink raw reply related

* [PATCH 0/3] Use sbitmap instead of percpu_ida
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
	linux-usb, kvm, virtualization, netdev, Juergen Gross,
	qla2xxx-upstream, Kent Overstreet, Jens Axboe
  Cc: Matthew Wilcox

Removing the percpu_ida code nets over 400 lines of removal.  It's not
as spectacular as deleting an entire architecture, but it's still a
worthy reduction in lines of code.

Untested due to lack of hardware and not understanding how to set up a
target platform.

Changes from v1:
 - Fixed bugs pointed out by Jens in iscsit_wait_for_tag()
 - Abstracted out tag freeing as requested by Bart
 - Made iscsit_wait_for_tag static as pointed out by 0day

Matthew Wilcox (3):
  target: Abstract tag freeing
  Convert target drivers to use sbitmap
  Remove percpu_ida

 drivers/scsi/qla2xxx/qla_target.c        |  14 +-
 drivers/target/iscsi/iscsi_target_util.c |  35 ++-
 drivers/target/sbp/sbp_target.c          |   7 +-
 drivers/target/target_core_transport.c   |   5 +-
 drivers/target/tcm_fc/tfc_cmd.c          |  10 +-
 drivers/usb/gadget/function/f_tcm.c      |   7 +-
 drivers/vhost/scsi.c                     |   8 +-
 drivers/xen/xen-scsiback.c               |   9 +-
 include/linux/percpu_ida.h               |  83 -----
 include/target/iscsi/iscsi_target_core.h |   1 +
 include/target/target_core_base.h        |  10 +-
 lib/Makefile                             |   2 +-
 lib/percpu_ida.c                         | 370 -----------------------
 13 files changed, 73 insertions(+), 488 deletions(-)
 delete mode 100644 include/linux/percpu_ida.h
 delete mode 100644 lib/percpu_ida.c

-- 
2.17.1

^ permalink raw reply

* Re: [PATCH v4 1/3] compiler-gcc.h: add gnu_inline to all inline declarations
From: H. Peter Anvin @ 2018-06-12 18:51 UTC (permalink / raw)
  To: Nick Desaulniers, sedat.dilek, Arnd Bergmann
  Cc: Kate Stewart, linux-efi, brijesh.singh, J. Kiszka, Will Deacon,
	virtualization, Masahiro Yamada, Manoj Gupta, boris.ostrovsky,
	mawilcox, x86, akataria, Greg Hackmann, Cao.jin, mingo, geert,
	David Rientjes, Andrey Ryabinin, Linux Kbuild mailing list,
	Kees Cook, rostedt, acme, Michal Marek, Thomas Gleixner,
	Alistair Strachan, tstellar, Ard Biesheuvel
In-Reply-To: <CAKwvOdmxFHfkj81YAXa8fE86fC+k+KvqMvgk4=e62pNCcXbKOg@mail.gmail.com>

<caoj.fnst@cn.fujitsu.com>,Greg KH <gregkh@linuxfoundation.org>,jarkko.sakkinen@linux.intel.com,jgross@suse.com,Josh Poimboeuf <jpoimboe@redhat.com>,Matthias Kaehlcke <mka@chromium.org>,thomas.lendacky@amd.com,Thiebaud Weksteen <tweek@google.com>,mjg59@google.com,joe@perches.com
From: hpa@zytor.com
Message-ID: <191E4EBE-4CB2-4C8B-AB61-689A91FFE7A8@zytor.com>

On June 12, 2018 11:33:14 AM PDT, Nick Desaulniers <ndesaulniers@google.com> wrote:
>On Fri, Jun 8, 2018 at 3:04 AM Sedat Dilek <sedat.dilek@gmail.com>
>wrote:
>>
>> On Fri, Jun 8, 2018 at 9:59 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Thu, Jun 7, 2018 at 10:49 PM, Nick Desaulniers
>> > <ndesaulniers@google.com> wrote:
>> >> Functions marked extern inline do not emit an externally visible
>> >> function when the gnu89 C standard is used. Some KBUILD Makefiles
>> >> overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as
>without
>> >> an explicit C standard specified, the default is gnu11. Since c99,
>the
>> >> semantics of extern inline have changed such that an externally
>visible
>> >> function is always emitted. This can lead to multiple definition
>errors
>> >> of extern inline functions at link time of compilation units whose
>build
>> >> files have removed an explicit C standard compiler flag for users
>of GCC
>> >> 5.1+ or Clang.
>> >>
>> >> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
>> >> Suggested-by: H. Peter Anvin <hpa@zytor.com>
>> >> Suggested-by: Joe Perches <joe@perches.com>
>> >
>> > I suspect this will break Geert's gcc-4.1.2, which I think doesn't
>have that
>> > attribute yet (4.1.3 or higher have it according to the
>documentation.
>> >
>> > It wouldn't be hard to work around that if we want to keep that
>version
>> > working, or we could decide that it's time to officially stop
>supporting
>> > that version, but we should probably decide on one or the other.
>
>Heh, so earlier we decided against compiler flags (-std=gnu89 or
>-fgnu89-inline) in preference to function attributes.  The function
>attribute is preferable as some of the Makefiles [accidentally?]
>overwrite KBUILD_CFLAGS, which is problematic for gcc 5.1 users as the
>implicit c standard used was changed to gnu11 from gnu89.  What's nice
>is that to support gcc 4.1 users, we simply don't need to add any
>attribute, as their implicit c standard is gnu89 which has the
>semantics for extern inline that we want.  I have a simple change to
>this patch that can support users of various gcc versions, see below:
>
>> Good point.
>> What is the minimum requirement of GCC version currently?
>> AFAICS x86/asm-goto support requires GCC >= 4.5?
>
>Yes, but that's only for x86, IIUC.  It seems the kernel may have
>different minimum required versions of GCC based on arch then?  That
>may be ok, but I'm not sure that's easy to keep track of without
>having it explicitly stated somewhere like the docs perhaps?
>
>> Just FYI...
>> ...saw the last days in upstream commits that kbuild/kconfig for
>> 4.18-rc1 offers possibilities to check for cc-version dependencies.
>
>Those will be helpful.  If we want to pursue compiler flags, which get
>set some Makefiles, then yes.  But I think a simpler change to my
>patch would be as below.
>
>It seems gcc did not get __has_attribute [0] until 5.1, but will
>define __GNUC_GNU_INLINE__ if supported. [1]  Unfortunately, Clang
>does not define __GNUC_GNU_INLINE__ [2].  So a proper feature test
>might look like:
>
>```
>#ifndef __has_attribute
>#define __has_attribute(x) 0
>#endif
>
>#if defined(__GNUC_GNU_INLINE__) || __has_attribute(gnu_inline)
>#define __gnu_inline __attribute__(gnu_inline)
>#endif
>
>#define inline inline __attribute__((always_inline, unused)) notrace
>__gnu_inline
>```
>
>Thoughts on this approach? I can send a v5 tomorrow if there's no
>major issues.  Feedback appreciated, as always.
>
>[0] https://clang.llvm.org/docs/LanguageExtensions.html#has-attribute
>[1]
>https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
>[2] https://bugs.llvm.org/show_bug.cgi?id=37784

Please fix clang. It isn't all that hard to fix.

However, __GCC_GNU_INLINE__ means you are in GNU mode by default, on gcc's new enough to have multiple misses.

The right thing to look for is __GCC_STDC_INLINE__ in which case you need the attribute.

By the way, you should check clang against gcc's predefined macros by doing:

gcc [options] -x c -Wp,-dM -E /dev/null | sort

Options can change the predefined macros substantially, especially the, -std=, arch and -O options. -x c can be replaced with e.g. -x c++, objective-c, assembler-with-cpp etc.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply

* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 18:08 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
	kvm@vger.kernel.org, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	target-devel@vger.kernel.org, qla2xxx-upstream@qlogic.com,
	linux1394-devel@lists.sourceforge.net, kent.overstreet@gmail.com
In-Reply-To: <0c93c72a3a339f3479f82de04223315671e07863.camel@wdc.com>

On Tue, Jun 12, 2018 at 04:32:03PM +0000, Bart Van Assche wrote:
> On Tue, 2018-06-12 at 09:15 -0700, Matthew Wilcox wrote:
> > On Tue, Jun 12, 2018 at 03:22:42PM +0000, Bart Van Assche wrote:
> > > Please introduce functions in the target core for allocating and freeing a tag
> > > instead of spreading the knowledge of how to allocate and free tags over all
> > > target drivers.
> > 
> > I can't without doing an unreasonably large amount of work on drivers that
> > I have no way to test.  Some of the drivers have the se_cmd already; some
> > of them don't.  I'd be happy to introduce a common function for freeing
> > a tag.
> 
> Which target drivers are you referring to? If you are referring to the sbp driver:
> I think that driver is dead and can be removed from the kernel tree. I even don't
> know whether that driver ever has had any users other than the developer of that
> driver.

For example tcm_fc:

        tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
        if (tag < 0)
                goto busy;

        cmd = &((struct ft_cmd *)se_sess->sess_cmd_map)[tag];

or qla2xxx:

        tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
        if (tag < 0)
                return NULL;

        cmd = &((struct qla_tgt_cmd *)se_sess->sess_cmd_map)[tag];

The core doesn't know at what offset from the pointer to store the tag
& cpu.  Only the individual drivers know their cmd layout.

^ permalink raw reply

* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Bart Van Assche @ 2018-06-12 16:32 UTC (permalink / raw)
  To: willy@infradead.org
  Cc: jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
	kvm@vger.kernel.org, mawilcox@microsoft.com,
	netdev@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	target-devel@vger.kernel.org, qla2xxx-upstream@qlogic.com,
	linux1394-devel@lists.sourceforge.net, kent.overstreet@gmail.com
In-Reply-To: <20180612161526.GE19433@bombadil.infradead.org>

On Tue, 2018-06-12 at 09:15 -0700, Matthew Wilcox wrote:
> On Tue, Jun 12, 2018 at 03:22:42PM +0000, Bart Van Assche wrote:
> > On Tue, 2018-05-15 at 09:00 -0700, Matthew Wilcox wrote:
> > > diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> > > index 025dc2d3f3de..cdf671c2af61 100644
> > > --- a/drivers/scsi/qla2xxx/qla_target.c
> > > +++ b/drivers/scsi/qla2xxx/qla_target.c
> > > @@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
> > >  		return;
> > >  	}
> > >  	cmd->jiffies_at_free = get_jiffies_64();
> > > -	percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
> > > +	sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
> > > +			cmd->se_cmd.map_cpu);
> > >  }
> > >  EXPORT_SYMBOL(qlt_free_cmd);
> > 
> > Please introduce functions in the target core for allocating and freeing a tag
> > instead of spreading the knowledge of how to allocate and free tags over all
> > target drivers.
> 
> I can't without doing an unreasonably large amount of work on drivers that
> I have no way to test.  Some of the drivers have the se_cmd already; some
> of them don't.  I'd be happy to introduce a common function for freeing
> a tag.

Which target drivers are you referring to? If you are referring to the sbp driver:
I think that driver is dead and can be removed from the kernel tree. I even don't
know whether that driver ever has had any users other than the developer of that
driver.

> > This looks weird to me. Shouldn't target code ignore signals instead of causing
> > tag allocation to fail if a signal is received?
> 
> It's what the current code did:
> 
> -               if (signal_pending_state(state, current)) {
> -                       tag = -ERESTARTSYS;
> -                       break;
> -               }
> 
> and the current callers literally indicate that they want signals:
> 
> drivers/infiniband/ulp/isert/ib_isert.c:        cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);
> drivers/target/iscsi/iscsi_target.c:    cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);

Right, the iSCSI target driver uses signals to wake up threads (see also the
send_sig() calls in the iSCSI target code).

Bart.

^ permalink raw reply

* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 16:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
	kvm@vger.kernel.org, mawilcox@microsoft.com,
	netdev@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	target-devel@vger.kernel.org, qla2xxx-upstream@qlogic.com,
	linux1394-devel@lists.sourceforge.net, kent.overstreet@gmail.com
In-Reply-To: <da5220a5ed4bed210c31a7517389e787a3b1a01f.camel@wdc.com>

On Tue, Jun 12, 2018 at 03:22:42PM +0000, Bart Van Assche wrote:
> On Tue, 2018-05-15 at 09:00 -0700, Matthew Wilcox wrote:
> > diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> > index 025dc2d3f3de..cdf671c2af61 100644
> > --- a/drivers/scsi/qla2xxx/qla_target.c
> > +++ b/drivers/scsi/qla2xxx/qla_target.c
> > @@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
> >  		return;
> >  	}
> >  	cmd->jiffies_at_free = get_jiffies_64();
> > -	percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
> > +	sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
> > +			cmd->se_cmd.map_cpu);
> >  }
> >  EXPORT_SYMBOL(qlt_free_cmd);
> 
> Please introduce functions in the target core for allocating and freeing a tag
> instead of spreading the knowledge of how to allocate and free tags over all
> target drivers.

I can't without doing an unreasonably large amount of work on drivers that
I have no way to test.  Some of the drivers have the se_cmd already; some
of them don't.  I'd be happy to introduce a common function for freeing
a tag.

> > +int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
> > +{
> > +	int tag = -1;
> > +	DEFINE_WAIT(wait);
> > +	struct sbq_wait_state *ws;
> > +
> > +	if (state == TASK_RUNNING)
> > +		return tag;
> > +
> > +	ws = &se_sess->sess_tag_pool.ws[0];
> > +	for (;;) {
> > +		prepare_to_wait_exclusive(&ws->wait, &wait, state);
> > +		if (signal_pending_state(state, current))
> > +			break;
> 
> This looks weird to me. Shouldn't target code ignore signals instead of causing
> tag allocation to fail if a signal is received?

It's what the current code did:

-               if (signal_pending_state(state, current)) {
-                       tag = -ERESTARTSYS;
-                       break;
-               }

and the current callers literally indicate that they want signals:

drivers/infiniband/ulp/isert/ib_isert.c:        cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);
drivers/target/iscsi/iscsi_target.c:    cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);

(etc)

^ permalink raw reply

* Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support
From: Stefan Hajnoczi @ 2018-06-12 16:05 UTC (permalink / raw)
  To: Liu, Changpeng
  Cc: pbonzini@redhat.com, cavery@redhat.com,
	virtualization@lists.linux-foundation.org
In-Reply-To: <FF7FC980937D6342B9D289F5F3C7C2625B6DF62C@SHSMSX103.ccr.corp.intel.com>


[-- Attachment #1.1: Type: text/plain, Size: 6316 bytes --]

On Mon, Jun 11, 2018 at 03:37:00AM +0000, Liu, Changpeng wrote:
> 
> 
> > -----Original Message-----
> > From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> > Sent: Friday, June 8, 2018 6:20 PM
> > To: Liu, Changpeng <changpeng.liu@intel.com>
> > Cc: virtualization@lists.linux-foundation.org; cavery@redhat.com;
> > jasowang@redhat.com; pbonzini@redhat.com; Wang, Wei W
> > <wei.w.wang@intel.com>
> > Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands
> > support
> > 
> > On Thu, Jun 07, 2018 at 11:07:06PM +0000, Liu, Changpeng wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> > > > Sent: Thursday, June 7, 2018 9:10 PM
> > > > To: Liu, Changpeng <changpeng.liu@intel.com>
> > > > Cc: virtualization@lists.linux-foundation.org; cavery@redhat.com;
> > > > jasowang@redhat.com; pbonzini@redhat.com; Wang, Wei W
> > > > <wei.w.wang@intel.com>
> > > > Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands
> > > > support
> > > >
> > > > On Wed, Jun 06, 2018 at 12:19:00PM +0800, Changpeng Liu wrote:
> > > > > Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES commands
> > > > > support, this will impact the performance when using SSD backend over
> > > > > file systems.
> > > > >
> > > > > Commit 88c85538 "virtio-blk: add discard and write zeroes features to
> > > > > specification"(see https://github.com/oasis-tcs/virtio-spec) extended
> > > > > existing virtio-blk protocol, adding extra DISCARD and WRITE ZEROES
> > > > > commands support.
> > > > >
> > > > > While here, using 16 bytes descriptor to describe one segment of DISCARD
> > > > > or WRITE ZEROES commands, each command may contain one or more
> > > > decriptors.
> > > > >
> > > > > The following data structure shows the definition of one descriptor:
> > > > >
> > > > > struct virtio_blk_discard_write_zeroes {
> > > > >         le64 sector;
> > > > >         le32 num_sectors;
> > > > >         le32 unmap;
> > > > > };
> > > > >
> > > > > Field 'sector' means the start sector for DISCARD and WRITE ZEROES,
> > > > > filed 'num_sectors' means the number of sectors for DISCARD and WRITE
> > > > > ZEROES, if both DISCARD and WRITE ZEROES are supported, field 'unmap'
> > > > > maybe used for WRITE ZEROES command with DISCARD enabled.
> > > > >
> > > > > We also extended the virtio-blk configuration space to let backend
> > > > > device put DISCARD and WRITE ZEROES configuration parameters.
> > > > >
> > > > > struct virtio_blk_config {
> > > > >         [...]
> > > > >
> > > > >         le32 max_discard_sectors;
> > > > >         le32 max_discard_seg;
> > > > >         le32 discard_sector_alignment;
> > > > >         le32 max_write_zeroes_sectors;
> > > > >         le32 max_write_zeroes_seg;
> > > > >         u8 write_zeroes_may_unmap;
> > > > > }
> > > > >
> > > > > New feature bit [VIRTIO_BLK_F_DISCARD (13)]: Device can support discard
> > > > > command, maximum discard sectors size in field 'max_discard_sectors' and
> > > > > maximum discard segment number in field 'max_discard_seg'.
> > > > >
> > > > > New feature [VIRTIO_BLK_F_WRITE_ZEROES (14)]: Device can support write
> > > > > zeroes command, maximum write zeroes sectors size in field
> > > > > 'max_write_zeroes_sectors' and maximum write zeroes segment number in
> > > > > field 'max_write_zeroes_seg'.
> > > > >
> > > > > The parameters in the configuration space of the device field
> > > > > 'max_discard_sectors' and field 'discard_sector_alignment' are expressed in
> > > > > 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. The
> > > > > field 'max_write_zeroes_sectors' is expressed in 512-byte units if the
> > > > > VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated.
> > > > >
> > > > > Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> > > > > ---
> > > > > CHANGELOG:
> > > > > v6: don't set T_OUT bit to discard and write zeroes commands.
> > > >
> > > > I don't see this in the patch...
> > > Yeah, do noting with DISCARD/WRITE ZEROES means no need to OR BLK_T_OUT
> > again.
> > > >
> > > > > @@ -225,6 +260,7 @@ static blk_status_t virtio_queue_rq(struct
> > > > blk_mq_hw_ctx *hctx,
> > > > >  	int qid = hctx->queue_num;
> > > > >  	int err;
> > > > >  	bool notify = false;
> > > > > +	bool unmap = false;
> > > > >  	u32 type;
> > > > >
> > > > >  	BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
> > > > > @@ -237,6 +273,13 @@ static blk_status_t virtio_queue_rq(struct
> > > > blk_mq_hw_ctx *hctx,
> > > > >  	case REQ_OP_FLUSH:
> > > > >  		type = VIRTIO_BLK_T_FLUSH;
> > > > >  		break;
> > > > > +	case REQ_OP_DISCARD:
> > > > > +		type = VIRTIO_BLK_T_DISCARD;
> > > > > +		break;
> > > > > +	case REQ_OP_WRITE_ZEROES:
> > > > > +		type = VIRTIO_BLK_T_WRITE_ZEROES;
> > > > > +		unmap = !(req->cmd_flags & REQ_NOUNMAP);
> > > > > +		break;
> > > > >  	case REQ_OP_SCSI_IN:
> > > > >  	case REQ_OP_SCSI_OUT:
> > > > >  		type = VIRTIO_BLK_T_SCSI_CMD;
> > > > > @@ -256,6 +299,12 @@ static blk_status_t virtio_queue_rq(struct
> > > > blk_mq_hw_ctx *hctx,
> > > > >
> > > > >  	blk_mq_start_request(req);
> > > > >
> > > > > +	if (type == VIRTIO_BLK_T_DISCARD || type ==
> > > > VIRTIO_BLK_T_WRITE_ZEROES) {
> > > > > +		err = virtblk_setup_discard_write_zeroes(req, unmap);
> > > > > +		if (err)
> > > > > +			return BLK_STS_RESOURCE;
> > > > > +	}
> > > > > +
> > > > >  	num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
> > > > >  	if (num) {
> > > > >  		if (rq_data_dir(req) == WRITE)
> > > >
> > > > ...since we still do blk_rq_map_sg() here and num should be != 0.
> > > No, while here, we should keep the original logic for READ/WRITE commands.
> > 
> > My question is: why does the changelog say "don't set T_OUT" but the
> > code *will* set it because blk_rq_map_sg() returns != 0 and
> > rq_data_dir(req) == WRITE?
> Since the last bit of DISCARD/WRITE ZEROES commands are already 1, so I said we don't need to set
> T_OUT bit to DISCARD/WRITE ZEROES commands again. But the original logic for WRITE, T_OUT is still
> needed, so just keep the original code here is fine.

Okay, I understand what you meant now.  Thanks!

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox