* RE: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support
From: Liu, Changpeng @ 2018-06-11 3:37 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: pbonzini@redhat.com, cavery@redhat.com,
virtualization@lists.linux-foundation.org
In-Reply-To: <20180608102027.GB31164@stefanha-x1.localdomain>
> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> Sent: Friday, June 8, 2018 6:20 PM
> To: Liu, Changpeng <changpeng.liu@intel.com>
> Cc: virtualization@lists.linux-foundation.org; cavery@redhat.com;
> jasowang@redhat.com; pbonzini@redhat.com; Wang, Wei W
> <wei.w.wang@intel.com>
> Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands
> support
>
> On Thu, Jun 07, 2018 at 11:07:06PM +0000, Liu, Changpeng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> > > Sent: Thursday, June 7, 2018 9:10 PM
> > > To: Liu, Changpeng <changpeng.liu@intel.com>
> > > Cc: virtualization@lists.linux-foundation.org; cavery@redhat.com;
> > > jasowang@redhat.com; pbonzini@redhat.com; Wang, Wei W
> > > <wei.w.wang@intel.com>
> > > Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands
> > > support
> > >
> > > On Wed, Jun 06, 2018 at 12:19:00PM +0800, Changpeng Liu wrote:
> > > > Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES commands
> > > > support, this will impact the performance when using SSD backend over
> > > > file systems.
> > > >
> > > > Commit 88c85538 "virtio-blk: add discard and write zeroes features to
> > > > specification"(see https://github.com/oasis-tcs/virtio-spec) extended
> > > > existing virtio-blk protocol, adding extra DISCARD and WRITE ZEROES
> > > > commands support.
> > > >
> > > > While here, using 16 bytes descriptor to describe one segment of DISCARD
> > > > or WRITE ZEROES commands, each command may contain one or more
> > > decriptors.
> > > >
> > > > The following data structure shows the definition of one descriptor:
> > > >
> > > > struct virtio_blk_discard_write_zeroes {
> > > > le64 sector;
> > > > le32 num_sectors;
> > > > le32 unmap;
> > > > };
> > > >
> > > > Field 'sector' means the start sector for DISCARD and WRITE ZEROES,
> > > > filed 'num_sectors' means the number of sectors for DISCARD and WRITE
> > > > ZEROES, if both DISCARD and WRITE ZEROES are supported, field 'unmap'
> > > > maybe used for WRITE ZEROES command with DISCARD enabled.
> > > >
> > > > We also extended the virtio-blk configuration space to let backend
> > > > device put DISCARD and WRITE ZEROES configuration parameters.
> > > >
> > > > struct virtio_blk_config {
> > > > [...]
> > > >
> > > > le32 max_discard_sectors;
> > > > le32 max_discard_seg;
> > > > le32 discard_sector_alignment;
> > > > le32 max_write_zeroes_sectors;
> > > > le32 max_write_zeroes_seg;
> > > > u8 write_zeroes_may_unmap;
> > > > }
> > > >
> > > > New feature bit [VIRTIO_BLK_F_DISCARD (13)]: Device can support discard
> > > > command, maximum discard sectors size in field 'max_discard_sectors' and
> > > > maximum discard segment number in field 'max_discard_seg'.
> > > >
> > > > New feature [VIRTIO_BLK_F_WRITE_ZEROES (14)]: Device can support write
> > > > zeroes command, maximum write zeroes sectors size in field
> > > > 'max_write_zeroes_sectors' and maximum write zeroes segment number in
> > > > field 'max_write_zeroes_seg'.
> > > >
> > > > The parameters in the configuration space of the device field
> > > > 'max_discard_sectors' and field 'discard_sector_alignment' are expressed in
> > > > 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. The
> > > > field 'max_write_zeroes_sectors' is expressed in 512-byte units if the
> > > > VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated.
> > > >
> > > > Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> > > > ---
> > > > CHANGELOG:
> > > > v6: don't set T_OUT bit to discard and write zeroes commands.
> > >
> > > I don't see this in the patch...
> > Yeah, do noting with DISCARD/WRITE ZEROES means no need to OR BLK_T_OUT
> again.
> > >
> > > > @@ -225,6 +260,7 @@ static blk_status_t virtio_queue_rq(struct
> > > blk_mq_hw_ctx *hctx,
> > > > int qid = hctx->queue_num;
> > > > int err;
> > > > bool notify = false;
> > > > + bool unmap = false;
> > > > u32 type;
> > > >
> > > > BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
> > > > @@ -237,6 +273,13 @@ static blk_status_t virtio_queue_rq(struct
> > > blk_mq_hw_ctx *hctx,
> > > > case REQ_OP_FLUSH:
> > > > type = VIRTIO_BLK_T_FLUSH;
> > > > break;
> > > > + case REQ_OP_DISCARD:
> > > > + type = VIRTIO_BLK_T_DISCARD;
> > > > + break;
> > > > + case REQ_OP_WRITE_ZEROES:
> > > > + type = VIRTIO_BLK_T_WRITE_ZEROES;
> > > > + unmap = !(req->cmd_flags & REQ_NOUNMAP);
> > > > + break;
> > > > case REQ_OP_SCSI_IN:
> > > > case REQ_OP_SCSI_OUT:
> > > > type = VIRTIO_BLK_T_SCSI_CMD;
> > > > @@ -256,6 +299,12 @@ static blk_status_t virtio_queue_rq(struct
> > > blk_mq_hw_ctx *hctx,
> > > >
> > > > blk_mq_start_request(req);
> > > >
> > > > + if (type == VIRTIO_BLK_T_DISCARD || type ==
> > > VIRTIO_BLK_T_WRITE_ZEROES) {
> > > > + err = virtblk_setup_discard_write_zeroes(req, unmap);
> > > > + if (err)
> > > > + return BLK_STS_RESOURCE;
> > > > + }
> > > > +
> > > > num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
> > > > if (num) {
> > > > if (rq_data_dir(req) == WRITE)
> > >
> > > ...since we still do blk_rq_map_sg() here and num should be != 0.
> > No, while here, we should keep the original logic for READ/WRITE commands.
>
> My question is: why does the changelog say "don't set T_OUT" but the
> code *will* set it because blk_rq_map_sg() returns != 0 and
> rq_data_dir(req) == WRITE?
Since the last bit of DISCARD/WRITE ZEROES commands are already 1, so I said we don't need to set
T_OUT bit to DISCARD/WRITE ZEROES commands again. But the original logic for WRITE, T_OUT is still
needed, so just keep the original code here is fine.
>
> Maybe I'm misreading the code, but it looks to me like this patch
> does the opposite of what the changelog says.
>
> Stefan
^ permalink raw reply
* Re: [PATCH v3 6/9] x86: prevent inline distortion by paravirt ops
From: Peter Zijlstra @ 2018-06-11 7:45 UTC (permalink / raw)
To: Nadav Amit
Cc: Juergen Gross, x86, linux-kernel, virtualization, Ingo Molnar,
H. Peter Anvin, Alok Kataria, Thomas Gleixner
In-Reply-To: <20180610141911.52948-7-namit@vmware.com>
On Sun, Jun 10, 2018 at 07:19:08AM -0700, Nadav Amit wrote:
> +/*
> + * This generates an indirect call based on the operation type number.
> + * The type number, computed in PARAVIRT_PATCH, is derived from the
> + * offset into the paravirt_patch_template structure, and can therefore be
> + * freely converted back into a structure offset.
> + */
> +.macro PARAVIRT_ALT type:req clobber:req pv_opptr:req
Unlike the marcro maze you replaced, this has the CALL hardcoded in. So
maybe name this PARAVIRT_CALL instead of PARAVIRT_ALT ?
> +771: ANNOTATE_RETPOLINE_SAFE
> + call *\pv_opptr
> +772: .pushsection .parainstructions,"a"
> + _ASM_ALIGN
> + _ASM_PTR 771b
> + .byte \type
> + .byte 772b-771b
> + .short \clobber
> + .popsection
> +.endm
^ permalink raw reply
* [PULL] vhost: cleanups and fixes
From: Michael S. Tsirkin @ 2018-06-11 16:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: ohad, kevin, kvm, mst, netdev, liang.z.li, linux-remoteproc,
linux-kernel, stable, bjorn.andersson, mhocko, mhocko,
syzbot+87cfa083e727a224754b, akpm, virtualization
The following changes since commit 29dcea88779c856c7dc92040a0c01233263101d4:
Linux 4.17 (2018-06-03 14:15:21 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to aa15783ee62d57d69433101ede3e3ed11e48161d:
virtio: update the comments for transport features (2018-06-07 22:17:40 +0300)
----------------------------------------------------------------
virtio, vhost: features, fixes
VF support for virtio.
Free page hint request support for VM migration.
DMA barriers for virtio strong barriers.
Bugfixes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
----------------------------------------------------------------
Michael S. Tsirkin (2):
virtio_ring: switch to dma_XX barriers for rpmsg
vhost: fix info leak due to uninitialized memory
Tiwei Bie (2):
virtio_pci: support enabling VFs
virtio: update the comments for transport features
Wei Wang (4):
mm: support reporting free page blocks
virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
mm/page_poison: expose page_poisoning_enabled to kernel modules
virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
drivers/vhost/vhost.c | 3 +
drivers/virtio/virtio_balloon.c | 298 +++++++++++++++++++++++++++++++-----
drivers/virtio/virtio_pci_common.c | 30 ++++
drivers/virtio/virtio_pci_modern.c | 14 ++
include/linux/mm.h | 6 +
include/linux/virtio_ring.h | 4 +-
include/uapi/linux/virtio_balloon.h | 7 +
include/uapi/linux/virtio_config.h | 16 +-
mm/page_alloc.c | 97 ++++++++++++
mm/page_poison.c | 6 +
10 files changed, 439 insertions(+), 42 deletions(-)
^ permalink raw reply
* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-11 17:26 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: alexander.h.duyck, virtio-dev, qemu-devel, jiri, kubakici, netdev,
virtualization, loseweigh, aaron.f.brown
In-Reply-To: <1525734594-11134-1-git-send-email-sridhar.samudrala@intel.com>
On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:
> This feature bit can be used by hypervisor to indicate virtio_net device to
> act as a standby for another device with the same MAC address.
>
> I tested this with a small change to the patch to mark the STANDBY feature 'true'
> by default as i am using libvirt to start the VMs.
> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
> XML file?
>
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
So I do not think we can commit to this interface: we
really need to control visibility of the primary device.
However just for testing purposes, we could add a non-stable
interface "x-standby" with the understanding that as any
x- prefix it's unstable and will be changed down the road,
likely in the next release.
> ---
> hw/net/virtio-net.c | 2 ++
> include/standard-headers/linux/virtio_net.h | 3 +++
> 2 files changed, 5 insertions(+)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 90502fca7c..38b3140670 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> true),
> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> + false),
> DEFINE_PROP_END_OF_LIST(),
> };
>
> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
> index e9f255ea3f..01ec09684c 100644
> --- a/include/standard-headers/linux/virtio_net.h
> +++ b/include/standard-headers/linux/virtio_net.h
> @@ -57,6 +57,9 @@
> * Steering */
> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>
> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
> + * with the same MAC.
> + */
> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
>
> #ifndef VIRTIO_NET_NO_LEGACY
> --
> 2.14.3
^ permalink raw reply
* Re: [PULL] vhost: cleanups and fixes
From: Linus Torvalds @ 2018-06-11 18:32 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: KVM list, Network Development, Linux Kernel Mailing List,
Bjorn Andersson, Andrew Morton, virtualization
In-Reply-To: <20180611192353-mutt-send-email-mst@kernel.org>
On Mon, Jun 11, 2018 at 9:24 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
Is this really a good idea?
Plus it seems entirely broken.
The report_pfn_range() callback is done under the zone lock, but
virtio_balloon_send_free_pages() (which is the only callback used that
I can find) does add_one_sg(), which does virtqueue_add_inbuf(vq, &sg,
1, vq, GFP_KERNEL);
So now we apparently do a GFP_KERNEL allocation insider the mm zone
lock, which is broken on just _so_ many levels.
Pulled and then unpulled again.
Either somebody needs to explain why I'm wrong and you can re-submit
this, or this kind of garbage needs to go away.
I do *not* want to be in the situation where I pull stuff from the
virtio people that adds completely broken core VM functionality.
Because if I'm in that situation, I will stop pulling from you guys.
Seriously. You have *no* place sending me broken shit that is outside
the virtio layer.
Subsystems that break code MM will get shunned. You just aren't
important enough to allow you breaking code VM.
Linus
^ permalink raw reply
* Re: [PULL] vhost: cleanups and fixes
From: Linus Torvalds @ 2018-06-11 18:44 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: KVM list, Network Development, Linux Kernel Mailing List,
Bjorn Andersson, Andrew Morton, virtualization
In-Reply-To: <CA+55aFzrPgnd7hRPrkeV+jX-MSwOZf7T4wKxz66Lk4oub3PZsw@mail.gmail.com>
On Mon, Jun 11, 2018 at 11:32 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So now we apparently do a GFP_KERNEL allocation insider the mm zone
> lock, which is broken on just _so_ many levels.
Oh, I see the comment about how it doesn't actually do an allocation
at all because it's a single-entry.
Still too damn ugly to live, and much too fragile. No way in hell do
we even _hint_ at a GFP_KERNEL when we're inside a context that can't
do any memory allocation at all.
Plus I'm not convinced it's a "no allocation" path even despite that
comment, because it also does a "dma_map_page()" etc, which can cause
allocations to do the dma mapping thing afaik. No?
Maybe there's some reason why that doesn't happen either, but
basically this whole callchain looks *way* to complicated to be used
under a core VM spinlock.
Linus
^ permalink raw reply
* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Jens Axboe @ 2018-06-12 1:18 UTC (permalink / raw)
To: Matthew Wilcox, linux-kernel, linux-scsi, target-devel,
linux1394-devel, linux-usb, kvm, virtualization, netdev,
Juergen Gross, qla2xxx-upstream, Kent Overstreet
Cc: Matthew Wilcox
In-Reply-To: <3a56027b-47bc-dcb8-a465-3670031572f1@kernel.dk>
On 5/15/18 10:11 AM, Jens Axboe wrote:
> On 5/15/18 10:00 AM, Matthew Wilcox wrote:
>> From: Matthew Wilcox <mawilcox@microsoft.com>
>>
>> The sbitmap and the percpu_ida perform essentially the same task,
>> allocating tags for commands. Since the sbitmap is more used than
>> the percpu_ida, convert the percpu_ida users to the sbitmap API.
>
> It should also be the same performance as percpu_ida in light use, and
> performs much better at > 50% utilization of the tag space. I think
> that's better justification than "more used than".
>
>> diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
>> index 4435bf374d2d..28bcffae609f 100644
>> --- a/drivers/target/iscsi/iscsi_target_util.c
>> +++ b/drivers/target/iscsi/iscsi_target_util.c
>> @@ -17,7 +17,7 @@
>> ******************************************************************************/
>>
>> #include <linux/list.h>
>> -#include <linux/percpu_ida.h>
>> +#include <linux/sched/signal.h>
>> #include <net/ipv6.h> /* ipv6_addr_equal() */
>> #include <scsi/scsi_tcq.h>
>> #include <scsi/iscsi_proto.h>
>> @@ -147,6 +147,28 @@ void iscsit_free_r2ts_from_list(struct iscsi_cmd *cmd)
>> spin_unlock_bh(&cmd->r2t_lock);
>> }
>>
>> +int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
>> +{
>> + int tag = -1;
>> + DEFINE_WAIT(wait);
>> + struct sbq_wait_state *ws;
>> +
>> + if (state == TASK_RUNNING)
>> + return tag;
>> +
>> + ws = &se_sess->sess_tag_pool.ws[0];
>> + for (;;) {
>> + prepare_to_wait_exclusive(&ws->wait, &wait, state);
>> + if (signal_pending_state(state, current))
>> + break;
>> + schedule();
>> + tag = sbitmap_queue_get(&se_sess->sess_tag_pool, cpup);
>> + }
>> +
>> + finish_wait(&ws->wait, &wait);
>> + return tag;
>> +}
>
> Seems like that should be:
>
>
> ws = &se_sess->sess_tag_pool.ws[0];
> for (;;) {
> prepare_to_wait_exclusive(&ws->wait, &wait, state);
> if (signal_pending_state(state, current))
> break;
> tag = sbitmap_queue_get(&se_sess->sess_tag_pool, cpup);
> if (tag != -1)
> break;
> schedule();
> }
>
> finish_wait(&ws->wait, &wait);
> return tag;
>
>> /*
>> * May be called from software interrupt (timer) context for allocating
>> * iSCSI NopINs.
>> @@ -155,9 +177,11 @@ struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, int state)
>> {
>> struct iscsi_cmd *cmd;
>> struct se_session *se_sess = conn->sess->se_sess;
>> - int size, tag;
>> + int size, tag, cpu;
>>
>> - tag = percpu_ida_alloc(&se_sess->sess_tag_pool, state);
>> + tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
>> + if (tag < 0)
>> + tag = iscsit_wait_for_tag(se_sess, state, &cpu);
>> if (tag < 0)
>> return NULL;
>
> Might make sense to just roll the whole thing into iscsi_get_tag(), that
> would be cleaner.
>
> sbitmap should provide a helper for that, but we can do that cleanup
> later. That would encapsulate things like the per-cpu caching hint too,
> for instance.
>
> Rest looks fine to me.
Are you going to push this further? I really think we should.
--
Jens Axboe
^ permalink raw reply
* Re: [PULL] vhost: cleanups and fixes
From: Michael S. Tsirkin @ 2018-06-12 1:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: KVM list, Network Development, Linux Kernel Mailing List,
Bjorn Andersson, Andrew Morton, virtualization
In-Reply-To: <CA+55aFz4uBHaWnBarVUEG90s2ucyVtoLmwYpYVwDV+XQESNRqw@mail.gmail.com>
On Mon, Jun 11, 2018 at 11:44:05AM -0700, Linus Torvalds wrote:
> On Mon, Jun 11, 2018 at 11:32 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So now we apparently do a GFP_KERNEL allocation insider the mm zone
> > lock, which is broken on just _so_ many levels.
>
> Oh, I see the comment about how it doesn't actually do an allocation
> at all because it's a single-entry.
>
> Still too damn ugly to live, and much too fragile. No way in hell do
> we even _hint_ at a GFP_KERNEL when we're inside a context that can't
> do any memory allocation at all.
>
> Plus I'm not convinced it's a "no allocation" path even despite that
> comment, because it also does a "dma_map_page()" etc, which can cause
> allocations to do the dma mapping thing afaik. No?
Well no because DMA is triggered by the IOMMU flag and
that is always off for the balloon. But I hear what you
are saying about it being fragile.
> Maybe there's some reason why that doesn't happen either, but
> basically this whole callchain looks *way* to complicated to be used
> under a core VM spinlock.
>
> Linus
Maybe it will help to have GFP_NONE which will make any allocation
fail if attempted. Linus, would this address your comment?
--
MST
^ permalink raw reply
* Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Jason Wang @ 2018-06-12 1:54 UTC (permalink / raw)
To: Michael S. Tsirkin, Sridhar Samudrala
Cc: alexander.h.duyck, virtio-dev, aaron.f.brown, jiri, kubakici,
netdev, qemu-devel, loseweigh, virtualization
In-Reply-To: <20180611202207-mutt-send-email-mst@kernel.org>
On 2018年06月12日 01:26, Michael S. Tsirkin wrote:
> On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:
>> This feature bit can be used by hypervisor to indicate virtio_net device to
>> act as a standby for another device with the same MAC address.
>>
>> I tested this with a small change to the patch to mark the STANDBY feature 'true'
>> by default as i am using libvirt to start the VMs.
>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
>> XML file?
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> So I do not think we can commit to this interface: we
> really need to control visibility of the primary device.
The problem is legacy guest won't use primary device at all if we do this.
How about control the visibility of standby device?
Thanks
> However just for testing purposes, we could add a non-stable
> interface "x-standby" with the understanding that as any
> x- prefix it's unstable and will be changed down the road,
> likely in the next release.
>
>
>> ---
>> hw/net/virtio-net.c | 2 ++
>> include/standard-headers/linux/virtio_net.h | 3 +++
>> 2 files changed, 5 insertions(+)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 90502fca7c..38b3140670 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>> true),
>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>> + false),
>> DEFINE_PROP_END_OF_LIST(),
>> };
>>
>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
>> index e9f255ea3f..01ec09684c 100644
>> --- a/include/standard-headers/linux/virtio_net.h
>> +++ b/include/standard-headers/linux/virtio_net.h
>> @@ -57,6 +57,9 @@
>> * Steering */
>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>>
>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
>> + * with the same MAC.
>> + */
>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
>>
>> #ifndef VIRTIO_NET_NO_LEGACY
>> --
>> 2.14.3
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PULL] vhost: cleanups and fixes
From: Michael S. Tsirkin @ 2018-06-12 1:57 UTC (permalink / raw)
To: Linus Torvalds
Cc: nilal, KVM list, Network Development, Linux Kernel Mailing List,
Bjorn Andersson, Andrew Morton, virtualization
In-Reply-To: <CA+55aFzrPgnd7hRPrkeV+jX-MSwOZf7T4wKxz66Lk4oub3PZsw@mail.gmail.com>
On Mon, Jun 11, 2018 at 11:32:41AM -0700, Linus Torvalds wrote:
> On Mon, Jun 11, 2018 at 9:24 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>
> Is this really a good idea?
Well knowing which pages are unused does seem to be useful. Do you
refer to this generally or to the idea of scanning the free list,
or to the specific implementation issues you listed below?
> Plus it seems entirely broken.
>
> The report_pfn_range() callback is done under the zone lock, but
> virtio_balloon_send_free_pages() (which is the only callback used that
> I can find) does add_one_sg(), which does virtqueue_add_inbuf(vq, &sg,
> 1, vq, GFP_KERNEL);
>
> So now we apparently do a GFP_KERNEL allocation insider the mm zone
> lock, which is broken on just _so_ many levels.
>
> Pulled and then unpulled again.
>
> Either somebody needs to explain why I'm wrong and you can re-submit
> this, or this kind of garbage needs to go away.
>
> I do *not* want to be in the situation where I pull stuff from the
> virtio people that adds completely broken core VM functionality.
>
> Because if I'm in that situation, I will stop pulling from you guys.
> Seriously. You have *no* place sending me broken shit that is outside
> the virtio layer.
>
> Subsystems that break code MM will get shunned. You just aren't
> important enough to allow you breaking code VM.
>
> Linus
So no, it doesn't do any allocations - GFP_KERNEL or otherwise, but I
agree how it might be confusing.
So I'll drop this for now, but it would be nice if there is a bit more
direction on this feature since I know several people are trying to work
on this.
--
MST
^ permalink raw reply
* Re: [PULL] vhost: cleanups and fixes
From: Linus Torvalds @ 2018-06-12 1:59 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: KVM list, Network Development, Linux Kernel Mailing List,
Bjorn Andersson, Andrew Morton, virtualization
In-Reply-To: <20180612042600-mutt-send-email-mst@kernel.org>
On Mon, Jun 11, 2018 at 6:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Maybe it will help to have GFP_NONE which will make any allocation
> fail if attempted. Linus, would this address your comment?
It would definitely have helped me initially overlook that call chain.
But then when I started looking at the whole dma_map_page() thing, it
just raised my hackles again.
I would seriously suggest having a much simpler version for the "no
allocation, no dma mapping" case, so that it's *obvious* that that
never happens.
So instead of having virtio_balloon_send_free_pages() call a really
generic complex chain of functions that in _some_ cases can do memory
allocation, why isn't there a short-circuited "vitruque_add_datum()"
that is guaranteed to never do anything like that?
Honestly, I look at "add_one_sg()" and it really doesn't make me
happy. It looks hacky as hell. If I read the code right, you're really
trying to just queue up a simple tuple of <pfn,len>, except you encode
it as a page pointer in order to play games with the SG logic, and
then you hmap that to the ring, except in this case it's all a fake
ring that just adds the cpu-physical address instead.
And to figuer that out, it's like five layers of indirection through
different helper functions that *can* do more generic things but in
this case don't.
And you do all of this from a core VM callback function with some
_really_ core VM locks held.
That makes no sense to me.
How about this:
- get rid of all that code
- make the core VM callback save the "these are the free memory
regions" in a fixed and limited array. One that DOES JUST THAT. No
crazy "SG IO dma-mapping function crap". Just a plain array of a fixed
size, pre-allocated for that virtio instance.
- make it obvious that what you do in that sequence is ten
instructions and no allocations ("Look ma, I wrote a value to an array
and incremented the array idex, and I'M DONE")
- then in that workqueue entry that you start *anyway*, you empty the
array and do all the crazy virtio stuff.
In fact, while at it, just simplify the VM interface too. Instead of
traversing a random number of buddy lists, just trraverse *one* - the
top-level one. Are you seriously ever going to shrink or mark
read-only anythin *but* something big enough to be in the maximum
order?
MAX_ORDER is what, 11? So we're talking 8MB blocks. Do you *really*
want the balloon code to work on smaller things, particularly since
the whole interface is fundamentally racy and opportunistic to begin
with?
The whole sequence of events really looks "this is too much
complexity, and way too fragile" to me at so many levels.
Linus
^ permalink raw reply
* Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-12 2:17 UTC (permalink / raw)
To: Jason Wang
Cc: alexander.h.duyck, virtio-dev, aaron.f.brown, jiri, kubakici,
Sridhar Samudrala, qemu-devel, loseweigh, netdev, virtualization
In-Reply-To: <dc8b1669-b8b6-fb59-94e2-6e4de6b96572@redhat.com>
On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote:
>
>
> On 2018年06月12日 01:26, Michael S. Tsirkin wrote:
> > On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:
> > > This feature bit can be used by hypervisor to indicate virtio_net device to
> > > act as a standby for another device with the same MAC address.
> > >
> > > I tested this with a small change to the patch to mark the STANDBY feature 'true'
> > > by default as i am using libvirt to start the VMs.
> > > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
> > > XML file?
> > >
> > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > So I do not think we can commit to this interface: we
> > really need to control visibility of the primary device.
>
> The problem is legacy guest won't use primary device at all if we do this.
And that's by design - I think it's the only way to ensure the
legacy guest isn't confused.
> How about control the visibility of standby device?
>
> Thanks
standy the always there to guarantee no downtime.
> > However just for testing purposes, we could add a non-stable
> > interface "x-standby" with the understanding that as any
> > x- prefix it's unstable and will be changed down the road,
> > likely in the next release.
> >
> >
> > > ---
> > > hw/net/virtio-net.c | 2 ++
> > > include/standard-headers/linux/virtio_net.h | 3 +++
> > > 2 files changed, 5 insertions(+)
> > >
> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > index 90502fca7c..38b3140670 100644
> > > --- a/hw/net/virtio-net.c
> > > +++ b/hw/net/virtio-net.c
> > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> > > true),
> > > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> > > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> > > + false),
> > > DEFINE_PROP_END_OF_LIST(),
> > > };
> > > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
> > > index e9f255ea3f..01ec09684c 100644
> > > --- a/include/standard-headers/linux/virtio_net.h
> > > +++ b/include/standard-headers/linux/virtio_net.h
> > > @@ -57,6 +57,9 @@
> > > * Steering */
> > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
> > > + * with the same MAC.
> > > + */
> > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
> > > #ifndef VIRTIO_NET_NO_LEGACY
> > > --
> > > 2.14.3
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 3:06 UTC (permalink / raw)
To: Jens Axboe
Cc: Juergen Gross, kvm, linux-scsi, Matthew Wilcox, netdev, linux-usb,
linux-kernel, virtualization, target-devel, qla2xxx-upstream,
linux1394-devel, Kent Overstreet
In-Reply-To: <02326395-3241-c94b-ad70-3de27a6f5a8c@kernel.dk>
On Mon, Jun 11, 2018 at 07:18:55PM -0600, Jens Axboe wrote:
> Are you going to push this further? I really think we should.
Yes, I'll resubmit it tomorrow. Sorry for dropping the ball on this one.
^ permalink raw reply
* Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-12 5:02 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang
Cc: alexander.h.duyck, virtio-dev, aaron.f.brown, jiri, kubakici,
netdev, qemu-devel, loseweigh, virtualization
In-Reply-To: <20180612051432-mutt-send-email-mst@kernel.org>
On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote:
> On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote:
>>
>> On 2018年06月12日 01:26, Michael S. Tsirkin wrote:
>>> On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:
>>>> This feature bit can be used by hypervisor to indicate virtio_net device to
>>>> act as a standby for another device with the same MAC address.
>>>>
>>>> I tested this with a small change to the patch to mark the STANDBY feature 'true'
>>>> by default as i am using libvirt to start the VMs.
>>>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
>>>> XML file?
>>>>
>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>> So I do not think we can commit to this interface: we
>>> really need to control visibility of the primary device.
>> The problem is legacy guest won't use primary device at all if we do this.
> And that's by design - I think it's the only way to ensure the
> legacy guest isn't confused.
Yes. I think so. But i am not sure if Qemu is the right place to control the visibility
of the primary device. The primary device may not be specified as an argument to Qemu. It
may be plugged in later.
The cloud service provider is providing a feature that enables low latency datapath and live
migration capability.
A tenant can use this feature only if he is running a VM that has virtio-net with failover support.
I think Qemu should check if guest virtio-net supports this feature and provide a mechanism for
an upper layer indicating if the STANDBY feature is successfully negotiated or not.
The upper layer can then decide if it should hot plug a VF with the same MAC and manage the 2 links.
If VF is successfully hot plugged, virtio-net link should be disabled.
>
>> How about control the visibility of standby device?
>>
>> Thanks
> standy the always there to guarantee no downtime.
>
>>> However just for testing purposes, we could add a non-stable
>>> interface "x-standby" with the understanding that as any
>>> x- prefix it's unstable and will be changed down the road,
>>> likely in the next release.
>>>
>>>
>>>> ---
>>>> hw/net/virtio-net.c | 2 ++
>>>> include/standard-headers/linux/virtio_net.h | 3 +++
>>>> 2 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>> index 90502fca7c..38b3140670 100644
>>>> --- a/hw/net/virtio-net.c
>>>> +++ b/hw/net/virtio-net.c
>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>> true),
>>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>>>> + false),
>>>> DEFINE_PROP_END_OF_LIST(),
>>>> };
>>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
>>>> index e9f255ea3f..01ec09684c 100644
>>>> --- a/include/standard-headers/linux/virtio_net.h
>>>> +++ b/include/standard-headers/linux/virtio_net.h
>>>> @@ -57,6 +57,9 @@
>>>> * Steering */
>>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
>>>> + * with the same MAC.
>>>> + */
>>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
>>>> #ifndef VIRTIO_NET_NO_LEGACY
>>>> --
>>>> 2.14.3
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PULL] vhost: cleanups and fixes
From: Wei Wang @ 2018-06-12 11:05 UTC (permalink / raw)
To: Linus Torvalds, Michael S. Tsirkin
Cc: KVM list, Network Development, Linux Kernel Mailing List,
Bjorn Andersson, Andrew Morton, virtualization
In-Reply-To: <CA+55aFyNhEzzufw0XP9DcqZNS1CH+jDGdN4CVnazb3ssFxFbzQ@mail.gmail.com>
On 06/12/2018 09:59 AM, Linus Torvalds wrote:
> On Mon, Jun 11, 2018 at 6:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> Maybe it will help to have GFP_NONE which will make any allocation
>> fail if attempted. Linus, would this address your comment?
> It would definitely have helped me initially overlook that call chain.
>
> But then when I started looking at the whole dma_map_page() thing, it
> just raised my hackles again.
>
> I would seriously suggest having a much simpler version for the "no
> allocation, no dma mapping" case, so that it's *obvious* that that
> never happens.
>
> So instead of having virtio_balloon_send_free_pages() call a really
> generic complex chain of functions that in _some_ cases can do memory
> allocation, why isn't there a short-circuited "vitruque_add_datum()"
> that is guaranteed to never do anything like that?
>
> Honestly, I look at "add_one_sg()" and it really doesn't make me
> happy. It looks hacky as hell. If I read the code right, you're really
> trying to just queue up a simple tuple of <pfn,len>, except you encode
> it as a page pointer in order to play games with the SG logic, and
> then you hmap that to the ring, except in this case it's all a fake
> ring that just adds the cpu-physical address instead.
>
> And to figuer that out, it's like five layers of indirection through
> different helper functions that *can* do more generic things but in
> this case don't.
>
> And you do all of this from a core VM callback function with some
> _really_ core VM locks held.
>
> That makes no sense to me.
>
> How about this:
>
> - get rid of all that code
>
> - make the core VM callback save the "these are the free memory
> regions" in a fixed and limited array. One that DOES JUST THAT. No
> crazy "SG IO dma-mapping function crap". Just a plain array of a fixed
> size, pre-allocated for that virtio instance.
>
> - make it obvious that what you do in that sequence is ten
> instructions and no allocations ("Look ma, I wrote a value to an array
> and incremented the array idex, and I'M DONE")
>
> - then in that workqueue entry that you start *anyway*, you empty the
> array and do all the crazy virtio stuff.
>
> In fact, while at it, just simplify the VM interface too. Instead of
> traversing a random number of buddy lists, just trraverse *one* - the
> top-level one. Are you seriously ever going to shrink or mark
> read-only anythin *but* something big enough to be in the maximum
> order?
>
> MAX_ORDER is what, 11? So we're talking 8MB blocks. Do you *really*
> want the balloon code to work on smaller things, particularly since
> the whole interface is fundamentally racy and opportunistic to begin
> with?
OK, I will implement a new version based on the suggestions. Thanks.
Best,
Wei
^ permalink raw reply
* Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-12 11:34 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, aaron.f.brown, jiri, kubakici,
netdev, qemu-devel, loseweigh, virtualization
In-Reply-To: <23fc4aa4-ec41-d6e2-3354-10cbfc13b7ec@intel.com>
On Mon, Jun 11, 2018 at 10:02:45PM -0700, Samudrala, Sridhar wrote:
> On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote:
> > On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote:
> > >
> > > On 2018年06月12日 01:26, Michael S. Tsirkin wrote:
> > > > On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:
> > > > > This feature bit can be used by hypervisor to indicate virtio_net device to
> > > > > act as a standby for another device with the same MAC address.
> > > > >
> > > > > I tested this with a small change to the patch to mark the STANDBY feature 'true'
> > > > > by default as i am using libvirt to start the VMs.
> > > > > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
> > > > > XML file?
> > > > >
> > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > > So I do not think we can commit to this interface: we
> > > > really need to control visibility of the primary device.
> > > The problem is legacy guest won't use primary device at all if we do this.
> > And that's by design - I think it's the only way to ensure the
> > legacy guest isn't confused.
>
> Yes. I think so. But i am not sure if Qemu is the right place to control the visibility
> of the primary device. The primary device may not be specified as an argument to Qemu. It
> may be plugged in later.
> The cloud service provider is providing a feature that enables low latency datapath and live
> migration capability.
> A tenant can use this feature only if he is running a VM that has virtio-net with failover support.
Well live migration is there already. The new feature is low latency
data path.
And it's the guest that needs failover support not the VM.
> I think Qemu should check if guest virtio-net supports this feature and provide a mechanism for
> an upper layer indicating if the STANDBY feature is successfully negotiated or not.
> The upper layer can then decide if it should hot plug a VF with the same MAC and manage the 2 links.
> If VF is successfully hot plugged, virtio-net link should be disabled.
Did you even talk to upper layer management about it?
Just list the steps they need to do and you will see
that's a lot of machinery to manage by the upper layer.
What do we gain in flexibility? As far as I can see the
only gain is some resources saved for legacy VMs.
That's not a lot as tenant of the upper layer probably already has
at least a hunch that it's a new guest otherwise
why bother specifying the feature at all - you
save even more resources without it.
>
> >
> > > How about control the visibility of standby device?
> > >
> > > Thanks
> > standy the always there to guarantee no downtime.
> >
> > > > However just for testing purposes, we could add a non-stable
> > > > interface "x-standby" with the understanding that as any
> > > > x- prefix it's unstable and will be changed down the road,
> > > > likely in the next release.
> > > >
> > > >
> > > > > ---
> > > > > hw/net/virtio-net.c | 2 ++
> > > > > include/standard-headers/linux/virtio_net.h | 3 +++
> > > > > 2 files changed, 5 insertions(+)
> > > > >
> > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > > index 90502fca7c..38b3140670 100644
> > > > > --- a/hw/net/virtio-net.c
> > > > > +++ b/hw/net/virtio-net.c
> > > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> > > > > true),
> > > > > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> > > > > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > > > > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> > > > > + false),
> > > > > DEFINE_PROP_END_OF_LIST(),
> > > > > };
> > > > > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
> > > > > index e9f255ea3f..01ec09684c 100644
> > > > > --- a/include/standard-headers/linux/virtio_net.h
> > > > > +++ b/include/standard-headers/linux/virtio_net.h
> > > > > @@ -57,6 +57,9 @@
> > > > > * Steering */
> > > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> > > > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
> > > > > + * with the same MAC.
> > > > > + */
> > > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
> > > > > #ifndef VIRTIO_NET_NO_LEGACY
> > > > > --
> > > > > 2.14.3
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-12 11:47 UTC (permalink / raw)
To: Siwei Liu
Cc: Alexander Duyck, virtio-dev, qemu-devel, Samudrala, Sridhar,
virtualization
In-Reply-To: <CADGSJ21AMp02vV+qxa+DuyvAUCCsXUFwV7wjzQ1Vy1jKV=n1HA@mail.gmail.com>
On Tue, Jun 05, 2018 at 03:09:26PM -0700, Siwei Liu wrote:
> The thing is cloud service provider might prefer sticking to the same
> level of service agreement (SLA) of keeping VF over migration,
That requirement is trivially satisfied by just a single VF :) If your
SLA does not require live migration, you should do just that.
--
MST
^ permalink raw reply
* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-12 11:54 UTC (permalink / raw)
To: Jason Wang
Cc: alexander.h.duyck, virtio-dev, Samudrala, Sridhar, qemu-devel,
virtualization
In-Reply-To: <a45625c9-d2cd-f8f7-ce31-37c558e70b88@redhat.com>
On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote:
>
>
> On 2018年06月05日 20:33, Michael S. Tsirkin wrote:
> > I don't think this is sufficient.
> >
> > If both primary and standby devices are present, a legacy guest without
> > support for the feature might see two devices with same mac and get
> > confused.
> >
> > I think that we should only make primary visible after guest acked the
> > backup feature bit.
>
> I think we want exactly the reverse? E.g fail the negotiation when guest
> does not ack backup feature.
>
> Otherwise legacy guest won't even have the chance to see primary device in
> the guest.
That's by design.
> >
> > And on reset or when backup is cleared in some other way, unplug the
> > primary.
>
> What if guest does not support hotplug?
It shouldn't ack the backup feature then and it's a good point.
We should both document this and check kernel config has
hotplug support. Sridhar could you take a look pls?
> >
> > Something like the below will do the job:
> >
> > Primary device is added with a special "primary-failover" flag.
> > A virtual machine is then initialized with just a standby virtio
> > device. Primary is not yet added.
>
> A question is how to do the matching? Qemu knows nothing about e.g mac
> address of a pass-through device I believe?
Supply a flag to the VFIO when it's added, this way QEMU will know.
> >
> > Later QEMU detects that guest driver device set DRIVER_OK.
> > It then exposes the primary device to the guest, and triggers
> > a device addition event (hot-plug event) for it.
>
> Do you mean we won't have primary device in the initial qemu cli?
No, that's not what I mean.
I mean management will supply a flag to VFIO and then
- VFIO defers exposing
primary to guest until guest acks the feature bit.
- When we see guest ack, initiate hotplug.
- On reboot, hide it again.
- On reset without reboot, request hot-unplug and on eject hide it again.
> >
> > If QEMU detects guest driver removal, it initiates a hot-unplug sequence
> > to remove the primary driver. In particular, if QEMU detects guest
> > re-initialization (e.g. by detecting guest reset) it immediately removes
> > the primary device.
>
> I believe guest failover module should handle this gracefully?
It can't control enumeration order, if primary is enumerated before
standby then guest will load its driver and it's too late
when failover driver is loaded.
> Thanks
>
> >
> > We can move some of this code to management as well, architecturally it
> > does not make too much sense but it might be easier implementation-wise.
> >
> > HTH
> >
> > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
> > > Ping on this patch now that the kernel patches are accepted into davem's net-next tree.
> > > https://patchwork.ozlabs.org/cover/920005/
> > >
> > >
> > > On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
> > > > This feature bit can be used by hypervisor to indicate virtio_net device to
> > > > act as a standby for another device with the same MAC address.
> > > >
> > > > I tested this with a small change to the patch to mark the STANDBY feature 'true'
> > > > by default as i am using libvirt to start the VMs.
> > > > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
> > > > XML file?
> > > >
> > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > > ---
> > > > hw/net/virtio-net.c | 2 ++
> > > > include/standard-headers/linux/virtio_net.h | 3 +++
> > > > 2 files changed, 5 insertions(+)
> > > >
> > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > index 90502fca7c..38b3140670 100644
> > > > --- a/hw/net/virtio-net.c
> > > > +++ b/hw/net/virtio-net.c
> > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> > > > true),
> > > > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> > > > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > > > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> > > > + false),
> > > > DEFINE_PROP_END_OF_LIST(),
> > > > };
> > > > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
> > > > index e9f255ea3f..01ec09684c 100644
> > > > --- a/include/standard-headers/linux/virtio_net.h
> > > > +++ b/include/standard-headers/linux/virtio_net.h
> > > > @@ -57,6 +57,9 @@
> > > > * Steering */
> > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> > > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
> > > > + * with the same MAC.
> > > > + */
> > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
> > > > #ifndef VIRTIO_NET_NO_LEGACY
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Bart Van Assche @ 2018-06-12 15:22 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-usb@vger.kernel.org, willy@infradead.org,
virtualization@lists.linux-foundation.org,
kent.overstreet@gmail.com, linux1394-devel@lists.sourceforge.net,
jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
qla2xxx-upstream@qlogic.com, target-devel@vger.kernel.org,
netdev@vger.kernel.org
Cc: mawilcox@microsoft.com
In-Reply-To: <20180515160043.27044-2-willy@infradead.org>
On Tue, 2018-05-15 at 09:00 -0700, Matthew Wilcox wrote:
> diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> index 025dc2d3f3de..cdf671c2af61 100644
> --- a/drivers/scsi/qla2xxx/qla_target.c
> +++ b/drivers/scsi/qla2xxx/qla_target.c
> @@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
> return;
> }
> cmd->jiffies_at_free = get_jiffies_64();
> - percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
> + sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
> + cmd->se_cmd.map_cpu);
> }
> EXPORT_SYMBOL(qlt_free_cmd);
Please introduce functions in the target core for allocating and freeing a tag
instead of spreading the knowledge of how to allocate and free tags over all
target drivers.
> +int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
> +{
> + int tag = -1;
> + DEFINE_WAIT(wait);
> + struct sbq_wait_state *ws;
> +
> + if (state == TASK_RUNNING)
> + return tag;
> +
> + ws = &se_sess->sess_tag_pool.ws[0];
> + for (;;) {
> + prepare_to_wait_exclusive(&ws->wait, &wait, state);
> + if (signal_pending_state(state, current))
> + break;
This looks weird to me. Shouldn't target code ignore signals instead of causing
tag allocation to fail if a signal is received?
> + schedule();
> + tag = sbitmap_queue_get(&se_sess->sess_tag_pool, cpup);
> + }
> +
> + finish_wait(&ws->wait, &wait);
> + return tag;
> +}
Thanks,
Bart.
^ permalink raw reply
* Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support
From: Stefan Hajnoczi @ 2018-06-12 16:05 UTC (permalink / raw)
To: Liu, Changpeng
Cc: pbonzini@redhat.com, cavery@redhat.com,
virtualization@lists.linux-foundation.org
In-Reply-To: <FF7FC980937D6342B9D289F5F3C7C2625B6DF62C@SHSMSX103.ccr.corp.intel.com>
[-- Attachment #1.1: Type: text/plain, Size: 6316 bytes --]
On Mon, Jun 11, 2018 at 03:37:00AM +0000, Liu, Changpeng wrote:
>
>
> > -----Original Message-----
> > From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> > Sent: Friday, June 8, 2018 6:20 PM
> > To: Liu, Changpeng <changpeng.liu@intel.com>
> > Cc: virtualization@lists.linux-foundation.org; cavery@redhat.com;
> > jasowang@redhat.com; pbonzini@redhat.com; Wang, Wei W
> > <wei.w.wang@intel.com>
> > Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands
> > support
> >
> > On Thu, Jun 07, 2018 at 11:07:06PM +0000, Liu, Changpeng wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> > > > Sent: Thursday, June 7, 2018 9:10 PM
> > > > To: Liu, Changpeng <changpeng.liu@intel.com>
> > > > Cc: virtualization@lists.linux-foundation.org; cavery@redhat.com;
> > > > jasowang@redhat.com; pbonzini@redhat.com; Wang, Wei W
> > > > <wei.w.wang@intel.com>
> > > > Subject: Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands
> > > > support
> > > >
> > > > On Wed, Jun 06, 2018 at 12:19:00PM +0800, Changpeng Liu wrote:
> > > > > Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES commands
> > > > > support, this will impact the performance when using SSD backend over
> > > > > file systems.
> > > > >
> > > > > Commit 88c85538 "virtio-blk: add discard and write zeroes features to
> > > > > specification"(see https://github.com/oasis-tcs/virtio-spec) extended
> > > > > existing virtio-blk protocol, adding extra DISCARD and WRITE ZEROES
> > > > > commands support.
> > > > >
> > > > > While here, using 16 bytes descriptor to describe one segment of DISCARD
> > > > > or WRITE ZEROES commands, each command may contain one or more
> > > > decriptors.
> > > > >
> > > > > The following data structure shows the definition of one descriptor:
> > > > >
> > > > > struct virtio_blk_discard_write_zeroes {
> > > > > le64 sector;
> > > > > le32 num_sectors;
> > > > > le32 unmap;
> > > > > };
> > > > >
> > > > > Field 'sector' means the start sector for DISCARD and WRITE ZEROES,
> > > > > filed 'num_sectors' means the number of sectors for DISCARD and WRITE
> > > > > ZEROES, if both DISCARD and WRITE ZEROES are supported, field 'unmap'
> > > > > maybe used for WRITE ZEROES command with DISCARD enabled.
> > > > >
> > > > > We also extended the virtio-blk configuration space to let backend
> > > > > device put DISCARD and WRITE ZEROES configuration parameters.
> > > > >
> > > > > struct virtio_blk_config {
> > > > > [...]
> > > > >
> > > > > le32 max_discard_sectors;
> > > > > le32 max_discard_seg;
> > > > > le32 discard_sector_alignment;
> > > > > le32 max_write_zeroes_sectors;
> > > > > le32 max_write_zeroes_seg;
> > > > > u8 write_zeroes_may_unmap;
> > > > > }
> > > > >
> > > > > New feature bit [VIRTIO_BLK_F_DISCARD (13)]: Device can support discard
> > > > > command, maximum discard sectors size in field 'max_discard_sectors' and
> > > > > maximum discard segment number in field 'max_discard_seg'.
> > > > >
> > > > > New feature [VIRTIO_BLK_F_WRITE_ZEROES (14)]: Device can support write
> > > > > zeroes command, maximum write zeroes sectors size in field
> > > > > 'max_write_zeroes_sectors' and maximum write zeroes segment number in
> > > > > field 'max_write_zeroes_seg'.
> > > > >
> > > > > The parameters in the configuration space of the device field
> > > > > 'max_discard_sectors' and field 'discard_sector_alignment' are expressed in
> > > > > 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. The
> > > > > field 'max_write_zeroes_sectors' is expressed in 512-byte units if the
> > > > > VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated.
> > > > >
> > > > > Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> > > > > ---
> > > > > CHANGELOG:
> > > > > v6: don't set T_OUT bit to discard and write zeroes commands.
> > > >
> > > > I don't see this in the patch...
> > > Yeah, do noting with DISCARD/WRITE ZEROES means no need to OR BLK_T_OUT
> > again.
> > > >
> > > > > @@ -225,6 +260,7 @@ static blk_status_t virtio_queue_rq(struct
> > > > blk_mq_hw_ctx *hctx,
> > > > > int qid = hctx->queue_num;
> > > > > int err;
> > > > > bool notify = false;
> > > > > + bool unmap = false;
> > > > > u32 type;
> > > > >
> > > > > BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
> > > > > @@ -237,6 +273,13 @@ static blk_status_t virtio_queue_rq(struct
> > > > blk_mq_hw_ctx *hctx,
> > > > > case REQ_OP_FLUSH:
> > > > > type = VIRTIO_BLK_T_FLUSH;
> > > > > break;
> > > > > + case REQ_OP_DISCARD:
> > > > > + type = VIRTIO_BLK_T_DISCARD;
> > > > > + break;
> > > > > + case REQ_OP_WRITE_ZEROES:
> > > > > + type = VIRTIO_BLK_T_WRITE_ZEROES;
> > > > > + unmap = !(req->cmd_flags & REQ_NOUNMAP);
> > > > > + break;
> > > > > case REQ_OP_SCSI_IN:
> > > > > case REQ_OP_SCSI_OUT:
> > > > > type = VIRTIO_BLK_T_SCSI_CMD;
> > > > > @@ -256,6 +299,12 @@ static blk_status_t virtio_queue_rq(struct
> > > > blk_mq_hw_ctx *hctx,
> > > > >
> > > > > blk_mq_start_request(req);
> > > > >
> > > > > + if (type == VIRTIO_BLK_T_DISCARD || type ==
> > > > VIRTIO_BLK_T_WRITE_ZEROES) {
> > > > > + err = virtblk_setup_discard_write_zeroes(req, unmap);
> > > > > + if (err)
> > > > > + return BLK_STS_RESOURCE;
> > > > > + }
> > > > > +
> > > > > num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
> > > > > if (num) {
> > > > > if (rq_data_dir(req) == WRITE)
> > > >
> > > > ...since we still do blk_rq_map_sg() here and num should be != 0.
> > > No, while here, we should keep the original logic for READ/WRITE commands.
> >
> > My question is: why does the changelog say "don't set T_OUT" but the
> > code *will* set it because blk_rq_map_sg() returns != 0 and
> > rq_data_dir(req) == WRITE?
> Since the last bit of DISCARD/WRITE ZEROES commands are already 1, so I said we don't need to set
> T_OUT bit to DISCARD/WRITE ZEROES commands again. But the original logic for WRITE, T_OUT is still
> needed, so just keep the original code here is fine.
Okay, I understand what you meant now. Thanks!
Stefan
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 16:15 UTC (permalink / raw)
To: Bart Van Assche
Cc: jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
kvm@vger.kernel.org, mawilcox@microsoft.com,
netdev@vger.kernel.org, linux-usb@vger.kernel.org,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
target-devel@vger.kernel.org, qla2xxx-upstream@qlogic.com,
linux1394-devel@lists.sourceforge.net, kent.overstreet@gmail.com
In-Reply-To: <da5220a5ed4bed210c31a7517389e787a3b1a01f.camel@wdc.com>
On Tue, Jun 12, 2018 at 03:22:42PM +0000, Bart Van Assche wrote:
> On Tue, 2018-05-15 at 09:00 -0700, Matthew Wilcox wrote:
> > diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> > index 025dc2d3f3de..cdf671c2af61 100644
> > --- a/drivers/scsi/qla2xxx/qla_target.c
> > +++ b/drivers/scsi/qla2xxx/qla_target.c
> > @@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
> > return;
> > }
> > cmd->jiffies_at_free = get_jiffies_64();
> > - percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
> > + sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
> > + cmd->se_cmd.map_cpu);
> > }
> > EXPORT_SYMBOL(qlt_free_cmd);
>
> Please introduce functions in the target core for allocating and freeing a tag
> instead of spreading the knowledge of how to allocate and free tags over all
> target drivers.
I can't without doing an unreasonably large amount of work on drivers that
I have no way to test. Some of the drivers have the se_cmd already; some
of them don't. I'd be happy to introduce a common function for freeing
a tag.
> > +int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
> > +{
> > + int tag = -1;
> > + DEFINE_WAIT(wait);
> > + struct sbq_wait_state *ws;
> > +
> > + if (state == TASK_RUNNING)
> > + return tag;
> > +
> > + ws = &se_sess->sess_tag_pool.ws[0];
> > + for (;;) {
> > + prepare_to_wait_exclusive(&ws->wait, &wait, state);
> > + if (signal_pending_state(state, current))
> > + break;
>
> This looks weird to me. Shouldn't target code ignore signals instead of causing
> tag allocation to fail if a signal is received?
It's what the current code did:
- if (signal_pending_state(state, current)) {
- tag = -ERESTARTSYS;
- break;
- }
and the current callers literally indicate that they want signals:
drivers/infiniband/ulp/isert/ib_isert.c: cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);
drivers/target/iscsi/iscsi_target.c: cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);
(etc)
^ permalink raw reply
* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Bart Van Assche @ 2018-06-12 16:32 UTC (permalink / raw)
To: willy@infradead.org
Cc: jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
kvm@vger.kernel.org, mawilcox@microsoft.com,
netdev@vger.kernel.org, linux-usb@vger.kernel.org,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
target-devel@vger.kernel.org, qla2xxx-upstream@qlogic.com,
linux1394-devel@lists.sourceforge.net, kent.overstreet@gmail.com
In-Reply-To: <20180612161526.GE19433@bombadil.infradead.org>
On Tue, 2018-06-12 at 09:15 -0700, Matthew Wilcox wrote:
> On Tue, Jun 12, 2018 at 03:22:42PM +0000, Bart Van Assche wrote:
> > On Tue, 2018-05-15 at 09:00 -0700, Matthew Wilcox wrote:
> > > diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> > > index 025dc2d3f3de..cdf671c2af61 100644
> > > --- a/drivers/scsi/qla2xxx/qla_target.c
> > > +++ b/drivers/scsi/qla2xxx/qla_target.c
> > > @@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
> > > return;
> > > }
> > > cmd->jiffies_at_free = get_jiffies_64();
> > > - percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
> > > + sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
> > > + cmd->se_cmd.map_cpu);
> > > }
> > > EXPORT_SYMBOL(qlt_free_cmd);
> >
> > Please introduce functions in the target core for allocating and freeing a tag
> > instead of spreading the knowledge of how to allocate and free tags over all
> > target drivers.
>
> I can't without doing an unreasonably large amount of work on drivers that
> I have no way to test. Some of the drivers have the se_cmd already; some
> of them don't. I'd be happy to introduce a common function for freeing
> a tag.
Which target drivers are you referring to? If you are referring to the sbp driver:
I think that driver is dead and can be removed from the kernel tree. I even don't
know whether that driver ever has had any users other than the developer of that
driver.
> > This looks weird to me. Shouldn't target code ignore signals instead of causing
> > tag allocation to fail if a signal is received?
>
> It's what the current code did:
>
> - if (signal_pending_state(state, current)) {
> - tag = -ERESTARTSYS;
> - break;
> - }
>
> and the current callers literally indicate that they want signals:
>
> drivers/infiniband/ulp/isert/ib_isert.c: cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);
> drivers/target/iscsi/iscsi_target.c: cmd = iscsit_allocate_cmd(conn, TASK_INTERRUPTIBLE);
Right, the iSCSI target driver uses signals to wake up threads (see also the
send_sig() calls in the iSCSI target code).
Bart.
^ permalink raw reply
* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-06-12 18:08 UTC (permalink / raw)
To: Bart Van Assche
Cc: jgross@suse.com, axboe@kernel.dk, linux-scsi@vger.kernel.org,
kvm@vger.kernel.org, netdev@vger.kernel.org,
linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
target-devel@vger.kernel.org, qla2xxx-upstream@qlogic.com,
linux1394-devel@lists.sourceforge.net, kent.overstreet@gmail.com
In-Reply-To: <0c93c72a3a339f3479f82de04223315671e07863.camel@wdc.com>
On Tue, Jun 12, 2018 at 04:32:03PM +0000, Bart Van Assche wrote:
> On Tue, 2018-06-12 at 09:15 -0700, Matthew Wilcox wrote:
> > On Tue, Jun 12, 2018 at 03:22:42PM +0000, Bart Van Assche wrote:
> > > Please introduce functions in the target core for allocating and freeing a tag
> > > instead of spreading the knowledge of how to allocate and free tags over all
> > > target drivers.
> >
> > I can't without doing an unreasonably large amount of work on drivers that
> > I have no way to test. Some of the drivers have the se_cmd already; some
> > of them don't. I'd be happy to introduce a common function for freeing
> > a tag.
>
> Which target drivers are you referring to? If you are referring to the sbp driver:
> I think that driver is dead and can be removed from the kernel tree. I even don't
> know whether that driver ever has had any users other than the developer of that
> driver.
For example tcm_fc:
tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
goto busy;
cmd = &((struct ft_cmd *)se_sess->sess_cmd_map)[tag];
or qla2xxx:
tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
return NULL;
cmd = &((struct qla_tgt_cmd *)se_sess->sess_cmd_map)[tag];
The core doesn't know at what offset from the pointer to store the tag
& cpu. Only the individual drivers know their cmd layout.
^ permalink raw reply
* Re: [PATCH v4 1/3] compiler-gcc.h: add gnu_inline to all inline declarations
From: H. Peter Anvin @ 2018-06-12 18:51 UTC (permalink / raw)
To: Nick Desaulniers, sedat.dilek, Arnd Bergmann
Cc: Kate Stewart, linux-efi, brijesh.singh, J. Kiszka, Will Deacon,
virtualization, Masahiro Yamada, Manoj Gupta, boris.ostrovsky,
mawilcox, x86, akataria, Greg Hackmann, Cao.jin, mingo, geert,
David Rientjes, Andrey Ryabinin, Linux Kbuild mailing list,
Kees Cook, rostedt, acme, Michal Marek, Thomas Gleixner,
Alistair Strachan, tstellar, Ard Biesheuvel
In-Reply-To: <CAKwvOdmxFHfkj81YAXa8fE86fC+k+KvqMvgk4=e62pNCcXbKOg@mail.gmail.com>
<caoj.fnst@cn.fujitsu.com>,Greg KH <gregkh@linuxfoundation.org>,jarkko.sakkinen@linux.intel.com,jgross@suse.com,Josh Poimboeuf <jpoimboe@redhat.com>,Matthias Kaehlcke <mka@chromium.org>,thomas.lendacky@amd.com,Thiebaud Weksteen <tweek@google.com>,mjg59@google.com,joe@perches.com
From: hpa@zytor.com
Message-ID: <191E4EBE-4CB2-4C8B-AB61-689A91FFE7A8@zytor.com>
On June 12, 2018 11:33:14 AM PDT, Nick Desaulniers <ndesaulniers@google.com> wrote:
>On Fri, Jun 8, 2018 at 3:04 AM Sedat Dilek <sedat.dilek@gmail.com>
>wrote:
>>
>> On Fri, Jun 8, 2018 at 9:59 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Thu, Jun 7, 2018 at 10:49 PM, Nick Desaulniers
>> > <ndesaulniers@google.com> wrote:
>> >> Functions marked extern inline do not emit an externally visible
>> >> function when the gnu89 C standard is used. Some KBUILD Makefiles
>> >> overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as
>without
>> >> an explicit C standard specified, the default is gnu11. Since c99,
>the
>> >> semantics of extern inline have changed such that an externally
>visible
>> >> function is always emitted. This can lead to multiple definition
>errors
>> >> of extern inline functions at link time of compilation units whose
>build
>> >> files have removed an explicit C standard compiler flag for users
>of GCC
>> >> 5.1+ or Clang.
>> >>
>> >> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
>> >> Suggested-by: H. Peter Anvin <hpa@zytor.com>
>> >> Suggested-by: Joe Perches <joe@perches.com>
>> >
>> > I suspect this will break Geert's gcc-4.1.2, which I think doesn't
>have that
>> > attribute yet (4.1.3 or higher have it according to the
>documentation.
>> >
>> > It wouldn't be hard to work around that if we want to keep that
>version
>> > working, or we could decide that it's time to officially stop
>supporting
>> > that version, but we should probably decide on one or the other.
>
>Heh, so earlier we decided against compiler flags (-std=gnu89 or
>-fgnu89-inline) in preference to function attributes. The function
>attribute is preferable as some of the Makefiles [accidentally?]
>overwrite KBUILD_CFLAGS, which is problematic for gcc 5.1 users as the
>implicit c standard used was changed to gnu11 from gnu89. What's nice
>is that to support gcc 4.1 users, we simply don't need to add any
>attribute, as their implicit c standard is gnu89 which has the
>semantics for extern inline that we want. I have a simple change to
>this patch that can support users of various gcc versions, see below:
>
>> Good point.
>> What is the minimum requirement of GCC version currently?
>> AFAICS x86/asm-goto support requires GCC >= 4.5?
>
>Yes, but that's only for x86, IIUC. It seems the kernel may have
>different minimum required versions of GCC based on arch then? That
>may be ok, but I'm not sure that's easy to keep track of without
>having it explicitly stated somewhere like the docs perhaps?
>
>> Just FYI...
>> ...saw the last days in upstream commits that kbuild/kconfig for
>> 4.18-rc1 offers possibilities to check for cc-version dependencies.
>
>Those will be helpful. If we want to pursue compiler flags, which get
>set some Makefiles, then yes. But I think a simpler change to my
>patch would be as below.
>
>It seems gcc did not get __has_attribute [0] until 5.1, but will
>define __GNUC_GNU_INLINE__ if supported. [1] Unfortunately, Clang
>does not define __GNUC_GNU_INLINE__ [2]. So a proper feature test
>might look like:
>
>```
>#ifndef __has_attribute
>#define __has_attribute(x) 0
>#endif
>
>#if defined(__GNUC_GNU_INLINE__) || __has_attribute(gnu_inline)
>#define __gnu_inline __attribute__(gnu_inline)
>#endif
>
>#define inline inline __attribute__((always_inline, unused)) notrace
>__gnu_inline
>```
>
>Thoughts on this approach? I can send a v5 tomorrow if there's no
>major issues. Feedback appreciated, as always.
>
>[0] https://clang.llvm.org/docs/LanguageExtensions.html#has-attribute
>[1]
>https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
>[2] https://bugs.llvm.org/show_bug.cgi?id=37784
Please fix clang. It isn't all that hard to fix.
However, __GCC_GNU_INLINE__ means you are in GNU mode by default, on gcc's new enough to have multiple misses.
The right thing to look for is __GCC_STDC_INLINE__ in which case you need the attribute.
By the way, you should check clang against gcc's predefined macros by doing:
gcc [options] -x c -Wp,-dM -E /dev/null | sort
Options can change the predefined macros substantially, especially the, -std=, arch and -O options. -x c can be replaced with e.g. -x c++, objective-c, assembler-with-cpp etc.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
^ permalink raw reply
* [PATCH 0/3] Use sbitmap instead of percpu_ida
From: Matthew Wilcox @ 2018-06-12 19:05 UTC (permalink / raw)
To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
linux-usb, kvm, virtualization, netdev, Juergen Gross,
qla2xxx-upstream, Kent Overstreet, Jens Axboe
Cc: Matthew Wilcox
Removing the percpu_ida code nets over 400 lines of removal. It's not
as spectacular as deleting an entire architecture, but it's still a
worthy reduction in lines of code.
Untested due to lack of hardware and not understanding how to set up a
target platform.
Changes from v1:
- Fixed bugs pointed out by Jens in iscsit_wait_for_tag()
- Abstracted out tag freeing as requested by Bart
- Made iscsit_wait_for_tag static as pointed out by 0day
Matthew Wilcox (3):
target: Abstract tag freeing
Convert target drivers to use sbitmap
Remove percpu_ida
drivers/scsi/qla2xxx/qla_target.c | 14 +-
drivers/target/iscsi/iscsi_target_util.c | 35 ++-
drivers/target/sbp/sbp_target.c | 7 +-
drivers/target/target_core_transport.c | 5 +-
drivers/target/tcm_fc/tfc_cmd.c | 10 +-
drivers/usb/gadget/function/f_tcm.c | 7 +-
drivers/vhost/scsi.c | 8 +-
drivers/xen/xen-scsiback.c | 9 +-
include/linux/percpu_ida.h | 83 -----
include/target/iscsi/iscsi_target_core.h | 1 +
include/target/target_core_base.h | 10 +-
lib/Makefile | 2 +-
lib/percpu_ida.c | 370 -----------------------
13 files changed, 73 insertions(+), 488 deletions(-)
delete mode 100644 include/linux/percpu_ida.h
delete mode 100644 lib/percpu_ida.c
--
2.17.1
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox