* Re: [RFC v2] virtio: support packed ring
From: Tiwei Bie @ 2018-04-13 7:15 UTC (permalink / raw)
To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <ebfbfc3e-9e52-9331-61b6-258e131f822d@redhat.com>
On Fri, Apr 13, 2018 at 12:30:24PM +0800, Jason Wang wrote:
> On 2018年04月01日 22:12, Tiwei Bie wrote:
> > Hello everyone,
> >
> > This RFC implements packed ring support for virtio driver.
> >
> > The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
> > by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
> > Minor changes are needed for the vhost code, e.g. to kick the guest.
> >
> > TODO:
> > - Refinements and bug fixes;
> > - Split into small patches;
> > - Test indirect descriptor support;
> > - Test/fix event suppression support;
> > - Test devices other than net;
> >
> > RFC v1 -> RFC v2:
> > - Add indirect descriptor support - compile test only;
> > - Add event suppression supprt - compile test only;
> > - Move vring_packed_init() out of uapi (Jason, MST);
> > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > - Avoid using '%' operator (Jason);
> > - Rename free_head -> next_avail_idx (Jason);
> > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > - Some other refinements and bug fixes;
> >
> > Thanks!
> >
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> > drivers/virtio/virtio_ring.c | 1094 +++++++++++++++++++++++++++++-------
> > include/linux/virtio_ring.h | 8 +-
> > include/uapi/linux/virtio_config.h | 12 +-
> > include/uapi/linux/virtio_ring.h | 61 ++
> > 4 files changed, 980 insertions(+), 195 deletions(-)
[...]
> > +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> > + unsigned int total_sg,
> > + gfp_t gfp)
> > +{
> > + struct vring_packed_desc *desc;
> > +
> > + /*
> > + * We require lowmem mappings for the descriptors because
> > + * otherwise virt_to_phys will give us bogus addresses in the
> > + * virtqueue.
> > + */
> > + gfp &= ~__GFP_HIGHMEM;
> > +
> > + desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
>
> Can we simply check vq->packed here to avoid duplicating helpers?
Then it would be something like this:
static void *alloc_indirect(struct virtqueue *_vq, unsigned int total_sg,
gfp_t gfp)
{
struct vring_virtqueue *vq = to_vvq(_vq);
void *data;
/*
* We require lowmem mappings for the descriptors because
* otherwise virt_to_phys will give us bogus addresses in the
* virtqueue.
*/
gfp &= ~__GFP_HIGHMEM;
if (vq->packed) {
data = kmalloc(total_sg * sizeof(struct vring_packed_desc),
gfp);
if (!data)
return NULL;
} else {
struct vring_desc *desc;
unsigned int i;
desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
if (!desc)
return NULL;
for (i = 0; i < total_sg; i++)
desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
data = desc;
}
return data;
}
I would prefer to have two simpler helpers (and to the callers,
it's already very clear about which one they should call), i.e.
the current implementation:
static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
unsigned int total_sg,
gfp_t gfp)
{
struct vring_packed_desc *desc;
/*
* We require lowmem mappings for the descriptors because
* otherwise virt_to_phys will give us bogus addresses in the
* virtqueue.
*/
gfp &= ~__GFP_HIGHMEM;
desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
return desc;
}
static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
unsigned int total_sg,
gfp_t gfp)
{
struct vring_desc *desc;
unsigned int i;
/*
* We require lowmem mappings for the descriptors because
* otherwise virt_to_phys will give us bogus addresses in the
* virtqueue.
*/
gfp &= ~__GFP_HIGHMEM;
desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
if (!desc)
return NULL;
for (i = 0; i < total_sg; i++)
desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
return desc;
}
>
> > +
> > + return desc;
> > +}
[...]
> > +static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > + struct scatterlist *sgs[],
> > + unsigned int total_sg,
> > + unsigned int out_sgs,
> > + unsigned int in_sgs,
> > + void *data,
> > + void *ctx,
> > + gfp_t gfp)
> > +{
> > + struct vring_virtqueue *vq = to_vvq(_vq);
> > + struct vring_packed_desc *desc;
> > + struct scatterlist *sg;
> > + unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> > + __virtio16 uninitialized_var(head_flags), flags;
> > + int head, wrap_counter;
> > + bool indirect;
> > +
> > + START_USE(vq);
> > +
> > + BUG_ON(data == NULL);
> > + BUG_ON(ctx && vq->indirect);
> > +
> > + if (unlikely(vq->broken)) {
> > + END_USE(vq);
> > + return -EIO;
> > + }
> > +
> > +#ifdef DEBUG
> > + {
> > + ktime_t now = ktime_get();
> > +
> > + /* No kick or get, with .1 second between? Warn. */
> > + if (vq->last_add_time_valid)
> > + WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> > + > 100);
> > + vq->last_add_time = now;
> > + vq->last_add_time_valid = true;
> > + }
> > +#endif
> > +
> > + BUG_ON(total_sg == 0);
> > +
> > + head = vq->next_avail_idx;
> > + wrap_counter = vq->wrap_counter;
> > +
> > + /* If the host supports indirect descriptor tables, and we have multiple
> > + * buffers, then go indirect. FIXME: tune this threshold */
> > + if (vq->indirect && total_sg > 1 && vq->vq.num_free)
>
> Let's introduce a helper like virtqueue_need_indirect() to avoid duplicating
> codes and FIXME.
Okay.
>
> > + desc = alloc_indirect_packed(_vq, total_sg, gfp);
> > + else {
> > + desc = NULL;
> > + WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
> > + }
> > +
> > + if (desc) {
> > + /* Use a single buffer which doesn't continue */
> > + indirect = true;
> > + /* Set up rest to use this indirect table. */
> > + i = 0;
> > + descs_used = 1;
> > + } else {
> > + indirect = false;
> > + desc = vq->vring_packed.desc;
> > + i = head;
> > + descs_used = total_sg;
> > + }
> > +
> > + if (vq->vq.num_free < descs_used) {
> > + pr_debug("Can't add buf len %i - avail = %i\n",
> > + descs_used, vq->vq.num_free);
> > + /* FIXME: for historical reasons, we force a notify here if
> > + * there are outgoing parts to the buffer. Presumably the
> > + * host should service the ring ASAP. */
> > + if (out_sgs)
> > + vq->notify(&vq->vq);
> > + if (indirect)
> > + kfree(desc);
> > + END_USE(vq);
> > + return -ENOSPC;
> > + }
> > +
> > + for (n = 0; n < out_sgs + in_sgs; n++) {
> > + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > + dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> > + DMA_TO_DEVICE : DMA_FROM_DEVICE);
> > + if (vring_mapping_error(vq, addr))
> > + goto unmap_release;
> > +
> > + flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > + (n < out_sgs ? 0 : VRING_DESC_F_WRITE) |
> > + VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > + VRING_DESC_F_USED(!vq->wrap_counter));
> > + if (!indirect && i == head)
> > + head_flags = flags;
> > + else
> > + desc[i].flags = flags;
> > +
> > + desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > + desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > + desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>
> Similar to V1, we only need this for the last descriptor.
Okay, will just set it for the last desc.
>
> > + prev = i;
>
> It looks to me there's no need to track prev inside the loop here.
>
> > + i++;
> > + if (!indirect && i >= vq->vring_packed.num) {
> > + i = 0;
> > + vq->wrap_counter ^= 1;
> > + }
> > + }
> > + }
> > + /* Last one doesn't continue. */
> > + if (total_sg == 1)
> > + head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > + else
> > + desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>
> The only case when prev != i - 1 is i == 0, we can add a if here.
It's just a mirror of the existing implementation in split ring.
It seems that split ring implementation needs this just because
it's much harder for it to find the prev, which is not true for
packed ring. So I'll take your suggestion. Thanks!
>
[...]
> > +static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > +{
> > + struct vring_virtqueue *vq = to_vvq(_vq);
> > + u16 new, old, off_wrap;
> > + bool needs_kick;
> > +
> > + START_USE(vq);
> > + /* We need to expose the new flags value before checking notification
> > + * suppressions. */
> > + virtio_mb(vq->weak_barriers);
> > +
> > + old = vq->next_avail_idx - vq->num_added;
> > + new = vq->next_avail_idx;
> > + vq->num_added = 0;
> > +
> > +#ifdef DEBUG
> > + if (vq->last_add_time_valid) {
> > + WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> > + vq->last_add_time)) > 100);
> > + }
> > + vq->last_add_time_valid = false;
> > +#endif
> > +
> > + off_wrap = virtio16_to_cpu(_vq->vdev, vq->vring_packed.device->off_wrap);
> > +
> > + if (vq->event) {
>
> It looks to me we should examine RING_EVENT_FLAGS_DESC in desc_event_flags
> instead of vq->event here. Spec does not forces to use evenf_off and
> event_wrap if event index is enabled.
>
> > + // FIXME: fix this!
> > + needs_kick = ((off_wrap >> 15) == vq->wrap_counter) &&
> > + vring_need_event(off_wrap & ~(1<<15), new, old);
>
> Why need a & here?
Because wrap_counter (the most significant bit in off_wrap)
isn't part of the index.
>
> > + } else {
>
> Need a smp_rmb() to make sure desc_event_flags was checked before flags.
I don't get your point, if my understanding is correct,
desc_event_flags is vq->vring_packed.device->flags. So
what's the "flags"?
>
> > + needs_kick = (vq->vring_packed.device->flags !=
> > + cpu_to_virtio16(_vq->vdev, VRING_EVENT_F_DISABLE));
> > + }
> > + END_USE(vq);
> > + return needs_kick;
> > +}
[...]
> > +static int detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> > + void **ctx)
> > +{
> > + struct vring_packed_desc *desc;
> > + unsigned int i, j;
> > +
> > + /* Clear data ptr. */
> > + vq->desc_state[head].data = NULL;
> > +
> > + i = head;
> > +
> > + for (j = 0; j < vq->desc_state[head].num; j++) {
> > + desc = &vq->vring_packed.desc[i];
> > + vring_unmap_one_packed(vq, desc);
> > + desc->flags = 0x0;
>
> Looks like this is unnecessary.
It's safer to zero it. If we don't zero it, after we
call virtqueue_detach_unused_buf_packed() which calls
this function, the desc is still available to the
device.
>
> > + i++;
> > + if (i >= vq->vring_packed.num)
> > + i = 0;
> > + }
[...]
> > +static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> > +{
> > + struct vring_virtqueue *vq = to_vvq(_vq);
> > + u16 last_used_idx, wrap_counter, off_wrap;
> > +
> > + START_USE(vq);
> > +
> > + last_used_idx = vq->last_used_idx;
> > + wrap_counter = vq->wrap_counter;
> > +
> > + if (last_used_idx > vq->next_avail_idx)
> > + wrap_counter ^= 1;
> > +
> > + off_wrap = last_used_idx | (wrap_counter << 15);
> > +
> > + /* We optimistically turn back on interrupts, then check if there was
> > + * more to do. */
> > + /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> > + * either clear the flags bit or point the event index at the next
> > + * entry. Always do both to keep code simple. */
> > + if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC:
> > + VRING_EVENT_F_ENABLE;
> > + vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > + vq->event_flags_shadow);
> > + }
>
> A smp_wmb() is missed here?
>
> > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev, off_wrap);
>
> And according to the spec, it looks to me write a VRING_EVENT_F_ENABLE is
> sufficient here.
I didn't think much when implementing the event suppression
for packed ring previously. After I saw your comments, I found
something new. Indeed, unlike the split ring, for the packed
ring, spec doesn't say we must use VRING_EVENT_F_DESC when
EVENT_IDX is negotiated. So do you think below thought is
right or makes sense?
- For virtqueue_enable_cb_prepare(), we just need to enable
the ring by setting flags to VRING_EVENT_F_ENABLE in any
case.
- We will try to use VRING_EVENT_F_DESC (if EVENT_IDX is
negotiated) only when we want to delay the interrupts
virtqueue_enable_cb_delayed().
>
> > + END_USE(vq);
> > + return last_used_idx;
> > +}
> > +
[...]
> > @@ -1157,14 +1852,18 @@ void vring_transport_features(struct virtio_device *vdev)
> > for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
> > switch (i) {
> > - case VIRTIO_RING_F_INDIRECT_DESC:
> > +#if 0
> > + case VIRTIO_RING_F_INDIRECT_DESC: // FIXME not tested yet.
> > break;
> > - case VIRTIO_RING_F_EVENT_IDX:
> > + case VIRTIO_RING_F_EVENT_IDX: // FIXME probably not work.
> > break;
> > +#endif
>
> It would be better if you can split EVENT_IDX and INDIRECT_DESC into
> separate patches too.
Sure. Will do it in the next version.
Thanks for the review!
>
> Thanks
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH] net: stmmac: fix missing support for 802.1AD tag on reception
From: Elad Nachman @ 2018-04-13 7:03 UTC (permalink / raw)
To: netdev, davem
stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before
calling napi_gro_receive().
The function assumes VLAN tagged frames are always tagged with 802.1Q
protocol,
and assigns ETH_P_8021Q to the skb by hard-coding the parameter on call
to __vlan_hwaccel_put_tag() .
This causes packets not to be passed to the VLAN slave if it was created
with 802.1AD protocol
(ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).
This fix passes the protocol from the VLAN header into
__vlan_hwaccel_put_tag()
instead of using the hard-coded value of ETH_P_8021Q.
Signed-off-by: Elad Nachman <eladn@gilat.com>
---
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 2018-04-11
17:04:00.586057300 +0300
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 2018-04-11
17:05:33.601992400 +0300
@@ -3293,17 +3293,19 @@ dma_map_err:
static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
{
- struct ethhdr *ehdr;
+ struct vlan_ethhdr *veth;
u16 vlanid;
+ __be16 vlan_proto;
if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
NETIF_F_HW_VLAN_CTAG_RX &&
!__vlan_get_tag(skb, &vlanid)) {
/* pop the vlan tag */
- ehdr = (struct ethhdr *)skb->data;
- memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
+ veth = (struct vlan_ethhdr *)skb->data;
+ vlan_proto = veth->h_vlan_proto;
+ memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
skb_pull(skb, VLAN_HLEN);
- __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
+ __vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
}
}
^ permalink raw reply
* [PATCH net] virtio-net: add missing virtqueue kick when flushing packets
From: Jason Wang @ 2018-04-13 6:58 UTC (permalink / raw)
To: mst, jasowang, virtualization, netdev, linux-kernel
Cc: ktaka, Daniel Borkmann
We tends to batch submitting packets during XDP_TX. This requires to
kick virtqueue after a batch, we tried to do it through
xdp_do_flush_map() which only makes sense for devmap not XDP_TX. So
explicitly kick the virtqueue in this case.
Reported-by: Kimitoshi Takahashi <ktaka@nii.ac.jp>
Tested-by: Kimitoshi Takahashi <ktaka@nii.ac.jp>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2337460..d8e1aea 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1269,7 +1269,9 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
{
struct receive_queue *rq =
container_of(napi, struct receive_queue, napi);
- unsigned int received;
+ struct virtnet_info *vi = rq->vq->vdev->priv;
+ struct send_queue *sq;
+ unsigned int received, qp;
bool xdp_xmit = false;
virtnet_poll_cleantx(rq);
@@ -1280,8 +1282,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
if (received < budget)
virtqueue_napi_complete(napi, rq->vq, received);
- if (xdp_xmit)
+ if (xdp_xmit) {
+ qp = vi->curr_queue_pairs - vi->xdp_queue_pairs +
+ smp_processor_id();
+ sq = &vi->sq[qp];
+ virtqueue_kick(sq->vq);
xdp_do_flush_map();
+ }
return received;
}
--
2.7.4
^ permalink raw reply related
* [PATCH] net: stmmac: fix missing support for 802.1AD tag on reception
From: Elad Nachman @ 2018-04-13 6:45 UTC (permalink / raw)
To: netdev, davem
stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before
calling napi_gro_receive().
The function assumes VLAN tagged frames are always tagged with 802.1Q
protocol,
and assigns ETH_P_8021Q to the skb by hard-coding the parameter on call
to __vlan_hwaccel_put_tag() .
This causes packets not to be passed to the VLAN slave if it was created
with 802.1AD protocol
(ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).
This fix passes the protocol from the VLAN header into
__vlan_hwaccel_put_tag()
instead of using the hard-coded value of ETH_P_8021Q.
Signed-off-by: Elad Nachman <eladn@gilat.com>
---
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 2018-04-11
17:04:00.586057300 +0300
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 2018-04-11
17:05:33.601992400 +0300
@@ -3293,17 +3293,19 @@ dma_map_err:
static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
{
- struct ethhdr *ehdr;
+ struct vlan_ethhdr *veth;
u16 vlanid;
+ __be16 vlan_proto;
if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
NETIF_F_HW_VLAN_CTAG_RX &&
!__vlan_get_tag(skb, &vlanid)) {
/* pop the vlan tag */
- ehdr = (struct ethhdr *)skb->data;
- memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
+ veth = (struct vlan_ethhdr *)skb->data;
+ vlan_proto = veth->h_vlan_proto;
+ memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
skb_pull(skb, VLAN_HLEN);
- __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
+ __vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
}
}
^ permalink raw reply
* Re: INFO: task hung in do_ip_vs_set_ctl (2)
From: syzbot @ 2018-04-13 6:23 UTC (permalink / raw)
To: coreteam, davem, fw, horms, ja, kadlec, linux-kernel, lvs-devel,
netdev, netfilter-devel, pablo, syzkaller-bugs, wensong
In-Reply-To: <94eb2c059ce0bca273056940d77d@google.com>
syzbot has found reproducer for the following crash on net-next commit
17dec0a949153d9ac00760ba2f5b78cb583e995f (Wed Apr 4 02:15:32 2018 +0000)
Merge branch 'userns-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=7810ed2e0cb359580c17
So far this crash happened 2 times on net-next, upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5922062967242752
syzkaller reproducer:
https://syzkaller.appspot.com/x/repro.syz?id=5359824032235520
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=6352399027404800
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-2735707888269579554
compiler: gcc (GCC) 8.0.1 20180301 (experimental)
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+7810ed2e0cb359580c17@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.
INFO: task syzkaller402106:4498 blocked for more than 120 seconds.
Not tainted 4.16.0+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syzkaller402106 D22184 4498 4494 0x00000000
Call Trace:
context_switch kernel/sched/core.c:2848 [inline]
__schedule+0x807/0x1e40 kernel/sched/core.c:3490
schedule+0xef/0x430 kernel/sched/core.c:3549
schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3607
__mutex_lock_common kernel/locking/mutex.c:833 [inline]
__mutex_lock+0xe38/0x17f0 kernel/locking/mutex.c:893
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
nf_setsockopt+0x7d/0xd0 net/netfilter/nf_sockopt.c:115
ip_setsockopt+0xd8/0xf0 net/ipv4/ip_sockglue.c:1253
tcp_setsockopt+0x93/0xe0 net/ipv4/tcp.c:2888
sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039
smc_setsockopt+0xc7/0x120 net/smc/af_smc.c:1289
__sys_setsockopt+0x1bd/0x390 net/socket.c:1903
SYSC_setsockopt net/socket.c:1914 [inline]
SyS_setsockopt+0x34/0x50 net/socket.c:1911
do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x445959
RSP: 002b:00007f2770618db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00000000006dac24 RCX: 0000000000445959
RDX: 000000000000048c RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00000000006dac20 R08: 0000000000000018 R09: 0000000000000000
R10: 0000000020000140 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd81ae8f6f R14: 00007f27706199c0 R15: 0000000000000001
Showing all locks held in the system:
3 locks held by kworker/0:0/4:
#0: 000000007346131c ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
__write_once_size include/linux/compiler.h:215 [inline]
#0: 000000007346131c ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: 000000007346131c ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline]
#0: 000000007346131c ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
atomic_long_set include/asm-generic/atomic-long.h:57 [inline]
#0: 000000007346131c ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
set_work_data kernel/workqueue.c:617 [inline]
#0: 000000007346131c ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
#0: 000000007346131c ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at:
process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116
#1: 00000000894403a3 ((addr_chk_work).work){+.+.}, at:
process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120
#2: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
2 locks held by khungtaskd/877:
#0: 00000000706bfe1c (rcu_read_lock){....}, at:
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
#0: 00000000706bfe1c (rcu_read_lock){....}, at: watchdog+0x1ff/0xf60
kernel/hung_task.c:249
#1: 00000000761e40d2 (tasklist_lock){.+.+}, at:
debug_show_all_locks+0xde/0x34a kernel/locking/lockdep.c:4470
2 locks held by getty/4464:
#0: 00000000f90a9320 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 000000005dd151b8 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4465:
#0: 00000000737b5b26 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 0000000017bb1ae5 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4466:
#0: 00000000badd071e (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 00000000a46de9fa (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4467:
#0: 00000000112731eb (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 00000000166cdec3 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4468:
#0: 00000000d9915e26 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 00000000abf02ab1 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4469:
#0: 0000000041b9a2d2 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 000000003dbb7393 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4470:
#0: 00000000dd811911 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: 00000000c9586a34 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
1 lock held by syzkaller402106/4498:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4517:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4523:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4535:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4537:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4548:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4553:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4564:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4571:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4581:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4588:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4598:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4603:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4615:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4620:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4631:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4511:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4514:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4526:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4530:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4542:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4546:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4558:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4562:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4574:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4578:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4590:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4594:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4606:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4610:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4622:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4626:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4501:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4518:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4522:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4533:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4539:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4550:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4556:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4567:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4572:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4582:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4586:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4597:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4602:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4612:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4617:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4628:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4506:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4515:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4527:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4531:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4543:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4547:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4559:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4563:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4575:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4579:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4591:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4595:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4607:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4611:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4623:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4627:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4503:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4519:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4524:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4534:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4538:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4549:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4554:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4566:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4569:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4580:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4585:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4596:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4601:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4613:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4619:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4630:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
2 locks held by syzkaller402106/4508:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
#1: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x562/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2388
1 lock held by syzkaller402106/4513:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4525:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4529:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4541:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4545:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4557:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4561:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4573:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4577:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4589:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4593:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4605:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4609:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4621:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4625:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4507:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4516:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4521:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4532:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4540:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4551:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4555:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4565:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4570:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4583:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4587:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4599:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4604:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4614:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4618:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4629:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4509:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4512:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4520:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4528:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4536:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4544:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4552:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4560:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4568:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4576:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4584:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4592:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4600:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4608:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by syzkaller402106/4616:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
1 lock held by syzkaller402106/4624:
#0: 00000000bfa7d99e (ipvs->sync_mutex){+.+.}, at:
do_ip_vs_set_ctl+0x339/0x1d30 net/netfilter/ipvs/ip_vs_ctl.c:2393
1 lock held by ipvs-b:0:0/4510:
#0: 00000000ddc85278 (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74
=============================================
NMI backtrace for cpu 1
CPU: 1 PID: 877 Comm: khungtaskd Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x1b9/0x29f lib/dump_stack.c:53
nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
check_hung_task kernel/hung_task.c:132 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
watchdog+0xc10/0xf60 kernel/hung_task.c:249
kthread+0x345/0x410 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:timekeeping_delta_to_ns kernel/time/timekeeping.c:351 [inline]
RIP: 0010:timekeeping_get_ns kernel/time/timekeeping.c:362 [inline]
RIP: 0010:ktime_get+0x308/0x430 kernel/time/timekeeping.c:764
RSP: 0018:ffff8801db007b98 EFLAGS: 00000002
RAX: 000000002930e468 RBX: 0000023e54000000 RCX: 0000000000000017
RDX: ffffed003b600f79 RSI: ffffffff8166cd99 RDI: 0000000000000004
RBP: ffff8801db007c90 R08: ffffffff88a752c0 R09: 0000000000000000
R10: ffffed0043fff001 R11: ffff88021fff8017 R12: dffffc0000000000
R13: 000000000006509e R14: ffff8801db007c68 R15: ffffffff88b172a0
FS: 0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000090e000 CR3: 00000001b130f000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
tick_nohz_irq_enter kernel/time/tick-sched.c:1124 [inline]
tick_irq_enter+0xbc/0x3f0 kernel/time/tick-sched.c:1145
irq_enter+0xb6/0xd0 kernel/softirq.c:346
scheduler_ipi+0x39b/0xa30 kernel/sched/core.c:1768
smp_reschedule_interrupt+0xed/0x660 arch/x86/kernel/smp.c:277
reschedule_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:886
</IRQ>
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
RSP: 0018:ffffffff88a07c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff02
RAX: dffffc0000000000 RBX: 1ffffffff1140f8a RCX: 0000000000000000
RDX: 1ffffffff1162e58 RSI: 0000000000000001 RDI: ffffffff88b172c0
RBP: ffffffff88a07c38 R08: ffffffff88a752c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff88a07cf0 R14: ffffffff88b172b0 R15: 0000000000000000
arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
default_idle+0xc2/0x440 arch/x86/kernel/process.c:354
arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345
default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
cpuidle_idle_call kernel/sched/idle.c:151 [inline]
do_idle+0x2f4/0x3c0 kernel/sched/idle.c:241
cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:346
rest_init+0xe1/0xe4 init/main.c:437
start_kernel+0x887/0x8ae init/main.c:717
x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:443
x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:424
secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
Code: ff ff 0f b6 8d 18 ff ff ff 48 ba 00 00 00 00 00 fc ff df 48 03 95 08
ff ff ff 48 0f af d8 48 8b 85 10 ff ff ff 48 01 d8 48 d3 e8 <48> 03 85 20
ff ff ff 48 c7 02 00 00 00 00 48 c7 42 08 00 00 00
^ permalink raw reply
* [RFC PATCH] net: stmmac: dwmac-sun8i: single_reg_field can be static
From: kbuild test robot @ 2018-04-13 6:03 UTC (permalink / raw)
To: Icenowy Zheng
Cc: kbuild-all-JC7UmRfGjtg, Rob Herring, Maxime Ripard, Chen-Yu Tsai,
Giuseppe Cavallaro, Corentin Labbe, netdev-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-sunxi-/JYPxA39Uh5TLH3MbocFFw, Icenowy Zheng
In-Reply-To: <20180411141641.14675-4-icenowy-h8G6r0blFSE@public.gmane.org>
Fixes: 529123418105 ("net: stmmac: dwmac-sun8i: Allow getting syscon regmap from device")
Signed-off-by: Fengguang Wu <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
dwmac-sun8i.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
index b61210c..842315a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
@@ -86,7 +86,7 @@ const struct reg_field old_syscon_reg_field = {
};
/* Specially exported regmap which contains only EMAC register */
-const struct reg_field single_reg_field = {
+static const struct reg_field single_reg_field = {
.reg = 0,
.lsb = 0,
.msb = 31,
^ permalink raw reply related
* Re: [PATCH 3/5] net: stmmac: dwmac-sun8i: Allow getting syscon regmap from device
From: kbuild test robot @ 2018-04-13 6:03 UTC (permalink / raw)
To: Icenowy Zheng
Cc: kbuild-all-JC7UmRfGjtg, Rob Herring, Maxime Ripard, Chen-Yu Tsai,
Giuseppe Cavallaro, Corentin Labbe, netdev-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-sunxi-/JYPxA39Uh5TLH3MbocFFw, Icenowy Zheng
In-Reply-To: <20180411141641.14675-4-icenowy-h8G6r0blFSE@public.gmane.org>
Hi Chen-Yu,
I love your patch! Perhaps something to improve:
[auto build test WARNING on robh/for-next]
[also build test WARNING on v4.16 next-20180412]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Icenowy-Zheng/Add-support-in-dwmac-sun8i-for-accessing-EMAC-clock/20180413-004816
base: https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__
sparse warnings: (new ones prefixed by >>)
drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c:82:24: sparse: symbol 'old_syscon_reg_field' was not declared. Should it be static?
>> drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c:89:24: sparse: symbol 'single_reg_field' was not declared. Should it be static?
Please review and possibly fold the followup patch.
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* Re: KMSAN: uninit-value in __netif_receive_skb_core
From: syzbot @ 2018-04-13 5:10 UTC (permalink / raw)
To: bpoirier, davem, dvyukov, edumazet, elena.reshetova, ishkamiel,
keescook, linux-kernel, makita.toshiaki, maloney, netdev,
rami.rosen, syzkaller-bugs, willemb
In-Reply-To: <94eb2c059ce01f643c0569a228ee@google.com>
syzbot has found reproducer for the following crash on
https://github.com/google/kmsan.git/master commit
35ff515e4bda2646f6c881d33951c306ea9c282a (Tue Apr 10 08:59:43 2018 +0000)
Merge pull request #11 from parkerduckworth/readme
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=b202b7208664142954fa
So far this crash happened 3 times on
https://github.com/google/kmsan.git/master.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4559916236800000
syzkaller reproducer:
https://syzkaller.appspot.com/x/repro.syz?id=4590273065648128
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=4631921027973120
Kernel config:
https://syzkaller.appspot.com/x/.config?id=6627248707860932248
compiler: clang version 7.0.0 (trunk 329391)
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b202b7208664142954fa@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.
==================================================================
BUG: KMSAN: uninit-value in __read_once_size include/linux/compiler.h:197
[inline]
BUG: KMSAN: uninit-value in deliver_ptype_list_skb net/core/dev.c:1908
[inline]
BUG: KMSAN: uninit-value in __netif_receive_skb_core+0x4630/0x4a80
net/core/dev.c:4545
CPU: 0 PID: 3514 Comm: syzkaller031167 Not tainted 4.16.0+ #83
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:53
kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
__msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
__read_once_size include/linux/compiler.h:197 [inline]
deliver_ptype_list_skb net/core/dev.c:1908 [inline]
__netif_receive_skb_core+0x4630/0x4a80 net/core/dev.c:4545
__netif_receive_skb net/core/dev.c:4627 [inline]
process_backlog+0x62d/0xe20 net/core/dev.c:5307
napi_poll net/core/dev.c:5705 [inline]
net_rx_action+0x7c1/0x1a70 net/core/dev.c:5771
__do_softirq+0x56d/0x93d kernel/softirq.c:285
do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1040
</IRQ>
do_softirq kernel/softirq.c:329 [inline]
__local_bh_enable_ip+0x114/0x140 kernel/softirq.c:182
local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
rcu_read_unlock_bh include/linux/rcupdate.h:726 [inline]
__dev_queue_xmit+0x2a31/0x2b60 net/core/dev.c:3584
dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
packet_snd net/packet/af_packet.c:2944 [inline]
packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969
sock_sendmsg_nosec net/socket.c:630 [inline]
sock_sendmsg net/socket.c:640 [inline]
sock_write_iter+0x3b9/0x470 net/socket.c:909
do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
do_iter_write+0x30d/0xd40 fs/read_write.c:932
vfs_writev fs/read_write.c:977 [inline]
do_writev+0x3c9/0x830 fs/read_write.c:1012
SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
SyS_writev+0x56/0x80 fs/read_write.c:1082
do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x43ffb9
RSP: 002b:00007ffd42187708 EFLAGS: 00000217 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ffb9
RDX: 0000000000000001 RSI: 00000000200010c0 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
R10: 00000000004002c8 R11: 0000000000000217 R12: 00000000004018e0
R13: 0000000000401970 R14: 0000000000000000 R15: 0000000000000000
Uninit was stored to memory at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
kmsan_save_stack mm/kmsan/kmsan.c:293 [inline]
kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684
__msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521
skb_vlan_untag+0x950/0xee0 include/linux/if_vlan.h:597
__netif_receive_skb_core+0x70a/0x4a80 net/core/dev.c:4460
__netif_receive_skb net/core/dev.c:4627 [inline]
process_backlog+0x62d/0xe20 net/core/dev.c:5307
napi_poll net/core/dev.c:5705 [inline]
net_rx_action+0x7c1/0x1a70 net/core/dev.c:5771
__do_softirq+0x56d/0x93d kernel/softirq.c:285
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
slab_post_alloc_hook mm/slab.h:445 [inline]
slab_alloc_node mm/slub.c:2737 [inline]
__kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
__kmalloc_reserve net/core/skbuff.c:138 [inline]
__alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
alloc_skb include/linux/skbuff.h:984 [inline]
alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
packet_alloc_skb net/packet/af_packet.c:2803 [inline]
packet_snd net/packet/af_packet.c:2894 [inline]
packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969
sock_sendmsg_nosec net/socket.c:630 [inline]
sock_sendmsg net/socket.c:640 [inline]
sock_write_iter+0x3b9/0x470 net/socket.c:909
do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
do_iter_write+0x30d/0xd40 fs/read_write.c:932
vfs_writev fs/read_write.c:977 [inline]
do_writev+0x3c9/0x830 fs/read_write.c:1012
SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
SyS_writev+0x56/0x80 fs/read_write.c:1082
do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
==================================================================
^ permalink raw reply
* Re: iproute2-4.16.0 no longer accepts routes via fe80::1
From: David Ahern @ 2018-04-13 4:39 UTC (permalink / raw)
To: Thomas Deutschmann, netdev; +Cc: serhe.popovych
In-Reply-To: <9d8beae3-ff87-6e89-c4ac-941e3c890d71@gentoo.org>
On 4/12/18 6:41 AM, Thomas Deutschmann wrote:
> Hi,
>
> well, it isn't just "fe80::1", it is any IPv6 address which
> will be rejected if not called with "-6". I run bisect:
>
>> git bisect start
>> # good: [50b8a842e8c098cddb213f5b3076526df88826e8] v4.15.0
>> git bisect good 50b8a842e8c098cddb213f5b3076526df88826e8
>> # bad: [4b6c4177ee66421770f0bbcc765c29135e44d921] v4.16.0
>> git bisect bad 4b6c4177ee66421770f0bbcc765c29135e44d921
>> # bad: [5f4892e2c8d4fb22118713e0c83290b352fe0e34] rdma: Make visible the number of arguments
>> git bisect bad 5f4892e2c8d4fb22118713e0c83290b352fe0e34
>> # good: [8c75f69411bc8c3affe5d173afcf981d15f5da15] Merge branch 'master' into net-next
>> git bisect good 8c75f69411bc8c3affe5d173afcf981d15f5da15
>> # bad: [27c523e209ab956ff269afec68c6e744e7f5edb6] utils: Introduce get_addr_rta() and inet_addr_match_rta()
>> git bisect bad 27c523e209ab956ff269afec68c6e744e7f5edb6
>> # bad: [d0bcedd549566a87354aa804df3be6be80681ee9] tc: introduce tc_qdisc_block_exists helper
>> git bisect bad d0bcedd549566a87354aa804df3be6be80681ee9
>> # bad: [6c4b672738acf680ee98c10e79a52a8dede5f9a6] iplink_geneve: Get rid of inet_get_addr()
>> git bisect bad 6c4b672738acf680ee98c10e79a52a8dede5f9a6
>> # bad: [93fa12418dc6f5943692250244be303bb162175b] utils: Always specify family and ->bytelen in get_prefix_1()
>> git bisect bad 93fa12418dc6f5943692250244be303bb162175b
>> # good: [f2522007d8fee924cb098b4afc8af16f2b25829f] utils: Always specify family for address in get_addr_1()
>> git bisect good f2522007d8fee924cb098b4afc8af16f2b25829f
>> # first bad commit: [93fa12418dc6f5943692250244be303bb162175b] utils: Always specify family and ->bytelen in get_prefix_1()
>
>> From 93fa12418dc6f5943692250244be303bb162175b Mon Sep 17 00:00:00 2001
>> From: Serhey Popovych
>> Date: Thu, 18 Jan 2018 20:13:43 +0200
>> Subject: utils: Always specify family and ->bytelen in get_prefix_1()
>>
>> Handle default/all/any special case in get_addr_1() to setup
>> ->family and ->bytelen correctly.
>>
>> Make get_addr_1() return ->bitlen == -2 instead of -1 to
>> distinguish default/all/any special case from the rest:
>> it is safe because all callers check ->bitlen < 0, not
>> explicit value -1.
>>
>> Reduce intendation by one level and get rid of goto/label
>> to make code more readable.
>>
>> Signed-off-by: Serhey Popovych
>> Signed-off-by: David Ahern
>
> https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=93fa12418dc6f5943692250244be303bb162175b
>
> So was this an intended behavior change? I.e. this will require
> updates for various user space tools/network configuration scripts
> which are relying on ip utilities feature to auto-detect inet family
> which was "supported" (at least working) until 4.16.0...
>
>
Not intentional. Serhey please take a look
^ permalink raw reply
* Re: [PATCH v5 05/14] PCI: Add pcie_print_link_status() to log link speed and whether it's limited
From: Jakub Kicinski @ 2018-04-13 4:32 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Tal Gilboa, Tariq Toukan, Jacob Keller, Ariel Elior,
Ganesh Goudar, Jeff Kirsher, everest-linux-l2, intel-wired-lan,
netdev, linux-kernel, linux-pci
In-Reply-To: <152244391852.135666.14903825998610307052.stgit@bhelgaas-glaptop.roam.corp.google.com>
On Fri, 30 Mar 2018 16:05:18 -0500, Bjorn Helgaas wrote:
> + if (bw_avail >= bw_cap)
> + pci_info(dev, "%d Mb/s available bandwidth (%s x%d link)\n",
> + bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
> + else
> + pci_info(dev, "%d Mb/s available bandwidth, limited by %s x%d link at %s (capable of %d Mb/s with %s x%d link)\n",
> + bw_avail, PCIE_SPEED2STR(speed), width,
> + limiting_dev ? pci_name(limiting_dev) : "<unknown>",
> + bw_cap, PCIE_SPEED2STR(speed_cap), width_cap);
I was just looking at using this new function to print PCIe BW for a
NIC, but I'm slightly worried that there is nothing in the message that
says PCIe... For a NIC some people may interpret the bandwidth as NIC
bandwidth:
[ 39.839989] nfp 0000:04:00.0: Netronome Flow Processor NFP4000/NFP6000 PCIe Card Probe
[ 39.848943] nfp 0000:04:00.0: 63.008 Gb/s available bandwidth (8 GT/s x8 link)
[ 39.857146] nfp 0000:04:00.0: RESERVED BARs: 0.0: General/MSI-X SRAM, 0.1: PCIe XPB/MSI-X PBA, 0.4: Explicit0, 0.5: Explicit1, fre4
It's not a 63Gbps NIC... I'm sorry if this was discussed before and I
didn't find it. Would it make sense to add the "PCIe: " prefix to the
message like bnx2x used to do? Like:
nfp 0000:04:00.0: PCIe: 63.008 Gb/s available bandwidth (8 GT/s x8 link)
Sorry for a very late comment.
^ permalink raw reply
* Re: [RFC v2] virtio: support packed ring
From: Jason Wang @ 2018-04-13 4:30 UTC (permalink / raw)
To: Tiwei Bie, mst, wexu, virtualization, linux-kernel, netdev; +Cc: jfreimann
In-Reply-To: <20180401141216.8969-1-tiwei.bie@intel.com>
On 2018年04月01日 22:12, Tiwei Bie wrote:
> Hello everyone,
>
> This RFC implements packed ring support for virtio driver.
>
> The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
> by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
> Minor changes are needed for the vhost code, e.g. to kick the guest.
>
> TODO:
> - Refinements and bug fixes;
> - Split into small patches;
> - Test indirect descriptor support;
> - Test/fix event suppression support;
> - Test devices other than net;
>
> RFC v1 -> RFC v2:
> - Add indirect descriptor support - compile test only;
> - Add event suppression supprt - compile test only;
> - Move vring_packed_init() out of uapi (Jason, MST);
> - Merge two loops into one in virtqueue_add_packed() (Jason);
> - Split vring_unmap_one() for packed ring and split ring (Jason);
> - Avoid using '%' operator (Jason);
> - Rename free_head -> next_avail_idx (Jason);
> - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> - Some other refinements and bug fixes;
>
> Thanks!
>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
> drivers/virtio/virtio_ring.c | 1094 +++++++++++++++++++++++++++++-------
> include/linux/virtio_ring.h | 8 +-
> include/uapi/linux/virtio_config.h | 12 +-
> include/uapi/linux/virtio_ring.h | 61 ++
> 4 files changed, 980 insertions(+), 195 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71458f493cf8..0515dca34d77 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -58,14 +58,15 @@
>
> struct vring_desc_state {
> void *data; /* Data for callback. */
> - struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> + void *indir_desc; /* Indirect descriptor, if any. */
> + int num; /* Descriptor list length. */
> };
>
> struct vring_virtqueue {
> struct virtqueue vq;
>
> - /* Actual memory layout for this queue */
> - struct vring vring;
> + /* Is this a packed ring? */
> + bool packed;
>
> /* Can we use weak barriers? */
> bool weak_barriers;
> @@ -79,19 +80,45 @@ struct vring_virtqueue {
> /* Host publishes avail event idx */
> bool event;
>
> - /* Head of free buffer list. */
> - unsigned int free_head;
> /* Number we've added since last sync. */
> unsigned int num_added;
>
> /* Last used index we've seen. */
> u16 last_used_idx;
>
> - /* Last written value to avail->flags */
> - u16 avail_flags_shadow;
> + union {
> + /* Available for split ring */
> + struct {
> + /* Actual memory layout for this queue. */
> + struct vring vring;
>
> - /* Last written value to avail->idx in guest byte order */
> - u16 avail_idx_shadow;
> + /* Head of free buffer list. */
> + unsigned int free_head;
> +
> + /* Last written value to avail->flags */
> + u16 avail_flags_shadow;
> +
> + /* Last written value to avail->idx in
> + * guest byte order. */
> + u16 avail_idx_shadow;
> + };
> +
> + /* Available for packed ring */
> + struct {
> + /* Actual memory layout for this queue. */
> + struct vring_packed vring_packed;
> +
> + /* Driver ring wrap counter. */
> + u8 wrap_counter;
> +
> + /* Index of the next avail descriptor. */
> + unsigned int next_avail_idx;
> +
> + /* Last written value to driver->flags in
> + * guest byte order. */
> + u16 event_flags_shadow;
> + };
> + };
>
> /* How to notify other side. FIXME: commonalize hcalls! */
> bool (*notify)(struct virtqueue *vq);
> @@ -201,8 +228,33 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
> cpu_addr, size, direction);
> }
>
> -static void vring_unmap_one(const struct vring_virtqueue *vq,
> - struct vring_desc *desc)
> +static void vring_unmap_one_split(const struct vring_virtqueue *vq,
> + struct vring_desc *desc)
> +{
> + u16 flags;
> +
> + if (!vring_use_dma_api(vq->vq.vdev))
> + return;
> +
> + flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +
> + if (flags & VRING_DESC_F_INDIRECT) {
> + dma_unmap_single(vring_dma_dev(vq),
> + virtio64_to_cpu(vq->vq.vdev, desc->addr),
> + virtio32_to_cpu(vq->vq.vdev, desc->len),
> + (flags & VRING_DESC_F_WRITE) ?
> + DMA_FROM_DEVICE : DMA_TO_DEVICE);
> + } else {
> + dma_unmap_page(vring_dma_dev(vq),
> + virtio64_to_cpu(vq->vq.vdev, desc->addr),
> + virtio32_to_cpu(vq->vq.vdev, desc->len),
> + (flags & VRING_DESC_F_WRITE) ?
> + DMA_FROM_DEVICE : DMA_TO_DEVICE);
> + }
> +}
> +
> +static void vring_unmap_one_packed(const struct vring_virtqueue *vq,
> + struct vring_packed_desc *desc)
> {
> u16 flags;
>
> @@ -235,8 +287,9 @@ static int vring_mapping_error(const struct vring_virtqueue *vq,
> return dma_mapping_error(vring_dma_dev(vq), addr);
> }
>
> -static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> - unsigned int total_sg, gfp_t gfp)
> +static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> + unsigned int total_sg,
> + gfp_t gfp)
> {
> struct vring_desc *desc;
> unsigned int i;
> @@ -257,14 +310,32 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> return desc;
> }
>
> -static inline int virtqueue_add(struct virtqueue *_vq,
> - struct scatterlist *sgs[],
> - unsigned int total_sg,
> - unsigned int out_sgs,
> - unsigned int in_sgs,
> - void *data,
> - void *ctx,
> - gfp_t gfp)
> +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> + unsigned int total_sg,
> + gfp_t gfp)
> +{
> + struct vring_packed_desc *desc;
> +
> + /*
> + * We require lowmem mappings for the descriptors because
> + * otherwise virt_to_phys will give us bogus addresses in the
> + * virtqueue.
> + */
> + gfp &= ~__GFP_HIGHMEM;
> +
> + desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
Can we simply check vq->packed here to avoid duplicating helpers?
> +
> + return desc;
> +}
> +
> +static inline int virtqueue_add_split(struct virtqueue *_vq,
> + struct scatterlist *sgs[],
> + unsigned int total_sg,
> + unsigned int out_sgs,
> + unsigned int in_sgs,
> + void *data,
> + void *ctx,
> + gfp_t gfp)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> struct scatterlist *sg;
> @@ -303,7 +374,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> /* If the host supports indirect descriptor tables, and we have multiple
> * buffers, then go indirect. FIXME: tune this threshold */
> if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> - desc = alloc_indirect(_vq, total_sg, gfp);
> + desc = alloc_indirect_split(_vq, total_sg, gfp);
> else {
> desc = NULL;
> WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> @@ -424,7 +495,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> for (n = 0; n < total_sg; n++) {
> if (i == err_idx)
> break;
> - vring_unmap_one(vq, &desc[i]);
> + vring_unmap_one_split(vq, &desc[i]);
> i = virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next);
> }
>
> @@ -435,6 +506,210 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> return -EIO;
> }
>
> +static inline int virtqueue_add_packed(struct virtqueue *_vq,
> + struct scatterlist *sgs[],
> + unsigned int total_sg,
> + unsigned int out_sgs,
> + unsigned int in_sgs,
> + void *data,
> + void *ctx,
> + gfp_t gfp)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + struct vring_packed_desc *desc;
> + struct scatterlist *sg;
> + unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> + __virtio16 uninitialized_var(head_flags), flags;
> + int head, wrap_counter;
> + bool indirect;
> +
> + START_USE(vq);
> +
> + BUG_ON(data == NULL);
> + BUG_ON(ctx && vq->indirect);
> +
> + if (unlikely(vq->broken)) {
> + END_USE(vq);
> + return -EIO;
> + }
> +
> +#ifdef DEBUG
> + {
> + ktime_t now = ktime_get();
> +
> + /* No kick or get, with .1 second between? Warn. */
> + if (vq->last_add_time_valid)
> + WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> + > 100);
> + vq->last_add_time = now;
> + vq->last_add_time_valid = true;
> + }
> +#endif
> +
> + BUG_ON(total_sg == 0);
> +
> + head = vq->next_avail_idx;
> + wrap_counter = vq->wrap_counter;
> +
> + /* If the host supports indirect descriptor tables, and we have multiple
> + * buffers, then go indirect. FIXME: tune this threshold */
> + if (vq->indirect && total_sg > 1 && vq->vq.num_free)
Let's introduce a helper like virtqueue_need_indirect() to avoid
duplicating codes and FIXME.
> + desc = alloc_indirect_packed(_vq, total_sg, gfp);
> + else {
> + desc = NULL;
> + WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
> + }
> +
> + if (desc) {
> + /* Use a single buffer which doesn't continue */
> + indirect = true;
> + /* Set up rest to use this indirect table. */
> + i = 0;
> + descs_used = 1;
> + } else {
> + indirect = false;
> + desc = vq->vring_packed.desc;
> + i = head;
> + descs_used = total_sg;
> + }
> +
> + if (vq->vq.num_free < descs_used) {
> + pr_debug("Can't add buf len %i - avail = %i\n",
> + descs_used, vq->vq.num_free);
> + /* FIXME: for historical reasons, we force a notify here if
> + * there are outgoing parts to the buffer. Presumably the
> + * host should service the ring ASAP. */
> + if (out_sgs)
> + vq->notify(&vq->vq);
> + if (indirect)
> + kfree(desc);
> + END_USE(vq);
> + return -ENOSPC;
> + }
> +
> + for (n = 0; n < out_sgs + in_sgs; n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> + DMA_TO_DEVICE : DMA_FROM_DEVICE);
> + if (vring_mapping_error(vq, addr))
> + goto unmap_release;
> +
> + flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> + (n < out_sgs ? 0 : VRING_DESC_F_WRITE) |
> + VRING_DESC_F_AVAIL(vq->wrap_counter) |
> + VRING_DESC_F_USED(!vq->wrap_counter));
> + if (!indirect && i == head)
> + head_flags = flags;
> + else
> + desc[i].flags = flags;
> +
> + desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> + desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> + desc[i].id = cpu_to_virtio32(_vq->vdev, head);
Similar to V1, we only need this for the last descriptor.
> + prev = i;
It looks to me there's no need to track prev inside the loop here.
> + i++;
> + if (!indirect && i >= vq->vring_packed.num) {
> + i = 0;
> + vq->wrap_counter ^= 1;
> + }
> + }
> + }
> + /* Last one doesn't continue. */
> + if (total_sg == 1)
> + head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> + else
> + desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
The only case when prev != i - 1 is i == 0, we can add a if here.
> +
> + if (indirect) {
> + /* Now that the indirect table is filled in, map it. */
> + dma_addr_t addr = vring_map_single(
> + vq, desc, total_sg * sizeof(struct vring_packed_desc),
> + DMA_TO_DEVICE);
> + if (vring_mapping_error(vq, addr))
> + goto unmap_release;
> +
> + head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> + VRING_DESC_F_AVAIL(wrap_counter) |
> + VRING_DESC_F_USED(!wrap_counter));
> + vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
> + vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> + total_sg * sizeof(struct vring_packed_desc));
> + vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
> + }
> +
> + /* We're using some buffers from the free list. */
> + vq->vq.num_free -= descs_used;
> +
> + /* Update free pointer */
> + if (indirect) {
> + n = head + 1;
> + if (n >= vq->vring_packed.num) {
> + n = 0;
> + vq->wrap_counter ^= 1;
> + }
> + vq->next_avail_idx = n;
> + } else
> + vq->next_avail_idx = i;
> +
> + /* Store token and indirect buffer state. */
> + vq->desc_state[head].num = descs_used;
> + vq->desc_state[head].data = data;
> + if (indirect)
> + vq->desc_state[head].indir_desc = desc;
> + else
> + vq->desc_state[head].indir_desc = ctx;
> +
> + /* A driver MUST NOT make the first descriptor in the list
> + * available before all subsequent descriptors comprising
> + * the list are made available. */
> + virtio_wmb(vq->weak_barriers);
> + vq->vring_packed.desc[head].flags = head_flags;
> + vq->num_added++;
> +
> + pr_debug("Added buffer head %i to %p\n", head, vq);
> + END_USE(vq);
> +
> + return 0;
> +
> +unmap_release:
> + err_idx = i;
> + i = head;
> +
> + for (n = 0; n < total_sg; n++) {
> + if (i == err_idx)
> + break;
> + vring_unmap_one_packed(vq, &desc[i]);
> + i++;
> + if (!indirect && i >= vq->vring_packed.num)
> + i = 0;
> + }
> +
> + vq->wrap_counter = wrap_counter;
> +
> + if (indirect)
> + kfree(desc);
> +
> + END_USE(vq);
> + return -EIO;
> +}
> +
> +static inline int virtqueue_add(struct virtqueue *_vq,
> + struct scatterlist *sgs[],
> + unsigned int total_sg,
> + unsigned int out_sgs,
> + unsigned int in_sgs,
> + void *data,
> + void *ctx,
> + gfp_t gfp)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> +
> + return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
> + in_sgs, data, ctx, gfp) :
> + virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
> + in_sgs, data, ctx, gfp);
> +}
> +
> /**
> * virtqueue_add_sgs - expose buffers to other end
> * @vq: the struct virtqueue we're talking about.
> @@ -537,18 +812,7 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
> }
> EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
>
> -/**
> - * virtqueue_kick_prepare - first half of split virtqueue_kick call.
> - * @vq: the struct virtqueue
> - *
> - * Instead of virtqueue_kick(), you can do:
> - * if (virtqueue_kick_prepare(vq))
> - * virtqueue_notify(vq);
> - *
> - * This is sometimes useful because the virtqueue_kick_prepare() needs
> - * to be serialized, but the actual virtqueue_notify() call does not.
> - */
> -bool virtqueue_kick_prepare(struct virtqueue *_vq)
> +static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> u16 new, old;
> @@ -580,6 +844,62 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
> END_USE(vq);
> return needs_kick;
> }
> +
> +static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 new, old, off_wrap;
> + bool needs_kick;
> +
> + START_USE(vq);
> + /* We need to expose the new flags value before checking notification
> + * suppressions. */
> + virtio_mb(vq->weak_barriers);
> +
> + old = vq->next_avail_idx - vq->num_added;
> + new = vq->next_avail_idx;
> + vq->num_added = 0;
> +
> +#ifdef DEBUG
> + if (vq->last_add_time_valid) {
> + WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> + vq->last_add_time)) > 100);
> + }
> + vq->last_add_time_valid = false;
> +#endif
> +
> + off_wrap = virtio16_to_cpu(_vq->vdev, vq->vring_packed.device->off_wrap);
> +
> + if (vq->event) {
It looks to me we should examine RING_EVENT_FLAGS_DESC in
desc_event_flags instead of vq->event here. Spec does not forces to use
evenf_off and event_wrap if event index is enabled.
> + // FIXME: fix this!
> + needs_kick = ((off_wrap >> 15) == vq->wrap_counter) &&
> + vring_need_event(off_wrap & ~(1<<15), new, old);
Why need a & here?
> + } else {
Need a smp_rmb() to make sure desc_event_flags was checked before flags.
> + needs_kick = (vq->vring_packed.device->flags !=
> + cpu_to_virtio16(_vq->vdev, VRING_EVENT_F_DISABLE));
> + }
> + END_USE(vq);
> + return needs_kick;
> +}
> +
> +/**
> + * virtqueue_kick_prepare - first half of split virtqueue_kick call.
> + * @vq: the struct virtqueue
> + *
> + * Instead of virtqueue_kick(), you can do:
> + * if (virtqueue_kick_prepare(vq))
> + * virtqueue_notify(vq);
> + *
> + * This is sometimes useful because the virtqueue_kick_prepare() needs
> + * to be serialized, but the actual virtqueue_notify() call does not.
> + */
> +bool virtqueue_kick_prepare(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> +
> + return vq->packed ? virtqueue_kick_prepare_packed(_vq) :
> + virtqueue_kick_prepare_split(_vq);
> +}
> EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
>
> /**
> @@ -626,8 +946,8 @@ bool virtqueue_kick(struct virtqueue *vq)
> }
> EXPORT_SYMBOL_GPL(virtqueue_kick);
>
> -static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> - void **ctx)
> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> + void **ctx)
> {
> unsigned int i, j;
> __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> @@ -639,12 +959,12 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> i = head;
>
> while (vq->vring.desc[i].flags & nextflag) {
> - vring_unmap_one(vq, &vq->vring.desc[i]);
> + vring_unmap_one_split(vq, &vq->vring.desc[i]);
> i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next);
> vq->vq.num_free++;
> }
>
> - vring_unmap_one(vq, &vq->vring.desc[i]);
> + vring_unmap_one_split(vq, &vq->vring.desc[i]);
> vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
> vq->free_head = head;
>
> @@ -666,7 +986,7 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>
> for (j = 0; j < len / sizeof(struct vring_desc); j++)
> - vring_unmap_one(vq, &indir_desc[j]);
> + vring_unmap_one_split(vq, &indir_desc[j]);
>
> kfree(indir_desc);
> vq->desc_state[head].indir_desc = NULL;
> @@ -675,11 +995,207 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> }
> }
>
> -static inline bool more_used(const struct vring_virtqueue *vq)
> +static int detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> + void **ctx)
> +{
> + struct vring_packed_desc *desc;
> + unsigned int i, j;
> +
> + /* Clear data ptr. */
> + vq->desc_state[head].data = NULL;
> +
> + i = head;
> +
> + for (j = 0; j < vq->desc_state[head].num; j++) {
> + desc = &vq->vring_packed.desc[i];
> + vring_unmap_one_packed(vq, desc);
> + desc->flags = 0x0;
Looks like this is unnecessary.
> + i++;
> + if (i >= vq->vring_packed.num)
> + i = 0;
> + }
> +
> + vq->vq.num_free += vq->desc_state[head].num;
> +
> + if (vq->indirect) {
> + u32 len;
> +
> + desc = vq->desc_state[head].indir_desc;
> + /* Free the indirect table, if any, now that it's unmapped. */
> + if (!desc)
> + goto out;
> +
> + len = virtio32_to_cpu(vq->vq.vdev,
> + vq->vring_packed.desc[head].len);
> +
> + BUG_ON(!(vq->vring_packed.desc[head].flags &
> + cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> + BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> +
> + for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
> + vring_unmap_one_packed(vq, &desc[j]);
> +
> + kfree(desc);
> + vq->desc_state[head].indir_desc = NULL;
> + } else if (ctx) {
> + *ctx = vq->desc_state[head].indir_desc;
> + }
> +
> +out:
> + return vq->desc_state[head].num;
> +}
> +
> +static inline bool more_used_split(const struct vring_virtqueue *vq)
> {
> return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
> }
>
> +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> +{
> + u16 last_used, flags;
> + bool avail, used;
> +
> + if (vq->vq.num_free == vq->vring_packed.num)
> + return false;
> +
> + last_used = vq->last_used_idx;
> + flags = virtio16_to_cpu(vq->vq.vdev,
> + vq->vring_packed.desc[last_used].flags);
> + avail = flags & VRING_DESC_F_AVAIL(1);
> + used = flags & VRING_DESC_F_USED(1);
> +
> + return avail == used;
> +}
> +
> +static inline bool more_used(const struct vring_virtqueue *vq)
> +{
> + return vq->packed ? more_used_packed(vq) : more_used_split(vq);
> +}
> +
> +void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, unsigned int *len,
> + void **ctx)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + void *ret;
> + unsigned int i;
> + u16 last_used;
> +
> + START_USE(vq);
> +
> + if (unlikely(vq->broken)) {
> + END_USE(vq);
> + return NULL;
> + }
> +
> + if (!more_used(vq)) {
> + pr_debug("No more buffers in queue\n");
> + END_USE(vq);
> + return NULL;
> + }
> +
> + /* Only get used array entries after they have been exposed by host. */
> + virtio_rmb(vq->weak_barriers);
> +
> + last_used = (vq->last_used_idx & (vq->vring.num - 1));
> + i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
> + *len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
> +
> + if (unlikely(i >= vq->vring.num)) {
> + BAD_RING(vq, "id %u out of range\n", i);
> + return NULL;
> + }
> + if (unlikely(!vq->desc_state[i].data)) {
> + BAD_RING(vq, "id %u is not a head!\n", i);
> + return NULL;
> + }
> +
> + /* detach_buf_split clears data, so grab it now. */
> + ret = vq->desc_state[i].data;
> + detach_buf_split(vq, i, ctx);
> + vq->last_used_idx++;
> + /* If we expect an interrupt for the next entry, tell host
> + * by writing event index and flush out the write before
> + * the read in the next get_buf call. */
> + if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> + virtio_store_mb(vq->weak_barriers,
> + &vring_used_event(&vq->vring),
> + cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
> +
> +#ifdef DEBUG
> + vq->last_add_time_valid = false;
> +#endif
> +
> + END_USE(vq);
> + return ret;
> +}
> +
> +void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, unsigned int *len,
> + void **ctx)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + uint16_t wrap_counter;
> + void *ret;
> + unsigned int i;
> + u16 last_used;
> +
> + START_USE(vq);
> +
> + if (unlikely(vq->broken)) {
> + END_USE(vq);
> + return NULL;
> + }
> +
> + if (!more_used(vq)) {
> + pr_debug("No more buffers in queue\n");
> + END_USE(vq);
> + return NULL;
> + }
> +
> + /* Only get used elements after they have been exposed by host. */
> + virtio_rmb(vq->weak_barriers);
> +
> + last_used = vq->last_used_idx;
> + i = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
> + *len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
> +
> + if (unlikely(i >= vq->vring_packed.num)) {
> + BAD_RING(vq, "id %u out of range\n", i);
> + return NULL;
> + }
> + if (unlikely(!vq->desc_state[i].data)) {
> + BAD_RING(vq, "id %u is not a head!\n", i);
> + return NULL;
> + }
> +
> + /* detach_buf_packed clears data, so grab it now. */
> + ret = vq->desc_state[i].data;
> + detach_buf_packed(vq, i, ctx);
> +
> + vq->last_used_idx += vq->desc_state[i].num;
> + if (vq->last_used_idx >= vq->vring_packed.num)
> + vq->last_used_idx -= vq->vring_packed.num;
> +
> + wrap_counter = vq->wrap_counter;
> + if (vq->last_used_idx > vq->next_avail_idx)
> + wrap_counter ^= 1;
> +
> + /* If we expect an interrupt for the next entry, tell host
> + * by writing event index and flush out the write before
> + * the read in the next get_buf call. */
> + if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
> + virtio_store_mb(vq->weak_barriers,
> + &vq->vring_packed.driver->off_wrap,
> + cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
> + wrap_counter << 15));
> +
> +#ifdef DEBUG
> + vq->last_add_time_valid = false;
> +#endif
> +
> + END_USE(vq);
> + return ret;
> +}
> +
> /**
> * virtqueue_get_buf - get the next used buffer
> * @vq: the struct virtqueue we're talking about.
> @@ -700,57 +1216,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> void **ctx)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - void *ret;
> - unsigned int i;
> - u16 last_used;
>
> - START_USE(vq);
> -
> - if (unlikely(vq->broken)) {
> - END_USE(vq);
> - return NULL;
> - }
> -
> - if (!more_used(vq)) {
> - pr_debug("No more buffers in queue\n");
> - END_USE(vq);
> - return NULL;
> - }
> -
> - /* Only get used array entries after they have been exposed by host. */
> - virtio_rmb(vq->weak_barriers);
> -
> - last_used = (vq->last_used_idx & (vq->vring.num - 1));
> - i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
> - *len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
> -
> - if (unlikely(i >= vq->vring.num)) {
> - BAD_RING(vq, "id %u out of range\n", i);
> - return NULL;
> - }
> - if (unlikely(!vq->desc_state[i].data)) {
> - BAD_RING(vq, "id %u is not a head!\n", i);
> - return NULL;
> - }
> -
> - /* detach_buf clears data, so grab it now. */
> - ret = vq->desc_state[i].data;
> - detach_buf(vq, i, ctx);
> - vq->last_used_idx++;
> - /* If we expect an interrupt for the next entry, tell host
> - * by writing event index and flush out the write before
> - * the read in the next get_buf call. */
> - if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> - virtio_store_mb(vq->weak_barriers,
> - &vring_used_event(&vq->vring),
> - cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
> -
> -#ifdef DEBUG
> - vq->last_add_time_valid = false;
> -#endif
> -
> - END_USE(vq);
> - return ret;
> + return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> + virtqueue_get_buf_ctx_split(_vq, len, ctx);
> }
> EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
>
> @@ -759,6 +1227,29 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> return virtqueue_get_buf_ctx(_vq, len, NULL);
> }
> EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> +
> +static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> +
> + if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> + vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> + if (!vq->event)
> + vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> + }
> +}
> +
> +static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> +
> + if (vq->event_flags_shadow != VRING_EVENT_F_DISABLE) {
> + vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
> + vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> + vq->event_flags_shadow);
> + }
> +}
> +
> /**
> * virtqueue_disable_cb - disable callbacks
> * @vq: the struct virtqueue we're talking about.
> @@ -772,15 +1263,66 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
>
> - if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> - vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> - if (!vq->event)
> - vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> - }
> -
> + if (vq->packed)
> + virtqueue_disable_cb_packed(_vq);
> + else
> + virtqueue_disable_cb_split(_vq);
> }
> EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
>
> +static unsigned virtqueue_enable_cb_prepare_split(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 last_used_idx;
> +
> + START_USE(vq);
> +
> + /* We optimistically turn back on interrupts, then check if there was
> + * more to do. */
> + /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> + * either clear the flags bit or point the event index at the next
> + * entry. Always do both to keep code simple. */
> + if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> + vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> + if (!vq->event)
> + vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> + }
> + vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
> + END_USE(vq);
> + return last_used_idx;
> +}
> +
> +static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 last_used_idx, wrap_counter, off_wrap;
> +
> + START_USE(vq);
> +
> + last_used_idx = vq->last_used_idx;
> + wrap_counter = vq->wrap_counter;
> +
> + if (last_used_idx > vq->next_avail_idx)
> + wrap_counter ^= 1;
> +
> + off_wrap = last_used_idx | (wrap_counter << 15);
> +
> + /* We optimistically turn back on interrupts, then check if there was
> + * more to do. */
> + /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> + * either clear the flags bit or point the event index at the next
> + * entry. Always do both to keep code simple. */
> + if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC:
> + VRING_EVENT_F_ENABLE;
> + vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> + vq->event_flags_shadow);
> + }
A smp_wmb() is missed here?
> + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev, off_wrap);
And according to the spec, it looks to me write a VRING_EVENT_F_ENABLE
is sufficient here.
> + END_USE(vq);
> + return last_used_idx;
> +}
> +
> /**
> * virtqueue_enable_cb_prepare - restart callbacks after disable_cb
> * @vq: the struct virtqueue we're talking about.
> @@ -796,26 +1338,34 @@ EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
> unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - u16 last_used_idx;
>
> - START_USE(vq);
> -
> - /* We optimistically turn back on interrupts, then check if there was
> - * more to do. */
> - /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> - * either clear the flags bit or point the event index at the next
> - * entry. Always do both to keep code simple. */
> - if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> - vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> - if (!vq->event)
> - vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> - }
> - vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
> - END_USE(vq);
> - return last_used_idx;
> + return vq->packed ? virtqueue_enable_cb_prepare_packed(_vq) :
> + virtqueue_enable_cb_prepare_split(_vq);
> }
> EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>
> +static bool virtqueue_poll_split(struct virtqueue *_vq, unsigned last_used_idx)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> +
> + virtio_mb(vq->weak_barriers);
> + return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
> +}
> +
> +static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + bool avail, used;
> + u16 flags;
> +
> + virtio_mb(vq->weak_barriers);
> + flags = virtio16_to_cpu(vq->vq.vdev,
> + vq->vring_packed.desc[last_used_idx].flags);
> + avail = flags & VRING_DESC_F_AVAIL(1);
> + used = flags & VRING_DESC_F_USED(1);
> + return avail == used;
> +}
> +
> /**
> * virtqueue_poll - query pending used buffers
> * @vq: the struct virtqueue we're talking about.
> @@ -829,8 +1379,8 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
>
> - virtio_mb(vq->weak_barriers);
> - return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
> + return vq->packed ? virtqueue_poll_packed(_vq, last_used_idx) :
> + virtqueue_poll_split(_vq, last_used_idx);
> }
> EXPORT_SYMBOL_GPL(virtqueue_poll);
>
> @@ -852,6 +1402,83 @@ bool virtqueue_enable_cb(struct virtqueue *_vq)
> }
> EXPORT_SYMBOL_GPL(virtqueue_enable_cb);
>
> +static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 bufs;
> +
> + START_USE(vq);
> +
> + /* We optimistically turn back on interrupts, then check if there was
> + * more to do. */
> + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> + * either clear the flags bit or point the event index at the next
> + * entry. Always update the event index to keep code simple. */
> + if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> + vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> + if (!vq->event)
> + vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> + }
> + /* TODO: tune this threshold */
> + bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
> +
> + virtio_store_mb(vq->weak_barriers,
> + &vring_used_event(&vq->vring),
> + cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
> +
> + if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
> + END_USE(vq);
> + return false;
> + }
> +
> + END_USE(vq);
> + return true;
> +}
> +
> +static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 bufs, off_wrap, used_idx, wrap_counter;
> +
> + START_USE(vq);
> +
> + /* We optimistically turn back on interrupts, then check if there was
> + * more to do. */
> + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> + * either clear the flags bit or point the event index at the next
> + * entry. Always update the event index to keep code simple. */
> + if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC:
> + VRING_EVENT_F_ENABLE;
> + vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> + vq->event_flags_shadow);
> + }
> +
> + /* TODO: tune this threshold */
> + bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
> +
> + used_idx = vq->last_used_idx + bufs;
> + if (used_idx >= vq->vring_packed.num)
> + used_idx -= vq->vring_packed.num;
> +
> + wrap_counter = vq->wrap_counter;
> + if (used_idx > vq->next_avail_idx)
> + wrap_counter ^= 1;
> +
> + off_wrap = used_idx | (wrap_counter << 15);
> +
> + virtio_store_mb(vq->weak_barriers, &vq->vring_packed.driver->off_wrap,
> + cpu_to_virtio16(_vq->vdev, off_wrap));
> +
> + if (more_used_packed(vq)) {
> + END_USE(vq);
> + return false;
> + }
> +
> + END_USE(vq);
> + return true;
> +}
> +
> /**
> * virtqueue_enable_cb_delayed - restart callbacks after disable_cb.
> * @vq: the struct virtqueue we're talking about.
> @@ -868,37 +1495,69 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb);
> bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - u16 bufs;
>
> - START_USE(vq);
> -
> - /* We optimistically turn back on interrupts, then check if there was
> - * more to do. */
> - /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> - * either clear the flags bit or point the event index at the next
> - * entry. Always update the event index to keep code simple. */
> - if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> - vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> - if (!vq->event)
> - vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> - }
> - /* TODO: tune this threshold */
> - bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
> -
> - virtio_store_mb(vq->weak_barriers,
> - &vring_used_event(&vq->vring),
> - cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
> -
> - if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
> - END_USE(vq);
> - return false;
> - }
> -
> - END_USE(vq);
> - return true;
> + return vq->packed ? virtqueue_enable_cb_delayed_packed(_vq) :
> + virtqueue_enable_cb_delayed_split(_vq);
> }
> EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
>
> +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + unsigned int i;
> + void *buf;
> +
> + START_USE(vq);
> +
> + for (i = 0; i < vq->vring.num; i++) {
> + if (!vq->desc_state[i].data)
> + continue;
> + /* detach_buf clears data, so grab it now. */
> + buf = vq->desc_state[i].data;
> + detach_buf_split(vq, i, NULL);
> + vq->avail_idx_shadow--;
> + vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
> + END_USE(vq);
> + return buf;
> + }
> + /* That should have freed everything. */
> + BUG_ON(vq->vq.num_free != vq->vring.num);
> +
> + END_USE(vq);
> + return NULL;
> +}
> +
> +static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + unsigned int i, num;
> + void *buf;
> +
> + START_USE(vq);
> +
> + for (i = 0; i < vq->vring_packed.num; i++) {
> + if (!vq->desc_state[i].data)
> + continue;
> + /* detach_buf clears data, so grab it now. */
> + buf = vq->desc_state[i].data;
> + num = detach_buf_packed(vq, i, NULL);
> + if (vq->next_avail_idx < num) {
> + vq->next_avail_idx = vq->vring_packed.num -
> + (num - vq->next_avail_idx);
> + vq->wrap_counter ^= 1;
> + } else {
> + vq->next_avail_idx -= num;
> + }
> + END_USE(vq);
> + return buf;
> + }
> + /* That should have freed everything. */
> + BUG_ON(vq->vq.num_free != vq->vring_packed.num);
> +
> + END_USE(vq);
> + return NULL;
> +}
> +
> /**
> * virtqueue_detach_unused_buf - detach first unused buffer
> * @vq: the struct virtqueue we're talking about.
> @@ -910,27 +1569,9 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
> void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - unsigned int i;
> - void *buf;
>
> - START_USE(vq);
> -
> - for (i = 0; i < vq->vring.num; i++) {
> - if (!vq->desc_state[i].data)
> - continue;
> - /* detach_buf clears data, so grab it now. */
> - buf = vq->desc_state[i].data;
> - detach_buf(vq, i, NULL);
> - vq->avail_idx_shadow--;
> - vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
> - END_USE(vq);
> - return buf;
> - }
> - /* That should have freed everything. */
> - BUG_ON(vq->vq.num_free != vq->vring.num);
> -
> - END_USE(vq);
> - return NULL;
> + return vq->packed ? virtqueue_detach_unused_buf_packed(_vq) :
> + virtqueue_detach_unused_buf_split(_vq);
> }
> EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>
> @@ -955,7 +1596,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> EXPORT_SYMBOL_GPL(vring_interrupt);
>
> struct virtqueue *__vring_new_virtqueue(unsigned int index,
> - struct vring vring,
> + union vring_union vring,
> + bool packed,
> struct virtio_device *vdev,
> bool weak_barriers,
> bool context,
> @@ -963,19 +1605,20 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> void (*callback)(struct virtqueue *),
> const char *name)
> {
> - unsigned int i;
> + unsigned int num, i;
> struct vring_virtqueue *vq;
>
> - vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
> + num = packed ? vring.vring_packed.num : vring.vring_split.num;
> +
> + vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state),
> GFP_KERNEL);
> if (!vq)
> return NULL;
>
> - vq->vring = vring;
> vq->vq.callback = callback;
> vq->vq.vdev = vdev;
> vq->vq.name = name;
> - vq->vq.num_free = vring.num;
> + vq->vq.num_free = num;
> vq->vq.index = index;
> vq->we_own_ring = false;
> vq->queue_dma_addr = 0;
> @@ -984,9 +1627,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> vq->weak_barriers = weak_barriers;
> vq->broken = false;
> vq->last_used_idx = 0;
> - vq->avail_flags_shadow = 0;
> - vq->avail_idx_shadow = 0;
> vq->num_added = 0;
> + vq->packed = packed;
> list_add_tail(&vq->vq.list, &vdev->vqs);
> #ifdef DEBUG
> vq->in_use = false;
> @@ -997,18 +1639,37 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> !context;
> vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
>
> + if (vq->packed) {
> + vq->vring_packed = vring.vring_packed;
> + vq->next_avail_idx = 0;
> + vq->wrap_counter = 1;
> + vq->event_flags_shadow = 0;
> + } else {
> + vq->vring = vring.vring_split;
> + vq->avail_flags_shadow = 0;
> + vq->avail_idx_shadow = 0;
> +
> + /* Put everything in free lists. */
> + vq->free_head = 0;
> + for (i = 0; i < num-1; i++)
> + vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> + }
> +
> /* No callback? Tell other side not to bother us. */
> if (!callback) {
> - vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> - if (!vq->event)
> - vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
> + if (packed) {
> + vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
> + vq->vring_packed.driver->flags = cpu_to_virtio16(vdev,
> + vq->event_flags_shadow);
> + } else {
> + vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> + if (!vq->event)
> + vq->vring.avail->flags = cpu_to_virtio16(vdev,
> + vq->avail_flags_shadow);
> + }
> }
>
> - /* Put everything in free lists. */
> - vq->free_head = 0;
> - for (i = 0; i < vring.num-1; i++)
> - vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> - memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
> + memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
>
> return &vq->vq;
> }
> @@ -1056,6 +1717,22 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
> }
> }
>
> +static inline int
> +__vring_size(unsigned int num, unsigned long align, bool packed)
> +{
> + return packed ? vring_packed_size(num, align) : vring_size(num, align);
> +}
> +
> +static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> + void *p, unsigned long align)
> +{
> + vr->num = num;
> + vr->desc = p;
> + vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
> + * num + align - 1) & ~(align - 1));
> + vr->device = vr->driver + 1;
> +}
> +
> struct virtqueue *vring_create_virtqueue(
> unsigned int index,
> unsigned int num,
> @@ -1072,7 +1749,8 @@ struct virtqueue *vring_create_virtqueue(
> void *queue = NULL;
> dma_addr_t dma_addr;
> size_t queue_size_in_bytes;
> - struct vring vring;
> + union vring_union vring;
> + bool packed;
>
> /* We assume num is a power of 2. */
> if (num & (num - 1)) {
> @@ -1080,9 +1758,13 @@ struct virtqueue *vring_create_virtqueue(
> return NULL;
> }
>
> + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> +
> /* TODO: allocate each queue chunk individually */
> - for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
> - queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> + for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
> + num /= 2) {
> + queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> + packed),
> &dma_addr,
> GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
> if (queue)
> @@ -1094,17 +1776,21 @@ struct virtqueue *vring_create_virtqueue(
>
> if (!queue) {
> /* Try to get a single page. You are my only hope! */
> - queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> + queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> + packed),
> &dma_addr, GFP_KERNEL|__GFP_ZERO);
> }
> if (!queue)
> return NULL;
>
> - queue_size_in_bytes = vring_size(num, vring_align);
> - vring_init(&vring, num, queue, vring_align);
> + queue_size_in_bytes = __vring_size(num, vring_align, packed);
> + if (packed)
> + vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> + else
> + vring_init(&vring.vring_split, num, queue, vring_align);
>
> - vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> - notify, callback, name);
> + vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> + context, notify, callback, name);
> if (!vq) {
> vring_free_queue(vdev, queue_size_in_bytes, queue,
> dma_addr);
> @@ -1130,10 +1816,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> void (*callback)(struct virtqueue *vq),
> const char *name)
> {
> - struct vring vring;
> - vring_init(&vring, num, pages, vring_align);
> - return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> - notify, callback, name);
> + union vring_union vring;
> + bool packed;
> +
> + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> + if (packed)
> + vring_packed_init(&vring.vring_packed, num, pages, vring_align);
> + else
> + vring_init(&vring.vring_split, num, pages, vring_align);
> +
> + return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> + context, notify, callback, name);
> }
> EXPORT_SYMBOL_GPL(vring_new_virtqueue);
>
> @@ -1143,7 +1836,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>
> if (vq->we_own_ring) {
> vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> - vq->vring.desc, vq->queue_dma_addr);
> + vq->packed ? (void *)vq->vring_packed.desc :
> + (void *)vq->vring.desc,
> + vq->queue_dma_addr);
> }
> list_del(&_vq->list);
> kfree(vq);
> @@ -1157,14 +1852,18 @@ void vring_transport_features(struct virtio_device *vdev)
>
> for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
> switch (i) {
> - case VIRTIO_RING_F_INDIRECT_DESC:
> +#if 0
> + case VIRTIO_RING_F_INDIRECT_DESC: // FIXME not tested yet.
> break;
> - case VIRTIO_RING_F_EVENT_IDX:
> + case VIRTIO_RING_F_EVENT_IDX: // FIXME probably not work.
> break;
> +#endif
It would be better if you can split EVENT_IDX and INDIRECT_DESC into
separate patches too.
Thanks
> case VIRTIO_F_VERSION_1:
> break;
> case VIRTIO_F_IOMMU_PLATFORM:
> break;
> + case VIRTIO_F_RING_PACKED:
> + break;
> default:
> /* We don't understand this bit. */
> __virtio_clear_bit(vdev, i);
> @@ -1185,7 +1884,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
>
> struct vring_virtqueue *vq = to_vvq(_vq);
>
> - return vq->vring.num;
> + return vq->packed ? vq->vring_packed.num : vq->vring.num;
> }
> EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
>
> @@ -1228,6 +1927,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
>
> BUG_ON(!vq->we_own_ring);
>
> + if (vq->packed)
> + return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
> + (char *)vq->vring_packed.desc);
> +
> return vq->queue_dma_addr +
> ((char *)vq->vring.avail - (char *)vq->vring.desc);
> }
> @@ -1239,11 +1942,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
>
> BUG_ON(!vq->we_own_ring);
>
> + if (vq->packed)
> + return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
> + (char *)vq->vring_packed.desc);
> +
> return vq->queue_dma_addr +
> ((char *)vq->vring.used - (char *)vq->vring.desc);
> }
> EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
>
> +/* Only available for split ring */
> const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> {
> return &to_vvq(vq)->vring;
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> index bbf32524ab27..a0075894ad16 100644
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
> struct virtio_device;
> struct virtqueue;
>
> +union vring_union {
> + struct vring vring_split;
> + struct vring_packed vring_packed;
> +};
> +
> /*
> * Creates a virtqueue and allocates the descriptor ring. If
> * may_reduce_num is set, then this may allocate a smaller ring than
> @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
>
> /* Creates a virtqueue with a custom layout. */
> struct virtqueue *__vring_new_virtqueue(unsigned int index,
> - struct vring vring,
> + union vring_union vring,
> + bool packed,
> struct virtio_device *vdev,
> bool weak_barriers,
> bool ctx,
> diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> index 308e2096291f..a6e392325e3a 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -49,7 +49,7 @@
> * transport being used (eg. virtio_ring), the rest are per-device feature
> * bits. */
> #define VIRTIO_TRANSPORT_F_START 28
> -#define VIRTIO_TRANSPORT_F_END 34
> +#define VIRTIO_TRANSPORT_F_END 36
>
> #ifndef VIRTIO_CONFIG_NO_LEGACY
> /* Do we get callbacks when the ring is completely used, even if we've
> @@ -71,4 +71,14 @@
> * this is for compatibility with legacy systems.
> */
> #define VIRTIO_F_IOMMU_PLATFORM 33
> +
> +/* This feature indicates support for the packed virtqueue layout. */
> +#define VIRTIO_F_RING_PACKED 34
> +
> +/*
> + * This feature indicates that all buffers are used by the device
> + * in the same order in which they have been made available.
> + */
> +#define VIRTIO_F_IN_ORDER 35
> +
> #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> index 6d5d5faa989b..735d4207c988 100644
> --- a/include/uapi/linux/virtio_ring.h
> +++ b/include/uapi/linux/virtio_ring.h
> @@ -44,6 +44,9 @@
> /* This means the buffer contains a list of buffer descriptors. */
> #define VRING_DESC_F_INDIRECT 4
>
> +#define VRING_DESC_F_AVAIL(b) ((b) << 7)
> +#define VRING_DESC_F_USED(b) ((b) << 15)
> +
> /* The Host uses this in used->flags to advise the Guest: don't kick me when
> * you add a buffer. It's unreliable, so it's simply an optimization. Guest
> * will still kick if it's out of buffers. */
> @@ -53,6 +56,10 @@
> * optimization. */
> #define VRING_AVAIL_F_NO_INTERRUPT 1
>
> +#define VRING_EVENT_F_ENABLE 0x0
> +#define VRING_EVENT_F_DISABLE 0x1
> +#define VRING_EVENT_F_DESC 0x2
> +
> /* We support indirect buffer descriptors */
> #define VIRTIO_RING_F_INDIRECT_DESC 28
>
> @@ -171,4 +178,58 @@ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
> return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
> }
>
> +struct vring_packed_desc_event {
> + /* __virtio16 off : 15; // Descriptor Event Offset
> + * __virtio16 wrap : 1; // Descriptor Event Wrap Counter */
> + __virtio16 off_wrap;
> + /* __virtio16 flags : 2; // Descriptor Event Flags */
> + __virtio16 flags;
> +};
> +
> +struct vring_packed_desc {
> + /* Buffer Address. */
> + __virtio64 addr;
> + /* Buffer Length. */
> + __virtio32 len;
> + /* Buffer ID. */
> + __virtio16 id;
> + /* The flags depending on descriptor type. */
> + __virtio16 flags;
> +};
> +
> +struct vring_packed {
> + unsigned int num;
> +
> + struct vring_packed_desc *desc;
> +
> + struct vring_packed_desc_event *driver;
> +
> + struct vring_packed_desc_event *device;
> +};
> +
> +/* The standard layout for the packed ring is a continuous chunk of memory
> + * which looks like this.
> + *
> + * struct vring_packed
> + * {
> + * // The actual descriptors (16 bytes each)
> + * struct vring_packed_desc desc[num];
> + *
> + * // Padding to the next align boundary.
> + * char pad[];
> + *
> + * // Driver Event Suppression
> + * struct vring_packed_desc_event driver;
> + *
> + * // Device Event Suppression
> + * struct vring_packed_desc_event device;
> + * };
> + */
> +
> +static inline unsigned vring_packed_size(unsigned int num, unsigned long align)
> +{
> + return ((sizeof(struct vring_packed_desc) * num + align - 1)
> + & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> +}
> +
> #endif /* _UAPI_LINUX_VIRTIO_RING_H */
^ permalink raw reply
* Re: [PATCH] iscsi: respond to netlink with unicast when appropriate
From: Martin K. Petersen @ 2018-04-13 2:35 UTC (permalink / raw)
To: Chris Leech
Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
Lee Duncan, open-iscsi-/JYPxA39Uh5TLH3MbocFFw
In-Reply-To: <20180409221528.67880-1-cleech-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Chris,
> Instead of always multicasting responses, send a unicast netlink message
> directed at the correct pid. This will be needed if we ever want to
> support multiple userspace processes interacting with the kernel over
> iSCSI netlink simultaneously. Limitations can currently be seen if you
> attempt to run multiple iscsistart commands in parallel.
>
> We've fixed up the userspace issues in iscsistart that prevented
> multiple instances from running, so now attempts to speed up booting by
> bringing up multiple iscsi sessions at once in the initramfs are just
> running into misrouted responses that this fixes.
Applied to 4.17/scsi-fixes. Thanks!
--
Martin K. Petersen Oracle Linux Engineering
--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply
* Re: v6/sit tunnels and VRFs
From: David Ahern @ 2018-04-13 2:25 UTC (permalink / raw)
To: Jeff Barnhill; +Cc: netdev
In-Reply-To: <CAL6e_peeeGNhXr1rxF5k+N_THGMyEOa8pt7JCLB_PyFfK2PXOg@mail.gmail.com>
On 4/12/18 10:54 AM, Jeff Barnhill wrote:
> Hi David,
>
> In the slides referenced, you recommend adding an "unreachable
> default" route to the end of each VRF route table. In my testing (for
> v4) this results in a change to fib lookup failures such that instead
> of ENETUNREACH being returned, EHOSTUNREACH is returned since the fib
> finds the unreachable route, versus failing to find a route
> altogether.
>
> Have the implications of this been considered? I don't see a
> clean/easy way to achieve the old behavior without affecting non-VRF
> routing (eg. remove the unreachable route and delete the non-VRF
> rules). I'm guessing that programmatically, it may not make much
> difference, ie. lookup fails, but for debugging or to a user looking
> at it, the difference matters. Do you (or anyone else) have any
> thoughts on this?
We have recommended moving the local table down in the FIB rules:
# ip ru ls
1000: from all lookup [l3mdev-table]
32765: from all lookup local
32766: from all lookup main
32767: from all lookup default
and adding a default route to VRF tables:
# ip ro ls vrf red
unreachable default metric 4278198272
172.16.2.0/24 proto bgp metric 20
nexthop via 169.254.0.1 dev swp3 weight 1 onlink
nexthop via 169.254.0.1 dev swp4 weight 1 onlink
# ip -6 ro ls vrf red
2001:db8:2::/64 proto bgp metric 20
nexthop via fe80::202:ff:fe00:e dev swp3 weight 1
nexthop via fe80::202:ff:fe00:f dev swp4 weight 1
anycast fe80:: dev lo proto kernel metric 0 pref medium
anycast fe80:: dev lo proto kernel metric 0 pref medium
fe80::/64 dev swp3 proto kernel metric 256 pref medium
fe80::/64 dev swp4 proto kernel metric 256 pref medium
ff00::/8 dev swp3 metric 256 pref medium
ff00::/8 dev swp4 metric 256 pref medium
unreachable default dev lo metric 4278198272 error -101 pref medium
Over the last 2 years we have not seen any negative side effects to this
and is what you want to have proper VRF separation.
Without a default route lookups will proceed to the next fib rule which
means a lookup in the next table and barring other PBR rules will be the
main table. It will lead to wrong lookups.
Here is an example:
ip netns add foo
ip netns exec foo bash
ip li set lo up
ip li add red type vrf table 123
ip li set red up
ip li add dummy1 type dummy
ip addr add 10.100.1.1/24 dev dummy1
ip li set dummy1 master red
ip li set dummy1 up
ip li add dummy2 type dummy
ip addr add 10.100.1.1/24 dev dummy2
ip li set dummy2 up
ip ro get 10.100.2.2
ip ro get 10.100.2.2 vrf red
# ip ru ls
0: from all lookup local
1000: from all lookup [l3mdev-table]
32766: from all lookup main
32767: from all lookup default
# ip ro ls
10.100.1.0/24 dev dummy2 proto kernel scope link src 10.100.1.1
10.100.2.0/24 via 10.100.1.2 dev dummy2
# ip ro ls vrf red
10.100.1.0/24 dev dummy1 proto kernel scope link src 10.100.1.1
That's the setup. What happens on route lookups?
# ip ro get vrf red 10.100.2.1
10.100.2.1 via 10.100.1.2 dev dummy2 src 10.100.1.1 uid 0
cache
which is clearly wrong. Let's look at the lookup sequence
# perf record -e fib:* ip ro get vrf red 10.100.2.1
10.100.2.1 via 10.100.1.2 dev dummy2 src 10.100.1.1 uid 0
cache
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (4 samples) ]
# perf script --fields trace:trace
table 255 oif 2 iif 1 src 0.0.0.0 dst 10.100.2.1 tos 0 scope 0 flags 4
table 123 oif 2 iif 1 src 0.0.0.0 dst 10.100.2.1 tos 0 scope 0 flags 4
table 254 oif 2 iif 1 src 0.0.0.0 dst 10.100.2.1 tos 0 scope 0 flags 4
nexthop dev dummy2 oif 4 src 10.100.1.1
The first one is because I did not move the local table down.
The second one is the correct vrf lookup
The third one is the continuation to the next table - the main table.
Adding a default route:
# ip ro add vrf red unreachable default
And the lookup is proper:
# ip ro get vrf red 10.100.2.1
RTNETLINK answers: No route to host
^ permalink raw reply
* Re: [PATCH net] net: dsa: mv88e6xxx: Fix receive time stamp race condition.
From: David Miller @ 2018-04-13 2:06 UTC (permalink / raw)
To: richardcochran; +Cc: netdev, linux-kernel, andrew, f.fainelli, vivien.didelot
In-Reply-To: <20180412173540.rrjx5n62guchety2@localhost>
From: Richard Cochran <richardcochran@gmail.com>
Date: Thu, 12 Apr 2018 10:35:40 -0700
> On Mon, Apr 09, 2018 at 07:19:31AM -0700, Richard Cochran wrote:
>> Dave, please hold off on this patch. I am seeing new problems in my
>> testing with this applied. I still need to get to the bottom of
>> this.
>
> Looks like the new problems are a HW/board glitch.
>
> The patch is good to go.
Ok, applied and queued up for -stable.
Thanks.
^ permalink raw reply
* Re: [PATCH v2 net] net: fix deadlock while clearing neighbor proxy table
From: David Miller @ 2018-04-13 2:02 UTC (permalink / raw)
To: w.bumiller; +Cc: netdev, yoshfuji
In-Reply-To: <20180412084655.12607-1-w.bumiller@proxmox.com>
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
Date: Thu, 12 Apr 2018 10:46:55 +0200
> When coming from ndisc_netdev_event() in net/ipv6/ndisc.c,
> neigh_ifdown() is called with &nd_tbl, locking this while
> clearing the proxy neighbor entries when eg. deleting an
> interface. Calling the table's pndisc_destructor() with the
> lock still held, however, can cause a deadlock: When a
> multicast listener is available an IGMP packet of type
> ICMPV6_MGM_REDUCTION may be sent out. When reaching
> ip6_finish_output2(), if no neighbor entry for the target
> address is found, __neigh_create() is called with &nd_tbl,
> which it'll want to lock.
>
> Move the elements into their own list, then unlock the table
> and perform the destruction.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199289
> Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
> Changes to v1:
> * Renamed 'pneigh_ifdown' to 'pneigh_ifdown_and_unlock'.
Applied and queued up for -stable.
> Btw. I'm not actually sure how much sense the Fixes tag makes as the
> commit itself isn't wrong, it just happens to be the most easily
> triggerable code path there (and I can't definitively rule out others,
> given that the "sending something over the network with the lock held
> will deadlock" comment at the top of the file is also from the initial
> commit I'd expect other backtraces to be possible from out of that
> function) and the other affected lines are mostly the initial git
> commit...
Understood.
^ permalink raw reply
* Re: [PATCHv2 net] sctp: do not check port in sctp_inet6_cmp_addr
From: David Miller @ 2018-04-13 2:01 UTC (permalink / raw)
To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <7220f314b70726ba4afee91df11e9ae8d1962e01.1523514271.git.lucien.xin@gmail.com>
From: Xin Long <lucien.xin@gmail.com>
Date: Thu, 12 Apr 2018 14:24:31 +0800
> pf->cmp_addr() is called before binding a v6 address to the sock. It
> should not check ports, like in sctp_inet_cmp_addr.
>
> But sctp_inet6_cmp_addr checks the addr by invoking af(6)->cmp_addr,
> sctp_v6_cmp_addr where it also compares the ports.
>
> This would cause that setsockopt(SCTP_SOCKOPT_BINDX_ADD) could bind
> multiple duplicated IPv6 addresses after Commit 40b4f0fd74e4 ("sctp:
> lack the check for ports in sctp_v6_cmp_addr").
>
> This patch is to remove af->cmp_addr called in sctp_inet6_cmp_addr,
> but do the proper check for both v6 addrs and v4mapped addrs.
>
> v1->v2:
> - define __sctp_v6_cmp_addr to do the common address comparison
> used for both pf and af v6 cmp_addr.
>
> Fixes: 40b4f0fd74e4 ("sctp: lack the check for ports in sctp_v6_cmp_addr")
> Reported-by: Jianwen Ji <jiji@redhat.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH net 0/4] nfp: improve signal handing on FW waits and flower control message processing
From: David Miller @ 2018-04-13 2:00 UTC (permalink / raw)
To: jakub.kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20180411234738.6766-1-jakub.kicinski@netronome.com>
From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Wed, 11 Apr 2018 16:47:34 -0700
> The first part of this set aims to improve handling of interrupted
> waits. Patch 1 makes waiting for management FW responses
> uninterruptible while patch 2 adds a message when signal arrives
> while waiting for an NFP mutex. We can't interrupt execution of
> FW commands so uninterruptible sleep seems reasonable there.
> Exiting a wait for a mutex should be clean and have no side affects
> so we are allowing to abort it. Note that both waits have rather
> large timeouts (tens of seconds).
>
> Patches 3 and 4 improve flower offload operation under heavy load.
> Currently there is no cap on the number of queued FW notifications.
> Some of the notifications have to be processed from a workqueue
> which may lead to very large number of messages getting queued
> if workqueue never gets a chance to run. Pieter puts a limit
> on number of queued messages, tries to drop some messages we ignore
> without queuing and process more important messages first.
Series applied, thanks Jakub.
^ permalink raw reply
* Re: [net 1/1] tipc: fix missing initializer in tipc_sendmsg()
From: David Miller @ 2018-04-13 1:56 UTC (permalink / raw)
To: jon.maloy
Cc: netdev, mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen,
hoang.h.le, canh.d.luu, ying.xue, tipc-discussion
In-Reply-To: <1523488548-28520-1-git-send-email-jon.maloy@ericsson.com>
From: Jon Maloy <jon.maloy@ericsson.com>
Date: Thu, 12 Apr 2018 01:15:48 +0200
> The stack variable 'dnode' in __tipc_sendmsg() may theoretically
> end up tipc_node_get_mtu() as an unitilalized variable.
>
> We fix this by intializing the variable at declaration. We also add
> a default else clause to the two conditional ones already there, so
> that we never end up in the named function if the given address
> type is illegal.
>
> Reported-by: syzbot+b0975ce9355b347c1546@syzkaller.appspotmail.com
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Applied, thanks Jon.
^ permalink raw reply
* Re: [PATCH net] strparser: Fix incorrect strp->need_bytes value.
From: David Miller @ 2018-04-13 1:56 UTC (permalink / raw)
To: doronrk; +Cc: netdev
In-Reply-To: <20180411220516.3062277-1-doronrk@fb.com>
From: Doron Roberts-Kedes <doronrk@fb.com>
Date: Wed, 11 Apr 2018 15:05:16 -0700
> strp_data_ready resets strp->need_bytes to 0 if strp_peek_len indicates
> that the remainder of the message has been received. However,
> do_strp_work does not reset strp->need_bytes to 0. If do_strp_work
> completes a partial message, the value of strp->need_bytes will continue
> to reflect the needed bytes of the previous message, causing
> future invocations of strp_data_ready to return early if
> strp->need_bytes is less than strp_peek_len. Resetting strp->need_bytes
> to 0 in __strp_recv on handing a full message to the upper layer solves
> this problem.
>
> __strp_recv also calculates strp->need_bytes using stm->accum_len before
> stm->accum_len has been incremented by cand_len. This can cause
> strp->need_bytes to be equal to the full length of the message instead
> of the full length minus the accumulated length. This, in turn, causes
> strp_data_ready to return early, even when there is sufficient data to
> complete the partial message. Incrementing stm->accum_len before using
> it to calculate strp->need_bytes solves this problem.
>
> Found while testing net/tls_sw recv path.
>
> Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH] selftests: net: add in_netns.sh to TEST_PROGS
From: David Miller @ 2018-04-13 1:53 UTC (permalink / raw)
To: anders.roxell; +Cc: shuah, netdev, linux-kselftest, linux-kernel
In-Reply-To: <20180411151734.15507-1-anders.roxell@linaro.org>
From: Anders Roxell <anders.roxell@linaro.org>
Date: Wed, 11 Apr 2018 17:17:34 +0200
> Script in_netns.sh isn't installed.
> --------------------
> running psock_fanout test
> --------------------
> ./run_afpackettests: line 12: ./in_netns.sh: No such file or directory
> [FAIL]
> --------------------
> running psock_tpacket test
> --------------------
> ./run_afpackettests: line 22: ./in_netns.sh: No such file or directory
> [FAIL]
>
> In current code added in_netns.sh to be installed.
>
> Fixes: cc30c93fa020 ("selftests/net: ignore background traffic in psock_fanout")
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH v2 net 0/2] ibmvnic: Fix parameter change request handling
From: David Miller @ 2018-04-13 1:52 UTC (permalink / raw)
To: nfont; +Cc: netdev, jallen, tlfalcon
In-Reply-To: <152345930946.41899.8546547036273488202.stgit@ltcalpine2-lp14.aus.stglabs.ibm.com>
From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Date: Wed, 11 Apr 2018 10:09:26 -0500
> When updating parameters for the ibmvnic driver there is a possibility
> of entering an infinite loop if a return value other that a partial
> success is received from sending the login CRQ.
>
> Also, a deadlock can occur on the rtnl lock if netdev_notify_peers()
> is called during driver reset for a parameter change reset.
>
> This patch set corrects both of these issues by updating the return
> code handling in ibmvnic_login() nand gaurding against calling
> netdev_notify_peers() for parameter change requests.
>
> -Nathan
>
> Updates for V2: Correct spelling mistakes in commit messages.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH net] net: validate attribute sizes in neigh_dump_table()
From: David Miller @ 2018-04-13 1:50 UTC (permalink / raw)
To: edumazet; +Cc: netdev, eric.dumazet, dsa
In-Reply-To: <20180411214600.203361-1-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 11 Apr 2018 14:46:00 -0700
> Since neigh_dump_table() calls nlmsg_parse() without giving policy
> constraints, attributes can have arbirary size that we must validate
>
> Reported by syzbot/KMSAN :
...
> Fixes: 21fdd092acc7 ("net: Add support for filtering neigh dump by master device")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: David Ahern <dsa@cumulusnetworks.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH net] tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets
From: David Miller @ 2018-04-13 1:50 UTC (permalink / raw)
To: edumazet; +Cc: netdev, eric.dumazet
In-Reply-To: <20180411213628.194344-1-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 11 Apr 2018 14:36:28 -0700
> syzbot/KMSAN reported an uninit-value in tcp_parse_options() [1]
>
> I believe this was caused by a TCP_MD5SIG being set on live
> flow.
>
> This is highly unexpected, since TCP option space is limited.
>
> For instance, presence of TCP MD5 option automatically disables
> TCP TimeStamp option at SYN/SYNACK time, which we can not do
> once flow has been established.
>
> Really, adding/deleting an MD5 key only makes sense on sockets
> in CLOSE or LISTEN state.
...
> Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [net 1/1] tipc: fix unbalanced reference counter
From: David Miller @ 2018-04-13 1:49 UTC (permalink / raw)
To: jon.maloy
Cc: netdev, mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen,
hoang.h.le, canh.d.luu, ying.xue, tipc-discussion
In-Reply-To: <1523479929-28161-1-git-send-email-jon.maloy@ericsson.com>
From: Jon Maloy <jon.maloy@ericsson.com>
Date: Wed, 11 Apr 2018 22:52:09 +0200
> When a topology subscription is created, we may encounter (or KASAN
> may provoke) a failure to create a corresponding service instance in
> the binding table. Instead of letting the tipc_nametbl_subscribe()
> report the failure back to the caller, the function just makes a warning
> printout and returns, without incrementing the subscription reference
> counter as expected by the caller.
>
> This makes the caller believe that the subscription was successful, so
> it will at a later moment try to unsubscribe the item. This involves
> a sub_put() call. Since the reference counter never was incremented
> in the first place, we get a premature delete of the subscription item,
> followed by a "use-after-free" warning.
>
> We fix this by adding a return value to tipc_nametbl_subscribe() and
> make the caller aware of the failure to subscribe.
>
> This bug seems to always have been around, but this fix only applies
> back to the commit shown below. Given the low risk of this happening
> we believe this to be sufficient.
>
> Fixes: commit 218527fe27ad ("tipc: replace name table service range
> array with rb tree")
> Reported-by: syzbot+aa245f26d42b8305d157@syzkaller.appspotmail.com
>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH] net: stmmac: fix missing support for 802.1AD tag on reception
From: David Miller @ 2018-04-13 1:48 UTC (permalink / raw)
To: EladN; +Cc: netdev
In-Reply-To: <DB5PR07MB15447E832914A7FD15E06A5FA0BD0@DB5PR07MB1544.eurprd07.prod.outlook.com>
From: Elad Nachman <EladN@gilat.com>
Date: Wed, 11 Apr 2018 15:07:40 +0000
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c2018-04-11 17:04:00.586057300 +0300
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c2018-04-11 17:05:33.601992400 +0300
> @@ -3293,17 +3293,19 @@ dma_map_err:
>
> static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
> {
> -struct ethhdr *ehdr;
> +struct vlan_ethhdr *veth;
> u16 vlanid;
> +__be16 vlan_proto;
This patch has been mangled by your email client.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox