Linux virtualization list
 help / color / mirror / Atom feed
* Re: remove function pointer casts and constify function tables
       [not found]               ` <20170526150839.GA4593@fieldses.org>
@ 2017-05-26 15:09                 ` bfields
       [not found]                 ` <20170526150956.GB4593@fieldses.org>
  1 sibling, 0 replies; 9+ messages in thread
From: bfields @ 2017-05-26 15:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-nfs@vger.kernel.org, virtualization,
	anna.schumaker@netapp.com, Trond Myklebust,
	jlayton@poochiereds.net, hch

Probably should have cc'd virtualization@lists.linux-foundation.org too.

On Fri, May 26, 2017 at 11:08:39AM -0400, bfields@fieldses.org wrote:
> On Tue, May 23, 2017 at 08:23:34AM -0400, bfields@fieldses.org wrote:
> > Unfortunately I can't get anything through testing.  It's not your
> > patches, it's something in -rc1.  My server VM stops responding to
> > any network traffic randomly in the middle of a run.  If I log in from a
> > serial console, I see the interface is up and everything looks OK.  I
> > haven't had the chance to do much more, and I'm not sure where to
> > start....  I started a git-bisect attempt, but there are several
> > unrelated problems, and I'm not sure this one is 100% reproduceable.
> 
> It looks like it may be due to something pulled in with virtio updates.
> I've reproduced the problem on c8b0d7290657 "s390/virtio: change
> maintainership" but not on v4.11.  Are there any known issues with those
> commits?
> 
> I've just been doing this long-running bisect while working on other
> stuff.  My reproducer (basically just running a bunch of NFS
> connectathon tests over a variety of protocol versions and security
> flavors) doesn't hit the bug reliably, and I've had to restart a couple
> times probably due to false negatives.  But this looks pretty promising,
> and there's only 17 commits in that range, so I'll keep bisecting.
> 
> --b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: remove function pointer casts and constify function tables
       [not found]                 ` <20170526150956.GB4593@fieldses.org>
@ 2017-05-26 19:31                   ` bfields
       [not found]                   ` <20170526193133.GA9874@fieldses.org>
  1 sibling, 0 replies; 9+ messages in thread
From: bfields @ 2017-05-26 19:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-nfs@vger.kernel.org, virtualization,
	anna.schumaker@netapp.com, Trond Myklebust,
	jlayton@poochiereds.net, hch

Looks like the culprit is very likely d85b758f72b0 "virtio_net: fix
support for small rings".

After that patch, my NFS server VM stops responding to packets after a
few minutes of testing.  Before that patch, my server keeps working.

--b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: remove function pointer casts and constify function tables
       [not found]                   ` <20170526193133.GA9874@fieldses.org>
@ 2017-05-30 16:26                     ` Michael S. Tsirkin
  2017-05-30 16:58                       ` Michael S. Tsirkin
                                         ` (3 more replies)
  2017-05-30 17:03                     ` Michael S. Tsirkin
       [not found]                     ` <20170530200109-mutt-send-email-mst@kernel.org>
  2 siblings, 4 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2017-05-30 16:26 UTC (permalink / raw)
  To: bfields@fieldses.org
  Cc: linux-nfs@vger.kernel.org, virtualization,
	anna.schumaker@netapp.com, Trond Myklebust,
	jlayton@poochiereds.net, hch

On Fri, May 26, 2017 at 03:31:33PM -0400, bfields@fieldses.org wrote:
> Looks like the culprit is very likely d85b758f72b0 "virtio_net: fix
> support for small rings".
> 
> After that patch, my NFS server VM stops responding to packets after a
> few minutes of testing.  Before that patch, my server keeps working.
> 
> --b.

Others complained about that too.
I'm still trying to reproduce though.

Meanwhile, could you please locate this line of code:
+               vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);

and add something like
        printk(KERN_ERR, "min buf = 0x%x expected 0x%x size 0x%x big %d\n",
               vi->rq[i].min_buf_len, GOOD_PACKET_LEN,
               virtqueue_get_vring_size(vi->rq[i].vq),
               (int)vi->big_packets);

after it?
Then boot and capture the output.

Thanks!


-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: remove function pointer casts and constify function tables
  2017-05-30 16:26                     ` Michael S. Tsirkin
@ 2017-05-30 16:58                       ` Michael S. Tsirkin
       [not found]                       ` <20170530195716-mutt-send-email-mst@kernel.org>
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2017-05-30 16:58 UTC (permalink / raw)
  To: bfields@fieldses.org
  Cc: linux-nfs@vger.kernel.org, virtualization,
	anna.schumaker@netapp.com, Trond Myklebust,
	jlayton@poochiereds.net, hch

On Tue, May 30, 2017 at 07:26:37PM +0300, Michael S. Tsirkin wrote:
> On Fri, May 26, 2017 at 03:31:33PM -0400, bfields@fieldses.org wrote:
> > Looks like the culprit is very likely d85b758f72b0 "virtio_net: fix
> > support for small rings".
> > 
> > After that patch, my NFS server VM stops responding to packets after a
> > few minutes of testing.  Before that patch, my server keeps working.
> > 
> > --b.
> 
> Others complained about that too.
> I'm still trying to reproduce though.
> 
> Meanwhile, could you please locate this line of code:
> +               vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
> 
> and add something like
>         printk(KERN_ERR, "min buf = 0x%x expected 0x%x size 0x%x big %d\n",
>                vi->rq[i].min_buf_len, GOOD_PACKET_LEN,
>                virtqueue_get_vring_size(vi->rq[i].vq),
>                (int)vi->big_packets);
> 
> after it?
> Then boot and capture the output.
> 
> Thanks!
> 

Also, can you pls print the mergeable_rx_buffer_size attribute from
sysfs for this device?


> -- 
> MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: remove function pointer casts and constify function tables
       [not found]                   ` <20170526193133.GA9874@fieldses.org>
  2017-05-30 16:26                     ` Michael S. Tsirkin
@ 2017-05-30 17:03                     ` Michael S. Tsirkin
       [not found]                     ` <20170530200109-mutt-send-email-mst@kernel.org>
  2 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2017-05-30 17:03 UTC (permalink / raw)
  To: bfields@fieldses.org
  Cc: linux-nfs@vger.kernel.org, virtualization,
	anna.schumaker@netapp.com, Trond Myklebust,
	jlayton@poochiereds.net, hch

On Fri, May 26, 2017 at 03:31:33PM -0400, bfields@fieldses.org wrote:
> Looks like the culprit is very likely d85b758f72b0 "virtio_net: fix
> support for small rings".
> 
> After that patch, my NFS server VM stops responding to packets after a
> few minutes of testing.  Before that patch, my server keeps working.
> 
> --b.


So I think I know what caused this: looks like some hypervisors
aren't prepared to deal with a situation where packet size
becomes very small.

But which hypervisors exactly? I'd like to know in order to detect these
and decide whether I blacklist bad ones or whitelist known-good ones.

Thanks!

-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: remove function pointer casts and constify function tables
       [not found]                       ` <20170530195716-mutt-send-email-mst@kernel.org>
@ 2017-05-31 20:57                         ` bfields
  0 siblings, 0 replies; 9+ messages in thread
From: bfields @ 2017-05-31 20:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-nfs@vger.kernel.org, virtualization,
	anna.schumaker@netapp.com, Trond Myklebust,
	jlayton@poochiereds.net, hch

On Tue, May 30, 2017 at 07:58:06PM +0300, Michael S. Tsirkin wrote:
> On Tue, May 30, 2017 at 07:26:37PM +0300, Michael S. Tsirkin wrote:
> > On Fri, May 26, 2017 at 03:31:33PM -0400, bfields@fieldses.org wrote:
> > > Looks like the culprit is very likely d85b758f72b0 "virtio_net: fix
> > > support for small rings".
> > > 
> > > After that patch, my NFS server VM stops responding to packets after a
> > > few minutes of testing.  Before that patch, my server keeps working.
> > > 
> > > --b.
> > 
> > Others complained about that too.
> > I'm still trying to reproduce though.
> > 
> > Meanwhile, could you please locate this line of code:
> > +               vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
> > 
> > and add something like
> >         printk(KERN_ERR, "min buf = 0x%x expected 0x%x size 0x%x big %d\n",
> >                vi->rq[i].min_buf_len, GOOD_PACKET_LEN,
> >                virtqueue_get_vring_size(vi->rq[i].vq),
> >                (int)vi->big_packets);
> > 
> > after it?
> > Then boot and capture the output.
> > 
> > Thanks!
> > 
> 
> Also, can you pls print the mergeable_rx_buffer_size attribute from
> sysfs for this device?

On 4.12-rc1:

	# cat
	# /sys/devices/pci0000:00/0000:00:03.0/virtio0/net/eth0/queues/rx-0/virtio_net/mergeable_rx_buffer_size
	320

--b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: remove function pointer casts and constify function tables
       [not found]                     ` <20170530200109-mutt-send-email-mst@kernel.org>
@ 2017-05-31 21:00                       ` bfields
  0 siblings, 0 replies; 9+ messages in thread
From: bfields @ 2017-05-31 21:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-nfs@vger.kernel.org, virtualization,
	anna.schumaker@netapp.com, Trond Myklebust,
	jlayton@poochiereds.net, hch

On Tue, May 30, 2017 at 08:03:12PM +0300, Michael S. Tsirkin wrote:
> On Fri, May 26, 2017 at 03:31:33PM -0400, bfields@fieldses.org wrote:
> > Looks like the culprit is very likely d85b758f72b0 "virtio_net: fix
> > support for small rings".
> > 
> > After that patch, my NFS server VM stops responding to packets after a
> > few minutes of testing.  Before that patch, my server keeps working.
> > 
> > --b.
> 
> 
> So I think I know what caused this: looks like some hypervisors
> aren't prepared to deal with a situation where packet size
> becomes very small.
> 
> But which hypervisors exactly? I'd like to know in order to detect these
> and decide whether I blacklist bad ones or whitelist known-good ones.

I'm running this under KVM on a Fedora 25 host
(4.10.15-200.fc25.x86_64).  Let me know if any more details about the
setup would be useful.

--b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: remove function pointer casts and constify function tables
  2017-05-30 16:26                     ` Michael S. Tsirkin
  2017-05-30 16:58                       ` Michael S. Tsirkin
       [not found]                       ` <20170530195716-mutt-send-email-mst@kernel.org>
@ 2017-05-31 21:09                       ` bfields
       [not found]                       ` <20170531210923.GF23526@fieldses.org>
  3 siblings, 0 replies; 9+ messages in thread
From: bfields @ 2017-05-31 21:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-nfs@vger.kernel.org, virtualization,
	anna.schumaker@netapp.com, Trond Myklebust,
	jlayton@poochiereds.net, hch

On Tue, May 30, 2017 at 07:26:37PM +0300, Michael S. Tsirkin wrote:
> On Fri, May 26, 2017 at 03:31:33PM -0400, bfields@fieldses.org wrote:
> > Looks like the culprit is very likely d85b758f72b0 "virtio_net: fix
> > support for small rings".
> > 
> > After that patch, my NFS server VM stops responding to packets after a
> > few minutes of testing.  Before that patch, my server keeps working.
> > 
> > --b.
> 
> Others complained about that too.
> I'm still trying to reproduce though.
> 
> Meanwhile, could you please locate this line of code:
> +               vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
> 
> and add something like
>         printk(KERN_ERR, "min buf = 0x%x expected 0x%x size 0x%x big %d\n",
>                vi->rq[i].min_buf_len, GOOD_PACKET_LEN,
>                virtqueue_get_vring_size(vi->rq[i].vq),
>                (int)vi->big_packets);
> 
> after it?
> Then boot and capture the output.

Doesn't look like that code's run on boot; apply the below, boot, and:

 $ dmesg|grep expected

gives no output.

--b.

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9320d96a1632..b10014f7b480 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2212,6 +2212,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
        for (i = 0; i < vi->max_queue_pairs; i++) {
                vi->rq[i].vq = vqs[rxq2vq(i)];
                vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
+               printk(KERN_ERR, "min buf = 0x%x expected 0x%x size 0x%x big %d\n",
+                               vi->rq[i].min_buf_len, GOOD_PACKET_LEN,
+                               virtqueue_get_vring_size(vi->rq[i].vq),
+                               (int)vi->big_packets);
                vi->sq[i].vq = vqs[txq2vq(i)];
        }

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: remove function pointer casts and constify function tables
       [not found]                       ` <20170531210923.GF23526@fieldses.org>
@ 2017-05-31 21:15                         ` bfields
  0 siblings, 0 replies; 9+ messages in thread
From: bfields @ 2017-05-31 21:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-nfs@vger.kernel.org, virtualization,
	anna.schumaker@netapp.com, Trond Myklebust,
	jlayton@poochiereds.net, hch

On Wed, May 31, 2017 at 05:09:23PM -0400, bfields@fieldses.org wrote:
> On Tue, May 30, 2017 at 07:26:37PM +0300, Michael S. Tsirkin wrote:
> > On Fri, May 26, 2017 at 03:31:33PM -0400, bfields@fieldses.org wrote:
> > > Looks like the culprit is very likely d85b758f72b0 "virtio_net: fix
> > > support for small rings".
> > > 
> > > After that patch, my NFS server VM stops responding to packets after a
> > > few minutes of testing.  Before that patch, my server keeps working.
> > > 
> > > --b.
> > 
> > Others complained about that too.
> > I'm still trying to reproduce though.
> > 
> > Meanwhile, could you please locate this line of code:
> > +               vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
> > 
> > and add something like
> >         printk(KERN_ERR, "min buf = 0x%x expected 0x%x size 0x%x big %d\n",
> >                vi->rq[i].min_buf_len, GOOD_PACKET_LEN,
> >                virtqueue_get_vring_size(vi->rq[i].vq),
> >                (int)vi->big_packets);
> > 
> > after it?
> > Then boot and capture the output.
> 
> Doesn't look like that code's run on boot; apply the below, boot, and:

Whoops, no, just a typo in the printk.  Here you go:

	min buf = 0x101 expected 0x5ee size 0x100 big 1

--b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-05-31 21:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20170512161701.22468-1-hch@lst.de>
     [not found] ` <1494620040.19467.1.camel@primarydata.com>
     [not found]   ` <20170513072557.GA14602@lst.de>
     [not found]     ` <1494691819.31377.1.camel@primarydata.com>
     [not found]       ` <20170515152134.GC24547@fieldses.org>
     [not found]         ` <20170515154450.GA18630@lst.de>
     [not found]           ` <20170523081159.GA19216@lst.de>
     [not found]             ` <20170523122334.GA4298@fieldses.org>
     [not found]               ` <20170526150839.GA4593@fieldses.org>
2017-05-26 15:09                 ` remove function pointer casts and constify function tables bfields
     [not found]                 ` <20170526150956.GB4593@fieldses.org>
2017-05-26 19:31                   ` bfields
     [not found]                   ` <20170526193133.GA9874@fieldses.org>
2017-05-30 16:26                     ` Michael S. Tsirkin
2017-05-30 16:58                       ` Michael S. Tsirkin
     [not found]                       ` <20170530195716-mutt-send-email-mst@kernel.org>
2017-05-31 20:57                         ` bfields
2017-05-31 21:09                       ` bfields
     [not found]                       ` <20170531210923.GF23526@fieldses.org>
2017-05-31 21:15                         ` bfields
2017-05-30 17:03                     ` Michael S. Tsirkin
     [not found]                     ` <20170530200109-mutt-send-email-mst@kernel.org>
2017-05-31 21:00                       ` bfields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox