* netfront/netback multiqueue exhausting grants
@ 2016-01-20 12:23 Ian Campbell
2016-01-20 14:40 ` Boris Ostrovsky
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Ian Campbell @ 2016-01-20 12:23 UTC (permalink / raw)
To: xen-devel; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel
There have been a few reports recently[0] which relate to a failure of
netfront to allocate sufficient grant refs for all the queues:
[ 0.533589] xen_netfront: can't alloc rx grant refs
[ 0.533612] net eth0: only created 31 queues
Which can be worked around by increasing the number of grants on the
hypervisor command line or by limiting the number of queues permitted by
either back or front using a module param (which was broken but is now
fixed on both sides, but I'm not sure it has been backported everywhere
such that it is a reliable thing to always tell users as a workaround).
Is there any plan to do anything about the default/out of the box
experience? Either limiting the number of queues or making both ends cope
more gracefully with failure to create some queues (or both) might be
sufficient?
I think the crash after the above in the first link at [0] is fixed? I
think that was the purpose of ca88ea1247df "xen-netfront: update num_queues
to real created" which was in 4.3.
Ian.
[0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html
http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html
some before hte xmas break too IIRC
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: netfront/netback multiqueue exhausting grants 2016-01-20 12:23 netfront/netback multiqueue exhausting grants Ian Campbell @ 2016-01-20 14:40 ` Boris Ostrovsky 2016-01-20 14:52 ` Ian Campbell 2016-01-20 16:18 ` annie li 2016-01-21 10:56 ` David Vrabel 2 siblings, 1 reply; 21+ messages in thread From: Boris Ostrovsky @ 2016-01-20 14:40 UTC (permalink / raw) To: Ian Campbell, xen-devel; +Cc: Wei Liu, David Vrabel On 01/20/2016 07:23 AM, Ian Campbell wrote: > There have been a few reports recently[0] which relate to a failure of > netfront to allocate sufficient grant refs for all the queues: > > [ 0.533589] xen_netfront: can't alloc rx grant refs > [ 0.533612] net eth0: only created 31 queues > > Which can be worked around by increasing the number of grants on the > hypervisor command line or by limiting the number of queues permitted by > either back or front using a module param (which was broken but is now > fixed on both sides, but I'm not sure it has been backported everywhere > such that it is a reliable thing to always tell users as a workaround). > > Is there any plan to do anything about the default/out of the box > experience? Either limiting the number of queues or making both ends cope > more gracefully with failure to create some queues (or both) might be > sufficient? > > I think the crash after the above in the first link at [0] is fixed? I > think that was the purpose of ca88ea1247df "xen-netfront: update num_queues > to real created" which was in 4.3. I think ca88ea1247df is the solution --- it will limit the number of queues. And apparently it's not in stable trees. At least not in 4.1.15, which is what the first reported is running: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/drivers/net/xen-netfront.c?id=refs/tags/v4.1.15 -boris > > Ian. > > [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html > http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html > some before hte xmas break too IIRC ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-20 14:40 ` Boris Ostrovsky @ 2016-01-20 14:52 ` Ian Campbell 2016-01-20 15:02 ` David Vrabel 0 siblings, 1 reply; 21+ messages in thread From: Ian Campbell @ 2016-01-20 14:52 UTC (permalink / raw) To: Boris Ostrovsky, xen-devel; +Cc: Wei Liu, David Vrabel On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote: > On 01/20/2016 07:23 AM, Ian Campbell wrote: > > There have been a few reports recently[0] which relate to a failure of > > netfront to allocate sufficient grant refs for all the queues: > > > > [ 0.533589] xen_netfront: can't alloc rx grant refs > > [ 0.533612] net eth0: only created 31 queues > > > > Which can be worked around by increasing the number of grants on the > > hypervisor command line or by limiting the number of queues permitted > > by > > either back or front using a module param (which was broken but is now > > fixed on both sides, but I'm not sure it has been backported everywhere > > such that it is a reliable thing to always tell users as a workaround). > > > > Is there any plan to do anything about the default/out of the box > > experience? Either limiting the number of queues or making both ends > > cope > > more gracefully with failure to create some queues (or both) might be > > sufficient? > > > > I think the crash after the above in the first link at [0] is fixed? I > > think that was the purpose of ca88ea1247df "xen-netfront: update > > num_queues > > to real created" which was in 4.3. > > I think ca88ea1247df is the solution --- it will limit the number of > queues. That's in 4.4, which the first link at [0] claimed to have tested. I can see this fixing the crash, but does it really fix the "actually works with less queues than it tried to get" issue? In any case having exhausted the grant entries creating queues there aren't any left to shuffle actual data around, is there? (or are those preallocated too?) Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-20 14:52 ` Ian Campbell @ 2016-01-20 15:02 ` David Vrabel 2016-01-20 15:10 ` Boris Ostrovsky 0 siblings, 1 reply; 21+ messages in thread From: David Vrabel @ 2016-01-20 15:02 UTC (permalink / raw) To: Ian Campbell, Boris Ostrovsky, xen-devel; +Cc: Wei Liu, David Vrabel On 20/01/16 14:52, Ian Campbell wrote: > On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote: >> On 01/20/2016 07:23 AM, Ian Campbell wrote: >>> There have been a few reports recently[0] which relate to a failure of >>> netfront to allocate sufficient grant refs for all the queues: >>> >>> [ 0.533589] xen_netfront: can't alloc rx grant refs >>> [ 0.533612] net eth0: only created 31 queues >>> >>> Which can be worked around by increasing the number of grants on the >>> hypervisor command line or by limiting the number of queues permitted >>> by >>> either back or front using a module param (which was broken but is now >>> fixed on both sides, but I'm not sure it has been backported everywhere >>> such that it is a reliable thing to always tell users as a workaround). >>> >>> Is there any plan to do anything about the default/out of the box >>> experience? Either limiting the number of queues or making both ends >>> cope >>> more gracefully with failure to create some queues (or both) might be >>> sufficient? >>> >>> I think the crash after the above in the first link at [0] is fixed? I >>> think that was the purpose of ca88ea1247df "xen-netfront: update >>> num_queues >>> to real created" which was in 4.3. >> >> I think ca88ea1247df is the solution --- it will limit the number of >> queues. > > That's in 4.4, which the first link at [0] claimed to have tested. I can > see this fixing the crash, but does it really fix the "actually works with > less queues than it tried to get" issue? > > In any case having exhausted the grant entries creating queues there aren't > any left to shuffle actual data around, is there? (or are those > preallocated too?) All grants refs for Tx and Rx are preallocated (this is the allocation that is failing above). David ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-20 15:02 ` David Vrabel @ 2016-01-20 15:10 ` Boris Ostrovsky 2016-01-20 15:16 ` Ian Campbell 0 siblings, 1 reply; 21+ messages in thread From: Boris Ostrovsky @ 2016-01-20 15:10 UTC (permalink / raw) To: David Vrabel, Ian Campbell, xen-devel; +Cc: Wei Liu On 01/20/2016 10:02 AM, David Vrabel wrote: > On 20/01/16 14:52, Ian Campbell wrote: >> On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote: >>> On 01/20/2016 07:23 AM, Ian Campbell wrote: >>>> There have been a few reports recently[0] which relate to a failure of >>>> netfront to allocate sufficient grant refs for all the queues: >>>> >>>> [ 0.533589] xen_netfront: can't alloc rx grant refs >>>> [ 0.533612] net eth0: only created 31 queues >>>> >>>> Which can be worked around by increasing the number of grants on the >>>> hypervisor command line or by limiting the number of queues permitted >>>> by >>>> either back or front using a module param (which was broken but is now >>>> fixed on both sides, but I'm not sure it has been backported everywhere >>>> such that it is a reliable thing to always tell users as a workaround). >>>> >>>> Is there any plan to do anything about the default/out of the box >>>> experience? Either limiting the number of queues or making both ends >>>> cope >>>> more gracefully with failure to create some queues (or both) might be >>>> sufficient? >>>> >>>> I think the crash after the above in the first link at [0] is fixed? I >>>> think that was the purpose of ca88ea1247df "xen-netfront: update >>>> num_queues >>>> to real created" which was in 4.3. >>> I think ca88ea1247df is the solution --- it will limit the number of >>> queues. >> That's in 4.4, which the first link at [0] claimed to have tested. I can >> see this fixing the crash, but does it really fix the "actually works with >> less queues than it tried to get" issue? That's what I thought it does too. I didn't notice that 4.4 was tested as well, so maybe not. -boris >> >> In any case having exhausted the grant entries creating queues there aren't >> any left to shuffle actual data around, is there? (or are those >> preallocated too?) > All grants refs for Tx and Rx are preallocated (this is the allocation > that is failing above). > > David ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-20 15:10 ` Boris Ostrovsky @ 2016-01-20 15:16 ` Ian Campbell 2016-01-21 10:12 ` Ian Campbell 0 siblings, 1 reply; 21+ messages in thread From: Ian Campbell @ 2016-01-20 15:16 UTC (permalink / raw) To: Boris Ostrovsky, David Vrabel, xen-devel; +Cc: Wei Liu On Wed, 2016-01-20 at 10:10 -0500, Boris Ostrovsky wrote: > On 01/20/2016 10:02 AM, David Vrabel wrote: > > On 20/01/16 14:52, Ian Campbell wrote: > > > On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote: > > > > On 01/20/2016 07:23 AM, Ian Campbell wrote: > > > > > There have been a few reports recently[0] which relate to a > > > > > failure of > > > > > netfront to allocate sufficient grant refs for all the queues: > > > > > > > > > > [ 0.533589] xen_netfront: can't alloc rx grant refs > > > > > [ 0.533612] net eth0: only created 31 queues > > > > > > > > > > Which can be worked around by increasing the number of grants on > > > > > the > > > > > hypervisor command line or by limiting the number of queues > > > > > permitted > > > > > by > > > > > either back or front using a module param (which was broken but > > > > > is now > > > > > fixed on both sides, but I'm not sure it has been backported > > > > > everywhere > > > > > such that it is a reliable thing to always tell users as a > > > > > workaround). > > > > > > > > > > Is there any plan to do anything about the default/out of the box > > > > > experience? Either limiting the number of queues or making both > > > > > ends > > > > > cope > > > > > more gracefully with failure to create some queues (or both) > > > > > might be > > > > > sufficient? > > > > > > > > > > I think the crash after the above in the first link at [0] is > > > > > fixed? I > > > > > think that was the purpose of ca88ea1247df "xen-netfront: update > > > > > num_queues > > > > > to real created" which was in 4.3. > > > > I think ca88ea1247df is the solution --- it will limit the number > > > > of > > > > queues. > > > That's in 4.4, which the first link at [0] claimed to have tested. I > > > can > > > see this fixing the crash, but does it really fix the "actually works > > > with > > > less queues than it tried to get" issue? > > That's what I thought it does too. I didn't notice that 4.4 was tested > as well, so maybe not. I've asked the reporter to send logs for the 4.4 case to xen-devel. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-20 15:16 ` Ian Campbell @ 2016-01-21 10:12 ` Ian Campbell 2016-01-21 10:25 ` Wei Liu 0 siblings, 1 reply; 21+ messages in thread From: Ian Campbell @ 2016-01-21 10:12 UTC (permalink / raw) To: Boris Ostrovsky, David Vrabel, xen-devel; +Cc: Wei Liu On Wed, 2016-01-20 at 15:16 +0000, Ian Campbell wrote: > On Wed, 2016-01-20 at 10:10 -0500, Boris Ostrovsky wrote: > > On 01/20/2016 10:02 AM, David Vrabel wrote: > > > On 20/01/16 14:52, Ian Campbell wrote: > > > > On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote: > > > > > On 01/20/2016 07:23 AM, Ian Campbell wrote: > > > > > > There have been a few reports recently[0] which relate to a > > > > > > failure of > > > > > > netfront to allocate sufficient grant refs for all the queues: > > > > > > > > > > > > [ 0.533589] xen_netfront: can't alloc rx grant refs > > > > > > [ 0.533612] net eth0: only created 31 queues > > > > > > > > > > > > Which can be worked around by increasing the number of grants > > > > > > on > > > > > > the > > > > > > hypervisor command line or by limiting the number of queues > > > > > > permitted > > > > > > by > > > > > > either back or front using a module param (which was broken but > > > > > > is now > > > > > > fixed on both sides, but I'm not sure it has been backported > > > > > > everywhere > > > > > > such that it is a reliable thing to always tell users as a > > > > > > workaround). > > > > > > > > > > > > Is there any plan to do anything about the default/out of the > > > > > > box > > > > > > experience? Either limiting the number of queues or making both > > > > > > ends > > > > > > cope > > > > > > more gracefully with failure to create some queues (or both) > > > > > > might be > > > > > > sufficient? > > > > > > > > > > > > I think the crash after the above in the first link at [0] is > > > > > > fixed? I > > > > > > think that was the purpose of ca88ea1247df "xen-netfront: > > > > > > update > > > > > > num_queues > > > > > > to real created" which was in 4.3. > > > > > I think ca88ea1247df is the solution --- it will limit the number > > > > > of > > > > > queues. > > > > That's in 4.4, which the first link at [0] claimed to have tested. > > > > I > > > > can > > > > see this fixing the crash, but does it really fix the "actually > > > > works > > > > with > > > > less queues than it tried to get" issue? > > > > That's what I thought it does too. I didn't notice that 4.4 was tested > > as well, so maybe not. > > I've asked the reporter to send logs for the 4.4 case to xen-devel. User confirmed[0] that 4.4 is actually OK. Did someone request stable backports yet, or shall I do so? Ian. [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00110.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-21 10:12 ` Ian Campbell @ 2016-01-21 10:25 ` Wei Liu 2016-01-21 10:37 ` Ian Campbell 0 siblings, 1 reply; 21+ messages in thread From: Wei Liu @ 2016-01-21 10:25 UTC (permalink / raw) To: Ian Campbell; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel, xen-devel On Thu, Jan 21, 2016 at 10:12:27AM +0000, Ian Campbell wrote: [...] > > I've asked the reporter to send logs for the 4.4 case to xen-devel. > > User confirmed[0] that 4.4 is actually OK. > > Did someone request stable backports yet, or shall I do so? > I vaguely remember we requested backport for relevant patches long time ago, but I admit I have lost track. So it wouldn't hurt if you do it again. Wei. > Ian. > > [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00110.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-21 10:25 ` Wei Liu @ 2016-01-21 10:37 ` Ian Campbell 2016-01-21 10:52 ` Wei Liu 0 siblings, 1 reply; 21+ messages in thread From: Ian Campbell @ 2016-01-21 10:37 UTC (permalink / raw) To: Wei Liu; +Cc: Boris Ostrovsky, David Vrabel, xen-devel On Thu, 2016-01-21 at 10:25 +0000, Wei Liu wrote: > On Thu, Jan 21, 2016 at 10:12:27AM +0000, Ian Campbell wrote: > [...] > > > I've asked the reporter to send logs for the 4.4 case to xen-devel. > > > > User confirmed[0] that 4.4 is actually OK. > > > > Did someone request stable backports yet, or shall I do so? > > > > I vaguely remember we requested backport for relevant patches long time > ago, but I admit I have lost track. So it wouldn't hurt if you do it > again. So I think we'd be looking for: 32a8440 xen-netfront: respect user provided max_queues 4c82ac3 xen-netback: respect user provided max_queues ca88ea1 xen-netfront: update num_queues to real created which certainly resolves things such that the workarounds work, and I think will also fix the default case such that it works with up to 32 vcpus (although it will consume all the grants and only get 31/32 queues). Does that sound correct? As Annie said, we may still want to consider what a sensible default max queues would be. Ian. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-21 10:37 ` Ian Campbell @ 2016-01-21 10:52 ` Wei Liu 0 siblings, 0 replies; 21+ messages in thread From: Wei Liu @ 2016-01-21 10:52 UTC (permalink / raw) To: Ian Campbell; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel, xen-devel On Thu, Jan 21, 2016 at 10:37:51AM +0000, Ian Campbell wrote: > On Thu, 2016-01-21 at 10:25 +0000, Wei Liu wrote: > > On Thu, Jan 21, 2016 at 10:12:27AM +0000, Ian Campbell wrote: > > [...] > > > > I've asked the reporter to send logs for the 4.4 case to xen-devel. > > > > > > User confirmed[0] that 4.4 is actually OK. > > > > > > Did someone request stable backports yet, or shall I do so? > > > > > > > I vaguely remember we requested backport for relevant patches long time > > ago, but I admit I have lost track. So it wouldn't hurt if you do it > > again. > > So I think we'd be looking for: > > 32a8440 xen-netfront: respect user provided max_queues > 4c82ac3 xen-netback: respect user provided max_queues > ca88ea1 xen-netfront: update num_queues to real created > > which certainly resolves things such that the workarounds work, and I think > will also fix the default case such that it works with up to 32 vcpus > (although it will consume all the grants and only get 31/32 queues). > > Does that sound correct? > Yes, it does. > As Annie said, we may still want to consider what a sensible default max > queues would be. > Maybe we should set a cap to 8 or 16 by default. Wei. > Ian. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-20 12:23 netfront/netback multiqueue exhausting grants Ian Campbell 2016-01-20 14:40 ` Boris Ostrovsky @ 2016-01-20 16:18 ` annie li 2016-01-21 10:56 ` David Vrabel 2 siblings, 0 replies; 21+ messages in thread From: annie li @ 2016-01-20 16:18 UTC (permalink / raw) To: Ian Campbell, xen-devel; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel On 2016/1/20 7:23, Ian Campbell wrote: > There have been a few reports recently[0] which relate to a failure of > netfront to allocate sufficient grant refs for all the queues: > > [ 0.533589] xen_netfront: can't alloc rx grant refs > [ 0.533612] net eth0: only created 31 queues > > Which can be worked around by increasing the number of grants on the > hypervisor command line or by limiting the number of queues permitted by > either back or front using a module param (which was broken but is now > fixed on both sides, but I'm not sure it has been backported everywhere > such that it is a reliable thing to always tell users as a workaround). Following are the patches to fix module param, they exist since v4.3. xen-netfront: respect user provided max_queues https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=32a844056fd43dda647e1c3c6b9983bdfa04d17d xen-netback: respect user provided max_queues https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4c82ac3c37363e8c4ded6a5fe1ec5fa756b34df3 > > Is there any plan to do anything about the default/out of the box > experience? Either limiting the number of queues or making both ends cope > more gracefully with failure to create some queues (or both) might be > sufficient? We run into similar issue recently, and guess it is better to suggest user to set netback module parameter with the default value as 8? see this link, http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing Probably more test are needed to get the default number of best experience. > > I think the crash after the above in the first link at [0] is fixed? I > think that was the purpose of ca88ea1247df "xen-netfront: update num_queues > to real created" which was in 4.3. Correct. Thanks Annie > > Ian. > > [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html > http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html > some before hte xmas break too IIRC > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-20 12:23 netfront/netback multiqueue exhausting grants Ian Campbell 2016-01-20 14:40 ` Boris Ostrovsky 2016-01-20 16:18 ` annie li @ 2016-01-21 10:56 ` David Vrabel 2016-01-21 12:19 ` Ian Campbell 2 siblings, 1 reply; 21+ messages in thread From: David Vrabel @ 2016-01-21 10:56 UTC (permalink / raw) To: Ian Campbell, xen-devel; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel On 20/01/16 12:23, Ian Campbell wrote: > There have been a few reports recently[0] which relate to a failure of > netfront to allocate sufficient grant refs for all the queues: > > [ 0.533589] xen_netfront: can't alloc rx grant refs > [ 0.533612] net eth0: only created 31 queues > > Which can be worked around by increasing the number of grants on the > hypervisor command line or by limiting the number of queues permitted by > either back or front using a module param (which was broken but is now > fixed on both sides, but I'm not sure it has been backported everywhere > such that it is a reliable thing to always tell users as a workaround). > > Is there any plan to do anything about the default/out of the box > experience? Either limiting the number of queues or making both ends cope > more gracefully with failure to create some queues (or both) might be > sufficient? > > I think the crash after the above in the first link at [0] is fixed? I > think that was the purpose of ca88ea1247df "xen-netfront: update num_queues > to real created" which was in 4.3. I think the correct solution is to increase the default maximum grant table size. Although, unless you're using the not-yet-applied per-cpu rwlock patches multiqueue is terrible on many (multisocket) systems and the number of queue should be limited in netback to 4 or even just 2. David ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-21 10:56 ` David Vrabel @ 2016-01-21 12:19 ` Ian Campbell 2016-01-21 14:17 ` David Vrabel 2016-01-22 3:36 ` Bob Liu 0 siblings, 2 replies; 21+ messages in thread From: Ian Campbell @ 2016-01-21 12:19 UTC (permalink / raw) To: David Vrabel, xen-devel; +Cc: Boris Ostrovsky, Wei Liu On Thu, 2016-01-21 at 10:56 +0000, David Vrabel wrote: > On 20/01/16 12:23, Ian Campbell wrote: > > There have been a few reports recently[0] which relate to a failure of > > netfront to allocate sufficient grant refs for all the queues: > > > > [ 0.533589] xen_netfront: can't alloc rx grant refs > > [ 0.533612] net eth0: only created 31 queues > > > > Which can be worked around by increasing the number of grants on the > > hypervisor command line or by limiting the number of queues permitted > > by > > either back or front using a module param (which was broken but is now > > fixed on both sides, but I'm not sure it has been backported everywhere > > such that it is a reliable thing to always tell users as a workaround). > > > > Is there any plan to do anything about the default/out of the box > > experience? Either limiting the number of queues or making both ends > > cope > > more gracefully with failure to create some queues (or both) might be > > sufficient? > > > > I think the crash after the above in the first link at [0] is fixed? I > > think that was the purpose of ca88ea1247df "xen-netfront: update > > num_queues > > to real created" which was in 4.3. > > I think the correct solution is to increase the default maximum grant > table size. That could well make sense, but then there will just be another higher limit, so we should perhaps do both. i.e. factoring in: * performance i.e. ability for N queues to saturate whatever sort of link contemporary Linux can saturate these days, plus some headroom, or whatever other ceiling seems sensible) * grant table resource consumption i.e. (sensible max number of blks * nr gnts per blk + sensible max number of vifs * nr gnts per vif + other devs needs) < per guest grant limit) to pick both the default gnttab size and the default max queuers. (or s/sensible/supportable/g etc). > Although, unless you're using the not-yet-applied per-cpu rwlock patches > multiqueue is terrible on many (multisocket) systems and the number of > queue should be limited in netback to 4 or even just 2. Presumably the guest can't tell, so it can't do this. I think when you say "terrible" you don't mean "worse than without mq" but rather "not realising the expected gains from a larger nunber of queues", right?. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-21 12:19 ` Ian Campbell @ 2016-01-21 14:17 ` David Vrabel 2016-01-21 15:11 ` annie li 2016-01-22 3:36 ` Bob Liu 1 sibling, 1 reply; 21+ messages in thread From: David Vrabel @ 2016-01-21 14:17 UTC (permalink / raw) To: Ian Campbell, David Vrabel, xen-devel; +Cc: Boris Ostrovsky, Wei Liu On 21/01/16 12:19, Ian Campbell wrote: > On Thu, 2016-01-21 at 10:56 +0000, David Vrabel wrote: >> On 20/01/16 12:23, Ian Campbell wrote: >>> There have been a few reports recently[0] which relate to a failure of >>> netfront to allocate sufficient grant refs for all the queues: >>> >>> [ 0.533589] xen_netfront: can't alloc rx grant refs >>> [ 0.533612] net eth0: only created 31 queues >>> >>> Which can be worked around by increasing the number of grants on the >>> hypervisor command line or by limiting the number of queues permitted >>> by >>> either back or front using a module param (which was broken but is now >>> fixed on both sides, but I'm not sure it has been backported everywhere >>> such that it is a reliable thing to always tell users as a workaround). >>> >>> Is there any plan to do anything about the default/out of the box >>> experience? Either limiting the number of queues or making both ends >>> cope >>> more gracefully with failure to create some queues (or both) might be >>> sufficient? >>> >>> I think the crash after the above in the first link at [0] is fixed? I >>> think that was the purpose of ca88ea1247df "xen-netfront: update >>> num_queues >>> to real created" which was in 4.3. >> >> I think the correct solution is to increase the default maximum grant >> table size. > > That could well make sense, but then there will just be another higher > limit, so we should perhaps do both. > > i.e. factoring in: > * performance i.e. ability for N queues to saturate whatever sort of link > contemporary Linux can saturate these days, plus some headroom, or > whatever other ceiling seems sensible) > * grant table resource consumption i.e. (sensible max number of blks * nr > gnts per blk + sensible max number of vifs * nr gnts per vif + other > devs needs) < per guest grant limit) to pick both the default gnttab > size and the default max queuers. Yes. >> Although, unless you're using the not-yet-applied per-cpu rwlock patches >> multiqueue is terrible on many (multisocket) systems and the number of >> queue should be limited in netback to 4 or even just 2. > > Presumably the guest can't tell, so it can't do this. > > I think when you say "terrible" you don't mean "worse than without mq" but > rather "not realising the expected gains from a larger nunber of queues", > right?. Malcolm did the analysis but if I remember correctly, 8 queues performed about the same as 1 queue and 16 were worse than 1 queue. David ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-21 14:17 ` David Vrabel @ 2016-01-21 15:11 ` annie li 0 siblings, 0 replies; 21+ messages in thread From: annie li @ 2016-01-21 15:11 UTC (permalink / raw) To: David Vrabel, Ian Campbell, xen-devel; +Cc: Boris Ostrovsky, Wei Liu On 2016/1/21 9:17, David Vrabel wrote: > On 21/01/16 12:19, Ian Campbell wrote: >> On Thu, 2016-01-21 at 10:56 +0000, David Vrabel wrote: >>> On 20/01/16 12:23, Ian Campbell wrote: >>>> There have been a few reports recently[0] which relate to a failure of >>>> netfront to allocate sufficient grant refs for all the queues: >>>> >>>> [ 0.533589] xen_netfront: can't alloc rx grant refs >>>> [ 0.533612] net eth0: only created 31 queues >>>> >>>> Which can be worked around by increasing the number of grants on the >>>> hypervisor command line or by limiting the number of queues permitted >>>> by >>>> either back or front using a module param (which was broken but is now >>>> fixed on both sides, but I'm not sure it has been backported everywhere >>>> such that it is a reliable thing to always tell users as a workaround). >>>> >>>> Is there any plan to do anything about the default/out of the box >>>> experience? Either limiting the number of queues or making both ends >>>> cope >>>> more gracefully with failure to create some queues (or both) might be >>>> sufficient? >>>> >>>> I think the crash after the above in the first link at [0] is fixed? I >>>> think that was the purpose of ca88ea1247df "xen-netfront: update >>>> num_queues >>>> to real created" which was in 4.3. >>> I think the correct solution is to increase the default maximum grant >>> table size. >> That could well make sense, but then there will just be another higher >> limit, so we should perhaps do both. >> >> i.e. factoring in: >> * performance i.e. ability for N queues to saturate whatever sort of link >> contemporary Linux can saturate these days, plus some headroom, or >> whatever other ceiling seems sensible) >> * grant table resource consumption i.e. (sensible max number of blks * nr >> gnts per blk + sensible max number of vifs * nr gnts per vif + other >> devs needs) < per guest grant limit) to pick both the default gnttab >> size and the default max queuers. > Yes. Would it waste lots of resources in the case where guest vif has lots of queue but no network load? Here is an example of gntref consumed by vif, Dom0 20vcpu, domu 20vcpu, one vif would consumes 20*256*2=10240 gntref. If setting the maximum grant table size to 64pages(default value of xen is 32pages now?), then only 3 vif is supported in guest. Even blk isn't taken account in, and also blk multi-page ring feature. Thanks Annie > >>> Although, unless you're using the not-yet-applied per-cpu rwlock patches >>> multiqueue is terrible on many (multisocket) systems and the number of >>> queue should be limited in netback to 4 or even just 2. >> Presumably the guest can't tell, so it can't do this. >> >> I think when you say "terrible" you don't mean "worse than without mq" but >> rather "not realising the expected gains from a larger nunber of queues", >> right?. > Malcolm did the analysis but if I remember correctly, 8 queues performed > about the same as 1 queue and 16 were worse than 1 queue. > > David > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-21 12:19 ` Ian Campbell 2016-01-21 14:17 ` David Vrabel @ 2016-01-22 3:36 ` Bob Liu 2016-01-22 7:53 ` Jan Beulich 1 sibling, 1 reply; 21+ messages in thread From: Bob Liu @ 2016-01-22 3:36 UTC (permalink / raw) To: Ian Campbell; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel, xen-devel On 01/21/2016 08:19 PM, Ian Campbell wrote: > On Thu, 2016-01-21 at 10:56 +0000, David Vrabel wrote: >> On 20/01/16 12:23, Ian Campbell wrote: >>> There have been a few reports recently[0] which relate to a failure of >>> netfront to allocate sufficient grant refs for all the queues: >>> >>> [ 0.533589] xen_netfront: can't alloc rx grant refs >>> [ 0.533612] net eth0: only created 31 queues >>> >>> Which can be worked around by increasing the number of grants on the >>> hypervisor command line or by limiting the number of queues permitted >>> by >>> either back or front using a module param (which was broken but is now >>> fixed on both sides, but I'm not sure it has been backported everywhere >>> such that it is a reliable thing to always tell users as a workaround). >>> >>> Is there any plan to do anything about the default/out of the box >>> experience? Either limiting the number of queues or making both ends >>> cope >>> more gracefully with failure to create some queues (or both) might be >>> sufficient? >>> >>> I think the crash after the above in the first link at [0] is fixed? I >>> think that was the purpose of ca88ea1247df "xen-netfront: update >>> num_queues >>> to real created" which was in 4.3. >> >> I think the correct solution is to increase the default maximum grant >> table size. > > That could well make sense, but then there will just be another higher > limit, so we should perhaps do both. > > i.e. factoring in: > * performance i.e. ability for N queues to saturate whatever sort of link > contemporary Linux can saturate these days, plus some headroom, or > whatever other ceiling seems sensible) > * grant table resource consumption i.e. (sensible max number of blks * nr > gnts per blk + sensible max number of vifs * nr gnts per vif + other > devs needs) < per guest grant limit) to pick both the default gnttab > size and the default max queuers. > Agree. By the way, do you think it's possible to make grant table support bigger page e.g 64K? One grant-ref per 64KB instead of 4KB, this should able to reduce the grant entry consumption significantly. Bob > (or s/sensible/supportable/g etc). > >> Although, unless you're using the not-yet-applied per-cpu rwlock patches >> multiqueue is terrible on many (multisocket) systems and the number of >> queue should be limited in netback to 4 or even just 2. > > Presumably the guest can't tell, so it can't do this. > > I think when you say "terrible" you don't mean "worse than without mq" but > rather "not realising the expected gains from a larger nunber of queues", > right?. > > Ian. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-22 3:36 ` Bob Liu @ 2016-01-22 7:53 ` Jan Beulich 2016-01-22 10:40 ` Bob Liu 0 siblings, 1 reply; 21+ messages in thread From: Jan Beulich @ 2016-01-22 7:53 UTC (permalink / raw) To: Bob Liu; +Cc: Boris Ostrovsky, xen-devel, Wei Liu, David Vrabel, Ian Campbell >>> On 22.01.16 at 04:36, <bob.liu@oracle.com> wrote: > By the way, do you think it's possible to make grant table support bigger > page e.g 64K? > One grant-ref per 64KB instead of 4KB, this should able to reduce the grant > entry consumption significantly. How would that work with an underlying page size of 4k, and pages potentially being non-contiguous in machine address space? Besides that the grant table hypercall interface isn't prepared to support 64k page size, due to its use of uint16_t for the length of copy ops. Jan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-22 7:53 ` Jan Beulich @ 2016-01-22 10:40 ` Bob Liu 2016-01-22 11:02 ` Jan Beulich 0 siblings, 1 reply; 21+ messages in thread From: Bob Liu @ 2016-01-22 10:40 UTC (permalink / raw) To: Jan Beulich Cc: Boris Ostrovsky, xen-devel, Wei Liu, David Vrabel, Ian Campbell On 01/22/2016 03:53 PM, Jan Beulich wrote: >>>> On 22.01.16 at 04:36, <bob.liu@oracle.com> wrote: >> By the way, do you think it's possible to make grant table support bigger >> page e.g 64K? >> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant >> entry consumption significantly. > > How would that work with an underlying page size of 4k, and pages > potentially being non-contiguous in machine address space? Besides > that the grant table hypercall interface isn't prepared to support > 64k page size, due to its use of uint16_t for the length of copy ops. > Right, and I mean whether we should consider address all the place as your mentioned. With multi-queue xen-block and xen-network, we got more reports that the grants were exhausted. -- Regards, -Bob ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-22 10:40 ` Bob Liu @ 2016-01-22 11:02 ` Jan Beulich 2016-01-23 0:29 ` Bob Liu 0 siblings, 1 reply; 21+ messages in thread From: Jan Beulich @ 2016-01-22 11:02 UTC (permalink / raw) To: Bob Liu; +Cc: Boris Ostrovsky, xen-devel, Wei Liu, David Vrabel, Ian Campbell >>> On 22.01.16 at 11:40, <bob.liu@oracle.com> wrote: > On 01/22/2016 03:53 PM, Jan Beulich wrote: >>>>> On 22.01.16 at 04:36, <bob.liu@oracle.com> wrote: >>> By the way, do you think it's possible to make grant table support bigger >>> page e.g 64K? >>> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant >>> entry consumption significantly. >> >> How would that work with an underlying page size of 4k, and pages >> potentially being non-contiguous in machine address space? Besides >> that the grant table hypercall interface isn't prepared to support >> 64k page size, due to its use of uint16_t for the length of copy ops. > > Right, and I mean whether we should consider address all the place as your > mentioned. Just from an abstract perspective: How would you envision to avoid machine address discontiguity? Or would you want to limit such an improvement to only HVM/PVH/HVMlite guests? Jan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-22 11:02 ` Jan Beulich @ 2016-01-23 0:29 ` Bob Liu 2016-01-25 9:53 ` Jan Beulich 0 siblings, 1 reply; 21+ messages in thread From: Bob Liu @ 2016-01-23 0:29 UTC (permalink / raw) To: Jan Beulich Cc: Boris Ostrovsky, xen-devel, Wei Liu, David Vrabel, Ian Campbell On 01/22/2016 07:02 PM, Jan Beulich wrote: >>>> On 22.01.16 at 11:40, <bob.liu@oracle.com> wrote: >> On 01/22/2016 03:53 PM, Jan Beulich wrote: >>>>>> On 22.01.16 at 04:36, <bob.liu@oracle.com> wrote: >>>> By the way, do you think it's possible to make grant table support bigger >>>> page e.g 64K? >>>> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant >>>> entry consumption significantly. >>> >>> How would that work with an underlying page size of 4k, and pages >>> potentially being non-contiguous in machine address space? Besides >>> that the grant table hypercall interface isn't prepared to support >>> 64k page size, due to its use of uint16_t for the length of copy ops. >> >> Right, and I mean whether we should consider address all the place as your >> mentioned. > > Just from an abstract perspective: How would you envision to avoid > machine address discontiguity? Or would you want to limit such an E.g Reserve a page pool with continuous 64KB pages, or make grant-map support huge page(2MB)? To be honest, I haven't think much about the detail. Do you think that's unlikely to implement? If yes, we have to limit the queue numbers, VM numbers and vdisk/vif numbers in a proper way to make sure the guests won't enter grant-exhausted state. > improvement to only HVM/PVH/HVMlite guests? > > Jan > -- Regards, -Bob ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: netfront/netback multiqueue exhausting grants 2016-01-23 0:29 ` Bob Liu @ 2016-01-25 9:53 ` Jan Beulich 0 siblings, 0 replies; 21+ messages in thread From: Jan Beulich @ 2016-01-25 9:53 UTC (permalink / raw) To: Bob Liu; +Cc: Boris Ostrovsky, xen-devel, Wei Liu, David Vrabel, Ian Campbell >>> On 23.01.16 at 01:29, <bob.liu@oracle.com> wrote: > On 01/22/2016 07:02 PM, Jan Beulich wrote: >>>>> On 22.01.16 at 11:40, <bob.liu@oracle.com> wrote: >>> On 01/22/2016 03:53 PM, Jan Beulich wrote: >>>>>>> On 22.01.16 at 04:36, <bob.liu@oracle.com> wrote: >>>>> By the way, do you think it's possible to make grant table support bigger >>>>> page e.g 64K? >>>>> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant >>>>> entry consumption significantly. >>>> >>>> How would that work with an underlying page size of 4k, and pages >>>> potentially being non-contiguous in machine address space? Besides >>>> that the grant table hypercall interface isn't prepared to support >>>> 64k page size, due to its use of uint16_t for the length of copy ops. >>> >>> Right, and I mean whether we should consider address all the place as your >>> mentioned. >> >> Just from an abstract perspective: How would you envision to avoid >> machine address discontiguity? Or would you want to limit such an > > E.g Reserve a page pool with continuous 64KB pages, or make grant-map support > huge page(2MB)? > To be honest, I haven't think much about the detail. > > Do you think that's unlikely to implement? Contiguous memory (of whatever granularity above 4k) is quite difficult to _guarantee_ in PV guests, so yes, without you or someone else having a fantastic new idea on how to achieve this I indeed see this pretty unlikely a thing to come true. Jan ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2016-01-25 9:53 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-20 12:23 netfront/netback multiqueue exhausting grants Ian Campbell 2016-01-20 14:40 ` Boris Ostrovsky 2016-01-20 14:52 ` Ian Campbell 2016-01-20 15:02 ` David Vrabel 2016-01-20 15:10 ` Boris Ostrovsky 2016-01-20 15:16 ` Ian Campbell 2016-01-21 10:12 ` Ian Campbell 2016-01-21 10:25 ` Wei Liu 2016-01-21 10:37 ` Ian Campbell 2016-01-21 10:52 ` Wei Liu 2016-01-20 16:18 ` annie li 2016-01-21 10:56 ` David Vrabel 2016-01-21 12:19 ` Ian Campbell 2016-01-21 14:17 ` David Vrabel 2016-01-21 15:11 ` annie li 2016-01-22 3:36 ` Bob Liu 2016-01-22 7:53 ` Jan Beulich 2016-01-22 10:40 ` Bob Liu 2016-01-22 11:02 ` Jan Beulich 2016-01-23 0:29 ` Bob Liu 2016-01-25 9:53 ` Jan Beulich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).