netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* DomU's network interface will hung when Dom0 running 32bit
@ 2013-10-12  8:53 jianhai luan
  2013-10-14 11:19 ` Wei Liu
  0 siblings, 1 reply; 26+ messages in thread
From: jianhai luan @ 2013-10-12  8:53 UTC (permalink / raw)
  To: Ian Campbell, Wei Liu; +Cc: xen-devel, netdev

Hi Ian,
   I meet the DomU's network interface hung issue recently, and have 
been working on the issue from that time. I find that DomU's network 
interface, which send lesser package, will hung if Dom0 running 32bit 
and DomU's up-time is very long.  I think that one jiffies overflow bug 
exist in the function tx_credit_exceeded().
   I know the inline function time_after_eq(a,b) will process jiffies 
overflow, but the function have one limit a should little that (b + 
MAX_SIGNAL_LONG). If a large than the value, time_after_eq will return 
false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit machine.
   If DomU's network interface send lesser package (<0.5k/s if 
jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out 
(credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now, 
next_credit) will failure (should be true). So one timer which will not 
be trigger in short time, and later process will be aborted when 
timer_pending(&vif->credit_timeout) is true. The result will be DomU's 
network interface will be hung in long time (> 40days).
   Please think about the below scenario:
   Condition:
     Dom0 running 32-bit and HZ = 1000
     vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit = 
0xffffffff, vif->credit_usec=0 jiffies=0
     vif receive lesser package (DomU send lesser package). If the value 
is litter than 2K/s, consume 4G(0xffffffff) will need 582.55 hours. 
jiffies will large than 0x7ffffff. we guess jiffies = 0x800000ff, 
time_after_eq(0x800000ff, 0xffffffff) will failure, and one time which 
expire is 0xfffffff will be pended into system. So the interface will 
hung until jiffies recount 0xffffffff (that will need very long time).

   If some error exist in above explain, please help me point it out.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-12  8:53 DomU's network interface will hung when Dom0 running 32bit jianhai luan
@ 2013-10-14 11:19 ` Wei Liu
  2013-10-15  2:44   ` jianhai luan
  0 siblings, 1 reply; 26+ messages in thread
From: Wei Liu @ 2013-10-14 11:19 UTC (permalink / raw)
  To: jianhai luan; +Cc: Ian Campbell, Wei Liu, xen-devel, netdev

On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:
> Hi Ian,
>   I meet the DomU's network interface hung issue recently, and have
> been working on the issue from that time. I find that DomU's network
> interface, which send lesser package, will hung if Dom0 running
> 32bit and DomU's up-time is very long.  I think that one jiffies
> overflow bug exist in the function tx_credit_exceeded().
>   I know the inline function time_after_eq(a,b) will process jiffies
> overflow, but the function have one limit a should little that (b +
> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will
> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit
> machine.
>   If DomU's network interface send lesser package (<0.5k/s if
> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out
> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now,
> next_credit) will failure (should be true). So one timer which will
> not be trigger in short time, and later process will be aborted when
> timer_pending(&vif->credit_timeout) is true. The result will be
> DomU's network interface will be hung in long time (> 40days).
>   Please think about the below scenario:
>   Condition:
>     Dom0 running 32-bit and HZ = 1000
>     vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit
> = 0xffffffff, vif->credit_usec=0 jiffies=0
>     vif receive lesser package (DomU send lesser package). If the
> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55
> hours. jiffies will large than 0x7ffffff. we guess jiffies =
> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and
> one time which expire is 0xfffffff will be pended into system. So
> the interface will hung until jiffies recount 0xffffffff (that will
> need very long time).

If I'm not mistaken you meant time_after_eq(now, next_credit) in
netback. How does next_credit become 0xffffffff?

Wei.

> 
>   If some error exist in above explain, please help me point it out.
> 
> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-14 11:19 ` Wei Liu
@ 2013-10-15  2:44   ` jianhai luan
  2013-10-15  8:43     ` Ian Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: jianhai luan @ 2013-10-15  2:44 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, netdev


On 2013-10-14 19:19, Wei Liu wrote:
> On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:
>> Hi Ian,
>>    I meet the DomU's network interface hung issue recently, and have
>> been working on the issue from that time. I find that DomU's network
>> interface, which send lesser package, will hung if Dom0 running
>> 32bit and DomU's up-time is very long.  I think that one jiffies
>> overflow bug exist in the function tx_credit_exceeded().
>>    I know the inline function time_after_eq(a,b) will process jiffies
>> overflow, but the function have one limit a should little that (b +
>> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will
>> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit
>> machine.
>>    If DomU's network interface send lesser package (<0.5k/s if
>> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out
>> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now,
>> next_credit) will failure (should be true). So one timer which will
>> not be trigger in short time, and later process will be aborted when
>> timer_pending(&vif->credit_timeout) is true. The result will be
>> DomU's network interface will be hung in long time (> 40days).
>>    Please think about the below scenario:
>>    Condition:
>>      Dom0 running 32-bit and HZ = 1000
>>      vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit
>> = 0xffffffff, vif->credit_usec=0 jiffies=0
>>      vif receive lesser package (DomU send lesser package). If the
>> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55
>> hours. jiffies will large than 0x7ffffff. we guess jiffies =
>> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and
>> one time which expire is 0xfffffff will be pended into system. So
>> the interface will hung until jiffies recount 0xffffffff (that will
>> need very long time).
> If I'm not mistaken you meant time_after_eq(now, next_credit) in
> netback. How does next_credit become 0xffffffff?

I only assume the value is 0xfffffff, and the value of next_credit 
isn't  point. If the delta between now and next_credit larger than 
ULONG_MAX, time_after_eq will do wrong judge.
>
> Wei.
>
>>    If some error exist in above explain, please help me point it out.
>>
>> Thanks,
>> Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15  2:44   ` jianhai luan
@ 2013-10-15  8:43     ` Ian Campbell
  2013-10-15  9:34       ` jianhai luan
  2013-10-16  7:10       ` annie li
  0 siblings, 2 replies; 26+ messages in thread
From: Ian Campbell @ 2013-10-15  8:43 UTC (permalink / raw)
  To: jianhai luan; +Cc: Wei Liu, xen-devel, netdev

On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote:
> On 2013-10-14 19:19, Wei Liu wrote:
> > On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:
> >> Hi Ian,
> >>    I meet the DomU's network interface hung issue recently, and have
> >> been working on the issue from that time. I find that DomU's network
> >> interface, which send lesser package, will hung if Dom0 running
> >> 32bit and DomU's up-time is very long.  I think that one jiffies
> >> overflow bug exist in the function tx_credit_exceeded().
> >>    I know the inline function time_after_eq(a,b) will process jiffies
> >> overflow, but the function have one limit a should little that (b +
> >> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will
> >> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit
> >> machine.
> >>    If DomU's network interface send lesser package (<0.5k/s if
> >> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out
> >> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now,
> >> next_credit) will failure (should be true). So one timer which will
> >> not be trigger in short time, and later process will be aborted when
> >> timer_pending(&vif->credit_timeout) is true. The result will be
> >> DomU's network interface will be hung in long time (> 40days).
> >>    Please think about the below scenario:
> >>    Condition:
> >>      Dom0 running 32-bit and HZ = 1000
> >>      vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit
> >> = 0xffffffff, vif->credit_usec=0 jiffies=0
> >>      vif receive lesser package (DomU send lesser package). If the
> >> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55
> >> hours. jiffies will large than 0x7ffffff. we guess jiffies =
> >> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and
> >> one time which expire is 0xfffffff will be pended into system. So
> >> the interface will hung until jiffies recount 0xffffffff (that will
> >> need very long time).
> > If I'm not mistaken you meant time_after_eq(now, next_credit) in
> > netback. How does next_credit become 0xffffffff?
> 
> I only assume the value is 0xfffffff, and the value of next_credit 
> isn't  point. If the delta between now and next_credit larger than 
> ULONG_MAX, time_after_eq will do wrong judge.

So it sounds like we need a timer which is independent of the traffic
being sent to keep credit_timeout.expires rolling over.

Can you propose a patch?

Ian.

> >
> > Wei.
> >
> >>    If some error exist in above explain, please help me point it out.
> >>
> >> Thanks,
> >> Jason
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15  8:43     ` Ian Campbell
@ 2013-10-15  9:34       ` jianhai luan
  2013-10-15 10:06         ` Wei Liu
  2013-10-16  7:10       ` annie li
  1 sibling, 1 reply; 26+ messages in thread
From: jianhai luan @ 2013-10-15  9:34 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, xen-devel, netdev, ANNIE LI

[-- Attachment #1: Type: text/plain, Size: 2838 bytes --]


On 2013-10-15 16:43, Ian Campbell wrote:
> On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote:
>> On 2013-10-14 19:19, Wei Liu wrote:
>>> On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:
>>>> Hi Ian,
>>>>     I meet the DomU's network interface hung issue recently, and have
>>>> been working on the issue from that time. I find that DomU's network
>>>> interface, which send lesser package, will hung if Dom0 running
>>>> 32bit and DomU's up-time is very long.  I think that one jiffies
>>>> overflow bug exist in the function tx_credit_exceeded().
>>>>     I know the inline function time_after_eq(a,b) will process jiffies
>>>> overflow, but the function have one limit a should little that (b +
>>>> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will
>>>> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit
>>>> machine.
>>>>     If DomU's network interface send lesser package (<0.5k/s if
>>>> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out
>>>> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now,
>>>> next_credit) will failure (should be true). So one timer which will
>>>> not be trigger in short time, and later process will be aborted when
>>>> timer_pending(&vif->credit_timeout) is true. The result will be
>>>> DomU's network interface will be hung in long time (> 40days).
>>>>     Please think about the below scenario:
>>>>     Condition:
>>>>       Dom0 running 32-bit and HZ = 1000
>>>>       vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit
>>>> = 0xffffffff, vif->credit_usec=0 jiffies=0
>>>>       vif receive lesser package (DomU send lesser package). If the
>>>> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55
>>>> hours. jiffies will large than 0x7ffffff. we guess jiffies =
>>>> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and
>>>> one time which expire is 0xfffffff will be pended into system. So
>>>> the interface will hung until jiffies recount 0xffffffff (that will
>>>> need very long time).
>>> If I'm not mistaken you meant time_after_eq(now, next_credit) in
>>> netback. How does next_credit become 0xffffffff?
>> I only assume the value is 0xfffffff, and the value of next_credit
>> isn't  point. If the delta between now and next_credit larger than
>> ULONG_MAX, time_after_eq will do wrong judge.
> So it sounds like we need a timer which is independent of the traffic
> being sent to keep credit_timeout.expires rolling over.
>
> Can you propose a patch?

Because credit_timeout.expire always after jiffies, i judge the value 
over the range of time_after_eq() by time_before(now, 
vif->credit_timeout.expires). please check the patch.
>
> Ian.
>
>>> Wei.
>>>
>>>>     If some error exist in above explain, please help me point it out.
>>>>
>>>> Thanks,
>>>> Jason
>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Process-the-wrong-judge-of-time_after_eq.patch --]
[-- Type: text/plain; charset=gb18030; name="0001-Process-the-wrong-judge-of-time_after_eq.patch", Size: 1206 bytes --]

From f08c584ca1f393f6559b58b6b4c9e259c313259e Mon Sep 17 00:00:00 2001
From: Jason Luan <jianhai.luan@oracle.com>
Date: Tue, 15 Oct 2013 17:07:49 +0800
Subject: [PATCH] Process the wrong judge of time_after_eq().

If netfront send lesser package, the delta between now and next_credit will be out range of time_after_qe() and the function will do wrong judge. Because the expires always after jiffies, we judge the condition by time_before(now, vif->credit_timeout.expires).

Signed-off-by: Jason Luan <jianhai.luan@oracle.com>
---
 drivers/net/xen-netback/netback.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index f3e591c..8036ce6 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1195,7 +1195,8 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 		return true;
 
 	/* Passed the point where we can replenish credit? */
-	if (time_after_eq(now, next_credit)) {
+	if (time_after_eq(now, next_credit) ||
+		unlikely(time_before(now, vif->credit_timeout.expires))) {
 		vif->credit_timeout.expires = now;
 		tx_add_credit(vif);
 	}
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15  9:34       ` jianhai luan
@ 2013-10-15 10:06         ` Wei Liu
  2013-10-15 11:26           ` jianhai luan
  0 siblings, 1 reply; 26+ messages in thread
From: Wei Liu @ 2013-10-15 10:06 UTC (permalink / raw)
  To: jianhai luan; +Cc: Ian Campbell, Wei Liu, xen-devel, netdev, ANNIE LI

On Tue, Oct 15, 2013 at 05:34:57PM +0800, jianhai luan wrote:
> 
> On 2013-10-15 16:43, Ian Campbell wrote:
> >On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote:
> >>On 2013-10-14 19:19, Wei Liu wrote:
> >>>On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:
> >>>>Hi Ian,
> >>>>    I meet the DomU's network interface hung issue recently, and have
> >>>>been working on the issue from that time. I find that DomU's network
> >>>>interface, which send lesser package, will hung if Dom0 running
> >>>>32bit and DomU's up-time is very long.  I think that one jiffies
> >>>>overflow bug exist in the function tx_credit_exceeded().
> >>>>    I know the inline function time_after_eq(a,b) will process jiffies
> >>>>overflow, but the function have one limit a should little that (b +
> >>>>MAX_SIGNAL_LONG). If a large than the value, time_after_eq will
> >>>>return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit
> >>>>machine.
> >>>>    If DomU's network interface send lesser package (<0.5k/s if
> >>>>jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out
> >>>>(credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now,
> >>>>next_credit) will failure (should be true). So one timer which will
> >>>>not be trigger in short time, and later process will be aborted when
> >>>>timer_pending(&vif->credit_timeout) is true. The result will be
> >>>>DomU's network interface will be hung in long time (> 40days).
> >>>>    Please think about the below scenario:
> >>>>    Condition:
> >>>>      Dom0 running 32-bit and HZ = 1000
> >>>>      vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit
> >>>>= 0xffffffff, vif->credit_usec=0 jiffies=0
> >>>>      vif receive lesser package (DomU send lesser package). If the
> >>>>value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55
> >>>>hours. jiffies will large than 0x7ffffff. we guess jiffies =
> >>>>0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and
> >>>>one time which expire is 0xfffffff will be pended into system. So
> >>>>the interface will hung until jiffies recount 0xffffffff (that will
> >>>>need very long time).
> >>>If I'm not mistaken you meant time_after_eq(now, next_credit) in
> >>>netback. How does next_credit become 0xffffffff?
> >>I only assume the value is 0xfffffff, and the value of next_credit
> >>isn't  point. If the delta between now and next_credit larger than
> >>ULONG_MAX, time_after_eq will do wrong judge.
> >So it sounds like we need a timer which is independent of the traffic
> >being sent to keep credit_timeout.expires rolling over.
> >
> >Can you propose a patch?
> 
> Because credit_timeout.expire always after jiffies, i judge the
> value over the range of time_after_eq() by time_before(now,
> vif->credit_timeout.expires). please check the patch.

I don't think this really fix the issue for you. You still have chance
that now wraps around and falls between expires and next_credit. In that
case it's stalled again.

Wei.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15 10:06         ` Wei Liu
@ 2013-10-15 11:26           ` jianhai luan
  2013-10-15 12:58             ` Wei Liu
  0 siblings, 1 reply; 26+ messages in thread
From: jianhai luan @ 2013-10-15 11:26 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, netdev, ANNIE LI


On 2013-10-15 18:06, Wei Liu wrote:
> On Tue, Oct 15, 2013 at 05:34:57PM +0800, jianhai luan wrote:
>> On 2013-10-15 16:43, Ian Campbell wrote:
>>> On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote:
>>>> On 2013-10-14 19:19, Wei Liu wrote:
>>>>> On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:
>>>>>> Hi Ian,
>>>>>>     I meet the DomU's network interface hung issue recently, and have
>>>>>> been working on the issue from that time. I find that DomU's network
>>>>>> interface, which send lesser package, will hung if Dom0 running
>>>>>> 32bit and DomU's up-time is very long.  I think that one jiffies
>>>>>> overflow bug exist in the function tx_credit_exceeded().
>>>>>>     I know the inline function time_after_eq(a,b) will process jiffies
>>>>>> overflow, but the function have one limit a should little that (b +
>>>>>> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will
>>>>>> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit
>>>>>> machine.
>>>>>>     If DomU's network interface send lesser package (<0.5k/s if
>>>>>> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out
>>>>>> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now,
>>>>>> next_credit) will failure (should be true). So one timer which will
>>>>>> not be trigger in short time, and later process will be aborted when
>>>>>> timer_pending(&vif->credit_timeout) is true. The result will be
>>>>>> DomU's network interface will be hung in long time (> 40days).
>>>>>>     Please think about the below scenario:
>>>>>>     Condition:
>>>>>>       Dom0 running 32-bit and HZ = 1000
>>>>>>       vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit
>>>>>> = 0xffffffff, vif->credit_usec=0 jiffies=0
>>>>>>       vif receive lesser package (DomU send lesser package). If the
>>>>>> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55
>>>>>> hours. jiffies will large than 0x7ffffff. we guess jiffies =
>>>>>> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and
>>>>>> one time which expire is 0xfffffff will be pended into system. So
>>>>>> the interface will hung until jiffies recount 0xffffffff (that will
>>>>>> need very long time).
>>>>> If I'm not mistaken you meant time_after_eq(now, next_credit) in
>>>>> netback. How does next_credit become 0xffffffff?
>>>> I only assume the value is 0xfffffff, and the value of next_credit
>>>> isn't  point. If the delta between now and next_credit larger than
>>>> ULONG_MAX, time_after_eq will do wrong judge.
>>> So it sounds like we need a timer which is independent of the traffic
>>> being sent to keep credit_timeout.expires rolling over.
>>>
>>> Can you propose a patch?
>> Because credit_timeout.expire always after jiffies, i judge the
>> value over the range of time_after_eq() by time_before(now,
>> vif->credit_timeout.expires). please check the patch.
> I don't think this really fix the issue for you. You still have chance
> that now wraps around and falls between expires and next_credit. In that
> case it's stalled again.

if time_before(now, vif->credit_timeout.expires) is true, time wrap and 
do operation. Otherwise time_before(now, vif->credit_timeout.expires) 
isn't true, now - vif->credit_timeout.expires should be letter than 
ULONG_MAX/2. Because next_credit large than vif->credit_timeout.expires 
(next_crdit = vif->credit_timeout.expires + 
msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and 
next_credit should be in range of time_after_eq().  So time_after_eq() 
do correctly judge.

Jason
>
> Wei.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15 11:26           ` jianhai luan
@ 2013-10-15 12:58             ` Wei Liu
  2013-10-15 14:29               ` jianhai luan
  2013-10-16  7:35               ` jianhai luan
  0 siblings, 2 replies; 26+ messages in thread
From: Wei Liu @ 2013-10-15 12:58 UTC (permalink / raw)
  To: jianhai luan; +Cc: Wei Liu, Ian Campbell, xen-devel, netdev, ANNIE LI

On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
[...]
> >>>Can you propose a patch?
> >>Because credit_timeout.expire always after jiffies, i judge the
> >>value over the range of time_after_eq() by time_before(now,
> >>vif->credit_timeout.expires). please check the patch.
> >I don't think this really fix the issue for you. You still have chance
> >that now wraps around and falls between expires and next_credit. In that
> >case it's stalled again.
> 
> if time_before(now, vif->credit_timeout.expires) is true, time wrap
> and do operation. Otherwise time_before(now,
> vif->credit_timeout.expires) isn't true, now -
> vif->credit_timeout.expires should be letter than ULONG_MAX/2.
> Because next_credit large than vif->credit_timeout.expires
> (next_crdit = vif->credit_timeout.expires +
> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
> next_credit should be in range of time_after_eq().  So
> time_after_eq() do correctly judge.
> 

Not sure I understand you. Consider "now" is placed like this:

   expires   now   next_credit
   ----time increases this direction--->

* time_after_eq(now, next_credit) -> false
* time_before(now, expires) -> false

Then it's stuck again. You're merely narrowing the window, not fixing
the real problem.

Wei.

> Jason
> >
> >Wei.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15 12:58             ` Wei Liu
@ 2013-10-15 14:29               ` jianhai luan
  2013-10-15 14:49                 ` Wei Liu
  2013-10-16  7:35               ` jianhai luan
  1 sibling, 1 reply; 26+ messages in thread
From: jianhai luan @ 2013-10-15 14:29 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, netdev, ANNIE LI


On 2013-10-15 20:58, Wei Liu wrote:
> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
> [...]
>>>>> Can you propose a patch?
>>>> Because credit_timeout.expire always after jiffies, i judge the
>>>> value over the range of time_after_eq() by time_before(now,
>>>> vif->credit_timeout.expires). please check the patch.
>>> I don't think this really fix the issue for you. You still have chance
>>> that now wraps around and falls between expires and next_credit. In that
>>> case it's stalled again.
>> if time_before(now, vif->credit_timeout.expires) is true, time wrap
>> and do operation. Otherwise time_before(now,
>> vif->credit_timeout.expires) isn't true, now -
>> vif->credit_timeout.expires should be letter than ULONG_MAX/2.
>> Because next_credit large than vif->credit_timeout.expires
>> (next_crdit = vif->credit_timeout.expires +
>> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
>> next_credit should be in range of time_after_eq().  So
>> time_after_eq() do correctly judge.
>>
> Not sure I understand you. Consider "now" is placed like this:
>
>     expires   now   next_credit
>     ----time increases this direction--->
>
> * time_after_eq(now, next_credit) -> false
> * time_before(now, expires) -> false

If now is placed in above environment, the result will be correct 
(Sending package will be not allowed until next_credit).
* time_after_eq(now, next_credit)  --> false will include two environment:
   expires     now   next_credit
   -----------time increases this direction ---->

Or
   expires      next_credit             next_credit + MAX_LONG/2 now
   -----------time increases this direction ---->


the first environment should be correct to control transmit. the second 
environment is our included environment.

Jason
>
> Then it's stuck again. You're merely narrowing the window, not fixing
> the real problem.
>
> Wei.
>
>> Jason
>>> Wei.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15 14:29               ` jianhai luan
@ 2013-10-15 14:49                 ` Wei Liu
  2013-10-15 14:50                   ` Ian Campbell
  0 siblings, 1 reply; 26+ messages in thread
From: Wei Liu @ 2013-10-15 14:49 UTC (permalink / raw)
  To: jianhai luan; +Cc: Wei Liu, Ian Campbell, xen-devel, netdev, ANNIE LI

On Tue, Oct 15, 2013 at 10:29:15PM +0800, jianhai luan wrote:
> 
> On 2013-10-15 20:58, Wei Liu wrote:
> >On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
> >[...]
> >>>>>Can you propose a patch?
> >>>>Because credit_timeout.expire always after jiffies, i judge the
> >>>>value over the range of time_after_eq() by time_before(now,
> >>>>vif->credit_timeout.expires). please check the patch.
> >>>I don't think this really fix the issue for you. You still have chance
> >>>that now wraps around and falls between expires and next_credit. In that
> >>>case it's stalled again.
> >>if time_before(now, vif->credit_timeout.expires) is true, time wrap
> >>and do operation. Otherwise time_before(now,
> >>vif->credit_timeout.expires) isn't true, now -
> >>vif->credit_timeout.expires should be letter than ULONG_MAX/2.
> >>Because next_credit large than vif->credit_timeout.expires
> >>(next_crdit = vif->credit_timeout.expires +
> >>msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
> >>next_credit should be in range of time_after_eq().  So
> >>time_after_eq() do correctly judge.
> >>
> >Not sure I understand you. Consider "now" is placed like this:
> >
> >    expires   now   next_credit
> >    ----time increases this direction--->
> >
> >* time_after_eq(now, next_credit) -> false
> >* time_before(now, expires) -> false
> 
> If now is placed in above environment, the result will be correct
> (Sending package will be not allowed until next_credit).

No, it is not necessarily correct. Keep in mind that "now" wraps around,
which is the issue you try to fix. You still have a window to stall your
frontend.

> * time_after_eq(now, next_credit)  --> false will include two environment:
>   expires     now   next_credit
>   -----------time increases this direction ---->
> 
> Or
>   expires      next_credit             next_credit + MAX_LONG/2 now
>   -----------time increases this direction ---->
> 
> 
> the first environment should be correct to control transmit. the
> second environment is our included environment.
> 
> Jason
> >
> >Then it's stuck again. You're merely narrowing the window, not fixing
> >the real problem.
> >
> >Wei.
> >
> >>Jason
> >>>Wei.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15 14:49                 ` Wei Liu
@ 2013-10-15 14:50                   ` Ian Campbell
  2013-10-15 15:19                     ` jianhai luan
  0 siblings, 1 reply; 26+ messages in thread
From: Ian Campbell @ 2013-10-15 14:50 UTC (permalink / raw)
  To: Wei Liu; +Cc: jianhai luan, xen-devel, netdev, ANNIE LI

On Tue, 2013-10-15 at 15:49 +0100, Wei Liu wrote:
> On Tue, Oct 15, 2013 at 10:29:15PM +0800, jianhai luan wrote:
> > 
> > On 2013-10-15 20:58, Wei Liu wrote:
> > >On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
> > >[...]
> > >>>>>Can you propose a patch?
> > >>>>Because credit_timeout.expire always after jiffies, i judge the
> > >>>>value over the range of time_after_eq() by time_before(now,
> > >>>>vif->credit_timeout.expires). please check the patch.
> > >>>I don't think this really fix the issue for you. You still have chance
> > >>>that now wraps around and falls between expires and next_credit. In that
> > >>>case it's stalled again.
> > >>if time_before(now, vif->credit_timeout.expires) is true, time wrap
> > >>and do operation. Otherwise time_before(now,
> > >>vif->credit_timeout.expires) isn't true, now -
> > >>vif->credit_timeout.expires should be letter than ULONG_MAX/2.
> > >>Because next_credit large than vif->credit_timeout.expires
> > >>(next_crdit = vif->credit_timeout.expires +
> > >>msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
> > >>next_credit should be in range of time_after_eq().  So
> > >>time_after_eq() do correctly judge.
> > >>
> > >Not sure I understand you. Consider "now" is placed like this:
> > >
> > >    expires   now   next_credit
> > >    ----time increases this direction--->
> > >
> > >* time_after_eq(now, next_credit) -> false
> > >* time_before(now, expires) -> false
> > 
> > If now is placed in above environment, the result will be correct
> > (Sending package will be not allowed until next_credit).
> 
> No, it is not necessarily correct. Keep in mind that "now" wraps around,
> which is the issue you try to fix. You still have a window to stall your
> frontend.

Remember that time_after_eq is supposed to work even with wraparound
occurring, so long as the two times are less than MAX_LONG/2 apart.

> 
> > * time_after_eq(now, next_credit)  --> false will include two environment:
> >   expires     now   next_credit
> >   -----------time increases this direction ---->
> > 
> > Or
> >   expires      next_credit             next_credit + MAX_LONG/2 now
> >   -----------time increases this direction ---->
> > 
> > 
> > the first environment should be correct to control transmit. the
> > second environment is our included environment.
> > 
> > Jason
> > >
> > >Then it's stuck again. You're merely narrowing the window, not fixing
> > >the real problem.
> > >
> > >Wei.
> > >
> > >>Jason
> > >>>Wei.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15 14:50                   ` Ian Campbell
@ 2013-10-15 15:19                     ` jianhai luan
  2013-10-15 16:03                       ` Wei Liu
  0 siblings, 1 reply; 26+ messages in thread
From: jianhai luan @ 2013-10-15 15:19 UTC (permalink / raw)
  To: Ian Campbell, Wei Liu; +Cc: xen-devel, netdev, ANNIE LI


On 2013-10-15 22:50, Ian Campbell wrote:
> On Tue, 2013-10-15 at 15:49 +0100, Wei Liu wrote:
>> On Tue, Oct 15, 2013 at 10:29:15PM +0800, jianhai luan wrote:
>>> On 2013-10-15 20:58, Wei Liu wrote:
>>>> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
>>>> [...]
>>>>>>>> Can you propose a patch?
>>>>>>> Because credit_timeout.expire always after jiffies, i judge the
>>>>>>> value over the range of time_after_eq() by time_before(now,
>>>>>>> vif->credit_timeout.expires). please check the patch.
>>>>>> I don't think this really fix the issue for you. You still have chance
>>>>>> that now wraps around and falls between expires and next_credit. In that
>>>>>> case it's stalled again.
>>>>> if time_before(now, vif->credit_timeout.expires) is true, time wrap
>>>>> and do operation. Otherwise time_before(now,
>>>>> vif->credit_timeout.expires) isn't true, now -
>>>>> vif->credit_timeout.expires should be letter than ULONG_MAX/2.
>>>>> Because next_credit large than vif->credit_timeout.expires
>>>>> (next_crdit = vif->credit_timeout.expires +
>>>>> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
>>>>> next_credit should be in range of time_after_eq().  So
>>>>> time_after_eq() do correctly judge.
>>>>>
>>>> Not sure I understand you. Consider "now" is placed like this:
>>>>
>>>>     expires   now   next_credit
>>>>     ----time increases this direction--->
>>>>
>>>> * time_after_eq(now, next_credit) -> false
>>>> * time_before(now, expires) -> false
>>> If now is placed in above environment, the result will be correct
>>> (Sending package will be not allowed until next_credit).
>> No, it is not necessarily correct. Keep in mind that "now" wraps around,
>> which is the issue you try to fix. You still have a window to stall your
>> frontend.
> Remember that time_after_eq is supposed to work even with wraparound
> occurring, so long as the two times are less than MAX_LONG/2 apart.

Sorry for my misunderstand explanation. I mean that
   * time_after_eq()/time_before_eq() fix the jiffies wraparound, so 
please think about  jiffies in line increasing.
   * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2), 
the judge will be wrong if out of the range.

So please think about three kind environment
   -  expires        now        next_credit
      --------time increases this direction ---------->

   -  expires        [next_credit        now next_credit+MAX_LONG/2
      --------time increase this direction ----------->

   - expires        next_credit        next_credit+MAX_LONG/2 now
      --------time increadse this direction ---------->

The first environment should be netfront consume all credit_byte before 
next_credit, So we should pending one timer to calculator the new 
credit_byte, and don't transmit until next_credit.

the second environment should be calculator the credit_byte because 
netfront don't consume all credit_byte before next_credit, and 
time_after_eq() do correct judge.

the third environment should be calculator in time because netfront 
don't consume all credit_byte until next_credit.But time_after_eq do 
error judge (time_after_eq(now, next_credit) is false), so the 
remaining_byte isn't be increased.

and I work on the third environment.  You know now > 
next_credit+MAX_LONG/2, time_before(now, expire) should be 
true(time_before(now, expire) is false in first environment)
>
>>> * time_after_eq(now, next_credit)  --> false will include two environment:
>>>    expires     now   next_credit
>>>    -----------time increases this direction ---->
>>>
>>> Or
>>>    expires      next_credit             next_credit + MAX_LONG/2 now
>>>    -----------time increases this direction ---->
>>>
>>>
>>> the first environment should be correct to control transmit. the
>>> second environment is our included environment.
>>>
>>> Jason
>>>> Then it's stuck again. You're merely narrowing the window, not fixing
>>>> the real problem.
>>>>
>>>> Wei.
>>>>
>>>>> Jason
>>>>>> Wei.
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15 15:19                     ` jianhai luan
@ 2013-10-15 16:03                       ` Wei Liu
  2013-10-15 16:23                         ` jianhai luan
  0 siblings, 1 reply; 26+ messages in thread
From: Wei Liu @ 2013-10-15 16:03 UTC (permalink / raw)
  To: jianhai luan; +Cc: Ian Campbell, Wei Liu, xen-devel, netdev, ANNIE LI

On Tue, Oct 15, 2013 at 11:19:42PM +0800, jianhai luan wrote:
[...]
> >>>>
> >>>>* time_after_eq(now, next_credit) -> false
> >>>>* time_before(now, expires) -> false
> >>>If now is placed in above environment, the result will be correct
> >>>(Sending package will be not allowed until next_credit).
> >>No, it is not necessarily correct. Keep in mind that "now" wraps around,
> >>which is the issue you try to fix. You still have a window to stall your
> >>frontend.
> >Remember that time_after_eq is supposed to work even with wraparound
> >occurring, so long as the two times are less than MAX_LONG/2 apart.
> 
> Sorry for my misunderstand explanation. I mean that
>   * time_after_eq()/time_before_eq() fix the jiffies wraparound, so
> please think about  jiffies in line increasing.
>   * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2),
> the judge will be wrong if out of the range.
> 
> So please think about three kind environment
>   -  expires        now        next_credit
>      --------time increases this direction ---------->
> 
>   -  expires        [next_credit        now next_credit+MAX_LONG/2
>      --------time increase this direction ----------->
> 
>   - expires        next_credit        next_credit+MAX_LONG/2 now
>      --------time increadse this direction ---------->
> 
> The first environment should be netfront consume all credit_byte
> before next_credit, So we should pending one timer to calculator the
> new credit_byte, and don't transmit until next_credit.
> 
> the second environment should be calculator the credit_byte because
> netfront don't consume all credit_byte before next_credit, and
> time_after_eq() do correct judge.
> 
> the third environment should be calculator in time because netfront
> don't consume all credit_byte until next_credit.But time_after_eq do
> error judge (time_after_eq(now, next_credit) is false), so the
> remaining_byte isn't be increased.
> 
> and I work on the third environment.  You know now >
> next_credit+MAX_LONG/2, time_before(now, expire) should be
> true(time_before(now, expire) is false in first environment)
> >

Thanks for staighten this out for me. I'm just too dumb for this, please
be patient with me. :-)

Could you prove that time_before(now, expire) is always true in third
case? That's where my main cencern lies. Is it because msecs_to_jiffies
always returns MAX_JIFFY_OFFSET (which is ((LONG_MAX >> 1)-1) ) at most?

Wei.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15 16:03                       ` Wei Liu
@ 2013-10-15 16:23                         ` jianhai luan
  2013-10-16  0:15                           ` jianhai luan
  0 siblings, 1 reply; 26+ messages in thread
From: jianhai luan @ 2013-10-15 16:23 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, netdev, ANNIE LI


On 2013-10-16 0:03, Wei Liu wrote:
> On Tue, Oct 15, 2013 at 11:19:42PM +0800, jianhai luan wrote:
> [...]
>>>>>> * time_after_eq(now, next_credit) -> false
>>>>>> * time_before(now, expires) -> false
>>>>> If now is placed in above environment, the result will be correct
>>>>> (Sending package will be not allowed until next_credit).
>>>> No, it is not necessarily correct. Keep in mind that "now" wraps around,
>>>> which is the issue you try to fix. You still have a window to stall your
>>>> frontend.
>>> Remember that time_after_eq is supposed to work even with wraparound
>>> occurring, so long as the two times are less than MAX_LONG/2 apart.
>> Sorry for my misunderstand explanation. I mean that
>>    * time_after_eq()/time_before_eq() fix the jiffies wraparound, so
>> please think about  jiffies in line increasing.
>>    * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2),
>> the judge will be wrong if out of the range.
>>
>> So please think about three kind environment
>>    -  expires        now        next_credit
>>       --------time increases this direction ---------->
>>
>>    -  expires        [next_credit        now next_credit+MAX_LONG/2
>>       --------time increase this direction ----------->
>>
>>    - expires        next_credit        next_credit+MAX_LONG/2 now
>>       --------time increadse this direction ---------->
>>
>> The first environment should be netfront consume all credit_byte
>> before next_credit, So we should pending one timer to calculator the
>> new credit_byte, and don't transmit until next_credit.
>>
>> the second environment should be calculator the credit_byte because
>> netfront don't consume all credit_byte before next_credit, and
>> time_after_eq() do correct judge.
>>
>> the third environment should be calculator in time because netfront
>> don't consume all credit_byte until next_credit.But time_after_eq do
>> error judge (time_after_eq(now, next_credit) is false), so the
>> remaining_byte isn't be increased.
>>
>> and I work on the third environment.  You know now >
>> next_credit+MAX_LONG/2, time_before(now, expire) should be
>> true(time_before(now, expire) is false in first environment)
> Thanks for staighten this out for me. I'm just too dumb for this, please
> be patient with me. :-)
>
> Could you prove that time_before(now, expire) is always true in third
> case? That's where my main cencern lies. Is it because msecs_to_jiffies
> always returns MAX_JIFFY_OFFSET (which is ((LONG_MAX >> 1)-1) ) at most?

I have wrong judge in third environment. If now large than expires + 
MAX_UNLONG, time_before(now, expires) will be false.
   expires    next_credit    next_credit+MAX_UNLONG/2    expires + 
MAX_UNLONG    now    next_credit+MAX_UNLONG
   --------------------------------------------------------- time 
increadse this direction  ---------------------------------->

   In the above environment, time_before(now, expires) will return 
false. But the jiffies elapsed more time and next_credit will be 
reachable in soon(time_after_eq(now, next_credit) will be true).
>
> Wei.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15 16:23                         ` jianhai luan
@ 2013-10-16  0:15                           ` jianhai luan
  0 siblings, 0 replies; 26+ messages in thread
From: jianhai luan @ 2013-10-16  0:15 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, netdev, ANNIE LI


On 2013-10-16 0:23, jianhai luan wrote:
>
> On 2013-10-16 0:03, Wei Liu wrote:
>> On Tue, Oct 15, 2013 at 11:19:42PM +0800, jianhai luan wrote:
>> [...]
>>>>>>> * time_after_eq(now, next_credit) -> false
>>>>>>> * time_before(now, expires) -> false
>>>>>> If now is placed in above environment, the result will be correct
>>>>>> (Sending package will be not allowed until next_credit).
>>>>> No, it is not necessarily correct. Keep in mind that "now" wraps 
>>>>> around,
>>>>> which is the issue you try to fix. You still have a window to 
>>>>> stall your
>>>>> frontend.
>>>> Remember that time_after_eq is supposed to work even with wraparound
>>>> occurring, so long as the two times are less than MAX_LONG/2 apart.
>>> Sorry for my misunderstand explanation. I mean that
>>>    * time_after_eq()/time_before_eq() fix the jiffies wraparound, so
>>> please think about  jiffies in line increasing.
>>>    * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2),
>>> the judge will be wrong if out of the range.
>>>
>>> So please think about three kind environment
>>>    -  expires        now        next_credit
>>>       --------time increases this direction ---------->
>>>
>>>    -  expires        [next_credit        now next_credit+MAX_LONG/2
>>>       --------time increase this direction ----------->
>>>
>>>    - expires        next_credit        next_credit+MAX_LONG/2 now
>>>       --------time increadse this direction ---------->
>>>
>>> The first environment should be netfront consume all credit_byte
>>> before next_credit, So we should pending one timer to calculator the
>>> new credit_byte, and don't transmit until next_credit.
>>>
>>> the second environment should be calculator the credit_byte because
>>> netfront don't consume all credit_byte before next_credit, and
>>> time_after_eq() do correct judge.
>>>
>>> the third environment should be calculator in time because netfront
>>> don't consume all credit_byte until next_credit.But time_after_eq do
>>> error judge (time_after_eq(now, next_credit) is false), so the
>>> remaining_byte isn't be increased.
>>>
>>> and I work on the third environment.  You know now >
>>> next_credit+MAX_LONG/2, time_before(now, expire) should be
>>> true(time_before(now, expire) is false in first environment)
>> Thanks for staighten this out for me. I'm just too dumb for this, please
>> be patient with me. :-)
>>
>> Could you prove that time_before(now, expire) is always true in third
>> case? That's where my main cencern lies. Is it because msecs_to_jiffies
>> always returns MAX_JIFFY_OFFSET (which is ((LONG_MAX >> 1)-1) ) at most?
>
> I have wrong judge in third environment. If now large than expires + 
> MAX_UNLONG, time_before(now, expires) will be false.
>   expires    next_credit    next_credit+MAX_UNLONG/2    expires + 
> MAX_UNLONG    now    next_credit+MAX_UNLONG
>   --------------------------------------------------------- time 
> increadse this direction  ---------------------------------->
>
>   In the above environment, time_before(now, expires) will return 
> false. But the jiffies elapsed more time and next_credit will be 
> reachable in soon(time_after_eq(now, next_credit) will be true).

After above talk, the window should be exist (expire+MAX_ULONG 
next_credit+MAX_ULONG, expire + 2MAX_ULONF next_credit+MAX_ULONG 
....,expire+<n>ULONG next_credit+<n>MAX_ULONG). Other time window should 
be not exist (maybe i don't think about).
  If so, please think about the below:
   * If no speed control, vif->credit_usec should be zero. expire = 
next_credit and the window is zero
   * If we have done speed control, the scenario should be very likely 
than first environment except the value is larger (the delta is 
<n>MAX_UNLONG)
      - If do speed control. the window should have been think about
                      speed            worse time
                      100M/s          40s
                      1000M/s       4s
      - the time should be acceptable.
>>
>> Wei.
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15  8:43     ` Ian Campbell
  2013-10-15  9:34       ` jianhai luan
@ 2013-10-16  7:10       ` annie li
  2013-10-16  8:46         ` Ian Campbell
  1 sibling, 1 reply; 26+ messages in thread
From: annie li @ 2013-10-16  7:10 UTC (permalink / raw)
  To: Ian Campbell; +Cc: jianhai luan, Wei Liu, xen-devel, netdev


On 2013-10-15 16:43, Ian Campbell wrote:
> On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote:
>> On 2013-10-14 19:19, Wei Liu wrote:
>>> On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:
>>>> Hi Ian,
>>>>     I meet the DomU's network interface hung issue recently, and have
>>>> been working on the issue from that time. I find that DomU's network
>>>> interface, which send lesser package, will hung if Dom0 running
>>>> 32bit and DomU's up-time is very long.  I think that one jiffies
>>>> overflow bug exist in the function tx_credit_exceeded().
>>>>     I know the inline function time_after_eq(a,b) will process jiffies
>>>> overflow, but the function have one limit a should little that (b +
>>>> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will
>>>> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit
>>>> machine.
>>>>     If DomU's network interface send lesser package (<0.5k/s if
>>>> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out
>>>> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now,
>>>> next_credit) will failure (should be true). So one timer which will
>>>> not be trigger in short time, and later process will be aborted when
>>>> timer_pending(&vif->credit_timeout) is true. The result will be
>>>> DomU's network interface will be hung in long time (> 40days).
>>>>     Please think about the below scenario:
>>>>     Condition:
>>>>       Dom0 running 32-bit and HZ = 1000
>>>>       vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit
>>>> = 0xffffffff, vif->credit_usec=0 jiffies=0
>>>>       vif receive lesser package (DomU send lesser package). If the
>>>> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55
>>>> hours. jiffies will large than 0x7ffffff. we guess jiffies =
>>>> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and
>>>> one time which expire is 0xfffffff will be pended into system. So
>>>> the interface will hung until jiffies recount 0xffffffff (that will
>>>> need very long time).
>>> If I'm not mistaken you meant time_after_eq(now, next_credit) in
>>> netback. How does next_credit become 0xffffffff?
>> I only assume the value is 0xfffffff, and the value of next_credit
>> isn't  point. If the delta between now and next_credit larger than
>> ULONG_MAX, time_after_eq will do wrong judge.
> So it sounds like we need a timer which is independent of the traffic
> being sent to keep credit_timeout.expires rolling over.
Resending it because of the wrong format in last email...

Is it a timer to be set as less than ULONG_MAX/2 to avoid credit_timeout.expires rolling over? But the problem is that we can not assure where jiffies start from, and this probably results into current issue again.
I assume Jason's patch fix this issue and this patch only uses __mod_timer to add a timer with next_credit when the netback fails to send out current
available credits.

Thanks
Annie


>
> Can you propose a patch?
>
> Ian.
>
>>> Wei.
>>>
>>>>     If some error exist in above explain, please help me point it out.
>>>>
>>>> Thanks,
>>>> Jason
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-15 12:58             ` Wei Liu
  2013-10-15 14:29               ` jianhai luan
@ 2013-10-16  7:35               ` jianhai luan
  2013-10-16  9:39                 ` [Xen-devel] " annie li
  1 sibling, 1 reply; 26+ messages in thread
From: jianhai luan @ 2013-10-16  7:35 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, netdev, ANNIE LI

[-- Attachment #1: Type: text/plain, Size: 1755 bytes --]


On 2013-10-15 20:58, Wei Liu wrote:
> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
> [...]
>>>>> Can you propose a patch?
>>>> Because credit_timeout.expire always after jiffies, i judge the
>>>> value over the range of time_after_eq() by time_before(now,
>>>> vif->credit_timeout.expires). please check the patch.
>>> I don't think this really fix the issue for you. You still have chance
>>> that now wraps around and falls between expires and next_credit. In that
>>> case it's stalled again.
>> if time_before(now, vif->credit_timeout.expires) is true, time wrap
>> and do operation. Otherwise time_before(now,
>> vif->credit_timeout.expires) isn't true, now -
>> vif->credit_timeout.expires should be letter than ULONG_MAX/2.
>> Because next_credit large than vif->credit_timeout.expires
>> (next_crdit = vif->credit_timeout.expires +
>> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
>> next_credit should be in range of time_after_eq().  So
>> time_after_eq() do correctly judge.
>>
> Not sure I understand you. Consider "now" is placed like this:
>
>     expires   now   next_credit
>     ----time increases this direction--->
>
> * time_after_eq(now, next_credit) -> false
> * time_before(now, expires) -> false
>
> Then it's stuck again. You're merely narrowing the window, not fixing
> the real problem.

The above environment isn't stack again. The netback will pending one 
timer to process the environment.

The attachment program will prove if !(time_after_eq(now, next_credit) 
|| time_before(now, vif->credit_timeout.expires)), now will only be 
placed in  above environment [ expires next_credit),  and the above 
environment will be processed by timer in soon.
>
> Wei.
>
>> Jason
>>> Wei.
Jason.

[-- Attachment #2: main.c --]
[-- Type: text/plain, Size: 961 bytes --]

#include <stdio.h>

#define typecheck(type,x) \
({	type __dummy; \
	typeof(x) __dummy2; \
	(void)(&__dummy == &__dummy2); \
	1; \
})

#define time_after(a, b)		\
	(typecheck(unsigned char, a) && \
	 typecheck(unsigned char, b) && \
	 ((char)((b) - (a)) < 0))
#define time_before(a,b)	time_after(b,a)

#define time_after_eq(a,b)		\
	(typecheck(unsigned char, a) && \
	 typecheck(unsigned char, b) && \
	 ((char)((a) -(b)) >= 0))
#define time_before_eq(a, b) time_after_eq(b,a)

void do_nothing()
{
	return;
}

int main()
{
	unsigned char expire, now, next;
	unsigned char delta = 10;
	int i, j;

	for(i = 0; i < 256; i++) {
		expire = i;
		next = expire + delta;

		printf("\n\n\n[%u ... %u]\n", expire, next);
		now = expire;
		for(j=0; j < 1024; j++, now++) {	
			if(j%256 == 0) printf("\n");

			if (time_after_eq(now, next) ||
				time_before(now, expire)) {
				do_nothing();
			}
			else {
				printf("    now=%d\n", (char)now);
			}
		}
	}
	
	return 0;
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DomU's network interface will hung when Dom0 running 32bit
  2013-10-16  7:10       ` annie li
@ 2013-10-16  8:46         ` Ian Campbell
  0 siblings, 0 replies; 26+ messages in thread
From: Ian Campbell @ 2013-10-16  8:46 UTC (permalink / raw)
  To: annie li; +Cc: jianhai luan, Wei Liu, xen-devel, netdev

On Wed, 2013-10-16 at 15:10 +0800, annie li wrote:
> Is it a timer to be set as less than ULONG_MAX/2 to avoid
> credit_timeout.expires rolling over? But the problem is that we can
> not assure where jiffies start from, and this probably results into
> current issue again.

Not less than the absolute value ULONG_MAX/2, just less than ULONG_MAX/2
offset from "now". The time_eq macros deal with wraparound.

Looks like Jianhai and Wei are discussing a better fix than an
additional timer though.

Ian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] DomU's network interface will hung when Dom0 running 32bit
  2013-10-16  7:35               ` jianhai luan
@ 2013-10-16  9:39                 ` annie li
  2013-10-16 13:08                   ` jianhai luan
  0 siblings, 1 reply; 26+ messages in thread
From: annie li @ 2013-10-16  9:39 UTC (permalink / raw)
  To: jianhai luan; +Cc: Wei Liu, xen-devel, Ian Campbell, netdev


On 2013-10-16 15:35, jianhai luan wrote:
>
> On 2013-10-15 20:58, Wei Liu wrote:
>> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
>> [...]
>>>>>> Can you propose a patch?
>>>>> Because credit_timeout.expire always after jiffies, i judge the
>>>>> value over the range of time_after_eq() by time_before(now,
>>>>> vif->credit_timeout.expires). please check the patch.
>>>> I don't think this really fix the issue for you. You still have chance
>>>> that now wraps around and falls between expires and next_credit. In 
>>>> that
>>>> case it's stalled again.
>>> if time_before(now, vif->credit_timeout.expires) is true, time wrap
>>> and do operation. Otherwise time_before(now,
>>> vif->credit_timeout.expires) isn't true, now -
>>> vif->credit_timeout.expires should be letter than ULONG_MAX/2.
>>> Because next_credit large than vif->credit_timeout.expires
>>> (next_crdit = vif->credit_timeout.expires +
>>> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
>>> next_credit should be in range of time_after_eq().  So
>>> time_after_eq() do correctly judge.
>>>
>> Not sure I understand you. Consider "now" is placed like this:
>>
>>     expires   now   next_credit
>>     ----time increases this direction--->
>>
>> * time_after_eq(now, next_credit) -> false
>> * time_before(now, expires) -> false
>>
>> Then it's stuck again. You're merely narrowing the window, not fixing
>> the real problem.
>
> The above environment isn't stack again. The netback will pending one 
> timer to process the environment.
>
> The attachment program will prove if !(time_after_eq(now, next_credit) 
> || time_before(now, vif->credit_timeout.expires)), now will only be 
> placed in  above environment [ expires next_credit),  and the above 
> environment will be processed by timer in soon.

Or check following to see what the if condition really do,

----------expires-------now-------credit----------    is the only case 
where we need to add a timer.

Other cases like following would match the if condition above, then no 
timer is added.
----------expires----------credit------now------
-----now-----expires----------credit----------

Or we can consider the extreme condition, when the rate control does not 
exist, "credit_usec" is zero, and "next_credit" is equal to "expires". 
The above if condition would cover all conditions, and no rate control 
really happens. If credit_usec is not zero, the "if condition" would 
cover the range outside of that from expires to next_credit.

Even if "now" is wrapped again into the range from "expires" to 
"next_credit", the "next_credit" that is set in __mod_timer is 
reasonable value(this can be gotten from credit_usec), and the timer 
would be hit soon.

Thanks
Annie
>>
>> Wei.
>>
>>> Jason
>>>> Wei.
> Jason.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] DomU's network interface will hung when Dom0 running 32bit
  2013-10-16  9:39                 ` [Xen-devel] " annie li
@ 2013-10-16 13:08                   ` jianhai luan
  2013-10-16 13:47                     ` Wei Liu
  0 siblings, 1 reply; 26+ messages in thread
From: jianhai luan @ 2013-10-16 13:08 UTC (permalink / raw)
  To: annie li; +Cc: Wei Liu, xen-devel, Ian Campbell, netdev


On 2013-10-16 17:39, annie li wrote:
>
> On 2013-10-16 15:35, jianhai luan wrote:
>>
>> On 2013-10-15 20:58, Wei Liu wrote:
>>> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
>>> [...]
>>>>>>> Can you propose a patch?
>>>>>> Because credit_timeout.expire always after jiffies, i judge the
>>>>>> value over the range of time_after_eq() by time_before(now,
>>>>>> vif->credit_timeout.expires). please check the patch.
>>>>> I don't think this really fix the issue for you. You still have 
>>>>> chance
>>>>> that now wraps around and falls between expires and next_credit. 
>>>>> In that
>>>>> case it's stalled again.
>>>> if time_before(now, vif->credit_timeout.expires) is true, time wrap
>>>> and do operation. Otherwise time_before(now,
>>>> vif->credit_timeout.expires) isn't true, now -
>>>> vif->credit_timeout.expires should be letter than ULONG_MAX/2.
>>>> Because next_credit large than vif->credit_timeout.expires
>>>> (next_crdit = vif->credit_timeout.expires +
>>>> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
>>>> next_credit should be in range of time_after_eq().  So
>>>> time_after_eq() do correctly judge.
>>>>
>>> Not sure I understand you. Consider "now" is placed like this:
>>>
>>>     expires   now   next_credit
>>>     ----time increases this direction--->
>>>
>>> * time_after_eq(now, next_credit) -> false
>>> * time_before(now, expires) -> false
>>>
>>> Then it's stuck again. You're merely narrowing the window, not fixing
>>> the real problem.
>>
>> The above environment isn't stack again. The netback will pending one 
>> timer to process the environment.
>>
>> The attachment program will prove if !(time_after_eq(now, 
>> next_credit) || time_before(now, vif->credit_timeout.expires)), now 
>> will only be placed in above environment [ expires next_credit),  and 
>> the above environment will be processed by timer in soon.
>
> Or check following to see what the if condition really do,
>
> ----------expires-------now-------credit----------    is the only case 
> where we need to add a timer.
>
> Other cases like following would match the if condition above, then no 
> timer is added.
> ----------expires----------credit------now------
> -----now-----expires----------credit----------
>
> Or we can consider the extreme condition, when the rate control does 
> not exist, "credit_usec" is zero, and "next_credit" is equal to 
> "expires". The above if condition would cover all conditions, and no 
> rate control really happens. If credit_usec is not zero, the "if 
> condition" would cover the range outside of that from expires to 
> next_credit.
>
> Even if "now" is wrapped again into the range from "expires" to 
> "next_credit", the "next_credit" that is set in __mod_timer is 
> reasonable value(this can be gotten from credit_usec), and the timer 
> would be hit soon.

Thanks Annie's express, my option is consistent with Annie.

Jason
>
> Thanks
> Annie
>>>
>>> Wei.
>>>
>>>> Jason
>>>>> Wei.
>> Jason.
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] DomU's network interface will hung when Dom0 running 32bit
  2013-10-16 13:08                   ` jianhai luan
@ 2013-10-16 13:47                     ` Wei Liu
  2013-10-16 15:04                       ` jianhai luan
  0 siblings, 1 reply; 26+ messages in thread
From: Wei Liu @ 2013-10-16 13:47 UTC (permalink / raw)
  To: jianhai luan; +Cc: annie li, Wei Liu, xen-devel, Ian Campbell, netdev

On Wed, Oct 16, 2013 at 09:08:03PM +0800, jianhai luan wrote:
[...]
> >>>
> >>>    expires   now   next_credit
> >>>    ----time increases this direction--->
> >>>
> >>>* time_after_eq(now, next_credit) -> false
> >>>* time_before(now, expires) -> false
> >>>
> >>>Then it's stuck again. You're merely narrowing the window, not fixing
> >>>the real problem.
> >>
> >>The above environment isn't stack again. The netback will
> >>pending one timer to process the environment.
> >>
> >>The attachment program will prove if !(time_after_eq(now,
> >>next_credit) || time_before(now, vif->credit_timeout.expires)),
> >>now will only be placed in above environment [ expires
> >>next_credit),  and the above environment will be processed by
> >>timer in soon.
> >
> >Or check following to see what the if condition really do,
> >
> >----------expires-------now-------credit----------    is the only
> >case where we need to add a timer.
> >
> >Other cases like following would match the if condition above,
> >then no timer is added.
> >----------expires----------credit------now------
> >-----now-----expires----------credit----------
> >
> >Or we can consider the extreme condition, when the rate control
> >does not exist, "credit_usec" is zero, and "next_credit" is equal
> >to "expires". The above if condition would cover all conditions,
> >and no rate control really happens. If credit_usec is not zero,
> >the "if condition" would cover the range outside of that from
> >expires to next_credit.
> >
> >Even if "now" is wrapped again into the range from "expires" to
> >"next_credit", the "next_credit" that is set in __mod_timer is
> >reasonable value(this can be gotten from credit_usec), and the
> >timer would be hit soon.
> 
> Thanks Annie's express, my option is consistent with Annie.
> 

OK, thanks for the explanation. I think I get the idea.

In any case, could you use !time_in_range / !time_in_range_open instead
of open coded one-liner? Though I presume one-liner open coding would
not hurt.

Wei.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] DomU's network interface will hung when Dom0 running 32bit
  2013-10-16 13:47                     ` Wei Liu
@ 2013-10-16 15:04                       ` jianhai luan
  2013-10-16 15:17                         ` Wei Liu
  2013-10-16 15:26                         ` annie li
  0 siblings, 2 replies; 26+ messages in thread
From: jianhai luan @ 2013-10-16 15:04 UTC (permalink / raw)
  To: Wei Liu; +Cc: annie li, xen-devel, Ian Campbell, netdev

[-- Attachment #1: Type: text/plain, Size: 2170 bytes --]


On 2013-10-16 21:47, Wei Liu wrote:
> On Wed, Oct 16, 2013 at 09:08:03PM +0800, jianhai luan wrote:
> [...]
>>>>>     expires   now   next_credit
>>>>>     ----time increases this direction--->
>>>>>
>>>>> * time_after_eq(now, next_credit) -> false
>>>>> * time_before(now, expires) -> false
>>>>>
>>>>> Then it's stuck again. You're merely narrowing the window, not fixing
>>>>> the real problem.
>>>> The above environment isn't stack again. The netback will
>>>> pending one timer to process the environment.
>>>>
>>>> The attachment program will prove if !(time_after_eq(now,
>>>> next_credit) || time_before(now, vif->credit_timeout.expires)),
>>>> now will only be placed in above environment [ expires
>>>> next_credit),  and the above environment will be processed by
>>>> timer in soon.
>>> Or check following to see what the if condition really do,
>>>
>>> ----------expires-------now-------credit----------    is the only
>>> case where we need to add a timer.
>>>
>>> Other cases like following would match the if condition above,
>>> then no timer is added.
>>> ----------expires----------credit------now------
>>> -----now-----expires----------credit----------
>>>
>>> Or we can consider the extreme condition, when the rate control
>>> does not exist, "credit_usec" is zero, and "next_credit" is equal
>>> to "expires". The above if condition would cover all conditions,
>>> and no rate control really happens. If credit_usec is not zero,
>>> the "if condition" would cover the range outside of that from
>>> expires to next_credit.
>>>
>>> Even if "now" is wrapped again into the range from "expires" to
>>> "next_credit", the "next_credit" that is set in __mod_timer is
>>> reasonable value(this can be gotten from credit_usec), and the
>>> timer would be hit soon.
>> Thanks Annie's express, my option is consistent with Annie.
>>
> OK, thanks for the explanation. I think I get the idea.
>
> In any case, could you use !time_in_range / !time_in_range_open instead
> of open coded one-liner? Though I presume one-liner open coding would
> not hurt.

I agree above the suggest. The attachment patch may be better than 
previous patch.

Jason
>
> Wei.


[-- Attachment #2: 0001-xen-netback-pending-timer-only-in-the-range-expire-n.patch --]
[-- Type: text/plain, Size: 1627 bytes --]

>From ef02403a10173896c5c102f768741d0700b8a3a2 Mon Sep 17 00:00:00 2001
From: Jason Luan <jianhai.luan@oracle.com>
Date: Tue, 15 Oct 2013 17:07:49 +0800
Subject: [PATCH] xen-netback: pending timer only in the range [expire,
 next_credit)

The function time_after_eq() do correct judge in range of MAX_UNLONG/2.
If net-front send lesser package, the delta between now and next_credit
will out of the range and time_after_eq() will do wrong judge in result
to net-front hung.  For example:
    expire    next_credit    ....    next_credit+MAX_UNLONG/2    now
    -----------------time increases this direction----------------->

We should be add the environment which now beyond next_credit+MAX_UNLONG/2.
Because the fact now mustn't before expire, time_before(now, expire) == true
will show the environment.
    time_after_eq(now, next_credit) || time_before (now, expire)
    ==
    !time_in_range_open(now, expire, next_credit)

Signed-off-by: Jason Luan <jianhai.luan@oracle.com>
---
 drivers/net/xen-netback/netback.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index f3e591c..62492f0 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1195,7 +1195,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 		return true;
 
 	/* Passed the point where we can replenish credit? */
-	if (time_after_eq(now, next_credit)) {
+	if (!time_in_range(now, vif->credit_timeout.expires, next_credit)) {
 		vif->credit_timeout.expires = now;
 		tx_add_credit(vif);
 	}
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] DomU's network interface will hung when Dom0 running 32bit
  2013-10-16 15:04                       ` jianhai luan
@ 2013-10-16 15:17                         ` Wei Liu
  2013-10-16 16:11                           ` David Vrabel
  2013-10-16 15:26                         ` annie li
  1 sibling, 1 reply; 26+ messages in thread
From: Wei Liu @ 2013-10-16 15:17 UTC (permalink / raw)
  To: jianhai luan; +Cc: Wei Liu, annie li, xen-devel, Ian Campbell, netdev

On Wed, Oct 16, 2013 at 11:04:34PM +0800, jianhai luan wrote:
[...]
> >From ef02403a10173896c5c102f768741d0700b8a3a2 Mon Sep 17 00:00:00 2001
> From: Jason Luan <jianhai.luan@oracle.com>
> Date: Tue, 15 Oct 2013 17:07:49 +0800
> Subject: [PATCH] xen-netback: pending timer only in the range [expire,
>  next_credit)
> 
> The function time_after_eq() do correct judge in range of MAX_UNLONG/2.
> If net-front send lesser package, the delta between now and next_credit
> will out of the range and time_after_eq() will do wrong judge in result
> to net-front hung.  For example:
>     expire    next_credit    ....    next_credit+MAX_UNLONG/2    now
>     -----------------time increases this direction----------------->
> 
> We should be add the environment which now beyond next_credit+MAX_UNLONG/2.
> Because the fact now mustn't before expire, time_before(now, expire) == true
> will show the environment.
>     time_after_eq(now, next_credit) || time_before (now, expire)
>     ==
>     !time_in_range_open(now, expire, next_credit)
> 

OK, you say !time_in_range_open here but the code below has
!time_in_range. I suppose it is a typo?

Your patch should go against "net" tree so you need to have
"[PATCH net]" as prefix.

Also don't send your patch as attachment, send it inline. You can use
git format-patch, git send-email to do this.

Could you please fix the above issues and resend the patch to netdev,
with Ian and me CC'ed.

You can find out how netdev works in
Documentation/networking/netdev-FAQ.txt.

Last but not least, thanks for your contribution!

Wei.

> Signed-off-by: Jason Luan <jianhai.luan@oracle.com>
> ---
>  drivers/net/xen-netback/netback.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index f3e591c..62492f0 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1195,7 +1195,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
>  		return true;
>  
>  	/* Passed the point where we can replenish credit? */
> -	if (time_after_eq(now, next_credit)) {
> +	if (!time_in_range(now, vif->credit_timeout.expires, next_credit)) {
>  		vif->credit_timeout.expires = now;
>  		tx_add_credit(vif);
>  	}
> -- 
> 1.7.6.5
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] DomU's network interface will hung when Dom0 running 32bit
  2013-10-16 15:04                       ` jianhai luan
  2013-10-16 15:17                         ` Wei Liu
@ 2013-10-16 15:26                         ` annie li
  1 sibling, 0 replies; 26+ messages in thread
From: annie li @ 2013-10-16 15:26 UTC (permalink / raw)
  To: jianhai luan; +Cc: Wei Liu, xen-devel, Ian Campbell, netdev


On 2013-10-16 23:04, jianhai luan wrote:
>
>  From ef02403a10173896c5c102f768741d0700b8a3a2 Mon Sep 17 00:00:00 2001
> From: Jason Luan<jianhai.luan@oracle.com>
> Date: Tue, 15 Oct 2013 17:07:49 +0800
> Subject: [PATCH] xen-netback: pending timer only in the range [expire,
>   next_credit)
>
> The function time_after_eq() do correct judge in range of MAX_UNLONG/2.
> If net-front send lesser package, the delta between now and next_credit
> will out of the range and time_after_eq() will do wrong judge in result
> to net-front hung.  For example:
>      expire    next_credit    ....    next_credit+MAX_UNLONG/2    now
>      -----------------time increases this direction----------------->
>
> We should be add the environment which now beyond next_credit+MAX_UNLONG/2.
> Because the fact now mustn't before expire, time_before(now, expire) == true
> will show the environment.
>      time_after_eq(now, next_credit) || time_before (now, expire)
>      ==
>      !time_in_range_open(now, expire, next_credit)
>
> Signed-off-by: Jason Luan<jianhai.luan@oracle.com>
> ---
>   drivers/net/xen-netback/netback.c |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index f3e591c..62492f0 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1195,7 +1195,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
>   		return true;
>   
>   	/* Passed the point where we can replenish credit? */

I think the comments above can be removed, and it is better to explain 
the if condition here briefly.

Thanks
Annie

> -	if (time_after_eq(now, next_credit)) {
> +	if (!time_in_range(now, vif->credit_timeout.expires, next_credit)) {
>   		vif->credit_timeout.expires = now;
>   		tx_add_credit(vif);
>   	}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] DomU's network interface will hung when Dom0 running 32bit
  2013-10-16 15:17                         ` Wei Liu
@ 2013-10-16 16:11                           ` David Vrabel
  2013-10-16 16:44                             ` jianhai luan
  0 siblings, 1 reply; 26+ messages in thread
From: David Vrabel @ 2013-10-16 16:11 UTC (permalink / raw)
  To: Wei Liu; +Cc: jianhai luan, xen-devel, annie li, Ian Campbell, netdev

On 16/10/13 16:17, Wei Liu wrote:
> On Wed, Oct 16, 2013 at 11:04:34PM +0800, jianhai luan wrote:
> [...]
>> >From ef02403a10173896c5c102f768741d0700b8a3a2 Mon Sep 17 00:00:00 2001
>> From: Jason Luan <jianhai.luan@oracle.com>
>> Date: Tue, 15 Oct 2013 17:07:49 +0800
>> Subject: [PATCH] xen-netback: pending timer only in the range [expire,
>>  next_credit)
>>
>> The function time_after_eq() do correct judge in range of MAX_UNLONG/2.
>> If net-front send lesser package, the delta between now and next_credit
>> will out of the range and time_after_eq() will do wrong judge in result
>> to net-front hung.  For example:
>>     expire    next_credit    ....    next_credit+MAX_UNLONG/2    now
>>     -----------------time increases this direction----------------->
>>
>> We should be add the environment which now beyond next_credit+MAX_UNLONG/2.
>> Because the fact now mustn't before expire, time_before(now, expire) == true
>> will show the environment.
>>     time_after_eq(now, next_credit) || time_before (now, expire)
>>     ==
>>     !time_in_range_open(now, expire, next_credit)
>>

I would like the description improved because it's too hard to understand.

How about something like:

"time_after_eq() only works if the delta is < MAX_ULONG/2.

If netfront sends at a very low rate, the time between subsequent calls
to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
timer_after_eq() will be incorrect.  Credit will not be replenished and
the guest may become unable to send (e.g., if prior to the long gap, all
credit was exhausted)."

But that's as far as I get because I can't see how the fix is correct.
The time_in_range() test might still return the wrong value if now has
advanced even further and wrapped so it is between expire and
next_credit again.

I think the credit timeout should be always armed to expire in
MAX_ULONG/4 jiffies (or some other large value).  If credit is exceeded,
this timer is then adjusted to fire earlier (at next_credit as it does
already).

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] DomU's network interface will hung when Dom0 running 32bit
  2013-10-16 16:11                           ` David Vrabel
@ 2013-10-16 16:44                             ` jianhai luan
  0 siblings, 0 replies; 26+ messages in thread
From: jianhai luan @ 2013-10-16 16:44 UTC (permalink / raw)
  To: David Vrabel, Wei Liu; +Cc: xen-devel, annie li, Ian Campbell, netdev


On 2013-10-17 0:11, David Vrabel wrote:
> On 16/10/13 16:17, Wei Liu wrote:
>> On Wed, Oct 16, 2013 at 11:04:34PM +0800, jianhai luan wrote:
>> [...]
>>> >From ef02403a10173896c5c102f768741d0700b8a3a2 Mon Sep 17 00:00:00 2001
>>> From: Jason Luan <jianhai.luan@oracle.com>
>>> Date: Tue, 15 Oct 2013 17:07:49 +0800
>>> Subject: [PATCH] xen-netback: pending timer only in the range [expire,
>>>   next_credit)
>>>
>>> The function time_after_eq() do correct judge in range of MAX_UNLONG/2.
>>> If net-front send lesser package, the delta between now and next_credit
>>> will out of the range and time_after_eq() will do wrong judge in result
>>> to net-front hung.  For example:
>>>      expire    next_credit    ....    next_credit+MAX_UNLONG/2    now
>>>      -----------------time increases this direction----------------->
>>>
>>> We should be add the environment which now beyond next_credit+MAX_UNLONG/2.
>>> Because the fact now mustn't before expire, time_before(now, expire) == true
>>> will show the environment.
>>>      time_after_eq(now, next_credit) || time_before (now, expire)
>>>      ==
>>>      !time_in_range_open(now, expire, next_credit)
>>>
> I would like the description improved because it's too hard to understand.
>
> How about something like:
>
> "time_after_eq() only works if the delta is < MAX_ULONG/2.
>
> If netfront sends at a very low rate, the time between subsequent calls
> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
> timer_after_eq() will be incorrect.  Credit will not be replenished and
> the guest may become unable to send (e.g., if prior to the long gap, all
> credit was exhausted)."

Thanks your description, i will accept it. :)
>
> But that's as far as I get because I can't see how the fix is correct.
> The time_in_range() test might still return the wrong value if now has
> advanced even further and wrapped so it is between expire and
> next_credit again.

typo, time_in_range() should be time_in_range_open().
Yes, if now have advanced even further and wrapped, it will always fall 
in [ expire, next_credit). In the range, please think two scenario:
    * No transmit limit: expire == next_credit, the range will be zero, 
replenish will always be done.
    * Transmit limit: Because guest may be consume all credit_bytes in 
very short time, other time in [expire, next_credit) will don't send any 
package. So the time which don't send package should be think about when 
we set the rate parameter.
       So if now fall in the range, the hung time should be acceptable. 
(if rate=10000M/s, the worse time will be 4s).
>
> I think the credit timeout should be always armed to expire in
> MAX_ULONG/4 jiffies (or some other large value).  If credit is exceeded,
> this timer is then adjusted to fire earlier (at next_credit as it does
> already).

Setting timer may be fixed the issue. But i don't think how to verify 
the fixed expect waiting 180 days. I verified the above patch only 
change expire's value to emulator the scenario.
>
> David

Jason.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-10-16 16:45 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-12  8:53 DomU's network interface will hung when Dom0 running 32bit jianhai luan
2013-10-14 11:19 ` Wei Liu
2013-10-15  2:44   ` jianhai luan
2013-10-15  8:43     ` Ian Campbell
2013-10-15  9:34       ` jianhai luan
2013-10-15 10:06         ` Wei Liu
2013-10-15 11:26           ` jianhai luan
2013-10-15 12:58             ` Wei Liu
2013-10-15 14:29               ` jianhai luan
2013-10-15 14:49                 ` Wei Liu
2013-10-15 14:50                   ` Ian Campbell
2013-10-15 15:19                     ` jianhai luan
2013-10-15 16:03                       ` Wei Liu
2013-10-15 16:23                         ` jianhai luan
2013-10-16  0:15                           ` jianhai luan
2013-10-16  7:35               ` jianhai luan
2013-10-16  9:39                 ` [Xen-devel] " annie li
2013-10-16 13:08                   ` jianhai luan
2013-10-16 13:47                     ` Wei Liu
2013-10-16 15:04                       ` jianhai luan
2013-10-16 15:17                         ` Wei Liu
2013-10-16 16:11                           ` David Vrabel
2013-10-16 16:44                             ` jianhai luan
2013-10-16 15:26                         ` annie li
2013-10-16  7:10       ` annie li
2013-10-16  8:46         ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).