* Re: [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:13 [PATCH net] xen-netback: disable rogue vif in kthread context Wei Liu
@ 2014-03-24 12:22 ` Ian Campbell
2014-03-24 12:33 ` David Laight
2014-03-24 12:33 ` David Laight
2014-03-24 12:22 ` Ian Campbell
` (4 subsequent siblings)
5 siblings, 2 replies; 15+ messages in thread
From: Ian Campbell @ 2014-03-24 12:22 UTC (permalink / raw)
To: Wei Liu; +Cc: netdev, xen-devel, paul.durrant, zoltan.kiss, edwin
On Mon, 2014-03-24 at 12:13 +0000, Wei Liu wrote:
> When netback discovers frontend is sending malformed packet it will
> disables the interface which serves that frontend.
>
> However disabling a network interface involving taking a mutex which
> cannot be done in softirq context, so we need to defer this process to
> kthread context.
>
> This patch does the following:
> 1. introduce a flag to indicate the interface is disabled.
> 2. check that flag in TX path, don't do any work if it's true.
> 3. check that flag in RX path, turn off that interface if it's true.
>
> The reason to disable it in RX path is because RX uses kthread. After
> this change the behavior of netback is still consistent -- it won't do
> any TX work for a rogue frontend, and the interface will be eventually
> turned off.
>
> Also change a "continue" to "break" after xenvif_fatal_tx_err, as it
> doesn't make sense to continue processing packets if frontend is rogue.
>
> This is a fix for XSA-90.
>
> Reported-by: Török Edwin <edwin@etorok.net>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
> + if (unlikely(vif->disabled) && netif_carrier_ok(vif->dev))
Perhaps consider extending the scope of the unlikely over the entire
expression? (not that I expect it will matter much)
Ian.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:22 ` Ian Campbell
@ 2014-03-24 12:33 ` David Laight
2014-03-24 12:33 ` David Laight
1 sibling, 0 replies; 15+ messages in thread
From: David Laight @ 2014-03-24 12:33 UTC (permalink / raw)
To: 'Ian Campbell', Wei Liu
Cc: netdev@vger.kernel.org, paul.durrant@citrix.com, edwin@etorok.net,
zoltan.kiss@citrix.com, xen-devel@lists.xen.org
From: Ian Campbell
...
> > + if (unlikely(vif->disabled) && netif_carrier_ok(vif->dev))
>
> Perhaps consider extending the scope of the unlikely over the entire
> expression? (not that I expect it will matter much)
A lot of these 'unlikely' are as much a hint to the person
reading the code than to the compiler!
I found (with a much older gcc) that an overall 'unlikely' didn't
have the same effect as one on each part.
I also found that gcc ignored 'unlikely' if the 'else' branch is empty.
'if (unlikely(...)) continue' generated a backwards (predicted true)
conditional branch. I added an asm("comment " ## __LINE__) before
the continue to force a forwards condition branch to an unconditional
branch to the loop top.
(Yes - I did care about every clock in that code.)
David
^ permalink raw reply [flat|nested] 15+ messages in thread* RE: [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:22 ` Ian Campbell
2014-03-24 12:33 ` David Laight
@ 2014-03-24 12:33 ` David Laight
1 sibling, 0 replies; 15+ messages in thread
From: David Laight @ 2014-03-24 12:33 UTC (permalink / raw)
To: 'Ian Campbell', Wei Liu
Cc: netdev@vger.kernel.org, xen-devel@lists.xen.org,
paul.durrant@citrix.com, zoltan.kiss@citrix.com, edwin@etorok.net
From: Ian Campbell
...
> > + if (unlikely(vif->disabled) && netif_carrier_ok(vif->dev))
>
> Perhaps consider extending the scope of the unlikely over the entire
> expression? (not that I expect it will matter much)
A lot of these 'unlikely' are as much a hint to the person
reading the code than to the compiler!
I found (with a much older gcc) that an overall 'unlikely' didn't
have the same effect as one on each part.
I also found that gcc ignored 'unlikely' if the 'else' branch is empty.
'if (unlikely(...)) continue' generated a backwards (predicted true)
conditional branch. I added an asm("comment " ## __LINE__) before
the continue to force a forwards condition branch to an unconditional
branch to the loop top.
(Yes - I did care about every clock in that code.)
David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:13 [PATCH net] xen-netback: disable rogue vif in kthread context Wei Liu
2014-03-24 12:22 ` Ian Campbell
@ 2014-03-24 12:22 ` Ian Campbell
2014-03-24 12:49 ` David Vrabel
` (3 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2014-03-24 12:22 UTC (permalink / raw)
To: Wei Liu; +Cc: netdev, paul.durrant, edwin, zoltan.kiss, xen-devel
On Mon, 2014-03-24 at 12:13 +0000, Wei Liu wrote:
> When netback discovers frontend is sending malformed packet it will
> disables the interface which serves that frontend.
>
> However disabling a network interface involving taking a mutex which
> cannot be done in softirq context, so we need to defer this process to
> kthread context.
>
> This patch does the following:
> 1. introduce a flag to indicate the interface is disabled.
> 2. check that flag in TX path, don't do any work if it's true.
> 3. check that flag in RX path, turn off that interface if it's true.
>
> The reason to disable it in RX path is because RX uses kthread. After
> this change the behavior of netback is still consistent -- it won't do
> any TX work for a rogue frontend, and the interface will be eventually
> turned off.
>
> Also change a "continue" to "break" after xenvif_fatal_tx_err, as it
> doesn't make sense to continue processing packets if frontend is rogue.
>
> This is a fix for XSA-90.
>
> Reported-by: Török Edwin <edwin@etorok.net>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
> + if (unlikely(vif->disabled) && netif_carrier_ok(vif->dev))
Perhaps consider extending the scope of the unlikely over the entire
expression? (not that I expect it will matter much)
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:13 [PATCH net] xen-netback: disable rogue vif in kthread context Wei Liu
2014-03-24 12:22 ` Ian Campbell
2014-03-24 12:22 ` Ian Campbell
@ 2014-03-24 12:49 ` David Vrabel
2014-03-24 12:49 ` [Xen-devel] " David Vrabel
` (2 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: David Vrabel @ 2014-03-24 12:49 UTC (permalink / raw)
To: Wei Liu; +Cc: Ian Campbell, netdev, edwin, xen-devel, paul.durrant, zoltan.kiss
On 24/03/14 12:13, Wei Liu wrote:
> When netback discovers frontend is sending malformed packet it will
> disables the interface which serves that frontend.
>
> However disabling a network interface involving taking a mutex which
> cannot be done in softirq context, so we need to defer this process to
> kthread context.
>
> This patch does the following:
> 1. introduce a flag to indicate the interface is disabled.
> 2. check that flag in TX path, don't do any work if it's true.
> 3. check that flag in RX path, turn off that interface if it's true.
>
> The reason to disable it in RX path is because RX uses kthread. After
> this change the behavior of netback is still consistent -- it won't do
> any TX work for a rogue frontend, and the interface will be eventually
> turned off.
>
> Also change a "continue" to "break" after xenvif_fatal_tx_err, as it
> doesn't make sense to continue processing packets if frontend is rogue.
[...]
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -62,6 +62,13 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
> struct xenvif *vif = container_of(napi, struct xenvif, napi);
> int work_done;
>
> + /* This vif is rogue, we pretend we've used up all budget to
> + * deschedule it from NAPI. But this interface will be turned
> + * off in thread context later.
> + */
> + if (unlikely(vif->disabled))
> + return budget;
Shouldn't you call __napi_complete() and return 0? Returning budget
will make NAPI poll repeatedly (since you're pretending to do work).
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 438d0c0..94e7261 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -655,7 +655,7 @@ static void xenvif_tx_err(struct xenvif *vif,
> static void xenvif_fatal_tx_err(struct xenvif *vif)
> {
> netdev_err(vif->dev, "fatal error; disabling device\n");
> - xenvif_carrier_off(vif);
> + vif->disabled = true;
Do you need to wake the thread here?
> @@ -1549,6 +1549,16 @@ int xenvif_kthread(void *data)
> wait_event_interruptible(vif->wq,
> rx_work_todo(vif) ||
> kthread_should_stop());
|| vif->disabled ?
> +
> + /* This frontend is found to be rogue, disable it in
> + * kthread context. Currently this is only set when
> + * netback finds out frontend sends malformed packet,
> + * but we cannot disable the interface in softirq
> + * context so we defer it here.
> + */
> + if (unlikely(vif->disabled) && netif_carrier_ok(vif->dev))
> + xenvif_carrier_off(vif);
> +
> if (kthread_should_stop())
> break;
>
As an aside, since I happened to be looking at xenvif_poll(), disabling
local irqs to avoid problems with concurrent events looks unsafe as the
event may occur on another VCPU.
__napi_complete(napi);
RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
if (more_to_do)
napi_schedule(napi);
Would work I think.
David
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [Xen-devel] [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:13 [PATCH net] xen-netback: disable rogue vif in kthread context Wei Liu
` (2 preceding siblings ...)
2014-03-24 12:49 ` David Vrabel
@ 2014-03-24 12:49 ` David Vrabel
2014-03-24 13:17 ` Zoltan Kiss
` (3 more replies)
2014-03-24 19:29 ` David Miller
2014-03-24 19:29 ` David Miller
5 siblings, 4 replies; 15+ messages in thread
From: David Vrabel @ 2014-03-24 12:49 UTC (permalink / raw)
To: Wei Liu; +Cc: netdev, xen-devel, paul.durrant, edwin, Ian Campbell, zoltan.kiss
On 24/03/14 12:13, Wei Liu wrote:
> When netback discovers frontend is sending malformed packet it will
> disables the interface which serves that frontend.
>
> However disabling a network interface involving taking a mutex which
> cannot be done in softirq context, so we need to defer this process to
> kthread context.
>
> This patch does the following:
> 1. introduce a flag to indicate the interface is disabled.
> 2. check that flag in TX path, don't do any work if it's true.
> 3. check that flag in RX path, turn off that interface if it's true.
>
> The reason to disable it in RX path is because RX uses kthread. After
> this change the behavior of netback is still consistent -- it won't do
> any TX work for a rogue frontend, and the interface will be eventually
> turned off.
>
> Also change a "continue" to "break" after xenvif_fatal_tx_err, as it
> doesn't make sense to continue processing packets if frontend is rogue.
[...]
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -62,6 +62,13 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
> struct xenvif *vif = container_of(napi, struct xenvif, napi);
> int work_done;
>
> + /* This vif is rogue, we pretend we've used up all budget to
> + * deschedule it from NAPI. But this interface will be turned
> + * off in thread context later.
> + */
> + if (unlikely(vif->disabled))
> + return budget;
Shouldn't you call __napi_complete() and return 0? Returning budget
will make NAPI poll repeatedly (since you're pretending to do work).
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 438d0c0..94e7261 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -655,7 +655,7 @@ static void xenvif_tx_err(struct xenvif *vif,
> static void xenvif_fatal_tx_err(struct xenvif *vif)
> {
> netdev_err(vif->dev, "fatal error; disabling device\n");
> - xenvif_carrier_off(vif);
> + vif->disabled = true;
Do you need to wake the thread here?
> @@ -1549,6 +1549,16 @@ int xenvif_kthread(void *data)
> wait_event_interruptible(vif->wq,
> rx_work_todo(vif) ||
> kthread_should_stop());
|| vif->disabled ?
> +
> + /* This frontend is found to be rogue, disable it in
> + * kthread context. Currently this is only set when
> + * netback finds out frontend sends malformed packet,
> + * but we cannot disable the interface in softirq
> + * context so we defer it here.
> + */
> + if (unlikely(vif->disabled) && netif_carrier_ok(vif->dev))
> + xenvif_carrier_off(vif);
> +
> if (kthread_should_stop())
> break;
>
As an aside, since I happened to be looking at xenvif_poll(), disabling
local irqs to avoid problems with concurrent events looks unsafe as the
event may occur on another VCPU.
__napi_complete(napi);
RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
if (more_to_do)
napi_schedule(napi);
Would work I think.
David
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [Xen-devel] [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:49 ` [Xen-devel] " David Vrabel
@ 2014-03-24 13:17 ` Zoltan Kiss
2014-03-24 13:17 ` Zoltan Kiss
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Zoltan Kiss @ 2014-03-24 13:17 UTC (permalink / raw)
To: David Vrabel, Wei Liu
Cc: netdev, xen-devel, paul.durrant, edwin, Ian Campbell
On 24/03/14 12:49, David Vrabel wrote:
> On 24/03/14 12:13, Wei Liu wrote:
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -62,6 +62,13 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
>> struct xenvif *vif = container_of(napi, struct xenvif, napi);
>> int work_done;
>>
>> + /* This vif is rogue, we pretend we've used up all budget to
>> + * deschedule it from NAPI. But this interface will be turned
>> + * off in thread context later.
>> + */
>> + if (unlikely(vif->disabled))
>> + return budget;
>
> Shouldn't you call __napi_complete() and return 0? Returning budget
> will make NAPI poll repeatedly (since you're pretending to do work).
The comment in net_rx_action:
/* Drivers must not modify the NAPI state if they
* consume the entire weight. In such cases this code
* still "owns" the NAPI instance and therefore can
* move the instance around on the list at-will.
*/
>
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 438d0c0..94e7261 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -655,7 +655,7 @@ static void xenvif_tx_err(struct xenvif *vif,
>> static void xenvif_fatal_tx_err(struct xenvif *vif)
>> {
>> netdev_err(vif->dev, "fatal error; disabling device\n");
>> - xenvif_carrier_off(vif);
>> + vif->disabled = true;
>
> Do you need to wake the thread here?
>
>> @@ -1549,6 +1549,16 @@ int xenvif_kthread(void *data)
>> wait_event_interruptible(vif->wq,
>> rx_work_todo(vif) ||
>> kthread_should_stop());
>
> || vif->disabled ?
>
>> +
>> + /* This frontend is found to be rogue, disable it in
>> + * kthread context. Currently this is only set when
>> + * netback finds out frontend sends malformed packet,
>> + * but we cannot disable the interface in softirq
>> + * context so we defer it here.
>> + */
>> + if (unlikely(vif->disabled) && netif_carrier_ok(vif->dev))
>> + xenvif_carrier_off(vif);
>> +
>> if (kthread_should_stop())
>> break;
>>
>
> As an aside, since I happened to be looking at xenvif_poll(), disabling
> local irqs to avoid problems with concurrent events looks unsafe as the
> event may occur on another VCPU.
>
> __napi_complete(napi);
> RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
> if (more_to_do)
> napi_schedule(napi);
>
> Would work I think.
>
> David
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:49 ` [Xen-devel] " David Vrabel
2014-03-24 13:17 ` Zoltan Kiss
@ 2014-03-24 13:17 ` Zoltan Kiss
2014-03-25 0:55 ` [Xen-devel] " Wei Liu
2014-03-25 0:55 ` Wei Liu
3 siblings, 0 replies; 15+ messages in thread
From: Zoltan Kiss @ 2014-03-24 13:17 UTC (permalink / raw)
To: David Vrabel, Wei Liu
Cc: netdev, paul.durrant, edwin, Ian Campbell, xen-devel
On 24/03/14 12:49, David Vrabel wrote:
> On 24/03/14 12:13, Wei Liu wrote:
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -62,6 +62,13 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
>> struct xenvif *vif = container_of(napi, struct xenvif, napi);
>> int work_done;
>>
>> + /* This vif is rogue, we pretend we've used up all budget to
>> + * deschedule it from NAPI. But this interface will be turned
>> + * off in thread context later.
>> + */
>> + if (unlikely(vif->disabled))
>> + return budget;
>
> Shouldn't you call __napi_complete() and return 0? Returning budget
> will make NAPI poll repeatedly (since you're pretending to do work).
The comment in net_rx_action:
/* Drivers must not modify the NAPI state if they
* consume the entire weight. In such cases this code
* still "owns" the NAPI instance and therefore can
* move the instance around on the list at-will.
*/
>
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 438d0c0..94e7261 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -655,7 +655,7 @@ static void xenvif_tx_err(struct xenvif *vif,
>> static void xenvif_fatal_tx_err(struct xenvif *vif)
>> {
>> netdev_err(vif->dev, "fatal error; disabling device\n");
>> - xenvif_carrier_off(vif);
>> + vif->disabled = true;
>
> Do you need to wake the thread here?
>
>> @@ -1549,6 +1549,16 @@ int xenvif_kthread(void *data)
>> wait_event_interruptible(vif->wq,
>> rx_work_todo(vif) ||
>> kthread_should_stop());
>
> || vif->disabled ?
>
>> +
>> + /* This frontend is found to be rogue, disable it in
>> + * kthread context. Currently this is only set when
>> + * netback finds out frontend sends malformed packet,
>> + * but we cannot disable the interface in softirq
>> + * context so we defer it here.
>> + */
>> + if (unlikely(vif->disabled) && netif_carrier_ok(vif->dev))
>> + xenvif_carrier_off(vif);
>> +
>> if (kthread_should_stop())
>> break;
>>
>
> As an aside, since I happened to be looking at xenvif_poll(), disabling
> local irqs to avoid problems with concurrent events looks unsafe as the
> event may occur on another VCPU.
>
> __napi_complete(napi);
> RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
> if (more_to_do)
> napi_schedule(napi);
>
> Would work I think.
>
> David
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [Xen-devel] [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:49 ` [Xen-devel] " David Vrabel
2014-03-24 13:17 ` Zoltan Kiss
2014-03-24 13:17 ` Zoltan Kiss
@ 2014-03-25 0:55 ` Wei Liu
2014-03-25 0:55 ` Wei Liu
3 siblings, 0 replies; 15+ messages in thread
From: Wei Liu @ 2014-03-25 0:55 UTC (permalink / raw)
To: David Vrabel
Cc: Wei Liu, netdev, xen-devel, paul.durrant, edwin, Ian Campbell,
zoltan.kiss
On Mon, Mar 24, 2014 at 12:49:11PM +0000, David Vrabel wrote:
> On 24/03/14 12:13, Wei Liu wrote:
> > When netback discovers frontend is sending malformed packet it will
> > disables the interface which serves that frontend.
> >
> > However disabling a network interface involving taking a mutex which
> > cannot be done in softirq context, so we need to defer this process to
> > kthread context.
> >
> > This patch does the following:
> > 1. introduce a flag to indicate the interface is disabled.
> > 2. check that flag in TX path, don't do any work if it's true.
> > 3. check that flag in RX path, turn off that interface if it's true.
> >
> > The reason to disable it in RX path is because RX uses kthread. After
> > this change the behavior of netback is still consistent -- it won't do
> > any TX work for a rogue frontend, and the interface will be eventually
> > turned off.
> >
> > Also change a "continue" to "break" after xenvif_fatal_tx_err, as it
> > doesn't make sense to continue processing packets if frontend is rogue.
> [...]
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -62,6 +62,13 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
> > struct xenvif *vif = container_of(napi, struct xenvif, napi);
> > int work_done;
> >
> > + /* This vif is rogue, we pretend we've used up all budget to
> > + * deschedule it from NAPI. But this interface will be turned
> > + * off in thread context later.
> > + */
> > + if (unlikely(vif->disabled))
> > + return budget;
>
> Shouldn't you call __napi_complete() and return 0? Returning budget
> will make NAPI poll repeatedly (since you're pretending to do work).
>
Yes. You're right.
> > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > index 438d0c0..94e7261 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -655,7 +655,7 @@ static void xenvif_tx_err(struct xenvif *vif,
> > static void xenvif_fatal_tx_err(struct xenvif *vif)
> > {
> > netdev_err(vif->dev, "fatal error; disabling device\n");
> > - xenvif_carrier_off(vif);
> > + vif->disabled = true;
>
> Do you need to wake the thread here?
>
That's a better approach.
> > @@ -1549,6 +1549,16 @@ int xenvif_kthread(void *data)
> > wait_event_interruptible(vif->wq,
> > rx_work_todo(vif) ||
> > kthread_should_stop());
>
> || vif->disabled ?
>
> > +
> > + /* This frontend is found to be rogue, disable it in
> > + * kthread context. Currently this is only set when
> > + * netback finds out frontend sends malformed packet,
> > + * but we cannot disable the interface in softirq
> > + * context so we defer it here.
> > + */
> > + if (unlikely(vif->disabled) && netif_carrier_ok(vif->dev))
> > + xenvif_carrier_off(vif);
> > +
> > if (kthread_should_stop())
> > break;
> >
>
> As an aside, since I happened to be looking at xenvif_poll(), disabling
> local irqs to avoid problems with concurrent events looks unsafe as the
> event may occur on another VCPU.
>
Are you seeing any problem? If this is not related to this fix we should
probably discuss this in another thread.
> __napi_complete(napi);
> RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
> if (more_to_do)
> napi_schedule(napi);
>
> Would work I think.
>
Not sure I get your suggestion. Sorry. If you're talking about the code
in xenvif_poll, there's comment up there describing a race. Again, this
should be discussed in separate thread.
Wei.
> David
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:49 ` [Xen-devel] " David Vrabel
` (2 preceding siblings ...)
2014-03-25 0:55 ` [Xen-devel] " Wei Liu
@ 2014-03-25 0:55 ` Wei Liu
3 siblings, 0 replies; 15+ messages in thread
From: Wei Liu @ 2014-03-25 0:55 UTC (permalink / raw)
To: David Vrabel
Cc: Wei Liu, Ian Campbell, netdev, edwin, xen-devel, paul.durrant,
zoltan.kiss
On Mon, Mar 24, 2014 at 12:49:11PM +0000, David Vrabel wrote:
> On 24/03/14 12:13, Wei Liu wrote:
> > When netback discovers frontend is sending malformed packet it will
> > disables the interface which serves that frontend.
> >
> > However disabling a network interface involving taking a mutex which
> > cannot be done in softirq context, so we need to defer this process to
> > kthread context.
> >
> > This patch does the following:
> > 1. introduce a flag to indicate the interface is disabled.
> > 2. check that flag in TX path, don't do any work if it's true.
> > 3. check that flag in RX path, turn off that interface if it's true.
> >
> > The reason to disable it in RX path is because RX uses kthread. After
> > this change the behavior of netback is still consistent -- it won't do
> > any TX work for a rogue frontend, and the interface will be eventually
> > turned off.
> >
> > Also change a "continue" to "break" after xenvif_fatal_tx_err, as it
> > doesn't make sense to continue processing packets if frontend is rogue.
> [...]
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -62,6 +62,13 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
> > struct xenvif *vif = container_of(napi, struct xenvif, napi);
> > int work_done;
> >
> > + /* This vif is rogue, we pretend we've used up all budget to
> > + * deschedule it from NAPI. But this interface will be turned
> > + * off in thread context later.
> > + */
> > + if (unlikely(vif->disabled))
> > + return budget;
>
> Shouldn't you call __napi_complete() and return 0? Returning budget
> will make NAPI poll repeatedly (since you're pretending to do work).
>
Yes. You're right.
> > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > index 438d0c0..94e7261 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -655,7 +655,7 @@ static void xenvif_tx_err(struct xenvif *vif,
> > static void xenvif_fatal_tx_err(struct xenvif *vif)
> > {
> > netdev_err(vif->dev, "fatal error; disabling device\n");
> > - xenvif_carrier_off(vif);
> > + vif->disabled = true;
>
> Do you need to wake the thread here?
>
That's a better approach.
> > @@ -1549,6 +1549,16 @@ int xenvif_kthread(void *data)
> > wait_event_interruptible(vif->wq,
> > rx_work_todo(vif) ||
> > kthread_should_stop());
>
> || vif->disabled ?
>
> > +
> > + /* This frontend is found to be rogue, disable it in
> > + * kthread context. Currently this is only set when
> > + * netback finds out frontend sends malformed packet,
> > + * but we cannot disable the interface in softirq
> > + * context so we defer it here.
> > + */
> > + if (unlikely(vif->disabled) && netif_carrier_ok(vif->dev))
> > + xenvif_carrier_off(vif);
> > +
> > if (kthread_should_stop())
> > break;
> >
>
> As an aside, since I happened to be looking at xenvif_poll(), disabling
> local irqs to avoid problems with concurrent events looks unsafe as the
> event may occur on another VCPU.
>
Are you seeing any problem? If this is not related to this fix we should
probably discuss this in another thread.
> __napi_complete(napi);
> RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
> if (more_to_do)
> napi_schedule(napi);
>
> Would work I think.
>
Not sure I get your suggestion. Sorry. If you're talking about the code
in xenvif_poll, there's comment up there describing a race. Again, this
should be discussed in separate thread.
Wei.
> David
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:13 [PATCH net] xen-netback: disable rogue vif in kthread context Wei Liu
` (3 preceding siblings ...)
2014-03-24 12:49 ` [Xen-devel] " David Vrabel
@ 2014-03-24 19:29 ` David Miller
2014-03-25 0:44 ` [Xen-devel] " Wei Liu
2014-03-25 0:44 ` Wei Liu
2014-03-24 19:29 ` David Miller
5 siblings, 2 replies; 15+ messages in thread
From: David Miller @ 2014-03-24 19:29 UTC (permalink / raw)
To: wei.liu2
Cc: netdev, xen-devel, paul.durrant, zoltan.kiss, edwin, ian.campbell
From: Wei Liu <wei.liu2@citrix.com>
Date: Mon, 24 Mar 2014 12:13:34 +0000
> @@ -62,6 +62,13 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
> struct xenvif *vif = container_of(napi, struct xenvif, napi);
> int work_done;
>
> + /* This vif is rogue, we pretend we've used up all budget to
> + * deschedule it from NAPI. But this interface will be turned
> + * off in thread context later.
> + */
> + if (unlikely(vif->disabled))
> + return budget;
> +
As mentioned by others, this makes NAPI poll forever.
The following comment was referenced:
/* Drivers must not modify the NAPI state if they
* consume the entire weight. In such cases this code
* still "owns" the NAPI instance and therefore can
* move the instance around on the list at-will.
*/
if (unlikely(work == weight)) {
WHICH MEANS, if you return that you used the full budget, the NAPI instance
is still owned by the core.
Why? Becuase if you used the entire budget, it's going to put you back onto
the polling list and invoke you again some time soon.
If the interface is disabled, you should return zero from the poll and do a
NAPI completion.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [Xen-devel] [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 19:29 ` David Miller
@ 2014-03-25 0:44 ` Wei Liu
2014-03-25 0:44 ` Wei Liu
1 sibling, 0 replies; 15+ messages in thread
From: Wei Liu @ 2014-03-25 0:44 UTC (permalink / raw)
To: David Miller
Cc: Wei Liu, Ian Campbell, netdev@vger.kernel.org, edwin,
xen-devel@lists.xen.org, Paul Durrant, zoltan.kiss
On Mon, Mar 24, 2014 at 7:29 PM, David Miller <davem@davemloft.net> wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> Date: Mon, 24 Mar 2014 12:13:34 +0000
>
>> @@ -62,6 +62,13 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
>> struct xenvif *vif = container_of(napi, struct xenvif, napi);
>> int work_done;
>>
>> + /* This vif is rogue, we pretend we've used up all budget to
>> + * deschedule it from NAPI. But this interface will be turned
>> + * off in thread context later.
>> + */
>> + if (unlikely(vif->disabled))
>> + return budget;
>> +
>
> As mentioned by others, this makes NAPI poll forever.
>
> The following comment was referenced:
>
> /* Drivers must not modify the NAPI state if they
> * consume the entire weight. In such cases this code
> * still "owns" the NAPI instance and therefore can
> * move the instance around on the list at-will.
> */
> if (unlikely(work == weight)) {
>
> WHICH MEANS, if you return that you used the full budget, the NAPI instance
> is still owned by the core.
>
> Why? Becuase if you used the entire budget, it's going to put you back onto
> the polling list and invoke you again some time soon.
>
> If the interface is disabled, you should return zero from the poll and do a
> NAPI completion.
>
Yes, you're right. I made a mistake in using the budget the other way around.
Wei.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 19:29 ` David Miller
2014-03-25 0:44 ` [Xen-devel] " Wei Liu
@ 2014-03-25 0:44 ` Wei Liu
1 sibling, 0 replies; 15+ messages in thread
From: Wei Liu @ 2014-03-25 0:44 UTC (permalink / raw)
To: David Miller
Cc: Wei Liu, Ian Campbell, netdev@vger.kernel.org, edwin,
xen-devel@lists.xen.org, Paul Durrant, zoltan.kiss
On Mon, Mar 24, 2014 at 7:29 PM, David Miller <davem@davemloft.net> wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> Date: Mon, 24 Mar 2014 12:13:34 +0000
>
>> @@ -62,6 +62,13 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
>> struct xenvif *vif = container_of(napi, struct xenvif, napi);
>> int work_done;
>>
>> + /* This vif is rogue, we pretend we've used up all budget to
>> + * deschedule it from NAPI. But this interface will be turned
>> + * off in thread context later.
>> + */
>> + if (unlikely(vif->disabled))
>> + return budget;
>> +
>
> As mentioned by others, this makes NAPI poll forever.
>
> The following comment was referenced:
>
> /* Drivers must not modify the NAPI state if they
> * consume the entire weight. In such cases this code
> * still "owns" the NAPI instance and therefore can
> * move the instance around on the list at-will.
> */
> if (unlikely(work == weight)) {
>
> WHICH MEANS, if you return that you used the full budget, the NAPI instance
> is still owned by the core.
>
> Why? Becuase if you used the entire budget, it's going to put you back onto
> the polling list and invoke you again some time soon.
>
> If the interface is disabled, you should return zero from the poll and do a
> NAPI completion.
>
Yes, you're right. I made a mistake in using the budget the other way around.
Wei.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net] xen-netback: disable rogue vif in kthread context
2014-03-24 12:13 [PATCH net] xen-netback: disable rogue vif in kthread context Wei Liu
` (4 preceding siblings ...)
2014-03-24 19:29 ` David Miller
@ 2014-03-24 19:29 ` David Miller
5 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2014-03-24 19:29 UTC (permalink / raw)
To: wei.liu2
Cc: ian.campbell, netdev, edwin, xen-devel, paul.durrant, zoltan.kiss
From: Wei Liu <wei.liu2@citrix.com>
Date: Mon, 24 Mar 2014 12:13:34 +0000
> @@ -62,6 +62,13 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
> struct xenvif *vif = container_of(napi, struct xenvif, napi);
> int work_done;
>
> + /* This vif is rogue, we pretend we've used up all budget to
> + * deschedule it from NAPI. But this interface will be turned
> + * off in thread context later.
> + */
> + if (unlikely(vif->disabled))
> + return budget;
> +
As mentioned by others, this makes NAPI poll forever.
The following comment was referenced:
/* Drivers must not modify the NAPI state if they
* consume the entire weight. In such cases this code
* still "owns" the NAPI instance and therefore can
* move the instance around on the list at-will.
*/
if (unlikely(work == weight)) {
WHICH MEANS, if you return that you used the full budget, the NAPI instance
is still owned by the core.
Why? Becuase if you used the entire budget, it's going to put you back onto
the polling list and invoke you again some time soon.
If the interface is disabled, you should return zero from the poll and do a
NAPI completion.
^ permalink raw reply [flat|nested] 15+ messages in thread