* [PATCH] ixp4xx: Support the all multicast flag on the NPE devices.
From: Richard Cochran @ 2010-05-31 13:11 UTC (permalink / raw)
To: Krzysztof Halasa; +Cc: netdev, linux-arm-kernel
This patch adds support for the IFF_ALLMULTI flag. Previously only the
IFF_PROMISC flag was supported.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
drivers/net/arm/ixp4xx_eth.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/drivers/net/arm/ixp4xx_eth.c b/drivers/net/arm/ixp4xx_eth.c
index 6be8b09..5d35064 100644
--- a/drivers/net/arm/ixp4xx_eth.c
+++ b/drivers/net/arm/ixp4xx_eth.c
@@ -739,6 +739,17 @@ static void eth_set_mcast_list(struct net_device *dev)
struct dev_mc_list *mclist;
u8 diffs[ETH_ALEN], *addr;
int i;
+ u8 allmulti[] = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00 };
+
+ if (dev->flags & IFF_ALLMULTI) {
+ for (i = 0; i < ETH_ALEN; i++) {
+ __raw_writel(allmulti[i], &port->regs->mcast_addr[i]);
+ __raw_writel(allmulti[i], &port->regs->mcast_mask[i]);
+ }
+ __raw_writel(DEFAULT_RX_CNTRL0 | RX_CNTRL0_ADDR_FLTR_EN,
+ &port->regs->rx_control[0]);
+ return;
+ }
if ((dev->flags & IFF_PROMISC) || netdev_mc_empty(dev)) {
__raw_writel(DEFAULT_RX_CNTRL0 & ~RX_CNTRL0_ADDR_FLTR_EN,
--
1.6.3.3
^ permalink raw reply related
* [PATCH] netfilter: xtables: stackptr should be percpu
From: Eric Dumazet @ 2010-05-31 13:13 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, Patrick McHardy, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311350580.31799@obet.zrqbmnf.qr>
Le lundi 31 mai 2010 à 13:51 +0200, Jan Engelhardt a écrit :
> On Monday 2010-05-31 13:06, Xiaotian Feng wrote:
>
> >In xt_register_table, xt_jumpstack_alloc is called first, later
> >xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
> >will be used again. Then the memory allocated by previous xt_jumpstack_alloc
> >will be leaked. We can simply remove the previous xt_jumpstack_alloc because
> >there aren't any users of newinfo between xt_jumpstack_alloc and
> >xt_replace_table.
>
> Indeed that seems to be so.
An official "Acked-by: ..." would be fine Jan :)
BTW I noticed a _big_ slowdown of iptables lately, and located the
reason.
All cpus share a single cache line for their 'stackptr' storage,
introduced in commit f3c5c1bfd4
This is a stable candidate (2.6.34)
Note : We also should use alloc_percpu() for jumpstack but this is not a
critical thing and can be a net-next patch.
[PATCH] netfilter: xtables: stackptr should be percpu
commit f3c5c1bfd4 (netfilter: xtables: make ip_tables reentrant)
introduced a performance regression, because stackptr array is shared by
all cpus, adding cache line ping pongs. (16 cpus share a 64 bytes cache
line)
Fix this using alloc_percpu()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/linux/netfilter/x_tables.h | 2 +-
net/ipv4/netfilter/ip_tables.c | 2 +-
net/ipv6/netfilter/ip6_tables.c | 2 +-
net/netfilter/x_tables.c | 13 +++----------
4 files changed, 6 insertions(+), 13 deletions(-)
diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index c00cc0c..24e5d01 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -397,7 +397,7 @@ struct xt_table_info {
* @stacksize jumps (number of user chains) can possibly be made.
*/
unsigned int stacksize;
- unsigned int *stackptr;
+ unsigned int __percpu *stackptr;
void ***jumpstack;
/* ipt_entry tables: one per CPU */
/* Note : this field MUST be the last one, see XT_TABLE_INFO_SZ */
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 63958f3..4b6c5ca 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -336,7 +336,7 @@ ipt_do_table(struct sk_buff *skb,
cpu = smp_processor_id();
table_base = private->entries[cpu];
jumpstack = (struct ipt_entry **)private->jumpstack[cpu];
- stackptr = &private->stackptr[cpu];
+ stackptr = per_cpu_ptr(private->stackptr, cpu);
origptr = *stackptr;
e = get_entry(table_base, private->hook_entry[hook]);
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 6f517bd..9d2d68f 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -363,7 +363,7 @@ ip6t_do_table(struct sk_buff *skb,
cpu = smp_processor_id();
table_base = private->entries[cpu];
jumpstack = (struct ip6t_entry **)private->jumpstack[cpu];
- stackptr = &private->stackptr[cpu];
+ stackptr = per_cpu_ptr(private->stackptr, cpu);
origptr = *stackptr;
e = get_entry(table_base, private->hook_entry[hook]);
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 445de70..7e8a93d 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -699,10 +699,8 @@ void xt_free_table_info(struct xt_table_info *info)
vfree(info->jumpstack);
else
kfree(info->jumpstack);
- if (sizeof(unsigned int) * nr_cpu_ids > PAGE_SIZE)
- vfree(info->stackptr);
- else
- kfree(info->stackptr);
+
+ free_percpu(info->stackptr);
kfree(info);
}
@@ -753,14 +751,9 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
unsigned int size;
int cpu;
- size = sizeof(unsigned int) * nr_cpu_ids;
- if (size > PAGE_SIZE)
- i->stackptr = vmalloc(size);
- else
- i->stackptr = kmalloc(size, GFP_KERNEL);
+ i->stackptr = alloc_percpu(unsigned int);
if (i->stackptr == NULL)
return -ENOMEM;
- memset(i->stackptr, 0, size);
size = sizeof(void **) * nr_cpu_ids;
if (size > PAGE_SIZE)
^ permalink raw reply related
* Re: [PATCH] netfilter: don't xt_jumpstack_alloc twice in xt_register_table
From: Jan Engelhardt @ 2010-05-31 13:19 UTC (permalink / raw)
To: Xiaotian Feng
Cc: netfilter-devel, netfilter, coreteam, linux-kernel, netdev,
Patrick McHardy, David S. Miller, Andrew Morton, Rusty Russell,
Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311350580.31799@obet.zrqbmnf.qr>
On Monday 2010-05-31 13:51, Jan Engelhardt wrote:
>On Monday 2010-05-31 13:06, Xiaotian Feng wrote:
>
>>In xt_register_table, xt_jumpstack_alloc is called first, later
>>xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
>>will be used again. Then the memory allocated by previous xt_jumpstack_alloc
>>will be leaked. We can simply remove the previous xt_jumpstack_alloc because
>>there aren't any users of newinfo between xt_jumpstack_alloc and
>>xt_replace_table.
>
>Indeed that seems to be so.
Acked-By: Jan Engelhardt <jengelh@medozas.de>
>
>>diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
>>index 445de70..47b1e79 100644
>>--- a/net/netfilter/x_tables.c
>>+++ b/net/netfilter/x_tables.c
>>@@ -844,10 +844,6 @@ struct xt_table *xt_register_table(struct net *net,
>> struct xt_table_info *private;
>> struct xt_table *t, *table;
>>
>>- ret = xt_jumpstack_alloc(newinfo);
>>- if (ret < 0)
>>- return ERR_PTR(ret);
>>-
>> /* Don't add one object to multiple lists. */
>> table = kmemdup(input_table, sizeof(struct xt_table), GFP_KERNEL);
>> if (!table) {
>--
>To unsubscribe from this list: send the line "unsubscribe netfilter" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [PATCH] netfilter: xtables: stackptr should be percpu
From: Jan Engelhardt @ 2010-05-31 13:22 UTC (permalink / raw)
To: Eric Dumazet
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, Patrick McHardy, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <1275311580.3291.44.camel@edumazet-laptop>
On Monday 2010-05-31 15:13, Eric Dumazet wrote:
>
>All cpus share a single cache line for their 'stackptr' storage,
>introduced in commit f3c5c1bfd4
>
>This is a stable candidate (2.6.34)
Stackptr was first introduced for 2.6.35-rcX.
>+ i->stackptr = alloc_percpu(unsigned int);
> if (i->stackptr == NULL)
> return -ENOMEM;
>- memset(i->stackptr, 0, size);
>
> size = sizeof(void **) * nr_cpu_ids;
> if (size > PAGE_SIZE)
Are alloc_percpu areas cleared?
Acked-By: Jan Engelhardt <jengelh@medozas.de>
^ permalink raw reply
* Re: [Patch] fix packet loss and massive ping spikes with PPP multi-link
From: Richard Hartmann @ 2010-05-31 13:39 UTC (permalink / raw)
To: Paul Mackerras
Cc: Ben McKeegan, netdev, linux-ppp, Alan Cox, Alexander E. Patrakov,
linux-kernel
In-Reply-To: <20100529021624.GA2538@brick.ozlabs.ibm.com>
On Sat, May 29, 2010 at 04:16, Paul Mackerras <paulus@samba.org> wrote:
> If you fix up the indentation issues (2-space indent in some of the
> added code -- if you're using emacs, set c-basic-offset to 8), I'll
> ack it and hopefully DaveM will pick it up.
If no one submits it by tonight, I will probably send a cleaned-up
version of the alternative patch. Not that I want to steal anyone's
laurels, mind. But having it in mainline is better than not having
it in mainline.
Richard
^ permalink raw reply
* Re: [PATCH] netfilter: xtables: stackptr should be percpu
From: Eric Dumazet @ 2010-05-31 13:44 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, Patrick McHardy, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311519340.25402@obet.zrqbmnf.qr>
Le lundi 31 mai 2010 à 15:22 +0200, Jan Engelhardt a écrit :
> On Monday 2010-05-31 15:13, Eric Dumazet wrote:
> >
> >All cpus share a single cache line for their 'stackptr' storage,
> >introduced in commit f3c5c1bfd4
> >
> >This is a stable candidate (2.6.34)
>
> Stackptr was first introduced for 2.6.35-rcX.
>
Indeed, I was fooled by 'git describe'
> >+ i->stackptr = alloc_percpu(unsigned int);
> > if (i->stackptr == NULL)
> > return -ENOMEM;
> >- memset(i->stackptr, 0, size);
> >
> > size = sizeof(void **) * nr_cpu_ids;
> > if (size > PAGE_SIZE)
>
> Are alloc_percpu areas cleared?
>
Yes, allocated chunks are cleared.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 1/2] Export firmware assigned labels of network devices to sysfs
From: Michael Ellerman @ 2010-05-31 14:07 UTC (permalink / raw)
To: K, Narendra
Cc: netdev@vger.kernel.org, linux-hotplug@vger.kernel.org,
linux-pci@vger.kernel.org, Domsch, Matt, Hargrave, Jordan,
Rose, Charles, Nijhawan, Vijay
In-Reply-To: <20100528115520.GA24114@littleblue.us.dell.com>
[-- Attachment #1: Type: text/plain, Size: 781 bytes --]
On Fri, 2010-05-28 at 06:55 -0500, K, Narendra wrote:
> Hello,
>
> This patch is in continuation of an earlier discussion -
>
> http://marc.info/?l=linux-netdev&m=126712978908314&w=3
>
> The patch has the following review suggestions from the community incorporated -
>
> 1. The name of the attribute has been changed from "smbiosname" to "label" to hide
> the implementation details.
> 2. The implementation has been moved to a new file drivers/pci/pci-label.c
You've changed the name, which is good, but the implementation is still
100% dependant on ACPI or DMI AFAICS.
So it seems to me until it's supported on another platform it may as
well go in pci-acpi.c, or at least only be compiled if (ACPI || DMI).
Otherwise it's just dead code.
cheers
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: [PATCH] ixp4xx: Support the all multicast flag on the NPE devices.
From: Krzysztof Halasa @ 2010-05-31 14:09 UTC (permalink / raw)
To: Richard Cochran; +Cc: netdev, linux-arm-kernel
In-Reply-To: <20100531131102.GA15870@riccoc20.at.omicron.at>
Richard Cochran <richardcochran@gmail.com> writes:
> This patch adds support for the IFF_ALLMULTI flag. Previously only the
> IFF_PROMISC flag was supported.
>
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
> ---
> drivers/net/arm/ixp4xx_eth.c | 11 +++++++++++
> 1 files changed, 11 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/arm/ixp4xx_eth.c b/drivers/net/arm/ixp4xx_eth.c
> index 6be8b09..5d35064 100644
> --- a/drivers/net/arm/ixp4xx_eth.c
> +++ b/drivers/net/arm/ixp4xx_eth.c
> @@ -739,6 +739,17 @@ static void eth_set_mcast_list(struct net_device *dev)
> struct dev_mc_list *mclist;
> u8 diffs[ETH_ALEN], *addr;
> int i;
> + u8 allmulti[] = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00 };
> +
> + if (dev->flags & IFF_ALLMULTI) {
> + for (i = 0; i < ETH_ALEN; i++) {
> + __raw_writel(allmulti[i], &port->regs->mcast_addr[i]);
> + __raw_writel(allmulti[i], &port->regs->mcast_mask[i]);
> + }
> + __raw_writel(DEFAULT_RX_CNTRL0 | RX_CNTRL0_ADDR_FLTR_EN,
> + &port->regs->rx_control[0]);
> + return;
> + }
Looks good, though I'd prefer a bit of "const" and "static" in the
allmulti[] declaration. Would you please repost? TIA.
--
Krzysztof Halasa
^ permalink raw reply
* Re: [PATCH] netfilter: xtables: stackptr should be percpu
From: Jan Engelhardt @ 2010-05-31 14:09 UTC (permalink / raw)
To: Eric Dumazet
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, Patrick McHardy, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <1275313448.3291.49.camel@edumazet-laptop>
On Monday 2010-05-31 15:44, Eric Dumazet wrote:
>Le lundi 31 mai 2010 à 15:22 +0200, Jan Engelhardt a écrit :
>> On Monday 2010-05-31 15:13, Eric Dumazet wrote:
>> >
>> >All cpus share a single cache line for their 'stackptr' storage,
>> >introduced in commit f3c5c1bfd4
>> >
>> >This is a stable candidate (2.6.34)
>>
>> Stackptr was first introduced for 2.6.35-rcX.
>
>Indeed, I was fooled by 'git describe'
Keep your friends close, and your enemies closer ;-)
git describe --contains f3c5c1bfd
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] netfilter: xtables: stackptr should be percpu
From: Eric Dumazet @ 2010-05-31 14:16 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, Patrick McHardy, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311608570.20623@obet.zrqbmnf.qr>
Le lundi 31 mai 2010 à 16:09 +0200, Jan Engelhardt a écrit :
> Keep your friends close, and your enemies closer ;-)
>
> git describe --contains f3c5c1bfd
Yes, --contains should be the default, and --predates the option :)
This is a bit OT anyway :)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] netfilter: don't xt_jumpstack_alloc twice in xt_register_table
From: Patrick McHardy @ 2010-05-31 14:34 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, David S. Miller, Andrew Morton, Rusty Russell,
Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311519160.25402@obet.zrqbmnf.qr>
Jan Engelhardt wrote:
> On Monday 2010-05-31 13:51, Jan Engelhardt wrote:
>> On Monday 2010-05-31 13:06, Xiaotian Feng wrote:
>>
>>> In xt_register_table, xt_jumpstack_alloc is called first, later
>>> xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
>>> will be used again. Then the memory allocated by previous xt_jumpstack_alloc
>>> will be leaked. We can simply remove the previous xt_jumpstack_alloc because
>>> there aren't any users of newinfo between xt_jumpstack_alloc and
>>> xt_replace_table.
>> Indeed that seems to be so.
>
> Acked-By: Jan Engelhardt <jengelh@medozas.de>
Applied, thanks everyone.
^ permalink raw reply
* Re: [PATCH] netfilter: don't xt_jumpstack_alloc twice in xt_register_table
From: Patrick McHardy @ 2010-05-31 14:37 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Xiaotian Feng, netfilter-devel, netfilter, coreteam, linux-kernel,
netdev, David S. Miller, Andrew Morton, Rusty Russell,
Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311519160.25402@obet.zrqbmnf.qr>
Jan Engelhardt wrote:
> On Monday 2010-05-31 13:51, Jan Engelhardt wrote:
>> On Monday 2010-05-31 13:06, Xiaotian Feng wrote:
>>
>>> In xt_register_table, xt_jumpstack_alloc is called first, later
>>> xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
>>> will be used again. Then the memory allocated by previous xt_jumpstack_alloc
>>> will be leaked. We can simply remove the previous xt_jumpstack_alloc because
>>> there aren't any users of newinfo between xt_jumpstack_alloc and
>>> xt_replace_table.
>> Indeed that seems to be so.
>
> Acked-By: Jan Engelhardt <jengelh@medozas.de>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] netfilter: xtables: stackptr should be percpu
From: Patrick McHardy @ 2010-05-31 14:37 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Eric Dumazet, Xiaotian Feng, netfilter-devel, netfilter, coreteam,
linux-kernel, netdev, David S. Miller, Andrew Morton,
Rusty Russell, Alexey Dobriyan
In-Reply-To: <alpine.LSU.2.01.1005311519340.25402@obet.zrqbmnf.qr>
Jan Engelhardt wrote:
> On Monday 2010-05-31 15:13, Eric Dumazet wrote:
>> All cpus share a single cache line for their 'stackptr' storage,
>> introduced in commit f3c5c1bfd4
>>
>> This is a stable candidate (2.6.34)
>
> Stackptr was first introduced for 2.6.35-rcX.
>
>> + i->stackptr = alloc_percpu(unsigned int);
>> if (i->stackptr == NULL)
>> return -ENOMEM;
>> - memset(i->stackptr, 0, size);
>>
>> size = sizeof(void **) * nr_cpu_ids;
>> if (size > PAGE_SIZE)
>
> Are alloc_percpu areas cleared?
>
> Acked-By: Jan Engelhardt <jengelh@medozas.de>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Oleg Nesterov @ 2010-05-31 14:39 UTC (permalink / raw)
To: Tejun Heo
Cc: Michael S. Tsirkin, Sridhar Samudrala, netdev, lkml,
kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C02C961.9050606@kernel.org>
On 05/30, Tejun Heo wrote:
>
> This conversion is to make each vhost use a dedicated kthread so that
> resource control via cgroup can be applied.
Personally, I agree. I think This is better than play with workqueue thread.
A couple of simple questions after the quick glance at the unapplied patch...
> void vhost_poll_flush(struct vhost_poll *poll)
> {
> - flush_work(&poll->work);
> + int seq = poll->queue_seq;
> +
> + if (seq - poll->done_seq > 0)
> + wait_event(poll->done, seq - poll->done_seq <= 0);
The check before wait_event() is not needed, please note that wait_event()
checks the condition before __wait_event().
What I can't understand is why we do have ->queue_seq and ->done_seq.
Isn't the single "bool poll->active" enough? vhost_poll_queue() sets
->active == T, vhost_poller() clears it before wake_up_all(poll->done).
> +static int vhost_poller(void *data)
> +{
> + struct vhost_dev *dev = data;
> + struct vhost_poll *poll;
> +
> +repeat:
> + set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
I don't understand the comment... why do we need this barrier?
> + if (kthread_should_stop()) {
> + __set_current_state(TASK_RUNNING);
> + return 0;
> + }
> +
> + poll = NULL;
> + spin_lock(&dev->poller_lock);
> + if (!list_empty(&dev->poll_list)) {
> + poll = list_first_entry(&dev->poll_list,
> + struct vhost_poll, node);
> + list_del_init(&poll->node);
> + }
> + spin_unlock(&dev->poller_lock);
> +
> + if (poll) {
> + __set_current_state(TASK_RUNNING);
> + poll->fn(poll);
> + smp_wmb(); /* paired with rmb in vhost_poll_flush() */
> + poll->done_seq = poll->queue_seq;
> + wake_up_all(&poll->done);
> + } else
> + schedule();
> +
> + goto repeat;
> +}
Given that vhost_poll_queue() does list_add() and wake_up_process() under
->poller_lock, I don't think we need any barriers to change ->state.
IOW, can't vhost_poller() simply do
while(!kthread_should_stop()) {
poll = NULL;
spin_lock(&dev->poller_lock);
if (!list_empty(&dev->poll_list)) {
...
} else
__set_current_state(TASK_INTERRUPTIBLE);
spin_unlock(&dev->poller_lock);
if (poll) {
...
} else
schedule();
}
?
Oleg.
^ permalink raw reply
* Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Tejun Heo @ 2010-05-31 15:07 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Michael S. Tsirkin, Sridhar Samudrala, netdev, lkml,
kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100531143945.GA6497@redhat.com>
Hello,
On 05/31/2010 04:39 PM, Oleg Nesterov wrote:
> On 05/30, Tejun Heo wrote:
>>
>> This conversion is to make each vhost use a dedicated kthread so that
>> resource control via cgroup can be applied.
>
> Personally, I agree. I think This is better than play with workqueue thread.
Yeap, I think so too. In vhost's case tho, as it exports a lot of
workqueue characteristics to vhost users, it's a bit more complex than
I wish it were. It can probably be simplified more if someone who
knows the code better takes a look or maybe we need to make this kind
of things easier by providing a generic helpers if more cases like
this spring up, but if that happens probably the RTTD would be somehow
teaching workqueue how to deal with cgroups. As this is the first
case, I guess open coding is okay for now.
> A couple of simple questions after the quick glance at the unapplied patch...
>
>> void vhost_poll_flush(struct vhost_poll *poll)
>> {
>> - flush_work(&poll->work);
>> + int seq = poll->queue_seq;
>> +
>> + if (seq - poll->done_seq > 0)
>> + wait_event(poll->done, seq - poll->done_seq <= 0);
>
> The check before wait_event() is not needed, please note that wait_event()
> checks the condition before __wait_event().
Heh... right, I was looking at __wait_event() and thinking "ooh... we
can skip lock in the fast path". :-)
> What I can't understand is why we do have ->queue_seq and ->done_seq.
>
> Isn't the single "bool poll->active" enough? vhost_poll_queue() sets
> ->active == T, vhost_poller() clears it before wake_up_all(poll->done).
I might have slightly over engineered this part not knowing the
expected workload. ->queue_seq/->done_seq pair is to guarantee that
flushers never get starved. Without sequencing queueings and
executions, flushers should wait for !pending && !active which can
take some time to come if the poll in question is very busy.
>> +static int vhost_poller(void *data)
>> +{
>> + struct vhost_dev *dev = data;
>> + struct vhost_poll *poll;
>> +
>> +repeat:
>> + set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
>
> I don't understand the comment... why do we need this barrier?
So that either kthread_stop()'s should_stop = 1 in kthread_stop() is
visible to kthread_should_stop() or task state is set to RUNNING.
>> + if (kthread_should_stop()) {
>> + __set_current_state(TASK_RUNNING);
>> + return 0;
>> + }
>> +
>> + poll = NULL;
>> + spin_lock(&dev->poller_lock);
>> + if (!list_empty(&dev->poll_list)) {
>> + poll = list_first_entry(&dev->poll_list,
>> + struct vhost_poll, node);
>> + list_del_init(&poll->node);
>> + }
>> + spin_unlock(&dev->poller_lock);
>> +
>> + if (poll) {
>> + __set_current_state(TASK_RUNNING);
>> + poll->fn(poll);
>> + smp_wmb(); /* paired with rmb in vhost_poll_flush() */
>> + poll->done_seq = poll->queue_seq;
>> + wake_up_all(&poll->done);
>> + } else
>> + schedule();
>> +
>> + goto repeat;
>> +}
>
> Given that vhost_poll_queue() does list_add() and wake_up_process() under
> ->poller_lock, I don't think we need any barriers to change ->state.
>
> IOW, can't vhost_poller() simply do
>
> while(!kthread_should_stop()) {
>
> poll = NULL;
> spin_lock(&dev->poller_lock);
> if (!list_empty(&dev->poll_list)) {
> ...
> } else
> __set_current_state(TASK_INTERRUPTIBLE);
> spin_unlock(&dev->poller_lock);
>
> if (poll) {
> ...
> } else
> schedule();
> }
> ?
But then kthread_stop() can happen between kthread_should_stop() and
__set_current_state(TASK_INTERRUPTIBLE) and poller can just sleep in
schedule() not knowing that.
Thank you.
--
tejun
^ permalink raw reply
* Re: [PATCH] mac8390: change an error return code and some cleanup, take 4
From: Joe Perches @ 2010-05-31 15:08 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: fthain, David Miller, p_gortmaker, netdev, linux-kernel,
linux-m68k
In-Reply-To: <AANLkTilkooBglbk33yDHmk8c_ZdlamOW8FVvuvjQaNsF@mail.gmail.com>
On Mon, 2010-05-31 at 11:58 +0200, Geert Uytterhoeven wrote:
> > To make it plain: there are 25 files or so that use ei_debug. Three of
> > those that now have the KERN_DEBUG printk's suppresed by the DEBUG macro
> > only do so as an apparently unintended side effect of a commit that claims
> > to "implement dynmic debug infrastructure". (Go figure.)
> >
> > http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=dd0fab5b940c0b65f26ac5b01485bac1f690ace6
> >
> > Your suggestion to use pr_debug is invoking compile time infrastructure
> > (the DEBUG macro), so it is not in the spirit of this commit, and it is
> > not relevant to any criticism from you or Joe of the earlier submissions.
> >
> > Please apply the patch.
>
> `pr_debug()' indeed now may generate code if DEBUG is not defined,
> i.e. if CONFIG_DYNAMIC_DEBUG is enabled.
> This is intented for debug infrastructure the user may want to enable later.
>
> If you want the old behavior, you can use `pr_devel()' instead, which
> only generates code if DEBUG is defined.
> This is intended for debug infrastructure for developers only.
>
> However, you used `printk(KERN_DEBUG pr_fmt()...)`, which always generates code.
> I'm still not 100% sure that was intentional?
There are many uses of KERN_DEBUG that are reasonable to have
always enabled.
There is no pr_<level> macro/function that is always enabled.
David, would you accept a new pr_<level> in kernel.h
for that purpose?
If so, do you have an opinion what it should be named?
I think pr_dbg is not ideal as dev_dbg is already in use
and can get optimized away.
Maybe one of:
pr_always_dbg
pr_dbg_always
pr_dbg_noopt
pr_tdbg
or something better? Anyone else?
http://lkml.org/lkml/2009/10/1/399
^ permalink raw reply
* Re: [PATCH] mac8390: change an error return code and some cleanup, take 4
From: David Miller @ 2010-05-31 15:14 UTC (permalink / raw)
To: joe; +Cc: geert, fthain, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <1275318493.6503.206.camel@Joe-Laptop.home>
From: Joe Perches <joe@perches.com>
Date: Mon, 31 May 2010 08:08:13 -0700
> There are many uses of KERN_DEBUG that are reasonable to have
> always enabled.
Doubtful.
pr_debug() makes a ton of sense as currently implemented.
It's for messages that we want both compile time and
run-time control over.
The case we have here in mac8390 seems like it should stay
as pr_info(). Because 1) it's already controlled by a
run-time knob so controlling it even further is confusing
and 2) the message is "informative", it lets the user know
a feature cannot be enabled, thus pr_info().
^ permalink raw reply
* Re: [PATCH] mac8390: change an error return code and some cleanup, take 4
From: Finn Thain @ 2010-05-31 15:19 UTC (permalink / raw)
To: David Miller; +Cc: joe, geert, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <20100531.081412.27799681.davem@davemloft.net>
On Mon, 31 May 2010, David Miller wrote:
> From: Joe Perches <joe@perches.com>
> Date: Mon, 31 May 2010 08:08:13 -0700
>
> > There are many uses of KERN_DEBUG that are reasonable to have always
> > enabled.
>
> Doubtful.
>
> pr_debug() makes a ton of sense as currently implemented. It's for
> messages that we want both compile time and run-time control over.
>
> The case we have here in mac8390 seems like it should stay as pr_info().
> Because 1) it's already controlled by a run-time knob so controlling it
> even further is confusing and 2) the message is "informative", it lets
> the user know a feature cannot be enabled, thus pr_info().
If that is true in general, then ei_debug itself becomes questionable.
In the case of mac8390 at least, I'm certainly leaning toward changing the
pr_info (conditional on ei_debug) to pr_debug (unconditional).
Finn
^ permalink raw reply
* Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Michael S. Tsirkin @ 2010-05-31 15:22 UTC (permalink / raw)
To: Tejun Heo
Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C02C961.9050606@kernel.org>
On Sun, May 30, 2010 at 10:24:01PM +0200, Tejun Heo wrote:
> Replace vhost_workqueue with per-vhost kthread. Other than callback
> argument change from struct work_struct * to struct vhost_poll *,
> there's no visible change to vhost_poll_*() interface.
I would prefer a substructure vhost_work, even just to make
the code easier to review and compare to workqueue.c.
> This conversion is to make each vhost use a dedicated kthread so that
> resource control via cgroup can be applied.
>
> Partially based on Sridhar Samudrala's patch.
>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com>
> ---
> Okay, here is three patch series to convert vhost to use per-vhost
> kthread, add cgroup_attach_task_current_cg() and apply it to the vhost
> kthreads. The conversion is mostly straight forward although flush is
> slightly tricky.
>
> The problem is that I have no idea how to test this.
It's a 3 step process:
1.
Install qemu-kvm under fc13, or build recent one from source,
get it from here:
git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
2. install guest under it:
qemu-img create -f qcow2 disk.qcow2 100G
Now get some image (e.g. fedora 13 in fc13.iso)
and install guest:
qemu-kvm -enable-kvm -m 1G -cdrom fc13.iso -drive file=disk.qcow2
3. set up networking. I usually simply do host to guest
on a special subnet for testing purposes:
Set up a bridge named mstbr0:
ifconfig mstbr0 down
brctl delbr mstbr0
brctl addbr mstbr0
brctl setfd mstbr0 0
ifconfig mstbr0 11.0.0.1
cat > ifup << EOF
#!/bin/sh -x
/sbin/ifconfig msttap0 0.0.0.0 up
brctl addif mstbr0 msttap0
EOF
qemu-kvm -enable-kvm -m 1G -cdrom fc13.iso -drive file=disk.qcow2
-net nic,model=virtio,netdev=foo -netdev
tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on
after you set up the guest, log into it and
ifconfig eth0 11.0.0.2
You should now be able to ping guest to host and back.
Use something like netperf to stress the connection.
Close qemu with kill -9 and unload module to test flushing code.
> Index: work/drivers/vhost/vhost.c
> ===================================================================
> --- work.orig/drivers/vhost/vhost.c
> +++ work/drivers/vhost/vhost.c
...
> @@ -125,10 +139,50 @@ static void vhost_vq_reset(struct vhost_
> vq->log_ctx = NULL;
> }
>
> +static int vhost_poller(void *data)
> +{
> + struct vhost_dev *dev = data;
> + struct vhost_poll *poll;
> +
> +repeat:
> + set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
> +
> + if (kthread_should_stop()) {
> + __set_current_state(TASK_RUNNING);
> + return 0;
> + }
> +
> + poll = NULL;
> + spin_lock(&dev->poller_lock);
> + if (!list_empty(&dev->poll_list)) {
> + poll = list_first_entry(&dev->poll_list,
> + struct vhost_poll, node);
> + list_del_init(&poll->node);
> + }
> + spin_unlock(&dev->poller_lock);
> +
> + if (poll) {
> + __set_current_state(TASK_RUNNING);
> + poll->fn(poll);
> + smp_wmb(); /* paired with rmb in vhost_poll_flush() */
> + poll->done_seq = poll->queue_seq;
> + wake_up_all(&poll->done);
This seems to add wakeups on data path, which uses spinlocks etc.
OTOH workqueue.c adds a special barrier
entry which only does a wakeup when needed.
Right?
> + } else
> + schedule();
> +
> + goto repeat;
> +}
> +
> long vhost_dev_init(struct vhost_dev *dev,
> struct vhost_virtqueue *vqs, int nvqs)
> {
> + struct task_struct *poller;
> int i;
> +
> + poller = kthread_create(vhost_poller, dev, "vhost-%d", current->pid);
> + if (IS_ERR(poller))
> + return PTR_ERR(poller);
> +
> dev->vqs = vqs;
> dev->nvqs = nvqs;
> mutex_init(&dev->mutex);
> @@ -136,6 +190,9 @@ long vhost_dev_init(struct vhost_dev *de
> dev->log_file = NULL;
> dev->memory = NULL;
> dev->mm = NULL;
> + spin_lock_init(&dev->poller_lock);
> + INIT_LIST_HEAD(&dev->poll_list);
> + dev->poller = poller;
>
> for (i = 0; i < dev->nvqs; ++i) {
> dev->vqs[i].dev = dev;
> @@ -143,8 +200,7 @@ long vhost_dev_init(struct vhost_dev *de
> vhost_vq_reset(dev, dev->vqs + i);
> if (dev->vqs[i].handle_kick)
> vhost_poll_init(&dev->vqs[i].poll,
> - dev->vqs[i].handle_kick,
> - POLLIN);
> + dev->vqs[i].handle_kick, POLLIN, dev);
> }
> return 0;
> }
> @@ -217,6 +273,8 @@ void vhost_dev_cleanup(struct vhost_dev
> if (dev->mm)
> mmput(dev->mm);
> dev->mm = NULL;
> +
> + kthread_stop(dev->poller);
> }
>
> static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz)
> @@ -1113,16 +1171,3 @@ void vhost_disable_notify(struct vhost_v
> vq_err(vq, "Failed to enable notification at %p: %d\n",
> &vq->used->flags, r);
> }
> -
> -int vhost_init(void)
> -{
> - vhost_workqueue = create_singlethread_workqueue("vhost");
> - if (!vhost_workqueue)
> - return -ENOMEM;
> - return 0;
> -}
> -
> -void vhost_cleanup(void)
> -{
> - destroy_workqueue(vhost_workqueue);
I note that destroy_workqueue does a flush, kthread_stop
doesn't. Right? Sure we don't need to check nothing is in one of
the lists? Maybe add a BUG_ON?
> -}
> Index: work/drivers/vhost/vhost.h
> ===================================================================
> --- work.orig/drivers/vhost/vhost.h
> +++ work/drivers/vhost/vhost.h
> @@ -5,7 +5,6 @@
> #include <linux/vhost.h>
> #include <linux/mm.h>
> #include <linux/mutex.h>
> -#include <linux/workqueue.h>
> #include <linux/poll.h>
> #include <linux/file.h>
> #include <linux/skbuff.h>
> @@ -20,19 +19,26 @@ enum {
> VHOST_NET_MAX_SG = MAX_SKB_FRAGS + 2,
> };
>
> +struct vhost_poll;
> +typedef void (*vhost_poll_fn_t)(struct vhost_poll *poll);
> +
> /* Poll a file (eventfd or socket) */
> /* Note: there's nothing vhost specific about this structure. */
> struct vhost_poll {
> + vhost_poll_fn_t fn;
> poll_table table;
> wait_queue_head_t *wqh;
> wait_queue_t wait;
> - /* struct which will handle all actual work. */
> - struct work_struct work;
> + struct list_head node;
> + wait_queue_head_t done;
> unsigned long mask;
> + struct vhost_dev *dev;
> + int queue_seq;
> + int done_seq;
> };
>
> -void vhost_poll_init(struct vhost_poll *poll, work_func_t func,
> - unsigned long mask);
> +void vhost_poll_init(struct vhost_poll *poll, vhost_poll_fn_t fn,
> + unsigned long mask, struct vhost_dev *dev);
> void vhost_poll_start(struct vhost_poll *poll, struct file *file);
> void vhost_poll_stop(struct vhost_poll *poll);
> void vhost_poll_flush(struct vhost_poll *poll);
> @@ -63,7 +69,7 @@ struct vhost_virtqueue {
> struct vhost_poll poll;
>
> /* The routine to call when the Guest pings us, or timeout. */
> - work_func_t handle_kick;
> + vhost_poll_fn_t handle_kick;
>
> /* Last available index we saw. */
> u16 last_avail_idx;
> @@ -86,11 +92,11 @@ struct vhost_virtqueue {
> struct iovec hdr[VHOST_NET_MAX_SG];
> size_t hdr_size;
> /* We use a kind of RCU to access private pointer.
> - * All readers access it from workqueue, which makes it possible to
> - * flush the workqueue instead of synchronize_rcu. Therefore readers do
> + * All readers access it from poller, which makes it possible to
> + * flush the vhost_poll instead of synchronize_rcu. Therefore readers do
> * not need to call rcu_read_lock/rcu_read_unlock: the beginning of
> - * work item execution acts instead of rcu_read_lock() and the end of
> - * work item execution acts instead of rcu_read_lock().
> + * vhost_poll execution acts instead of rcu_read_lock() and the end of
> + * vhost_poll execution acts instead of rcu_read_lock().
> * Writers use virtqueue mutex. */
> void *private_data;
> /* Log write descriptors */
> @@ -110,6 +116,9 @@ struct vhost_dev {
> int nvqs;
> struct file *log_file;
> struct eventfd_ctx *log_ctx;
> + spinlock_t poller_lock;
> + struct list_head poll_list;
> + struct task_struct *poller;
> };
>
> long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue *vqs, int nvqs);
> @@ -136,9 +145,6 @@ bool vhost_enable_notify(struct vhost_vi
> int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
> unsigned int log_num, u64 len);
>
> -int vhost_init(void);
> -void vhost_cleanup(void);
> -
> #define vq_err(vq, fmt, ...) do { \
> pr_debug(pr_fmt(fmt), ##__VA_ARGS__); \
> if ((vq)->error_ctx) \
^ permalink raw reply
* Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Oleg Nesterov @ 2010-05-31 15:31 UTC (permalink / raw)
To: Tejun Heo
Cc: Michael S. Tsirkin, Sridhar Samudrala, netdev, lkml,
kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C03D0BE.1040601@kernel.org>
On 05/31, Tejun Heo wrote:
>
> On 05/31/2010 04:39 PM, Oleg Nesterov wrote:
>
> > What I can't understand is why we do have ->queue_seq and ->done_seq.
> >
> > Isn't the single "bool poll->active" enough? vhost_poll_queue() sets
> > ->active == T, vhost_poller() clears it before wake_up_all(poll->done).
>
> I might have slightly over engineered this part not knowing the
> expected workload. ->queue_seq/->done_seq pair is to guarantee that
> flushers never get starved.
Ah, indeed.
Well, afaics we do not need 2 counters anyway, both vhost_poll_queue()
and vhost_poller() could increment the single counter and the flusher
can take bit 0 into account. But I agree 2 counters are much more clean.
> >> +static int vhost_poller(void *data)
> >> +{
> >> + struct vhost_dev *dev = data;
> >> + struct vhost_poll *poll;
> >> +
> >> +repeat:
> >> + set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
> >
> > I don't understand the comment... why do we need this barrier?
>
> So that either kthread_stop()'s should_stop = 1 in kthread_stop() is
> visible to kthread_should_stop() or task state is set to RUNNING.
Of course, you are right. I am really surprized I asked this question ;)
Thanks,
Oleg.
^ permalink raw reply
* Re: [PATCH] ixp4xx: Support the all multicast flag on the NPE devices.
From: Richard Cochran @ 2010-05-31 15:34 UTC (permalink / raw)
To: Krzysztof Halasa; +Cc: netdev, linux-arm-kernel
In-Reply-To: <m3pr0c17ue.fsf@intrepid.localdomain>
On Mon, May 31, 2010 at 04:09:13PM +0200, Krzysztof Halasa wrote:
> Looks good, though I'd prefer a bit of "const" and "static" in the
> allmulti[] declaration. Would you please repost? TIA.
Okay, here it is.
Thanks,
Richard
This patch adds support for the IFF_ALLMULTI flag. Previously only the
IFF_PROMISC flag was supported.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
drivers/net/arm/ixp4xx_eth.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/drivers/net/arm/ixp4xx_eth.c b/drivers/net/arm/ixp4xx_eth.c
index 6be8b09..42b4361 100644
--- a/drivers/net/arm/ixp4xx_eth.c
+++ b/drivers/net/arm/ixp4xx_eth.c
@@ -739,6 +739,17 @@ static void eth_set_mcast_list(struct net_device *dev)
struct dev_mc_list *mclist;
u8 diffs[ETH_ALEN], *addr;
int i;
+ static const u8 allmulti[] = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00 };
+
+ if (dev->flags & IFF_ALLMULTI) {
+ for (i = 0; i < ETH_ALEN; i++) {
+ __raw_writel(allmulti[i], &port->regs->mcast_addr[i]);
+ __raw_writel(allmulti[i], &port->regs->mcast_mask[i]);
+ }
+ __raw_writel(DEFAULT_RX_CNTRL0 | RX_CNTRL0_ADDR_FLTR_EN,
+ &port->regs->rx_control[0]);
+ return;
+ }
if ((dev->flags & IFF_PROMISC) || netdev_mc_empty(dev)) {
__raw_writel(DEFAULT_RX_CNTRL0 & ~RX_CNTRL0_ADDR_FLTR_EN,
--
1.6.3.3
^ permalink raw reply related
* Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Tejun Heo @ 2010-05-31 15:38 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Michael S. Tsirkin, Sridhar Samudrala, netdev, lkml,
kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100531153104.GA9452@redhat.com>
Hello,
On 05/31/2010 05:31 PM, Oleg Nesterov wrote:
>> I might have slightly over engineered this part not knowing the
>> expected workload. ->queue_seq/->done_seq pair is to guarantee that
>> flushers never get starved.
>
> Ah, indeed.
>
> Well, afaics we do not need 2 counters anyway, both vhost_poll_queue()
> and vhost_poller() could increment the single counter and the flusher
> can take bit 0 into account. But I agree 2 counters are much more clean.
Right, we can do that too. Cool. :-)
>>>> +static int vhost_poller(void *data)
>>>> +{
>>>> + struct vhost_dev *dev = data;
>>>> + struct vhost_poll *poll;
>>>> +
>>>> +repeat:
>>>> + set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */
>>>
>>> I don't understand the comment... why do we need this barrier?
>>
>> So that either kthread_stop()'s should_stop = 1 in kthread_stop() is
>> visible to kthread_should_stop() or task state is set to RUNNING.
>
> Of course, you are right. I am really surprized I asked this question ;)
This part is always a bit tricky tho. Maybe it would be a good idea
to make kthread_stop() do periodic wakeups. It's already injecting
one rather unexpected wake up into the mix anyway so there isn't much
point in avoiding multiple and it would make designing kthread loops
easier. Or maybe what we need is something like kthread_idle() which
encapsulates the check and sleep part.
Thanks.
--
tejun
^ permalink raw reply
* Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Tejun Heo @ 2010-05-31 15:45 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <20100531152221.GB2987@redhat.com>
Hello,
On 05/31/2010 05:22 PM, Michael S. Tsirkin wrote:
> On Sun, May 30, 2010 at 10:24:01PM +0200, Tejun Heo wrote:
>> Replace vhost_workqueue with per-vhost kthread. Other than callback
>> argument change from struct work_struct * to struct vhost_poll *,
>> there's no visible change to vhost_poll_*() interface.
>
> I would prefer a substructure vhost_work, even just to make
> the code easier to review and compare to workqueue.c.
Yeap, sure.
>> The problem is that I have no idea how to test this.
>
> It's a 3 step process:
...
> You should now be able to ping guest to host and back.
> Use something like netperf to stress the connection.
> Close qemu with kill -9 and unload module to test flushing code.
Thanks for the instruction. I'll see if there's a way to do it
without building qemu myself on opensuse. But please feel free to go
ahead and test it. It might just work! :-)
>> + if (poll) {
>> + __set_current_state(TASK_RUNNING);
>> + poll->fn(poll);
>> + smp_wmb(); /* paired with rmb in vhost_poll_flush() */
>> + poll->done_seq = poll->queue_seq;
>> + wake_up_all(&poll->done);
>
> This seems to add wakeups on data path, which uses spinlocks etc.
> OTOH workqueue.c adds a special barrier entry which only does a
> wakeup when needed. Right?
Yeah, well, if it's a really hot path sure we can avoid wake_up_all()
in most cases. Do you think that would be necessary?
>> -void vhost_cleanup(void)
>> -{
>> - destroy_workqueue(vhost_workqueue);
>
> I note that destroy_workqueue does a flush, kthread_stop
> doesn't. Right? Sure we don't need to check nothing is in one of
> the lists? Maybe add a BUG_ON?
There were a bunch of flushes before kthread_stop() and they seemed to
stop and flush everything. Aren't they enough? We can definitely add
BUG_ON() after kthread_should_stop() check succeeds either way tho.
Thanks.
--
tejun
^ permalink raw reply
* Re: [RFC] netfilter: WIP: Xtables idletimer target implementation
From: Patrick McHardy @ 2010-05-31 15:59 UTC (permalink / raw)
To: Luciano Coelho
Cc: ext Jan Engelhardt, netfilter-devel@vger.kernel.org,
netdev@vger.kernel.org, Timo Teras
In-Reply-To: <1275040724.24490.121.camel@chilepepper>
Luciano Coelho wrote:
> On Fri, 2010-05-28 at 10:05 +0200, ext Jan Engelhardt wrote:
>> On Friday 2010-05-28 07:25, Luciano Coelho wrote:
>>> Do you have any other suggestion on how I can associate the rules to
>>> specific interfaces?
>> -A INPUT -i foo -j do
>> -A do -j idletimer
>>
>> A little funny, but actually this would allow me to keep a timer
>> for a group of interfaces rather than just per-if.
>
> Yes, this is what our userspace apps are doing. I've formulated my
> question in an unclear way. If you check the rest of the code, I create
> sysfs files under the interface's directory and use it as an attribute
> to notify the userspace when the timer has expired.
>
> In short, I need to figure out a way to associate each rule with an
> interface in sysfs, so I can notify the userspace when the timer has
> expired. I couldn't figure out another way to do it. Any suggestions?
How about just using an arbitrary user-supplied name? People can
name them after interfaces, or anything else.
>>>>> +static int xt_idletimer_checkentry(const struct xt_tgchk_param *par)
>>>>> +{
>>>>> + const struct xt_idletimer_info *info = par->targinfo;
>>>>> + const struct ipt_entry *entryinfo = par->entryinfo;
>>>>> + const struct ipt_ip *ip = &entryinfo->ip;
>>>> I'm not sure spying on ipt_ip is a long-term viable solution.
>>> Do you have any other suggestions on how I could get an interface
>>> associated with the rule? I thought about having the userspace pass the
>>> interface as an option to the rule (like I already do for the timeout
>>> value), but that looked ugly to me, since the interface can already be
>>> defined as part of the ruleset.
>> I have patches ready since a while that decouple ipt_ip
>> from a rule, so there is no guarantee that such will exist.
>
> Okay, if that's the case, then I don't know how to associate the rule
> with a specific net object in the kobject tree. Maybe I have to figure
> out a different way to notify the userspace, unless I add the target
> option I mentioned above. :/
>
>
^ permalink raw reply
* Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread
From: Michael S. Tsirkin @ 2010-05-31 16:00 UTC (permalink / raw)
To: Tejun Heo
Cc: Oleg Nesterov, Sridhar Samudrala, netdev, lkml,
kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev, Jiri Kosina,
Thomas Gleixner, Ingo Molnar, Andi Kleen
In-Reply-To: <4C03D983.9010905@kernel.org>
On Mon, May 31, 2010 at 05:45:07PM +0200, Tejun Heo wrote:
> Hello,
>
> On 05/31/2010 05:22 PM, Michael S. Tsirkin wrote:
> > On Sun, May 30, 2010 at 10:24:01PM +0200, Tejun Heo wrote:
> >> Replace vhost_workqueue with per-vhost kthread. Other than callback
> >> argument change from struct work_struct * to struct vhost_poll *,
> >> there's no visible change to vhost_poll_*() interface.
> >
> > I would prefer a substructure vhost_work, even just to make
> > the code easier to review and compare to workqueue.c.
>
> Yeap, sure.
>
> >> The problem is that I have no idea how to test this.
> >
> > It's a 3 step process:
> ...
> > You should now be able to ping guest to host and back.
> > Use something like netperf to stress the connection.
> > Close qemu with kill -9 and unload module to test flushing code.
>
> Thanks for the instruction. I'll see if there's a way to do it
> without building qemu myself on opensuse.
My guess is no, there was no stable qemu release with vhost net support
yet. Building it is mostly configure/make/make install,
as far as I remember you only need devel versions of X libraries,
SDL and curses installed.
> But please feel free to go
> ahead and test it. It might just work! :-)
>
> >> + if (poll) {
> >> + __set_current_state(TASK_RUNNING);
> >> + poll->fn(poll);
> >> + smp_wmb(); /* paired with rmb in vhost_poll_flush() */
> >> + poll->done_seq = poll->queue_seq;
> >> + wake_up_all(&poll->done);
> >
>
> > This seems to add wakeups on data path, which uses spinlocks etc.
> > OTOH workqueue.c adds a special barrier entry which only does a
> > wakeup when needed. Right?
>
> Yeah, well, if it's a really hot path sure we can avoid wake_up_all()
> in most cases. Do you think that would be necessary?
My guess is yes. This is really very hot path in code, and we are
close to 100% CPU in some benchmarks.
> >> -void vhost_cleanup(void)
> >> -{
> >> - destroy_workqueue(vhost_workqueue);
> >
> > I note that destroy_workqueue does a flush, kthread_stop
> > doesn't. Right? Sure we don't need to check nothing is in one of
> > the lists? Maybe add a BUG_ON?
>
> There were a bunch of flushes before kthread_stop() and they seemed to
> stop and flush everything. Aren't they enough?
I was just asking, I'll need to review the code in depth.
> We can definitely add
> BUG_ON() after kthread_should_stop() check succeeds either way tho.
>
> Thanks.
>
> --
> tejun
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox