* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Eric Dumazet @ 2011-07-19 20:41 UTC (permalink / raw)
To: Jiri Pirko; +Cc: Neil Horman, netdev, Alexey Dobriyan, David S. Miller
In-Reply-To: <20110719202922.GA2352@minipsycho>
Le mardi 19 juillet 2011 à 22:29 +0200, Jiri Pirko a écrit :
> You are right, but it may not cause panic, right? In case this patch
> would cause significant performance regression, how about to just forbid
> pktgen to run on soft-netdevs ?
>
Please do
Note : a sysadmin has other ways to make a machine panic or reboot or
halt...
^ permalink raw reply
* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Jiri Pirko @ 2011-07-19 20:29 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Neil Horman, netdev, Alexey Dobriyan, David S. Miller
In-Reply-To: <1311105738.3113.11.camel@edumazet-laptop>
Tue, Jul 19, 2011 at 10:02:18PM CEST, eric.dumazet@gmail.com wrote:
>Le mardi 19 juillet 2011 à 15:52 -0400, Neil Horman a écrit :
>> This oops was reported recently when running a pktgen script that called for a
>> transmitted skb to be cloned and sent 100 times using the clone_skb command to a
>> bridge device with several attached interface:
>>
>...
>
>> Turns out the pktgen driver doesn't actually clone skbs, but rather shares them,
>> increasing the reference count of the skb for each send operation. This works
>> for most drivers because most drivers don't store or care about any state in the
>> skb itself, but several do. For instance, the above tun/tap driver and other
>> soft drivers (vlans, bonding/bridging), all requeue frames to a physical device,
>> meaning the skb next and prev pointers will be set. Other drivers also care
>> about skb state. The virtio_net driver for instance uses the skb->cb space to
>> store a vnet header and several converged adapters adjust the data pointer of an
>> skb to prepend a device control header to the skb. Drivers expect skbs
>> submitted for i/o to be in their control and unshared with other users, an
>> assumption which pktgen is violating, the result being multiple skb users
>> corrupting one antohers state and producing oopses like the one above. The
>> solution is to make pktgen clone the skb for each transmit so as to ensure the
>> drivers assumptions about private exclusive access to the skb is maintained.
>>
>> Tested successfully by myself
>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>> Reported-by: Jiri Pirko <jpirko@redhat.com>
>> CC: Eric Dumazet <eric.dumazet@gmail.com>
>> CC: Alexey Dobriyan <adobriyan@gmail.com>
>> CC: David S. Miller <davem@davemloft.net>
>> ---
>
>This will kill pktgen performance ?
>
>pktgen is only for sysadmins, and very skilled ones :)
You are right, but it may not cause panic, right? In case this patch
would cause significant performance regression, how about to just forbid
pktgen to run on soft-netdevs ?
Jirka
>
>BTW you forgot to CC pktgen author, Robert Olsson
>
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Joe Perches @ 2011-07-19 20:17 UTC (permalink / raw)
To: Eric Dumazet, Robert Olsson
Cc: Neil Horman, netdev, Alexey Dobriyan, David S. Miller
In-Reply-To: <1311105738.3113.11.camel@edumazet-laptop>
On Tue, 2011-07-19 at 22:02 +0200, Eric Dumazet wrote:
> Le mardi 19 juillet 2011 à 15:52 -0400, Neil Horman a écrit :
> > This oops was reported recently when running a pktgen script that called for a
> > transmitted skb to be cloned and sent 100 times using the clone_skb command to a
> > bridge device with several attached interface:
> BTW you forgot to CC pktgen author, Robert Olsson
Robert, you're not getting cc's on pktgen when
people use get_maintainer.
Are you still interested in getting cc'd on patches
to pktgen?
If so, perhaps you want to add a MAINTAINERS entry
for it.
Maybe something like:
diff --git a/MAINTAINERS b/MAINTAINERS
index 378fccf..5a6c16a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4999,6 +4999,11 @@ S: Maintained
F: drivers/block/pktcdvd.c
F: include/linux/pktcdvd.h
+PKTGEN
+M: Robert Olsson <robert.olsson@its.uu.se>
+S: Odd Fixes
+F: net/core/pktgen.c
+
PKUNITY SOC DRIVERS
M: Guan Xuetao <gxt@mprc.pku.edu.cn>
W: http://mprc.pku.edu.cn/~guanxuetao/linux
^ permalink raw reply related
* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Eric Dumazet @ 2011-07-19 20:02 UTC (permalink / raw)
To: Neil Horman; +Cc: netdev, Alexey Dobriyan, David S. Miller
In-Reply-To: <1311105179-26408-1-git-send-email-nhorman@tuxdriver.com>
Le mardi 19 juillet 2011 à 15:52 -0400, Neil Horman a écrit :
> This oops was reported recently when running a pktgen script that called for a
> transmitted skb to be cloned and sent 100 times using the clone_skb command to a
> bridge device with several attached interface:
>
...
> Turns out the pktgen driver doesn't actually clone skbs, but rather shares them,
> increasing the reference count of the skb for each send operation. This works
> for most drivers because most drivers don't store or care about any state in the
> skb itself, but several do. For instance, the above tun/tap driver and other
> soft drivers (vlans, bonding/bridging), all requeue frames to a physical device,
> meaning the skb next and prev pointers will be set. Other drivers also care
> about skb state. The virtio_net driver for instance uses the skb->cb space to
> store a vnet header and several converged adapters adjust the data pointer of an
> skb to prepend a device control header to the skb. Drivers expect skbs
> submitted for i/o to be in their control and unshared with other users, an
> assumption which pktgen is violating, the result being multiple skb users
> corrupting one antohers state and producing oopses like the one above. The
> solution is to make pktgen clone the skb for each transmit so as to ensure the
> drivers assumptions about private exclusive access to the skb is maintained.
>
> Tested successfully by myself
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Reported-by: Jiri Pirko <jpirko@redhat.com>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Alexey Dobriyan <adobriyan@gmail.com>
> CC: David S. Miller <davem@davemloft.net>
> ---
This will kill pktgen performance ?
pktgen is only for sysadmins, and very skilled ones :)
BTW you forgot to CC pktgen author, Robert Olsson
^ permalink raw reply
* [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Neil Horman @ 2011-07-19 19:52 UTC (permalink / raw)
To: netdev; +Cc: Neil Horman, Eric Dumazet, Alexey Dobriyan, David S. Miller
This oops was reported recently when running a pktgen script that called for a
transmitted skb to be cloned and sent 100 times using the clone_skb command to a
bridge device with several attached interface:
BUG: unable to handle kernel paging request at ffff880136994528
RIP: 0010:[<ffff880136994528>] [<ffff880136994528>] 0xffff880136994528
RSP: 0018:ffff8801384e5b78 EFLAGS: 00010286
RAX: ffff880136994528 RBX: ffff880137e52800 RCX: 0000000000000000
RDX: ffffffffa0529ba0 RSI: ffff880137e52800 RDI: ffff8801369944b0
RBP: ffff8801384e5ba0 R08: ffff8801379cb49c R09: ffff8801384e5c78
R10: 0000000000000000 R11: 0000000000000400 R12: ffff880137e52ec0
R13: ffff8801369944b0 R14: ffff880136ed9480 R15: ffff880137e52800
FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffff880136994528 CR3: 00000001395f2000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kpktgend_0 (pid: 2601, threadinfo ffff8801384e4000, task
ffff880137fe5560)
Stack:
ffffffffa0527425 0000000000000000 ffff8801369944b0 ffff880137e52800
<0> ffff880136ed9480 ffff8801384e5bf0 ffffffff8141047a ffffffffa0529ba0
<0> ffff880134813e40 0000000080000000 ffff8801379cb400 ffff880137e52800
Call Trace:
[<ffffffffa0527425>] ? tun_net_xmit+0x135/0x200 [tun]
[<ffffffff8141047a>] dev_hard_start_xmit+0x20a/0x370
[<ffffffff814288aa>] sch_direct_xmit+0x15a/0x1c0
[<ffffffff81413ab8>] dev_queue_xmit+0x378/0x4a0
[<ffffffffa0488400>] ? br_dev_queue_push_xmit+0x0/0xa0 [bridge]
[<ffffffffa048846c>] br_dev_queue_push_xmit+0x6c/0xa0 [bridge]
[<ffffffffa04884f8>] br_forward_finish+0x58/0x60 [bridge]
[<ffffffffa0488690>] __br_deliver+0x60/0x70 [bridge]
[<ffffffffa04883aa>] br_flood+0xca/0xe0 [bridge]
[<ffffffffa0488630>] ? __br_deliver+0x0/0x70 [bridge]
[<ffffffffa04883f7>] br_flood_deliver+0x17/0x20 [bridge]
[<ffffffffa048748b>] br_dev_xmit+0xfb/0x100 [bridge]
[<ffffffffa053790e>] pktgen_thread_worker+0x83e/0x1bc8 [pktgen]
[<ffffffff81059d12>] ? finish_task_switch+0x42/0xd0
[<ffffffffa0487390>] ? br_dev_xmit+0x0/0x100 [bridge]
[<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa05370d0>] ? pktgen_thread_worker+0x0/0x1bc8 [pktgen]
[<ffffffff81091936>] kthread+0x96/0xa0
[<ffffffff810141ca>] child_rip+0xa/0x20
[<ffffffff810918a0>] ? kthread+0x0/0xa0
[<ffffffff810141c0>] ? child_rip+0x0/0x20
Turns out the pktgen driver doesn't actually clone skbs, but rather shares them,
increasing the reference count of the skb for each send operation. This works
for most drivers because most drivers don't store or care about any state in the
skb itself, but several do. For instance, the above tun/tap driver and other
soft drivers (vlans, bonding/bridging), all requeue frames to a physical device,
meaning the skb next and prev pointers will be set. Other drivers also care
about skb state. The virtio_net driver for instance uses the skb->cb space to
store a vnet header and several converged adapters adjust the data pointer of an
skb to prepend a device control header to the skb. Drivers expect skbs
submitted for i/o to be in their control and unshared with other users, an
assumption which pktgen is violating, the result being multiple skb users
corrupting one antohers state and producing oopses like the one above. The
solution is to make pktgen clone the skb for each transmit so as to ensure the
drivers assumptions about private exclusive access to the skb is maintained.
Tested successfully by myself
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Jiri Pirko <jpirko@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: David S. Miller <davem@davemloft.net>
---
net/core/pktgen.c | 21 +++++++++++++--------
1 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index f76079c..c8e0e08 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3297,6 +3297,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
netdev_tx_t (*xmit)(struct sk_buff *, struct net_device *)
= odev->netdev_ops->ndo_start_xmit;
struct netdev_queue *txq;
+ struct sk_buff *tx_skb = NULL;
u16 queue_map;
int ret;
@@ -3316,9 +3317,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
/* If no skb or clone count exhausted then get new one */
if (!pkt_dev->skb || (pkt_dev->last_ok &&
- ++pkt_dev->clone_count >= pkt_dev->clone_skb)) {
- /* build a new pkt */
- kfree_skb(pkt_dev->skb);
+ ++pkt_dev->clone_count > pkt_dev->clone_skb)) {
pkt_dev->skb = fill_packet(odev, pkt_dev);
if (pkt_dev->skb == NULL) {
@@ -3332,10 +3331,13 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
pkt_dev->clone_count = 0; /* reset counter */
}
+ tx_skb = (pkt_dev->clone_count == pkt_dev->clone_skb) ?
+ pkt_dev->skb : skb_clone(pkt_dev->skb, GFP_KERNEL);
+
if (pkt_dev->delay && pkt_dev->last_ok)
spin(pkt_dev, pkt_dev->next_tx);
- queue_map = skb_get_queue_mapping(pkt_dev->skb);
+ queue_map = skb_get_queue_mapping(tx_skb);
txq = netdev_get_tx_queue(odev, queue_map);
__netif_tx_lock_bh(txq);
@@ -3345,8 +3347,8 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
pkt_dev->last_ok = 0;
goto unlock;
}
- atomic_inc(&(pkt_dev->skb->users));
- ret = (*xmit)(pkt_dev->skb, odev);
+
+ ret = (*xmit)(tx_skb, odev);
switch (ret) {
case NETDEV_TX_OK:
@@ -3369,8 +3371,11 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
/* fallthru */
case NETDEV_TX_LOCKED:
case NETDEV_TX_BUSY:
- /* Retry it next time */
- atomic_dec(&(pkt_dev->skb->users));
+ /*
+ * Only free it if its not the last in the clone series
+ */
+ if (tx_skb != pkt_dev->skb)
+ kfree_skb(tx_skb);
pkt_dev->last_ok = 0;
}
unlock:
--
1.7.6
^ permalink raw reply related
* Re: [PATCH] vhost: clean up outstanding buffers before setting vring
From: Michael S. Tsirkin @ 2011-07-19 19:49 UTC (permalink / raw)
To: Shirley Ma; +Cc: David Miller, netdev, jasowang
In-Reply-To: <1311098546.8573.13.camel@localhost.localdomain>
On Tue, Jul 19, 2011 at 11:02:26AM -0700, Shirley Ma wrote:
> The outstanding DMA buffers need to be clean up before setting vring in
> vhost. Otherwise the vring would be out of sync.
>
> Signed-off-by: Shirley Ma<xma@us.ibm.com>
I suspect what is missing is calling
vhost_zerocopy_signal_used then?
If yes we probably should do it after
changing the backend, not on vring set.
> ---
>
> drivers/vhost/vhost.c | 11 +++++++++--
> 1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index c14c42b..d6315b4 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -445,8 +445,10 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> vhost_poll_flush(&dev->vqs[i].poll);
> }
> /* Wait for all lower device DMAs done. */
> - if (dev->vqs[i].ubufs)
> + if (dev->vqs[i].ubufs) {
> vhost_ubuf_put_and_wait(dev->vqs[i].ubufs);
> + kfree(dev->vqs[i].ubufs);
> + }
>
> /* Signal guest as appropriate. */
> vhost_zerocopy_signal_used(&dev->vqs[i]);
> @@ -651,6 +653,12 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
> vq = d->vqs + idx;
>
> mutex_lock(&vq->mutex);
> + /* Wait for all lower device DMAs done. */
> + if (vq->ubufs)
> + vhost_ubuf_put_and_wait(vq->ubufs);
Could you elaborate on the problem you observe please?
At least in theory, existing code flushes outstanding
requests when backend is changed.
And since vring set verifies no backend is active,
we should be fine?
> +
> + /* Signal guest as appropriate. */
> + vhost_zerocopy_signal_used(vq);
>
> switch (ioctl) {
> case VHOST_SET_VRING_NUM:
> @@ -1592,7 +1600,6 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *ubufs)
> {
> kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
> wait_event(ubufs->wait, !atomic_read(&ubufs->kref.refcount));
> - kfree(ubufs);
Won't this leak memory when ubufs are switched in vhost_net_set_backend?
> }
>
> void vhost_zerocopy_callback(void *arg)
>
>
>
^ permalink raw reply
* Re: [PATCH net-next]vhost: fix condition check for # of outstanding dma buffers
From: Michael S. Tsirkin @ 2011-07-19 19:09 UTC (permalink / raw)
To: Shirley Ma; +Cc: David Miller, netdev, jasowang
In-Reply-To: <1311100678.8573.16.camel@localhost.localdomain>
On Tue, Jul 19, 2011 at 11:37:58AM -0700, Shirley Ma wrote:
> Signed-off-by: Shirley Ma <xma@us.ibm.com>
> ---
>
> drivers/vhost/net.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 70ac604..83cb738 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -189,8 +189,10 @@ static void handle_tx(struct vhost_net *net)
> break;
> }
> /* If more outstanding DMAs, queue the work */
> - if (unlikely(vq->upend_idx - vq->done_idx >
> - VHOST_MAX_PEND)) {
> + if (unlikely((vq->upend_idx - vq->done_idx >
> + VHOST_MAX_PEND) ||
> + (vq->upend_idx - vq->done_idx >
> + VHOST_MAX_PEND - UIO_MAXIOV))) {
Could you please explain why this makes sense please?
VHOST_MAX_PEND is 128 UIO_MAXIOV is 1024 so
the result is negative?
I thought upend_idx - done_idx is exactly the number
of buffers, so once we get too many we stop until
one gets freed?
> tx_poll_start(net, sock);
> set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
> break;
>
^ permalink raw reply
* Re: [PULL net-next] vhost-net fixes for 3.1
From: Michael S. Tsirkin @ 2011-07-19 19:05 UTC (permalink / raw)
To: David Miller; +Cc: netdev, jasowang, mashirle
In-Reply-To: <20110719.120007.1487641546240582444.davem@davemloft.net>
On Tue, Jul 19, 2011 at 12:00:07PM -0700, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Tue, 19 Jul 2011 21:58:13 +0300
>
> > I have snuck in another one:
> > vhost: optimize interrupt enable/disable
> >
> > please don't be alarmed :)
>
> I already pulled so send another pull request if you want me
> to get this.
You got the new one actually, so nothing needs to be done,
I think.
Thanks!
--
MST
^ permalink raw reply
* Re: [PULL net-next] vhost-net fixes for 3.1
From: David Miller @ 2011-07-19 19:00 UTC (permalink / raw)
To: mst; +Cc: netdev, jasowang, mashirle
In-Reply-To: <20110719185813.GA8632@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 19 Jul 2011 21:58:13 +0300
> I have snuck in another one:
> vhost: optimize interrupt enable/disable
>
> please don't be alarmed :)
I already pulled so send another pull request if you want me
to get this.
^ permalink raw reply
* Re: [BUG] ipv6: all routes share same inetpeer
From: David Miller @ 2011-07-19 18:59 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1311101870.3113.6.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 19 Jul 2011 20:57:50 +0200
> Le mardi 19 juillet 2011 à 20:20 +0200, Eric Dumazet a écrit :
>> Le mardi 19 juillet 2011 à 10:37 -0700, David Miller a écrit :
>> > From: Eric Dumazet <eric.dumazet@gmail.com>
>> > Date: Tue, 19 Jul 2011 19:23:49 +0200
>> >
>> > > Maybe you can find the bug before me ?
>> >
>> > I think when we add the route we cow the metrics almost immediately.
>> > The daddr is, unfortunately, fully prefixed at that point.
>>
>> Yes, we shall provide a second ip6_rt_copy() argument, with the
>> destination address.
>>
>
> Hmm, or maybe just change the dst_copy_metrics(&rt->dst, &ort->dst);
> call done from ip6_rt_copy(), to avoid doing the COW if not really
> needed ?
This is ok if it handles the case where ort's metrics point to
writable inetpeer memory.
^ permalink raw reply
* Re: [BUG] ipv6: all routes share same inetpeer
From: Eric Dumazet @ 2011-07-19 18:57 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <1311099638.3113.2.camel@edumazet-laptop>
Le mardi 19 juillet 2011 à 20:20 +0200, Eric Dumazet a écrit :
> Le mardi 19 juillet 2011 à 10:37 -0700, David Miller a écrit :
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Tue, 19 Jul 2011 19:23:49 +0200
> >
> > > Maybe you can find the bug before me ?
> >
> > I think when we add the route we cow the metrics almost immediately.
> > The daddr is, unfortunately, fully prefixed at that point.
>
> Yes, we shall provide a second ip6_rt_copy() argument, with the
> destination address.
>
Hmm, or maybe just change the dst_copy_metrics(&rt->dst, &ort->dst);
call done from ip6_rt_copy(), to avoid doing the COW if not really
needed ?
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index ddef80f..5403cea 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1740,7 +1740,7 @@ static struct rt6_info * ip6_rt_copy(struct rt6_info *ort)
rt->dst.input = ort->dst.input;
rt->dst.output = ort->dst.output;
- dst_copy_metrics(&rt->dst, &ort->dst);
+ rt->dst._metrics = ort->dst._metrics;
rt->dst.error = ort->dst.error;
rt->rt6i_idev = ort->rt6i_idev;
if (rt->rt6i_idev)
^ permalink raw reply related
* Re: [PULL net-next] vhost-net fixes for 3.1
From: Michael S. Tsirkin @ 2011-07-19 18:58 UTC (permalink / raw)
To: netdev; +Cc: jasowang, mashirle
In-Reply-To: <20110719114741.GA5827@redhat.com>
On Tue, Jul 19, 2011 at 02:47:41PM +0300, Michael S. Tsirkin wrote:
> Here's Jason's fix for a theoretical event loss
> on migration (tweaked by me to update the test
> module as well), and a refcounting fix by me
> for a bug in the new zerocopy mode.
>
> I thought hard about merging Jason's patches for 3.0
> as well, but in the end I think it's not worth
> the risk of breaking something at this stage.
>
> Please pull for 3.1, thanks a lot!
>
> The following changes since commit 81fc70d86527a1450560709500ca5f52e661da1f:
>
> net: can: remove custom hex_to_bin() (2011-07-18 11:31:43 -0700)
>
> are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next
>
> Jason Wang (2):
> vhost: init used ring after backend was set
> vhost: set log when updating used flags or avail event
>
> Michael S. Tsirkin (1):
> vhost: fix zcopy reference counting
I have snuck in another one:
vhost: optimize interrupt enable/disable
please don't be alarmed :)
> drivers/vhost/net.c | 4 ++
> drivers/vhost/test.c | 5 +++
> drivers/vhost/vhost.c | 87 ++++++++++++++++++++++++++++++------------------
> drivers/vhost/vhost.h | 1 +
> 4 files changed, 64 insertions(+), 33 deletions(-)
^ permalink raw reply
* Re: [PATCH 1/3] stmmac: unify MAC and PHY configuration parameters
From: David Miller @ 2011-07-19 18:57 UTC (permalink / raw)
To: peppe.cavallaro; +Cc: netdev, stuart.menefy
In-Reply-To: <1311064684-20246-1-git-send-email-peppe.cavallaro@st.com>
From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Tue, 19 Jul 2011 10:38:02 +0200
> Prior to this change, most PHY configuration parameters were passed
> into the STMMAC device as a separate PHY device. As well as being
> unusual, this made it difficult to make changes to the MAC/PHY
> relationship.
>
> This patch moves all the PHY parameters into the MAC configuration
> structure, mainly as a separate structure. This allows us to completly
> ignore the MDIO bus attached to a stmmac if desired, and not create
> the PHY bus. It also allows the stmmac driver to use a different PHY
> from the one it is connected to, for example a fixed PHY or bit banging
> PHY.
>
> Also derive the stmmac/PHY connection type (MII/RMII etc) from the
> mode can be passed into <platf>_configure_ethernet.
> STLinux kernel at git://git.stlinux.com/stm/linux-sh4-2.6.32.y.git
> provides several examples how to use this new infrastructure (that
> actually is easier to maintain and clearer).
>
> Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
I find these changes confusing, because I can't see where these
platform data objects are created that end up being used by
the stmmac driver.
I'm concerned about this because if you're changing these data
structures, you'll need to update also the code that creates
these platform data objects.
Finally, this patch needs to update Documentation/networking/stmmac.txt
^ permalink raw reply
* [PATCH net-next]vhost: fix condition check for # of outstanding dma buffers
From: Shirley Ma @ 2011-07-19 18:37 UTC (permalink / raw)
To: David Miller, mst; +Cc: netdev, jasowang
Signed-off-by: Shirley Ma <xma@us.ibm.com>
---
drivers/vhost/net.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 70ac604..83cb738 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -189,8 +189,10 @@ static void handle_tx(struct vhost_net *net)
break;
}
/* If more outstanding DMAs, queue the work */
- if (unlikely(vq->upend_idx - vq->done_idx >
- VHOST_MAX_PEND)) {
+ if (unlikely((vq->upend_idx - vq->done_idx >
+ VHOST_MAX_PEND) ||
+ (vq->upend_idx - vq->done_idx >
+ VHOST_MAX_PEND - UIO_MAXIOV))) {
tx_poll_start(net, sock);
set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
break;
^ permalink raw reply related
* Re: [BUG] ipv6: all routes share same inetpeer
From: Eric Dumazet @ 2011-07-19 18:20 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20110719.103724.1461298517132188126.davem@davemloft.net>
Le mardi 19 juillet 2011 à 10:37 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 19 Jul 2011 19:23:49 +0200
>
> > Maybe you can find the bug before me ?
>
> I think when we add the route we cow the metrics almost immediately.
> The daddr is, unfortunately, fully prefixed at that point.
Yes, we shall provide a second ip6_rt_copy() argument, with the
destination address.
I am testing :
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index ddef80f..2a6d70a 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -72,7 +72,7 @@
#define RT6_TRACE(x...) do { ; } while (0)
#endif
-static struct rt6_info * ip6_rt_copy(struct rt6_info *ort);
+static struct rt6_info * ip6_rt_copy(struct rt6_info *ort, const struct in6_addr *dest);
static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie);
static unsigned int ip6_default_advmss(const struct dst_entry *dst);
static unsigned int ip6_default_mtu(const struct dst_entry *dst);
@@ -699,7 +699,7 @@ static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort, const struct in6_add
* Clone the route.
*/
- rt = ip6_rt_copy(ort);
+ rt = ip6_rt_copy(ort, daddr);
if (rt) {
struct neighbour *neigh;
@@ -712,7 +712,6 @@ static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort, const struct in6_add
ipv6_addr_copy(&rt->rt6i_gateway, daddr);
}
- ipv6_addr_copy(&rt->rt6i_dst.addr, daddr);
rt->rt6i_dst.plen = 128;
rt->rt6i_flags |= RTF_CACHE;
rt->dst.flags |= DST_HOST;
@@ -761,9 +760,9 @@ static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort, const struct in6_add
static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort, const struct in6_addr *daddr)
{
- struct rt6_info *rt = ip6_rt_copy(ort);
+ struct rt6_info *rt = ip6_rt_copy(ort, daddr);
+
if (rt) {
- ipv6_addr_copy(&rt->rt6i_dst.addr, daddr);
rt->rt6i_dst.plen = 128;
rt->rt6i_flags |= RTF_CACHE;
rt->dst.flags |= DST_HOST;
@@ -1584,7 +1583,7 @@ void rt6_redirect(const struct in6_addr *dest, const struct in6_addr *src,
if (neigh == dst_get_neighbour(&rt->dst))
goto out;
- nrt = ip6_rt_copy(rt);
+ nrt = ip6_rt_copy(rt, dest);
if (nrt == NULL)
goto out;
@@ -1592,7 +1591,6 @@ void rt6_redirect(const struct in6_addr *dest, const struct in6_addr *src,
if (on_link)
nrt->rt6i_flags &= ~RTF_GATEWAY;
- ipv6_addr_copy(&nrt->rt6i_dst.addr, dest);
nrt->rt6i_dst.plen = 128;
nrt->dst.flags |= DST_HOST;
@@ -1730,7 +1728,7 @@ void rt6_pmtu_discovery(const struct in6_addr *daddr, const struct in6_addr *sad
* Misc support functions
*/
-static struct rt6_info * ip6_rt_copy(struct rt6_info *ort)
+static struct rt6_info * ip6_rt_copy(struct rt6_info *ort, const struct in6_addr *dest)
{
struct net *net = dev_net(ort->rt6i_dev);
struct rt6_info *rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops,
@@ -1740,6 +1738,8 @@ static struct rt6_info * ip6_rt_copy(struct rt6_info *ort)
rt->dst.input = ort->dst.input;
rt->dst.output = ort->dst.output;
+ ipv6_addr_copy(&rt->rt6i_dst.addr, dest);
+ rt->rt6i_dst.plen = ort->rt6i_dst.plen;
dst_copy_metrics(&rt->dst, &ort->dst);
rt->dst.error = ort->dst.error;
rt->rt6i_idev = ort->rt6i_idev;
@@ -1752,7 +1752,6 @@ static struct rt6_info * ip6_rt_copy(struct rt6_info *ort)
rt->rt6i_flags = ort->rt6i_flags & ~RTF_EXPIRES;
rt->rt6i_metric = 0;
- memcpy(&rt->rt6i_dst, &ort->rt6i_dst, sizeof(struct rt6key));
#ifdef CONFIG_IPV6_SUBTREES
memcpy(&rt->rt6i_src, &ort->rt6i_src, sizeof(struct rt6key));
#endif
^ permalink raw reply related
* IPv6: autoconfiguration and suspend/resume or link down/up
From: Jiri Bohac @ 2011-07-19 18:02 UTC (permalink / raw)
To: netdev; +Cc: Herbert Xu, David S. Miller, stephen hemminger
Hi,
I came over a surprising behaviour with IPv6 autoconfiguration,
which I think is a bug, but I would first like to hear other
people's opinions before trying to fix this:
Problem 1: all the address/route lifetimes are kept in jiffies
and jiffies don't get incremented on resume. So when a
route/address lifetime is 30 minutes and the system resumes after
1 hour, the route/address should be considered expired, but it is
not.
Problem 2: when a system is moved to a new network a RS is not
sent. Thus, IPv6 does not autoconfigure until the router sends a
periodic RA. This can occur both while the system is alive and
while it is suspended. I think the autoconfigured state should be
discarded when the kernel suspects the system could have been
moved to a different network.
When the cable is unplugged and plugged in again, we already get
notified through linkwatch -> netdev_state_change ->
-> call_netdevice_notifiers(NETDEV_CHANGE, ...)
However, if the device has already been autoconfigured,
addrconf_notify() only handles this event by printing a
message.
So my idea was to:
- handle link up/down in addrconf_notify() similarly to
NETDEV_UP/NETDEV_DOWN
- on suspend, faking a link down event; on resume, faking a link up event
(or better, having a special event type for suspend/resume)
This would cause autoconfiguration to be restarted on resume as
well as cable plug/unplug, solving both the above problems.
Or do we want to completely rely on userspace tools
(networkmanager/ifplug) and expect them to do NETDEV_DOWN on
unplug/suspend and NETDEV_UP on plug/resume?
Any thoughts?
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ
^ permalink raw reply
* [PATCH] vhost: clean up outstanding buffers before setting vring
From: Shirley Ma @ 2011-07-19 18:02 UTC (permalink / raw)
To: David Miller, mst; +Cc: netdev, jasowang
The outstanding DMA buffers need to be clean up before setting vring in
vhost. Otherwise the vring would be out of sync.
Signed-off-by: Shirley Ma<xma@us.ibm.com>
---
drivers/vhost/vhost.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c14c42b..d6315b4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -445,8 +445,10 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
vhost_poll_flush(&dev->vqs[i].poll);
}
/* Wait for all lower device DMAs done. */
- if (dev->vqs[i].ubufs)
+ if (dev->vqs[i].ubufs) {
vhost_ubuf_put_and_wait(dev->vqs[i].ubufs);
+ kfree(dev->vqs[i].ubufs);
+ }
/* Signal guest as appropriate. */
vhost_zerocopy_signal_used(&dev->vqs[i]);
@@ -651,6 +653,12 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
vq = d->vqs + idx;
mutex_lock(&vq->mutex);
+ /* Wait for all lower device DMAs done. */
+ if (vq->ubufs)
+ vhost_ubuf_put_and_wait(vq->ubufs);
+
+ /* Signal guest as appropriate. */
+ vhost_zerocopy_signal_used(vq);
switch (ioctl) {
case VHOST_SET_VRING_NUM:
@@ -1592,7 +1600,6 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *ubufs)
{
kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
wait_event(ubufs->wait, !atomic_read(&ubufs->kref.refcount));
- kfree(ubufs);
}
void vhost_zerocopy_callback(void *arg)
^ permalink raw reply related
* Re: [BUG] ipv6: all routes share same inetpeer
From: David Miller @ 2011-07-19 17:37 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1311096229.2375.49.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 19 Jul 2011 19:23:49 +0200
> Maybe you can find the bug before me ?
I think when we add the route we cow the metrics almost immediately.
The daddr is, unfortunately, fully prefixed at that point.
^ permalink raw reply
* [BUG] ipv6: all routes share same inetpeer
From: Eric Dumazet @ 2011-07-19 17:23 UTC (permalink / raw)
To: David Miller; +Cc: netdev
Hi David
While polishing a patch and testing it, I found that all ipv6 routes
shared the same inetpeer ! Oh well...
Apparently we call rt6_bind_peer() at wrong time, providing NULL
addresses.
Maybe you can find the bug before me ?
With following quick/dirty/debugging patch :
diff --git a/include/net/inetpeer.h b/include/net/inetpeer.h
index 39d1230..f24391c 100644
--- a/include/net/inetpeer.h
+++ b/include/net/inetpeer.h
@@ -88,6 +88,7 @@ static inline struct inet_peer *inet_getpeer_v6(const struct in6_addr *v6daddr,
ipv6_addr_copy((struct in6_addr *)daddr.addr.a6, v6daddr);
daddr.family = AF_INET6;
+ WARN_ON(daddr.addr.a6[0] == 0 && daddr.addr.a6[1] == 0 && daddr.addr.a6[2] == 0 && daddr.addr.a6[3] == 0);
return inet_getpeer(&daddr, create);
}
I get for example :
[ 299.024117] ------------[ cut here ]------------
[ 299.024176] WARNING: at include/net/inetpeer.h:91 rt6_bind_peer+0x84/0xb0()
[ 299.024234] Hardware name: ProLiant BL460c G1
[ 299.024287] Modules linked in: xt_hashlimit ipmi_devintf ipmi_si ipmi_msghandler tg3 bonding
[ 299.024583] Pid: 7119, comm: ping6 Tainted: G W 3.0.0-rc7-03555-ge798b6e-dirty #1048
[ 299.024657] Call Trace:
[ 299.024709] [<c1042b4d>] warn_slowpath_common+0x6d/0xa0
[ 299.024765] [<c1373104>] ? rt6_bind_peer+0x84/0xb0
[ 299.024820] [<c1373104>] ? rt6_bind_peer+0x84/0xb0
[ 299.024875] [<c1042b9d>] warn_slowpath_null+0x1d/0x20
[ 299.024931] [<c1373104>] rt6_bind_peer+0x84/0xb0
[ 299.024985] [<c13731ec>] ipv6_cow_metrics+0xbc/0xe0
[ 299.025046] [<c13722a8>] ip6_rt_copy+0x1e8/0x210
[ 299.025101] [<c1372a70>] rt6_alloc_cow.isra.32+0x10/0x1d0
[ 299.025158] [<c1048fb9>] ? local_bh_enable_ip+0x59/0xc0
[ 299.025213] [<c137356b>] ip6_pol_route.isra.37+0x29b/0x2a0
[ 299.025270] [<c13735a1>] ip6_pol_route_output+0x31/0x40
[ 299.025325] [<c1376277>] fib6_rule_lookup+0x17/0x20
[ 299.025380] [<c137238c>] ip6_route_output+0x5c/0xa0
[ 299.025436] [<c1373570>] ? ip6_pol_route.isra.37+0x2a0/0x2a0
[ 299.025492] [<c1365004>] ip6_dst_lookup_tail+0xd4/0xe0
[ 299.025548] [<c136519f>] ip6_dst_lookup_flow+0x2f/0x90
[ 299.025604] [<c1048fb9>] ? local_bh_enable_ip+0x59/0xc0
[ 299.025660] [<c1390ef4>] ip6_datagram_connect+0x174/0x490
[ 299.025717] [<c12bdd42>] ? release_sock+0xf2/0x150
[ 299.025772] [<c137d9a7>] ? udp_v6_get_port+0x47/0x60
[ 299.025829] [<c132f838>] inet_dgram_connect+0x28/0x70
[ 299.025884] [<c12bc5c0>] sys_connect+0x60/0xa0
[ 299.025939] [<c10d239e>] ? might_fault+0x2e/0x80
[ 299.026001] [<c13c438d>] ? _raw_spin_unlock+0x1d/0x20
[ 299.026057] [<c10d239e>] ? might_fault+0x2e/0x80
[ 299.026117] [<c10d23e4>] ? might_fault+0x74/0x80
[ 299.026172] [<c12bcfeb>] sys_socketcall+0xbb/0x2e0
[ 299.026227] [<c13c4fc3>] ? sysenter_exit+0xf/0x18
[ 299.026282] [<c11a3ec0>] ? trace_hardirqs_on_thunk+0xc/0x10
[ 299.026338] [<c13c4f90>] sysenter_do_call+0x12/0x36
[ 299.026393] ---[ end trace 53d11c892332cf99 ]---
or :
[ 299.032017] ------------[ cut here ]------------
[ 299.032072] WARNING: at include/net/inetpeer.h:91 rt6_bind_peer+0x84/0xb0()
[ 299.032130] Hardware name: ProLiant BL460c G1
[ 299.032183] Modules linked in: xt_hashlimit ipmi_devintf ipmi_si ipmi_msghandler tg3 bonding
[ 299.032482] Pid: 0, comm: kworker/0:1 Tainted: G W 3.0.0-rc7-03555-ge798b6e-dirty #1048
[ 299.032557] Call Trace:
[ 299.032614] [<c1042b4d>] warn_slowpath_common+0x6d/0xa0
[ 299.032671] [<c1373104>] ? rt6_bind_peer+0x84/0xb0
[ 299.032725] [<c1373104>] ? rt6_bind_peer+0x84/0xb0
[ 299.032780] [<c1042b9d>] warn_slowpath_null+0x1d/0x20
[ 299.032835] [<c1373104>] rt6_bind_peer+0x84/0xb0
[ 299.032890] [<c13731ec>] ipv6_cow_metrics+0xbc/0xe0
[ 299.032945] [<c1373a90>] icmp6_dst_alloc+0x1a0/0x2a0
[ 299.033001] [<c13738f0>] ? ip6_blackhole_route+0x240/0x240
[ 299.033058] [<c137a0bf>] ndisc_send_skb+0x4f/0x310
[ 299.033113] [<c137957b>] ? ndisc_fill_addr_option+0x5b/0x90
[ 299.033169] [<c137a3d2>] __ndisc_send+0x52/0x60
[ 299.033224] [<c137ad5d>] ndisc_send_ns+0x5d/0x90
[ 299.033279] [<c136b559>] ? ipv6_chk_addr+0x119/0x130
[ 299.033335] [<c137ae2f>] ndisc_solicit+0x9f/0x130
[ 299.033391] [<c12d8b8e>] neigh_timer_handler+0x10e/0x2a0
[ 299.033447] [<c105168a>] run_timer_softirq+0x13a/0x370
[ 299.033503] [<c1051608>] ? run_timer_softirq+0xb8/0x370
[ 299.033558] [<c12d8a80>] ? neigh_update+0x4c0/0x4c0
[ 299.033614] [<c1049577>] __do_softirq+0x97/0x1f0
[ 299.033674] [<c10494e0>] ? remote_softirq_receive+0x60/0x60
^ permalink raw reply related
* Re: [PATCH] slcan: remove unused 'leased', 'line' and 'pid' fields from the 'slcan' structure
From: Matvejchikov Ilya @ 2011-07-19 17:18 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: netdev
In-Reply-To: <4E25A267.4030005@hartkopp.net>
Well, it seems that it is.
2011/7/19 Oliver Hartkopp <socketcan@hartkopp.net>:
> On 19.07.2011 09:58, Matvejchikov Ilya wrote:
>> Signed-off-by: Matvejchikov Ilya <matvejchikov@gmail.com>
>> ---
>
> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
>
> Thanks Ilya!
>
> Your cleanup is obviously fixing some fallout from this patch
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5342b77c4123ba39f911d92a813295fb3bb21f69
>
> which emerged in 2.6.32 for slip.c .
>
> Best regards,
> Oliver
>
>
>> drivers/net/can/slcan.c | 10 +---------
>> 1 files changed, 1 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/can/slcan.c b/drivers/net/can/slcan.c
>> index 1b49df6..805ee29 100644
>> --- a/drivers/net/can/slcan.c
>> +++ b/drivers/net/can/slcan.c
>> @@ -95,10 +95,6 @@ struct slcan {
>> unsigned long flags; /* Flag values/ mode etc */
>> #define SLF_INUSE 0 /* Channel in use */
>> #define SLF_ERROR 1 /* Parity, etc. error */
>> -
>> - unsigned char leased;
>> - dev_t line;
>> - pid_t pid;
>> };
>>
>> static struct net_device **slcan_devs;
>> @@ -462,7 +458,7 @@ static void slc_sync(void)
>> break;
>>
>> sl = netdev_priv(dev);
>> - if (sl->tty || sl->leased)
>> + if (sl->tty)
>> continue;
>> if (dev->flags & IFF_UP)
>> dev_close(dev);
>> @@ -565,8 +561,6 @@ static int slcan_open(struct tty_struct *tty)
>>
>> sl->tty = tty;
>> tty->disc_data = sl;
>> - sl->line = tty_devnum(tty);
>> - sl->pid = current->pid;
>>
>> if (!test_bit(SLF_INUSE, &sl->flags)) {
>> /* Perform the low-level SLCAN initialization. */
>> @@ -617,8 +611,6 @@ static void slcan_close(struct tty_struct *tty)
>>
>> tty->disc_data = NULL;
>> sl->tty = NULL;
>> - if (!sl->leased)
>> - sl->line = 0;
>>
>> /* Flush network side */
>> unregister_netdev(sl->dev);
>
>
^ permalink raw reply
* Re: [PULL net-next] vhost-net fixes for 3.1
From: David Miller @ 2011-07-19 17:15 UTC (permalink / raw)
To: mst; +Cc: netdev, jasowang, mashirle
In-Reply-To: <20110719114741.GA5827@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 19 Jul 2011 14:47:41 +0300
> Here's Jason's fix for a theoretical event loss
> on migration (tweaked by me to update the test
> module as well), and a refcounting fix by me
> for a bug in the new zerocopy mode.
>
> I thought hard about merging Jason's patches for 3.0
> as well, but in the end I think it's not worth
> the risk of breaking something at this stage.
>
> Please pull for 3.1, thanks a lot!
Pulled, thanks!
^ permalink raw reply
* Re: [PATCH] r8169: fix sticky accepts packet bits in RxConfig.
From: David Miller @ 2011-07-19 17:11 UTC (permalink / raw)
To: romieu; +Cc: netdev, hayeswang
In-Reply-To: <20110719154025.GA7667@electric-eye.fr.zoreil.com>
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Tue, 19 Jul 2011 17:40:25 +0200
> Please pull from branch 'davem-next.r8169' in repository
>
> git://git.kernel.org/pub/scm/linux/kernel/git/romieu/netdev-2.6.git davem-next.r8169
>
> to get the change below.
Pulled, thanks!
^ permalink raw reply
* Re: [RFC PATCH net-next-2.6 0/2] Automatic XPS mapping
From: Tom Herbert @ 2011-07-19 17:07 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev, sf-linux-drivers
In-Reply-To: <1298045607.2570.17.camel@bwh-desktop>
Hi Ben,
I've finally gotten around to looking at how XPS interacts with the HW
priority queues...
On Fri, Feb 18, 2011 at 8:13 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> In the same way that we maintain a mapping CPUs to RX queues for RFS
> acceleration based on current IRQ affinity and the CPU topology, we can
> maintain a mapping of CPUs to TX queues for queue selection in XPS. (In
> fact this may be the same mapping.)
>
Any more progress on this? It seem like a good way to provide default
configuration for XPS that is usable.
> Questions:
> - Does this make a real difference to performance?
XPS seems to when configured correctly :-)
> (I've only barely tested this.)
> - Should there be a way to disable it?
> - Should the automatic mapping be made visible?
Yes, would be nice for this to be readable in the tx-<n> directory for
the queue. Same thing for rmap in RFS acceleration.
> (This applies RFS acceleration too.)
> - Should different mappings be allowed for different traffic classes,
> in case they have separate sets of TX interrupts with different
> affinity?
> (This applies to manual XPS configuration too.)
>
Yes. Looking at XPS and the HW traffic class support, I realized that
they don't seem to play together at all. If XPS is enabled, we don't
do the skb_tx_hash which is where the priority is taken into account.
It's probably worse than that, AFAICT XPS could be configured so that
packets are inadvertently sent on arbitrary priority queues.
I think the correct approach is to first choose a set of queues by
priority, and then among those queues perform XPS. Probably requires
some new configuration to make this hierarchy visible.
Tom
> Ben Hutchings (2):
> net: XPS: Allow driver to provide a default mapping of CPUs to TX
> queues
> sfc: Add CPU queue mapping for XPS
>
> drivers/net/sfc/efx.c | 59 +++++++++++++++++++++++++++++++-------------
> include/linux/netdevice.h | 10 +++++--
> net/Kconfig | 17 +++++++++----
> net/core/dev.c | 41 +++++++++++++++++-------------
> 4 files changed, 83 insertions(+), 44 deletions(-)
>
> --
> 1.7.3.4
>
>
> --
> Ben Hutchings, Senior Software Engineer, Solarflare Communications
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>
>
^ permalink raw reply
* Re: [PATCH 00/10] bnx2x series
From: David Miller @ 2011-07-19 17:05 UTC (permalink / raw)
To: dmitry; +Cc: eilong, vladz, netdev
In-Reply-To: <1311076147.30287.16.camel@lb-tlvb-dmitry>
From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Tue, 19 Jul 2011 14:49:07 +0300
> Please consider applying following series to the net-next.
All applied, thanks.
^ permalink raw reply
* Re: ath: Unable to reset channel (2412 MHz), reset status -5
From: Justin P. Mattock @ 2011-07-19 16:01 UTC (permalink / raw)
To: Mohammed Shafi
Cc: netdev@vger.kernel.org, linux-wireless,
linux-kernel@vger.kernel.org
In-Reply-To: <CAD2nsn1xEB-e33_wrGqLETZ4MwFy12gtyawtg5XHLH0=-9vs2w@mail.gmail.com>
On 07/19/2011 08:49 AM, Mohammed Shafi wrote:
> On Tue, Jul 19, 2011 at 8:55 PM, Justin P. Mattock
> <justinmattock@gmail.com> wrote:
>> On 07/18/2011 11:46 PM, Mohammed Shafi wrote:
>>>
>>> On Tue, Jul 19, 2011 at 12:05 PM, Justin P. Mattock
>>> <justinmattock@gmail.com> wrote:
>>>>
>>>> On 07/18/2011 11:25 PM, Mohammed Shafi wrote:
>>>>>
>>>>> On Tue, Jul 19, 2011 at 11:43 AM, Justin P. Mattock
>>>>> <justinmattock@gmail.com> wrote:
>>>>>>
>>>>>> seems with the latest Mainline I am getting the ath9k carpping out
>>>>>> after
>>>>>> a
>>>>>> while of streaming(dmesg below)..:
>>>>>> http://fpaste.org/D7wM/
>>>>>>
>>>>>> will try a bisect if I have the time..
>>>>>
>>>>> I will try to recreate here with 3.0.0-rc7-wl, we can get more
>>>>> information by ath9k debug=0xffffffff.
>>>>> thanks.
>>>>>
>>>>
>>>> cool!
>>>> seems to fire off randomly over here.
>>>
>>> I have got AR5416, will try with that.
>>>
>>
>> I can try the bisect, but might take a while, since external
>> activities(job). Seems this is just starting i.g. with
>> 3.0.0-rc1 I never saw this, but with 3.0.0-rc7-00125-gf560f66
>> its making noise.
>
> I am unable to get this problem in my AR5416 :( will also try
> tomorrow. I am using the card for browsing etc.
> recently there were patches for hardware code changes.
> just a guess, please check does this patch helps(or is it already
> there). i think this will be called via
> ath_set_channel->ath9k_hw_reset->ath9k_hw_init_bb and may be
> https://patchwork.kernel.org/patch/957902/
I am not seeing this in: ar5008_hw_init_bb
will apply this to the current Mainline when I get the chance and run
the system to see if I get this firing off.
(as for what I have, and was doing:(OS fedora 15)simply streaming radio
with clients running in the background twitter, facebook, thunderbird
Justin P. Mattock
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox