* Re: linux-next: some merging notes
From: Benjamin Herrenschmidt @ 2012-12-11 23:19 UTC (permalink / raw)
To: Stephen Rothwell
Cc: Linus, linux-next, LKML, Paul Mackerras, linuxppc-dev,
David Miller, netdev, Rusty Russell, Greg KH, Steven Rostedt,
Olof Johansson, Arnd Bergmann, linux-arm-kernel, Tomi Valkeinen,
N, Mugunthan V, Nathan Fontenot, Bill Pemberton
In-Reply-To: <20121212091552.02c72c8926f9f9147b080d68@canb.auug.org.au>
On Wed, 2012-12-12 at 09:15 +1100, Stephen Rothwell wrote:
> The powerpc tree
> (git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git#next)
> contains a commit that breaks the building of
> lib/pSeries-reconfig-notifier-error-inject.c. I applied a patch to
> linux-next to disable CONFIG_PSERIES_RECONFIG_NOTIFIER_ERROR_INJECT.
I will put a fix in before I send the pull request.
Cheers,
Ben.
^ permalink raw reply
* Re: pull request: wireless-next 2012-12-11
From: David Miller @ 2012-12-11 23:27 UTC (permalink / raw)
To: linville; +Cc: linux-wireless, netdev
In-Reply-To: <20121211215546.GA3566@tuxdriver.com>
From: "John W. Linville" <linville@tuxdriver.com>
Date: Tue, 11 Dec 2012 16:55:46 -0500
> The following changes since commit 75be437230b06fca87908a787f70de0ce7fbab8c:
>
> net: gro: avoid double copy in skb_gro_receive() (2012-12-11 13:44:09 -0500)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next.git for-davem
Pulled, thanks John.
^ permalink raw reply
* Re: [net-next 0/7] bna: Driver Version Updated to 3.1.2.1
From: David Miller @ 2012-12-11 23:28 UTC (permalink / raw)
To: rmody; +Cc: netdev, bhutchings, David.Laight, adapter_linux_open_src_team
In-Reply-To: <1355264696-8927-1-git-send-email-rmody@brocade.com>
From: Rasesh Mody <rmody@brocade.com>
Date: Tue, 11 Dec 2012 14:24:49 -0800
> Hello Dave,
>
> Resubmitting the patch set with review feedback addressed.
>
> The following patch-set includes Tx Rx changes, bug fixes, firmware
> update, code cleanup and enhancements.
>
> This also updates the BNA driver to v3.1.2.1.
>
> The patches have been compiled and tested against 3.7.0-rc3.
Series applied, thanks.
^ permalink raw reply
* [PATCH 4/5] net: sfc: fix return value check in efx_ptp_probe_channel().
From: Cyril Roelandt @ 2012-12-12 0:24 UTC (permalink / raw)
To: linux-kernel
Cc: kernel-janitors, Cyril Roelandt, linux-net-drivers, bhutchings,
netdev
In-Reply-To: <1355271894-5284-1-git-send-email-tipecaml@gmail.com>
The ptp_clock_register() returns ERR_PTR() and never returns NULL. Replace the
NULL check by a call to IS_ERR().
Signed-off-by: Cyril Roelandt <tipecaml@gmail.com>
---
drivers/net/ethernet/sfc/ptp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/sfc/ptp.c b/drivers/net/ethernet/sfc/ptp.c
index 0767043f..9bcc38c 100644
--- a/drivers/net/ethernet/sfc/ptp.c
+++ b/drivers/net/ethernet/sfc/ptp.c
@@ -930,7 +930,7 @@ static int efx_ptp_probe_channel(struct efx_channel *channel)
ptp->phc_clock = ptp_clock_register(&ptp->phc_clock_info,
&efx->pci_dev->dev);
- if (!ptp->phc_clock)
+ if (IS_ERR(ptp->phc_clock))
goto fail3;
INIT_WORK(&ptp->pps_work, efx_ptp_pps_worker);
--
1.7.10.4
^ permalink raw reply related
* Re: vlan tagged packets and libpcap breakage
From: Ani Sinha @ 2012-12-12 0:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, Francesco Ruggeri, Eric W. Biederman, tcpdump-workers,
Michael Richardson
In-Reply-To: <1355267060.27891.139.camel@edumazet-glaptop>
On Tue, Dec 11, 2012 at 3:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2012-12-11 at 14:36 -0800, Ani Sinha wrote:
>> >
>> > It is possible to test for the presence of support of the new vlan bpf
>> > extensions by attempting to load a filter that uses them. As only valid
>> > filters can be loaded, old kernels that do not support filtering of vlan
>> > tags will fail to load the a test filter with uses them.
>>
>> Unfortunately I do not see this. The sk_chk_filter() does not have a
>> default in the case statement and the check will not detect an unknown
>> instruction. It will fail when the filter is run and as far as I can see,
>> the packet will be dropped. Something like this might help?
>>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index c23543c..96338aa 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -548,6 +548,8 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen)
>> return -EINVAL;
>> /* Some instructions need special checks */
>> switch (code) {
>> + /* for unknown instruction, return EINVAL */
>> + default : return -EINVAL;
>> case BPF_S_ALU_DIV_K:
>> /* check for division by zero */
>> if (ftest->k == 0)
>
> This patch is wrong.
yes I generated this patch wrong.
>
> Check lines 546, 547, 548 where we do the check for unknown instructions
>
> code = codes[code];
> if (!code)
> return -EINVAL;
yepph it's OK here.
>
> If you want to test ANCILLARY possible values, its already too late, as
> old kernels wont use any patch anyway.
yepph, I was looking at possible ancilliary values. Basically this
case statement :
#define ANCILLARY(CODE) case SKF_AD_OFF + SKF_AD_##CODE: \
code = BPF_S_ANC_##CODE; \
break
switch (ftest->k) {
ANCILLARY(PROTOCOL);
ANCILLARY(PKTTYPE);
ANCILLARY(IFINDEX);
ANCILLARY(NLATTR);
ANCILLARY(NLATTR_NEST);
ANCILLARY(MARK);
ANCILLARY(QUEUE);
ANCILLARY(HATYPE);
ANCILLARY(RXHASH);
ANCILLARY(CPU);
ANCILLARY(ALU_XOR_X);
ANCILLARY(VLAN_TAG);
ANCILLARY(VLAN_TAG_PRESENT);
}
_______________________________________________
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers
^ permalink raw reply
* Re: [PATCH net-next v5] bridge: export multicast database via netlink
From: Stephen Hemminger @ 2012-12-12 0:48 UTC (permalink / raw)
To: Cong Wang
Cc: netdev, bridge, Herbert Xu, Jesper Dangaard Brouer, Thomas Graf,
David S. Miller
In-Reply-To: <1354874688-24564-1-git-send-email-amwang@redhat.com>
On Fri, 7 Dec 2012 18:04:48 +0800
Cong Wang <amwang@redhat.com> wrote:
> From: Cong Wang <amwang@redhat.com>
>
> V5: fix two bugs pointed out by Thomas
> remove seq check for now, mark it as TODO
>
> V4: remove some useless #include
> some coding style fix
>
> V3: drop debugging printk's
> update selinux perm table as well
>
> V2: drop patch 1/2, export ifindex directly
> Redesign netlink attributes
> Improve netlink seq check
> Handle IPv6 addr as well
>
> This patch exports bridge multicast database via netlink
> message type RTM_GETMDB. Similar to fdb, but currently bridge-specific.
> We may need to support modify multicast database too (RTM_{ADD,DEL}MDB).
>
> (Thanks to Thomas for patient reviews)
>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Cong Wang <amwang@redhat.com>
>
Applied, but required some manual fixing. It required adding if_bridge.h
to include/linux in iproute2 exported headers. Also patch still had some fuzz
against current version.
^ permalink raw reply
* Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
From: Ani Sinha @ 2012-12-12 0:50 UTC (permalink / raw)
To: Eric Dumazet
Cc: Eric W. Biederman, Michael Richardson, netdev, tcpdump-workers,
Francesco Ruggeri
In-Reply-To: <1355267060.27891.139.camel@edumazet-glaptop>
On Tue, Dec 11, 2012 at 3:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2012-12-11 at 14:36 -0800, Ani Sinha wrote:
>> >
>> > It is possible to test for the presence of support of the new vlan bpf
> If you want to test ANCILLARY possible values, its already too late, as
> old kernels wont use any patch anyway.
>
So basically this means that if we generate a filter with these
special negative offset values and expect that the kernel will
complain if it does not recognize the newer values then we would be
wrong. And you are right. Old kernels never knew about them and the
code wasn't written in a way to return EINVAL if it didn't recognize a
special negative anciliary offset value.
^ permalink raw reply
* [PATCH] solos-pci: fix double-free of TX skb in DMA mode
From: David Woodhouse @ 2012-12-12 0:57 UTC (permalink / raw)
To: netdev; +Cc: nathan
[-- Attachment #1: Type: text/plain, Size: 1234 bytes --]
We weren't clearing card->tx_skb[port] when processing the TX done interrupt.
If there wasn't another skb ready to transmit immediately, this led to a
double-free because we'd free it *again* next time we did have a packet to
send.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
---
drivers/atm/solos-pci.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c
index 6619a8a..c909b7b 100644
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -945,10 +945,11 @@ static uint32_t fpga_tx(struct solos_card *card)
for (port = 0; tx_pending; tx_pending >>= 1, port++) {
if (tx_pending & 1) {
struct sk_buff *oldskb = card->tx_skb[port];
- if (oldskb)
+ if (oldskb) {
pci_unmap_single(card->dev, SKB_CB(oldskb)->dma_addr,
oldskb->len, PCI_DMA_TODEVICE);
-
+ card->tx_skb[port] = NULL;
+ }
spin_lock(&card->tx_queue_lock);
skb = skb_dequeue(&card->tx_queue[port]);
if (!skb)
--
1.8.0.1
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]
^ permalink raw reply related
* Goodluck
From: Allen and Violet Large @ 2012-12-11 19:12 UTC (permalink / raw)
This is a personal email directed to you. My wife and I won a Jackpot
Lottery of $11.3 million in July and have voluntarily decided to donate
the sum of $500,000.00 USD to you as part of our own charity project to
improve the lot of 10 lucky individuals all over the world. If you have
received this email then you are one of the lucky recipients and all you
have to do is get back with us so that we can send your details to the
payout bank.
Please note that you have to contact my private email for more
informations (allen_violetlarge03@yahoo.co.jp)
You can verify this by visiting the web pages below.
http://www.dailymail.co.uk/news/article-1326473/Canadian-couple-Allen-Violet-Large-away-entire-11-2m-lottery-win.html
http://www.cbc.ca/news/canada/nova-scotia/story/2010/11/04/ns-allen-violet-large-lottery-winning.html
Goodluck,
Allen and Violet Large
Email: allen_violetlarge03@yahoo.co.jp
^ permalink raw reply
* Re: netdevice wanrouter: Convert directly reference of netdev->priv
From: Paul Gortmaker @ 2012-12-12 0:58 UTC (permalink / raw)
To: Dan Carpenter; +Cc: wangchen, netdev, David Miller
In-Reply-To: <20121203090405.GA12089@elgon.mountain>
On Mon, Dec 3, 2012 at 4:04 AM, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> Hello Wang Chen,
>
> The patch 7be6065b39c3: "netdevice wanrouter: Convert directly
> reference of netdev->priv" from Nov 20, 2008, leads to the following
> Smatch warning:
> net/wanrouter/wanmain.c:610 wanrouter_device_new_if()
> error: potential NULL dereference 'dev'.
>
> This is an old patch from 2008. It removed the allocation in
> wanrouter_device_new_if() so it looks like wanrouter has been completely
> broken for four years.
Hi Dan,
Crap -- wishing I'd seen this earlier. There was an RFC patch for
sending wanrouter to the bitbucket from Joe Perches, but aside
from the obvious build failures in it that Dave found (and I fixed)
there wasn't any real feedback (either positive or negative) to it:
http://patchwork.ozlabs.org/patch/198830/
Knowing it has been non-functional for ~4 years is I think a key
bit of information in justifying a removal, so folks like yourself
and JuliaL don't waste cycles fixing/auditing dead code. But it
will need to be 3.9 material now, it seems.
Paul.
--
>
> @@ -589,10 +591,6 @@ static int wanrouter_device_new_if(struct wan_device *wandev,
> err = -EPROTONOSUPPORT;
> goto out;
> } else {
> - dev = kzalloc(sizeof(struct net_device), GFP_KERNEL);
> - err = -ENOBUFS;
> - if (dev == NULL)
> - goto out;
> err = wandev->new_if(wandev, dev, cnf);
>
> "dev" is still NULL after the call to ->new_if().
>
> }
>
> Here is what the code looks like now:
>
> net/wanrouter/wanmain.c
> 590 if (cnf->config_id == WANCONFIG_MPPP) {
> 591 printk(KERN_INFO "%s: Wanpipe Mulit-Port PPP support has not been compiled in!\n",
> 592 wandev->name);
> 593 err = -EPROTONOSUPPORT;
> 594 goto out;
> 595 } else {
>
> We were supposed to allocate "dev" here.
>
> 596 err = wandev->new_if(wandev, dev, cnf);
> 597 }
> 598
> 599 if (!err) {
> 600 /* Register network interface. This will invoke init()
> 601 * function supplied by the driver. If device registered
> 602 * successfully, add it to the interface list.
> 603 */
> 604
> 605 #ifdef WANDEBUG
> 606 printk(KERN_INFO "%s: registering interface %s...\n",
> 607 wanrouter_modname, dev->name);
> 608 #endif
> 609
> 610 err = register_netdev(dev);
> ^^^^^^^^^^^^^^^^^^^^
>
> The kernel will always oops inside the call to register_netdev() because
> "dev" is still NULL.
>
> I suspect we should just revert the patch?
>
> regards,
> dan carpenter
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH net-next] bnx2: Fix accidental reversions.
From: Michael Chan @ 2012-12-12 2:24 UTC (permalink / raw)
To: davem; +Cc: netdev
Commit 4ce45e02469c382699f4c5f6df727aea9dd2e1ca
"bnx2: Add BNX2 prefix to CHIP ID and name macros"
accidentally reverted 2 commits to use pci_ioumap() and to make
pci_error_handlers const. This fixes those mistakes.
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index c16526d..a1adfaf 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -8572,7 +8572,7 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
return 0;
error:
- iounmap(bp->regview);
+ pci_iounmap(pdev, bp->regview);
pci_release_regions(pdev);
pci_disable_device(pdev);
pci_set_drvdata(pdev, NULL);
@@ -8750,7 +8750,7 @@ static void bnx2_io_resume(struct pci_dev *pdev)
rtnl_unlock();
}
-static struct pci_error_handlers bnx2_err_handler = {
+static const struct pci_error_handlers bnx2_err_handler = {
.error_detected = bnx2_io_error_detected,
.slot_reset = bnx2_io_slot_reset,
.resume = bnx2_io_resume,
--
1.6.4.GIT
^ permalink raw reply related
* [PATCH net-next] pkt_sched: avoid requeues if possible
From: Eric Dumazet @ 2012-12-12 1:54 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Jamal Hadi Salim, John Fastabend
From: Eric Dumazet <edumazet@google.com>
With BQL being deployed, we can more likely have following behavior :
We dequeue a packet from qdisc in dequeue_skb(), then we realize target
tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
skb into gso_skb for later.
This shows in stats (tc -s qdisc dev eth0) as requeues.
Problem of these requeues is that high priority packets can not be
dequeued as long as this (possibly low prio and big TSO packet) is not
removed from gso_skb.
At 1Gbps speed, a full size TSO packet is 500 us of extra latency.
In some cases, we know that all packets dequeued from a qdisc are
for a particular and known txq :
- If device is non multi queue
- For all MQ/MQPRIO slave qdiscs
This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
this capability, so that dequeue_skb() is allowed to dequeue a packet
only if the associated txq is not stopped.
This indeed reduce latencies for high prio packets (or improve fairness
with sfq/fq_codel), and almost remove qdisc 'requeues'.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
---
include/net/sch_generic.h | 7 +++++++
net/sched/sch_api.c | 2 ++
net/sched/sch_generic.c | 11 ++++++-----
net/sched/sch_mq.c | 4 +++-
net/sched/sch_mqprio.c | 4 ++++
5 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4616f46..1540f9c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -50,6 +50,13 @@ struct Qdisc {
#define TCQ_F_INGRESS 2
#define TCQ_F_CAN_BYPASS 4
#define TCQ_F_MQROOT 8
+#define TCQ_F_ONETXQUEUE 0x10 /* dequeue_skb() can assume all skbs are for
+ * q->dev_queue : It can test
+ * netif_xmit_frozen_or_stopped() before
+ * dequeueing next packet.
+ * Its true for MQ/MQPRIO slaves, or non
+ * multiqueue device.
+ */
#define TCQ_F_WARN_NONWC (1 << 16)
int padded;
const struct Qdisc_ops *ops;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 4799c48..d84f7e7 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -833,6 +833,8 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
goto err_out3;
}
lockdep_set_class(qdisc_lock(sch), &qdisc_tx_lock);
+ if (!netif_is_multiqueue(dev))
+ sch->flags |= TCQ_F_ONETXQUEUE;
}
sch->handle = handle;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index aefc150..5d81a44 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -53,20 +53,19 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
{
struct sk_buff *skb = q->gso_skb;
+ const struct netdev_queue *txq = q->dev_queue;
if (unlikely(skb)) {
- struct net_device *dev = qdisc_dev(q);
- struct netdev_queue *txq;
-
/* check the reason of requeuing without tx lock first */
- txq = netdev_get_tx_queue(dev, skb_get_queue_mapping(skb));
+ txq = netdev_get_tx_queue(txq->dev, skb_get_queue_mapping(skb));
if (!netif_xmit_frozen_or_stopped(txq)) {
q->gso_skb = NULL;
q->q.qlen--;
} else
skb = NULL;
} else {
- skb = q->dequeue(q);
+ if (!(q->flags & TCQ_F_ONETXQUEUE) || !netif_xmit_frozen_or_stopped(txq))
+ skb = q->dequeue(q);
}
return skb;
@@ -686,6 +685,8 @@ static void attach_one_default_qdisc(struct net_device *dev,
netdev_info(dev, "activation failed\n");
return;
}
+ if (!netif_is_multiqueue(dev))
+ qdisc->flags |= TCQ_F_ONETXQUEUE;
}
dev_queue->qdisc_sleeping = qdisc;
}
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 0a4b2f9..5da78a1 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -63,6 +63,7 @@ static int mq_init(struct Qdisc *sch, struct nlattr *opt)
if (qdisc == NULL)
goto err;
priv->qdiscs[ntx] = qdisc;
+ qdisc->flags |= TCQ_F_ONETXQUEUE;
}
sch->flags |= TCQ_F_MQROOT;
@@ -150,7 +151,8 @@ static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
dev_deactivate(dev);
*old = dev_graft_qdisc(dev_queue, new);
-
+ if (new)
+ new->flags |= TCQ_F_ONETXQUEUE;
if (dev->flags & IFF_UP)
dev_activate(dev);
return 0;
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index d1831ca..accec33 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -132,6 +132,7 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
goto err;
}
priv->qdiscs[i] = qdisc;
+ qdisc->flags |= TCQ_F_ONETXQUEUE;
}
/* If the mqprio options indicate that hardware should own
@@ -205,6 +206,9 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
*old = dev_graft_qdisc(dev_queue, new);
+ if (new)
+ new->flags |= TCQ_F_ONETXQUEUE;
+
if (dev->flags & IFF_UP)
dev_activate(dev);
^ permalink raw reply related
* Re: [PATCH net-next] bnx2: Fix accidental reversions.
From: David Miller @ 2012-12-12 2:28 UTC (permalink / raw)
To: mchan; +Cc: netdev
In-Reply-To: <1355279060-24192-1-git-send-email-mchan@broadcom.com>
From: "Michael Chan" <mchan@broadcom.com>
Date: Tue, 11 Dec 2012 18:24:20 -0800
> Commit 4ce45e02469c382699f4c5f6df727aea9dd2e1ca
> "bnx2: Add BNX2 prefix to CHIP ID and name macros"
>
> accidentally reverted 2 commits to use pci_ioumap() and to make
> pci_error_handlers const. This fixes those mistakes.
>
> Signed-off-by: Michael Chan <mchan@broadcom.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next 4/7] openvswitch: add ipv6 'set' action
From: Tom Herbert @ 2012-12-12 3:14 UTC (permalink / raw)
To: Jesse Gross
Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
David Miller
In-Reply-To: <1354214149-33651-5-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
> This patch adds ipv6 set action functionality. It allows to change
> traffic class, flow label, hop-limit, ipv6 source and destination
> address fields.
>
I have to wonder about these patches and the underlying design
direction. Aren't these sort of things and more already implemented
by IPtables but in a modular and extensible fashion? Has there been
any thought into hooking OVS to IP tables to leverage all the existing
functionality?
Thanks,
Tom
> Signed-off-by: Ansis Atteka <aatteka-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
> ---
> net/openvswitch/actions.c | 93 ++++++++++++++++++++++++++++++++++++++++++++
> net/openvswitch/datapath.c | 20 ++++++++++
> 2 files changed, 113 insertions(+)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 0811447..a58ed27 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -28,6 +28,7 @@
> #include <linux/if_arp.h>
> #include <linux/if_vlan.h>
> #include <net/ip.h>
> +#include <net/ipv6.h>
> #include <net/checksum.h>
> #include <net/dsfield.h>
>
> @@ -162,6 +163,53 @@ static void set_ip_addr(struct sk_buff *skb, struct iphdr *nh,
> *addr = new_addr;
> }
>
> +static void update_ipv6_checksum(struct sk_buff *skb, u8 l4_proto,
> + __be32 addr[4], const __be32 new_addr[4])
> +{
> + int transport_len = skb->len - skb_transport_offset(skb);
> +
> + if (l4_proto == IPPROTO_TCP) {
> + if (likely(transport_len >= sizeof(struct tcphdr)))
> + inet_proto_csum_replace16(&tcp_hdr(skb)->check, skb,
> + addr, new_addr, 1);
> + } else if (l4_proto == IPPROTO_UDP) {
> + if (likely(transport_len >= sizeof(struct udphdr))) {
> + struct udphdr *uh = udp_hdr(skb);
> +
> + if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
> + inet_proto_csum_replace16(&uh->check, skb,
> + addr, new_addr, 1);
> + if (!uh->check)
> + uh->check = CSUM_MANGLED_0;
> + }
> + }
> + }
> +}
> +
> +static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto,
> + __be32 addr[4], const __be32 new_addr[4],
> + bool recalculate_csum)
> +{
> + if (recalculate_csum)
> + update_ipv6_checksum(skb, l4_proto, addr, new_addr);
> +
> + skb->rxhash = 0;
> + memcpy(addr, new_addr, sizeof(__be32[4]));
> +}
> +
> +static void set_ipv6_tc(struct ipv6hdr *nh, u8 tc)
> +{
> + nh->priority = tc >> 4;
> + nh->flow_lbl[0] = (nh->flow_lbl[0] & 0x0F) | ((tc & 0x0F) << 4);
> +}
> +
> +static void set_ipv6_fl(struct ipv6hdr *nh, u32 fl)
> +{
> + nh->flow_lbl[0] = (nh->flow_lbl[0] & 0xF0) | (fl & 0x000F0000) >> 16;
> + nh->flow_lbl[1] = (fl & 0x0000FF00) >> 8;
> + nh->flow_lbl[2] = fl & 0x000000FF;
> +}
> +
> static void set_ip_ttl(struct sk_buff *skb, struct iphdr *nh, u8 new_ttl)
> {
> csum_replace2(&nh->check, htons(nh->ttl << 8), htons(new_ttl << 8));
> @@ -195,6 +243,47 @@ static int set_ipv4(struct sk_buff *skb, const struct ovs_key_ipv4 *ipv4_key)
> return 0;
> }
>
> +static int set_ipv6(struct sk_buff *skb, const struct ovs_key_ipv6 *ipv6_key)
> +{
> + struct ipv6hdr *nh;
> + int err;
> + __be32 *saddr;
> + __be32 *daddr;
> +
> + err = make_writable(skb, skb_network_offset(skb) +
> + sizeof(struct ipv6hdr));
> + if (unlikely(err))
> + return err;
> +
> + nh = ipv6_hdr(skb);
> + saddr = (__be32 *)&nh->saddr;
> + daddr = (__be32 *)&nh->daddr;
> +
> + if (memcmp(ipv6_key->ipv6_src, saddr, sizeof(ipv6_key->ipv6_src)))
> + set_ipv6_addr(skb, ipv6_key->ipv6_proto, saddr,
> + ipv6_key->ipv6_src, true);
> +
> + if (memcmp(ipv6_key->ipv6_dst, daddr, sizeof(ipv6_key->ipv6_dst))) {
> + unsigned int offset = 0;
> + int flags = IP6_FH_F_SKIP_RH;
> + bool recalc_csum = true;
> +
> + if (ipv6_ext_hdr(nh->nexthdr))
> + recalc_csum = ipv6_find_hdr(skb, &offset,
> + NEXTHDR_ROUTING, NULL,
> + &flags) != NEXTHDR_ROUTING;
> +
> + set_ipv6_addr(skb, ipv6_key->ipv6_proto, daddr,
> + ipv6_key->ipv6_dst, recalc_csum);
> + }
> +
> + set_ipv6_tc(nh, ipv6_key->ipv6_tclass);
> + set_ipv6_fl(nh, ntohl(ipv6_key->ipv6_label));
> + nh->hop_limit = ipv6_key->ipv6_hlimit;
> +
> + return 0;
> +}
> +
> /* Must follow make_writable() since that can move the skb data. */
> static void set_tp_port(struct sk_buff *skb, __be16 *port,
> __be16 new_port, __sum16 *check)
> @@ -347,6 +436,10 @@ static int execute_set_action(struct sk_buff *skb,
> err = set_ipv4(skb, nla_data(nested_attr));
> break;
>
> + case OVS_KEY_ATTR_IPV6:
> + err = set_ipv6(skb, nla_data(nested_attr));
> + break;
> +
> case OVS_KEY_ATTR_TCP:
> err = set_tcp(skb, nla_data(nested_attr));
> break;
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 4c4b62c..fd4a6a4 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -479,6 +479,7 @@ static int validate_set(const struct nlattr *a,
>
> switch (key_type) {
> const struct ovs_key_ipv4 *ipv4_key;
> + const struct ovs_key_ipv6 *ipv6_key;
>
> case OVS_KEY_ATTR_PRIORITY:
> case OVS_KEY_ATTR_ETHERNET:
> @@ -500,6 +501,25 @@ static int validate_set(const struct nlattr *a,
>
> break;
>
> + case OVS_KEY_ATTR_IPV6:
> + if (flow_key->eth.type != htons(ETH_P_IPV6))
> + return -EINVAL;
> +
> + if (!flow_key->ip.proto)
> + return -EINVAL;
> +
> + ipv6_key = nla_data(ovs_key);
> + if (ipv6_key->ipv6_proto != flow_key->ip.proto)
> + return -EINVAL;
> +
> + if (ipv6_key->ipv6_frag != flow_key->ip.frag)
> + return -EINVAL;
> +
> + if (ntohl(ipv6_key->ipv6_label) & 0xFFF00000)
> + return -EINVAL;
> +
> + break;
> +
> case OVS_KEY_ATTR_TCP:
> if (flow_key->ip.proto != IPPROTO_TCP)
> return -EINVAL;
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] tun: allow setting ethernet addresss while running
From: Jan Engelhardt @ 2012-12-12 3:27 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: davem, netdev, jasowang
In-Reply-To: <1355188560-8388-1-git-send-email-shemminger@vyatta.com>
On Tuesday 2012-12-11 02:16, Stephen Hemminger wrote:
>This is a pure software device, and ok with live address change.
>--- a/drivers/net/tun.c
>+++ b/drivers/net/tun.c
>@@ -849,6 +849,7 @@ static void tun_net_init(struct net_device *dev)
> /* Ethernet TAP Device */
> ether_setup(dev);
> dev->priv_flags &= ~IFF_TX_SKB_SHARING;
>+ dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
>
> eth_hw_addr_random(dev);
Would this possibly apply to L2TP devices as well?
^ permalink raw reply
* Re: [PATCH net-next rfc 0/2] Allow unpriveledge user to disable tuntap queue
From: Jason Wang @ 2012-12-12 3:29 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: pmoore, netdev, linux-kernel, mprivozn
In-Reply-To: <20121211124616.GC15435@redhat.com>
On 12/11/2012 08:46 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 11, 2012 at 07:03:45PM +0800, Jason Wang wrote:
>> This series is an rfc that tries to solve the issue that the queues of tuntap
>> could not be disabled/enabled by unpriveledged user. This is needed for
>> unpriveledge userspace such as qemu since guest may change the number of queues
>> at any time, qemu needs to configure the tuntap to disable/enable a specific
>> queue.
>>
>> Instead of introducting new flag/ioctls, this series tries to re-use the current
>> TUNSETQUEUE and IFF_ATTACH_QUEUE/IFF_DETACH_QUEUE. After this change,
>> IFF_DETACH_QUEUE is used to disable a specific queue instead of detaching all
>> its state from tuntap. IFF_ATTACH_QUEUE is used to do: 1) creating new queue to
>> a tuntap device, in this situation, previous DAC check is still done. 2)
>> re-enable the queue previously disabled by IFF_DETACH_QUEUE, in this situation,
>> we can bypass some checking when we do during queue creating (the check need to
>> be done here needs discussion.
>>
>> Management software (such as libvirt) then can do:
>> - TUNSETIFF to creating device and queue 0
>> - TUNSETQUEUE to create the rest of queues
>> - Passing them to unpriveledge userspace (such as qemu)
> Sorry I find this somewhat confusing.
> Why doesn't management call TUNSETIFF to create all queues -
> seems cleaner, no? Also has the advantage that it works
> without selinux changes.
The issue is how to return those fds through TUNSETIFF. Looks like
there's no space in ifreq for TUNSETIFF, we need another new ioctls to
do this.
>
> So why don't we simply fix TUNSETQUEUE such that
> 1. It only works if already attached to device by TUNSETIFF
> 2. It does not attach/detach, instead simply enables/disables the queue
This is just what this patch does, the only different is when calling
TUNSETQUEUE through a fd without attaching to the device, it is used to
create the queue.
> This way no new flags, just tweak the semantics of the
> existing ones. Need to do this before 3.8 is out though
> otherwise we'll end up maintaining the old semantics forever.
>
Yes, I will try to solve this issue soon.
>> Then the unpriveledge userspace can enable and disable a specific queue through
>> IFF_ATTACH_QUEUE and IFF_DETACH_QUEUE.
>>
>> This is done by introducing a enabled flags were used to notify whether the
>> queue is enabled, and tuntap only send/receive packets when it was enabled.
>>
>> Please comment, thanks!
>>
>> Jason Wang (2):
>> tuntap: forbid calling TUNSETQUEUE for a persistent device with no
>> queues
>> tuntap: allow unpriveledge user to enable and disable queues
>>
>> drivers/net/tun.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++---
>> 1 files changed, 73 insertions(+), 5 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next rfc 2/2] tuntap: allow unpriveledge user to enable and disable queues
From: Jason Wang @ 2012-12-12 3:34 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: pmoore, netdev, linux-kernel, mprivozn
In-Reply-To: <20121211123012.GB15435@redhat.com>
On 12/11/2012 08:30 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 11, 2012 at 07:03:47PM +0800, Jason Wang wrote:
>> Currently, when a file is attached to tuntap through TUNSETQUEUE, the uid/gid
>> and CAP_NET_ADMIN were checked, and we use this ioctl to create and destroy
>> queues. Sometimes, userspace such as qemu need to the ability to enable and
>> disable a specific queue without priveledge since guest operating system may
>> change the number of queues it want use.
>>
>> To support this kind of ability, this patch introduce a flag enabled which is
>> used to track whether the queue is enabled by userspace. And also restrict that
>> only one deivce could be used for a queue to attach. With this patch, the DAC
>> checking when adding queues through IFF_ATTACH_QUEUE is still done and after
>> this, IFF_DETACH_QUEUE/IFF_ATTACH_QUEUE could be used to disable/enable this
>> queue.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/net/tun.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++-----
>> 1 files changed, 73 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index d593f56..43831a7 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -138,6 +138,7 @@ struct tun_file {
>> /* only used for fasnyc */
>> unsigned int flags;
>> u16 queue_index;
>> + bool enabled;
>> };
>>
>> struct tun_flow_entry {
>> @@ -345,9 +346,11 @@ unlock:
>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
>> {
>> struct tun_struct *tun = netdev_priv(dev);
>> + struct tun_file *tfile;
>> struct tun_flow_entry *e;
>> u32 txq = 0;
>> u32 numqueues = 0;
>> + int i;
>>
>> rcu_read_lock();
>> numqueues = tun->numqueues;
>> @@ -366,6 +369,19 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
>> txq -= numqueues;
>> }
>>
>> + tfile = rcu_dereference(tun->tfiles[txq]);
>> + if (unlikely(!tfile->enabled))
> This unlikely tag is suspicious. It should be perfectly
> legal to use less queues than created.
Ok. will remove this check.
>
>> + /* tun_detach() should make sure there's at least one queue
>> + * could be used to do the tranmission.
>> + */
>> + for (i = 0; i < numqueues; i++) {
>> + tfile = rcu_dereference(tun->tfiles[i]);
>> + if (tfile->enabled) {
>> + txq = i;
>> + break;
>> + }
>> + }
>> +
> Worst case this will do a linear scan over all queueus on each packet.
> Instead, I think we need a list of all queues and only install
> the active ones in the array.
Another method is using another variable e.g. active_queues to track how
many queues were enabled. And re-shuffle the pointers during
detaching/attaching to make sure [0, active_queues) to be enabled
queues, and [active_queues, num_queues) to be disabled queues. Then we
could avoid this issue.
>
>> rcu_read_unlock();
>> return txq;
>> }
>> @@ -386,6 +402,36 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
>> netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
>> }
>>
>> +static int tun_enable(struct tun_file *tfile)
>> +{
>> + if (tfile->enabled == true)
> simply if (tfile->enabled)
Right.
>> + return -EINVAL;
> Actually it's better to have operations be
> idempotent. If it's enabled, enabling should
> be a NOP not an error.
Ok.
>> +
>> + tfile->enabled = true;
>> + return 0;
>> +}
>> +
>> +static int tun_disable(struct tun_file *tfile)
>> +{
>> + struct tun_struct *tun = rcu_dereference_protected(tfile->tun,
>> + lockdep_rtnl_is_held());
>> + u16 index = tfile->queue_index;
>> +
>> + if (!tun)
>> + return -EINVAL;
>> +
>> + if (tun->numqueues == 1)
>> + return -EINVAL;
> So if there's a single queue we can't disable it,
> but if there are > 1 we can disable them all.
> This seems arbitrary.
>
The question is whether we can allow the userspace to disable all queues
which looks useless to me. So I try to forbid this.
>> +
>> + BUG_ON(index >= tun->numqueues);
>> + tfile->enabled = false;
>> +
>> + synchronize_net();
>> + tun_flow_delete_by_queue(tun, index);
>> +
>> + return 0;
>> +}
>> +
>> static void __tun_detach(struct tun_file *tfile, bool clean)
>> {
>> struct tun_file *ntfile;
>> @@ -446,6 +492,7 @@ static void tun_detach_all(struct net_device *dev)
>> BUG_ON(!tfile);
>> wake_up_all(&tfile->wq.wait);
>> rcu_assign_pointer(tfile->tun, NULL);
>> + tfile->enabled = false;
>> --tun->numqueues;
>> }
>> BUG_ON(tun->numqueues != 0);
>> @@ -490,6 +537,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
>> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>> sock_hold(&tfile->sk);
>> tun->numqueues++;
>> + tfile->enabled = true;
>>
>> tun_set_real_num_queues(tun);
>>
>> @@ -672,6 +720,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>> if (txq >= tun->numqueues)
>> goto drop;
>>
>> + /* Drop packet if the queue was not enabled */
>> + if (!tfile->enabled)
>> + goto drop;
>> +
>> tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
>>
>> BUG_ON(!tfile);
>> @@ -1010,6 +1062,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>> bool zerocopy = false;
>> int err;
>>
>> + if (!tfile->enabled)
>> + return -EINVAL;
>> +
>> if (!(tun->flags & TUN_NO_PI)) {
>> if ((len -= sizeof(pi)) > total_len)
>> return -EINVAL;
>> @@ -1199,6 +1254,9 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>> struct tun_pi pi = { 0, skb->protocol };
>> ssize_t total = 0;
>>
>> + if (!tfile->enabled)
>> + return -EINVAL;
>> +
>> if (!(tun->flags & TUN_NO_PI)) {
>> if ((len -= sizeof(pi)) < 0)
>> return -EINVAL;
>> @@ -1769,15 +1827,21 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>> if (dev->netdev_ops != &tap_netdev_ops &&
>> dev->netdev_ops != &tun_netdev_ops)
>> ret = -EINVAL;
>> - else if (tun_not_capable(tun))
>> - ret = -EPERM;
>> - /* TUNSETIFF is needed to do permission checking */
>> - else if (tun->numqueues == 0)
>> - ret = -EPERM;
>> - else
>> - ret = tun_attach(tun, file);
>> + else {
>> + if (!rcu_dereference(tfile->tun)) {
> Should be rcu_dereference_protected.
True.
>
>> + if (tun_not_capable(tun) ||
>> + tun->numqueues == 0)
>> + ret = -EPERM;
>> + else
>> + ret = tun_attach(tun, file);
>> + }
>> + else {
>> + /* FIXME: permission check? */
>> + ret = tun_enable(tfile);
>> + }
>> + }
>> } else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
>> - __tun_detach(tfile, false);
>> + tun_disable(tfile);
>> else
>> ret = -EINVAL;
>>
>> @@ -2085,6 +2149,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
>> tfile->socket.file = file;
>> tfile->socket.ops = &tun_socket_ops;
>>
>> + tfile->enabled = false;
>> sock_init_data(&tfile->socket, &tfile->sk);
>> sk_change_net(&tfile->sk, tfile->net);
>>
>> --
>> 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 4/5] net: sfc: fix return value check in efx_ptp_probe_channel().
From: David Miller @ 2012-12-12 5:15 UTC (permalink / raw)
To: tipecaml
Cc: linux-kernel, kernel-janitors, linux-net-drivers, bhutchings,
netdev
In-Reply-To: <1355271894-5284-5-git-send-email-tipecaml@gmail.com>
From: Cyril Roelandt <tipecaml@gmail.com>
Date: Wed, 12 Dec 2012 01:24:53 +0100
> The ptp_clock_register() returns ERR_PTR() and never returns NULL. Replace the
> NULL check by a call to IS_ERR().
>
> Signed-off-by: Cyril Roelandt <tipecaml@gmail.com>
I'll let Ben queue this up.
Probably he'll want to avoid potentially leaving an ERR_PTR
in ptp->phc_clock even if, with this fix, that would be
harmless.
^ permalink raw reply
* Re: [PATCH] solos-pci: fix double-free of TX skb in DMA mode
From: David Miller @ 2012-12-12 5:23 UTC (permalink / raw)
To: dwmw2; +Cc: netdev, nathan
In-Reply-To: <1355273834.23544.37.camel@shinybook.infradead.org>
From: David Woodhouse <dwmw2@infradead.org>
Date: Wed, 12 Dec 2012 00:57:14 +0000
> We weren't clearing card->tx_skb[port] when processing the TX done interrupt.
> If there wasn't another skb ready to transmit immediately, this led to a
> double-free because we'd free it *again* next time we did have a packet to
> send.
>
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> Cc: stable@kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [PATCH] solos-pci: fix double-free of TX skb in DMA mode
From: David Miller @ 2012-12-12 5:24 UTC (permalink / raw)
To: dwmw2; +Cc: netdev, nathan
In-Reply-To: <20121212.002345.2154152659215725592.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Wed, 12 Dec 2012 00:23:45 -0500 (EST)
> From: David Woodhouse <dwmw2@infradead.org>
> Date: Wed, 12 Dec 2012 00:57:14 +0000
>
>> We weren't clearing card->tx_skb[port] when processing the TX done interrupt.
>> If there wasn't another skb ready to transmit immediately, this led to a
>> double-free because we'd free it *again* next time we did have a packet to
>> send.
>>
>> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
>> Cc: stable@kernel.org
>
> Acked-by: David S. Miller <davem@davemloft.net>
Sorry, fingers slipped, I meant "Applied" :-)
^ permalink raw reply
* Re: [PATCH net-next] pkt_sched: avoid requeues if possible
From: David Miller @ 2012-12-12 5:24 UTC (permalink / raw)
To: erdnetdev; +Cc: netdev, jhs, john.r.fastabend
In-Reply-To: <1355277273.27891.166.camel@edumazet-glaptop>
From: Eric Dumazet <erdnetdev@gmail.com>
Date: Tue, 11 Dec 2012 17:54:33 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> With BQL being deployed, we can more likely have following behavior :
>
> We dequeue a packet from qdisc in dequeue_skb(), then we realize target
> tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
> skb into gso_skb for later.
>
> This shows in stats (tc -s qdisc dev eth0) as requeues.
>
> Problem of these requeues is that high priority packets can not be
> dequeued as long as this (possibly low prio and big TSO packet) is not
> removed from gso_skb.
>
> At 1Gbps speed, a full size TSO packet is 500 us of extra latency.
>
> In some cases, we know that all packets dequeued from a qdisc are
> for a particular and known txq :
>
> - If device is non multi queue
> - For all MQ/MQPRIO slave qdiscs
>
> This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
> this capability, so that dequeue_skb() is allowed to dequeue a packet
> only if the associated txq is not stopped.
>
> This indeed reduce latencies for high prio packets (or improve fairness
> with sfq/fq_codel), and almost remove qdisc 'requeues'.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next v5] bridge: export multicast database via netlink
From: Cong Wang @ 2012-12-12 7:59 UTC (permalink / raw)
To: Stephen Hemminger
Cc: netdev, bridge, Herbert Xu, Jesper Dangaard Brouer, Thomas Graf,
David S. Miller
In-Reply-To: <20121211164856.37ce94fe@nehalam.linuxnetplumber.net>
On Tue, 2012-12-11 at 16:48 -0800, Stephen Hemminger wrote:
>
> Applied, but required some manual fixing. It required adding if_bridge.h
> to include/linux in iproute2 exported headers. Also patch still had some fuzz
> against current version.
>
Thanks, Stephen!
I thought those headers are sync'ed with kernel headers automatically,
so we have to keep them up to date manually.
^ permalink raw reply
* [PATCH 1/2] iproute2: implement add/del mdb entry
From: Cong Wang @ 2012-12-12 8:23 UTC (permalink / raw)
To: netdev; +Cc: Thomas Graf, Stephen Hemminger, bridge, Cong Wang
In-Reply-To: <1355300590-2390-1-git-send-email-amwang@redhat.com>
From: Cong Wang <amwang@redhat.com>
This patch implements:
bridge mdb { add | del } dev DEV port PORT grp GROUP
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
bridge/mdb.c | 76 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/if_bridge.h | 8 +++++
include/linux/rtnetlink.h | 4 ++
3 files changed, 88 insertions(+), 0 deletions(-)
diff --git a/bridge/mdb.c b/bridge/mdb.c
index 390d7f6..4d8a896 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -28,6 +28,7 @@ int filter_index;
static void usage(void)
{
+ fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP\n");
fprintf(stderr, " bridge mdb {show} [ dev DEV ]\n");
exit(-1);
}
@@ -153,11 +154,86 @@ static int mdb_show(int argc, char **argv)
return 0;
}
+static int mdb_modify(int cmd, int flags, int argc, char **argv)
+{
+ struct {
+ struct nlmsghdr n;
+ struct br_port_msg bpm;
+ char buf[1024];
+ } req;
+ struct br_mdb_entry entry;
+ char *d = NULL, *p = NULL, *grp = NULL;
+
+ memset(&req, 0, sizeof(req));
+ memset(&entry, 0, sizeof(entry));
+
+ req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct br_port_msg));
+ req.n.nlmsg_flags = NLM_F_REQUEST|flags;
+ req.n.nlmsg_type = cmd;
+ req.bpm.family = PF_BRIDGE;
+
+ while (argc > 0) {
+ if (strcmp(*argv, "dev") == 0) {
+ NEXT_ARG();
+ d = *argv;
+ } else if (strcmp(*argv, "grp") == 0) {
+ NEXT_ARG();
+ grp = *argv;
+ } else {
+ if (strcmp(*argv, "port") == 0) {
+ NEXT_ARG();
+ p = *argv;
+ }
+ if (matches(*argv, "help") == 0)
+ usage();
+ }
+ argc--; argv++;
+ }
+
+ if (d == NULL || grp == NULL || p == NULL) {
+ fprintf(stderr, "Device, group address and port name are required arguments.\n");
+ exit(-1);
+ }
+
+ req.bpm.ifindex = ll_name_to_index(d);
+ if (req.bpm.ifindex == 0) {
+ fprintf(stderr, "Cannot find device \"%s\"\n", d);
+ return -1;
+ }
+
+ entry.ifindex = ll_name_to_index(p);
+ if (entry.ifindex == 0) {
+ fprintf(stderr, "Cannot find device \"%s\"\n", p);
+ return -1;
+ }
+
+ if (!inet_pton(AF_INET, grp, &entry.addr.u.ip4)) {
+ if (!inet_pton(AF_INET6, grp, &entry.addr.u.ip6)) {
+ fprintf(stderr, "Invalid address \"%s\"\n", grp);
+ return -1;
+ } else
+ entry.addr.proto = htons(ETH_P_IPV6);
+ } else
+ entry.addr.proto = htons(ETH_P_IP);
+
+ addattr_l(&req.n, sizeof(req), MDBA_SET_ENTRY, &entry, sizeof(entry));
+
+ if (rtnl_talk(&rth, &req.n, 0, 0, NULL) < 0)
+ exit(2);
+
+ return 0;
+}
+
int do_mdb(int argc, char **argv)
{
ll_init_map(&rth);
if (argc > 0) {
+ if (matches(*argv, "add") == 0)
+ return mdb_modify(RTM_NEWMDB, NLM_F_CREATE|NLM_F_EXCL, argc-1, argv+1);
+ if (matches(*argv, "delete") == 0)
+ return mdb_modify(RTM_DELMDB, 0, argc-1, argv+1);
+
if (matches(*argv, "show") == 0 ||
matches(*argv, "lst") == 0 ||
matches(*argv, "list") == 0)
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 151a8bb..b3b6a67 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -157,6 +157,7 @@ enum {
#define MDBA_ROUTER_MAX (__MDBA_ROUTER_MAX - 1)
struct br_port_msg {
+ __u8 family;
__u32 ifindex;
};
@@ -171,4 +172,11 @@ struct br_mdb_entry {
} addr;
};
+enum {
+ MDBA_SET_ENTRY_UNSPEC,
+ MDBA_SET_ENTRY,
+ __MDBA_SET_ENTRY_MAX,
+};
+#define MDBA_SET_ENTRY_MAX (__MDBA_SET_ENTRY_MAX - 1)
+
#endif /* _LINUX_IF_BRIDGE_H */
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index c82a159..3ea85dc 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -125,6 +125,10 @@ enum {
RTM_GETNETCONF = 82,
#define RTM_GETNETCONF RTM_GETNETCONF
+ RTM_NEWMDB = 84,
+#define RTM_NEWMDB RTM_NEWMDB
+ RTM_DELMDB = 85,
+#define RTM_DELMDB RTM_DELMDB
RTM_GETMDB = 86,
#define RTM_GETMDB RTM_GETMDB
--
1.7.7.6
^ permalink raw reply related
* [PATCH 2/2] iproute2: add support to monitor mdb entries too
From: Cong Wang @ 2012-12-12 8:23 UTC (permalink / raw)
To: netdev; +Cc: bridge, Cong Wang, Stephen Hemminger, Thomas Graf
In-Reply-To: <1355300590-2390-1-git-send-email-amwang@redhat.com>
From: Cong Wang <amwang@redhat.com>
This patch implements `bridge monitor mdb`.
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
bridge/br_common.h | 2 ++
bridge/mdb.c | 4 ++--
bridge/monitor.c | 14 ++++++++++++++
include/linux/rtnetlink.h | 2 ++
4 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/bridge/br_common.h b/bridge/br_common.h
index 892fb76..10f6ce9 100644
--- a/bridge/br_common.h
+++ b/bridge/br_common.h
@@ -3,6 +3,8 @@ extern int print_linkinfo(const struct sockaddr_nl *who,
void *arg);
extern int print_fdb(const struct sockaddr_nl *who,
struct nlmsghdr *n, void *arg);
+extern int print_mdb(const struct sockaddr_nl *who,
+ struct nlmsghdr *n, void *arg);
extern int do_fdb(int argc, char **argv);
extern int do_mdb(int argc, char **argv);
diff --git a/bridge/mdb.c b/bridge/mdb.c
index 4d8a896..121ce9c 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -82,8 +82,8 @@ int print_mdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
int len = n->nlmsg_len;
struct rtattr * tb[MDBA_MAX+1];
- if (n->nlmsg_type != RTM_GETMDB) {
- fprintf(stderr, "Not RTM_GETMDB: %08x %08x %08x\n",
+ if (n->nlmsg_type != RTM_GETMDB && n->nlmsg_type != RTM_NEWMDB && n->nlmsg_type != RTM_DELMDB) {
+ fprintf(stderr, "Not RTM_GETMDB, RTM_NEWMDB or RTM_DELMDB: %08x %08x %08x\n",
n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
return 0;
diff --git a/bridge/monitor.c b/bridge/monitor.c
index 2f60655..44e14d8 100644
--- a/bridge/monitor.c
+++ b/bridge/monitor.c
@@ -68,6 +68,12 @@ int accept_msg(const struct sockaddr_nl *who,
fprintf(fp, "[NEIGH]");
return print_fdb(who, n, arg);
+ case RTM_NEWMDB:
+ case RTM_DELMDB:
+ if (prefix_banner)
+ fprintf(fp, "[MDB]");
+ return print_mdb(who, n, arg);
+
case 15:
return show_mark(fp, n);
@@ -84,6 +90,7 @@ int do_monitor(int argc, char **argv)
unsigned groups = ~RTMGRP_TC;
int llink=0;
int lneigh=0;
+ int lmdb=0;
rtnl_close(&rth);
@@ -97,6 +104,9 @@ int do_monitor(int argc, char **argv)
} else if (matches(*argv, "fdb") == 0) {
lneigh = 1;
groups = 0;
+ } else if (matches(*argv, "mdb") == 0) {
+ lmdb = 1;
+ groups = 0;
} else if (strcmp(*argv, "all") == 0) {
groups = ~RTMGRP_TC;
prefix_banner=1;
@@ -116,6 +126,10 @@ int do_monitor(int argc, char **argv)
groups |= nl_mgrp(RTNLGRP_NEIGH);
}
+ if (lmdb) {
+ groups |= nl_mgrp(RTNLGRP_MDB);
+ }
+
if (file) {
FILE *fp;
fp = fopen(file, "r");
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 3ea85dc..87452b4 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -609,6 +609,8 @@ enum rtnetlink_groups {
#define RTNLGRP_IPV4_NETCONF RTNLGRP_IPV4_NETCONF
RTNLGRP_IPV6_NETCONF,
#define RTNLGRP_IPV6_NETCONF RTNLGRP_IPV6_NETCONF
+ RTNLGRP_MDB,
+#define RTNLGRP_MDB RTNLGRP_MDB
__RTNLGRP_MAX
};
#define RTNLGRP_MAX (__RTNLGRP_MAX - 1)
--
1.7.7.6
^ permalink raw reply related
* [PATCH net-next 1/2] bridge: notify mdb changes via netlink
From: Cong Wang @ 2012-12-12 8:23 UTC (permalink / raw)
To: netdev
Cc: bridge, Cong Wang, Herbert Xu, Stephen Hemminger, David S. Miller,
Thomas Graf
From: Cong Wang <amwang@redhat.com>
As Stephen mentioned, we need to monitor the mdb
changes in user-space, so add notifications via netlink too.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
include/uapi/linux/rtnetlink.h | 6 +++
net/bridge/br_mdb.c | 80 ++++++++++++++++++++++++++++++++++++++++
net/bridge/br_multicast.c | 2 +
net/bridge/br_private.h | 2 +
4 files changed, 90 insertions(+), 0 deletions(-)
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 354a1e7..7a5eb19 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -125,6 +125,10 @@ enum {
RTM_GETNETCONF = 82,
#define RTM_GETNETCONF RTM_GETNETCONF
+ RTM_NEWMDB = 84,
+#define RTM_NEWMDB RTM_NEWMDB
+ RTM_DELMDB = 85,
+#define RTM_DELMDB RTM_DELMDB
RTM_GETMDB = 86,
#define RTM_GETMDB RTM_GETMDB
@@ -607,6 +611,8 @@ enum rtnetlink_groups {
#define RTNLGRP_IPV4_NETCONF RTNLGRP_IPV4_NETCONF
RTNLGRP_IPV6_NETCONF,
#define RTNLGRP_IPV6_NETCONF RTNLGRP_IPV6_NETCONF
+ RTNLGRP_MDB,
+#define RTNLGRP_MDB RTNLGRP_MDB
__RTNLGRP_MAX
};
#define RTNLGRP_MAX (__RTNLGRP_MAX - 1)
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index ccc43a9..a8cfbf5 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -155,6 +155,86 @@ out:
return skb->len;
}
+static int nlmsg_populate_mdb_fill(struct sk_buff *skb,
+ struct net_device *dev,
+ struct br_mdb_entry *entry, u32 pid,
+ u32 seq, int type, unsigned int flags)
+{
+ struct nlmsghdr *nlh;
+ struct br_port_msg *bpm;
+ struct nlattr *nest, *nest2;
+
+ nlh = nlmsg_put(skb, pid, seq, type, sizeof(*bpm), NLM_F_MULTI);
+ if (!nlh)
+ return -EMSGSIZE;
+
+ bpm = nlmsg_data(nlh);
+ bpm->family = AF_BRIDGE;
+ bpm->ifindex = dev->ifindex;
+ nest = nla_nest_start(skb, MDBA_MDB);
+ if (nest == NULL)
+ goto cancel;
+ nest2 = nla_nest_start(skb, MDBA_MDB_ENTRY);
+ if (nest2 == NULL)
+ goto end;
+
+ if (nla_put(skb, MDBA_MDB_ENTRY_INFO, sizeof(*entry), entry))
+ goto end;
+
+ nla_nest_end(skb, nest2);
+ nla_nest_end(skb, nest);
+ return nlmsg_end(skb, nlh);
+
+end:
+ nla_nest_end(skb, nest);
+cancel:
+ nlmsg_cancel(skb, nlh);
+ return -EMSGSIZE;
+}
+
+static inline size_t rtnl_mdb_nlmsg_size(void)
+{
+ return NLMSG_ALIGN(sizeof(struct br_port_msg))
+ + nla_total_size(sizeof(struct br_mdb_entry));
+}
+
+static void __br_mdb_notify(struct net_device *dev, struct br_mdb_entry *entry,
+ int type)
+{
+ struct net *net = dev_net(dev);
+ struct sk_buff *skb;
+ int err = -ENOBUFS;
+
+ skb = nlmsg_new(rtnl_mdb_nlmsg_size(), GFP_ATOMIC);
+ if (!skb)
+ goto errout;
+
+ err = nlmsg_populate_mdb_fill(skb, dev, entry, 0, 0, type, NTF_SELF);
+ if (err < 0) {
+ kfree_skb(skb);
+ goto errout;
+ }
+
+ rtnl_notify(skb, net, 0, RTNLGRP_MDB, NULL, GFP_ATOMIC);
+ return;
+errout:
+ rtnl_set_sk_err(net, RTNLGRP_MDB, err);
+}
+
+void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
+ struct br_ip *group, int type)
+{
+ struct br_mdb_entry entry;
+
+ entry.ifindex = port->dev->ifindex;
+ entry.addr.proto = group->proto;
+ entry.addr.u.ip4 = group->u.ip4;
+#if IS_ENABLED(CONFIG_IPV6)
+ entry.addr.u.ip6 = group->u.ip6;
+#endif
+ __br_mdb_notify(dev, &entry, type);
+}
+
void br_mdb_init(void)
{
rtnl_register(PF_BRIDGE, RTM_GETMDB, NULL, br_mdb_dump, NULL);
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 847b98a1..d929586 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -681,6 +681,7 @@ static int br_multicast_add_group(struct net_bridge *br,
(unsigned long)p);
rcu_assign_pointer(*pp, p);
+ br_mdb_notify(br->dev, port, group, RTM_NEWMDB);
found:
mod_timer(&p->timer, now + br->multicast_membership_interval);
@@ -1240,6 +1241,7 @@ static void br_multicast_leave_group(struct net_bridge *br,
hlist_del_init(&p->mglist);
del_timer(&p->timer);
call_rcu_bh(&p->rcu, br_multicast_free_pg);
+ br_mdb_notify(br->dev, port, group, RTM_DELMDB);
if (!mp->ports && !mp->mglist &&
netif_running(br->dev))
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index f95b766..2807c76 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -435,6 +435,8 @@ extern int br_multicast_toggle(struct net_bridge *br, unsigned long val);
extern int br_multicast_set_querier(struct net_bridge *br, unsigned long val);
extern int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val);
extern void br_mdb_init(void);
+extern void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
+ struct br_ip *group, int type);
static inline bool br_multicast_is_router(struct net_bridge *br)
{
--
1.7.7.6
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox