* Re: [PATCH v14 06/17] Use callback to deal with skb_release_data() specially.
From: Eric Dumazet @ 2010-11-04 9:07 UTC (permalink / raw)
To: xiaohui.xin; +Cc: netdev, kvm, linux-kernel, mst, mingo, davem, herbert, jdike
In-Reply-To: <1288861465.2659.44.camel@edumazet-laptop>
Le jeudi 04 novembre 2010 à 10:04 +0100, Eric Dumazet a écrit :
> Hmm, I suggest you read the comment two lines above.
>
> If destructor_arg is now cleared each time we allocate a new skb, then,
> please move it before dataref in shinfo structure, so that the following
> memset() does the job efficiently...
Something like :
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e6ba898..2dca504 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -195,6 +195,9 @@ struct skb_shared_info {
__be32 ip6_frag_id;
__u8 tx_flags;
struct sk_buff *frag_list;
+ /* Intermediate layers must ensure that destructor_arg
+ * remains valid until skb destructor */
+ void *destructor_arg;
struct skb_shared_hwtstamps hwtstamps;
/*
@@ -202,9 +205,6 @@ struct skb_shared_info {
*/
atomic_t dataref;
- /* Intermediate layers must ensure that destructor_arg
- * remains valid until skb destructor */
- void * destructor_arg;
/* must be last field, see pskb_expand_head() */
skb_frag_t frags[MAX_SKB_FRAGS];
};
^ permalink raw reply related
* 500,000GBP was awarded to your email
From: Henrik Maibom Hansen @ 2010-11-04 8:25 UTC (permalink / raw)
You won from the google promotion, contact: mr.grahampoll15@gmail.com for your claims
^ permalink raw reply
* Re: [RFC 0/3] MPEG2/TS drop analyzer iptables match extension
From: Jesper Dangaard Brouer @ 2010-11-04 9:20 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Netfilter Developers, paulmck, Eric Dumazet, netdev, Solvik Blum
In-Reply-To: <alpine.LNX.2.01.1011040113270.8604@obet.zrqbmnf.qr>
On Thu, 4 Nov 2010, Jan Engelhardt wrote:
> On Tuesday 2010-10-19 16:21, Jesper Dangaard Brouer wrote:
>>
>> This is my iptables match module for analyzing IPTV MPEG2/TS streams.
>> Currently it only detects dropped packets, but I want to extend it for
>> analyzing jitter and bursts.
>>
>> Jan Engelhardt convinced me that I should just send the module as-is
>> for review on the list. I wrote the code in 2009, and have only done
>> some minor changes to make it work on kernel 2.6.35 since.
>
> This now lives in the mp2t branch (since NFWS already actually) of xt-a,
> and I have taken the liberty to start updating it to higher standards.
> Please watch that branch, as I don't have any MPEG equipment around me
> to do runtime tests.
Jan, I would actually like to maintain the source via my own git tree.
And I would gladly accept your patches against that tree.
Since the workshop, I have been busy "Open Sourcing" the project my self.
I now have a git repository, which also contains the collector daemon,
web-code and database layout. The git tree also contains some README
documentation. It for example contains a description of howto setup a
testlab with MPEG2 streaming via VLC and controlled packet drops via
tc-netem, so you can perform your runtime tests.
I didn't plan to release the project just yet, as I wanted to do some
renaming. E.g I want to rename the mp2t module to mpeg2ts, and I want to
rename the collector daemon from tvprobe to iptv-analyzer.
The git tree is temporarily located on people.netfilter.org:
git://people.netfilter.org/hawk/iptv-analyzer.git
I have bought the domain iptv-analyzer.org, but I have not installed any
servers on that domain, yet.
Solvik Blum (Cc.ed), is currently helping me out with the web-frontend.
Cheers,
Jesper Brouer
--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------
^ permalink raw reply
* Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
From: Michael S. Tsirkin @ 2010-11-04 9:30 UTC (permalink / raw)
To: Shirley Ma; +Cc: David Miller, netdev, kvm, linux-kernel
In-Reply-To: <1288849126.12932.4.camel@localhost.localdomain>
On Wed, Nov 03, 2010 at 10:38:46PM -0700, Shirley Ma wrote:
> On Wed, 2010-11-03 at 12:48 +0200, Michael S. Tsirkin wrote:
> > I mean in practice, you see a benefit from this patch?
>
> Yes, I tested it. It does benefit the performance.
>
> > > My concern here is whether checking only in set up would be
> > sufficient
> > > for security?
> >
> > It better be sufficient because the checks that put_user does
> > are not effictive when run from the kernel thread, anyway.
> >
> > > Would be there is a case guest could corrupt the ring
> > > later? If not, that's OK.
> >
> > You mean change the pointer after it's checked?
> > If you see such a case, please holler.
>
> I wonder about it, not a such case in mind.
>
> > To clarify: the combination of __put_user and separate
> > signalling is giving the same performance benefit as your
> > patch?
>
> Yes, it has similar performance, not I haven't finished all message
> sizes comparison yet.
>
> > I am mostly concerned with adding code that seems to help
> > speed for reasons we don't completely understand, because
> > then we might break the optimization easily without noticing.
>
> I don't think the patch I submited would break up anything.
No, I just meant that when a patch gives some benefit, I'd like
to understand where the benefit comes from so that I don't
break it later.
> It just
> reduced the cost of per used buffer 3 put_user() calls and guest
> signaling from one to one to many to one.
One thing to note is that deferred signalling needs to be
benchmarked with old guests which don't orphan skbs on xmit
(or disable orphaning in both networking stack and virtio-net).
>
> Thanks
> Shirley
OK, so I guess I'll queue the __put_user etc patches for net-next, on top of this
I think a patch which defers signalling would be nice to have,
then we can figure out whether a separate heads array still has benefits
for non zero copy case: if yes what they are, if no whether it should be
used for zero copy only for both both non-zero copy and zero copy.
Makes sense?
--
MST
^ permalink raw reply
* Re: [PATCH 1/2] r6040: fix multicast operations
From: Florian Fainelli @ 2010-11-04 10:04 UTC (permalink / raw)
To: Shawn Lin; +Cc: netdev, Marc Leclerc, Albert Chen, David Miller, Ben Hutchings
In-Reply-To: <1288788689.1837.141.camel@shawn-desktop>
Hello Shawn,
On Wednesday 03 November 2010 13:51:29 Shawn Lin wrote:
> On Wed, 2010-10-20 at 23:09 +0200, Florian Fainelli wrote:
> > This patch fixes the following issues with the r6040 NIC operating in
> > multicast:
> >
> > 1) When the IFF_ALLMULTI flag is set, we should write 0xffff to the NIC
> > hash
> >
> > table registers to make it process multicast traffic
> >
> > 2) When the number of multicast address to handle is smaller than
> > MCAST_MAX
> >
> > we should use the NIC multicast registers MID1_{L,M,H}.
> >
> > 3) The hashing of the address was not correct, due to an invalid
> > substraction
> >
> > (15 - (crc & 0x0f)) instead of (crc & 0x0f)
>
> I suggest to modify the comment as follows.
>
> 3) The hashing of the address was not correct, due to an invalid
> substraction (15 - (crc & 0x0f)) instead of (crc & 0x0f) and an
> incorrect crc algorithm (ether_crc_le) instead of (ether_crc).
>
> [...]
>
> The original code I submitted to Florian has some issues mentioned by Ben
> Hutchings.
>
> This revision fixes these issues and another issue about the sequence of
> configuring multicast hash table registers.
>
> The correct sequence is to enable multicast function before write values to
> hash table registers. I have verified it on my platform.
>
> The hash algorithm is provided by hardware designers. I also re-confirmed
> it with RDC's engineer.
>
> Please let me know if anyone has questions.
>
> The version is for net-next-2.6:
Please resubmit the patch with your Signed-off-by tag and the Tested-by:
to keep track of the issue. Thank you!
>
> ---
> diff --git a/drivers/net/r6040.c b/drivers/net/r6040.c
> index 0b014c8..e88e171 100644
> --- a/drivers/net/r6040.c
> +++ b/drivers/net/r6040.c
> @@ -69,6 +69,8 @@
>
> /* MAC registers */
> #define MCR0 0x00 /* Control register 0 */
> +#define PROMISC 0x0020 /* Promiscuous mode */
> +#define HASH_EN 0x0100 /* Enable multicast hash table function */
> #define MCR1 0x04 /* Control register 1 */
> #define MAC_RST 0x0001 /* Reset the MAC */
> #define MBCR 0x08 /* Bus control */
> @@ -851,77 +853,84 @@ static void r6040_multicast_list(struct net_device
> *dev) {
> struct r6040_private *lp = netdev_priv(dev);
> void __iomem *ioaddr = lp->base;
> - u16 *adrp;
> - u16 reg;
> unsigned long flags;
> struct netdev_hw_addr *ha;
> int i;
> + u16 hash_table[4] = { 0, };
>
> - /* MAC Address */
> - adrp = (u16 *)dev->dev_addr;
> - iowrite16(adrp[0], ioaddr + MID_0L);
> - iowrite16(adrp[1], ioaddr + MID_0M);
> - iowrite16(adrp[2], ioaddr + MID_0H);
> -
> - /* Promiscous Mode */
> spin_lock_irqsave(&lp->lock, flags);
>
> /* Clear AMCP & PROM bits */
> - reg = ioread16(ioaddr) & ~0x0120;
> + lp->mcr0 = ioread16(ioaddr + MCR0) & ~(PROMISC | HASH_EN);
> +
> + /* Promiscuous Mode */
> if (dev->flags & IFF_PROMISC) {
> - reg |= 0x0020;
> - lp->mcr0 |= 0x0020;
> + lp->mcr0 |= PROMISC;
> }
> - /* Too many multicast addresses
> - * accept all traffic */
> - else if ((netdev_mc_count(dev) > MCAST_MAX) ||
> - (dev->flags & IFF_ALLMULTI))
> - reg |= 0x0020;
> -
> - iowrite16(reg, ioaddr);
> - spin_unlock_irqrestore(&lp->lock, flags);
> + /* Enable multicast hash table function to
> + * receive all multicast packets. */
> + else if (dev->flags & IFF_ALLMULTI) {
> + lp->mcr0 |= HASH_EN;
> +
> + for (i = 0; i < MCAST_MAX ; i++) {
> + iowrite16(0, ioaddr + MID_1L + 8 * i);
> + iowrite16(0, ioaddr + MID_1M + 8 * i);
> + iowrite16(0, ioaddr + MID_1H + 8 * i);
> + }
>
> - /* Build the hash table */
> - if (netdev_mc_count(dev) > MCAST_MAX) {
> - u16 hash_table[4];
> + for (i = 0; i < 4; i++)
> + hash_table[i] = 0xffff;
> + }
> + /* Use internal multicast address registers if the number of
> + * multicast addresses is not greater than MCAST_MAX. */
> + else if (netdev_mc_count(dev) <= MCAST_MAX) {
> + i = 0;
> + netdev_for_each_mc_addr(ha, dev) {
> + u16 *adrp = (u16 *) ha->addr;
> + iowrite16(adrp[0], ioaddr + MID_1L + 8 * i);
> + iowrite16(adrp[1], ioaddr + MID_1M + 8 * i);
> + iowrite16(adrp[2], ioaddr + MID_1H + 8 * i);
> + i++;
> + }
> + while (i < MCAST_MAX) {
> + iowrite16(0, ioaddr + MID_1L + 8 * i);
> + iowrite16(0, ioaddr + MID_1M + 8 * i);
> + iowrite16(0, ioaddr + MID_1H + 8 * i);
> + i++;
> + }
> + }
> + /* Otherwise, Enable multicast hash table function. */
> + else {
> u32 crc;
>
> - for (i = 0; i < 4; i++)
> - hash_table[i] = 0;
> + lp->mcr0 |= HASH_EN;
>
> - netdev_for_each_mc_addr(ha, dev) {
> - char *addrs = ha->addr;
> + for (i = 0; i < MCAST_MAX ; i++) {
> + iowrite16(0, ioaddr + MID_1L + 8 * i);
> + iowrite16(0, ioaddr + MID_1M + 8 * i);
> + iowrite16(0, ioaddr + MID_1H + 8 * i);
> + }
>
> - if (!(*addrs & 1))
> - continue;
> + /* Build multicast hash table */
> + netdev_for_each_mc_addr(ha, dev) {
> + u8 *addrs = ha->addr;
>
> - crc = ether_crc_le(6, addrs);
> + crc = ether_crc(ETH_ALEN, addrs);
> crc >>= 26;
> - hash_table[crc >> 4] |= 1 << (15 - (crc & 0xf));
> + hash_table[crc >> 4] |= 1 << (crc & 0xf);
> }
> - /* Fill the MAC hash tables with their values */
> + }
> + iowrite16(lp->mcr0, ioaddr + MCR0);
> +
> + /* Fill the MAC hash tables with their values */
> + if (lp->mcr0 && HASH_EN) {
> iowrite16(hash_table[0], ioaddr + MAR0);
> iowrite16(hash_table[1], ioaddr + MAR1);
> iowrite16(hash_table[2], ioaddr + MAR2);
> iowrite16(hash_table[3], ioaddr + MAR3);
> }
> - /* Multicast Address 1~4 case */
> - i = 0;
> - netdev_for_each_mc_addr(ha, dev) {
> - if (i >= MCAST_MAX)
> - break;
> - adrp = (u16 *) ha->addr;
> - iowrite16(adrp[0], ioaddr + MID_1L + 8 * i);
> - iowrite16(adrp[1], ioaddr + MID_1M + 8 * i);
> - iowrite16(adrp[2], ioaddr + MID_1H + 8 * i);
> - i++;
> - }
> - while (i < MCAST_MAX) {
> - iowrite16(0xffff, ioaddr + MID_1L + 8 * i);
> - iowrite16(0xffff, ioaddr + MID_1M + 8 * i);
> - iowrite16(0xffff, ioaddr + MID_1H + 8 * i);
> - i++;
> - }
> +
> + spin_unlock_irqrestore(&lp->lock, flags);
> }
>
> static void netdev_get_drvinfo(struct net_device *dev,
> ---
>
> The version is for 2.6.32.y and 2.6.27.y:
>
> ---
> diff --git a/drivers/net/r6040.c b/drivers/net/r6040.c
> index 9ee9f01..f9af419 100644
> --- a/drivers/net/r6040.c
> +++ b/drivers/net/r6040.c
> @@ -69,6 +69,8 @@
>
> /* MAC registers */
> #define MCR0 0x00 /* Control register 0 */
> +#define PROMISC 0x0020 /* Promiscuous mode */
> +#define HASH_EN 0x0100 /* Enable multicast hash table function */
> #define MCR1 0x04 /* Control register 1 */
> #define MAC_RST 0x0001 /* Reset the MAC */
> #define MBCR 0x08 /* Bus control */
> @@ -935,76 +937,88 @@ static void r6040_multicast_list(struct net_device
> *dev) {
> struct r6040_private *lp = netdev_priv(dev);
> void __iomem *ioaddr = lp->base;
> - u16 *adrp;
> - u16 reg;
> unsigned long flags;
> struct dev_mc_list *dmi = dev->mc_list;
> int i;
> + u16 hash_table[4] = { 0, };
>
> - /* MAC Address */
> - adrp = (u16 *)dev->dev_addr;
> - iowrite16(adrp[0], ioaddr + MID_0L);
> - iowrite16(adrp[1], ioaddr + MID_0M);
> - iowrite16(adrp[2], ioaddr + MID_0H);
> -
> - /* Promiscous Mode */
> spin_lock_irqsave(&lp->lock, flags);
>
> /* Clear AMCP & PROM bits */
> - reg = ioread16(ioaddr) & ~0x0120;
> + lp->mcr0 = ioread16(ioaddr + MCR0) & ~(PROMISC | HASH_EN);
> +
> + /* Promiscuous Mode */
> if (dev->flags & IFF_PROMISC) {
> - reg |= 0x0020;
> - lp->mcr0 |= 0x0020;
> + lp->mcr0 |= PROMISC;
> }
> - /* Too many multicast addresses
> - * accept all traffic */
> - else if ((dev->mc_count > MCAST_MAX)
> - || (dev->flags & IFF_ALLMULTI))
> - reg |= 0x0020;
> + /* Enable multicast hash table function to
> + * receive all multicast packets. */
> + else if (dev->flags & IFF_ALLMULTI) {
> + lp->mcr0 |= HASH_EN;
> +
> + for (i = 0; i < MCAST_MAX ; i++) {
> + iowrite16(0, ioaddr + MID_1L + 8 * i);
> + iowrite16(0, ioaddr + MID_1M + 8 * i);
> + iowrite16(0, ioaddr + MID_1H + 8 * i);
> + }
>
> - iowrite16(reg, ioaddr);
> - spin_unlock_irqrestore(&lp->lock, flags);
> + for (i = 0; i < 4; i++)
> + hash_table[i] = 0xffff;
> + }
> + /* Use internal multicast address registers if the number of
> + * multicast addresses is not greater than MCAST_MAX. */
> + else if (dev->mc_count <= MCAST_MAX) {
> + i = 0;
> + while (i < dev->mc_count) {
> + u16 *adrp = (u16 *) dmi->dmi_addr;
> + dmi = dmi->next;
>
> - /* Build the hash table */
> - if (dev->mc_count > MCAST_MAX) {
> - u16 hash_table[4];
> + iowrite16(adrp[0], ioaddr + MID_1L + 8 * i);
> + iowrite16(adrp[1], ioaddr + MID_1M + 8 * i);
> + iowrite16(adrp[2], ioaddr + MID_1H + 8 * i);
> + i++;
> + }
> + while (i < MCAST_MAX) {
> + iowrite16(0, ioaddr + MID_1L + 8 * i);
> + iowrite16(0, ioaddr + MID_1M + 8 * i);
> + iowrite16(0, ioaddr + MID_1H + 8 * i);
> + i++;
> + }
> + }
> + /* Otherwise, Enable multicast hash table function. */
> + else {
> u32 crc;
>
> - for (i = 0; i < 4; i++)
> - hash_table[i] = 0;
> + lp->mcr0 |= HASH_EN;
>
> - for (i = 0; i < dev->mc_count; i++) {
> - char *addrs = dmi->dmi_addr;
> + for (i = 0; i < MCAST_MAX ; i++) {
> + iowrite16(0, ioaddr + MID_1L + 8 * i);
> + iowrite16(0, ioaddr + MID_1M + 8 * i);
> + iowrite16(0, ioaddr + MID_1H + 8 * i);
> + }
>
> + /* Build multicast hash table */
> + for (i = 0; i < dev->mc_count; i++) {
> + u8 *addrs = dmi->dmi_addr;
> dmi = dmi->next;
>
> - if (!(*addrs & 1))
> - continue;
> -
> - crc = ether_crc_le(6, addrs);
> + crc = ether_crc(ETH_ALEN, addrs);
> crc >>= 26;
> - hash_table[crc >> 4] |= 1 << (15 - (crc & 0xf));
> + hash_table[crc >> 4] |= 1 << (crc & 0xf);
> }
> - /* Fill the MAC hash tables with their values */
> +
> + }
> + iowrite16(lp->mcr0, ioaddr + MCR0);
> +
> + /* Fill the MAC hash tables with their values */
> + if (lp->mcr0 && HASH_EN) {
> iowrite16(hash_table[0], ioaddr + MAR0);
> iowrite16(hash_table[1], ioaddr + MAR1);
> iowrite16(hash_table[2], ioaddr + MAR2);
> iowrite16(hash_table[3], ioaddr + MAR3);
> }
> - /* Multicast Address 1~4 case */
> - dmi = dev->mc_list;
> - for (i = 0, dmi; (i < dev->mc_count) && (i < MCAST_MAX); i++) {
> - adrp = (u16 *)dmi->dmi_addr;
> - iowrite16(adrp[0], ioaddr + MID_1L + 8*i);
> - iowrite16(adrp[1], ioaddr + MID_1M + 8*i);
> - iowrite16(adrp[2], ioaddr + MID_1H + 8*i);
> - dmi = dmi->next;
> - }
> - for (i = dev->mc_count; i < MCAST_MAX; i++) {
> - iowrite16(0xffff, ioaddr + MID_1L + 8*i);
> - iowrite16(0xffff, ioaddr + MID_1M + 8*i);
> - iowrite16(0xffff, ioaddr + MID_1H + 8*i);
> - }
> +
> + spin_unlock_irqrestore(&lp->lock, flags);
> }
>
> static void netdev_get_drvinfo(struct net_device *dev,
> ---
>
>
>
> ===========================================================================
> ================ The privileged confidential information contained in this
> email is intended for use only by the addressees as indicated by the
> original sender of this email. If you are not the addressee indicated in
> this email or are not responsible for delivery of the email to such a
> person, please kindly reply to the sender indicating this fact and delete
> all copies of it from your computer and network server immediately. Your
> cooperation is highly appreciated. It is advised that any unauthorized use
> of confidential information of DM&P Group is strictly prohibited; and any
> information in this email irrelevant to the official business of DM&P
> Group shall be deemed as neither given nor endorsed by DM&P Group.
>
> ===========================================================================
> ================
^ permalink raw reply
* Freeing alive fib_info caused by ebc0ffae5
From: Michael Ellerman @ 2010-11-04 10:23 UTC (permalink / raw)
To: netdev; +Cc: eric.dumazet
[-- Attachment #1: Type: text/plain, Size: 694 bytes --]
Hi all,
I'm running Linus' latest or thereabouts (ff8b16d), and I'm seeing
"Freeing alive fib_info" messages, from free_fib_info().
Actually I only get one per boot, when network interfaces come up.
Seemingly related I am getting refcount problems when I shutdown, ie.
unregister_netdevice() sees a usage count of 1, which never decrements.
Bisect says it's ebc0ffae5 which causes the problem, or makes it appear.
fib: RCU conversion of fib_lookup()
fib_lookup() converted to be called in RCU protected context, no
reference taken and released on a contended cache line (fib_clntref)
Is this a bug in that commit, or a driver bug exposed?
cheers
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: [RFC 0/3] MPEG2/TS drop analyzer iptables match extension
From: Jan Engelhardt @ 2010-11-04 10:29 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Netfilter Developers, paulmck, Eric Dumazet, netdev, Solvik Blum
In-Reply-To: <Pine.LNX.4.64.1011040953590.19565@ask.diku.dk>
On Thursday 2010-11-04 10:20, Jesper Dangaard Brouer wrote:
> On Thu, 4 Nov 2010, Jan Engelhardt wrote:
>> On Tuesday 2010-10-19 16:21, Jesper Dangaard Brouer wrote:
>>>
>>> This is my iptables match module for analyzing IPTV MPEG2/TS streams.
>>> Currently it only detects dropped packets, but I want to extend it for
>>> analyzing jitter and bursts.
>>>
>>> Jan Engelhardt convinced me that I should just send the module as-is
>>> for review on the list. I wrote the code in 2009, and have only done
>>> some minor changes to make it work on kernel 2.6.35 since.
>>
>> This now lives in the mp2t branch (since NFWS already actually) of xt-a,
>> and I have taken the liberty to start updating it to higher standards.
>> Please watch that branch, as I don't have any MPEG equipment around me
>> to do runtime tests.
>
> Jan, I would actually like to maintain the source via my own git tree. And I
> would gladly accept your patches against that tree.
I do not mind who is hosting what parts, as git repos can be
transferred easily, but I strongly suggest not to decouple xt_mp2t
from (any clone of) the xtables-addons structure base, because doing
so would bring you back to square one with regard to maintenance.
I recognize you may dislike splitting up the IPTV codebase, so I
propose that you make use of submodules, and have an Xt-a clone as
one submodule. That would allow merging in both directions.
^ permalink raw reply
* Re: Freeing alive fib_info caused by ebc0ffae5
From: Eric Dumazet @ 2010-11-04 10:30 UTC (permalink / raw)
To: michael; +Cc: netdev
In-Reply-To: <1288866186.30549.10.camel@concordia>
Le jeudi 04 novembre 2010 à 21:23 +1100, Michael Ellerman a écrit :
> Hi all,
>
> I'm running Linus' latest or thereabouts (ff8b16d), and I'm seeing
> "Freeing alive fib_info" messages, from free_fib_info().
>
> Actually I only get one per boot, when network interfaces come up.
> Seemingly related I am getting refcount problems when I shutdown, ie.
> unregister_netdevice() sees a usage count of 1, which never decrements.
>
> Bisect says it's ebc0ffae5 which causes the problem, or makes it appear.
>
> fib: RCU conversion of fib_lookup()
>
> fib_lookup() converted to be called in RCU protected context, no
> reference taken and released on a contended cache line (fib_clntref)
>
>
> Is this a bug in that commit, or a driver bug exposed?
Hi Michael, thanks for the report (and painful bisection I guess)
Thats hard to say... Is it reproductable on my machine ?
Thanks
^ permalink raw reply
* Re: Freeing alive fib_info caused by ebc0ffae5
From: Eric Dumazet @ 2010-11-04 10:46 UTC (permalink / raw)
To: michael; +Cc: netdev
In-Reply-To: <1288866626.2659.71.camel@edumazet-laptop>
Le jeudi 04 novembre 2010 à 11:30 +0100, Eric Dumazet a écrit :
> Le jeudi 04 novembre 2010 à 21:23 +1100, Michael Ellerman a écrit :
> > Hi all,
> >
> > I'm running Linus' latest or thereabouts (ff8b16d), and I'm seeing
> > "Freeing alive fib_info" messages, from free_fib_info().
> >
> > Actually I only get one per boot, when network interfaces come up.
> > Seemingly related I am getting refcount problems when I shutdown, ie.
> > unregister_netdevice() sees a usage count of 1, which never decrements.
> >
> > Bisect says it's ebc0ffae5 which causes the problem, or makes it appear.
> >
> > fib: RCU conversion of fib_lookup()
> >
> > fib_lookup() converted to be called in RCU protected context, no
> > reference taken and released on a contended cache line (fib_clntref)
> >
> >
> > Is this a bug in that commit, or a driver bug exposed?
>
> Hi Michael, thanks for the report (and painful bisection I guess)
>
> Thats hard to say... Is it reproductable on my machine ?
You could ask a stack trace eventually, this might help to spot the bug.
Thanks
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 3e0da3e..8039db0 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -159,6 +159,7 @@ void free_fib_info(struct fib_info *fi)
{
if (fi->fib_dead == 0) {
pr_warning("Freeing alive fib_info %p\n", fi);
+ WARN_ON_ONCE(1);
return;
}
change_nexthops(fi) {
^ permalink raw reply related
* Congrat! contact: mr.graham.poll15@gmail.com for more details
From: Henrik Maibom Hansen @ 2010-11-04 10:22 UTC (permalink / raw)
500,000GBP was awarded to your email
^ permalink raw reply
* Re: Freeing alive fib_info caused by ebc0ffae5
From: Eric Dumazet @ 2010-11-04 11:21 UTC (permalink / raw)
To: michael; +Cc: netdev
In-Reply-To: <1288866626.2659.71.camel@edumazet-laptop>
Le jeudi 04 novembre 2010 à 11:30 +0100, Eric Dumazet a écrit :
> Le jeudi 04 novembre 2010 à 21:23 +1100, Michael Ellerman a écrit :
> > Hi all,
> >
> > I'm running Linus' latest or thereabouts (ff8b16d), and I'm seeing
> > "Freeing alive fib_info" messages, from free_fib_info().
> >
> > Actually I only get one per boot, when network interfaces come up.
> > Seemingly related I am getting refcount problems when I shutdown, ie.
> > unregister_netdevice() sees a usage count of 1, which never decrements.
> >
> > Bisect says it's ebc0ffae5 which causes the problem, or makes it appear.
> >
> > fib: RCU conversion of fib_lookup()
> >
> > fib_lookup() converted to be called in RCU protected context, no
> > reference taken and released on a contended cache line (fib_clntref)
> >
> >
> > Is this a bug in that commit, or a driver bug exposed?
>
> Hi Michael, thanks for the report (and painful bisection I guess)
>
> Thats hard to say... Is it reproductable on my machine ?
>
Hmm, a review of the code spotted a bug in fib_result_assign()
Please try following patch :
Thanks again !
[PATCH] fib: fib_result_assign() should not change fib refcounts
After commit ebc0ffae5 (RCU conversion of fib_lookup()),
fib_result_assign() should not change fib refcounts anymore.
Thanks to Michael who did the bisection and bug report.
Reported-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/ipv4/fib_lookup.h | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index a29edf2..c079cc0 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -47,11 +47,8 @@ extern int fib_detect_death(struct fib_info *fi, int order,
static inline void fib_result_assign(struct fib_result *res,
struct fib_info *fi)
{
- if (res->fi != NULL)
- fib_info_put(res->fi);
+ /* we used to play games with refcounts, but we now use RCU */
res->fi = fi;
- if (fi != NULL)
- atomic_inc(&fi->fib_clntref);
}
#endif /* _FIB_LOOKUP_H */
^ permalink raw reply related
* Re: Freeing alive fib_info caused by ebc0ffae5
From: Michael Ellerman @ 2010-11-04 11:23 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1288869699.2659.77.camel@edumazet-laptop>
[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]
On Thu, 2010-11-04 at 12:21 +0100, Eric Dumazet wrote:
> Le jeudi 04 novembre 2010 à 11:30 +0100, Eric Dumazet a écrit :
> > Le jeudi 04 novembre 2010 à 21:23 +1100, Michael Ellerman a écrit :
> > > Hi all,
> > >
> > > I'm running Linus' latest or thereabouts (ff8b16d), and I'm seeing
> > > "Freeing alive fib_info" messages, from free_fib_info().
> > >
> > > Actually I only get one per boot, when network interfaces come up.
> > > Seemingly related I am getting refcount problems when I shutdown, ie.
> > > unregister_netdevice() sees a usage count of 1, which never decrements.
> > >
> > > Bisect says it's ebc0ffae5 which causes the problem, or makes it appear.
> > >
> > > fib: RCU conversion of fib_lookup()
> > >
> > > fib_lookup() converted to be called in RCU protected context, no
> > > reference taken and released on a contended cache line (fib_clntref)
> > >
> > >
> > > Is this a bug in that commit, or a driver bug exposed?
> >
> > Hi Michael, thanks for the report (and painful bisection I guess)
> >
> > Thats hard to say... Is it reproductable on my machine ?
> >
>
> Hmm, a review of the code spotted a bug in fib_result_assign()
Aha, I was just adding some debug in there. Let me test the patch.
cheers
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* vhost-net-next updates
From: Michael S. Tsirkin @ 2010-11-04 11:26 UTC (permalink / raw)
To: Shirley Ma; +Cc: krkumar2, netdev, kvm, linux-kernel
I pushed out some optimization patches on vhost-net-next
branch on my vhost tree (intended for 2.6.38).
It would be helpful if people working on vhost-net optimizations
base their work on that tree just to make sure comparisons
are apples to apples.
I might rebase this as I didn't send a pull request to Dave yet
but I'll try not to. So far I have:
8b7347a vhost: get/put_user -> __get/__put_user
dfe5ac5 vhost: copy_to_user -> __copy_to_user
64e1c80 vhost-net: batch use/unuse mm
533a19b vhost: put mm after thread stop
3fcedec drivers/vhost/vhost.c: delete double assignment
Thanks!
--
MST
^ permalink raw reply
* Re: Freeing alive fib_info caused by ebc0ffae5
From: Michael Ellerman @ 2010-11-04 11:35 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1288869699.2659.77.camel@edumazet-laptop>
[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]
On Thu, 2010-11-04 at 12:21 +0100, Eric Dumazet wrote:
>
> Hmm, a review of the code spotted a bug in fib_result_assign()
>
> Please try following patch :
>
> Thanks again !
>
> [PATCH] fib: fib_result_assign() should not change fib refcounts
>
> After commit ebc0ffae5 (RCU conversion of fib_lookup()),
> fib_result_assign() should not change fib refcounts anymore.
>
> Thanks to Michael who did the bisection and bug report.
>
> Reported-by: Michael Ellerman <michael@ellerman.id.au>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> net/ipv4/fib_lookup.h | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
> index a29edf2..c079cc0 100644
> --- a/net/ipv4/fib_lookup.h
> +++ b/net/ipv4/fib_lookup.h
> @@ -47,11 +47,8 @@ extern int fib_detect_death(struct fib_info *fi, int order,
> static inline void fib_result_assign(struct fib_result *res,
> struct fib_info *fi)
> {
> - if (res->fi != NULL)
> - fib_info_put(res->fi);
> + /* we used to play games with refcounts, but we now use RCU */
> res->fi = fi;
> - if (fi != NULL)
> - atomic_inc(&fi->fib_clntref);
> }
>
> #endif /* _FIB_LOOKUP_H */
Perfect, that fixes it, thanks!
cheers
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: [PATCH] virtio_net: Fix queue full check
From: Michael S. Tsirkin @ 2010-11-04 12:24 UTC (permalink / raw)
To: Rusty Russell; +Cc: Krishna Kumar2, davem, netdev, yvugenfi
In-Reply-To: <20101102161730.GA32311@redhat.com>
On Tue, Nov 02, 2010 at 06:17:30PM +0200, Michael S. Tsirkin wrote:
> On Fri, Oct 29, 2010 at 09:58:40PM +1030, Rusty Russell wrote:
> > On Fri, 29 Oct 2010 09:25:09 pm Krishna Kumar2 wrote:
> > > Rusty Russell <rusty@rustcorp.com.au> wrote on 10/29/2010 03:17:24 PM:
> > >
> > > > > Oct 17 10:22:40 localhost kernel: net eth0: Unexpected TX queue
> > > failure: -28
> > > > > Oct 17 10:28:22 localhost kernel: net eth0: Unexpected TX queue
> > > failure: -28
> > > > > Oct 17 10:35:58 localhost kernel: net eth0: Unexpected TX queue
> > > failure: -28
> > > > > Oct 17 10:41:06 localhost kernel: net eth0: Unexpected TX queue
> > > failure: -28
> > > > >
> > > > > I initially changed the check from -ENOMEM to -ENOSPC, but
> > > > > virtqueue_add_buf can return only -ENOSPC when it doesn't have
> > > > > space for new request. Patch removes redundant checks but
> > > > > displays the failure errno.
> > > > >
> > > > > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> > > > > ---
> > > > > drivers/net/virtio_net.c | 15 ++++-----------
> > > > > 1 file changed, 4 insertions(+), 11 deletions(-)
> > > > >
> > > > > diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
> > > > > --- org/drivers/net/virtio_net.c 2010-10-11 10:20:02.000000000 +0530
> > > > > +++ new/drivers/net/virtio_net.c 2010-10-21 17:37:45.000000000 +0530
> > > > > @@ -570,17 +570,10 @@ static netdev_tx_t start_xmit(struct sk_
> > > > >
> > > > > /* This can happen with OOM and indirect buffers. */
> > > > > if (unlikely(capacity < 0)) {
> > > > > - if (net_ratelimit()) {
> > > > > - if (likely(capacity == -ENOMEM)) {
> > > > > - dev_warn(&dev->dev,
> > > > > - "TX queue failure: out of memory\n");
> > > > > - } else {
> > > > > - dev->stats.tx_fifo_errors++;
> > > > > - dev_warn(&dev->dev,
> > > > > - "Unexpected TX queue failure: %d\n",
> > > > > - capacity);
> > > > > - }
> > > > > - }
> > > > > + if (net_ratelimit())
> > > > > + dev_warn(&dev->dev,
> > > > > + "TX queue failure (%d): out of memory\n",
> > > > > + capacity);
> > > >
> > > > Hold on... you were getting -ENOSPC, which shouldn't happen. What makes
> > > you
> > > > think it's out of memory?
> > >
> > > virtqueue_add_buf_gfp returns only -ENOSPC on failure, whether
> > > direct or indirect descriptors are used, so isn't -ENOSPC
> > > "expected"? (vring_add_indirect returns -ENOMEM on memory
> > > failure, but that is masked out and we go direct which is
> > > the failure point).
> >
> > Ah, OK, gotchya.
> > I'm not even sure the fallback to linear makes sense; if we're failing
> > kmallocs we should probably just return -ENOMEM. Would mean we can
> > tell the difference between "out of space" (which should never happen
> > since we stop the queue when we have < 2+MAX_SKB_FRAGS slots left)
> > and this case.
> >
> > Michael, what do you think?
> >
> > Thanks,
> > Rusty.
>
> Let's make sure I understand the issue: we use indirect buffers
> so we assume there's still a lot of place in the ring, then
> allocation for the indirect fails and so we return -ENOSPC?
>
> So first, I agree it's a bug. But I am not sure killing the fallback
> is such a good idea: recovering from add buf failure is hard
> generally, we should try to accomodate if we can. Let's just fix
> the return code for now?
>
> And generally, we should be smarter: as long as the ring is almost
> empty, and s/g list is short, it is a waste to use indirect buffers.
> BTW we have had a FIXME there for a long while, I think Yan suggested
> increasing that threshold to 3. Yan?
>
> Further, maybe preallocating some memory for the indirect buffers might
> be a good idea.
>
> In short, lots of good ideas, let's start with the minimal patch that is
> a good 2.6.37 candidate too. How about the following (untested)?
>
> virtio: fix add_buf return code for OOM
>
> add_buff returned ENOSPC on out of memory: this is a bug
> as at leats virtio-net expects ENOMEM and handles it
> specially. Fix that.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
I thought about this some more. I think the original
code is actually correct in returning ENOSPC: indirect
buffers are nice, but it's a mistake
to rely on them as a memory allocation might fail.
And if you look at virtio-net, it is dropping packets
under memory pressure which is not really a happy outcome:
the packet will get freed, reallocated and we get another one,
adding pressure on the allocator instead of releasing it
until we free up some buffers.
So I now think we should calculate the capacity
assuming non-indirect entries, and if we manage to
use indirect, all the better.
So below is what I propose now - as a replacement for
my original patch. Krishna Kumar, Rusty, what do you think?
Separately I'm also considering moving the
if (vq->num_free < out + in)
check earlier in the function to keep all users honest,
but need to check what the implications are for e.g. block.
Thoughts on this?
---->
virtio: return correct capacity to users
We can't rely on indirect buffers for capacity
calculations because they need a memory allocation
which might fail.
So return the number of buffers we can guarantee users.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1475ed6..cc2f73e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -230,9 +230,6 @@ add_head:
pr_debug("Added buffer head %i to %p\n", head, vq);
END_USE(vq);
- /* If we're indirect, we can fit many (assuming not OOM). */
- if (vq->indirect)
- return vq->num_free ? vq->vring.num : 0;
return vq->num_free;
}
EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp);
^ permalink raw reply related
* Winning Alert!!! contact: claimsdepartment1313@yahoo.co.uk for more details
From: Henrik Maibom Hansen @ 2010-11-04 12:46 UTC (permalink / raw)
You have won $552,000.00,just send your name,tel,country
^ permalink raw reply
* Re: [PATCH 2/2] inet_diag: Make sure we actually run the same bytecode we audited.
From: Thomas Graf @ 2010-11-04 13:28 UTC (permalink / raw)
To: Nelson Elhage; +Cc: netdev
In-Reply-To: <1288838141-17871-2-git-send-email-nelhage@ksplice.com>
On Wed, Nov 03, 2010 at 10:35:41PM -0400, Nelson Elhage wrote:
> We were using nlmsg_find_attr() to look up the bytecode by attribute when
> auditing, but then just using the first attribute when actually running
> bytecode. So, if we received a message with two attribute elements, where only
> the second had type INET_DIAG_REQ_BYTECODE, we would validate and run different
> bytecode strings.
>
> Fix this by consistently using nlmsg_find_attr everywhere.
>
> Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Both patches look good.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
^ permalink raw reply
* Re: Routing over multiple interfaces
From: Patrick Schaaf @ 2010-11-04 12:51 UTC (permalink / raw)
To: netdev
In-Reply-To: <4CD08C6D.1090107@arndnet.de>
> iptables -t mangle -A PREROUTING -d $EXTERNAL -m statistic --mode nth --every 2 -j MARK --set-mark 6
If statistics match is missing, a pretty good alternative I recently
"found" is using u32 to match for a bit from the IP ID. That is a
stateless decision, and here it probably has the theoretical advantage
of putting all fragments of a given packet onto the same link.
iptables -t mangle -A PREROUTING ... -m u32 --u32 0x2&0x1=0x0 -j MARK
--set-mark 6
best regards
Patrick
^ permalink raw reply
* Re: [RFC][net-next-2.6 PATCH 2/4] net: 8021Q consolidate header_ops routines
From: John Fastabend @ 2010-11-04 13:43 UTC (permalink / raw)
To: Jesse Gross; +Cc: netdev@vger.kernel.org
In-Reply-To: <AANLkTi=pObCfPHkPCUxD8yicZL3pyTwm9s_z4KKda62k@mail.gmail.com>
On 11/3/2010 5:47 PM, Jesse Gross wrote:
> On Thu, Oct 21, 2010 at 3:10 PM, John Fastabend
> <john.r.fastabend@intel.com> wrote:
>> The only thing the 8021Q header ops routines are required
>> for is the VLAN_FLAG_REORDER_HDR otherwise by the time
>> the VLAN tag has been added the packet is already on
>> its way down the stack. In this case using the Ethernet
>> ops works OK.
>>
>> At present the VLAN_FLAG_REORDER_HDR flag does not work
>> with vlan offloads. As I understand the flag the intent
>> is to allow taps on the vlan device and possibly the
>> QOS layer to see the vlan tag info.
>>
>> By inserting the tag in vlan_tci any taps or QOS policies
>> should be able to retrieve the vlan info. This allows
>> the flag to work the same in both the offload case and
>> non-offloaded case. And allows us to use the underlying
>> ethernet ops.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>
> I noticed that you dropped this patch from your most recent series, so
> I went back to take a look at it. I realized that it probably works
> inconsistently since header caching doesn't take into account
> skb->vlan_tci, so whether you see the tag depends on the state of the
> cache.
>
> It would be really good to have this type of code consolidation, both
> for the sake of sanity and to eliminate the inconsistent behavior. We
> could do that by either not using header caching or making it work
> with vlan offloading somehow. However, I'm not sure that there's
> really much point in that. VLAN_FLAG_REORDER_HDR doesn't work with
> cards that do vlan offloading, which is a pretty significant number of
> them. It similarly works inconsistently on the rx side. So it's
> broken most of the time and worse, the behavior changes depending on
> the NIC (and now the ethtool setting). Can we just eliminate it?
Yes this is why I have dropped it for now. Also rebuild is broke as best I can tell. Although I doubt anyone would notice you would need to clear VLAN_FLAG_REORDER_HDR and be using one of the ARPHRD_{ROSE|AX25|NETROM}.
The problem with caching the vlan header is the skb priority to vlan priority map. So we could cache the vid, sa, da, and protocols but I can not see anyway to cache the vlan priority. Also the cache would have to be flushed when the flag is toggled.
Thanks,
John.
^ permalink raw reply
* Re: Routing over multiple interfaces
From: Eric Dumazet @ 2010-11-04 14:01 UTC (permalink / raw)
To: Patrick Schaaf; +Cc: netdev
In-Reply-To: <1288875102.4357.40.camel@lat1>
Le jeudi 04 novembre 2010 à 13:51 +0100, Patrick Schaaf a écrit :
> > iptables -t mangle -A PREROUTING -d $EXTERNAL -m statistic --mode nth --every 2 -j MARK --set-mark 6
>
> If statistics match is missing, a pretty good alternative I recently
> "found" is using u32 to match for a bit from the IP ID. That is a
> stateless decision, and here it probably has the theoretical advantage
> of putting all fragments of a given packet onto the same link.
>
> iptables -t mangle -A PREROUTING ... -m u32 --u32 0x2&0x1=0x0 -j MARK
> --set-mark 6
Sure, thats a good tip/idea, but note many UDP frames have IP.id = 0
^ permalink raw reply
* Re: Linux 2.6.37-rc1 (net/sched: cls_cgroup)
From: Randy Dunlap @ 2010-11-04 15:56 UTC (permalink / raw)
To: Herbert Xu
Cc: Eric Dumazet, Linus Torvalds, Jamal Hadi Salim, Thomas Graf,
Linux Kernel Mailing List, netdev, Ben Blum
In-Reply-To: <20101103233105.GA26124@gondor.apana.org.au>
On 11/03/10 16:31, Herbert Xu wrote:
> On Wed, Nov 03, 2010 at 11:01:17PM +0100, Eric Dumazet wrote:
>>
>> commits 8e039d84b323c450
>> (cgroups: net_cls as module)
>>
>> followed by commit f845172531f
>> (cls_cgroup: Store classid in struct sock)
>
> Indeed, it looks like the tree I worked on didn't have the first
> patch applied for some reason.
>
> Anyway, this patch should fix the problem. Thanks Eric!
>
> cls_cgroup: Fix crash on module unload
>
> Somewhere along the lines net_cls_subsys_id became a macro when
> cls_cgroup is built as a module. Not only did it make cls_cgroup
> completely useless, it also causes it to crash on module unload.
>
> This patch fixes this by removing that macro.
>
> Thanks to Eric Dumazet for diagnosing this problem.
>
> Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Thanks.
>
> diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
> index 37dff78..d49c40f 100644
> --- a/net/sched/cls_cgroup.c
> +++ b/net/sched/cls_cgroup.c
> @@ -34,8 +34,6 @@ struct cgroup_subsys net_cls_subsys = {
> .populate = cgrp_populate,
> #ifdef CONFIG_NET_CLS_CGROUP
> .subsys_id = net_cls_subsys_id,
> -#else
> -#define net_cls_subsys_id net_cls_subsys.subsys_id
> #endif
> .module = THIS_MODULE,
> };
>
> Cheers,
--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
^ permalink raw reply
* Re: [PATCH] virtio_net: Fix queue full check
From: Krishna Kumar2 @ 2010-11-04 16:17 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: davem, netdev, Rusty Russell, yvugenfi
In-Reply-To: <20101104122424.GA29830@redhat.com>
"Michael S. Tsirkin" <mst@redhat.com> wrote on 11/04/2010 05:54:24 PM:
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> I thought about this some more. I think the original
> code is actually correct in returning ENOSPC: indirect
> buffers are nice, but it's a mistake
> to rely on them as a memory allocation might fail.
>
> And if you look at virtio-net, it is dropping packets
> under memory pressure which is not really a happy outcome:
> the packet will get freed, reallocated and we get another one,
> adding pressure on the allocator instead of releasing it
> until we free up some buffers.
>
> So I now think we should calculate the capacity
> assuming non-indirect entries, and if we manage to
> use indirect, all the better.
>
> So below is what I propose now - as a replacement for
> my original patch. Krishna Kumar, Rusty, what do you think?
>
> Separately I'm also considering moving the
> if (vq->num_free < out + in)
> check earlier in the function to keep all users honest,
> but need to check what the implications are for e.g. block.
> Thoughts on this?
This looks like the right thing to do. Besides this, I
think virtio-net still needs to remove check for ENOMEM?
I will test this patch tomorrow.
Another question about add_recvbuf_small and
add_recvbuf_big - both call virtqueue_add_buf_gfp with
in+out > 1, and that can fail with -ENOSPC. So try_fill_recv
gets -ENOSPC. When that happens, oom is not set to true,
I thought it should have got set. Is this a bug?
Thanks,
- KK
^ permalink raw reply
* Re: [PATCH] virtio_net: Fix queue full check
From: Michael S. Tsirkin @ 2010-11-04 16:45 UTC (permalink / raw)
To: Krishna Kumar2; +Cc: davem, netdev, Rusty Russell, yvugenfi
In-Reply-To: <OF9196A33D.75E1F25E-ON652577D1.0057D889-652577D1.00592104@in.ibm.com>
On Thu, Nov 04, 2010 at 09:47:04PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 11/04/2010 05:54:24 PM:
>
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >
> > I thought about this some more. I think the original
> > code is actually correct in returning ENOSPC: indirect
> > buffers are nice, but it's a mistake
> > to rely on them as a memory allocation might fail.
> >
> > And if you look at virtio-net, it is dropping packets
> > under memory pressure which is not really a happy outcome:
> > the packet will get freed, reallocated and we get another one,
> > adding pressure on the allocator instead of releasing it
> > until we free up some buffers.
> >
> > So I now think we should calculate the capacity
> > assuming non-indirect entries, and if we manage to
> > use indirect, all the better.
> >
> > So below is what I propose now - as a replacement for
> > my original patch. Krishna Kumar, Rusty, what do you think?
> >
> > Separately I'm also considering moving the
> > if (vq->num_free < out + in)
> > check earlier in the function to keep all users honest,
> > but need to check what the implications are for e.g. block.
> > Thoughts on this?
>
> This looks like the right thing to do. Besides this, I
> think virtio-net still needs to remove check for ENOMEM?
Yes, the only valid reason for failure would be a unexpected error.
No need to special-case ENOMEM anymore.
> I will test this patch tomorrow.
>
> Another question about add_recvbuf_small and
> add_recvbuf_big - both call virtqueue_add_buf_gfp with
> in+out > 1, and that can fail with -ENOSPC. So try_fill_recv
> gets -ENOSPC. When that happens, oom is not set to true,
> I thought it should have got set. Is this a bug?
>
> Thanks,
>
> - KK
I don't see a bug: on ENOSPC we don't need to (and can't) add any more
buffers, we know we will make progress since there must be some buffers
in the ring already, ENOMEM makes us try again later with more buffers
(and possibly more aggressive GFP flag). What's wrong?
--
MST
^ permalink raw reply
* Re: RFC: Crypto API User-interface
From: Herbert Xu @ 2010-11-04 17:34 UTC (permalink / raw)
To: Linux Crypto Mailing List, netdev, Linux Kernel Mailing List
In-Reply-To: <20101019134418.GA13514@gondor.apana.org.au>
On Tue, Oct 19, 2010 at 09:44:18PM +0800, Herbert Xu wrote:
>
> OK I've gone ahead and implemented the user-space API for hashes
> and ciphers.
Here is a revised series with bug fixes and improvements. The
main change is that hashes can now be finalised by recvmsg instead
of requiring a preceding sendmsg with no MSG_MORE.
Thakns to Miloslav Trmac for reviewing this and contributing
fixes and improvements.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* [PATCH 1/4] net - Add AF_ALG macros
From: Herbert Xu @ 2010-11-04 17:36 UTC (permalink / raw)
To: Linux Crypto Mailing List, netdev, Linux Kernel Mailing List
In-Reply-To: <20101104173456.GA1321@gondor.apana.org.au>
net - Add AF_ALG macros
This patch adds the socket family/level macros for the yet-to-be-born
AF_ALG family. The AF_ALG family provides the user-space interface
for the kernel crypto API.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
include/linux/socket.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 5146b50..ebc081b 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -193,7 +193,8 @@ struct ucred {
#define AF_PHONET 35 /* Phonet sockets */
#define AF_IEEE802154 36 /* IEEE802154 sockets */
#define AF_CAIF 37 /* CAIF sockets */
-#define AF_MAX 38 /* For now.. */
+#define AF_ALG 38 /* Algorithm sockets */
+#define AF_MAX 39 /* For now.. */
/* Protocol families, same as address families. */
#define PF_UNSPEC AF_UNSPEC
@@ -234,6 +235,7 @@ struct ucred {
#define PF_PHONET AF_PHONET
#define PF_IEEE802154 AF_IEEE802154
#define PF_CAIF AF_CAIF
+#define PF_ALG AF_ALG
#define PF_MAX AF_MAX
/* Maximum queue length specifiable by listen. */
@@ -307,6 +309,7 @@ struct ucred {
#define SOL_RDS 276
#define SOL_IUCV 277
#define SOL_CAIF 278
+#define SOL_ALG 279
/* IPX options */
#define IPX_TYPE 1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox