Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/5] First Patch on TFRC-SP. Copy base files from TFRC
From: gerrit @ 2009-09-19  6:16 UTC (permalink / raw)
  To: Ivo Calado; +Cc: Gerrit Renker, dccp, netdev
In-Reply-To: <cb00fa210909141738l60d02cd5p287d990326ce6c3c@mail.gmail.com>

>> Also separated the conditions
>> +               if ((len <= 0) ||
>> +                   (!tfrc_lh_closed_check(cur,
>> cong_evt->tfrchrx_ccval))) {
>> back into
>>                if (len <= 0)
>>                        return false;
>>
>>                if (!tfrc_lh_closed_check(cur, cong_evt->tfrchrx_ccval))
>>                        return false;
>>
>
> Thanks!
Yes I know, the above change is reintroduced by patch 2/2. Only found
out after I had gone through this one.


>> The following function pokes a hole in thei so far "abstract" data type;
>> the convention has been to access the internals of the struct only via
>> get-functions:
>>
>> static inline struct tfrc_loss_interval
>>        *tfrc_lh_get_loss_interval(struct tfrc_loss_hist *lh, const u8 i)
>> {
>>        BUG_ON(i >= lh->counter);
>>        return lh->ring[LIH_INDEX(lh->counter - i - 1)];
>> }
>>
>> (You use it in patch 3/5 to gain access to li_ccval and li_losses.
>> Better would be to have two separate accessor functions.)
>>
>
> Okay, I will fix this.
>
It would be great but is secondary at this stage. The primary objective
should be to get a common prototype out soon, and then verify that it is
correct. I expect several rewrites of other code to make this possible,
so the above detail can also be fixed once a prototype has been found to
work satisfactorily.


^ permalink raw reply

* Re: [patch] ipvs: Use atomic operations atomicly
From: Simon Horman @ 2009-09-19  9:51 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: lvs-devel, netdev, netfilter-devel, ?? shin hong, David Miller
In-Reply-To: <4A9E6B62.8080208@trash.net>

[ re-post with CC list ]

On Wed, Sep 02, 2009 at 02:56:02PM +0200, Patrick McHardy wrote:
> Simon Horman wrote:
> > On Mon, Aug 31, 2009 at 02:22:26PM +0200, Patrick McHardy wrote:
> >> It seems that proc_do_sync_threshold() should check whether this value
> >> is zero. The current checks also look racy since incorrect values are
> >> first updated, then overwritten again.
> > 
> > I'm wondering if an approach along the lines of the following is valid.
> > The idea is that the value in the ctl_table is essentially a scratch
> > value that is used by the parser and then copied into ip_vs_sync_threshold
> > if it is valid.
> 
> Even simpler would be to use a temporary buffer on the stack for copying
> the values from userspace and then copy them to the final buffer after
> validation.

Do you mean something like what I have done using read_table in
the new patch below?

> > I'm concerned that there are atomicity issues
> > surrounding writing ip_vs_sync_threshold while there might be readers.
> 
> That might be a problem if they are required to be "synchronized".

I don't think that the sychronisation needs extend beyond their
use in this snippet.

> 
> > --- a/net/netfilter/ipvs/ip_vs_core.c
> > +++ b/net/netfilter/ipvs/ip_vs_core.c
> > @@ -1362,8 +1362,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb,
> >  	    (ip_vs_sync_state & IP_VS_STATE_MASTER) &&
> >  	    (((cp->protocol != IPPROTO_TCP ||
> >  	       cp->state == IP_VS_TCP_S_ESTABLISHED) &&
> > -	      (pkts % sysctl_ip_vs_sync_threshold[1]
> > -	       == sysctl_ip_vs_sync_threshold[0])) ||
> > +	      (pkts % ip_vs_sync_threshold[1] == ip_vs_sync_threshold[0])) ||
> >  	     ((cp->protocol == IPPROTO_TCP) && (cp->old_state != cp->state) &&
> >  	      ((cp->state == IP_VS_TCP_S_FIN_WAIT) ||
> >  	       (cp->state == IP_VS_TCP_S_CLOSE_WAIT) ||
> > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> > index fba2892..8a9ff21 100644
> > --- a/net/netfilter/ipvs/ip_vs_ctl.c
> > +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> > @@ -76,6 +76,11 @@ static atomic_t ip_vs_dropentry = ATOMIC_INIT(0);
> >  /* number of virtual services */
> >  static int ip_vs_num_services = 0;
> >  
> > +/* threshold handling */
> > +static int ip_vs_sync_threshold_min = 0;
> > +static int ip_vs_sync_threshold_max = INT_MAX;
> > +int ip_vs_sync_threshold[2] = { 3, 50 };
> > +
> 
> min should be 1 I guess or you still need to manually check
> that ip_vs_sync_threshold[1] != 0 to avoid a division be zero.

I think that the check that val[0] < val[1] ensures this.

Something that I noticed while putting this new patch together,
should the code be checking the value of rc and only assigning  values if
its 0. If so, I probably needs to be changed in a few places.

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index fba2892..c33ef7d 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -76,6 +76,9 @@ static atomic_t ip_vs_dropentry = ATOMIC_INIT(0);
 /* number of virtual services */
 static int ip_vs_num_services = 0;
 
+static int ip_vs_sync_threshold_min = 0;
+static int ip_vs_sync_threshold_max = INT_MAX;
+
 /* sysctl variables */
 static int sysctl_ip_vs_drop_entry = 0;
 static int sysctl_ip_vs_drop_packet = 0;
@@ -1520,18 +1523,17 @@ static int
 proc_do_sync_threshold(ctl_table *table, int write, struct file *filp,
 		       void __user *buffer, size_t *lenp, loff_t *ppos)
 {
+	struct ctl_table read_table;
 	int *valp = table->data;
 	int val[2];
 	int rc;
 
-	/* backup the value first */
-	memcpy(val, valp, sizeof(val));
+	memcpy(&read_table, table, sizeof(read_table));
+	read_table.data = &val;
 
-	rc = proc_dointvec(table, write, filp, buffer, lenp, ppos);
-	if (write && (valp[0] < 0 || valp[1] < 0 || valp[0] >= valp[1])) {
-		/* Restore the correct value */
+	rc = proc_dointvec_minmax(&read_table, write, filp, buffer, lenp, ppos);
+	if (write && (val[0] >= val[1]))
 		memcpy(valp, val, sizeof(val));
-	}
 	return rc;
 }
 
@@ -1698,6 +1700,8 @@ static struct ctl_table vs_vars[] = {
 		.maxlen		= sizeof(sysctl_ip_vs_sync_threshold),
 		.mode		= 0644,
 		.proc_handler	= proc_do_sync_threshold,
+		.extra1		= &ip_vs_sync_threshold_min,
+		.extra2		= &ip_vs_sync_threshold_max,
 	},
 	{
 		.procname	= "nat_icmp_send",

^ permalink raw reply related

* Re: [PATCH] cpmac: fix compilation errors against undeclared BUS_ID_SIZE
From: Florian Fainelli @ 2009-09-19 10:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Ralf Baechle, linux-mips
In-Reply-To: <200909160944.24265.florian@openwrt.org>

David,

Ping ? This fixes a build failure. Thank you very much !

Le mercredi 16 septembre 2009 09:44:22, Florian Fainelli a écrit :
> Hi David,
>
> This is relevant for 2.6.32-rc0, thanks !
> --
> From: Florian Fainelli <florian@openwrt.org>
> Subject: [PATCH] cpmac: fix compilation errors against undeclared
> BUS_ID_SIZE
>
> With the removal of BUS_ID_SIZE, cpmac was not fully
> converted to use MII_BUS_ID_SIZE as it ought to. This
> patch fixes the following cpmac build failure:
>  CC      drivers/net/cpmac.o
> drivers/net/cpmac.c: In function 'cpmac_start_xmit':
> drivers/net/cpmac.c:563: warning: comparison of distinct pointer types
> lacks a cast drivers/net/cpmac.c: In function 'cpmac_probe':
> drivers/net/cpmac.c:1112: error: 'BUS_ID_SIZE' undeclared (first use in
> this function) drivers/net/cpmac.c:1112: error: (Each undeclared identifier
> is reported only once drivers/net/cpmac.c:1112: error: for each function it
> appears in.)
>
> Reported-by: Ralf Baechle <ralf@linux-mips.org>
> Signed-off-by: Florian Fainelli <florian@openwrt.org>
> ---
> diff --git a/drivers/net/cpmac.c b/drivers/net/cpmac.c
> index 3e3fab8..61f9da2 100644
> --- a/drivers/net/cpmac.c
> +++ b/drivers/net/cpmac.c
> @@ -1109,7 +1109,7 @@ static int external_switch;
>  static int __devinit cpmac_probe(struct platform_device *pdev)
>  {
>  	int rc, phy_id;
> -	char mdio_bus_id[BUS_ID_SIZE];
> +	char mdio_bus_id[MII_BUS_ID_SIZE];
>  	struct resource *mem;
>  	struct cpmac_priv *priv;
>  	struct net_device *dev;
> @@ -1118,7 +1118,7 @@ static int __devinit cpmac_probe(struct
> platform_device *pdev) pdata = pdev->dev.platform_data;
>
>  	if (external_switch || dumb_switch) {
> -		strncpy(mdio_bus_id, "0", BUS_ID_SIZE); /* fixed phys bus */
> +		strncpy(mdio_bus_id, "0", MII_BUS_ID_SIZE); /* fixed phys bus */
>  		phy_id = pdev->id;
>  	} else {
>  		for (phy_id = 0; phy_id < PHY_MAX_ADDR; phy_id++) {
> @@ -1126,7 +1126,7 @@ static int __devinit cpmac_probe(struct
> platform_device *pdev) continue;
>  			if (!cpmac_mii->phy_map[phy_id])
>  				continue;
> -			strncpy(mdio_bus_id, cpmac_mii->id, BUS_ID_SIZE);
> +			strncpy(mdio_bus_id, cpmac_mii->id, MII_BUS_ID_SIZE);
>  			break;
>  		}
>  	}
> @@ -1167,7 +1167,7 @@ static int __devinit cpmac_probe(struct
> platform_device *pdev) priv->msg_enable = netif_msg_init(debug_level,
> 0xff);
>  	memcpy(dev->dev_addr, pdata->dev_addr, sizeof(dev->dev_addr));
>
> -	snprintf(priv->phy_name, BUS_ID_SIZE, PHY_ID_FMT, mdio_bus_id, phy_id);
> +	snprintf(priv->phy_name, MII_BUS_ID_SIZE, PHY_ID_FMT, mdio_bus_id,
> phy_id);
>
>  	priv->phy = phy_connect(dev, priv->phy_name, &cpmac_adjust_link, 0,
>  						PHY_INTERFACE_MODE_MII);

-- 
Best regards, Florian Fainelli
Email: florian@openwrt.org
Web: http://openwrt.org
IRC: [florian] on irc.freenode.net
-------------------------------

^ permalink raw reply

* b43 is broken in latest net-2.6 and linux-2.6
From: Oliver Hartkopp @ 2009-09-19 11:23 UTC (permalink / raw)
  To: Michael Buesch; +Cc: Linux Netdev List

Hello Michael,

my b43 wireless card (Dell 830) is not working with the latest net-2.6 (and
also linux-2.6 2.6.31-05767-gdf58bee).

net-2.6 2.6.31-03263-gc29854e is working
net-2.6 2.6.31-03301-ga97e178 is broken

I removed the patch with the work_queue stuff which did not help - so it's
probably the other patch you added to b43 recently.

Don't know ... the wlan0 link does not become ready anymore.

If you need some more information - please let me know.

Best regards,
Oliver

^ permalink raw reply

* Re: [PATCH 2/5] Implement loss counting on TFRC-SP receiver
From: gerrit @ 2009-09-19 12:11 UTC (permalink / raw)
  To: Ivo Calado; +Cc: Gerrit Renker, dccp, netdev
In-Reply-To: <cb00fa210909141739y234302b3x9c6df23b057a9473@mail.gmail.com>

>>                s64 len = dccp_delta_seqno(cur->li_seqno,
>> cong_evt_seqno);
>>                if ((len <= 0) ||
>>                    (!tfrc_lh_closed_check(cur,
>> cong_evt->tfrchrx_ccval))) {
>> +                       cur->li_losses += rh->num_losses;
>>                        return false;
>>                }
>> This has a multiplicative effect, since rh->num_losses is added to
cur->li_losses
>> each time the condition is evaluated. E.g. if 3 times in a row
reordered
>> (earlier)
>> sequence numbers arrive, or if the CCvals do not differ (high-speed
networks),
>> we end up with 3 * rh->num_losses, which can't be correct.
>
>
> The following code would be correct then?
>
>               if ((len <= 0) ||
>                   (!tfrc_lh_closed_check(cur, cong_evt->tfrchrx_ccval)))
{
> +                       cur->li_losses += rh->num_losses;
> +                       rh->num_losses  = 0;
>                       return false;
> With this change I suppose the could be fixed. With that, the
> rh->num_losses couldn't added twice. Am I correct?
>
>
The function tfrc_lh_interval_add() is called when
 * __two_after_loss() returns true (a new loss is detected) or
 * a data packet is ECN-CE marked.

I am still not sure about the 'len <= 0' case; this would be true
if an ECN-marked packet arrives whose sequence number is 'before'
the start of the current loss interval, or if a loss is detected
which is older than the start of the current loss interval.

The other case (tfrc_lh_closed_check) returns 1 if the current loss
interval is 'closed' according to RFC 4342, 10.2.

Intuitively, in the first case it refers to the preceding loss
interval (i.e. not cur->...), in the second case it seems correct.

Doing the first case is complicated due to going back in history.
The simplest solution I can think of at the moment is to ignore
the exception-case of reordered packets and do something like

    if (len <= 0) {
       /* FIXME: this belongs into the previous loss interval */  
tfrc_pr_debug("Warning: ignoring loss due to reordering");
       return false;
    }
    if (!tfrc_lh_closed_check(...)) {
        // your code from above
    }

However, there is a much deeper underlying question: currently the
implementation is not really what the specification says; if we
wanted to abide by the letter of the law, we would have to implement
the Loss Intervals Option first, and then sort out such details as
above. Discussion continues further below.

>> --- dccp_tree_work4.orig/net/dccp/ccids/lib/packet_history_sp.c +++
dccp_tree_work4/net/dccp/ccids/lib/packet_history_sp.c
>> @@ -244,6 +244,7 @@
>>                h->loss_count = 3;
>>                tfrc_sp_rx_hist_entry_from_skb(tfrc_rx_hist_entry(h, 3),
                                              skb, n3);
>> +               h->num_losses = dccp_loss_count(s2, s3, n3);
>>                return 1;
>>        }
>> This only measures the gap between s2 and s3, but the "hole" starts at s0,
>> so it would need to be dccp_loss_count(s0, s3, n3). Algorithm is
documented at
>> http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\
>> ccid3/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
>
> <snip>
>
>>  }
>> Here it is also between s0 and s2, not between s0 and s3. It is case
VI(c.3).
>> However, the above still is a crude approximation, since it only
measures between
>> the last sequence number received before the loss and the third
sequence
>> number
>> after the loss. It would be better to either
>>  * use the first sequence number after the loss (this can be s1, s2, or
s3) or
>>  * check if there are more holes between the first/second and the
second/third
>>   sequence numbers after the loss.
>> The second option would be the correct one, it should also take the NDP
counts
>> of each gap into account. And already we have a fairly complicated
algorithm.
>
>
>
> I'll study loss_detection_algorithm_notes.txt and correct the code. But
I have one question, that i don't know if is already answered by the
documentation:
> Further holes, between the the first and third packet received after the
hole are accounted only in future calls to the function, right? Because
the receiver needs to receive more packets to confirm loss, right?
> So, it's really necessary to look for other holes after the loss? Will
not this other holes be identified as losses in future?
I stand corrected, you are right: only the hole between
 * the highest sequence number before the loss (S0) and
 * the first sequence number after the loss
   (S1 or S3 depending on reordering)
are relevant.

Continuing the point from above, I would like to ask which way you would
like to go with your implementation:
 (a) receiver computes the Loss Event Rate, sender just uses this value
 (b) receiver only gathers the data (loss intervals, lost packets),
     sender does all the verification, book-keeping, and computation.

From reading your patches, I think it is going in the direction of (a).
But if this is the case, we don't need the Dropped Packets Option from
RFC 5622, 8.7. By definition it only makes sense if Loss Intervals
Options are also present.

So it is necessary to decide whether to go the full way, which means
 * support Loss Intervals and Dropped Packets alike
 * modify TFRC library (it will be a redesign)
 * modify receiver code
 * modify sender code,
or to use the present approach where
 * the receiver computes the Loss Rate and
 * a Mandatory Send Loss Event Rate feature is present during feature
   negotiation, to avoid problems with incompatible senders
   (there is a comment explaining this, in net/dccp/feat.c).

Thoughts?


^ permalink raw reply

* Re: [PATCH 4/5] Adds options DROPPED PACKETS and LOSS INTERVALS to receiver
From: gerrit @ 2009-09-19 13:16 UTC (permalink / raw)
  To: Ivo Calado; +Cc: Gerrit Renker, dccp, netdev
In-Reply-To: <cb00fa210909141740u585116c7i45165054d741a022@mail.gmail.com>

>> | Adds options DROPPED PACKETS and LOSS INTERVALS to receiver.
I must admit that I did not look at this deeply enough to be able to
say whether it would work or not. The comments that were sent were after
the first reading.

Whether to add the Loss Intervals / Dropped Packet options is related to
the question in patch 2/5. This needs to be clarified first: you do add
the Loss Intervals option, but if you do it, the division of the loss
intervals is not necessary - unless I am missing something here, this
computation is done by the sender.

If I understand RFC 4342/4828/5622 correctly, the sender would need to
keep track of the RTTs for each sent loss interval. Since the loss
interval boundaries are set by the receiver, the sender would need to
store the window counter value (or the RTT). RFC 4828 is a bit misleading
since it quotes RFC 3448/5348 (where the receiver computes the loss
event rate), whereas CCID-4 is based on RFC 4342 (where the sender
normally computes the loss event rate).

>> The condition above should be '&&', not '||'. Suggested alternative:
>>
>> +       if (tfrc_lh_slab == NULL)
>> +               goto lh_failed;
>> +
>> +       tfrc_ld_slab = kmem_cache_create("tfrc_sp_li_data",
>> +                                        sizeof(struct
>> tfrc_loss_data_entry), 0,
>> +                                        SLAB_HWCACHE_ALIGN, NULL);
>> +       if (tfrc_ld_slab != NULL)
>> +               return 0;
>> +
>> +       kmem_cache_destroy(tfrc_lh_slab);
>> +       tfrc_lh_slab = NULL;
>> +lh_failed:
>> +       return -ENOBUFS;
>>  }
>>
>
> Thanks for revising this. Adding one label for each failure case will
> not scale well. In another patch it will be needed to create another
> structure, and so, requiring another label.
Using such labels follows a coding convention in the networking code.
As an example, consider ip4_init_mib_net() in net/ipv4/af_inet.c.

The pattern is that if step n fails, it does a rollback, undoing all
preceding initialisations in the reverse order. I think this is also
in agreement with Documentation/CodingStyle, chap. 7.

> And how would be to determine if one packet's ecn is set to ECT 0 or ECT
> 1?
It should be possible to use '==' directly, i.e.
switch (DCCP_SKB_CB(skb)->dccpd_ecn) {
case  INET_ECN_NOT_ECT:  // ECN not enabled
case  INET_ECN_ECT_1:    // ECT(1), see below
case  INET_ECN_ECT_0:    // ECT(0)
case  INET_ECN_CE:       // congestion
}

However, the kernel currently only supports ECT(0). Resolving this is
ongoing work in another thread. For the moment, it simplifies the ECN
nonce verification; as per figure 1 in RFC 3540, the sum will always
be 0 if only ECT(0) is used.

This would allow to write a function stub for ECN nonce verification,
which for the moment only does something like

bool dccp_verify_ecn_nonce(const u8 sum)
{
      return sum == 0;
}

The same "fix" has currently been put into the Ack Vector nonce sum,
this is in
http://eden-feed.erg.abdn.ac.uk/cgi-bin/gitweb.cgi?p=dccp_exp.git;\
a=commitdiff;h=50e6081f6ff37102ac5f92df85f017e2c15f338a

^ permalink raw reply

* Re: ipw2200: firmware DMA loading rework
From: Bartlomiej Zolnierkiewicz @ 2009-09-19 13:25 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tso Ted, Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Mel Gorman,
	Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, James Ketrenos,
	Chatre, Reinette,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ipw2100-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
In-Reply-To: <200909022026.17910.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > Zolnierkiewicz<bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > >>
> > >> s/2.6.30/2.6.31-rc6/
> > >>
> > >> The issue has always been there but it was some recent change that
> > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > >
> > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > 
> > OK so the mount succeeded.
> > 
> > > EXT4-fs (dm-2): barriers enabled
> > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > EXT4-fs (dm-2): delayed allocation enabled
> > > EXT4-fs: file extents enabled
> > > mount: page allocation failure. order:5, mode:0xc0d0
> > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > Call Trace:
> > >  [<c0394de3>] ? printk+0xf/0x14
> > >  [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > >  [<c016a71b>] __get_free_pages+0xf/0x32
> > >  [<c01865cf>] __kmalloc+0x28/0xfa
> > >  [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > >  [<c01f529d>] ext4_mb_init+0x392/0x460
> > >  [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > >  [<c0239bc8>] ? snprintf+0x15/0x17
> > >  [<c01c0b26>] ? disk_name+0x24/0x69
> > >  [<c018ba63>] get_sb_bdev+0xda/0x117
> > >  [<c01e6711>] ext4_get_sb+0x13/0x15
> > >  [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > >  [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > >  [<c018adad>] do_kern_mount+0x33/0xbd
> > >  [<c019d0af>] do_mount+0x660/0x6b8
> > >  [<c016a71b>] ? __get_free_pages+0xf/0x32
> > >  [<c019d168>] sys_mount+0x61/0x99
> > >  [<c0102908>] sysenter_do_call+0x12/0x36
> > > Mem-Info:
> > > DMA per-cpu:
> > > CPU    0: hi:    0, btch:   1 usd:   0
> > > Normal per-cpu:
> > > CPU    0: hi:  186, btch:  31 usd:   0
> > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > >  inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > >  free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 489 489
> > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 0 0
> > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > 57947 total pagecache pages
> > > 878 pages in swap cache
> > > Swap cache stats: add 920, delete 42, find 11/11
> > > Free swap  = 1016436kB
> > > Total swap = 1020116kB
> > > 131056 pages RAM
> > > 4233 pages reserved
> > > 90573 pages shared
> > > 77286 pages non-shared
> > > EXT4-fs: mballoc enabled
> > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > >
> > > Thus it seems like the original bug is still there and any ideas how to
> > > debug the problem further are appreciated..
> > >
> > > The complete dmesg and kernel config are here:
> > >
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > 
> > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > it is the same issue, which from the trace it seems it is, then this
> > is due to an extra kmalloc() allocation and this apparently will not
> > get fixed on 2.6.31 due to the closeness of the merge window and the
> > non-criticalness this issue has been deemed.
> > 
> > A patch fix is part of the ext4-patchqueue
> > http://repo.or.cz/w/ext4-patch-queue.git
> 
> Thanks for the pointer but the page allocation failures that I hit seem
> to be caused by the memory management itself and the ext4 issue fixed by:
> 
> http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> 
> is a different problem (unrelated to this one).

Here is another data point.

This time it is an order-6 page allocation failure for rt2870sta
(w/ upcoming driver changes) and Linus' tree from few days ago..

ifconfig: page allocation failure. order:6, mode:0x8020
Pid: 4752, comm: ifconfig Tainted: G        WC 2.6.31-04082-g1824090-dirty #80
Call Trace:
 [<c03996f2>] ? printk+0xf/0x15
 [<c016b841>] __alloc_pages_nodemask+0x41d/0x462
 [<c010681e>] dma_generic_alloc_coherent+0x53/0xbd
 [<c02f83aa>] hcd_buffer_alloc+0xdb/0xe8
 [<c01067cb>] ? dma_generic_alloc_coherent+0x0/0xbd
 [<c02ee2d6>] usb_buffer_alloc+0x16/0x1d
 [<e121b627>] NICInitTransmit+0xe2/0x7e4 [rt2870sta]
 [<e121bfb1>] RTMPAllocTxRxRingMemory+0x11c/0x17b [rt2870sta]
 [<e11f0960>] rt28xx_init+0xa5/0x3f8 [rt2870sta]
 [<e121194a>] rt28xx_open+0x53/0xa2 [rt2870sta]
 [<e1211b77>] MainVirtualIF_open+0x23/0xf6 [rt2870sta]
 [<c03383a4>] dev_open+0x86/0xbb
 [<c0337b1a>] dev_change_flags+0x96/0x147
 [<c036e9cb>] devinet_ioctl+0x20f/0x4f8
 [<c036fc8f>] inet_ioctl+0x8e/0xa7
 [<c032ab50>] sock_ioctl+0x1c9/0x1ed
 [<c032a987>] ? sock_ioctl+0x0/0x1ed
 [<c0195732>] vfs_ioctl+0x18/0x71
 [<c0195cbb>] do_vfs_ioctl+0x491/0x4cf
 [<c01779d6>] ? handle_mm_fault+0x242/0x4ff
 [<c0119609>] ? do_page_fault+0x102/0x292
 [<c0140721>] ? up_read+0x16/0x29
 [<c0195d27>] sys_ioctl+0x2e/0x48
 [<c0102908>] sysenter_do_call+0x12/0x36
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:  84
Active_anon:14664 active_file:30057 inactive_anon:31744
 inactive_file:29940 unevictable:2 dirty:11 writeback:0 unstable:0
 free:5421 slab:4037 mapped:7781 pagetables:963 bounce:0
DMA free:2060kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:124kB active_file:3284kB inactive_file:972kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 489 489
Normal free:19624kB min:2788kB low:3484kB high:4180kB active_anon:58656kB inactive_anon:126852kB active_file:116944kB inactive_file:118788kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 3*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
Normal: 2180*4kB 625*8kB 303*16kB 33*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19624kB
64568 total pagecache pages
3652 pages in swap cache
Swap cache stats: add 21642, delete 17990, find 4906/6079
Free swap  = 981700kB
Total swap = 1020116kB
131056 pages RAM
4262 pages reserved
91941 pages shared
60834 pages non-shared
<-- ERROR in Alloc TX TxContext[0] HTTX_BUFFER !! 
<-- RTMPAllocTxRxRingMemory, Status=3
ERROR!!! RTMPAllocDMAMemory failed, Status[=0x00000003]
!!! rt28xx Initialized fail !!!

^ permalink raw reply

* [iproute2 PATCH] Add 'ip tuntap' support.
From: David Woodhouse @ 2009-09-19 16:48 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

This patch provides support for 'ip tuntap', allowing creation and
deletion of persistent tun/tap devices.

diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
new file mode 100644
index 0000000..915ba57
--- /dev/null
+++ b/include/linux/if_tun.h
@@ -0,0 +1,88 @@
+/*
+ *  Universal TUN/TAP device driver.
+ *  Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ *  GNU General Public License for more details.
+ */
+
+#ifndef __IF_TUN_H
+#define __IF_TUN_H
+
+#include <linux/types.h>
+#include <linux/if_ether.h>
+
+/* Read queue size */
+#define TUN_READQ_SIZE	500
+
+/* TUN device flags */
+#define TUN_TUN_DEV 	0x0001	
+#define TUN_TAP_DEV	0x0002
+#define TUN_TYPE_MASK   0x000f
+
+#define TUN_FASYNC	0x0010
+#define TUN_NOCHECKSUM	0x0020
+#define TUN_NO_PI	0x0040
+#define TUN_ONE_QUEUE	0x0080
+#define TUN_PERSIST 	0x0100	
+#define TUN_VNET_HDR 	0x0200
+
+/* Ioctl defines */
+#define TUNSETNOCSUM  _IOW('T', 200, int) 
+#define TUNSETDEBUG   _IOW('T', 201, int) 
+#define TUNSETIFF     _IOW('T', 202, int) 
+#define TUNSETPERSIST _IOW('T', 203, int) 
+#define TUNSETOWNER   _IOW('T', 204, int)
+#define TUNSETLINK    _IOW('T', 205, int)
+#define TUNSETGROUP   _IOW('T', 206, int)
+#define TUNGETFEATURES _IOR('T', 207, unsigned int)
+#define TUNSETOFFLOAD  _IOW('T', 208, unsigned int)
+#define TUNSETTXFILTER _IOW('T', 209, unsigned int)
+#define TUNGETIFF      _IOR('T', 210, unsigned int)
+#define TUNGETSNDBUF   _IOR('T', 211, int)
+#define TUNSETSNDBUF   _IOW('T', 212, int)
+
+/* TUNSETIFF ifr flags */
+#define IFF_TUN		0x0001
+#define IFF_TAP		0x0002
+#define IFF_NO_PI	0x1000
+#define IFF_ONE_QUEUE	0x2000
+#define IFF_VNET_HDR	0x4000
+#define IFF_TUN_EXCL	0x8000
+
+/* Features for GSO (TUNSETOFFLOAD). */
+#define TUN_F_CSUM	0x01	/* You can hand me unchecksummed packets. */
+#define TUN_F_TSO4	0x02	/* I can handle TSO for IPv4 packets */
+#define TUN_F_TSO6	0x04	/* I can handle TSO for IPv6 packets */
+#define TUN_F_TSO_ECN	0x08	/* I can handle TSO with ECN bits. */
+
+/* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
+#define TUN_PKT_STRIP	0x0001
+struct tun_pi {
+	__u16  flags;
+	__be16 proto;
+};
+
+/*
+ * Filter spec (used for SETXXFILTER ioctls)
+ * This stuff is applicable only to the TAP (Ethernet) devices.
+ * If the count is zero the filter is disabled and the driver accepts
+ * all packets (promisc mode).
+ * If the filter is enabled in order to accept broadcast packets
+ * broadcast addr must be explicitly included in the addr list.
+ */
+#define TUN_FLT_ALLMULTI 0x0001 /* Accept all multicast packets */
+struct tun_filter {
+	__u16  flags; /* TUN_FLT_ flags see above */
+	__u16  count; /* Number of addresses */
+	__u8   addr[0][ETH_ALEN];
+};
+
+#endif /* __IF_TUN_H */
diff --git a/ip/Makefile b/ip/Makefile
index 3c185cf..fd16fe9 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -1,6 +1,6 @@
 IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o \
     rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \
-    ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \
+    ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o iptuntap.o \
     ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o \
     iplink_vlan.o link_veth.o link_gre.o iplink_can.o
 
diff --git a/ip/ip.c b/ip/ip.c
index 2bd54b2..d846a76 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -47,7 +47,7 @@ static void usage(void)
 "Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n"
 "       ip [ -force ] -batch filename\n"
 "where  OBJECT := { link | addr | addrlabel | route | rule | neigh | ntable |\n"
-"                   tunnel | maddr | mroute | monitor | xfrm }\n"
+"                   tunnel | tuntap | maddr | mroute | monitor | xfrm }\n"
 "       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
 "                    -f[amily] { inet | inet6 | ipx | dnet | link } |\n"
 "                    -o[neline] | -t[imestamp] | -b[atch] [filename] }\n");
@@ -75,6 +75,8 @@ static const struct cmd {
 	{ "link",	do_iplink },
 	{ "tunnel",	do_iptunnel },
 	{ "tunl",	do_iptunnel },
+	{ "tuntap",	do_iptuntap },
+	{ "tap",	do_iptuntap },
 	{ "monitor",	do_ipmonitor },
 	{ "xfrm",	do_xfrm },
 	{ "mroute",	do_multiroute },
diff --git a/ip/ip_common.h b/ip/ip_common.h
index 273065f..c857667 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -32,6 +32,7 @@ extern int do_ipneigh(int argc, char **argv);
 extern int do_ipntable(int argc, char **argv);
 extern int do_iptunnel(int argc, char **argv);
 extern int do_ip6tunnel(int argc, char **argv);
+extern int do_iptuntap(int argc, char **argv);
 extern int do_iplink(int argc, char **argv);
 extern int do_ipmonitor(int argc, char **argv);
 extern int do_multiaddr(int argc, char **argv);
diff --git a/ip/iptuntap.c b/ip/iptuntap.c
new file mode 100644
index 0000000..f7f64bc
--- /dev/null
+++ b/ip/iptuntap.c
@@ -0,0 +1,321 @@
+/*
+ * iptunnel.c	       "ip tuntap"
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	David Woodhouse <David.Woodhouse@intel.com>
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <arpa/inet.h>
+#include <sys/ioctl.h>
+#include <linux/if.h>
+#include <linux/if_tun.h>
+#include <pwd.h>
+#include <grp.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <errno.h>
+
+#include "rt_names.h"
+#include "utils.h"
+#include "ip_common.h"
+
+#define TUNDEV "/dev/net/tun"
+
+static void usage(void) __attribute__((noreturn));
+
+static void usage(void)
+{
+	fprintf(stderr, "Usage: ip tuntap { add | del } [ dev PHYS_DEV ] \n");
+	fprintf(stderr, "          [ mode { tun | tap } ] [ user USER ] [ group GROUP ]\n");
+	fprintf(stderr, "          [ one_queue ] [ pi ] [ vnet_hdr ]\n");
+	fprintf(stderr, "\n");
+	fprintf(stderr, "Where: USER  := { STRING | NUMBER }\n");
+	fprintf(stderr, "       GROUP := { STRING | NUMBER }\n");
+	exit(-1);
+}
+
+static int tap_add_ioctl(struct ifreq *ifr, uid_t uid, gid_t gid)
+{
+	int fd = open(TUNDEV, O_RDWR);
+	int ret = -1;
+
+	ifr->ifr_flags |= IFF_TUN_EXCL;
+
+	fd = open(TUNDEV, O_RDWR);
+	if (fd < 0) {
+		perror("open");
+		return -1;
+	}
+	if (ioctl(fd, TUNSETIFF, ifr)) {
+		perror("ioctl(TUNSETIFF)");
+		goto out;
+	}
+	if (uid != -1 && ioctl(fd, TUNSETOWNER, uid)) {
+		perror("ioctl(TUNSETOWNER)");
+		goto out;
+	}
+	if (gid != -1 && ioctl(fd, TUNSETGROUP, gid)) {
+		perror("ioctl(TUNSETGROUP)");
+		goto out;
+	}
+	if (ioctl(fd, TUNSETPERSIST, 1)) {
+		perror("ioctl(TUNSETPERSIST)");
+		goto out;
+	}
+	ret = 0;
+ out:
+	close(fd);
+	return ret;
+}
+
+static int tap_del_ioctl(struct ifreq *ifr)
+{
+	int fd = open(TUNDEV, O_RDWR);
+	int ret = -1;
+
+	if (fd < 0) {
+		perror("open");
+		return -1;
+	}
+	if (ioctl(fd, TUNSETIFF, ifr)) {
+		perror("ioctl(TUNSETIFF)");
+		goto out;
+	}
+	if (ioctl(fd, TUNSETPERSIST, 0)) {
+		perror("ioctl(TUNSETPERSIST)");
+		goto out;
+	}
+	ret = 0;
+ out:
+	close(fd);
+	return ret;
+	
+}
+static int parse_args(int argc, char **argv, struct ifreq *ifr, uid_t *uid, gid_t *gid)
+{
+	int count = 0;
+
+	memset(ifr, 0, sizeof(*ifr));
+
+	ifr->ifr_flags |= IFF_NO_PI;
+
+	while (argc > 0) {
+		if (matches(*argv, "mode") == 0) {
+			NEXT_ARG();
+			if (matches(*argv, "tun") == 0) {
+				if (ifr->ifr_flags & IFF_TAP) {
+					fprintf(stderr,"You managed to ask for more than one tunnel mode.\n");
+					exit(-1);
+				}
+				ifr->ifr_flags |= IFF_TUN;
+			} else if (matches(*argv, "tap") == 0) {
+				if (ifr->ifr_flags & IFF_TUN) {
+					fprintf(stderr,"You managed to ask for more than one tunnel mode.\n");
+					exit(-1);
+				}
+				ifr->ifr_flags |= IFF_TAP;
+			} else {
+				fprintf(stderr,"Cannot guess tunnel mode.\n");
+				exit(-1);
+			}
+		} else if (uid && matches(*argv, "user") == 0) {
+			char *end;
+			unsigned long user;
+
+			NEXT_ARG();
+			if (**argv && ((user = strtol(*argv, &end, 10)), !*end))
+				*uid = user;
+			else {
+				struct passwd *pw = getpwnam(*argv);
+				if (!pw) {
+					fprintf(stderr, "invalid user \"%s\"\n", *argv);
+					exit(-1);
+				}
+				*uid = pw->pw_uid;
+			}
+		} else if (gid && matches(*argv, "group") == 0) {
+			char *end;
+			unsigned long group;
+
+			NEXT_ARG();
+			
+			if (**argv && ((group = strtol(*argv, &end, 10)), !*end))
+				*gid = group;
+			else {
+				struct group *gr = getgrnam(*argv);
+				if (!gr) {
+					fprintf(stderr, "invalid group \"%s\"\n", *argv);
+					exit(-1);
+				}
+				*gid = gr->gr_gid;
+			}
+		} else if (matches(*argv, "pi") == 0) {
+			ifr->ifr_flags &= ~IFF_NO_PI;
+		} else if (matches(*argv, "one_queue") == 0) {
+			ifr->ifr_flags |= IFF_ONE_QUEUE;
+		} else if (matches(*argv, "vnet_hdr") == 0) {
+			ifr->ifr_flags |= IFF_VNET_HDR;
+		} else if (matches(*argv, "dev") == 0) {
+			NEXT_ARG();
+			strncpy(ifr->ifr_name, *argv, IFNAMSIZ-1);
+		} else {
+			if (matches(*argv, "name") == 0) {
+				NEXT_ARG();
+			} else if (matches(*argv, "help") == 0)
+				usage();
+			if (ifr->ifr_name[0])
+				duparg2("name", *argv);
+			strncpy(ifr->ifr_name, *argv, IFNAMSIZ);
+		}
+		count++;
+		argc--; argv++;
+	}
+
+	return 0;
+}
+
+
+static int do_add(int argc, char **argv)
+{
+	struct ifreq ifr;
+	uid_t uid = -1;
+	gid_t gid = -1;
+
+	if (parse_args(argc, argv, &ifr, &uid, &gid) < 0)
+		return -1;
+
+	if (!(ifr.ifr_flags & TUN_TYPE_MASK)) {
+		fprintf(stderr, "You failed to specify a tunnel mode\n");
+		return -1;
+	}
+	return tap_add_ioctl(&ifr, uid, gid);
+}
+
+static int do_del(int argc, char **argv)
+{
+	struct ifreq ifr;
+
+	if (parse_args(argc, argv, &ifr, NULL, NULL) < 0)
+		return -1;
+
+	return tap_del_ioctl(&ifr);
+}
+
+static int read_prop(char *dev, char *prop, long *value)
+{
+	char fname[IFNAMSIZ+25], buf[80], *endp;
+	ssize_t len;
+	int fd;
+	long result;
+
+	sprintf(fname, "/sys/class/net/%s/%s", dev, prop);
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		if (strcmp(prop, "tun_flags"))
+			fprintf(stderr, "open %s: %s\n", fname,
+				strerror(errno));
+		return -1;
+	}
+	len = read(fd, buf, sizeof(buf)-1);
+	close(fd);
+	if (len < 0) {
+		fprintf(stderr, "read %s: %s", fname, strerror(errno));
+		return -1;
+	}
+
+	buf[len] = 0;
+	result = strtol(buf, &endp, 0);
+	if (*endp != '\n') {
+		fprintf(stderr, "Failed to parse %s\n", fname);
+		return -1;
+	}
+	*value = result;
+	return 0;
+}
+
+static void print_flags(long flags)
+{
+	if (flags & IFF_TUN)
+		printf(" tun");
+
+	if (flags & IFF_TAP)
+		printf(" tap");
+
+	if (!(flags & IFF_NO_PI))
+		printf(" pi");
+
+	if (flags & IFF_ONE_QUEUE)
+		printf(" one_queue");
+
+	if (flags & IFF_VNET_HDR)
+		printf(" vnet_hdr");
+
+	flags &= ~(IFF_TUN|IFF_TAP|IFF_NO_PI|IFF_ONE_QUEUE|IFF_VNET_HDR);
+	if (flags)
+		printf(" UNKNOWN_FLAGS:%lx", flags);
+}
+
+static int do_show(int argc, char **argv)
+{
+	DIR *dir;
+	struct dirent *d;
+	long flags, owner = -1, group = -1;
+
+	dir = opendir("/sys/class/net");
+	if (!dir) {
+		perror("opendir");
+		return -1;
+	}
+	while ((d = readdir(dir))) {
+		if (d->d_name[0] == '.' &&
+		    (d->d_name[1] == 0 || d->d_name[1] == '.'))
+			continue;
+
+		if (read_prop(d->d_name, "tun_flags", &flags))
+			continue;
+
+		read_prop(d->d_name, "owner", &owner);
+		read_prop(d->d_name, "group", &group);
+		
+		printf("%s:", d->d_name);
+		print_flags(flags);
+		if (owner != -1)
+			printf(" user %ld", owner);
+		if (group != -1)
+			printf(" group %ld", group);
+		printf("\n");
+	}
+	return 0;
+}
+
+int do_iptuntap(int argc, char **argv)
+{
+	if (argc > 0) {
+		if (matches(*argv, "add") == 0)
+			return do_add(argc-1, argv+1);
+		if (matches(*argv, "del") == 0)
+			return do_del(argc-1, argv+1);
+		if (matches(*argv, "show") == 0 ||
+                    matches(*argv, "lst") == 0 ||
+                    matches(*argv, "list") == 0)
+                        return do_show(argc-1, argv+1);
+		if (matches(*argv, "help") == 0)
+			usage();
+	} else
+		return do_show(0, NULL);
+
+	fprintf(stderr, "Command \"%s\" is unknown, try \"ip tuntap help\".\n",
+		*argv);
+	exit(-1);
+}

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply related

* SO_TIMESTAMPING fix and design decisions
From: Christopher Zimmermann @ 2009-09-19 17:25 UTC (permalink / raw)
  To: netdev


[-- Attachment #1.1: Type: text/plain, Size: 3049 bytes --]

Hi, 

I'm currently working on the SO_TIMESTAMPING feature which is currently 
pretty much broken. The current status is the following:

-tx software timestamps don't work because of a race condition. See 
commit cd4d8fdad1f1320.
-rx software timestamps do work. But they are nothing new.
SO_TIMESTAMP[NS] has been available for years.

hardware timestamps only work for the Intel igb driver. I have access to 
two test machines with NICs supported by this driver.

-tx hardware timestamps do work. I still have to check what happens when 
there is high load of packets requesting timestamping.
-rx hardware timestamps work only for special PTP (Precision Time 
Protocol) packets. There exists a HWTSTAMP_FILTER_ALL option to 
timestamp all packets, but it doesn't work and it will not work. This is 
due to a problem in the hardware design. The Intel hardware is tuned for 
PTP (and so is the ioctl interface).


Right now I'm trying to fix the software tx timestamps. I see several 
ways to fix it:

-Do skb_get() before calling ops->ndo_start_xmit(). This breaks the 
wireless stack, because it is incompatible with pskb_expand_head().

-Do skb_clone() and skb_set_owner_w() before calling 
ops->ndo_start_xmit(). If the driver promises to do timestamping 
(shtx->in_progress==1), then this clone will be abandoned. -> Software 
timestamps only as fallback.

-Do skb_clone() and skb_set_owner_w() before calling 
ops->ndo_start_xmit(). Use this to send the software timestamp 
regardless of what the driver is doing. This results in software 
timestamps being always generated. Not only as fallback. The drawback is 
that userspace will eventually have to parse two timestamping messages 
(only if requested). This is not a big deal.

I chose the last option since it is easiest to implement without much 
interaction with the driver. Patch is attached. Any comments or ideas 
for a better implementation are welcome.
Currently the tx timestamp is returned with the whole packet, including 
all transport layer headers. I would like to return only the payload, 
since this would make the interface easier for userspace. There is 
nothing like a "payload" pointer in the sk_buff. How can I solve this? 
Add such a pointer?


To fix the hardware rx timestamps my idea is to change the ioctl 
interface, so that it allows userspace to fine tune the relevant 
hardware registers of the intel controler. This would allow hardware 
timestamps to be used in other scenarios than just for PTP.
I don't know any application which already uses this interface right 
now. But if there is one it would be easy to fix.
In case there appear some other controlers with timestamping support 
they will either need their own custom interface (if they are similarly 
limited) or they can just go without the ioctl interface and do hardware 
timestamping of all received packages if they are not so limited.

Could this be a way to go or what would you suggest?


Regards, 

Christopher Zimmermann

[-- Attachment #1.2: patch --]
[-- Type: application/octet-stream, Size: 4924 bytes --]

commit fe7b307ab374a495feba8c951fb7da2518a4e422
Author: Christopher Zimmermann <madroach@zakweb.de>
Date:   Sat Sep 19 19:21:28 2009 +0200

    net: Implement timestamping
    
    Avoid the skb_clone, skb_hold and software fallback problems by
    returning two seperate messages to userspace. One for software and one
    for hardware timestamp.

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index d2639c4..1bdce95 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -4435,7 +4435,7 @@ static void igb_tx_hwtstamp(struct igb_adapter *adapter, struct sk_buff *skb)
 			shhwtstamps.hwtstamp = ns_to_ktime(ns);
 			shhwtstamps.syststamp =
 				timecompare_transform(&adapter->compare, ns);
-			skb_tstamp_tx(skb, &shhwtstamps);
+			skb_tstamp_hw_tx(skb, &shhwtstamps);
 		}
 	}
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index df7b23a..315a4c4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1868,7 +1868,7 @@ static inline ktime_t net_invalid_timestamp(void)
 }
 
 /**
- * skb_tstamp_tx - queue clone of skb with send time stamps
+ * skb_tstamp_tx - queue clone of skb with hardware send time stamps
  * @orig_skb:	the original outgoing packet
  * @hwtstamps:	hardware time stamps, may be NULL if not available
  *
@@ -1878,9 +1878,21 @@ static inline ktime_t net_invalid_timestamp(void)
  * generates a software time stamp (otherwise), then queues the clone
  * to the error queue of the socket.  Errors are silently ignored.
  */
-extern void skb_tstamp_tx(struct sk_buff *orig_skb,
+extern void skb_tstamp_hw_tx(struct sk_buff *orig_skb,
 			struct skb_shared_hwtstamps *hwtstamps);
 
+/**
+ * skb_tstamp_tx - queue clone of skb with software generated send time stamps
+ * @orig_skb:	the original outgoing packet
+ *
+ * If the skb has a socket associated, then this function clones the
+ * skb (thus sharing the actual data and optional structures), generates a
+ * software time stamp, then queues the clone to the error queue of the socket.
+ * Errors are silently ignored.
+ */
+extern void skb_tstamp_tx(struct sk_buff *skb);
+			
+
 extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len);
 extern __sum16 __skb_checksum_complete(struct sk_buff *skb);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 560c8c9..c04d3dd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1701,6 +1701,16 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 	int rc;
+	struct sk_buff *skb_tstamp = NULL;
+
+	if (unlikely(skb_tx(skb)->software && skb->sk)) {
+		skb_tstamp = skb_clone(skb, 0);
+
+		/* TODO: Is a sock_hold() needed here?
+		 * skb_set_owner_w doesn't do it. */
+		if (likely(skb_tstamp))
+		    	skb_set_owner_w(skb_tstamp, skb->sk);
+	}
 
 	if (likely(!skb->next)) {
 		if (!list_empty(&ptype_all))
@@ -1721,8 +1731,11 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			skb_dst_drop(skb);
 
 		rc = ops->ndo_start_xmit(skb, dev);
-		if (rc == NETDEV_TX_OK)
+		if (rc == NETDEV_TX_OK) {
+			if (unlikely(skb_tstamp))
+				skb_tstamp_tx(skb_tstamp);
 			txq_trans_update(txq);
+		}
 		/*
 		 * TODO: if skb_orphan() was called by
 		 * dev->hard_start_xmit() (for example, the unmodified
@@ -1752,6 +1765,8 @@ gso:
 			skb->next = nskb;
 			return rc;
 		}
+		if (unlikely(skb_tstamp))
+			skb_tstamp_tx(skb_tstamp);
 		txq_trans_update(txq);
 		if (unlikely(netif_tx_queue_stopped(txq) && skb->next))
 			return NETDEV_TX_BUSY;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 80a9616..1517a5e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2967,7 +2967,7 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
 }
 EXPORT_SYMBOL_GPL(skb_cow_data);
 
-void skb_tstamp_tx(struct sk_buff *orig_skb,
+void skb_tstamp_hw_tx(struct sk_buff *orig_skb,
 		struct skb_shared_hwtstamps *hwtstamps)
 {
 	struct sock *sk = orig_skb->sk;
@@ -2982,17 +2982,9 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (!skb)
 		return;
 
-	if (hwtstamps) {
+	if (hwtstamps)
 		*skb_hwtstamps(skb) =
 			*hwtstamps;
-	} else {
-		/*
-		 * no hardware time stamps available,
-		 * so keep the skb_shared_tx and only
-		 * store software time stamp
-		 */
-		skb->tstamp = ktime_get_real();
-	}
 
 	serr = SKB_EXT_ERR(skb);
 	memset(serr, 0, sizeof(*serr));
@@ -3002,6 +2994,23 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (err)
 		kfree_skb(skb);
 }
+EXPORT_SYMBOL_GPL(skb_tstamp_hw_tx);
+
+void skb_tstamp_tx(struct sk_buff *skb)
+{
+	struct sock_exterr_skb *serr;
+	int err;
+
+	skb->tstamp = ktime_get_real();
+
+	serr = SKB_EXT_ERR(skb);
+	memset(serr, 0, sizeof(*serr));
+	serr->ee.ee_errno = ENOMSG;
+	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
+	err = sock_queue_err_skb(skb->sk, skb);
+	if (err)
+		kfree_skb(skb);
+}
 EXPORT_SYMBOL_GPL(skb_tstamp_tx);
 
 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply related

* [PATCH] atm: dereference of he_dev->rbps_virt in he_init_group()
From: Roel Kluin @ 2009-09-19 18:16 UTC (permalink / raw)
  To: David Miller; +Cc: chas, linux-atm-general, netdev, akpm
In-Reply-To: <20090911.125135.148893888.davem@davemloft.net>

he_dev->rbps_virt or he_dev->rbpl_virt allocation may fail, s
them. Make sure that he_init_group() cleans up after errors.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
---
Found with sed: http://kernelnewbies.org/roelkluin

This version was build, sparse and checkpatch tested.

diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index 2de6406..00fd67e 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -777,7 +777,7 @@ he_init_cs_block_rcm(struct he_dev *he_dev)
 static int __devinit
 he_init_group(struct he_dev *he_dev, int group)
 {
-	int i;
+	int i, ret;
 
 	/* small buffer pool */
 	he_dev->rbps_pool = pci_pool_create("rbps", he_dev->pci_dev,
@@ -790,19 +790,27 @@ he_init_group(struct he_dev *he_dev, int group)
 	he_dev->rbps_base = pci_alloc_consistent(he_dev->pci_dev,
 		CONFIG_RBPS_SIZE * sizeof(struct he_rbp), &he_dev->rbps_phys);
 	if (he_dev->rbps_base == NULL) {
-		hprintk("failed to alloc rbps\n");
-		return -ENOMEM;
+		hprintk("failed to alloc rbps_base\n");
+		ret = -ENOMEM;
+		goto out_destroy_rbps_pool;
 	}
 	memset(he_dev->rbps_base, 0, CONFIG_RBPS_SIZE * sizeof(struct he_rbp));
 	he_dev->rbps_virt = kmalloc(CONFIG_RBPS_SIZE * sizeof(struct he_virt), GFP_KERNEL);
+	if (he_dev->rbps_virt == NULL) {
+		hprintk("failed to alloc rbps_virt\n");
+		ret = -ENOMEM;
+		goto out_free_rbps_base;
+	}
 
 	for (i = 0; i < CONFIG_RBPS_SIZE; ++i) {
 		dma_addr_t dma_handle;
 		void *cpuaddr;
 
 		cpuaddr = pci_pool_alloc(he_dev->rbps_pool, GFP_KERNEL|GFP_DMA, &dma_handle);
-		if (cpuaddr == NULL)
-			return -ENOMEM;
+		if (cpuaddr == NULL) {
+			ret = -ENOMEM;
+			goto out_free_rbps_virt;
+		}
 
 		he_dev->rbps_virt[i].virt = cpuaddr;
 		he_dev->rbps_base[i].status = RBP_LOANED | RBP_SMALLBUF | (i << RBP_INDEX_OFF);
@@ -827,25 +835,34 @@ he_init_group(struct he_dev *he_dev, int group)
 			CONFIG_RBPL_BUFSIZE, 8, 0);
 	if (he_dev->rbpl_pool == NULL) {
 		hprintk("unable to create rbpl pool\n");
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto out_free_rbps_virt;
 	}
 
 	he_dev->rbpl_base = pci_alloc_consistent(he_dev->pci_dev,
 		CONFIG_RBPL_SIZE * sizeof(struct he_rbp), &he_dev->rbpl_phys);
 	if (he_dev->rbpl_base == NULL) {
-		hprintk("failed to alloc rbpl\n");
-		return -ENOMEM;
+		hprintk("failed to alloc rbpl_base\n");
+		ret = -ENOMEM;
+		goto out_destroy_rbpl_pool;
 	}
 	memset(he_dev->rbpl_base, 0, CONFIG_RBPL_SIZE * sizeof(struct he_rbp));
 	he_dev->rbpl_virt = kmalloc(CONFIG_RBPL_SIZE * sizeof(struct he_virt), GFP_KERNEL);
+	if (he_dev->rbpl_virt == NULL) {
+		hprintk("failed to alloc rbpl_virt\n");
+		ret = -ENOMEM;
+		goto out_free_rbpl_base;
+	}
 
 	for (i = 0; i < CONFIG_RBPL_SIZE; ++i) {
 		dma_addr_t dma_handle;
 		void *cpuaddr;
 
 		cpuaddr = pci_pool_alloc(he_dev->rbpl_pool, GFP_KERNEL|GFP_DMA, &dma_handle);
-		if (cpuaddr == NULL)
-			return -ENOMEM;
+		if (cpuaddr == NULL) {
+			ret = -ENOMEM;
+			goto out_free_rbpl_virt;
+		}
 
 		he_dev->rbpl_virt[i].virt = cpuaddr;
 		he_dev->rbpl_base[i].status = RBP_LOANED | (i << RBP_INDEX_OFF);
@@ -870,7 +887,8 @@ he_init_group(struct he_dev *he_dev, int group)
 		CONFIG_RBRQ_SIZE * sizeof(struct he_rbrq), &he_dev->rbrq_phys);
 	if (he_dev->rbrq_base == NULL) {
 		hprintk("failed to allocate rbrq\n");
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto out_free_rbpl_virt;
 	}
 	memset(he_dev->rbrq_base, 0, CONFIG_RBRQ_SIZE * sizeof(struct he_rbrq));
 
@@ -894,7 +912,8 @@ he_init_group(struct he_dev *he_dev, int group)
 		CONFIG_TBRQ_SIZE * sizeof(struct he_tbrq), &he_dev->tbrq_phys);
 	if (he_dev->tbrq_base == NULL) {
 		hprintk("failed to allocate tbrq\n");
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto out_free_rbpq_base;
 	}
 	memset(he_dev->tbrq_base, 0, CONFIG_TBRQ_SIZE * sizeof(struct he_tbrq));
 
@@ -906,6 +925,39 @@ he_init_group(struct he_dev *he_dev, int group)
 	he_writel(he_dev, CONFIG_TBRQ_THRESH, G0_TBRQ_THRESH + (group * 16));
 
 	return 0;
+
+out_free_rbpq_base:
+	pci_free_consistent(he_dev->pci_dev, CONFIG_RBRQ_SIZE *
+			sizeof(struct he_rbrq), he_dev->rbrq_base,
+			he_dev->rbrq_phys);
+	i = CONFIG_RBPL_SIZE;
+out_free_rbpl_virt:
+	while (--i)
+		pci_pool_free(he_dev->rbps_pool, he_dev->rbpl_virt[i].virt,
+				he_dev->rbps_base[i].phys);
+	kfree(he_dev->rbpl_virt);
+
+out_free_rbpl_base:
+	pci_free_consistent(he_dev->pci_dev, CONFIG_RBPL_SIZE *
+			sizeof(struct he_rbp), he_dev->rbpl_base,
+			he_dev->rbpl_phys);
+out_destroy_rbpl_pool:
+	pci_pool_destroy(he_dev->rbpl_pool);
+
+	i = CONFIG_RBPL_SIZE;
+out_free_rbps_virt:
+	while (--i)
+		pci_pool_free(he_dev->rbpl_pool, he_dev->rbps_virt[i].virt,
+				he_dev->rbpl_base[i].phys);
+	kfree(he_dev->rbps_virt);
+
+out_free_rbps_base:
+	pci_free_consistent(he_dev->pci_dev, CONFIG_RBPS_SIZE *
+			sizeof(struct he_rbp), he_dev->rbps_base,
+			he_dev->rbps_phys);
+out_destroy_rbps_pool:
+	pci_pool_destroy(he_dev->rbps_pool);
+	return ret;
 }
 
 static int __devinit

^ permalink raw reply related

* Re: [PATCH] atm: dereference of he_dev->rbps_virt in he_init_group()
From: Joe Perches @ 2009-09-19 18:27 UTC (permalink / raw)
  To: Roel Kluin; +Cc: David Miller, chas, linux-atm-general, netdev, akpm
In-Reply-To: <4AB51FE7.7030509@gmail.com>

On Sat, 2009-09-19 at 20:16 +0200, Roel Kluin wrote:
> +	int i, ret;
> +		ret = -ENOMEM;
> +		ret = -ENOMEM;
> +			ret = -ENOMEM;
> +		ret = -ENOMEM;
> +		ret = -ENOMEM;
> +		ret = -ENOMEM;
> +		ret = -ENOMEM;
> +		ret = -ENOMEM;
> +out_destroy_rbps_pool:
> +	pci_pool_destroy(he_dev->rbps_pool);
> +	return ret;

It looks as if it'd be clearer to not use variable ret and
simply return -ENOMEM after the out_destroy_rbps_pool label.



^ permalink raw reply

* [PATCH 2/5] drivers/net: remove duplicate structure field initialization
From: Julia Lawall @ 2009-09-19 19:48 UTC (permalink / raw)
  To: netdev, linux-kernel, kernel-janitors

From: Julia Lawall <julia@diku.dk>

The definitions of vnet_ops and ehea_netdev_ops have initializations of a
local function and eth_change_mtu for their respective ndo_change_mtu
fields.  This change uses only the local function.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier I, s, fld;
position p0,p;
expression E;
@@

struct I s =@p0 { ... .fld@p = E, ...};

@s@
identifier I, s, r.fld;
position r.p0,p;
expression E;
@@

struct I s =@p0 { ... .fld@p = E, ...};

@script:python@
p0 << r.p0;
fld << r.fld;
ps << s.p;
pr << r.p;
@@

if int(ps[0].line)!=int(pr[0].line) or int(ps[0].column)!=int(pr[0].column):
  cocci.print_main(fld,p0)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/ehea/ehea_main.c |    1 -
 drivers/net/sunvnet.c        |    1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/net/sunvnet.c b/drivers/net/sunvnet.c
index f1e5e45..bc74db0 100644
--- a/drivers/net/sunvnet.c
+++ b/drivers/net/sunvnet.c
@@ -1016,7 +1016,6 @@ static const struct net_device_ops vnet_ops = {
 	.ndo_open		= vnet_open,
 	.ndo_stop		= vnet_close,
 	.ndo_set_multicast_list	= vnet_set_rx_mode,
-	.ndo_change_mtu		= eth_change_mtu,
 	.ndo_set_mac_address	= vnet_set_mac_addr,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_tx_timeout		= vnet_tx_timeout,
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 977c3d3..41bd7ae 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -3083,7 +3083,6 @@ static const struct net_device_ops ehea_netdev_ops = {
 	.ndo_poll_controller	= ehea_netpoll,
 #endif
 	.ndo_get_stats		= ehea_get_stats,
-	.ndo_change_mtu		= eth_change_mtu,
 	.ndo_set_mac_address	= ehea_set_mac_addr,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_multicast_list	= ehea_set_multicast_list,

^ permalink raw reply related

* Re: [iproute2 PATCH] Add 'ip tuntap' support.
From: Stephen Hemminger @ 2009-09-19 19:52 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev
In-Reply-To: <1253378923.6317.11.camel@macbook.infradead.org>

On Sat, 19 Sep 2009 09:48:43 -0700
David Woodhouse <dwmw2@infradead.org> wrote:

> This patch provides support for 'ip tuntap', allowing creation and
> deletion of persistent tun/tap devices.
> 
> diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
> new file mode 100644
> index 0000000..915ba57
> --- /dev/null
> +++ b/include/linux/if_tun.h
> @@ -0,0 +1,88 @@
> +/*
> + *  Universal TUN/TAP device driver.
> + *  Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + *  GNU General Public License for more details.
> + */
> +
> +#ifndef __IF_TUN_H
> +#define __IF_TUN_H
> +
> +#include <linux/types.h>
> +#include <linux/if_ether.h>
> +
> +/* Read queue size */
> +#define TUN_READQ_SIZE	500
> +
> +/* TUN device flags */
> +#define TUN_TUN_DEV 	0x0001	
> +#define TUN_TAP_DEV	0x0002
> +#define TUN_TYPE_MASK   0x000f
> +
> +#define TUN_FASYNC	0x0010
> +#define TUN_NOCHECKSUM	0x0020
> +#define TUN_NO_PI	0x0040
> +#define TUN_ONE_QUEUE	0x0080
> +#define TUN_PERSIST 	0x0100	
> +#define TUN_VNET_HDR 	0x0200
> +
> +/* Ioctl defines */
> +#define TUNSETNOCSUM  _IOW('T', 200, int) 
> +#define TUNSETDEBUG   _IOW('T', 201, int) 
> +#define TUNSETIFF     _IOW('T', 202, int) 
> +#define TUNSETPERSIST _IOW('T', 203, int) 
> +#define TUNSETOWNER   _IOW('T', 204, int)
> +#define TUNSETLINK    _IOW('T', 205, int)
> +#define TUNSETGROUP   _IOW('T', 206, int)
> +#define TUNGETFEATURES _IOR('T', 207, unsigned int)
> +#define TUNSETOFFLOAD  _IOW('T', 208, unsigned int)
> +#define TUNSETTXFILTER _IOW('T', 209, unsigned int)
> +#define TUNGETIFF      _IOR('T', 210, unsigned int)
> +#define TUNGETSNDBUF   _IOR('T', 211, int)
> +#define TUNSETSNDBUF   _IOW('T', 212, int)
> +
> +/* TUNSETIFF ifr flags */
> +#define IFF_TUN		0x0001
> +#define IFF_TAP		0x0002
> +#define IFF_NO_PI	0x1000
> +#define IFF_ONE_QUEUE	0x2000
> +#define IFF_VNET_HDR	0x4000
> +#define IFF_TUN_EXCL	0x8000
> +
> +/* Features for GSO (TUNSETOFFLOAD). */
> +#define TUN_F_CSUM	0x01	/* You can hand me unchecksummed packets. */
> +#define TUN_F_TSO4	0x02	/* I can handle TSO for IPv4 packets */
> +#define TUN_F_TSO6	0x04	/* I can handle TSO for IPv6 packets */
> +#define TUN_F_TSO_ECN	0x08	/* I can handle TSO with ECN bits. */
> +
> +/* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
> +#define TUN_PKT_STRIP	0x0001
> +struct tun_pi {
> +	__u16  flags;
> +	__be16 proto;
> +};
> +
> +/*
> + * Filter spec (used for SETXXFILTER ioctls)
> + * This stuff is applicable only to the TAP (Ethernet) devices.
> + * If the count is zero the filter is disabled and the driver accepts
> + * all packets (promisc mode).
> + * If the filter is enabled in order to accept broadcast packets
> + * broadcast addr must be explicitly included in the addr list.
> + */
> +#define TUN_FLT_ALLMULTI 0x0001 /* Accept all multicast packets */
> +struct tun_filter {
> +	__u16  flags; /* TUN_FLT_ flags see above */
> +	__u16  count; /* Number of addresses */
> +	__u8   addr[0][ETH_ALEN];
> +};
> +
> +#endif /* __IF_TUN_H */
> diff --git a/ip/Makefile b/ip/Makefile
> index 3c185cf..fd16fe9 100644
> --- a/ip/Makefile
> +++ b/ip/Makefile
> @@ -1,6 +1,6 @@
>  IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o \
>      rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \
> -    ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \
> +    ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o iptuntap.o \
>      ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o \
>      iplink_vlan.o link_veth.o link_gre.o iplink_can.o
>  
> diff --git a/ip/ip.c b/ip/ip.c
> index 2bd54b2..d846a76 100644
> --- a/ip/ip.c
> +++ b/ip/ip.c
> @@ -47,7 +47,7 @@ static void usage(void)
>  "Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n"
>  "       ip [ -force ] -batch filename\n"
>  "where  OBJECT := { link | addr | addrlabel | route | rule | neigh | ntable |\n"
> -"                   tunnel | maddr | mroute | monitor | xfrm }\n"
> +"                   tunnel | tuntap | maddr | mroute | monitor | xfrm }\n"
>  "       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
>  "                    -f[amily] { inet | inet6 | ipx | dnet | link } |\n"
>  "                    -o[neline] | -t[imestamp] | -b[atch] [filename] }\n");
> @@ -75,6 +75,8 @@ static const struct cmd {
>  	{ "link",	do_iplink },
>  	{ "tunnel",	do_iptunnel },
>  	{ "tunl",	do_iptunnel },
> +	{ "tuntap",	do_iptuntap },
> +	{ "tap",	do_iptuntap },
>  	{ "monitor",	do_ipmonitor },
>  	{ "xfrm",	do_xfrm },
>  	{ "mroute",	do_multiroute },
> diff --git a/ip/ip_common.h b/ip/ip_common.h
> index 273065f..c857667 100644
> --- a/ip/ip_common.h
> +++ b/ip/ip_common.h
> @@ -32,6 +32,7 @@ extern int do_ipneigh(int argc, char **argv);
>  extern int do_ipntable(int argc, char **argv);
>  extern int do_iptunnel(int argc, char **argv);
>  extern int do_ip6tunnel(int argc, char **argv);
> +extern int do_iptuntap(int argc, char **argv);
>  extern int do_iplink(int argc, char **argv);
>  extern int do_ipmonitor(int argc, char **argv);
>  extern int do_multiaddr(int argc, char **argv);
> diff --git a/ip/iptuntap.c b/ip/iptuntap.c
> new file mode 100644
> index 0000000..f7f64bc
> --- /dev/null
> +++ b/ip/iptuntap.c
> @@ -0,0 +1,321 @@
> +/*
> + * iptunnel.c	       "ip tuntap"
> + *
> + *		This program is free software; you can redistribute it and/or
> + *		modify it under the terms of the GNU General Public License
> + *		as published by the Free Software Foundation; either version
> + *		2 of the License, or (at your option) any later version.
> + *
> + * Authors:	David Woodhouse <David.Woodhouse@intel.com>
> + *
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <arpa/inet.h>
> +#include <sys/ioctl.h>
> +#include <linux/if.h>
> +#include <linux/if_tun.h>
> +#include <pwd.h>
> +#include <grp.h>
> +#include <fcntl.h>
> +#include <dirent.h>
> +#include <errno.h>
> +
> +#include "rt_names.h"
> +#include "utils.h"
> +#include "ip_common.h"
> +
> +#define TUNDEV "/dev/net/tun"
> +
> +static void usage(void) __attribute__((noreturn));
> +
> +static void usage(void)
> +{
> +	fprintf(stderr, "Usage: ip tuntap { add | del } [ dev PHYS_DEV ] \n");
> +	fprintf(stderr, "          [ mode { tun | tap } ] [ user USER ] [ group GROUP ]\n");
> +	fprintf(stderr, "          [ one_queue ] [ pi ] [ vnet_hdr ]\n");
> +	fprintf(stderr, "\n");
> +	fprintf(stderr, "Where: USER  := { STRING | NUMBER }\n");
> +	fprintf(stderr, "       GROUP := { STRING | NUMBER }\n");
> +	exit(-1);
> +}
> +
> +static int tap_add_ioctl(struct ifreq *ifr, uid_t uid, gid_t gid)
> +{
> +	int fd = open(TUNDEV, O_RDWR);
> +	int ret = -1;
> +
> +	ifr->ifr_flags |= IFF_TUN_EXCL;
> +
> +	fd = open(TUNDEV, O_RDWR);
> +	if (fd < 0) {
> +		perror("open");
> +		return -1;
> +	}
> +	if (ioctl(fd, TUNSETIFF, ifr)) {
> +		perror("ioctl(TUNSETIFF)");
> +		goto out;
> +	}
> +	if (uid != -1 && ioctl(fd, TUNSETOWNER, uid)) {
> +		perror("ioctl(TUNSETOWNER)");
> +		goto out;
> +	}
> +	if (gid != -1 && ioctl(fd, TUNSETGROUP, gid)) {
> +		perror("ioctl(TUNSETGROUP)");
> +		goto out;
> +	}
> +	if (ioctl(fd, TUNSETPERSIST, 1)) {
> +		perror("ioctl(TUNSETPERSIST)");
> +		goto out;
> +	}
> +	ret = 0;
> + out:
> +	close(fd);
> +	return ret;
> +}
> +
> +static int tap_del_ioctl(struct ifreq *ifr)
> +{
> +	int fd = open(TUNDEV, O_RDWR);
> +	int ret = -1;
> +
> +	if (fd < 0) {
> +		perror("open");
> +		return -1;
> +	}
> +	if (ioctl(fd, TUNSETIFF, ifr)) {
> +		perror("ioctl(TUNSETIFF)");
> +		goto out;
> +	}
> +	if (ioctl(fd, TUNSETPERSIST, 0)) {
> +		perror("ioctl(TUNSETPERSIST)");
> +		goto out;
> +	}
> +	ret = 0;
> + out:
> +	close(fd);
> +	return ret;
> +	
> +}
> +static int parse_args(int argc, char **argv, struct ifreq *ifr, uid_t *uid, gid_t *gid)
> +{
> +	int count = 0;
> +
> +	memset(ifr, 0, sizeof(*ifr));
> +
> +	ifr->ifr_flags |= IFF_NO_PI;
> +
> +	while (argc > 0) {
> +		if (matches(*argv, "mode") == 0) {
> +			NEXT_ARG();
> +			if (matches(*argv, "tun") == 0) {
> +				if (ifr->ifr_flags & IFF_TAP) {
> +					fprintf(stderr,"You managed to ask for more than one tunnel mode.\n");
> +					exit(-1);
> +				}
> +				ifr->ifr_flags |= IFF_TUN;
> +			} else if (matches(*argv, "tap") == 0) {
> +				if (ifr->ifr_flags & IFF_TUN) {
> +					fprintf(stderr,"You managed to ask for more than one tunnel mode.\n");
> +					exit(-1);
> +				}
> +				ifr->ifr_flags |= IFF_TAP;
> +			} else {
> +				fprintf(stderr,"Cannot guess tunnel mode.\n");
> +				exit(-1);
> +			}
> +		} else if (uid && matches(*argv, "user") == 0) {
> +			char *end;
> +			unsigned long user;
> +
> +			NEXT_ARG();
> +			if (**argv && ((user = strtol(*argv, &end, 10)), !*end))
> +				*uid = user;
> +			else {
> +				struct passwd *pw = getpwnam(*argv);
> +				if (!pw) {
> +					fprintf(stderr, "invalid user \"%s\"\n", *argv);
> +					exit(-1);
> +				}
> +				*uid = pw->pw_uid;
> +			}
> +		} else if (gid && matches(*argv, "group") == 0) {
> +			char *end;
> +			unsigned long group;
> +
> +			NEXT_ARG();
> +			
> +			if (**argv && ((group = strtol(*argv, &end, 10)), !*end))
> +				*gid = group;
> +			else {
> +				struct group *gr = getgrnam(*argv);
> +				if (!gr) {
> +					fprintf(stderr, "invalid group \"%s\"\n", *argv);
> +					exit(-1);
> +				}
> +				*gid = gr->gr_gid;
> +			}
> +		} else if (matches(*argv, "pi") == 0) {
> +			ifr->ifr_flags &= ~IFF_NO_PI;
> +		} else if (matches(*argv, "one_queue") == 0) {
> +			ifr->ifr_flags |= IFF_ONE_QUEUE;
> +		} else if (matches(*argv, "vnet_hdr") == 0) {
> +			ifr->ifr_flags |= IFF_VNET_HDR;
> +		} else if (matches(*argv, "dev") == 0) {
> +			NEXT_ARG();
> +			strncpy(ifr->ifr_name, *argv, IFNAMSIZ-1);
> +		} else {
> +			if (matches(*argv, "name") == 0) {
> +				NEXT_ARG();
> +			} else if (matches(*argv, "help") == 0)
> +				usage();
> +			if (ifr->ifr_name[0])
> +				duparg2("name", *argv);
> +			strncpy(ifr->ifr_name, *argv, IFNAMSIZ);
> +		}
> +		count++;
> +		argc--; argv++;
> +	}
> +
> +	return 0;
> +}
> +
> +
> +static int do_add(int argc, char **argv)
> +{
> +	struct ifreq ifr;
> +	uid_t uid = -1;
> +	gid_t gid = -1;
> +
> +	if (parse_args(argc, argv, &ifr, &uid, &gid) < 0)
> +		return -1;
> +
> +	if (!(ifr.ifr_flags & TUN_TYPE_MASK)) {
> +		fprintf(stderr, "You failed to specify a tunnel mode\n");
> +		return -1;
> +	}
> +	return tap_add_ioctl(&ifr, uid, gid);
> +}
> +
> +static int do_del(int argc, char **argv)
> +{
> +	struct ifreq ifr;
> +
> +	if (parse_args(argc, argv, &ifr, NULL, NULL) < 0)
> +		return -1;
> +
> +	return tap_del_ioctl(&ifr);
> +}
> +
> +static int read_prop(char *dev, char *prop, long *value)
> +{
> +	char fname[IFNAMSIZ+25], buf[80], *endp;
> +	ssize_t len;
> +	int fd;
> +	long result;
> +
> +	sprintf(fname, "/sys/class/net/%s/%s", dev, prop);
> +	fd = open(fname, O_RDONLY);
> +	if (fd < 0) {
> +		if (strcmp(prop, "tun_flags"))
> +			fprintf(stderr, "open %s: %s\n", fname,
> +				strerror(errno));
> +		return -1;
> +	}
> +	len = read(fd, buf, sizeof(buf)-1);
> +	close(fd);
> +	if (len < 0) {
> +		fprintf(stderr, "read %s: %s", fname, strerror(errno));
> +		return -1;
> +	}
> +
> +	buf[len] = 0;
> +	result = strtol(buf, &endp, 0);
> +	if (*endp != '\n') {
> +		fprintf(stderr, "Failed to parse %s\n", fname);
> +		return -1;
> +	}
> +	*value = result;
> +	return 0;
> +}
> +
> +static void print_flags(long flags)
> +{
> +	if (flags & IFF_TUN)
> +		printf(" tun");
> +
> +	if (flags & IFF_TAP)
> +		printf(" tap");
> +
> +	if (!(flags & IFF_NO_PI))
> +		printf(" pi");
> +
> +	if (flags & IFF_ONE_QUEUE)
> +		printf(" one_queue");
> +
> +	if (flags & IFF_VNET_HDR)
> +		printf(" vnet_hdr");
> +
> +	flags &= ~(IFF_TUN|IFF_TAP|IFF_NO_PI|IFF_ONE_QUEUE|IFF_VNET_HDR);
> +	if (flags)
> +		printf(" UNKNOWN_FLAGS:%lx", flags);
> +}
> +
> +static int do_show(int argc, char **argv)
> +{
> +	DIR *dir;
> +	struct dirent *d;
> +	long flags, owner = -1, group = -1;
> +
> +	dir = opendir("/sys/class/net");
> +	if (!dir) {
> +		perror("opendir");
> +		return -1;
> +	}
> +	while ((d = readdir(dir))) {
> +		if (d->d_name[0] == '.' &&
> +		    (d->d_name[1] == 0 || d->d_name[1] == '.'))
> +			continue;
> +
> +		if (read_prop(d->d_name, "tun_flags", &flags))
> +			continue;
> +
> +		read_prop(d->d_name, "owner", &owner);
> +		read_prop(d->d_name, "group", &group);
> +		
> +		printf("%s:", d->d_name);
> +		print_flags(flags);
> +		if (owner != -1)
> +			printf(" user %ld", owner);
> +		if (group != -1)
> +			printf(" group %ld", group);
> +		printf("\n");
> +	}
> +	return 0;
> +}
> +
> +int do_iptuntap(int argc, char **argv)
> +{
> +	if (argc > 0) {
> +		if (matches(*argv, "add") == 0)
> +			return do_add(argc-1, argv+1);
> +		if (matches(*argv, "del") == 0)
> +			return do_del(argc-1, argv+1);
> +		if (matches(*argv, "show") == 0 ||
> +                    matches(*argv, "lst") == 0 ||
> +                    matches(*argv, "list") == 0)
> +                        return do_show(argc-1, argv+1);
> +		if (matches(*argv, "help") == 0)
> +			usage();
> +	} else
> +		return do_show(0, NULL);
> +
> +	fprintf(stderr, "Command \"%s\" is unknown, try \"ip tuntap help\".\n",
> +		*argv);
> +	exit(-1);
> +}
> 

I added it, but:
  * cleaned up whitespace
  * use if_tun.h from sanitized 2.6.30
  * ifdef IFF_TUN_EXCL flag, so can build with older code
   (currently iproute is releasing for 2.6.30)

^ permalink raw reply

* Re: [iproute2 PATCH] Add 'ip tuntap' support.
From: David Woodhouse @ 2009-09-19 19:55 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20090919125222.5f4718f1@s6510>

On Sat, 2009-09-19 at 12:52 -0700, Stephen Hemminger wrote:
> 
> I added it, but:
>   * cleaned up whitespace

Thanks.

>   * use if_tun.h from sanitized 2.6.30

Hm, what's the difference, other than the lack of IFF_TUN_EXCL?

>   * ifdef IFF_TUN_EXCL flag, so can build with older code
>    (currently iproute is releasing for 2.6.30)

IFF_TUN_EXCL is harmless (ignored) on older kernels -- when I looked at
compatibility I decided to just set it anyway. Sorry, perhaps I should
have mentioned that in the commit comment... but I'd completely
forgotten about it.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

^ permalink raw reply

* [PATCH RESEND] kaweth: Fix memory leak in kaweth_control()
From: Kevin Cernekee @ 2009-09-19 21:18 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Oliver Neukum,
	Greg Kroah-Hartman

kaweth_control() never frees the buffer that it allocates for the USB
control message.  Test case:

while :; do ifconfig eth2 down ; ifconfig eth2 up ; done

This is a tiny buffer so it is a slow leak.  If you want to speed up the
process, you can change the allocation size to e.g. 16384 bytes, and it
will consume several megabytes within a few minutes.

Signed-off-by: Kevin Cernekee <cernekee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Acked-by: Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>
---
 drivers/net/usb/kaweth.c |   18 +++++++++++-------
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c
index e2a39b9..e391ef9 100644
--- a/drivers/net/usb/kaweth.c
+++ b/drivers/net/usb/kaweth.c
@@ -263,6 +263,7 @@ static int kaweth_control(struct kaweth_device *kaweth,
 			  int timeout)
 {
 	struct usb_ctrlrequest *dr;
+	int retval;
 
 	dbg("kaweth_control()");
 
@@ -278,18 +279,21 @@ static int kaweth_control(struct kaweth_device *kaweth,
 		return -ENOMEM;
 	}
 
-	dr->bRequestType= requesttype;
+	dr->bRequestType = requesttype;
 	dr->bRequest = request;
 	dr->wValue = cpu_to_le16(value);
 	dr->wIndex = cpu_to_le16(index);
 	dr->wLength = cpu_to_le16(size);
 
-	return kaweth_internal_control_msg(kaweth->dev,
-					pipe,
-					dr,
-					data,
-					size,
-					timeout);
+	retval = kaweth_internal_control_msg(kaweth->dev,
+					     pipe,
+					     dr,
+					     data,
+					     size,
+					     timeout);
+
+	kfree(dr);
+	return retval;
 }
 
 /****************************************************************
-- 
1.6.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: SO_TIMESTAMPING fix and design decisions
From: Peter P Waskiewicz Jr @ 2009-09-19 22:09 UTC (permalink / raw)
  To: Christopher Zimmermann; +Cc: netdev@vger.kernel.org
In-Reply-To: <20090919192549.0735c93a@pundit>

On Sat, 2009-09-19 at 10:25 -0700, Christopher Zimmermann wrote:
> Hi, 
> 
> I'm currently working on the SO_TIMESTAMPING feature which is currently 
> pretty much broken. The current status is the following:
> 
> -tx software timestamps don't work because of a race condition. See 
> commit cd4d8fdad1f1320.
> -rx software timestamps do work. But they are nothing new.
> SO_TIMESTAMP[NS] has been available for years.
> 
> hardware timestamps only work for the Intel igb driver. I have access to 
> two test machines with NICs supported by this driver.
> 

Intel's 82599, supported by ixgbe, also has the same IEEE 1588
timestamping support in hardware.  We haven't implemented the support
yet in ixgbe, but the hardware is there and does work.  If you were
curious of the interface, the datasheet for the hardware is available on
our SourceForge site (e1000.sf.net).

Cheers,
-PJ


^ permalink raw reply

* [PATCH 1/2] pktgen: check for link down
From: Stephen Hemminger @ 2009-09-20  5:18 UTC (permalink / raw)
  To: David Miller, Robert Olsson, Jesper Dangaard Brouer; +Cc: netdev

If cable is pulled, pktgen shouldn't continue slamming packets into the
device.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/core/pktgen.c	2009-09-19 11:20:55.546463176 -0700
+++ b/net/core/pktgen.c	2009-09-19 11:22:44.810509240 -0700
@@ -1959,7 +1959,7 @@ static int pktgen_setup_dev(struct pktge
 	if (odev->type != ARPHRD_ETHER) {
 		printk(KERN_ERR "pktgen: not an ethernet device: \"%s\"\n", ifname);
 		err = -EINVAL;
-	} else if (!netif_running(odev)) {
+	} else if (!netif_running(odev) || !netif_carrier_ok(odev)) {
 		printk(KERN_ERR "pktgen: device is down: \"%s\"\n", ifname);
 		err = -ENETDOWN;
 	} else {
@@ -3410,7 +3410,7 @@ static void pktgen_xmit(struct pktgen_de
 	/* Did we saturate the queue already? */
 	if (netif_tx_queue_stopped(txq) || netif_tx_queue_frozen(txq)) {
 		/* If device is down, then all queues are permnantly frozen */
-		if (netif_running(odev))
+		if (netif_running(odev) && netif_carrier_ok(odev))
 			idle(pkt_dev);
 		else
 			pktgen_stop_device(pkt_dev);

^ permalink raw reply

* [PATCH 2/2] pktgen: nmi watchdog keep alive
From: Stephen Hemminger @ 2009-09-20  5:21 UTC (permalink / raw)
  To: David Miller, Robert Olsson; +Cc: netdev

If pktgen gets really busy it takes up all the CPU,
and can starve the NMI thread and cause system reset.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/core/pktgen.c	2009-09-19 11:28:53.762463050 -0700
+++ b/net/core/pktgen.c	2009-09-19 11:30:36.534459968 -0700
@@ -136,6 +136,7 @@
 #include <linux/delay.h>
 #include <linux/timer.h>
 #include <linux/list.h>
+#include <linux/nmi.h>
 #include <linux/init.h>
 #include <linux/skbuff.h>
 #include <linux/netdevice.h>
@@ -3369,6 +3370,7 @@ static void idle(struct pktgen_dev *pkt_
 {
 	ktime_t idle_start = ktime_now();
 
+	touch_nmi_watchdog();
 	if (need_resched())
 		schedule();
 	else

^ permalink raw reply

* [PATCH] netdev: stats on multiqueue possible bug
From: Stephen Hemminger @ 2009-09-20  5:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Dave, I think you were trying to optimize something here that
doesn't need optimization.

If transmit stats (all) wrap to zero, then the stats would not
be set correctly. Move local variable into loop as well.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/core/dev.c	2009-09-19 15:19:13.902495560 -0700
+++ b/net/core/dev.c	2009-09-19 15:48:21.126458728 -0700
@@ -5079,19 +5079,19 @@ const struct net_device_stats *dev_get_s
 		unsigned long tx_bytes = 0, tx_packets = 0, tx_dropped = 0;
 		struct net_device_stats *stats = &dev->stats;
 		unsigned int i;
-		struct netdev_queue *txq;
 
 		for (i = 0; i < dev->num_tx_queues; i++) {
-			txq = netdev_get_tx_queue(dev, i);
+			const struct netdev_queue *txq
+				 = netdev_get_tx_queue(dev, i);
 			tx_bytes   += txq->tx_bytes;
 			tx_packets += txq->tx_packets;
 			tx_dropped += txq->tx_dropped;
 		}
-		if (tx_bytes || tx_packets || tx_dropped) {
-			stats->tx_bytes   = tx_bytes;
-			stats->tx_packets = tx_packets;
-			stats->tx_dropped = tx_dropped;
-		}
+
+		stats->tx_bytes   = tx_bytes;
+		stats->tx_packets = tx_packets;
+		stats->tx_dropped = tx_dropped;
+
 		return stats;
 	}
 }

^ permalink raw reply

* Re: sky2 rx length errors
From: Andrew Morton @ 2009-09-20  6:35 UTC (permalink / raw)
  To: Grozdan; +Cc: linux-kernel, Stephen Hemminger, netdev
In-Reply-To: <c5bd819b0909180641n7c353b80tc15b9c9fe02d5c95@mail.gmail.com>

(added cc's from the MAINTAINERS file)

On Fri, 18 Sep 2009 15:41:45 +0200 Grozdan <neutrino8@gmail.com> wrote:

> Hi,
> 
> I have a Marvell onboard NIC (88E8053) and I've been noticing for a
> while now a bit weird behavior with the sky2 driver. This mostly
> occurs with newer kernels (2.6.30, 2.6.31) and my older distro kernel
> (2.6.27.21) does not seem to have the same problem. Basically, the
> sky2 driver will randomly and unpredictably spew rx length error
> messages and reboot itself. I also noticed in dmesg that this mostly
> occurs after "martian destination" messages. After this message, sky2
> starts spewing messages as shown below and then reboots itself. It is
> not really a big problem for me, but since I'm virtually always logged
> in in IRC, the client always loses connection, waits for a few minutes
> to get a response from the server and then relogs me again. I do not
> think it's a HW problem as the Marvell NIC otherwise works perfectly
> and I've checked my cable modem too which operates without a problem.
> Any ideas?
> 
> PS: please cc me as I'm not subscribed to the mailing list
> 
> sky2 driver version 1.23
> sky2 0000:05:00.0: PCI INT A -> GSI 36 (level, low) -> IRQ 36
> sky2 0000:05:00.0: setting latency timer to 64
> sky2 0000:05:00.0: PCI: Disallowing DAC for device
> sky2 0000:05:00.0: Yukon-2 EC chip revision 2
> sky2 0000:05:00.0: irq 53 for MSI/MSI-X
> sky2 0000:05:00.0: No interrupt generated using MSI, switching to INTx mode.
> sky2 eth0: addr 00:11:d8:a1:5b:0e
> sky2 eth0: enabling interface
> sky2 eth0: Link is up at 100 Mbps, full duplex, flow control rx
> .....
> .....
> martian destination 0.0.0.0 from 172.23.204.1, dev eth0
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x5ea0100 length 598
> sky2 eth0: rx length error: status 0x5ea0100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x5ea0100 length 598
> sky2 eth0: rx length error: status 0x5ea0100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598
> sky2 eth0: rx length error: status 0x4420100 length 598


^ permalink raw reply

* Re: [PATCH] netdev: stats on multiqueue possible bug
From: Eric Dumazet @ 2009-09-20  7:24 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20090919222608.5fdf80b8@nehalam>

Stephen Hemminger a écrit :
> Dave, I think you were trying to optimize something here that
> doesn't need optimization.
> 
> If transmit stats (all) wrap to zero, then the stats would not
> be set correctly. Move local variable into loop as well.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> --- a/net/core/dev.c	2009-09-19 15:19:13.902495560 -0700
> +++ b/net/core/dev.c	2009-09-19 15:48:21.126458728 -0700
> @@ -5079,19 +5079,19 @@ const struct net_device_stats *dev_get_s
>  		unsigned long tx_bytes = 0, tx_packets = 0, tx_dropped = 0;
>  		struct net_device_stats *stats = &dev->stats;
>  		unsigned int i;
> -		struct netdev_queue *txq;
>  
>  		for (i = 0; i < dev->num_tx_queues; i++) {
> -			txq = netdev_get_tx_queue(dev, i);
> +			const struct netdev_queue *txq
> +				 = netdev_get_tx_queue(dev, i);
>  			tx_bytes   += txq->tx_bytes;
>  			tx_packets += txq->tx_packets;
>  			tx_dropped += txq->tx_dropped;
>  		}
> -		if (tx_bytes || tx_packets || tx_dropped) {
> -			stats->tx_bytes   = tx_bytes;
> -			stats->tx_packets = tx_packets;
> -			stats->tx_dropped = tx_dropped;
> -		}
> +
> +		stats->tx_bytes   = tx_bytes;
> +		stats->tx_packets = tx_packets;
> +		stats->tx_dropped = tx_dropped;
> +
>  		return stats;
>  	}
>  }

Most devices dont update txq->tx_bytes/tx_packets/tx_dropped yet, but
still update their device->stats

Your patch makes these devices clearing their stats.

In the case all stats wrap to zero, we'll give old values. If you think this is 
a bug, you should find another way to fix it :)

^ permalink raw reply

* Re: SO_TIMESTAMPING fix and design decisions
From: Christopher Zimmermann @ 2009-09-20  7:52 UTC (permalink / raw)
  To: Peter P Waskiewicz Jr, netdev@vger.kernel.org
In-Reply-To: <1253398161.14869.2.camel@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]

On Sat, 19 Sep 2009 15:09:21 -0700
Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> wrote:

> > hardware timestamps only work for the Intel igb driver. I have 
> > access to two test machines with NICs supported by this driver.
> 
> Intel's 82599, supported by ixgbe, also has the same IEEE 1588
> timestamping support in hardware.  We haven't implemented the support
> yet in ixgbe, but the hardware is there and does work.  If you were
> curious of the interface, the datasheet for the hardware is available on
> our SourceForge site (e1000.sf.net).

hi! thanks for the reply.

I already got the documentation for the 82576 cards I have access to. I 
won't be able to afford another pair.

What do you think about my idea to expose the relevant registers to 
userspace? I believe it would not be too difficult for userspace to 
configure the timestamps this way and would allow way more flexibility. 
Of course I would #DEFINE the constants used to set the registers.

Christopher Zimmermann

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [AX25] kernel panic
From: Bernard Pidoux @ 2009-09-20  8:42 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Ralf Baechle DL5RB, Linux Netdev List, linux-hams
In-Reply-To: <20090911120557.GA12175@linux-mips.org>

[-- Attachment #1: Type: text/plain, Size: 174 bytes --]

Hi,

Here are the first events noticed since I turned on 
CONFIG_DEBUG_OBJECTS_TIMERS option.

First a kernel BUG, second a kernel panic.

Best regards,

Bernard Pidoux






[-- Attachment #2: kernel_bug --]
[-- Type: text/plain, Size: 4116 bytes --]

------------[ cut here ]------------
kernel BUG at kernel/timer.c:913!
invalid opcode: 0000 [#1] 
last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
CPU 0 
Modules linked in: netconsole netrom mkiss rose ax25 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc af_packet ipv6 binfmt_misc loop ext3 jbd cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table snd_via82xx snd_ac97_codec ac97_bus snd_mpu401_uart snd_rawmidi snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd soundcore shpchp pci_hotplug i2c_viapro i2c_core via_agp floppy 8139cp 8139too mii sg sr_mod processor rtc_cmos button thermal evdev pata_via ata_generic ide_pci_generic pata_acpi sata_via libata sd_mod scsi_mod crc_t10dif
Pid: 24497, comm: astropulse_5.06 Not tainted 2.6.31-nosmp #3 MS-7258
RIP: 0010:[<ffffffff81061412>]  [<ffffffff81061412>] cascade+0xb2/0xc0
RSP: 0000:ffffffff8155be00  EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffff880050c4c218 RCX: 0000000104445204
RDX: ffffffff8132b3d0 RSI: ffff880050c4c218 RDI: ffffffff81673540
RBP: ffffffff8155be40 R08: 0000000000000004 R09: ffffffff81568bc0
R10: ffffffff8155be28 R11: 0000000000000001 R12: ffffffff81673540
R13: ffffffff8155be00 R14: 0000000000000012 R15: 0000000000000001
FS:  00000000016e2860(0063) GS:ffffffff81558000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe3eb15029c CR3: 000000007ecff000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process astropulse_5.06 (pid: 24497, threadinfo ffff88005807a000, task ffff880049588000)
Stack:
 ffff8800445748e0 ffff88007e86ab98 ffffffff8155be88 000000008c85481d
<0> 0000000000000000 ffffffff81673540 ffffffff8155be70 0000000000000081
<0> ffffffff8155bec0 ffffffff81061613 ffffffff81675150 ffffffff81674d50
Call Trace:
 <IRQ> 
 [<ffffffff81061613>] run_timer_softirq+0xf3/0x250
 [<ffffffff8102d0db>] ? lapic_next_event+0x2b/0x50
 [<ffffffff8107fc42>] ? clockevents_program_event+0x62/0xc0
 [<ffffffff8105b242>] __do_softirq+0xe2/0x1d0
 [<ffffffff8101414a>] call_softirq+0x1a/0x30
 [<ffffffff810162c5>] do_softirq+0x75/0xc0
 [<ffffffff8105ac45>] irq_exit+0x65/0x80
 [<ffffffff8102ddb5>] smp_apic_timer_interrupt+0x65/0xb0
 [<ffffffff81013c73>] apic_timer_interrupt+0x13/0x20
 <EOI> 
Code: 45 fe ff ff 4c 39 eb 48 8b 13 75 dd 48 8b 55 d8 65 48 33 14 25 28 00 00 00 44 89 f0 75 11 48 83 c4 20 5b 41 5c 41 5d 41 5e c9 c3 <0f> 0b eb fe e8 c5 29 ff ff 0f 1f 44 00 00 55 48 89 e5 48 83 ec 
RIP  [<ffffffff81061412>] cascade+0xb2/0xc0
 RSP <ffffffff8155be00>
---[ end trace 2efb8d4aaedbf503 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 24497, comm: astropulse_5.06 Tainted: G      D    2.6.31-nosmp #3
Call Trace:
 <IRQ>  [<ffffffff813d5e52>] panic+0xb2/0x180
 [<ffffffff8132f8ff>] ? __kfree_skb+0x6f/0xe0
 [<ffffffff810549bd>] ? console_unblank+0x8d/0xd0
 [<ffffffff812bdcee>] ? unblank_screen+0x1e/0x40
 [<ffffffff81053a36>] ? oops_exit+0x36/0x60
 [<ffffffff81017947>] oops_end+0xe7/0x100
 [<ffffffff810179bd>] ? oops_begin+0x5d/0x80
 [<ffffffff81017b52>] die+0x62/0xa0
 [<ffffffff81014e86>] do_trap+0x166/0x190
 [<ffffffff8107715d>] ? notify_die+0x3d/0x60
 [<ffffffff810153a5>] do_invalid_op+0xa5/0xd0
 [<ffffffff81061412>] ? cascade+0xb2/0xc0
 [<ffffffff8104dcde>] ? wake_up_state+0x1e/0x40
 [<ffffffff81013ddb>] invalid_op+0x1b/0x20
 [<ffffffff8132b3d0>] ? sock_def_write_space+0x0/0xb0
 [<ffffffff81061412>] ? cascade+0xb2/0xc0
 [<ffffffff81061613>] run_timer_softirq+0xf3/0x250
 [<ffffffff8102d0db>] ? lapic_next_event+0x2b/0x50
 [<ffffffff8107fc42>] ? clockevents_program_event+0x62/0xc0
 [<ffffffff8105b242>] __do_softirq+0xe2/0x1d0
 [<ffffffff8101414a>] call_softirq+0x1a/0x30
 [<ffffffff810162c5>] do_softirq+0x75/0xc0
 [<ffffffff8105ac45>] irq_exit+0x65/0x80
 [<ffffffff8102ddb5>] smp_apic_timer_interrupt+0x65/0xb0
 [<ffffffff81013c73>] apic_timer_interrupt+0x13/0x20
 <EOI> 
Rebooting in 60 seconds..[root@f6bvp-9 bernard]# 


^ permalink raw reply

* Re: [AX25] kernel panic
From: f8arr @ 2009-09-20  9:09 UTC (permalink / raw)
  To: Bernard Pidoux
  Cc: Jarek Poplawski, Ralf Baechle DL5RB, Linux Netdev List,
	linux-hams
In-Reply-To: <4AB5EAE5.6070605@upmc.fr>

hi Bernard,

My problem wasn't exactly the same has your.
But !

Doing a boot with the kernel option "nosmp" wasn't suffisant... it 
seemed to load the smp and then, while reading the boot option, 
unsetting them... but it was too late. (I saw that looking the dmesg).

My solution has been to compile a kernel without smp to be sure that 
nothing goes up... and it worked !

Regards
f8arr

Bernard Pidoux a écrit :
> Hi,
>
> Here are the first events noticed since I turned on 
> CONFIG_DEBUG_OBJECTS_TIMERS option.
>
> First a kernel BUG, second a kernel panic.
>
> Best regards,
>
> Bernard Pidoux
>
>
>
>
>

^ permalink raw reply

* Re: [iproute2] tc action mirred    question
From: Xiaofei Wu @ 2009-09-20  9:58 UTC (permalink / raw)
  To: hadi; +Cc: linux netdev
In-Reply-To: <1253104099.4584.9.camel@dogo.mojatatu.com>

Hi,

I come across another problem.

network topology:
 M
  |
 A
/  \
B  D
\  /
 C

node M  < ---- > node C
common path: M-A-B-C
the other path: M-A-D-C

With your help I can mirror the outgoing packets(node A wlan0) to wlan1(node A), then transmit it to D. D will route them to C.

There will be another problem.

When the link A-B is not available, there is no packect going out to mirror, node M could not get to node C. (if B is broken, A use ARP to ask the MAC of B's IP addr, but no reply)

So I want to forward the incoming packets( node M -> A(eth0) ) to wlan0(node A) and wlan1(node A) at the same time, route them separately. In this case, if one path is unavailable, it will not influence the other path.

Could iproute2 'tc' do this?

regards,
wu

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox