Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] netfilter: Xtables: idletimer target implementation
From: Luciano Coelho @ 2010-06-03  7:04 UTC (permalink / raw)
  To: ext Jan Engelhardt
  Cc: netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
	kaber@trash.net, Timo Teras
In-Reply-To: <1275512485.2797.46.camel@powerslave>

On Wed, 2010-06-02 at 23:01 +0200, Coelho Luciano (Nokia-D/Helsinki)
wrote:
> On Wed, 2010-06-02 at 22:04 +0200, Coelho Luciano (Nokia-D/Helsinki)
> wrote:
> > What causes printk to appear under /sys/module even when compiled in, is
> > that it uses a module param.  This line:
> > 
> > module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
> > 
> > ...is what triggers the printk directory to be created in sysfs.  If I
> > add a similar line in my module, it shows up there too.
> > 
> > I still don't know if there is an actual kobject associated with it,
> > I'll check that next.
> 
> Okay, so here is how it goes: if the module is linked into the kernel
> and it has module parameters, the kernel creates a kobj for it as a
> module_ktype without parent, which will cause it to show up
> in /sys/modules.
> 
> I could do the same in the module initialization when THIS_MODULE ==
> NULL, but I don't see any other module doing this.  In fact, I only see
> the kernel itself creating kobjects of module_ktype (in load_module()
> and in the case I just described).  Smells like a terrible hack to do
> that in the module itself... :(
> 
> Adding bogus parameters to the module just to trig the kernel to create
> the kobject also seems to be too hacky...

Looking closer, it seems that it makes a bit of sense to add a kernel
module to /sys/device/system.  I think it makes more sense than adding
to the module class or to the net class, actually.  The idletimer is not
a net device (so it doesn't fit in /sys/class/net) and it is not a
module, even though it may be handled by the xt_IDLETIMER module.

So we can look at the xt_idletimer as a system device, which is not a
peripheral device in itself, but a software timer device (there are
already similar components).

I'll add the kernel object we need as a system class device, so it will
go under /sys/devices/system/xt_idletimer.  Does that make sense to you?


-- 
Cheers,
Luca.


^ permalink raw reply

* Re: [net-next PATCH] drivers/net/enic: Use (pr|netdev)_<level> macro helpers
From: Scott Feldman @ 2010-06-03  7:19 UTC (permalink / raw)
  To: Joe Perches; +Cc: netdev, LKML
In-Reply-To: <1275492616.2489.20.camel@Joe-Laptop.home>

On 6/2/10 8:30 AM, "Joe Perches" <joe@perches.com> wrote:

> Compile tested only
> 
> Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> Remove #define PFX
> Use pr_<level>
> Use netdev_<level>
> Remove trailing periods from most formats
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Thanks Joe.  I tested it and there are a couple places before the netdev is
registered where we need dev_<level> rather than netdev_<level>, otherwise
we get output like:

enic 0000:08:00.0: (unregistered net_device): vNIC MAC addr
00:25:b5:19:35:8f wq/rq 256/512

I like the dev_name(dev) in there but not the "(unregister net_device)".

Also there where a couple of lines longer than 80 chars.

We can fix up the patch and resubmit if you like.

-scott

^ permalink raw reply

* Re: [PATCH v2] netfilter: Xtables: idletimer target implementation
From: Jan Engelhardt @ 2010-06-03  7:58 UTC (permalink / raw)
  To: Luciano Coelho
  Cc: netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
	kaber@trash.net, Timo Teras
In-Reply-To: <1275548660.10855.4.camel@chilepepper>


On Thursday 2010-06-03 09:04, Luciano Coelho wrote:
>
>Looking closer, it seems that it makes a bit of sense to add a kernel
>module to /sys/device/system.  I think it makes more sense than adding
>to the module class or to the net class, actually.  The idletimer is not
>a net device (so it doesn't fit in /sys/class/net) and it is not a
>module, even though it may be handled by the xt_IDLETIMER module.
>
>So we can look at the xt_idletimer as a system device, which is not a
>peripheral device in itself, but a software timer device (there are
>already similar components).
>
>I'll add the kernel object we need as a system class device, so it will
>go under /sys/devices/system/xt_idletimer.  Does that make sense to you?

Mh.. somehow I'd pick /sys/devices/virtual/xt_idletimer.
Or even create a /sys/net/xt_idletimer. (/sys has conceptual
subsystems directly beneath it: devices, fs, kernel, ...)

^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-06-03  8:01 UTC (permalink / raw)
  To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1275226150.3587.9.camel@bigi>

On Sun, May 30, 2010 at 09:29:10AM -0400, jamal wrote:
>
> I have constructed a test case (attached) and my fear is unfortunately
> still there;-< What am i doing wrong?
> 
> The packet path is:
> -->eth0-->tcpdump eth0-->pedit-->mirror to dummy0-->tcpdump dummy0

Well this doesn't guarantee a cloned packet at all.  Once af_packet
receives the packet it'll wake up any listeners like tcpdump, if
tcpdump gets to it before pedit runs then the packet won't be
cloned anymore.

Anyway, I don't see why actions are special.  Everybody else lives
by the rule that cloned skbs are not writeable.  So if this was
indeed buggy as you say it would have shown up a long time ago.

Case in point, we had a bug in certain NIC drivers where they
modified cloned skbs for TSO.  This quickly showed up as bogus
packets in tcpdump and we fixed it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Proposed linux kernel changes : scaling  tcp/ip stack
From: Mitchell Erblich @ 2010-06-03  8:16 UTC (permalink / raw)
  To: netdev

To whom it may concern,

First, my assumption is to keep this discussion local to just a few tcp/ip
developers to see if there is any consensus that the below is a logical 
approach. Please also pass this email if there is a "owner(s)" of this stack
to identify if a case exists for the below possible changes.

I am not currently on the linux kernel mail group.

I have experience with modifications of the Linux tcp/ip stack, and have
merged the changes into the company's local tree and left the possible 
global integration to others.

I have been approached by a number of companies about scaling the
stack with the assumption of a number of cpu cores. At present, I find extra
time on my hands and am considering looking into this area on my own.

The first assumption is that if extra cores are available, that a single
received homogeneous flow of a large number of packets/segments per
second (pps) can be split into non-equal flows. This split can in effect
allow a larger recv'd pps rate at the same core load while splitting off
other workloads, such as xmit'ing pure ACKs.

Simply, again assuming Amdahl's law (and not looking to equalize the load
between cores), and creating logical separations where in a many core 
system, different cores could have new kernel threads  that operate in 
parallel within the tcp/ip stack. The initial separation points would be at 
the ip/tcp layer boundry and where any recv'd sk/pkt would generate some 
form of output.

The ip/tcp layer would be split like the vintage AT&T STREAMs protocol,
with some form of queuing & scheduling, would be needed. In addition,
the queuing/schedullng of other kernel threads would occur within ip & tcp
to separate the I/O.

A possible validation test is to identify the max recv'd pps rate within the
tcp/ip modules within normal flow TCP established state with normal order 
of say 64byte non fragmented segments, before and after each 
incremental change. Or the same rate with fewer core/cpu cycles.

I am willing to have a private git Linux.org tree that concentrates proposed
changes into this tree and if there is willingness, a seen want/need then identify
how to implement the merge.

		Mitchell Erblich
		UNIX Kernel Engineer

^ permalink raw reply

* [Patch] infiniband: check local reserved ports
From: Amerigo Wang @ 2010-06-03  8:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Tetsuo Handa, Amerigo Wang, Roland Dreier, davem


Since Tetsuo's patch already got merged, now this is the missing part
for local port reservation.

Cc: Roland Dreier <rdreier@cisco.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: WANG Cong <amwang@redhat.com>

---
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index b930b81..7b89bab 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1978,6 +1978,7 @@ static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 	rover = net_random() % remaining + low;
 retry:
 	if (last_used_port != rover &&
+	    !inet_is_reserved_local_port(rover) &&
 	    !idr_find(ps, (unsigned short) rover)) {
 		int ret = cma_alloc_port(ps, id_priv, rover);
 		/*

^ permalink raw reply related

* RE: [PATCH] ppp_generic: fix multilink fragment sizes
From: Paoloni, Gabriele @ 2010-06-03  8:41 UTC (permalink / raw)
  To: Ben McKeegan
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, alan@lxorguk.ukuu.org.uk,
	linux-ppp@vger.kernel.org, paulus@samba.org
In-Reply-To: <4C067EF7.9040609@netservers.co.uk>

Hi

I agree with you about replacing totlen with len (actually the previous one was quite bad).
I think we don't need to round up anyway and nbigger is doing his job I think. Basically we are giving just one more byte to the first nbigger free channels and for the rest the integer division will round down automatically.

For example say you have before transmitting 5 free channels and len is 83bytes

nbigger will be (ln 1284) 83%5=3

the frame will be split as follows: 17 - 17 - 17 - 16 - 16

Since for the first three iterations the channel to tx on will get an extra byte (ln1427) and nbigger will be decreased by one (ln1428).

The only change I would make to the code is to replace totlen with len @ln1425

Now if you agree either me or you can submit a new patch.

Regards

Gabriele Paoloni   

>-----Original Message-----
>From: Ben McKeegan [mailto:ben@netservers.co.uk]
>Sent: 02 June 2010 16:56
>To: Paoloni, Gabriele
>Cc: davem@davemloft.net; netdev@vger.kernel.org; linux-
>kernel@vger.kernel.org; alan@lxorguk.ukuu.org.uk; linux-
>ppp@vger.kernel.org; paulus@samba.org
>Subject: Re: [PATCH] ppp_generic: fix multilink fragment sizes
>
>Paoloni, Gabriele wrote:
>> The proposed patch looks wrong to me.
>>
>> nbigger is already doing the job; I didn't use DIV_ROUND_UP because in
>general we don't have always to roundup, otherwise we would exceed the
>total bandwidth.
>
>I was basing this on the original code prior to your patch, which used
>DIV_ROUND_UP to get the fragment size.  Looking more closely I see your
>point, the original code was starting with the larger fragment size and
>decrementing rather than starting with the smaller size and incrementing
>as your code does, so that makes sense.
>
>
>>
>>  		flen = len;
>>  		if (nfree > 0) {
>>  			if (pch->speed == 0) {
>> -				flen = totlen/nfree;
>> +				if (nfree > 1)
>> +					flen = DIV_ROUND_UP(len, nfree);
>>  				if (nbigger > 0) {
>>  					flen++;
>>  					nbigger--;
>
>The important change here is the use of 'len' instead of 'totlen'.
>'nfree' and 'len' should decrease roughly proportionally with each
>iteration of the loop whereas 'totlen' remains unchanged.  Thus
>(totlen/nfree) gets bigger on each iteration whereas len/nfree should
>give roughly the same.  However, without rounding up here I'm not sure
>the logic is right either, since the side effect of nbigger is to make
>len decrease faster so it is not quite proportional to the decrease in
>nfree.  Is there a risk of ending up on the nfree == 1 iteration with
>flen == len - 1 and thus generating a superfluous extra 1 byte long
>fragment?  This would be a far worse situation than a slight imbalance
>in the size of the fragments.
>
>Perhaps the solution is to go back to a precalculated fragment size for
>the pch->speed == 0 case as per original code?
>
>Regards,
>Ben.
--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.



^ permalink raw reply

* Re: [PATCH v2] netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer
From: Junchang Wang @ 2010-06-03  8:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: sonic zhang, David Miller, netdev, uclinux-dist-devel
In-Reply-To: <1275537919.29413.55.camel@edumazet-laptop>

Hi Eric,

On Thu, Jun 3, 2010 at 12:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Not related to your patch, but reviewing it I see this driver still do
> the "dev->stats.tx_packets++; dev->stats.tx_bytes += (skb->len);"
>
> This is not necessary and expensive, since we update txq stats in core
> network stack.
>
>        rc = ops->ndo_start_xmit(skb, dev);
>        if (rc == NETDEV_TX_OK)
>                txq_trans_update(txq);  << here >>
>

Good suggestion for drivers. But I wonder whether there are stats for
received packets in core network stack.

I.e., can I replace "dev->stats.rx_packets++" and "dev->stats.rx_bytes
+= (skb->len);" with something already maintained by core stack? I
failed to find them.

Thanks.


-- 
--Junchang

^ permalink raw reply

* Re: Proposed linux kernel changes : scaling  tcp/ip stack
From: Eric Dumazet @ 2010-06-03  9:14 UTC (permalink / raw)
  To: Mitchell Erblich; +Cc: netdev
In-Reply-To: <FDFFEFAB-A741-4232-821E-17BFAE5CAFAC@earthlink.net>

Le jeudi 03 juin 2010 à 01:16 -0700, Mitchell Erblich a écrit :
> To whom it may concern,
> 
> First, my assumption is to keep this discussion local to just a few tcp/ip
> developers to see if there is any consensus that the below is a logical 
> approach. Please also pass this email if there is a "owner(s)" of this stack
> to identify if a case exists for the below possible changes.
> 
> I am not currently on the linux kernel mail group.
> 			
> I have experience with modifications of the Linux tcp/ip stack, and have
> merged the changes into the company's local tree and left the possible 
> global integration to others.
> 
> I have been approached by a number of companies about scaling the
> stack with the assumption of a number of cpu cores. At present, I find extra
> time on my hands and am considering looking into this area on my own.
> 
> The first assumption is that if extra cores are available, that a single
> received homogeneous flow of a large number of packets/segments per
> second (pps) can be split into non-equal flows. This split can in effect
> allow a larger recv'd pps rate at the same core load while splitting off
> other workloads, such as xmit'ing pure ACKs.
> 
> Simply, again assuming Amdahl's law (and not looking to equalize the load
> between cores), and creating logical separations where in a many core 
> system, different cores could have new kernel threads  that operate in 
> parallel within the tcp/ip stack. The initial separation points would be at 
> the ip/tcp layer boundry and where any recv'd sk/pkt would generate some 
> form of output.
> 
> The ip/tcp layer would be split like the vintage AT&T STREAMs protocol,
> with some form of queuing & scheduling, would be needed. In addition,
> the queuing/schedullng of other kernel threads would occur within ip & tcp
> to separate the I/O.
> 
> A possible validation test is to identify the max recv'd pps rate within the
> tcp/ip modules within normal flow TCP established state with normal order 
> of say 64byte non fragmented segments, before and after each 
> incremental change. Or the same rate with fewer core/cpu cycles.
> 
> I am willing to have a private git Linux.org tree that concentrates proposed
> changes into this tree and if there is willingness, a seen want/need then identify
> how to implement the merge.

Hi Mitchell

We work everyday to improve network stack, and standard linux tree is
pretty scalable, you dont need to setup a separate git tree for that.

Our beloved maintainer David S. Miller handles two trees, net-2.6 and
net-next-2.6 where we put all our changes.

http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git

I suggest you read the last patches (say .. about 10.000 of them), to
have an idea of things we did during last years.

keywords : RCU, multiqueue, RPS, percpu data, lockless algos, cache line
placement...

Its nice to see another man joining the team !

Thanks



^ permalink raw reply

* [PATCH] ppp_generic: fix multilink fragment sizes
From: Ben McKeegan @ 2010-06-03  9:14 UTC (permalink / raw)
  To: davem; +Cc: ben, netdev, linux-kernel, gabriele.paoloni, alan, linux-ppp,
	paulus
In-Reply-To: <DF7BB929B28FCF479E888E3D9F8D9E88D3E1706A@irsmsx502.ger.corp.intel.com>

Fix bug in multilink fragment size calculation introduced by
commit 9c705260feea6ae329bc6b6d5f6d2ef0227eda0a
"ppp: ppp_mp_explode() redesign"

Signed-off-by: Ben McKeegan <ben@netservers.co.uk>
---
 drivers/net/ppp_generic.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
index 0db3894..c980f74 100644
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -1416,7 +1416,7 @@ static int ppp_mp_explode(struct ppp *ppp, struct sk_buff *skb)
 		flen = len;
 		if (nfree > 0) {
 			if (pch->speed == 0) {
-				flen = totlen/nfree;
+				flen = len/nfree;
 				if (nbigger > 0) {
 					flen++;
 					nbigger--;
-- 
1.5.6.5


^ permalink raw reply related

* Re: [PATCH v2] netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer
From: Eric Dumazet @ 2010-06-03  9:19 UTC (permalink / raw)
  To: Junchang Wang; +Cc: sonic zhang, David Miller, netdev, uclinux-dist-devel
In-Reply-To: <AANLkTinrW2w9x6dbzTAzSrElqUueFOC6tJaPNsVcnLBv@mail.gmail.com>

Le jeudi 03 juin 2010 à 16:57 +0800, Junchang Wang a écrit :
> Hi Eric,
> 
> On Thu, Jun 3, 2010 at 12:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Not related to your patch, but reviewing it I see this driver still do
> > the "dev->stats.tx_packets++; dev->stats.tx_bytes += (skb->len);"
> >
> > This is not necessary and expensive, since we update txq stats in core
> > network stack.
> >
> >        rc = ops->ndo_start_xmit(skb, dev);
> >        if (rc == NETDEV_TX_OK)
> >                txq_trans_update(txq);  << here >>
> >
> 
> Good suggestion for drivers. But I wonder whether there are stats for
> received packets in core network stack.
> 

No its not there.

> I.e., can I replace "dev->stats.rx_packets++" and "dev->stats.rx_bytes
> += (skb->len);" with something already maintained by core stack? I
> failed to find them.
> 

As I said, core network takes care of three counters only, because it
was 'free', as they share a cache line with a spinlock we must hold when
calling xmit function.

In receive path, we dont dirty a cache line in core network, so updating
counters would add a cost. (modern NICs handle stats in firmware)




^ permalink raw reply

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to classes
From: Kay Sievers @ 2010-06-03  9:30 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Johannes Berg, Greg KH, netdev
In-Reply-To: <m17hmhrl6v.fsf_-_@fess.ebiederm.org>

On Thu, Jun 3, 2010 at 02:53, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> In the last painful restructuring of sysfs we created started
> creating class directories under normal devices so we could place
> devices such as network devices directly under their the hardware
> that implements them instead of in their class directories like
> /sys/class/net/.  This creation of class directories avoids the
> need to worry about namespace clonflicts if something is renamed.
>
> A special exception was made for devices that were still placed
> directly in their class directory.  Looking at how this interacts
> with the wireless network devices it appears this special exception
> is either completely unneeded or at least needs to be restricted to
> a parent device with the same class as the child device.  Certainly
> in the case of unrelated classes we very much have the possibility
> of namespace classes and we should be creating the subdirectory.

The class-glue-directories are only created between a bus-parent and
and a class device. Class devices usually don't have other class
devices as parents, that's why it wasn't done that way.

If people use class devices from other classes as parents, they should
definitely convert the class that acts as a parent to a bus, to fit
into the usual model. All that was really never meant to be used that
way. The current behavior, to not to create the glue-directory is at
least the intended one from the driver core's perspective.

What kind of classes do this, where this change would help or would be needed?

I don't mind trying if that change will work for people, I can't tell
if there are any other users doing things like this which could break
with such a change. Stuff like udev will be fine with directories
inserted, but there are many things out there, that just access their
parents attributes with ../../foo, which might no longer work when we
insert directories.

Thanks,
Kay

^ permalink raw reply

* [patch] isdn/kcapi: return -EFAULT on copy_from_user errors
From: Dan Carpenter @ 2010-06-03  9:56 UTC (permalink / raw)
  To: Karsten Keil
  Cc: David S. Miller, Jan Kiszka, Tilman Schmidt, netdev,
	kernel-janitors

copy_from_user() returns the number of bytes remaining but we should
return -EFAULT here.  The error code gets returned to the user.  Both 
old_capi_manufacturer() and capi20_manufacturer() had other places
that already returned -EFAULT so this won't break anything.

Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/drivers/isdn/capi/kcapi.c b/drivers/isdn/capi/kcapi.c
index bde3c88..b054494 100644
--- a/drivers/isdn/capi/kcapi.c
+++ b/drivers/isdn/capi/kcapi.c
@@ -1020,12 +1020,12 @@ static int old_capi_manufacturer(unsigned int cmd, void __user *data)
 		if (cmd == AVMB1_ADDCARD) {
 		   if ((retval = copy_from_user(&cdef, data,
 					    sizeof(avmb1_carddef))))
-			   return retval;
+			   return -EFAULT;
 		   cdef.cardtype = AVM_CARDTYPE_B1;
 		} else {
 		   if ((retval = copy_from_user(&cdef, data,
 					    sizeof(avmb1_extcarddef))))
-			   return retval;
+			   return -EFAULT;
 		}
 		cparams.port = cdef.port;
 		cparams.irq = cdef.irq;
@@ -1218,7 +1218,7 @@ int capi20_manufacturer(unsigned int cmd, void __user *data)
 		kcapi_carddef cdef;
 
 		if ((retval = copy_from_user(&cdef, data, sizeof(cdef))))
-			return retval;
+			return -EFAULT;
 
 		cparams.port = cdef.port;
 		cparams.irq = cdef.irq;

^ permalink raw reply related

* Re: [RFC][PATCH] Fix another namespace issue with devices assigned to  classes
From: Eric W. Biederman @ 2010-06-03 10:00 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Johannes Berg, Greg KH, netdev
In-Reply-To: <AANLkTinTSQ6Ncz3FqgFHasJc2ZKfm2vaweJHUam9b-gi@mail.gmail.com>

Kay Sievers <kay.sievers@vrfy.org> writes:

> On Thu, Jun 3, 2010 at 02:53, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> In the last painful restructuring of sysfs we created started
>> creating class directories under normal devices so we could place
>> devices such as network devices directly under their the hardware
>> that implements them instead of in their class directories like
>> /sys/class/net/.  This creation of class directories avoids the
>> need to worry about namespace clonflicts if something is renamed.
>>
>> A special exception was made for devices that were still placed
>> directly in their class directory.  Looking at how this interacts
>> with the wireless network devices it appears this special exception
>> is either completely unneeded or at least needs to be restricted to
>> a parent device with the same class as the child device.  Certainly
>> in the case of unrelated classes we very much have the possibility
>> of namespace classes and we should be creating the subdirectory.
>
> The class-glue-directories are only created between a bus-parent and
> and a class device. Class devices usually don't have other class
> devices as parents, that's why it wasn't done that way.

> If people use class devices from other classes as parents, they should
> definitely convert the class that acts as a parent to a bus, to fit
> into the usual model. All that was really never meant to be used that
> way. The current behavior, to not to create the glue-directory is at
> least the intended one from the driver core's perspective.
>
> What kind of classes do this, where this change would help or would be needed?

> I don't mind trying if that change will work for people, I can't tell
> if there are any other users doing things like this which could break
> with such a change. Stuff like udev will be fine with directories
> inserted, but there are many things out there, that just access their
> parents attributes with ../../foo, which might no longer work when we
> insert directories.

To the best of my knowledge we are talking a very limited number of
real world cases.

The driver in particular that causes problems is mac80211_hwsim. It
winds up placing network devices in a directory that isn't prepared to
take network namespace tagged members, with the result that when the
module is removed we don't delete the symlinks from /sys/class/net/.
I see no reason to believe we are free of possible namespace conflicts
either, which is why I suggested the patch.

From my perspective not creating the directory in some weird corner case that
appears to practically to never happen looks like an ugly nasty special case.

If the solution winds up being converting mac80211_hwsim to using a
bus instead of a class that seems reasonable to me as well.  More code
in one place to remove the chance of problems elsewhere.

Eric

^ permalink raw reply

* [patch] tehuti: return -EFAULT on copy_to_user errors
From: Dan Carpenter @ 2010-06-03 10:05 UTC (permalink / raw)
  To: Alexander Indenbaum
  Cc: Andy Gospodarek, David S. Miller, Jiri Pirko, Stephen Hemminger,
	Eric Dumazet, netdev, kernel-janitors

copy_to_user() returns the number of bytes remaining but we want to
return a negative error code here.

Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
index 20ab161..737df60 100644
--- a/drivers/net/tehuti.c
+++ b/drivers/net/tehuti.c
@@ -646,7 +646,7 @@ static int bdx_ioctl_priv(struct net_device *ndev, struct ifreq *ifr, int cmd)
 		error = copy_from_user(data, ifr->ifr_data, sizeof(data));
 		if (error) {
 			pr_err("cant copy from user\n");
-			RET(error);
+			RET(-EFAULT);
 		}
 		DBG("%d 0x%x 0x%x\n", data[0], data[1], data[2]);
 	}
@@ -665,7 +665,7 @@ static int bdx_ioctl_priv(struct net_device *ndev, struct ifreq *ifr, int cmd)
 		    data[2]);
 		error = copy_to_user(ifr->ifr_data, data, sizeof(data));
 		if (error)
-			RET(error);
+			RET(-EFAULT);
 		break;
 
 	case BDX_OP_WRITE:

^ permalink raw reply related

* Re: Call trace related to bonding seen in 2.6.34
From: Narendra K @ 2010-06-03  9:58 UTC (permalink / raw)
  To: netdev; +Cc: fubar
In-Reply-To: <EDA0A4495861324DA2618B4C45DCB3EE6128C6@blrx3m08.blr.amer.dell.com>

> Hello,
> 
> Call trace related to bond_mii_monitor  as described in this thread -
> http://patchwork.ozlabs.org/patch/41288/ was seen on 2.6.34 kernel.
> (Trace is similar to what is described in the post dated 2009-12-17
> 21:31:36.) The trace is seen when the network service is stopped. The
> issue occurs when the network service is started and stopped in quick
> succession. 
> 
> Bonding device configuration parameters are as below -
> 
> Bonding driver version:3.6.0
> Mode: balance-alb (issue is also seen with active-backup mode)
> Miimon=100
> 3 slaves with link up and one slave with link down.
> 
> Though this requires more thought and investigation, I thought this
> could be a good data point. The below change to the bonding driver
> seemed to make the issue go away -
> 
> drivers/net/bonding/bond_main.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c
> b/drivers/net/bonding/bond_main.c
> index 0075514..f280aaf 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2408,7 +2408,7 @@ void bond_mii_monitor(struct work_struct *work)
>         }
> 
>  re_arm:
> -       if (bond->params.miimon)
> +       if (bond->params.miimon && !bond->kill_timers)
>                 queue_delayed_work(bond->wq, &bond->mii_work,
>  
> msecs_to_jiffies(bond->params.miimon));
>  out:
> 
> Any thoughts ?

Sorry, i missed attaching the trace here. Please find the trace below -

[  270.811391] bonding: bond0: Removing slave eth0
[  270.815934] bonding: bond0: Warning: the permanent HWaddr of eth0 - 00:22:19:5b:8b:97 - is still in use by bond0. Set the HWaddr of eth0 to a different address to avoid conflicts.
[  270.831913] bonding: bond0: releasing active interface eth0
[  270.831919] device eth0 left promiscuous mode
[  270.831953] bonding: bond0: making interface eth1 the new active one.
[  270.904113] ------------[ cut here ]------------
[  270.908076] kernel BUG at kernel/workqueue.c:354!
[  270.908076] invalid opcode: 0000 [#1] SMP 
[  270.908076] last sysfs file: /sys/devices/virtual/net/bond0/bonding/slaves
[  270.908076] CPU 0 
[  270.908076] Modules linked in: af_packet bonding ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod joydev sg iTCO_wdt usbhid rtc_cmos rtc_core mptctl iTCO_vendor_support pcspkr hid tpm_tis ioatdma tpm dca bnx2 rtc_lib power_meter serio_raw sr_mod dcdbas tpm_bios cdrom button uhci_hcd ehci_hcd sd_mod crc_t10dif usbcore edd ext3 mbcache jbd fan processor ide_pci_generic ide_core ata_generic ata_piix libata mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[  270.908076] 
[  270.908076] Pid: 14690, comm: bond0 Not tainted 2.6.34-upstream #1 0K399H/PowerEdge R610
[  270.908076] RIP: 0010:[<ffffffff81062366>]  [<ffffffff81062366>] queue_delayed_work_on+0x106/0x110
[  270.908076] RSP: 0018:ffff880423b1ddc0  EFLAGS: 00010282
[  270.908076] RAX: 0000000000000000 RBX: ffff880423a289f0 RCX: 0000000000000019
[  270.908076] RDX: 0000000000000000 RSI: ffff880417871a00 RDI: 00000000ffffffff
[  270.908076] RBP: ffff880423b1ddf0 R08: 0000000000000018 R09: 0000000000000001
[  270.908076] R10: 0000000000000000 R11: 0000000000000003 R12: ffff880423a286c0
[  270.908076] R13: ffff880417871a00 R14: 00000000ffffffff R15: 0000000000000019
[  270.908076] FS:  0000000000000000(0000) GS:ffff880237200000(0000) knlGS:0000000000000000
[  270.908076] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  270.908076] CR2: 00007f6b162da980 CR3: 0000000001604000 CR4: 00000000000006f0
[  270.908076] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  270.908076] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  270.908076] Process bond0 (pid: 14690, threadinfo ffff880423b1c000, task ffff880423980100)
[  270.908076] Stack:
[  270.908076]  ffff880423980100 ffff8802261b1a00 ffff880423a286c0 0000000000000003
[  270.908076] <0> ffff880423a289f0 0000000000000000 ffff880423b1de00 ffffffff810623ac
[  270.908076] <0> ffff880423b1de50 ffffffffa030c331 ffffffff8160c020 ffff880423a286f0
[  270.908076] Call Trace:
[  270.908076]  [<ffffffff810623ac>] queue_delayed_work+0x1c/0x30
[  270.908076]  [<ffffffffa030c331>] bond_mii_monitor+0x371/0x600 [bonding]
[  270.908076]  [<ffffffffa030bfc0>] ? bond_mii_monitor+0x0/0x600 [bonding]
[  270.908076]  [<ffffffff81061523>] worker_thread+0x133/0x200
[  270.908076]  [<ffffffff81065af0>] ? autoremove_wake_function+0x0/0x40
[  270.908076]  [<ffffffff810613f0>] ? worker_thread+0x0/0x200
[  270.908076]  [<ffffffff81065546>] kthread+0x96/0xa0
[  270.908076]  [<ffffffff81003d04>] kernel_thread_helper+0x4/0x10
[  270.908076]  [<ffffffff810654b0>] ? kthread+0x0/0xa0
[  270.908076]  [<ffffffff81003d00>] ? kernel_thread_helper+0x0/0x10
[  270.908076] Code: ff 48 8b 75 08 4c 89 e7 e8 c8 79 ff ff e9 7c ff ff ff 44 89 f6 4c 89 e7 e8 68 7b ff ff ba 01 00 00 00 e9 28 ff ff ff 0f 0b eb fe <0f> 0b eb fe 66 0f 1f 44 00 00 55 48 89 f0 48 8b 35 65 18 76 00 
[  270.908076] RIP  [<ffffffff81062366>] queue_delayed_work_on+0x106/0x110
[  270.908076]  RSP <ffff880423b1ddc0>
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu

With regards,
Narendra K

^ permalink raw reply

* Re: [PATCH 1/2] fec: convert TX hook to netdev_tx_t
From: David Miller @ 2010-06-03 10:19 UTC (permalink / raw)
  To: dkirjanov; +Cc: netdev
In-Reply-To: <20100602191547.GA16211@hera.kernel.org>

From: Denis Kirjanov <dkirjanov@hera.kernel.org>
Date: Wed, 2 Jun 2010 19:15:47 +0000

> Convert TX hook return value to netdev_tx_t
> 
> Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] fec: Cleanup PHY probing
From: David Miller @ 2010-06-03 10:19 UTC (permalink / raw)
  To: dkirjanov; +Cc: netdev
In-Reply-To: <20100602191700.GA22351@hera.kernel.org>

From: Denis Kirjanov <dkirjanov@hera.kernel.org>
Date: Wed, 2 Jun 2010 19:17:00 +0000

> Cleanup PHY probing: use helpers from phylib
> 
> Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv4: add LINUX_MIB_IPRPFILTER snmp counter
From: David Miller @ 2010-06-03 10:19 UTC (permalink / raw)
  To: eric.dumazet; +Cc: cl, netdev, shemminger
In-Reply-To: <1275516327.29413.34.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Jun 2010 00:05:27 +0200

> [PATCH net-next-2.6] ipv4: add LINUX_MIB_IPRPFILTER snmp counter
> 
> Christoph Lameter mentioned that packets could be dropped in input path
> because of rp_filter settings, without any SNMP counter being
> incremented. System administrator can have a hard time to track the
> problem.
> 
> This patch introduces a new counter, LINUX_MIB_IPRPFILTER, incremented
> each time we drop a packet because Reverse Path Filter triggers.
> 
> (We receive an IPv4 datagram on a given interface, and find the route to
> send an answer would use another interface)
> 
> netstat -s | grep IPReversePathFilter
>     IPReversePathFilter: 21714
> 
> Reported-by: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] netfilter: Xtables: idletimer target implementation
From: Luciano Coelho @ 2010-06-03 10:13 UTC (permalink / raw)
  To: ext Jan Engelhardt
  Cc: netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
	kaber@trash.net, Timo Teras
In-Reply-To: <alpine.LSU.2.01.1006030956000.9308@obet.zrqbmnf.qr>

On Thu, 2010-06-03 at 09:58 +0200, ext Jan Engelhardt wrote:
> On Thursday 2010-06-03 09:04, Luciano Coelho wrote:
> >
> >Looking closer, it seems that it makes a bit of sense to add a kernel
> >module to /sys/device/system.  I think it makes more sense than adding
> >to the module class or to the net class, actually.  The idletimer is not
> >a net device (so it doesn't fit in /sys/class/net) and it is not a
> >module, even though it may be handled by the xt_IDLETIMER module.
> >
> >So we can look at the xt_idletimer as a system device, which is not a
> >peripheral device in itself, but a software timer device (there are
> >already similar components).
> >
> >I'll add the kernel object we need as a system class device, so it will
> >go under /sys/devices/system/xt_idletimer.  Does that make sense to you?
> 
> Mh.. somehow I'd pick /sys/devices/virtual/xt_idletimer.
> Or even create a /sys/net/xt_idletimer. (/sys has conceptual
> subsystems directly beneath it: devices, fs, kernel, ...)

Yes, I think I'll use the /sys/device/virtual/misc class.  That seems to
be the place where, well, miscellaneous devices go. :) I think it fits
pretty nicely in that concept.

We could also have a /sys/net subsystem, but that's very high in the
sysfs hierarchy and adding it in the xt_IDLETIMER module wouldn't make
any sense.  This is something that should be added (if really needed) in
the net core subsystem, I guess.

I'll use the first option and resubmit the patch as v3.


-- 
Cheers,
Luca.


^ permalink raw reply

* Re: [PATCH] ipconfig: document DHCP hostname and DNS record
From: David Miller @ 2010-06-03 10:20 UTC (permalink / raw)
  To: fengguang.wu; +Cc: andi, netdev, linux-kernel
In-Reply-To: <20100603020244.GB4461@localhost>

From: Wu Fengguang <fengguang.wu@intel.com>
Date: Thu, 3 Jun 2010 10:02:44 +0800

> ipconfig: document DHCP hostname and DNS record
> 
> Now it's possible to update the DNS record for $HOST_NAME with
> 
> 	ip=::::$HOST_NAME::dhcp
> 
> CC: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] drivers/net: use __packed annotation
From: David Miller @ 2010-06-03 10:20 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1275538209.29413.58.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Jun 2010 06:10:09 +0200

> cleanup patch.
> 
> Use new __packed annotation in drivers/net/
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv4: RCU conversion of ip_route_input_slow/ip_route_input_mc
From: David Miller @ 2010-06-03 10:20 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1275542491.29413.71.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Jun 2010 07:21:31 +0200

> [PATCH] ipv4: rcu conversion of ip_route_input_slow/ip_route_input_mc
> 
> Avoid two atomic ops on struct in_device refcount per incoming packet,
> if slow path taken, (or route cache disabled)
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: use __packed annotation
From: David Miller @ 2010-06-03 10:22 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1275517209.29413.41.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Jun 2010 00:20:09 +0200

> cleanup patch.
> 
> Use new __packed annotation in net/ and include/
> (except netfilter)
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] chelsio: Remove remnants of CONFIG_CHELSIO_T1_COUGAR
From: David Miller @ 2010-06-03 10:22 UTC (permalink / raw)
  To: rdreier; +Cc: netdev, shemminger
In-Reply-To: <ada39x5tioj.fsf@roland-alpha.cisco.com>

From: Roland Dreier <rdreier@cisco.com>
Date: Wed, 02 Jun 2010 11:04:28 -0700

> CONFIG_CHELSIO_T1_COUGAR cannot be set (it appears nowhere in any
> Kconfig files), and the code it protects could never build (cspi.h was
> never added to the kernel tree).  Therefore it's pretty safe to remove
> all vestiges of this dead code.
> 
> Signed-off-by: Roland Dreier <rolandd@cisco.com>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox