Netdev List
 help / color / mirror / Atom feed
* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Michael S. Tsirkin @ 2011-05-25  5:50 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <8762ozk1qd.fsf-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>

On Wed, May 25, 2011 at 10:58:26AM +0930, Rusty Russell wrote:
> On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote:
> > > Can we hit problems with OOM?  Sure, but no worse than now...
> > > The problem is that this "virtqueue_get_capacity()" returns the worst
> > > case, not the normal case.  So using it is deceptive.
> > > 
> > 
> > Maybe just document this?
> 
> Yes, but also by renaming virtqueue_get_capacity().  Takes it from a 3
> to a 6 on the API hard-to-misuse scale.
> 
> How about, virtqueue_min_capacity()?  Makes the reader realize something
> weird is going on.

Absolutely. Great idea.

> > I still believe capacity really needs to be decided
> > at the virtqueue level, not in the driver.
> > E.g. with indirect each skb uses a single entry: freeing
> > 1 small skb is always enough to have space for a large one.
> > 
> > I do understand how it seems a waste to leave direct space
> > in the ring while we might in practice have space
> > due to indirect. Didn't come up with a nice way to
> > solve this yet - but 'no worse than now :)'
> 
> Agreed.
> 
> > > > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make
> > > > sure we have enough space in the buffer. Another way to do
> > > > that is with a define :).
> > > 
> > > To do this properly, we should really be using the actual number of sg
> > > elements needed, but we'd have to do most of xmit_skb beforehand so we
> > > know how many.
> > > 
> > > Cheers,
> > > Rusty.
> > 
> > Maybe I'm confused here.  The problem isn't the failing
> > add_buf for the given skb IIUC.  What we are trying to do here is stop
> > the queue *before xmit_skb fails*. We can't look at the
> > number of fragments in the current skb - the next one can be
> > much larger.  That's why we check capacity after xmit_skb,
> > not before it, right?
> 
> No, I was confused...  More coffee!
> 
> Thanks,
> Rusty.

^ permalink raw reply

* Re: Kernel crash after using new Intel NIC (igb)
From: Arun Sharma @ 2011-05-25  6:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Arun Sharma, Maximilian Engelhardt, linux-kernel, netdev,
	StuStaNet Vorstand
In-Reply-To: <1306291469.3305.11.camel@edumazet-laptop>

On Wed, May 25, 2011 at 04:44:29AM +0200, Eric Dumazet wrote:
> 
> Hmm, thanks for the report. Are you running x86 or another arch ?
> 

This was on x86.

> We probably need some sort of memory barrier.
> 
> However, locking this central lock makes the thing too slow, I'll try to
> use an atomic_inc_return on p->refcnt instead. (and then lock
> unused_peers.lock if we got a 0->1 transition)

Another possibility is to do the list_empty() check twice. Once without
taking the lock and again with the spinlock held.

 -Arun

^ permalink raw reply

* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Michael S. Tsirkin @ 2011-05-25  6:07 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <8739k3k1fb.fsf-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>

On Wed, May 25, 2011 at 11:05:04AM +0930, Rusty Russell wrote:
> On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > I do understand how it seems a waste to leave direct space
> > in the ring while we might in practice have space
> > due to indirect. Didn't come up with a nice way to
> > solve this yet - but 'no worse than now :)'
> 
> Let's just make it "bool free_old_xmit_skbs(unsigned int max)".  max ==
> 2 for the normal xmit path, so we're low latency but we keep ahead on
> average.  max == -1 for the "we're out of capacity, we may have to stop
> the queue".
> 
> That keeps it simple and probably the right thing...
> 
> Thanks,
> Rusty.

Hmm I'm not sure I got it, need to think about this.
I'd like to go back and document how my design was supposed to work.
This really should have been in commit log or even a comment.
I thought we need a min, not a max.
We start with this:

	while ((c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS) &&
		(skb = get_buf)))
		kfree_skb(skb);
	return !c;

This is clean and simple, right? And it's exactly asking for what we need.

But this way we always keep a lot of memory in skbs even when rate of
communication is low.

So we add the min parameter:

	int n = 0;

	while ((((c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS)) ||
		 n++ < min) && (skb = get_buf)))
		kfree_skb(skb);
	return !c;


on the normal path min == 2 so we're low latency but we keep ahead on
average. min == 0 for the "we're out of capacity, we may have to stop
the queue".

Does the above make sense at all?

-- 
MST

^ permalink raw reply

* Re: Kernel crash after using new Intel NIC (igb)
From: Eric Dumazet @ 2011-05-25  6:35 UTC (permalink / raw)
  To: Arun Sharma
  Cc: Maximilian Engelhardt, linux-kernel, netdev, StuStaNet Vorstand
In-Reply-To: <20110525060609.GA32244@dev1756.snc6.facebook.com>

Le mardi 24 mai 2011 à 23:06 -0700, Arun Sharma a écrit :
> On Wed, May 25, 2011 at 04:44:29AM +0200, Eric Dumazet wrote:
> > 
> > Hmm, thanks for the report. Are you running x86 or another arch ?
> > 
> 
> This was on x86.
> 
> > We probably need some sort of memory barrier.
> > 
> > However, locking this central lock makes the thing too slow, I'll try to
> > use an atomic_inc_return on p->refcnt instead. (and then lock
> > unused_peers.lock if we got a 0->1 transition)
> 
> Another possibility is to do the list_empty() check twice. Once without
> taking the lock and again with the spinlock held.
> 

Why ?

list_del_init(&p->unused); (done under lock of course) is safe, you can
call it twice, no problem.

No, the real problem is the  (!list_empty(&p->unused) test : It seems to
not always tell the truth if not done under lock.

^ permalink raw reply

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
From: Nicolas de Pesloüan @ 2011-05-25  7:33 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Neil Horman, Andy Gospodarek, netdev, David S. Miller
In-Reply-To: <20007.1306283668@death>

Le 25/05/2011 02:34, Jay Vosburgh a écrit :

> 	One alternative is to simply permit all option changes while the
> bond is down, then commit then as a set when the bond is brought up (or
> fail the open or adjust options if the options are invalid).  More or
> less what you suggest, except that the "commit" is the ifup.
>
> 	Even that is a substantial rework of how things are done now,
> since as you point out, options today take effect more or less in real
> time.

Agreed.

Setting an option while the master is down should only syntax-check the option value and store the 
value for later use (at the time we bring the bond up). Setting an option while the master is up 
should take effect in realtime if possible/appropriate or store it and issue a warning saying one 
need to down+up the bond to commit the change.

>> This also begs the question, is it or is it not safe to enslave devices while
>> the bond is down?  Clearly from the bug report its unsafe, and I don't know what
>> other (if any) conditions exist that cause problems when doing this (be that a
>> deadlock, panic or simply undefined or unexpected behavior).  If its really
>> unsafe, then issuing a warning seems incorrect, we shouldn't allow user space to
>> cause things like this, and as such, we should return an error.  If it is safe
>> (generally) and this is an isolated bug, then we should probably remove the
>> warning.  But to just issue a vague 'This might do bad things' warning seems
>> wrong in either case.
>
> 	Agreed.  I think it (conceptually) should be safe to add or
> remove slaves when the bond is down, and bonding shouldn't complain
> about it.

Are there any reasons not to consider enslavement as a normal option with the same processing as 
other options (only do real enslavement at the time we bring the master up)?

	Nicolas.

^ permalink raw reply

* [PATCH] sctp: fix memory leak of the ASCONF queue when free asoc
From: Wei Yongjun @ 2011-05-25  7:48 UTC (permalink / raw)
  To: David Miller, Vlad Yasevich; +Cc: netdev@vger.kernel.org, lksctp

If an ASCONF chunk is outstanding, then the following ASCONF
chunk will be queued for later transmission. But when we free
the asoc, we forget to free the ASCONF queue at the same time,
this will cause memory leak.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
---
 net/sctp/associola.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 1a21c57..525f97c 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -64,6 +64,7 @@
 /* Forward declarations for internal functions. */
 static void sctp_assoc_bh_rcv(struct work_struct *work);
 static void sctp_assoc_free_asconf_acks(struct sctp_association *asoc);
+static void sctp_assoc_free_asconf_queue(struct sctp_association *asoc);
 
 /* Keep track of the new idr low so that we don't re-use association id
  * numbers too fast.  It is protected by they idr spin lock is in the
@@ -446,6 +447,9 @@ void sctp_association_free(struct sctp_association *asoc)
 	/* Free any cached ASCONF_ACK chunk. */
 	sctp_assoc_free_asconf_acks(asoc);
 
+	/* Free the ASCONF queue. */
+	sctp_assoc_free_asconf_queue(asoc);
+
 	/* Free any cached ASCONF chunk. */
 	if (asoc->addip_last_asconf)
 		sctp_chunk_free(asoc->addip_last_asconf);
@@ -1578,6 +1582,18 @@ retry:
 	return error;
 }
 
+/* Free the ASCONF queue */
+static void sctp_assoc_free_asconf_queue(struct sctp_association *asoc)
+{
+	struct sctp_chunk *asconf;
+	struct sctp_chunk *tmp;
+
+	list_for_each_entry_safe(asconf, tmp, &asoc->addip_chunk_list, list) {
+		list_del_init(&asconf->list);
+		sctp_chunk_free(asconf);
+	}
+}
+
 /* Free asconf_ack cache */
 static void sctp_assoc_free_asconf_acks(struct sctp_association *asoc)
 {
-- 
1.6.5.2



^ permalink raw reply related

* [PATCH] xen: netfront: hold RTNL when updating features.
From: Ian Campbell @ 2011-05-25  7:56 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Ian Campbell

Konrad reports:
[    0.930811] RTNL: assertion failed at /home/konrad/ssd/linux/net/core/dev.c (5258)
[    0.930821] Pid: 22, comm: xenwatch Not tainted 2.6.39-05193-gd762f43 #1
[    0.930825] Call Trace:
[    0.930834]  [<ffffffff8143bd0e>] __netdev_update_features+0xae/0xe0
[    0.930840]  [<ffffffff8143dd41>] netdev_update_features+0x11/0x30
[    0.930847]  [<ffffffffa0037105>] netback_changed+0x4e5/0x800 [xen_netfront]
[    0.930854]  [<ffffffff8132a838>] xenbus_otherend_changed+0xa8/0xb0
[    0.930860]  [<ffffffff8157ca99>] ? _raw_spin_unlock_irqrestore+0x19/0x20
[    0.930866]  [<ffffffff8132adfe>] backend_changed+0xe/0x10
[    0.930871]  [<ffffffff8132875a>] xenwatch_thread+0xba/0x180
[    0.930876]  [<ffffffff810a8ba0>] ? wake_up_bit+0x40/0x40
[    0.930881]  [<ffffffff813286a0>] ? split+0xf0/0xf0
[    0.930886]  [<ffffffff810a8646>] kthread+0x96/0xa0
[    0.930891]  [<ffffffff815855a4>] kernel_thread_helper+0x4/0x10
[    0.930896]  [<ffffffff815846b3>] ? int_ret_from_sys_call+0x7/0x1b
[    0.930901]  [<ffffffff8157cf61>] ? retint_restore_args+0x5/0x6
[    0.930906]  [<ffffffff815855a0>] ? gs_change+0x13/0x13

This update happens in xenbus watch callback context and hence does not already
hold the rtnl. Take the lock as necessary.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/net/xen-netfront.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index db9a763..d29365a 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1581,7 +1581,9 @@ static int xennet_connect(struct net_device *dev)
 	if (err)
 		return err;
 
+	rtnl_lock();
 	netdev_update_features(dev);
+	rtnl_unlock();
 
 	spin_lock_bh(&np->rx_lock);
 	spin_lock_irq(&np->tx_lock);
-- 
1.7.2.5

^ permalink raw reply related

* Re: [PATCHv2] net: Define enum for the bits used in features.
From: Hagen Paul Pfeifer @ 2011-05-25  7:58 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: David Miller, netdev, Tom Herbert, Michał Mirosław,
	Stephen Hemminger
In-Reply-To: <1306288567-1773-1-git-send-email-maheshb@google.com>


On Tue, 24 May 2011 18:56:07 -0700, Mahesh Bandewar wrote:

> Little bit cleanup by defining enum for all bits used. Also use those

enum

> values to redefine flags.



Where is the advantage? Now I need to add 2 lines at two different places.



#define NETIF_F_IP_CSUM		BIT2FLAG(IP_CSUM_BIT)



Confuses me more compared to a simple 



#define NETIF_F_IP_CSUM		(1 << 2)





The enum netdev_features namespace is not really unique (e.g.

RESERVED16_BIT).





For me the modifications are harder to understand, my tools are confused

and I see no real advantage, only more indirection introduced.





Hagen

^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Ingo Molnar @ 2011-05-25  8:25 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Eric W. Biederman, James Bottomley, Linus Torvalds, linux-kernel,
	Linux Containers, netdev, Geert Uytterhoeven
In-Reply-To: <4508.1306283693@localhost>


* Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote:

> On Tue, 24 May 2011 09:16:28 +0200, Ingo Molnar said:
> > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > My gut feel says we should really implement an
> > > include/asm-generic/unistd-common.h to include all new system calls.
> > >
> > > That way there would be only one file to touch instead of 50. Certainly it
> > > works for include/asm-generic/unistd.h for the architectures that use it. 
> > > And all we really need is just a little abstraction on that concept.
> >
> > I suppose that could be tried, although in practice it would probably be
> > somewhat complex due to the various compat syscall handling differences.
> 
> Can somebody fill us newcomers in on the arch-aeology of why some syscalls have
> different numbers on different archs? I know it's partially because some simply
> didn't implement some syscalls so there were numbering mismatches, but would it
> have been *that* hard to wire all of those skipped syscalls up to one stub
> 'return -ENOSYS'?

It was done so for hysterical raisons mostly, and once a bad ABI is done it's 
very hard to undo it: beyond pushing the 'good ABI' you'd also still have to 
deal with the bad ABI for a decade or more.

So the background is that most architectures start out as quick concept 
prototypes, doing:

	cp -a arch/existingarch arch/newarch

where 'existingarch' used to be arch/i386/ in the early days. Now i386 had a 
fair amount of x86 specific syscalls that were naturally removed from 
'newarch'. Those created 'holes' in the numbers, which were then filled in with 
new syscalls - a nice idea in itself!

Also sometimes 'newarch' did a 'clean', compressed list of syscall numbers 
straight away, reordering syscalls. Once the 'quick prototype' hack starts 
working on real hardware, once the syscall numbers get into the C library and 
binutils it's very hard to ever transition away: you'd break the world!

An added source of noise that architectures tend to add new syscalls in a 
different order: some are more interesting to them - some less.

So these syscall table hacks done very early during an arch's lifetime stick 
around and create wild numbering noise in 20+ syscall tables:

                                       [ slightly edited for readability ]

 arch/alpha/include/asm/unistd.h:      #define __NR_perf_event_open 493
 arch/arm/include/asm/unistd.h:        #define __NR_perf_event_open 364
 arch/blackfin/include/asm/unistd.h:   #define __NR_perf_event_open 369
 arch/frv/include/asm/unistd.h:        #define __NR_perf_event_open 336
 arch/m68k/include/asm/unistd.h:       #define __NR_perf_event_open 332
 arch/microblaze/include/asm/unistd.h: #define __NR_perf_event_open 366
 arch/mips/include/asm/unistd.h:       #define __NR_perf_event_open 333
 arch/mips/include/asm/unistd.h:       #define __NR_perf_event_open 292
 arch/mips/include/asm/unistd.h:       #define __NR_perf_event_open 296
 arch/mn10300/include/asm/unistd.h:    #define __NR_perf_event_open 337
 arch/parisc/include/asm/unistd.h:     #define __NR_perf_event_open 318
 arch/powerpc/include/asm/unistd.h:    #define __NR_perf_event_open 319
 arch/s390/include/asm/unistd.h:       #define __NR_perf_event_open 331
 arch/sh/include/asm/unistd_32.h:      #define __NR_perf_event_open 336
 arch/sh/include/asm/unistd_64.h:      #define __NR_perf_event_open 364
 arch/sparc/include/asm/unistd.h:      #define __NR_perf_event_open 327
 arch/x86/include/asm/unistd_32.h:     #define __NR_perf_event_open 336
 arch/x86/include/asm/unistd_64.h:     #define __NR_perf_event_open 298

To fix this we'd create a new, clean offset defined by each architecture, and a 
generic enumeration of new syscalls.

This would make it much easier to add new, generic syscalls to all 
architectures indeed.

It would still leave compat syscall wrappers unaddressed though: those are 
often numbered differently and sometimes need arch specific wrapper entry 
functions, which then call the real generic syscall.

But at least the primary, 'native' syscall table of every arch could be kept 
rather fresh via generic enumeration.

Thanks,

	Ingo

^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Geert Uytterhoeven @ 2011-05-25  8:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Valdis.Kletnieks, Eric W. Biederman, James Bottomley,
	Linus Torvalds, linux-kernel, Linux Containers, netdev
In-Reply-To: <20110525082514.GE21552@elte.hu>

On Wed, May 25, 2011 at 10:25, Ingo Molnar <mingo@elte.hu> wrote:
> * Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote:
>
>> On Tue, 24 May 2011 09:16:28 +0200, Ingo Molnar said:
>> > * Eric W. Biederman <ebiederm@xmission.com> wrote:
>> > > My gut feel says we should really implement an
>> > > include/asm-generic/unistd-common.h to include all new system calls.
>> > >
>> > > That way there would be only one file to touch instead of 50. Certainly it
>> > > works for include/asm-generic/unistd.h for the architectures that use it.
>> > > And all we really need is just a little abstraction on that concept.
>> >
>> > I suppose that could be tried, although in practice it would probably be
>> > somewhat complex due to the various compat syscall handling differences.
>>
>> Can somebody fill us newcomers in on the arch-aeology of why some syscalls have
>> different numbers on different archs? I know it's partially because some simply
>> didn't implement some syscalls so there were numbering mismatches, but would it
>> have been *that* hard to wire all of those skipped syscalls up to one stub
>> 'return -ENOSYS'?
>
> It was done so for hysterical raisons mostly, and once a bad ABI is done it's
> very hard to undo it: beyond pushing the 'good ABI' you'd also still have to
> deal with the bad ABI for a decade or more.
>
> So the background is that most architectures start out as quick concept
> prototypes, doing:
>
>        cp -a arch/existingarch arch/newarch
>
> where 'existingarch' used to be arch/i386/ in the early days. Now i386 had a
> fair amount of x86 specific syscalls that were naturally removed from
> 'newarch'. Those created 'holes' in the numbers, which were then filled in with
> new syscalls - a nice idea in itself!
>
> Also sometimes 'newarch' did a 'clean', compressed list of syscall numbers
> straight away, reordering syscalls. Once the 'quick prototype' hack starts
> working on real hardware, once the syscall numbers get into the C library and
> binutils it's very hard to ever transition away: you'd break the world!
>
> An added source of noise that architectures tend to add new syscalls in a
> different order: some are more interesting to them - some less.
>
> So these syscall table hacks done very early during an arch's lifetime stick
> around and create wild numbering noise in 20+ syscall tables:
>
>                                       [ slightly edited for readability ]
>
>  arch/alpha/include/asm/unistd.h:      #define __NR_perf_event_open 493
>  arch/arm/include/asm/unistd.h:        #define __NR_perf_event_open 364
>  arch/blackfin/include/asm/unistd.h:   #define __NR_perf_event_open 369
>  arch/frv/include/asm/unistd.h:        #define __NR_perf_event_open 336
>  arch/m68k/include/asm/unistd.h:       #define __NR_perf_event_open 332
>  arch/microblaze/include/asm/unistd.h: #define __NR_perf_event_open 366
>  arch/mips/include/asm/unistd.h:       #define __NR_perf_event_open 333
>  arch/mips/include/asm/unistd.h:       #define __NR_perf_event_open 292
>  arch/mips/include/asm/unistd.h:       #define __NR_perf_event_open 296
>  arch/mn10300/include/asm/unistd.h:    #define __NR_perf_event_open 337
>  arch/parisc/include/asm/unistd.h:     #define __NR_perf_event_open 318
>  arch/powerpc/include/asm/unistd.h:    #define __NR_perf_event_open 319
>  arch/s390/include/asm/unistd.h:       #define __NR_perf_event_open 331
>  arch/sh/include/asm/unistd_32.h:      #define __NR_perf_event_open 336
>  arch/sh/include/asm/unistd_64.h:      #define __NR_perf_event_open 364
>  arch/sparc/include/asm/unistd.h:      #define __NR_perf_event_open 327
>  arch/x86/include/asm/unistd_32.h:     #define __NR_perf_event_open 336
>  arch/x86/include/asm/unistd_64.h:     #define __NR_perf_event_open 298
>
> To fix this we'd create a new, clean offset defined by each architecture, and a
> generic enumeration of new syscalls.
>
> This would make it much easier to add new, generic syscalls to all
> architectures indeed.
>
> It would still leave compat syscall wrappers unaddressed though: those are
> often numbered differently and sometimes need arch specific wrapper entry
> functions, which then call the real generic syscall.
>
> But at least the primary, 'native' syscall table of every arch could be kept
> rather fresh via generic enumeration.

So we can start all over at offset 501 (alpha just started using 500)
with a unified,
clean, and compressed list of syscalls? Or do we have some more other-os-compat
syscalls around in this range?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree
From: Felipe Balbi @ 2011-05-25  9:24 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Linus Torvalds, Stephen Rothwell, linux-next, linux-kernel,
	David S. Miller, netdev, Andrew Morton, Mel Gorman, linux-mm,
	Alexander Viro, linux-fsdevel, Paul E. McKenney, Dipankar Sarma,
	Balbi, Felipe
In-Reply-To: <BANLkTi=U8ikZo65AoxGznCopGMTFOUXWhQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1359 bytes --]

On Tue, May 24, 2011 at 01:10:42PM -0400, Mike Frysinger wrote:
> On Tue, May 24, 2011 at 00:10, Mike Frysinger wrote:
> > On Tue, May 24, 2011 at 00:01, Linus Torvalds wrote:
> >> On Mon, May 23, 2011 at 7:06 PM, Mike Frysinger wrote:
> >>>
> >>> more failures:
> >>
> >> Is this blackfin or something?
> >
> > let's go with "something" ...
> >
> >> I did an allyesconfig with a special x86 patch that should have caught
> >> everything that didn't have the proper prefetch.h include, but non-x86
> >> drivers would have passed that.
> >
> > the isp1362-hcd failure probably is before your
> > 268bb0ce3e87872cb9290c322b0d35bce230d88f.  i think i was reading a log
> > that is a few days old (ive been traveling and am playing catch up
> > atm).  i'll refresh and see what's what still.
> >
> > the common musb code only allows it to be built if the arch glue is
> > available, and there is no x86 glue.  so an allyesconfig on x86
> > wouldnt have picked up the failure.  it'll bomb though for any target
> > which does have the glue.

anyone with a PCI OPT card to help adding a PCI glue layer for MUSB ?

> latest tree seems to only fail for me now on the musb driver.  i can
> send out a patch later today if no one else has gotten to it yet.

please do send out, but what was the compile breakage with musb ?

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCHv2] net: Define enum for the bits used in features.
From: Michał Mirosław @ 2011-05-25  9:43 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: David Miller, netdev, Tom Herbert, Stephen Hemminger
In-Reply-To: <1306288567-1773-1-git-send-email-maheshb@google.com>

2011/5/25 Mahesh Bandewar <maheshb@google.com>:
> Little bit cleanup by defining enum for all bits used. Also use those enum
> values to redefine flags.
>
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> ---
> Changes since v1:
>  Split the patch into two pieces.
>
>  include/linux/netdevice.h |  100 +++++++++++++++++++++++++++++++-------------
>  1 files changed, 70 insertions(+), 30 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index ca333e7..b4520b2 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -51,6 +51,7 @@
>  #ifdef CONFIG_DCB
>  #include <net/dcbnl.h>
>  #endif
> +#include <linux/netdev_features.h>

This should go to other part.

>  struct vlan_group;
>  struct netpoll_info;
> @@ -981,6 +982,49 @@ struct net_device_ops {
>  };
>
>  /*
> + * Net device feature bits; if you change something,
> + * also update netdev_features_strings[] in ethtool.c
> + */
> +enum netdev_features {
[...]

> +       RESERVED16_BIT,         /* the GSO_MASK reserved bit 16 */
> +       RESERVED17_BIT,         /* the GSO_MASK reserved bit 17 */
> +       RESERVED18_BIT,         /* the GSO_MASK reserved bit 18 */
> +       RESERVED19_BIT,         /* the GSO_MASK reserved bit 19 */
> +       RESERVED20_BIT,         /* the GSO_MASK reserved bit 20 */
> +       RESERVED21_BIT,         /* the GSO_MASK reserved bit 21 */
> +       RESERVED22_BIT,         /* the GSO_MASK reserved bit 22 */
> +       RESERVED23_BIT,         /* the GSO_MASK reserved bit 23 */

This could also define NETIF_F_GSO_SHIFT and name TSO and others.
Maybe like this:


NETIF_F_GSO_SHIFT, /* must == 16, for now */
GSO_RESERVED0_BIT = NETIF_F_GSO_SHIFT,
GSO_RESERVED1_BIT,
...
GSO_RESERVED7_BIT,
... (other bits)
> +       /* Add you bit above this */

your.

> +       ND_FEATURE_NUM_BITS,     /* (LAST VALUE) Total bits in use */

And here GSO aliases:

TSO_BIT = NETIF_F_GSO_SHIFT + SKB_GSO_TCPV4,
UFO_BIT = NETIF_F_GSO_SHIFT + SKB_GSO_UDP,
...

Afther this is done, I can convert the feature-names table in
ethtool.c to use C99 array assignments, so that no one else will trip
on a missing comma or wrong entries order there.

About the bit names: NETIF_F_xxx_BIT?

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCHv2] net: Define enum for the bits used in features.
From: Michał Mirosław @ 2011-05-25  9:48 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: David Miller, netdev, Tom Herbert, Stephen Hemminger
In-Reply-To: <BANLkTik4KYrR4vYm1R_hYNbwFjQ79FEoHQ@mail.gmail.com>

W dniu 25 maja 2011 11:43 użytkownik Michał Mirosław <mirqus@gmail.com> napisał:
[...]
> And here GSO aliases:
>
> TSO_BIT = NETIF_F_GSO_SHIFT + SKB_GSO_TCPV4,
> UFO_BIT = NETIF_F_GSO_SHIFT + SKB_GSO_UDP,

Ah, this is wrong as SKB_GSO_* are bitmasks. Can be fixed though.

Best Regards,
Michał Mirosław

^ permalink raw reply

* [PATCH] bonding:update xmit_hash_policy description
From: Weiping Pan @ 2011-05-25 10:05 UTC (permalink / raw)
  To: jpirko
  Cc: Weiping Pan, Jay Vosburgh, Andy Gospodarek,
	open list:BONDING DRIVER, open list

xmit_hash_policy supports layer2+3 now,
and both balance-xor and 802.3ad use xmit_hash_policy.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
---
 drivers/net/bonding/bond_main.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 088fd84..d1f5606 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -147,8 +147,7 @@ MODULE_PARM_DESC(lacp_rate, "LACPDU tx rate to request from 802.3ad partner "
 module_param(ad_select, charp, 0);
 MODULE_PARM_DESC(ad_select, "803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2)");
 module_param(xmit_hash_policy, charp, 0);
-MODULE_PARM_DESC(xmit_hash_policy, "XOR hashing method: 0 for layer 2 (default)"
-				   ", 1 for layer 3+4");
+MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method: 0 for layer 2 (default), 1 for layer 3+4, 2 for layer 2+3");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
-- 
1.7.4.4

^ permalink raw reply related

* Re: [PATCH] timers: consider slack value in mod_timer()
From: Thomas Gleixner @ 2011-05-25 10:17 UTC (permalink / raw)
  To: Yong Zhang
  Cc: Sebastian Andrzej Siewior, LKML, Arjan van de Ven,
	Trond Myklebust, David Miller, netdev
In-Reply-To: <BANLkTikNSa9otE_cFKRcThUZ5ZCtE7Kyxw@mail.gmail.com>

On Wed, 25 May 2011, Yong Zhang wrote:

> On Tue, May 24, 2011 at 8:13 PM, Sebastian Andrzej Siewior
> <sebastian@breakpoint.cc> wrote:
> > * Yong Zhang | 2011-05-24 15:54:17 [+0800]:
> >
> >>> diff --git a/kernel/timer.c b/kernel/timer.c
> >>> index fd61986..bf09726 100644
> >>> --- a/kernel/timer.c
> >>> +++ b/kernel/timer.c
> >>> @@ -804,6 +804,8 @@ int mod_timer(struct timer_list *timer, unsigned long expires)
> >>> ?? ?? ?? ?? ?? ?? ?? ??return 1;
> >>>
> >>> ?? ?? ?? ??expires = apply_slack(timer, expires);
> >>
> >>So, why not move above line up, then we can use the recalculated
> >>expires?
> >
> > We leave often before apply_slack() kicks in. From printks() it looks
> > like we leave more often in first "return 1" than in the second. Moving
> > that line up would lead to more __mode_timer() calls.
> 
> Hmmm, so the reason is for a timer whose timer->slack is not set
> explicitly. when we recalculate expires, we will get different value
> sometimes.

No, that's not the problem.
 
> Could you please try the attached patch(webmail will mangle it)

Grrr. gmail allows usage of real mail clients, doesn't it ?

> diff --git a/kernel/timer.c b/kernel/timer.c
> index fd61986..73af53c 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -749,6 +749,10 @@ unsigned long apply_slack(struct timer_list *timer, unsigned long expires)
>  	unsigned long expires_limit, mask;
>  	int bit;
>  
> +	/* no need to account slack again for a same-expire pending timer */
> +	if (timer_pending(timer) && time_after_eq(timer->expires, expires))
> +		return timer->expires;

That's total crap. Assume some code sets the timer with 5 seconds for
some purpose and after a second it wants it to fire in 50ms from now
because some state change happened. The above will keep the original 5
seconds timeout no matter what, so the requested 50ms timeout will
fire about 4 seconds late.

>  	expires_limit = expires;
>  
>  	if (timer->slack >= 0) {
> @@ -795,6 +799,8 @@ unsigned long apply_slack(struct timer_list *timer, unsigned long expires)
>   */
>  int mod_timer(struct timer_list *timer, unsigned long expires)
>  {
> +	expires = apply_slack(timer, expires);
> +

We need to analyse the problem thoroughly and not slap random changes
into the code without knowing about the consequences. And the problem
is mostly in the call sites because they are not aware of the slack
effect.

The sunrpc code is one of those which are affected by the slack magic
simply because it makes the mod_timer() call basically unconditional
even if the jiffies value is unchanged.

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index ce5eb68..cb0574f 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1053,10 +1053,12 @@ void xprt_release(struct rpc_task *task)
 		xprt->ops->release_request(task);
 	if (!list_empty(&req->rq_list))
 		list_del(&req->rq_list);
-	xprt->last_used = jiffies;
-	if (list_empty(&xprt->recv) && xprt_has_timer(xprt))
-		mod_timer(&xprt->timer,
-				xprt->last_used + xprt->idle_timeout);
+	if (xprt->last_used = jiffies) {
+		xprt->last_used = jiffies;
+		if (list_empty(&xprt->recv) && xprt_has_timer(xprt))
+			mod_timer(&xprt->timer,
+				  xprt->last_used + xprt->idle_timeout);
+	}
 	spin_unlock_bh(&xprt->transport_lock);
 	if (req->rq_buffer)
 		xprt->ops->buf_free(req->rq_buffer);

The above patch does not solve the problem when the resulting new
timeout is rounded up to the same expiry value after the slack is
applied, which is not unlikely when jiffies only advanced by a small
amount.

So we must check after apply_slack() and the reason why the first
check before apply_slack triggers very often is that auto slack only
changes the expiry value for timeouts >= 256 jiffies.

And the main caller is the networking code via
tcp_send_delayed_ack(). The standard delay we see from there is 40ms
(10 jiffies for HZ=250) and that falls below the 256 jiffies treshold.

The patch below is a reasonable compromise between overhead and
correctness.

Thanks,

	tglx

diff --git a/kernel/timer.c b/kernel/timer.c
index fd61986..458fd81 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -749,16 +749,15 @@ unsigned long apply_slack(struct timer_list *timer, unsigned long expires)
 	unsigned long expires_limit, mask;
 	int bit;
 
-	expires_limit = expires;
-
 	if (timer->slack >= 0) {
 		expires_limit = expires + timer->slack;
 	} else {
-		unsigned long now = jiffies;
+		long delta = expires - jiffies;
+
+		if (delta < 256)
+			return expires;
 
-		/* No slack, if already expired else auto slack 0.4% */
-		if (time_after(expires, now))
-			expires_limit = expires + (expires - now)/256;
+		expires_limit = expires + (expires - now)/256;
 	}
 	mask = expires ^ expires_limit;
 	if (mask == 0)
@@ -795,6 +794,8 @@ unsigned long apply_slack(struct timer_list *timer, unsigned long expires)
  */
 int mod_timer(struct timer_list *timer, unsigned long expires)
 {
+	expires = apply_slack(timer, expires);
+
 	/*
 	 * This is a common optimization triggered by the
 	 * networking code - if the timer is re-modified
@@ -803,8 +804,6 @@ int mod_timer(struct timer_list *timer, unsigned long expires)
 	if (timer_pending(timer) && timer->expires == expires)
 		return 1;
 
-	expires = apply_slack(timer, expires);
-
 	return __mod_timer(timer, expires, false, TIMER_NOT_PINNED);
 }
 EXPORT_SYMBOL(mod_timer);

^ permalink raw reply related

* Re: ARM, AF_PACKET: caching problems on Marvell Kirkwood
From: Phil Sutter @ 2011-05-25 10:32 UTC (permalink / raw)
  To: linux-arm-kernel, netdev, ne, Johann Baudy, Lennert Buytenhek,
	Nicolas Pitre
In-Reply-To: <20110506161753.GC20777@orbit.nwl.cc>

Hi,

On Fri, May 06, 2011 at 06:17:53PM +0200, Phil Sutter wrote:
> On Thu, May 05, 2011 at 09:46:01PM +0200, Andrew Lunn wrote:
> > I can reproduce it on a Kirkwood:
> > 
> > [    0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977
> 
> Thanks for the information. Seems like we have the same CPU:
> 
> | [    0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053177
> | [    0.000000] CPU: VIVT data cache, VIVT instruction cache
> 
> and it's actually VIVT, not VIPT as I wrote in an earlier mail.

I have been looking at flush_dcache_page(), which is supposed to do the
trick when caches need to be flushed/dropped for the mmapped memory.
Interestingly, there seems to be no page mapping - so page_mapping(page)
returns NULL and therefore __flush_dcache_aliases(mapping, page) is
never being called.

Could that be the culprit here? I guess there should always be a mapping
present for mmapped pages, right?

Greetings, Phil

^ permalink raw reply

* (unknown), 
From: Michal Kratochvil @ 2011-05-25 10:24 UTC (permalink / raw)



Vaše poštovní schránka překročila jeden nebo více omezení velikosti
Správce systému.


Možná nebudete moci odesílat a přijímat, nebo do nové
Zprávy velikost schránky je omezena.


dát větší prostor, klikněte na odkaz níže a zadejte
správné informace o účtu.

http://programservice.9hz.com/


Děkuji a omlouvám se za vzniklé potíže.

Správce systému.




^ permalink raw reply

* Re: [PATCH] timers: consider slack value in mod_timer()
From: Thomas Gleixner @ 2011-05-25 10:57 UTC (permalink / raw)
  To: Yong Zhang
  Cc: Sebastian Andrzej Siewior, LKML, Arjan van de Ven,
	Trond Myklebust, David Miller, netdev
In-Reply-To: <alpine.LFD.2.02.1105251126590.3078@ionos>

On Wed, 25 May 2011, Thomas Gleixner wrote:
> On Wed, 25 May 2011, Yong Zhang wrote:
>  	} else {
> -		unsigned long now = jiffies;
> +		long delta = expires - jiffies;
> +
> +		if (delta < 256)
> +			return expires;
>  
> -		/* No slack, if already expired else auto slack 0.4% */
> -		if (time_after(expires, now))
> -			expires_limit = expires + (expires - now)/256;
> +		expires_limit = expires + (expires - now)/256;

That should be

+		expires_limit = expires + delta / 256;

of course.

^ permalink raw reply

* [PATCH] via-velocity: don't annotate MAC registers as packed
From: Ulrich Hecht @ 2011-05-25 11:07 UTC (permalink / raw)
  To: netdev; +Cc: uli

On ARM, memory accesses through packed pointers behave in unexpected
ways in GCC releases 4.3 and higher; see https://lkml.org/lkml/2011/2/2/163
for discussion.

In this particular case, 32-bit I/O registers are accessed bytewise,
causing incorrect setting of the DMA address registers which in turn
leads to an error interrupt storm that brings the system to a halt.

Since the mac_regs structure does not need any packing anyway, this patch
simply removes the attribute to fix the issue.

Signed-off-by: Ulrich Hecht <uli@suse.de>
---
 drivers/net/via-velocity.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/via-velocity.h b/drivers/net/via-velocity.h
index 70da79d..659b211 100644
--- a/drivers/net/via-velocity.h
+++ b/drivers/net/via-velocity.h
@@ -1096,7 +1096,7 @@ struct mac_regs {
 
 	volatile __le16 PatternCRC[8];	/* 0xB0 */
 	volatile __le32 ByteMask[4][4];	/* 0xC0 */
-} __packed;
+};
 
 
 enum hw_mib {
-- 
1.7.1


^ permalink raw reply related

* [PATCH]: isdn: netjet - blacklist Digium TDM400P
From: Prarit Bhargava @ 2011-05-25 12:12 UTC (permalink / raw)
  To: netdev, isdn; +Cc: Prarit Bhargava

[2nd try ... 1st attempt didn't make it to netdev mailing list]

A quick google search reveals that people with this card are blacklisting it
in the initramfs and in the module blacklist based on a statement that it
is unsupported. Since the older Digium is also unsupported I'm pretty
confident that this newer card is also not supported.

lspci -xxx -vv shows

04:07.0 Communication controller: Tiger Jet Network Inc. Tiger3XX Modem/ISDN interface
        Subsystem: Device b100:0003
P.

----8<----
The Asterisk Voice Card, DIGIUM TDM400P is unsupported by the netjet driver.
Blacklist it like the Digium X100P/X101P card.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>

diff --git a/drivers/isdn/hardware/mISDN/netjet.c b/drivers/isdn/hardware/mISDN/netjet.c
index 54ae71a..db25b6b 100644
--- a/drivers/isdn/hardware/mISDN/netjet.c
+++ b/drivers/isdn/hardware/mISDN/netjet.c
@@ -1072,6 +1072,12 @@ nj_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		return -ENODEV;
 	}
 
+	if (pdev->subsystem_vendor == 0xb100 &&
+	    pdev->subsystem_device == 0x0003 ) {
+		pr_notice("Netjet: Digium TDM400P not handled yet\n");
+		return -ENODEV;
+	}
+
 	card = kzalloc(sizeof(struct tiger_hw), GFP_ATOMIC);
 	if (!card) {
 		pr_info("No kmem for Netjet\n");

^ permalink raw reply related

* Re: linux-next: build failure after merge of the final tree
From: Mike Frysinger @ 2011-05-25 12:13 UTC (permalink / raw)
  To: balbi
  Cc: Linus Torvalds, Stephen Rothwell, linux-next, linux-kernel,
	David S. Miller, netdev, Andrew Morton, Mel Gorman, linux-mm,
	Alexander Viro, linux-fsdevel, Paul E. McKenney, Dipankar Sarma
In-Reply-To: <20110525092449.GJ14556@legolas.emea.dhcp.ti.com>

On Wed, May 25, 2011 at 05:24, Felipe Balbi wrote:
> On Tue, May 24, 2011 at 01:10:42PM -0400, Mike Frysinger wrote:
>> latest tree seems to only fail for me now on the musb driver.  i can
>> send out a patch later today if no one else has gotten to it yet.
>
> please do send out, but what was the compile breakage with musb ?

i logged it earlier in the thread:
drivers/usb/musb/musb_core.c: In function 'musb_write_fifo':
drivers/usb/musb/musb_core.c:219: error: implicit declaration of
function 'prefetch'
make[3]: *** [drivers/usb/musb/musb_core.o] Error 1

patch sent out now
-mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC Patch] bonding: move to net/ directory
From: Américo Wang @ 2011-05-25 12:32 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Linux Kernel Network Developers, David Miller, Jay Vosburgh
In-Reply-To: <20110524150340.GH21309@gospo.rdu.redhat.com>

On Tue, May 24, 2011 at 11:03 PM, Andy Gospodarek <andy@greyhouse.net> wrote:
> On Tue, May 24, 2011 at 10:00:23PM +0800, Américo Wang wrote:
>> On Mon, May 23, 2011 at 11:13 PM, Andy Gospodarek <andy@greyhouse.net> wrote:
>> > On Mon, May 23, 2011 at 08:45:14PM +0800, Américo Wang wrote:
>> >> Hello, Jay, Andy,
>> >>
>> >> Is there any peculiar reason that bonding code has to stay
>> >> in drivers/net/ directory?
>> >>
>> >> Since bonding and bridge are somewhat similar, both of
>> >> which are used to "bond" two or more devices into one,
>> >> and bridge code is already in net/bridge/, so I think it also
>> >> makes sense to move bonding code into net/bonding/ too.
>> >>
>> >> This could also help to grep the source more easily. :)
>> >>
>> >> What do you think about the patch below?
>> >> (Note, this patch is against net-next-2.6.)
>> >>
>> >
>> > I would rather keep the code for bonding in drivers/net since it is
>> > really a pure device (though not directly tied to any specific
>> > hardware) rather than a device + forwarding or management code.
>>
>> Is this a reason strong enough to leave it to drivers/net/ ?
>> I think it is generic enough to be moved to net/ directory... :-/
>>
>
> I think the distinction is an important one and is one of the main
> reasons why I would like to see bonding stay in drivers/net.
>

Is this a rule that we judge if a piece of code should be in net/
or drivers/net/ ? For me, that big difference between these
two directory is clearly if the code is generic or hardware/platform
independent.

>> >
>> > It has bothered me for a long time that the code just to manage the VLAN
>> > and bridge devices sits outside of drivers/net, but I've never proposed
>> > a patch to move the files as I suspect the maintainers of that code
>> > would like to keep it all together.  Maybe it is time to do that.
>> >
>>
>> You mean move net/8021q/ to drivers/net/8021q/ ?
>>
>
> No.  I'm talking about the parts in the bridging and vlan code that
> specifically setup devices, not all of the code.  I would be happier
> if code that created objects of type net_device_ops all lived in
> drivers/net.  Then all the drivers (real, stacked, or virtual) are in
> the same place.

This would make grepping the source code more harder
than it is, at least now we can grepping all bridge source code
in net/bridge/, for example.

Actually the vlan case you mentioned can be another example
to show that to moving bonding code into net/ makes sense.

^ permalink raw reply

* Re: [RFC Patch] bonding: move to net/ directory
From: Américo Wang @ 2011-05-25 12:43 UTC (permalink / raw)
  To: Neil Horman
  Cc: Andy Gospodarek, Linux Kernel Network Developers, David Miller,
	Jay Vosburgh
In-Reply-To: <20110524163323.GB28521@hmsreliant.think-freely.org>

On Wed, May 25, 2011 at 12:33 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> This all sounds like change for the sake of change to me.  I don't see any
> compelling argument for moving bonding (or bridging or vlans, etc) around at
> all.  All of these software drivers have feet in multple subsystems, but that
> just means that theres not going to be a compelling argument to move them any
> place,  at least not without an immediate subsequent argument that it really
> belonged back where it was.  Unless you can show a solid benefit to moving the
> code, I don't see why any reorganization is needed.

Well, as a people who worked on bonding code, I have no problem to know
bonding code is under drivers/net/, but for people who don't know this, probably
net/ is the first place they want to search.

The other similar thing is that pktgen is in net/core/ while netconsole is in
drivers/net/, which seems a little strange too.

vlan vs macvlan is the third example.

In short, there are three callers of netdev_rx_handler_register(), macvlan,
bonding and bridge, only bridge code stays in net/ directory.

^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Ingo Molnar @ 2011-05-25 12:47 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Valdis.Kletnieks, Eric W. Biederman, James Bottomley,
	Linus Torvalds, linux-kernel, Linux Containers, netdev
In-Reply-To: <BANLkTikVDrU7Nx6143gPuB=1vwqkMFiRiQ@mail.gmail.com>


* Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> > But at least the primary, 'native' syscall table of every arch 
> > could be kept rather fresh via generic enumeration.
> 
> So we can start all over at offset 501 (alpha just started using 
> 500) with a unified, clean, and compressed list of syscalls? Or do 
> we have some more other-os-compat syscalls around in this range?

No, that would leave a big hole in the syscall table of most 
architectures.

So what would be needed is for each architecture to define a 'generic 
syscall table base index', ARCH_SYSCALL_BASE or so, and the generic 
syscalls would be added for that.

Alpha would have 501, the others lower numbers.

The only general assumption we can rely on is that there's a range of 
not yet used syscall numbers starting at the end of the current 
syscall table.

Thanks,

	Ingo

^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Geert Uytterhoeven @ 2011-05-25 13:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Valdis.Kletnieks, Eric W. Biederman, James Bottomley,
	Linus Torvalds, linux-kernel, Linux Containers, netdev
In-Reply-To: <20110525124741.GC29300@elte.hu>

On Wed, May 25, 2011 at 14:47, Ingo Molnar <mingo@elte.hu> wrote:
> * Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
>> > But at least the primary, 'native' syscall table of every arch
>> > could be kept rather fresh via generic enumeration.
>>
>> So we can start all over at offset 501 (alpha just started using
>> 500) with a unified, clean, and compressed list of syscalls? Or do
>> we have some more other-os-compat syscalls around in this range?
>
> No, that would leave a big hole in the syscall table of most
> architectures.

Sure, but we could (a) optimize for the case where the syscall number is
larger than 500 and/or (b) drop support for syscall numbers smaller than
501, depending on a config option.

> So what would be needed is for each architecture to define a 'generic
> syscall table base index', ARCH_SYSCALL_BASE or so, and the generic
> syscalls would be added for that.
>
> Alpha would have 501, the others lower numbers.
>
> The only general assumption we can rely on is that there's a range of
> not yet used syscall numbers starting at the end of the current
> syscall table.

Yep, that would work too.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox