Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [ICMP]: Avoid sparse warnings in net/ipv4/icmp.c
From: David Miller @ 2008-01-04  5:26 UTC (permalink / raw)
  To: dada1; +Cc: netdev
In-Reply-To: <477DC304.40508@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Fri, 04 Jan 2008 06:24:20 +0100

>    CHECK   net/ipv4/icmp.c
> net/ipv4/icmp.c:249:13: warning: context imbalance in 'icmp_xmit_unlock' - 
> unexpected unlock
> net/ipv4/icmp.c:376:13: warning: context imbalance in 'icmp_reply' - different 
> lock contexts for basic block
> net/ipv4/icmp.c:430:6: warning: context imbalance in 'icmp_send' - different 
> lock contexts for basic block
> 
> Solution is to declare icmp_xmit_unlock() as __inline__ (similar with 
> icmp_xmit_lock())

Please use "inline" instead of "__inline__".  If other stuff
uses __inline__ in that file, feel free to fix it up too.

^ permalink raw reply

* [ICMP]: Avoid sparse warnings in net/ipv4/icmp.c
From: Eric Dumazet @ 2008-01-04  5:24 UTC (permalink / raw)
  To: David S. Miller, Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 559 bytes --]

   CHECK   net/ipv4/icmp.c
net/ipv4/icmp.c:249:13: warning: context imbalance in 'icmp_xmit_unlock' - 
unexpected unlock
net/ipv4/icmp.c:376:13: warning: context imbalance in 'icmp_reply' - different 
lock contexts for basic block
net/ipv4/icmp.c:430:6: warning: context imbalance in 'icmp_send' - different 
lock contexts for basic block

Solution is to declare icmp_xmit_unlock() as __inline__ (similar with 
icmp_xmit_lock())

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

  net/ipv4/icmp.c |    2 +-
  1 files changed, 1 insertion(+), 1 deletion(-)


[-- Attachment #2: icmp.patch --]
[-- Type: text/plain, Size: 335 bytes --]

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index fc66c8a..01ce07b 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -246,7 +246,7 @@ static __inline__ int icmp_xmit_lock(void)
 	return 0;
 }
 
-static void icmp_xmit_unlock(void)
+static __inline__ void icmp_xmit_unlock(void)
 {
 	spin_unlock_bh(&icmp_sock->sk_lock.slock);
 }

^ permalink raw reply related

* Re: [NET]: prot_inuse cleanups and optimizations
From: David Miller @ 2008-01-04  4:47 UTC (permalink / raw)
  To: dada1; +Cc: netdev, acme
In-Reply-To: <20080103184154.220b5294.dada1@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Thu, 3 Jan 2008 18:41:54 +0100

> 1) Cleanups (all functions are prefixed by sock_prot_inuse)
> 
> sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1)
> sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1)
> sock_prot_inuse()       -> sock_prot_inuse_get()
> 
> New functions :
> 
> sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use.
> 
> 2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto",
> since nobody wants to read the inuse value.
> 
> This saves 1372 bytes on i386/SMP and some cpu cycles.
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

I hate the ifdef, but it's in a header so what can I do? :-)

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH 1/2] LSM: Add inet_sys_snd_skb() LSM hook
From: David Miller @ 2008-01-04  4:45 UTC (permalink / raw)
  To: paul.moore; +Cc: netdev
In-Reply-To: <20080103172539.14445.1668.stgit@flek.americas.hpqcorp.net>

From: Paul Moore <paul.moore@hp.com>
Date: Thu, 03 Jan 2008 12:25:39 -0500

> Add an inet_sys_snd_skb() LSM hook to allow the LSM to provide packet level
> access control for all outbound packets.  Using the existing postroute_last
> netfilter hook turns out to be problematic as it is can be invoked multiple
> times for a single packet, e.g. individual IPsec transforms, adding unwanted
> overhead and complicating the security policy.
> 
> Signed-off-by: Paul Moore <paul.moore@hp.com>

I disagree with this change.

The packet is different each time you see it in the postrouting hook,
and also the new hook is thus redundant.

If it's a performance issue and you can classify the security early,
mark the SKB as "seen" and then on subsequent hooks you can just
return immediately if that flag is set.

^ permalink raw reply

* Re: [XFRM]: Do not define km_migrate() if !CONFIG_XFRM_MIGRATE
From: David Miller @ 2008-01-04  4:43 UTC (permalink / raw)
  To: dada1; +Cc: netdev
In-Reply-To: <20080103182449.ee02958e.dada1@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Thu, 3 Jan 2008 18:24:49 +0100

> In include/net/xfrm.h we find :
> 
> #ifdef CONFIG_XFRM_MIGRATE
> extern int km_migrate(struct xfrm_selector *sel, u8 dir, u8 type,
>                       struct xfrm_migrate *m, int num_bundles);
> ...
> #endif
> 
> We can also guard the function body itself in net/xfrm/xfrm_state.c
> with same condition.
> 
> (Problem spoted by sparse checker)
> make C=2 net/xfrm/xfrm_state.o
> ...
> net/xfrm/xfrm_state.c:1765:5: warning: symbol 'km_migrate' was not declared. Should it be static?
> ...
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Applied to net-2.6, thanks Eric.

^ permalink raw reply

* Re: [LIB] pcounter : unline too big functions
From: David Miller @ 2008-01-04  4:41 UTC (permalink / raw)
  To: dada1; +Cc: acme, herbert, netdev
In-Reply-To: <20080103165227.02fc0da4.dada1@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Thu, 3 Jan 2008 16:52:27 +0100

> Before pushing pcounter to Linus tree, I would like to make some adjustments.
> 
> Goal is to reduce kernel text size, by unlining too big functions.
...
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [NET]: Avoid divides in net/core/gen_estimator.c
From: David Miller @ 2008-01-04  4:40 UTC (permalink / raw)
  To: dada1; +Cc: netdev
In-Reply-To: <20080103141235.dbff886f.dada1@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Thu, 3 Jan 2008 14:12:35 +0100

> We can void divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on x86)
> changing ((HZ<<idx)/4) to ((HZ/4) << idx)
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-2.6.25 3/3] [TCP]: Perform setting of common control fields in one place
From: David Miller @ 2008-01-04  4:39 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev
In-Reply-To: <11993580071094-git-send-email-ilpo.jarvinen@helsinki.fi>

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu,  3 Jan 2008 13:00:07 +0200

> In case of segments which are purely for control without any
> data (SYN/ACK/FIN/RST), many fields are set to common values
> in multiple places.
 ...
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

It's nice to see Arnaldo's tools get some usage :-)

Applied, thanks!

^ permalink raw reply

* Re: [PATCH net-2.6.25 2/3] [TCP]: Urgent parameter effect can be simplified.
From: David Miller @ 2008-01-04  4:38 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev
In-Reply-To: <11993580071646-git-send-email-ilpo.jarvinen@helsinki.fi>

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu,  3 Jan 2008 13:00:06 +0200

> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied.

^ permalink raw reply

* Re: [PATCH net-2.6.25 1/3] [TCP]: cleanup tcp_parse_options deep indented switch
From: David Miller @ 2008-01-04  4:37 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev
In-Reply-To: <11993580071900-git-send-email-ilpo.jarvinen@helsinki.fi>

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu,  3 Jan 2008 13:00:05 +0200

> Removed case indentation level & combined some nested ifs, mostly
> within 80 lines now. This is a leftover from indent patch, it
> just had to be done manually to avoid messing it up completely.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied.

^ permalink raw reply

* Re: forcedeth: MAC-address reversed on resume from suspend
From: Björn Steinbrink @ 2008-01-04  3:43 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andreas Mohr, Richard Jonsson, linux-kernel, Ayaz Abdulla,
	jgarzik, netdev
In-Reply-To: <20080102234209.GA10831@does.not.exist>

On 2008.01.03 01:42:09 +0200, Adrian Bunk wrote:
> [ original bug report: http://lkml.org/lkml/2008/1/2/253 ]
> 
> On Wed, Jan 02, 2008 at 10:48:43PM +0100, Andreas Mohr wrote:
> > Hi,
> > 
> > On Wed, Jan 02, 2008 at 10:04:49PM +0100, Richard Jonsson wrote:
> > > Bugreport regarding forcedeth driver.
> > >
> > > When returning from suspend-to-RAM the MAC-address byteorder is
> > > reversed.  After another suspend-resume cycle the MAC-address is
> > > again correct. This brings a great deal of pain since the NIC is
> > > assigned a random MAC-address when it is detected as invalid.
> > >
> > > I cannot say if this issue appeared recently or if it's been in
> > > the kernel for a while, as I've been using wireless until
> > > recently.
> > >
> > > I read somewhere on lkml that the mac is read from the device,
> > > then reversed and finally written back to the device. Can it be
> > > that it is read again on resume and reversed, and then again
> > > written to the device? This would explain why it's ok every other
> > > resume cycle.
> > 
> > Uh, Use The Source, Luke?
> > 
> > But no, I think it's simply driver dainbreadness, not a matter of
> > complicated writing and reading back in reversed order.
> > 
> > drivers/net/forcedeth.c has a nice DEV_HAS_CORRECT_MACADDR flag
> > which is being enabled for certain cards (those which need this) and
> > disabled for others.
> > 
> > The nv_probe() function then nicely obeys this flag by reversing the
> > address if needed, _however_ there's another function,
> > nv_copy_mac_to_hw(), which does _NOT_ obey it and simply copies the
> > _raw_ netdev dev_addr to the card register (NvRegMacAddrA etc.).

nv_copy_mac_to_hw() is called from nv_probe() _after_ the MAC address
has been fixed, and from nv_set_mac_address() which is supposed to get a
correct MAC address anyway. So I don't see any problem there.

After some digging, I guess it's related to
5070d3408405ae1941f259acac7a9882045c3be4

That introduced a flag in a register to signal if the MAC address has
been corrected, so that for example kexec won't reverse the adddress
again, when calling nv_probe() again.

As I know neither the suspend nor the kexec code well enough, I'll have
to make a few smart guesses (let's hope that that works out *g*):

a) suspend manages to reverse the MAC address again, so it must call
nv_probe. And we have lost the flag that stops us from reversing the
address. We cannot have lost the fixed MAC address, because then we'd
always get the reversed address, and not go back and forth. I'll assume
that it will also call nv_remove during suspend, because it's nicely
symmetrical then :-)

b) kexec does not call nv_remove, because otherwise, it would see a
reversed address on its nv_probe call anyway, and the flag wouldn't be
necessary.

Now the commit that introduced the flag did also introduce an indirect
change to nv_remove. Although it still says that it writes back the
broken MAC address, that's no longer true, because orig_mac now also
gets the correct address in nv_probe.

Smart guessing time again: That was required because otherwise a "rmood
forcedeth && modprobe forcedeth" would have produced a reversed MAC
address.

But that also causes the problem that we get when we loose either the
flag, we start reversing the fixed address. If we instead return the
card to its initial state in nv_remove, ie. unset the flag and write
back the reversed address, then everyone going through nv_remove _and_
nv_probe should be happy. And kexec, which only goes through nv_probe
again and doesn't loose the flag should be happy, too.

Richard, could you give this a spin? And then we'd likely need someone
to test that with kexec...

thanks,
Björn
---
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index a96583c..f84c752 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -5199,10 +5199,6 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 		dev->dev_addr[3] = (np->orig_mac[0] >> 16) & 0xff;
 		dev->dev_addr[4] = (np->orig_mac[0] >>  8) & 0xff;
 		dev->dev_addr[5] = (np->orig_mac[0] >>  0) & 0xff;
-		/* set permanent address to be correct aswell */
-		np->orig_mac[0] = (dev->dev_addr[0] << 0) + (dev->dev_addr[1] << 8) +
-			(dev->dev_addr[2] << 16) + (dev->dev_addr[3] << 24);
-		np->orig_mac[1] = (dev->dev_addr[4] << 0) + (dev->dev_addr[5] << 8);
 		writel(txreg|NVREG_TRANSMITPOLL_MAC_ADDR_REV, base + NvRegTransmitPoll);
 	}
 	memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
@@ -5414,6 +5410,8 @@ static void __devexit nv_remove(struct pci_dev *pci_dev)
 	 */
 	writel(np->orig_mac[0], base + NvRegMacAddrA);
 	writel(np->orig_mac[1], base + NvRegMacAddrB);
+	writel(readl(base + NvRegTransmitPoll) & ~NVREG_TRANSMITPOLL_MAC_ADDR_REV,
+	       base + NvRegTransmitPoll);

 	/* free all structures */
 	free_rings(dev);

^ permalink raw reply related

* Re: [RFC PATCH] NET: Clone the sk_buff->iif field properly
From: David Miller @ 2008-01-04  3:36 UTC (permalink / raw)
  To: paul.moore; +Cc: joe, jarkao2, hadi, netdev
In-Reply-To: <200801032219.04223.paul.moore@hp.com>

From: Paul Moore <paul.moore@hp.com>
Date: Thu, 3 Jan 2008 22:19:03 -0500

> > Perhaps move the skb->cloned = 1 to just after n->cloned = 1
> > or
> > 	skb->cloned = n->cloned = 1;
> > or maybe
> > 	skb->cloned = 1;
> > 	C(cloned);
> 
> I thought about that, but I kinda like how the parent-skb-only changes are 
> grouped together at the end.  I think the distinction helps readability, but 
> then again we've already seen how subjective readability can be :)

I think either way is fine, as long as the stores are ordered
properly.  The ordering of the loads and little issues like
this skb->cloned thing are much less important.

Actually, if you look at the generated assembler, GCC makes
a mess of all of these bitfield accesses, largely destroying
the pure store stream since it does a read/modify/write on
a word for each bitfield set.

^ permalink raw reply

* lockless pagecache Cassini regression
From: David Miller @ 2008-01-04  3:32 UTC (permalink / raw)
  To: npiggin; +Cc: netdev

Nick, I think the following changeset:

    commit fa4f0774d7c6cccb4d1fda76b91dd8eddcb2dd6a

    [CASSINI]: dont touch page_count

    Remove page refcount manipulations from cassini driver by using
    another field in struct page. Needed for lockless pagecache.

    Signed-off-by: Nick Piggin <npiggin@suse.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Broke the Cassini driver.

While it is true that, within the cassini driver, you converted
the counter bumps and tests accurately, this changeset does not
account for the page count bumps that are going to occur
in other contexts when the SKBs being constructed are freed
up (and thus the skb_frag_struct pages get liberated).

Basically what these drivers do is allocate a page, give it
to the network card, and the card decides how to chop up the
page based upon the size of arriving packets.  Therefore we
don't know ahead of time how many references to the page we
will generate while attaching bits of the page to receive
packet SKBs that get built.

As incoming packets are constructed by the card using chunks of these
system pages, it indicates in the descriptor which parts of which
pages were used.  We then use this to attach the page parts onto the
SKB.  Once the SKB is constructed it is passed up to the networking,
later on the SKB is freed and the page references are dropped by
kfree_skbmem().

And it is these page reference counts that the Cassini driver needs to
test to be correct, not the driver local ones we are now using.

I do something similar to what Cassini was doing in the NIU driver
(drivers/net/niu.c).  I think we need to restore Cassini to something
similar to what NIU is doing.

The only tricky bit is the page count checks, and frankly those can
just be transformed into references to compount_head(page)->_count or
similar.

Actually, page_count() does exactly that, it is implemented by
checking atomic_read(&compound_head(page)->_count) and thus I
think the best thing to do is revert your changes.

These pages are freshly allocated pages, not stuff in the page
cache or similar, so I think this is safe and the way to move
forward.

Also, these changes added lots of new atomics to the driver.
It's already doing get_page() etc. as needed.

What do you think Nick?

[CASSINI]: Revert 'dont touch page_count'.

This reverts changeset fa4f0774d7c6cccb4d1fda76b91dd8eddcb2dd6a
([CASSINI]: dont touch page_count) because it breaks the driver.

The local page counting added by this changeset did not account
for the asynchronous page count changes done by kfree_skb()
and friends.

The change adds extra atomics and on top of it all appears to be
totally unnecessary as well.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/cassini.c b/drivers/net/cassini.c
index 9030ca5..ff957f2 100644
--- a/drivers/net/cassini.c
+++ b/drivers/net/cassini.c
@@ -336,30 +336,6 @@ static inline void cas_mask_intr(struct cas *cp)
 		cas_disable_irq(cp, i);
 }

-static inline void cas_buffer_init(cas_page_t *cp)
-{
-	struct page *page = cp->buffer;
-	atomic_set((atomic_t *)&page->lru.next, 1);
-}
-
-static inline int cas_buffer_count(cas_page_t *cp)
-{
-	struct page *page = cp->buffer;
-	return atomic_read((atomic_t *)&page->lru.next);
-}
-
-static inline void cas_buffer_inc(cas_page_t *cp)
-{
-	struct page *page = cp->buffer;
-	atomic_inc((atomic_t *)&page->lru.next);
-}
-
-static inline void cas_buffer_dec(cas_page_t *cp)
-{
-	struct page *page = cp->buffer;
-	atomic_dec((atomic_t *)&page->lru.next);
-}
-
 static void cas_enable_irq(struct cas *cp, const int ring)
 {
 	if (ring == 0) { /* all but TX_DONE */
@@ -497,7 +473,6 @@ static int cas_page_free(struct cas *cp, cas_page_t *page)
 {
 	pci_unmap_page(cp->pdev, page->dma_addr, cp->page_size,
 		       PCI_DMA_FROMDEVICE);
-	cas_buffer_dec(page);
 	__free_pages(page->buffer, cp->page_order);
 	kfree(page);
 	return 0;
@@ -527,7 +502,6 @@ static cas_page_t *cas_page_alloc(struct cas *cp, const gfp_t flags)
 	page->buffer = alloc_pages(flags, cp->page_order);
 	if (!page->buffer)
 		goto page_err;
-	cas_buffer_init(page);
 	page->dma_addr = pci_map_page(cp->pdev, page->buffer, 0,
 				      cp->page_size, PCI_DMA_FROMDEVICE);
 	return page;
@@ -606,7 +580,7 @@ static void cas_spare_recover(struct cas *cp, const gfp_t flags)
 	list_for_each_safe(elem, tmp, &list) {
 		cas_page_t *page = list_entry(elem, cas_page_t, list);

-		if (cas_buffer_count(page) > 1)
+		if (page_count(page->buffer) > 1) 
 			continue;

 		list_del(elem);
@@ -1374,7 +1348,7 @@ static inline cas_page_t *cas_page_spare(struct cas *cp, const int index)
 	cas_page_t *page = cp->rx_pages[1][index];
 	cas_page_t *new;

-	if (cas_buffer_count(page) == 1)
+	if (page_count(page->buffer) == 1)
 		return page;

 	new = cas_page_dequeue(cp);
@@ -1394,7 +1368,7 @@ static cas_page_t *cas_page_swap(struct cas *cp, const int ring,
 	cas_page_t **page1 = cp->rx_pages[1];

 	/* swap if buffer is in use */
-	if (cas_buffer_count(page0[index]) > 1) {
+	if (page_count(page0[index]->buffer) > 1) {
 		cas_page_t *new = cas_page_spare(cp, index);
 		if (new) {
 			page1[index] = page0[index];
@@ -2066,7 +2040,6 @@ static int cas_rx_process_pkt(struct cas *cp, struct cas_rx_comp *rxc,
 		skb->len      += hlen - swivel;

 		get_page(page->buffer);
-		cas_buffer_inc(page);
 		frag->page = page->buffer;
 		frag->page_offset = off;
 		frag->size = hlen - swivel;
@@ -2091,7 +2064,6 @@ static int cas_rx_process_pkt(struct cas *cp, struct cas_rx_comp *rxc,
 			frag++;

 			get_page(page->buffer);
-			cas_buffer_inc(page);
 			frag->page = page->buffer;
 			frag->page_offset = 0;
 			frag->size = hlen;
@@ -2255,7 +2227,7 @@ static int cas_post_rxds_ringN(struct cas *cp, int ring, int num)
 	released = 0;
 	while (entry != last) {
 		/* make a new buffer if it's still in use */
-		if (cas_buffer_count(page[entry]) > 1) {
+		if (page_count(page[entry]->buffer) > 1) {
 			cas_page_t *new = cas_page_dequeue(cp);
 			if (!new) {
 				/* let the timer know that we need to

^ permalink raw reply related

* Re: [RFC PATCH] NET: Clone the sk_buff->iif field properly
From: Paul Moore @ 2008-01-04  3:19 UTC (permalink / raw)
  To: Joe Perches; +Cc: David Miller, jarkao2, hadi, netdev
In-Reply-To: <1199403607.4888.6.camel@localhost>

On Thursday 03 January 2008 6:40:07 pm Joe Perches wrote:
> On Thu, 2008-01-03 at 18:13 -0500, Paul Moore wrote:
> > static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff
> > *skb)
> > {
> > #define C(x) n->x = skb->x
> >
> > 	n->next = n->prev = NULL;
> > 	n->sk = NULL;
> > 	__copy_skb_header(n, skb);
> >
> > 	C(len);
> > 	C(data_len);
> > 	C(mac_len);
> > 	n->hdr_len = skb->nohdr ? skb_headroom(skb) : skb->hdr_len;
> > 	n->cloned = 1;
> > 	n->nohdr = 0;
> > 	n->destructor = NULL;
> > 	C(iif);
> > 	C(tail);
> > 	C(end);
> > 	C(head);
> > 	C(data);
> > 	C(truesize);
> > 	atomic_set(&n->users, 1);
> >
> > 	atomic_inc(&(skb_shinfo(skb)->dataref));
> > 	skb->cloned = 1;
> >
> > 	return n;
> > #undef C
>
> Perhaps move the skb->cloned = 1 to just after n->cloned = 1
> or
> 	skb->cloned = n->cloned = 1;
> or maybe
> 	skb->cloned = 1;
> 	C(cloned);

I thought about that, but I kinda like how the parent-skb-only changes are 
grouped together at the end.  I think the distinction helps readability, but 
then again we've already seen how subjective readability can be :)

-- 
paul moore
linux security @ hp

^ permalink raw reply

* [PATCH] PHYLIB: Locking fixes for PHY I/O potentially sleeping
From: Nate Case @ 2008-01-03 23:36 UTC (permalink / raw)
  To: Andy Fleming; +Cc: netdev

PHY read/write functions can potentially sleep (e.g., a PHY accessed
via I2C).  The following changes were made to account for this:

    * Change spin locks to mutex locks
    * Add a BUG_ON() to phy_read() phy_write() to warn against
      calling them from an interrupt context.
    * Use work queue for PHY state machine handling since
      it can potentially sleep
    * Change phydev lock from spinlock to mutex

Signed-off-by: Nate Case <ncase@xes-inc.com>
---
 drivers/net/phy/mdio_bus.c   |    2 +-
 drivers/net/phy/phy.c        |   68 ++++++++++++++++++++++++++++-------------
 drivers/net/phy/phy_device.c |   11 +++----
 include/linux/phy.h          |    5 ++-
 4 files changed, 55 insertions(+), 31 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index c30196d..6e9f619 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -49,7 +49,7 @@ int mdiobus_register(struct mii_bus *bus)
 	int i;
 	int err = 0;
 
-	spin_lock_init(&bus->mdio_lock);
+	mutex_init(&bus->mdio_lock);
 
 	if (NULL == bus || NULL == bus->name ||
 			NULL == bus->read ||
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 7c9e6e3..12fccb1 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -26,7 +26,6 @@
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
 #include <linux/skbuff.h>
-#include <linux/spinlock.h>
 #include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/mii.h>
@@ -72,9 +71,11 @@ int phy_read(struct phy_device *phydev, u16 regnum)
 	int retval;
 	struct mii_bus *bus = phydev->bus;
 
-	spin_lock_bh(&bus->mdio_lock);
+	BUG_ON(in_interrupt());
+
+	mutex_lock(&bus->mdio_lock);
 	retval = bus->read(bus, phydev->addr, regnum);
-	spin_unlock_bh(&bus->mdio_lock);
+	mutex_unlock(&bus->mdio_lock);
 
 	return retval;
 }
@@ -95,9 +96,11 @@ int phy_write(struct phy_device *phydev, u16 regnum, u16 val)
 	int err;
 	struct mii_bus *bus = phydev->bus;
 
-	spin_lock_bh(&bus->mdio_lock);
+	BUG_ON(in_interrupt());
+
+	mutex_lock(&bus->mdio_lock);
 	err = bus->write(bus, phydev->addr, regnum, val);
-	spin_unlock_bh(&bus->mdio_lock);
+	mutex_unlock(&bus->mdio_lock);
 
 	return err;
 }
@@ -428,7 +431,7 @@ int phy_start_aneg(struct phy_device *phydev)
 {
 	int err;
 
-	spin_lock_bh(&phydev->lock);
+	mutex_lock(&phydev->lock);
 
 	if (AUTONEG_DISABLE == phydev->autoneg)
 		phy_sanitize_settings(phydev);
@@ -449,13 +452,14 @@ int phy_start_aneg(struct phy_device *phydev)
 	}
 
 out_unlock:
-	spin_unlock_bh(&phydev->lock);
+	mutex_unlock(&phydev->lock);
 	return err;
 }
 EXPORT_SYMBOL(phy_start_aneg);
 
 
 static void phy_change(struct work_struct *work);
+static void phy_state_machine(struct work_struct *work);
 static void phy_timer(unsigned long data);
 
 /**
@@ -476,6 +480,7 @@ void phy_start_machine(struct phy_device *phydev,
 {
 	phydev->adjust_state = handler;
 
+	INIT_WORK(&phydev->state_queue, phy_state_machine);
 	init_timer(&phydev->phy_timer);
 	phydev->phy_timer.function = &phy_timer;
 	phydev->phy_timer.data = (unsigned long) phydev;
@@ -493,11 +498,12 @@ void phy_start_machine(struct phy_device *phydev,
 void phy_stop_machine(struct phy_device *phydev)
 {
 	del_timer_sync(&phydev->phy_timer);
+	cancel_work_sync(&phydev->state_queue);
 
-	spin_lock_bh(&phydev->lock);
+	mutex_lock(&phydev->lock);
 	if (phydev->state > PHY_UP)
 		phydev->state = PHY_UP;
-	spin_unlock_bh(&phydev->lock);
+	mutex_unlock(&phydev->lock);
 
 	phydev->adjust_state = NULL;
 }
@@ -541,9 +547,9 @@ static void phy_force_reduction(struct phy_device *phydev)
  */
 void phy_error(struct phy_device *phydev)
 {
-	spin_lock_bh(&phydev->lock);
+	mutex_lock(&phydev->lock);
 	phydev->state = PHY_HALTED;
-	spin_unlock_bh(&phydev->lock);
+	mutex_unlock(&phydev->lock);
 }
 
 /**
@@ -705,10 +711,10 @@ static void phy_change(struct work_struct *work)
 	if (err)
 		goto phy_err;
 
-	spin_lock_bh(&phydev->lock);
+	mutex_lock(&phydev->lock);
 	if ((PHY_RUNNING == phydev->state) || (PHY_NOLINK == phydev->state))
 		phydev->state = PHY_CHANGELINK;
-	spin_unlock_bh(&phydev->lock);
+	mutex_unlock(&phydev->lock);
 
 	atomic_dec(&phydev->irq_disable);
 	enable_irq(phydev->irq);
@@ -735,7 +741,7 @@ phy_err:
  */
 void phy_stop(struct phy_device *phydev)
 {
-	spin_lock_bh(&phydev->lock);
+	mutex_lock(&phydev->lock);
 
 	if (PHY_HALTED == phydev->state)
 		goto out_unlock;
@@ -751,7 +757,7 @@ void phy_stop(struct phy_device *phydev)
 	phydev->state = PHY_HALTED;
 
 out_unlock:
-	spin_unlock_bh(&phydev->lock);
+	mutex_unlock(&phydev->lock);
 
 	/*
 	 * Cannot call flush_scheduled_work() here as desired because
@@ -773,7 +779,7 @@ out_unlock:
  */
 void phy_start(struct phy_device *phydev)
 {
-	spin_lock_bh(&phydev->lock);
+	mutex_lock(&phydev->lock);
 
 	switch (phydev->state) {
 		case PHY_STARTING:
@@ -787,19 +793,26 @@ void phy_start(struct phy_device *phydev)
 		default:
 			break;
 	}
-	spin_unlock_bh(&phydev->lock);
+	mutex_unlock(&phydev->lock);
 }
 EXPORT_SYMBOL(phy_stop);
 EXPORT_SYMBOL(phy_start);
 
-/* PHY timer which handles the state machine */
-static void phy_timer(unsigned long data)
+/**
+ * phy_state_machine - Handle the state machine
+ * @work: work_struct that describes the work to be done
+ *
+ * Description: Scheduled by the state_queue workqueue each time
+ *   phy_timer is triggered.
+ */
+static void phy_state_machine(struct work_struct *work)
 {
-	struct phy_device *phydev = (struct phy_device *)data;
+	struct phy_device *phydev =
+			container_of(work, struct phy_device, state_queue);
 	int needs_aneg = 0;
 	int err = 0;
 
-	spin_lock_bh(&phydev->lock);
+	mutex_lock(&phydev->lock);
 
 	if (phydev->adjust_state)
 		phydev->adjust_state(phydev->attached_dev);
@@ -965,7 +978,7 @@ static void phy_timer(unsigned long data)
 			break;
 	}
 
-	spin_unlock_bh(&phydev->lock);
+	mutex_unlock(&phydev->lock);
 
 	if (needs_aneg)
 		err = phy_start_aneg(phydev);
@@ -976,3 +989,14 @@ static void phy_timer(unsigned long data)
 	mod_timer(&phydev->phy_timer, jiffies + PHY_STATE_TIME * HZ);
 }
 
+/* PHY timer which schedules the state machine work */
+static void phy_timer(unsigned long data)
+{
+	struct phy_device *phydev = (struct phy_device *)data;
+
+	/*
+	 * PHY I/O operations can potentially sleep so we ensure that
+	 * it's done from a process context
+	 */
+	schedule_work(&phydev->state_queue);
+}
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 5b9e175..f4c4fd8 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -25,7 +25,6 @@
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
 #include <linux/skbuff.h>
-#include <linux/spinlock.h>
 #include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/mii.h>
@@ -80,7 +79,7 @@ struct phy_device* phy_device_create(struct mii_bus *bus, int addr, int phy_id)
 
 	dev->state = PHY_DOWN;
 
-	spin_lock_init(&dev->lock);
+	mutex_init(&dev->lock);
 
 	return dev;
 }
@@ -656,7 +655,7 @@ static int phy_probe(struct device *dev)
 	if (!(phydrv->flags & PHY_HAS_INTERRUPT))
 		phydev->irq = PHY_POLL;
 
-	spin_lock_bh(&phydev->lock);
+	mutex_lock(&phydev->lock);
 
 	/* Start out supporting everything. Eventually,
 	 * a controller will attach, and may modify one
@@ -670,7 +669,7 @@ static int phy_probe(struct device *dev)
 	if (phydev->drv->probe)
 		err = phydev->drv->probe(phydev);
 
-	spin_unlock_bh(&phydev->lock);
+	mutex_unlock(&phydev->lock);
 
 	return err;
 
@@ -682,9 +681,9 @@ static int phy_remove(struct device *dev)
 
 	phydev = to_phy_device(dev);
 
-	spin_lock_bh(&phydev->lock);
+	mutex_lock(&phydev->lock);
 	phydev->state = PHY_DOWN;
-	spin_unlock_bh(&phydev->lock);
+	mutex_unlock(&phydev->lock);
 
 	if (phydev->drv->remove)
 		phydev->drv->remove(phydev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 554836e..5e43ae7 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -88,7 +88,7 @@ struct mii_bus {
 
 	/* A lock to ensure that only one thing can read/write
 	 * the MDIO bus at a time */
-	spinlock_t mdio_lock;
+	struct mutex mdio_lock;
 
 	struct device *dev;
 
@@ -284,10 +284,11 @@ struct phy_device {
 
 	/* Interrupt and Polling infrastructure */
 	struct work_struct phy_queue;
+	struct work_struct state_queue;
 	struct timer_list phy_timer;
 	atomic_t irq_disable;
 
-	spinlock_t lock;
+	struct mutex lock;
 
 	struct net_device *attached_dev;
 
-- 
1.5.3.3


^ permalink raw reply related

* Re: [RFC PATCH] NET: Clone the sk_buff->iif field properly
From: Joe Perches @ 2008-01-03 23:40 UTC (permalink / raw)
  To: Paul Moore; +Cc: David Miller, jarkao2, hadi, netdev
In-Reply-To: <200801031813.57621.paul.moore@hp.com>

On Thu, 2008-01-03 at 18:13 -0500, Paul Moore wrote:
> static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff 
> *skb)
> {
> #define C(x) n->x = skb->x
> 
> 	n->next = n->prev = NULL;
> 	n->sk = NULL;
> 	__copy_skb_header(n, skb);
> 
> 	C(len);
> 	C(data_len);
> 	C(mac_len);
> 	n->hdr_len = skb->nohdr ? skb_headroom(skb) : skb->hdr_len;
> 	n->cloned = 1;
> 	n->nohdr = 0;
> 	n->destructor = NULL;
> 	C(iif);
> 	C(tail);
> 	C(end);
> 	C(head);
> 	C(data);
> 	C(truesize);
> 	atomic_set(&n->users, 1);
> 
> 	atomic_inc(&(skb_shinfo(skb)->dataref));
> 	skb->cloned = 1;
> 
> 	return n;
> #undef C

Perhaps move the skb->cloned = 1 to just after n->cloned = 1
or
	skb->cloned = n->cloned = 1;
or maybe
	skb->cloned = 1;
	C(cloned);


^ permalink raw reply

* Re: [RFC PATCH] NET: Clone the sk_buff->iif field properly
From: David Miller @ 2008-01-03 23:25 UTC (permalink / raw)
  To: paul.moore; +Cc: jarkao2, hadi, netdev
In-Reply-To: <200801031813.57621.paul.moore@hp.com>

From: Paul Moore <paul.moore@hp.com>
Date: Thu, 3 Jan 2008 18:13:57 -0500

> On Thursday 03 January 2008 6:05:18 pm David Miller wrote:
> > From: Paul Moore <paul.moore@hp.com>
> > Date: Thu, 3 Jan 2008 16:20:06 -0500
> >
> > > On Thursday 03 January 2008 4:13:12 pm Jarek Poplawski wrote:
> > > > On Thu, Jan 03, 2008 at 11:15:34AM -0500, Paul Moore wrote:
> > > > ...
> > > >
> > > > > While I'm at it, is there some reason for this #define in
> > > > > __skb_clone()?
> > > > >
> > > > >  #define C(x) n->x = skb->x
> > > > >
> > > > > ... it seems kinda silly to me and I tend to think the code
> > > > > would be better without it.
> > > >
> > > > IMHO, if there are a lot of this, it's definitely more readable:
> > > > easier to check which values are simply copied and which need
> > > > something more. But, as usual, it's probably a question of taste,
> > > > and of course without it it would definitely look classier...
> > >
> > > For me personally, I would argue the readability bit.
> >
> > I definitely think the C() thing is more readable.
> >
> > Less typing, less reading...
> 
> Well, you're the boss :)  I just put the C() macro back in, but I kept 
> the reordering that was suggested to help reduce cacheline bounces 
> since that still makes sense to me.  The function now looks like this:

Looks ok to me.

^ permalink raw reply

* Re: [RFC PATCH] NET: Clone the sk_buff->iif field properly
From: Paul Moore @ 2008-01-03 23:13 UTC (permalink / raw)
  To: David Miller; +Cc: jarkao2, hadi, netdev
In-Reply-To: <20080103.150518.05131300.davem@davemloft.net>

On Thursday 03 January 2008 6:05:18 pm David Miller wrote:
> From: Paul Moore <paul.moore@hp.com>
> Date: Thu, 3 Jan 2008 16:20:06 -0500
>
> > On Thursday 03 January 2008 4:13:12 pm Jarek Poplawski wrote:
> > > On Thu, Jan 03, 2008 at 11:15:34AM -0500, Paul Moore wrote:
> > > ...
> > >
> > > > While I'm at it, is there some reason for this #define in
> > > > __skb_clone()?
> > > >
> > > >  #define C(x) n->x = skb->x
> > > >
> > > > ... it seems kinda silly to me and I tend to think the code
> > > > would be better without it.
> > >
> > > IMHO, if there are a lot of this, it's definitely more readable:
> > > easier to check which values are simply copied and which need
> > > something more. But, as usual, it's probably a question of taste,
> > > and of course without it it would definitely look classier...
> >
> > For me personally, I would argue the readability bit.
>
> I definitely think the C() thing is more readable.
>
> Less typing, less reading...

Well, you're the boss :)  I just put the C() macro back in, but I kept 
the reordering that was suggested to help reduce cacheline bounces 
since that still makes sense to me.  The function now looks like this:

static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff 
*skb)
{
#define C(x) n->x = skb->x

	n->next = n->prev = NULL;
	n->sk = NULL;
	__copy_skb_header(n, skb);

	C(len);
	C(data_len);
	C(mac_len);
	n->hdr_len = skb->nohdr ? skb_headroom(skb) : skb->hdr_len;
	n->cloned = 1;
	n->nohdr = 0;
	n->destructor = NULL;
	C(iif);
	C(tail);
	C(end);
	C(head);
	C(data);
	C(truesize);
	atomic_set(&n->users, 1);

	atomic_inc(&(skb_shinfo(skb)->dataref));
	skb->cloned = 1;

	return n;
#undef C
}

-- 
paul moore
linux security @ hp

^ permalink raw reply

* Re: [RFC PATCH] NET: Clone the sk_buff->iif field properly
From: David Miller @ 2008-01-03 23:05 UTC (permalink / raw)
  To: paul.moore; +Cc: jarkao2, hadi, netdev
In-Reply-To: <200801031620.06603.paul.moore@hp.com>

From: Paul Moore <paul.moore@hp.com>
Date: Thu, 3 Jan 2008 16:20:06 -0500

> On Thursday 03 January 2008 4:13:12 pm Jarek Poplawski wrote:
> > On Thu, Jan 03, 2008 at 11:15:34AM -0500, Paul Moore wrote:
> > ...
> >
> > > While I'm at it, is there some reason for this #define in
> > > __skb_clone()?
> > >
> > >  #define C(x) n->x = skb->x
> > >
> > > ... it seems kinda silly to me and I tend to think the code would
> > > be better without it.
> >
> > IMHO, if there are a lot of this, it's definitely more readable:
> > easier to check which values are simply copied and which need
> > something more. But, as usual, it's probably a question of taste, and
> > of course without it it would definitely look classier...
> 
> For me personally, I would argue the readability bit.

I definitely think the C() thing is more readable.

Less typing, less reading...


^ permalink raw reply

* [PATCH 1/1] r8169: fix missing loop variable increment
From: Francois Romieu @ 2008-01-03 22:38 UTC (permalink / raw)
  To: jeff
  Cc: netdev, Andrew Morton, David S. Miller,
	"Citizen Lee <citizen_lee

Spotted-by: Citizen Lee <citizen_lee@thecus.com>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
---
 drivers/net/r8169.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 5863190..6ee9db1 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2002,7 +2002,7 @@ static void rtl8169_set_magic_reg(void __iomem *ioaddr, unsigned mac_version)
 	u32 clk;
 
 	clk = RTL_R8(Config2) & PCI_Clock_66MHz;
-	for (i = 0; i < ARRAY_SIZE(cfg2_info); i++) {
+	for (i = 0; i < ARRAY_SIZE(cfg2_info); i++, p++) {
 		if ((p->mac_version == mac_version) && (p->clk == clk)) {
 			RTL_W32(0x7c, p->val);
 			break;
-- 
1.5.3.3


^ permalink raw reply related

* Re: [RFC PATCH] NET: Clone the sk_buff->iif field properly
From: Jarek Poplawski @ 2008-01-03 22:49 UTC (permalink / raw)
  To: Paul Moore; +Cc: hadi, netdev
In-Reply-To: <20080103220608.GB7258@ami.dom.local>

On Thu, Jan 03, 2008 at 11:06:08PM +0100, Jarek Poplawski wrote:
...
> I'm not macros fan in general: just yesterday I've cursed a bit at some
> guy (I forgot the name...), who gave all these "meaningful" names to
> macros in linux/pkt_cls.h. But, maybe after some time I'll start to
> defend them as well... Especially when I try to imagine doing the same
> without them.

...Jamal, as a matter of fact, I don't know why, just about writing this
previous message, I started to like all these names without exception!
Isn't it a miracle?

Jarek P.

^ permalink raw reply

* Re: [RFC PATCH] NET: Clone the sk_buff->iif field properly
From: Jarek Poplawski @ 2008-01-03 22:06 UTC (permalink / raw)
  To: Paul Moore; +Cc: hadi, netdev
In-Reply-To: <200801031620.06603.paul.moore@hp.com>

On Thu, Jan 03, 2008 at 04:20:06PM -0500, Paul Moore wrote:
...
> For me personally, I would argue the readability bit.  Whenever I see a 
> function/macro call I have to go find the function/macro definition 
> before I can understand what it is doing.  Granted, the macro is 
> defined "local" to the function but my point is that being able to look 
> at a line of code and understand it without having to look elsewhere is 
> a nice quality.  To loose that simply because someone wants to save a 
> few keystrokes is a mistake from my point of view.

When I first read this __skb_clone() I had mixed emotions about this
macro too. Later I didn't think about it, and only now, after your
question I've done a quick test, compared with __copy_skb_header() and
it seems there is really less rightwords eye moving. So, it was only
about this one very "local" macro.

I'm not macros fan in general: just yesterday I've cursed a bit at some
guy (I forgot the name...), who gave all these "meaningful" names to
macros in linux/pkt_cls.h. But, maybe after some time I'll start to
defend them as well... Especially when I try to imagine doing the same
without them.

Jarek P.

^ permalink raw reply

* Re: [RFC PATCH] NET: Clone the sk_buff->iif field properly
From: Paul Moore @ 2008-01-03 21:20 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: hadi, netdev
In-Reply-To: <20080103211312.GA7258@ami.dom.local>

On Thursday 03 January 2008 4:13:12 pm Jarek Poplawski wrote:
> On Thu, Jan 03, 2008 at 11:15:34AM -0500, Paul Moore wrote:
> ...
>
> > While I'm at it, is there some reason for this #define in
> > __skb_clone()?
> >
> >  #define C(x) n->x = skb->x
> >
> > ... it seems kinda silly to me and I tend to think the code would
> > be better without it.
>
> IMHO, if there are a lot of this, it's definitely more readable:
> easier to check which values are simply copied and which need
> something more. But, as usual, it's probably a question of taste, and
> of course without it it would definitely look classier...

For me personally, I would argue the readability bit.  Whenever I see a 
function/macro call I have to go find the function/macro definition 
before I can understand what it is doing.  Granted, the macro is 
defined "local" to the function but my point is that being able to look 
at a line of code and understand it without having to look elsewhere is 
a nice quality.  To loose that simply because someone wants to save a 
few keystrokes is a mistake from my point of view.

Besides, if we are really interested in writing a kernel with the least 
number of keystrokes possible wouldn't we be doing it in perl?  I'm 
sure somebody out there has ported the current kernel source to a 
single line of perl ... ;)

> PS: I hope you didn't suggest earlier my (better?) knowlege of git;
> otherwise don't bother: with your git push you are far ahead of my
> gitweb 'degree'.

 ;)

On a serious note, your comment about gitweb made me poke around with 
some of the extra little features ... that 'history' link for each file 
is pretty cool!

-- 
paul moore
linux security @ hp

^ permalink raw reply

* Re: [RFC PATCH] NET: Clone the sk_buff->iif field properly
From: Jarek Poplawski @ 2008-01-03 21:13 UTC (permalink / raw)
  To: Paul Moore; +Cc: hadi, netdev
In-Reply-To: <200801031115.34886.paul.moore@hp.com>

On Thu, Jan 03, 2008 at 11:15:34AM -0500, Paul Moore wrote:
...
> While I'm at it, is there some reason for this #define in __skb_clone()?
> 
>  #define C(x) n->x = skb->x
> 
> ... it seems kinda silly to me and I tend to think the code would be 
> better without it.

IMHO, if there are a lot of this, it's definitely more readable: easier
to check which values are simply copied and which need something more.
But, as usual, it's probably a question of taste, and of course without
it it would definitely look classier...

Cheers,
Jarek P.

PS: I hope you didn't suggest earlier my (better?) knowlege of git;
otherwise don't bother: with your git push you are far ahead of my
gitweb 'degree'.

^ permalink raw reply

* RE: [PATCH 2.6.23.12] net/bonding: option to specify initial bond interface number
From: Jari Takkala @ 2008-01-03 19:19 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev
In-Reply-To: <29661.1199379831@death>

On Thursday, January 03, 2008 12:04, Jay Vosburgh wrote:
>> In our startup scripts we need to be able to ensure that the
>> interface name is consistent across reboots. Sometimes bond1 may be
>> brought up before bond0 and it may have different options (requiring
>> a different instance of the bonding driver).
> 
> 	With the sysfs interface to bonding, your last statement is not
> true; any number of bonding interfaces, with arbitrary names, can be
> created and have their options set without loading multiple instances
> of the bonding driver.
> 
> 	Does your embedded system have sysfs available?  If it does,
> then it's to your advantage to use the sysfs API; for one thing, the
> single instance of the bonding driver with all interfaces through it
> should utilize fewer resources than loading the driver repeatedly.
> 

We do have sysfs available. I actually did not realize that it is possible to have one bonding instance and create multiple interfaces with different options. It looks like when I initially wrote this patch  (~2.6.20) either this feature was not available, or the bonding documentation may have been out of date. I see now in the latest Documentation/networking/bonding.txt that this is explained clearly.

Thank you for your help, I will drop the patch and move to using the sysfs method. Although I may need to patch it for our own uses to either not create bond0 on module load, or I may just run 'echo -bond0 > /sys/class/net/bonding_masters' immediately after loading the module.

Jari

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox