Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [patch 0/5][RFC] Update network drivers to use devres
From: Stephen Hemminger @ 2007-08-03  8:58 UTC (permalink / raw)
  To: Brandon Philips; +Cc: netdev, teheo
In-Reply-To: <20070802224206.GB5181@ifup.org>

On Thu, 2 Aug 2007 15:42:06 -0700
Brandon Philips <brandon@ifup.org> wrote:

> This patch set adds support for devres in the net core and converts the
> e100 and e1000 drivers to devres.  Devres is a simple resource manager
> for device drivers, see Documentation/driver-model/devres.txt for more
> information.
> 
> The use of devres will remain optional for drivers with this patch set.
> Drivers can be converted when it makes sense.

Just because devres exists is not sufficient motivation to change.

It seems that devres was a band-aid rather than fixing storage drivers
to have proper DMA lifetimes. 
Network devices seem to work fine thanks, and the resource requirements
are different. If ain't broke, don't fix it.

^ permalink raw reply

* [GIT PATCH] ucc_geth fixes for 2.6.22-rc1
From: Li Yang @ 2007-08-03  9:07 UTC (permalink / raw)
  To: jeff; +Cc: netdev, linuxppc-embedded

Please pull from 'ucc_geth' branch of
master.kernel.org:/pub/scm/linux/kernel/git/leo/fsl-soc.git ucc_geth

to receive the following fixes:

 drivers/net/ucc_geth_ethtool.c |    1 -
 drivers/net/ucc_geth_mii.c     |    3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

Domen Puncer (1):
      ucc_geth: fix section mismatch

Jan Altenberg (1):
      ucc_geth: remove get_perm_addr from ucc_geth_ethtool.c


diff --git a/drivers/net/ucc_geth_ethtool.c
b/drivers/net/ucc_geth_ethtool.c
index a8994c7..64bef7c 100644
--- a/drivers/net/ucc_geth_ethtool.c
+++ b/drivers/net/ucc_geth_ethtool.c
@@ -379,7 +379,6 @@ static const struct ethtool_ops uec_ethtool_ops = {
 	.get_stats_count        = uec_get_stats_count,
 	.get_strings            = uec_get_strings,
 	.get_ethtool_stats      = uec_get_ethtool_stats,
-	.get_perm_addr          = ethtool_op_get_perm_addr,
 };
 
 void uec_set_ethtool_ops(struct net_device *netdev)
diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c
index 5f8c2d3..6c257b8 100644
--- a/drivers/net/ucc_geth_mii.c
+++ b/drivers/net/ucc_geth_mii.c
@@ -272,7 +272,8 @@ int __init uec_mdio_init(void)
 	return of_register_platform_driver(&uec_mdio_driver);
 }
 
-void __exit uec_mdio_exit(void)
+/* called from __init ucc_geth_init, therefore can not be __exit */
+void uec_mdio_exit(void)
 {
 	of_unregister_platform_driver(&uec_mdio_driver);
 }


^ permalink raw reply related

* Re: [patch] genirq: fix simple and fasteoi irq handlers
From: Ingo Molnar @ 2007-08-03  8:46 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Gabriel C, Linus Torvalds, Thomas Gleixner, Jean-Baptiste Vignaud,
	linux-kernel, shemminger, linux-net, netdev, Andrew Morton,
	Alan Cox, marcin.slusarz
In-Reply-To: <20070803080408.GA12222@elte.hu>


* Ingo Molnar <mingo@elte.hu> wrote:

> * Jarek Poplawski <jarkao2@o2.pl> wrote:
> 
> > I can't guarantee this is all needed to fix this bug, but I think 
> > this patch is necessary here.
> 
> hmmm ... very interesting! Now _this_ is something we'd like to see 
> tested. Could you send a patch to Marcin that also undoes the 
> workaround we have in place now, so that he could check whether 
> ne2k-pci works fine with your fix alone?

or it would be nice if Marcin could test pure 2.6.22 plus your fix 
(without any other patches applied).

	Ingo

^ permalink raw reply

* Re: strange tcp behavior
From: Evgeniy Polyakov @ 2007-08-03  8:25 UTC (permalink / raw)
  To: Simon Arlott; +Cc: john, netdev, David Miller
In-Reply-To: <46B2293B.3020406@simon.arlott.org.uk>

On Thu, Aug 02, 2007 at 07:58:03PM +0100, Simon Arlott (simon@fire.lp0.eu) wrote:
> 19:24:32.897071 IP 192.168.7.4.50000 > 192.168.7.8.2500: S 705362199:705362199(0) win 1500
> 19:24:32.897211 IP 192.168.7.8.2500 > 192.168.7.4.50000: S 4159455228:4159455228(0) ack 705362200 win 14360 <mss 7180>
> 19:24:32.920784 IP 192.168.7.4.50000 > 192.168.7.8.2500: . ack 1 win 1500
> 19:24:32.921732 IP 192.168.7.4.50000 > 192.168.7.8.2500: P 1:17(16) ack 1 win 1500
> 19:24:32.921795 IP 192.168.7.8.2500 > 192.168.7.4.50000: . ack 17 win 14360
> 19:24:32.922881 IP 192.168.7.4.50000 > 192.168.7.8.2500: R 705362216:705362216(0) win 1500
> 19:24:34.927717 IP 192.168.7.8.2500 > 192.168.7.4.50000: R 1:1(0) ack 17 win 14360
> 
> According to RFC 793, the RST from .4 means that the connection 
> is CLOSED.

RFC 2525 - common tcp problems, says we should send RST in this case,
although it does not specify should we send it if socket is in CLOSED
state or not. Well, we send :)
Even if tcp_send_active_reset() will check if socket is in CLOSED state
and will not send data, but is still there, it will not be easily
triggered though, but it can be possible.

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: strange tcp behavior
From: Evgeniy Polyakov @ 2007-08-03  8:22 UTC (permalink / raw)
  To: David Miller; +Cc: simon, john, netdev
In-Reply-To: <20070802.192134.107254907.davem@davemloft.net>

On Thu, Aug 02, 2007 at 07:21:34PM -0700, David Miller (davem@davemloft.net) wrote:
> > On Thu, Aug 02, 2007 at 10:08:42PM +0400, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> > > So, following patch fixes problem for me.
> > 
> > Or this one. Essentially the same though.
> > 
> > Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
> 
> So, this bug got introduced partly in 2.3.15, which is when
> we SMP threaded the networking stack.
> 
> The error check was present in inet_sendmsg() previously, it
> looked like this:
> 
> int inet_sendmsg(struct socket *sock, struct msghdr *msg, int size,
> 		 struct scm_cookie *scm)
> {
> 	struct sock *sk = sock->sk;
> 
> 	if (sk->shutdown & SEND_SHUTDOWN) {
> 		if (!(msg->msg_flags&MSG_NOSIGNAL))
> 			send_sig(SIGPIPE, current, 1);
> 		return(-EPIPE);
> 	}

This one would caught our problem.

> 	if (sk->prot->sendmsg == NULL) 
> 		return(-EOPNOTSUPP);
> 	if(sk->err)
> 		return sock_error(sk);

And this one too.

> 	/* We may need to bind the socket. */
> 	if (inet_autobind(sk) != 0)
> 		return -EAGAIN;
> 
> 	return sk->prot->sendmsg(sk, msg, size);
> }
> 
> I believe the idea was to move the sk->err check down into
> tcp_sendmsg().
> 
> But this raises a major issue.
> 
> What in the world are we doing allowing stream sockets to autobind?
> That is totally bogus.  Even if we autobind, that won't make a connect
> happen.

For accepted socket it is perfectly valid assumption - we could autobind 
it during the first send. Or may bind it during accept. Its a matter of
taste I think. Autobinding during first sending can end up being a 
protection against DoS in some obscure rare case...

> There is logic down in TCP to handle all of these details properly
> as long as we don't do this bogus autobind stuff.

Yes, TCP sending function will catch this problems.

> do_tcp_sendpages() and tcp_sendmsg() both invoke sk_stream_wait_connect()
> if TCP is in a state where data sending is not possible.  Inside of
> sk_stream_wait_connect() it handles socket errors as first priority,
> then if no socket errors are pending it checks if we are trying to
> connect currently and if not returns -EPIPE.  It is exactly what we
> want under these circumstances.
> 
> So the bug is purely that autobind is attempted for TCP sockets at
> all.
> 
> TCP's sendpage handles this correctly already, it calls directly down
> into tcp_sendpage(), inet_sendpage() is not used at all.
> 
> So the fix is to make tcp_sendmsg() direct as well, that bypasses all
> of this autobind madness.  The error checking and state verification
> in TCP's sendmsg() and sendpage() implementations will do the right
> thing.
> 
> Comments?
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index c209361..185c7ec 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -281,7 +281,7 @@ extern int			tcp_v4_remember_stamp(struct sock *sk);
>  
>  extern int		    	tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw);
>  
> -extern int			tcp_sendmsg(struct kiocb *iocb, struct sock *sk,
> +extern int			tcp_sendmsg(struct kiocb *iocb, struct socket *sock,
>  					    struct msghdr *msg, size_t size);

Maybe recvmsg should be changed too for symmetry?

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: [patch] genirq: fix simple and fasteoi irq handlers
From: Ingo Molnar @ 2007-08-03  8:04 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Gabriel C, Linus Torvalds, Thomas Gleixner, Jean-Baptiste Vignaud,
	linux-kernel, shemminger, linux-net, netdev, Andrew Morton,
	Alan Cox, marcin.slusarz
In-Reply-To: <20070803060733.GA1582@ff.dom.local>

* Jarek Poplawski <jarkao2@o2.pl> wrote:

> I can't guarantee this is all needed to fix this bug, but I think this 
> patch is necessary here.

hmmm ... very interesting! Now _this_ is something we'd like to see 
tested. Could you send a patch to Marcin that also undoes the workaround 
we have in place now, so that he could check whether ne2k-pci works fine 
with your fix alone?

	Ingo

^ permalink raw reply

* Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )
From: Jarek Poplawski @ 2007-08-03  7:30 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Gabriel C, Andrew Morton, linux-kernel, netdev, jason.wessel,
	amitkale, Sam Ravnborg
In-Reply-To: <20070802155922.GQ11115@waste.org>

On Thu, Aug 02, 2007 at 10:59:23AM -0500, Matt Mackall wrote:
> On Thu, Aug 02, 2007 at 11:00:08AM +0200, Jarek Poplawski wrote:
> > On Wed, Aug 01, 2007 at 09:02:19PM -0500, Matt Mackall wrote:
...
> > > How about cc:ing the netpoll maintainer?
> > 
> > Is there a new one or do you suggest possibility of abusing the
> > authority of the netpoll's author with such trifles...?!
> 
> I'm just subtly suggesting that if you're going to have a discussion
> about netpoll, you ought to cc: me.

Thanks! I'm very honored. I've suspected there is some subtlety, but
wasn't sure of possible new patches to MAINTAINERS, so tried to be
subtle too...

> 
> > There are some notions about "other diagnostic tools" in some
> > net drivers, eg. 3c509.c, so there would be a little bit of
> > work if, after changing this, they really exist (and even if
> > not - maybe it's reasonable to save such possibility for the
> > future?).
> 
> I created it for netpoll, only netpoll clients have ever cared.

So, probably you're the best person to change this! Alas, it seems,
for some time any changes to netpoll could have a cold reception
here (pity for Ingo's laptop...).

Regards,
Jarek P.

^ permalink raw reply

* [patch] genirq: fix simple and fasteoi irq handlers
From: Jarek Poplawski @ 2007-08-03  6:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gabriel C, Linus Torvalds, Thomas Gleixner, Jean-Baptiste Vignaud,
	linux-kernel, shemminger, linux-net, netdev, Andrew Morton,
	Alan Cox, marcin.slusarz
In-Reply-To: <20070802201126.GA27000@elte.hu>

On Thu, Aug 02, 2007 at 10:11:26PM +0200, Ingo Molnar wrote:
> 
> * Gabriel C <nix.or.die@googlemail.com> wrote:
> 
> > I get a warning on each boot now with this patch ..
> > 
> > [   63.686613] WARNING: at kernel/irq/resend.c:70 check_irq_resend()
...
> we are still trying to figure out what happens with ne2k-pci. The 
> message will vanish soon.

Hi,

I can't guarantee this is all needed to fix this bug, but I think
this patch is necessary here.

Regards,
Jarek P.
------------>

Subject: genirq: fix simple and fasteoi irq handlers

After the "genirq: do not mask interrupts by default" patch interrupts
should be disabled not immediately upon request, but after they happen.
But, handle_simple_irq() and handle_fasteoi_irq() can skip this once or
more if an irq is just serviced (IRQ_INPROGRESS), possibly disrupting a
driver's work.

The main reason of problems here, pointing the broken patch and making
the first patch which can fix this was done by Marcin Slusarz.
Additional test patches of Thomas Gleixner and Ingo Molnar tested by
Marcin Slusarz helped to narrow possible reasons even more. Thanks.

PS: this patch fixes only one evident error here, but there could be
more places affected by above-mentioned change in irq handling.


Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>

---

diff -Nurp 2.6.23-rc1-/kernel/irq/chip.c 2.6.23-rc1/kernel/irq/chip.c
--- 2.6.23-rc1-/kernel/irq/chip.c	2007-07-09 01:32:17.000000000 +0200
+++ 2.6.23-rc1/kernel/irq/chip.c	2007-08-02 20:42:38.000000000 +0200
@@ -295,12 +295,11 @@ handle_simple_irq(unsigned int irq, stru
 
 	spin_lock(&desc->lock);
 
-	if (unlikely(desc->status & IRQ_INPROGRESS))
-		goto out_unlock;
 	kstat_cpu(cpu).irqs[irq]++;
 
 	action = desc->action;
-	if (unlikely(!action || (desc->status & IRQ_DISABLED))) {
+	if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
+						 IRQ_DISABLED)))) {
 		if (desc->chip->mask)
 			desc->chip->mask(irq);
 		desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
@@ -392,18 +391,16 @@ handle_fasteoi_irq(unsigned int irq, str
 
 	spin_lock(&desc->lock);
 
-	if (unlikely(desc->status & IRQ_INPROGRESS))
-		goto out;
-
 	desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
 	kstat_cpu(cpu).irqs[irq]++;
 
 	/*
-	 * If its disabled or no action available
+	 * If it's running, disabled or no action available
 	 * then mask it and get out of here:
 	 */
 	action = desc->action;
-	if (unlikely(!action || (desc->status & IRQ_DISABLED))) {
+	if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
+						 IRQ_DISABLED)))) {
 		desc->status |= IRQ_PENDING;
 		if (desc->chip->mask)
 			desc->chip->mask(irq);

^ permalink raw reply

* Re: Distributed storage.
From: Manu Abraham @ 2007-08-03  5:04 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: netdev, linux-kernel, linux-fsdevel
In-Reply-To: <20070731171347.GA14267@2ka.mipt.ru>

On 7/31/07, Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:

> TODO list currently includes following main items:
>     * redundancy algorithm (drop me a request of your own, but it is highly
>         unlikley that Reed-Solomon based will ever be used - it is too slow
>         for distributed RAID, I consider WEAVER codes)

LDPC codes[1][2] have been replacing Turbo code[3] with regards to
communication links and we have been seeing that transition. (maybe
helpful, came to mind seeing the mention of Turbo code) Don't know how
weaver compares to LDPC, though found some comparisons [4][5] But
looking at fault tolerance figures, i guess Weaver is much better.

[1] http://www.ldpc-codes.com/
[2] http://portal.acm.org/citation.cfm?id=1240497
[3] http://en.wikipedia.org/wiki/Turbo_code
[4] http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf
[5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf

^ permalink raw reply

* Re: Distributed storage.
From: Mike Snitzer @ 2007-08-03  4:09 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: netdev, linux-kernel, linux-fsdevel, Daniel Phillips
In-Reply-To: <20070731171347.GA14267@2ka.mipt.ru>

On 7/31/07, Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
> Hi.
>
> I'm pleased to announce first release of the distributed storage
> subsystem, which allows to form a storage on top of remote and local
> nodes, which in turn can be exported to another storage as a node to
> form tree-like storages.

Very interesting work, I read through your blog for the project and it
is amazing how quickly you developed/tested this code.  Thanks for
capturing the evolution of DST like you have.

> Compared to other similar approaches namely iSCSI and NBD,
> there are following advantages:
>     * non-blocking processing without busy loops (compared to both above)
>     * small, plugable architecture
>     * failover recovery (reconnect to remote target)
>     * autoconfiguration (full absence in NBD and/or device mapper on top of it)
>     * no additional allocatins (not including network part) - at least two in
>         device mapper for fast path
>     * very simple - try to compare with iSCSI
>     * works with different network protocols
>     * storage can be formed on top of remote nodes and be exported
>         simultaneously (iSCSI is peer-to-peer only, NBD requires device
>         mapper and is synchronous)

Having the in-kernel export is a great improvement over NBD's
userspace nbd-server (extra copy, etc).

But NBD's synchronous nature is actually an asset when coupled with MD
raid1 as it provides guarantees that the data has _really_ been
mirrored remotely.

> TODO list currently includes following main items:
>     * redundancy algorithm (drop me a request of your own, but it is highly
>         unlikley that Reed-Solomon based will ever be used - it is too slow
>         for distributed RAID, I consider WEAVER codes)

I'd like to better understand where you see DST heading in the area of
redundancy.    Based on your blog entries:
http://tservice.net.ru/~s0mbre/blog/devel/dst/2007_07_24_1.html
http://tservice.net.ru/~s0mbre/blog/devel/dst/2007_07_31_2.html
(and your todo above) implementing a mirroring algorithm appears to be
a near-term goal for you.  Can you comment on how your intended
implementation would compare, in terms of correctness and efficiency,
to say MD (raid1) + NBD?  MD raid1 has a write intent bitmap that is
useful to speed resyncs; what if any mechanisms do you see DST
embracing to provide similar and/or better reconstruction
infrastructure?  Do you intend to embrace any exisiting MD or DM
infrastructure?

BTW, you have definitely published some very compelling work and its
sad that you're predisposed to think DST won't be recieved well if you
pushed for inclusion (for others, as much was said in the 7.31.2007
blog post I referenced above).  Clearly others need to embrace DST to
help inclusion become a reality.  To that end, its great to see that
Daniel Phillips and the other zumastor folks will be putting DST
through its paces.

regards,
Mike

^ permalink raw reply

* Re: [PATCH 1/2] [TCP]: Also handle snd_una changes in tcp_cwnd_down
From: David Miller @ 2007-08-03  2:46 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev
In-Reply-To: <Pine.LNX.4.64.0708021412380.8788@kivilampi-30.cs.helsinki.fi>

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 2 Aug 2007 14:18:59 +0300 (EEST)

> Dave, please put these two patches to net-2.6 to complete bidir fix 
> series. ...And please push to stable as well, take just the minimized 
> "fix" portion of this "[TCP]: Also handle snd_una changes in 
> tcp_cwnd_down" patch as I described above. Other cleanups in it can be
> put just to net-2.6.

Ok, I will, thanks Ilpo!

^ permalink raw reply

* Re: [GIT PULL] sctp updates
From: David Miller @ 2007-08-03  2:44 UTC (permalink / raw)
  To: vyasevich; +Cc: netdev, lksctp-developers
In-Reply-To: <46B21A95.80103@verizon.net>

From: Vlad Yasevich <vyasevich@verizon.net>
Date: Thu, 02 Aug 2007 13:55:33 -0400

> Hi David
> 
> Please pull the following changes since commit fc34f6c617bf2a845d793af12b96bcc0afd472c4:
>  Andrew Morton (1):
>        Fix up "remove the arm26 port"
> 
> which are found in branch 'master' of the git repository at:
> 
>  master.kernel.org:/pub/scm/linux/kernel/git/vxy/lksctp-dev.git

Pulled, thanks a lot Vlad.

^ permalink raw reply

* Re: [PATCH] TIPC: fix two minor sparse warnings
From: David Miller @ 2007-08-03  2:28 UTC (permalink / raw)
  To: fw; +Cc: netdev, per.liden, jon.maloy, allan.stephens
In-Reply-To: <20070802225756.GA14999@Chamillionaire.breakpoint.cc>

From: Florian Westphal <fw@strlen.de>
Date: Fri, 3 Aug 2007 00:57:56 +0200

> fix two warnings generated by sparse:
> 
> link.c:2386 symbol 'msgcount' shadows an earlier one
> node.c:244 symbol 'addr_string' shadows an earlier one
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>

Looks good, applied, thanks Florian.

^ permalink raw reply

* Re: [PATCH] TIPC: make function tipc_nameseq_subscribe static
From: David Miller @ 2007-08-03  2:26 UTC (permalink / raw)
  To: fw; +Cc: netdev, per.liden, jon.maloy, allan.stephens
In-Reply-To: <20070802225638.GA14966@Chamillionaire.breakpoint.cc>

From: Florian Westphal <fw@strlen.de>
Date: Fri, 3 Aug 2007 00:56:38 +0200

> make needlessly global function tipc_nameseq_subscribe static.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>

Patch applied, thanks Florian.

^ permalink raw reply

* Re: strange tcp behavior
From: David Miller @ 2007-08-03  2:21 UTC (permalink / raw)
  To: johnpol; +Cc: simon, john, netdev
In-Reply-To: <20070802184840.GA8901@2ka.mipt.ru>

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Thu, 2 Aug 2007 22:48:42 +0400

> On Thu, Aug 02, 2007 at 10:08:42PM +0400, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> > So, following patch fixes problem for me.
> 
> Or this one. Essentially the same though.
> 
> Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

So, this bug got introduced partly in 2.3.15, which is when
we SMP threaded the networking stack.

The error check was present in inet_sendmsg() previously, it
looked like this:

int inet_sendmsg(struct socket *sock, struct msghdr *msg, int size,
		 struct scm_cookie *scm)
{
	struct sock *sk = sock->sk;

	if (sk->shutdown & SEND_SHUTDOWN) {
		if (!(msg->msg_flags&MSG_NOSIGNAL))
			send_sig(SIGPIPE, current, 1);
		return(-EPIPE);
	}
	if (sk->prot->sendmsg == NULL) 
		return(-EOPNOTSUPP);
	if(sk->err)
		return sock_error(sk);

	/* We may need to bind the socket. */
	if (inet_autobind(sk) != 0)
		return -EAGAIN;

	return sk->prot->sendmsg(sk, msg, size);
}

I believe the idea was to move the sk->err check down into
tcp_sendmsg().

But this raises a major issue.

What in the world are we doing allowing stream sockets to autobind?
That is totally bogus.  Even if we autobind, that won't make a connect
happen.

There is logic down in TCP to handle all of these details properly
as long as we don't do this bogus autobind stuff.

do_tcp_sendpages() and tcp_sendmsg() both invoke sk_stream_wait_connect()
if TCP is in a state where data sending is not possible.  Inside of
sk_stream_wait_connect() it handles socket errors as first priority,
then if no socket errors are pending it checks if we are trying to
connect currently and if not returns -EPIPE.  It is exactly what we
want under these circumstances.

So the bug is purely that autobind is attempted for TCP sockets at
all.

TCP's sendpage handles this correctly already, it calls directly down
into tcp_sendpage(), inet_sendpage() is not used at all.

So the fix is to make tcp_sendmsg() direct as well, that bypasses all
of this autobind madness.  The error checking and state verification
in TCP's sendmsg() and sendpage() implementations will do the right
thing.

Comments?

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/tcp.h b/include/net/tcp.h
index c209361..185c7ec 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -281,7 +281,7 @@ extern int			tcp_v4_remember_stamp(struct sock *sk);
 
 extern int		    	tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw);
 
-extern int			tcp_sendmsg(struct kiocb *iocb, struct sock *sk,
+extern int			tcp_sendmsg(struct kiocb *iocb, struct socket *sock,
 					    struct msghdr *msg, size_t size);
 extern ssize_t			tcp_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags);
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 06c08e5..e681034 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -831,7 +831,7 @@ const struct proto_ops inet_stream_ops = {
 	.shutdown	   = inet_shutdown,
 	.setsockopt	   = sock_common_setsockopt,
 	.getsockopt	   = sock_common_getsockopt,
-	.sendmsg	   = inet_sendmsg,
+	.sendmsg	   = tcp_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = tcp_sendpage,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index da4c0b6..7e74011 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -658,9 +658,10 @@ static inline int select_size(struct sock *sk)
 	return tmp;
 }
 
-int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
+int tcp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 		size_t size)
 {
+	struct sock *sk = sock->sk;
 	struct iovec *iov;
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 3f5f742..9c94627 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2425,7 +2425,6 @@ struct proto tcp_prot = {
 	.shutdown		= tcp_shutdown,
 	.setsockopt		= tcp_setsockopt,
 	.getsockopt		= tcp_getsockopt,
-	.sendmsg		= tcp_sendmsg,
 	.recvmsg		= tcp_recvmsg,
 	.backlog_rcv		= tcp_v4_do_rcv,
 	.hash			= tcp_v4_hash,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index eed0937..b5f9637 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -484,7 +484,7 @@ const struct proto_ops inet6_stream_ops = {
 	.shutdown	   = inet_shutdown,		/* ok		*/
 	.setsockopt	   = sock_common_setsockopt,	/* ok		*/
 	.getsockopt	   = sock_common_getsockopt,	/* ok		*/
-	.sendmsg	   = inet_sendmsg,		/* ok		*/
+	.sendmsg	   = tcp_sendmsg,		/* ok		*/
 	.recvmsg	   = sock_common_recvmsg,	/* ok		*/
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = tcp_sendpage,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index f10f368..cbdb784 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -2115,7 +2115,6 @@ struct proto tcpv6_prot = {
 	.shutdown		= tcp_shutdown,
 	.setsockopt		= tcp_setsockopt,
 	.getsockopt		= tcp_getsockopt,
-	.sendmsg		= tcp_sendmsg,
 	.recvmsg		= tcp_recvmsg,
 	.backlog_rcv		= tcp_v6_do_rcv,
 	.hash			= tcp_v6_hash,

^ permalink raw reply related

* Re: [PATCH] ethtool: Add support for setting multiple rx/tx queues
From: Kok, Auke @ 2007-08-03  0:18 UTC (permalink / raw)
  To: netdev; +Cc: Auke Kok, jgarzik, davem, Nunley, Nicholas D
In-Reply-To: <20070731202156.15813.98302.stgit@localhost.localdomain>

Auke Kok wrote:
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> ---
> 
> @@ -496,6 +516,14 @@ static void parse_cmdline(int argc, char **argp)
>  				i = argc;
>  				break;
>  			}
> +			if (mode == MODE_SQUEUE) {
> +				parse_generic_cmdline(argc, argp, i,
> +					&gqueue_changed,
> +					cmdline_ring,

Nick pointed out the obvious typo here... I'll wait for some (more) comments 
before reposting.

Auke

^ permalink raw reply

* Re: TCP SACK issue, hung connection, tcpdump included
From: Ilpo Järvinen @ 2007-08-02 23:51 UTC (permalink / raw)
  To: Darryl Miles; +Cc: Netdev
In-Reply-To: <46B20D48.7060704@netbauds.net>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5007 bytes --]

...I dropped lkml, it's useless to bother them with this network related 
stuff...

On Thu, 2 Aug 2007, Darryl Miles wrote:
> Ilpo Järvinen wrote:
> > On Tue, 31 Jul 2007, Darryl L. Miles wrote:

[...RFC bashing, snip...] 

>  * The older linux kernel for not being 100% SACK RFC compliant in its
> implementation ?  Not a lot we can do about this now, but if we're able
> to identify there maybe backward compatibility issues with the same
> implementation thats a useful point to take forward.
>
>  * The newer linux kernel for enabling D-SACK by default when RFC2883
> doesn't even claim a cast iron case for D-SACK to be compatible with any
> 100% RFC compliant SACK implementation.

Are you aware that D-SACK processing and generation has been part of the 
linux kernel TCP far before 2.6 series even begun... ...and it goes far 
beoynd that, 2.4.0 had it too (2.2.0 didn't seem to have it, never before 
have I read that one IIRC :-) ).

> Does Ilpo have a particular vested interest in D-SACK that should be 
> disclosed?

Sure :-). ...my interest was to show you that it's not a bug :-).

> So it is necessary to turn off a TCP option (that is enabled by default)
> to be sure to have reliable TCP connections (that don't lock up) in the
> bugfree Linux networking stack ?  This is absurd.

...You'll have to turn a lot off to be compatible with everything around 
Internet, and still you would probably fail. Some people have to, e.g., to 
turn of window scaling to work-around buggy intermediate nodes (nat boxes 
or some firewalls), there's even a sysctl to workaround signed 16-bit 
window arithmetic bugs that's mostly legacy but I bet you can find host 
broken in that area too. Etc. Yet we don't off those by default.

> If such an option causes such a problem; then that option should not be
> enabled by default. 

...Linux TCP has enabled by default option which are _known_ (at least 
nowadays) to cause bad problems and many of them are _still_ enabled... 
Browse archives if you don't believe me... And I'm relatively sure it will 
do so also in future though I'm not the maintainer nor "anybody" to make 
such decisions...

> rather than wallpaper over the cracks with the voodoo of turning things 
> that are enabled by default off.

...I said that because it felt like you kept repeating that the generated 
DSACK block is a bug even though, like you now know, it's a feature, not a 
bug. :-)

> > 2) The ACK got discarded by the SERVER
> 
> I'd thought about that one, its a possibility.  The server in question
> does have period of high UDP load on a fair number of UDP sockets at
> once (a few 1000).  Both systems have 2Gb of RAM.  The server has maybe
> just 250Mb of RSS of all apps combined.

...There are three independent signs in the log to indicate discard out
of these 3 reasons. Whereas your theory _fails_ to explain some behavior 
in the log you presented, e.g., not updated timestamp which happen even 
_before_ the DSACK stuff?!?... I'll formulate this question: why didn't 
snd_una advance nor timestamp update though a cumulative ACK arrived?
You can check for yourself (in server log):

03:58:56.384503
03:58:56.462583
03:58:56.465707
03:58:56.678546

...I'm hoping SNMPs provide explanation to it.

> The client sent a SACK.  But from understanding more about D-SACK, this
> is a valid D-SACK response, but it appears to confuse the older Linux
> kernel at the server end.

...Are you saying that it's confused by _DSACK_ just because it's only 
"strange" thing you seem to find from the log? I see other things in your 
log which are exceptional and point to elsewhere... Please don't neglect 
them... ...Problems occur already before that DSACK is received by the 
server end.

> Agreed on this.  However discarding data is allowed (providing it is
> intentional discarding not a bug where the TCP stack is discarding segments it
> shouldn't), TCP should recover providing sufficient packets get through.

But if one end decides to discard everything after time t, TCP _will
not_ recover because "sufficient packets" won't "get through"... And 
that's what your log is telling me. Yes discarding is allowed but that 
wasn't the point, we're more interested here on why it got discarded
rather than allowance of discarding.

> Forgive me if I am mistaken, but while the server reports a checksum
> error, the client did not.  I took this to be a misreporting by tcpdump
> at the server, probably due to the "e1000" network card checksum
> offloading

...That's probably the reason, I agree, these show up. Thought that also 
myself, besides, it wouldn't cause that kind of breakage anyway.

> So the SNMP data would show up intentional discards (due to memory/resource
> issues).  So I'll get some of those too.
> 
> The SNMP stats aren't so useful right now as
> the box has been rebooted since then but I shall attempt to capture
> /proc/net/* data, cause the problem, then capture /proc/net/* data again
> if those numbers can help.

Good, thanks. 

-- 
 i.

^ permalink raw reply

* [PATCH] TIPC: fix two minor sparse warnings
From: Florian Westphal @ 2007-08-02 22:57 UTC (permalink / raw)
  To: netdev; +Cc: per.liden, jon.maloy, allan.stephens

fix two warnings generated by sparse:

link.c:2386 symbol 'msgcount' shadows an earlier one
node.c:244 symbol 'addr_string' shadows an earlier one

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/tipc/link.c |    2 +-
 net/tipc/node.c |    2 --
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 1d674e0..1b17fec 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -2383,10 +2383,10 @@ void tipc_link_changeover(struct link *l_ptr)
 		struct tipc_msg *msg = buf_msg(crs);
 
 		if ((msg_user(msg) == MSG_BUNDLER) && split_bundles) {
-			u32 msgcount = msg_msgcnt(msg);
 			struct tipc_msg *m = msg_get_wrapped(msg);
 			unchar* pos = (unchar*)m;
 
+			msgcount = msg_msgcnt(msg);
 			while (msgcount--) {
 				msg_set_seqno(m,msg_seqno(msg));
 				tipc_link_tunnel(l_ptr, &tunnel_hdr, m,
diff --git a/net/tipc/node.c b/net/tipc/node.c
index e2e452a..598f4d3 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -241,8 +241,6 @@ struct node *tipc_node_attach_link(struct link *l_ptr)
 		char addr_string[16];
 
 		if (n_ptr->link_cnt >= 2) {
-			char addr_string[16];
-
 			err("Attempt to create third link to %s\n",
 			    addr_string_fill(addr_string, n_ptr->addr));
 			return NULL;

^ permalink raw reply related

* [PATCH] TIPC: make function tipc_nameseq_subscribe static
From: Florian Westphal @ 2007-08-02 22:56 UTC (permalink / raw)
  To: netdev; +Cc: per.liden, jon.maloy, allan.stephens

make needlessly global function tipc_nameseq_subscribe static.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index d8473ee..ac7dfdd 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -501,7 +501,7 @@ end_node:
  * sequence overlapping with the requested sequence
  */
 
-void tipc_nameseq_subscribe(struct name_seq *nseq, struct subscription *s)
+static void tipc_nameseq_subscribe(struct name_seq *nseq, struct subscription *s)
 {
 	struct sub_seq *sseq = nseq->sseqs;
 


^ permalink raw reply related

* [patch 5/5][RFC] Update e1000 driver to use devres.
From: Brandon Philips @ 2007-08-02 22:45 UTC (permalink / raw)
  To: netdev; +Cc: teheo, Brandon Philips

[-- Attachment #1: e1000-devres.patch --]
[-- Type: text/plain, Size: 5523 bytes --]

Conversion of e1000 probe() and remove() to devres.

Signed-off-by: Brandon Philips <bphilips@suse.de>
---
 drivers/net/e1000/e1000.h      |    1 
 drivers/net/e1000/e1000_main.c |   79 ++++++++++++-----------------------------
 2 files changed, 26 insertions(+), 54 deletions(-)

Index: linux-2.6/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.orig/drivers/net/e1000/e1000_main.c
+++ linux-2.6/drivers/net/e1000/e1000_main.c
@@ -868,7 +868,7 @@ e1000_probe(struct pci_dev *pdev,
 	int i, err, pci_using_dac;
 	uint16_t eeprom_data = 0;
 	uint16_t eeprom_apme_mask = E1000_EEPROM_APME;
-	if ((err = pci_enable_device(pdev)))
+	if ((err = pcim_enable_device(pdev)))
 		return err;
 
 	if (!(err = pci_set_dma_mask(pdev, DMA_64BIT_MASK)) &&
@@ -884,14 +884,14 @@ e1000_probe(struct pci_dev *pdev,
 	}
 
 	if ((err = pci_request_regions(pdev, e1000_driver_name)))
-		goto err_pci_reg;
+		goto err_dma;
 
 	pci_set_master(pdev);
 
 	err = -ENOMEM;
-	netdev = alloc_etherdev(sizeof(struct e1000_adapter));
+	netdev = devm_alloc_etherdev(&pdev->dev, sizeof(struct e1000_adapter));
 	if (!netdev)
-		goto err_alloc_etherdev;
+		goto err_dma;
 
 	SET_MODULE_OWNER(netdev);
 	SET_NETDEV_DEV(netdev, &pdev->dev);
@@ -907,9 +907,9 @@ e1000_probe(struct pci_dev *pdev,
 	mmio_len = pci_resource_len(pdev, BAR_0);
 
 	err = -EIO;
-	adapter->hw.hw_addr = ioremap(mmio_start, mmio_len);
+	adapter->hw.hw_addr = devm_ioremap(&pdev->dev, mmio_start, mmio_len);
 	if (!adapter->hw.hw_addr)
-		goto err_ioremap;
+		goto err_dma;
 
 	for (i = BAR_1; i <= BAR_5; i++) {
 		if (pci_resource_len(pdev, i) == 0)
@@ -952,7 +952,7 @@ e1000_probe(struct pci_dev *pdev,
 	/* setup the private structure */
 
 	if ((err = e1000_sw_init(adapter)))
-		goto err_sw_init;
+		goto err_dma;
 
 	err = -EIO;
 	/* Flash BAR mapping must happen after e1000_sw_init
@@ -961,7 +961,9 @@ e1000_probe(struct pci_dev *pdev,
 	   (pci_resource_flags(pdev, 1) & IORESOURCE_MEM)) {
 		flash_start = pci_resource_start(pdev, 1);
 		flash_len = pci_resource_len(pdev, 1);
-		adapter->hw.flash_address = ioremap(flash_start, flash_len);
+		adapter->hw.flash_address = devm_ioremap(&pdev->dev,
+							flash_start,
+							flash_len);
 		if (!adapter->hw.flash_address)
 			goto err_flashmap;
 	}
@@ -1163,27 +1165,11 @@ err_register:
 err_eeprom:
 	if (!e1000_check_phy_reset_block(&adapter->hw))
 		e1000_phy_hw_reset(&adapter->hw);
-
-	if (adapter->hw.flash_address)
-		iounmap(adapter->hw.flash_address);
 err_flashmap:
 #ifdef CONFIG_E1000_NAPI
 	for (i = 0; i < adapter->num_rx_queues; i++)
 		dev_put(&adapter->polling_netdev[i]);
 #endif
-
-	kfree(adapter->tx_ring);
-	kfree(adapter->rx_ring);
-#ifdef CONFIG_E1000_NAPI
-	kfree(adapter->polling_netdev);
-#endif
-err_sw_init:
-	iounmap(adapter->hw.hw_addr);
-err_ioremap:
-	free_netdev(netdev);
-err_alloc_etherdev:
-	pci_release_regions(pdev);
-err_pci_reg:
 err_dma:
 	pci_disable_device(pdev);
 	return err;
@@ -1224,21 +1210,6 @@ e1000_remove(struct pci_dev *pdev)
 
 	if (!e1000_check_phy_reset_block(&adapter->hw))
 		e1000_phy_hw_reset(&adapter->hw);
-
-	kfree(adapter->tx_ring);
-	kfree(adapter->rx_ring);
-#ifdef CONFIG_E1000_NAPI
-	kfree(adapter->polling_netdev);
-#endif
-
-	iounmap(adapter->hw.hw_addr);
-	if (adapter->hw.flash_address)
-		iounmap(adapter->hw.flash_address);
-	pci_release_regions(pdev);
-
-	free_netdev(netdev);
-
-	pci_disable_device(pdev);
 }
 
 /**
@@ -1350,27 +1321,27 @@ e1000_sw_init(struct e1000_adapter *adap
 static int __devinit
 e1000_alloc_queues(struct e1000_adapter *adapter)
 {
-	adapter->tx_ring = kcalloc(adapter->num_tx_queues,
-	                           sizeof(struct e1000_tx_ring), GFP_KERNEL);
+	adapter->tx_ring = devm_kcalloc(&adapter->pdev->dev,
+					adapter->num_tx_queues,
+					sizeof(struct e1000_tx_ring),
+					GFP_KERNEL);
 	if (!adapter->tx_ring)
 		return -ENOMEM;
 
-	adapter->rx_ring = kcalloc(adapter->num_rx_queues,
-	                           sizeof(struct e1000_rx_ring), GFP_KERNEL);
-	if (!adapter->rx_ring) {
-		kfree(adapter->tx_ring);
+	adapter->rx_ring = devm_kcalloc(&adapter->pdev->dev,
+					adapter->num_rx_queues,
+					sizeof(struct e1000_rx_ring),
+					GFP_KERNEL);
+	if (!adapter->rx_ring)
 		return -ENOMEM;
-	}
 
 #ifdef CONFIG_E1000_NAPI
-	adapter->polling_netdev = kcalloc(adapter->num_rx_queues,
-	                                  sizeof(struct net_device),
-	                                  GFP_KERNEL);
-	if (!adapter->polling_netdev) {
-		kfree(adapter->tx_ring);
-		kfree(adapter->rx_ring);
+	adapter->polling_netdev = devm_kcalloc(&adapter->pdev->dev,
+					       adapter->num_rx_queues,
+					       sizeof(struct net_device),
+					       GFP_KERNEL);
+	if (!adapter->polling_netdev)
 		return -ENOMEM;
-	}
 #endif
 
 	return E1000_SUCCESS;
@@ -5174,7 +5145,7 @@ e1000_resume(struct pci_dev *pdev)
 
 	pci_set_power_state(pdev, PCI_D0);
 	pci_restore_state(pdev);
-	if ((err = pci_enable_device(pdev))) {
+	if ((err = pcim_enable_device(pdev))) {
 		printk(KERN_ERR "e1000: Cannot enable PCI device from suspend\n");
 		return err;
 	}
Index: linux-2.6/drivers/net/e1000/e1000.h
===================================================================
--- linux-2.6.orig/drivers/net/e1000/e1000.h
+++ linux-2.6/drivers/net/e1000/e1000.h
@@ -41,6 +41,7 @@
 #include <linux/errno.h>
 #include <linux/ioport.h>
 #include <linux/pci.h>
+#include <linux/io.h>
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>

-- 

^ permalink raw reply

* [patch 4/5][RFC] Implement devm_kcalloc
From: Brandon Philips @ 2007-08-02 22:45 UTC (permalink / raw)
  To: netdev; +Cc: teheo, Brandon Philips

[-- Attachment #1: kcalloc-devres.patch --]
[-- Type: text/plain, Size: 1711 bytes --]

devm_kcalloc is a simple wrapper around devm_kzalloc for arrays.  This is
needed because kcalloc is often used in network devices. 

Signed-off-by: Brandon Philips <bphilips@suse.de>

---
 drivers/base/devres.c  |   16 ++++++++++++++++
 include/linux/device.h |    1 +
 2 files changed, 17 insertions(+)

Index: linux-2.6/drivers/base/devres.c
===================================================================
--- linux-2.6.orig/drivers/base/devres.c
+++ linux-2.6/drivers/base/devres.c
@@ -630,6 +630,22 @@ void * devm_kzalloc(struct device *dev, 
 EXPORT_SYMBOL_GPL(devm_kzalloc);
 
 /**
+ * devm_kcalloc - resource-managed kcalloc
+ * @dev: Device to allocate memory for
+ * @n: number of elements.
+ * @size: element size.
+ * @flags: the type of memory to allocate.
+ */
+inline void * devm_kcalloc(struct device * dev, size_t n, size_t size,
+			   gfp_t flags)
+{
+        if (n != 0 && size > ULONG_MAX / n)
+                return NULL;
+        return devm_kzalloc(dev, n * size, flags);
+}
+EXPORT_SYMBOL_GPL(devm_kcalloc);
+
+/**
  * devm_kfree - Resource-managed kfree
  * @dev: Device this memory belongs to
  * @p: Memory to free
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -402,6 +402,7 @@ extern int devres_release_group(struct d
 
 /* managed kzalloc/kfree for device drivers, no kmalloc, always use kzalloc */
 extern void *devm_kzalloc(struct device *dev, size_t size, gfp_t gfp);
+extern void *devm_kcalloc(struct device *dev, size_t n, size_t size, gfp_t flags);
 extern void devm_kfree(struct device *dev, void *p);
 
 struct device {

-- 

^ permalink raw reply

* [patch 3/5][RFC] Update e100 driver to use devres.
From: Brandon Philips @ 2007-08-02 22:45 UTC (permalink / raw)
  To: netdev; +Cc: teheo, Brandon Philips

[-- Attachment #1: e100-devres.patch --]
[-- Type: text/plain, Size: 5340 bytes --]

devres manages device resources and is currently used by all libata low level
drivers.   It can greatly reduce the complexity of the error handling on probe
and the device removal functions.

For example the e100_free() function and all of the gotos in e100_probe have
been removed.  Also, e100_remove() has been deleted and replaced with a much
simpler netdev_pci_remove_one() function that can handle PCI net devices that
don't require teardown besides resource deallocation.

Signed-off-by: Brandon Philips <bphilips@suse.de>

---
 drivers/net/e100.c |   70 ++++++++++++-----------------------------------------
 1 file changed, 17 insertions(+), 53 deletions(-)

Index: linux-2.6/drivers/net/e100.c
===================================================================
--- linux-2.6.orig/drivers/net/e100.c
+++ linux-2.6/drivers/net/e100.c
@@ -2517,18 +2517,11 @@ static int e100_do_ioctl(struct net_devi
 
 static int e100_alloc(struct nic *nic)
 {
-	nic->mem = pci_alloc_consistent(nic->pdev, sizeof(struct mem),
-		&nic->dma_addr);
-	return nic->mem ? 0 : -ENOMEM;
-}
+	struct device *dev = &nic->pdev->dev;
 
-static void e100_free(struct nic *nic)
-{
-	if(nic->mem) {
-		pci_free_consistent(nic->pdev, sizeof(struct mem),
-			nic->mem, nic->dma_addr);
-		nic->mem = NULL;
-	}
+	nic->mem = dmam_alloc_coherent(dev, sizeof(struct mem),
+		&nic->dma_addr, GFP_ATOMIC);
+	return nic->mem ? 0 : -ENOMEM;
 }
 
 static int e100_open(struct net_device *netdev)
@@ -2555,7 +2548,7 @@ static int __devinit e100_probe(struct p
 	struct nic *nic;
 	int err;
 
-	if(!(netdev = alloc_etherdev(sizeof(struct nic)))) {
+	if (!(netdev = devm_alloc_etherdev(&pdev->dev, sizeof(struct nic)))) {
 		if(((1 << debug) - 1) & NETIF_MSG_PROBE)
 			printk(KERN_ERR PFX "Etherdev alloc failed, abort.\n");
 		return -ENOMEM;
@@ -2585,26 +2578,26 @@ static int __devinit e100_probe(struct p
 	nic->msg_enable = (1 << debug) - 1;
 	pci_set_drvdata(pdev, netdev);
 
-	if((err = pci_enable_device(pdev))) {
+	if ((err = pcim_enable_device(pdev))) {
 		DPRINTK(PROBE, ERR, "Cannot enable PCI device, aborting.\n");
-		goto err_out_free_dev;
+		return err;
 	}
 
 	if(!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM)) {
 		DPRINTK(PROBE, ERR, "Cannot find proper PCI device "
 			"base address, aborting.\n");
 		err = -ENODEV;
-		goto err_out_disable_pdev;
+		return err;
 	}
 
 	if((err = pci_request_regions(pdev, DRV_NAME))) {
 		DPRINTK(PROBE, ERR, "Cannot obtain PCI resources, aborting.\n");
-		goto err_out_disable_pdev;
+		return err;
 	}
 
 	if((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) {
 		DPRINTK(PROBE, ERR, "No usable DMA configuration, aborting.\n");
-		goto err_out_free_res;
+		return err;
 	}
 
 	SET_MODULE_OWNER(netdev);
@@ -2613,11 +2606,11 @@ static int __devinit e100_probe(struct p
 	if (use_io)
 		DPRINTK(PROBE, INFO, "using i/o access mode\n");
 
-	nic->csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr));
+	nic->csr = pcim_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr));
 	if(!nic->csr) {
 		DPRINTK(PROBE, ERR, "Cannot map device registers, aborting.\n");
 		err = -ENOMEM;
-		goto err_out_free_res;
+		return err;
 	}
 
 	if(ent->driver_data)
@@ -2650,11 +2643,11 @@ static int __devinit e100_probe(struct p
 
 	if((err = e100_alloc(nic))) {
 		DPRINTK(PROBE, ERR, "Cannot alloc driver memory, aborting.\n");
-		goto err_out_iounmap;
+		return err;
 	}
 
 	if((err = e100_eeprom_load(nic)))
-		goto err_out_free;
+		return err;
 
 	e100_phy_init(nic);
 
@@ -2664,8 +2657,7 @@ static int __devinit e100_probe(struct p
 		if (!eeprom_bad_csum_allow) {
 			DPRINTK(PROBE, ERR, "Invalid MAC address from "
 			        "EEPROM, aborting.\n");
-			err = -EAGAIN;
-			goto err_out_free;
+			return -EAGAIN;
 		} else {
 			DPRINTK(PROBE, ERR, "Invalid MAC address from EEPROM, "
 			        "you MUST configure one.\n");
@@ -2685,7 +2677,7 @@ static int __devinit e100_probe(struct p
 	strcpy(netdev->name, "eth%d");
 	if((err = register_netdev(netdev))) {
 		DPRINTK(PROBE, ERR, "Cannot register net device, aborting.\n");
-		goto err_out_free;
+		return err;
 	}
 
 	DPRINTK(PROBE, INFO, "addr 0x%llx, irq %d, "
@@ -2695,36 +2687,8 @@ static int __devinit e100_probe(struct p
 		netdev->dev_addr[3], netdev->dev_addr[4], netdev->dev_addr[5]);
 
 	return 0;
-
-err_out_free:
-	e100_free(nic);
-err_out_iounmap:
-	pci_iounmap(pdev, nic->csr);
-err_out_free_res:
-	pci_release_regions(pdev);
-err_out_disable_pdev:
-	pci_disable_device(pdev);
-err_out_free_dev:
-	pci_set_drvdata(pdev, NULL);
-	free_netdev(netdev);
-	return err;
 }
 
-static void __devexit e100_remove(struct pci_dev *pdev)
-{
-	struct net_device *netdev = pci_get_drvdata(pdev);
-
-	if(netdev) {
-		struct nic *nic = netdev_priv(netdev);
-		unregister_netdev(netdev);
-		e100_free(nic);
-		iounmap(nic->csr);
-		free_netdev(netdev);
-		pci_release_regions(pdev);
-		pci_disable_device(pdev);
-		pci_set_drvdata(pdev, NULL);
-	}
-}
 
 #ifdef CONFIG_PM
 static int e100_suspend(struct pci_dev *pdev, pm_message_t state)
@@ -2875,7 +2839,7 @@ static struct pci_driver e100_driver = {
 	.name =         DRV_NAME,
 	.id_table =     e100_id_table,
 	.probe =        e100_probe,
-	.remove =       __devexit_p(e100_remove),
+	.remove =       __devexit_p(netdev_pci_remove_one),
 #ifdef CONFIG_PM
 	/* Power Management hooks */
 	.suspend =      e100_suspend,

-- 

^ permalink raw reply

* [patch 2/5][RFC] Update net core to use devres.
From: Brandon Philips @ 2007-08-02 22:45 UTC (permalink / raw)
  To: netdev; +Cc: teheo, Brandon Philips

[-- Attachment #1: ether-core-devres.patch --]
[-- Type: text/plain, Size: 7821 bytes --]

* netdev_pci_remove_one() can replace simple pci device remove
  functions

* devm_alloc_netdev() is like alloc_netdev but allocates memory using devres.

Signed-off-by: Brandon Philips <bphilips@suse.de>

---
 include/linux/etherdevice.h |    5 ++
 include/linux/netdevice.h   |    7 ++
 net/core/dev.c              |  109 +++++++++++++++++++++++++++++++++++++++-----
 net/ethernet/eth.c          |    8 +++
 4 files changed, 119 insertions(+), 10 deletions(-)

Index: linux-2.6/include/linux/netdevice.h
===================================================================
--- linux-2.6.orig/include/linux/netdevice.h
+++ linux-2.6/include/linux/netdevice.h
@@ -656,6 +656,7 @@ extern int		dev_queue_xmit(struct sk_buf
 extern int		register_netdevice(struct net_device *dev);
 extern void		unregister_netdevice(struct net_device *dev);
 extern void		free_netdev(struct net_device *dev);
+extern void		netdev_pci_remove_one(struct pci_dev *pdev);
 extern void		synchronize_net(void);
 extern int 		register_netdevice_notifier(struct notifier_block *nb);
 extern int		unregister_netdevice_notifier(struct notifier_block *nb);
@@ -1085,8 +1086,14 @@ extern void		ether_setup(struct net_devi
 extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 				       void (*setup)(struct net_device *),
 				       unsigned int queue_count);
+extern struct net_device *devm_alloc_netdev_mq(struct device *dev,
+				       int sizeof_priv, const char *name,
+				       void (*setup)(struct net_device *),
+				       unsigned int queue_count);
 #define alloc_netdev(sizeof_priv, name, setup) \
 	alloc_netdev_mq(sizeof_priv, name, setup, 1)
+#define devm_alloc_netdev(dev, sizeof_priv, name, setup) \
+	devm_alloc_netdev_mq(dev, sizeof_priv, name, setup, 1)
 extern int		register_netdev(struct net_device *dev);
 extern void		unregister_netdev(struct net_device *dev);
 /* Functions used for secondary unicast and multicast support */
Index: linux-2.6/net/core/dev.c
===================================================================
--- linux-2.6.orig/net/core/dev.c
+++ linux-2.6/net/core/dev.c
@@ -89,6 +89,7 @@
 #include <linux/interrupt.h>
 #include <linux/if_ether.h>
 #include <linux/netdevice.h>
+#include <linux/pci.h>
 #include <linux/etherdevice.h>
 #include <linux/notifier.h>
 #include <linux/skbuff.h>
@@ -3658,18 +3659,51 @@ static struct net_device_stats *internal
 }
 
 /**
- *	alloc_netdev_mq - allocate network device
- *	@sizeof_priv:	size of private data to allocate space for
- *	@name:		device name format string
- *	@setup:		callback to initialize device
- *	@queue_count:	the number of subqueues to allocate
+ * devm_free_netdev - wrapper around free_netdev for devres
+ */
+static void devm_free_netdev(struct device *gendev, void *res)
+{
+	struct net_device *dev = dev_get_drvdata(gendev);
+	free_netdev(dev);
+}
+
+/**
+ * register_netdev_devres - register netdev with a managed device
+ * @dev:	devres managed device responsible for the memory
+ * @netdev:		pointer to netdev to be managed
  *
- *	Allocates a struct net_device with private data area for driver use
- *	and performs basic initialization.  Also allocates subquue structs
- *	for each queue on the device at the end of the netdevice.
+ * Registers @netdev to the device @dev and calls free_netdev automatically when the
+ * device disappears
  */
-struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
-		void (*setup)(struct net_device *), unsigned int queue_count)
+static inline void * register_netdev_devres(struct device *gendev,
+						struct net_device *dev)
+{
+	struct net_device **p;
+
+	/* 0 size because we don't need it. The net_device is already alloc'd
+	 * in alloc_netdev_mq.  We can't use devm_kzalloc in alloc_netdeev_mq
+	 * because a net_device cannot be free'd directly as it can be a
+	 * kobject.  See free_netdev.
+	 */
+	p = devres_alloc(devm_free_netdev, 0, GFP_KERNEL);
+
+	if (unlikely(!p))
+		return NULL;
+
+	*p = dev;
+	devres_add(gendev, p);
+
+	return dev;
+}
+
+/**
+ *	__alloc_netdev_mq - does the work to allocate a network device
+ *	@dev:		devres managed device responsible for mem.
+ *			NULL if unmanaged
+ */
+struct net_device *__alloc_netdev_mq(struct device *gendev, int sizeof_priv,
+		const char *name, void (*setup)(struct net_device *),
+		unsigned int queue_count)
 {
 	void *p;
 	struct net_device *dev;
@@ -3706,8 +3740,43 @@ struct net_device *alloc_netdev_mq(int s
 	dev->get_stats = internal_stats;
 	setup(dev);
 	strcpy(dev->name, name);
+
+	/* If we are given a device then manage this netdev with devres */
+	if (gendev != NULL)
+		return register_netdev_devres(gendev, dev);
+
 	return dev;
 }
+
+/**
+ *	alloc_netdev_mq - alloc_netdev_mq for devres managed devices
+ *	@dev:		devres managed device responsible for mem.
+ */
+struct net_device *devm_alloc_netdev_mq(struct device *dev, int sizeof_priv, const
+				char *name, void (*setup)(struct net_device *),
+				unsigned int queue_count)
+{
+	return __alloc_netdev_mq(dev, sizeof_priv, name, setup, queue_count);
+}
+EXPORT_SYMBOL(devm_alloc_netdev_mq);
+
+/**
+ *	alloc_netdev_mq - allocate network device
+ *	@sizeof_priv:	size of private data to allocate space for
+ *	@name:		device name format string
+ *	@setup:		callback to initialize device
+ *	@queue_count:	the number of subqueues to allocate
+ *
+ *	Allocates a struct net_device with private data area for driver use
+ *	and performs basic initialization.  Also allocates subquue structs
+ *	for each queue on the device at the end of the netdevice.
+ */
+struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
+				void (*setup)(struct net_device *),
+				unsigned int queue_count)
+{
+	return __alloc_netdev_mq(NULL, sizeof_priv, name, setup, queue_count);
+}
 EXPORT_SYMBOL(alloc_netdev_mq);
 
 /**
@@ -3737,6 +3806,26 @@ void free_netdev(struct net_device *dev)
 #endif
 }
 
+#ifdef CONFIG_PCI
+/**
+ * netdev_pci_remove_one - free network device
+ * @pdev: pci_dev of the device to remove
+ *
+ * Simple remove function for pci network devices with no teardown besides
+ * resource deallocation.
+ */
+void netdev_pci_remove_one(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	if (netdev) {
+		unregister_netdev(netdev);
+		pci_set_drvdata(pdev, NULL);
+	}
+}
+EXPORT_SYMBOL(netdev_pci_remove_one);
+#endif
+
+
 /* Synchronize with packet receive processing. */
 void synchronize_net(void)
 {
Index: linux-2.6/include/linux/etherdevice.h
===================================================================
--- linux-2.6.orig/include/linux/etherdevice.h
+++ linux-2.6/include/linux/etherdevice.h
@@ -40,7 +40,12 @@ extern int		eth_header_cache(struct neig
 					 struct hh_cache *hh);
 
 extern struct net_device *alloc_etherdev_mq(int sizeof_priv, unsigned int queue_count);
+extern struct net_device *devm_alloc_etherdev_mq(struct device *dev,
+					      int sizeof_priv,
+					      unsigned int queue_count);
 #define alloc_etherdev(sizeof_priv) alloc_etherdev_mq(sizeof_priv, 1)
+#define devm_alloc_etherdev(dev, sizeof_priv) \
+	devm_alloc_etherdev_mq(dev, sizeof_priv, 1)
 
 /**
  * is_zero_ether_addr - Determine if give Ethernet address is all zeros.
Index: linux-2.6/net/ethernet/eth.c
===================================================================
--- linux-2.6.orig/net/ethernet/eth.c
+++ linux-2.6/net/ethernet/eth.c
@@ -337,3 +337,11 @@ struct net_device *alloc_etherdev_mq(int
 	return alloc_netdev_mq(sizeof_priv, "eth%d", ether_setup, queue_count);
 }
 EXPORT_SYMBOL(alloc_etherdev_mq);
+
+struct net_device *devm_alloc_etherdev_mq(struct device *dev, int sizeof_priv,
+						unsigned int queue_count)
+{
+	return devm_alloc_netdev_mq(dev, sizeof_priv, "eth%d", ether_setup,
+				    queue_count);
+}
+EXPORT_SYMBOL(devm_alloc_etherdev_mq);

-- 

^ permalink raw reply

* [patch 1/5][RFC] NET: Change pci_enable_device to pci_reenable_device to keep device enable balance
From: Brandon Philips @ 2007-08-02 22:44 UTC (permalink / raw)
  To: netdev; +Cc: teheo, Brandon Philips

[-- Attachment #1: net-pci-reenable-on-slot-reset.patch --]
[-- Type: text/plain, Size: 2552 bytes --]

On a slot_reset event pci_disable_device() is never called so calling
pci_enable_device() will unbalance the enable count.

Signed-off-by: Brandon Philips <bphilips@suse.de>

---
 drivers/net/e100.c             |    2 +-
 drivers/net/e1000/e1000_main.c |    2 +-
 drivers/net/ixgb/ixgb_main.c   |    2 +-
 drivers/net/s2io.c             |    2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/net/e100.c
===================================================================
--- linux-2.6.orig/drivers/net/e100.c
+++ linux-2.6/drivers/net/e100.c
@@ -2828,7 +2828,7 @@ static pci_ers_result_t e100_io_slot_res
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct nic *nic = netdev_priv(netdev);
 
-	if (pci_enable_device(pdev)) {
+	if (pci_reenable_device(pdev)) {
 		printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n");
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
Index: linux-2.6/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.orig/drivers/net/e1000/e1000_main.c
+++ linux-2.6/drivers/net/e1000/e1000_main.c
@@ -5270,7 +5270,7 @@ static pci_ers_result_t e1000_io_slot_re
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct e1000_adapter *adapter = netdev->priv;
 
-	if (pci_enable_device(pdev)) {
+	if (pci_reenable_device(pdev)) {
 		printk(KERN_ERR "e1000: Cannot re-enable PCI device after reset.\n");
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
Index: linux-2.6/drivers/net/ixgb/ixgb_main.c
===================================================================
--- linux-2.6.orig/drivers/net/ixgb/ixgb_main.c
+++ linux-2.6/drivers/net/ixgb/ixgb_main.c
@@ -2294,7 +2294,7 @@ static pci_ers_result_t ixgb_io_slot_res
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct ixgb_adapter *adapter = netdev_priv(netdev);
 
-	if(pci_enable_device(pdev)) {
+	if(pci_reenable_device(pdev)) {
 		DPRINTK(PROBE, ERR, "Cannot re-enable PCI device after reset.\n");
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
Index: linux-2.6/drivers/net/s2io.c
===================================================================
--- linux-2.6.orig/drivers/net/s2io.c
+++ linux-2.6/drivers/net/s2io.c
@@ -7833,7 +7833,7 @@ static pci_ers_result_t s2io_io_slot_res
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct s2io_nic *sp = netdev->priv;
 
-	if (pci_enable_device(pdev)) {
+	if (pci_reenable_device(pdev)) {
 		printk(KERN_ERR "s2io: "
 		       "Cannot re-enable PCI device after reset.\n");
 		return PCI_ERS_RESULT_DISCONNECT;

-- 

^ permalink raw reply

* Re: [REGRESSION] tg3 dead after s2ram
From: Michael Chan @ 2007-08-02 23:38 UTC (permalink / raw)
  To: David Miller
  Cc: joachim.deguara, akpm, linux-kernel, michal.k.k.piotrowski,
	netdev, linux-acpi
In-Reply-To: <20070802.150636.77057800.davem@davemloft.net>

On Thu, 2007-08-02 at 15:06 -0700, David Miller wrote:
> From: "Michael Chan" <mchan@broadcom.com>
> Date: Thu, 02 Aug 2007 12:10:29 -0700
>
> > Alternatively, we can also fix it by calling pci_enable_device() again
> > in tg3_open().  But I think it is better to just always save and restore
> > in suspend/resume.  bnx2.c will also require the same fix.
> 
> We could do it that way.  But don't you think it's more reliable to
> save and restore around the event we know will be what clobbers the
> PCI config space on us? :-)
> 

Yes for sure when netif state is running and we were already doing that.

> Other things might happen between ->resume() and ->open() that could
> modify PCI config space, and we could overwrite such changes if we do
> the PCI restore in ->open().

I suggested calling pci_enable_device() in ->open(), not calling
pci_restore_state() in ->open().  I ultimately decided against it
because some devices do not enable memory as a workaround and it would
be messy to deal with it again in tg3_open().

I definitely agree that calling PCI restore in ->open() is a bad idea.
We used to save PCI state in ->probe() once and restore PCI state after
every chip reset.  This sequence caused many subtle problems.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox