Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: Caitlin Bestler @ 2006-04-26 22:53 UTC (permalink / raw)
  To: David S. Miller, jeff; +Cc: kelly, netdev, rusty

David S. Miller wrote:
> From: Jeff Garzik <jeff@garzik.org>
> Date: Wed, 26 Apr 2006 15:46:58 -0400
> 
>> Oh, there are plenty of examples of filtering within an established
>> connection:  input rules.  I've seen "drop all packets from <these>
>> IPs" type rules frequently.  Victims of DoS use those kinds of rules
>> to stop packets as early as possible.
> 
> Yes, good point, but this applies to listening connections.
> 
> We'll need to figure out a way to deal with this.
> 
> It occurs to me that for established connections, netfilter
> can simply remove all matching entries from the netchannel lookup
> tables. 
> 
> But that still leaves the thorny listening socket issue.
> This may by itself make netfilter netchannel support
> important and that brings up a lot of issues about classifier
> algorithms. 
> 
> All of this I wanted to avoid as we start this work :-)
> 
> We can think about how to approach these other problems and
> start with something simple meanwhile.  That seems to me to
> be the best approach moving forward.
> 
> It's important to start really simple else we'll just keep
> getting bogged down in complexity and details and never
> implement anything.

How does this sound?

The netchannel qualifiers should only deal with TCP packets
for established connections. Listens would continue to be 
dealt with by the existing stack logic, vj_channelizing
only occurring when the the connection was accepted.

The vj_netchannel qualifiers would conceptually take place
before the netfilter rules (to avoid making deployment
of netchannels dependent on netfilter) but their creation
would have to be approved by netfilter (if netfiler was
active). Netfilter could also revoke vj_channel qualifiers.

If the rule is that "if a vj_netchannel rule exists then it
must be ok with netfilter" is actually very easy to implement.
During early development you simply tell the testers "hey,
don't set up any netchannels that netfilter would reject"
and defer implementing enforcement until after the netchannels
code actually works. After all, if it is isn't actually successfully
transmitting or receiving packets yet it can't really be acting
contrary to netfilter policy.

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: David S. Miller @ 2006-04-26 22:59 UTC (permalink / raw)
  To: caitlinb; +Cc: jeff, kelly, netdev, rusty
In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F143AEC1@NT-SJCA-0751.brcm.ad.broadcom.com>

From: "Caitlin Bestler" <caitlinb@broadcom.com>
Date: Wed, 26 Apr 2006 15:53:44 -0700

> The netchannel qualifiers should only deal with TCP packets
> for established connections. Listens would continue to be 
> dealt with by the existing stack logic, vj_channelizing
> only occurring when the the connection was accepted.

I consider netchannel support for listening TCP sockets
to be absolutely essential.

^ permalink raw reply

* Re: tune back idle cwnd closing?
From: Zach Brown @ 2006-04-26 23:25 UTC (permalink / raw)
  To: David S. Miller; +Cc: jheffner, netdev
In-Reply-To: <20060426.144540.39973302.davem@davemloft.net>


>> Given that RFC2681 is Experimental (and I'm not aware of any current 
>> efforts in the IETF to push it to the standard track), IHMO it would not 
>> be inappropriate to make this behavior controlled via sysctl.
> 
> I have to respectfully disagree.

OK, thanks for taking the time to look at it.

- z

^ permalink raw reply

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
From: Shaw @ 2006-04-27  0:14 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Michael Chan, Herbert Xu, netdev, auke-jan.h.kok, davem, jgarzik
In-Reply-To: <bdfc5d6e0604211346n50b15f56g4ebc2fe5fe88a63a@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1255 bytes --]

On 4/21/06, Andy Gospodarek <andy@greyhouse.net> wrote:
> On 4/21/06, Michael Chan <mchan@broadcom.com> wrote:
> > On Fri, 2006-04-21 at 16:01 -0400, Andy Gospodarek wrote:
> >
> > > I just hate to see extra resources used to solve problems that good
> > > coding can solve (not that my suggestion is necessarily a 'good' one),
> > > so I was trying to think of a way to resolve this without explicitly
> > > adding another workqueue.
> >
> > If you don't want to add another workqueue, then look at tg3, bnx2, and
> > one of the smc drivers on how to effectively wait for the driver's
> > workqueue task to finish without deadlocking with linkwatch_event.
> >
>
> I agree 100%.  I just hope others can manage to figure that out too.

Ok, here's another attempt.  The goal here is to serialize attempts to
clean the tx and rx buffers, and ensure that e1000_close is called
after the tx_timeout_task has completed running and/or that the task
is safe to run after e1000_close hasrun.

I'm concerned about the addition of the netif_running check to
e1000_down.  While something like this is needed, I'm not familiar
enough w/ the code to know if this is okay.
All explanations and comments are greatly appreciated.

Thanks,
Shaw

[-- Attachment #2: e1000.patch --]
[-- Type: text/x-patch, Size: 2736 bytes --]

diff -u -uprN -X linux-2.6.16.11/Documentation/dontdiff linux-2.6.16.11/drivers/net/e1000/e1000.h linux-2.6.16.11.e1000_patch/drivers/net/e1000/e1000.h
--- linux-2.6.16.11/drivers/net/e1000/e1000.h	2006-04-24 13:20:24.000000000 -0700
+++ linux-2.6.16.11.e1000_patch/drivers/net/e1000/e1000.h	2006-04-26 16:23:46.475842000 -0700
@@ -358,5 +358,8 @@ struct e1000_adapter {
 #ifdef CONFIG_PCI_MSI
 	boolean_t have_msi;
 #endif
+	uint32_t flags;
+#define E1000_CLEANING 0x00000001
+	spinlock_t clean_lock;
 };
 #endif /* _E1000_H_ */
diff -u -uprN -X linux-2.6.16.11/Documentation/dontdiff linux-2.6.16.11/drivers/net/e1000/e1000_main.c linux-2.6.16.11.e1000_patch/drivers/net/e1000/e1000_main.c
--- linux-2.6.16.11/drivers/net/e1000/e1000_main.c	2006-04-24 13:20:24.000000000 -0700
+++ linux-2.6.16.11.e1000_patch/drivers/net/e1000/e1000_main.c	2006-04-26 16:59:48.742905000 -0700
@@ -525,6 +525,16 @@ e1000_down(struct e1000_adapter *adapter
 	boolean_t mng_mode_enabled = (adapter->hw.mac_type >= e1000_82571) &&
 				     e1000_check_mng_mode(&adapter->hw);
 
+	spin_lock_bh(&adapter->clean_lock);
+	adapter->flags |= E1000_CLEANING;
+
+	if (!netif_running(netdev)) {
+	    adapter->flags &= ~E1000_CLEANING;
+	    spin_unlock_bh(&adapter->clean_lock);
+	    return;
+	}
+	spin_unlock_bh(&adapter->clean_lock);
+
 	e1000_irq_disable(adapter);
 #ifdef CONFIG_E1000_MQ
 	while (atomic_read(&adapter->rx_sched_call_data.count) != 0);
@@ -549,8 +559,12 @@ e1000_down(struct e1000_adapter *adapter
 	netif_stop_queue(netdev);
 
 	e1000_reset(adapter);
+
+	spin_lock_bh(&adapter->clean_lock);
 	e1000_clean_all_tx_rings(adapter);
 	e1000_clean_all_rx_rings(adapter);
+	adapter->flags &= ~E1000_CLEANING;
+	spin_unlock_bh(&adapter->clean_lock);
 
 	/* Power down the PHY so no link is implied when interface is down *
 	 * The PHY cannot be powered down if any of the following is TRUE *
@@ -1109,6 +1123,8 @@ e1000_sw_init(struct e1000_adapter *adap
 
 	atomic_set(&adapter->irq_sem, 1);
 	spin_lock_init(&adapter->stats_lock);
+	spin_lock_init(&adapter->clean_lock);
+	adapter->flags = 0;
 
 	return 0;
 }
@@ -1269,10 +1285,18 @@ e1000_close(struct net_device *netdev)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 
+	/* Calling flush_scheduled_work() may deadlock because
+	 * linkwatch_event() may be on the workqueue and it will 
+	 * try to get the rtnl_lock which we are holding. */
+	while (adapter->flags & E1000_CLEANING) 
+	    msleep(1);
+
 	e1000_down(adapter);
 
+	spin_lock_bh(&adapter->clean_lock);
 	e1000_free_all_tx_resources(adapter);
 	e1000_free_all_rx_resources(adapter);
+	spin_unlock_bh(&adapter->clean_lock);
 
 	if ((adapter->hw.mng_cookie.status &
 			  E1000_MNG_DHCP_COOKIE_STATUS_VLAN_SUPPORT)) {



^ permalink raw reply

* Re: [RFC] Geographical/regulatory information for ieee80211
From: Larry Finger @ 2006-04-27  0:54 UTC (permalink / raw)
  To: Rick Jones, Christoph Hellwig, netdev
In-Reply-To: <4443D694.8090809@hp.com>

Rick Jones wrote:
> Christoph Hellwig wrote:
>> On Thu, Apr 13, 2006 at 07:59:21PM -0500, Larry Finger wrote:
>>
>>> I am planning on writing a new routine to be added to 
>>> net/ieee80211/ieee80211_geo.c that will populate an ieee80211_geo 
>>> object given a country code. The new routine will eliminate the need 
>>> for each driver to do their own.
>>
>>
>> This sounds like a generally good idea, but the question is:  do we want
>> this inside a kernel module or in userspace, either like the regulartory
>> daemon intel has (unfortunately in binary only form) or as a simple init
>> script.  I really don't want to recompile my kernel just because 
>> regulations
>> changed, and they seems to do that quite often.
> 
> Yet I would expect the regulatory bodies to look less favorably on 
> something more easily maleable by the end-user.

I don't think it would make that much difference as the user could easily lie about their locality 
and get any set of parameters that they wanted. Intel avoids this problem by hiding the locality in 
an EEPROM (ipw2200) or by combining the EEPROM information with the binary-only daemon (3945).

I am leaning toward putting the geographical information into a userland daemon. That way we won't 
have to patch the kernel every time a country modifies its regulations. In addition, the kernel will 
be smaller. The downside is that the daemon will have to be updated and supplied in some convenient 
form, perhaps as part of a wireless tools package.

Larry

^ permalink raw reply

* RE: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: Caitlin Bestler @ 2006-04-27  1:02 UTC (permalink / raw)
  To: David S. Miller; +Cc: jeff, kelly, netdev, rusty

netdev-owner@vger.kernel.org wrote:
> From: "Caitlin Bestler" <caitlinb@broadcom.com>
> Date: Wed, 26 Apr 2006 15:53:44 -0700
> 
>> The netchannel qualifiers should only deal with TCP packets for
>> established connections. Listens would continue to be dealt with by
>> the existing stack logic, vj_channelizing only occurring when the the
>> connection was accepted.
> 
> I consider netchannel support for listening TCP sockets to be
> absolutely essential. -

Meaning that inbound SYNs would be placed in a net channel
for processing by a Consumer at the other end of the ring?

If so the rules filtering SYNs would have to be applied either
before it went into the ring, or when the consumer end takes
them out. The latter makes more sense to me, because the rules
about what remote hosts can initiate a connection request to
a given TCP port can be fairly complex for a variety of
legitimate reasons.

Would it be reasonable to state that a net channel carrying
SYNs should not be set up when the consumer is a user mode
process?

^ permalink raw reply

* Re: [RFC] e1000 performance patch
From: Robin Humble @ 2006-04-27  2:43 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev
In-Reply-To: <444FF389.2090002@hp.com>

Hi Rick,

thanks for your comments.

On Wed, Apr 26, 2006 at 03:26:17PM -0700, Rick Jones wrote:
>Robin Humble wrote:
>>attached is a small patch for e1000 that dynamically changes Interrupt
>>Throttle Rate for best performance - both latency and bandwidth.
>>it makes e1000 look really good on netpipe with a ~28 us latency and
>>890 Mbit/s bandwidth.
>>
>>the basic idea is that high InterruptThrottleRate (~200k) is best for
>>small messages, 
>Best for small numbers of small messages?  If one is looking to have 
>high aggregate small packet rates, the higher throttle rate may degrade 
>the peak PPS one can achieve.

if small is <1kB, and there's a single client, then it looks to me like
the higher ITR the better.
for a single netpipe client (running 10k repetitions and from 0 byte to
1kB messages), the driver chooses 200k ITR until it gets close to 1kB
messages, when it drops to its next level of 90k ITR. about 15-20% cpu
is used.

<short delay whilst I run some tests>

for 3 netpipe clients (again running 10k repetitions and from 0 byte to
1kB messages, all with the patched e1000 driver), the server is at 200k
ITR until the 3 clients get to ~96 bytes, then it drops to 90k ITR, and
at ~512 byte messages it drops the ITR once more to 30k.

so I think the patched driver is doing the right thing there and
lowering the ITR more rapidly as it gets more clients.

but clearly I should be using netperf to get more accurate cpu numbers
and a more convincing aggregate table :-)

>It is a bit rough/messy as a writeup, but here is what I've seen wrt the 
>latency vs throughput tradeoffs:
>ftp://ftp.cup.hp.com/dist/networking/briefs/nic_latency_vs_tput.txt

from a quick read it looks like just the case with 32kB messages,
multiple simultaneous clients, and driver set to unlimited ITR sees
reduced throughput. is that right?

if so, then I'm not surprised.
this graph
  http://www.cita.utoronto.ca/mediawiki/index.php/Image:Cpu.100k.png
shows that (for our hardware etc. etc.) at 32kB the cpu usage if one
was using 100k ITR is already excessive, and unlimited ITR would be
worse than that... :-/
so for 32kB messages and a single client (never mind multiple clients)
I'd agree with your study that unlimited ITR is probably not a good
idea.

with a single client doing 32kB messages, my patched driver is probably
doing the right thing as it's at 30k ITR (and at its minimum ITR of
15k with multiple clients doing 32kB messages).

>> uint32_t goc = max(adapter->gotcl, adapter->gorcl) / 1000000;
>> uint32_t itr = goc > 10 ? (goc > 20 ? (goc > 100 ? 15000: 30000): 90000): 200000;

Hmmmm... I've just noticed that the gotcl/gorcl count is >200M on the
server when 3 clients are doing 32kB netpipes... so I can probably use
goc of > 150 or 200 as a threshold to switch to a lower ITR again.
maybe 3k or 6k...

but overall I'm actually more worried about a mix of small and large
messages than multiple clients.

a large/small mix might well occur in 'the real world' and it'll be 2s
until the watchdog routine can adapt the ITR. potentially that 2s will
be at 200k ITR which is too high for large messages, and up to 2s of
cpu will be burnt needlessly.

can netperf (or some other tool) mix up big and small message sizes
like 'the real world' perhaps does?
that might help me find a good frequency at which to try to adapt the
ITR... (eg. 1, 10, 100 or 1000 times a second)

cheers,
robin

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: Kelly Daly @ 2006-04-27  3:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: rusty, netdev
In-Reply-To: <20060426.003335.26972263.davem@davemloft.net>

Hi Dave,

Thanks for your response.  =)

On Wednesday 26 April 2006 17:59, you wrote:
> Ok I have comments already just glancing at the initial patch.
>
> With the 32-bit descriptors in the channel, you indeed end up
> with a fixed sized pool with a lot of hard-to-finesse sizing
> and lookup problems to solve.

It should be quite trivial to resize this pool using RCU.

>
> So what I wanted to do was finesse the entire issue by simply
> side-stepping it initially.  Use a normal buffer with a tail
> descriptor, when you enqueue you give a tail descriptor pointer.


The tail pointers are an excellent idea - and they certainly fix a lot of 
compatibility issues that we side-stepped (we were going for the "make it 
work" approach rather than the "make it right" - figured we could get to that 
bit later  =P  ).

> I really dislike the pools of buffers, partly because they are fixed
> size (or dynamically sized and even more expensive to implement), but
> moreso because there is all of this absolutely stupid state management
> you eat just to get at the real data.  That's pointless, we're trying
> to make this as light as possible.  Just use real pointers and
> describe the packet with a tail descriptor.

We approached this from the understanding that an intelligent NIC will be able 
to transition directly to userspace, which is a major win.  0 copies to 
userspace would be sweet.  I think we can still achieve this using your 
scheme without *too* much pain.

> Next, you can't even begin to work on the protocol channels before you
> do one very important piece of work.  Integration of all of the ipv4
> and ipv6 protocol hash tables into a central code, it's a total
> prerequisite.  Then you modify things to use a generic
> inet_{,listen_}lookup() or inet6_{,listen_}lookup() that takes a
> protocol number as well as saddr/daddr/sport/dport and searches
> from a central table.

Understood.  And agreed.  Once again was side-stepped just to try to get a 
"working model".  Will look into this immediately.

> So I think I'll continue working on my implementation, it's more
> transitional and that's how we have to do this kind of work.


Thanks again for your comments  =) (and thanks to everyone else who took the 
time to respond to this)

Kelly

^ permalink raw reply

* RE: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: Rusty Russell @ 2006-04-27  3:40 UTC (permalink / raw)
  To: Caitlin Bestler; +Cc: David S. Miller, kelly, netdev
In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F143AE6C@NT-SJCA-0751.brcm.ad.broadcom.com>

On Wed, 2006-04-26 at 12:30 -0700, Caitlin Bestler wrote:
> David S. Miller wrote:
> 
> > 
> > I personally think allowing sockets to trump firewall rules
> > is an acceptable relaxation of the rules in order to simplify
> > the implementation.
> 
> I agree.  I have never seen a set of netfilter rules that
> would block arbitrary packets *within* an established connection.

Intelligent or no, this does happen.  More importantly, people rely on
packet counters.  Basically I don't think we can "relax" our firewall
implementation and retain trust 8(

I started thinking about this back in January.  We could force
everything through the "slow" path when something is registered with
netfilter (similarly raw sockets, bonding, divert).  Or, we could delay
LOCAL_IN hook processing until we get to socket receive.

Delaying netfilter hook processing won't work for intelligent NICs that
write straight to mmapped buffers, but we could make that CAP_NET_RAW.

We *used* to have an nf_cache mechanism to determine exactly when the
netfilter hooks cared about a packet, but it was never used and was hard
to reconcile with connection-tracking timeouts...

Cheers,
Rusty.
-- 
 ccontrol: http://ozlabs.org/~rusty/ccontrol

^ permalink raw reply

* Re: TSO and IPoIB performance degradation
From: Troy Benjegerdes @ 2006-04-27  4:13 UTC (permalink / raw)
  To: David S. Miller
  Cc: mst, rick.jones2, netdev, rdreier, linux-kernel, openib-general
In-Reply-To: <20060320.023704.70907203.davem@davemloft.net>

On Mon, Mar 20, 2006 at 02:37:04AM -0800, David S. Miller wrote:
> From: "Michael S. Tsirkin" <mst@mellanox.co.il>
> Date: Mon, 20 Mar 2006 12:22:34 +0200
> 
> > Quoting r. David S. Miller <davem@davemloft.net>:
> > > The path an SKB can take is opaque and unknown until the very last
> > > moment it is actually given to the device transmit function.
> > 
> > Why, I was proposing looking at dst cache. If that's NULL, well,
> > we won't stretch ACKs. Worst case we apply the wrong optimization.
> > Right?
> 
> Where you receive a packet from isn't very useful for determining
> even the full patch on which that packet itself flowed.
> 
> More importantly, packets also do not necessarily go back out over the
> same path on which packets are received for a connection.  This is
> actually quite common.
> 
> Maybe packets for this connection come in via IPoIB but go out via
> gigabit ethernet and another route altogether.
> 
> > What I'd like to clarify, however: rfc2581 explicitly states that in
> > some cases it might be OK to generate ACKs less frequently than
> > every second full-sized segment. Given Matt's measurements, TCP on
> > top of IP over InfiniBand on Linux seems to hit one of these cases.
> > Do you agree to that?
> 
> I disagree with Linux changing it's behavior.  It would be great to
> turn off congestion control completely over local gigabit networks,
> but that isn't determinable in any way, so we don't do that.
> 
> The IPoIB situation is no different, you can set all the bits you want
> in incoming packets, the barrier to doing this remains the same.
> 
> It hurts performance if any packet drop occurs because it will require
> an extra round trip for recovery to begin to be triggered at the
> sender.
> 
> The network is a black box, routes to and from a destination are
> arbitrary, and so is packet rewriting and reflection, so being able to
> say "this all occurs on IPoIB" is simply infeasible.
> 
> I don't know how else to say this, we simply cannot special case IPoIB
> or any other topology type.

David is right. If you care about performance, you are already using SDP
or verbs layer for the transport anyway. If I am going to be doing IPoIB,
it's because eventually I expect the packet might get off the IB network
and onto some other network and go halfway across the country.


^ permalink raw reply

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
From: Auke Kok @ 2006-04-27  4:55 UTC (permalink / raw)
  To: Shaw
  Cc: Andy Gospodarek, Michael Chan, Herbert Xu, netdev, auke-jan.h.kok,
	davem, jgarzik
In-Reply-To: <7bb8b8de0604261714h2471420xa06bb6639ddb6cea@mail.gmail.com>

Shaw wrote:
> On 4/21/06, Andy Gospodarek <andy@greyhouse.net> wrote:
>> On 4/21/06, Michael Chan <mchan@broadcom.com> wrote:
>>> On Fri, 2006-04-21 at 16:01 -0400, Andy Gospodarek wrote:
>>>
>>>> I just hate to see extra resources used to solve problems that good
>>>> coding can solve (not that my suggestion is necessarily a 'good' one),
>>>> so I was trying to think of a way to resolve this without explicitly
>>>> adding another workqueue.
>>> If you don't want to add another workqueue, then look at tg3, bnx2, and
>>> one of the smc drivers on how to effectively wait for the driver's
>>> workqueue task to finish without deadlocking with linkwatch_event.
>>>
>> I agree 100%.  I just hope others can manage to figure that out too.
> 
> Ok, here's another attempt.  The goal here is to serialize attempts to
> clean the tx and rx buffers, and ensure that e1000_close is called
> after the tx_timeout_task has completed running and/or that the task
> is safe to run after e1000_close hasrun.
> 
> I'm concerned about the addition of the netif_running check to
> e1000_down.  While something like this is needed, I'm not familiar
> enough w/ the code to know if this is okay.
> All explanations and comments are greatly appreciated.

I apologise for not getting back on this earlier but Jesse Brandeburg and I 
have been digging into this for two days and making some big progress. One of 
the main fixes will be that we're taking out a watchdog reset task completely 
and doing down/up cycles instead, which removes a large portion of the race 
conditions at this stage completely (the tx_timeout triggers a watchdog reset 
which can happen during an e1000_down causing a double free interrupt, or a 
double allocation).

We're making good progress with this and are now working on removing the last 
race between the ioctl path and the ifdn/ifup stuff, where the last remaining 
race location is in the ethtool test which does all sorts of funny lowlevel 
driver stuff that can seriously OOPS if you're running ethtool tests while 
ifup/downing your interface.

While I appreciate patches ;^) I think we're on a better path by making these 
cleanups, and actually reducing the code in large places. I hope to be able to 
push something out for RFC soon. Added benefit will be that we're dropping a 
whole bunch of irq operations where we didn't need to (soft resets).

Cheers,

Auke

^ permalink raw reply

* RE: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: James Morris @ 2006-04-27  4:58 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Caitlin Bestler, David S. Miller, kelly, netdev
In-Reply-To: <1146109226.11864.37.camel@localhost.localdomain>

On Thu, 27 Apr 2006, Rusty Russell wrote:

> netfilter (similarly raw sockets, bonding, divert).  Or, we could delay
> LOCAL_IN hook processing until we get to socket receive.

This an idea proposed for skfilter [1], too, allowing packets to be 
filtered by local endpoint.

[1] http://people.redhat.com/jmorris/selinux/skfilter/

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: Fw: Bug: PPP dropouts in >=2.6.16
From: Sven Schuster @ 2006-04-27  5:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, ak, jesse.brandeburg, netdev
In-Reply-To: <20060426150733.53cc9032.akpm@osdl.org>

[-- Attachment #1: Type: text/plain, Size: 3240 bytes --]

Hi Andrew,

On Wed, Apr 26, 2006 at 03:07:33PM -0700, Andrew Morton told us:
> So there's something in -mm which fixes your kernel?  It's usually the
> other way around ;)

actually this was the first time that I tried a "normal" kernel.
I haven't chosen to run -mm because it fixed something for me
originally, I just run -mm for a matter of taste ;)

> And it sounds like something which has been in -mm for a long time, so it
> might not be a patch which I was planning on sending upstream.
>
> Can you think of a way in which we can identify which patch does the good
> deed?

My first thought was it had something to do with pata_via, as
mkinitrd complained it cannot find that module in 2.6.16.9
when I installed it. Taking a closer look, it doesn't even seem
like pata_via is really used, its use count in lsmod output is 0.
But, in the last few releases of -mm I had problems every now and
then where my box didn't want to boot complaining about lost
interrupts on hdb (hdb here, not hda) or it just froze after some
days of uptime (I was able to do sysrq though). Later on I ran
SMART self tests on both my hard drives which didn't reveal any
errors. Google told me some other guys with VIA based boards had
similar problems which went away when using a board with another
vendor's chipset. Being a lazy bastard and having no real time I
stopped digging into this...
How to debug? I might try unapplying VIA and/or IDE related patches
from -mm until I get the same problem like with the stable series.
If one would tell me which patches I should try :-)

Here's the dmesg output concerning my IDE controller:

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 0000:00:07.1
PCI: Via IRQ fixup for 0000:00:07.1, from 255 to 0
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci0000:00:07.1
    ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
hda: Maxtor 6Y120L0, ATA DISK drive
hdb: Maxtor 6Y120L0, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: AOPEN CD-RW CRW4852 1.00 20030123, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 240121728 sectors (122942 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 hda10 hda11 >
hdb: max request size: 128KiB
hdb: 240121728 sectors (122942 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hdb: cache flushes supported
 hdb: hdb1 hdb2 hdb3 hdb4 < hdb5 hdb6 hdb7 hdb8 >
hdc: ATAPI 40X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20

If someone wants me to provide more info, test patches or anything
please tell me :-)

Thanks

Sven

--
Linux zion.homelinux.com 2.6.17-rc1-mm1_31 #31 Sat Apr 8 16:18:23 CEST 2006 i686 athlon i386 GNU/Linux
 07:19:57 up 12:02,  2 users,  load average: 0.19, 0.10, 0.13

[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: David S. Miller @ 2006-04-27  6:08 UTC (permalink / raw)
  To: caitlinb; +Cc: jeff, kelly, netdev, rusty
In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F143AEE0@NT-SJCA-0751.brcm.ad.broadcom.com>

From: "Caitlin Bestler" <caitlinb@broadcom.com>
Date: Wed, 26 Apr 2006 18:02:43 -0700

> Would it be reasonable to state that a net channel carrying
> SYNs should not be set up when the consumer is a user mode
> process?

I'm currently assuming that the protocol processing is still done in
the kernel on behalf of the user context, so the issues you raise
really aren't relevant.

We really shouldn't be jumping the gun so far into the implementation
as others seem to be doing.  Let's do it simple first and see if
putting things all the way to userspace even is necessary.

No work is going to get done if we keep carrying on like this
over details we really do not need to consider right away.

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: David S. Miller @ 2006-04-27  6:16 UTC (permalink / raw)
  To: jmorris; +Cc: rusty, caitlinb, kelly, netdev
In-Reply-To: <Pine.LNX.4.64.0604270055010.6286@d.namei>

From: James Morris <jmorris@namei.org>
Date: Thu, 27 Apr 2006 00:58:41 -0400 (EDT)

> On Thu, 27 Apr 2006, Rusty Russell wrote:
> 
> > netfilter (similarly raw sockets, bonding, divert).  Or, we could delay
> > LOCAL_IN hook processing until we get to socket receive.
> 
> This an idea proposed for skfilter [1], too, allowing packets to be 
> filtered by local endpoint.
> 
> [1] http://people.redhat.com/jmorris/selinux/skfilter/

Moving forward this really is an important problem that we'll need to
solve, and we'll need to solve it such that netfilter can be fully
enabled in tandem with net channels doing their thing.

It's simple, if we don't make them work together, then as a
consequence the real life sites that would benefit the most from net
channels will not see the benefit from them because they will use
netfilter and they will have firewall rules enabled.  Our work is
largely wasteful if that's what happens.

But let's move forward on the bits we can implement now, believing
optimistically that we will find a way to deal with this issue
properly. :-)

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: David S. Miller @ 2006-04-27  6:17 UTC (permalink / raw)
  To: rusty; +Cc: caitlinb, kelly, netdev
In-Reply-To: <1146109226.11864.37.camel@localhost.localdomain>

From: Rusty Russell <rusty@rustcorp.com.au>
Date: Thu, 27 Apr 2006 13:40:26 +1000

> We *used* to have an nf_cache mechanism to determine exactly when the
> netfilter hooks cared about a packet, but it was never used and was hard
> to reconcile with connection-tracking timeouts...

Let's not consider bringing that thing back :-)

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: Andi Kleen @ 2006-04-27  6:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: caitlinb, jeff, kelly, netdev, rusty
In-Reply-To: <20060426.230812.13989689.davem@davemloft.net>

On Thursday 27 April 2006 08:08, David S. Miller wrote:

> I'm currently assuming that the protocol processing is still done in
> the kernel on behalf of the user context, so the issues you raise
> really aren't relevant.
> 
> We really shouldn't be jumping the gun so far into the implementation
> as others seem to be doing.  Let's do it simple first and see if
> putting things all the way to userspace even is necessary.

I still have my doubts about doing that securely anyways.
 
> No work is going to get done if we keep carrying on like this
> over details we really do not need to consider right away.

One thing I would like to see is some generic code for the channels.
It might be interesting to try if that data structure could be used
in other parts of the kernel that pass objects around (like VM or block
layer) 

-Andi

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: David S. Miller @ 2006-04-27  6:25 UTC (permalink / raw)
  To: kelly; +Cc: rusty, netdev
In-Reply-To: <200604271331.37073.kelly@au.ibm.com>

From: Kelly Daly <kelly@au1.ibm.com>
Date: Thu, 27 Apr 2006 13:31:37 +1000

> It should be quite trivial to resize this pool using RCU.

Yes, a lot of this stuff can use RCU, in particular the channel
demux is a prime candidate.

There are some non-trivial issues wrt. synchronizing the net
channel lookup state with socket state changes (socket moves to
close or whatever).  This reminds me that we had some nice TCP
hash table RCU patches that Ben LaHaise posted at one point and
that slipped through the cracks.  That took care of all the event
ordering issues, it seemed at the time, and is something we need
to get back on track with.

> The tail pointers are an excellent idea - and they certainly fix a
> lot of compatibility issues that we side-stepped (we were going for
> the "make it work" approach rather than the "make it right" -
> figured we could get to that bit later =P ).

Start simple, we can keep mucking with the interfaces over and over
again as we move from simple netif_receive_skb() channels out to the
more complex socket demux style channel.

This is a big and long project, there are no style points for trying
to go all the way in the first pass :-)

> We approached this from the understanding that an intelligent NIC
> will be able to transition directly to userspace, which is a major
> win.  0 copies to userspace would be sweet.  I think we can still
> achieve this using your scheme without *too* much pain.

Understood.  What's your basic idea?  Just make the buffers in the
pool large enough to fit the SKB encapsulation at the end?

Note that this will change a lot of the assumptions currently in
your buffer handling code about buffer reuse and such.

So the idea in your scheme is to give the buffer pools to the NIC
in a per-channel way via a simple descriptor table?  And the u32's
are arbitrary keys that index into this descriptor table, right?

I would suggest just sticking to the simple global input queue.
Solve the easy problems and the buffering model first.  Then we
can port drivers and people can bang on the basic infrastructure.
Take my SKB encapsulator in my vj-2.6 tree once you've transformed
your buffer pools to accomodate.

I'll actually sit back and let you do that, I'm actually coming around
more to your scheme in some regards :-)  I'll sit and think about some
of the heavier issues we'll hit in the next phase and once you have
a cut at the current phase I'll work on a tg3 driver port.

Thanks!

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: David S. Miller @ 2006-04-27  6:27 UTC (permalink / raw)
  To: ak; +Cc: caitlinb, jeff, kelly, netdev, rusty
In-Reply-To: <200604270817.36041.ak@suse.de>

From: Andi Kleen <ak@suse.de>
Date: Thu, 27 Apr 2006 08:17:35 +0200

> On Thursday 27 April 2006 08:08, David S. Miller wrote:
> 
> > I'm currently assuming that the protocol processing is still done in
> > the kernel on behalf of the user context, so the issues you raise
> > really aren't relevant.
> > 
> > We really shouldn't be jumping the gun so far into the implementation
> > as others seem to be doing.  Let's do it simple first and see if
> > putting things all the way to userspace even is necessary.
> 
> I still have my doubts about doing that securely anyways.

The NIC has a descriptor of buffers, the NIC can thus DMA right
into this buffer which only contains packet data and nothing
else outside of those packets.

The software implementation, of course, will not be able to do
this and will need to copy.

> One thing I would like to see is some generic code for the channels.
> It might be interesting to try if that data structure could be used
> in other parts of the kernel that pass objects around (like VM or block
> layer) 

Seconded.  This should be easy once we have the basic global input
queue channel working.

I even put it in include/linux/netchannel.h in my vj-2.6 tree sort
of to hint at this. :)


^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: Andi Kleen @ 2006-04-27  6:41 UTC (permalink / raw)
  To: David S. Miller; +Cc: caitlinb, jeff, kelly, netdev, rusty
In-Reply-To: <20060426.232724.07687472.davem@davemloft.net>

On Thursday 27 April 2006 08:27, David S. Miller wrote:
> From: Andi Kleen <ak@suse.de>
> Date: Thu, 27 Apr 2006 08:17:35 +0200
> 
> > On Thursday 27 April 2006 08:08, David S. Miller wrote:
> > 
> > > I'm currently assuming that the protocol processing is still done in
> > > the kernel on behalf of the user context, so the issues you raise
> > > really aren't relevant.
> > > 
> > > We really shouldn't be jumping the gun so far into the implementation
> > > as others seem to be doing.  Let's do it simple first and see if
> > > putting things all the way to userspace even is necessary.
> > 
> > I still have my doubts about doing that securely anyways.
> 
> The NIC has a descriptor of buffers, the NIC can thus DMA right
> into this buffer which only contains packet data and nothing
> else outside of those packets.

Yes but all clients will see all the data from all sockets don't they?
[Unless you have a RDMA nic that can scale to hundred thousands of connections, 
but let's assume standard hardware for now]

-Andi

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: David S. Miller @ 2006-04-27  7:52 UTC (permalink / raw)
  To: ak; +Cc: caitlinb, jeff, kelly, netdev, rusty
In-Reply-To: <200604270841.52349.ak@suse.de>

From: Andi Kleen <ak@suse.de>
Date: Thu, 27 Apr 2006 08:41:51 +0200

> Yes but all clients will see all the data from all sockets don't
> they?  [Unless you have a RDMA nic that can scale to hundred
> thousands of connections, but let's assume standard hardware for
> now]

Each netchannel, which goes to a specific socket, has a ring
buffer of packets the NIC can use.  Those packets are mmap()'d
into userspace so we can control the layout, the page boundaries,
etc. and the NIC will only DMA packets matching that channel ID
into that userland area.

Have a look at the code Kelly posted.

^ permalink raw reply

* Re: [RFC] bridge: partial rtnetlink hooks
From: Patrick McHardy @ 2006-04-27  8:24 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20060426104521.44682924@localhost.localdomain>

Stephen Hemminger wrote:
> This is the start of adding support for rtnetlink to the bridge code.
> So far it only supports accessing the list of links and notifying
> about link changes. It is just a prototype to get early feedback, don't
> use to build your own masterpiece yet.
> 

> +static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
> +{
> +	struct net_device *dev;
> +	int idx = 0;
> +	int err = 0;
> +
> +	printk(KERN_DEBUG "bridge dump ifinfo\n");
> +	for (dev = dev_base; dev; dev = dev->next) {
> +		struct net_bridge_port *p = rcu_dereference(dev->br_port);


I think using rcu_dereference (especially without rcu_read_lock()) is
a bit misleading, the pointer is actually protected by the RTNL at
this point.


^ permalink raw reply

* [patch 1/9] smc911x Kconfig fix
From: akpm @ 2006-04-27  9:30 UTC (permalink / raw)
  To: jeff; +Cc: netdev, akpm, dustin


From: Andrew Morton <akpm@osdl.org>

In file included from drivers/net/smc911x.c:84:
drivers/net/smc911x.h:46:9: warning: "SMC_USE_16BIT" is not defined
drivers/net/smc911x.h:60:9: warning: "SMC_USE_32BIT" is not defined
drivers/net/smc911x.h:73:10: warning: "SMC_USE_PXA_DMA" is not defined
drivers/net/smc911x.c: In function `smc911x_reset':
drivers/net/smc911x.c:247: warning: implicit declaration of function `SMC_inl'
drivers/net/smc911x.c:249: warning: implicit declaration of function `SMC_outl'

Cc: Dustin McIntire <dustin@sensoria.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/net/Kconfig |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN drivers/net/Kconfig~smc911x-Kconfig-fix drivers/net/Kconfig
--- devel/drivers/net/Kconfig~smc911x-Kconfig-fix	2006-04-26 20:42:30.000000000 -0700
+++ devel-akpm/drivers/net/Kconfig	2006-04-26 20:42:56.000000000 -0700
@@ -869,7 +869,7 @@ config SMC911X
 	tristate "SMSC LAN911[5678] support"
 	select CRC32
 	select MII
-	depends on NET_ETHERNET
+	depends on NET_ETHERNET && ARCH_PXA
 	help
 	  This is a driver for SMSC's LAN911x series of Ethernet chipsets
 	  including the new LAN9115, LAN9116, LAN9117, and LAN9118.
_

^ permalink raw reply

* [patch 5/9] PCI Error Recovery: e100 network device driver
From: akpm @ 2006-04-27  9:30 UTC (permalink / raw)
  To: jeff; +Cc: netdev, akpm, linas, jesse.brandeburg


From: linas@austin.ibm.com (Linas Vepstas)

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel ethernet e100
device driver.  The patch has been tested, and appears to work well.

Signed-off-by: Linas Vepstas <linas@linas.org>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/net/e100.c |   75 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 75 insertions(+)

diff -puN drivers/net/e100.c~pci-error-recovery-e100-network-device-driver drivers/net/e100.c
--- devel/drivers/net/e100.c~pci-error-recovery-e100-network-device-driver	2006-04-10 23:21:20.000000000 -0700
+++ devel-akpm/drivers/net/e100.c	2006-04-10 23:21:20.000000000 -0700
@@ -2726,6 +2726,80 @@ static void e100_shutdown(struct pci_dev
 		DPRINTK(PROBE,ERR, "Error enabling wake\n");
 }
 
+/* ------------------ PCI Error Recovery infrastructure  -------------- */
+/**
+ * e100_io_error_detected - called when PCI error is detected.
+ * @pdev: Pointer to PCI device
+ * @state: The current pci conneection state
+ */
+static pci_ers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+
+	/* Similar to calling e100_down(), but avoids adpater I/O. */
+	netdev->stop(netdev);
+
+	/* Detach; put netif into state similar to hotplug unplug. */
+	netif_poll_enable(netdev);
+	netif_device_detach(netdev);
+
+	/* Request a slot reset. */
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * e100_io_slot_reset - called after the pci bus has been reset.
+ * @pdev: Pointer to PCI device
+ *
+ * Restart the card from scratch.
+ */
+static pci_ers_result_t e100_io_slot_reset(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct nic *nic = netdev_priv(netdev);
+
+	if (pci_enable_device(pdev)) {
+		printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+	pci_set_master(pdev);
+
+	/* Only one device per card can do a reset */
+	if (0 != PCI_FUNC(pdev->devfn))
+		return PCI_ERS_RESULT_RECOVERED;
+	e100_hw_reset(nic);
+	e100_phy_init(nic);
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+/**
+ * e100_io_resume - resume normal operations
+ * @pdev: Pointer to PCI device
+ *
+ * Resume normal operations after an error recovery
+ * sequence has been completed.
+ */
+static void e100_io_resume(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct nic *nic = netdev_priv(netdev);
+
+	/* ack any pending wake events, disable PME */
+	pci_enable_wake(pdev, 0, 0);
+
+	netif_device_attach(netdev);
+	if (netif_running(netdev)) {
+		e100_open(netdev);
+		mod_timer(&nic->watchdog, jiffies);
+	}
+}
+
+static struct pci_error_handlers e100_err_handler = {
+	.error_detected = e100_io_error_detected,
+	.slot_reset = e100_io_slot_reset,
+	.resume = e100_io_resume,
+};
 
 static struct pci_driver e100_driver = {
 	.name =         DRV_NAME,
@@ -2737,6 +2811,7 @@ static struct pci_driver e100_driver = {
 	.resume =       e100_resume,
 #endif
 	.shutdown =     e100_shutdown,
+	.err_handler = &e100_err_handler,
 };
 
 static int __init e100_init_module(void)
_

^ permalink raw reply

* [patch 4/9] PCI Error Recovery: e1000 network device driver
From: akpm @ 2006-04-27  9:30 UTC (permalink / raw)
  To: jeff; +Cc: netdev, akpm, linas, jesse.brandeburg


From: Linas Vepstas <linas@linas.org>

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel gigabit ethernet
e1000 device driver.  The patch has been tested, and appears to work well.

[akpm@osdl.org: minor cleanups]
Signed-off-by: Linas Vepstas <linas@linas.org>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/net/e1000/e1000_main.c |  116 ++++++++++++++++++++++++++++++-
 1 files changed, 115 insertions(+), 1 deletion(-)

diff -puN drivers/net/e1000/e1000_main.c~pci-error-recovery-e1000-network-device-driver drivers/net/e1000/e1000_main.c
--- devel/drivers/net/e1000/e1000_main.c~pci-error-recovery-e1000-network-device-driver	2006-04-22 01:39:15.000000000 -0700
+++ devel-akpm/drivers/net/e1000/e1000_main.c	2006-04-22 01:39:15.000000000 -0700
@@ -227,6 +227,16 @@ static int e1000_resume(struct pci_dev *
 static void e1000_netpoll (struct net_device *netdev);
 #endif
 
+static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev,
+                     pci_channel_state_t state);
+static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev);
+static void e1000_io_resume(struct pci_dev *pdev);
+
+static struct pci_error_handlers e1000_err_handler = {
+	.error_detected = e1000_io_error_detected,
+	.slot_reset = e1000_io_slot_reset,
+	.resume = e1000_io_resume,
+};
 
 static struct pci_driver e1000_driver = {
 	.name     = e1000_driver_name,
@@ -236,8 +246,9 @@ static struct pci_driver e1000_driver = 
 	/* Power Managment Hooks */
 #ifdef CONFIG_PM
 	.suspend  = e1000_suspend,
-	.resume   = e1000_resume
+	.resume   = e1000_resume,
 #endif
+	.err_handler = &e1000_err_handler,
 };
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -3076,6 +3087,10 @@ e1000_update_stats(struct e1000_adapter 
 
 #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF
 
+	/* Prevent stats update while adapter is being reset */
+	if (adapter->link_speed == 0)
+		return;
+
 	spin_lock_irqsave(&adapter->stats_lock, flags);
 
 	/* these counters are modified from e1000_adjust_tbi_stats,
@@ -4625,4 +4640,103 @@ e1000_netpoll(struct net_device *netdev)
 }
 #endif
 
+/**
+ * e1000_io_error_detected - called when PCI error is detected
+ * @pdev: Pointer to PCI device
+ * @state: The current pci conneection state
+ *
+ * This function is called after a PCI bus error affecting
+ * this device has been detected.
+ */
+static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev,
+						pci_channel_state_t state)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+
+	netif_device_detach(netdev);
+
+	if (netif_running(netdev))
+		e1000_down(adapter);
+
+	/* Request a slot slot reset. */
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * e1000_io_slot_reset - called after the pci bus has been reset.
+ * @pdev: Pointer to PCI device
+ *
+ * Restart the card from scratch, as if from a cold-boot. Implementation
+ * resembles the first-half of the e1000_resume routine.
+ */
+static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+
+	if (pci_enable_device(pdev)) {
+		printk(KERN_ERR "e1000: Cannot re-enable PCI device after "
+				"reset.\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+	pci_set_master(pdev);
+
+	pci_enable_wake(pdev, 3, 0);
+	pci_enable_wake(pdev, 4, 0); /* 4 == D3 cold */
+
+	/* Perform card reset only on one instance of the card */
+	if (PCI_FUNC (pdev->devfn) != 0)
+		return PCI_ERS_RESULT_RECOVERED;
+
+	e1000_reset(adapter);
+	E1000_WRITE_REG(&adapter->hw, WUS, ~0);
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+/**
+ * e1000_io_resume - called when traffic can start flowing again.
+ * @pdev: Pointer to PCI device
+ *
+ * This callback is called when the error recovery driver tells us that
+ * its OK to resume normal operation. Implementation resembles the
+ * second-half of the e1000_resume routine.
+ */
+static void e1000_io_resume(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+	uint32_t manc, swsm;
+
+	if (netif_running(netdev)) {
+		if (e1000_up(adapter)) {
+			printk(KERN_ERR "e1000: can't bring device back up "
+					"after reset\n");
+			return;
+		}
+	}
+
+	netif_device_attach(netdev);
+
+	if (adapter->hw.mac_type >= e1000_82540 &&
+	    adapter->hw.media_type == e1000_media_type_copper) {
+		manc = E1000_READ_REG(&adapter->hw, MANC);
+		manc &= ~(E1000_MANC_ARP_EN);
+		E1000_WRITE_REG(&adapter->hw, MANC, manc);
+	}
+
+	switch (adapter->hw.mac_type) {
+	case e1000_82573:
+		swsm = E1000_READ_REG(&adapter->hw, SWSM);
+		E1000_WRITE_REG(&adapter->hw, SWSM, swsm | E1000_SWSM_DRV_LOAD);
+		break;
+	default:
+		break;
+	}
+
+	if (netif_running(netdev))
+		mod_timer(&adapter->watchdog_timer, jiffies);
+}
+
 /* e1000_main.c */
_

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox