Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/9] mm: add generic adaptive large memory allocation APIs
From: Changli Gao @ 2010-05-14  8:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: akpm, Hoang-Nam Nguyen, Christoph Raisch, Roland Dreier,
	Sean Hefty, Hal Rosenstock, Divy Le Ray, James E.J. Bottomley,
	Theodore Ts'o, Andreas Dilger, Alexander Viro, Paul Menage,
	Li Zefan, linux-rdma, linux-kernel, netdev, linux-scsi,
	linux-ext4, linux-fsdevel, linux-mm, containers, Eric Dumazet,
	Tetsuo Handa
In-Reply-To: <1273824214.5605.3625.camel@twins>

On Fri, May 14, 2010 at 4:03 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-05-13 at 22:08 +0800, Changli Gao wrote:
>> > NAK, I really utterly dislike that inatomic argument. The alloc side
>> > doesn't function in atomic context either. Please keep the thing
>> > symmetric in that regards.
>> >
>>
>> There are some users, who release memory in atomic context. for
>> example: fs/file.c: fdmem.
>
> urgh, but yeah, aside from not using vmalloc to allocate fd tables one
> needs to deal with this.
>
> But if that is the only one, I'd let them do the workqueue thing that's
> already there. If there really are more people wanting to do this, then
> maybe add: kvfree_atomic().
>

Tetsuo has pointed another one in apparmor.
http://kernel.ubuntu.com/git?p=jj/ubuntu-lucid.git;a=blobdiff;f=security/apparmor/match.c;h=d2cd55419acfcae85cb748c8f837a4384a3a0d29;hp=afc2dd2260edffcf88521ae86458ad03aa8ea12c;hb=f5eba4b0a01cc671affa429ba1512b6de7caeb5b;hpb=abdff9ddaf2644d0f9962490f73e030806ba90d3
, though apparmor hasn't been merged into mainline.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] bonding: allow user-controlled output slave selection
From: John Fastabend @ 2010-05-14  8:53 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Andy Gospodarek, Neil Horman, netdev@vger.kernel.org
In-Reply-To: <3797.1273776862@death.nxdomain.ibm.com>

Jay Vosburgh wrote:
> John Fastabend <john.r.fastabend@intel.com> wrote:
> 
>> Andy Gospodarek wrote:
>>> On Wed, May 12, 2010 at 12:41:54PM -0700, Jay Vosburgh wrote:
> [...]
>>>>       One goal I'm hoping to achieve is something that would satisfy
>>>> both the queue map stuff that you're looking for, and would meet the
>>>> needs of the FCOE people who also want to disable the duplicate
>>>> suppression (i.e., permit incoming traffic on the inactive slave) for a
>>>> different special case.
>>>>
>>>>       The FCOE proposal revolves around, essentially, active-backup
>>>> mode, but permitting incoming traffic on the inactive slave.  At the
>>>> moment, the patches attempt to special case things such that only
>>>> dev_add_pack listeners directly bound to the inactive slave are checked
>>>> (to permit the FCOE traffic to pass on the inactive slave, but still
>>>> dropping IP, as ip_rcv is a wildcard bind).
>>>>
>>>>       Your keep_all patch is, by and large, the same thing, except
>>>> that it permits anything to come in on the "inactive" slave, and it's a
>>>> switch that has to be turned on.
>>>>
>>>>       This seems like needless duplication to me; I'd prefer to see a
>>>> single solution that handles both cases instead of two special cases
>>>> that each do 90% of what the other does.
>>>>
>>>>       As far as a new mode goes, one major reason I think a separate
>>>> mode is warranted is the semantics: with either of these changes (to
>>>> permit more or less regular use of the "inactive" slaves), the mode
>>>> isn't really an "active-backup" mode any longer; there is no "inactive"
>>>> or "backup" slave.  I think of this as being a major change of
>>>> functionality, not simply a minor option.
>>>>
>>>>       Hence my thought that "active-backup" could stay as a "true" hot
>>>> standby mode (backup slaves are just that: for backup, only), and this
>>>> new mode would be the place to do the special queue-map / FCOE /
>>>> whatever that isn't really a hot standby configuration any longer.
>>>>
>>>>       As far as the behavior of the new mode (your concern about its
>>>> policy map approximations), in the end, it would probably act pretty
>>>> much like active-backup with your patch applied: traffic goes out the
>>>> active slave, unless directed otherwise.  It's a lot less complicated
>>>> than I had feared.
>>>>
>>> It's beginning to sound like the 'FCoE use-case' and the one Neil and I
>>> are proposing are quite similar.  The main goal of both is to have the
>>> option to have multiple slaves send and receive traffic during the
>>> steady-state, but in the event of a failover all traffic would run on a
>>> single interface.
>>>
>> I believe they are similar although I never considered using FCoE over a
>> device that is actually in the bond.  For example the current FCoE use
>> case is,
>>
>> bond0 ------> ethx
>>               |
>> vlan-fcoe -->  |
>>
>> Here vlan-fcoe is not a slave of bond0.  With the keep_all patch this
>> would work plus an additional configuration,
> 
>         I also recall discussion that another valid FCOE configuration
> is to simply bind to ethx, with no VLANs involved.
> 

Correct this is valid and works today because the skb will be delivered 
to exact matches in this case.

>> bond0 --> vlan-fcoe1  ---> ethx
>>   \                        |
>>    \ --- vlan-fcoe2  --->  |
>>
>> Here both vlan-fcoe1 and vlan-fcoe2 are slaves of bond0.
>>
>> Even with the keep_all patch it still seems a little inconsistent to drop
>> a packet outright if it is received on an inactive slave and destined for
>> a vlan on the bond and then to deliver the packet to devices that have
>> exact matches if it is received on an inactive slave but destined for the
>> bond device.  I'll post a patch in just a moment that hopefully
>> illustrates what I see as an unexpected side effect.
> 
>         Yes, I understand this, and I view this as a separate concern
> from the duplicate suppressor (although they are linked to a degree).

OK, my third patch adding bond_should_drop flag to the sk_buff struct 
should address this.

> 
>         The ultimate intent (for your changes) is to permit slaves to
> operate simultaneously as members of the bond as well as independent
> entities, which is a significant behavior change from the past.
> 

I expect the primary use case to be attaching a vlan outside the bond 
over a enslaved real device.  Most of this is already implemented though 
by delivering skbs to exact matches in the inactive case.

>>> The implementation proposed with this patch is a bit different that the
>>> 'mark-mode' patch you may recall I posted a few months ago.  It created
>>> a new mode that essentially did exactly what you are describing --
>>> transmit on the primary interface unless pushed to another interface via
>>> info in the skb and receive on all interfaces.  We initially did not
>>> create a new mode based on your reservations about the previous
>>> mark-mode patch and went the direction of enhancing one or two modes
>>> initially (figuring it would be good to run before walking), with the
>>> idea that other modes could take care of this output slave selection
>>> logic in the future.
>>>
>>>
>>>>>>    I presume you're overloading active-backup because it's not
>>>>>> etherchannel, 802.3ad, etc, and just talks right to the switch.  For the
>>>>>> regular load balance modes, I still think overlay into the existing
>>>>>> modes is preferable (more on that later); I'm thinking of "manual"
>>>>>> instead of another tweak to active-backup.
>>>>>>
>>>>>>    If users want to have actual hot-standby functionality, then
>>>>>> active-backup would do that, and nothing else (and it can be multi-queue
>>>>>> aware, but only one slave active at a time).
>>>>>>
>>>>> Yes, but active backup doesn't provide prefered output path selection in and of
>>>>> itself.  Thats the feature here.
>>>>       I understand that; I'm suggesting that active-backup should
>>>> provide no service other than hot standby, and not be overloaded into a
>>>> manual load balancing scheme (both for your use, and for FCOE).
>>>>
>>>>       Maybe I'm worrying too much about defending the purity of the
>>>> active-backup mode; I understand what you're trying to do a little
>>>> better now, and yes, the "manual" mode I think of (in your queue mapping
>>>> scheme, not the other doodads I talked about) would basically be
>>>> active-backup with your queue mapper, minus the duplicate suppressor.
>>>>
>>> It doesn't matter terribly to me which direction is taken.  Again, a
>>> major reason this route was proposed was that you were not as keen on
>>> creating a new mode as I was at the time of that patch posting.  It's
>>> somewhat understandable as once a mode is added it's tough to take away,
>>> but when one sees how much we are really changing the way active-backup
>>> might behave in some cases maybe it makes sense to use a new mode?
>>>
>>> I guess I like the idea of adding this output selection to existing
>>> modes because it at least gives us the option to use queue maps to
>>> select output interfaces for more than a mode that looks like
>>> present-day active-backup minus the duplicate suppression.   I'm happy to
>>> code-up a patch that creates a new mode, but before I go do that and
>>> test it, I'd like to know we have come to an agreement on the direction
>>> for the future.
>>>
>>>>>>    Users who want the set of bonded slaves to look like a big
>>>>>> multiqueue buffet could use this "manual" mode and set things up however
>>>>>> they want.  One way to set it up is simply that the bond is N queues
>>>>>> wide, where N is the total of the queue counts of all the slaves.  If a
>>>>>> slave fails, N gets smaller, and the user code has to deal with that.
>>>>>> Since the queue count of a device can't change dynamically, the bond
>>>>>> would have to actually be set up with some big number of queues, and
>>>>>> then only a subset is actually active (or there is some sort of wrap).
>>>>>>
>>>>>>    In such an implementation, each slave would have a range of
>>>>>> queue IDs, not necessarily just one.  I'm a bit leery of exposing an API
>>>>>> where each slave is one queue ID, as it could make transitioning to real
>>>>>> multi-queue awareness difficult.
>>>>>>
>>>>> I'm sorry, what exactly do you mean when you say 'real' multi queue
>>>>> awareness?  How is this any less real than any other implementation?  The
>>>>> approach you outline above isn't any more or less valid than this one.
>>>>       That was my misunderstanding of how you planned to handle
>>>> things.  I had thought this patch was simply a scheme to use the queue
>>>> IDs for slave selection, without any method to further perform queue
>>>> selection on the slave itself (I hadn't thought of placing a tc action
>>>> on the slave itself, which you described later on).  I had been thinking
>>>> in terms of schemes to expose all of the slave queues on the bonding
>>>> device.
>>> It wasn't our original intention either.  I didn't mention it in my
>>> original post as it wasn't really the intent of our patch, but a nice
>>> side-effect for the informed user. :) Obviously a bit more testing could
>>> take place and we could add more examples to the documentation for the
>>> nice side-effect feature of this patch, but since this wasn't our
>>> original intent and we didn't test it, we did not advertise it.
>>>
>>>>       So, I don't see any issues with the queue mapping part.  I still
>>>> want to find a common solution for FCOE / your patch with regards to the
>>>> duplicate suppressor.
>>> Understood.
>>>
>>>>> While we're on the subject, Andy and I did discuss a model simmilar to what you
>>>>> describe above (what I'll refer to as a queue id passthrough model), in which
>>>>> you can tell the bonding driver to map a frame to a queue, and the bonding
>>>>> driver doesn't really do anything with the queue id other than pass to the slave
>>>>> device for hardware based multiqueue tx handling.  While we could do that, its
>>>>> my feeling such a model isn't the way to go for two primary reasons:
>>>>>
>>>>> 1) Inconsistent behavior.  Such an implementation makes assumptions regarding
>>>>> queue id specification within a driver.  For example, What if one of the slaves
>>>>> reserves some fixed number of low order queues for a sepecific purpose, and as
>>>>> such general use queues begin an at offset from zero, while other slaves do not.
>>>>> While its easy to accomidate such needs when writing the tc filters, if a slave
>>>>> fails over, such a bias would change output traffic behavior, as the bonding
>>>>> driver can't be clearly informed of such a bias.  Likewise, what if a slave
>>>>> driver allocates more queues than it actually supports in hardware (like the
>>>>> implementation you propose, ixgbe IIRC actually does this).  If slaves handled
>>>>> unimplemented tx queues different (if one wrapped queues, while the other simply
>>>>> dropped frames to unimplemented queues for instance).  A failover would change
>>>>> traffic patterns dramatically.
>>>>>
>>>>> 2) Need.  While (1) can pretty easily be managed with a few configuration
>>>>> guidelines (output queues on slaves have to be configured identically, lets
>>>>> chaos and madness befall you, etc), theres really no reason to bind users to
>>>>> such a system.  We're using tc filters to set the queue id on skbs enqueued to
>>>>> the bonding driver, theres absolutely no reason you can add addition filters to
>>>>> the slaves directly.  Since the bonding driver uses dev_queue_xmit to send a
>>>>> frame to a slave, it has the opportunity to pass through another set of queuing
>>>>> diciplines and filters that can reset and re-assign the skbs queue mapping.  So
>>>>> with the approach in this patch you can get both direct output control without
>>>>> sacrificing actual hardware tx output queue control.  With a passthrough model,
>>>>> you save a bit of filter configuration, but at the expense of having to be much
>>>>> more careful about how you configure your slave nics, and detecting such errors
>>>>> in configuration would be rather difficult to track down, as it would require
>>>>> the generation of traffic that hit the right filter after a failover.
>>>>       I don't disagree with any of this.  One thought I do have is
>>>> that Eric Dumazet, I believe, has mentioned that the read lock in
>>>> bonding is a limiting factor on 10G performance.  In the far distant
>>>> future when bonding is RCU, going through the lock(s) on the tc actions
>>>> of the slave could have the same net effect, and in such a case, a
>>>> qdisc-less path may be of benefit.  Not a concern for today, I suspect.
>>>>
>>>>>>    There might also be a way to tie it in to the new RPS code on
>>>>>> the receive side.
>>>>>>
>>>>>>    If the slaves all have the same MAC and attach to a single
>>>>>> switch via etherchannel, then it all looks pretty much like a single big
>>>>>> honkin' multiqueue device.  The switch probably won't map the flows back
>>>>>> the same way, though.
>>>>>>
>>>>> I agree, they probably wont.  Receive side handling wasn't really our focus here
>>>>> though.  Thats largely why we chose round robin and active backup as our first
>>>>> modes to use this with.  They are already written to expect frames on either
>>>>> interface.
>>>>>
>>>>>>    If the slaves are on discrete switches (without etherchannel),
>>>>>> things become more complicated.  If the slaves have the same MAC, then
>>>>>> the switches will be irritated about seeing that same MAC coming in from
>>>>>> multiple places.  If the slaves have different MACs, then ARP has the
>>>>>> same sort of issues.
>>>>>>
>>>>>>    In thinking about it, if it's linux bonding at both ends, there
>>>>>> could be any number of discrete switches in the path, and it wouldn't
>>>>>> matter as long as the linux end can work things out, e.g.,
>>>>>>
>>>>>>         -- switch 1 --
>>>>>> hostA  /              \  hostB
>>>>>> bond  ---- switch 2 ---- bond
>>>>>>        \              /
>>>>>>         -- switch 3 --
>>>>>>
>>>>>>    For something like this, the switches would never share MAC
>>>>>> information for the bonding slaves.  The issue here then becomes more of
>>>>>> detecting link failures (it would require either a "trunk failover" type
>>>>>> of function on the switch, or some kind of active probe between the
>>>>>> bonds).
>>>>>>
>>>>>>    Now, I realize that I'm babbling a bit, as from reading your
>>>>>> description, this isn't necessarily your target topology (which sounded
>>>>>> more like a case of slave A can reach only network X, and slave B can
>>>>>> reach anywhere, so sending to network X should use slave A
>>>>>> preferentially), or, as long as I'm doing ASCII-art,
>>>>>>
>>>>>>        --- switch 1 ---- network X
>>>>>> hostA /               /
>>>>>> bond  ---- switch 2 -+-- anywhere
>>>>>>
>>>>>>    Is that an accurate representation?  Or is it something a bit
>>>>>> different, e.g.,
>>>>>>
>>>>>>        --- switch 1 ---- network X -\
>>>>>> hostA /                             /
>>>>>> bond  ---- switch 2 ---- anywhere --
>>>>>>
>>>>>>    I.e., the "anywhere" connects back to network X from the
>>>>>> outside, so to speak.  Or, oh, maybe I'm missing it entirely, and you're
>>>>>> thinking of something like this:
>>>>>>
>>>>>>        --- switch 1 --- VPN --- web site
>>>>>> hostA /                          /
>>>>>> bond  ---- switch 2 - Internet -/
>>>>>>
>>>>>>    Where you prefer to hit "web site" via the VPN (perhaps it's a
>>>>>> more efficient or secure path), but can do it from the public network at
>>>>>> large if necessary.
>>>>>>
>>>>> Yes, this one.  I think the other models are equally interesting, but this model
>>>>> in which either path had universal reachabilty, but for some classes of traffic
>>>>> one path is preferred over the other is the one we had in mind.
>>>>>
>>>>>>    Now, regardless of the above, your first patch ("keep_all") is
>>>>>> to deal with the reverse problem, if this is a piggyback on top of
>>>>>> active-backup mode: how to get packets back, when both channels can be
>>>>>> active simultaneously.  That actually dovetails to a degree with work
>>>>>> I've been doing lately, but the solution there probably isn't what
>>>>>> you're looking for (there's a user space daemon to do path finding, and
>>>>>> the "bond IP" address is piggybacked on the slaves' MAC addresses, which
>>>>>> are not changed; the "bond IP" set exists in a separate subnet all its
>>>>>> own).
>>>>>>
>>>>>>    As I said, I'm not convinced that the "keep_all" option to
>>>>>> active-backup is really better than just a "manual" mode that lacks the
>>>>>> dup suppression and expects the user to set everything up.
>>>>>>
>>>>>>    As for the round-robin change in this patch, if I'm reading it
>>>>>> right, then the way it works is that the packets are round-robined,
>>>>>> unless there's a queue id passed in, in which case it's assigned to the
>>>>>> slave mapped to that queue id.  I'm not entirely sure why you picked
>>>>>> round-robin mode for that over balance-xor; it doesn't seem to fit well
>>>>>> with the description in the documentation.  Or is it just sort of a
>>>>>> demonstrator?
>>>>>>
>>>>> It was selected because round robin allows transmits on any interface already,
>>>>> and expects frames on any interface, so it was a 'safe' choice.  I would think
>>>>> balance-xor would also work.  Ideally it would be nice to get more modes
>>>>> supporting this mechanism.
>>>>       I think that this should work for balance-xor and 802.3ad.  The
>>>> only limitation for 802.3ad is that the spec requires "conversations" to
>>>> not be striped or to skip around in a manner that could lead to out of
>>>> order delivery.
>>> Agreed.  Checking would probably also have to be done to make sure that
>>> we were not trasmitting on an inactive aggregator.
>>>
>>>>       I'm not so sure about the alb/tlb modes; at first thought, I
>>>> think it could have conflicts with the internal balancing done within
>>>> the modes (if, e.g., the tc action put traffic for the same destination
>>>> on two slaves).
>>>>
>>> TLB and ALB modes would certainly have to be done differently.  It
>>> should not be terribly difficult to move from the existing hashing
>>> that's done to one that relies on the queue_mapping, but it will take a
>>> bit to make sure it's not a complete hack.
>>>
>>> We decided against doing that for all modes on the first pass as it
>>> seemed like the active-backup and round-robin were the most-likely
>>> users.  We also wanted present the code early rather that spending time
>>> supporting this on every-mode to find out that it just wasn't rational
>>> to do it on some of them.
>>>
>>>>>>    I do like one other aspect of the patch, and that's the concept
>>>>>> of overlaying the queue map on top of the balance algorithm.  So, e.g.,
>>>>>> balance-xor would do its usual thing, unless the packet is queue mapped,
>>>>>> in which case the packet's assignment is obeyed.  The balance-xor could
>>>>>> even optionally do its xor across the full set of all slaves output
>>>>>> queues instead of just across the slaves.  Round-robin can operate
>>>>>> similarly.  For those modes, a "balance by queue vs. balance by slave"
>>>>>> seems like a reasonable knob to have.
>>>>> Not sure what you mean here.  In the model implemented by this patch, there is
>>>>> one output queue per slave, and as such, balance by queue == balance by slave.
>>>>> That would make sense in the model you describe earlier in this note, but not in
>>>>> the model presented by this patch.
>>>>       Yes, I was thinking about what I had described; again,
>>>> predicated on my misunderstanding of how it all worked.
>>>>
>>>>>>    I do understand that you're proposing something relatively
>>>>>> simple, and I'm thinking out loud about alternate or additional
>>>>>> implementation details.  Some of this is "ooh ahh what if", but we also
>>>>>> don't want to end up with something that's forwards incompatible, and
>>>>>> I'm hoping to find one solution to multiple problems.
>>>>>>
>>>>> For clarification, can you ennumerate what other problems you are trying to
>>>>> solve with this feature, or features simmilar to this?  From this email, the one
>>>>> that I most clearly see is the desire to allow a passthrough mode of queue
>>>>> selection, which I think I've noted can be done already (even without this
>>>>> patch), by attaching additional tc filters to the slaves output queues directly.
>>>>> What else do you have in mind?
>>>>       As I said above, I hadn't thought of stacking tc actions on to
>>>> the slaves directly, so I was thinking on ways to expose the slave
>>>> queues.
>>>>
>>>>       I still find something intriguing about a round-robin or xor
>>>> mode that robins/xors through all of the slave queues, though, but that
>>>> should be something separate (I'm not sure if such a scheme is actually
>>>> "better", either).
>>>>
>>>>       -J
>> It would be best if there was a solution for the FCoE use case that works
>> with the current bonding modes including 802.3ad.  There is switch support
>> to run mpio FCoE while doing link aggregation on the LAN side that we
>> should support.  I'm not sure the keep_all patch would be good in this
>> case Jay I think you mentioned this at some point, but I missed the
>> conclusion?  Although maybe it would be OK I'll think about it some more
>> tomorrow.
> 
>         How does that mpio FCOE / switch support function?  Does it rely
> on utilizing ports (for the FC traffic) that are not members of the
> 802.3ad active aggregator?  E.g.:
> 
>        / eth0,eth1 -- switch A -- etc
> bond0 -
>        \ eth2,eth3 -- switch B -- etc
> 
>         Bond0 has four slaves, eth0 - eth3.  Eth0 and eth1 connect to
> switch A; eth2 and eth3 to switch B.  Presuming that the switches aren't
> stacked / magic, either eth0/eth1 or eth2/eth3 will be the active
> aggregator (linux bonding only supports one active aggregator).
> 
>         Am I correct in presuming that the FCOE balancer gizmo doesn't
> care about the 802.3ad state of the ports, and it and the switch will
> run FC traffic across all four ports, regardless of which ports are
> active and which are not?
> 
>         Or is the switch even simpler than that, and it processes all
> FCOE traffic to ports, regardless of how the ports are configured
> (etherchannel, 802.3ad, etc)?
>

The switch is simple and FCoE traffic is bound to ports which are in 
turn bound to VSAN (virtual SAN) ports on the FC side.  So the FCoE 
traffic works regardless of how the ports are configured (etherchannel, 
802.3ad, etc).

>         As for other bonding modes, balance-xor or balance-rr (round
> robin) shouldn't have the same problems with the duplicate suppression
> logic that active-backup and 802.3ad have.  The alb/tlb modes might or
> might not be workable at all, depending upon how the FCOE traffic looks
> (e.g., what source and destination MAC addresses are in the FCOE
> frames?).

We have the ability to use a SAN MAC that can be different then the LAN 
MAC which should allow this to work.

> 
>         In any event, wanting to run FCOE in conjunction with a variety
> of bonding modes suggests that Neil was right all along, and the
> duplicate suppressor change should be an option, not a new mode.

This should be OK, but all FCoE really needs is skbs to be delivered to 
devices that have packet handlers with exact matches.  Basically, making 
  the inactive VLAN case behave the same as an inactive real device.

Thanks,
John


> 
>         -J
> 
> ---
>         -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/3] e1000: fix WARN_ON with mac-vlan
From: David Miller @ 2010-05-14 10:14 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, jpirko, jesse.brandeburg
In-Reply-To: <20100514012425.30457.23799.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 13 May 2010 18:25:33 -0700

> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> When adding more than 14 mac-vlan adapters on e1000 the driver
> would fire a WARN_ON when adding the 15th.  The WARN_ON in this
> case is completely un-necessary, as the code below the WARN_ON is
> directly handling the value the WARN_ON triggered on.
> 
> CC: Jiri Pirko <jpirko@redhat.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/3] e1000: cleanup unused parameters
From: David Miller @ 2010-05-14 10:14 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, jesse.brandeburg
In-Reply-To: <20100514012554.30457.66528.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 13 May 2010 18:25:56 -0700

> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> During the cleanup pass after the removal of e1000e hardware from e1000 some
> parameters were missed.  Remove them because it is just dead code.
> 
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 3/3] ixgb and e1000: Use new function for copybreak tests
From: David Miller @ 2010-05-14 10:14 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, joe
In-Reply-To: <20100514012615.30457.37881.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 13 May 2010 18:26:17 -0700

> From: Joe Perches <joe@perches.com>
> 
> There appears to be an off-by-1 defect in the maximum packet size
> copied when copybreak is speified in these modules.
> 
> The copybreak module params are specified as:
> "Maximum size of packet that is copied to a new buffer on receive"
> 
> The tests are changed from "< copybreak" to "<= copybreak"
> and moved into new static functions for readability.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH 0/6] sky2: update
From: David Miller @ 2010-05-14 10:15 UTC (permalink / raw)
  To: shemminger; +Cc: mikem, netdev
In-Reply-To: <20100513161247.833356588@vyatta.com>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 13 May 2010 09:12:47 -0700

> Bunch of patches from Mike, with some additional comments.

All applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH 0/9]qlcnic: cleanup
From: David Miller @ 2010-05-14 10:15 UTC (permalink / raw)
  To: amit.salecha; +Cc: netdev, ameen.rahman
In-Reply-To: <1273756070-7205-1-git-send-email-amit.salecha@qlogic.com>

From: Amit Kumar Salecha <amit.salecha@qlogic.com>
Date: Thu, 13 May 2010 06:07:41 -0700

> Hi
>   Series of 9 patches to cleanup unused code and to support quiscent
>   mode. 

All applied, thanks.

^ permalink raw reply

* Re: [PATCH] netfilter: Remove skb_is_nonlinear check from nf_conntrack_sip
From: Patrick McHardy @ 2010-05-14 10:16 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Jason Gunthorpe, netfilter-devel, netdev
In-Reply-To: <alpine.LSU.2.01.1005140849500.28602@obet.zrqbmnf.qr>

Jan Engelhardt wrote:
> On Friday 2010-05-14 02:38, Jason Gunthorpe wrote:
> 
>> At least the XEN net front driver always produces non linear skbs,
>> so the SIP module does nothing at all when used with that NIC.
>>
>> Copy the hacky technique for accessing SKB data from the ftp conntrack,
>> better than nothing..
>>
>> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
>>
>> +/* This is slow, but it's simple. --RR */
>> +static char *sip_buffer;
>> +static DEFINE_SPINLOCK(nf_sip_lock);
>> +
> 
> skb_linearize seems simpler. (What about the cost?)

Yeah, we have to use skb_linearize(). The SIP NAT helper might mangle
the packet and alter its size, at which point we'd have to make a new
copy of the data area to get the offsets right.

^ permalink raw reply

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Patrick McHardy @ 2010-05-14 10:47 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw, arnd
In-Reply-To: <C811BD6D.312C4%scofeldm@cisco.com>

Scott Feldman wrote:
> On 5/13/10 1:40 PM, "Patrick McHardy" <kaber@trash.net> wrote:
> 
>>> +  if (vf_port[IFLA_VF_PORT_VF])
>>> +   vf = nla_get_u32(vf_port[IFLA_VF_PORT_VF]);
>>> +  err = -EOPNOTSUPP;
>>> +  if (ops->ndo_set_vf_port)
>>> +   err = ops->ndo_set_vf_port(dev, vf, vf_port);
>> This appears to be addressing a single VF to issue commands.
>> I already explained this during the last set of VF patches,
>> messages are supposed to by symetrical, since you're dumping
>> state for all existing VFs, you also need to accept configuration
>> for multiple VFs. Basically, the kernel must be able to receive
>> a message it created during a dump and fully recreate the state.
> 
> This was modeled same as existing IFLA_VF_ cmd where single VF is addressed
> on set, but all VFs for PF are dumped on get.

Yes, that one should have been done differently as well,
unfortunately my comments were ignored. So far rtnetlink
had two properties that are now broken:

- messages sent by the kernel could be sent back to the
  kernel to re-create an object in the same state

- the same parsing functions could be used in userspace for
  messages sent by the kernel and netlink error messages,
  which contain the original userspace message

I know at least one program I've written a few years ago which
relies on the second property. Anyways, this is easily fixable
by encapsulating all top-level VF attributes in a list and
invoking the ndo_set_vf_port() callback for each VF configuration.

^ permalink raw reply

* Re: [net-next-2.6 V7 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Patrick McHardy @ 2010-05-14 10:58 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw, arnd
In-Reply-To: <20100514013526.1816.45104.stgit@savbu-pc100.cisco.com>

Scott Feldman wrote:
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -653,6 +653,26 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev)
>  		return 0;
>  }
>  
> +static size_t rtnl_vf_port_size(const struct net_device *dev)
> +{
> +	size_t vf_port_size = nla_total_size(sizeof(struct nlattr))
> +						     /* VF_PORT_VF */
> +		+ nla_total_size(VF_PORT_PROFILE_MAX)/* VF_PORT_PROFILE */
> +		+ nla_total_size(sizeof(struct ifla_vf_port_vsi))
> +						     /* VF_PORT_VSI_TYPE */
> +		+ nla_total_size(VF_PORT_UUID_MAX)   /* VF_PORT_VSI_INSTANCE */
> +		+ nla_total_size(VF_PORT_UUID_MAX)   /* VF_PORT_HOST_UUID */
> +		+ nla_total_size(1)		     /* VF_PROT_VDP_REQUEST */

Do messages generated by the kernel really contain a request?

> +		+ nla_total_size(2);		     /* VF_PORT_VDP_RESPONSE */
> +
> +	if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent)
> +		return 0;
> +	if (dev_num_vf(dev->dev.parent))
> +		return vf_port_size * dev_num_vf(dev->dev.parent);
> +	else
> +		return vf_port_size;
> +}
> +


> +static int rtnl_vf_port_fill_nest(struct sk_buff *skb, struct net_device *dev,
> +				  int vf)
> +{
> +	struct nlattr *data;
> +	int err;
> +
> +	data = nla_nest_start(skb, IFLA_VF_PORT);

We usually use a top-level attribute to encapsulate lists of identical
attributes. The other iflink attributes may only occur once and are
usually parsed using nla_parse_nested(), which will parse all
IFLA_VF_PORT attributes, but only return the last one.

Something like:

iflink message:
...
[IFLA_VF_PORTS]
  [IFLA_VF_PORT]
    [IFLA_VF_PORT_*], ...
  [IFLA_VF_PORT]
    [IFLA_VF_PORT_*], ...
  ...


> +	if (!data)
> +		return -EMSGSIZE;
> +
> +	if (vf != VF_PORT_VF_NOT_USED)
> +		nla_put_u32(skb, IFLA_VF_PORT_VF, vf);

This should be checking for errors or use NLA_PUT_U32.

> +
> +	err = dev->netdev_ops->ndo_get_vf_port(dev, vf, skb);
> +	if (err) {
> +		nla_nest_cancel(skb, data);
> +		return err;
> +	}
> +
> +	nla_nest_end(skb, data);
> +
> +	return 0;
> +}
> +

>  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>  			    int type, u32 pid, u32 seq, u32 change,
>  			    unsigned int flags)
> @@ -747,17 +819,23 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>  		goto nla_put_failure;
>  	copy_rtnl_link_stats64(nla_data(attr), stats);
>  
> +	if (dev->dev.parent)
> +		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));

Just wondering, is the only case where dev.parent is non-NULL
really when virtual ports are present?

> +
>  	if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent) {
>  		int i;
>  		struct ifla_vf_info ivi;
>  
> -		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
>  		for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
>  			if (dev->netdev_ops->ndo_get_vf_config(dev, i, &ivi))
>  				break;
>  			NLA_PUT(skb, IFLA_VFINFO, sizeof(ivi), &ivi);
>  		}
>  	}
> +
> +	if (rtnl_vf_port_fill(skb, dev))
> +		goto nla_put_failure;
> +
>  	if (dev->rtnl_link_ops) {
>  		if (rtnl_link_fill(skb, dev) < 0)
>  			goto nla_put_failure;
> @@ -824,6 +902,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>  				    .len = sizeof(struct ifla_vf_vlan) },
>  	[IFLA_VF_TX_RATE]	= { .type = NLA_BINARY,
>  				    .len = sizeof(struct ifla_vf_tx_rate) },
> +	[IFLA_VF_PORT]		= { .type = NLA_NESTED },
>  };
>  EXPORT_SYMBOL(ifla_policy);
>  
> @@ -832,6 +911,20 @@ static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
>  	[IFLA_INFO_DATA]	= { .type = NLA_NESTED },
>  };
>  
> +static const struct nla_policy ifla_vf_port_policy[IFLA_VF_PORT_MAX+1] = {
> +	[IFLA_VF_PORT_VF]	    = { .type = NLA_U32 },
> +	[IFLA_VF_PORT_PROFILE]	    = { .type = NLA_STRING,
> +					.len = VF_PORT_PROFILE_MAX },
> +	[IFLA_VF_PORT_VSI_TYPE]     = { .type = NLA_BINARY,
> +					.len = sizeof(struct ifla_vf_port_vsi)},
> +	[IFLA_VF_PORT_INSTANCE_UUID]= { .type = NLA_BINARY,
> +					.len = VF_PORT_UUID_MAX },
> +	[IFLA_VF_PORT_HOST_UUID]    = { .type = NLA_STRING,
> +					.len = VF_PORT_UUID_MAX },
> +	[IFLA_VF_PORT_REQUEST]	    = { .type = NLA_U8, },
> +	[IFLA_VF_PORT_RESPONSE]	    = { .type = NLA_U16, },
> +};
> +
>  struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
>  {
>  	struct net *net;
> @@ -1028,6 +1121,27 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
>  	}
>  	err = 0;
>  
> +	if (tb[IFLA_VF_PORT]) {
> +		struct nlattr *vf_port[IFLA_VF_PORT_MAX+1];
> +		int vf = VF_PORT_VF_NOT_USED;
> +
> +		err = nla_parse_nested(vf_port, IFLA_VF_PORT_MAX,
> +			tb[IFLA_VF_PORT], ifla_vf_port_policy);
> +		if (err < 0)
> +			goto errout;
> +
> +		if (vf_port[IFLA_VF_PORT_VF])
> +			vf = nla_get_u32(vf_port[IFLA_VF_PORT_VF]);
> +
> +		err = -EOPNOTSUPP;
> +		if (ops->ndo_set_vf_port)
> +			err = ops->ndo_set_vf_port(dev, vf, vf_port);
> +		if (err < 0)
> +			goto errout;
> +		modified = 1;
> +	}
> +	err = 0;
> +
>  errout:
>  	if (err < 0 && modified && net_ratelimit())
>  		printk(KERN_WARNING "A link change request failed with "
> 


^ permalink raw reply

* Re: [GIT PULL] last minute vhost-net fix
From: David Miller @ 2010-05-14 11:04 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20100513084433.GA23082@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 13 May 2010 11:44:34 +0300

> David, if it's not too late, please pull the following
> last minute fix into 2.6.34.

Pulled, thanks.

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2010-05-14 11:06 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


One small last minute fix from the VHOST folks to deal with some
memory barrier issues.

Please pull, thanks a lot!

The following changes since commit cea0d767c29669bf89f86e4aee46ef462d2ebae8:
  Linus Torvalds (1):
        Merge branch 'hwmon-for-linus' of git://git.kernel.org/.../jdelvare/staging

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

David S. Miller (1):
      Merge branch 'net-2.6' of git://git.kernel.org/.../mst/vhost

Michael S. Tsirkin (1):
      vhost: fix barrier pairing

 drivers/vhost/vhost.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

^ permalink raw reply

* Re: [net-next-2.6 V7 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Arnd Bergmann @ 2010-05-14 12:12 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Scott Feldman, davem, netdev, chrisw
In-Reply-To: <4BED2CD8.4020209@trash.net>

On Friday 14 May 2010, Patrick McHardy wrote:
> Scott Feldman wrote:
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -653,6 +653,26 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev)
> >  		return 0;
> >  }
> >  
> > +static size_t rtnl_vf_port_size(const struct net_device *dev)
> > +{
> > +	size_t vf_port_size = nla_total_size(sizeof(struct nlattr))
> > +						     /* VF_PORT_VF */
> > +		+ nla_total_size(VF_PORT_PROFILE_MAX)/* VF_PORT_PROFILE */
> > +		+ nla_total_size(sizeof(struct ifla_vf_port_vsi))
> > +						     /* VF_PORT_VSI_TYPE */
> > +		+ nla_total_size(VF_PORT_UUID_MAX)   /* VF_PORT_VSI_INSTANCE */
> > +		+ nla_total_size(VF_PORT_UUID_MAX)   /* VF_PORT_HOST_UUID */
> > +		+ nla_total_size(1)		     /* VF_PROT_VDP_REQUEST */
> 
> Do messages generated by the kernel really contain a request?

Yes, the request field of the VDP message shows the status (e.g. associated or
disassociated).

> > +static int rtnl_vf_port_fill_nest(struct sk_buff *skb, struct net_device *dev,
> > +				  int vf)
> > +{
> > +	struct nlattr *data;
> > +	int err;
> > +
> > +	data = nla_nest_start(skb, IFLA_VF_PORT);
> 
> We usually use a top-level attribute to encapsulate lists of identical
> attributes. The other iflink attributes may only occur once and are
> usually parsed using nla_parse_nested(), which will parse all
> IFLA_VF_PORT attributes, but only return the last one.
> 
> Something like:
> 
> iflink message:
> ...
> [IFLA_VF_PORTS]
>   [IFLA_VF_PORT]
>     [IFLA_VF_PORT_*], ...
>   [IFLA_VF_PORT]
>     [IFLA_VF_PORT_*], ...
>   ...

Ah, I was wondering about this already. Does this mean that IFLA_VFINFO
does this incorrectly as well?

> >  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
> >  			    int type, u32 pid, u32 seq, u32 change,
> >  			    unsigned int flags)
> > @@ -747,17 +819,23 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
> >  		goto nla_put_failure;
> >  	copy_rtnl_link_stats64(nla_data(attr), stats);
> >  
> > +	if (dev->dev.parent)
> > +		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
> 
> Just wondering, is the only case where dev.parent is non-NULL
> really when virtual ports are present?

No, but if parent is NULL, we must not call dev_num_vf(). The way that enic
needs the attributes, they can be either for the VF of dev->dev.parent (the
PCI PF), or for the PF itself, even if it does not have VFs, in which case
it would be interesting to have IFLA_NUM_VF = 0 in the output.

Maybe a better structure would be to separate the two cases, also allowing
a port profile to be associated with both the PF and with each of its VFs?

Something like this:

[IFLA_NUM_VF]
[IFLA_VF_PORTS]
  [IFLA_VF_PORT]
    [IFLA_VF_PORT_*], ...
  [IFLA_VF_PORT]
    [IFLA_VF_PORT_*], ...
[IFLA_PORT_SELF]
  [IFLA_VF_PORT_*], ...

	Arnd

^ permalink raw reply

* [RFC] NF: IP tables idletimer target implementation
From: Luciano Coelho @ 2010-05-14 12:50 UTC (permalink / raw)
  To: netdev; +Cc: Timo Teras

This patch implements an idletimer IP tables target that can be used to
identify when interfaces have been idle for a certain period of time.

It adds a file to the sysfs for each interface that is brought up.  The file
contains the time remaining before the event is triggered.  This file can
also be used to set the timer manually.

The default timeout should be set when the IP table rule is defined with the
--timeout parameter set.

This implementation was originally done by Timo Teras and a few other people
who have sent patches with updates and fixes.  It has lived for a while in
the linux-omap tree, but has been removed when linux-omap was aligned with
upstream.  Now the patch has been forward-ported, which includes a few
changes related to net namespaces, x_tables etc.

While this is not the best approach for interface idle time monitoring, it is
non-intrusive and fits well in the existing architecture without any major
changes to the networking subsystem.

Cc: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
---
 include/linux/netfilter_ipv4/ipt_IDLETIMER.h |   22 ++
 net/ipv4/netfilter/Kconfig                   |   17 ++
 net/ipv4/netfilter/Makefile                  |    1 +
 net/ipv4/netfilter/ipt_IDLETIMER.c           |  320 ++++++++++++++++++++++++++
 4 files changed, 360 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter_ipv4/ipt_IDLETIMER.h
 create mode 100644 net/ipv4/netfilter/ipt_IDLETIMER.c

diff --git a/include/linux/netfilter_ipv4/ipt_IDLETIMER.h b/include/linux/netfilter_ipv4/ipt_IDLETIMER.h
new file mode 100644
index 0000000..89993e2
--- /dev/null
+++ b/include/linux/netfilter_ipv4/ipt_IDLETIMER.h
@@ -0,0 +1,22 @@
+/*
+ * linux/include/linux/netfilter_ipv4/ipt_IDLETIMER.h
+ *
+ * Header file for IP tables timer target module.
+ *
+ * Copyright (C) 2004 Nokia Corporation
+ * Written by Timo TerÃ¤s <ext-timo.teras@nokia.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _IPT_TIMER_H
+#define _IPT_TIMER_H
+
+struct ipt_idletimer_info {
+	unsigned int timeout;
+};
+
+#endif
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 1833bdb..91fba9a 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -204,6 +204,23 @@ config IP_NF_TARGET_REDIRECT
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP_NF_TARGET_IDLETIMER
+	tristate  "IDLETIMER target support"
+	depends on IP_NF_IPTABLES
+	help
+	  This option adds a `IDLETIMER' target. Each matching packet resets
+	  the timer associated with input and/or output interfaces. Timer
+	  expiry causes kobject uevent. Idle timer can be read via sysfs.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
+config IP_NF_TARGET_IDLETIMER_DEBUG
+	bool "IDLETIMER target debugging"
+	help
+	  Say Y here if you want to get debugging information when using the
+	  IDLETIMER target.  If unsure, say N.
+
+
 config NF_NAT_SNMP_BASIC
 	tristate "Basic SNMP-ALG support"
 	depends on NF_NAT
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 4811159..60bdaf1 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -60,6 +60,7 @@ obj-$(CONFIG_IP_NF_TARGET_MASQUERADE) += ipt_MASQUERADE.o
 obj-$(CONFIG_IP_NF_TARGET_NETMAP) += ipt_NETMAP.o
 obj-$(CONFIG_IP_NF_TARGET_REDIRECT) += ipt_REDIRECT.o
 obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o
+obj-$(CONFIG_IP_NF_TARGET_IDLETIMER) += ipt_IDLETIMER.o
 obj-$(CONFIG_IP_NF_TARGET_ULOG) += ipt_ULOG.o
 
 # generic ARP tables
diff --git a/net/ipv4/netfilter/ipt_IDLETIMER.c b/net/ipv4/netfilter/ipt_IDLETIMER.c
new file mode 100644
index 0000000..2c5b465
--- /dev/null
+++ b/net/ipv4/netfilter/ipt_IDLETIMER.c
@@ -0,0 +1,320 @@
+/*
+ * linux/net/ipv4/netfilter/ipt_IDLETIMER.c
+ *
+ * Netfilter module to trigger a timer when packet matches.
+ * After timer expires a kevent will be sent.
+ *
+ * Copyright (C) 2004, 2010 Nokia Corporation
+ * Written by Timo Teras <ext-timo.teras@nokia.com>
+ *
+ * Contact: Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/timer.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/notifier.h>
+#include <linux/netfilter.h>
+#include <linux/rtnetlink.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter_ipv4/ipt_IDLETIMER.h>
+#include <linux/kobject.h>
+#include <linux/workqueue.h>
+
+#ifdef CONFIG_IP_NF_TARGET_IDLETIMER_DEBUG
+#define DEBUGP(format, args...) printk(KERN_DEBUG \
+				       "ipt_IDLETIMER:%s:" format "\n", \
+				       __func__ , ## args)
+#else
+#define DEBUGP(format, args...)
+#endif
+
+/*
+ * Internal timer management.
+ */
+static ssize_t utimer_attr_show(struct device *dev,
+				struct device_attribute *attr, char *buf);
+static ssize_t utimer_attr_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count);
+
+struct utimer_t {
+	char name[IFNAMSIZ];
+	struct list_head entry;
+	struct timer_list timer;
+	struct work_struct work;
+	struct net *net;
+};
+
+static LIST_HEAD(active_utimer_head);
+static DEFINE_SPINLOCK(list_lock);
+static DEVICE_ATTR(idletimer, 0644, utimer_attr_show, utimer_attr_store);
+
+static void utimer_delete(struct utimer_t *timer)
+{
+	DEBUGP("Deleting timer '%s'\n", timer->name);
+
+	list_del(&timer->entry);
+	del_timer_sync(&timer->timer);
+	put_net(timer->net);
+	kfree(timer);
+}
+
+static void utimer_work(struct work_struct *work)
+{
+	struct utimer_t *timer = container_of(work, struct utimer_t, work);
+	struct net_device *netdev = NULL;
+
+	netdev = dev_get_by_name(timer->net, timer->name);
+
+	if (netdev != NULL) {
+		sysfs_notify(&netdev->dev.kobj, NULL,
+			     "idletimer");
+		dev_put(netdev);
+	}
+}
+
+static void utimer_expired(unsigned long data)
+{
+	struct utimer_t *timer = (struct utimer_t *) data;
+
+	DEBUGP("Timer '%s' expired\n", timer->name);
+
+	spin_lock_bh(&list_lock);
+	utimer_delete(timer);
+	spin_unlock_bh(&list_lock);
+
+	schedule_work(&timer->work);
+}
+
+static struct utimer_t *utimer_create(const char *name,
+				      struct net *net)
+{
+	struct utimer_t *timer;
+
+	timer = kmalloc(sizeof(struct utimer_t), GFP_ATOMIC);
+	if (timer == NULL)
+		return NULL;
+
+	list_add(&timer->entry, &active_utimer_head);
+	strlcpy(timer->name, name, sizeof(timer->name));
+	timer->net = get_net(net);
+
+	init_timer(&timer->timer);
+	timer->timer.function = utimer_expired;
+	timer->timer.data = (unsigned long) timer;
+
+	INIT_WORK(&timer->work, utimer_work);
+
+	DEBUGP("Created timer '%s'\n", timer->name);
+
+	return timer;
+}
+
+static struct utimer_t *__utimer_find(const char *name, const struct net *net)
+{
+	struct utimer_t *entry;
+
+	list_for_each_entry(entry, &active_utimer_head, entry) {
+		if (!strcmp(name, entry->name) && net == entry->net)
+			return entry;
+	}
+
+	return NULL;
+}
+
+static void utimer_modify(const char *name,
+			  struct net *net,
+			  unsigned long expires)
+{
+	struct utimer_t *timer;
+
+	DEBUGP("Modifying timer '%s'\n", name);
+	spin_lock_bh(&list_lock);
+	timer = __utimer_find(name, net);
+	if (timer == NULL)
+		timer = utimer_create(name, net);
+	mod_timer(&timer->timer, expires);
+	spin_unlock_bh(&list_lock);
+}
+
+static ssize_t utimer_attr_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct utimer_t *timer;
+	struct net_device *netdev = to_net_dev(dev);
+	unsigned long expires = 0;
+
+	spin_lock_bh(&list_lock);
+	timer = __utimer_find(netdev->name, dev_net(netdev));
+	if (timer)
+		expires = timer->timer.expires;
+	spin_unlock_bh(&list_lock);
+
+	if (expires)
+		return sprintf(buf, "%lu\n", (expires-jiffies) / HZ);
+
+	return sprintf(buf, "0\n");
+}
+
+static ssize_t utimer_attr_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count)
+{
+	int expires;
+	struct net_device *netdev = to_net_dev(dev);
+
+	if (sscanf(buf, "%d", &expires) == 1) {
+		if (expires > 0)
+			utimer_modify(netdev->name,
+				      dev_net(netdev),
+				      jiffies+HZ*(unsigned long)expires);
+	}
+
+	return count;
+}
+
+static int utimer_notifier_call(struct notifier_block *this,
+				unsigned long event, void *ptr)
+{
+	struct net_device *netdev = ptr;
+	int ret;
+
+	switch (event) {
+	case NETDEV_UP:
+		DEBUGP("NETDEV_UP: %s\n", netdev->name);
+		ret = device_create_file(&netdev->dev,
+					 &dev_attr_idletimer);
+		WARN_ON(ret);
+
+		break;
+	case NETDEV_DOWN:
+		DEBUGP("NETDEV_DOWN: %s\n", netdev->name);
+		device_remove_file(&netdev->dev,
+				   &dev_attr_idletimer);
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block utimer_notifier_block = {
+	.notifier_call	= utimer_notifier_call,
+};
+
+
+static int utimer_init(void)
+{
+	return register_netdevice_notifier(&utimer_notifier_block);
+}
+
+static void utimer_fini(void)
+{
+	struct utimer_t *entry, *next;
+	struct net_device *dev;
+	struct net *net;
+
+	list_for_each_entry_safe(entry, next, &active_utimer_head, entry)
+		utimer_delete(entry);
+
+	rtnl_lock();
+	unregister_netdevice_notifier(&utimer_notifier_block);
+	for_each_net(net) {
+		for_each_netdev(net, dev) {
+			utimer_notifier_call(&utimer_notifier_block,
+					     NETDEV_DOWN, dev);
+		}
+	}
+	rtnl_unlock();
+}
+
+/*
+ * The actual iptables plugin.
+ */
+static unsigned int ipt_idletimer_target(struct sk_buff *skb,
+					 const struct xt_action_param *par)
+{
+	const struct ipt_idletimer_info *target = par->targinfo;
+	unsigned long expires;
+
+	expires = jiffies + HZ*target->timeout;
+
+	if (par->in != NULL)
+		utimer_modify(par->in->name,
+			      dev_net(par->in),
+			      expires);
+
+	if (par->out != NULL)
+		utimer_modify(par->out->name,
+			      dev_net(par->out),
+			      expires);
+
+	return XT_CONTINUE;
+}
+
+static int ipt_idletimer_checkentry(const struct xt_tgchk_param *par)
+{
+	const struct ipt_idletimer_info *info = par->targinfo;
+
+	if (info->timeout == 0) {
+		DEBUGP("timeout value is zero\n");
+		return false;
+	}
+
+	return true;
+}
+
+static struct xt_target ipt_idletimer = {
+	.name		= "IDLETIMER",
+	.family		= NFPROTO_IPV4,
+	.target		= ipt_idletimer_target,
+	.targetsize     = sizeof(struct ipt_idletimer_info),
+	.checkentry	= ipt_idletimer_checkentry,
+	.me		= THIS_MODULE,
+};
+
+static int __init init(void)
+{
+	int ret;
+
+	ret = utimer_init();
+	if (ret)
+		return ret;
+
+	ret =  xt_register_target(&ipt_idletimer);
+	if (ret < 0) {
+		utimer_fini();
+		return ret;
+	}
+
+	return 0;
+}
+
+static void __exit fini(void)
+{
+	xt_unregister_target(&ipt_idletimer);
+	utimer_fini();
+}
+
+module_init(init);
+module_exit(fini);
+
+MODULE_AUTHOR("Timo Teras <ext-timo.teras@nokia.com>");
+MODULE_DESCRIPTION("iptables idletimer target module");
+MODULE_LICENSE("GPL");
-- 
1.6.3.3


^ permalink raw reply related

* loosing IPMI-card by loading netconsole
From: Henning Fehrmann @ 2010-05-14 13:45 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Jesse Brandeburg, Bruce Allan, PJ Waskiewicz, John Ronciak,
	netdev, Matt Mackall, Carsten Aulbert, Tejun Heo

Hello,

We have SuperMicro PDSM 2+ boards together with 82573E and 82573L Intel NICs.
Additional we have IPMI cards which are tunneled via the 82573E NIC.

We are using the e1000e driver for NICs together with the netconsole driver.
(netconsole netconsole=4444@client_IP/eth1,514@server_IP/server_MAC). Netconsole
is using the NIC which is NOT used by the IPMI card.

Usually we are able to access the IPMI card remotely with ipmitools.

Having the kernel 2.6.27.39 installed everything worked together, the NICs, remotely 
accessing the IPMI cards and netconsole.

The driver version of e1000e is 0.3.3.3-k6. 

Using a more recent kernel: 2.6.32.7 we lost the ability of remotely accessing the IPMI card
when the netconsole driver is loaded. 
The IPMI card is accessible before the netconsole driver is loaded and disappears 
once we use netconsole. Even unloading netconsole does not help then.

We get the IPMI card back when 'removing' eth0:
echo 1 > /sys/devices/pci0000:00/0000:00:1c.4/0000:0d:00.0/remove

0d:00.0 is eth0.

The version of the e1000e driver is 1.0.2-k2.

I compiled and loaded a later version of this driver (1.1.19) without solving this problem.

For eth0 (82573E) we have a firmware version 0.15-4 installed and for eth1 we use the firmware 
version 0.5-7. But this is the same for both kernel versions. 

Do you have an idea? 

Thank you and cheers,
Henning

^ permalink raw reply

* [PATCH] fsl_pq_mdio: Fix mdiobus allocation handling
From: Anton Vorontsov @ 2010-05-14 14:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linuxppc-dev

The driver could return success code even if mdiobus_alloc() failed.
This patch fixes the issue.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 drivers/net/fsl_pq_mdio.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fsl_pq_mdio.c b/drivers/net/fsl_pq_mdio.c
index 3acac5f..ff028f5 100644
--- a/drivers/net/fsl_pq_mdio.c
+++ b/drivers/net/fsl_pq_mdio.c
@@ -277,15 +277,17 @@ static int fsl_pq_mdio_probe(struct of_device *ofdev,
 	int tbiaddr = -1;
 	const u32 *addrp;
 	u64 addr = 0, size = 0;
-	int err = 0;
+	int err;
 
 	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
 	if (!priv)
 		return -ENOMEM;
 
 	new_bus = mdiobus_alloc();
-	if (NULL == new_bus)
+	if (!new_bus) {
+		err = -ENOMEM;
 		goto err_free_priv;
+	}
 
 	new_bus->name = "Freescale PowerQUICC MII Bus",
 	new_bus->read = &fsl_pq_mdio_read,
-- 
1.7.0.5


^ permalink raw reply related

* [PATCH] gianfar: Remove legacy PM callbacks
From: Anton Vorontsov @ 2010-05-14 14:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linuxppc-dev

These callbacks were needed because dev_pm_ops support for OF
platform devices was in the powerpc tree, and the patch that
added dev_pm_ops for gianfar driver was in the netdev tree. Now
that netdev and powerpc trees have merged into Linus' tree, we
can remove the legacy hooks.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 drivers/net/gianfar.c |   14 --------------
 1 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 5d3763f..fb23f04 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1288,21 +1288,9 @@ static struct dev_pm_ops gfar_pm_ops = {
 
 #define GFAR_PM_OPS (&gfar_pm_ops)
 
-static int gfar_legacy_suspend(struct of_device *ofdev, pm_message_t state)
-{
-	return gfar_suspend(&ofdev->dev);
-}
-
-static int gfar_legacy_resume(struct of_device *ofdev)
-{
-	return gfar_resume(&ofdev->dev);
-}
-
 #else
 
 #define GFAR_PM_OPS NULL
-#define gfar_legacy_suspend NULL
-#define gfar_legacy_resume NULL
 
 #endif
 
@@ -3055,8 +3043,6 @@ static struct of_platform_driver gfar_driver = {
 
 	.probe = gfar_probe,
 	.remove = gfar_remove,
-	.suspend = gfar_legacy_suspend,
-	.resume = gfar_legacy_resume,
 	.driver.pm = GFAR_PM_OPS,
 };
 
-- 
1.7.0.5

^ permalink raw reply related

* RE: loosing IPMI-card by loading netconsole
From: Ronciak, John @ 2010-05-14 14:51 UTC (permalink / raw)
  To: Henning Fehrmann, Kirsher, Jeffrey T
  Cc: Brandeburg, Jesse, Allan, Bruce W, Waskiewicz Jr, Peter P,
	netdev@vger.kernel.org, Matt Mackall, Carsten Aulbert, Tejun Heo
In-Reply-To: <20100514134544.GA26674@gretchen.aei.mpg.de>

Sorry to hear about the problem you are having Henning.  What do you mean when you say "it disappears"?  Can both eth0 and eth1 ping (or be pinged)?  Do all the networking devices still show up in the system when you do an 'lspci'?  What happens if you down and then up the interface you are having problems with?  Does 'rmmod' do the same thing as your removal method?  Is there anything in the system logs saying anything about the interfaces?

We have not had reports of this so this is a bit unusual.  Please let us know.

Does this happen on other systems as well or just one particular system?


Cheers,
John


> -----Original Message-----
> From: Henning Fehrmann [mailto:henning.fehrmann@aei.mpg.de]
> Sent: Friday, May 14, 2010 6:46 AM
> To: Kirsher, Jeffrey T
> Cc: Brandeburg, Jesse; Allan, Bruce W; Waskiewicz Jr, Peter P; Ronciak,
> John; netdev@vger.kernel.org; Matt Mackall; Carsten Aulbert; Tejun Heo
> Subject: loosing IPMI-card by loading netconsole
> 
> Hello,
> 
> We have SuperMicro PDSM 2+ boards together with 82573E and 82573L Intel
> NICs.
> Additional we have IPMI cards which are tunneled via the 82573E NIC.
> 
> We are using the e1000e driver for NICs together with the netconsole
> driver.
> (netconsole netconsole=4444@client_IP/eth1,514@server_IP/server_MAC).
> Netconsole is using the NIC which is NOT used by the IPMI card.
> 
> Usually we are able to access the IPMI card remotely with ipmitools.
> 
> Having the kernel 2.6.27.39 installed everything worked together, the
> NICs, remotely accessing the IPMI cards and netconsole.
> 
> The driver version of e1000e is 0.3.3.3-k6.
> 
> Using a more recent kernel: 2.6.32.7 we lost the ability of remotely
> accessing the IPMI card when the netconsole driver is loaded.
> The IPMI card is accessible before the netconsole driver is loaded and
> disappears once we use netconsole. Even unloading netconsole does not
> help then.
> 
> We get the IPMI card back when 'removing' eth0:
> echo 1 > /sys/devices/pci0000:00/0000:00:1c.4/0000:0d:00.0/remove
> 
> 0d:00.0 is eth0.
> 
> 
> The version of the e1000e driver is 1.0.2-k2.
> 
> I compiled and loaded a later version of this driver (1.1.19) without
> solving this problem.
> 
> 
> For eth0 (82573E) we have a firmware version 0.15-4 installed and for
> eth1 we use the firmware version 0.5-7. But this is the same for both
> kernel versions.
> 
> Do you have an idea?
> 
> Thank you and cheers,
> Henning


^ permalink raw reply

* Re: [PATCH 0/6] sky2: update
From: Stephen Hemminger @ 2010-05-14 15:19 UTC (permalink / raw)
  To: David Miller; +Cc: mikem, netdev
In-Reply-To: <20100514.031501.57481177.davem@davemloft.net>

On Fri, 14 May 2010 03:15:01 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Thu, 13 May 2010 09:12:47 -0700
> 
> > Bunch of patches from Mike, with some additional comments.
> 
> All applied to net-next-2.6, thanks.

The first one needs to go to net-2.6 because it a regression:
Current code will lose multicast addresses when the automatic
recovery from stuck chip happens. Auto recovery happens a lot
under load on some configurations.

-- 

^ permalink raw reply

* Fw: [Bug 15974] New: kernel panic when squid in bridge mode
From: Stephen Hemminger @ 2010-05-14 15:24 UTC (permalink / raw)
  To: Bart De Schuymer, Patrick McHardy; +Cc: netdev



Begin forwarded message:

Date: Fri, 14 May 2010 08:52:07 GMT
From: bugzilla-daemon@bugzilla.kernel.org
To: shemminger@linux-foundation.org
Subject: [Bug 15974] New: kernel panic when squid in bridge mode


https://bugzilla.kernel.org/show_bug.cgi?id=15974

           Summary: kernel panic when squid in bridge mode
           Product: Networking
           Version: 2.5
    Kernel Version: 2.6.30.5
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: IPV4
        AssignedTo: shemminger@linux-foundation.org
        ReportedBy: senthilkumaar2021@gmail.com
        Regression: No


Hi we are using squid tproxy in bridge mode .The kernel version used is
2.6.30.5 once in 10-15 hours we are getting kernel panic message in he screen
.We are passing traffic of 100Mbps through bridge.The iptables and ebtables are
used for squid

ptables -t mangle -N DIVERT
iptables -t mangle -A DIVERT -j MARK --set-mark 1
iptables -t mangle -A DIVERT -j ACCEPT

iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
iptables -t mangle -A PREROUTING -p tcp --dport 80 -j TPROXY --tproxy-mark
0x1/0x1 --on-port 3129

ebtables -t broute -A BROUTING -i $CLIENT_IFACE -p ipv4 --ip-proto tcp
--ip-dport 80 -j redirect --redirect-target DROP

ebtables -t broute -A BROUTING -i $INET_IFACE -p ipv4 --ip-proto tcp --ip-sport
80 -j redirect --redirect-target DROP 


we have got kernel panic in kernel 2.6.28.5 also

the error is

<ffffffffa03933c2>] ? nf_nat_fn+0x138/0x14e [iptable_nat]
[<ffffffffa0393585>] ? nf_nat_in+0x2f/0x6e [iptable_nat]
[<ffffffffa027edaa>] ? br_nf_pre_routing_finish+0x0/0x2c4 [bridge]
[<ffffffffa027edfa>] br_nf_pre_routing_finish+0x50/0x2c4 [bridge]
[<ffffffffa027edaa>] ? br_nf_pre_routing_finish+0x0/0x2c4 [bridge]
[<ffffffff81339a50>] ? nf_hook_slow+0x68/0xc8
[<ffffffffa027edaa>] ? br_nf_pre_routing_finish+0x0/0x2c4 [bridge]
[<ffffffffa027f616>] br_nf_pre_routing+0x5a8/0x5c7 [bridge]
[<ffffffff813399ab>] nf_iterate+0x48/0x85
[<ffffffffa027a931>] ? br_handle_frame_finish+0x0/0x154 [bridge]
[<ffffffff81339a50>] nf_hook_slow+0x68/0xc8
[<ffffffffa027a931>] ? br_handle_frame_finish+0x0/0x154 [bridge]
[<ffffffffa027ac36>] br_handle_frame+0x1b1/0x1db [bridge]
[<ffffffff8131d54b>] netif_receive_skb+0x316/0x434
[<ffffffff8131dbfb>] napi_gro_receive+0x6e/0x83
[<ffffffffa0125bfe>] e1000_receive_skb+0x5c/0x65 [e1000e]
[<ffffffffa0125de8>] e1000_clean_rx_irq+0x1e1/0x28f [e1000e]
[<ffffffffa012730e>] e1000_clean+0x99/0x24a [e1000e]
[<ffffffff813bcfc5>] ? _spin_unlock_irqrestore+0x2c/0x43
[<ffffffff8131ba62>] net_rx_action+0xb8/0x1b4
[<ffffffff8104ed43>] __do_softirq+0x99/0x152
[<ffffffff8101284c>] call_softirq+0x1c/0x30
[<ffffffff81013a02>] do_softirq+0x52/0xb9
[<ffffffff8104e969>] irq_exit+0x53/0x8d
[<ffffffff81013d1a>] do_IRQ+0x135/0x157
[<ffffffff81011f93>] ret_from_intr+0x0/0x2e
<EOI> [<ffffffff81017e20>] ? mwait_idle+0x9e/0xc7
[<ffffffff81017e17>] ? mwait_idle+0x95/0xc7
[<ffffffff813bfd20>] ? atomic_notifier_call_chain+0x13/0x15
[<ffffffff810102f4>] ? enter_idle+0x27/0x29


Please help me in fixing the issue

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


-- 

^ permalink raw reply

* Re: [RFC] NF: IP tables idletimer target implementation
From: Patrick McHardy @ 2010-05-14 16:03 UTC (permalink / raw)
  To: Luciano Coelho; +Cc: netdev, Timo Teras, Netfilter Development Mailinglist
In-Reply-To: <1273841458-10443-1-git-send-email-luciano.coelho@nokia.com>

Please CC netfilter-devel on future submissions.

Luciano Coelho wrote:
> It adds a file to the sysfs for each interface that is brought up.  The file
> contains the time remaining before the event is triggered.  This file can
> also be used to set the timer manually.

What is this used for? It doesn't seem to smart to poll manually
if you get an event anyways, and the timeout can already be set
per rule.

> diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
> index 1833bdb..91fba9a 100644
> --- a/net/ipv4/netfilter/Kconfig
> +++ b/net/ipv4/netfilter/Kconfig
> @@ -204,6 +204,23 @@ config IP_NF_TARGET_REDIRECT
>  
>  	  To compile it as a module, choose M here.  If unsure, say N.
>  
> +config IP_NF_TARGET_IDLETIMER

This should be a x_tables target, there's nothing IPv4-specific
about it.

> diff --git a/net/ipv4/netfilter/ipt_IDLETIMER.c b/net/ipv4/netfilter/ipt_IDLETIMER.c
> new file mode 100644
> index 0000000..2c5b465
> --- /dev/null
> +++ b/net/ipv4/netfilter/ipt_IDLETIMER.c

> +
> +#ifdef CONFIG_IP_NF_TARGET_IDLETIMER_DEBUG
> +#define DEBUGP(format, args...) printk(KERN_DEBUG \
> +				       "ipt_IDLETIMER:%s:" format "\n", \
> +				       __func__ , ## args)
> +#else
> +#define DEBUGP(format, args...)
> +#endif

Please use pr_debug and get rid of the config option.

> +
> +/*
> + * Internal timer management.
> + */
> +static ssize_t utimer_attr_show(struct device *dev,
> +				struct device_attribute *attr, char *buf);
> +static ssize_t utimer_attr_store(struct device *dev,
> +				 struct device_attribute *attr,
> +				 const char *buf, size_t count);
> +
> +struct utimer_t {
> +	char name[IFNAMSIZ];
> +	struct list_head entry;
> +	struct timer_list timer;
> +	struct work_struct work;
> +	struct net *net;
> +};
> +
> +static LIST_HEAD(active_utimer_head);
> +static DEFINE_SPINLOCK(list_lock);
> +static DEVICE_ATTR(idletimer, 0644, utimer_attr_show, utimer_attr_store);
> +
> +static void utimer_delete(struct utimer_t *timer)
> +{
> +	DEBUGP("Deleting timer '%s'\n", timer->name);
> +
> +	list_del(&timer->entry);
> +	del_timer_sync(&timer->timer);
> +	put_net(timer->net);
> +	kfree(timer);
> +}
> +
> +static void utimer_work(struct work_struct *work)
> +{
> +	struct utimer_t *timer = container_of(work, struct utimer_t, work);
> +	struct net_device *netdev = NULL;

Unnecessary initialization.

> +
> +	netdev = dev_get_by_name(timer->net, timer->name);
> +
> +	if (netdev != NULL) {
> +		sysfs_notify(&netdev->dev.kobj, NULL,
> +			     "idletimer");
> +		dev_put(netdev);
> +	}
> +}
> +
> +static void utimer_expired(unsigned long data)
> +{
> +	struct utimer_t *timer = (struct utimer_t *) data;
> +
> +	DEBUGP("Timer '%s' expired\n", timer->name);
> +
> +	spin_lock_bh(&list_lock);
> +	utimer_delete(timer);
> +	spin_unlock_bh(&list_lock);
> +
> +	schedule_work(&timer->work);

Use after free, utimer_delete() frees the timer.

> +}
> +
> +static struct utimer_t *utimer_create(const char *name,
> +				      struct net *net)
> +{
> +	struct utimer_t *timer;
> +
> +	timer = kmalloc(sizeof(struct utimer_t), GFP_ATOMIC);
> +	if (timer == NULL)
> +		return NULL;
> +
> +	list_add(&timer->entry, &active_utimer_head);
> +	strlcpy(timer->name, name, sizeof(timer->name));
> +	timer->net = get_net(net);

How does this handle namespace exit?

> +
> +	init_timer(&timer->timer);
> +	timer->timer.function = utimer_expired;
> +	timer->timer.data = (unsigned long) timer;

setup_timer()

> +
> +	INIT_WORK(&timer->work, utimer_work);
> +
> +	DEBUGP("Created timer '%s'\n", timer->name);
> +
> +	return timer;
> +}
> +
> +static struct utimer_t *__utimer_find(const char *name, const struct net *net)
> +{
> +	struct utimer_t *entry;
> +
> +	list_for_each_entry(entry, &active_utimer_head, entry) {
> +		if (!strcmp(name, entry->name) && net == entry->net)
> +			return entry;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void utimer_modify(const char *name,
> +			  struct net *net,
> +			  unsigned long expires)
> +{
> +	struct utimer_t *timer;
> +
> +	DEBUGP("Modifying timer '%s'\n", name);
> +	spin_lock_bh(&list_lock);
> +	timer = __utimer_find(name, net);

So you're scanning the list up to twice per packet? That seems
highly suboptimal, why not create the timer when the rule is
created and only update the timeout? You could use the interfaces
specified in struct ipt_ip.

> +	if (timer == NULL)
> +		timer = utimer_create(name, net);
> +	mod_timer(&timer->timer, expires);
> +	spin_unlock_bh(&list_lock);
> +}
> +
> +static ssize_t utimer_attr_show(struct device *dev,
> +				struct device_attribute *attr, char *buf)
> +{
> +	struct utimer_t *timer;
> +	struct net_device *netdev = to_net_dev(dev);
> +	unsigned long expires = 0;
> +
> +	spin_lock_bh(&list_lock);
> +	timer = __utimer_find(netdev->name, dev_net(netdev));
> +	if (timer)
> +		expires = timer->timer.expires;
> +	spin_unlock_bh(&list_lock);
> +
> +	if (expires)
> +		return sprintf(buf, "%lu\n", (expires-jiffies) / HZ);
> +
> +	return sprintf(buf, "0\n");
> +}
> +
> +static ssize_t utimer_attr_store(struct device *dev,
> +				 struct device_attribute *attr,
> +				 const char *buf, size_t count)
> +{
> +	int expires;
> +	struct net_device *netdev = to_net_dev(dev);
> +
> +	if (sscanf(buf, "%d", &expires) == 1) {
> +		if (expires > 0)

Using %u seems better.

> +			utimer_modify(netdev->name,
> +				      dev_net(netdev),
> +				      jiffies+HZ*(unsigned long)expires);
> +	}
> +
> +	return count;
> +}
> +
> +static int utimer_notifier_call(struct notifier_block *this,
> +				unsigned long event, void *ptr)
> +{
> +	struct net_device *netdev = ptr;
> +	int ret;
> +
> +	switch (event) {
> +	case NETDEV_UP:
> +		DEBUGP("NETDEV_UP: %s\n", netdev->name);
> +		ret = device_create_file(&netdev->dev,
> +					 &dev_attr_idletimer);
> +		WARN_ON(ret);
> +
> +		break;
> +	case NETDEV_DOWN:
> +		DEBUGP("NETDEV_DOWN: %s\n", netdev->name);
> +		device_remove_file(&netdev->dev,
> +				   &dev_attr_idletimer);
> +		break;
> +	}
> +
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block utimer_notifier_block = {
> +	.notifier_call	= utimer_notifier_call,
> +};
> +
> +
> +static int utimer_init(void)
> +{
> +	return register_netdevice_notifier(&utimer_notifier_block);
> +}
> +
> +static void utimer_fini(void)
> +{
> +	struct utimer_t *entry, *next;
> +	struct net_device *dev;
> +	struct net *net;
> +
> +	list_for_each_entry_safe(entry, next, &active_utimer_head, entry)
> +		utimer_delete(entry);
> +
> +	rtnl_lock();

deadlock? unregister_netdevice_notifier() already takes the RTNL.

> +	unregister_netdevice_notifier(&utimer_notifier_block);
> +	for_each_net(net) {
> +		for_each_netdev(net, dev) {
> +			utimer_notifier_call(&utimer_notifier_block,
> +					     NETDEV_DOWN, dev);
> +		}
> +	}
> +	rtnl_unlock();
> +}
> +
> +/*
> + * The actual iptables plugin.
> + */
> +static unsigned int ipt_idletimer_target(struct sk_buff *skb,
> +					 const struct xt_action_param *par)
> +{
> +	const struct ipt_idletimer_info *target = par->targinfo;
> +	unsigned long expires;
> +
> +	expires = jiffies + HZ*target->timeout;
> +
> +	if (par->in != NULL)
> +		utimer_modify(par->in->name,
> +			      dev_net(par->in),
> +			      expires);
> +
> +	if (par->out != NULL)
> +		utimer_modify(par->out->name,
> +			      dev_net(par->out),
> +			      expires);
> +
> +	return XT_CONTINUE;
> +}
> +
> +static int ipt_idletimer_checkentry(const struct xt_tgchk_param *par)
> +{
> +	const struct ipt_idletimer_info *info = par->targinfo;
> +
> +	if (info->timeout == 0) {
> +		DEBUGP("timeout value is zero\n");
> +		return false;
> +	}
> +
> +	return true;

The return convention in the current net-next tree is 0 for
no error or an errno code otherwise.

> +}

^ permalink raw reply

* Re: loosing IPMI-card by loading netconsole
From: Tejun Heo @ 2010-05-14 16:27 UTC (permalink / raw)
  To: Ronciak, John
  Cc: Henning Fehrmann, Kirsher, Jeffrey T, Brandeburg, Jesse,
	Allan, Bruce W, Waskiewicz Jr, Peter P, netdev@vger.kernel.org,
	Matt Mackall, Carsten Aulbert
In-Reply-To: <DDC57477F5D6F845A0DDCB99D3C4812D0CA63BB9E4@orsmsx510.amr.corp.intel.com>

Hello, John.

As Henning seems offline, I'll try to fill in.

On 05/14/2010 04:51 PM, Ronciak, John wrote:
> Sorry to hear about the problem you are having Henning.  What do you
> mean when you say "it disappears"?

It stops responding to IPMI requests.

> Can both eth0 and eth1 ping (or be pinged)?  Do all the networking
> devices still show up in the system when you do an 'lspci'?

Yeah, everything other than IPMI works just fine.

> What happens if you down and then up the interface you are having
> problems with?  Does 'rmmod' do the same thing as your removal
> method?

Haven't tried these but well I think rmmoding should achieve about the
same thing.

> Is there anything in the system logs saying anything about the
> interfaces?

Nope.

> We have not had reports of this so this is a bit unusual.  Please let us know.
> 
> Does this happen on other systems as well or just one particular system?

Yeah, it happens on at least several hundred machines, so not an
isolated hardware issue at all.

To sum up.

On 2.6.27.39, netconsole + IPMI works fine.  On 2.6.32.7, as soon as
netconsole is loaded, IPMI stops working.  Unloading netconsole
doesn't revive IPMI but detaching the driver from the controller does.
In both cases, usual networking works fine.

Thanks.

-- 
tejun

^ permalink raw reply

* SR-IOV PCI quirk for 82599?
From: Fischer, Anna @ 2010-05-14 16:26 UTC (permalink / raw)
  To: netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net

There is a PCI quirk for the 82576 controller that programs the PCI BARs to use Flash memory if the BIOS has not allocated resources for the SR-IOV VF BARs.

Is there a similar quirk for the 82599, or can even the same one be used for that device?

Thanks,
Anna 

^ permalink raw reply

* RE: loosing IPMI-card by loading netconsole
From: Ronciak, John @ 2010-05-14 16:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Henning Fehrmann, Kirsher, Jeffrey T, Brandeburg, Jesse,
	Allan, Bruce W, Waskiewicz Jr, Peter P, netdev@vger.kernel.org,
	Matt Mackall, Carsten Aulbert
In-Reply-To: <4BED79EB.1000204@kernel.org>


> -----Original Message-----
> From: Tejun Heo [mailto:tj@kernel.org]
> Sent: Friday, May 14, 2010 9:27 AM
> To: Ronciak, John
> Cc: Henning Fehrmann; Kirsher, Jeffrey T; Brandeburg, Jesse; Allan,
> Bruce W; Waskiewicz Jr, Peter P; netdev@vger.kernel.org; Matt Mackall;
> Carsten Aulbert
> Subject: Re: loosing IPMI-card by loading netconsole
> 
> Hello, John.
> 
> As Henning seems offline, I'll try to fill in.
> 
> On 05/14/2010 04:51 PM, Ronciak, John wrote:
> > Sorry to hear about the problem you are having Henning.  What do you
> > mean when you say "it disappears"?
> 
> It stops responding to IPMI requests.
> 
> > Can both eth0 and eth1 ping (or be pinged)?  Do all the networking
> > devices still show up in the system when you do an 'lspci'?
> 
> Yeah, everything other than IPMI works just fine.
> 
> > What happens if you down and then up the interface you are having
> > problems with?  Does 'rmmod' do the same thing as your removal
> method?
> 
> Haven't tried these but well I think rmmoding should achieve about the
> same thing.
> 
> > Is there anything in the system logs saying anything about the
> > interfaces?
> 
> Nope.
> 
> > We have not had reports of this so this is a bit unusual.  Please let
> us know.
> >
> > Does this happen on other systems as well or just one particular
> system?
> 
> Yeah, it happens on at least several hundred machines, so not an
> isolated hardware issue at all.
> 
> To sum up.
> 
> On 2.6.27.39, netconsole + IPMI works fine.  On 2.6.32.7, as soon as
> netconsole is loaded, IPMI stops working.  Unloading netconsole doesn't
> revive IPMI but detaching the driver from the controller does.
> In both cases, usual networking works fine.
> 
> Thanks.
> 
> --
> Tejun
Thanks Tejun.

Since the networking things seem to be operational could it be that with the newer kernel the tunnels you had setup somehow no longer exist or have been disabled somehow?  When this interface is removed and setup again it gets things fixed up again?

Cheers,
John


^ permalink raw reply

* Re: [net-next-2.6 V7 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Patrick McHardy @ 2010-05-14 16:42 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Scott Feldman, davem, netdev, chrisw
In-Reply-To: <201005141412.01578.arnd@arndb.de>

Arnd Bergmann wrote:
> On Friday 14 May 2010, Patrick McHardy wrote:
>>> +static int rtnl_vf_port_fill_nest(struct sk_buff *skb, struct net_device *dev,
>>> +				  int vf)
>>> +{
>>> +	struct nlattr *data;
>>> +	int err;
>>> +
>>> +	data = nla_nest_start(skb, IFLA_VF_PORT);
>> We usually use a top-level attribute to encapsulate lists of identical
>> attributes. The other iflink attributes may only occur once and are
>> usually parsed using nla_parse_nested(), which will parse all
>> IFLA_VF_PORT attributes, but only return the last one.
>>
>> Something like:
>>
>> iflink message:
>> ...
>> [IFLA_VF_PORTS]
>>   [IFLA_VF_PORT]
>>     [IFLA_VF_PORT_*], ...
>>   [IFLA_VF_PORT]
>>     [IFLA_VF_PORT_*], ...
>>   ...
> 
> Ah, I was wondering about this already. Does this mean that IFLA_VFINFO
> does this incorrectly as well?

Yes.

>>>  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>>>  			    int type, u32 pid, u32 seq, u32 change,
>>>  			    unsigned int flags)
>>> @@ -747,17 +819,23 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>>>  		goto nla_put_failure;
>>>  	copy_rtnl_link_stats64(nla_data(attr), stats);
>>>  
>>> +	if (dev->dev.parent)
>>> +		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
>> Just wondering, is the only case where dev.parent is non-NULL
>> really when virtual ports are present?
> 
> No, but if parent is NULL, we must not call dev_num_vf(). The way that enic
> needs the attributes, they can be either for the VF of dev->dev.parent (the
> PCI PF), or for the PF itself, even if it does not have VFs, in which case
> it would be interesting to have IFLA_NUM_VF = 0 in the output.

I see. I was mainly wondering about completely different types of
devices.

> Maybe a better structure would be to separate the two cases, also allowing
> a port profile to be associated with both the PF and with each of its VFs?
> 
> Something like this:
> 
> [IFLA_NUM_VF]
> [IFLA_VF_PORTS]
>   [IFLA_VF_PORT]
>     [IFLA_VF_PORT_*], ...
>   [IFLA_VF_PORT]
>     [IFLA_VF_PORT_*], ...
> [IFLA_PORT_SELF]
>   [IFLA_VF_PORT_*], ...

That would also be fine.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox