Netdev List
 help / color / mirror / Atom feed
* mmotm 2010-07-19 - e1000e vs. pm_qos_update_request issues
From: Valdis.Kletnieks @ 2010-07-20 20:35 UTC (permalink / raw)
  To: akpm, Thomas Gleixner, David S. Miller; +Cc: linux-kernel, e1000-devel, netdev
In-Reply-To: <201007200007.o6K07Xbg028863@imap1.linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 1966 bytes --]

On Mon, 19 Jul 2010 16:38:09 PDT, akpm@linux-foundation.org said:
> The mm-of-the-moment snapshot 2010-07-19-16-37 has been uploaded to
> 
>    http://userweb.kernel.org/~akpm/mmotm/

Throws a warning at boot:

[    1.786060] WARNING: at kernel/pm_qos_params.c:264 pm_qos_update_request+0x28/0x54()
[    1.786088] Hardware name: Latitude E6500
[    1.787045] pm_qos_update_request() called for unknown object
[    1.787966] Modules linked in:
[    1.788940] Pid: 1, comm: swapper Not tainted 2.6.35-rc5-mmotm0719 #1
[    1.790035] Call Trace:
[    1.791121]  [<ffffffff81037335>] warn_slowpath_common+0x80/0x98
[    1.792205]  [<ffffffff810373e1>] warn_slowpath_fmt+0x41/0x43
[    1.793279]  [<ffffffff81057c14>] pm_qos_update_request+0x28/0x54
[    1.794347]  [<ffffffff8134889e>] e1000_configure+0x421/0x459
[    1.795393]  [<ffffffff8134afbd>] e1000_open+0xbd/0x37c
[    1.796436]  [<ffffffff8105743a>] ? raw_notifier_call_chain+0xf/0x11
[    1.797491]  [<ffffffff8145f948>] __dev_open+0xae/0xe2
[    1.798547]  [<ffffffff8145f997>] dev_open+0x1b/0x49
[    1.799612]  [<ffffffff8146e36e>] netpoll_setup+0x84/0x259
[    1.800685]  [<ffffffff81b5037c>] init_netconsole+0xbc/0x21f
[    1.801744]  [<ffffffff81b5026c>] ? sir_wq_init+0x0/0x35
[    1.802793]  [<ffffffff81b502c0>] ? init_netconsole+0x0/0x21f
[    1.803845]  [<ffffffff810002ff>] do_one_initcall+0x7a/0x12f
[    1.804885]  [<ffffffff81b2ccae>] kernel_init+0x138/0x1c2
[    1.805915]  [<ffffffff81003554>] kernel_thread_helper+0x4/0x10
[    1.806937]  [<ffffffff81590e00>] ? restore_args+0x0/0x30
[    1.807955]  [<ffffffff81b2cb76>] ? kernel_init+0x0/0x1c2
[    1.808958]  [<ffffffff81003550>] ? kernel_thread_helper+0x0/0x10
[    1.809958] ---[ end trace 84b562a00a60539e ]---

Looks like a repeat of something I reported against -mmotm 2010-05-11, though a
WARNING rather than an outright crash - the traceback is pretty much identical.
 I have *no* idea why -rc3-mmotm0701 doesn't whinge similarly.


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply

* Re: With disable_ipv6 set to 1 on an interface, ff00:/8 and fe80::/64 are still added on device UP
From: David Miller @ 2010-07-20 20:48 UTC (permalink / raw)
  To: brian.haley; +Cc: maheshkelkar, netdev
In-Reply-To: <4C460856.5090701@hp.com>

From: Brian Haley <brian.haley@hp.com>
Date: Tue, 20 Jul 2010 16:34:30 -0400

> I believe the easiest way to fix this is the following patch, can
> you please test it?
 ...
> If the interface has IPv6 disabled, don't add a multicast or
> link-local route since we won't be adding a link-local address.
> 
> Reported-by: Mahesh Kelkar <maheshkelkar@gmail.com>
> Signed-off-by: Brian Haley <brian.haley@hp.com>

This looks good to me, let me know when it has been tested.

^ permalink raw reply

* Re: mmotm 2010-07-19 - e1000e vs. pm_qos_update_request issues
From: Andrew Morton @ 2010-07-20 21:07 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Rafael J. Wysocki, mark gross, e1000-devel, netdev, linux-kernel,
	James Bottomley, Thomas Gleixner, David S. Miller
In-Reply-To: <6182.1279658125@localhost>

On Tue, 20 Jul 2010 16:35:25 -0400
Valdis.Kletnieks@vt.edu wrote:

> On Mon, 19 Jul 2010 16:38:09 PDT, akpm@linux-foundation.org said:
> > The mm-of-the-moment snapshot 2010-07-19-16-37 has been uploaded to
> > 
> >    http://userweb.kernel.org/~akpm/mmotm/
> 
> Throws a warning at boot:
> 
> [    1.786060] WARNING: at kernel/pm_qos_params.c:264 pm_qos_update_request+0x28/0x54()
> [    1.786088] Hardware name: Latitude E6500
> [    1.787045] pm_qos_update_request() called for unknown object
> [    1.787966] Modules linked in:
> [    1.788940] Pid: 1, comm: swapper Not tainted 2.6.35-rc5-mmotm0719 #1
> [    1.790035] Call Trace:
> [    1.791121]  [<ffffffff81037335>] warn_slowpath_common+0x80/0x98
> [    1.792205]  [<ffffffff810373e1>] warn_slowpath_fmt+0x41/0x43
> [    1.793279]  [<ffffffff81057c14>] pm_qos_update_request+0x28/0x54
> [    1.794347]  [<ffffffff8134889e>] e1000_configure+0x421/0x459
> [    1.795393]  [<ffffffff8134afbd>] e1000_open+0xbd/0x37c
> [    1.796436]  [<ffffffff8105743a>] ? raw_notifier_call_chain+0xf/0x11
> [    1.797491]  [<ffffffff8145f948>] __dev_open+0xae/0xe2
> [    1.798547]  [<ffffffff8145f997>] dev_open+0x1b/0x49
> [    1.799612]  [<ffffffff8146e36e>] netpoll_setup+0x84/0x259
> [    1.800685]  [<ffffffff81b5037c>] init_netconsole+0xbc/0x21f
> [    1.801744]  [<ffffffff81b5026c>] ? sir_wq_init+0x0/0x35
> [    1.802793]  [<ffffffff81b502c0>] ? init_netconsole+0x0/0x21f
> [    1.803845]  [<ffffffff810002ff>] do_one_initcall+0x7a/0x12f
> [    1.804885]  [<ffffffff81b2ccae>] kernel_init+0x138/0x1c2
> [    1.805915]  [<ffffffff81003554>] kernel_thread_helper+0x4/0x10
> [    1.806937]  [<ffffffff81590e00>] ? restore_args+0x0/0x30
> [    1.807955]  [<ffffffff81b2cb76>] ? kernel_init+0x0/0x1c2
> [    1.808958]  [<ffffffff81003550>] ? kernel_thread_helper+0x0/0x10
> [    1.809958] ---[ end trace 84b562a00a60539e ]---
> 
> Looks like a repeat of something I reported against -mmotm 2010-05-11, though a
> WARNING rather than an outright crash - the traceback is pretty much identical.
>  I have *no* idea why -rc3-mmotm0701 doesn't whinge similarly.
> 

I don't recall you reporting that, sorry.

The warning was added by

: commit 82f682514a5df89ffb3890627eebf0897b7a84ec
: Author:     James Bottomley <James.Bottomley@suse.de>
: AuthorDate: Mon Jul 5 22:53:06 2010 +0200
: Commit:     Rafael J. Wysocki <rjw@sisk.pl>
: CommitDate: Mon Jul 19 02:00:34 2010 +0200
: 
:     pm_qos: Get rid of the allocation in pm_qos_add_request()


It's a pretty crappy warning too.  Neither the warning nor the code
comments provide developers with any hint as to what they have done
wrong, nor what they must do to fix things.  And the patch changelog
doesn't mention the new warnings *at all*.

So one must assume that the people who stuck this thing in the tree
have volunteered to fix e1000e.  Let's cc 'em.


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH net-next] sysfs: add entry to indicate network interfaces with random MAC address
From: Stephen Hemminger @ 2010-07-20 21:18 UTC (permalink / raw)
  To: David Miller
  Cc: bhutchings, sassmann, netdev, linux-kernel, gospo, gregory.v.rose,
	alexander.h.duyck, leedom, harald
In-Reply-To: <20100720.131748.51255156.davem@davemloft.net>

On Tue, 20 Jul 2010 13:17:48 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Tue, 20 Jul 2010 15:29:54 +0100
> 
> > On Tue, 2010-07-20 at 14:41 +0200, Stefan Assmann wrote:
> >> Btw, the driver itself could also alter the flag. Then we'd have a well
> >> defined way of setting a stable address.
> > 
> > The driver can't know whether an address assigned by the user is stable.
> 
> If userspace can somehow obtain a persistent address, it can kick
> udev too.
> 
> I really don't see any real value provided by letting userspace mess
> with this.  Because the permanence communicated in this value is from
> the perspective of the kernel driver, it's really therefore about the
> thing that's in ->perm_addr[] not what happens to be in ->addr[] right
> now.

No one mentioned that the first octet of an Ethernet address already
indicates "software generated" Ethernet address. Per the standard,
if bit 1 is set it means address is locally assigned.

static inline bool is_locally_assigned_ether(const u8 *addr)
{
	return (addr[0] & 0x2) != 0;
}

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2010-07-20 21:19 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


Several really small fixes, as is customary this late in the -rc
series...

1) Three bluetooth fixes via Marcel Holtmann:
   a) Fix L2CAP state machine race, from Andrei Emeltchenko

   b) HCI conn security level must be reset after auth failure, from
      Johan Hedberg.

   c) Existng HCI connections need auth levels adjusted when
      constrained by new connection settings.  From Ville Tervo.

2) TX queue hash is set incorrectly in stacked device situations
   because of how we early orphan the socket from the SKB these
   days.  Fix, by remembering sk->sk_hash in skb->rxhash and using
   it later, from Eric Dumazet.

3) Neighbour ->cache_update header op is optional, check for NULL
   was missing in neigh_update_hhs() leading to OOPS with GRE
   tunnels.  Fix from Doug Kehn.

4) RFS Socket sk->sk_tx_queue_mapping is read asynchronously from it's
   setting.  Therefore doing two reads (one to analyze it's validity,
   a second to fetch the actual value) is racey.  Do it on one read
   to fix the problem.  From Tom Herbert.

5) Don't do vhost-net flushes with mutex held, otherwise we deadlock
   with workqueue.  From Michael S. Tsirkin.


6) Guest triggerable condition results in pr_err() log, change to
   pr_debug().  Also From Michael S. Tsirkin.

7) Multicast router code in ipv4 leaks SKB is fib lookup fails, fix
   from Ben Greear.

8) Initialize workqueue earlier in rt2x00 wireless to avoid access to
   uninitialized lock on probe failure.  From Stephen Boyd.

9) Dangling hypervisor VIO interrupt can wedge ibmveth device on close.
   Fix using explicit hypervisor call to disable it before the free.
   From Robert Jennings.

10) Mobile ipv6 routing header code checks wrong address, should look at
    ipv6 header address not the one in the routing header.  From
    Arnaud Ebalard.

11) There are circumstances, that which we do not %100 understand yet but
    is proven by testing and experimentation, where we can call
    tcp_xmit_retransmit_queue() when the send queue of the socket is empty.

    If this happens, we deref a NULL skb trying to look at SACK information
    of the head skb.

    Just return immediately if ->packets_out is zero.

    From Ilpo Järvinen.

12) Bridge netpoll support is buggy, but we were only able to fix the
    problem with a series of non-trivial changes in net-next-2.6 which
    are not appropriate this late in the -rc series.  Just disable
    netpoll support in bridging for 2.6.35, it'll be working fine in
    2.6.36

13) Phone SKB leak fix from Rémi Denis-Courmont.

14) 8168dp ID fix in r8169 driver from Francois Romieu.

15) Packet scheduler NAT module requires all ICMPs to have an IP
    header in their payload, this is not correct, only some ICMPs do.
    Fix from Changli Gao based upon a patch and analysis by Rodrigo
    Partearroyo González.

16) NET_DSA needs to depend upon NET_ETHERNET.  Based upon a report by
    Randy Dunlap.

17) Fix locking in the axnet_cs interrupt handler since it can be
    invoked by the watchdog too.  From Ken Kawasaki.

18) hostap driver must unconditionally set dev->base_addr during probe.
    From John W. Linville.

19) Fix crash in xfrm bundle lookup, it assumed template resolution always
    results in constructed xfrms.  From Timo Teräs.

20) tcp_splice_read() doesn't hit sock_rps_record_flow() but it needs to.
    From Changli Gao.

Please pull, thanks a lot!

The following changes since commit 2f7989efd4398d92b8adffce2e07dd043a0895fe:

  Merge master.kernel.org:/home/rmk/linux-2.6-arm (2010-07-14 17:28:13 -0700)

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Andrei Emeltchenko (1):
      Bluetooth: Check L2CAP pending status before sending connect request

Arnaud Ebalard (1):
      IPv6: fix CoA check in RH2 input handler (mip6_rthdr_input())

Ben Greear (1):
      ipmr: Don't leak memory if fib lookup fails.

Changli Gao (2):
      act_nat: not all of the ICMP packets need an IP header payload
      rfs: call sock_rps_record_flow() in tcp_splice_read()

David S. Miller (4):
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless-2.6
      dsa: Fix Kconfig dependencies.
      Merge branch 'vhost-net' of git://git.kernel.org/.../mst/vhost
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless-2.6

Doug Kehn (1):
      net/core: neighbour update Oops

Eric Dumazet (1):
      net: skb_tx_hash() fix relative to skb_orphan_try()

Francois Romieu (1):
      r8169: incorrect identifier for a 8168dp

Herbert Xu (1):
      bridge: Partially disable netpoll support

Ilpo Järvinen (1):
      tcp: fix crash in tcp_xmit_retransmit_queue

Johan Hedberg (1):
      Bluetooth: Reset the security level after an authentication failure

John W. Linville (1):
      hostap_pci: set dev->base_addr during probe

Ken Kawasaki (1):
      axnet_cs: use spin_lock_irqsave in ax_interrupt

Michael S. Tsirkin (2):
      vhost-net: avoid flush under lock
      vhost: avoid pr_err on condition guest can trigger

Rajkumar Manoharan (1):
      ath9k_htc: fix memory leak in ath9k_hif_usb_alloc_urbs

Reinette Chatre (1):
      iwlwifi: remove key information during device restart

Robert Jennings (1):
      ibmveth: lost IRQ while closing/opening device leads to service loss

Rémi Denis-Courmont (1):
      Phonet: fix skb leak in pipe endpoint accept()

Stephen Boyd (1):
      rt2x00: Fix lockdep warning in rt2x00lib_probe_dev()

Timo Teräs (1):
      xfrm: do not assume that template resolving always returns xfrms

Tom Herbert (1):
      net: fix problem in reading sock TX queue

Ville Tervo (1):
      Bluetooth: Update sec_level/auth_type for already existing connections

 drivers/net/ibmveth.c                    |    4 +++-
 drivers/net/pcmcia/axnet_cs.c            |    7 ++++---
 drivers/net/r8169.c                      |    2 +-
 drivers/net/wireless/ath/ath9k/hif_usb.c |    8 ++++++--
 drivers/net/wireless/hostap/hostap_pci.c |    1 +
 drivers/net/wireless/iwlwifi/iwl-sta.h   |   11 +++++++++++
 drivers/net/wireless/rt2x00/rt2x00dev.c  |   10 +++++-----
 drivers/vhost/net.c                      |   13 +++++++++----
 include/net/sock.h                       |    7 +------
 net/bluetooth/hci_conn.c                 |    5 +++++
 net/bluetooth/hci_event.c                |    2 ++
 net/bluetooth/l2cap.c                    |   14 +++++++++++---
 net/bridge/br_device.c                   |    9 ---------
 net/bridge/br_forward.c                  |   23 +----------------------
 net/core/dev.c                           |   20 +++++++++++++-------
 net/core/neighbour.c                     |    5 ++++-
 net/dsa/Kconfig                          |    2 +-
 net/ipv4/ipmr.c                          |    8 ++++++--
 net/ipv4/tcp.c                           |    1 +
 net/ipv4/tcp_output.c                    |    3 +++
 net/ipv6/mip6.c                          |    3 ++-
 net/phonet/pep.c                         |    1 +
 net/sched/act_nat.c                      |    5 ++++-
 net/xfrm/xfrm_policy.c                   |   15 +++++++++++++--
 24 files changed, 108 insertions(+), 71 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next] sysfs: add entry to indicate network interfaces with random MAC address
From: David Miller @ 2010-07-20 21:20 UTC (permalink / raw)
  To: shemminger
  Cc: bhutchings, sassmann, netdev, linux-kernel, gospo, gregory.v.rose,
	alexander.h.duyck, leedom, harald
In-Reply-To: <20100720141816.16f0a939@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 20 Jul 2010 14:18:16 -0700

> No one mentioned that the first octet of an Ethernet address already
> indicates "software generated" Ethernet address. Per the standard,
> if bit 1 is set it means address is locally assigned.
> 
> static inline bool is_locally_assigned_ether(const u8 *addr)
> {
> 	return (addr[0] & 0x2) != 0;
> }

W00t!

Indeed, can udev just use that?  :-)

^ permalink raw reply

* Re: net/dsa
From: Karl Beldan @ 2010-07-20 21:28 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev
In-Reply-To: <20100720135950.GZ14513@mail.wantstofly.org>

On Tue, Jul 20, 2010 at 3:59 PM, Lennert Buytenhek
<buytenh@wantstofly.org> wrote:
> On Mon, Jul 12, 2010 at 10:59:29PM +0200, Karl Beldan wrote:
>
>> Hi,
>
> Hi Karl,
>
> Sorry, I didn't see your mail initially -- please CC me in the future.
>
>
>> I found the dsa code very handy to help manage a switch.
>
> Ah.  What particular part are you using?
>
>
The whole thing but the cascading stuff.
I could even reuse a tagging format almost as is!

>> Yet I was surprised I had to tweak the code to simply use the phy
>> layer state machine.
>
> You mean that net/dsa uses phy_attach() but not phy_start_machine() ?
> Have you seen problems arising from this?
>
>
I did not mean that, but I do not think there would be a problem,
since every in-tree driver provide their poll_link() to do the job of
the phylib's state_queue, unless one does not provide its poll_link(),
I guess this is what you had in mind.
What I had in mind in fact was the re-use of the phylib's interrupt
based code, in this situation poll_link() is not there, but there
replacing phy_attach() with phy_start_machine is not enough.
Those are not big changes, but the code seems to aim at such versatile
behavior (and more), I can only imagine it would be useful for the
plethora of boards embedding a switch.

>> And I don't see much activity in the code nor any discussion, e.g no
>> follow up to http://patchwork.ozlabs.org/patch/16578.
>>
>> So I was wondering if there was anybody playing with this code, or
>> having ideas about features to add (vlan/stp callbacks) ?
>
> As far as I know, the code currently in the kernel works well for what
> it intends to do (which is to just expose the switch ports), and I'm
> not aware of any bugs in it.
>
> That said, you're right in that there are several more features that
> the hardware supports that the software could be extended to handle.
>
> For one, I don't have access to any Marvell switch chip hardware
> anymore, so that limits my ability to play with this.  Also, the
> relevant documentation is under a rather restrictive license, so the
> only way I can see net/dsa support for Marvell parts improving is if
> there's pressure from a large enough customer to make this happen.
>
Now I understand, but still, I am surprised nobody else touched the
code, with all those switches in the embedded business.

> If this is about non-Marvell parts, I'd welcome adding support for
> those into net/dsa.  For one, I would really like to see Broadcom
> switch chip support added -- the documentation for those chips is
> under similarly restrictive licensing, though.
>
>

Thanks,

Karl

^ permalink raw reply

* Re: net/dsa
From: Lennert Buytenhek @ 2010-07-20 21:54 UTC (permalink / raw)
  To: Karl Beldan; +Cc: netdev
In-Reply-To: <AANLkTinBA3uqfML78PRr1-xN2ye_3Pjj-NQE5t7eHxGy@mail.gmail.com>

On Tue, Jul 20, 2010 at 11:28:19PM +0200, Karl Beldan wrote:

> >> I found the dsa code very handy to help manage a switch.
> >
> > Ah.  What particular part are you using?
> 
> The whole thing but the cascading stuff.
> I could even reuse a tagging format almost as is!

I meant, which silicon part, i.e. which hardware/chips?  Anything with
available data sheets?


> >> Yet I was surprised I had to tweak the code to simply use the phy
> >> layer state machine.
> >
> > You mean that net/dsa uses phy_attach() but not phy_start_machine() ?
> > Have you seen problems arising from this?
>
> I did not mean that, but I do not think there would be a problem,
> since every in-tree driver provide their poll_link() to do the job of
> the phylib's state_queue, unless one does not provide its poll_link(),
> I guess this is what you had in mind.
> What I had in mind in fact was the re-use of the phylib's interrupt
> based code, in this situation poll_link() is not there, but there
> replacing phy_attach() with phy_start_machine is not enough.

We cannot rely on the switch's interrupt pin being hooked up -- there
are many boards out where it's not wired up at all.  Therefore, polling
for link state changes is the only reliable and portable way.

(Of course, interrupt support can always be added, and that used
instead of polling if a load-time test shows that the interrupt pin
actually works.)


> Those are not big changes, but the code seems to aim at such versatile
> behavior (and more), I can only imagine it would be useful for the
> plethora of boards embedding a switch.

Although it supports Marvell chips only for now, net/dsa was written to
be able to handle other models of switch chips as well.  As I said, I
would love to see support for other switch chips added to it.


> > > So I was wondering if there was anybody playing with this code, or
> > > having ideas about features to add (vlan/stp callbacks) ?
> >
> > As far as I know, the code currently in the kernel works well for what
> > it intends to do (which is to just expose the switch ports), and I'm
> > not aware of any bugs in it.
> >
> > That said, you're right in that there are several more features that
> > the hardware supports that the software could be extended to handle.
> >
> > For one, I don't have access to any Marvell switch chip hardware
> > anymore, so that limits my ability to play with this.  Also, the
> > relevant documentation is under a rather restrictive license, so the
> > only way I can see net/dsa support for Marvell parts improving is if
> > there's pressure from a large enough customer to make this happen.
>
> Now I understand, but still, I am surprised nobody else touched the
> code, with all those switches in the embedded business.

Me too..

..then again, "embedded people" tend to hack up stuff in private
and ship whatever works -- they aren't exactly known for working with
upstream.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ixgbe: fix ethtool stats
From: Jeff Kirsher @ 2010-07-20 22:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Jesse Brandeburg, PJ Waskiewicz, netdev
In-Reply-To: <1279646906.2498.103.camel@edumazet-laptop>

On Tue, Jul 20, 2010 at 10:28, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Note : I am currently unable to test following patch, could you please
> Intel guys test it and Ack (or Nack) it ?
>
> Thanks !
>
> [PATCH net-next-2.6] ixgbe: fix ethtool stats
>
> In latest changes about 64bit stats on 32bit arches,
> [commit 28172739f0a276eb8 (net: fix 64 bit counters on 32 bit arches)],
> I missed ixgbe uses a bit of magic in its ixgbe_gstrings_stats
> definition.
>
> IXGBE_NETDEV_STAT() must now assume offsets relative to
> rtnl_link_stats64, not relative do dev->stats.
>
> As a bonus, we also get 64bit stats on ethtool -S
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  drivers/net/ixgbe/ixgbe_ethtool.c |   42 ++++++++++++++--------------
>  1 file changed, 21 insertions(+), 21 deletions(-)
>

Thanks Eric, I have added it to my queue.

-- 
Cheers,
Jeff

^ permalink raw reply

* [patch 3/3] drivers/net/82596.c: fix warning
From: akpm @ 2010-07-20 22:25 UTC (permalink / raw)
  To: davem; +Cc: netdev, akpm, segooon

From: Andrew Morton <akpm@linux-foundation.org>

drivers/net/82596.c: In function 'i596_open':
drivers/net/82596.c:1044: warning: label 'err_irq_dev' defined but not used

Caused by "82596: free resources on error"

Cc: Kulikov Vasiliy <segooon@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/net/82596.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN drivers/net/82596.c~drivers-net-82596c-fix-warning drivers/net/82596.c
--- a/drivers/net/82596.c~drivers-net-82596c-fix-warning
+++ a/drivers/net/82596.c
@@ -1040,8 +1040,8 @@ err_queue:
 err_irq_56:
 #ifdef ENABLE_MVME16x_NET
 	free_irq(0x56, dev);
-#endif
 err_irq_dev:
+#endif
 	free_irq(dev->irq, dev);
 
 	return res;
_

^ permalink raw reply

* [patch 2/3] arch/um/drivers: remove duplicate structure field initialization
From: akpm @ 2010-07-20 22:25 UTC (permalink / raw)
  To: davem; +Cc: netdev, akpm, julia, shemminger

From: Julia Lawall <julia@diku.dk>

There are two initializations of ndo_set_mac_address, one to a local
function that is not used otherwise and one to a function that is defined
elsewhere.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier I, s, fld;
position p0,p;
expression E;
@@

struct I s =@p0 { ... .fld@p = E, ...};

@s@
identifier I, s, r.fld;
position r.p0,p;
expression E;
@@

struct I s =@p0 { ... .fld@p = E, ...};

@script:python@
p0 << r.p0;
fld << r.fld;
ps << s.p;
pr << r.p;
@@

if int(ps[0].line)<int(pr[0].line) or int(ps[0].column)<int(pr[0].column):
  cocci.print_main(fld,p0)
// </smpl>

akpm:

- Use the standard eth_mac_addr() in uml_net_set_mac()

- Remove unneeded and racy local set_ether_mac()

- Remove duplicated (and incorrect)
  uml_netdev_ops.ndo_set_mac_address initializer.

Fixes 8bb95b39a16ed55226810596f92216c53329d2fe ("uml: convert network
device to netdevice ops").

[akpm@linux-foundation.org: rework as above]
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/um/drivers/net_kern.c |   10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff -puN arch/um/drivers/net_kern.c~arch-um-drivers-remove-duplicate-structure-field-initialization arch/um/drivers/net_kern.c
--- a/arch/um/drivers/net_kern.c~arch-um-drivers-remove-duplicate-structure-field-initialization
+++ a/arch/um/drivers/net_kern.c
@@ -25,11 +25,6 @@
 #include "net_kern.h"
 #include "net_user.h"
 
-static inline void set_ether_mac(struct net_device *dev, unsigned char *addr)
-{
-	memcpy(dev->dev_addr, addr, ETH_ALEN);
-}
-
 #define DRIVER_NAME "uml-netdev"
 
 static DEFINE_SPINLOCK(opened_lock);
@@ -266,7 +261,7 @@ static int uml_net_set_mac(struct net_de
 	struct sockaddr *hwaddr = addr;
 
 	spin_lock_irq(&lp->lock);
-	set_ether_mac(dev, hwaddr->sa_data);
+	eth_mac_addr(dev, hwaddr->sa_data);
 	spin_unlock_irq(&lp->lock);
 
 	return 0;
@@ -380,7 +375,6 @@ static const struct net_device_ops uml_n
 	.ndo_tx_timeout 	= uml_net_tx_timeout,
 	.ndo_set_mac_address	= uml_net_set_mac,
 	.ndo_change_mtu 	= uml_net_change_mtu,
-	.ndo_set_mac_address 	= eth_mac_addr,
 	.ndo_validate_addr	= eth_validate_addr,
 };
 
@@ -478,7 +472,7 @@ static void eth_configure(int n, void *i
 	    ((*transport->user->init)(&lp->user, dev) != 0))
 		goto out_unregister;
 
-	set_ether_mac(dev, device->mac);
+	eth_mac_addr(dev, device->mac);
 	dev->mtu = transport->user->mtu;
 	dev->netdev_ops = &uml_netdev_ops;
 	dev->ethtool_ops = &uml_net_ethtool_ops;
_

^ permalink raw reply

* [patch 1/3] drivers/net/cxgb3/t3_hw.c: use new hex_to_bin() method
From: akpm @ 2010-07-20 22:25 UTC (permalink / raw)
  To: davem; +Cc: netdev, akpm, ext-andriy.shevchenko, divy

From: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>

Get rid of own implementation of hex_to_bin().

Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/net/cxgb3/t3_hw.c |   16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff -puN drivers/net/cxgb3/t3_hw.c~drivers-net-cxgb3-t3_hwc-use-new-hex_to_bin-method drivers/net/cxgb3/t3_hw.c
--- a/drivers/net/cxgb3/t3_hw.c~drivers-net-cxgb3-t3_hwc-use-new-hex_to_bin-method
+++ a/drivers/net/cxgb3/t3_hw.c
@@ -679,14 +679,6 @@ int t3_seeprom_wp(struct adapter *adapte
 	return t3_seeprom_write(adapter, EEPROM_STAT_ADDR, enable ? 0xc : 0);
 }
 
-/*
- * Convert a character holding a hex digit to a number.
- */
-static unsigned int hex2int(unsigned char c)
-{
-	return isdigit(c) ? c - '0' : toupper(c) - 'A' + 10;
-}
-
 /**
  *	get_vpd_params - read VPD parameters from VPD EEPROM
  *	@adapter: adapter to read
@@ -727,15 +719,15 @@ static int get_vpd_params(struct adapter
 		p->port_type[0] = uses_xaui(adapter) ? 1 : 2;
 		p->port_type[1] = uses_xaui(adapter) ? 6 : 2;
 	} else {
-		p->port_type[0] = hex2int(vpd.port0_data[0]);
-		p->port_type[1] = hex2int(vpd.port1_data[0]);
+		p->port_type[0] = hex_to_bin(vpd.port0_data[0]);
+		p->port_type[1] = hex_to_bin(vpd.port1_data[0]);
 		p->xauicfg[0] = simple_strtoul(vpd.xaui0cfg_data, NULL, 16);
 		p->xauicfg[1] = simple_strtoul(vpd.xaui1cfg_data, NULL, 16);
 	}
 
 	for (i = 0; i < 6; i++)
-		p->eth_base[i] = hex2int(vpd.na_data[2 * i]) * 16 +
-				 hex2int(vpd.na_data[2 * i + 1]);
+		p->eth_base[i] = hex_to_bin(vpd.na_data[2 * i]) * 16 +
+				 hex_to_bin(vpd.na_data[2 * i + 1]);
 	return 0;
 }
 
_

^ permalink raw reply

* Re: [BUG net-next-2.6] vlan, bonding, bnx2 problems
From: Jay Vosburgh @ 2010-07-20 22:58 UTC (permalink / raw)
  Cc: Michael Chan, Eric Dumazet, David Miller,
	pedro.netdev@dondevamos.com, netdev@vger.kernel.org,
	kaber@trash.net, bhutchings@solarflare.com
In-Reply-To: <25515.1279570766@death>


Jay Vosburgh <fubar@us.ibm.com> wrote:

>Michael Chan <mchan@broadcom.com> wrote:
>
>>Adding Jay to CC.
>>
>>On Mon, 2010-07-19 at 06:24 -0700, Eric Dumazet wrote:
>>> [   32.046479] BUG: scheduling while atomic: ifenslave/4586/0x00000100
>>> [   32.046540] Modules linked in: ipmi_si ipmi_msghandler hpilo
>>> bonding ipv6
>>> [   32.046784] Pid: 4586, comm: ifenslave Tainted: G        W
>>> 2.6.35-rc1-01453-g3e12451-dirty #836
>>> [   32.046860] Call Trace:
>>> [   32.046910]  [<c13421c4>] ? printk+0x18/0x1c
>>> [   32.046965]  [<c10315c9>] __schedule_bug+0x59/0x60
>>> [   32.047019]  [<c1342a2c>] schedule+0x57c/0x850
>>> [   32.047074]  [<c104a106>] ? lock_timer_base+0x26/0x50
>>> [   32.047128]  [<c1342f78>] schedule_timeout+0x118/0x250
>>> [   32.047183]  [<c104a2c0>] ? process_timeout+0x0/0x10
>>> [   32.047238]  [<c13430c5>] schedule_timeout_uninterruptible
>>> +0x15/0x20
>>> [   32.047295]  [<c104a345>] msleep+0x15/0x20
>>> [   32.047350]  [<c1227082>] bnx2_napi_disable+0x52/0x80
>>> [   32.047405]  [<c122b56f>] bnx2_netif_stop+0x3f/0xa0
>>> [   32.047460]  [<c122b62a>] bnx2_vlan_rx_register+0x5a/0x80
>>> [   32.047516]  [<f8ced776>] bond_enslave+0x526/0xa90 [bonding]
>>> [   32.047576]  [<f8b8f0d0>] ? fib6_clean_node+0x0/0xb0 [ipv6]
>>> [   32.047634]  [<f8b8dda0>] ? fib6_age+0x0/0x90 [ipv6]
>>> [   32.047689]  [<c129d2d3>] ? netdev_set_master+0x3/0xc0
>>> [   32.047746]  [<f8cee4cb>] bond_do_ioctl+0x31b/0x430 [bonding]
>>> [   32.047804]  [<c105b19a>] ? raw_notifier_call_chain+0x1a/0x20
>>> [   32.047861]  [<c12abd5d>] ? __rtnl_unlock+0xd/0x10
>>> [   32.047915]  [<c129f8cd>] ? __dev_get_by_name+0x7d/0xa0
>>> [   32.047970]  [<c12a19b0>] dev_ifsioc+0xf0/0x290
>>> [   32.048025]  [<f8cee1b0>] ? bond_do_ioctl+0x0/0x430 [bonding]
>>> [   32.048081]  [<c12a1ce1>] dev_ioctl+0x191/0x610
>>> [   32.048136]  [<c12eeb20>] ? udp_ioctl+0x0/0x70
>>> [   32.048189]  [<c128f67c>] sock_ioctl+0x6c/0x240
>>> [   32.048243]  [<c10d3a44>] vfs_ioctl+0x34/0xa0
>>> [   32.048297]  [<c10c7cab>] ? alloc_file+0x1b/0xa0
>>> [   32.048351]  [<c128f610>] ? sock_ioctl+0x0/0x240
>>> [   32.048404]  [<c10d4186>] do_vfs_ioctl+0x66/0x550
>>> [   32.048459]  [<c1022ca0>] ? do_page_fault+0x0/0x350
>>> [   32.048513]  [<c1022e41>] ? do_page_fault+0x1a1/0x350
>>> [   32.048568]  [<c129098c>] ? sys_socket+0x5c/0x70
>>> [   32.048622]  [<c1291860>] ? sys_socketcall+0x60/0x270
>>> [   32.048677]  [<c10d46a9>] sys_ioctl+0x39/0x60
>>> [   32.048730]  [<c1002bd0>] sysenter_do_call+0x12/0x26
>>> [   32.052025] bonding: bond0: enslaving eth1 as a backup interface
>>> with a down link.
>>> [   32.100207] tg3 0000:14:04.0: PME# enabled
>>> [   32.100222]  pci0000:00: wake-up capability enabled by ACPI
>>> [   32.224488]  pci0000:00: wake-up capability disabled by ACPI
>>> [   32.224492] tg3 0000:14:04.0: PME# disabled
>>> [   32.348516] tg3 0000:14:04.0: BAR 0: set to [mem
>>> 0xfdff0000-0xfdffffff 64bit] (PCI address [0xfdff0000-0xfdffffff]
>>> [   32.348524] tg3 0000:14:04.0: BAR 2: set to [mem
>>> 0xfdfe0000-0xfdfeffff 64bit] (PCI address [0xfdfe0000-0xfdfeffff]
>>> [   32.363711] bonding: bond0: enslaving eth2 as a backup interface
>>> with a down link.
>>> 
>>> 
>>> 
>>> For bnx2, it seems commit 212f9934afccf9c9739921
>>> was not sufficient to correct the "scheduling while atomic" bug...
>>> enslaving a bnx2 on a bond device with one vlan already set :
>>>  bond_enslave -> bnx2_vlan_rx_register -> bnx2_netif_stop ->
>>> bnx2_napi_disable -> msleep()
>>> 
>>
>>There are a number of drivers that call napi_disable() during
>>->ndo_vlan_rx_regsiter().  bnx2 is lockless in the rx path and so we
>>need to disable NAPI rx processing and wait for it to be done before
>>modifying the vlgrp.
>>
>>Jay, is there an alternative to holding the bond->lock when calling the
>>slave's ->ndo_vlan_rx_register()?
>
>	I believe so.  The lock is held here nominally to mutex
>bonding's vlan_list.  The bond_add_vlans_on_slave function actually does
>the lock and call to ndo_vlan_rx_register (plus one add_vid call per
>configured VLAN); I think the call frame in the above stack has been
>optimized out.
>
>	For the specific cases of bond_add_vlans_on_slave and
>bond_del_vlans_from_slave, we should be able to get away without holding
>the bond->lock because we also hold RTNL, and it looks like all changes
>to the vlan_list are implicitly mutexed by RTNL because all VLAN add /
>remove for device or vid end up being done under RTNL.
>
>	The cases within bonding that change the vlan_list will still
>have to hold bond->lock, because other call sites within bonding check
>the vlan_list without RTNL (and it would be impractical to have them do
>so).
>
>	The patch is as follows; I'm compiling this now to test.  If it
>pans out, I'll post a formal submission in a bit.

	Just an update; the "VLAN 0" patch:

commit ad1afb00393915a51c21b1ae8704562bf036855f
Author: Pedro Garcia <pedro.netdev@dondevamos.com>
Date:   Sun Jul 18 15:38:44 2010 -0700

    vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)

	has broken a bunch of VLAN-related things in bonding (more than
just the ipv6 event thing that was already fixed).  Now, 8021q will do
an "add_vid" for VLAN 0 without doing a vlan_rx_register and supplying a
struct vlan_group; this confuses the existing bonding code, which
assumes that register comes first.

	I'm working out the best way to fix the VLAN breakage before I
can test the below patch (which may have to change).

	-J

>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 8228088..decddf5 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -565,10 +565,8 @@ static void bond_add_vlans_on_slave(struct bonding *bond, struct net_device *sla
> 	struct vlan_entry *vlan;
> 	const struct net_device_ops *slave_ops = slave_dev->netdev_ops;
>
>-	write_lock_bh(&bond->lock);
>-
> 	if (list_empty(&bond->vlan_list))
>-		goto out;
>+		return;
>
> 	if ((slave_dev->features & NETIF_F_HW_VLAN_RX) &&
> 	    slave_ops->ndo_vlan_rx_register)
>@@ -576,13 +574,10 @@ static void bond_add_vlans_on_slave(struct bonding *bond, struct net_device *sla
>
> 	if (!(slave_dev->features & NETIF_F_HW_VLAN_FILTER) ||
> 	    !(slave_ops->ndo_vlan_rx_add_vid))
>-		goto out;
>+		return;
>
> 	list_for_each_entry(vlan, &bond->vlan_list, vlan_list)
> 		slave_ops->ndo_vlan_rx_add_vid(slave_dev, vlan->vlan_id);
>-
>-out:
>-	write_unlock_bh(&bond->lock);
> }
>
> static void bond_del_vlans_from_slave(struct bonding *bond,
>@@ -592,10 +587,8 @@ static void bond_del_vlans_from_slave(struct bonding *bond,
> 	struct vlan_entry *vlan;
> 	struct net_device *vlan_dev;
>
>-	write_lock_bh(&bond->lock);
>-
> 	if (list_empty(&bond->vlan_list))
>-		goto out;
>+		return;
>
> 	if (!(slave_dev->features & NETIF_F_HW_VLAN_FILTER) ||
> 	    !(slave_ops->ndo_vlan_rx_kill_vid))
>@@ -614,9 +607,6 @@ unreg:
> 	if ((slave_dev->features & NETIF_F_HW_VLAN_RX) &&
> 	    slave_ops->ndo_vlan_rx_register)
> 		slave_ops->ndo_vlan_rx_register(slave_dev, NULL);
>-
>-out:
>-	write_unlock_bh(&bond->lock);
> }
>
> /*------------------------------- Link status -------------------------------*/

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* [PATCH -next] net: NET_DSA depends on NET_ETHERNET
From: Randy Dunlap @ 2010-07-20 23:03 UTC (permalink / raw)
  To: David Miller; +Cc: sfr, netdev, linux-next, linux-kernel, Lennert Buytenhek
In-Reply-To: <20100710.190954.245403400.davem@davemloft.net>

From: Randy Dunlap <randy.dunlap@oracle.com>

NET_DSA code selects and uses PHYLIB code, but PHYLIB depends on
NET_ETHERNET.  However, "select" does not follow kconfig dependencies,
so explicitly list that requirement here instead.

Fixes this kconfig warning:

warning: (NET_DSA && NET && EXPERIMENTAL && !S390 ...) selects PHYLIB which has unmet direct dependencies (!S390 && NET_ETHERNET)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
---
 net/dsa/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Is there some reason that NET_DSA is bool instead of tristate?
I.e., net/dsa/ code cannot be built as loadable modules?
--- linux-next-20100713.orig/net/dsa/Kconfig
+++ linux-next-20100713/net/dsa/Kconfig
@@ -1,7 +1,7 @@
 menuconfig NET_DSA
 	bool "Distributed Switch Architecture support"
 	default n
-	depends on EXPERIMENTAL && !S390
+	depends on EXPERIMENTAL && !S390 && NET_ETHERNET
 	select PHYLIB
 	---help---
 	  This allows you to use hardware switch chips that use

^ permalink raw reply

* Re: [patch v2.2 1/4] [PATCH v2.1 1/4] netfilter: xt_ipvs (netfilter matcher for IPVS)
From: Simon Horman @ 2010-07-20 23:34 UTC (permalink / raw)
  To: Hannes Eder
  Cc: Patrick McHardy, lvs-devel, netdev, linux-kernel, netfilter,
	Wensong Zhang, Julius Volz, David S. Miller,
	Netfilter Development Mailinglist
In-Reply-To: <AANLkTimUEJGAJ6tsPKGltUGBwrdotK5YyKBIkQWbA8qZ@mail.gmail.com>

On Tue, Jul 20, 2010 at 02:44:11PM +0200, Hannes Eder wrote:
> Hi Simon,
> 
> On Tue, Jun 22, 2010 at 09:13, Simon Horman <horms@verge.net.au> wrote:
> > On Mon, May 03, 2010 at 01:29:46PM +0200, Hannes Eder wrote:
> >> Thank you for picking this series of patches up again and thanks for
> >> the feedback.
> >>
> >> I'll send an updated version in the next days.
> >
> > Hi Hanes,
> >
> > more than a few days seems to have passed.
> > Do you have time to fix the patches up?
> > If not, I'll take a stab at it.
> 
> /me working through the backlog of emails after vacation, however this
> email was buried in my inbox before my vacation, my bad.  I've been
> extremely busy lately and I did not have the time to work on the
> patches.  I saw your updated versions, I appreciate very much that you
> are taking it from there.

No problem, I assumed that you were busy with something.

^ permalink raw reply

* Re: [PATCH -next] net: NET_DSA depends on NET_ETHERNET
From: David Miller @ 2010-07-21  0:45 UTC (permalink / raw)
  To: randy.dunlap; +Cc: sfr, netdev, linux-next, linux-kernel, buytenh
In-Reply-To: <4C462B44.5010107@oracle.com>

From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Tue, 20 Jul 2010 16:03:32 -0700

> From: Randy Dunlap <randy.dunlap@oracle.com>
> 
> NET_DSA code selects and uses PHYLIB code, but PHYLIB depends on
> NET_ETHERNET.  However, "select" does not follow kconfig dependencies,
> so explicitly list that requirement here instead.
> 
> Fixes this kconfig warning:
> 
> warning: (NET_DSA && NET && EXPERIMENTAL && !S390 ...) selects PHYLIB which has unmet direct dependencies (!S390 && NET_ETHERNET)
> 
> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>

Randy, this has been fixed in net-2.6 for some time now.

And I'm pretty sure I sent a copy of this to you when I
checked it in :-)

--------------------
>From 336a283b9cbe47748ccd68fd8c5158f67cee644b Mon Sep 17 00:00:00 2001
From: David S. Miller <davem@davemloft.net>
Date: Mon, 12 Jul 2010 20:03:42 -0700
Subject: [PATCH 09/24] dsa: Fix Kconfig dependencies.

Based upon a report by Randy Dunlap.

DSA needs PHYLIB, but PHYLIB needs NET_ETHERNET.  So, in order
to select PHYLIB we have to make DSA depend upon NET_ETHERNET.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/dsa/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index c51b554..1120178 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -1,7 +1,7 @@
 menuconfig NET_DSA
 	bool "Distributed Switch Architecture support"
 	default n
-	depends on EXPERIMENTAL && !S390
+	depends on EXPERIMENTAL && NET_ETHERNET && !S390
 	select PHYLIB
 	---help---
 	  This allows you to use hardware switch chips that use
-- 
1.7.1.1

^ permalink raw reply related

* Re: [patch v2.6 4/4] libxt_ipvs: user-space lib for netfilter matcher xt_ipvs
From: Simon Horman @ 2010-07-21  1:21 UTC (permalink / raw)
  To: lvs-devel, netdev, linux-kernel, netfilter, netfilter-devel
  Cc: Malcolm Turnbull, Wensong Zhang, Julius Volz, Patrick McHardy,
	David S. Miller, Hannes Eder
In-Reply-To: <20100711090500.421568837@vergenet.net>

On Sun, Jul 11, 2010 at 06:03:46PM +0900, horms@vergenet.net wrote:
> From:	Hannes Eder <heder@google.com>
> 
> The user-space library for the netfilter matcher xt_ipvs.

[snip]

> Index: iptables/include/linux/netfilter/xt_ipvs.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ iptables/include/linux/netfilter/xt_ipvs.h	2010-07-04 20:23:30.000000000 +0900
> @@ -0,0 +1,25 @@
> +#ifndef _XT_IPVS_H
> +#define _XT_IPVS_H 1
> +
> +#define XT_IPVS_IPVS_PROPERTY	(1 << 0) /* all other options imply this one */
> +#define XT_IPVS_PROTO		(1 << 1)
> +#define XT_IPVS_VADDR		(1 << 2)
> +#define XT_IPVS_VPORT		(1 << 3)
> +#define XT_IPVS_DIR		(1 << 4)
> +#define XT_IPVS_METHOD		(1 << 5)
> +#define XT_IPVS_VPORTCTL	(1 << 6)
> +#define XT_IPVS_MASK		((1 << 7) - 1)
> +#define XT_IPVS_ONCE_MASK	(XT_IPVS_MASK & ~XT_IPVS_IPVS_PROPERTY)
> +
> +struct xt_ipvs_mtinfo {
> +	union nf_inet_addr	vaddr, vmask;
> +	__be16			vport;
> +	__u16			l4proto;
> +	__u16			fwd_method;

The kernel version of this file has been updated so that
l4proto and fwd_method are __u8. This also needs to be updated.
I will post an updated patch (v2.7).

> +	__be16			vportctl;
> +
> +	__u8			invert;
> +	__u8			bitmask;
> +};
> +
> +#endif /* _XT_IPVS_H */

^ permalink raw reply

* Re: [patch v2.7 4/4] libxt_ipvs: user-space lib for netfilter matcher xt_ipvs
From: Simon Horman @ 2010-07-21  1:23 UTC (permalink / raw)
  To: lvs-devel, netdev, linux-kernel, netfilter, netfilter-devel
  Cc: Mark Brooks, Malcolm Turnbull, Wensong Zhang, Julius Volz,
	Patrick McHardy, David S. Miller, Hannes Eder
In-Reply-To: <20100721012146.GC22966@verge.net.au>

From:	Hannes Eder <heder@google.com>

The user-space library for the netfilter matcher xt_ipvs.

[ trivial up-port by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Acked-by: Simon Horman <horms@verge.net.au>

 configure.ac                      |   10 -
 extensions/libxt_ipvs.c           |  365 +++++++++++++++++++++++++++++++++++++
 extensions/libxt_ipvs.man         |   24 ++
 include/linux/netfilter/xt_ipvs.h |   25 +++
 4 files changed, 422 insertions(+), 2 deletions(-)
 create mode 100644 extensions/libxt_ipvs.c
 create mode 100644 extensions/libxt_ipvs.man
 create mode 100644 include/linux/netfilter/xt_ipvs.h

v2.7
* Update struct xt_ipvs_mtinfo to use __u8 instead of __16 for the l4proto
  and fwd_method to reflect the same change to the kernel copy
  of struct xt_ipvs_mtinfo.

v2.1
* Trival up-port

Index: iptables/configure.ac
===================================================================
--- iptables.orig/configure.ac	2010-07-21 09:43:55.000000000 +0900
+++ iptables/configure.ac	2010-07-21 09:44:02.000000000 +0900
@@ -52,12 +52,18 @@ AC_ARG_WITH([pkgconfigdir], AS_HELP_STRI
 	[Path to the pkgconfig directory [[LIBDIR/pkgconfig]]]),
 	[pkgconfigdir="$withval"], [pkgconfigdir='${libdir}/pkgconfig'])
 
-AC_CHECK_HEADER([linux/dccp.h])
-
 blacklist_modules="";
+
+AC_CHECK_HEADER([linux/dccp.h])
 if test "$ac_cv_header_linux_dccp_h" != "yes"; then
 	blacklist_modules="$blacklist_modules dccp";
 fi;
+
+AC_CHECK_HEADER([linux/ip_vs.h])
+if test "$ac_cv_header_linux_ip_vs_h" != "yes"; then
+	blacklist_modules="$blacklist_modules ipvs";
+fi;
+
 AC_SUBST([blacklist_modules])
 
 AM_CONDITIONAL([ENABLE_STATIC], [test "$enable_static" = "yes"])
Index: iptables/extensions/libxt_ipvs.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ iptables/extensions/libxt_ipvs.c	2010-07-21 10:07:17.000000000 +0900
@@ -0,0 +1,365 @@
+/*
+ * Shared library add-on to iptables to add IPVS matching.
+ *
+ * Detailed doc is in the kernel module source net/netfilter/xt_ipvs.c
+ *
+ * Author: Hannes Eder <heder@google.com>
+ */
+#include <sys/types.h>
+#include <assert.h>
+#include <ctype.h>
+#include <errno.h>
+#include <getopt.h>
+#include <netdb.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <xtables.h>
+#include <linux/ip_vs.h>
+#include <linux/netfilter/xt_ipvs.h>
+
+static const struct option ipvs_mt_opts[] = {
+	{ .name = "ipvs",     .has_arg = false, .val = '0' },
+	{ .name = "vproto",   .has_arg = true,  .val = '1' },
+	{ .name = "vaddr",    .has_arg = true,  .val = '2' },
+	{ .name = "vport",    .has_arg = true,  .val = '3' },
+	{ .name = "vdir",     .has_arg = true,  .val = '4' },
+	{ .name = "vmethod",  .has_arg = true,  .val = '5' },
+	{ .name = "vportctl", .has_arg = true,  .val = '6' },
+	{ .name = NULL }
+};
+
+static void ipvs_mt_help(void)
+{
+	printf(
+"IPVS match options:\n"
+"[!] --ipvs                      packet belongs to an IPVS connection\n"
+"\n"
+"Any of the following options implies --ipvs (even negated)\n"
+"[!] --vproto protocol           VIP protocol to match; by number or name,\n"
+"                                e.g. \"tcp\"\n"
+"[!] --vaddr address[/mask]      VIP address to match\n"
+"[!] --vport port                VIP port to match; by number or name,\n"
+"                                e.g. \"http\"\n"
+"    --vdir {ORIGINAL|REPLY}     flow direction of packet\n"
+"[!] --vmethod {GATE|IPIP|MASQ}  IPVS forwarding method used\n"
+"[!] --vportctl port             VIP port of the controlling connection to\n"
+"                                match, e.g. 21 for FTP\n"
+		);
+}
+
+static void ipvs_mt_parse_addr_and_mask(const char *arg,
+					union nf_inet_addr *address,
+					union nf_inet_addr *mask,
+					unsigned int family)
+{
+	struct in_addr *addr = NULL;
+	struct in6_addr *addr6 = NULL;
+	unsigned int naddrs = 0;
+
+	if (family == NFPROTO_IPV4) {
+		xtables_ipparse_any(arg, &addr, &mask->in, &naddrs);
+		if (naddrs > 1)
+			xtables_error(PARAMETER_PROBLEM,
+				      "multiple IP addresses not allowed");
+		if (naddrs == 1)
+			memcpy(&address->in, addr, sizeof(*addr));
+	} else if (family == NFPROTO_IPV6) {
+		xtables_ip6parse_any(arg, &addr6, &mask->in6, &naddrs);
+		if (naddrs > 1)
+			xtables_error(PARAMETER_PROBLEM,
+				      "multiple IP addresses not allowed");
+		if (naddrs == 1)
+			memcpy(&address->in6, addr6, sizeof(*addr6));
+	} else {
+		/* Hu? */
+		assert(false);
+	}
+}
+
+/* Function which parses command options; returns true if it ate an option */
+static int ipvs_mt_parse(int c, char **argv, int invert, unsigned int *flags,
+			 const void *entry, struct xt_entry_match **match,
+			 unsigned int family)
+{
+	struct xt_ipvs_mtinfo *data = (void *)(*match)->data;
+	char *p = NULL;
+	u_int8_t op = 0;
+
+	if ('0' <= c && c <= '6') {
+		static const int ops[] = {
+			XT_IPVS_IPVS_PROPERTY,
+			XT_IPVS_PROTO,
+			XT_IPVS_VADDR,
+			XT_IPVS_VPORT,
+			XT_IPVS_DIR,
+			XT_IPVS_METHOD,
+			XT_IPVS_VPORTCTL
+		};
+		op = ops[c - '0'];
+	} else
+		return 0;
+
+	if (*flags & op & XT_IPVS_ONCE_MASK)
+		goto multiple_use;
+
+	switch (c) {
+	case '0': /* --ipvs */
+		/* Nothing to do here. */
+		break;
+
+	case '1': /* --vproto */
+		/* Canonicalize into lower case */
+		for (p = optarg; *p != '\0'; ++p)
+			*p = tolower(*p);
+
+		data->l4proto = xtables_parse_protocol(optarg);
+		break;
+
+	case '2': /* --vaddr */
+		ipvs_mt_parse_addr_and_mask(optarg, &data->vaddr,
+					    &data->vmask, family);
+		break;
+
+	case '3': /* --vport */
+		data->vport = htons(xtables_parse_port(optarg, "tcp"));
+		break;
+
+	case '4': /* --vdir */
+		xtables_param_act(XTF_NO_INVERT, "ipvs", "--vdir", invert);
+		if (strcasecmp(optarg, "ORIGINAL") == 0) {
+			data->bitmask |= XT_IPVS_DIR;
+			data->invert   &= ~XT_IPVS_DIR;
+		} else if (strcasecmp(optarg, "REPLY") == 0) {
+			data->bitmask |= XT_IPVS_DIR;
+			data->invert  |= XT_IPVS_DIR;
+		} else {
+			xtables_param_act(XTF_BAD_VALUE,
+					  "ipvs", "--vdir", optarg);
+		}
+		break;
+
+	case '5': /* --vmethod */
+		if (strcasecmp(optarg, "GATE") == 0)
+			data->fwd_method = IP_VS_CONN_F_DROUTE;
+		else if (strcasecmp(optarg, "IPIP") == 0)
+			data->fwd_method = IP_VS_CONN_F_TUNNEL;
+		else if (strcasecmp(optarg, "MASQ") == 0)
+			data->fwd_method = IP_VS_CONN_F_MASQ;
+		else
+			xtables_param_act(XTF_BAD_VALUE,
+					  "ipvs", "--vmethod", optarg);
+		break;
+
+	case '6': /* --vportctl */
+		data->vportctl = htons(xtables_parse_port(optarg, "tcp"));
+		break;
+
+	default:
+		/* Hu? How did we come here? */
+		assert(false);
+		return 0;
+	}
+
+	if (op & XT_IPVS_ONCE_MASK) {
+		if (data->invert & XT_IPVS_IPVS_PROPERTY)
+			xtables_error(PARAMETER_PROBLEM,
+				      "! --ipvs cannot be together with"
+				      " other options");
+		data->bitmask |= XT_IPVS_IPVS_PROPERTY;
+	}
+
+	data->bitmask |= op;
+	if (invert)
+		data->invert |= op;
+	*flags |= op;
+	return 1;
+
+multiple_use:
+	xtables_error(PARAMETER_PROBLEM,
+		      "multiple use of the same IPVS option is not allowed");
+}
+
+static int ipvs_mt4_parse(int c, char **argv, int invert, unsigned int *flags,
+			  const void *entry, struct xt_entry_match **match)
+{
+	return ipvs_mt_parse(c, argv, invert, flags, entry, match,
+			     NFPROTO_IPV4);
+}
+
+static int ipvs_mt6_parse(int c, char **argv, int invert, unsigned int *flags,
+			  const void *entry, struct xt_entry_match **match)
+{
+	return ipvs_mt_parse(c, argv, invert, flags, entry, match,
+			     NFPROTO_IPV6);
+}
+
+static void ipvs_mt_check(unsigned int flags)
+{
+	if (flags == 0)
+		xtables_error(PARAMETER_PROBLEM,
+			      "IPVS: At least one option is required");
+}
+
+/* Shamelessly copied from libxt_conntrack.c */
+static void ipvs_mt_dump_addr(const union nf_inet_addr *addr,
+			      const union nf_inet_addr *mask,
+			      unsigned int family, bool numeric)
+{
+	char buf[BUFSIZ];
+
+	if (family == NFPROTO_IPV4) {
+		if (!numeric && addr->ip == 0) {
+			printf("anywhere ");
+			return;
+		}
+		if (numeric)
+			strcpy(buf, xtables_ipaddr_to_numeric(&addr->in));
+		else
+			strcpy(buf, xtables_ipaddr_to_anyname(&addr->in));
+		strcat(buf, xtables_ipmask_to_numeric(&mask->in));
+		printf("%s ", buf);
+	} else if (family == NFPROTO_IPV6) {
+		if (!numeric && addr->ip6[0] == 0 && addr->ip6[1] == 0 &&
+		    addr->ip6[2] == 0 && addr->ip6[3] == 0) {
+			printf("anywhere ");
+			return;
+		}
+		if (numeric)
+			strcpy(buf, xtables_ip6addr_to_numeric(&addr->in6));
+		else
+			strcpy(buf, xtables_ip6addr_to_anyname(&addr->in6));
+		strcat(buf, xtables_ip6mask_to_numeric(&mask->in6));
+		printf("%s ", buf);
+	}
+}
+
+static void ipvs_mt_dump(const void *ip, const struct xt_ipvs_mtinfo *data,
+			 unsigned int family, bool numeric, const char *prefix)
+{
+	if (data->bitmask == XT_IPVS_IPVS_PROPERTY) {
+		if (data->invert & XT_IPVS_IPVS_PROPERTY)
+			printf("! ");
+		printf("%sipvs ", prefix);
+	}
+
+	if (data->bitmask & XT_IPVS_PROTO) {
+		if (data->invert & XT_IPVS_PROTO)
+			printf("! ");
+		printf("%sproto %u ", prefix, data->l4proto);
+	}
+
+	if (data->bitmask & XT_IPVS_VADDR) {
+		if (data->invert & XT_IPVS_VADDR)
+			printf("! ");
+
+		printf("%svaddr ", prefix);
+		ipvs_mt_dump_addr(&data->vaddr, &data->vmask, family, numeric);
+	}
+
+	if (data->bitmask & XT_IPVS_VPORT) {
+		if (data->invert & XT_IPVS_VPORT)
+			printf("! ");
+
+		printf("%svport %u ", prefix, ntohs(data->vport));
+	}
+
+	if (data->bitmask & XT_IPVS_DIR) {
+		if (data->invert & XT_IPVS_DIR)
+			printf("%svdir REPLY ", prefix);
+		else
+			printf("%svdir ORIGINAL ", prefix);
+	}
+
+	if (data->bitmask & XT_IPVS_METHOD) {
+		if (data->invert & XT_IPVS_METHOD)
+			printf("! ");
+
+		printf("%svmethod ", prefix);
+		switch (data->fwd_method) {
+		case IP_VS_CONN_F_DROUTE:
+			printf("GATE ");
+			break;
+		case IP_VS_CONN_F_TUNNEL:
+			printf("IPIP ");
+			break;
+		case IP_VS_CONN_F_MASQ:
+			printf("MASQ ");
+			break;
+		default:
+			/* Hu? */
+			printf("UNKNOWN ");
+			break;
+		}
+	}
+
+	if (data->bitmask & XT_IPVS_VPORTCTL) {
+		if (data->invert & XT_IPVS_VPORTCTL)
+			printf("! ");
+
+		printf("%svportctl %u ", prefix, ntohs(data->vportctl));
+	}
+}
+
+static void ipvs_mt4_print(const void *ip, const struct xt_entry_match *match,
+			   int numeric)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV4, numeric, "");
+}
+
+static void ipvs_mt6_print(const void *ip, const struct xt_entry_match *match,
+			   int numeric)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV6, numeric, "");
+}
+
+static void ipvs_mt4_save(const void *ip, const struct xt_entry_match *match)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV4, true, "--");
+}
+
+static void ipvs_mt6_save(const void *ip, const struct xt_entry_match *match)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV6, true, "--");
+}
+
+static struct xtables_match ipvs_matches_reg[] = {
+	{
+		.version       = XTABLES_VERSION,
+		.name          = "ipvs",
+		.revision      = 0,
+		.family        = NFPROTO_IPV4,
+		.size          = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.help          = ipvs_mt_help,
+		.parse         = ipvs_mt4_parse,
+		.final_check   = ipvs_mt_check,
+		.print         = ipvs_mt4_print,
+		.save          = ipvs_mt4_save,
+		.extra_opts    = ipvs_mt_opts,
+	},
+	{
+		.version       = XTABLES_VERSION,
+		.name          = "ipvs",
+		.revision      = 0,
+		.family        = NFPROTO_IPV6,
+		.size          = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.help          = ipvs_mt_help,
+		.parse         = ipvs_mt6_parse,
+		.final_check   = ipvs_mt_check,
+		.print         = ipvs_mt6_print,
+		.save          = ipvs_mt6_save,
+		.extra_opts    = ipvs_mt_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_matches(ipvs_matches_reg,
+				 ARRAY_SIZE(ipvs_matches_reg));
+}
Index: iptables/extensions/libxt_ipvs.man
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ iptables/extensions/libxt_ipvs.man	2010-07-21 09:44:02.000000000 +0900
@@ -0,0 +1,24 @@
+Match IPVS connection properties.
+.TP
+[\fB!\fR] \fB\-\-ipvs\fP
+packet belongs to an IPVS connection
+.TP
+Any of the following options implies \-\-ipvs (even negated)
+.TP
+[\fB!\fR] \fB\-\-vproto\fP \fIprotocol\fP
+VIP protocol to match; by number or name, e.g. "tcp"
+.TP
+[\fB!\fR] \fB\-\-vaddr\fP \fIaddress\fP[\fB/\fP\fImask\fP]
+VIP address to match
+.TP
+[\fB!\fR] \fB\-\-vport\fP \fIport\fP
+VIP port to match; by number or name, e.g. "http"
+.TP
+\fB\-\-vdir\fP {\fBORIGINAL\fP|\fBREPLY\fP}
+flow direction of packet
+.TP
+[\fB!\fR] \fB\-\-vmethod\fP {\fBGATE\fP|\fBIPIP\fP|\fBMASQ\fP}
+IPVS forwarding method used
+.TP
+[\fB!\fR] \fB\-\-vportctl\fP \fIport\fP
+VIP port of the controlling connection to match, e.g. 21 for FTP
Index: iptables/include/linux/netfilter/xt_ipvs.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ iptables/include/linux/netfilter/xt_ipvs.h	2010-07-21 10:05:47.000000000 +0900
@@ -0,0 +1,25 @@
+#ifndef _XT_IPVS_H
+#define _XT_IPVS_H 1
+
+#define XT_IPVS_IPVS_PROPERTY	(1 << 0) /* all other options imply this one */
+#define XT_IPVS_PROTO		(1 << 1)
+#define XT_IPVS_VADDR		(1 << 2)
+#define XT_IPVS_VPORT		(1 << 3)
+#define XT_IPVS_DIR		(1 << 4)
+#define XT_IPVS_METHOD		(1 << 5)
+#define XT_IPVS_VPORTCTL	(1 << 6)
+#define XT_IPVS_MASK		((1 << 7) - 1)
+#define XT_IPVS_ONCE_MASK	(XT_IPVS_MASK & ~XT_IPVS_IPVS_PROPERTY)
+
+struct xt_ipvs_mtinfo {
+	union nf_inet_addr	vaddr, vmask;
+	__be16			vport;
+	__u8			l4proto;
+	__u8			fwd_method;
+	__be16			vportctl;
+
+	__u8			invert;
+	__u8			bitmask;
+};
+
+#endif /* _XT_IPVS_H */

^ permalink raw reply

* Re: linux-next: manual merge of the net tree with the net-current tree
From: David Miller @ 2010-07-21  1:27 UTC (permalink / raw)
  To: joe; +Cc: sfr, netdev, linux-next, linux-kernel, jdike, mst
In-Reply-To: <1279593240.19374.2.camel@Joe-Laptop.home>

From: Joe Perches <joe@perches.com>
Date: Mon, 19 Jul 2010 19:34:00 -0700

> On Tue, 2010-07-20 at 12:20 +1000, Stephen Rothwell wrote:
>> I fixed it up (see below) and can carry the fix as necessary.
> @@@ -527,15 -527,12 +527,14 @@@ static long vhost_net_set_backend(struc
>   
>         /* start polling new socket */
>         oldsock = vq->private_data;
> -       if (sock == oldsock)
> -               goto done;
> +       if (sock != oldsock){
> 
> Trivial: missing space before open brace in commit
> dd1f4078f0d2de74a308f00a2dffbd550cfba59f

Thanks guys, I'm taking care of this as I merge net-2.6 into
net-next-2.6

^ permalink raw reply

* Re: [PATCH] LSM: Add post accept() hook.
From: Tetsuo Handa @ 2010-07-21  2:00 UTC (permalink / raw)
  To: paul.moore
  Cc: davem, eric.dumazet, jmorris, sam, serge, netdev,
	linux-security-module
In-Reply-To: <201007201552.29539.paul.moore@hp.com>

Paul Moore wrote:
> On Monday, July 19, 2010 09:36:31 pm Tetsuo Handa wrote:
> > One is for dropping connections from unwanted hosts. Administrators define
> > policy before enabling enforcing mode (the mode which connections are
> > dropped if operation was not granted by policy). Administrators specify
> > acceptable hosts (i.e. hosts which this host needs to communicate with)
> > and unacceptable hosts (i.e. hosts which this host needn't to communicate
> > with).
> 
> You can enforce per-host access controls without the need for a post-accept() 
> hooks, e.g. security_sock_rcv_skb() and the netfilter hooks 
> (NF_INET_POST_ROUTING, NF_INET_FORWARD, NF_INET_LOCAL_OUT).  Or are you 
> interested in controlling which hosts an _application_ can communicate with?

I'm interested in controlling which ports on which hosts a _process_ can
communicate with. In TOMOYO's words, "processes that belong to which TOMOYO's
domain can communicate with which ports on which hosts".

TOMOYO's rules are

  Processes that belong to FOO domain can open /etc/fstab for reading.
     ( allow_read /etc/fstab )

  Processes that belong to FOO domain can create /tmp/file with mode 0600.
     ( allow_create /tmp/file 0600 )

  Processes that belong to FOO domain can connect to port 80 on host
  10.20.30.40 using TCP protocol.
     ( allow_network TCP connect 10.20.30.40 80 )

and so on. But currently,

  Processes that belong to FOO domain can accept TCP connections from port 1024
  on host 10.20.30.40.
     ( allow_network TCP accept 10.20.30.40 1024 )

  Processes that belong to FOO domain can receive UDP messages from port 65535
  on host 100.200.10.20.
     ( allow_network UDP connect 100.200.10.20 65535 )

are impossible.

Regarding outgoing connections/datagrams, we can specify address/port
parameters from the point of view of _process_ who actually sends requests.
But regarding incoming connections/datagrams, we cannot specify address/port
parameters from the point of view of _process_ who actually receives requests.

We can enforce per-host access controls using iptables.
But we can't use iptables for controlling address/port parameters for incoming
connections/datagrams because the process who actually receives requests
(ServewrApp2 in below example) is not always the same as the process who
created the socket (ServerApp1 in below example).

> > Dropping connections would happen if some process was hijacked and the
> > process attempted to communicate with other processes using TCP
> > connections. But dropping connections should not happen in normal
> > circumstance.
> 
> It doesn't matter if dropping connections is normal or not, what matters is 
> that it can happen.
> 
> > The other is for updating process's state variable upon accept() operation.
> > LKM version of TOMOYO has per a task_struct variable that is used for
> > implementing stateful permissions. (As of now, not implemented for LSM
> > version of TOMOYO.)
> 
> I'm open to re-introducing a post-accept() hook that does not have a return 
> value, in other words, a hook that can only be used to update LSM state and 
> not affect the connection.  Although I do think you could probably achieve the 
> same thing using some of the existing LSM hooks (look at how SELinux updates 
> its state upon accept()) but that is something you would have to look it and 
> see if it works for TOMOYO.

I can't figure out why the hook must not affect the connection.
Is it possible to clarify using below players?

Server1 and Client1 are hosts which are connected on TCP/IP network.
ServerApp1 and ServerApp2 are applications running on Server1 which might call
socket(), bind(), listen(), accept(), send(), recv(), shutdown(), close() and
execute().
ClientApp1 and ClientApp2 are applications running on Client1 which might call
socket(), connect(), send(), recv(), shutdown(), close().
Router1 and Router2 are routers which exist between Server1 and Client1.

  +-------+   +-------+   +-------+   +-------+
  |Server1|---|Router1|---|Router2|---|Client1|
  +-------+   +-------+   +-------+   +-------+

Event sequences:

Server1                       Client1

  ServerApp1 creates a socket using socket().

  ServerApp1 binds to an address using bind().

  ServerApp1 listens to the address using listen().

                                ClientApp1 creates a socket using socket().

                                ClientApp1 issues connect() request.

                                  Sends SYN.

    Receives SYN.

    Sends SYN/ACK.

                                  Receives SYN/ACK.

                                  Sends ACK.

    Receives ACK.

                                ClientApp1 issues send() request.

                                  Sends data.

    Receives data.

    Sends ACK.

                                  Receives ACK.

                                ClientApp1 issues send() request.

                                  Sends data.

    Receives data.

    Sends ACK.

                                  Receives ACK.

  ServerApp1 calls execve("ServerApp2").

  ServerApp2 issues accept() request.

    security_socket_accept() is called.

    sock->ops->accept() is called.

    security_socket_post_accept() is called. (*3)

    newsock->ops->getname() is called. (*1)

    move_addr_to_user() is called. (*2)

    fd_install() is called.

  ServerApp2 issues some requests.

    Some LSM hooks will be called.




*1: This may fail and the connection is discarded if failed.
    Thus, newsock->ops->getname() affects the connection.
    This is not fault of ServerApp2. Maybe this is fault of ClientApp1 or
    Router1 or Router2, but discarding already established connection is
    justified.

*2: This may fail and the connection is discarded if failed.
    Thus, move_addr_to_user() affects the connection.
    Is this the fault of ServerApp2?
    If the upeer_sockaddr supplied by ServerApp2 was bad, this is the fault of
    ServerApp2. Thus, discarding already established connection is justified.
    If the upeer_sockaddr supplied by ServerApp2 was good but physical RAM was
    not yet assigned for the upeer_sockaddr, and OOM killer was invoked when
    attempted to write to upeer_sockaddr and OOM killer chose ServerApp2, and
    the ServerApp2 is killed. This is not fault of ServerApp2. But discarding
    already established connection is justified.

*3: newsock->ops->getname() and move_addr_to_user() already affects the
    connection. They discard already established connections even if the cause
    is not ServerApp2's fault. Why security_socket_post_accept() affecting the
    connection cannot be justified?

Router1 and Router2 can inject RST into the already established connections
at any time (if they are IDS/IPS or broken or malicious).
How does security_socket_post_accept() returning an error differs from these
routers injecting RST?

Regards.

^ permalink raw reply

* linux-next: manual merge of the wireless tree with the net tree
From: Stephen Rothwell @ 2010-07-21  2:04 UTC (permalink / raw)
  To: John W. Linville
  Cc: linux-next, linux-kernel, Eric Dumazet, Wey-Yi Guy, David Miller,
	netdev

Hi John,

Today's linux-next merge of the wireless tree got a conflict in
drivers/net/wireless/iwlwifi/iwl-commands.h between commit
ba2d3587912f82d1ab4367975b1df460db60fb1e ("drivers/net: use __packed
annotation") from the net tree and commit
7c094c5cc4d28062abf0d33ca022dbea6c522558 ("iwlwifi: additional statistic
debug counter") from the wireless tree.

I fixed it up (see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc drivers/net/wireless/iwlwifi/iwl-commands.h
index 8d2db9d,83247f7..0000000
--- a/drivers/net/wireless/iwlwifi/iwl-commands.h
+++ b/drivers/net/wireless/iwlwifi/iwl-commands.h
@@@ -3035,8 -3035,9 +3035,9 @@@ struct iwl39_statistics_tx 
  struct statistics_dbg {
  	__le32 burst_check;
  	__le32 burst_count;
- 	__le32 reserved[4];
+ 	__le32 wait_for_silence_timeout_cnt;
+ 	__le32 reserved[3];
 -} __attribute__ ((packed));
 +} __packed;
  
  struct iwl39_statistics_div {
  	__le32 tx_on_a;

^ permalink raw reply

* linux-next: manual merge of the net tree with Linus' tree
From: Stephen Rothwell @ 2010-07-21  2:04 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-next, linux-kernel, Herbert Xu

[-- Attachment #1: Type: text/plain, Size: 493 bytes --]

Hi all,

Today's linux-next merge of the net tree got a conflict in
net/bridge/br_device.c between commit
573201f36fd9c7c6d5218cdcd9948cee700b277d ("bridge: Partially disable
netpoll support") from Linus' tree and commit
91d2c34a4eed32876ca333b0ca44f3bc56645805 ("bridge: Fix netpoll support")
from the net tree.

The net tree commit seems to be a fuller fix, so I used that.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [RFC PATCH] dst: check if dst is freed in dst_check()
From: Eric Dumazet @ 2010-07-21  2:28 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <4C457120.9070105@6wind.com>

Le mardi 20 juillet 2010 à 11:49 +0200, Nicolas Dichtel a écrit :
> Hi,
> 
> I probably missed something, but I cannot find where obsolete field is checked 
> when dst_check() is called. If dst->obsolete is > 1, dst cannot be used!
> 
> Attached is a proposal to fix this issue.
> 
> 

> diff --git a/include/net/dst.h b/include/net/dst.h
> index 81d1413..7bf4f9a 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -319,6 +319,8 @@ static inline int dst_input(struct sk_buff *skb)
>  
>  static inline struct dst_entry *dst_check(struct dst_entry *dst, u32 cookie)
>  {
> +	if (dst->obsolete > 1)
> +		return NULL;
>  	if (dst->obsolete)
>  		dst = dst->ops->check(dst, cookie);
>  	return dst;

I believe this is not needed and redundant.

In what case do you think this matters ?

To my knowledge dst_check() is only used by net/xfrm/xfrm_policy.c

And xfrm_dst_check() does the necessary checks.

static struct dst_entry *xfrm_dst_check(struct dst_entry *dst, u32 cookie)
{
        /* Code (such as __xfrm4_bundle_create()) sets dst->obsolete
         * to "-1" to force all XFRM destinations to get validated by
         * dst_ops->check on every use.  We do this because when a
         * normal route referenced by an XFRM dst is obsoleted we do
         * not go looking around for all parent referencing XFRM dsts
         * so that we can invalidate them.  It is just too much work.
         * Instead we make the checks here on every use.  For example:
         *
         *      XFRM dst A --> IPv4 dst X
         *
         * X is the "xdst->route" of A (X is also the "dst->path" of A
         * in this example).  If X is marked obsolete, "A" will not
         * notice.  That's what we are validating here via the
         * stale_bundle() check.
         *
         * When a policy's bundle is pruned, we dst_free() the XFRM
         * dst which causes it's ->obsolete field to be set to a
         * positive non-zero integer.  If an XFRM dst has been pruned
         * like this, we want to force a new route lookup.
         */
        if (dst->obsolete < 0 && !stale_bundle(dst))
                return dst;

        return NULL;
}



^ permalink raw reply

* Re: linux-next: manual merge of the net tree with Linus' tree
From: Herbert Xu @ 2010-07-21  2:31 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: David Miller, netdev, linux-next, linux-kernel
In-Reply-To: <20100721120448.31e325fd.sfr@canb.auug.org.au>

On Wed, Jul 21, 2010 at 12:04:48PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the net tree got a conflict in
> net/bridge/br_device.c between commit
> 573201f36fd9c7c6d5218cdcd9948cee700b277d ("bridge: Partially disable
> netpoll support") from Linus' tree and commit
> 91d2c34a4eed32876ca333b0ca44f3bc56645805 ("bridge: Fix netpoll support")
> from the net tree.
> 
> The net tree commit seems to be a fuller fix, so I used that.

Yeah, 573201f is just the temporary fix for 2.6.35.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next-2.6] ixgbe: fix ethtool stats
From: Eric Dumazet @ 2010-07-21  2:38 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: David Miller, Jesse Brandeburg, PJ Waskiewicz, netdev
In-Reply-To: <AANLkTik7JrI3HrtvQRTgrRU30fFb2lrgGxUJsWXedBL0@mail.gmail.com>

Le mardi 20 juillet 2010 à 15:06 -0700, Jeff Kirsher a écrit :
> On Tue, Jul 20, 2010 at 10:28, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Note : I am currently unable to test following patch, could you please
> > Intel guys test it and Ack (or Nack) it ?
> >
> > Thanks !
> >
> > [PATCH net-next-2.6] ixgbe: fix ethtool stats
> >
> > In latest changes about 64bit stats on 32bit arches,
> > [commit 28172739f0a276eb8 (net: fix 64 bit counters on 32 bit arches)],
> > I missed ixgbe uses a bit of magic in its ixgbe_gstrings_stats
> > definition.
> >
> > IXGBE_NETDEV_STAT() must now assume offsets relative to
> > rtnl_link_stats64, not relative do dev->stats.
> >
> > As a bonus, we also get 64bit stats on ethtool -S
> >
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > ---
> >  drivers/net/ixgbe/ixgbe_ethtool.c |   42 ++++++++++++++--------------
> >  1 file changed, 21 insertions(+), 21 deletions(-)
> >
> 
> Thanks Eric, I have added it to my queue.
> 

Thanks !

By the way, my ixgbe conf doesnt like net-next-2.6 at all.
(No link is established in my fiber loop configuration)

current linux-2.6 git runs correctly, link at 10Gb, so there is a
regression somewhere.

As this machine is quite slow (I dont have anymore my Nehalem dev
machine, had to use an old setup), a bisection would take one month...




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox