Netdev List
 help / color / mirror / Atom feed
* [ofa-general] [RFC PATCH] IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
From: Roland Dreier @ 2008-01-04 22:05 UTC (permalink / raw)
  To: netdev; +Cc: general
In-Reply-To: <20071210203841.GJ30090@obsidianresearch.com>

Any objection to merging the following for 2.6.25?

[Rolf -- I think it makes more sense to delete the overwriting of the
P_Key in ipoib_multicast.c in this patch rather than later in the
series; do you agree?]

Thanks,
  Roland


From: Rolf Manderscheid <rvm@obsidianresearch.com>

An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
---
 drivers/infiniband/core/cma.c                  |    4 +---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    4 ----
 include/net/if_inet6.h                         |   11 +++++++----
 include/net/ip.h                               |   10 ++++++----
 net/ipv4/arp.c                                 |    2 +-
 net/ipv6/ndisc.c                               |    2 +-
 6 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 312ec74..982836e 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2610,11 +2610,9 @@ static void cma_set_mgid(struct rdma_id_private *id_priv,
 		/* IPv6 address is an SA assigned MGID. */
 		memcpy(mgid, &sin6->sin6_addr, sizeof *mgid);
 	} else {
-		ip_ib_mc_map(sin->sin_addr.s_addr, mc_map);
+		ip_ib_mc_map(sin->sin_addr.s_addr, dev_addr->broadcast, mc_map);
 		if (id_priv->id.ps == RDMA_PS_UDP)
 			mc_map[7] = 0x01;	/* Use RDMA CM signature */
-		mc_map[8] = ib_addr_get_pkey(dev_addr) >> 8;
-		mc_map[9] = (unsigned char) ib_addr_get_pkey(dev_addr);
 		*mgid = *(union ib_gid *) (mc_map + 4);
 	}
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 858ada1..2628339 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -788,10 +788,6 @@ void ipoib_mcast_restart_task(struct work_struct *work)
 
 		memcpy(mgid.raw, mclist->dmi_addr + 4, sizeof mgid);
 
-		/* Add in the P_Key */
-		mgid.raw[4] = (priv->pkey >> 8) & 0xff;
-		mgid.raw[5] = priv->pkey & 0xff;
-
 		mcast = __ipoib_mcast_find(dev, &mgid);
 		if (!mcast || test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) {
 			struct ipoib_mcast *nmcast;
diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index 448eccb..b24508a 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -269,18 +269,21 @@ static inline void ipv6_arcnet_mc_map(const struct in6_addr *addr, char *buf)
 	buf[0] = 0x00;
 }
 
-static inline void ipv6_ib_mc_map(struct in6_addr *addr, char *buf)
+static inline void ipv6_ib_mc_map(const struct in6_addr *addr,
+				  const unsigned char *broadcast, char *buf)
 {
+	unsigned char scope = broadcast[5] & 0xF;
+
 	buf[0]  = 0;		/* Reserved */
 	buf[1]  = 0xff;		/* Multicast QPN */
 	buf[2]  = 0xff;
 	buf[3]  = 0xff;
 	buf[4]  = 0xff;
-	buf[5]  = 0x12;		/* link local scope */
+	buf[5]  = 0x10 | scope;	/* scope from broadcast address */
 	buf[6]  = 0x60;		/* IPv6 signature */
 	buf[7]  = 0x1b;
-	buf[8]  = 0;		/* P_Key */
-	buf[9]  = 0;
+	buf[8]  = broadcast[8];	/* P_Key */
+	buf[9]  = broadcast[9];
 	memcpy(buf + 10, addr->s6_addr + 6, 10);
 }
 #endif
diff --git a/include/net/ip.h b/include/net/ip.h
index 840dd91..50c8889 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -266,20 +266,22 @@ static inline void ip_eth_mc_map(__be32 naddr, char *buf)
  *	Leave P_Key as 0 to be filled in by driver.
  */
 
-static inline void ip_ib_mc_map(__be32 naddr, char *buf)
+static inline void ip_ib_mc_map(__be32 naddr, const unsigned char *broadcast, char *buf)
 {
 	__u32 addr;
+	unsigned char scope = broadcast[5] & 0xF;
+
 	buf[0]  = 0;		/* Reserved */
 	buf[1]  = 0xff;		/* Multicast QPN */
 	buf[2]  = 0xff;
 	buf[3]  = 0xff;
 	addr    = ntohl(naddr);
 	buf[4]  = 0xff;
-	buf[5]  = 0x12;		/* link local scope */
+	buf[5]  = 0x10 | scope;	/* scope from broadcast address */
 	buf[6]  = 0x40;		/* IPv4 signature */
 	buf[7]  = 0x1b;
-	buf[8]  = 0;		/* P_Key */
-	buf[9]  = 0;
+	buf[8]  = broadcast[8];		/* P_Key */
+	buf[9]  = broadcast[9];
 	buf[10] = 0;
 	buf[11] = 0;
 	buf[12] = 0;
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 08174a2..54a76b8 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -211,7 +211,7 @@ int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir)
 		ip_tr_mc_map(addr, haddr);
 		return 0;
 	case ARPHRD_INFINIBAND:
-		ip_ib_mc_map(addr, haddr);
+		ip_ib_mc_map(addr, dev->broadcast, haddr);
 		return 0;
 	default:
 		if (dir) {
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 777ed73..85947ea 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -337,7 +337,7 @@ int ndisc_mc_map(struct in6_addr *addr, char *buf, struct net_device *dev, int d
 		ipv6_arcnet_mc_map(addr, buf);
 		return 0;
 	case ARPHRD_INFINIBAND:
-		ipv6_ib_mc_map(addr, buf);
+		ipv6_ib_mc_map(addr, dev->broadcast, buf);
 		return 0;
 	default:
 		if (dir) {
-- 
1.5.4.rc2

^ permalink raw reply related

* Re: network interface state
From: Milan Kocian @ 2008-01-04 20:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, drepper
In-Reply-To: <20071114.153103.230463871.davem@davemloft.net>


On Wed, 2007-11-14 at 15:31 -0800, David Miller wrote:
> From: Ulrich Drepper <drepper@redhat.com>
> Date: Wed, 14 Nov 2007 12:59:52 -0800
> 
> > Just FYI, with the current getaddrinfo code it is even more critical to
> > get to a point where I can cache network interface information and query
> > the kernel whether it changed.  We now have to read the RTM_GETADDR
> > tables for every lookup.  It was more limited with the old, incomplete
> > implementation.
> > 
> > Even if it's something as simple as a RTM_SEQUENCE request which returns
> > a number that is bumped at every interface change.
> 
> This sounds like a useful feature.  Essentially you want a generation
> ID that increments every time a configuration change is made?
> 
> Most daemons handle this by listening for events on the netlink
> socket, but I understand how that might not be practical for
> glibc.
> 
> > Related: I need to know about the device type (the ARPHRD_* values) to
> > determine whether a device is for a native transport or a tunnel.  What
> > I currently do is:
> > 
> > - - at the beginning I get information about all interfaces using
> > RTM_GETADDR
> > 
> > - - them later I have to find the device type by
> > 
> >   + reading the RTM_GETLINK data to get to the device name
> > 
> >   + then using the name and ioctl(SIOCGIFHWADDR) I get the device type
> > 
> > 
> > It would be so much nicer if the device type would be part of the
> > RTM_GETADDR data, or at least the RTM_GETLINK data.
> 
> It's part of the link information, Look in ifinfomsg->ifi_type
> 
> In general be suspicious if it seems netlink isn't providing
> the same information available via the old ioctls :-)

Sorry for late little offtopic question: Exists any simple way how to
differentiate virtual network devices from real devices (e.g. vlans,
bridges)?

They have the same ifinfomsg->ifi_type as real devices (ARPHRD_ETHER). I
know to differentiate vlans via IFLA_LINK attribute. But how to
differentiate bridges from real devices I didn't determine.
Thanks for any answer.

regards,

milan kocian




^ permalink raw reply

* Re: NAPI poll behavior in various Intel drivers
From: James Chapman @ 2008-01-04 20:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, auke-jan.h.kok
In-Reply-To: <20080104.034036.160194618.davem@davemloft.net>

David Miller wrote:
> Several Intel networking drivers such as e1000, e1000e
> and e100 all do this to exit NAPI polling:
> 
> 	if ((!tx_cleaned && (work_done == 0)) ||
>  	   !netif_running(poll_dev)) {
> 
> I tried to make this use in the NAPI rework:
> 
> 	if ((!tx_cleaned && (work_done < budget)) ||
>  	   !netif_running(poll_dev)) {
> 
> But that got reverted by:
> 
> 	commit f7bbb9098315d712351aba7861a8c9fcf6bf0213
> 
> 	e1000: Fix NAPI state bug when Rx complete
>     
> 	Don't exit polling when we have not yet used our budget, this causes
> 	the NAPI system to end up with a messed up poll list.
>     
> 	Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> 	Signed-off-by: Jeff Garzik <jeff@garzik.org>
> 
> I definitely would not have signed off on that :-)
> 
> That "tx_cleaned" thing clouds the logic in all of these driver's
> poll routines.
> 
> The one necessary precondition is that when work_done < budget
> we exit polling and return a value less than budget.
> 
> If the ->poll() returns a value less than budget, net_rx_action()
> assumes that the device has been removed from the poll() list.
> 
> 		/* Drivers must not modify the NAPI state if they
> 		 * consume the entire weight.  In such cases this code
> 		 * still "owns" the NAPI instance and therefore can
> 		 * move the instance around on the list at-will.
> 		 */
> 		if (unlikely(work == weight))
> 			list_move_tail(&n->poll_list, list);
> 
> This "work_done == 0" test in these drivers, is thus, wrong.  It
> should be "work_done < budget" and the whole tx_cleaned thing needs to
> be removed.
> 
> It happens to work, because what happens is that we loop again and
> process the same NAPI struct again.
> 
> As a result, E1000 devices get polled TWICE every time they
> process at least one RX packet, but do not consume the whole
> quota.
> 
> I smell a performance hack, and if so this is wrong and against
> all of the principles of NAPI.  Either that or it's a workaround
> for the "!netif_running()" case.

You have a good nose, Dave. :) And you can probably partly blame me for
this scheme. I worked with the e100 driver author (Scott Feldman) 5
years ago to performance tune and stress test the driver. I found much
better packets/sec forwarding performance (packets in one e100, out
another) by staying in polled mode until no receive or transmit was done.

With the latest NAPI, this code has to change. But rather than remove
the tx_cleaned logic completely, shouldn't transmit processing be
included in the work_done accounting when a driver does transmit cleanup
processing in the poll? If not, then when an interface is transmitting
far more than receiving, it will exit polled mode on every poll.

> I noticed this while trying to work on a generic fix for the
> "->poll() does not exit when device is brought down while being
> bombed with packets" bug.

-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development


^ permalink raw reply

* [PATCH net-2.6.24][ATM]: [nicstar] delay irq setup until card is configured
From: chas williams - CONTRACTOR @ 2008-01-04 21:29 UTC (permalink / raw)
  To: netdev; +Cc: davem

if an interrupt occurs too soon, the driver oops while trying to handle
a shortage of buffers condition (caused by no buffers having been
allocated yet).

commit a3322d3d34854edf27f7950efaa93e68f5f71ace
Author: Chas Williams - CONTRACTOR <chas@relax.cmf.nrl.navy.mil>
Date:   Fri Jan 4 16:27:07 2008 -0500

    [ATM]: [nicstar] delay irq setup until card is configured
    
    Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>

diff --git a/drivers/atm/nicstar.c b/drivers/atm/nicstar.c
index 14ced85..0c205b0 100644
--- a/drivers/atm/nicstar.c
+++ b/drivers/atm/nicstar.c
@@ -625,14 +625,6 @@ static int __devinit ns_init_card(int i, struct pci_dev *pcidev)
    if (mac[i] == NULL)
       nicstar_init_eprom(card->membase);
 
-   if (request_irq(pcidev->irq, &ns_irq_handler, IRQF_DISABLED | IRQF_SHARED, "nicstar", card) != 0)
-   {
-      printk("nicstar%d: can't allocate IRQ %d.\n", i, pcidev->irq);
-      error = 9;
-      ns_init_card_error(card, error);
-      return error;
-   }
-
    /* Set the VPI/VCI MSb mask to zero so we can receive OAM cells */
    writel(0x00000000, card->membase + VPM);
       
@@ -858,8 +850,6 @@ static int __devinit ns_init_card(int i, struct pci_dev *pcidev)
       card->iovpool.count++;
    }
 
-   card->intcnt = 0;
-
    /* Configure NICStAR */
    if (card->rct_size == 4096)
       ns_cfg_rctsize = NS_CFG_RCTSIZE_4096_ENTRIES;
@@ -868,6 +858,15 @@ static int __devinit ns_init_card(int i, struct pci_dev *pcidev)
 
    card->efbie = 1;
 
+   card->intcnt = 0;
+   if (request_irq(pcidev->irq, &ns_irq_handler, IRQF_DISABLED | IRQF_SHARED, "nicstar", card) != 0)
+   {
+      printk("nicstar%d: can't allocate IRQ %d.\n", i, pcidev->irq);
+      error = 9;
+      ns_init_card_error(card, error);
+      return error;
+   }
+
    /* Register device */
    card->atmdev = atm_dev_register("nicstar", &atm_ops, -1, NULL);
    if (card->atmdev == NULL)

^ permalink raw reply related

* Re: NAPI poll behavior in various Intel drivers
From: David Miller @ 2008-01-04 21:24 UTC (permalink / raw)
  To: jchapman; +Cc: netdev, auke-jan.h.kok
In-Reply-To: <477E92B6.8010809@katalix.com>

From: James Chapman <jchapman@katalix.com>
Date: Fri, 04 Jan 2008 20:10:30 +0000

> With the latest NAPI, this code has to change. But rather than remove
> the tx_cleaned logic completely, shouldn't transmit processing be
> included in the work_done accounting when a driver does transmit cleanup
> processing in the poll?

Most other NAPI drivers don't do this, they just process all the
pending TX work unconditionally and do not account it into the NAPI
poll work.

The logic is that, like link state handling, TX work is very cheap and
not the cpu cycle eater that RX packet processing is.

^ permalink raw reply

* Re: 2.6.24-rc6-mm1
From: Torsten Kaiser @ 2008-01-04 21:24 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Herbert Xu, Andrew Morton, linux-kernel, Neil Brown,
	J. Bruce Fields, netdev, Tom Tucker
In-Reply-To: <64bb37e0801040721p57ff3d54wc3de00546d1d2ff1@mail.gmail.com>

On Jan 4, 2008 4:21 PM, Torsten Kaiser <just.for.lkml@googlemail.com> wrote:
> On Jan 4, 2008 2:30 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> > - above git-nfsd and git-net tests should be probably repeated with
> > -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches
> > only, and if bug triggers, with one reversed; btw., since in previous
> > message you mentioned that 50 packages could be not enough to trigger
> > this, these 54 above could make too little margin yet.
>
> Yes, I think I really need to redo the git-nfsd-test.
> With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound
> run of kde-packages triggered it after only 5 packages.
> I don't know what this bug hates about kdeartwork-wallpaper (triggered
> it this time) or kdeartwork-styles.

49 more (kde-)packages did work too. Still looks like it is only in -mm.

Torsten

^ permalink raw reply

* Re: [PATCH 1/2] LSM: Add inet_sys_snd_skb() LSM hook
From: David Miller @ 2008-01-04 21:09 UTC (permalink / raw)
  To: paul.moore; +Cc: netdev
In-Reply-To: <200801040938.27515.paul.moore@hp.com>

From: Paul Moore <paul.moore@hp.com>
Date: Fri, 4 Jan 2008 09:38:27 -0500

> Unfortunately, it's not quite that easy at present.  The only field we 
> have in the skb where we could possibly set a flag is the secmark field 
> which is already taken.

Herbert Xu added a "peeked" field in net-2.6.25 that is only used on
input while processing socket receive queues.  You could use it on
output.

^ permalink raw reply

* Re: e1000_clean_tx_irq: Detected Tx Unit Hang - it's bug?
From: Kok, Auke @ 2008-01-04 20:11 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev
In-Reply-To: <477E2833.70301@bigtelecom.ru>

Badalian Vyacheslav wrote:
> Hello all.
> Some time in dmesg i see this:
> 
> [16121.400422] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> [16121.400426]   Tx Queue             <0>
> [16121.400427]   TDH                  <28>
> [16121.400429]   TDT                  <28>
> [16121.400430]   next_to_use          <28>
> [16121.400431]   next_to_clean        <7d>
> [16121.400433] buffer_info[next_to_clean]
> [16121.400434]   time_stamp           <17b949>
> [16121.400435]   next_to_watch        <7d>
> [16121.400437]   jiffies              <17ba57>
> [16121.400438]   next_to_watch.status <1>

might be a bug. What kernel version are you using? it appears the tx handler was
just sitting idle and this message might be bogus, which is one of the things that
we fixed recently.

Auke

^ permalink raw reply

* ESFQ (or SFQ updates) mainline status?
From: Jeff Gustafson @ 2008-01-04 19:21 UTC (permalink / raw)
  To: netdev

Hi all,
	I have a question about the status of the spiffy updates to SFQ.  I
*really* like the ESFQ idea.  I appears to be exactly what I'm looking
for.  From what I can tell from this mailing list, SFQ is getting some
or all of the ESFQ features.  Although the ESFQ web site gives detailed
information about the status for inclusion in mainline, I am having
trouble finding out exactly what (if any) parts of ESFQ got into SFQ in
mainline.
	Did any of the ESFQ-like enhancements for SFQ reach .23 or do I still
need to patch?  I am running a stripped-down Fedora 8 box and it would
be very convenient if the patches were already in .23 or added by the
Fedora team.
	I tried to run the ESFQ command against SFQ, but the command was
rejected:

/sbin/tc qdisc add dev eth1 parent 1:12 handle 12: sfq perturb 10 hash
ctnatchg

	It appears the even if the kernel does have the patches, the shipped
version tc does not.

				..Jeff



^ permalink raw reply

* [PATCH 3/3] [SCTP]: Add back the code that accounted for FORWARD_TSN parameter in INIT.
From: Vlad Yasevich @ 2008-01-04 19:45 UTC (permalink / raw)
  To: netdev; +Cc: davem, lksctp-developers, Vlad Yasevich
In-Reply-To: <1199475946-11103-1-git-send-email-vladislav.yasevich@hp.com>

Some recent changes completely removed accounting for the FORWARD_TSN
parameter length in the INIT and INIT-ACK chunk.  This is wrong and
should be restored.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 net/sctp/sm_make_chunk.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index ed7c9e3..3cc629d 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -210,6 +210,9 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc,
 	chunksize = sizeof(init) + addrs_len + SCTP_SAT_LEN(num_types);
 	chunksize += sizeof(ecap_param);
 
+	if (sctp_prsctp_enable)
+		chunksize += sizeof(prsctp_param);
+
 	/* ADDIP: Section 4.2.7:
 	 *  An implementation supporting this extension [ADDIP] MUST list
 	 *  the ASCONF,the ASCONF-ACK, and the AUTH  chunks in its INIT and
@@ -369,6 +372,9 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
 	if (asoc->peer.ecn_capable)
 		chunksize += sizeof(ecap_param);
 
+	if (sctp_prsctp_enable)
+		chunksize += sizeof(prsctp_param);
+
 	if (sctp_addip_enable) {
 		extensions[num_ext] = SCTP_CID_ASCONF;
 		extensions[num_ext+1] = SCTP_CID_ASCONF_ACK;
-- 
1.5.3.5


^ permalink raw reply related

* [PATCH 2/3] [SCTP]: Correctly handle AUTH parameters in unexpected INIT
From: Vlad Yasevich @ 2008-01-04 19:45 UTC (permalink / raw)
  To: netdev; +Cc: davem, lksctp-developers, Vlad Yasevich
In-Reply-To: <1199475946-11103-1-git-send-email-vladislav.yasevich@hp.com>

When processing an unexpected INIT chunk, we do not need to
do any preservation of the old AUTH parameters.  In fact,
doing such preservations will nullify AUTH and allow connection
stealing.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 net/sctp/sm_statefuns.c |   22 ----------------------
 1 files changed, 0 insertions(+), 22 deletions(-)

diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 5fb8477..d247ed4 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -1309,26 +1309,6 @@ static void sctp_tietags_populate(struct sctp_association *new_asoc,
 	new_asoc->c.initial_tsn         = asoc->c.initial_tsn;
 }
 
-static void sctp_auth_params_populate(struct sctp_association *new_asoc,
-				    const struct sctp_association *asoc)
-{
-	/* Only perform this if AUTH extension is enabled */
-	if (!sctp_auth_enable)
-		return;
-
-	/* We need to provide the same parameter information as
-	 * was in the original INIT.  This means that we need to copy
-	 * the HMACS, CHUNKS, and RANDOM parameter from the original
-	 * assocaition.
-	 */
-	memcpy(new_asoc->c.auth_random, asoc->c.auth_random,
-		sizeof(asoc->c.auth_random));
-	memcpy(new_asoc->c.auth_hmacs, asoc->c.auth_hmacs,
-		sizeof(asoc->c.auth_hmacs));
-	memcpy(new_asoc->c.auth_chunks, asoc->c.auth_chunks,
-		sizeof(asoc->c.auth_chunks));
-}
-
 /*
  * Compare vtag/tietag values to determine unexpected COOKIE-ECHO
  * handling action.
@@ -1486,8 +1466,6 @@ static sctp_disposition_t sctp_sf_do_unexpected_init(
 
 	sctp_tietags_populate(new_asoc, asoc);
 
-	sctp_auth_params_populate(new_asoc, asoc);
-
 	/* B) "Z" shall respond immediately with an INIT ACK chunk.  */
 
 	/* If there are errors need to be reported for unknown parameters,
-- 
1.5.3.5


^ permalink raw reply related

* [PATCH 1/3] [SCTP]: Fix the name of the authentication event.
From: Vlad Yasevich @ 2008-01-04 19:45 UTC (permalink / raw)
  To: netdev; +Cc: davem, lksctp-developers, Vlad Yasevich
In-Reply-To: <1199475946-11103-1-git-send-email-vladislav.yasevich@hp.com>

The even should be called SCTP_AUTHENTICATION_INDICATION.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 include/net/sctp/user.h |    2 +-
 net/sctp/ulpevent.c     |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 00848b6..954090b 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -450,7 +450,7 @@ enum sctp_sn_type {
 	SCTP_SHUTDOWN_EVENT,
 	SCTP_PARTIAL_DELIVERY_EVENT,
 	SCTP_ADAPTATION_INDICATION,
-	SCTP_AUTHENTICATION_EVENT,
+	SCTP_AUTHENTICATION_INDICATION,
 };
 
 /* Notification error codes used to fill up the error fields in some
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 2c17c7e..3073143 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -830,7 +830,7 @@ struct sctp_ulpevent *sctp_ulpevent_make_authkey(
 	ak = (struct sctp_authkey_event *)
 		skb_put(skb, sizeof(struct sctp_authkey_event));
 
-	ak->auth_type = SCTP_AUTHENTICATION_EVENT;
+	ak->auth_type = SCTP_AUTHENTICATION_INDICATION;
 	ak->auth_flags = 0;
 	ak->auth_length = sizeof(struct sctp_authkey_event);
 
-- 
1.5.3.5


^ permalink raw reply related

* [PATCH 0/3] [SCTP] AUTH Bug fixes
From: Vlad Yasevich @ 2008-01-04 19:45 UTC (permalink / raw)
  To: netdev; +Cc: davem, lksctp-developers

Hi David

The following 3 paches address some issues with the AUTH implementation.
These are for 2.6.24.

Thanks
-vlad 

^ permalink raw reply

* sparc oops in ip_fast_csum
From: Mariusz Kozlowski @ 2008-01-04 17:37 UTC (permalink / raw)
  To: David Miller; +Cc: sparclinux, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3066 bytes --]

Hello,

        This comes from the Linus latest linux-2.6 tree. Randomly happened.
Can't reproduce that. More info below.

Unable to handle kernel paging request at virtual address 00000000be286000
tsk->{mm,active_mm}->context = 0000000000000eae
tsk->{mm,active_mm}->pgd = fffff800be0e4000
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
isic(9009): Oops [#1]
TSTATE: 0000008811009606 TPC: 0000000000572f00 TNPC: 0000000000572f04 Y: 00000000    Not tainted
TPC: <ip_fast_csum+0x48/0x80>
g0: fffff800bf729d40 g1: 0000000000000000 g2: 00000000624b9fd9 g3: 00000000017c0400
g4: fffff800be664800 g5: fffff8007f96c000 g6: fffff800be0a0000 g7: ffffffffffffe12a
o0: fffff800be285ff0 o1: 0000000000000002 o2: 0000098b7f1220bf o3: 0000000000000000
o4: 0000000000004034 o5: 0000000000000000 sp: fffff800be0a30f1 ret_pc: 00000000006490e0
RPC: <raw_sendmsg+0x4c8/0x780>
l0: fffff800be7045a0 l1: fffff800be27e4a0 l2: fffff800be4c8000 l3: fffff800be27e490
l4: 0000000000000000 l5: 0000000000000010 l6: 0000000000000000 l7: 00000000f7fa97e4
i0: 0000000000000000 i1: fffff800bf0f8de0 i2: fffff800be0a3e78 i3: 0000000000000055
i4: 0000000000000000 i5: 00ffffffffffffff i6: fffff800be0a3221 i7: 0000000000653910
I7: <inet_sendmsg+0x38/0x60>
Caller[0000000000653910]: inet_sendmsg+0x38/0x60
Caller[00000000005f8d00]: sock_sendmsg+0x88/0xc0
Caller[00000000005f90b0]: sys_sendto+0xb8/0x100
Caller[00000000004062d4]: linux_sparc_syscall32+0x3c/0x40
Caller[00000000000136cc]: 0x136d4
Instruction DUMP: 8ea1e001  22480005  852aa010 <c6022010> 106ffff9  01000000  84828002  9530a010  94428000

(gdb) l *0x0000000000572f00 
0x572f00 is at arch/sparc64/lib/ipcsum.S:24.
19              addccc  %o2, %g0, %o2
20              subcc   %g7, 1, %g7
21              be,a,pt %icc, 2f
22               sll    %o2, 16, %g2
23      
24              lduw    [%o0 + 0x10], %g3
25              ba,pt   %xcc, 1b
26               nop
27      2:      addcc   %o2, %g2, %g2
28              srl     %g2, 16, %o2

The box is sparc64 - sun ultra 60.

# cat /proc/cpuinfo
cpu             : TI UltraSparc II  (BlackBird)
fpu             : UltraSparc II integrated FPU
prom            : OBP 3.17.0 1998/10/23 11:26
type            : sun4u
ncpus probed    : 2
ncpus active    : 2
D$ parity tl1   : 0
I$ parity tl1   : 0
Cpu0ClkTck      : 000000001ad21e13
Cpu2ClkTck      : 000000001ad21e13
MMU Type        : Spitfire
State:
CPU0:           online
CPU2:           online

Linux sparc64 2.6.24-rc6 #2 SMP PREEMPT Fri Jan 4 14:12:37 CET 2008 sparc64 sun4u TI UltraSparc II (BlackBird) GNU/Linux
 
Gnu C                  4.1.2
Gnu make               3.81
binutils               Binutils
util-linux             2.12r
mount                  2.12r
module-init-tools      3.4
e2fsprogs              1.40.3
Linux C Library        6.1
Dynamic linker (ldd)   2.6.1
Procps                 3.2.7
Net-tools              1.60
Kbd                    1.13
Sh-utils               6.9
udev                   115
Modules Loaded         sg sr_mod cdrom

Regards,

        Mariusz

[-- Attachment #2: .config --]
[-- Type: text/plain, Size: 19835 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc6
# Fri Jan  4 13:56:01 2008
#
CONFIG_SPARC=y
CONFIG_SPARC64=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_64BIT=y
CONFIG_MMU=y
CONFIG_QUICKLIST=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_NO_VIRT_TO_BUS=y
CONFIG_OF=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_SPARC64_PAGE_SIZE_8KB=y
# CONFIG_SPARC64_PAGE_SIZE_64KB is not set
# CONFIG_SPARC64_PAGE_SIZE_512KB is not set
# CONFIG_SPARC64_PAGE_SIZE_4MB is not set
CONFIG_SECCOMP=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
# CONFIG_HOTPLUG_CPU is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
# CONFIG_CGROUPS is not set
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_FAIR_USER_SCHED=y
# CONFIG_FAIR_CGROUP_SCHED is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_RELAY=y
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_BLK_DEV_BSG is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_SYSVIPC_COMPAT=y
CONFIG_GENERIC_HARDIRQS=y

#
# General machine setup
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_NR_CPUS=4
# CONFIG_CPU_FREQ is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_RESOURCES_64BIT=y
CONFIG_ZONE_DMA_FLAG=0
CONFIG_NR_QUICK=1
CONFIG_SBUS=y
CONFIG_SBUSCHAR=y
CONFIG_SUN_AUXIO=y
CONFIG_SUN_IO=y
# CONFIG_SUN_LDOMS is not set
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCI_SYSCALL=y
CONFIG_ARCH_SUPPORTS_MSI=y
# CONFIG_PCI_MSI is not set
CONFIG_PCI_LEGACY=y
# CONFIG_PCI_DEBUG is not set
CONFIG_SUN_OPENPROMFS=m
CONFIG_SPARC32_COMPAT=y
CONFIG_COMPAT=y
CONFIG_BINFMT_ELF32=y
# CONFIG_BINFMT_AOUT32 is not set

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=m
CONFIG_SOLARIS_EMUL=y
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
# CONFIG_CMDLINE_BOOL is not set

#
# Networking
#
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_XFRM=y
CONFIG_XFRM_USER=m
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
# CONFIG_INET_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
# CONFIG_INET_LRO is not set
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
# CONFIG_IPV6 is not set
# CONFIG_INET6_XFRM_TUNNEL is not set
# CONFIG_INET6_TUNNEL is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETFILTER is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_SCHED is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set

#
# Wireless
#
# CONFIG_CFG80211 is not set
# CONFIG_WIRELESS_EXT is not set
# CONFIG_MAC80211 is not set
# CONFIG_IEEE80211 is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
# CONFIG_PREVENT_FIRMWARE_BUILD is not set
CONFIG_FW_LOADER=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_CONNECTOR=m
# CONFIG_MTD is not set
CONFIG_OF_DEVICE=y
# CONFIG_PARPORT is not set
# CONFIG_BLK_DEV is not set
# CONFIG_MISC_DEVICES is not set
# CONFIG_IDE is not set

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
# CONFIG_SCSI_NETLINK is not set
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=m
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=m
# CONFIG_CHR_DEV_SCH is not set

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
# CONFIG_SCSI_CONSTANTS is not set
# CONFIG_SCSI_LOGGING is not set
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
# CONFIG_SCSI_FC_ATTRS is not set
CONFIG_SCSI_ISCSI_ATTRS=y
# CONFIG_SCSI_SAS_LIBSAS is not set
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
CONFIG_SCSI_SYM53C8XX_2=y
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
CONFIG_SCSI_SYM53C8XX_MMIO=y
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLOGICPTI is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_SUNESP is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_ATA is not set
# CONFIG_MD is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_IEEE1394 is not set
# CONFIG_I2O is not set
CONFIG_NETDEVICES=y
# CONFIG_NETDEVICES_MULTIQUEUE is not set
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
# CONFIG_VETH is not set
# CONFIG_IP1000 is not set
# CONFIG_ARCNET is not set
# CONFIG_PHYLIB is not set
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
# CONFIG_SUNLANCE is not set
CONFIG_HAPPYMEAL=y
# CONFIG_SUNBMAC is not set
# CONFIG_SUNQE is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_NET_TULIP is not set
# CONFIG_HP100 is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_NET_PCI is not set
# CONFIG_B44 is not set
# CONFIG_NETDEV_1000 is not set
# CONFIG_NETDEV_10000 is not set
# CONFIG_TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
# CONFIG_SHAPER is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_POLLDEV is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_KEYBOARD_SUNKBD=y
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
CONFIG_SERIO_PCIPS2=m
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_VT_HW_CONSOLE_BINDING is not set
# CONFIG_SERIAL_NONSTANDARD is not set

#
# Serial drivers
#

#
# Non-8250 serial port support
#
CONFIG_SERIAL_SUNCORE=y
# CONFIG_SERIAL_SUNZILOG is not set
CONFIG_SERIAL_SUNSU=y
CONFIG_SERIAL_SUNSU_CONSOLE=y
CONFIG_SERIAL_SUNSAB=m
CONFIG_SERIAL_SUNHV=y
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
# CONFIG_IPMI_HANDLER is not set
# CONFIG_HW_RANDOM is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_RAW_DRIVER is not set
# CONFIG_TCG_TPM is not set
CONFIG_DEVPORT=y
# CONFIG_I2C is not set

#
# SPI support
#
# CONFIG_SPI is not set
# CONFIG_SPI_MASTER is not set
# CONFIG_W1 is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_HWMON is not set
# CONFIG_WATCHDOG is not set

#
# Sonics Silicon Backplane
#
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_SM501 is not set

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set
# CONFIG_DVB_CORE is not set
# CONFIG_DAB is not set

#
# Graphics support
#
# CONFIG_DRM is not set
# CONFIG_VGASTATE is not set
CONFIG_VIDEO_OUTPUT_CONTROL=m
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
# CONFIG_FB_DDC is not set
# CONFIG_FB_CFB_FILLRECT is not set
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_SYS_FOPS is not set
CONFIG_FB_DEFERRED_IO=y
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_UVESA is not set
CONFIG_FB_SBUS=y
# CONFIG_FB_BW2 is not set
# CONFIG_FB_CG3 is not set
# CONFIG_FB_CG6 is not set
CONFIG_FB_FFB=y
# CONFIG_FB_TCX is not set
# CONFIG_FB_CG14 is not set
# CONFIG_FB_P9100 is not set
# CONFIG_FB_LEO is not set
# CONFIG_FB_XVR500 is not set
# CONFIG_FB_XVR2500 is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
# CONFIG_PROM_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY is not set
# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
CONFIG_FONTS=y
# CONFIG_FONT_8x8 is not set
# CONFIG_FONT_8x16 is not set
# CONFIG_FONT_6x11 is not set
# CONFIG_FONT_7x14 is not set
# CONFIG_FONT_PEARL_8x8 is not set
# CONFIG_FONT_ACORN_8x8 is not set
CONFIG_FONT_SUN8x16=y
# CONFIG_FONT_SUN12x22 is not set
# CONFIG_FONT_10x18 is not set
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
# CONFIG_LOGO_LINUX_CLUT224 is not set
CONFIG_LOGO_SUN_CLUT224=y

#
# Sound
#
# CONFIG_SOUND is not set
# CONFIG_HID_SUPPORT is not set
# CONFIG_USB_SUPPORT is not set
# CONFIG_MMC is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_INFINIBAND is not set
# CONFIG_RTC_CLASS is not set

#
# Userspace I/O
#
# CONFIG_UIO is not set

#
# Misc Linux/SPARC drivers
#
CONFIG_SUN_OPENPROMIO=y
# CONFIG_OBP_FLASH is not set
# CONFIG_SUN_BPP is not set
# CONFIG_BBC_I2C is not set
# CONFIG_ENVCTRL is not set
# CONFIG_DISPLAY7SEG is not set

#
# File systems
#
CONFIG_EXT2_FS=y
# CONFIG_EXT2_FS_XATTR is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
# CONFIG_EXT4DEV_FS is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_FS_POSIX_ACL is not set
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_QUOTA is not set
CONFIG_DNOTIFY=y
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_FUSE_FS is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=m
# CONFIG_JOLIET is not set
# CONFIG_ZISOFS is not set
CONFIG_UDF_FS=m

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
# CONFIG_CONFIGFS_FS is not set

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_SUN_PARTITION=y
# CONFIG_NLS is not set
# CONFIG_DLM is not set
# CONFIG_INSTRUMENTATION is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_MAGIC_SYSRQ=y
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_SCHED_DEBUG is not set
# CONFIG_SCHEDSTATS is not set
# CONFIG_TIMER_STATS is not set
CONFIG_SLUB_DEBUG_ON=y
# CONFIG_DEBUG_PREEMPT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_PI_LIST=y
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
# CONFIG_PROVE_LOCKING is not set
CONFIG_LOCKDEP=y
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_LOCKDEP is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_SG is not set
CONFIG_FRAME_POINTER=y
# CONFIG_FORCED_INLINING is not set
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_SAMPLES is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_DCFLUSH is not set
# CONFIG_STACK_DEBUG is not set
# CONFIG_DEBUG_BOOTMEM is not set
CONFIG_DEBUG_PAGEALLOC=y

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
CONFIG_CRYPTO=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_MANAGER=y
# CONFIG_CRYPTO_HMAC is not set
# CONFIG_CRYPTO_XCBC is not set
# CONFIG_CRYPTO_NULL is not set
# CONFIG_CRYPTO_MD4 is not set
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_SHA1=m
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_TGR192 is not set
CONFIG_CRYPTO_GF128MUL=y
# CONFIG_CRYPTO_ECB is not set
CONFIG_CRYPTO_CBC=y
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_XTS is not set
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_DES=y
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_AES is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_ANUBIS is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_DEFLATE is not set
# CONFIG_CRYPTO_MICHAEL_MIC is not set
CONFIG_CRYPTO_CRC32C=m
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_TEST is not set
# CONFIG_CRYPTO_AUTHENC is not set
CONFIG_CRYPTO_HW=y

#
# Library routines
#
CONFIG_BITREVERSE=y
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
CONFIG_LIBCRC32C=m
CONFIG_PLIST=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y

^ permalink raw reply

* Re: 2.6.24-rc6-mm1
From: Torsten Kaiser @ 2008-01-04 15:21 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Herbert Xu, Andrew Morton, linux-kernel, Neil Brown,
	J. Bruce Fields, netdev, Tom Tucker
In-Reply-To: <20080104133031.GA3329@ff.dom.local>

On Jan 4, 2008 2:30 PM, Jarek Poplawski <jarkao2@gmail.com> wrote:
> On 04-01-2008 11:23, Torsten Kaiser wrote:
> > On Jan 2, 2008 10:51 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >> On Wed, Jan 02, 2008 at 07:29:59PM +0100, Torsten Kaiser wrote:
> >>> Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings.
> >> OK that's great.  The next step would be to try excluding specific git
> >> trees from mm to see if they make a difference.
> >>
> >> The two specific trees of interest would be git-nfsd and git-net.
> >
> > git-nfsd from git://git.linux-nfs.org/projects/bfields/linux.git#for-mm
> > -> compiling and installing 54 packages worked without crashes.
> >
> > git-net from git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git
> > -> compiling and installing 95 packages worked without crashes.
> ...
> > I will enable CONFIG_IOMMU_DEBUG in -rc6-mm1 and see, as otherwise I
> > have no clue where to look...
>
> Hi,
>
> A few questions/suggestions:

I'm open for any suggestions and will try to answer any questions.
The only thing that is sadly not practical is bisecting the borkenout
mm-patches, as triggering this error is to unreliable /
time-consuming.

> - is it still vanilla -rc6-mm1; I've seen on kernel list you tried
> some fixes around raid?

Yes, without these fixes I can't boot.
But they should only be run during starting the arrays, so I doubt
that this is that cause.
(Also -rc3-mm2 did not need this fix)

My skbuff-double-free-detector is still in there, but was never triggered.

> - could you remind this lockdep warning; is it always and the same,
> always before crash, or no rules?

???
I see no lockdep warning before the crashes.
I have seen a warning about the dst->__refcnt in dst_release and
different warnings about list operations.

I think I have always posted everything I have seen before the
crashes. (captured via serial console)

(If you mean the lockdep-problem in -rc6: That is more or less a
missing annotation during early bootup. The only problem with that is,
that it will causes lockdep to be turned off and so it can not be used
to find any real problem. A fix for that is in -mm so I do have
lockdep on the mm-kernels)

> - I've seen you looked after double freeing, but this last debug list
> warning could suggest locking problems during list modification too.

Yes, but Herbert mentioned double freeing a skb explicit and so I
tried to catch this.
I do not know enough about the network core to verify the locking of
the involved lists.

> - above git-nfsd and git-net tests should be probably repeated with
> -rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches
> only, and if bug triggers, with one reversed; btw., since in previous
> message you mentioned that 50 packages could be not enough to trigger
> this, these 54 above could make too little margin yet.

Yes, I think I really need to redo the git-nfsd-test.
With IOMMU_DEBUG enabled rc6-mm1worked for 52 packages, only a secound
run of kde-packages triggered it after only 5 packages.
I don't know what this bug hates about kdeartwork-wallpaper (triggered
it this time) or kdeartwork-styles.

Output from the crash with IOMMU_DEBUG (lockdep was enabled, but did
not trigger):
[15593.236374] Unable to handle kernel NULL pointer
dereference<3>list_add corruption. prev->next should be next
(ffffffff8078a410), but was ffff81011ec01e68. (prev=ffff81011ec01e68).
[15593.236374]  at 0000000000000000 RIP:
[15593.236374]  [<0000000000000000>]
[15593.236374] PGD 79d22067 PUD 7acd7067 PMD 0
[15593.236374] Oops: 0010 [1] SMP
[15593.236374] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[15593.236374] CPU 2
[15593.236374] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat sg hid pata_amd i2c_nforce2
[15593.236374] Pid: 510, comm: khpsbpkt Not tainted 2.6.24-rc6-mm1 #15
[15593.236374] RIP: 0010:[<0000000000000000>]  [<0000000000000000>]
[15593.236374] RSP: 0018:ffff81007eed3ee8  EFLAGS: 00010206
[15593.236374] RAX: ffff81007eed3ef0 RBX: ffff81011ec01e40 RCX: ffff81011ec01e40
[15593.236374] RDX: ffff81011ec01e68 RSI: ffff81011ec01e68 RDI: 0000000000000000
[15593.236374] RBP: ffff81007eed3f10 R08: 0000000000000000 R09: 0000000000000001
[15593.236374] R10: 0000000000000001 R11: 0000000000000058 R12: ffff81007eed3ef0
[15593.236374] R13: ffffffff80470e50 R14: 0000000000000000 R15: 0000000000000000
[15593.236374] FS:  00007f76e6c98700(0000) GS:ffff81011ff1f000(0000)
knlGS:00000000556f46c0
[15593.236374] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[15593.236374] CR2: 0000000000000000 CR3: 0000000079d29000 CR4: 00000000000006e0
[15593.236374] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15593.236374] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[15593.236374] Process khpsbpkt (pid: 510, threadinfo
ffff81007eed2000, task ffff81007eed0000)
[15593.236374] Stack:  ffffffff80470f0b ffff81011ec01e68
ffff81011ec014a8 00000000fffffffc
[15593.236374]  0000000000000000 ffff81007eed3f40 ffffffff8024d72d
00000000000001fb
[15593.236374]  ffff81007ff2bd98 00000000000001fb ffff81007ff2bcf0
ffff81007ff2df40
[15593.236374] Call Trace:
[15593.236374]  [<ffffffff80470f0b>] hpsbpkt_thread+0xbb/0x140
[15593.236374]  [<ffffffff8024d72d>] kthread+0x4d/0x80
[15593.236374]  [<ffffffff8020c4b8>] child_rip+0xa/0x12
[15593.236374]  [<ffffffff8020bbcf>] restore_args+0x0/0x30
[15593.236374]  [<ffffffff8024d6e0>] kthread+0x0/0x80
[15593.236374]  [<ffffffff8020c4ae>] child_rip+0x0/0x12
[15593.236374]
[15593.236374]
[15593.236374] Code:  Bad RIP value.
[15593.236374] RIP  [<0000000000000000>]
[15593.236374]  RSP <ffff81007eed3ee8>
[15593.236374] CR2: 0000000000000000
[15593.236377] ---[ end trace 11d2dc0fdbe1651f ]---
[15627.875963] ------------[ cut here ]------------
[15627.875963] kernel BUG at lib/list_debug.c:33!
[15627.875963] invalid opcode: 0000 [2] SMP
[15627.875963] last sysfs file:
/sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[15627.875963] CPU 3
[15627.875963] Modules linked in: radeon drm w83792d ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat sg hid pata_amd i2c_nforce2
[15627.875963] Pid: 6258, comm: nxssh Tainted: G      D 2.6.24-rc6-mm1 #15
[15627.875963] RIP: 0010:[<ffffffff803bd954>]  [<ffffffff803bd954>]
__list_add+0x54/0x60
[15627.875963] RSP: 0000:ffff81007ffb3c80  EFLAGS: 00010082
[15627.875963] RAX: 0000000000000079 RBX: 0000000000000082 RCX: 000000000000b9f1
[15627.875963] RDX: 0000000000001514 RSI: 0000000000000001 RDI: ffffffff807641c0
[15627.875963] RBP: ffff81007ffb3c80 R08: 0000000000000001 R09: 0000000000000010
[15627.875963] R10: 0000000000000000 R11: 0000000000000020 R12: ffff81011ec01e40
[15627.875963] R13: ffff81011ec01e68 R14: 0000000000000002 R15: ffff81007eee2000
[15627.875963] FS:  00007f3531da2700(0000) GS:ffff81011ff1f280(0000)
knlGS:00000000556f46c0
[15627.875963] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15627.875963] CR2: 00007ff643d49fe0 CR3: 0000000079c37000 CR4: 00000000000006e0
[15627.875963] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15627.875963] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[15627.875963] Process nxssh (pid: 6258, threadinfo ffff810079d50000,
task ffff810079d4e000)
[15627.875963] Stack:  ffff81007ffb3ca0 ffffffff8046f9e8
ffff81011ec01e40 000000000000000c
[15627.875963]  ffff81007ffb3d40 ffffffff8047027c ffff81007ddd8000
ffff81007ddd8048
[15627.875963]  ffff81007ffb3ce0 ffffffff805d4366 ffff81007ddd8000
000000000000000c
[15627.875963] Call Trace:
[15627.875963]  <IRQ>  [<ffffffff8046f9e8>] queue_packet_complete+0x48/0x80
[15627.875963]  [<ffffffff8047027c>] hpsb_packet_received+0x51c/0x6d0
[15627.875963]  [<ffffffff805d4366>] _spin_unlock+0x26/0x30
[15627.875963]  [<ffffffff8047cc3d>] dma_rcv_tasklet+0x22d/0x430
[15627.875963]  [<ffffffff8021273e>] read_hpet+0xe/0x10
[15627.875963]  [<ffffffff805d48f2>] _spin_unlock_irqrestore+0x42/0x60
[15627.875963]  [<ffffffff8023d8b3>] tasklet_action+0x53/0xd0
[15627.875963]  [<ffffffff8023d754>] __do_softirq+0x84/0x110
[15627.875963]  [<ffffffff8020c82c>] call_softirq+0x1c/0x30
[15627.875963]  [<ffffffff8020eaa5>] do_softirq+0x65/0xc0
[15627.875963]  [<ffffffff8023d6c5>] irq_exit+0x95/0xa0
[15627.875963]  [<ffffffff8020ebbf>] do_IRQ+0x8f/0x100
[15627.875963]  [<ffffffff8020bb26>] ret_from_intr+0x0/0xf
[15627.875963]  <EOI>
[15627.875963]
[15627.875963] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 8b 16
48 89 e5 e8
[15627.875963] RIP  [<ffffffff803bd954>] __list_add+0x54/0x60
[15627.875963]  RSP <ffff81007ffb3c80>
[15627.875963] ---[ end trace 11d2dc0fdbe1651f ]---
[15627.875963] Kernel panic - not syncing: Aiee, killing interrupt handler!

first oops:
(gdb) list *0xffffffff80470f0b
0xffffffff80470f0b is in hpsbpkt_thread (drivers/ieee1394/ieee1394_core.c:1139).
1134                    INIT_LIST_HEAD(&tmp);
1135                    spin_lock_irq(&pending_packets_lock);
1136                    list_splice_init(&hpsbpkt_queue, &tmp);
1137                    spin_unlock_irq(&pending_packets_lock);
1138
1139                    list_for_each_entry_safe(packet, p, &tmp, queue) {
1140                            list_del_init(&packet->queue);
1141                            packet->complete_routine(packet->complete_data);
1142                    }
1143

second oops:
(gdb) list *0xffffffff8046f9e8
0xffffffff8046f9e8 is in queue_packet_complete
(drivers/ieee1394/ieee1394_core.c:1115).
1110                    return;
1111            }
1112            if (packet->complete_routine != NULL) {
1113                    spin_lock_irqsave(&pending_packets_lock, flags);
1114                    list_add_tail(&packet->queue, &hpsbpkt_queue);
1115                    spin_unlock_irqrestore(&pending_packets_lock, flags);
1116                    wake_up_process(khpsbpkt_thread);
1117            }
1118            return;
1119    }

Torsten

^ permalink raw reply

* Re: [PATCH 1/2] LSM: Add inet_sys_snd_skb() LSM hook
From: Paul Moore @ 2008-01-04 14:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20080103.204549.204229388.davem@davemloft.net>

On Thursday 03 January 2008 11:45:49 pm David Miller wrote:
> From: Paul Moore <paul.moore@hp.com>
> Date: Thu, 03 Jan 2008 12:25:39 -0500
>
> > Add an inet_sys_snd_skb() LSM hook to allow the LSM to provide
> > packet level access control for all outbound packets.  Using the
> > existing postroute_last netfilter hook turns out to be problematic
> > as it is can be invoked multiple times for a single packet, e.g.
> > individual IPsec transforms, adding unwanted overhead and
> > complicating the security policy.
> >
> > Signed-off-by: Paul Moore <paul.moore@hp.com>
>
> I disagree with this change.
>
> The packet is different each time you see it in the postrouting hook,
> and also the new hook is thus redundant.

Well, thanks for taking a look.

> If it's a performance issue and you can classify the security early,
> mark the SKB as "seen" and then on subsequent hooks you can just
> return immediately if that flag is set.

Unfortunately, it's not quite that easy at present.  The only field we 
have in the skb where we could possibly set a flag is the secmark field 
which is already taken.  Granted, there is the possibility of 
segmenting the secmark field to some degree but that brings about a new 
set of problems involving the number of unique labels, backwards 
compatibility, etc.

Regardless, back to the drawing board.  I'll have to think a bit harder 
about a way to make the netfilter hooks work ...

-- 
paul moore
linux security @ hp

^ permalink raw reply

* Re: lockless pagecache Cassini regression
From: Nick Piggin @ 2008-01-04 14:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20080104.035831.52071724.davem@davemloft.net>

On Fri, Jan 04, 2008 at 03:58:31AM -0800, David Miller wrote:
> From: Nick Piggin <npiggin@suse.de>
> Date: Fri, 4 Jan 2008 12:33:52 +0100
> 
> > Just for interest, the lockless pagecache actually makes
> > page->_count unstable for all pages that _have ever_ been pagecache
> > pages (since the last quiescent rcu state, anyway). Basically, it
> > looks up and takes a ref on the struct page without ever having a
> > prior pin or reference on that page. It can do this because it knows
> > the struct page won't actually get freed. After taking the ref, it
> > rechecks that it has got the right page...
> 
> Ok, I understand the needs now.
> 
> I think the way the drivers/net/niu.c driver handles things
> would work better.  It only performs get_page(), atomic
> increments on compound_head(page)->_count, and __free_page().
> Is that all legal with the lockless pagecache?

Yes, you can use the regular refcounting and freeing operations. If the
lockless pagecache has got a speculative reference on the page after the
driver drops all "real" references, it will take care of freeing the page.

 
> If so I can likely rework the Cassini page management to
> behave similarly.

If you got the chance, that would be very nice.

Thanks,
Nick

^ permalink raw reply

* Re: 2.6.24-rc6-mm1
From: Jarek Poplawski @ 2008-01-04 13:30 UTC (permalink / raw)
  To: Torsten Kaiser
  Cc: Herbert Xu, Andrew Morton, linux-kernel, Neil Brown,
	J. Bruce Fields, netdev, Tom Tucker
In-Reply-To: <64bb37e0801040223q17a76565k3c7667a197403ce5@mail.gmail.com>

On 04-01-2008 11:23, Torsten Kaiser wrote:
> On Jan 2, 2008 10:51 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> On Wed, Jan 02, 2008 at 07:29:59PM +0100, Torsten Kaiser wrote:
>>> Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings.
>> OK that's great.  The next step would be to try excluding specific git
>> trees from mm to see if they make a difference.
>>
>> The two specific trees of interest would be git-nfsd and git-net.
> 
> git-nfsd from git://git.linux-nfs.org/projects/bfields/linux.git#for-mm
> -> compiling and installing 54 packages worked without crashes.
> 
> git-net from git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git
> -> compiling and installing 95 packages worked without crashes.
...
> I will enable CONFIG_IOMMU_DEBUG in -rc6-mm1 and see, as otherwise I
> have no clue where to look...

Hi,

A few questions/suggestions:

- is it still vanilla -rc6-mm1; I've seen on kernel list you tried
some fixes around raid?

- could you remind this lockdep warning; is it always and the same,
always before crash, or no rules?

- I've seen you looked after double freeing, but this last debug list
warning could suggest locking problems during list modification too.

- above git-nfsd and git-net tests should be probably repeated with
-rc6-mm1 git versions: so vanilla rc6 plus both these -mm patches
only, and if bug triggers, with one reversed; btw., since in previous
message you mentioned that 50 packages could be not enough to trigger
this, these 54 above could make too little margin yet.

Regards,
Jarek P.

^ permalink raw reply

* Re: [PATCH] (Revised) USB VID/PID clash between pegasus and hci_usb drivers
From: Marcel Holtmann @ 2008-01-04 12:45 UTC (permalink / raw)
  To: Chris Rankin; +Cc: netdev, greg
In-Reply-To: <130968.6446.qm@web52904.mail.re2.yahoo.com>

Hi Chris,

> I have surprisingly managed to track down the USB device descriptor for the wired Belkin network
> adapter, and it looks as if the device class is actually 0x00 instead of 0xFF. I originally used
> 0xFF because that is what my 3com device used (for some reason) and so I was hoping that Belkin
> would also use 0xFF for the same reason.

can you please follow Documentation/SubmittingPatches. The commit
message should explain why this is needed and why this change is the
best choice.

The patch itself is Acked-by: Marcel Holtmann <marcel@holtmann.org>

Regards

Marcel



^ permalink raw reply

* e1000_clean_tx_irq: Detected Tx Unit Hang - it's bug?
From: Badalian Vyacheslav @ 2008-01-04 12:36 UTC (permalink / raw)
  To: netdev

Hello all.
Some time in dmesg i see this:

[16121.400422] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
[16121.400426]   Tx Queue             <0>
[16121.400427]   TDH                  <28>
[16121.400429]   TDT                  <28>
[16121.400430]   next_to_use          <28>
[16121.400431]   next_to_clean        <7d>
[16121.400433] buffer_info[next_to_clean]
[16121.400434]   time_stamp           <17b949>
[16121.400435]   next_to_watch        <7d>
[16121.400437]   jiffies              <17ba57>
[16121.400438]   next_to_watch.status <1>
[16121.400968] htb: too many events !

Its bug or overload information?
Thanks

^ permalink raw reply

* Re: lockless pagecache Cassini regression
From: David Miller @ 2008-01-04 11:58 UTC (permalink / raw)
  To: npiggin; +Cc: netdev
In-Reply-To: <20080104113351.GA12706@wotan.suse.de>

From: Nick Piggin <npiggin@suse.de>
Date: Fri, 4 Jan 2008 12:33:52 +0100

> Just for interest, the lockless pagecache actually makes
> page->_count unstable for all pages that _have ever_ been pagecache
> pages (since the last quiescent rcu state, anyway). Basically, it
> looks up and takes a ref on the struct page without ever having a
> prior pin or reference on that page. It can do this because it knows
> the struct page won't actually get freed. After taking the ref, it
> rechecks that it has got the right page...

Ok, I understand the needs now.

I think the way the drivers/net/niu.c driver handles things
would work better.  It only performs get_page(), atomic
increments on compound_head(page)->_count, and __free_page().
Is that all legal with the lockless pagecache?

If so I can likely rework the Cassini page management to
behave similarly.

> Acked-by: Nick Piggin <npiggin@suse.de>

Thanks for reviewing.

^ permalink raw reply

* NAPI poll behavior in various Intel drivers
From: David Miller @ 2008-01-04 11:40 UTC (permalink / raw)
  To: netdev; +Cc: auke-jan.h.kok


Several Intel networking drivers such as e1000, e1000e
and e100 all do this to exit NAPI polling:

	if ((!tx_cleaned && (work_done == 0)) ||
 	   !netif_running(poll_dev)) {

I tried to make this use in the NAPI rework:

	if ((!tx_cleaned && (work_done < budget)) ||
 	   !netif_running(poll_dev)) {

But that got reverted by:

	commit f7bbb9098315d712351aba7861a8c9fcf6bf0213

	e1000: Fix NAPI state bug when Rx complete
    
	Don't exit polling when we have not yet used our budget, this causes
	the NAPI system to end up with a messed up poll list.
    
	Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
	Signed-off-by: Jeff Garzik <jeff@garzik.org>

I definitely would not have signed off on that :-)

That "tx_cleaned" thing clouds the logic in all of these driver's
poll routines.

The one necessary precondition is that when work_done < budget
we exit polling and return a value less than budget.

If the ->poll() returns a value less than budget, net_rx_action()
assumes that the device has been removed from the poll() list.

		/* Drivers must not modify the NAPI state if they
		 * consume the entire weight.  In such cases this code
		 * still "owns" the NAPI instance and therefore can
		 * move the instance around on the list at-will.
		 */
		if (unlikely(work == weight))
			list_move_tail(&n->poll_list, list);

This "work_done == 0" test in these drivers, is thus, wrong.  It
should be "work_done < budget" and the whole tx_cleaned thing needs to
be removed.

It happens to work, because what happens is that we loop again and
process the same NAPI struct again.

As a result, E1000 devices get polled TWICE every time they
process at least one RX packet, but do not consume the whole
quota.

I smell a performance hack, and if so this is wrong and against
all of the principles of NAPI.  Either that or it's a workaround
for the "!netif_running()" case.

I noticed this while trying to work on a generic fix for the
"->poll() does not exit when device is brought down while being
bombed with packets" bug.

^ permalink raw reply

* Re: lockless pagecache Cassini regression
From: Nick Piggin @ 2008-01-04 11:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20080103.193245.15134133.davem@davemloft.net>

On Thu, Jan 03, 2008 at 07:32:45PM -0800, David Miller wrote:
> 
> Nick, I think the following changeset:
> 
>     commit fa4f0774d7c6cccb4d1fda76b91dd8eddcb2dd6a
> 
>     [CASSINI]: dont touch page_count
>     
>     Remove page refcount manipulations from cassini driver by using
>     another field in struct page. Needed for lockless pagecache.
>     
>     Signed-off-by: Nick Piggin <npiggin@suse.de>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> Broke the Cassini driver.
> 
> While it is true that, within the cassini driver, you converted
> the counter bumps and tests accurately, this changeset does not
> account for the page count bumps that are going to occur
> in other contexts when the SKBs being constructed are freed
> up (and thus the skb_frag_struct pages get liberated).

Dang :(

 
> Basically what these drivers do is allocate a page, give it
> to the network card, and the card decides how to chop up the
> page based upon the size of arriving packets.  Therefore we
> don't know ahead of time how many references to the page we
> will generate while attaching bits of the page to receive
> packet SKBs that get built.
> 
> As incoming packets are constructed by the card using chunks of these
> system pages, it indicates in the descriptor which parts of which
> pages were used.  We then use this to attach the page parts onto the
> SKB.  Once the SKB is constructed it is passed up to the networking,
> later on the SKB is freed and the page references are dropped by
> kfree_skbmem().
> 
> And it is these page reference counts that the Cassini driver needs to
> test to be correct, not the driver local ones we are now using.
> 
> I do something similar to what Cassini was doing in the NIU driver
> (drivers/net/niu.c).  I think we need to restore Cassini to something
> similar to what NIU is doing.
> 
> The only tricky bit is the page count checks, and frankly those can
> just be transformed into references to compount_head(page)->_count or
> similar.
> 
> Actually, page_count() does exactly that, it is implemented by
> checking atomic_read(&compound_head(page)->_count) and thus I
> think the best thing to do is revert your changes.
> 
> These pages are freshly allocated pages, not stuff in the page
> cache or similar, so I think this is safe and the way to move
> forward.
> 
> Also, these changes added lots of new atomics to the driver.
> It's already doing get_page() etc. as needed.
> 
> What do you think Nick?

Thanks for the detailed explanation as always, Dave. And I hope I didn't
waste too much of your time debugging this...

It is a regression so clearly needs to be reverted. It would actually break
with the lockless pagecache, however the lockless pagecache is not merged
anyway, so reverting is a no-brainer at this point. 

Just for interest, the lockless pagecache actually makes page->_count unstable
for all pages that _have ever_ been pagecache pages (since the last quiescent
rcu state, anyway). Basically, it looks up and takes a ref on the struct page
without ever having a prior pin or reference on that page. It can do this
because it knows the struct page won't actually get freed. After taking the
ref, it rechecks that it has got the right page...

Anyway, that's the short story but probably gives you an idea of why it
hasn't been merged yet ;) I could avoid that whole hassle by rcu freeing
pagecache pages, but that would add other overheads.


> [CASSINI]: Revert 'dont touch page_count'.
> 
> This reverts changeset fa4f0774d7c6cccb4d1fda76b91dd8eddcb2dd6a
> ([CASSINI]: dont touch page_count) because it breaks the driver.
> 
> The local page counting added by this changeset did not account
> for the asynchronous page count changes done by kfree_skb()
> and friends.
> 
> The change adds extra atomics and on top of it all appears to be
> totally unnecessary as well.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

Acked-by: Nick Piggin <npiggin@suse.de>


> diff --git a/drivers/net/cassini.c b/drivers/net/cassini.c
> index 9030ca5..ff957f2 100644
> --- a/drivers/net/cassini.c
> +++ b/drivers/net/cassini.c
> @@ -336,30 +336,6 @@ static inline void cas_mask_intr(struct cas *cp)
>  		cas_disable_irq(cp, i);
>  }
>  
> -static inline void cas_buffer_init(cas_page_t *cp)
> -{
> -	struct page *page = cp->buffer;
> -	atomic_set((atomic_t *)&page->lru.next, 1);
> -}
> -
> -static inline int cas_buffer_count(cas_page_t *cp)
> -{
> -	struct page *page = cp->buffer;
> -	return atomic_read((atomic_t *)&page->lru.next);
> -}
> -
> -static inline void cas_buffer_inc(cas_page_t *cp)
> -{
> -	struct page *page = cp->buffer;
> -	atomic_inc((atomic_t *)&page->lru.next);
> -}
> -
> -static inline void cas_buffer_dec(cas_page_t *cp)
> -{
> -	struct page *page = cp->buffer;
> -	atomic_dec((atomic_t *)&page->lru.next);
> -}
> -
>  static void cas_enable_irq(struct cas *cp, const int ring)
>  {
>  	if (ring == 0) { /* all but TX_DONE */
> @@ -497,7 +473,6 @@ static int cas_page_free(struct cas *cp, cas_page_t *page)
>  {
>  	pci_unmap_page(cp->pdev, page->dma_addr, cp->page_size,
>  		       PCI_DMA_FROMDEVICE);
> -	cas_buffer_dec(page);
>  	__free_pages(page->buffer, cp->page_order);
>  	kfree(page);
>  	return 0;
> @@ -527,7 +502,6 @@ static cas_page_t *cas_page_alloc(struct cas *cp, const gfp_t flags)
>  	page->buffer = alloc_pages(flags, cp->page_order);
>  	if (!page->buffer)
>  		goto page_err;
> -	cas_buffer_init(page);
>  	page->dma_addr = pci_map_page(cp->pdev, page->buffer, 0,
>  				      cp->page_size, PCI_DMA_FROMDEVICE);
>  	return page;
> @@ -606,7 +580,7 @@ static void cas_spare_recover(struct cas *cp, const gfp_t flags)
>  	list_for_each_safe(elem, tmp, &list) {
>  		cas_page_t *page = list_entry(elem, cas_page_t, list);
>  
> -		if (cas_buffer_count(page) > 1)
> +		if (page_count(page->buffer) > 1) 
>  			continue;
>  
>  		list_del(elem);
> @@ -1374,7 +1348,7 @@ static inline cas_page_t *cas_page_spare(struct cas *cp, const int index)
>  	cas_page_t *page = cp->rx_pages[1][index];
>  	cas_page_t *new;
>  
> -	if (cas_buffer_count(page) == 1)
> +	if (page_count(page->buffer) == 1)
>  		return page;
>  
>  	new = cas_page_dequeue(cp);
> @@ -1394,7 +1368,7 @@ static cas_page_t *cas_page_swap(struct cas *cp, const int ring,
>  	cas_page_t **page1 = cp->rx_pages[1];
>  
>  	/* swap if buffer is in use */
> -	if (cas_buffer_count(page0[index]) > 1) {
> +	if (page_count(page0[index]->buffer) > 1) {
>  		cas_page_t *new = cas_page_spare(cp, index);
>  		if (new) {
>  			page1[index] = page0[index];
> @@ -2066,7 +2040,6 @@ static int cas_rx_process_pkt(struct cas *cp, struct cas_rx_comp *rxc,
>  		skb->len      += hlen - swivel;
>  
>  		get_page(page->buffer);
> -		cas_buffer_inc(page);
>  		frag->page = page->buffer;
>  		frag->page_offset = off;
>  		frag->size = hlen - swivel;
> @@ -2091,7 +2064,6 @@ static int cas_rx_process_pkt(struct cas *cp, struct cas_rx_comp *rxc,
>  			frag++;
>  
>  			get_page(page->buffer);
> -			cas_buffer_inc(page);
>  			frag->page = page->buffer;
>  			frag->page_offset = 0;
>  			frag->size = hlen;
> @@ -2255,7 +2227,7 @@ static int cas_post_rxds_ringN(struct cas *cp, int ring, int num)
>  	released = 0;
>  	while (entry != last) {
>  		/* make a new buffer if it's still in use */
> -		if (cas_buffer_count(page[entry]) > 1) {
> +		if (page_count(page[entry]->buffer) > 1) {
>  			cas_page_t *new = cas_page_dequeue(cp);
>  			if (!new) {
>  				/* let the timer know that we need to

^ permalink raw reply

* [patch 8/9][NETNS][IPV6] make sysctls route per namespace
From: Daniel Lezcano @ 2008-01-04 11:12 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20080104111226.776105484@localhost.localdomain>

[-- Attachment #1: sysctl/move-sysctl-route-to-netns.patch --]
[-- Type: text/plain, Size: 11055 bytes --]

All the sysctl concerning the routes are moved to the network namespace
structure. A helper function is called to initialize the variables.

Because the ipv6 protocol is not yet per namespace, the variables are
accessed relatively from the network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/ip6_route.h    |    2 -
 include/net/netns/ipv6.h   |    8 ++++++
 net/ipv6/ip6_fib.c         |   14 ++++++----
 net/ipv6/route.c           |   58 ++++++++++++++++++++++-----------------------
 net/ipv6/sysctl_net_ipv6.c |    9 ++++++
 5 files changed, 55 insertions(+), 36 deletions(-)

Index: net-2.6.25/include/net/netns/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/netns/ipv6.h
+++ net-2.6.25/include/net/netns/ipv6.h
@@ -13,6 +13,14 @@ struct netns_sysctl_ipv6 {
 	struct ctl_table_header *table;
    	struct inet_frags_ctl frags;
  	int bindv6only;
+  	int flush_delay;
+  	int ip6_rt_max_size;
+  	int ip6_rt_gc_min_interval;
+  	int ip6_rt_gc_timeout;
+  	int ip6_rt_gc_interval;
+  	int ip6_rt_gc_elasticity;
+  	int ip6_rt_mtu_expires;
+  	int ip6_rt_min_advmss;
 };
 
 struct netns_ipv6 {
Index: net-2.6.25/net/ipv6/route.c
===================================================================
--- net-2.6.25.orig/net/ipv6/route.c
+++ net-2.6.25/net/ipv6/route.c
@@ -73,14 +73,6 @@
 
 #define CLONE_OFFLINK_ROUTE 0
 
-static int ip6_rt_max_size = 4096;
-static int ip6_rt_gc_min_interval = HZ / 2;
-static int ip6_rt_gc_timeout = 60*HZ;
-int ip6_rt_gc_interval = 30*HZ;
-static int ip6_rt_gc_elasticity = 9;
-static int ip6_rt_mtu_expires = 10*60*HZ;
-static int ip6_rt_min_advmss = IPV6_MIN_MTU - 20 - 40;
-
 static struct rt6_info * ip6_rt_copy(struct rt6_info *ort);
 static struct dst_entry	*ip6_dst_check(struct dst_entry *dst, u32 cookie);
 static struct dst_entry *ip6_negative_advice(struct dst_entry *);
@@ -889,8 +881,8 @@ static inline unsigned int ipv6_advmss(u
 {
 	mtu -= sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
 
-	if (mtu < ip6_rt_min_advmss)
-		mtu = ip6_rt_min_advmss;
+	if (mtu < init_net.ipv6.sysctl.ip6_rt_min_advmss)
+		mtu = init_net.ipv6.sysctl.ip6_rt_min_advmss;
 
 	/*
 	 * Maximal non-jumbo IPv6 payload is IPV6_MAXPLEN and
@@ -990,19 +982,19 @@ static int ip6_dst_gc(void)
 	static unsigned long last_gc;
 	unsigned long now = jiffies;
 
-	if (time_after(last_gc + ip6_rt_gc_min_interval, now) &&
-	    atomic_read(&ip6_dst_ops.entries) <= ip6_rt_max_size)
+	if (time_after(last_gc + init_net.ipv6.sysctl.ip6_rt_gc_min_interval, now) &&
+	    atomic_read(&ip6_dst_ops.entries) <= init_net.ipv6.sysctl.ip6_rt_max_size)
 		goto out;
 
 	expire++;
 	fib6_run_gc(expire);
 	last_gc = now;
 	if (atomic_read(&ip6_dst_ops.entries) < ip6_dst_ops.gc_thresh)
-		expire = ip6_rt_gc_timeout>>1;
+		expire = init_net.ipv6.sysctl.ip6_rt_gc_timeout>>1;
 
 out:
-	expire -= expire>>ip6_rt_gc_elasticity;
-	return (atomic_read(&ip6_dst_ops.entries) > ip6_rt_max_size);
+	expire -= expire>>init_net.ipv6.sysctl.ip6_rt_gc_elasticity;
+	return (atomic_read(&ip6_dst_ops.entries) > init_net.ipv6.sysctl.ip6_rt_max_size);
 }
 
 /* Clean host part of a prefix. Not necessary in radix tree,
@@ -1508,7 +1500,7 @@ void rt6_pmtu_discovery(struct in6_addr 
 		rt->u.dst.metrics[RTAX_MTU-1] = pmtu;
 		if (allfrag)
 			rt->u.dst.metrics[RTAX_FEATURES-1] |= RTAX_FEATURE_ALLFRAG;
-		dst_set_expires(&rt->u.dst, ip6_rt_mtu_expires);
+		dst_set_expires(&rt->u.dst, init_net.ipv6.sysctl.ip6_rt_mtu_expires);
 		rt->rt6i_flags |= RTF_MODIFIED|RTF_EXPIRES;
 		goto out;
 	}
@@ -1534,7 +1526,7 @@ void rt6_pmtu_discovery(struct in6_addr 
 		 * which is 10 mins. After 10 mins the decreased pmtu is expired
 		 * and detecting PMTU increase will be automatically happened.
 		 */
-		dst_set_expires(&nrt->u.dst, ip6_rt_mtu_expires);
+		dst_set_expires(&nrt->u.dst, init_net.ipv6.sysctl.ip6_rt_mtu_expires);
 		nrt->rt6i_flags |= RTF_DYNAMIC|RTF_EXPIRES;
 
 		ip6_ins_rt(nrt);
@@ -2390,15 +2382,14 @@ static inline void ipv6_route_proc_fini(
 
 #ifdef CONFIG_SYSCTL
 
-static int flush_delay;
-
 static
 int ipv6_sysctl_rtcache_flush(ctl_table *ctl, int write, struct file * filp,
 			      void __user *buffer, size_t *lenp, loff_t *ppos)
 {
+	int delay = init_net.ipv6.sysctl.flush_delay;
 	if (write) {
 		proc_dointvec(ctl, write, filp, buffer, lenp, ppos);
-		fib6_run_gc(flush_delay <= 0 ? ~0UL : (unsigned long)flush_delay);
+		fib6_run_gc(delay <= 0 ? ~0UL : (unsigned long)delay);
 		return 0;
 	} else
 		return -EINVAL;
@@ -2407,7 +2398,7 @@ int ipv6_sysctl_rtcache_flush(ctl_table 
 ctl_table ipv6_route_table_template[] = {
 	{
 		.procname	=	"flush",
-		.data		=	&flush_delay,
+		.data		=	&init_net.ipv6.sysctl.flush_delay,
 		.maxlen		=	sizeof(int),
 		.mode		=	0200,
 		.proc_handler	=	&ipv6_sysctl_rtcache_flush
@@ -2423,7 +2414,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_MAX_SIZE,
 		.procname	=	"max_size",
-		.data		=	&ip6_rt_max_size,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_max_size,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec,
@@ -2431,7 +2422,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_GC_MIN_INTERVAL,
 		.procname	=	"gc_min_interval",
-		.data		=	&ip6_rt_gc_min_interval,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_min_interval,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2440,7 +2431,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_GC_TIMEOUT,
 		.procname	=	"gc_timeout",
-		.data		=	&ip6_rt_gc_timeout,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_timeout,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2449,7 +2440,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_GC_INTERVAL,
 		.procname	=	"gc_interval",
-		.data		=	&ip6_rt_gc_interval,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_interval,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2458,7 +2449,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_GC_ELASTICITY,
 		.procname	=	"gc_elasticity",
-		.data		=	&ip6_rt_gc_elasticity,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_elasticity,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2467,7 +2458,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_MTU_EXPIRES,
 		.procname	=	"mtu_expires",
-		.data		=	&ip6_rt_mtu_expires,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_mtu_expires,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2476,7 +2467,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_MIN_ADVMSS,
 		.procname	=	"min_adv_mss",
-		.data		=	&ip6_rt_min_advmss,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_min_advmss,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_jiffies,
@@ -2485,7 +2476,7 @@ ctl_table ipv6_route_table_template[] = 
 	{
 		.ctl_name	=	NET_IPV6_ROUTE_GC_MIN_INTERVAL_MS,
 		.procname	=	"gc_min_interval_ms",
-		.data		=	&ip6_rt_gc_min_interval,
+		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_min_interval,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
 		.proc_handler	=	&proc_dointvec_ms_jiffies,
@@ -2498,6 +2489,15 @@ struct ctl_table *ipv6_route_sysctl_init
 {
 	struct ctl_table *table;
 
+	net->ipv6.sysctl.flush_delay = 0;
+	net->ipv6.sysctl.ip6_rt_max_size = 4096;
+	net->ipv6.sysctl.ip6_rt_gc_min_interval = HZ / 2;
+	net->ipv6.sysctl.ip6_rt_gc_timeout = 60*HZ;
+	net->ipv6.sysctl.ip6_rt_gc_interval = 30*HZ;
+	net->ipv6.sysctl.ip6_rt_gc_elasticity = 9;
+	net->ipv6.sysctl.ip6_rt_mtu_expires = 10*60*HZ;
+	net->ipv6.sysctl.ip6_rt_min_advmss = IPV6_MIN_MTU - 20 - 40;
+
    	table = kmemdup(ipv6_route_table_template,
 			sizeof(ipv6_route_table_template),
 			GFP_KERNEL);
Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -113,7 +113,16 @@ static int ipv6_sysctl_net_init(struct n
    	if (!ipv6_icmp_table)
    		goto out_ipv6_route_table;
 
+   	ipv6_route_table[0].data = &net->ipv6.sysctl.flush_delay;
+   	ipv6_route_table[2].data = &net->ipv6.sysctl.ip6_rt_max_size;
+   	ipv6_route_table[3].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
+   	ipv6_route_table[4].data = &net->ipv6.sysctl.ip6_rt_gc_timeout;
+   	ipv6_route_table[5].data = &net->ipv6.sysctl.ip6_rt_gc_interval;
+   	ipv6_route_table[6].data = &net->ipv6.sysctl.ip6_rt_gc_elasticity;
+   	ipv6_route_table[7].data = &net->ipv6.sysctl.ip6_rt_mtu_expires;
+   	ipv6_route_table[8].data = &net->ipv6.sysctl.ip6_rt_min_advmss;
    	ipv6_table[0].child = ipv6_route_table;
+
    	ipv6_table[1].child = ipv6_icmp_table;
 
   	ipv6_table[2].data = &net->ipv6.sysctl.bindv6only;
Index: net-2.6.25/include/net/ip6_route.h
===================================================================
--- net-2.6.25.orig/include/net/ip6_route.h
+++ net-2.6.25/include/net/ip6_route.h
@@ -43,8 +43,6 @@ extern struct rt6_info	ip6_prohibit_entr
 extern struct rt6_info	ip6_blk_hole_entry;
 #endif
 
-extern int ip6_rt_gc_interval;
-
 extern void			ip6_route_input(struct sk_buff *skb);
 
 extern struct dst_entry *	ip6_route_output(struct sock *sk,
Index: net-2.6.25/net/ipv6/ip6_fib.c
===================================================================
--- net-2.6.25.orig/net/ipv6/ip6_fib.c
+++ net-2.6.25/net/ipv6/ip6_fib.c
@@ -681,13 +681,15 @@ static __inline__ void fib6_start_gc(str
 {
 	if (ip6_fib_timer.expires == 0 &&
 	    (rt->rt6i_flags & (RTF_EXPIRES|RTF_CACHE)))
-		mod_timer(&ip6_fib_timer, jiffies + ip6_rt_gc_interval);
+		mod_timer(&ip6_fib_timer, jiffies +
+			  init_net.ipv6.sysctl.ip6_rt_gc_interval);
 }
 
 void fib6_force_start_gc(void)
 {
 	if (ip6_fib_timer.expires == 0)
-		mod_timer(&ip6_fib_timer, jiffies + ip6_rt_gc_interval);
+		mod_timer(&ip6_fib_timer, jiffies +
+			  init_net.ipv6.sysctl.ip6_rt_gc_interval);
 }
 
 /*
@@ -1447,7 +1449,8 @@ void fib6_run_gc(unsigned long dummy)
 {
 	if (dummy != ~0UL) {
 		spin_lock_bh(&fib6_gc_lock);
-		gc_args.timeout = dummy ? (int)dummy : ip6_rt_gc_interval;
+		gc_args.timeout = dummy ? (int)dummy :
+			init_net.ipv6.sysctl.ip6_rt_gc_interval;
 	} else {
 		local_bh_disable();
 		if (!spin_trylock(&fib6_gc_lock)) {
@@ -1455,7 +1458,7 @@ void fib6_run_gc(unsigned long dummy)
 			local_bh_enable();
 			return;
 		}
-		gc_args.timeout = ip6_rt_gc_interval;
+		gc_args.timeout = init_net.ipv6.sysctl.ip6_rt_gc_interval;
 	}
 	gc_args.more = 0;
 
@@ -1463,7 +1466,8 @@ void fib6_run_gc(unsigned long dummy)
 	fib6_clean_all(fib6_age, 0, NULL);
 
 	if (gc_args.more)
-		mod_timer(&ip6_fib_timer, jiffies + ip6_rt_gc_interval);
+		mod_timer(&ip6_fib_timer, jiffies +
+			  init_net.ipv6.sysctl.ip6_rt_gc_interval);
 	else {
 		del_timer(&ip6_fib_timer);
 		ip6_fib_timer.expires = 0;

-- 

^ permalink raw reply

* [patch 7/9][NETNS][IPV6] make mld_max_msf readonly in other namespaces
From: Daniel Lezcano @ 2008-01-04 11:12 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20080104111226.776105484@localhost.localdomain>

[-- Attachment #1: make-mld_max_msf-readonly.patch --]
[-- Type: text/plain, Size: 1114 bytes --]

The mld_max_msf protects the system with a maximum allowed multicast 
source filters. Making this variable per namespace can be potentially
an problem if someone inside a namespace set it to a big value, that
will impact the whole system including other namespaces.

I don't see any benefits to have it per namespace for now, so in order 
to keep a directory entry in a newly created namespace, I make it
read-only when we are not in the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 net/ipv6/sysctl_net_ipv6.c |    3 +++
 1 file changed, 3 insertions(+)

Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -122,6 +122,9 @@ static int ipv6_sysctl_net_init(struct n
     	ipv6_table[5].data = &net->ipv6.sysctl.frags.timeout;
   	ipv6_table[6].data = &net->ipv6.sysctl.frags.secret_interval;
 
+	if (net != &init_net)
+		ipv6_table[7].mode = 0444;
+
 	ipv6_frag_sysctl_init(net);
 
 	net->ipv6.sysctl.bindv6only = 0;

-- 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox