Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2.6.25 3/9] SCTP: Add the handling of "Set Primary IP Address" parameter to INIT
From: David Miller @ 2007-12-20 22:10 UTC (permalink / raw)
  To: vladislav.yasevich; +Cc: netdev, lksctp-developers
In-Reply-To: <1197927169-5106-4-git-send-email-vladislav.yasevich@hp.com>

From: Vlad Yasevich <vladislav.yasevich@hp.com>
Date: Mon, 17 Dec 2007 16:32:43 -0500

> The ADD-IP "Set Primary IP Address" parameter is allowed in the
> INIT/INIT-ACK exchange.  Allow processing of this parameter during
> the INIT/INIT-ACK.
> 
> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2.6.25 2/9] SCTP: Handle the wildcard ADD-IP Address parameter
From: David Miller @ 2007-12-20 22:09 UTC (permalink / raw)
  To: vladislav.yasevich; +Cc: netdev, lksctp-developers
In-Reply-To: <1197927169-5106-3-git-send-email-vladislav.yasevich@hp.com>

From: Vlad Yasevich <vladislav.yasevich@hp.com>
Date: Mon, 17 Dec 2007 16:32:42 -0500

> The Address Parameter in the parameter list of the ASCONF chunk
> may be a wildcard address.  In this case special processing
> is required.  For the 'add' case, the source IP of the packet is
> added.  In the 'del' case, all addresses except the source IP
> of packet are removed. In the "mark primary" case, the source
> address is marked as primary.
> 
> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2.6.25 1/9] SCTP: Discard unauthenticated ASCONF and ASCONF ACK chunks
From: David Miller @ 2007-12-20 22:08 UTC (permalink / raw)
  To: vladislav.yasevich; +Cc: netdev, lksctp-developers
In-Reply-To: <1197927169-5106-2-git-send-email-vladislav.yasevich@hp.com>

From: Vlad Yasevich <vladislav.yasevich@hp.com>
Date: Mon, 17 Dec 2007 16:32:41 -0500

> Now that we support AUTH, discard unauthenticated ASCONF and ASCONF ACK
> chunks as mandated in the ADD-IP spec.
> 
> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net/ipv4/: Spelling fixes
From: David Miller @ 2007-12-20 22:05 UTC (permalink / raw)
  To: joe
  Cc: linux-kernel, akpm, pekkas, kuznet, yoshfuji, jmorris, kaber,
	coreteam, netdev, netfilter-devel, netfilter
In-Reply-To: <1197920439-5455-25-git-send-email-joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon, 17 Dec 2007 11:40:31 -0800

> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net/netfilter/: Spelling fixes
From: David Miller @ 2007-12-20 22:04 UTC (permalink / raw)
  To: joe; +Cc: linux-kernel, akpm, kaber, coreteam, netdev, netfilter-devel,
	netfilter
In-Reply-To: <1197920439-5455-28-git-send-email-joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon, 17 Dec 2007 11:40:34 -0800

> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net/sctp/: Spelling fixes
From: David Miller @ 2007-12-20 22:04 UTC (permalink / raw)
  To: joe; +Cc: linux-kernel, akpm, sri, vladislav.yasevich, lksctp-developers,
	netdev
In-Reply-To: <1197920439-5455-31-git-send-email-joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon, 17 Dec 2007 11:40:37 -0800

> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net/netlabel/: Spelling fixes
From: David Miller @ 2007-12-20 22:03 UTC (permalink / raw)
  To: joe; +Cc: linux-kernel, akpm, paul.moore, netdev
In-Reply-To: <1197920439-5455-29-git-send-email-joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon, 17 Dec 2007 11:40:35 -0800

> Signed-off-by: Joe Perches <joe@perches.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net/sched/: Spelling fixes
From: David Miller @ 2007-12-20 22:02 UTC (permalink / raw)
  To: joe; +Cc: linux-kernel, akpm, hadi, netdev
In-Reply-To: <1197920439-5455-30-git-send-email-joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon, 17 Dec 2007 11:40:36 -0800

> 
> Signed-off-by: Joe Perches <joe@perches.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net/core/: Spelling fixes
From: David Miller @ 2007-12-20 22:02 UTC (permalink / raw)
  To: joe; +Cc: linux-kernel, akpm, netdev
In-Reply-To: <1197920439-5455-23-git-send-email-joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon, 17 Dec 2007 11:40:29 -0800

> 
> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net/ipv6/: Spelling fixes
From: David Miller @ 2007-12-20 22:01 UTC (permalink / raw)
  To: joe; +Cc: linux-kernel, akpm, pekkas, kuznet, yoshfuji, jmorris, kaber,
	netdev
In-Reply-To: <1197920439-5455-26-git-send-email-joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon, 17 Dec 2007 11:40:32 -0800

> Signed-off-by: Joe Perches <joe@perches.com>

APplied.

^ permalink raw reply

* Re: [PATCH] net/irda/: Spelling fixes
From: David Miller @ 2007-12-20 22:01 UTC (permalink / raw)
  To: joe; +Cc: linux-kernel, akpm, samuel, netdev
In-Reply-To: <1197920439-5455-27-git-send-email-joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon, 17 Dec 2007 11:40:33 -0800

> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net/dccp/: Spelling fixes
From: David Miller @ 2007-12-20 21:59 UTC (permalink / raw)
  To: joe; +Cc: linux-kernel, akpm, acme, dccp, netdev
In-Reply-To: <1197920439-5455-24-git-send-email-joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon, 17 Dec 2007 11:40:30 -0800

> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [PATCH] include/net/: Spelling fixes
From: David Miller @ 2007-12-20 21:56 UTC (permalink / raw)
  To: joe
  Cc: linux-kernel, akpm, pekkas, kuznet, yoshfuji, jmorris, kaber,
	ralf, samuel, sri, vladislav.yasevich, linux-hams,
	lksctp-developers, netdev
In-Reply-To: <1197920439-5455-19-git-send-email-joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon, 17 Dec 2007 11:40:25 -0800

> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [IPSEC]: Rename tunnel-mode functions to avoid collisions with tunnels
From: David Miller @ 2007-12-20 21:54 UTC (permalink / raw)
  To: herbert; +Cc: netdev
In-Reply-To: <20071219063833.GA7041@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 19 Dec 2007 14:38:33 +0800

> [IPSEC]: Rename tunnel-mode functions to avoid collisions with tunnels
> 
> It appears that I've managed to create two different functions both
> called xfrm6_tunnel_output.  This is because we have the plain tunnel
> encapsulation named xfrmX_tunnel as well as the tunnel-mode encapsulation
> which lives in the files xfrmX_mode_tunnel.c.
> 
> This patch renames functions from the latter to use the xfrmX_mode_tunnel
> prefix to avoid name-space conflicts.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks Herbert.

^ permalink raw reply

* [PATCH 2/3] XFRM: RFC4303 compliant auditing
From: Paul Moore @ 2007-12-20 21:42 UTC (permalink / raw)
  To: netdev, linux-audit; +Cc: latten
In-Reply-To: <20071220214200.12122.89628.stgit@flek.lan>

This patch adds a number of new IPsec audit events to meet the auditing
requirements of RFC4303.  This includes audit hooks for the following events:

 * Could not find a valid SA [sections 2.1, 3.4.2]
   . xfrm_audit_state_notfound()
   . xfrm_audit_state_notfound_simple()

 * Sequence number overflow [section 3.3.3]
   . xfrm_audit_state_replay_overflow()

 * Replayed packet [section 3.4.3]
   . xfrm_audit_state_replay()

 * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
   . xfrm_audit_state_icvfail()

While RFC4304 deals only with ESP most of the changes in this patch apply to
IPsec in general, i.e. both AH and ESP.  The one case, integrity check
failure, where ESP specific code had to be modified the same was done to the
AH code for the sake of consistency.

Signed-off-by: Paul Moore <paul.moore@hp.com>
---

 include/net/xfrm.h     |   33 ++++++++--
 net/ipv4/ah4.c         |    4 +
 net/ipv4/esp4.c        |    1 
 net/ipv6/ah6.c         |    2 -
 net/ipv6/esp6.c        |    1 
 net/ipv6/xfrm6_input.c |    4 +
 net/xfrm/xfrm_input.c  |    6 +-
 net/xfrm/xfrm_output.c |    2 +
 net/xfrm/xfrm_policy.c |   14 ++--
 net/xfrm/xfrm_state.c  |  153 +++++++++++++++++++++++++++++++++++++++++++-----
 10 files changed, 184 insertions(+), 36 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index ac6cf09..941d5cd 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -548,26 +548,33 @@ struct xfrm_audit
 };
 
 #ifdef CONFIG_AUDITSYSCALL
-static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 secid)
+static inline struct audit_buffer *xfrm_audit_start(const char *op)
 {
 	struct audit_buffer *audit_buf = NULL;
-	char *secctx;
-	u32 secctx_len;
 
+	if (audit_enabled == 0)
+		return NULL;
 	audit_buf = audit_log_start(current->audit_context, GFP_ATOMIC,
-			      AUDIT_MAC_IPSEC_EVENT);
+				    AUDIT_MAC_IPSEC_EVENT);
 	if (audit_buf == NULL)
 		return NULL;
+	audit_log_format(audit_buf, "op=%s", op);
+	return audit_buf;
+}
 
-	audit_log_format(audit_buf, "auid=%u", auid);
+static inline void xfrm_audit_helper_usrinfo(u32 auid, u32 secid,
+					     struct audit_buffer *audit_buf)
+{
+	char *secctx;
+	u32 secctx_len;
 
+	audit_log_format(audit_buf, " auid=%u", auid);
 	if (secid != 0 &&
 	    security_secid_to_secctx(secid, &secctx, &secctx_len) == 0) {
 		audit_log_format(audit_buf, " subj=%s", secctx);
 		security_release_secctx(secctx, secctx_len);
 	} else
 		audit_log_task_context(audit_buf);
-	return audit_buf;
 }
 
 extern void xfrm_audit_policy_add(struct xfrm_policy *xp, int result,
@@ -578,11 +585,22 @@ extern void xfrm_audit_state_add(struct xfrm_state *x, int result,
 				 u32 auid, u32 secid);
 extern void xfrm_audit_state_delete(struct xfrm_state *x, int result,
 				    u32 auid, u32 secid);
+extern void xfrm_audit_state_replay_overflow(struct xfrm_state *x,
+					     struct sk_buff *skb);
+extern void xfrm_audit_state_notfound_simple(struct sk_buff *skb, u16 family);
+extern void xfrm_audit_state_notfound(struct sk_buff *skb, u16 family,
+				      __be32 net_spi, __be32 net_seq);
+extern void xfrm_audit_state_icvfail(struct xfrm_state *x,
+				     struct sk_buff *skb, u8 proto);
 #else
 #define xfrm_audit_policy_add(x, r, a, s)	do { ; } while (0)
 #define xfrm_audit_policy_delete(x, r, a, s)	do { ; } while (0)
 #define xfrm_audit_state_add(x, r, a, s)	do { ; } while (0)
 #define xfrm_audit_state_delete(x, r, a, s)	do { ; } while (0)
+#define xfrm_audit_state_replay_overflow(x, s)	do { ; } while (0)
+#define xfrm_audit_state_notfound_simple(s, f)	do { ; } while (0)
+#define xfrm_audit_state_notfound(s, f, sp, sq)	do { ; } while (0)
+#define xfrm_audit_state_icvfail(x, s, p)	do { ; } while (0)
 #endif /* CONFIG_AUDITSYSCALL */
 
 static inline void xfrm_pol_hold(struct xfrm_policy *policy)
@@ -1193,7 +1211,8 @@ extern int xfrm_state_delete(struct xfrm_state *x);
 extern int xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info);
 extern void xfrm_sad_getinfo(struct xfrmk_sadinfo *si);
 extern void xfrm_spd_getinfo(struct xfrmk_spdinfo *si);
-extern int xfrm_replay_check(struct xfrm_state *x, __be32 seq);
+extern int xfrm_replay_check(struct xfrm_state *x,
+			     struct sk_buff *skb, __be32 seq);
 extern void xfrm_replay_advance(struct xfrm_state *x, __be32 seq);
 extern void xfrm_replay_notify(struct xfrm_state *x, int event);
 extern int xfrm_state_mtu(struct xfrm_state *x, int mtu);
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index d76803a..ec8de0a 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -179,8 +179,10 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb)
 		err = ah_mac_digest(ahp, skb, ah->auth_data);
 		if (err)
 			goto unlock;
-		if (memcmp(ahp->work_icv, auth_data, ahp->icv_trunc_len))
+		if (memcmp(ahp->work_icv, auth_data, ahp->icv_trunc_len)) {
+			xfrm_audit_state_icvfail(x, skb, IPPROTO_AH);
 			err = -EBADMSG;
+		}
 	}
 unlock:
 	spin_unlock(&x->lock);
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 28ea5c7..b334c76 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -191,6 +191,7 @@ static int esp_input(struct xfrm_state *x, struct sk_buff *skb)
 			BUG();
 
 		if (unlikely(memcmp(esp->auth.work_icv, sum, alen))) {
+			xfrm_audit_state_icvfail(x, skb, IPPROTO_ESP);
 			err = -EBADMSG;
 			goto unlock;
 		}
diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index 1b51d1e..2d32772 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -381,7 +381,7 @@ static int ah6_input(struct xfrm_state *x, struct sk_buff *skb)
 		if (err)
 			goto unlock;
 		if (memcmp(ahp->work_icv, auth_data, ahp->icv_trunc_len)) {
-			LIMIT_NETDEBUG(KERN_WARNING "ipsec ah authentication error\n");
+			xfrm_audit_state_icvfail(x, skb, IPPROTO_AH);
 			err = -EBADMSG;
 		}
 	}
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 5bd5292..e10f10b 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -186,6 +186,7 @@ static int esp6_input(struct xfrm_state *x, struct sk_buff *skb)
 			BUG();
 
 		if (unlikely(memcmp(esp->auth.work_icv, sum, alen))) {
+			xfrm_audit_state_icvfail(x, skb, IPPROTO_ESP);
 			ret = -EBADMSG;
 			goto unlock;
 		}
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 74f3aac..08b850a 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -136,8 +136,10 @@ int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 		break;
 	}
 
-	if (!xfrm_vec_one)
+	if (!xfrm_vec_one) {
+		xfrm_audit_state_notfound_simple(skb, AF_INET6);
 		goto drop;
+	}
 
 	/* Allocate new secpath or COW existing one. */
 	if (!skb->sp || atomic_read(&skb->sp->refcnt) != 1) {
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 8624cbd..87e1aac 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -139,8 +139,10 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 			goto drop;
 
 		x = xfrm_state_lookup(daddr, spi, nexthdr, family);
-		if (x == NULL)
+		if (x == NULL) {
+			xfrm_audit_state_notfound(skb, family, spi, seq);
 			goto drop;
+		}
 
 		skb->sp->xvec[skb->sp->len++] = x;
 
@@ -151,7 +153,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 		if ((x->encap ? x->encap->encap_type : 0) != encap_type)
 			goto drop_unlock;
 
-		if (x->props.replay_window && xfrm_replay_check(x, seq))
+		if (x->props.replay_window && xfrm_replay_check(x, skb, seq))
 			goto drop_unlock;
 
 		if (xfrm_state_check_expire(x))
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 26fa0cb..eb3333b 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -57,6 +57,8 @@ static int xfrm_output_one(struct sk_buff *skb, int err)
 
 		if (x->type->flags & XFRM_TYPE_REPLAY_PROT) {
 			XFRM_SKB_CB(skb)->seq = ++x->replay.oseq;
+			if (unlikely(x->replay.oseq == 0))
+				xfrm_audit_state_replay_overflow(x, skb);
 			if (xfrm_aevent_is_on())
 				xfrm_replay_notify(x, XFRM_REPLAY_UPDATE);
 		}
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index c8f0656..97cebfe 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2323,12 +2323,11 @@ void xfrm_audit_policy_add(struct xfrm_policy *xp, int result,
 {
 	struct audit_buffer *audit_buf;
 
-	if (audit_enabled == 0)
-		return;
-	audit_buf = xfrm_audit_start(auid, secid);
+	audit_buf = xfrm_audit_start("SPD-add");
 	if (audit_buf == NULL)
 		return;
-	audit_log_format(audit_buf, " op=SPD-add res=%u", result);
+	xfrm_audit_helper_usrinfo(auid, secid, audit_buf);
+	audit_log_format(audit_buf, " res=%u", result);
 	xfrm_audit_common_policyinfo(xp, audit_buf);
 	audit_log_end(audit_buf);
 }
@@ -2339,12 +2338,11 @@ void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result,
 {
 	struct audit_buffer *audit_buf;
 
-	if (audit_enabled == 0)
-		return;
-	audit_buf = xfrm_audit_start(auid, secid);
+	audit_buf = xfrm_audit_start("SPD-delete");
 	if (audit_buf == NULL)
 		return;
-	audit_log_format(audit_buf, " op=SPD-delete res=%u", result);
+	xfrm_audit_helper_usrinfo(auid, secid, audit_buf);
+	audit_log_format(audit_buf, " res=%u", result);
 	xfrm_audit_common_policyinfo(xp, audit_buf);
 	audit_log_end(audit_buf);
 }
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index dd38e6f..1f00aeb 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -61,6 +61,13 @@ static unsigned int xfrm_state_genid;
 static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned int family);
 static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo);
 
+#ifdef CONFIG_AUDITSYSCALL
+static void xfrm_audit_state_replay(struct xfrm_state *x,
+				    struct sk_buff *skb, __be32 net_seq);
+#else
+#define xfrm_audit_state_replay(x, s, sq)	do { ; } while (0)
+#endif /* CONFIG_AUDITSYSCALL */
+
 static inline unsigned int xfrm_dst_hash(xfrm_address_t *daddr,
 					 xfrm_address_t *saddr,
 					 u32 reqid,
@@ -1609,13 +1616,14 @@ static void xfrm_replay_timer_handler(unsigned long data)
 	spin_unlock(&x->lock);
 }
 
-int xfrm_replay_check(struct xfrm_state *x, __be32 net_seq)
+int xfrm_replay_check(struct xfrm_state *x,
+		      struct sk_buff *skb, __be32 net_seq)
 {
 	u32 diff;
 	u32 seq = ntohl(net_seq);
 
 	if (unlikely(seq == 0))
-		return -EINVAL;
+		goto err;
 
 	if (likely(seq > x->replay.seq))
 		return 0;
@@ -1624,14 +1632,18 @@ int xfrm_replay_check(struct xfrm_state *x, __be32 net_seq)
 	if (diff >= min_t(unsigned int, x->props.replay_window,
 			  sizeof(x->replay.bitmap) * 8)) {
 		x->stats.replay_window++;
-		return -EINVAL;
+		goto err;
 	}
 
 	if (x->replay.bitmap & (1U << diff)) {
 		x->stats.replay++;
-		return -EINVAL;
+		goto err;
 	}
 	return 0;
+
+err:
+	xfrm_audit_state_replay(x, skb, net_seq);
+	return -EINVAL;	
 }
 EXPORT_SYMBOL(xfrm_replay_check);
 
@@ -1994,8 +2006,8 @@ void __init xfrm_state_init(void)
 }
 
 #ifdef CONFIG_AUDITSYSCALL
-static inline void xfrm_audit_common_stateinfo(struct xfrm_state *x,
-					       struct audit_buffer *audit_buf)
+static inline void xfrm_audit_helper_sainfo(struct xfrm_state *x,
+					    struct audit_buffer *audit_buf)
 {
 	struct xfrm_sec_ctx *ctx = x->security;
 	u32 spi = ntohl(x->id.spi);
@@ -2022,18 +2034,45 @@ static inline void xfrm_audit_common_stateinfo(struct xfrm_state *x,
 	audit_log_format(audit_buf, " spi=%u(0x%x)", spi, spi);
 }
 
+static inline void xfrm_audit_helper_pktinfo(struct sk_buff *skb, u16 family,
+					     struct audit_buffer *audit_buf)
+{
+	struct iphdr *iph4;
+	struct ipv6hdr *iph6;
+
+	switch (family) {
+	case AF_INET:
+		iph4 = ip_hdr(skb);
+		audit_log_format(audit_buf,
+				 " src=" NIPQUAD_FMT " dst=" NIPQUAD_FMT,
+				 NIPQUAD(iph4->saddr),
+				 NIPQUAD(iph4->daddr));
+		break;
+	case AF_INET6:
+		iph6 = ipv6_hdr(skb);
+		audit_log_format(audit_buf,
+				 " src=" NIP6_FMT " dst=" NIP6_FMT
+				 " flowlbl=0x%x%x%x",
+				 NIP6(iph6->saddr),
+				 NIP6(iph6->daddr),
+				 iph6->flow_lbl[0] & 0x0f,
+				 iph6->flow_lbl[1],
+				 iph6->flow_lbl[2]);
+		break;
+	}
+}
+
 void xfrm_audit_state_add(struct xfrm_state *x, int result,
 			  u32 auid, u32 secid)
 {
 	struct audit_buffer *audit_buf;
 
-	if (audit_enabled == 0)
-		return;
-	audit_buf = xfrm_audit_start(auid, secid);
+	audit_buf = xfrm_audit_start("SAD-add");
 	if (audit_buf == NULL)
 		return;
-	audit_log_format(audit_buf, " op=SAD-add res=%u", result);
-	xfrm_audit_common_stateinfo(x, audit_buf);
+	xfrm_audit_helper_usrinfo(auid, secid, audit_buf);
+	xfrm_audit_helper_sainfo(x, audit_buf);
+	audit_log_format(audit_buf, " res=%u", result);
 	audit_log_end(audit_buf);
 }
 EXPORT_SYMBOL_GPL(xfrm_audit_state_add);
@@ -2043,14 +2082,96 @@ void xfrm_audit_state_delete(struct xfrm_state *x, int result,
 {
 	struct audit_buffer *audit_buf;
 
-	if (audit_enabled == 0)
-		return;
-	audit_buf = xfrm_audit_start(auid, secid);
+	audit_buf = xfrm_audit_start("SAD-delete");
 	if (audit_buf == NULL)
 		return;
-	audit_log_format(audit_buf, " op=SAD-delete res=%u", result);
-	xfrm_audit_common_stateinfo(x, audit_buf);
+	xfrm_audit_helper_usrinfo(auid, secid, audit_buf);
+	xfrm_audit_helper_sainfo(x, audit_buf);
+	audit_log_format(audit_buf, " res=%u", result);
 	audit_log_end(audit_buf);
 }
 EXPORT_SYMBOL_GPL(xfrm_audit_state_delete);
+
+void xfrm_audit_state_replay_overflow(struct xfrm_state *x,
+				      struct sk_buff *skb)
+{
+	struct audit_buffer *audit_buf;
+	u32 spi;
+
+	audit_buf = xfrm_audit_start("SA-replay-overflow");
+	if (audit_buf == NULL)
+		return;
+	xfrm_audit_helper_pktinfo(skb, x->props.family, audit_buf);
+	/* don't record the sequence number because it's inherent in this kind
+	 * of audit message */
+	spi = ntohl(x->id.spi);
+	audit_log_format(audit_buf, " spi=%u(0x%x)", spi, spi);
+	audit_log_end(audit_buf);
+}
+EXPORT_SYMBOL_GPL(xfrm_audit_state_replay_overflow);
+
+static void xfrm_audit_state_replay(struct xfrm_state *x,
+			     struct sk_buff *skb, __be32 net_seq)
+{
+	struct audit_buffer *audit_buf;
+	u32 spi;
+
+	audit_buf = xfrm_audit_start("SA-replayed-pkt");
+	if (audit_buf == NULL)
+		return;
+	xfrm_audit_helper_pktinfo(skb, x->props.family, audit_buf);
+	spi = ntohl(x->id.spi);
+	audit_log_format(audit_buf, " spi=%u(0x%x) seqno=%u",
+			 spi, spi, ntohl(net_seq));
+	audit_log_end(audit_buf);
+}
+
+void xfrm_audit_state_notfound_simple(struct sk_buff *skb, u16 family)
+{
+	struct audit_buffer *audit_buf;
+
+	audit_buf = xfrm_audit_start("SA-notfound");
+	if (audit_buf == NULL)
+		return;
+	xfrm_audit_helper_pktinfo(skb, family, audit_buf);
+	audit_log_end(audit_buf);
+}
+EXPORT_SYMBOL_GPL(xfrm_audit_state_notfound_simple);
+
+void xfrm_audit_state_notfound(struct sk_buff *skb, u16 family,
+			       __be32 net_spi, __be32 net_seq)
+{
+	struct audit_buffer *audit_buf;
+	u32 spi;
+
+	audit_buf = xfrm_audit_start("SA-notfound");
+	if (audit_buf == NULL)
+		return;
+	xfrm_audit_helper_pktinfo(skb, family, audit_buf);
+	spi = ntohl(net_spi);
+	audit_log_format(audit_buf, " spi=%u(0x%x) seqno=%u",
+			 spi, spi, ntohl(net_seq));
+	audit_log_end(audit_buf);
+}
+EXPORT_SYMBOL_GPL(xfrm_audit_state_notfound);
+
+void xfrm_audit_state_icvfail(struct xfrm_state *x,
+			      struct sk_buff *skb, u8 proto)
+{
+	struct audit_buffer *audit_buf;
+	__be32 net_spi;
+	__be32 net_seq;
+
+	audit_buf = xfrm_audit_start("SA-icv-failure");
+	if (audit_buf == NULL)
+		return;
+	xfrm_audit_helper_pktinfo(skb, x->props.family, audit_buf);
+	if (xfrm_parse_spi(skb, proto, &net_spi, &net_seq) == 0) {
+		u32 spi = ntohl(net_spi);
+		audit_log_format(audit_buf, " spi=%u(0x%x) seqno=%u",
+				 spi, spi, ntohl(net_seq));
+	}
+	audit_log_end(audit_buf);
+}
+EXPORT_SYMBOL_GPL(xfrm_audit_state_icvfail);
 #endif /* CONFIG_AUDITSYSCALL */


^ permalink raw reply related

* [PATCH 1/3] XFRM: Assorted IPsec fixups
From: Paul Moore @ 2007-12-20 21:42 UTC (permalink / raw)
  To: netdev, linux-audit; +Cc: latten
In-Reply-To: <20071220214200.12122.89628.stgit@flek.lan>

This patch fixes a number of small but potentially troublesome things in the
XFRM/IPsec code:

 * Use the 'audit_enabled' variable already in include/linux/audit.h
   Removed the need for extern declarations local to each XFRM audit fuction

 * Convert 'sid' to 'secid' everywhere we can
   The 'sid' name is specific to SELinux, 'secid' is the common naming
   convention used by the kernel when refering to tokenized LSM labels,
   unfortunately we have to leave 'ctx_sid' in 'struct xfrm_sec_ctx' otherwise
   we risk breaking userspace

 * Convert address display to use standard NIP* macros
   Similar to what was recently done with the SPD audit code, this also also
   includes the removal of some unnecessary memcpy() calls

 * Move common code to xfrm_audit_common_stateinfo()
   Code consolidation from the "less is more" book on software development

 * Proper spacing around commas in function arguments
   Minor style tweak since I was already touching the code

Signed-off-by: Paul Moore <paul.moore@hp.com>
---

 include/net/xfrm.h     |   14 ++++++-------
 net/xfrm/xfrm_policy.c |   15 ++++++--------
 net/xfrm/xfrm_state.c  |   53 ++++++++++++++++++++----------------------------
 3 files changed, 36 insertions(+), 46 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 32b99e2..ac6cf09 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -548,7 +548,7 @@ struct xfrm_audit
 };
 
 #ifdef CONFIG_AUDITSYSCALL
-static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 sid)
+static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 secid)
 {
 	struct audit_buffer *audit_buf = NULL;
 	char *secctx;
@@ -561,8 +561,8 @@ static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 sid)
 
 	audit_log_format(audit_buf, "auid=%u", auid);
 
-	if (sid != 0 &&
-	    security_secid_to_secctx(sid, &secctx, &secctx_len) == 0) {
+	if (secid != 0 &&
+	    security_secid_to_secctx(secid, &secctx, &secctx_len) == 0) {
 		audit_log_format(audit_buf, " subj=%s", secctx);
 		security_release_secctx(secctx, secctx_len);
 	} else
@@ -571,13 +571,13 @@ static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 sid)
 }
 
 extern void xfrm_audit_policy_add(struct xfrm_policy *xp, int result,
-				  u32 auid, u32 sid);
+				  u32 auid, u32 secid);
 extern void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result,
-				  u32 auid, u32 sid);
+				  u32 auid, u32 secid);
 extern void xfrm_audit_state_add(struct xfrm_state *x, int result,
-				 u32 auid, u32 sid);
+				 u32 auid, u32 secid);
 extern void xfrm_audit_state_delete(struct xfrm_state *x, int result,
-				    u32 auid, u32 sid);
+				    u32 auid, u32 secid);
 #else
 #define xfrm_audit_policy_add(x, r, a, s)	do { ; } while (0)
 #define xfrm_audit_policy_delete(x, r, a, s)	do { ; } while (0)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index d2084b1..c8f0656 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -24,6 +24,7 @@
 #include <linux/netfilter.h>
 #include <linux/module.h>
 #include <linux/cache.h>
+#include <linux/audit.h>
 #include <net/dst.h>
 #include <net/xfrm.h>
 #include <net/ip.h>
@@ -2317,15 +2318,14 @@ static inline void xfrm_audit_common_policyinfo(struct xfrm_policy *xp,
 	}
 }
 
-void
-xfrm_audit_policy_add(struct xfrm_policy *xp, int result, u32 auid, u32 sid)
+void xfrm_audit_policy_add(struct xfrm_policy *xp, int result,
+			   u32 auid, u32 secid)
 {
 	struct audit_buffer *audit_buf;
-	extern int audit_enabled;
 
 	if (audit_enabled == 0)
 		return;
-	audit_buf = xfrm_audit_start(sid, auid);
+	audit_buf = xfrm_audit_start(auid, secid);
 	if (audit_buf == NULL)
 		return;
 	audit_log_format(audit_buf, " op=SPD-add res=%u", result);
@@ -2334,15 +2334,14 @@ xfrm_audit_policy_add(struct xfrm_policy *xp, int result, u32 auid, u32 sid)
 }
 EXPORT_SYMBOL_GPL(xfrm_audit_policy_add);
 
-void
-xfrm_audit_policy_delete(struct xfrm_policy *xp, int result, u32 auid, u32 sid)
+void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result,
+			      u32 auid, u32 secid)
 {
 	struct audit_buffer *audit_buf;
-	extern int audit_enabled;
 
 	if (audit_enabled == 0)
 		return;
-	audit_buf = xfrm_audit_start(sid, auid);
+	audit_buf = xfrm_audit_start(auid, secid);
 	if (audit_buf == NULL)
 		return;
 	audit_log_format(audit_buf, " op=SPD-delete res=%u", result);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 95df01c..dd38e6f 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -19,6 +19,7 @@
 #include <linux/ipsec.h>
 #include <linux/module.h>
 #include <linux/cache.h>
+#include <linux/audit.h>
 #include <asm/uaccess.h>
 
 #include "xfrm_hash.h"
@@ -1996,69 +1997,59 @@ void __init xfrm_state_init(void)
 static inline void xfrm_audit_common_stateinfo(struct xfrm_state *x,
 					       struct audit_buffer *audit_buf)
 {
-	if (x->security)
+	struct xfrm_sec_ctx *ctx = x->security;
+	u32 spi = ntohl(x->id.spi);
+
+	if (ctx)
 		audit_log_format(audit_buf, " sec_alg=%u sec_doi=%u sec_obj=%s",
-				 x->security->ctx_alg, x->security->ctx_doi,
-				 x->security->ctx_str);
+				 ctx->ctx_alg, ctx->ctx_doi, ctx->ctx_str);
 
 	switch(x->props.family) {
 	case AF_INET:
-		audit_log_format(audit_buf, " src=%u.%u.%u.%u dst=%u.%u.%u.%u",
+		audit_log_format(audit_buf,
+				 " src=" NIPQUAD_FMT " dst=" NIPQUAD_FMT,
 				 NIPQUAD(x->props.saddr.a4),
 				 NIPQUAD(x->id.daddr.a4));
 		break;
 	case AF_INET6:
-		{
-			struct in6_addr saddr6, daddr6;
-
-			memcpy(&saddr6, x->props.saddr.a6,
-				sizeof(struct in6_addr));
-			memcpy(&daddr6, x->id.daddr.a6,
-				sizeof(struct in6_addr));
-			audit_log_format(audit_buf,
-					 " src=" NIP6_FMT " dst=" NIP6_FMT,
-					 NIP6(saddr6), NIP6(daddr6));
-		}
+		audit_log_format(audit_buf,
+				 " src=" NIP6_FMT " dst=" NIP6_FMT,
+				 NIP6(*(struct in6_addr *)x->props.saddr.a6),
+				 NIP6(*(struct in6_addr *)x->id.daddr.a6));
 		break;
 	}
+
+	audit_log_format(audit_buf, " spi=%u(0x%x)", spi, spi);
 }
 
-void
-xfrm_audit_state_add(struct xfrm_state *x, int result, u32 auid, u32 sid)
+void xfrm_audit_state_add(struct xfrm_state *x, int result,
+			  u32 auid, u32 secid)
 {
 	struct audit_buffer *audit_buf;
-	u32 spi;
-	extern int audit_enabled;
 
 	if (audit_enabled == 0)
 		return;
-	audit_buf = xfrm_audit_start(sid, auid);
+	audit_buf = xfrm_audit_start(auid, secid);
 	if (audit_buf == NULL)
 		return;
-	audit_log_format(audit_buf, " op=SAD-add res=%u",result);
+	audit_log_format(audit_buf, " op=SAD-add res=%u", result);
 	xfrm_audit_common_stateinfo(x, audit_buf);
-	spi = ntohl(x->id.spi);
-	audit_log_format(audit_buf, " spi=%u(0x%x)", spi, spi);
 	audit_log_end(audit_buf);
 }
 EXPORT_SYMBOL_GPL(xfrm_audit_state_add);
 
-void
-xfrm_audit_state_delete(struct xfrm_state *x, int result, u32 auid, u32 sid)
+void xfrm_audit_state_delete(struct xfrm_state *x, int result,
+			     u32 auid, u32 secid)
 {
 	struct audit_buffer *audit_buf;
-	u32 spi;
-	extern int audit_enabled;
 
 	if (audit_enabled == 0)
 		return;
-	audit_buf = xfrm_audit_start(sid, auid);
+	audit_buf = xfrm_audit_start(auid, secid);
 	if (audit_buf == NULL)
 		return;
-	audit_log_format(audit_buf, " op=SAD-delete res=%u",result);
+	audit_log_format(audit_buf, " op=SAD-delete res=%u", result);
 	xfrm_audit_common_stateinfo(x, audit_buf);
-	spi = ntohl(x->id.spi);
-	audit_log_format(audit_buf, " spi=%u(0x%x)", spi, spi);
 	audit_log_end(audit_buf);
 }
 EXPORT_SYMBOL_GPL(xfrm_audit_state_delete);


^ permalink raw reply related

* [PATCH 0/3] XFRM audit fixes/additions for net-2.6.25
From: Paul Moore @ 2007-12-20 21:42 UTC (permalink / raw)
  To: netdev, linux-audit; +Cc: latten

Three patches backed against net-2.6.25 from today.  Some of the audit
messages are a little difficult to test by their nature but I've verified
that I'm still able to send/receive IPsec protected traffic with the patches
applied.

The first patch was posted before but David decided it best to split the
patch so some parts could be pulled into 2.6.24; the patch was split and
the 2.6.24 bits were accepted (the SPI byteorder fix) so patch #1 in the
series is what is left for 2.6.25.

The second patch was posted before as an RFC patch without anyone complaining
too loudly.  Eric Paris made some suggestions about better handling of the
"op=" audit field and I've tried to take that into account with this patch.

The final patch is the audit replay counter overflow issue fix that has been
talked about on netdev.  This sounded like the best course of action from the
discussion but if I'm wrong, just drop this patch and I'll cook up something
else to solve the problem.

Thanks.

--
paul moore
linux security @ hp

^ permalink raw reply

* [PATCH 3/3] XFRM: Drop packets when replay counter would overflow
From: Paul Moore @ 2007-12-20 21:42 UTC (permalink / raw)
  To: netdev, linux-audit; +Cc: latten
In-Reply-To: <20071220214200.12122.89628.stgit@flek.lan>

According to RFC4303, section 3.3.3 we need to drop outgoing packets which
cause the replay counter to overflow:

   3.3.3.  Sequence Number Generation

   The sender's counter is initialized to 0 when an SA is established.
   The sender increments the sequence number (or ESN) counter for this
   SA and inserts the low-order 32 bits of the value into the Sequence
   Number field.  Thus, the first packet sent using a given SA will
   contain a sequence number of 1.

   If anti-replay is enabled (the default), the sender checks to ensure
   that the counter has not cycled before inserting the new value in the
   Sequence Number field.  In other words, the sender MUST NOT send a
   packet on an SA if doing so would cause the sequence number to cycle.
   An attempt to transmit a packet that would result in sequence number
   overflow is an auditable event.  The audit log entry for this event
   SHOULD include the SPI value, current date/time, Source Address,
   Destination Address, and (in IPv6) the cleartext Flow ID.

Signed-off-by: Paul Moore <paul.moore@hp.com>
---

 net/xfrm/xfrm_output.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index eb3333b..284eeef 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -57,8 +57,11 @@ static int xfrm_output_one(struct sk_buff *skb, int err)

 		if (x->type->flags & XFRM_TYPE_REPLAY_PROT) {
 			XFRM_SKB_CB(skb)->seq = ++x->replay.oseq;
-			if (unlikely(x->replay.oseq == 0))
+			if (unlikely(x->replay.oseq == 0)) {
+				x->replay.oseq--;
 				xfrm_audit_state_replay_overflow(x, skb);
+				goto error;
+			}
 			if (xfrm_aevent_is_on())
 				xfrm_replay_notify(x, XFRM_REPLAY_UPDATE);
 		}

^ permalink raw reply related

* Re: [PATCH 1/2] e1000e: Use deferrable timer for watchdog
From: Kok, Auke @ 2007-12-20 21:27 UTC (permalink / raw)
  To: jeff; +Cc: netdev, parag.warudkar
In-Reply-To: <20071220185125.21903.44752.stgit@localhost.localdomain>

Auke Kok wrote:
> From: Parag Warudkar <parag.warudkar@gmail.com>
> 
> Reduce wakeups from idle per second.
> 
> Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> ---

Jeff,

given the discussion with Stephen I'd like to skip merging this patch and the
e1000 one for now. The unforeseen implications of this are just not controlled
enough and we need to guarantee some limit of deferral first.

Auke

^ permalink raw reply

* Re: [PATCH] sky2: Use deferrable timer for watchdog
From: Kok, Auke @ 2007-12-20 21:24 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Parag Warudkar, Arjan van de Ven, netdev, akpm, linux-kernel
In-Reply-To: <20071220130841.6d2801f2@deepthought>

Stephen Hemminger wrote:
> On Thu, 20 Dec 2007 15:36:13 -0500
> "Parag Warudkar" <parag.warudkar@gmail.com> wrote:
> 
>> On Dec 20, 2007 3:04 PM, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>>> I think it is reasonable for Network driver watchdogs to use a
>>>> deferrable timer - if the machine is 100% IDLE there is no one needing
>>>> the network to be up. If there is something running even on the other
>>>> CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
>>>> which will make it very likely in practice that each CPU will be
>>>> interrupted in reasonable amount of time.
>>> this is not correct; many machines are idle waiting for network data. Think of webservers...
>> Yes, I forgot the receive case. So if a server was 100% IDLE and a web
>> server was listening for network data and we reach 0 wakeups per
>> second on the CPU where the network watchdog timer is scheduled to run
>> deferred _and_ the network link went down, it would cause the watchdog
>> to not run and redo the link until some one else wakes up that CPU
>> later.
>> So as long as we make sure we don't convert every timer to deferrable
>> we should be ok - may be this can be resolved easily by having a
>> non-deferrable "dont-allow-deferring-for-too-long" timer on each CPU
>> that just causes at least one wake up in some reasonable time delta
>> from the previous wakeup (whoever caused that one.) It is still
>> beneficial in that all deferrable timers would run at once without
>> needing to have separate wakeup for each.
>>
>>>> Of course there are theoretical cases where we could land into a
>>>> situation where a CPU in a multiprocessor machine is IDLE infinitely
>>>> and that causes the watchdog that happens to be bound to run on the
>>>> same CPU to not run. To take care of these unlikely cases I think the
>>>> timer mechanism should have a reasonable limit on how long a CPU can
>>>> go IDLE if there are deferrable timers.
>>> how about something else instead: a timer mechanism that takes a range instead..
>>> that at least has defined semantics; the deferrable semantics really are "indefinite".
>>> Lets keep at least the semantics clear and clean.
>>>
>> Would not the simpler solution of installing a non-deferrable timer
>> per cpu which will not allow the CPU to go IDLE for more than x units
>> of time at once  (or something to that effect) work? Range would
>> complicate the thing and I am not sure how many cases will know
>> reasonably correct range for their normal operation. In this instance
>> of the e1000 watchdog what range could it give and be successful at
>> what it wants to do - bring up the link in reasonable amount of time,
>> while also realizing the power savings?
>>
>> Perhaps depending on Server/Laptop/Desktop machine (may be based on
>> Preemption) we could have normal or deferrable timers but that'll
>> exclude Servers from power savings and I am not sure Data center folks
>> will like that :) .
>>
>> Parag
> 
> 
> The problem is that on a server the receiver will go deaf if the chip
> bug that the watchdog is looking for triggers.  Yes, no packets in
> and it happily will just sit there.
> 
> So for now, I am not going to apply your simple patch and work on a 
> two stage timer per arjan's suggestion for a later release.

I also think that's the right way to go for now. I'll ask jeff to hold off on the
two patches for now.

Auke


^ permalink raw reply

* Re: [PATCH] sky2: Use deferrable timer for watchdog
From: Stephen Hemminger @ 2007-12-20 21:08 UTC (permalink / raw)
  To: Parag Warudkar; +Cc: Arjan van de Ven, Kok, Auke, netdev, akpm, linux-kernel
In-Reply-To: <82e4877d0712201236l2962cc86y73f0be0d6e2ae4be@mail.gmail.com>

On Thu, 20 Dec 2007 15:36:13 -0500
"Parag Warudkar" <parag.warudkar@gmail.com> wrote:

> On Dec 20, 2007 3:04 PM, Arjan van de Ven <arjan@linux.intel.com> wrote:
> > > I think it is reasonable for Network driver watchdogs to use a
> > > deferrable timer - if the machine is 100% IDLE there is no one needing
> > > the network to be up. If there is something running even on the other
> > > CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
> > > which will make it very likely in practice that each CPU will be
> > > interrupted in reasonable amount of time.
> >
> > this is not correct; many machines are idle waiting for network data. Think of webservers...
> 
> Yes, I forgot the receive case. So if a server was 100% IDLE and a web
> server was listening for network data and we reach 0 wakeups per
> second on the CPU where the network watchdog timer is scheduled to run
> deferred _and_ the network link went down, it would cause the watchdog
> to not run and redo the link until some one else wakes up that CPU
> later.
> So as long as we make sure we don't convert every timer to deferrable
> we should be ok - may be this can be resolved easily by having a
> non-deferrable "dont-allow-deferring-for-too-long" timer on each CPU
> that just causes at least one wake up in some reasonable time delta
> from the previous wakeup (whoever caused that one.) It is still
> beneficial in that all deferrable timers would run at once without
> needing to have separate wakeup for each.
> 
> >
> > >
> > > Of course there are theoretical cases where we could land into a
> > > situation where a CPU in a multiprocessor machine is IDLE infinitely
> > > and that causes the watchdog that happens to be bound to run on the
> > > same CPU to not run. To take care of these unlikely cases I think the
> > > timer mechanism should have a reasonable limit on how long a CPU can
> > > go IDLE if there are deferrable timers.
> >
> > how about something else instead: a timer mechanism that takes a range instead..
> > that at least has defined semantics; the deferrable semantics really are "indefinite".
> > Lets keep at least the semantics clear and clean.
> >
> 
> Would not the simpler solution of installing a non-deferrable timer
> per cpu which will not allow the CPU to go IDLE for more than x units
> of time at once  (or something to that effect) work? Range would
> complicate the thing and I am not sure how many cases will know
> reasonably correct range for their normal operation. In this instance
> of the e1000 watchdog what range could it give and be successful at
> what it wants to do - bring up the link in reasonable amount of time,
> while also realizing the power savings?
> 
> Perhaps depending on Server/Laptop/Desktop machine (may be based on
> Preemption) we could have normal or deferrable timers but that'll
> exclude Servers from power savings and I am not sure Data center folks
> will like that :) .
> 
> Parag


The problem is that on a server the receiver will go deaf if the chip
bug that the watchdog is looking for triggers.  Yes, no packets in
and it happily will just sit there.

So for now, I am not going to apply your simple patch and work on a 
two stage timer per arjan's suggestion for a later release.

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: After many hours all outbound connections get stuck in SYN_SENT
From: Ilpo Järvinen @ 2007-12-20 21:05 UTC (permalink / raw)
  To: James Nichols
  Cc: Glen Turner, Jan Engelhardt, Eric Dumazet, LKML,
	Linux Netdev List
In-Reply-To: <83a51e120712200837p9e3d1a4g15b5f4763597073e@mail.gmail.com>

On Thu, 20 Dec 2007, James Nichols wrote:

> > You'd probably should also investigate the Linux kernel,
> > especially the size and locks of the components of the Sack data
> > structures and what happens to those data structures after Sack is
> > disabled (presumably the Sack data structure is in some unhappy
> > circumstance, and disabling Sack allows the data to be discarded,
> > magically unclaging the box).

...Not sure if you want now to invent such structure. Yes, we have per skb 
->sacked but again in SYN_SENT there are very few things who touch it at 
all, and they just set it to zero (though it would not even be mandatory 
for tcp_transmit_skb, IIRC, checked that just couple of days ago due to 
other things).

Another thing is the rx_opt.sack_ok which is just couple flag bits that 
tell the TCP variant in use (and it's mostly used only after SYN handshake 
completes). The rest (the actual SACK blocks) is in the ack_skb but again 
it has very little meaning in SYN_SENT state unless somebody is crazy 
enough to add SACK blocks to SYN-ACKs :-).

> > In the absence of the reporter wanting to dump the kernel's
> > core, how about a patch to print the Sack datastructure when
> > the command to disable Sack is received by the kernel?
> > Maybe just print the last 16b of the IP address?
> 
> Given the fact that I've had this problem for so long, over a variety
> of networking hardware vendors and colo-facilities, this really sounds
> good to me.  It will be challenging for me to justify a kernel core
> dump, but a simple patch to dump the Sack data would be do-able.

If your symptoms really are: SYNs leaving (if they show up in tcpdump, for 
sure they've left TCP code already) and SYN-ACK not showing up even in 
something as early as in tcpdump (for sure TCP side code didn't execute at 
that point yet), there's very little change that Linux' TCP code has some 
bug in it, only things that do something in such scenario are the SYN 
generation and retransmitting SYNs (and those are trivially verifiable 
from tcpdump).

-- 
 i.

^ permalink raw reply

* Re: After many hours all outbound connections get stuck in SYN_SENT
From: Justin Banks @ 2007-12-20 20:49 UTC (permalink / raw)
  To: James Nichols
  Cc: Eric Dumazet, Jan Engelhardt, linux-kernel, Linux Netdev List
In-Reply-To: <83a51e120712200808m7fa63e9jc588124a6da5f740@mail.gmail.com>

James Nichols wrote
> > I still dont understand.
> >
> > "tcpdump -p -n -s 1600 -c 10000" doesnt reveal User data at all.
> >
> > Without any exact data from you, I am afraid nobody can help.
> 
> Oh, I didn't see that you specified specific options.  I'll still have
> to anonymize 2000+ IP addresses, but I think there is an open source
> tool that will do this for you.


tcpdump -p -n -s 1600 -c 10000 | perl -pe 's/(\d+\.\d+\.\d+\.\d+)/HIDE.THIS.IP.ADDR/g'

-justinb

-- 
Justin Banks
BakBone Software
justinb@bakbone.com

^ permalink raw reply

* Re: After many hours all outbound connections get stuck in SYN_SENT
From: Ilpo Järvinen @ 2007-12-20 20:44 UTC (permalink / raw)
  To: James Nichols; +Cc: Eric Dumazet, Jan Engelhardt, LKML, Linux Netdev List
In-Reply-To: <83a51e120712200808m7fa63e9jc588124a6da5f740@mail.gmail.com>

On Thu, 20 Dec 2007, James Nichols wrote:

> > I still dont understand.
> >
> > "tcpdump -p -n -s 1600 -c 10000" doesnt reveal User data at all.
> >
> > Without any exact data from you, I am afraid nobody can help.
> 
> Oh, I didn't see that you specified specific options.  I'll still have
> to anonymize 2000+ IP addresses, but I think there is an open source
> tool that will do this for you.

Even a simple for loop in shell can do that. It's not that hard and 
there's very little need for manual work! Ingrediments: for, cut, grep
and sed.


-- 
 i.

^ permalink raw reply

* Re: [PATCH] sky2: Use deferrable timer for watchdog
From: Parag Warudkar @ 2007-12-20 20:36 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Kok, Auke, Stephen Hemminger, netdev, akpm, linux-kernel
In-Reply-To: <476ACABC.4010503@linux.intel.com>

On Dec 20, 2007 3:04 PM, Arjan van de Ven <arjan@linux.intel.com> wrote:
> > I think it is reasonable for Network driver watchdogs to use a
> > deferrable timer - if the machine is 100% IDLE there is no one needing
> > the network to be up. If there is something running even on the other
> > CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
> > which will make it very likely in practice that each CPU will be
> > interrupted in reasonable amount of time.
>
> this is not correct; many machines are idle waiting for network data. Think of webservers...

Yes, I forgot the receive case. So if a server was 100% IDLE and a web
server was listening for network data and we reach 0 wakeups per
second on the CPU where the network watchdog timer is scheduled to run
deferred _and_ the network link went down, it would cause the watchdog
to not run and redo the link until some one else wakes up that CPU
later.
So as long as we make sure we don't convert every timer to deferrable
we should be ok - may be this can be resolved easily by having a
non-deferrable "dont-allow-deferring-for-too-long" timer on each CPU
that just causes at least one wake up in some reasonable time delta
from the previous wakeup (whoever caused that one.) It is still
beneficial in that all deferrable timers would run at once without
needing to have separate wakeup for each.

>
> >
> > Of course there are theoretical cases where we could land into a
> > situation where a CPU in a multiprocessor machine is IDLE infinitely
> > and that causes the watchdog that happens to be bound to run on the
> > same CPU to not run. To take care of these unlikely cases I think the
> > timer mechanism should have a reasonable limit on how long a CPU can
> > go IDLE if there are deferrable timers.
>
> how about something else instead: a timer mechanism that takes a range instead..
> that at least has defined semantics; the deferrable semantics really are "indefinite".
> Lets keep at least the semantics clear and clean.
>

Would not the simpler solution of installing a non-deferrable timer
per cpu which will not allow the CPU to go IDLE for more than x units
of time at once  (or something to that effect) work? Range would
complicate the thing and I am not sure how many cases will know
reasonably correct range for their normal operation. In this instance
of the e1000 watchdog what range could it give and be successful at
what it wants to do - bring up the link in reasonable amount of time,
while also realizing the power savings?

Perhaps depending on Server/Laptop/Desktop machine (may be based on
Preemption) we could have normal or deferrable timers but that'll
exclude Servers from power savings and I am not sure Data center folks
will like that :) .

Parag

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox