Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] phy: new SMSC LAN83C185 PHY driver
From: Francois Romieu @ 2006-05-08  0:01 UTC (permalink / raw)
  To: Herbert Valerio Riedel; +Cc: afleming, netdev
In-Reply-To: <E1FcqmN-0003ZV-FB@fencepost.gnu.org>

Herbert Valerio Riedel <hvr@gnu.org> :
> new SMSC LAN83C185 10BaseT/100BaseTX PHY driver for the PHY subsystem
> 
> Signed-off-by: Herbert Valerio Riedel <hvr@gnu.org>

Fine-with-me: Francois Romieu <romieu@fr.zoreil.com>

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH]ip_options_fragment() has no effect on fragmentation
From: weiyj @ 2006-05-05 17:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: weiyj, netdev
In-Reply-To: <20060202.170840.114568749.davem@davemloft.net>

Hello Mr. David:

Does this patch will be used? 

This patch resolved the following problem: When I send IPv4 packet(contain 
Record Route Option) which need to be fragmented to the router, the router 
can not fragment it correctly. After fragmented by router, the second 
fragmentation still contain Record Route Option. Refer to RFC791, Record 
Route Option must Not be copied on fragmentation, goes in first fragment 
only.

On Thursday 02 February 2006 20:08, David S. Miller wrote:
> From: Wei Yongjun <weiyj@soft.fujitsu.com>
> Date: Wed, 01 Feb 2006 14:21:41 -0500
>
> Your patch is still corrupt, new lines were added by your email client
> which splits up the patch headers.
>
> I applied the patch by hand, but next time I won't put so much effort
> into fixing up your work.  Please learn how to submit patches
> properly.
>
> Thank you.

--- a/net/ipv4/ip_options.c.orig 2006-01-27 09:14:33.463612696 +0900
+++ b/net/ipv4/ip_options.c 2006-01-27 09:12:21.857619848 +0900
@@ -207,7 +207,7 @@
 
 void ip_options_fragment(struct sk_buff * skb) 
 {
- unsigned char * optptr = skb->nh.raw;
+ unsigned char * optptr = skb->nh.raw + sizeof(struct iphdr);
  struct ip_options * opt = &(IPCB(skb)->opt);
  int  l = opt->optlen;
  int  optlen;

^ permalink raw reply

* Re: [rfc][patch] ipvs: use proper timeout instead of fixed value
From: Andy Gospodarek @ 2006-05-08  3:13 UTC (permalink / raw)
  To: Horms; +Cc: Andy Gospodarek, netdev, wensong, ja
In-Reply-To: <20060507043838.GF8374@verge.net.au>

On Sun, May 07, 2006 at 01:38:40PM +0900, Horms wrote:
> On Fri, May 05, 2006 at 02:57:26PM -0400, Andy Gospodarek wrote:
> > On Fri, May 05, 2006 at 12:20:54PM +0900, Horms wrote:
> > > 
> > > Sorry, I missunderstood your patch completely the first time around.
> > > Yes I think this is an excellent idea. Assuming its tested and works
> > > I'm happy to sign off on it and prod DaveM.
> > 
> > Horms,
> > 
> > I'll get a setup together and post results when I have them.
> 
> I was thinking that it would be nice if the timeout could be sent over
> the wire, though that might bring in some compatibility issues,
> and thus your approach might be the best idea.
> 
> -- 
> Horms                                           http://www.vergenet.net/~horms/


Horms,

At first I too thought about including the timeout in the messages would
be nice, but it could raise compatibility issues with maybe the only
advantage being that it would allow quicker timeout of connections --
something that might be undesirable in such an LVS environment.

-andy


^ permalink raw reply

* Re: [rfc][patch] ipvs: use proper timeout instead of fixed value
From: Andy Gospodarek @ 2006-05-08  3:27 UTC (permalink / raw)
  To: Wensong Zhang; +Cc: Andy Gospodarek, netdev, horms, ja
In-Reply-To: <Pine.LNX.4.64.0605072302370.7007@penguin.linux-vs.org>

On Sun, May 07, 2006 at 11:32:00PM +0800, Wensong Zhang wrote:
> 
> Hi Andy,
> 
> Yes, the original sychronziation design is a sort of arbitary or 
> compromised solution. We don't want to synchronize every state change from 
> master to backup load balancer, because we were afraid that there were too 
> much state change synchronization messages and there would be some 
> performance penalty. So, we only sychronize the connection of TCP 
> ESTABLISHED state or UDP to backup load balancer, and use the timeout of 3 
> minutes.
> 
> Your change is probably ok, but we should be aware that it may create a 
> lot of connection entries at backup load balancer for TCP applications, 
> which is much more than that at master load balancer, because there is no 
> connection close synchronization and timeout is changed to 3 minutes to 15 
> minutes.

Wensong,

This is an interesting point.  Do you feel it would be reasonable to
synchronize states for extablished connections and terminating
connections rather than all state changes?  It would seem keeping the
table at a minimum size would be desireable without tracking every state
change since that could create too much noice between master and backup
-- especially for short-lived connections.

I could work some more to expand and test this patch to include more
detailed connection tracking if you (or anyone else) feel it would be
valuable.

-andy


> 
> The simple solution is probably to make timeout value tuneable, so that 
> users can tune it for their application. The better solution is to 
> synchronize very connection state change from master to backup, so that 
> backup have almost the same state of connections.
> 
> Thanks,
> 
> Wensong
> 
> 
> On Thu, 4 May 2006, Andy Gospodarek wrote:
> 
> >
> >Instead of using the default timeout of 3 minutes, this uses the timeout
> >specific to the protocol used for the connection. The 3 minute timeout
> >seems somewhat arbitrary (though I know it is used other places in the
> >ipvs code) and when failing over it would be much nicer to use one of
> >the configured timeout values.
> >
> >Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> >---
> >
> >ip_vs_sync.c |    5 +++--
> >1 files changed, 3 insertions(+), 2 deletions(-)
> >
> >diff --git a/net/ipv4/ipvs/ip_vs_sync.c b/net/ipv4/ipvs/ip_vs_sync.c
> >--- a/net/ipv4/ipvs/ip_vs_sync.c
> >+++ b/net/ipv4/ipvs/ip_vs_sync.c
> >@@ -67,7 +67,6 @@ struct ip_vs_sync_conn_options {
> >	struct ip_vs_seq        out_seq;        /* outgoing seq. struct */
> >};
> >
> >-#define IP_VS_SYNC_CONN_TIMEOUT (3*60*HZ)
> >#define SIMPLE_CONN_SIZE  (sizeof(struct ip_vs_sync_conn))
> >#define FULL_CONN_SIZE  \
> >(sizeof(struct ip_vs_sync_conn) + sizeof(struct ip_vs_sync_conn_options))
> >@@ -279,6 +278,7 @@ static void ip_vs_process_message(const
> >	struct ip_vs_sync_conn *s;
> >	struct ip_vs_sync_conn_options *opt;
> >	struct ip_vs_conn *cp;
> >+	struct ip_vs_protocol *pp;
> >	char *p;
> >	int i;
> >
> >@@ -337,7 +337,8 @@ static void ip_vs_process_message(const
> >			p += SIMPLE_CONN_SIZE;
> >
> >		atomic_set(&cp->in_pkts, sysctl_ip_vs_sync_threshold[0]);
> >-		cp->timeout = IP_VS_SYNC_CONN_TIMEOUT;
> >+		pp = ip_vs_proto_get(s->protocol);
> >+		cp->timeout = pp->timeout_table[cp->state];
> >		ip_vs_conn_put(cp);
> >
> >		if (p > buffer+buflen) {
> >
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: dBm cutoff at -1dBm is too low
From: Pavel Roskin @ 2006-05-08  3:35 UTC (permalink / raw)
  To: jt; +Cc: NetDev
In-Reply-To: <20060505172818.GA7543@bougret.hpl.hp.com>

On Fri, 2006-05-05 at 10:28 -0700, Jean Tourrilhes wrote:
> 	I tried to use 'signed' in the struct a long while ago, and
> for some reason it broke left and right, I don't remember the
> details. So, whatever we do, it would not be straightforward.

Then let's keep the structure as is and change what iwconfig is
displaying:

--- iwlib.c
+++ iwlib.c
@@ -1400,7 +1400,7 @@ iw_print_stats(char *		buffer,
 	    {
 	      len = snprintf(buffer, buflen, "Signal level%c%d dBm  ",
 			     qual->updated & IW_QUAL_LEVEL_UPDATED ? '=' : ':',
-			     qual->level - 0x100);
+			     ((qual->level - 192) & 0xff) + 192);
 	      buffer += len;
 	      buflen -= len;
 	    }

Semantically, it's already nonsense to use an unsigned number for a
value that is almost always negative.  The true meaning of qual->level
is already a subject of convention.  So let's just adjust this
convention so that we don't punish the clients that report valid
positive dBm values.

The client calculates a value (perhaps an integer) and pushes it into
unsigned char in the hope that the client side can understand it using
the common sense.  Interpreting 0dBm as -256dBm goes against the common
sense.  This has to be specifically worked around in the drivers.

Of course, it would be nice to document it somewhere, and to have
constants e.g.

#define DBM_MIN -192
#define DBM_MAX 63

-- 
Regards,
Pavel Roskin

^ permalink raw reply

* Re: [PATCH]ip_options_fragment() has no effect on fragmentation
From: David S. Miller @ 2006-05-08  5:39 UTC (permalink / raw)
  To: weiyj; +Cc: netdev
In-Reply-To: <200605051350.03300.weiyj@soft.fujitsu.com>

From: weiyj@soft.fujitsu.com
Date: Fri, 5 May 2006 13:50:02 -0400

> Does this patch will be used? 

Your patch is in the tree already.

^ permalink raw reply

* Re: [PATCH]ip_options_fragment() has no effect on fragmentation
From: David S. Miller @ 2006-05-08  5:41 UTC (permalink / raw)
  To: weiyj; +Cc: netdev
In-Reply-To: <200605051350.03300.weiyj@soft.fujitsu.com>

From: weiyj@soft.fujitsu.com
Date: Fri, 5 May 2006 13:50:02 -0400

> Does this patch will be used? 
> 
> This patch resolved the following problem: When I send IPv4 packet(contain 
> Record Route Option) which need to be fragmented to the router, the router 
> can not fragment it correctly. After fragmented by router, the second 
> fragmentation still contain Record Route Option. Refer to RFC791, Record 
> Route Option must Not be copied on fragmentation, goes in first fragment 
> only.

Actually it didn't get applied for some reason.

Please resubmit it properly with a full changelog and
"Signed-off-by: " lines, and make double sure that your
email client does not corrupt the patch so that I may
apply it cleanly.  Test this by sending it to yourself
and trying to apply the patch.

Thanks.

^ permalink raw reply

* Re: [PATCH] core: linkwatch should use jiffies64
From: David S. Miller @ 2006-05-08  6:07 UTC (permalink / raw)
  To: stefan; +Cc: netdev
In-Reply-To: <200605071213.56877.stefan@loplof.de>

From: Stefan Rompf <stefan@loplof.de>
Date: Sun, 7 May 2006 12:13:56 +0200

> the linkwatch code can overflow on a jiffies wrap, scheduling
> work with a too large delay. If the delay is >0x80000000,
> internal_add_timer() seems to overflow too, hiding the bug, so
> this isn't triggered too easily.

What is so special about what linkwatch is doing such
that it needs this kind of treatment and other similar
pieces or code do not?

We have all sorts of interfaces such as time_after() et el.
in order to deal with wrapping issues.  And furthermore
using 64-bit jiffies here might not be appropriate because
they are not guarenteed to be accessed atomically, nothing
here prevents the upper 32-bits from being updated while
we sample the lower 32-bits on 32-bit systems that will do
the 64-bit jiffied load using two 32-bit loads.

^ permalink raw reply

* Re: [PATCH]ip_options_fragment() has no effect on fragmentation
From: Wei Yongjun @ 2006-05-06  0:36 UTC (permalink / raw)
  To: David S. Miller; +Cc: weiyj, netdev

I had tested the patch under linux system, maybe this mail is correct.

Fix error point to options in ip_options_fragment(). optptr get a error
pointer to the ipv4 header, correct is pointer to ipv4 options.

Signed-off-by: Wei Yongjun <weiyj@soft.fujitsu.com>

--- a/net/ipv4/ip_options.c	2006-01-27 09:14:33.463612696 +0900
+++ b/net/ipv4/ip_options.c	2006-01-27 09:12:21.857619848 +0900
@@ -207,7 +207,7 @@

 void ip_options_fragment(struct sk_buff * skb) 
 {
-	unsigned char * optptr = skb->nh.raw;
+	unsigned char * optptr = skb->nh.raw + sizeof(struct iphdr);
 	struct ip_options * opt = &(IPCB(skb)->opt);
 	int  l = opt->optlen;
 	int  optlen;

On Monday 08 May 2006 01:41, David S. Miller wrote:
> Actually it didn't get applied for some reason.
>
> Please resubmit it properly with a full changelog and
> "Signed-off-by: " lines, and make double sure that your
> email client does not corrupt the patch so that I may
> apply it cleanly.  Test this by sending it to yourself
> and trying to apply the patch.

^ permalink raw reply

* Re: ipsec tunnel asymmetrical mtu
From: Marco Berizzi @ 2006-05-08  8:28 UTC (permalink / raw)
  To: herbert; +Cc: netdev
In-Reply-To: <E1FXVdb-0003Tt-00@gondolin.me.apana.org.au>

Herbert Xu wrote:

>However, the fact that the tcpdump causes more chunky packets to
>make it through could be an indication that there is a bug somewhere
>in our NAT/IPsec code or at least a suboptimal memory allocation
>strategy that's somehow avoided when AF_PACKET pins the skb down.

Ciao Herbert,

I have discovered another tricky behaviour. Take a look:

root@Mimosa:~# ping 10.49.59.23
PING 10.49.59.23 (10.49.59.23) 56(84) bytes of data.
64 bytes from 10.49.59.23: icmp_seq=1 ttl=247 time=91.9 ms
64 bytes from 10.49.59.23: icmp_seq=2 ttl=247 time=49.3 ms
64 bytes from 10.49.59.23: icmp_seq=3 ttl=247 time=106 ms
64 bytes from 10.49.59.23: icmp_seq=4 ttl=247 time=74.3 ms

--- 10.49.59.23 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2998ms
rtt min/avg/max/mdev = 49.316/80.460/106.257/21.241 ms
root@Mimosa:~# cd /tmp/
root@Mimosa:/tmp# tcpdump -v -p -n ip host 10.49.59.23 > 
/tmp/NULL-10.49.59.23 &
[1] 18981
root@Mimosa:/tmp# tcpdump: listening on eth0, link-type EN10MB (Ethernet), 
capture size 96 bytes

root@Mimosa:/tmp# ping 10.49.59.23
PING 10.49.59.23 (10.49.59.23) 56(84) bytes of data.

--- 10.49.59.23 ping statistics ---
8 packets transmitted, 0 received, 100% packet loss, time 6999ms

root@Mimosa:/tmp# fg
tcpdump -v -p -n ip host 10.49.59.23 >/tmp/NULL-10.49.59.23
101 packets captured
101 packets received by filter
0 packets dropped by kernel
root@Mimosa:/tmp# ping 10.49.59.23
PING 10.49.59.23 (10.49.59.23) 56(84) bytes of data.

--- 10.49.59.23 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2000ms

root@Mimosa:/tmp# cat NULL-10.49.59.23
10:09:44.401764 IP (tos 0x0, ttl 127, id 494, offset 0, flags [DF], length: 
482) 172.22.1.84.1064 > 10.49.59.23.3218: P 2911920500:2911920942(442) ack 
1338762722 win 32148
10:09:44.482254 IP (tos 0x0, ttl  52, id 49677, offset 0, flags [none], 
length: 40) 10.49.59.23.3218 > 172.29.128.1.1064: . [tcp sum ok] ack 
2911920942 win 65535
10:09:45.152849 IP (tos 0x0, ttl  52, id 49827, offset 0, flags [none], 
length: 184) 10.49.59.23.3218 > 172.29.128.1.1064: P 0:144(144) ack 1 win 
65535
10:09:45.341709 IP (tos 0x0, ttl 127, id 495, offset 0, flags [DF], length: 
40) 172.22.1.84.1064 > 10.49.59.23.3218: . [tcp sum ok] ack 145 win 32004
10:09:47.028958 IP (tos 0x0, ttl 247, id 50107, offset 0, flags [none], 
length: 84) 10.49.59.23 > 172.29.128.1: icmp 64: echo reply seq 1
10:09:48.029890 IP (tos 0x0, ttl 247, id 50365, offset 0, flags [none], 
length: 84) 10.49.59.23 > 172.29.128.1: icmp 64: echo reply seq 2
10:09:49.026640 IP (tos 0x0, ttl 247, id 50565, offset 0, flags [none], 
length: 84) 10.49.59.23 > 172.29.128.1: icmp 64: echo reply seq 3

root@Mimosa:/tmp# ip r s
172.30.30.30 via 85.32.35.1 dev eth0
85.32.35.1 dev eth0  scope link
85.36.58.168/29 dev eth0  proto kernel  scope link  src 85.36.58.174
81.113.185.96/27 via 85.32.35.1 dev eth0
85.32.35.0/27 dev eth1  scope link
172.22.1.0/24 via 85.32.35.1 dev eth0  src 172.18.1.254
172.18.1.0/24 dev eth2  proto kernel  scope link  src 172.18.1.254
172.25.5.0/24 via 85.32.35.1 dev eth0
172.25.1.0/24 via 85.32.35.1 dev eth0
172.21.1.0/24 via 85.32.35.1 dev eth0
172.17.1.0/24 via 85.32.35.1 dev eth0
172.23.4.0/23 via 85.32.35.1 dev eth0
172.23.2.0/23 via 85.32.35.1 dev eth0
172.23.0.0/23 via 85.32.35.1 dev eth0
172.16.0.0/23 via 85.32.35.1 dev eth0
10.0.0.0/8 via 85.32.35.1 dev eth0  src 172.29.128.1
127.0.0.0/8 dev lo  scope link
default via 85.32.35.1 dev eth0  metric 1

After running 'tcpdump ip host x.y.z.w', 'ping x.y.z.w' reply
doesn't appear anymore on mimosa console.



^ permalink raw reply

* Re: [rfc][patch] ipvs: use proper timeout instead of fixed value
From: Horms @ 2006-05-08  8:56 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev, wensong, ja
In-Reply-To: <20060508031333.GA25913@gospo.rdu.redhat.com>

On Sun, May 07, 2006 at 11:13:33PM -0400, Andy Gospodarek wrote:
> On Sun, May 07, 2006 at 01:38:40PM +0900, Horms wrote:
> > On Fri, May 05, 2006 at 02:57:26PM -0400, Andy Gospodarek wrote:
> > > On Fri, May 05, 2006 at 12:20:54PM +0900, Horms wrote:
> > > > 
> > > > Sorry, I missunderstood your patch completely the first time around.
> > > > Yes I think this is an excellent idea. Assuming its tested and works
> > > > I'm happy to sign off on it and prod DaveM.
> > > 
> > > Horms,
> > > 
> > > I'll get a setup together and post results when I have them.
> > 
> > I was thinking that it would be nice if the timeout could be sent over
> > the wire, though that might bring in some compatibility issues,
> > and thus your approach might be the best idea.
> > 
> > -- 
> > Horms                                           http://www.vergenet.net/~horms/
> 
> 
> Horms,
> 
> At first I too thought about including the timeout in the messages would
> be nice, but it could raise compatibility issues with maybe the only
> advantage being that it would allow quicker timeout of connections --
> something that might be undesirable in such an LVS environment.

Yes, I agree. Wensong raised the issue of having many many connections
on the backup, I personally think thats its unlikely that would be a
problem in practice, though making it tunable might be nice in any case.

-- 
Horms                                           http://www.vergenet.net/~horms/


^ permalink raw reply

* Initial benchmarks of some VJ ideas [mmap memcpy vs copy_to_user].
From: Evgeniy Polyakov @ 2006-05-08 12:24 UTC (permalink / raw)
  To: netdev; +Cc: davem, caitlinb, kelly, rusty, johnpol

[-- Attachment #1: Type: text/plain, Size: 1668 bytes --]

I hope he does not take offence at name shortening :)

I've sligtly modified UDP receiving path and run several benchmarks
in the following cases:
1. pure recvfrom() using copy_to_user() with 4k and 40k buffers.
2. recvfrom() remains the same, but skb->data is copied into kernel
buffer which can be mapped into userspace with 4k and 40k buffers
instead of copy_to_user().
3. recvfrom() remains the same, but no data is copied at all, and only 
iovec pointer is increased and it's size decreased.

Receiving is simple userspace application with one thread,
which does blocking read from UDP socket with default socket/stack parameters.

Receiver runs on 2.4 Ghz Xeon (HT enabled) with 1Gb of RAM and e1000 gigabit NIC.
Sender runs on amd64 nvidia nforce4 with 1Gb of RAM and r8169 NIC.
Machines are connected with d-link dgs-1216t gigabit switch.

Performance graph attached.

Conclusions:
at least in UDP case with 1gbit NIC performance was not increased,
but it can be the result of either NIC speed (I do not entrust to
nvidia and/or realtek), or broken sender application.
So the only observable result here is CPU usage changes:
it was decreased by 30% for copy_to_user() -> memcpy() changes
with 40k buffers. 4k buffers are too small to see any performance
changes due to syscall overhead.

If we transform CPU related changes to network speed, we still can
not get 6 times (or even 2 times) performance gain.

Luckily TCP processing is much more costly, e1000 interrupt handler
is too big, there are a lot of context switches and other
cache-unfriendly and locking stuff, but I still
wonder where does 6 (!) times performance gain lives.

-- 
	Evgeniy Polyakov

[-- Attachment #2: netchannel_speed.png --]
[-- Type: image/png, Size: 9120 bytes --]

^ permalink raw reply

* Re: [RFC 1/3] [IPSEC] xfrm: Undo afinfo lock proliferation
From: Herbert Xu @ 2006-05-08 13:47 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Miika Komu, Diego Beltrami, David S. Miller, netdev
In-Reply-To: <200605081522.14524.netdev@axxeo.de>

Hi Ingo:

On Mon, May 08, 2006 at 03:22:14PM +0200, Ingo Oeser wrote:
> 
> you wrote:
> > diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> > --- a/include/net/xfrm.h
> > +++ b/include/net/xfrm.h
> > @@ -204,8 +204,7 @@ struct xfrm_type;
> > ?struct xfrm_dst;
> > ?struct xfrm_policy_afinfo {
> > ????????unsigned short??????????family;
> > -???????rwlock_t????????????????lock;
> > -???????struct xfrm_type_map????*type_map;
> > +???????struct xfrm_type????????*type_map[256];
> 
> What does this magic number "256" mean? I guess we need either
> a comment or a proper constant here. I prefer the latter.

It's just moving an existing line down if you look further up in
the patch.  Of couse I would have nothing against a patch that
replaced 256 with IPPROTO_MAX or something.

However, I prefer to not mix clean-ups with substantive changes so
if you have the time please post a separate patch for this.  Otherwise
I'll do a patch when I resubmit this.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [RFC 1/3] [IPSEC] xfrm: Undo afinfo lock proliferation
From: Ingo Oeser @ 2006-05-08 13:58 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Miika Komu, Diego Beltrami, David S. Miller, netdev
In-Reply-To: <20060508134729.GA13089@gondor.apana.org.au>

Hi Herbert,

Herbert Xu schrieb:
> It's just moving an existing line down if you look further up in
> the patch.  Of couse I would have nothing against a patch that
> replaced 256 with IPPROTO_MAX or something.

Ahh, that's the actual meaning here... 

> However, I prefer to not mix clean-ups with substantive changes so
> if you have the time please post a separate patch for this.  Otherwise
> I'll do a patch when I resubmit this.

Would be nice to include such a patch in this patchset, since you seem
to be able to decipher that magic value :-)

Thanks & Regards

Ingo Oeser

^ permalink raw reply

* [PATCH,RFC] set_security implementation inside softmac
From: Daniel Drake @ 2006-05-08 14:57 UTC (permalink / raw)
  To: netdev; +Cc: ipw2100-admin, softmac-dev

[-- Attachment #1: Type: text/plain, Size: 692 bytes --]

Hi,

This patch moves the boring set_security logic (which is currently 
duplicated line-for-line in zd1211 and bcm43xx) into softmac, and adds a 
hook so that drivers can still be notified about sec changes (so that 
they can upload keys to the device, etc).

I also attached a patch for bcm43xx to illustrate this.

However, the set_security stuff in ieee80211 feels kind of sub-standard 
(or have I just misunderstood it?), how each driver is expected to 
populate ieee->sec from that callback, and how the driver doesn't get an 
opportunity to say "actually, I can't work with that configuration".

Is it worth fixing the whole thing, or is this kind of patch acceptable?

Thanks,
Daniel

[-- Attachment #2: softmac-set-security.patch --]
[-- Type: text/x-patch, Size: 2888 bytes --]

Index: linux/include/net/ieee80211softmac.h
===================================================================
--- linux.orig/include/net/ieee80211softmac.h
+++ linux/include/net/ieee80211softmac.h
@@ -172,6 +172,10 @@ struct ieee80211softmac_device {
 	void (*set_bssid_filter)(struct net_device *dev, const u8 *bssid);
 	void (*set_channel)(struct net_device *dev, u8 channel);
 
+	/* implement this if you need to configure hardware encryption
+	 * when the user changes security settings */
+	void (*set_security)(struct net_device *dev);
+
 	/* assign if you need it, informational only */
 	void (*link_change)(struct net_device *dev);
 
Index: linux/net/ieee80211/softmac/ieee80211softmac_module.c
===================================================================
--- linux.orig/net/ieee80211/softmac/ieee80211softmac_module.c
+++ linux/net/ieee80211/softmac/ieee80211softmac_module.c
@@ -27,6 +27,51 @@
 #include "ieee80211softmac_priv.h"
 #include <linux/sort.h>
 #include <linux/etherdevice.h>
+#include <net/ieee80211.h>
+
+static void set_security(struct net_device *dev,
+	struct ieee80211_security *sec)
+{
+	struct ieee80211softmac_device *mac = ieee80211_priv(dev);
+	struct ieee80211_device *ieee = mac->ieee;
+	struct ieee80211_security *secinfo = &ieee->sec;
+	int keyidx;
+
+	dprintk(KERN_NOTICE PFX "set_security:\n");
+	secinfo->flags = sec->flags;
+
+	for (keyidx = 0; keyidx < WEP_KEYS; keyidx++)
+		if (sec->flags & (1 << keyidx)) {
+			secinfo->encode_alg[keyidx] = sec->encode_alg[keyidx];
+			secinfo->key_sizes[keyidx] = sec->key_sizes[keyidx];
+			memcpy(secinfo->keys[keyidx], sec->keys[keyidx],
+			       SCM_KEY_LEN);
+		}
+
+	if (sec->flags & SEC_ACTIVE_KEY) {
+		secinfo->active_key = sec->active_key;
+		dprintk("   .active_key = %d\n", sec->active_key);
+	}
+	if (sec->flags & SEC_UNICAST_GROUP) {
+		secinfo->unicast_uses_group = sec->unicast_uses_group;
+		dprintk("   .unicast_uses_group = %d\n", sec->unicast_uses_group);
+	}
+	if (sec->flags & SEC_LEVEL) {
+		secinfo->level = sec->level;
+		dprintk("   .level = %d\n", sec->level);
+	}
+	if (sec->flags & SEC_ENABLED) {
+		secinfo->enabled = sec->enabled;
+		dprintk("   .enabled = %d\n", sec->enabled);
+	}
+	if (sec->flags & SEC_ENCRYPT) {
+		secinfo->encrypt = sec->encrypt;
+		dprintk("   .encrypt = %d\n", sec->encrypt);
+	}
+
+	if (mac->set_security)
+		mac->set_security(dev);
+}
 
 struct net_device *alloc_ieee80211softmac(int sizeof_priv)
 {
@@ -44,6 +89,7 @@ struct net_device *alloc_ieee80211softma
 	softmac->ieee->handle_assoc_response = ieee80211softmac_handle_assoc_response;
 	softmac->ieee->handle_reassoc_request = ieee80211softmac_handle_reassoc_req;
 	softmac->ieee->handle_disassoc = ieee80211softmac_handle_disassoc;
+	softmac->ieee->set_security = set_security;
 	softmac->scaninfo = NULL;
 
 	softmac->associnfo.scan_retry = IEEE80211SOFTMAC_ASSOC_SCAN_RETRY_LIMIT;

[-- Attachment #3: bcm43xx-set-security.patch --]
[-- Type: text/x-patch, Size: 3372 bytes --]

Index: linux/drivers/net/wireless/bcm43xx/bcm43xx_main.c
===================================================================
--- linux.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c
+++ linux/drivers/net/wireless/bcm43xx/bcm43xx_main.c
@@ -3548,8 +3548,7 @@ static void bcm43xx_ieee80211_set_chan(s
 }
 
 /* set_security() callback in struct ieee80211_device */
-static void bcm43xx_ieee80211_set_security(struct net_device *net_dev,
-					   struct ieee80211_security *sec)
+static void bcm43xx_ieee80211_set_security(struct net_device *net_dev)
 {
 	struct bcm43xx_private *bcm = bcm43xx_priv(net_dev);
 	struct ieee80211_security *secinfo = &bcm->ieee->sec;
@@ -3560,42 +3559,15 @@ static void bcm43xx_ieee80211_set_securi
 
 	bcm43xx_lock_mmio(bcm, flags);
 
-	for (keyidx = 0; keyidx<WEP_KEYS; keyidx++)
-		if (sec->flags & (1<<keyidx)) {
-			secinfo->encode_alg[keyidx] = sec->encode_alg[keyidx];
-			secinfo->key_sizes[keyidx] = sec->key_sizes[keyidx];
-			memcpy(secinfo->keys[keyidx], sec->keys[keyidx], SCM_KEY_LEN);
-		}
-	
-	if (sec->flags & SEC_ACTIVE_KEY) {
-		secinfo->active_key = sec->active_key;
-		dprintk(KERN_INFO PFX "   .active_key = %d\n", sec->active_key);
-	}
-	if (sec->flags & SEC_UNICAST_GROUP) {
-		secinfo->unicast_uses_group = sec->unicast_uses_group;
-		dprintk(KERN_INFO PFX "   .unicast_uses_group = %d\n", sec->unicast_uses_group);
-	}
-	if (sec->flags & SEC_LEVEL) {
-		secinfo->level = sec->level;
-		dprintk(KERN_INFO PFX "   .level = %d\n", sec->level);
-	}
-	if (sec->flags & SEC_ENABLED) {
-		secinfo->enabled = sec->enabled;
-		dprintk(KERN_INFO PFX "   .enabled = %d\n", sec->enabled);
-	}
-	if (sec->flags & SEC_ENCRYPT) {
-		secinfo->encrypt = sec->encrypt;
-		dprintk(KERN_INFO PFX "   .encrypt = %d\n", sec->encrypt);
-	}
 	if (bcm->initialized && !bcm->ieee->host_encrypt) {
 		if (secinfo->enabled) {
 			/* upload WEP keys to hardware */
 			char null_address[6] = { 0 };
 			u8 algorithm = 0;
 			for (keyidx = 0; keyidx<WEP_KEYS; keyidx++) {
-				if (!(sec->flags & (1<<keyidx)))
+				if (!(secinfo->flags & (1<<keyidx)))
 					continue;
-				switch (sec->encode_alg[keyidx]) {
+				switch (secinfo->encode_alg[keyidx]) {
 					case SEC_ALG_NONE: algorithm = BCM43xx_SEC_ALGO_NONE; break;
 					case SEC_ALG_WEP:
 						algorithm = BCM43xx_SEC_ALGO_WEP;
@@ -3614,7 +3586,7 @@ static void bcm43xx_ieee80211_set_securi
 						assert(0);
 						break;
 				}
-				bcm43xx_key_write(bcm, keyidx, algorithm, sec->keys[keyidx], secinfo->key_sizes[keyidx], &null_address[0]);
+				bcm43xx_key_write(bcm, keyidx, algorithm, secinfo->keys[keyidx], secinfo->key_sizes[keyidx], &null_address[0]);
 				bcm->key[keyidx].enabled = 1;
 				bcm->key[keyidx].algorithm = algorithm;
 			}
@@ -3695,6 +3667,7 @@ static int bcm43xx_init_private(struct b
 	bcm->ieee = netdev_priv(net_dev);
 	bcm->softmac = ieee80211_priv(net_dev);
 	bcm->softmac->set_channel = bcm43xx_ieee80211_set_chan;
+	bcm->softmac->set_security = bcm43xx_ieee80211_set_security;
 
 	bcm->irq_savedstate = BCM43xx_IRQ_INITIAL;
 	bcm->pci_dev = pci_dev;
@@ -3730,7 +3703,6 @@ static int bcm43xx_init_private(struct b
 	
 	bcm->ieee->iw_mode = BCM43xx_INITIAL_IWMODE;
 	bcm->ieee->tx_headroom = sizeof(struct bcm43xx_txhdr);
-	bcm->ieee->set_security = bcm43xx_ieee80211_set_security;
 	bcm->ieee->hard_start_xmit = bcm43xx_ieee80211_hard_start_xmit;
 
 	return 0;

^ permalink raw reply

* Re: [rfc][patch] ipvs: use proper timeout instead of fixed value
From: Wensong Zhang @ 2006-05-08 16:39 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev, horms, ja
In-Reply-To: <20060508032700.GB25913@gospo.rdu.redhat.com>


Hi Andy,

On Sun, 7 May 2006, Andy Gospodarek wrote:

> On Sun, May 07, 2006 at 11:32:00PM +0800, Wensong Zhang wrote:
>>
>> Hi Andy,
>>
>> Yes, the original sychronziation design is a sort of arbitary or
>> compromised solution. We don't want to synchronize every state change from
>> master to backup load balancer, because we were afraid that there were too
>> much state change synchronization messages and there would be some
>> performance penalty. So, we only sychronize the connection of TCP
>> ESTABLISHED state or UDP to backup load balancer, and use the timeout of 3
>> minutes.
>>
>> Your change is probably ok, but we should be aware that it may create a
>> lot of connection entries at backup load balancer for TCP applications,
>> which is much more than that at master load balancer, because there is no
>> connection close synchronization and timeout is changed to 3 minutes to 15
>> minutes.
>
> Wensong,
>
> This is an interesting point.  Do you feel it would be reasonable to
> synchronize states for extablished connections and terminating
> connections rather than all state changes?  It would seem keeping the
> table at a minimum size would be desireable without tracking every state
> change since that could create too much noice between master and backup
> -- especially for short-lived connections.
>

Yeah, it's reasonable, but we need to synchronize updating states of 
established connections too when established connections keep receiving 
packets, so that we can maintain the timeout of those established 
connections more accurately. It's really a trade-off to keep connection 
timeout more accurate in the mean while keeping the volume of 
synchronization messages as low as possible.

> I could work some more to expand and test this patch to include more
> detailed connection tracking if you (or anyone else) feel it would be
> valuable.
>

Maybe it's better to write a more detailed design first.

Thanks,

Wensong

> -andy
>
>
>>
>> The simple solution is probably to make timeout value tuneable, so that
>> users can tune it for their application. The better solution is to
>> synchronize very connection state change from master to backup, so that
>> backup have almost the same state of connections.
>>
>> Thanks,
>>
>> Wensong
>>
>>
>> On Thu, 4 May 2006, Andy Gospodarek wrote:
>>
>>>
>>> Instead of using the default timeout of 3 minutes, this uses the timeout
>>> specific to the protocol used for the connection. The 3 minute timeout
>>> seems somewhat arbitrary (though I know it is used other places in the
>>> ipvs code) and when failing over it would be much nicer to use one of
>>> the configured timeout values.
>>>
>>> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
>>> ---
>>>
>>> ip_vs_sync.c |    5 +++--
>>> 1 files changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ipv4/ipvs/ip_vs_sync.c b/net/ipv4/ipvs/ip_vs_sync.c
>>> --- a/net/ipv4/ipvs/ip_vs_sync.c
>>> +++ b/net/ipv4/ipvs/ip_vs_sync.c
>>> @@ -67,7 +67,6 @@ struct ip_vs_sync_conn_options {
>>> 	struct ip_vs_seq        out_seq;        /* outgoing seq. struct */
>>> };
>>>
>>> -#define IP_VS_SYNC_CONN_TIMEOUT (3*60*HZ)
>>> #define SIMPLE_CONN_SIZE  (sizeof(struct ip_vs_sync_conn))
>>> #define FULL_CONN_SIZE  \
>>> (sizeof(struct ip_vs_sync_conn) + sizeof(struct ip_vs_sync_conn_options))
>>> @@ -279,6 +278,7 @@ static void ip_vs_process_message(const
>>> 	struct ip_vs_sync_conn *s;
>>> 	struct ip_vs_sync_conn_options *opt;
>>> 	struct ip_vs_conn *cp;
>>> +	struct ip_vs_protocol *pp;
>>> 	char *p;
>>> 	int i;
>>>
>>> @@ -337,7 +337,8 @@ static void ip_vs_process_message(const
>>> 			p += SIMPLE_CONN_SIZE;
>>>
>>> 		atomic_set(&cp->in_pkts, sysctl_ip_vs_sync_threshold[0]);
>>> -		cp->timeout = IP_VS_SYNC_CONN_TIMEOUT;
>>> +		pp = ip_vs_proto_get(s->protocol);
>>> +		cp->timeout = pp->timeout_table[cp->state];
>>> 		ip_vs_conn_put(cp);
>>>
>>> 		if (p > buffer+buflen) {
>>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: IPv6 connect() from site-local to global IPv6 address.
From: Rick Jones @ 2006-05-08 16:44 UTC (permalink / raw)
  To: David Woodhouse
  Cc: YOSHIFUJI Hideaki / 吉藤英明, netdev
In-Reply-To: <1146926830.2503.67.camel@shinybook.infradead.org>

>>Anyway, it is valid to use (obsolete) site-local source address
>>for global destination address.
>>The problem seems that router does NOT send ICMPv6 destination
>>unreachable to the sender. I don't know why, but it SHOULD. 
> 
> 
> I'll pursue that question later. It wouldn't be _sufficient_ since there
> are (buggy) programs, including Evolution, which will not fall back to
> the second and subsequent addresses from getaddrinfo() -- they'll just
> give up when the first attempt to connect fails. So we really do need
> the IPv4 address to be listed _first_ in the results, as it used to be.

Or get the applications fixed no?  Kludging around application bugs 
sounds a bit like the "Fram Oil Filter" commercial where the mechanic is 
grinning while he says "You can pay me now, or you can pay be later." As 
in pay for the slightly more expensive oil filter now, or engine repair 
later.

> I wish to deploy IPv6 internally so that we can develop and test IPv6
> support. There is _no_ chance of getting proper IPv6 connectivity to the
> outside world through the corporate firewall. I'd like IPv6 to be usable
> _internally_ though, without breaking connectivity to the outside world
> over IPv4.
> 
> How should I do this?

Other than fixing the applications that only take the first response 
(isn't that a generic application bug going back nearly decades now? 
amazing how things stay the same isn't it) Can you run a caching-only 
name server at the edge that filters-out the IPv6 responses so your 
systems never see Global IPV6 responses?

rick jones

^ permalink raw reply

* Re: [PATCH] netdev: hotplug napi race cleanup
From: Stephen Hemminger @ 2006-05-08 16:54 UTC (permalink / raw)
  To: David S. Miller; +Cc: herbert, patrakov, netdev, akpm
In-Reply-To: <20060506.180947.35317492.davem@davemloft.net>

On Sat, 06 May 2006 18:09:47 -0700 (PDT)
"David S. Miller" <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@osdl.org>
> Date: Mon, 24 Apr 2006 15:23:41 -0700
> 
> > This follows after the earlier two patches.
> > 
> > Change the initialization of the class device portion of the net device
> > to be done earlier, so that any races before registration completes are
> > harmless.  Add a mutex to avoid changes to netdevice during the
> > class device registration. 
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
> 
> I'm not going to apply this patch and instead request that we think
> about why this problem exists in the first place.
> 
> This patch is even stronger evidence that doing the sysfs registry in
> the todo list processing is wrong.  If you can legally do this while
> holding the rtnl semaphore, you can just as equally do it inside of
> register_netdevice() which is where it truly belongs.
> 
> Then you can handle errors properly, unwind the state, and return the
> error to the caller instead of just losing the error and leaving the
> device in a half-registered state.

The issue is are there network devices that can't sleep during
register_netdevice?

^ permalink raw reply

* Re: [PATCH] TCP congestion module: add TCP-LP supporting for 2.6.16.14
From: Pavel Machek @ 2006-05-08 11:29 UTC (permalink / raw)
  To: Wong Edison; +Cc: netdev, linux-kernel
In-Reply-To: <3feffd230605062232m1b9a3951h6d21071cdacc890f@mail.gmail.com>

Hi!

> TCP Low Priority is a distributed algorithm whose goal 
> is to utilize only
> the excess network bandwidth as compared to the ``fair 
> share`` of
> bandwidth as targeted by TCP. Available from:
>   http://www.ece.rice.edu/~akuzma/Doc/akuzma/TCP-LP.pdf

Nice... I'd like to use something like this on my (overloaded)
GPRS/EDGE link.

Unfortunately, patch does not include documentation update AFAICS. How
do I use it? net-nice -n 19 rsync would be nice, but I guess that
would be quite complex...?
							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply

* Re: dBm cutoff at -1dBm is too low
From: Jean Tourrilhes @ 2006-05-08 17:17 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: NetDev
In-Reply-To: <1146972732.24434.89.camel@dv>

On Sat, May 06, 2006 at 11:32:12PM -0400, Pavel Roskin wrote:
> On Fri, 2006-05-05 at 10:28 -0700, Jean Tourrilhes wrote:
> > 
> > 	There are still quite a few drivers which have not been
> > converted to use IW_QUAL_DBM, so I don't want to drop the backward
> > compatibility yet.
> 
> But shouldn't you trust the drivers using IW_QUAL_DBM, whether the value
> is positive or negative?

	You can't remove the test, making the rest pointeless. Old
style driver never used the flags, new style driver that don't report
dBm will never use the flags, and there is not way to dinstinguish
both, apart from the 'sign' of the value.
	We may want to perform an audit of the various drivers,
in-tree and out-of-tree, to see how much progress we have made so far.

> The problem is the driver has to take care of it.  It cannot just take
> the dBm value from the card and pass it to userspace.  It has to limit
> the value at -1.  Otherwise iwconfig would show -256dBm or something
> like that.  I can imagine that some GUI can decide that the connection
> has become very bad, and that would confuse the user.

	Correct.

> I suggest -192dBm to 63dBm.  That's enough padding on both sides, so
> that the drivers can just pass the firmware value without checking.

	Yep, seems reasonable.

> Pavel Roskin

	Jean

^ permalink raw reply

* Re: [PATCH] TCP congestion module: add TCP-LP supporting for 2.6.16.14
From: Wong Edison @ 2006-05-08 17:25 UTC (permalink / raw)
  To: Pavel Machek; +Cc: netdev, linux-kernel
In-Reply-To: <20060508112915.GB4162@ucw.cz>

umum...
as i am new in submitting patch...
althought i have already read for a lot of reference...
i know only a little about the format required :(

i would like to do whatever i need to do
may u give me some hints about how to do so ??

net-nice -n 19 rsync ??
sorry that i don't know what is it... :(

for how to use it
1. patch the source tree
2. chose TCP-LP as modules and make it
3. after install use:
  sysctl -w net.ipv4.tcp_congestion_control=lp
4. you will then find your connection is relatively lower priority
than other computers

i put my work in my site:
http://edin.no-ip.com/project/tcp-lp/

Regard,
Edison

2006/5/8, Pavel Machek <pavel@suse.cz>:
> Hi!
>
> > TCP Low Priority is a distributed algorithm whose goal
> > is to utilize only
> > the excess network bandwidth as compared to the ``fair
> > share`` of
> > bandwidth as targeted by TCP. Available from:
> >   http://www.ece.rice.edu/~akuzma/Doc/akuzma/TCP-LP.pdf
>
> Nice... I'd like to use something like this on my (overloaded)
> GPRS/EDGE link.
>
> Unfortunately, patch does not include documentation update AFAICS. How
> do I use it? net-nice -n 19 rsync would be nice, but I guess that
> would be quite complex...?
>                                                         Pavel
> --
> Thanks for all the (sleeping) penguins.
>

^ permalink raw reply

* Re: 2.4 kern: want to print TCP cwnd with every packet
From: Stephen Hemminger @ 2006-05-08 17:31 UTC (permalink / raw)
  To: George P Nychis; +Cc: netdev
In-Reply-To: <28365.128.237.227.255.1146925156.squirrel@128.237.227.255>

On Sat, 6 May 2006 10:19:16 -0400 (EDT)
"George P Nychis" <gnychis@cmu.edu> wrote:

> Hi,
> 
> I'd like to print the TCP cwnd for the sender, with every packet before it is sent out.  This way i could plot the sender window over time to show TCP's behavior in certain conditions.
> 
> I see in tcp_input.c several places where i could print the current window, but i'd have to add code in multiple places.  I was wondering if there is any 1 place, right before a packet is sent out, that i could printk() tp->snd_cwnd
> 
> Thanks!
> George
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Look at http://developer.osdl.org/shemminger/prototypes/tcpprobe.tar.gz

^ permalink raw reply

* Re: [PATCH] TCP congestion module: add TCP-LP supporting for 2.6.16.14
From: David S. Miller @ 2006-05-08 17:43 UTC (permalink / raw)
  To: pavel; +Cc: hswong3i, netdev, linux-kernel
In-Reply-To: <20060508112915.GB4162@ucw.cz>

From: Pavel Machek <pavel@suse.cz>
Date: Mon, 8 May 2006 11:29:15 +0000

> Hi!
> 
> > TCP Low Priority is a distributed algorithm whose goal 
> > is to utilize only
> > the excess network bandwidth as compared to the ``fair 
> > share`` of
> > bandwidth as targeted by TCP. Available from:
> >   http://www.ece.rice.edu/~akuzma/Doc/akuzma/TCP-LP.pdf
> 
> Nice... I'd like to use something like this on my (overloaded)
> GPRS/EDGE link.
> 
> Unfortunately, patch does not include documentation update AFAICS. How
> do I use it? net-nice -n 19 rsync would be nice, but I guess that
> would be quite complex...?

You could select it as the default congestion control algorithm
in your kernel config, but that's probably not what you want.

Or, just include it, and select it with the TCP_CONGESTION socket
option when you want it.  Sorry, this does require app modifications.

^ permalink raw reply

* Re: [RFC] SECMARK 1.0
From: Karl MacMillan @ 2006-05-08 17:41 UTC (permalink / raw)
  To: James Morris
  Cc: Joshua Brindle, selinux, netdev, netfilter-devel, Stephen Smalley,
	Daniel J Walsh
In-Reply-To: <Pine.LNX.4.64.0605071323330.9423@d.namei>

On Sun, 2006-05-07 at 13:43 -0400, James Morris wrote:
> On Sun, 7 May 2006, Joshua Brindle wrote:
> 
> > It looks like you are labeling all packets on an established connection as
> > tracked_packet_t. What is the rationale for not labeling all ftp traffic as
> > ftpd_packet_t? Granted that its very unlikely for established connections to
> > go to the wrong process but the SELinux policy should be able to clearly show
> > that ftpd and sshd cannot see each others packets but these policies say that
> > they can both send/receive tracked_packet_t.
> 
> Yes.  This is due to the way connection tracking works, there's no 
> indication of which connection the packet belongs to, just its state.  We 
> depend on conntrack to correctly determine the state of the packet, 
> anyway, and we don't have real IP security without something like IPsec.
> 
> SELinux policy analysis will need to work on the assumption that conntrack 
> is working correctly, and that a packet which is a valid part of an ftp 
> connection won't end up delivering data to sshd.
> 
> We could look at something similar to the CONNMARK target, where 
> connections are labeled, but it's very ugly and really not useful, as we 
> assume that conntrack is working correctly.
> 

Something like CONNMARK seems preferable to me (perhaps even allowing
type_transition rules to give the related packets a unique type). This
makes the labeling reflect the real security property of the packets.
Yes, we are trusting the conntrack to mark the packets accurately, but
it makes the policy match the intent. Otherwise it is not possible to
reason about information flow using just the policy.

Are there serious downsides to this approach?

> You can always not use conntrack and emulate the existing controls, as 
> well.
> 

Yes, but gaining connection tracking is a major advantage of this
approach over the existing controls.

Karl

-- 
Karl MacMillan
Tresys Technology
www.tresys.com
> > Obviously these are just examples, I'm just curious if there was a reason to
> > label established packets differently from the new connection packets (and the
> > same as all the other established packets)
> 
> The label for the new connection packet is essential for ensuring that the 
> connection being created is the right type.  We know that if a valid SYN 
> arrives on port 22, that this is an attempt to establish a connection with 
> sshd (assuming it's configured to listen on port 22).  Then, conntrack 
> observes the TCP handshake, and creates connection state if successful.
> 
> Also, we only want sshd to receive new connections here, not create them.
> 
> > I imagine that, at least at first, it would be good to have allow domain
> > unlabeled_t:packet { send recv }; in an (enabled) conditional so that the
> > migration will be easier.
> 
> See 
> 
> http://people.redhat.com/jmorris/selinux/secmark/policy/packettest/packettest.te
> 
> # Totally insecure, for testing.
> #
> # Allow all unlabled packets to all domains.
> #
> allow domain unlabeled_t:packet { send recv };
> 
> As the policy is generated by macros, it should be very simple to change 
> just a few of the macros to generate compatible policy rules for the new 
> controls.
> 
> > Also, we need to come up with a mechanism for distributing default marking
> > rules that can accompany a policy. The rules could go into a section in the
> > .pp file but how does that integrate with various firewall systems that take
> > control of the iptables rules?
> 
> I believe the way to handle this is to create SELinux input and output 
> chains and then add all of the SELinux labeling rules via them, so the 
> SELinux rules are at least partitioned off, and these tools will be able 
> to treat them opaquely. Have a look at the example rulesets I posted on 
> the web site.
> 
> > And finally, what happens if the labeling rule changes during an established
> > connection? Do the packets related to that connection retain the original
> > label or will they get the new label?
> 
> It depends on which rule is changed.  If it's for a NEW connection, only 
> packets hitting that will get the new label.  For ESTABLISHED & RELATED, 
> all packets on those rules will get the new label once the rule is 
> updated.
> 
> > Thanks, this will be very beneficial to the SELinux community
> 
> I hope so, the SELinux policy should be pretty simple, and the iptables 
> rules perhaps not so much, but they'll be generated by macros.
> 
> I've also seen some interesting performance improvements (especially if 
> conntrack is not loaded), although we won't really know the overall 
> picture until all of the current policy is converted over.  Then it'll be 
> a case of how well iptables performs with lots of rules and chains.
> 
> 
> - James
> -- 
> James Morris
> <jmorris@namei.org>
> 
> --
> This message was distributed to subscribers of the selinux mailing list.
> If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with
> the words "unsubscribe selinux" without quotes as the message.


^ permalink raw reply

* Re: [PATCH] TCP congestion module: add TCP-LP supporting for 2.6.16.14
From: Wong Edison @ 2006-05-08 17:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: pavel, netdev, linux-kernel
In-Reply-To: <20060508.104322.58430929.davem@davemloft.net>

> Or, just include it, and select it with the TCP_CONGESTION socket
> option when you want it.  Sorry, this does require app modifications.

i would like to have more information about this
so within the app
after create the socket
then call setsockopt (!?)
to set the TCP_CONGESTION into "lp" (in my case) ??

is that means the socket's congestion algorithm will then be what i set ??
in this socket within this app only ??

how about the socket create by "accept" ??
it will still use the default ?? or as listen socket ??

thanks

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox