Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: netlink nlmsg_pid supposed to be pid or tid?
From: Alexey Kuznetsov @ 2005-11-21 22:49 UTC (permalink / raw)
  To: Christopher Friesen; +Cc: Herbert Xu, netdev, linux-kernel, davem
In-Reply-To: <4382406D.1040508@nortel.com>

Hello!

> TIPC wants the user to fill in the pid to use in the nlmsghdr portion of 
> a particular message.

It is wrong. netlink_pid used not to be associated with process pids.
Kernel used pid just as a seed to calculate a random value to bind,
when user did not bind explicitly. It is equal to current->pid occasionally.
F.e. libnetlink from iproute autobinds and gets netlink_pid with
getsockname().

When user binds the socket himself, he was free to bind to any value,
including pid and tgid.

Actually, I remember one discussion. Herbert, wait a minute...
That's it: February 2005, Subject: [PATCH] Add audit uid to netlink credentials
We decided (or not?) that binding to anything but tgid and pid
must be prohibited by security reasons. Apaprently, the finding was lost.

Alexey

^ permalink raw reply

* [NETLINK]: Use tgid instead of pid for nlmsg_pid
From: Herbert Xu @ 2005-11-21 22:16 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: herbert, cfriesen, netdev, linux-kernel, davem
In-Reply-To: <20051121213549.GA28187@ms2.inr.ac.ru>

Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote:
> 
> I agree, apparently netlink_autobind was missed when sed'ing pid->tgid.
> Of course, it does not matter, but tgid is nicer choice from user's viewpoint.

Great, here is the patch to do just that.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -476,7 +476,7 @@ static int netlink_autobind(struct socke
 	struct hlist_head *head;
 	struct sock *osk;
 	struct hlist_node *node;
-	s32 pid = current->pid;
+	s32 pid = current->tgid;
 	int err;
 	static s32 rover = -4097;
 

^ permalink raw reply

* Re: netlink nlmsg_pid supposed to be pid or tid?
From: Herbert Xu @ 2005-11-21 22:15 UTC (permalink / raw)
  To: Christopher Friesen; +Cc: kuznet, herbert, netdev, linux-kernel, davem
In-Reply-To: <4382406D.1040508@nortel.com>

Christopher Friesen <cfriesen@nortel.com> wrote:
> 
> When an NPTL child thread uses getpid() to specify the pid, it never 
> receives a response to this request.  Running the same code on the 
> parent works, and running the same code under Linuxthreads works.

As I said before, you should not rely on getpid() to work.  Any other
process in the system can bind to your pid which makes it unavailable
to your process.  In which case the kernel will pick an arbitrary
negative pid as your address.

You should always use getsockaddr(2) to get your local address.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH -mm2] net: Fix compiler-error on dgrs.c when !CONFIG_PCI
From: Herbert Xu @ 2005-11-21 22:12 UTC (permalink / raw)
  To: Richard Knutsson
  Cc: herbert, akpm, linux-kernel, netdev, jgarzik, ashutosh.naik
In-Reply-To: <43824066.4080109@student.ltu.se>

Richard Knutsson <ricknu-0@student.ltu.se> wrote:
>
> We need it even if pci_register_driver() and pci_register_driver() is 
> empty shells. And instead of removing #endif CONFIG_PCI for 

I don't think so.  Please see my previous patch where pci_register_driver
is not called at all if CONFIG_PCI is not defined.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: netlink nlmsg_pid supposed to be pid or tid?
From: Christopher Friesen @ 2005-11-21 21:47 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: Herbert Xu, netdev, linux-kernel, davem
In-Reply-To: <20051121213549.GA28187@ms2.inr.ac.ru>

Alexey Kuznetsov wrote:
> Hello!
> 
> 
>>I tend to agree with you here that tgid makes more sense.
> 
> 
> I agree, apparently netlink_autobind was missed when sed'ing pid->tgid.
> Of course, it does not matter, but tgid is nicer choice from user's viewpoint.

I'm glad you agree, but I'm not sure what you mean by "it does not matter".

TIPC wants the user to fill in the pid to use in the nlmsghdr portion of 
a particular message.

When an NPTL child thread uses getpid() to specify the pid, it never 
receives a response to this request.  Running the same code on the 
parent works, and running the same code under Linuxthreads works.

Using gettid() works, but it also means that only the thread that issued 
the request can read the reply.

Chris

^ permalink raw reply

* Re: [PATCH -mm2] net: Fix compiler-error on dgrs.c when !CONFIG_PCI
From: Richard Knutsson @ 2005-11-21 21:47 UTC (permalink / raw)
  To: Herbert Xu; +Cc: akpm, linux-kernel, netdev, jgarzik, ashutosh.naik
In-Reply-To: <20051121203758.GA25509@gondor.apana.org.au>

Herbert Xu wrote:

>On Mon, Nov 21, 2005 at 01:52:49PM +0100, Richard Knutsson wrote:
>  
>
>>This patch requirer the 
>>"net-fix-compiler-error-on-dgrsc-when-config_pci.patch" (added to the 
>>-mm tree after 2.6.15-rc1-mm2):
>>
>>--- 
>>devel/drivers/net/dgrs.c~net-fix-compiler-error-on-dgrsc-when-config_pci 
>>2005-11-19 18:00:34.000000000 -0800
>>+++ devel-akpm/drivers/net/dgrs.c	2005-11-19 18:00:34.000000000 -0800
>>@@ -1458,6 +1458,8 @@ static struct pci_driver dgrs_pci_driver
>>	.probe = dgrs_pci_probe,
>>	.remove = __devexit_p(dgrs_pci_remove),
>>};
>>+#else
>>+static struct pci_driver dgrs_pci_driver = {};
>>#endif
>>    
>>
>
>I don't see the point.  We shouldn't have this structure at all
>if CONFIG_PCI is not set.
>
>Cheers,
>
We need it even if pci_register_driver() and pci_register_driver() is 
empty shells. And instead of removing #endif CONFIG_PCI for 
dgrs_pci_driver, which needs dgrs_pci_tbl, dgrs_pci_probe() and 
dgrs_pci_remove() (and so on), I added the empty dgrs_pci_driver-structure.

cu,

^ permalink raw reply

* Re: netlink nlmsg_pid supposed to be pid or tid?
From: Alexey Kuznetsov @ 2005-11-21 21:35 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Christopher Friesen, netdev, linux-kernel, davem
In-Reply-To: <E1EeIcx-0006i3-00@gondolin.me.apana.org.au>

Hello!

> I tend to agree with you here that tgid makes more sense.

I agree, apparently netlink_autobind was missed when sed'ing pid->tgid.
Of course, it does not matter, but tgid is nicer choice from user's viewpoint.

Alexey

^ permalink raw reply

* Re: netlink nlmsg_pid supposed to be pid or tid?
From: Herbert Xu @ 2005-11-21 20:51 UTC (permalink / raw)
  To: Christopher Friesen; +Cc: netdev, linux-kernel, davem, kuznet
In-Reply-To: <438220C3.4040602@nortel.com>

Christopher Friesen <cfriesen@nortel.com> wrote:
> 
> When using netlink, should "nlmsg_pid" be set to pid (current->tgid) or 
> tid (current->pid)?

I tend to agree with you here that tgid makes more sense.  Dave, what
do you think?

However, you should never assume whatever value the kernel uses since
it may always choose an arbitrary value should the preferred pid/tgid
be taken by someone else.

So always use getsockname(2) to find out the correct address.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH -mm2] net: Fix compiler-error on dgrs.c when !CONFIG_PCI
From: Herbert Xu @ 2005-11-21 20:37 UTC (permalink / raw)
  To: Richard Knutsson; +Cc: akpm, linux-kernel, netdev, jgarzik, ashutosh.naik
In-Reply-To: <4381C321.5010805@student.ltu.se>

On Mon, Nov 21, 2005 at 01:52:49PM +0100, Richard Knutsson wrote:
>
> What do you think about this patch? Will you sign it? It is no longer an 
> error-warning-fix but a bug-fix (and some cleanup).
> I "took" you implementation of dgrs_(un)register_eisa(), especially 
> since eisa needed to be unregistered if pci succeeds (I take you word 
> for it to be so).
> (BTW, this patch is compiled with CONFIG_PCI set, CONFIG_EISA set and 
> both set without errors/warnings for dgrs.o.)
> 
> This patch requirer the 
> "net-fix-compiler-error-on-dgrsc-when-config_pci.patch" (added to the 
> -mm tree after 2.6.15-rc1-mm2):
> 
> --- 
> devel/drivers/net/dgrs.c~net-fix-compiler-error-on-dgrsc-when-config_pci 
> 2005-11-19 18:00:34.000000000 -0800
> +++ devel-akpm/drivers/net/dgrs.c	2005-11-19 18:00:34.000000000 -0800
> @@ -1458,6 +1458,8 @@ static struct pci_driver dgrs_pci_driver
> 	.probe = dgrs_pci_probe,
> 	.remove = __devexit_p(dgrs_pci_remove),
> };
> +#else
> +static struct pci_driver dgrs_pci_driver = {};
> #endif

I don't see the point.  We shouldn't have this structure at all
if CONFIG_PCI is not set.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: netlink nlmsg_pid supposed to be pid or tid?
From: Christopher Friesen @ 2005-11-21 19:46 UTC (permalink / raw)
  To: netdev, Linux Kernel Mailing List
In-Reply-To: <438220C3.4040602@nortel.com>

Friesen, Christopher [CAR:VC21:EXCH] wrote:
> 
> When using netlink, should "nlmsg_pid" be set to pid (current->tgid) or 
> tid (current->pid)?

The reason I ask is that with 2.6.10 we had to set it to tid, which 
really doesn't make much sense given than sockets are shared across all 
threads in a process (and gettid() isn't even a libc function).

Chris

^ permalink raw reply

* netlink nlmsg_pid supposed to be pid or tid?
From: Christopher Friesen @ 2005-11-21 19:32 UTC (permalink / raw)
  To: netdev, Linux Kernel Mailing List


When using netlink, should "nlmsg_pid" be set to pid (current->tgid) or 
tid (current->pid)?

Chris

^ permalink raw reply

* Re: [2.6 patch] kill drivers/net/irda/sir_core.c
From: Jean Tourrilhes @ 2005-11-21 18:09 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: jgarzik, netdev, linux-kernel
In-Reply-To: <20051120232904.GM16060@stusta.de>

On Mon, Nov 21, 2005 at 12:29:04AM +0100, Adrian Bunk wrote:
> EXPORT_SYMBOL's do nowadays belong to the files where the actual 
> functions are.
> 
> Moving the module_init/module_exit to the file with the actual functions 
> has the advantage of saving a few bytes due to the removal of two 
> functions.
> 
> 
> Signed-off-by: Adrian Bunk <bunk@stusta.de>

	Go for it.
	Ack-by : Jean Tourrilhes <jt@hpl.hp.com>

	Jean

^ permalink raw reply

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
From: Patrick McHardy @ 2005-11-21 16:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, netfilter-devel, yasuyuki.kozakai
In-Reply-To: <20051120.230034.105307423.davem@davemloft.net>

David S. Miller wrote:
> I've read over Patrick's two most recent postings of these patches
> and I think they are generally sane and I cannot find any holes in
> them.  Herbert brought up the legitimate concern about defragmentation,
> but I think that's a detail and does not take away from the structural
> soundness of Patrick's approach.

I think we implicitly agreed on moving the POST_ROUTING hook before
fragmentation and change the user-visible behaviour of the mangle
POSTROUTING chain. At least neither Harald not Rusty objected to
the patch :)

^ permalink raw reply

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
From: Patrick McHardy @ 2005-11-21 16:34 UTC (permalink / raw)
  To: Yasuyuki KOZAKAI; +Cc: netdev, netfilter-devel, davem
In-Reply-To: <200511211053.jALAro04019574@toshiba.co.jp>

Yasuyuki KOZAKAI wrote:
>>I don't see why it is confusing. Plain text packets are visible before
>>encapsulation (and they have to be because we don't necessarily know
>>if packets will be encapsulated at the time the hooks are called in
>>case the policy lookup after NAT returns a policy), plain text packets
>>are visible after decapsulation. With different hooks we can't have
>>symetrical behaviour because of the case I mentioned above, and that
>>would be confusing IMO.
> 
> Well, what I worried about was just ease to use, not internal processing.
> I suspected that users can correctly configure IPsec and packet filtering.
> Just doing "iptables -P INPUT -j DROP; iptables -A INPUT -p esp -j ACCEPT"
> will drop all input packets and this is different behavior with current
> kernel, for example. So I just imagined many people would say "Why ?".
> 
> But as you said in other mail, probably this is my needles fear, isn't this ?
> As you know, I'm basically worrier. :)

:) I think its OK since in tunnel mode the decapsulated packets already
need to be allowed by the ruleset, so I think its not too confusing to
expect the same in transport mode.

>>>Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6)
>>>so that ip_local_deliver_finish()/ip6_input_finish() can continue to process
>>>headers if packets have not been mangled ? Is this difficult or impossible to
>>>implement ?
>>
>>I'm not sure I understand. Do you propose to check if the packet was
>>mangled after the PRE_ROUTING hook (if it was not stolen or queued)
>>and if not return directly to ip6_input_finish()?
> 
> Yes.
> 
>> Where would the LOCAL_IN hook be called?
> 
> Ah, indeed. But we can add it just before return directly to
> ip6_input_finish().

Please see my mail to Kazunori. The hooks take ownership of the
skb, changing this would become pretty ugly because of queued
or stolen packets. And it would still leave one path where packets
are not directly returned, so I think the double-parsing should
be handled otherwise.

>>AFAICT statistics are not affected by my patches, except for the
>>iptables counters. The double parsing of extension headers is indeed
>>a problem I forgot about, it looks like we need to carry nhoff around
>>if it can't be derived from the packet.
> 
> Of cause that is one solution.

I'm going to try that if I can't come up with something better.

^ permalink raw reply

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
From: Patrick McHardy @ 2005-11-21 16:29 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev, netfilter-devel, kazunori, davem
In-Reply-To: <20051121.175033.87771604.yoshfuji@linux-ipv6.org>

YOSHIFUJI Hideaki / ^[$B5HF#1QL@^[ wrote:
> Hello.
> 
> In article <438185ED.3050005@miyazawa.org> (at Mon, 21 Nov 2005 17:31:41 +0900), Kazunori Miyazawa <kazunori@miyazawa.org> says:
> 
> 
>>Your ip_xfrm_transport_hook is a good idea, I think.
>>
>>We could call ip6_rcv_finish if the netfilter changed the addresses
>>or otherwise we can continue the loop to avoid the cost in a similar
>>way because we can know the change with checking skb->dst.
> 
> 
> Well, I agree.
> 
> Probably, we can do similarly for ipv6; e.g.:
> 
> int ip6_xfrm_transport_hook(struct sk_buff *skb)
> {
> #if 0 /* We NEVER support NAT. :-) */
>      if (likely(skb->dst == NULL)) {
>             int err = ip6_route_input()
>             if (unlikely(err))
>                      goto drop;
>      }
> #endif
>      __skb_pull(skb, skb->h.raw - skb->nh.raw);
>      return NET_RX_SUCCESS;
> drop:
>      kfree_skb(skb);
>      return NET_RX_DROP;
> }
> 
> :
> 
>       } else {
> #ifdef CONFIG_NETFILTER
>              skb->nh.ipv6h->payload_len = htons(skb->len);
> 	     skb->h.raw = skb->data;
>              __skb_push(skb, skb->data - skb->nh.raw);
> 
>              if (NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL,
>                          ip6_xfrm_transport_hook) == NET_RX_DROP)
>                  return -1;
> #endif
>              return 1;
>       }
> 
> Then, we can continue parsing extension headers, I think.

Is it the rerouting you're concerned about? It will usually not
happen because skb->dst is not NULL. Its needed for NFQUEUE,
packets can be changes in userspace and need rerouting afterwards.
In any case there would still be one path on which extension
headers would be parsed twice, so I'm going to look into different
ways to fix that.

^ permalink raw reply

* Re: [2.6 patch] do not select NET_CLS
From: Patrick McHardy @ 2005-11-21 16:16 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Christian, linux-kernel, netdev, Roman Zippel
In-Reply-To: <20051121155955.GW16060@stusta.de>

Adrian Bunk wrote:
> 2.6.15-rc changes NET_CLS to being automatically select'ed when needed.
> 
> This patch confuses users since NET_CLS is a bool, and compiling an 
> additional module that select's NET_CLS causes unresolved symbols since 
> it's not user-visible that adding a module changes the kernel image.
> 
> This patch therefore changes NET_CLS back to the 2.6.14 status quo of 
> being an user-visible option.

I disagree with this patch. NET_CLS enables the infrastructure support
for classifiers. Users generally don't care about infrastructure but
directly usable things, so I'd prefer to have it automatically selected. 
And there are lots of other cases where enabling a module causes changes
in the kernel image. Some examples include some of the netfilter stuff,
the IPsec transforms, NET_CLS_ROUTE4, the ieee80211 stuff, and a lot
more.

^ permalink raw reply

* [2.6 patch] do not select NET_CLS
From: Adrian Bunk @ 2005-11-21 15:59 UTC (permalink / raw)
  To: Christian; +Cc: linux-kernel, netdev, Roman Zippel
In-Reply-To: <20051116235813.GS5735@stusta.de>

>...
> But there's a change in 2.6.15-rc1 that makes this issue much worse:
> It is no longer user-visible.
> 
> tristate's select'ing bool's that do not change parts of the (modular) 
> driver but compile additional code into the kernel are simply wrong.

The patch below (should apply against 2.6.15-rc2) fixes this issue.

cu
Adrian


<--  snip  -->


2.6.15-rc changes NET_CLS to being automatically select'ed when needed.

This patch confuses users since NET_CLS is a bool, and compiling an 
additional module that select's NET_CLS causes unresolved symbols since 
it's not user-visible that adding a module changes the kernel image.

This patch therefore changes NET_CLS back to the 2.6.14 status quo of 
being an user-visible option.


Signed-off-by: Adrian Bunk <bunk@stusta.de>

---

 net/sched/Kconfig |   25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

--- linux-2.6.15-rc1-mm2-full/net/sched/Kconfig.old	2005-11-21 16:39:04.000000000 +0100
+++ linux-2.6.15-rc1-mm2-full/net/sched/Kconfig	2005-11-21 16:41:59.000000000 +0100
@@ -254,11 +254,23 @@
 comment "Classification"
 
 config NET_CLS
-	boolean
+	bool "Packet classifier API"
+	help
+	  The CBQ scheduling algorithm requires that network packets which are
+	  scheduled to be sent out over a network device be classified
+	  according to some criterion. If you say Y here, you will get a
+	  choice of several different packet classifiers with the following
+	  questions.
+
+	  This will enable you to use Differentiated Services (diffserv) and
+	  Resource Reservation Protocol (RSVP) on your Linux router.
+	  Documentation and software is at
+	  <http://diffserv.sourceforge.net/>.
+
+if NET_CLS
 
 config NET_CLS_BASIC
 	tristate "Elementary classification (BASIC)"
-	select NET_CLS
 	---help---
 	  Say Y here if you want to be able to classify packets using
 	  only extended matches and actions.
@@ -268,7 +280,6 @@
 
 config NET_CLS_TCINDEX
 	tristate "Traffic-Control Index (TCINDEX)"
-	select NET_CLS
 	---help---
 	  Say Y here if you want to be able to classify packets based on
 	  traffic control indices. You will want this feature if you want
@@ -280,7 +291,6 @@
 config NET_CLS_ROUTE4
 	tristate "Routing decision (ROUTE)"
 	select NET_CLS_ROUTE
-	select NET_CLS
 	---help---
 	  If you say Y here, you will be able to classify packets
 	  according to the route table entry they matched.
@@ -293,7 +303,6 @@
 
 config NET_CLS_FW
 	tristate "Netfilter mark (FW)"
-	select NET_CLS
 	---help---
 	  If you say Y here, you will be able to classify packets
 	  according to netfilter/firewall marks.
@@ -303,7 +312,6 @@
 
 config NET_CLS_U32
 	tristate "Universal 32bit comparisons w/ hashing (U32)"
-	select NET_CLS
 	---help---
 	  Say Y here to be able to classify packetes using a universal
 	  32bit pieces based comparison scheme.
@@ -326,7 +334,6 @@
 
 config NET_CLS_RSVP
 	tristate "IPv4 Resource Reservation Protocol (RSVP)"
-	select NET_CLS
 	select NET_ESTIMATOR
 	---help---
 	  The Resource Reservation Protocol (RSVP) permits end systems to
@@ -341,7 +348,6 @@
 
 config NET_CLS_RSVP6
 	tristate "IPv6 Resource Reservation Protocol (RSVP6)"
-	select NET_CLS
 	select NET_ESTIMATOR
 	---help---
 	  The Resource Reservation Protocol (RSVP) permits end systems to
@@ -356,7 +362,6 @@
 
 config NET_EMATCH
 	bool "Extended Matches"
-	select NET_CLS
 	---help---
 	  Say Y here if you want to use extended matches on top of classifiers
 	  and select the extended matches below.
@@ -541,6 +546,8 @@
 	  automaticaly selected if needed but can be selected manually for
 	  statstical purposes.
 
+endif # NET_CLS
+
 endif # NET_SCHED
 
 endmenu

^ permalink raw reply

* Re: [PATCH -mm2] net: Fix compiler-error on dgrs.c when !CONFIG_PCI
From: Richard Knutsson @ 2005-11-21 12:52 UTC (permalink / raw)
  To: Herbert Xu; +Cc: akpm, linux-kernel, netdev, jgarzik, ashutosh.naik
In-Reply-To: <20051120204001.GA11043@gondor.apana.org.au>

Herbert Xu wrote:

>On Sun, Nov 20, 2005 at 04:35:46PM +0100, Richard Knutsson wrote:
>  
>
>>>-#ifdef CONFIG_EISA
>>>-	cardcount = eisa_driver_register(&dgrs_eisa_driver);
>>>+	cardcount = dgrs_register_eisa();
>>>	if (cardcount < 0)
>>>		return cardcount;
>>>-#endif
>>>-	cardcount = pci_register_driver(&dgrs_pci_driver);
>>>-	if (cardcount)
>>>+	cardcount = dgrs_register_pci();
>>>+	if (cardcount < 0) {
>>>      
>>>
>>Are you sure it should be "cardcount < 0" and not "cardcount"?
>>    
>>
>
>Yes if cardcount is >= 0 then the registration was successful.
>
>  
>
>>>+		dgrs_unregister_eisa();
>>>      
>>>
>>Why change the behaviour off this driver?
>>    
>>
>
>Because the driver was buggy.  When this function returns a non-zero
>value, it must return the system to its original state.
>
>That means if the EISA driver has already been registered then it must
>be unregistered.
>
>Cheers,
>  
>
What do you think about this patch? Will you sign it? It is no longer an 
error-warning-fix but a bug-fix (and some cleanup).
I "took" you implementation of dgrs_(un)register_eisa(), especially 
since eisa needed to be unregistered if pci succeeds (I take you word 
for it to be so).
(BTW, this patch is compiled with CONFIG_PCI set, CONFIG_EISA set and 
both set without errors/warnings for dgrs.o.)

This patch requirer the 
"net-fix-compiler-error-on-dgrsc-when-config_pci.patch" (added to the 
-mm tree after 2.6.15-rc1-mm2):

--- devel/drivers/net/dgrs.c~net-fix-compiler-error-on-dgrsc-when-config_pci	2005-11-19 18:00:34.000000000 -0800
+++ devel-akpm/drivers/net/dgrs.c	2005-11-19 18:00:34.000000000 -0800
@@ -1458,6 +1458,8 @@ static struct pci_driver dgrs_pci_driver
 	.probe = dgrs_pci_probe,
 	.remove = __devexit_p(dgrs_pci_remove),
 };
+#else
+static struct pci_driver dgrs_pci_driver = {};
 #endif
 
 

Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se>
---

diff -Narup a/drivers/net/dgrs.c b/drivers/net/dgrs.c
--- a/drivers/net/dgrs.c	2005-11-21 11:25:29.000000000 +0100
+++ b/drivers/net/dgrs.c	2005-11-21 11:38:33.000000000 +0100
@@ -1522,6 +1522,26 @@ static struct eisa_driver dgrs_eisa_driv
 		.remove = __devexit_p(dgrs_eisa_remove),
 	}
 };
+
+
+static inline int dgrs_register_eisa(void)
+{
+	return eisa_driver_register(&dgrs_eisa_driver);
+}
+
+static inline void dgrs_unregister_eisa(void)
+{
+	eisa_driver_unregister(&dgrs_eisa_driver);
+}
+
+#else
+
+static inline int dgrs_register_eisa(void)
+{
+	return 0;
+}
+
+static inline void dgrs_unregister_eisa(void) {}
 #endif
 
 /*
@@ -1551,7 +1571,7 @@ MODULE_PARM_DESC(nicmode, "Digi RightSwi
 static int __init dgrs_init_module (void)
 {
 	int	i;
-	int	cardcount = 0;
+	int	cardcount;
 
 	/*
 	 *	Command line variable overrides
@@ -1592,25 +1612,23 @@ static int __init dgrs_init_module (void
 	/*
 	 *	Find and configure all the cards
 	 */
-#ifdef CONFIG_EISA
-	cardcount = eisa_driver_register(&dgrs_eisa_driver);
+
+	cardcount = dgrs_register_eisa();
 	if (cardcount < 0)
 		return cardcount;
-#endif
+
 	cardcount = pci_register_driver(&dgrs_pci_driver);
-	if (cardcount)
+	if (cardcount < 0) {
+		dgrs_unregister_eisa();
 		return cardcount;
+	}
 	return 0;
 }
 
 static void __exit dgrs_cleanup_module (void)
 {
-#ifdef CONFIG_EISA
-	eisa_driver_unregister (&dgrs_eisa_driver);
-#endif
-#ifdef CONFIG_PCI
+	dgrs_unregister_eisa();
 	pci_unregister_driver (&dgrs_pci_driver);
-#endif
 }
 
 module_init(dgrs_init_module);

^ permalink raw reply

* Netlink send from the interrupt context
From: Robert Iakobashvili @ 2005-11-21 11:16 UTC (permalink / raw)
  To: netdev

Are there netlink socket netlink_unicast () and netlink_broadcast ()
interrupt safe?

If not, where is the problem and the direction to make them safe?
If it is not easy, what could be a workaroud?

Thank you in advance.
----------------------------------------------------------------
Robert Iakobashvili  - NAVIGARE NECESSE EST
coroberti at gmail dot com
---------------------------------------------------------------

^ permalink raw reply

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
From: Yasuyuki KOZAKAI @ 2005-11-21 10:53 UTC (permalink / raw)
  To: kaber; +Cc: netdev, netfilter-devel, davem, yasuyuki.kozakai
In-Reply-To: <43816EB4.4080205@trash.net>


Hi,

From: Patrick McHardy <kaber@trash.net>
Date: Mon, 21 Nov 2005 07:52:36 +0100

> I don't see why it is confusing. Plain text packets are visible before
> encapsulation (and they have to be because we don't necessarily know
> if packets will be encapsulated at the time the hooks are called in
> case the policy lookup after NAT returns a policy), plain text packets
> are visible after decapsulation. With different hooks we can't have
> symetrical behaviour because of the case I mentioned above, and that
> would be confusing IMO.

Well, what I worried about was just ease to use, not internal processing.
I suspected that users can correctly configure IPsec and packet filtering.
Just doing "iptables -P INPUT -j DROP; iptables -A INPUT -p esp -j ACCEPT"
will drop all input packets and this is different behavior with current
kernel, for example. So I just imagined many people would say "Why ?".

But as you said in other mail, probably this is my needles fear, isn't this ?
As you know, I'm basically worrier. :)

> > And this can be said about IPv6 input path. If packets have not been mangled
> > (this is ordinary case because ip6tables doesn't have neither NAT nor
> > target module to mangle addresses directly), they don't have to re-route
> > and don't have to re-visit ip6_input_finish().
> > 
> > In the other way, if their addresses have been mangled, it's necessary to
> > re-route. I agree re-visiting ip6_input_finish() in this case.
> 
> Same goes for ip6_input_finish as for ip_local_deliver_finish(),
> the packet would continue its path there anyway. Do you actually
> mean ip6_rcv_finish()?

No. I mean ip6_input_finish(). calling ip6_input_finish() twice causes
problem at processing of IPv6 extension headers. This is different point
between IPv4 and IPv6.

Please note that I don't mean IPv4 processing in your patch has problem.
I think it will work. What I wanted to do was just avoiding double processing
of extension headers and synchronizing IPv4/IPv6 behavior as possible.

> > Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6)
> > so that ip_local_deliver_finish()/ip6_input_finish() can continue to process
> > headers if packets have not been mangled ? Is this difficult or impossible to
> > implement ?
> 
> I'm not sure I understand. Do you propose to check if the packet was
> mangled after the PRE_ROUTING hook (if it was not stolen or queued)
> and if not return directly to ip6_input_finish()?

Yes.

>                                                    Where would the
> LOCAL_IN hook be called?

Ah, indeed. But we can add it just before return directly to
ip6_input_finish().

> > This can solve the issue about twice processing of statistics and IPv6
> > extension headers. Because ip_local_deliver_finish()/ip6_input_finish() can
> > continue to process extension headers after ESP/AH in ordinary case.
> 
> AFAICT statistics are not affected by my patches, except for the
> iptables counters. The double parsing of extension headers is indeed
> a problem I forgot about, it looks like we need to carry nhoff around
> if it can't be derived from the packet.

Of cause that is one solution.

-- Yasuyuki Kozakai

^ permalink raw reply

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-11-21  8:50 UTC (permalink / raw)
  To: kazunori, kaber; +Cc: netdev, netfilter-devel, davem
In-Reply-To: <438185ED.3050005@miyazawa.org>

Hello.

In article <438185ED.3050005@miyazawa.org> (at Mon, 21 Nov 2005 17:31:41 +0900), Kazunori Miyazawa <kazunori@miyazawa.org> says:

> Your ip_xfrm_transport_hook is a good idea, I think.
> 
> We could call ip6_rcv_finish if the netfilter changed the addresses
> or otherwise we can continue the loop to avoid the cost in a similar
> way because we can know the change with checking skb->dst.

Well, I agree.

In article <20051120163135.16666.76993.sendpatchset@localhost.localdomain> (at Sun, 20 Nov 2005 17:31:36 +0100), Patrick McHardy <kaber@trash.net> says:

> diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
> index b93e7cd..3c39296 100644
> --- a/net/ipv4/netfilter.c
> +++ b/net/ipv4/netfilter.c
> @@ -105,6 +105,26 @@ int ip_dst_output(struct sk_buff *skb)
>  	return dst_output(skb);
>  }
>  EXPORT_SYMBOL(ip_dst_output);
> +
> +/*
> + * okfn for transport mode xfrm_input.c hook. Basically a copy of
> + * ip_rcv_finish without statistics and option parsing.
> + */
> +int ip_xfrm_transport_hook(struct sk_buff *skb)
> +{
> +	struct iphdr *iph = skb->nh.iph;
> +
> +	if (likely(skb->dst == NULL)) {
> +		int err = ip_route_input(skb, iph->daddr, iph->saddr, iph->tos,
> +		                         skb->dev);
> +		if (unlikely(err))
> +			goto drop;
> +	}
> +	return dst_input(skb);
> +drop:
> +	kfree_skb(skb);
> +	return NET_RX_DROP;
> +}
>  #endif /* CONFIG_XFRM */
>  
:
> @@ -129,7 +133,16 @@ int xfrm6_rcv_spi(struct sk_buff **pskb,
>  		netif_rx(skb);
>  		return -1;
>  	} else {
> +#ifdef CONFIG_NETFILTER
> +		skb->nh.ipv6h->payload_len = htons(skb->len);
> +		__skb_push(skb, skb->data - skb->nh.raw);
> +
> +		NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL,
> +		        ip6_rcv_finish);
> +		return -1;
> +#else
>  		return 1;
> +#endif
>  	}
>  

Probably, we can do similarly for ipv6; e.g.:

int ip6_xfrm_transport_hook(struct sk_buff *skb)
{
#if 0 /* We NEVER support NAT. :-) */
     if (likely(skb->dst == NULL)) {
            int err = ip6_route_input()
            if (unlikely(err))
                     goto drop;
     }
#endif
     __skb_pull(skb, skb->h.raw - skb->nh.raw);
     return NET_RX_SUCCESS;
drop:
     kfree_skb(skb);
     return NET_RX_DROP;
}

:

      } else {
#ifdef CONFIG_NETFILTER
             skb->nh.ipv6h->payload_len = htons(skb->len);
	     skb->h.raw = skb->data;
             __skb_push(skb, skb->data - skb->nh.raw);

             if (NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL,
                         ip6_xfrm_transport_hook) == NET_RX_DROP)
                 return -1;
#endif
             return 1;
      }

Then, we can continue parsing extension headers, I think.

-- 
YOSHIFUJI Hideaki @ USAGI Project  <yoshfuji@linux-ipv6.org>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA

^ permalink raw reply

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
From: Herbert Xu @ 2005-11-21  7:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, netfilter-devel, kaber, yasuyuki.kozakai
In-Reply-To: <20051120.230034.105307423.davem@davemloft.net>

David S. Miller <davem@davemloft.net> wrote:
> 
> I've read over Patrick's two most recent postings of these patches
> and I think they are generally sane and I cannot find any holes in
> them.  Herbert brought up the legitimate concern about defragmentation,
> but I think that's a detail and does not take away from the structural
> soundness of Patrick's approach.

Yes I agree completely.  The new IPsec stack has been around for three
years and it's about time that we have proper netfilter support for it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
From: David S. Miller @ 2005-11-21  7:00 UTC (permalink / raw)
  To: kaber; +Cc: netdev, netfilter-devel, yasuyuki.kozakai
In-Reply-To: <43816EB4.4080205@trash.net>

From: Patrick McHardy <kaber@trash.net>
Date: Mon, 21 Nov 2005 07:52:36 +0100

> I don't see why it is confusing. Plain text packets are visible before
> encapsulation (and they have to be because we don't necessarily know
> if packets will be encapsulated at the time the hooks are called in
> case the policy lookup after NAT returns a policy), plain text packets
> are visible after decapsulation. With different hooks we can't have
> symetrical behaviour because of the case I mentioned above, and that
> would be confusing IMO.

I think this is a very important point.

I can see no serious argument against this behavior, especially for
output.  On input, there is an argument of paranoia about seeing
plaintext packets, but administrator could do this anyways with
tcpdump or custom kernel module if this system is the decapsulation
point.

I've read over Patrick's two most recent postings of these patches
and I think they are generally sane and I cannot find any holes in
them.  Herbert brought up the legitimate concern about defragmentation,
but I think that's a detail and does not take away from the structural
soundness of Patrick's approach.

^ permalink raw reply

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
From: Patrick McHardy @ 2005-11-21  6:52 UTC (permalink / raw)
  To: Yasuyuki KOZAKAI; +Cc: netdev, netfilter-devel, davem
In-Reply-To: <200511210442.jAL4gPoO001846@toshiba.co.jp>

Yasuyuki KOZAKAI wrote:
> At first, now I could agree to use same name for hooks before/after xfrm
> processing, if it's important to keep compatibility than to avoid difficulty
> to use. Even now I think it's confusing to pass packets before/after xfrm to
> same hook, and believe it's ideal to use different name for them.
> But people can configure correctly according to you, then my concern might be
> needless fear.

I don't see why it is confusing. Plain text packets are visible before
encapsulation (and they have to be because we don't necessarily know
if packets will be encapsulated at the time the hooks are called in
case the policy lookup after NAT returns a policy), plain text packets
are visible after decapsulation. With different hooks we can't have
symetrical behaviour because of the case I mentioned above, and that
would be confusing IMO.

> Next is about re-visiting stack, I'm beginning to understand your patch.
> It looks natural to re-route after DNAT on input path of IPv4. That's really
> needed if packets have been mangled.
> 
> But is there any reason that we have to allow packets to re-visit
> ip_local_deliver_finish() in the case that they have not been mangled
> at PRE_ROUTING ? I think no. This situation is like ip_nat_out().

My patches don't change when and if packets will reach
ip_local_deliver_finish(), they just add a possibility for rerouting.
Currently the transforms are called from ip_local_deliver_finish() and
in transport mode the decapsulated packet continues its path in
ip_local_deliver_finish(), with my patches they will go through two
netfilter hooks and continue the exact same codepath, given that
they are not NATed to a foreign destination IP on their way.

> And this can be said about IPv6 input path. If packets have not been mangled
> (this is ordinary case because ip6tables doesn't have neither NAT nor
> target module to mangle addresses directly), they don't have to re-route
> and don't have to re-visit ip6_input_finish().
> 
> In the other way, if their addresses have been mangled, it's necessary to
> re-route. I agree re-visiting ip6_input_finish() in this case.

Same goes for ip6_input_finish as for ip_local_deliver_finish(),
the packet would continue its path there anyway. Do you actually
mean ip6_rcv_finish()?

> Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6)
> so that ip_local_deliver_finish()/ip6_input_finish() can continue to process
> headers if packets have not been mangled ? Is this difficult or impossible to
> implement ?

I'm not sure I understand. Do you propose to check if the packet was
mangled after the PRE_ROUTING hook (if it was not stolen or queued)
and if not return directly to ip6_input_finish()? Where would the
LOCAL_IN hook be called?

> This can solve the issue about twice processing of statistics and IPv6
> extension headers. Because ip_local_deliver_finish()/ip6_input_finish() can
> continue to process extension headers after ESP/AH in ordinary case.

AFAICT statistics are not affected by my patches, except for the
iptables counters. The double parsing of extension headers is indeed
a problem I forgot about, it looks like we need to carry nhoff around
if it can't be derived from the packet.

Regards
Patrick

^ permalink raw reply

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
From: Yasuyuki KOZAKAI @ 2005-11-21  4:42 UTC (permalink / raw)
  To: kaber; +Cc: netdev, netfilter-devel, davem
In-Reply-To: <20051120163135.16666.76993.sendpatchset@localhost.localdomain>

Hi, Patrick,

From: Patrick McHardy <kaber@trash.net>
Date: Sun, 20 Nov 2005 17:31:36 +0100

> [IPV4/6]: Netfilter IPsec input hooks
> 
> When the innermost transform uses transport mode the decapsulated packet
> is not visible to netfilter. Pass the packet through the PRE_ROUTING and
> LOCAL_IN hooks again before handing it to upper layer protocols to make
> netfilter-visibility symetrical to the output path.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

At first, now I could agree to use same name for hooks before/after xfrm
processing, if it's important to keep compatibility than to avoid difficulty
to use. Even now I think it's confusing to pass packets before/after xfrm to
same hook, and believe it's ideal to use different name for them.
But people can configure correctly according to you, then my concern might be
needless fear.

Next is about re-visiting stack, I'm beginning to understand your patch.
It looks natural to re-route after DNAT on input path of IPv4. That's really
needed if packets have been mangled.

But is there any reason that we have to allow packets to re-visit
ip_local_deliver_finish() in the case that they have not been mangled
at PRE_ROUTING ? I think no. This situation is like ip_nat_out().

And this can be said about IPv6 input path. If packets have not been mangled
(this is ordinary case because ip6tables doesn't have neither NAT nor
target module to mangle addresses directly), they don't have to re-route
and don't have to re-visit ip6_input_finish().

In the other way, if their addresses have been mangled, it's necessary to
re-route. I agree re-visiting ip6_input_finish() in this case.

Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6)
so that ip_local_deliver_finish()/ip6_input_finish() can continue to process
headers if packets have not been mangled ? Is this difficult or impossible to
implement ?

This can solve the issue about twice processing of statistics and IPv6
extension headers. Because ip_local_deliver_finish()/ip6_input_finish() can
continue to process extension headers after ESP/AH in ordinary case.

In special case, if some codes mangle IPv6 addresses, that's the codes
to take care of the possibility of re-visiting ip6_input_finish().

What do you think ?

# Please note that these are just my opinions and other USAGI guys might have
# other opinions.

Regards,

-- Yasuyuki Kozakai

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox