Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise
From: Koki Sanagi @ 2010-07-21  6:57 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers
In-Reply-To: <20100720110439.GA1995@hmsreliant.think-freely.org>

(2010/07/20 20:04), Neil Horman wrote:
> On Tue, Jul 20, 2010 at 09:45:31AM +0900, Koki Sanagi wrote:
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>
>> Add a tracepoint for tracing when softirq action is raised.
>>
>> It and the existed tracepoints complete softirq's tracepoints:
>> softirq_raise, softirq_entry and softirq_exit.
>>
>> And when this tracepoint is used in combination with
>> the softirq_entry tracepoint we can determine
>> the softirq raise latency.
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
>> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
>>
>> [ factorize softirq events with DECLARE_EVENT_CLASS ]
>> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
>> ---
>>  include/linux/interrupt.h  |    8 +++++-
>>  include/trace/events/irq.h |   57 ++++++++++++++++++++++++++-----------------
>>  kernel/softirq.c           |    4 +-
>>  3 files changed, 43 insertions(+), 26 deletions(-)
>>
>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>> index c233113..1cb5726 100644
>> --- a/include/linux/interrupt.h
>> +++ b/include/linux/interrupt.h
>> @@ -18,6 +18,7 @@
>>  #include <asm/atomic.h>
>>  #include <asm/ptrace.h>
>>  #include <asm/system.h>
>> +#include <trace/events/irq.h>
>>  
>>  /*
>>   * These correspond to the IORESOURCE_IRQ_* defines in
>> @@ -402,7 +403,12 @@ asmlinkage void do_softirq(void);
>>  asmlinkage void __do_softirq(void);
>>  extern void open_softirq(int nr, void (*action)(struct softirq_action *));
>>  extern void softirq_init(void);
>> -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
>> +static inline void __raise_softirq_irqoff(unsigned int nr)
>> +{
>> +	trace_softirq_raise(nr);
>> +	or_softirq_pending(1UL << nr);
>> +}
>> +
> We already have tracepoints in irq_enter and irq_exit.  If the goal here is to
> detect latency during packet processing, cant the delta in time between those
> two points be used to determine interrupt handling latency?

Certainly, the time between irq_entry and irq_exit is not directly related to
latency during packet processing. But it's indirectly related it.
Because softirq_entry isn't passed until irq exits and softirq_entry time is
related to packet processing latency. So I show it as a reference.

> 
> 
>>  extern void raise_softirq_irqoff(unsigned int nr);
>>  extern void raise_softirq(unsigned int nr);
>>  extern void wakeup_softirqd(void);
>> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
>> index 0e4cfb6..717744c 100644
>> --- a/include/trace/events/irq.h
>> +++ b/include/trace/events/irq.h
>> @@ -5,7 +5,9 @@
>>  #define _TRACE_IRQ_H
>>  
>>  #include <linux/tracepoint.h>
>> -#include <linux/interrupt.h>
>> +
>> +struct irqaction;
>> +struct softirq_action;
>>  
>>  #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
>>  #define show_softirq_name(val)				\
>> @@ -84,56 +86,65 @@ TRACE_EVENT(irq_handler_exit,
>>  
>>  DECLARE_EVENT_CLASS(softirq,
>>  
>> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
>> +	TP_PROTO(unsigned int nr),
>>  
>> -	TP_ARGS(h, vec),
>> +	TP_ARGS(nr),
>>  
>>  	TP_STRUCT__entry(
>> -		__field(	int,	vec			)
>> +		__field(	unsigned int,	vec	)
>>  	),
>>  
>>  	TP_fast_assign(
>> -		__entry->vec = (int)(h - vec);
>> +		__entry->vec	= nr;
>>  	),
>>  
>>  	TP_printk("vec=%d [action=%s]", __entry->vec,
>> -		  show_softirq_name(__entry->vec))
>> +		show_softirq_name(__entry->vec))
>> +);
>> +
>> +/**
>> + * softirq_raise - called immediately when a softirq is raised
>> + * @nr: softirq vector number
>> + *
>> + * Tracepoint for tracing when softirq action is raised.
>> + * Also, when used in combination with the softirq_entry tracepoint
>> + * we can determine the softirq raise latency.
>> + */
>> +DEFINE_EVENT(softirq, softirq_raise,
>> +
>> +	TP_PROTO(unsigned int nr),
>> +
>> +	TP_ARGS(nr)
>>  );
>>  
>>  /**
>>   * softirq_entry - called immediately before the softirq handler
>> - * @h: pointer to struct softirq_action
>> - * @vec: pointer to first struct softirq_action in softirq_vec array
>> + * @nr: softirq vector number
>>   *
>> - * The @h parameter, contains a pointer to the struct softirq_action
>> - * which has a pointer to the action handler that is called. By subtracting
>> - * the @vec pointer from the @h pointer, we can determine the softirq
>> - * number. Also, when used in combination with the softirq_exit tracepoint
>> + * Tracepoint for tracing when softirq action starts.
>> + * Also, when used in combination with the softirq_exit tracepoint
>>   * we can determine the softirq latency.
>>   */
>>  DEFINE_EVENT(softirq, softirq_entry,
>>  
>> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
>> +	TP_PROTO(unsigned int nr),
>>  
>> -	TP_ARGS(h, vec)
>> +	TP_ARGS(nr)
>>  );
>>  
>>  /**
>>   * softirq_exit - called immediately after the softirq handler returns
>> - * @h: pointer to struct softirq_action
>> - * @vec: pointer to first struct softirq_action in softirq_vec array
>> + * @nr: softirq vector number
>>   *
>> - * The @h parameter contains a pointer to the struct softirq_action
>> - * that has handled the softirq. By subtracting the @vec pointer from
>> - * the @h pointer, we can determine the softirq number. Also, when used in
>> - * combination with the softirq_entry tracepoint we can determine the softirq
>> - * latency.
>> + * Tracepoint for tracing when softirq action ends.
>> + * Also, when used in combination with the softirq_entry tracepoint
>> + * we can determine the softirq latency.
>>   */
>>  DEFINE_EVENT(softirq, softirq_exit,
>>  
>> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
>> +	TP_PROTO(unsigned int nr),
>>  
>> -	TP_ARGS(h, vec)
>> +	TP_ARGS(nr)
>>  );
>>  
>>  #endif /*  _TRACE_IRQ_H */
>> diff --git a/kernel/softirq.c b/kernel/softirq.c
>> index 825e112..6790599 100644
>> --- a/kernel/softirq.c
>> +++ b/kernel/softirq.c
>> @@ -215,9 +215,9 @@ restart:
>>  			int prev_count = preempt_count();
>>  			kstat_incr_softirqs_this_cpu(h - softirq_vec);
>>  
>> -			trace_softirq_entry(h, softirq_vec);
>> +			trace_softirq_entry(h - softirq_vec);
>>  			h->action(h);
>> -			trace_softirq_exit(h, softirq_vec);
>> +			trace_softirq_exit(h - softirq_vec);
> 
> You're loosing information here by reducing the numbers of parameters in this
> tracepoint.  How many other tracepoint scripts rely on having both pointers
> handy?  Why not just do the pointer math inside your tracehook instead?

In __raise_softirq_irqoff macro there is no method to refer softirq_vec, so it
can't use softirq DECLARE_EVENT_CLASS as is.
Currently,  there is no script using softirq_entry or softirq_exit.

Thanks,
Koki Sanagi.

> 
>>  			if (unlikely(prev_count != preempt_count())) {
>>  				printk(KERN_ERR "huh, entered softirq %td %s %p"
>>  				       "with preempt_count %08x,"
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 



^ permalink raw reply

* Re: [PATCH net-next] sysfs: add entry to indicate network interfaces with random MAC address
From: Harald Hoyer @ 2010-07-21  6:47 UTC (permalink / raw)
  To: David Miller
  Cc: shemminger, bhutchings, sassmann, netdev, linux-kernel, gospo,
	gregory.v.rose, alexander.h.duyck, leedom
In-Reply-To: <20100720.233457.267367495.davem@davemloft.net>

On 07/21/2010 08:34 AM, David Miller wrote:
> From: Harald Hoyer<harald@redhat.com>
> Date: Wed, 21 Jul 2010 08:26:27 +0200
>
>> On 07/20/2010 11:20 PM, David Miller wrote:
>>> From: Stephen Hemminger<shemminger@vyatta.com>
>>> Date: Tue, 20 Jul 2010 14:18:16 -0700
>>>
>>>> No one mentioned that the first octet of an Ethernet address already
>>>> indicates "software generated" Ethernet address. Per the standard,
>>>> if bit 1 is set it means address is locally assigned.
>>>>
>>>> static inline bool is_locally_assigned_ether(const u8 *addr)
>>>> {
>>>> 	return (addr[0]&   0x2) != 0;
>>>> }
>>>
>>> W00t!
>>>
>>> Indeed, can udev just use that?  :-)
>>
>> It already does:
>> see /lib/udev/rules.d/75-persistent-net-generator.rules
>
> So... why doesn't this work?

It works.. but the information, that the MAC is randomly generated would be 
valuable. So, for the non-random locally assigned MAC (with bit 1), we could 
easily make persistent rules based on the MAC, instead of completely ignoring 
them, like we do currently.

^ permalink raw reply

* Re: [RFC PATCH] dst: check if dst is freed in dst_check()
From: David Miller @ 2010-07-21  6:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: nicolas.dichtel, netdev
In-Reply-To: <1279679288.2492.15.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Jul 2010 04:28:08 +0200

> Le mardi 20 juillet 2010 à 11:49 +0200, Nicolas Dichtel a écrit :
>> diff --git a/include/net/dst.h b/include/net/dst.h
>> index 81d1413..7bf4f9a 100644
>> --- a/include/net/dst.h
>> +++ b/include/net/dst.h
>> @@ -319,6 +319,8 @@ static inline int dst_input(struct sk_buff *skb)
>>  
>>  static inline struct dst_entry *dst_check(struct dst_entry *dst, u32 cookie)
>>  {
>> +	if (dst->obsolete > 1)
>> +		return NULL;
>>  	if (dst->obsolete)
>>  		dst = dst->ops->check(dst, cookie);
>>  	return dst;
> 
> I believe this is not needed and redundant.
> 
> In what case do you think this matters ?
> 
> To my knowledge dst_check() is only used by net/xfrm/xfrm_policy.c
> 
> And xfrm_dst_check() does the necessary checks.

Right, last time I was snooping around in here I came to the
same conclusion.  In fact I think I'm the author of that
enormous comment in xfrm_dst_check(). :-)



^ permalink raw reply

* Re: [patch v2.6 4/4] libxt_ipvs: user-space lib for netfilter matcher xt_ipvs
From: Jan Engelhardt @ 2010-07-21  6:35 UTC (permalink / raw)
  To: Simon Horman
  Cc: lvs-devel, netdev, linux-kernel, netfilter, netfilter-devel,
	Malcolm Turnbull, Wensong Zhang, Julius Volz, Patrick McHardy,
	David S. Miller, Hannes Eder
In-Reply-To: <20100721012146.GC22966@verge.net.au>


On Wednesday 2010-07-21 03:21, Simon Horman wrote:
>> +
>> +#define XT_IPVS_IPVS_PROPERTY	(1 << 0) /* all other options imply this one */
>> +#define XT_IPVS_PROTO		(1 << 1)
>> +#define XT_IPVS_VADDR		(1 << 2)
>> +#define XT_IPVS_VPORT		(1 << 3)
>> +#define XT_IPVS_DIR		(1 << 4)
>> +#define XT_IPVS_METHOD		(1 << 5)
>> +#define XT_IPVS_VPORTCTL	(1 << 6)
>> +#define XT_IPVS_MASK		((1 << 7) - 1)
>> +#define XT_IPVS_ONCE_MASK	(XT_IPVS_MASK & ~XT_IPVS_IPVS_PROPERTY)

Can't these just be an enum?


^ permalink raw reply

* Re: [PATCH net-next] sysfs: add entry to indicate network interfaces with random MAC address
From: David Miller @ 2010-07-21  6:34 UTC (permalink / raw)
  To: harald
  Cc: shemminger, bhutchings, sassmann, netdev, linux-kernel, gospo,
	gregory.v.rose, alexander.h.duyck, leedom
In-Reply-To: <4C469313.6010807@redhat.com>

From: Harald Hoyer <harald@redhat.com>
Date: Wed, 21 Jul 2010 08:26:27 +0200

> On 07/20/2010 11:20 PM, David Miller wrote:
>> From: Stephen Hemminger<shemminger@vyatta.com>
>> Date: Tue, 20 Jul 2010 14:18:16 -0700
>>
>>> No one mentioned that the first octet of an Ethernet address already
>>> indicates "software generated" Ethernet address. Per the standard,
>>> if bit 1 is set it means address is locally assigned.
>>>
>>> static inline bool is_locally_assigned_ether(const u8 *addr)
>>> {
>>> 	return (addr[0]&  0x2) != 0;
>>> }
>>
>> W00t!
>>
>> Indeed, can udev just use that?  :-)
> 
> It already does:
> see /lib/udev/rules.d/75-persistent-net-generator.rules

So... why doesn't this work?

^ permalink raw reply

* Re: [PATCH net-next] sysfs: add entry to indicate network interfaces with random MAC address
From: Harald Hoyer @ 2010-07-21  6:26 UTC (permalink / raw)
  To: David Miller
  Cc: shemminger, bhutchings, sassmann, netdev, linux-kernel, gospo,
	gregory.v.rose, alexander.h.duyck, leedom
In-Reply-To: <20100720.142045.32697196.davem@davemloft.net>

On 07/20/2010 11:20 PM, David Miller wrote:
> From: Stephen Hemminger<shemminger@vyatta.com>
> Date: Tue, 20 Jul 2010 14:18:16 -0700
>
>> No one mentioned that the first octet of an Ethernet address already
>> indicates "software generated" Ethernet address. Per the standard,
>> if bit 1 is set it means address is locally assigned.
>>
>> static inline bool is_locally_assigned_ether(const u8 *addr)
>> {
>> 	return (addr[0]&  0x2) != 0;
>> }
>
> W00t!
>
> Indeed, can udev just use that?  :-)

It already does:
see /lib/udev/rules.d/75-persistent-net-generator.rules

^ permalink raw reply

* Re: [PATCH -next] net: NET_DSA depends on NET_ETHERNET
From: Randy Dunlap @ 2010-07-21  5:41 UTC (permalink / raw)
  To: David Miller; +Cc: sfr, netdev, linux-next, linux-kernel, buytenh
In-Reply-To: <20100720.174530.139530021.davem@davemloft.net>

On 07/20/10 17:45, David Miller wrote:
> From: Randy Dunlap <randy.dunlap@oracle.com>
> Date: Tue, 20 Jul 2010 16:03:32 -0700
> 
>> From: Randy Dunlap <randy.dunlap@oracle.com>
>>
>> NET_DSA code selects and uses PHYLIB code, but PHYLIB depends on
>> NET_ETHERNET.  However, "select" does not follow kconfig dependencies,
>> so explicitly list that requirement here instead.
>>
>> Fixes this kconfig warning:
>>
>> warning: (NET_DSA && NET && EXPERIMENTAL && !S390 ...) selects PHYLIB which has unmet direct dependencies (!S390 && NET_ETHERNET)
>>
>> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
> 
> Randy, this has been fixed in net-2.6 for some time now.

OK, I did see the commit get merged today.

> And I'm pretty sure I sent a copy of this to you when I
> checked it in :-)

I missed it somehow.  Thanks.

> --------------------
> From 336a283b9cbe47748ccd68fd8c5158f67cee644b Mon Sep 17 00:00:00 2001
> From: David S. Miller <davem@davemloft.net>
> Date: Mon, 12 Jul 2010 20:03:42 -0700
> Subject: [PATCH 09/24] dsa: Fix Kconfig dependencies.
> 
> Based upon a report by Randy Dunlap.
> 
> DSA needs PHYLIB, but PHYLIB needs NET_ETHERNET.  So, in order
> to select PHYLIB we have to make DSA depend upon NET_ETHERNET.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/dsa/Kconfig |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
> index c51b554..1120178 100644
> --- a/net/dsa/Kconfig
> +++ b/net/dsa/Kconfig
> @@ -1,7 +1,7 @@
>  menuconfig NET_DSA
>  	bool "Distributed Switch Architecture support"
>  	default n
> -	depends on EXPERIMENTAL && !S390
> +	depends on EXPERIMENTAL && NET_ETHERNET && !S390
>  	select PHYLIB
>  	---help---
>  	  This allows you to use hardware switch chips that use


-- 
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* Re: [patch 1/1] drivers/s390/net: use memdup_user
From: Frank Blaschka @ 2010-07-21  5:35 UTC (permalink / raw)
  To: akpm; +Cc: netdev, linux-s390
In-Reply-To: <201007202227.o6KMRZ7V021566@imap1.linux-foundation.org>

Hi,

I added this patch to my patch set

Thx

Frank

^ permalink raw reply

* [PATCH 2/2] sysfs: allow creating symlinks from untagged to tagged directories
From: Eric W. Biederman @ 2010-07-21  5:12 UTC (permalink / raw)
  To: Greg KH
  Cc: Andrew Morton, Greg KH, Rafael J. Wysocki, Maciej W. Rozycki,
	Kay Sievers, Johannes Berg, netdev
In-Reply-To: <m1vd894dy5.fsf_-_@fess.ebiederm.org>


Supporting symlinks from untagged to tagged directories is reasonable,
and needed to support CONFIG_SYSFS_DEPRECATED.  So don't fail a prior
allowing that case to work.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/sysfs/symlink.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 6603833..a7ac78f 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -67,7 +67,8 @@ static int sysfs_do_create_link(struct kobject *kobj, struct kobject *target,
 
 	sysfs_addrm_start(&acxt, parent_sd);
 	/* Symlinks must be between directories with the same ns_type */
-	if (ns_type == sysfs_ns_type(sd->s_symlink.target_sd->s_parent)) {
+	if (!ns_type ||
+	    (ns_type == sysfs_ns_type(sd->s_symlink.target_sd->s_parent))) {
 		if (warn)
 			error = sysfs_add_one(&acxt, sd);
 		else
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related

* [PATCH 1/2] sysfs: sysfs_delete_link handle symlinks from untagged to tagged directories.
From: Eric W. Biederman @ 2010-07-21  5:10 UTC (permalink / raw)
  To: Greg KH
  Cc: Andrew Morton, Greg KH, Rafael J. Wysocki, Maciej W. Rozycki,
	Kay Sievers, Johannes Berg, netdev
In-Reply-To: <m139vd5sms.fsf_-_@fess.ebiederm.org>


This happens for network devices when SYSFS_DEPRECATED is enabled.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/sysfs/symlink.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 44bca5f..6603833 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -135,7 +135,7 @@ void sysfs_delete_link(struct kobject *kobj, struct kobject *targ,
 {
 	const void *ns = NULL;
 	spin_lock(&sysfs_assoc_lock);
-	if (targ->sd)
+	if (targ->sd && sysfs_ns_type(kobj->sd))
 		ns = targ->sd->s_ns;
 	spin_unlock(&sysfs_assoc_lock);
 	sysfs_hash_and_remove(kobj->sd, ns, name);
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related

* [PATCH 0/2] Support untagged symlinks to tagged directories.
From: Eric W. Biederman @ 2010-07-21  5:08 UTC (permalink / raw)
  To: Greg KH
  Cc: Andrew Morton, Greg KH, Rafael J. Wysocki, Maciej W. Rozycki,
	Kay Sievers, Johannes Berg, netdev
In-Reply-To: <20100720201334.GA11991@suse.de>

Greg KH <gregkh@suse.de> writes:

> On Mon, Jul 19, 2010 at 01:34:51PM -0700, Andrew Morton wrote:
>> On Thu, 8 Jul 2010 16:06:01 -0700
>> Greg KH <greg@kroah.com> wrote:
>> 
>> > On Thu, Jul 08, 2010 at 03:28:53PM -0700, Eric W. Biederman wrote:
>> > > Greg KH <greg@kroah.com> writes:
>> > > 
>> > > > With this patch, how does the existing code fail as the drivers aren't
>> > > > fixed up?
>> > > >
>> > > > I like this change, just worried it will cause problems if it gets into
>> > > > .35, without your RFC patch.  Will it?
>> > > 
>> 
>> geethanks!
>> 
>> On the FC6 test box I have no networking.
>
> Ick.
>
> Eric, any ideas?

Yes.  I just found some time to test my fixes and things are looking
good.  It really is just two one line fixes.

On the other part of this debug with SYSFS_DEPRECATED enabled it
with mac80211_hwsim drivers works fine no problems.  I expect the
bnep driver will also be fine.

What is affecting those two is arguably a bug in the non-deprecated
sysfs mode.

Regardless here are my fixes.  I have split this into a patch for
the warning and a patch for sysfs_delete_link.  Because at least
the sysfs_delete_link code needs to make into 2.6.35 if we can.

Eric

^ permalink raw reply

* Re: [PATCH] Export SMBIOS provided firmware instance and label to sysfs
From: Greg KH @ 2010-07-21  3:59 UTC (permalink / raw)
  To: Narendra K
  Cc: netdev, linux-hotplug, linux-pci, matt_domsch, charles_rose,
	jordan_hargrave, vijay_nijhawan
In-Reply-To: <20100714121345.GA20411@auslistsprd01.us.dell.com>

On Wed, Jul 14, 2010 at 07:13:45AM -0500, Narendra K wrote:
> @@ -333,6 +358,7 @@ static void __init dmi_decode(const struct dmi_header *dm, void *dummy)
>  		break;
>  	case 41:	/* Onboard Devices Extended Information */
>  		dmi_save_extended_devices(dm);
> +		break;

Why make this change?  It's not relevant to your patch, right?

> +enum smbios_attr_enum {
> +	SMBIOS_ATTR_NONE = 0,
> +	SMBIOS_ATTR_LABEL_SHOW,
> +	SMBIOS_ATTR_INSTANCE_SHOW,
> +};
> +
> +static mode_t
> +find_smbios_instance_string(struct pci_dev *pdev, char *buf, int attribute)

Why isn't 'attribute' an enumerated type like you just defined above?
Extra type-checking is always good, especially as the variable name
'attribute' means something totally different in other parts of this
file.

> +{
> +	const struct dmi_device *dmi;
> +	struct dmi_dev_onboard *donboard;
> +	int bus;
> +	int devfn;
> +
> +	bus = pdev->bus->number;
> +	devfn = pdev->devfn;
> +
> +	dmi = NULL;
> +	while ((dmi = dmi_find_device(DMI_DEV_TYPE_DEV_ONBOARD,
> +				      NULL, dmi)) != NULL) {
> +		donboard = dmi->device_data;
> +		if (donboard && donboard->bus == bus &&
> +					donboard->devfn == devfn) {
> +			if (buf) {
> +				if (attribute == SMBIOS_ATTR_INSTANCE_SHOW)
> +					return scnprintf(buf, PAGE_SIZE,
> +							 "%d\n",
> +							 donboard->instance);
> +				else if (attribute == SMBIOS_ATTR_LABEL_SHOW)
> +					return scnprintf(buf, PAGE_SIZE,
> +							 "%s\n",
> +							 dmi->name);
> +			}
> +			return strlen(dmi->name);
> +		}
> +	}
> +	return 0;
> +}
> +
> +static mode_t
> +smbios_instance_string_exist(struct kobject *kobj, struct attribute *attr,
> +			     int n)
> +{
> +	struct device *dev;
> +	struct pci_dev *pdev;
> +
> +	dev = container_of(kobj, struct device, kobj);
> +	pdev = to_pci_dev(dev);
> +
> +	return find_smbios_instance_string(pdev, NULL, SMBIOS_ATTR_NONE);
> +}
> +
> +static ssize_t
> +smbioslabel_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	struct pci_dev *pdev;
> +	pdev = to_pci_dev(dev);
> +
> +	return find_smbios_instance_string(pdev, buf,
> +					   SMBIOS_ATTR_LABEL_SHOW);
> +}
> +
> +static ssize_t
> +smbiosinstance_show(struct device *dev,
> +		    struct device_attribute *attr, char *buf)
> +{
> +	struct pci_dev *pdev;
> +	pdev = to_pci_dev(dev);
> +
> +	return find_smbios_instance_string(pdev, buf,
> +					   SMBIOS_ATTR_INSTANCE_SHOW);
> +}
> +
> +static struct device_attribute smbios_attr_label = {
> +	.attr = {.name = "label", .mode = 0444, .owner = THIS_MODULE},
> +	.show = smbioslabel_show,
> +};
> +
> +static struct device_attribute smbios_attr_instance = {
> +	.attr = {.name = "index", .mode = 0444, .owner = THIS_MODULE},
> +	.show = smbiosinstance_show,
> +};
> +
> +static struct attribute *smbios_attributes[] = {
> +	&smbios_attr_label.attr,
> +	&smbios_attr_instance.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group smbios_attr_group = {
> +	.attrs = smbios_attributes,
> +	.is_visible = smbios_instance_string_exist,
> +};
> +
> +static int
> +pci_create_smbiosname_file(struct pci_dev *pdev)
> +{
> +	if (!sysfs_create_group(&pdev->dev.kobj, &smbios_attr_group))
> +		return 0;
> +	return -ENODEV;
> +}
> +
> +static int
> +pci_remove_smbiosname_file(struct pci_dev *pdev)
> +{
> +		sysfs_remove_group(&pdev->dev.kobj, &smbios_attr_group);
> +		return 0;
> +}

What's with the extra indentation?

Why return a value at all here?

> +
> +int pci_create_firmware_label_files(struct pci_dev *pdev)
> +{
> +	if (!pci_create_smbiosname_file(pdev))
> +		return 0;
> +	return -ENODEV;
> +}
> +
> +int pci_remove_firmware_label_files(struct pci_dev *pdev)
> +{
> +	if (!pci_remove_smbiosname_file(pdev))
> +		return 0;
> +	return -ENODEV;
> +}

Why return values for these two functions if you never check them
anywhere?  Either check the return value and do something with it, or
just make them 'void'.

Also, you need to add documentation for what this sysfs file is and does
in the Documentation/ABI/ directory.  That must be in this patch to have
it acceptable.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] Export SMBIOS provided firmware instance and label to sysfs
From: Greg KH @ 2010-07-21  3:54 UTC (permalink / raw)
  To: Narendra_K
  Cc: netdev, linux-hotplug, linux-pci, Matt_Domsch, Charles_Rose,
	Jordan_Hargrave, Vijay_Nijhawan
In-Reply-To: <EDA0A4495861324DA2618B4C45DCB3EE612BD1@blrx3m08.blr.amer.dell.com>

On Mon, Jul 19, 2010 at 10:24:39PM +0530, Narendra_K@Dell.com wrote:
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Narendra K
> > Sent: Wednesday, July 14, 2010 5:44 PM
> > To: greg@kroah.com
> > Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org; linux-
> > pci@vger.kernel.org; Domsch, Matt; Rose, Charles; Hargrave, Jordan;
> > Nijhawan, Vijay
> > Subject: Re: [PATCH] Export SMBIOS provided firmware instance and
> label
> > to sysfs
> > 
> > 
> > V1 -> V2:
> > 
> > 1. The 'smbios_attr' buffer is not being used as mentioned above
> > 
> > 2. The function 'smbios_instance_string_exist' is split into two
> > functions,
> > the other being 'find_smbios_instance_string' which would print the
> > result
> > into the sysfs provided 'buf' of associated device. The function
> > 'smbios_instance_string_exist' would let us know if the label exists
> or
> > not.
> > 
> > Please find the patch with above changes here -
> > 
> > From: Narendra K <narendra_k@dell.com>
> > Subject: [PATCH] Export SMBIOS provided firmware instance and label to
> > sysfs
> > 
> 
> Greg,
> 
> Thanks for the review comments. 
> 
> This version of the patch has all the suggestions incorporated. Please
> let us know if there are any concerns. If the approach is acceptable,
> please consider this patch for inclusion.

What "version"?  The previous one you sent?  I'll look at it, but note
that I'm not the maintainer who you need to convince to accept it :)

thanks,

greg k-h

^ permalink raw reply

* Re: linux-next: manual merge of the net tree with Linus' tree
From: Stephen Rothwell @ 2010-07-21  3:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-next, linux-kernel, herbert
In-Reply-To: <20100720.202759.00438343.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 500 bytes --]

Hi Dave,

On Tue, 20 Jul 2010 20:27:59 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
>
> The net-2.6 changes should be undone as net-next-2.6 has the fixes
> that allow bridge netpoll to work properly, thus in net-next-2.6 the
> net-2.6 commit is completely unnecessary.
> 
> I did this when I merged net-2.6 into net-next-2.6 about an hour ago
> :-)

Excellent, thanks.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: linux-next: manual merge of the net tree with Linus' tree
From: David Miller @ 2010-07-21  3:27 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel, herbert
In-Reply-To: <20100721120448.31e325fd.sfr@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 21 Jul 2010 12:04:48 +1000

> Today's linux-next merge of the net tree got a conflict in
> net/bridge/br_device.c between commit
> 573201f36fd9c7c6d5218cdcd9948cee700b277d ("bridge: Partially disable
> netpoll support") from Linus' tree and commit
> 91d2c34a4eed32876ca333b0ca44f3bc56645805 ("bridge: Fix netpoll support")
> from the net tree.
> 
> The net tree commit seems to be a fuller fix, so I used that.

The net-2.6 changes should be undone as net-next-2.6 has the fixes
that allow bridge netpoll to work properly, thus in net-next-2.6 the
net-2.6 commit is completely unnecessary.

I did this when I merged net-2.6 into net-next-2.6 about an hour ago
:-)

^ permalink raw reply

* Re: [PATCH net-next-2.6] ixgbe: fix ethtool stats
From: Eric Dumazet @ 2010-07-21  2:38 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: David Miller, Jesse Brandeburg, PJ Waskiewicz, netdev
In-Reply-To: <AANLkTik7JrI3HrtvQRTgrRU30fFb2lrgGxUJsWXedBL0@mail.gmail.com>

Le mardi 20 juillet 2010 à 15:06 -0700, Jeff Kirsher a écrit :
> On Tue, Jul 20, 2010 at 10:28, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Note : I am currently unable to test following patch, could you please
> > Intel guys test it and Ack (or Nack) it ?
> >
> > Thanks !
> >
> > [PATCH net-next-2.6] ixgbe: fix ethtool stats
> >
> > In latest changes about 64bit stats on 32bit arches,
> > [commit 28172739f0a276eb8 (net: fix 64 bit counters on 32 bit arches)],
> > I missed ixgbe uses a bit of magic in its ixgbe_gstrings_stats
> > definition.
> >
> > IXGBE_NETDEV_STAT() must now assume offsets relative to
> > rtnl_link_stats64, not relative do dev->stats.
> >
> > As a bonus, we also get 64bit stats on ethtool -S
> >
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > ---
> >  drivers/net/ixgbe/ixgbe_ethtool.c |   42 ++++++++++++++--------------
> >  1 file changed, 21 insertions(+), 21 deletions(-)
> >
> 
> Thanks Eric, I have added it to my queue.
> 

Thanks !

By the way, my ixgbe conf doesnt like net-next-2.6 at all.
(No link is established in my fiber loop configuration)

current linux-2.6 git runs correctly, link at 10Gb, so there is a
regression somewhere.

As this machine is quite slow (I dont have anymore my Nehalem dev
machine, had to use an old setup), a bisection would take one month...




^ permalink raw reply

* Re: linux-next: manual merge of the net tree with Linus' tree
From: Herbert Xu @ 2010-07-21  2:31 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: David Miller, netdev, linux-next, linux-kernel
In-Reply-To: <20100721120448.31e325fd.sfr@canb.auug.org.au>

On Wed, Jul 21, 2010 at 12:04:48PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the net tree got a conflict in
> net/bridge/br_device.c between commit
> 573201f36fd9c7c6d5218cdcd9948cee700b277d ("bridge: Partially disable
> netpoll support") from Linus' tree and commit
> 91d2c34a4eed32876ca333b0ca44f3bc56645805 ("bridge: Fix netpoll support")
> from the net tree.
> 
> The net tree commit seems to be a fuller fix, so I used that.

Yeah, 573201f is just the temporary fix for 2.6.35.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [RFC PATCH] dst: check if dst is freed in dst_check()
From: Eric Dumazet @ 2010-07-21  2:28 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <4C457120.9070105@6wind.com>

Le mardi 20 juillet 2010 à 11:49 +0200, Nicolas Dichtel a écrit :
> Hi,
> 
> I probably missed something, but I cannot find where obsolete field is checked 
> when dst_check() is called. If dst->obsolete is > 1, dst cannot be used!
> 
> Attached is a proposal to fix this issue.
> 
> 

> diff --git a/include/net/dst.h b/include/net/dst.h
> index 81d1413..7bf4f9a 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -319,6 +319,8 @@ static inline int dst_input(struct sk_buff *skb)
>  
>  static inline struct dst_entry *dst_check(struct dst_entry *dst, u32 cookie)
>  {
> +	if (dst->obsolete > 1)
> +		return NULL;
>  	if (dst->obsolete)
>  		dst = dst->ops->check(dst, cookie);
>  	return dst;

I believe this is not needed and redundant.

In what case do you think this matters ?

To my knowledge dst_check() is only used by net/xfrm/xfrm_policy.c

And xfrm_dst_check() does the necessary checks.

static struct dst_entry *xfrm_dst_check(struct dst_entry *dst, u32 cookie)
{
        /* Code (such as __xfrm4_bundle_create()) sets dst->obsolete
         * to "-1" to force all XFRM destinations to get validated by
         * dst_ops->check on every use.  We do this because when a
         * normal route referenced by an XFRM dst is obsoleted we do
         * not go looking around for all parent referencing XFRM dsts
         * so that we can invalidate them.  It is just too much work.
         * Instead we make the checks here on every use.  For example:
         *
         *      XFRM dst A --> IPv4 dst X
         *
         * X is the "xdst->route" of A (X is also the "dst->path" of A
         * in this example).  If X is marked obsolete, "A" will not
         * notice.  That's what we are validating here via the
         * stale_bundle() check.
         *
         * When a policy's bundle is pruned, we dst_free() the XFRM
         * dst which causes it's ->obsolete field to be set to a
         * positive non-zero integer.  If an XFRM dst has been pruned
         * like this, we want to force a new route lookup.
         */
        if (dst->obsolete < 0 && !stale_bundle(dst))
                return dst;

        return NULL;
}



^ permalink raw reply

* linux-next: manual merge of the net tree with Linus' tree
From: Stephen Rothwell @ 2010-07-21  2:04 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-next, linux-kernel, Herbert Xu

[-- Attachment #1: Type: text/plain, Size: 493 bytes --]

Hi all,

Today's linux-next merge of the net tree got a conflict in
net/bridge/br_device.c between commit
573201f36fd9c7c6d5218cdcd9948cee700b277d ("bridge: Partially disable
netpoll support") from Linus' tree and commit
91d2c34a4eed32876ca333b0ca44f3bc56645805 ("bridge: Fix netpoll support")
from the net tree.

The net tree commit seems to be a fuller fix, so I used that.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* linux-next: manual merge of the wireless tree with the net tree
From: Stephen Rothwell @ 2010-07-21  2:04 UTC (permalink / raw)
  To: John W. Linville
  Cc: linux-next, linux-kernel, Eric Dumazet, Wey-Yi Guy, David Miller,
	netdev

Hi John,

Today's linux-next merge of the wireless tree got a conflict in
drivers/net/wireless/iwlwifi/iwl-commands.h between commit
ba2d3587912f82d1ab4367975b1df460db60fb1e ("drivers/net: use __packed
annotation") from the net tree and commit
7c094c5cc4d28062abf0d33ca022dbea6c522558 ("iwlwifi: additional statistic
debug counter") from the wireless tree.

I fixed it up (see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc drivers/net/wireless/iwlwifi/iwl-commands.h
index 8d2db9d,83247f7..0000000
--- a/drivers/net/wireless/iwlwifi/iwl-commands.h
+++ b/drivers/net/wireless/iwlwifi/iwl-commands.h
@@@ -3035,8 -3035,9 +3035,9 @@@ struct iwl39_statistics_tx 
  struct statistics_dbg {
  	__le32 burst_check;
  	__le32 burst_count;
- 	__le32 reserved[4];
+ 	__le32 wait_for_silence_timeout_cnt;
+ 	__le32 reserved[3];
 -} __attribute__ ((packed));
 +} __packed;
  
  struct iwl39_statistics_div {
  	__le32 tx_on_a;

^ permalink raw reply

* Re: [PATCH] LSM: Add post accept() hook.
From: Tetsuo Handa @ 2010-07-21  2:00 UTC (permalink / raw)
  To: paul.moore
  Cc: davem, eric.dumazet, jmorris, sam, serge, netdev,
	linux-security-module
In-Reply-To: <201007201552.29539.paul.moore@hp.com>

Paul Moore wrote:
> On Monday, July 19, 2010 09:36:31 pm Tetsuo Handa wrote:
> > One is for dropping connections from unwanted hosts. Administrators define
> > policy before enabling enforcing mode (the mode which connections are
> > dropped if operation was not granted by policy). Administrators specify
> > acceptable hosts (i.e. hosts which this host needs to communicate with)
> > and unacceptable hosts (i.e. hosts which this host needn't to communicate
> > with).
> 
> You can enforce per-host access controls without the need for a post-accept() 
> hooks, e.g. security_sock_rcv_skb() and the netfilter hooks 
> (NF_INET_POST_ROUTING, NF_INET_FORWARD, NF_INET_LOCAL_OUT).  Or are you 
> interested in controlling which hosts an _application_ can communicate with?

I'm interested in controlling which ports on which hosts a _process_ can
communicate with. In TOMOYO's words, "processes that belong to which TOMOYO's
domain can communicate with which ports on which hosts".

TOMOYO's rules are

  Processes that belong to FOO domain can open /etc/fstab for reading.
     ( allow_read /etc/fstab )

  Processes that belong to FOO domain can create /tmp/file with mode 0600.
     ( allow_create /tmp/file 0600 )

  Processes that belong to FOO domain can connect to port 80 on host
  10.20.30.40 using TCP protocol.
     ( allow_network TCP connect 10.20.30.40 80 )

and so on. But currently,

  Processes that belong to FOO domain can accept TCP connections from port 1024
  on host 10.20.30.40.
     ( allow_network TCP accept 10.20.30.40 1024 )

  Processes that belong to FOO domain can receive UDP messages from port 65535
  on host 100.200.10.20.
     ( allow_network UDP connect 100.200.10.20 65535 )

are impossible.

Regarding outgoing connections/datagrams, we can specify address/port
parameters from the point of view of _process_ who actually sends requests.
But regarding incoming connections/datagrams, we cannot specify address/port
parameters from the point of view of _process_ who actually receives requests.

We can enforce per-host access controls using iptables.
But we can't use iptables for controlling address/port parameters for incoming
connections/datagrams because the process who actually receives requests
(ServewrApp2 in below example) is not always the same as the process who
created the socket (ServerApp1 in below example).

> > Dropping connections would happen if some process was hijacked and the
> > process attempted to communicate with other processes using TCP
> > connections. But dropping connections should not happen in normal
> > circumstance.
> 
> It doesn't matter if dropping connections is normal or not, what matters is 
> that it can happen.
> 
> > The other is for updating process's state variable upon accept() operation.
> > LKM version of TOMOYO has per a task_struct variable that is used for
> > implementing stateful permissions. (As of now, not implemented for LSM
> > version of TOMOYO.)
> 
> I'm open to re-introducing a post-accept() hook that does not have a return 
> value, in other words, a hook that can only be used to update LSM state and 
> not affect the connection.  Although I do think you could probably achieve the 
> same thing using some of the existing LSM hooks (look at how SELinux updates 
> its state upon accept()) but that is something you would have to look it and 
> see if it works for TOMOYO.

I can't figure out why the hook must not affect the connection.
Is it possible to clarify using below players?

Server1 and Client1 are hosts which are connected on TCP/IP network.
ServerApp1 and ServerApp2 are applications running on Server1 which might call
socket(), bind(), listen(), accept(), send(), recv(), shutdown(), close() and
execute().
ClientApp1 and ClientApp2 are applications running on Client1 which might call
socket(), connect(), send(), recv(), shutdown(), close().
Router1 and Router2 are routers which exist between Server1 and Client1.

  +-------+   +-------+   +-------+   +-------+
  |Server1|---|Router1|---|Router2|---|Client1|
  +-------+   +-------+   +-------+   +-------+

Event sequences:

Server1                       Client1

  ServerApp1 creates a socket using socket().

  ServerApp1 binds to an address using bind().

  ServerApp1 listens to the address using listen().

                                ClientApp1 creates a socket using socket().

                                ClientApp1 issues connect() request.

                                  Sends SYN.

    Receives SYN.

    Sends SYN/ACK.

                                  Receives SYN/ACK.

                                  Sends ACK.

    Receives ACK.

                                ClientApp1 issues send() request.

                                  Sends data.

    Receives data.

    Sends ACK.

                                  Receives ACK.

                                ClientApp1 issues send() request.

                                  Sends data.

    Receives data.

    Sends ACK.

                                  Receives ACK.

  ServerApp1 calls execve("ServerApp2").

  ServerApp2 issues accept() request.

    security_socket_accept() is called.

    sock->ops->accept() is called.

    security_socket_post_accept() is called. (*3)

    newsock->ops->getname() is called. (*1)

    move_addr_to_user() is called. (*2)

    fd_install() is called.

  ServerApp2 issues some requests.

    Some LSM hooks will be called.




*1: This may fail and the connection is discarded if failed.
    Thus, newsock->ops->getname() affects the connection.
    This is not fault of ServerApp2. Maybe this is fault of ClientApp1 or
    Router1 or Router2, but discarding already established connection is
    justified.

*2: This may fail and the connection is discarded if failed.
    Thus, move_addr_to_user() affects the connection.
    Is this the fault of ServerApp2?
    If the upeer_sockaddr supplied by ServerApp2 was bad, this is the fault of
    ServerApp2. Thus, discarding already established connection is justified.
    If the upeer_sockaddr supplied by ServerApp2 was good but physical RAM was
    not yet assigned for the upeer_sockaddr, and OOM killer was invoked when
    attempted to write to upeer_sockaddr and OOM killer chose ServerApp2, and
    the ServerApp2 is killed. This is not fault of ServerApp2. But discarding
    already established connection is justified.

*3: newsock->ops->getname() and move_addr_to_user() already affects the
    connection. They discard already established connections even if the cause
    is not ServerApp2's fault. Why security_socket_post_accept() affecting the
    connection cannot be justified?

Router1 and Router2 can inject RST into the already established connections
at any time (if they are IDS/IPS or broken or malicious).
How does security_socket_post_accept() returning an error differs from these
routers injecting RST?

Regards.

^ permalink raw reply

* Re: linux-next: manual merge of the net tree with the net-current tree
From: David Miller @ 2010-07-21  1:27 UTC (permalink / raw)
  To: joe; +Cc: sfr, netdev, linux-next, linux-kernel, jdike, mst
In-Reply-To: <1279593240.19374.2.camel@Joe-Laptop.home>

From: Joe Perches <joe@perches.com>
Date: Mon, 19 Jul 2010 19:34:00 -0700

> On Tue, 2010-07-20 at 12:20 +1000, Stephen Rothwell wrote:
>> I fixed it up (see below) and can carry the fix as necessary.
> @@@ -527,15 -527,12 +527,14 @@@ static long vhost_net_set_backend(struc
>   
>         /* start polling new socket */
>         oldsock = vq->private_data;
> -       if (sock == oldsock)
> -               goto done;
> +       if (sock != oldsock){
> 
> Trivial: missing space before open brace in commit
> dd1f4078f0d2de74a308f00a2dffbd550cfba59f

Thanks guys, I'm taking care of this as I merge net-2.6 into
net-next-2.6

^ permalink raw reply

* Re: [patch v2.7 4/4] libxt_ipvs: user-space lib for netfilter matcher xt_ipvs
From: Simon Horman @ 2010-07-21  1:23 UTC (permalink / raw)
  To: lvs-devel, netdev, linux-kernel, netfilter, netfilter-devel
  Cc: Mark Brooks, Malcolm Turnbull, Wensong Zhang, Julius Volz,
	Patrick McHardy, David S. Miller, Hannes Eder
In-Reply-To: <20100721012146.GC22966@verge.net.au>

From:	Hannes Eder <heder@google.com>

The user-space library for the netfilter matcher xt_ipvs.

[ trivial up-port by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Acked-by: Simon Horman <horms@verge.net.au>

 configure.ac                      |   10 -
 extensions/libxt_ipvs.c           |  365 +++++++++++++++++++++++++++++++++++++
 extensions/libxt_ipvs.man         |   24 ++
 include/linux/netfilter/xt_ipvs.h |   25 +++
 4 files changed, 422 insertions(+), 2 deletions(-)
 create mode 100644 extensions/libxt_ipvs.c
 create mode 100644 extensions/libxt_ipvs.man
 create mode 100644 include/linux/netfilter/xt_ipvs.h

v2.7
* Update struct xt_ipvs_mtinfo to use __u8 instead of __16 for the l4proto
  and fwd_method to reflect the same change to the kernel copy
  of struct xt_ipvs_mtinfo.

v2.1
* Trival up-port

Index: iptables/configure.ac
===================================================================
--- iptables.orig/configure.ac	2010-07-21 09:43:55.000000000 +0900
+++ iptables/configure.ac	2010-07-21 09:44:02.000000000 +0900
@@ -52,12 +52,18 @@ AC_ARG_WITH([pkgconfigdir], AS_HELP_STRI
 	[Path to the pkgconfig directory [[LIBDIR/pkgconfig]]]),
 	[pkgconfigdir="$withval"], [pkgconfigdir='${libdir}/pkgconfig'])
 
-AC_CHECK_HEADER([linux/dccp.h])
-
 blacklist_modules="";
+
+AC_CHECK_HEADER([linux/dccp.h])
 if test "$ac_cv_header_linux_dccp_h" != "yes"; then
 	blacklist_modules="$blacklist_modules dccp";
 fi;
+
+AC_CHECK_HEADER([linux/ip_vs.h])
+if test "$ac_cv_header_linux_ip_vs_h" != "yes"; then
+	blacklist_modules="$blacklist_modules ipvs";
+fi;
+
 AC_SUBST([blacklist_modules])
 
 AM_CONDITIONAL([ENABLE_STATIC], [test "$enable_static" = "yes"])
Index: iptables/extensions/libxt_ipvs.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ iptables/extensions/libxt_ipvs.c	2010-07-21 10:07:17.000000000 +0900
@@ -0,0 +1,365 @@
+/*
+ * Shared library add-on to iptables to add IPVS matching.
+ *
+ * Detailed doc is in the kernel module source net/netfilter/xt_ipvs.c
+ *
+ * Author: Hannes Eder <heder@google.com>
+ */
+#include <sys/types.h>
+#include <assert.h>
+#include <ctype.h>
+#include <errno.h>
+#include <getopt.h>
+#include <netdb.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <xtables.h>
+#include <linux/ip_vs.h>
+#include <linux/netfilter/xt_ipvs.h>
+
+static const struct option ipvs_mt_opts[] = {
+	{ .name = "ipvs",     .has_arg = false, .val = '0' },
+	{ .name = "vproto",   .has_arg = true,  .val = '1' },
+	{ .name = "vaddr",    .has_arg = true,  .val = '2' },
+	{ .name = "vport",    .has_arg = true,  .val = '3' },
+	{ .name = "vdir",     .has_arg = true,  .val = '4' },
+	{ .name = "vmethod",  .has_arg = true,  .val = '5' },
+	{ .name = "vportctl", .has_arg = true,  .val = '6' },
+	{ .name = NULL }
+};
+
+static void ipvs_mt_help(void)
+{
+	printf(
+"IPVS match options:\n"
+"[!] --ipvs                      packet belongs to an IPVS connection\n"
+"\n"
+"Any of the following options implies --ipvs (even negated)\n"
+"[!] --vproto protocol           VIP protocol to match; by number or name,\n"
+"                                e.g. \"tcp\"\n"
+"[!] --vaddr address[/mask]      VIP address to match\n"
+"[!] --vport port                VIP port to match; by number or name,\n"
+"                                e.g. \"http\"\n"
+"    --vdir {ORIGINAL|REPLY}     flow direction of packet\n"
+"[!] --vmethod {GATE|IPIP|MASQ}  IPVS forwarding method used\n"
+"[!] --vportctl port             VIP port of the controlling connection to\n"
+"                                match, e.g. 21 for FTP\n"
+		);
+}
+
+static void ipvs_mt_parse_addr_and_mask(const char *arg,
+					union nf_inet_addr *address,
+					union nf_inet_addr *mask,
+					unsigned int family)
+{
+	struct in_addr *addr = NULL;
+	struct in6_addr *addr6 = NULL;
+	unsigned int naddrs = 0;
+
+	if (family == NFPROTO_IPV4) {
+		xtables_ipparse_any(arg, &addr, &mask->in, &naddrs);
+		if (naddrs > 1)
+			xtables_error(PARAMETER_PROBLEM,
+				      "multiple IP addresses not allowed");
+		if (naddrs == 1)
+			memcpy(&address->in, addr, sizeof(*addr));
+	} else if (family == NFPROTO_IPV6) {
+		xtables_ip6parse_any(arg, &addr6, &mask->in6, &naddrs);
+		if (naddrs > 1)
+			xtables_error(PARAMETER_PROBLEM,
+				      "multiple IP addresses not allowed");
+		if (naddrs == 1)
+			memcpy(&address->in6, addr6, sizeof(*addr6));
+	} else {
+		/* Hu? */
+		assert(false);
+	}
+}
+
+/* Function which parses command options; returns true if it ate an option */
+static int ipvs_mt_parse(int c, char **argv, int invert, unsigned int *flags,
+			 const void *entry, struct xt_entry_match **match,
+			 unsigned int family)
+{
+	struct xt_ipvs_mtinfo *data = (void *)(*match)->data;
+	char *p = NULL;
+	u_int8_t op = 0;
+
+	if ('0' <= c && c <= '6') {
+		static const int ops[] = {
+			XT_IPVS_IPVS_PROPERTY,
+			XT_IPVS_PROTO,
+			XT_IPVS_VADDR,
+			XT_IPVS_VPORT,
+			XT_IPVS_DIR,
+			XT_IPVS_METHOD,
+			XT_IPVS_VPORTCTL
+		};
+		op = ops[c - '0'];
+	} else
+		return 0;
+
+	if (*flags & op & XT_IPVS_ONCE_MASK)
+		goto multiple_use;
+
+	switch (c) {
+	case '0': /* --ipvs */
+		/* Nothing to do here. */
+		break;
+
+	case '1': /* --vproto */
+		/* Canonicalize into lower case */
+		for (p = optarg; *p != '\0'; ++p)
+			*p = tolower(*p);
+
+		data->l4proto = xtables_parse_protocol(optarg);
+		break;
+
+	case '2': /* --vaddr */
+		ipvs_mt_parse_addr_and_mask(optarg, &data->vaddr,
+					    &data->vmask, family);
+		break;
+
+	case '3': /* --vport */
+		data->vport = htons(xtables_parse_port(optarg, "tcp"));
+		break;
+
+	case '4': /* --vdir */
+		xtables_param_act(XTF_NO_INVERT, "ipvs", "--vdir", invert);
+		if (strcasecmp(optarg, "ORIGINAL") == 0) {
+			data->bitmask |= XT_IPVS_DIR;
+			data->invert   &= ~XT_IPVS_DIR;
+		} else if (strcasecmp(optarg, "REPLY") == 0) {
+			data->bitmask |= XT_IPVS_DIR;
+			data->invert  |= XT_IPVS_DIR;
+		} else {
+			xtables_param_act(XTF_BAD_VALUE,
+					  "ipvs", "--vdir", optarg);
+		}
+		break;
+
+	case '5': /* --vmethod */
+		if (strcasecmp(optarg, "GATE") == 0)
+			data->fwd_method = IP_VS_CONN_F_DROUTE;
+		else if (strcasecmp(optarg, "IPIP") == 0)
+			data->fwd_method = IP_VS_CONN_F_TUNNEL;
+		else if (strcasecmp(optarg, "MASQ") == 0)
+			data->fwd_method = IP_VS_CONN_F_MASQ;
+		else
+			xtables_param_act(XTF_BAD_VALUE,
+					  "ipvs", "--vmethod", optarg);
+		break;
+
+	case '6': /* --vportctl */
+		data->vportctl = htons(xtables_parse_port(optarg, "tcp"));
+		break;
+
+	default:
+		/* Hu? How did we come here? */
+		assert(false);
+		return 0;
+	}
+
+	if (op & XT_IPVS_ONCE_MASK) {
+		if (data->invert & XT_IPVS_IPVS_PROPERTY)
+			xtables_error(PARAMETER_PROBLEM,
+				      "! --ipvs cannot be together with"
+				      " other options");
+		data->bitmask |= XT_IPVS_IPVS_PROPERTY;
+	}
+
+	data->bitmask |= op;
+	if (invert)
+		data->invert |= op;
+	*flags |= op;
+	return 1;
+
+multiple_use:
+	xtables_error(PARAMETER_PROBLEM,
+		      "multiple use of the same IPVS option is not allowed");
+}
+
+static int ipvs_mt4_parse(int c, char **argv, int invert, unsigned int *flags,
+			  const void *entry, struct xt_entry_match **match)
+{
+	return ipvs_mt_parse(c, argv, invert, flags, entry, match,
+			     NFPROTO_IPV4);
+}
+
+static int ipvs_mt6_parse(int c, char **argv, int invert, unsigned int *flags,
+			  const void *entry, struct xt_entry_match **match)
+{
+	return ipvs_mt_parse(c, argv, invert, flags, entry, match,
+			     NFPROTO_IPV6);
+}
+
+static void ipvs_mt_check(unsigned int flags)
+{
+	if (flags == 0)
+		xtables_error(PARAMETER_PROBLEM,
+			      "IPVS: At least one option is required");
+}
+
+/* Shamelessly copied from libxt_conntrack.c */
+static void ipvs_mt_dump_addr(const union nf_inet_addr *addr,
+			      const union nf_inet_addr *mask,
+			      unsigned int family, bool numeric)
+{
+	char buf[BUFSIZ];
+
+	if (family == NFPROTO_IPV4) {
+		if (!numeric && addr->ip == 0) {
+			printf("anywhere ");
+			return;
+		}
+		if (numeric)
+			strcpy(buf, xtables_ipaddr_to_numeric(&addr->in));
+		else
+			strcpy(buf, xtables_ipaddr_to_anyname(&addr->in));
+		strcat(buf, xtables_ipmask_to_numeric(&mask->in));
+		printf("%s ", buf);
+	} else if (family == NFPROTO_IPV6) {
+		if (!numeric && addr->ip6[0] == 0 && addr->ip6[1] == 0 &&
+		    addr->ip6[2] == 0 && addr->ip6[3] == 0) {
+			printf("anywhere ");
+			return;
+		}
+		if (numeric)
+			strcpy(buf, xtables_ip6addr_to_numeric(&addr->in6));
+		else
+			strcpy(buf, xtables_ip6addr_to_anyname(&addr->in6));
+		strcat(buf, xtables_ip6mask_to_numeric(&mask->in6));
+		printf("%s ", buf);
+	}
+}
+
+static void ipvs_mt_dump(const void *ip, const struct xt_ipvs_mtinfo *data,
+			 unsigned int family, bool numeric, const char *prefix)
+{
+	if (data->bitmask == XT_IPVS_IPVS_PROPERTY) {
+		if (data->invert & XT_IPVS_IPVS_PROPERTY)
+			printf("! ");
+		printf("%sipvs ", prefix);
+	}
+
+	if (data->bitmask & XT_IPVS_PROTO) {
+		if (data->invert & XT_IPVS_PROTO)
+			printf("! ");
+		printf("%sproto %u ", prefix, data->l4proto);
+	}
+
+	if (data->bitmask & XT_IPVS_VADDR) {
+		if (data->invert & XT_IPVS_VADDR)
+			printf("! ");
+
+		printf("%svaddr ", prefix);
+		ipvs_mt_dump_addr(&data->vaddr, &data->vmask, family, numeric);
+	}
+
+	if (data->bitmask & XT_IPVS_VPORT) {
+		if (data->invert & XT_IPVS_VPORT)
+			printf("! ");
+
+		printf("%svport %u ", prefix, ntohs(data->vport));
+	}
+
+	if (data->bitmask & XT_IPVS_DIR) {
+		if (data->invert & XT_IPVS_DIR)
+			printf("%svdir REPLY ", prefix);
+		else
+			printf("%svdir ORIGINAL ", prefix);
+	}
+
+	if (data->bitmask & XT_IPVS_METHOD) {
+		if (data->invert & XT_IPVS_METHOD)
+			printf("! ");
+
+		printf("%svmethod ", prefix);
+		switch (data->fwd_method) {
+		case IP_VS_CONN_F_DROUTE:
+			printf("GATE ");
+			break;
+		case IP_VS_CONN_F_TUNNEL:
+			printf("IPIP ");
+			break;
+		case IP_VS_CONN_F_MASQ:
+			printf("MASQ ");
+			break;
+		default:
+			/* Hu? */
+			printf("UNKNOWN ");
+			break;
+		}
+	}
+
+	if (data->bitmask & XT_IPVS_VPORTCTL) {
+		if (data->invert & XT_IPVS_VPORTCTL)
+			printf("! ");
+
+		printf("%svportctl %u ", prefix, ntohs(data->vportctl));
+	}
+}
+
+static void ipvs_mt4_print(const void *ip, const struct xt_entry_match *match,
+			   int numeric)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV4, numeric, "");
+}
+
+static void ipvs_mt6_print(const void *ip, const struct xt_entry_match *match,
+			   int numeric)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV6, numeric, "");
+}
+
+static void ipvs_mt4_save(const void *ip, const struct xt_entry_match *match)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV4, true, "--");
+}
+
+static void ipvs_mt6_save(const void *ip, const struct xt_entry_match *match)
+{
+	const struct xt_ipvs_mtinfo *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV6, true, "--");
+}
+
+static struct xtables_match ipvs_matches_reg[] = {
+	{
+		.version       = XTABLES_VERSION,
+		.name          = "ipvs",
+		.revision      = 0,
+		.family        = NFPROTO_IPV4,
+		.size          = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.help          = ipvs_mt_help,
+		.parse         = ipvs_mt4_parse,
+		.final_check   = ipvs_mt_check,
+		.print         = ipvs_mt4_print,
+		.save          = ipvs_mt4_save,
+		.extra_opts    = ipvs_mt_opts,
+	},
+	{
+		.version       = XTABLES_VERSION,
+		.name          = "ipvs",
+		.revision      = 0,
+		.family        = NFPROTO_IPV6,
+		.size          = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_ipvs_mtinfo)),
+		.help          = ipvs_mt_help,
+		.parse         = ipvs_mt6_parse,
+		.final_check   = ipvs_mt_check,
+		.print         = ipvs_mt6_print,
+		.save          = ipvs_mt6_save,
+		.extra_opts    = ipvs_mt_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_matches(ipvs_matches_reg,
+				 ARRAY_SIZE(ipvs_matches_reg));
+}
Index: iptables/extensions/libxt_ipvs.man
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ iptables/extensions/libxt_ipvs.man	2010-07-21 09:44:02.000000000 +0900
@@ -0,0 +1,24 @@
+Match IPVS connection properties.
+.TP
+[\fB!\fR] \fB\-\-ipvs\fP
+packet belongs to an IPVS connection
+.TP
+Any of the following options implies \-\-ipvs (even negated)
+.TP
+[\fB!\fR] \fB\-\-vproto\fP \fIprotocol\fP
+VIP protocol to match; by number or name, e.g. "tcp"
+.TP
+[\fB!\fR] \fB\-\-vaddr\fP \fIaddress\fP[\fB/\fP\fImask\fP]
+VIP address to match
+.TP
+[\fB!\fR] \fB\-\-vport\fP \fIport\fP
+VIP port to match; by number or name, e.g. "http"
+.TP
+\fB\-\-vdir\fP {\fBORIGINAL\fP|\fBREPLY\fP}
+flow direction of packet
+.TP
+[\fB!\fR] \fB\-\-vmethod\fP {\fBGATE\fP|\fBIPIP\fP|\fBMASQ\fP}
+IPVS forwarding method used
+.TP
+[\fB!\fR] \fB\-\-vportctl\fP \fIport\fP
+VIP port of the controlling connection to match, e.g. 21 for FTP
Index: iptables/include/linux/netfilter/xt_ipvs.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ iptables/include/linux/netfilter/xt_ipvs.h	2010-07-21 10:05:47.000000000 +0900
@@ -0,0 +1,25 @@
+#ifndef _XT_IPVS_H
+#define _XT_IPVS_H 1
+
+#define XT_IPVS_IPVS_PROPERTY	(1 << 0) /* all other options imply this one */
+#define XT_IPVS_PROTO		(1 << 1)
+#define XT_IPVS_VADDR		(1 << 2)
+#define XT_IPVS_VPORT		(1 << 3)
+#define XT_IPVS_DIR		(1 << 4)
+#define XT_IPVS_METHOD		(1 << 5)
+#define XT_IPVS_VPORTCTL	(1 << 6)
+#define XT_IPVS_MASK		((1 << 7) - 1)
+#define XT_IPVS_ONCE_MASK	(XT_IPVS_MASK & ~XT_IPVS_IPVS_PROPERTY)
+
+struct xt_ipvs_mtinfo {
+	union nf_inet_addr	vaddr, vmask;
+	__be16			vport;
+	__u8			l4proto;
+	__u8			fwd_method;
+	__be16			vportctl;
+
+	__u8			invert;
+	__u8			bitmask;
+};
+
+#endif /* _XT_IPVS_H */

^ permalink raw reply

* Re: [patch v2.6 4/4] libxt_ipvs: user-space lib for netfilter matcher xt_ipvs
From: Simon Horman @ 2010-07-21  1:21 UTC (permalink / raw)
  To: lvs-devel, netdev, linux-kernel, netfilter, netfilter-devel
  Cc: Malcolm Turnbull, Wensong Zhang, Julius Volz, Patrick McHardy,
	David S. Miller, Hannes Eder
In-Reply-To: <20100711090500.421568837@vergenet.net>

On Sun, Jul 11, 2010 at 06:03:46PM +0900, horms@vergenet.net wrote:
> From:	Hannes Eder <heder@google.com>
> 
> The user-space library for the netfilter matcher xt_ipvs.

[snip]

> Index: iptables/include/linux/netfilter/xt_ipvs.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ iptables/include/linux/netfilter/xt_ipvs.h	2010-07-04 20:23:30.000000000 +0900
> @@ -0,0 +1,25 @@
> +#ifndef _XT_IPVS_H
> +#define _XT_IPVS_H 1
> +
> +#define XT_IPVS_IPVS_PROPERTY	(1 << 0) /* all other options imply this one */
> +#define XT_IPVS_PROTO		(1 << 1)
> +#define XT_IPVS_VADDR		(1 << 2)
> +#define XT_IPVS_VPORT		(1 << 3)
> +#define XT_IPVS_DIR		(1 << 4)
> +#define XT_IPVS_METHOD		(1 << 5)
> +#define XT_IPVS_VPORTCTL	(1 << 6)
> +#define XT_IPVS_MASK		((1 << 7) - 1)
> +#define XT_IPVS_ONCE_MASK	(XT_IPVS_MASK & ~XT_IPVS_IPVS_PROPERTY)
> +
> +struct xt_ipvs_mtinfo {
> +	union nf_inet_addr	vaddr, vmask;
> +	__be16			vport;
> +	__u16			l4proto;
> +	__u16			fwd_method;

The kernel version of this file has been updated so that
l4proto and fwd_method are __u8. This also needs to be updated.
I will post an updated patch (v2.7).

> +	__be16			vportctl;
> +
> +	__u8			invert;
> +	__u8			bitmask;
> +};
> +
> +#endif /* _XT_IPVS_H */

^ permalink raw reply

* Re: [PATCH -next] net: NET_DSA depends on NET_ETHERNET
From: David Miller @ 2010-07-21  0:45 UTC (permalink / raw)
  To: randy.dunlap; +Cc: sfr, netdev, linux-next, linux-kernel, buytenh
In-Reply-To: <4C462B44.5010107@oracle.com>

From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Tue, 20 Jul 2010 16:03:32 -0700

> From: Randy Dunlap <randy.dunlap@oracle.com>
> 
> NET_DSA code selects and uses PHYLIB code, but PHYLIB depends on
> NET_ETHERNET.  However, "select" does not follow kconfig dependencies,
> so explicitly list that requirement here instead.
> 
> Fixes this kconfig warning:
> 
> warning: (NET_DSA && NET && EXPERIMENTAL && !S390 ...) selects PHYLIB which has unmet direct dependencies (!S390 && NET_ETHERNET)
> 
> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>

Randy, this has been fixed in net-2.6 for some time now.

And I'm pretty sure I sent a copy of this to you when I
checked it in :-)

--------------------
>From 336a283b9cbe47748ccd68fd8c5158f67cee644b Mon Sep 17 00:00:00 2001
From: David S. Miller <davem@davemloft.net>
Date: Mon, 12 Jul 2010 20:03:42 -0700
Subject: [PATCH 09/24] dsa: Fix Kconfig dependencies.

Based upon a report by Randy Dunlap.

DSA needs PHYLIB, but PHYLIB needs NET_ETHERNET.  So, in order
to select PHYLIB we have to make DSA depend upon NET_ETHERNET.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/dsa/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index c51b554..1120178 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -1,7 +1,7 @@
 menuconfig NET_DSA
 	bool "Distributed Switch Architecture support"
 	default n
-	depends on EXPERIMENTAL && !S390
+	depends on EXPERIMENTAL && NET_ETHERNET && !S390
 	select PHYLIB
 	---help---
 	  This allows you to use hardware switch chips that use
-- 
1.7.1.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox