Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 0/3] virtio-net: inline header support
From: Rusty Russell @ 2012-10-04  3:34 UTC (permalink / raw)
  To: Anthony Liguori, Michael S. Tsirkin, Thomas Lendacky
  Cc: Sasha Levin, virtualization, linux-kernel, avi, kvm, netdev
In-Reply-To: <87mx033u74.fsf@codemonkey.ws>

Anthony Liguori <anthony@codemonkey.ws> writes:
> Rusty Russell <rusty@rustcorp.com.au> writes:
>
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>
>>> Thinking about Sasha's patches, we can reduce ring usage
>>> for virtio net small packets dramatically if we put
>>> virtio net header inline with the data.
>>> This can be done for free in case guest net stack allocated
>>> extra head room for the packet, and I don't see
>>> why would this have any downsides.
>>
>> I've been wanting to do this for the longest time... but...
>>
>>> Even though with my recent patches qemu
>>> no longer requires header to be the first s/g element,
>>> we need a new feature bit to detect this.
>>> A trivial qemu patch will be sent separately.
>>
>> There's a reason I haven't done this.  I really, really dislike "my
>> implemention isn't broken" feature bits.  We could have an infinite
>> number of them, for each bug in each device.
>
> This is a bug in the specification.
>
> The QEMU implementation pre-dates the specification.  All of the actual
> implementations of virtio relied on the semantics of s/g elements and
> still do.

lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
ever going to be merged, so I'm not sure what its status is?  But I'm
determined to fix qemu, and hence my torture patch to make sure this
doesn't creep in again.

> What's in the specification really doesn't matter when it doesn't agree
> with all of the existing implementations.
>
> Users use implementations, not specifications.  The specification really
> ought to be changed here.

I'm sorely tempted, except that we're losing a real optimization because
of this :(

The specification has long contained the footnote:

        The current qemu device implementations mistakenly insist that
        the first descriptor cover the header in these cases exactly, so
        a cautious driver should arrange it so.

I'd like to tie this caveat to the PCI capability change, so this note
will move to the appendix with the old PCI layout.

Cheers,
Rusty.

^ permalink raw reply

* Re: [PATCH 3/3] vxlan: virtual extensible lan
From: Stephen Hemminger @ 2012-10-04  4:02 UTC (permalink / raw)
  To: Jesse Gross; +Cc: davem, netdev
In-Reply-To: <CAEP_g=9Y3MmsABQJyi508FEgs4635tnaVqk4Ncwbpm41bLskhg@mail.gmail.com>

On Wed, 3 Oct 2012 18:54:46 -0700
Jesse Gross <jesse@nicira.com> wrote:

> On Mon, Oct 1, 2012 at 3:32 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> > --- /dev/null   1970-01-01 00:00:00.000000000 +0000
> > +++ b/drivers/net/vxlan.c       2012-10-01 15:08:38.024499080 -0700
> > +/* Transmit local packets over Vxlan
> > + *
> > + * Outer IP header inherits ECN and DF from inner header.
> > + * Outer UDP destination is the VXLAN assigned port.
> > + *           source port is based on hash of flow if available
> > + *                       otherwise use a random value
> > + */
> > +static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
> [...]
> > +       hash = skb_get_rxhash(skb);
> [...]
> > +       uh->dest = htons(vxlan_port);
> > +       uh->source = hash ? :random32();
> 
> I think this may lead to packet reordering in some cases.  If the
> protocol is unknown then it may still represent part of a flow and
> random source ports could lead to taking different paths.
> 
> Also, does it make sense to restrict the range of ports to, say, the
> upper half of the range?

Since port is always a pair, there is no need to restrict range, unless
there was a broken firewall in the way. One bug there is that random32()
can return 0 which is not a valid port number.  A better fallback
would be a hash of the MAC header.

^ permalink raw reply

* RE: [PATCHv4 1/4] modem_shm: Add Modem Access Framework
From: Arun MURTHY @ 2012-10-04  4:08 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-doc@vger.kernel.org, alan@lxorguk.ukuu.org.uk
In-Reply-To: <20121003151748.GA5745@kroah.com>

> On Wed, Oct 03, 2012 at 05:54:08AM +0200, Arun MURTHY wrote:
> > > On Mon, Oct 01, 2012 at 07:30:38AM +0200, Arun MURTHY wrote:
> > > > > On Fri, Sep 28, 2012 at 01:35:01PM +0530, Arun Murthy wrote:
> > > > > > +#include <linux/module.h>
> > > > > > +#include <linux/slab.h>
> > > > > > +#include <linux/err.h>
> > > > > > +#include <linux/printk.h>
> > > > > > +#include <linux/modem_shm/modem.h>
> > > > > > +
> > > > > > +static struct class *modem_class;
> > > > >
> > > > > What's wrong with a bus_type instead?
> > > >
> > > > Can I know the advantage of using bus_type over class?
> > >
> > > You have devices living on a bus, and it's much more descriptive
> > > than a class (which we are going to eventually get rid of one of these
> days...).
> > >
> > > Might I ask why you choose a class over a bus_type?
> >
> > Basically my requirement is to create a central entity for accessing
> > and releasing modem from APE.
> 
> What is an "APE"?
> 
> And what do you mean by "accessing" and "releasing"?

APE - Application Processor Engine
There are two processors but on a single chip, one being the APE and other
is the modem. So 'accessing' means requesting access or waking-up the
co-processor and releasing means allowing the co-processor to sleep.


> 
> > Since this is done by different clients the central entity should be
> > able to handle the request and play safely, since this has more affect
> > in system suspend and deep sleep. Using class helps me in achieving
> > this and also create an entry to user space which can be used in the
> > later parts. Moreover this not something like a bus or so, so I didn't
> > use bus instead went with a simple class approach.
> 
> But as you have devices that are "binding" to this "controller", a bus might
> make more sense, right?

Have explained above regarding the platform, the concept of bus doesn't
come into picture at all. Here its just waking-up the modem and allowing
it to go to sleep.

> 
> I don't see how a class helps out for you here more than anything else, what
> are you expecting from the class interface?  You aren't using the reference
> counting logic it provides, so why use it at all?

I am using the reference counting logic in class such as  class_for_each_device.

> 
> Actually, why use the driver core at all in the first place if you aren't needing
> the devices to show up in sysfs (as you don't have a device, you are just a
> mediator)?

Yes I am something like a mediator, but since this is associated with many
clients, there should be some central entity to take inputs from all the clients
and act accordingly. This MAF does that. Sysfs will also be created for this
MAF in the coming versions.

> 
> > > > > > +int modem_release(struct modem_desc *mdesc) {
> > > > > > +	if (!mdesc->release)
> > > > > > +		return -EFAULT;
> > > > > > +
> > > > > > +	if (modem_is_requested(mdesc)) {
> > > > > > +		atomic_dec(&mdesc->mclients->cnt);
> > > > > > +		if (atomic_read(&mdesc->use_cnt) == 1) {
> > > > > > +			mdesc->release(mdesc);
> > > > > > +			atomic_dec(&mdesc->use_cnt);
> > > > > > +		}
> > > > >
> > > > > Eeek, why aren't you using the built-in reference counting that
> > > > > the struct device provided to you, and instead are rolling your own?
> > > > > This happens in many places, why?
> > > >
> > > > My usage of counters over here is for each modem there are many
> clients.
> > > > Each of the clients will have a ref to modem_desc. Each of them
> > > > use this for requesting and releasing the modem. One counter for
> > > > tracking the request and release for each client which is done by
> > > > variable 'cnt' in
> > > struct clients.
> > > > The counter use_cnt is used for tracking the modem request/release
> > > > irrespective of the clients and counter cli_cnt is used for
> > > > restricting the modem_get to the no of clients defined in no_clients.
> > > >
> > > > So totally 3 counter one for restricting the usage of modem_get by
> > > > clients, second for restricting modem request/release at top
> > > > level, and 3rd for restricting modem release/request for per
> > > > client per modem
> > > basis.
> > > >
> > > > Can you let me know if the same can be achieved by using built-in
> > > > ref counting?
> > >
> > > Yes, because you don't need all of those different levels, just
> > > stick with one and you should be fine. :)
> > >
> >
> > No, checks at all these levels are required, I have briefed out the need also.
> 
> I still don't understand, sorry.

The pictorial view by Anish should help in understanding.
           Modem                 Client1     Client2    Client3    Client4
State  turn-on                   request
State  no-state-change                     request
State  no-state-change                                   request
State  no-state-change				request
State  no-state-change      release
State  no-state-change                                   release
State  no-state-change                     release
State   turn-off					release

This is just a simple straight forward example.

> 
> > This will have effect on system power management, i.e suspend and deep
> > sleep.
> 
> How does power management matter?  If you tie into the driver model
> properly, power management comes "for free" so you don't have to do
> anything special about it.  Why not use that logic instead of trying to roll your
> own?

As said there are two processors on a single chip playing over here. One being
the APE(Application Processor Engine) and other is Modem. Since they are on
a single chip but for APE entering into deep sleep modem should be released.

> 
> > We restrict that the drivers should request modem only once and
> > release only once, but we cannot rely on the clients hence a check for
> > the same has to be done in the MAF.
> 
> You can't rely on the clients to do what?  And why can't you rely on them?
> What is going to happen?  Who is a "client" here?  Other kernel code?

Yes, let me take my driver itself as an example. Here the clients are the
shared memory driver, sim driver, security etc. Shared memory driver
is the communicating media between the APE and Modem and hence
needs to wake-up the modem and after completion should allow modem
to enter sleep.
Similarly it's the same for sim driver also.
We define that the clients such as shared memory driver and the sim
driver should request for modem only one and then release only once
and also since this is a MAF shouldn't it take care of checking the same?

> 
> I really don't understand your model at all as to what you are trying to
> mediate and manage here, sorry.  I suggest writing it all up as your first patch
> (documentation is good), so that we can properly review your
> implementation and not argue about how to implement something that I
> honestly don't understand.

Sorry for that. Actually my 4th patch in this patchset includes the documentation.
Since it's the kernel doc I have made it as the last patch in this patchset, else
kernel doc compilation will fail.
Please feel free to refer the 4th patch for the documentation part and if still
not clear I can provide more explanation on this.

> 
> > Also the no of clients should be defined and hence a check for the
> > same is done in MAF.
> 
> Defined where?  What is "MAF"?

This driver is MAF(Modem Access Framework).

> 
> > Apart from all these the requests coming from all the clients is to be
> > accumulated and based on that modem release or access should be
> > performed, hence so.
> 
> That sentance makes no sense to me, it must be too early for me here...

Thanks and Regards,
Arun R Murthy
-----------------

^ permalink raw reply

* Re: [PATCH 0/3] virtio-net: inline header support
From: Anthony Liguori @ 2012-10-04  4:29 UTC (permalink / raw)
  To: Rusty Russell, Michael S. Tsirkin, Thomas Lendacky
  Cc: Sasha Levin, virtualization, linux-kernel, avi, kvm, netdev
In-Reply-To: <87391u3o67.fsf@rustcorp.com.au>

Rusty Russell <rusty@rustcorp.com.au> writes:

> Anthony Liguori <anthony@codemonkey.ws> writes:
>> Rusty Russell <rusty@rustcorp.com.au> writes:
>>
>>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>>
>>>> Thinking about Sasha's patches, we can reduce ring usage
>>>> for virtio net small packets dramatically if we put
>>>> virtio net header inline with the data.
>>>> This can be done for free in case guest net stack allocated
>>>> extra head room for the packet, and I don't see
>>>> why would this have any downsides.
>>>
>>> I've been wanting to do this for the longest time... but...
>>>
>>>> Even though with my recent patches qemu
>>>> no longer requires header to be the first s/g element,
>>>> we need a new feature bit to detect this.
>>>> A trivial qemu patch will be sent separately.
>>>
>>> There's a reason I haven't done this.  I really, really dislike "my
>>> implemention isn't broken" feature bits.  We could have an infinite
>>> number of them, for each bug in each device.
>>
>> This is a bug in the specification.
>>
>> The QEMU implementation pre-dates the specification.  All of the actual
>> implementations of virtio relied on the semantics of s/g elements and
>> still do.
>
> lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
> ever going to be merged, so I'm not sure what its status is?  But I'm
> determined to fix qemu, and hence my torture patch to make sure this
> doesn't creep in again.

There are even more implementations out there and I'd wager they all
rely on framing.

>> What's in the specification really doesn't matter when it doesn't agree
>> with all of the existing implementations.
>>
>> Users use implementations, not specifications.  The specification really
>> ought to be changed here.
>
> I'm sorely tempted, except that we're losing a real optimization because
> of this :(

What optimizations?  What Michael is proposing is still achievable with
a device feature.  Are there other optimizations that can be achieved by
changing framing that we can't achieve with feature bits?

As I mentioned in another note, bad framing decisions can cause
performance issues too...

> The specification has long contained the footnote:
>
>         The current qemu device implementations mistakenly insist that
>         the first descriptor cover the header in these cases exactly, so
>         a cautious driver should arrange it so.

I seem to recall this being a compromise between you and I..  I think
I objected strongly to this back when you first wrote the spec and you
added this to appease me ;-)

Regards,

Anthony Liguori

>
> I'd like to tie this caveat to the PCI capability change, so this note
> will move to the appendix with the old PCI layout.
>
> Cheers,
> Rusty.

^ permalink raw reply

* [RFC] vxlan: use ether header as fallback hash
From: Stephen Hemminger @ 2012-10-04  4:39 UTC (permalink / raw)
  To: Jesse Gross, davem, netdev
In-Reply-To: <CAEP_g=9Y3MmsABQJyi508FEgs4635tnaVqk4Ncwbpm41bLskhg@mail.gmail.com>

VXLAN bases source UDP port based on flow to help the
receiver to be able to load balance based on outer header
contents.

This patches changes the algorithm to better handle packets
that can not be categorized by the rxhash() function.
It adds a fallback to use jhash on the Ether header.

It also fixes a bug where the old code could assign 0 as a port
value.


Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
RFC for now, compile tested only

--- a/drivers/net/vxlan.c	2012-10-03 21:25:43.747968165 -0700
+++ b/drivers/net/vxlan.c	2012-10-03 21:36:10.213805422 -0700
@@ -622,12 +622,30 @@ static inline u8 vxlan_ecn_encap(u8 tos,
 	return INET_ECN_encapsulate(tos, inner);
 }
 
+/* Compute hash to use for source port
+ *   first choice to use L4 flow hash since it will spread
+ *     better and maybe available from hardware
+ *   secondary choice is to use jhash on the Ethernet header
+ * Always returns non-zero value
+ */
+static u16 vxlan_flow_hash(struct sk_buff *skb)
+{
+	u16 hash = skb_get_rxhash(skb);
+
+	if (!hash)
+		hash = jhash(skb->data, 3, skb->protocol);
+
+	if (!hash)
+		hash = 1;
+
+	return hash;
+}
+
 /* Transmit local packets over Vxlan
  *
  * Outer IP header inherits ECN and DF from inner header.
  * Outer UDP destination is the VXLAN assigned port.
- *           source port is based on hash of flow if available
- *                       otherwise use a random value
+ *           source port is based on hash of flow
  */
 static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 {
@@ -641,8 +659,8 @@ static netdev_tx_t vxlan_xmit(struct sk_
 	struct flowi4 fl4;
 	struct vxlan_fdb *f;
 	unsigned int pkt_len = skb->len;
-	u32 hash;
 	__be32 dst;
+	__be16 src_port;
 	__be16 df = 0;
 	__u8 tos, ttl;
 	int err;
@@ -670,7 +688,7 @@ static netdev_tx_t vxlan_xmit(struct sk_
 	if (tos == 1)
 		tos = vxlan_get_dsfield(old_iph, skb);
 
-	hash = skb_get_rxhash(skb);
+	src_port = (__force __be16) vxlan_flow_hash(skb);
 
 	rt = ip_route_output_gre(dev_net(dev), &fl4, dst,
 				 vxlan->saddr, vxlan->vni,
@@ -703,7 +721,7 @@ static netdev_tx_t vxlan_xmit(struct sk_
 	uh = udp_hdr(skb);
 
 	uh->dest = htons(vxlan_port);
-	uh->source = hash ? :random32();
+	uh->source = src_port;
 
 	uh->len = htons(skb->len);
 	uh->check = 0;

^ permalink raw reply

* [patch v3 01/11] netlink: add reference of module in netlink_dump_start
From: Gao feng @ 2012-10-04  4:41 UTC (permalink / raw)
  To: davem, eric.dumazet, steffen.klassert
  Cc: netfilter-devel, linux-rdma, netdev, linux-crypto, pablo,
	stephen.hemminger, jengelh, Gao feng

I get a panic when I use ss -a and rmmod inet_diag at the
same time.

it's because netlink_dump use inet_diag_dump witch function
belongs to module inet_diag.

I search the codes and find many modules have the same problem.
We need add reference of the module witch the cb->dump belongs
to.

Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.

Change From v2:

delete netlink_dump_done,and call module_put in netlink_dump
and netlink_sock_destruct.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
 include/linux/netlink.h  |    5 ++++-
 net/netlink/af_netlink.c |   21 +++++++++++++++++----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index f80c56a..bf4e501 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -245,6 +245,8 @@ struct netlink_callback {
 					struct netlink_callback *cb);
 	int			(*done)(struct netlink_callback *cb);
 	void			*data;
+	/* the module that dump function belong to */
+	struct module		*module;
 	u16			family;
 	u16			min_dump_alloc;
 	unsigned int		prev_seq, seq;
@@ -262,8 +264,9 @@ __nlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, int type, int len, int fla
 
 struct netlink_dump_control {
 	int (*dump)(struct sk_buff *skb, struct netlink_callback *);
-	int (*done)(struct netlink_callback*);
+	int (*done)(struct netlink_callback *);
 	void *data;
+	struct module *module;
 	u16 min_dump_alloc;
 };
 
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 0f2e3ad..0905dfb 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -169,6 +169,7 @@ static void netlink_sock_destruct(struct sock *sk)
 	if (nlk->cb) {
 		if (nlk->cb->done)
 			nlk->cb->done(nlk->cb);
+		module_put(nlk->cb->module);
 		netlink_destroy_callback(nlk->cb);
 	}
 
@@ -1755,6 +1756,8 @@ static int netlink_dump(struct sock *sk)
 
 	if (cb->done)
 		cb->done(cb);
+
+	module_put(cb->module);
 	nlk->cb = NULL;
 	mutex_unlock(nlk->cb_mutex);
 
@@ -1784,6 +1787,7 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	cb->done = control->done;
 	cb->nlh = nlh;
 	cb->data = control->data;
+	cb->module = control->module;
 	cb->min_dump_alloc = control->min_dump_alloc;
 	atomic_inc(&skb->users);
 	cb->skb = skb;
@@ -1794,19 +1798,28 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 		return -ECONNREFUSED;
 	}
 	nlk = nlk_sk(sk);
-	/* A dump is in progress... */
+
 	mutex_lock(nlk->cb_mutex);
+	/* A dump is in progress... */
 	if (nlk->cb) {
 		mutex_unlock(nlk->cb_mutex);
 		netlink_destroy_callback(cb);
-		sock_put(sk);
-		return -EBUSY;
+		ret = -EBUSY;
+		goto out;
 	}
+	/* add reference of module witch cb->dump belong to */
+	if (!try_module_get(cb->module)) {
+		mutex_unlock(nlk->cb_mutex);
+		netlink_destroy_callback(cb);
+		ret = -EPROTONOSUPPORT;
+		goto out;
+	}
+
 	nlk->cb = cb;
 	mutex_unlock(nlk->cb_mutex);
 
 	ret = netlink_dump(sk);
-
+out:
 	sock_put(sk);
 
 	if (ret)
-- 
1.7.7.6


^ permalink raw reply related

* [patch v3 02/11] inet_diag: pass inet_diag module to netlink_dump_start
From: Gao feng @ 2012-10-04  4:41 UTC (permalink / raw)
  To: davem, eric.dumazet, steffen.klassert
  Cc: netfilter-devel, linux-rdma, netdev, linux-crypto, pablo,
	stephen.hemminger, jengelh, Gao feng
In-Reply-To: <1349325689-16104-1-git-send-email-gaofeng@cn.fujitsu.com>

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
 net/ipv4/inet_diag.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 535584c..5ffd7bc 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -981,6 +981,7 @@ static int inet_diag_rcv_msg_compat(struct sk_buff *skb, struct nlmsghdr *nlh)
 		{
 			struct netlink_dump_control c = {
 				.dump = inet_diag_dump_compat,
+				.module = THIS_MODULE,
 			};
 			return netlink_dump_start(net->diag_nlsk, skb, nlh, &c);
 		}
@@ -1010,6 +1011,7 @@ static int inet_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
 		{
 			struct netlink_dump_control c = {
 				.dump = inet_diag_dump,
+				.module = THIS_MODULE,
 			};
 			return netlink_dump_start(net->diag_nlsk, skb, h, &c);
 		}
-- 
1.7.7.6

^ permalink raw reply related

* [patch v3 03/11] unix_diag: pass unix_diag module to netlink_dump_start
From: Gao feng @ 2012-10-04  4:41 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ
  Cc: netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	stephen.hemminger-ZtmgI6mnKB3QT0dZR+AlfA, jengelh-9+2X+4sQBs8,
	Gao feng
In-Reply-To: <1349325689-16104-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 net/unix/diag.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/unix/diag.c b/net/unix/diag.c
index 06748f1..e1cd31f 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -299,6 +299,7 @@ static int unix_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
 	if (h->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
 			.dump = unix_diag_dump,
+			.module = THIS_MODULE,
 		};
 		return netlink_dump_start(net->diag_nlsk, skb, h, &c);
 	} else
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [patch v3 04/11] nf_conntrack_netlink: pass nf_conntrack_netlink module to netlink_dump_start
From: Gao feng @ 2012-10-04  4:41 UTC (permalink / raw)
  To: davem, eric.dumazet, steffen.klassert
  Cc: netfilter-devel, linux-rdma, netdev, linux-crypto, pablo,
	stephen.hemminger, jengelh, Gao feng
In-Reply-To: <1349325689-16104-1-git-send-email-gaofeng@cn.fujitsu.com>

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
 net/netfilter/nf_conntrack_netlink.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 7bbfb3d..20af287 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1022,6 +1022,7 @@ ctnetlink_get_conntrack(struct sock *ctnl, struct sk_buff *skb,
 		struct netlink_dump_control c = {
 			.dump = ctnetlink_dump_table,
 			.done = ctnetlink_done,
+			.module = THIS_MODULE,
 		};
 #ifdef CONFIG_NF_CONNTRACK_MARK
 		if (cda[CTA_MARK] && cda[CTA_MARK_MASK]) {
@@ -1723,6 +1724,7 @@ ctnetlink_stat_ct_cpu(struct sock *ctnl, struct sk_buff *skb,
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
 			.dump = ctnetlink_ct_stat_cpu_dump,
+			.module = THIS_MODULE,
 		};
 		return netlink_dump_start(ctnl, skb, nlh, &c);
 	}
@@ -2242,6 +2244,7 @@ ctnetlink_get_expect(struct sock *ctnl, struct sk_buff *skb,
 		struct netlink_dump_control c = {
 			.dump = ctnetlink_exp_dump_table,
 			.done = ctnetlink_exp_done,
+			.module = THIS_MODULE,
 		};
 		return netlink_dump_start(ctnl, skb, nlh, &c);
 	}
@@ -2680,6 +2683,7 @@ ctnetlink_stat_exp_cpu(struct sock *ctnl, struct sk_buff *skb,
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
 			.dump = ctnetlink_exp_stat_cpu_dump,
+			.module = THIS_MODULE,
 		};
 		return netlink_dump_start(ctnl, skb, nlh, &c);
 	}
-- 
1.7.7.6


^ permalink raw reply related

* [patch v3 05/11] nfnetlink_acct: pass nfnetlink_acct module to netlink_dump_start
From: Gao feng @ 2012-10-04  4:41 UTC (permalink / raw)
  To: davem, eric.dumazet, steffen.klassert
  Cc: netfilter-devel, linux-rdma, netdev, linux-crypto, pablo,
	stephen.hemminger, jengelh, Gao feng
In-Reply-To: <1349325689-16104-1-git-send-email-gaofeng@cn.fujitsu.com>

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
 net/netfilter/nfnetlink_acct.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nfnetlink_acct.c b/net/netfilter/nfnetlink_acct.c
index 589d686..8acf8b8 100644
--- a/net/netfilter/nfnetlink_acct.c
+++ b/net/netfilter/nfnetlink_acct.c
@@ -175,6 +175,7 @@ nfnl_acct_get(struct sock *nfnl, struct sk_buff *skb,
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
 			.dump = nfnl_acct_dump,
+			.module = THIS_MODULE,
 		};
 		return netlink_dump_start(nfnl, skb, nlh, &c);
 	}
-- 
1.7.7.6


^ permalink raw reply related

* [patch v3 06/11] nfnetlink_cthelper: pass nfnetlink_cthelper module to netlink_dump_start
From: Gao feng @ 2012-10-04  4:41 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ
  Cc: netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	stephen.hemminger-ZtmgI6mnKB3QT0dZR+AlfA, jengelh-9+2X+4sQBs8,
	Gao feng
In-Reply-To: <1349325689-16104-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 net/netfilter/nfnetlink_cthelper.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nfnetlink_cthelper.c b/net/netfilter/nfnetlink_cthelper.c
index 945950a..2bcd2d8 100644
--- a/net/netfilter/nfnetlink_cthelper.c
+++ b/net/netfilter/nfnetlink_cthelper.c
@@ -504,6 +504,7 @@ nfnl_cthelper_get(struct sock *nfnl, struct sk_buff *skb,
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
 			.dump = nfnl_cthelper_dump_table,
+			.module = THIS_MODULE,
 		};
 		return netlink_dump_start(nfnl, skb, nlh, &c);
 	}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [patch v3 07/11] nfnetlink_cttimeout: pass nfnetlink_cttimeout module to netlink_dump_start
From: Gao feng @ 2012-10-04  4:41 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ
  Cc: netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	stephen.hemminger-ZtmgI6mnKB3QT0dZR+AlfA, jengelh-9+2X+4sQBs8,
	Gao feng
In-Reply-To: <1349325689-16104-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 net/netfilter/nfnetlink_cttimeout.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index 8847b4d..e0ab57c 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -248,6 +248,7 @@ cttimeout_get_timeout(struct sock *ctnl, struct sk_buff *skb,
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
 		struct netlink_dump_control c = {
 			.dump = ctnl_timeout_dump,
+			.module = THIS_MODULE,
 		};
 		return netlink_dump_start(ctnl, skb, nlh, &c);
 	}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [patch v3 08/11] crypto: pass crypto_user module to netlink_dump_start
From: Gao feng @ 2012-10-04  4:41 UTC (permalink / raw)
  To: davem, eric.dumazet, steffen.klassert
  Cc: netfilter-devel, linux-rdma, netdev, linux-crypto, pablo,
	stephen.hemminger, jengelh, Gao feng, Herbert Xu
In-Reply-To: <1349325689-16104-1-git-send-email-gaofeng@cn.fujitsu.com>

set netlink_dump_control.done to avoid panic.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
 crypto/crypto_user.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index 6bba414..f08297b 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -430,13 +430,15 @@ static struct crypto_link {
 	int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **);
 	int (*dump)(struct sk_buff *, struct netlink_callback *);
 	int (*done)(struct netlink_callback *);
+	struct module *module;
 } crypto_dispatch[CRYPTO_NR_MSGTYPES] = {
 	[CRYPTO_MSG_NEWALG	- CRYPTO_MSG_BASE] = { .doit = crypto_add_alg},
 	[CRYPTO_MSG_DELALG	- CRYPTO_MSG_BASE] = { .doit = crypto_del_alg},
 	[CRYPTO_MSG_UPDATEALG	- CRYPTO_MSG_BASE] = { .doit = crypto_update_alg},
 	[CRYPTO_MSG_GETALG	- CRYPTO_MSG_BASE] = { .doit = crypto_report,
 						       .dump = crypto_dump_report,
-						       .done = crypto_dump_report_done},
+						       .done = crypto_dump_report_done,
+						       .module = THIS_MODULE},
 };
 
 static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
@@ -471,6 +473,7 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 				.dump = link->dump,
 				.done = link->done,
 				.min_dump_alloc = dump_alloc,
+				.module = link->module,
 			};
 			return netlink_dump_start(crypto_nlsk, skb, nlh, &c);
 		}
-- 
1.7.7.6

^ permalink raw reply related

* [patch v3 09/11] xfrm: pass xfrm_user module to netlink_dump_start
From: Gao feng @ 2012-10-04  4:41 UTC (permalink / raw)
  To: davem, eric.dumazet, steffen.klassert
  Cc: netfilter-devel, linux-rdma, netdev, linux-crypto, pablo,
	stephen.hemminger, jengelh, Gao feng
In-Reply-To: <1349325689-16104-1-git-send-email-gaofeng@cn.fujitsu.com>

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
 net/xfrm/xfrm_user.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 421f984..3d909f7 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2308,17 +2308,20 @@ static struct xfrm_link {
 	int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **);
 	int (*dump)(struct sk_buff *, struct netlink_callback *);
 	int (*done)(struct netlink_callback *);
+	struct module *module;
 } xfrm_dispatch[XFRM_NR_MSGTYPES] = {
 	[XFRM_MSG_NEWSA       - XFRM_MSG_BASE] = { .doit = xfrm_add_sa        },
 	[XFRM_MSG_DELSA       - XFRM_MSG_BASE] = { .doit = xfrm_del_sa        },
 	[XFRM_MSG_GETSA       - XFRM_MSG_BASE] = { .doit = xfrm_get_sa,
 						   .dump = xfrm_dump_sa,
-						   .done = xfrm_dump_sa_done  },
+						   .done = xfrm_dump_sa_done,
+						   .module = THIS_MODULE      },
 	[XFRM_MSG_NEWPOLICY   - XFRM_MSG_BASE] = { .doit = xfrm_add_policy    },
 	[XFRM_MSG_DELPOLICY   - XFRM_MSG_BASE] = { .doit = xfrm_get_policy    },
 	[XFRM_MSG_GETPOLICY   - XFRM_MSG_BASE] = { .doit = xfrm_get_policy,
 						   .dump = xfrm_dump_policy,
-						   .done = xfrm_dump_policy_done },
+						   .done = xfrm_dump_policy_done,
+						   .module = THIS_MODULE      },
 	[XFRM_MSG_ALLOCSPI    - XFRM_MSG_BASE] = { .doit = xfrm_alloc_userspi },
 	[XFRM_MSG_ACQUIRE     - XFRM_MSG_BASE] = { .doit = xfrm_add_acquire   },
 	[XFRM_MSG_EXPIRE      - XFRM_MSG_BASE] = { .doit = xfrm_add_sa_expire },
@@ -2362,6 +2365,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 			struct netlink_dump_control c = {
 				.dump = link->dump,
 				.done = link->done,
+				.module = link->module,
 			};
 			return netlink_dump_start(net->xfrm.nlsk, skb, nlh, &c);
 		}
-- 
1.7.7.6


^ permalink raw reply related

* [patch v3 10/11] ipset: pass ipset module to netlink_dump_start
From: Gao feng @ 2012-10-04  4:41 UTC (permalink / raw)
  To: davem, eric.dumazet, steffen.klassert
  Cc: netfilter-devel, linux-rdma, netdev, linux-crypto, pablo,
	stephen.hemminger, jengelh, Gao feng, Jozsef Kadlecsik
In-Reply-To: <1349325689-16104-1-git-send-email-gaofeng@cn.fujitsu.com>

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
---
 net/netfilter/ipset/ip_set_core.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 778465f..6f2b55c 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1185,6 +1185,7 @@ ip_set_dump(struct sock *ctnl, struct sk_buff *skb,
 		struct netlink_dump_control c = {
 			.dump = ip_set_dump_start,
 			.done = ip_set_dump_done,
+			.module = THIS_MODULE,
 		};
 		return netlink_dump_start(ctnl, skb, nlh, &c);
 	}
-- 
1.7.7.6


^ permalink raw reply related

* [patch v3 11/11] infiniband: pass rdma_cm module to netlink_dump_start
From: Gao feng @ 2012-10-04  4:41 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	steffen.klassert-opNxpl+3fjRBDgjK7y7TUQ
  Cc: netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	stephen.hemminger-ZtmgI6mnKB3QT0dZR+AlfA, jengelh-9+2X+4sQBs8,
	Gao feng, Roland Dreier, Sean Hefty
In-Reply-To: <1349325689-16104-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/core/cma.c     |    3 ++-
 drivers/infiniband/core/netlink.c |    1 +
 include/rdma/rdma_netlink.h       |    1 +
 3 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 26b3760..4fff27a 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -3498,7 +3498,8 @@ out:
 }
 
 static const struct ibnl_client_cbs cma_cb_table[] = {
-	[RDMA_NL_RDMA_CM_ID_STATS] = { .dump = cma_get_id_stats },
+	[RDMA_NL_RDMA_CM_ID_STATS] = { .dump = cma_get_id_stats,
+				       .module = THIS_MODULE },
 };
 
 static int __init cma_init(void)
diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c
index fe10a94..da06abd 100644
--- a/drivers/infiniband/core/netlink.c
+++ b/drivers/infiniband/core/netlink.c
@@ -154,6 +154,7 @@ static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 			{
 				struct netlink_dump_control c = {
 					.dump = client->cb_table[op].dump,
+					.module = client->cb_table[op].module,
 				};
 				return netlink_dump_start(nls, skb, nlh, &c);
 			}
diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
index 3c5363a..bd3d8b2 100644
--- a/include/rdma/rdma_netlink.h
+++ b/include/rdma/rdma_netlink.h
@@ -39,6 +39,7 @@ struct rdma_cm_id_stats {
 
 struct ibnl_client_cbs {
 	int (*dump)(struct sk_buff *skb, struct netlink_callback *nlcb);
+	struct module *module;
 };
 
 int ibnl_init(void);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [RFC] vxlan: use ether header as fallback hash
From: Eric Dumazet @ 2012-10-04  4:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jesse Gross, davem, netdev
In-Reply-To: <20121003213906.09b57539@nehalam.linuxnetplumber.net>

On Wed, 2012-10-03 at 21:39 -0700, Stephen Hemminger wrote:
> VXLAN bases source UDP port based on flow to help the
> receiver to be able to load balance based on outer header
> contents.
> 
> This patches changes the algorithm to better handle packets
> that can not be categorized by the rxhash() function.
> It adds a fallback to use jhash on the Ether header.
> 
> It also fixes a bug where the old code could assign 0 as a port
> value.
> 
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> RFC for now, compile tested only
> 
> --- a/drivers/net/vxlan.c	2012-10-03 21:25:43.747968165 -0700
> +++ b/drivers/net/vxlan.c	2012-10-03 21:36:10.213805422 -0700
> @@ -622,12 +622,30 @@ static inline u8 vxlan_ecn_encap(u8 tos,
>  	return INET_ECN_encapsulate(tos, inner);
>  }
>  

#include <linux/jhash.h>

> +/* Compute hash to use for source port
> + *   first choice to use L4 flow hash since it will spread
> + *     better and maybe available from hardware
> + *   secondary choice is to use jhash on the Ethernet header
> + * Always returns non-zero value
> + */
> +static u16 vxlan_flow_hash(struct sk_buff *skb)
> +{
> +	u16 hash = skb_get_rxhash(skb);

skb_get_rxhash(skb) returns an u32, that could have low order 16bits set
to 0.

So I would use u32 hash = skb_get_rxhash(skb);



> +
> +	if (!hash)
> +		hash = jhash(skb->data, 3, skb->protocol);
> +

then here, do :

	hash ^= hash >> 16;
	hash &= 0xffff;


> +	if (!hash)
> +		hash = 1;
> +
> +	return hash;
> +}
> +

^ permalink raw reply

* Re: [RFC] vxlan: use ether header as fallback hash
From: Stephen Hemminger @ 2012-10-04  5:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jesse Gross, davem, netdev
In-Reply-To: <1349326673.16011.5.camel@edumazet-glaptop>


> On Wed, 2012-10-03 at 21:39 -0700, Stephen Hemminger wrote:
> > VXLAN bases source UDP port based on flow to help the
> > receiver to be able to load balance based on outer header
> > contents.
> > 
> > This patches changes the algorithm to better handle packets
> > that can not be categorized by the rxhash() function.
> > It adds a fallback to use jhash on the Ether header.
> > 
> > It also fixes a bug where the old code could assign 0 as a port
> > value.
> > 
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > 
> > ---
> > RFC for now, compile tested only
> > 
> > --- a/drivers/net/vxlan.c	2012-10-03 21:25:43.747968165 -0700
> > +++ b/drivers/net/vxlan.c	2012-10-03 21:36:10.213805422 -0700
> > @@ -622,12 +622,30 @@ static inline u8 vxlan_ecn_encap(u8 tos,
> >  	return INET_ECN_encapsulate(tos, inner);
> >  }
> >  
> 
> #include <linux/jhash.h>
> 
> > +/* Compute hash to use for source port
> > + *   first choice to use L4 flow hash since it will spread
> > + *     better and maybe available from hardware
> > + *   secondary choice is to use jhash on the Ethernet header
> > + * Always returns non-zero value
> > + */
> > +static u16 vxlan_flow_hash(struct sk_buff *skb)
> > +{
> > +	u16 hash = skb_get_rxhash(skb);
> 
> skb_get_rxhash(skb) returns an u32, that could have low order 16bits
> set
> to 0.
> 
> So I would use u32 hash = skb_get_rxhash(skb);


Turns out that 0 is a valid source port if no reply is requested
(per RFC and wikipedia)

^ permalink raw reply

* Re: [patch v3 02/11] inet_diag: pass inet_diag module to netlink_dump_start
From: Eric Dumazet @ 2012-10-04  5:08 UTC (permalink / raw)
  To: Gao feng
  Cc: davem, steffen.klassert, netfilter-devel, linux-rdma, netdev,
	linux-crypto, pablo, stephen.hemminger, jengelh
In-Reply-To: <1349325689-16104-2-git-send-email-gaofeng@cn.fujitsu.com>

On Thu, 2012-10-04 at 12:41 +0800, Gao feng wrote:
> set netlink_dump_control.module to avoid panic.
> 
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> ---
>  net/ipv4/inet_diag.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
> index 535584c..5ffd7bc 100644
> --- a/net/ipv4/inet_diag.c
> +++ b/net/ipv4/inet_diag.c
> @@ -981,6 +981,7 @@ static int inet_diag_rcv_msg_compat(struct sk_buff *skb, struct nlmsghdr *nlh)
>  		{
>  			struct netlink_dump_control c = {
>  				.dump = inet_diag_dump_compat,
> +				.module = THIS_MODULE,
>  			};
>  			return netlink_dump_start(net->diag_nlsk, skb, nlh, &c);
>  		}
> @@ -1010,6 +1011,7 @@ static int inet_diag_handler_dump(struct sk_buff *skb, struct nlmsghdr *h)
>  		{
>  			struct netlink_dump_control c = {
>  				.dump = inet_diag_dump,
> +				.module = THIS_MODULE,
>  			};
>  			return netlink_dump_start(net->diag_nlsk, skb, h, &c);
>  		}


I believe Pablo suggestion was to make netlink_dump_start()
automatically pass THIS_MODULE so that we dont need to change all call
sites ?

extern int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
				  const struct nlmsghdr *nlh,
				  struct netlink_dump_control *control);

static inline int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
					const struct nlmsghdr *nlh,
					struct netlink_dump_control *control)
{
	control->module = THIS_MODULE;
	return __netlink_dump_start(ssk, skb, nlh, control);
}

or :

extern int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
				  const struct nlmsghdr *nlh,
				  struct netlink_dump_control *control,
				  struct module *module);

static inline int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
					const struct nlmsghdr *nlh,
					struct netlink_dump_control *control)
{
	return __netlink_dump_start(ssk, skb, nlh, control, THIS_MODULE);
}



^ permalink raw reply

* Re: [PATCH 0/3] virtio-net: inline header support
From: Rusty Russell @ 2012-10-04  5:17 UTC (permalink / raw)
  To: Anthony Liguori, Michael S. Tsirkin, Thomas Lendacky
  Cc: Sasha Levin, virtualization, linux-kernel, avi, kvm, netdev
In-Reply-To: <87sj9vxbnf.fsf@codemonkey.ws>

Anthony Liguori <anthony@codemonkey.ws> writes:

> Rusty Russell <rusty@rustcorp.com.au> writes:
>
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>
>> There's a reason I haven't done this.  I really, really dislike "my
>> implemention isn't broken" feature bits.  We could have an infinite
>> number of them, for each bug in each device.
>>
>> So my plan was to tie this assumption to the new PCI layout.  And have a
>> stress-testing patch like the one below in the kernel (see my virtio-wip
>> branch for stuff like this).  Turn it on at boot with
>> "virtio_ring.torture" on the kernel commandline.
>>
>> BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0)
>> is too old.  Building the latest git now...
>>
>> Cheers,
>> Rusty.
>>
>> Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE
>>
>> Virtio devices are not supposed to depend on the framing of the scatter-gather
>> lists, but various implementations did.  Safeguard this in future by adding
>> an option to deliberately create perverse descriptors.
>>
>> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> Ignore framing is really a bad idea.  You want backends to enforce
> reasonable framing because guest's shouldn't do silly things with framing.
>
> For instance, with virtio-blk, if you want decent performance, you
> absolutely want to avoid bouncing the data.  If you're using O_DIRECT in
> the host to submit I/O requests, then it's critical that all of the s/g
> elements are aligned to a sector boundary and sized to a sector
> boundary.
>
> Yes, QEMU can handle if that's not the case, but it would be insanely
> stupid for a guest not to do this.  This is the sort of thing that ought
> to be enforced in the specification because a guest cannot perform well
> if it doesn't follow these rules.

Lack of imagination is what got us into trouble in the first place; when
presented with one counter-example, it's useful to look for others.

That's our job, not to dismiss them a "insanely stupid".

For example:
1) Perhaps the guest isn't trying to perform well, it's trying to be a
   tiny bootloader?
2) Perhaps the guest is the direct consumer, and aligning buffers is
   redundant.

> A spec isn't terribly useful if the result is guest drivers that are
> slow.  There's very little to gain by not enforcing rules around framing
> and there's a lot to lose if a guest frames incorrectly.

The guest has the flexibility, and gets to decide.  The spec is not
forcing them to perform badly.

> In the rare case where we want to make a framing change, we should use
> feature bits like Michael is proposing.
>
> In this case, we should simply say that with the feature bit, the vnet
> header can be in the same element as the data but not allow the header
> to be spread across multiple elements.

I'd love to split struct virtio_net_hdr_mrg_rxbuf, so the num_buffers
ends up somewhere else.

The simplest rules are "never" or "always".

Cheers,
Rusty.
PS.  Inserting zero-length buffers is something I'd be prepared to rule
     out, my current patch does it just for yuks...

^ permalink raw reply

* Re: [PATCH 3/3] vxlan: virtual extensible lan
From: Eric Dumazet @ 2012-10-04  6:34 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jesse Gross, davem, netdev
In-Reply-To: <20121003210220.5e74aa5d@nehalam.linuxnetplumber.net>

On Wed, 2012-10-03 at 21:02 -0700, Stephen Hemminger wrote:

>
> Since port is always a pair, there is no need to restrict range, unless
> there was a broken firewall in the way. One bug there is that random32()
> can return 0 which is not a valid port number.  A better fallback
> would be a hash of the MAC header.

But using up to 65536 values for the port means that if vxlan traffic
crosses a statefull firewall (eg netfilter with conntrack), this might
need 65536 contexts/sessions.

So random32() is not needed.

What might be needed is additional parameters for a vxlan tunnel, to
give an optional range for the source port.

^ permalink raw reply

* IPVS-DR problem with neigh lookup in 3.6
From: Julian Anastasov @ 2012-10-04  6:52 UTC (permalink / raw)
  To: netdev


	Hello,

	I suspect commit a263b3093641fb1ec377582c90986a7fd0625184
is causing problem for the IPVS Direct Routing mode which
requests output route for real server IP (RIP) but packets
contain virtual IP (VIP). Before 3.6 ip_finish_output2 used
the neighbour address while now we use iph daddr. The
result is that now we send ARP query for VIP instead of RIP.

	Is it a good idea to keep rt_gateway always
valid and to check RTCF_REDIRECTED where needed?
While checking this option I see that the value of
0 is not good for matching, for example, callforward_do_filter
in net/netfilter/nf_conntrack_h323_main.c will fail
too. If we do such change there is a chance some rt_gateway
checks to disappear, other will be replaced with check for
RTCF_REDIRECTED flag. Should I try such option?
Because adding rt_dst does not look good, it will not
help to callforward_do_filter too.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: IPVS-DR problem with neigh lookup in 3.6
From: David Miller @ 2012-10-04  6:58 UTC (permalink / raw)
  To: ja; +Cc: netdev
In-Reply-To: <alpine.LFD.2.00.1210040923010.1618@ja.ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Thu, 4 Oct 2012 09:52:47 +0300 (EEST)

> 	Is it a good idea to keep rt_gateway always
> valid and to check RTCF_REDIRECTED where needed?

What exactly do you mean by this?  rt_gateway can only
be explicit nexthop or zero for local subnet.

It cannot take on any other value, otherwise routes are
not properly sharable.

> Because adding rt_dst does not look good, it will not
> help to callforward_do_filter too.

Adding rt_dst is not to be seriously considered.

^ permalink raw reply

* Re: [PATCH 0/3] virtio-net: inline header support
From: Paolo Bonzini @ 2012-10-04  7:09 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Thomas Lendacky, kvm, netdev, linux-kernel,
	virtualization, avi, Sasha Levin
In-Reply-To: <87bogj2j1b.fsf@rustcorp.com.au>

Il 04/10/2012 02:11, Rusty Russell ha scritto:
> > > There's a reason I haven't done this.  I really, really dislike "my
> > > implemention isn't broken" feature bits.  We could have an infinite
> > > number of them, for each bug in each device.
> >
> > However, this bug affects (almost) all implementations and (almost) all
> > devices.  It even makes sense to reserve a transport feature bit for it
> > instead of a device feature bit.
>
> Perhaps, but we have to fix the bugs first!

Yes. :)  Isn't that what mst's patch does?

> As I said, my torture patch broke qemu immediately.  Since noone has
> leapt onto fixing that, I'll take a look now...

I can look at virtio-scsi.

Paolo

^ permalink raw reply

* Good News---Please Reply Now.
From: roberta.miller @ 2012-10-04  4:57 UTC (permalink / raw)


Greeting, I want you to stand as next of kin so that we can transfer very
huge amount of money and I have all the legal documents to back it up.

Regards,
Barrister Ulrich Claypole.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox