* Re: Open vSwitch Design
From: jamal @ 2011-11-25 11:24 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev-yBygre7rU0TnMu66kgdUjQ, Chris Wright, Herbert Xu,
Eric Dumazet, netdev, John Fastabend, David Miller
In-Reply-To: <20111124212021.2ae2fb7f-QE31Isp8l5DVJhW05BI4jyWSNWFUUkiGXqFh9Ls21Oc@public.gmane.org>
On Thu, 2011-11-24 at 21:20 -0800, Stephen Hemminger wrote:
> On Thu, 24 Nov 2011 17:30:33 -0500
> jamal <hadi-fAAogVwAN2Kw5LPnMra/2Q@public.gmane.org> wrote:
>
> > Can you explain why you couldnt use the current bridge code (likely with
> > some mods)? I can see you want to isolate the VMs via the virtual ports;
> > maybe even vlans on the virtual ports - the current bridge code should
> > be able to handle that.
>
> The way openvswitch works is that the flow table is populated
> by user space. The kernel bridge works completely differently (it learns
> about MAC addresses).
>
Most hardware bridges out there support all different modes:
You can have learning in the hardware or defer it to user/control plane
by setting some flags. You can have broadcasting done in hardware or
defer to user space.
The mods i was thinking of is to bring the Linux bridge to have the
same behavior. You then need to allow netlink updates of bridge MAC
table from user space. There may be weaknesses with the current bridging
code in relation to Vlans that may need to be addressed.
[But my concern was not so much the bridge - because changes are needed
in that case; it is the "match, actionlist" that is already in place
that got to me.]
> Actually, this is what puts me off on the current implementation.
> I would prefer that the kernel implementation was just a software
> implementation of a hardware OpenFlow switch. That way it would
> be transparent that the control plane in user space was talking to kernel
> or hardware.
Or alternatively, allow the bridge code to support the different modes.
Learning as well as broadcasting mode needs to be settable.
Then you have interesting capability in the kernel that meets the
requirements of an open flow switch (+ anyone who wants to do policy
control in user space with their favorite standard).
> > The tc classifier-action-qdisc infrastructure handles this.
> > The sampler needs a new action defined.
>
> There are too many damn layers in the software path already.
I think what they are doing in the separation of control and data
is reasonable. The policy and control are in user space. The fastpath
is in the kernel; and it may be in a variety of spots (some arp entry
here, some L3 entry there, a couple of match-action items etc)
the brains which understand the what the different things mean in
aggregation in terms of a service are in user space.
>
> The problem is that there are two flow classifiers, one in OpenVswitch
> in the kernel, and the other in the user space flow manager. I think the
> issue is that the two have different code.
i see. I can understand having a simple classifier in the kernel and
more complex "consulting" sitting in user space which updates the
kernel on how to deal with subsequent flow packets.
> Is the kernel/userspace API for OpenVswitch nailed down and documented
> well enough that alternative control plane software could be built?
They do have a generic netlink interface. I would prefer the netlink
interface already in place (which would have worked if they used
the stuff already in place).
cheers,
jamal
^ permalink raw reply
* Re: Open vSwitch Design
From: jamal @ 2011-11-25 11:34 UTC (permalink / raw)
To: Eric Dumazet
Cc: dev-yBygre7rU0TnMu66kgdUjQ, chrisw-H+wXaHxf7aLQT0dZR+AlfA,
netdev-u79uwXL29TY76Z2rM5mHXA, Florian Westphal,
john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
herbert-F6s6mLieUQo7FNHlEwC/lvQIK84fMopw,
shemminger-ZtmgI6mnKB3QT0dZR+AlfA, David Miller
In-Reply-To: <1322203013.2872.28.camel@edumazet-laptop>
Hrm. I forgot about the flow classifier - it may be what the openflow
folks need. It is more friendly for the well defined tuples than u32.
But what do you mean "refactor"? I can already use this classifier
and attach actions to set policy in the kernel.
cheers,
jamal
On Fri, 2011-11-25 at 07:36 +0100, Eric Dumazet wrote:
> > >
> > > Maybe its time to factorize the thing, eventually use it in a third
> > > component (Open vSwitch...)
> >
> > Yes.
>
> A third reason to do that anyway is that net/sched/sch_sfb.c should use
> __skb_get_rxhash() providing the perturbation itself, and not use the
> standard (hashrnd) one ).
>
> Right now, if two flows share same rxhash, the double SFB hash will also
> share the same final hash.
>
> (This point was mentioned by Florian Westphal)
>
>
>
^ permalink raw reply
* Re: [PATCH 2/4] NET: NETROM: When adding a route verify length of mnemonic string.
From: Dan Carpenter @ 2011-11-25 11:36 UTC (permalink / raw)
To: Ralf Baechle
Cc: David S. Miller, netdev, linux-hams, Walter Harms,
Thomas Osterried
In-Reply-To: <cfff1df64b18a89140ff995189c6a3c484815997.1322214950.git.ralf@linux-mips.org>
[-- Attachment #1: Type: text/plain, Size: 897 bytes --]
On Fri, Nov 25, 2011 at 09:08:49AM +0000, Ralf Baechle wrote:
> struct nr_route_struct's mnemonic permits a string of up to 7 bytes to be
> used. If userland passes a not zero terminated string to the kernel adding
> a node to the routing table might result in the kernel attempting to read
> copy a too long string.
>
> Mnemonic is part of the NET/ROM routing protocol; NET/ROM routing table
> updates only broadcast 6 bytes. The 7th byte in the mnemonic array exists
> only as a \0 termination character for the kernel code's convenience.
>
> Fixed by rejecting mnemonic strings that have no terminating \0 in the first
> 7 characters. Do this test only NETROM_NODE to avoid breaking NETROM_NEIGH
> where userland might passing an uninitialized mnemonic field.
Good point... I missed that.
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
regards,
dan carpenter
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* Re: [PATCH net-next 1/2] netem: rate-latency extension
From: Hagen Paul Pfeifer @ 2011-11-25 12:02 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Stephen Hemminger, netdev
In-Reply-To: <1322201600.2872.15.camel@edumazet-laptop>
* Eric Dumazet | 2011-11-25 07:13:20 [+0100]:
>Yes, but Hagen change adds a few lines to netem, and netem already
>handles throttling. This is why I believe its a nice enhancement.
We first modified TBF, but TBF address a slightly different task. So the patch
was a little bit awkward and complex (more awkward then the two additional
netem enqueue() lines). So in the end: yes, netem is the right place for this:
only a few lines in netem are required. Additionally: setup a qdisc chain with
TBF, netem, ... is also error prone. Students of mine repeatedly make mistakes
here. This change make a complete emulation setup even more easy. But this is
only a side note.
Hagen
^ permalink raw reply
* Re: [PATCH 3/4] NET: NETROM: Cleanup argument SIOCADDRT ioctl argument checking.
From: walter harms @ 2011-11-25 12:12 UTC (permalink / raw)
Cc: Ralf Baechle, David S. Miller, netdev, linux-hams,
Thomas Osterried, Kernel Janitors List
In-Reply-To: <4ECF7A76.3090305@bfs.de>
hi,
according to LXR there are several places where the check is >AX25_MAX_DIGIS instead of >=.
any takers ?
re,
wh
Am 25.11.2011 12:22, schrieb walter harms:
>
>
> Am 25.11.2011 10:09, schrieb Ralf Baechle:
>> nr_route.ndigis is unsigned int so the nr_route.ndigis < 0 expression is
>> never true and can be dropped. Doing the nr_ax25_dev_get call later
>> allows the nr_route.ndigis test to bail out without having to dev_put.
>>
>> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
>> Cc: Thomas Osterried <thomas@osterried.de>
>> ---
>> net/netrom/nr_route.c | 6 ++----
>> 1 files changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
>> index 8d7716c..2cf3301 100644
>> --- a/net/netrom/nr_route.c
>> +++ b/net/netrom/nr_route.c
>> @@ -670,12 +670,10 @@ int nr_rt_ioctl(unsigned int cmd, void __user *arg)
>> case SIOCADDRT:
>> if (copy_from_user(&nr_route, arg, sizeof(struct nr_route_struct)))
>> return -EFAULT;
>> - if ((dev = nr_ax25_dev_get(nr_route.device)) == NULL)
>> + if (nr_route.ndigis > AX25_MAX_DIGIS)
>> return -EINVAL;
>> - if (nr_route.ndigis < 0 || nr_route.ndigis > AX25_MAX_DIGIS) {
>> - dev_put(dev);
>> + if ((dev = nr_ax25_dev_get(nr_route.device)) == NULL)
>> return -EINVAL;
>> - }
>> switch (nr_route.type) {
>> case NETROM_NODE:
>> if (strnlen(nr_route.mnemonic, 7) == 7) {
>
> I realy do not know if that matters but some use AX25_MAX_DIGIS as array
> and therefore it should be >=AX25_MAX_DIGIS.
>
> struct rose_route_struct {
> rose_address address;
> unsigned short mask;
> ax25_address neighbour;
> char device[16];
> unsigned char ndigis;
> ax25_address digipeaters[AX25_MAX_DIGIS];
> };
> --
> To unsubscribe from this list: send the line "unsubscribe linux-hams" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: Open vSwitch Design
From: Eric Dumazet @ 2011-11-25 13:02 UTC (permalink / raw)
To: jhs-jkUAjuhPggJWk0Htik3J/w
Cc: dev-yBygre7rU0TnMu66kgdUjQ, chrisw-H+wXaHxf7aLQT0dZR+AlfA,
netdev-u79uwXL29TY76Z2rM5mHXA, Florian Westphal,
john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
herbert-F6s6mLieUQo7FNHlEwC/lvQIK84fMopw,
shemminger-ZtmgI6mnKB3QT0dZR+AlfA, David Miller
In-Reply-To: <1322220862.1908.79.camel@mojatatu>
Le vendredi 25 novembre 2011 à 06:34 -0500, jamal a écrit :
> Hrm. I forgot about the flow classifier - it may be what the openflow
> folks need. It is more friendly for the well defined tuples than u32.
>
> But what do you mean "refactor"? I can already use this classifier
> and attach actions to set policy in the kernel.
cls_flow is not complete, since it doesnt handle tunnels for example.
It calls a 'partial flow classifier' to find each needed element, one by
one.
(adding tunnel decap would need to perform this several time for each
packet)
__skb_get_rxhash() is more tunnel aware, yet some protocols are still
missing, for example IPPROTO_IPV6.
Instead of adding logic to both dissectors, we could have a central flow
dissector, filling a temporary pivot structure with found elements (src
addr, dst addr, ports, ...), going through tunnels encap if found.
Then net/sched/cls_flow.c could pick needed elems from this structure to
compute the hash as specified in tc command :
(for example : tc filter ... flow hash keys proto-dst,dst ...)
(One dissector call per packet for any number of keys in the filter)
Same for net/sched/sch_sfb.c : Use the pivot structure and compute the
two hashes (using two hashrnd values)
And __skb_get_rxhash() could use the same flow dissector, and pick (src
addr, dst addr, ports) to compute skb->rxhash, and set skb->l4_rxhash if
"ports" is not null.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* Re: [v4 PATCH 2/2] NETFILTER userspace part for target HMARK
From: Jan Engelhardt @ 2011-11-25 13:20 UTC (permalink / raw)
To: Hans Schillstrom; +Cc: kaber, pablo, netfilter-devel, netdev, hans.schillstrom
In-Reply-To: <1322213787-25796-3-git-send-email-hans@schillstrom.com>
On Friday 2011-11-25 10:36, Hans Schillstrom wrote:
>+Parameters:
>+For all masks default is all "1:s", to disable a field use mask 0
>+For IPv6 it's just the last 32 bits that is included in the hash
Why limit IPv6 to 32?
>diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
>new file mode 100644
>index 0000000..1760015
>--- /dev/null
>+++ b/include/linux/netfilter/xt_hmark.h
>@@ -0,0 +1,62 @@
>+#ifndef XT_HMARK_H_
>+#define XT_HMARK_H_
>+
>+#include <linux/types.h>
>+
>+/*
>+ * Flags must not start at 0, since it's used as none.
>+ */
>+enum {
>+ XT_HMARK_USE_SNAT = 1, /* SNAT & DNAT are used by the kernel module */
>+ XT_HMARK_USE_DNAT,
>+ XT_HMARK_SADR_AND,
>+ XT_HMARK_DADR_AND,
>+ XT_HMARK_SPI_AND,
>+ XT_HMARK_SPI_OR,
>+ XT_HMARK_SPORT_AND,
>+ XT_HMARK_DPORT_AND,
>+ XT_HMARK_SPORT_OR,
>+ XT_HMARK_DPORT_OR,
>+ XT_HMARK_PROTO_AND,
>+ XT_HMARK_RND,
>+ XT_HMARK_MODULUS,
>+ XT_HMARK_OFFSET,
>+ XT_F_HMARK_USE_SNAT = 1 << XT_HMARK_USE_SNAT,
This file does not match the kernel-side xt_hmark.h.
Definitions only used within the userspace side should go into libxt_hmark.c
anyhow.
>+union ports {
>+ struct {
>+ __u16 src;
>+ __u16 dst;
>+ } p16;
>+ __u32 v32;
>+};
Bad name "ports", big clash potential.
^ permalink raw reply
* Re: [PATCH 3/4] NET: NETROM: Cleanup argument SIOCADDRT ioctl argument checking.
From: Thomas Osterried @ 2011-11-25 13:26 UTC (permalink / raw)
To: wharms
Cc: Ralf Baechle, David S. Miller, netdev, linux-hams,
Kernel Janitors List
In-Reply-To: <4ECF8629.7000301@bfs.de>
Am Freitag, den 25. November 2011 um 13:12:25 Uhr, schrieb walter harms <wharms@bfs.de> in <4ECF8629.7000301@bfs.de>:
> hi,
> according to LXR there are several places where the check is >AX25_MAX_DIGIS instead of >=.
>
> any takers ?
nr_route.ndigis is used at
nr_call_to_digi(&digi, nr_route.ndigis, nr_route.digipeaters),
Image nr_route.ndigis is 0.
static ax25_digi *nr_call_to_digi(ax25_digi *digi, int ndigis,
ax25_address *digipeaters)
{
int i;
if (ndigis == 0)
return NULL;
################ here we leave
for (i = 0; i < ndigis; i++) {
digi->calls[i] = digipeaters[i];
digi->repeated[i] = 0;
}
digi->ndigi = ndigis;
digi->lastrepeat = -1;
return digi;
}
Image ndigi is 8 (AX25_MAX_DIGIS), as large as nr_route.digipeaters (because it's digipeaters[AX25_MAX_DIGIS]).
we fill the array from i = 0 to i < ndigis (=7) -> 8 times == sizeof(digipeaters)
-> everything is fine with that.
vy 73,
- Thomas dl9sau
>
> re,
> wh
>
>
> Am 25.11.2011 12:22, schrieb walter harms:
> >
> >
> > Am 25.11.2011 10:09, schrieb Ralf Baechle:
> >> nr_route.ndigis is unsigned int so the nr_route.ndigis < 0 expression is
> >> never true and can be dropped. Doing the nr_ax25_dev_get call later
> >> allows the nr_route.ndigis test to bail out without having to dev_put.
> >>
> >> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
> >> Cc: Thomas Osterried <thomas@osterried.de>
> >> ---
> >> net/netrom/nr_route.c | 6 ++----
> >> 1 files changed, 2 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
> >> index 8d7716c..2cf3301 100644
> >> --- a/net/netrom/nr_route.c
> >> +++ b/net/netrom/nr_route.c
> >> @@ -670,12 +670,10 @@ int nr_rt_ioctl(unsigned int cmd, void __user *arg)
> >> case SIOCADDRT:
> >> if (copy_from_user(&nr_route, arg, sizeof(struct nr_route_struct)))
> >> return -EFAULT;
> >> - if ((dev = nr_ax25_dev_get(nr_route.device)) == NULL)
> >> + if (nr_route.ndigis > AX25_MAX_DIGIS)
> >> return -EINVAL;
> >> - if (nr_route.ndigis < 0 || nr_route.ndigis > AX25_MAX_DIGIS) {
> >> - dev_put(dev);
> >> + if ((dev = nr_ax25_dev_get(nr_route.device)) == NULL)
> >> return -EINVAL;
> >> - }
> >> switch (nr_route.type) {
> >> case NETROM_NODE:
> >> if (strnlen(nr_route.mnemonic, 7) == 7) {
> >
> > I realy do not know if that matters but some use AX25_MAX_DIGIS as array
> > and therefore it should be >=AX25_MAX_DIGIS.
> >
> > struct rose_route_struct {
> > rose_address address;
> > unsigned short mask;
> > ax25_address neighbour;
> > char device[16];
> > unsigned char ndigis;
> > ax25_address digipeaters[AX25_MAX_DIGIS];
> > };
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-hams" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
^ permalink raw reply
* Re: [PATCH] at91_ether: use gpio_is_valid for phy IRQ line
From: Jean-Christophe PLAGNIOL-VILLARD @ 2011-11-25 13:47 UTC (permalink / raw)
To: Jamie Iles
Cc: Nicolas Ferre, netdev, sfr, linux-next, linux-kernel,
linux-arm-kernel
In-Reply-To: <20111124222836.GB28582@gallagher>
On 22:28 Thu 24 Nov , Jamie Iles wrote:
> Hi Nicolas,
>
> On Thu, Nov 24, 2011 at 10:21:14PM +0100, Nicolas Ferre wrote:
> > Use the generic gpiolib gpio_is_valid() function to test
> > if the phy IRQ line GPIO is actually provided.
> >
> > For non-connected or non-existing phy IRQ lines, -EINVAL
> > value is used for phy_irq_pin field of struct at91_eth_data.
> >
> > Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
> > ---
> > drivers/net/ethernet/cadence/at91_ether.c | 23 +++++++++++++----------
> > 1 files changed, 13 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/cadence/at91_ether.c b/drivers/net/ethernet/cadence/at91_ether.c
> > index 56624d3..a1c4143 100644
> > --- a/drivers/net/ethernet/cadence/at91_ether.c
> > +++ b/drivers/net/ethernet/cadence/at91_ether.c
> > @@ -255,8 +255,7 @@ static void enable_phyirq(struct net_device *dev)
> > unsigned int dsintr, irq_number;
> > int status;
> >
> > - irq_number = lp->board_data.phy_irq_pin;
> > - if (!irq_number) {
> > + if (!gpio_is_valid(lp->board_data.phy_irq_pin)) {
> > /*
> > * PHY doesn't have an IRQ pin (RTL8201, DP83847, AC101L),
> > * or board does not have it connected.
> > @@ -265,6 +264,7 @@ static void enable_phyirq(struct net_device *dev)
> > return;
> > }
> >
> > + irq_number = lp->board_data.phy_irq_pin;
>
> Does this need to be:
>
> irq_number = gpio_to_irq(lp->board_data.phy_irq_pin);
>
> and the same for the other occurrences? Otherwise this looks like the
> right thing to me.
yes
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Best Regards,
J.
^ permalink raw reply
* Netem man page
From: Hagen Paul Pfeifer @ 2011-11-25 13:47 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger
netem man page attached. The patch describe the Correlated Loss Generator
extensions, which is kernel mainline but not supported by iproute yet (the
userland code is missing) - I decided to retain the CLG paragraphs.
If not, I can re-submit the man page reflecting the current state (without
CLG). After that I will send a patch to describe netem rate and cell extension.
Stephen?
^ permalink raw reply
* [PATCH iproute2] netem: add man-page
From: Hagen Paul Pfeifer @ 2011-11-25 13:47 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger, Hagen Paul Pfeifer
In-Reply-To: <1322228876-10500-1-git-send-email-hagen@jauu.net>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
---
man/man8/tc-netem.8 | 272 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 272 insertions(+), 0 deletions(-)
create mode 100644 man/man8/tc-netem.8
diff --git a/man/man8/tc-netem.8 b/man/man8/tc-netem.8
new file mode 100644
index 0000000..c8ed292
--- /dev/null
+++ b/man/man8/tc-netem.8
@@ -0,0 +1,272 @@
+.TH NETEM 8 "25 November 2011" "iproute2" "Linux"
+.SH NAME
+NetEm \- Network Emulator
+.SH SYNOPSIS
+.B tc qdisc ... dev
+dev
+.B ] add netem [ limit
+packets
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ logging
+LEVEL ]
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ delay
+TIME [ JITTER [CORRELATION]]]
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ distribution
+{uniform|normal|pareto|paretonormal} ]
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ drop
+PERCENT [CORRELATION]]
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ loss
+PERCENT [CORRELATION]]
+.B ]
+
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ query ] [ loss_GI
+ploss [burst_length [density [pisol [good_burst_length]]]]
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ query ] [ loss_4state
+[p13 [p31 [p32 [p23 [p14]]]]]
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ loss_gilb_ell
+p [r [1-h [1-k]]]
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ loss_gilb_ell_4s
+p [r [1-h [1-k]]]
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ loss_pattern
+FILENAME [REPETITIONS]]
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ corrupt
+PERCENT [CORRELATION]]
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ duplicate
+PERCENT [CORRELATION]]
+.B ]
+
+.B tc qdisc ... dev
+dev
+.B ] add netem [ reorder
+PRECENT [CORRELATION] [ gap DISTANCE ]]
+.B ]
+
+.SH DESCRIPTION
+NetEm is an enhancement of the Linux traffic control facilities
+that allow to add delay, packet loss, duplication and more other
+characteristics to packets outgoing from a selected network
+interface. NetEm is build using the existing Quality Of Service (QOS)
+and Differentiated Services (diffserv) facilities in the Linux
+kernel.
+
+.SH netem OPTIONS
+netem has the following options:
+
+.B limit
+packets
+
+limits the effect of selected options to the indicated number of next packets.
+
+.B logging
+LEVEL
+
+sets a logging level. Actually it works with loss_GI, loss_4state, loss_bern,
+loss_gilb, loss_gilb_ell, loss_gilb_ell_4s, loss_pattern options. The default
+value is level 0, which means that no data will be logged. When logging level
+is set to 1 the kernel logs will include a line like "netem loss event
+algorithm [type] x RFPLE y" for each loss event. The acronym RFPLE means
+"Received From Previous Loss Event" and it counts the number y of good packets
+received between two loss events while x is the number of all lost packets and
+algorithm refers to the selected loss generation algorithm (4-state, gilb_ell
+or deterministic). The type label applies only to the GI algorithm and can be
+burst or isolated.
+
+.B delay
+TIME [ JITTER [CORRELATION]]]
+
+adds the chosen delay to the packets outgoing to chosen network interface. The
+optional parameters allows to introduce a delay variation and a correlation.
+Delay and jitter values are expressed in ms while correlation is percentage.
+
+.B distribution
+{uniform|normal|pareto|paretonormal}
+
+allow the choose the delay distribution. If not specified, the default
+distribution is normal. Additional parameters allow to consider situations in
+which network has variable delays depending on traffic flows concurring on the
+same path, that causes severeal delay peaks and a tail.
+
+.B drop
+PERCENT [CORRELATION]
+
+OR
+
+.B loss
+PERCENT [CORRELATION]
+
+adds an independent loss probability to the packets outgoing from the chosen
+network interface. It is also possibile to add a correlation, but this option
+is now deprecated due to the noticed bad behaviour.
+
+.B query
+
+enables the query mode. It applies to loss_GI and loss_4state options. If it is
+used with the loss_GI option, the transition probabilities which correspond to
+the input intuitive parameters are calculated and printed to screen, without
+copying them in the netem qdisc. Similarly, if it is used with the loss_4state
+option, it calculates and prints the intuitive parameters that corresponds to
+the input transition probabilities.
+
+.B loss_GI
+ploss [burst_length [density [pisol [good_burst_length]]]]
+
+adds packet losses according to the GI (General and Intuitive) loss model,
+using the intuitive parameters. The parameter ploss is mandatory while the
+others are optional. The intuitive parameters are converted to the transition
+probabilities of the 4-state Markov model. If the only parameter specified is
+ploss, it corresponds to the Bernoulli model while the optional parameters
+allow to extend the model to 2-state (burst_length), 3-state (density), and
+4-state (pisol). If the good_burst_length is not specified the hyphotesis of
+statistical independence for the losses within the burst will be used.
+
+.B loss_4state
+p13 [p31 [p32 [p23 [p14]]]]
+
+adds packet losses according to the 4-state Markov using the transition
+probabilities as input parameters. The parameters p13 is mandatory and if used
+alone corresponds to the Bernoulli model. The optional parameters allows to
+extend the model to 2-state (p31), 3-state (p23 and p32) and 4-state (p14).
+State 1 corresponds to good reception, State 4 to independent losses, State 3
+to burst losses and State 2 to good reception within a burst.
+
+.B loss_gilb_ell
+p [r [1-h [1-k]]]
+
+adds packet losses according to the Gilbert-Elliot loss model or its special
+cases (Gilbert, Simple Gilbert and Bernoulli). To use the Bernoulli model, the
+only needed parameter is p while the the others will be set to the default
+values r=1-p, 1-h=1 and 1-k=0. The parameters needed for the Simple Gilbert
+model are two (p and r), while three parameters (p, r, 1-h) are needed for the
+Gilbert model and four (p, r, 1-h and 1-k) are needed for the Gilbert-Elliot
+model. As known, p and r are the transition probabilities between the bad and
+the good states, 1-h is the loss probability in the bad state and 1-k is the
+loss probability in the good state.
+
+.B loss_gilb_ell_4s
+p [r [1-h [1-k]]]
+
+adds packet losses according to the Gilbert-Elliot-4s loss model. It is a
+particular version of the GI model which behaviour is very similar to the
+Gilbert-Elliot's. The input parameters are the same of the real Gilbert-Elliot
+model or its special cases. The transition probabilities and GI parameters that
+corresponds to the Gilbert-Elliot input parameters are calculated and, if the
+query mode is enabled, printed to screen. This option is included to study the
+correspondence between GI model and the models available in the literature, it
+has no practical use at the moment.
+
+.B loss_pattern
+FILENAME [REPETITIONS]
+
+adds packet losses according to a deterministic loss pattern. It reads from the
+text file FILENAME a sequence of "1" and "0" where "1" are the loss events and
+"0" are the regular transmission of packets . The parameter repetitions is
+optional and is the number of "replicas" of the loss pattern file. It is
+optional and by default is 0 which means infinite repetition of the loss
+pattern.
+
+.B corrupt
+PERCENT [CORRELATION]]
+
+allows the emulate the random noise introducing an error in a random position
+for a chosen percent of packets. It is also possible to add a correlation
+through the proper parameter.
+
+.B duplicate
+PERCENT [CORRELATION]]
+
+using this option the chosen percent of packets is duplicated before queueing
+them. It is also possible to add a correlation through the proper parameter.
+
+.B reorder
+PRECENT [CORRELATION] [ gap DISTANCE ]]
+
+there are two ways to use this option:
+
+.B reorder
+gap 5 10 ms
+
+in this first example every 5th (10th, 15th) packet is sent immediately while
+other packets are delayed by 10 ms
+
+.B reorder
+25% 50%
+
+in this second example 25% of packets are sent immediately (with correlation of
+50%) while the other are delayed by 10 ms.
+
+.SH LIMITATIONS
+The main known limitation of Netem are related to timer granularity, since
+Linux is not a real-time operating system; to the choice of Pseudo-Random
+Number Generator (PRNG) and the original loss model.
+
+.SH SOURCES
+.TP
+o
+Hemminger S. , "Network Emulation with NetEm", Open Source Development Lab,
+April 2005
+(http://devresources.linux-foundation.org/shemminger/netem/LCA2005_paper.pdf)
+
+.TP
+o
+Netem page from Linux foundation, (http://www.linuxfoundation.org/en/Net:Netem)
+
+.TP
+o
+Salsano S., Ludovici F., Ordine A., "Definition of a general and intuitive loss
+model for packet networks and its implementation in the Netem module in the
+Linux kernel", available at http://netgroup.uniroma2.it/NetemCLG
+
+.SH SEE ALSO
+.BR tc (8),
+.BR tc-tbf (8)
+
+.SH AUTHOR
+Netem was written by Stephen Hemminger at OSDL and is based on NISTnet. This
+manpage was created by Fabio Ludovici <fabio.ludovici at yahoo dot it> and
+Hagen Paul Pfeifer <hagen@jauu.net>
--
1.7.7
^ permalink raw reply related
* Re: [PATCH] at91_ether: use gpio_is_valid for phy IRQ line
From: Nicolas Ferre @ 2011-11-25 13:56 UTC (permalink / raw)
To: Jean-Christophe PLAGNIOL-VILLARD, Jamie Iles
Cc: netdev, sfr, linux-next, linux-kernel, linux-arm-kernel
In-Reply-To: <20111125134708.GM15531@game.jcrosoft.org>
On 11/25/2011 02:47 PM, Jean-Christophe PLAGNIOL-VILLARD :
> On 22:28 Thu 24 Nov , Jamie Iles wrote:
>> Hi Nicolas,
>>
>> On Thu, Nov 24, 2011 at 10:21:14PM +0100, Nicolas Ferre wrote:
>>> Use the generic gpiolib gpio_is_valid() function to test
>>> if the phy IRQ line GPIO is actually provided.
>>>
>>> For non-connected or non-existing phy IRQ lines, -EINVAL
>>> value is used for phy_irq_pin field of struct at91_eth_data.
>>>
>>> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
>>> ---
>>> drivers/net/ethernet/cadence/at91_ether.c | 23 +++++++++++++----------
>>> 1 files changed, 13 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/cadence/at91_ether.c b/drivers/net/ethernet/cadence/at91_ether.c
>>> index 56624d3..a1c4143 100644
>>> --- a/drivers/net/ethernet/cadence/at91_ether.c
>>> +++ b/drivers/net/ethernet/cadence/at91_ether.c
>>> @@ -255,8 +255,7 @@ static void enable_phyirq(struct net_device *dev)
>>> unsigned int dsintr, irq_number;
>>> int status;
>>>
>>> - irq_number = lp->board_data.phy_irq_pin;
>>> - if (!irq_number) {
>>> + if (!gpio_is_valid(lp->board_data.phy_irq_pin)) {
>>> /*
>>> * PHY doesn't have an IRQ pin (RTL8201, DP83847, AC101L),
>>> * or board does not have it connected.
>>> @@ -265,6 +264,7 @@ static void enable_phyirq(struct net_device *dev)
>>> return;
>>> }
>>>
>>> + irq_number = lp->board_data.phy_irq_pin;
>>
>> Does this need to be:
>>
>> irq_number = gpio_to_irq(lp->board_data.phy_irq_pin);
>>
>> and the same for the other occurrences? Otherwise this looks like the
>> right thing to me.
> yes
True but I prefered to separate changes. As it was not addressed like
this in the code I clung to it. I will post an additional patch on top
of this one to add the gpio_to_irq() call.
> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Best regards,
--
Nicolas Ferre
^ permalink raw reply
* TCP fast retransmit
From: Esztermann, Ansgar @ 2011-11-25 13:33 UTC (permalink / raw)
To: netdev@vger.kernel.org
[originally posted to lkml]
Hello list,
is there some documentation available on TCP fast retransmit? There seem to be quite a lot of descriptions -- from informal to scholarly papers -- on the various algorithms available to calculate the proper size of the congestion window, but I have been unable so far to find out *when* a fast retransmit is triggered. RFC 2581 states the third dupACK "should" do it, and this seems to be quoted fairly often. However, I can easily produce connections that fail to perform fast retransmit even after 5 dupACKs. Some people mention Linux uses a different (presumable more sophisticated) algorithm to trigger fast retransmits, but no-one seems to elaborate.
Thanks,
A.
--
Ansgar Esztermann
DV-Systemadministration
Max-Planck-Institut für biophysikalische Chemie, Abteilung 105
^ permalink raw reply
* RE: [v4 PATCH 2/2] NETFILTER userspace part for target HMARK
From: Hans Schillström @ 2011-11-25 14:04 UTC (permalink / raw)
To: Jan Engelhardt, Hans Schillstrom
Cc: kaber@trash.net, pablo@netfilter.org,
netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <alpine.LNX.2.01.1111251418440.6279@frira.zrqbmnf.qr>
>On Friday 2011-11-25 10:36, Hans Schillstrom wrote:
>
>>+Parameters:
>>+For all masks default is all "1:s", to disable a field use mask 0
>>+For IPv6 it's just the last 32 bits that is included in the hash
>
>Why limit IPv6 to 32?
Performance, and the gain of adding another 192 bits to jhash ain't much.
However there is some cases when it hurts, i.e. when you can't mask of an subnet
I'm not sure it it's a problem or not...
>>diff --git a/include/linux/netfilter/xt_hmark.h b/include/linux/netfilter/xt_hmark.h
[snip]
>
>This file does not match the kernel-side xt_hmark.h.
>Definitions only used within the userspace side should go into libxt_hmark.c
>anyhow.
Oop,s just forgot to copy the file :-)
>>+union ports {
>>+ struct {
>>+ __u16 src;
>>+ __u16 dst;
>>+ } p16;
>>+ __u32 v32;
>>+};
>
>Bad name "ports", big clash potential.
Yes, I'll change that..
Thanks
Hans
^ permalink raw reply
* Re: [PATCH net] net: Revert ARCNET and PHYLIB to tristate options
From: Ben Hutchings @ 2011-11-25 14:07 UTC (permalink / raw)
To: David Miller; +Cc: jeffrey.t.kirsher, netdev, debian-kernel
In-Reply-To: <20111125.013102.841551383223873520.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 1132 bytes --]
On Fri, 2011-11-25 at 01:31 -0500, David Miller wrote:
> From: Ben Hutchings <ben@decadent.org.uk>
> Date: Thu, 24 Nov 2011 07:23:30 +0000
>
> > Commit 88491d8103498a6166f70d5999902fec70924314 ("drivers/net: Kconfig
> > & Makefile cleanup") changed the type of these options to bool, but
> > they select code that could (and still can) be built as modules.
> >
> > Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> > ---
> > I consider the inability to build arcnet.o and libphy.o as modules to be
> > a regression in 3.2 for general-purpose distribution kernels.
> > Therefore, please apply this to the net tree.
>
> I challenge you to get PHYLIB set to 'm' in a configuration such as
> "allmodconfig" which is clearly in line with the kind of configuration
> which "distribution kernels" use.
[...]
Well, I can't think why it would be built in, since PHY modules can be
auto-loaded now.
$ grep PHYLIB /boot/config-2.6.32-5-amd64
CONFIG_PHYLIB=m
$ grep PHYLIB /boot/config-3.1.0-1-amd64
CONFIG_PHYLIB=m
Ben.
--
Ben Hutchings
Teamwork is essential - it allows you to blame someone else.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply
* [PATCH 0/6] NFS: create DNS resolver cache per network namespace
From: Stanislav Kinsbursky @ 2011-11-25 14:12 UTC (permalink / raw)
To: Trond.Myklebust
Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, jbottomley,
bfields, davem, devel
This patch set was created in context of clone of git
branch: git://git.linux-nfs.org/projects/trondmy/nfs-2.6.git.
tag: v3.1
This patch set depends on previous patch sets titled:
1) "SUNRPC: initial part of making pipefs work in net ns"
2) "SUNPRC: cleanup PipeFS for network-namespace-aware users"
Actually, this patch set consists of two tightly connected parts:
1) DNS resolver cache per network namespcae implementation itself
2) DNS resolver cache dentries creation by PipeFS network namespace aware
routines.
Thus this patch set is the second part of "PipeFS per network namespace"
story.
The following series consists of:
---
Stanislav Kinsbursky (6):
SUNRPC: split cache creation and PipeFS registration
NFS: split cache creation and PipeFS registration
NFS: handle NFS caches dentries by network namespace aware routines
NFS: DNS resolver cache per network namespace context introduced
NFS: DNS resolver PipeFS notifier introduced
NFS: remove RPC PipeFS mount point references from NFS cache routines
fs/nfs/cache_lib.c | 61 ++++++++++++++------
fs/nfs/cache_lib.h | 10 +++
fs/nfs/dns_resolve.c | 125 ++++++++++++++++++++++++++++++++++--------
fs/nfs/dns_resolve.h | 14 ++++-
fs/nfs/inode.c | 33 ++++++++++-
fs/nfs/netns.h | 13 ++++
fs/nfs/nfs4namespace.c | 8 ++-
include/linux/sunrpc/cache.h | 2 +
net/sunrpc/cache.c | 12 ++--
9 files changed, 218 insertions(+), 60 deletions(-)
create mode 100644 fs/nfs/netns.h
--
Signature
^ permalink raw reply
* [PATCH 1/6] SUNRPC: split cache creation and PipeFS registration
From: Stanislav Kinsbursky @ 2011-11-25 14:12 UTC (permalink / raw)
To: Trond.Myklebust
Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, jbottomley,
bfields, davem, devel
In-Reply-To: <20111125130557.6271.95071.stgit@localhost6.localdomain6>
This precursor patch splits SUNRPC cache creation and PipeFS registartion.
It's required for latter split of NFS DNS resolver cache creation per network
namespace context and PipeFS registration/unregistration on MOUNT/UMOUNT
events.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
fs/nfs/cache_lib.c | 3 +++
include/linux/sunrpc/cache.h | 2 ++
net/sunrpc/cache.c | 12 +++++-------
3 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/cache_lib.c b/fs/nfs/cache_lib.c
index c98b439..d62a895 100644
--- a/fs/nfs/cache_lib.c
+++ b/fs/nfs/cache_lib.c
@@ -120,6 +120,7 @@ int nfs_cache_register(struct cache_detail *cd)
mnt = rpc_get_mount();
if (IS_ERR(mnt))
return PTR_ERR(mnt);
+ sunrpc_init_cache_detail(cd);
ret = vfs_path_lookup(mnt->mnt_root, mnt, "/cache", 0, &path);
if (ret)
goto err;
@@ -128,6 +129,7 @@ int nfs_cache_register(struct cache_detail *cd)
if (!ret)
return ret;
err:
+ sunrpc_destroy_cache_detail(cd);
rpc_put_mount();
return ret;
}
@@ -135,6 +137,7 @@ err:
void nfs_cache_unregister(struct cache_detail *cd)
{
sunrpc_cache_unregister_pipefs(cd);
+ sunrpc_destroy_cache_detail(cd);
rpc_put_mount();
}
diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 5efd8ce..57d9fa7 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -202,6 +202,8 @@ extern int cache_register_net(struct cache_detail *cd, struct net *net);
extern void cache_unregister(struct cache_detail *cd);
extern void cache_unregister_net(struct cache_detail *cd, struct net *net);
+extern void sunrpc_init_cache_detail(struct cache_detail *cd);
+extern void sunrpc_destroy_cache_detail(struct cache_detail *cd);
extern int sunrpc_cache_register_pipefs(struct dentry *parent, const char *,
mode_t, struct cache_detail *);
extern void sunrpc_cache_unregister_pipefs(struct cache_detail *);
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 72ad836..320e549 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -344,7 +344,7 @@ static int current_index;
static void do_cache_clean(struct work_struct *work);
static struct delayed_work cache_cleaner;
-static void sunrpc_init_cache_detail(struct cache_detail *cd)
+void sunrpc_init_cache_detail(struct cache_detail *cd)
{
rwlock_init(&cd->hash_lock);
INIT_LIST_HEAD(&cd->queue);
@@ -360,8 +360,9 @@ static void sunrpc_init_cache_detail(struct cache_detail *cd)
/* start the cleaning process */
schedule_delayed_work(&cache_cleaner, 0);
}
+EXPORT_SYMBOL_GPL(sunrpc_init_cache_detail);
-static void sunrpc_destroy_cache_detail(struct cache_detail *cd)
+void sunrpc_destroy_cache_detail(struct cache_detail *cd)
{
cache_purge(cd);
spin_lock(&cache_list_lock);
@@ -384,6 +385,7 @@ static void sunrpc_destroy_cache_detail(struct cache_detail *cd)
out:
printk(KERN_ERR "nfsd: failed to unregister %s cache\n", cd->name);
}
+EXPORT_SYMBOL_GPL(sunrpc_destroy_cache_detail);
/* clean cache tries to find something to clean
* and cleans it.
@@ -1785,17 +1787,14 @@ int sunrpc_cache_register_pipefs(struct dentry *parent,
struct dentry *dir;
int ret = 0;
- sunrpc_init_cache_detail(cd);
q.name = name;
q.len = strlen(name);
q.hash = full_name_hash(q.name, q.len);
dir = rpc_create_cache_dir(parent, &q, umode, cd);
if (!IS_ERR(dir))
cd->u.pipefs.dir = dir;
- else {
- sunrpc_destroy_cache_detail(cd);
+ else
ret = PTR_ERR(dir);
- }
return ret;
}
EXPORT_SYMBOL_GPL(sunrpc_cache_register_pipefs);
@@ -1804,7 +1803,6 @@ void sunrpc_cache_unregister_pipefs(struct cache_detail *cd)
{
rpc_remove_cache_dir(cd->u.pipefs.dir);
cd->u.pipefs.dir = NULL;
- sunrpc_destroy_cache_detail(cd);
}
EXPORT_SYMBOL_GPL(sunrpc_cache_unregister_pipefs);
^ permalink raw reply related
* [PATCH 2/6] NFS: split cache creation and PipeFS registration
From: Stanislav Kinsbursky @ 2011-11-25 14:12 UTC (permalink / raw)
To: Trond.Myklebust
Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, jbottomley,
bfields, davem, devel
In-Reply-To: <20111125130557.6271.95071.stgit@localhost6.localdomain6>
This precursor patch splits NFS cache creation and PipeFS registartion.
It's required for latter split of NFS DNS resolver cache creation per network
namespace context and PipeFS registration/unregistration on MOUNT/UMOUNT
events.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
fs/nfs/cache_lib.c | 11 +++++++++--
fs/nfs/cache_lib.h | 2 ++
fs/nfs/dns_resolve.c | 11 ++++++++++-
3 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/cache_lib.c b/fs/nfs/cache_lib.c
index d62a895..9d79a2e 100644
--- a/fs/nfs/cache_lib.c
+++ b/fs/nfs/cache_lib.c
@@ -120,7 +120,6 @@ int nfs_cache_register(struct cache_detail *cd)
mnt = rpc_get_mount();
if (IS_ERR(mnt))
return PTR_ERR(mnt);
- sunrpc_init_cache_detail(cd);
ret = vfs_path_lookup(mnt->mnt_root, mnt, "/cache", 0, &path);
if (ret)
goto err;
@@ -129,7 +128,6 @@ int nfs_cache_register(struct cache_detail *cd)
if (!ret)
return ret;
err:
- sunrpc_destroy_cache_detail(cd);
rpc_put_mount();
return ret;
}
@@ -141,3 +139,12 @@ void nfs_cache_unregister(struct cache_detail *cd)
rpc_put_mount();
}
+void nfs_cache_init(struct cache_detail *cd)
+{
+ sunrpc_init_cache_detail(cd);
+}
+
+void nfs_cache_destroy(struct cache_detail *cd)
+{
+ sunrpc_destroy_cache_detail(cd);
+}
diff --git a/fs/nfs/cache_lib.h b/fs/nfs/cache_lib.h
index 7cf6caf..815dd66 100644
--- a/fs/nfs/cache_lib.h
+++ b/fs/nfs/cache_lib.h
@@ -23,5 +23,7 @@ extern struct nfs_cache_defer_req *nfs_cache_defer_req_alloc(void);
extern void nfs_cache_defer_req_put(struct nfs_cache_defer_req *dreq);
extern int nfs_cache_wait_for_upcall(struct nfs_cache_defer_req *dreq);
+extern void nfs_cache_init(struct cache_detail *cd);
+extern void nfs_cache_destroy(struct cache_detail *cd);
extern int nfs_cache_register(struct cache_detail *cd);
extern void nfs_cache_unregister(struct cache_detail *cd);
diff --git a/fs/nfs/dns_resolve.c b/fs/nfs/dns_resolve.c
index a6e711a..619dea6 100644
--- a/fs/nfs/dns_resolve.c
+++ b/fs/nfs/dns_resolve.c
@@ -361,12 +361,21 @@ ssize_t nfs_dns_resolve_name(char *name, size_t namelen,
int nfs_dns_resolver_init(void)
{
- return nfs_cache_register(&nfs_dns_resolve);
+ int err;
+
+ nfs_cache_init(&nfs_dns_resolve);
+ err = nfs_cache_register(&nfs_dns_resolve);
+ if (err) {
+ nfs_cache_destroy(&nfs_dns_resolve);
+ return err;
+ }
+ return 0;
}
void nfs_dns_resolver_destroy(void)
{
nfs_cache_unregister(&nfs_dns_resolve);
+ nfs_cache_destroy(&nfs_dns_resolve);
}
#endif
^ permalink raw reply related
* [PATCH 3/6] NFS: handle NFS caches dentries by network namespace aware routines
From: Stanislav Kinsbursky @ 2011-11-25 14:12 UTC (permalink / raw)
To: Trond.Myklebust
Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, jbottomley,
bfields, davem, devel
In-Reply-To: <20111125130557.6271.95071.stgit@localhost6.localdomain6>
This patch makes NFS caches PipeFS dentries allocated and destroyed in network
namespace context by PipeFS network namespace aware routines.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
fs/nfs/cache_lib.c | 44 +++++++++++++++++++++++++++++++++++---------
fs/nfs/cache_lib.h | 4 ++--
fs/nfs/dns_resolve.c | 4 ++--
3 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/fs/nfs/cache_lib.c b/fs/nfs/cache_lib.c
index 9d79a2e..5dd017b 100644
--- a/fs/nfs/cache_lib.c
+++ b/fs/nfs/cache_lib.c
@@ -13,6 +13,7 @@
#include <linux/slab.h>
#include <linux/sunrpc/cache.h>
#include <linux/sunrpc/rpc_pipe_fs.h>
+#include <net/net_namespace.h>
#include "cache_lib.h"
@@ -111,20 +112,34 @@ int nfs_cache_wait_for_upcall(struct nfs_cache_defer_req *dreq)
return 0;
}
-int nfs_cache_register(struct cache_detail *cd)
+static int nfs_cache_register_sb(struct super_block *sb, struct cache_detail *cd)
+{
+ int ret;
+ struct dentry *dir;
+
+ dir = rpc_d_lookup_sb(sb, "cache");
+ BUG_ON(dir == NULL);
+ ret = sunrpc_cache_register_pipefs(dir, cd->name, 0600, cd);
+ dput(dir);
+ return ret;
+}
+
+int nfs_cache_register_net(struct net *net, struct cache_detail *cd)
{
struct vfsmount *mnt;
- struct path path;
+ struct super_block *pipefs_sb;
int ret;
mnt = rpc_get_mount();
if (IS_ERR(mnt))
return PTR_ERR(mnt);
- ret = vfs_path_lookup(mnt->mnt_root, mnt, "/cache", 0, &path);
- if (ret)
+ pipefs_sb = rpc_get_sb_net(net);
+ if (!pipefs_sb) {
+ ret = -ENOENT;
goto err;
- ret = sunrpc_cache_register_pipefs(path.dentry, cd->name, 0600, cd);
- path_put(&path);
+ }
+ ret = nfs_cache_register_sb(pipefs_sb, cd);
+ rpc_put_sb_net(net);
if (!ret)
return ret;
err:
@@ -132,10 +147,21 @@ err:
return ret;
}
-void nfs_cache_unregister(struct cache_detail *cd)
+static void nfs_cache_unregister_sb(struct super_block *sb, struct cache_detail *cd)
{
- sunrpc_cache_unregister_pipefs(cd);
- sunrpc_destroy_cache_detail(cd);
+ if (cd->u.pipefs.dir)
+ sunrpc_cache_unregister_pipefs(cd);
+}
+
+void nfs_cache_unregister_net(struct net *net, struct cache_detail *cd)
+{
+ struct super_block *pipefs_sb;
+
+ pipefs_sb = rpc_get_sb_net(net);
+ if (pipefs_sb) {
+ nfs_cache_unregister_sb(pipefs_sb, cd);
+ rpc_put_sb_net(net);
+ }
rpc_put_mount();
}
diff --git a/fs/nfs/cache_lib.h b/fs/nfs/cache_lib.h
index 815dd66..e0a6cc4 100644
--- a/fs/nfs/cache_lib.h
+++ b/fs/nfs/cache_lib.h
@@ -25,5 +25,5 @@ extern int nfs_cache_wait_for_upcall(struct nfs_cache_defer_req *dreq);
extern void nfs_cache_init(struct cache_detail *cd);
extern void nfs_cache_destroy(struct cache_detail *cd);
-extern int nfs_cache_register(struct cache_detail *cd);
-extern void nfs_cache_unregister(struct cache_detail *cd);
+extern int nfs_cache_register_net(struct net *net, struct cache_detail *cd);
+extern void nfs_cache_unregister_net(struct net *net, struct cache_detail *cd);
diff --git a/fs/nfs/dns_resolve.c b/fs/nfs/dns_resolve.c
index 619dea6..3cbf4b8 100644
--- a/fs/nfs/dns_resolve.c
+++ b/fs/nfs/dns_resolve.c
@@ -364,7 +364,7 @@ int nfs_dns_resolver_init(void)
int err;
nfs_cache_init(&nfs_dns_resolve);
- err = nfs_cache_register(&nfs_dns_resolve);
+ err = nfs_cache_register_net(&init_net, &nfs_dns_resolve);
if (err) {
nfs_cache_destroy(&nfs_dns_resolve);
return err;
@@ -374,7 +374,7 @@ int nfs_dns_resolver_init(void)
void nfs_dns_resolver_destroy(void)
{
- nfs_cache_unregister(&nfs_dns_resolve);
+ nfs_cache_unregister_net(&init_net, &nfs_dns_resolve);
nfs_cache_destroy(&nfs_dns_resolve);
}
^ permalink raw reply related
* [PATCH 4/6] NFS: DNS resolver cache per network namespace context introduced
From: Stanislav Kinsbursky @ 2011-11-25 14:13 UTC (permalink / raw)
To: Trond.Myklebust
Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, jbottomley,
bfields, davem, devel
In-Reply-To: <20111125130557.6271.95071.stgit@localhost6.localdomain6>
This patch implements DNS resolver cache creation and registration for each
alive network namespace context.
This was done by registering NFS per-net operations, responsible for DNS cache
allocation/register and unregister/destructioning instead of initialization and
destruction of static "nfs_dns_resolve" cache detail (this one was removed).
Pointer to network dns resolver cache is stored in new per-net "nfs_net"
structure.
This patch also changes nfs_dns_resolve_name() function prototype (and it's
calls) by adding network pointer parameter, which is used to get proper DNS
resolver cache pointer for do_cache_lookup_wait() call.
Note: empty nfs_dns_resolver_init() and nfs_dns_resolver_destroy() functions
will be used in next patch in the series.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
fs/nfs/dns_resolve.c | 96 ++++++++++++++++++++++++++++++++----------------
fs/nfs/dns_resolve.h | 14 ++++++-
fs/nfs/inode.c | 33 +++++++++++++++--
fs/nfs/netns.h | 13 +++++++
fs/nfs/nfs4namespace.c | 8 +++-
5 files changed, 123 insertions(+), 41 deletions(-)
create mode 100644 fs/nfs/netns.h
diff --git a/fs/nfs/dns_resolve.c b/fs/nfs/dns_resolve.c
index 3cbf4b8..9aea78a 100644
--- a/fs/nfs/dns_resolve.c
+++ b/fs/nfs/dns_resolve.c
@@ -11,7 +11,7 @@
#include <linux/sunrpc/clnt.h>
#include <linux/dns_resolver.h>
-ssize_t nfs_dns_resolve_name(char *name, size_t namelen,
+ssize_t nfs_dns_resolve_name(struct net *net, char *name, size_t namelen,
struct sockaddr *sa, size_t salen)
{
ssize_t ret;
@@ -43,12 +43,11 @@ ssize_t nfs_dns_resolve_name(char *name, size_t namelen,
#include "dns_resolve.h"
#include "cache_lib.h"
+#include "netns.h"
#define NFS_DNS_HASHBITS 4
#define NFS_DNS_HASHTBL_SIZE (1 << NFS_DNS_HASHBITS)
-static struct cache_head *nfs_dns_table[NFS_DNS_HASHTBL_SIZE];
-
struct nfs_dns_ent {
struct cache_head h;
@@ -259,21 +258,6 @@ out:
return ret;
}
-static struct cache_detail nfs_dns_resolve = {
- .owner = THIS_MODULE,
- .hash_size = NFS_DNS_HASHTBL_SIZE,
- .hash_table = nfs_dns_table,
- .name = "dns_resolve",
- .cache_put = nfs_dns_ent_put,
- .cache_upcall = nfs_dns_upcall,
- .cache_parse = nfs_dns_parse,
- .cache_show = nfs_dns_show,
- .match = nfs_dns_match,
- .init = nfs_dns_ent_init,
- .update = nfs_dns_ent_update,
- .alloc = nfs_dns_ent_alloc,
-};
-
static int do_cache_lookup(struct cache_detail *cd,
struct nfs_dns_ent *key,
struct nfs_dns_ent **item,
@@ -336,8 +320,8 @@ out:
return ret;
}
-ssize_t nfs_dns_resolve_name(char *name, size_t namelen,
- struct sockaddr *sa, size_t salen)
+ssize_t nfs_dns_resolve_name(struct net *net, char *name,
+ size_t namelen, struct sockaddr *sa, size_t salen)
{
struct nfs_dns_ent key = {
.hostname = name,
@@ -345,37 +329,83 @@ ssize_t nfs_dns_resolve_name(char *name, size_t namelen,
};
struct nfs_dns_ent *item = NULL;
ssize_t ret;
+ struct nfs_net *nn = net_generic(net, nfs_net_id);
- ret = do_cache_lookup_wait(&nfs_dns_resolve, &key, &item);
+ ret = do_cache_lookup_wait(nn->nfs_dns_resolve, &key, &item);
if (ret == 0) {
if (salen >= item->addrlen) {
memcpy(sa, &item->addr, item->addrlen);
ret = item->addrlen;
} else
ret = -EOVERFLOW;
- cache_put(&item->h, &nfs_dns_resolve);
+ cache_put(&item->h, nn->nfs_dns_resolve);
} else if (ret == -ENOENT)
ret = -ESRCH;
return ret;
}
-int nfs_dns_resolver_init(void)
+int nfs_dns_resolver_cache_init(struct net *net)
{
- int err;
+ int err = -ENOMEM;
+ struct nfs_net *nn = net_generic(net, nfs_net_id);
+ struct cache_detail *cd;
+ struct cache_head **tbl;
+
+ cd = kzalloc(sizeof(struct cache_detail), GFP_KERNEL);
+ if (cd == NULL)
+ goto err_cd;
+
+ tbl = kzalloc(NFS_DNS_HASHTBL_SIZE * sizeof(struct cache_head *),
+ GFP_KERNEL);
+ if (tbl == NULL)
+ goto err_tbl;
+
+ cd->owner = THIS_MODULE,
+ cd->hash_size = NFS_DNS_HASHTBL_SIZE,
+ cd->hash_table = tbl,
+ cd->name = "dns_resolve",
+ cd->cache_put = nfs_dns_ent_put,
+ cd->cache_upcall = nfs_dns_upcall,
+ cd->cache_parse = nfs_dns_parse,
+ cd->cache_show = nfs_dns_show,
+ cd->match = nfs_dns_match,
+ cd->init = nfs_dns_ent_init,
+ cd->update = nfs_dns_ent_update,
+ cd->alloc = nfs_dns_ent_alloc,
+
+ nfs_cache_init(cd);
+ err = nfs_cache_register_net(net, cd);
+ if (err)
+ goto err_reg;
+ nn->nfs_dns_resolve = cd;
+ return 0;
- nfs_cache_init(&nfs_dns_resolve);
- err = nfs_cache_register_net(&init_net, &nfs_dns_resolve);
- if (err) {
- nfs_cache_destroy(&nfs_dns_resolve);
- return err;
- }
+err_reg:
+ nfs_cache_destroy(cd);
+ kfree(cd->hash_table);
+err_tbl:
+ kfree(cd);
+err_cd:
+ return err;
+}
+
+void nfs_dns_resolver_cache_destroy(struct net *net)
+{
+ struct nfs_net *nn = net_generic(net, nfs_net_id);
+ struct cache_detail *cd = nn->nfs_dns_resolve;
+
+ nfs_cache_unregister_net(net, cd);
+ nfs_cache_destroy(cd);
+ kfree(cd->hash_table);
+ kfree(cd);
+}
+
+int nfs_dns_resolver_init(void)
+{
return 0;
}
void nfs_dns_resolver_destroy(void)
{
- nfs_cache_unregister_net(&init_net, &nfs_dns_resolve);
- nfs_cache_destroy(&nfs_dns_resolve);
}
-
#endif
diff --git a/fs/nfs/dns_resolve.h b/fs/nfs/dns_resolve.h
index 199bb55..2e4f596 100644
--- a/fs/nfs/dns_resolve.h
+++ b/fs/nfs/dns_resolve.h
@@ -15,12 +15,22 @@ static inline int nfs_dns_resolver_init(void)
static inline void nfs_dns_resolver_destroy(void)
{}
+
+static inline int nfs_dns_resolver_cache_init(struct net *net)
+{
+ return 0;
+}
+
+static inline void nfs_dns_resolver_cache_destroy(struct net *net)
+{}
#else
extern int nfs_dns_resolver_init(void);
extern void nfs_dns_resolver_destroy(void);
+extern int nfs_dns_resolver_cache_init(struct net *net);
+extern void nfs_dns_resolver_cache_destroy(struct net *net);
#endif
-extern ssize_t nfs_dns_resolve_name(char *name, size_t namelen,
- struct sockaddr *sa, size_t salen);
+extern ssize_t nfs_dns_resolve_name(struct net *net, char *name,
+ size_t namelen, struct sockaddr *sa, size_t salen);
#endif
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index fe12037..84d8506 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -50,6 +50,7 @@
#include "fscache.h"
#include "dns_resolve.h"
#include "pnfs.h"
+#include "netns.h"
#define NFSDBG_FACILITY NFSDBG_VFS
@@ -1550,6 +1551,25 @@ static void nfsiod_stop(void)
destroy_workqueue(wq);
}
+int nfs_net_id;
+
+static int nfs_net_init(struct net *net)
+{
+ return nfs_dns_resolver_cache_init(net);
+}
+
+static void nfs_net_exit(struct net *net)
+{
+ nfs_dns_resolver_cache_destroy(net);
+}
+
+static struct pernet_operations nfs_net_ops = {
+ .init = nfs_net_init,
+ .exit = nfs_net_exit,
+ .id = &nfs_net_id,
+ .size = sizeof(struct nfs_net),
+};
+
/*
* Initialize NFS
*/
@@ -1559,10 +1579,14 @@ static int __init init_nfs_fs(void)
err = nfs_idmap_init();
if (err < 0)
- goto out9;
+ goto out10;
err = nfs_dns_resolver_init();
if (err < 0)
+ goto out9;
+
+ err = register_pernet_subsys(&nfs_net_ops);
+ if (err < 0)
goto out8;
err = nfs_fscache_register();
@@ -1623,10 +1647,12 @@ out5:
out6:
nfs_fscache_unregister();
out7:
- nfs_dns_resolver_destroy();
+ unregister_pernet_subsys(&nfs_net_ops);
out8:
- nfs_idmap_quit();
+ nfs_dns_resolver_destroy();
out9:
+ nfs_idmap_quit();
+out10:
return err;
}
@@ -1638,6 +1664,7 @@ static void __exit exit_nfs_fs(void)
nfs_destroy_inodecache();
nfs_destroy_nfspagecache();
nfs_fscache_unregister();
+ unregister_pernet_subsys(&nfs_net_ops);
nfs_dns_resolver_destroy();
nfs_idmap_quit();
#ifdef CONFIG_PROC_FS
diff --git a/fs/nfs/netns.h b/fs/nfs/netns.h
new file mode 100644
index 0000000..8c1f130
--- /dev/null
+++ b/fs/nfs/netns.h
@@ -0,0 +1,13 @@
+#ifndef __NFS_NETNS_H__
+#define __NFS_NETNS_H__
+
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
+
+struct nfs_net {
+ struct cache_detail *nfs_dns_resolve;
+};
+
+extern int nfs_net_id;
+
+#endif
diff --git a/fs/nfs/nfs4namespace.c b/fs/nfs/nfs4namespace.c
index bb80c49..919a369 100644
--- a/fs/nfs/nfs4namespace.c
+++ b/fs/nfs/nfs4namespace.c
@@ -94,13 +94,14 @@ static int nfs4_validate_fspath(struct dentry *dentry,
}
static size_t nfs_parse_server_name(char *string, size_t len,
- struct sockaddr *sa, size_t salen)
+ struct sockaddr *sa, size_t salen, struct nfs_server *server)
{
ssize_t ret;
ret = rpc_pton(string, len, sa, salen);
if (ret == 0) {
- ret = nfs_dns_resolve_name(string, len, sa, salen);
+ ret = nfs_dns_resolve_name(server->client->cl_xprt->xprt_net,
+ string, len, sa, salen);
if (ret < 0)
ret = 0;
}
@@ -137,7 +138,8 @@ static struct vfsmount *try_location(struct nfs_clone_mount *mountdata,
continue;
mountdata->addrlen = nfs_parse_server_name(buf->data, buf->len,
- mountdata->addr, addr_bufsize);
+ mountdata->addr, addr_bufsize,
+ NFS_SB(mountdata->sb));
if (mountdata->addrlen == 0)
continue;
^ permalink raw reply related
* [PATCH 5/6] NFS: DNS resolver PipeFS notifier introduced
From: Stanislav Kinsbursky @ 2011-11-25 14:13 UTC (permalink / raw)
To: Trond.Myklebust
Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, jbottomley,
bfields, davem, devel
In-Reply-To: <20111125130557.6271.95071.stgit@localhost6.localdomain6>
This patch subscribes DNS resolver caches to RPC pipefs notifications. Notifier
is registering on NFS module load. This notifier callback is responsible for
creation/destruction of PipeFS DNS resolver cache directory.
Note that no locking required in notifier callback because PipeFS superblock
pointer is passed as an argument from it's creation or destruction routine and
thus we can be sure about it's validity.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
fs/nfs/cache_lib.c | 4 ++--
fs/nfs/cache_lib.h | 4 ++++
fs/nfs/dns_resolve.c | 38 +++++++++++++++++++++++++++++++++++++-
3 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/cache_lib.c b/fs/nfs/cache_lib.c
index 5dd017b..5905a31 100644
--- a/fs/nfs/cache_lib.c
+++ b/fs/nfs/cache_lib.c
@@ -112,7 +112,7 @@ int nfs_cache_wait_for_upcall(struct nfs_cache_defer_req *dreq)
return 0;
}
-static int nfs_cache_register_sb(struct super_block *sb, struct cache_detail *cd)
+int nfs_cache_register_sb(struct super_block *sb, struct cache_detail *cd)
{
int ret;
struct dentry *dir;
@@ -147,7 +147,7 @@ err:
return ret;
}
-static void nfs_cache_unregister_sb(struct super_block *sb, struct cache_detail *cd)
+void nfs_cache_unregister_sb(struct super_block *sb, struct cache_detail *cd)
{
if (cd->u.pipefs.dir)
sunrpc_cache_unregister_pipefs(cd);
diff --git a/fs/nfs/cache_lib.h b/fs/nfs/cache_lib.h
index e0a6cc4..317db95 100644
--- a/fs/nfs/cache_lib.h
+++ b/fs/nfs/cache_lib.h
@@ -27,3 +27,7 @@ extern void nfs_cache_init(struct cache_detail *cd);
extern void nfs_cache_destroy(struct cache_detail *cd);
extern int nfs_cache_register_net(struct net *net, struct cache_detail *cd);
extern void nfs_cache_unregister_net(struct net *net, struct cache_detail *cd);
+extern int nfs_cache_register_sb(struct super_block *sb,
+ struct cache_detail *cd);
+extern void nfs_cache_unregister_sb(struct super_block *sb,
+ struct cache_detail *cd);
diff --git a/fs/nfs/dns_resolve.c b/fs/nfs/dns_resolve.c
index 9aea78a..200eb67 100644
--- a/fs/nfs/dns_resolve.c
+++ b/fs/nfs/dns_resolve.c
@@ -40,6 +40,7 @@ ssize_t nfs_dns_resolve_name(struct net *net, char *name, size_t namelen,
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/cache.h>
#include <linux/sunrpc/svcauth.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
#include "dns_resolve.h"
#include "cache_lib.h"
@@ -400,12 +401,47 @@ void nfs_dns_resolver_cache_destroy(struct net *net)
kfree(cd);
}
+static int rpc_pipefs_event(struct notifier_block *nb, unsigned long event,
+ void *ptr)
+{
+ struct super_block *sb = ptr;
+ struct net *net = sb->s_fs_info;
+ struct nfs_net *nn = net_generic(net, nfs_net_id);
+ struct cache_detail *cd = nn->nfs_dns_resolve;
+ int ret = 0;
+
+ if (cd == NULL)
+ return 0;
+
+ if (!try_module_get(THIS_MODULE))
+ return 0;
+
+ switch (event) {
+ case RPC_PIPEFS_MOUNT:
+ ret = nfs_cache_register_sb(sb, cd);
+ break;
+ case RPC_PIPEFS_UMOUNT:
+ nfs_cache_unregister_sb(sb, cd);
+ break;
+ default:
+ ret = -ENOTSUPP;
+ break;
+ }
+ module_put(THIS_MODULE);
+ return ret;
+}
+
+static struct notifier_block nfs_dns_resolver_block = {
+ .notifier_call = rpc_pipefs_event,
+};
+
int nfs_dns_resolver_init(void)
{
- return 0;
+ return rpc_pipefs_notifier_register(&nfs_dns_resolver_block);
}
void nfs_dns_resolver_destroy(void)
{
+ rpc_pipefs_notifier_unregister(&nfs_dns_resolver_block);
}
#endif
^ permalink raw reply related
* [PATCH 6/6] NFS: remove RPC PipeFS mount point references from NFS cache routines
From: Stanislav Kinsbursky @ 2011-11-25 14:13 UTC (permalink / raw)
To: Trond.Myklebust
Cc: linux-nfs, xemul, neilb, netdev, linux-kernel, jbottomley,
bfields, davem, devel
In-Reply-To: <20111125130557.6271.95071.stgit@localhost6.localdomain6>
This is a cleanup patch. We don't need this reference anymore, because DNS
resolver cache now creates it's dentries in per-net operations and on PipeFS
mount/umount notification.
Note that nfs_cache_register_net() now returns 0 instead of -ENOENT in case of
PiepFS superblock absence. This is ok, Dns resolver cache will be regestered on
PipeFS mount event.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
---
fs/nfs/cache_lib.c | 19 ++++---------------
1 files changed, 4 insertions(+), 15 deletions(-)
diff --git a/fs/nfs/cache_lib.c b/fs/nfs/cache_lib.c
index 5905a31..dded263 100644
--- a/fs/nfs/cache_lib.c
+++ b/fs/nfs/cache_lib.c
@@ -126,24 +126,14 @@ int nfs_cache_register_sb(struct super_block *sb, struct cache_detail *cd)
int nfs_cache_register_net(struct net *net, struct cache_detail *cd)
{
- struct vfsmount *mnt;
struct super_block *pipefs_sb;
- int ret;
+ int ret = 0;
- mnt = rpc_get_mount();
- if (IS_ERR(mnt))
- return PTR_ERR(mnt);
pipefs_sb = rpc_get_sb_net(net);
- if (!pipefs_sb) {
- ret = -ENOENT;
- goto err;
+ if (pipefs_sb) {
+ ret = nfs_cache_register_sb(pipefs_sb, cd);
+ rpc_put_sb_net(net);
}
- ret = nfs_cache_register_sb(pipefs_sb, cd);
- rpc_put_sb_net(net);
- if (!ret)
- return ret;
-err:
- rpc_put_mount();
return ret;
}
@@ -162,7 +152,6 @@ void nfs_cache_unregister_net(struct net *net, struct cache_detail *cd)
nfs_cache_unregister_sb(pipefs_sb, cd);
rpc_put_sb_net(net);
}
- rpc_put_mount();
}
void nfs_cache_init(struct cache_detail *cd)
^ permalink raw reply related
* RE: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
From: David Laight @ 2011-11-25 14:19 UTC (permalink / raw)
To: Hans Schillstrom, kaber, pablo, jengelh, netfilter-devel, netdev
Cc: hans.schillstrom
In-Reply-To: <1322213787-25796-2-git-send-email-hans@schillstrom.com>
> + addr1 = (__force u32) ip6->saddr.s6_addr32[3];
> + addr2 = (__force u32) ip6->daddr.s6_addr32[3];
...
> + ports.v32 = * (__force u32 *) (skb->data + nhoff);
Is this code even vaguely portable??
I suspect the 'ports' bit has serious endianness problems.
I'm also not sure whether linux guarantees the alignment
of skb->data here.
David
^ permalink raw reply
* RE: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Eric Dumazet @ 2011-11-25 14:36 UTC (permalink / raw)
To: David Laight
Cc: Hans Schillstrom, kaber, pablo, jengelh, netfilter-devel, netdev,
hans.schillstrom
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6D8AEE3@saturn3.aculab.com>
Le vendredi 25 novembre 2011 à 14:19 +0000, David Laight a écrit :
> > + addr1 = (__force u32) ip6->saddr.s6_addr32[3];
> > + addr2 = (__force u32) ip6->daddr.s6_addr32[3];
> ...
> > + ports.v32 = * (__force u32 *) (skb->data + nhoff);
>
> Is this code even vaguely portable??
Yes it is.
> I suspect the 'ports' bit has serious endianness problems.
We dont care of endianness here, and we document it with the (__force
u32) cast.
> I'm also not sure whether linux guarantees the alignment
> of skb->data here.
It is guaranteed in whole linux stack.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [v4 PATCH 1/2] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Eric Dumazet @ 2011-11-25 14:43 UTC (permalink / raw)
To: Hans Schillstrom
Cc: kaber, pablo, jengelh, netfilter-devel, netdev, hans.schillstrom
In-Reply-To: <1322213787-25796-2-git-send-email-hans@schillstrom.com>
Le vendredi 25 novembre 2011 à 10:36 +0100, Hans Schillstrom a écrit :
> From: Hans Schillstrom <hans.schillstrom@ericsson.com>
>
> The target allows you to create rules in the "raw" and "mangle" tables
> which alter the netfilter mark (nfmark) field within a given range.
> First a 32 bit hash value is generated then modulus by <limit> and
> finally an offset is added before it's written to nfmark.
> Prior to routing, the nfmark can influence the routing method (see
> "Use netfilter MARK value as routing key") and can also be used by
> other subsystems to change their behavior.
>
Oh well, yet another duplicated flow dissector ...
> +/*
> + * Calc hash value, special casre is taken on icmp and fragmented messages
> + * i.e. fragmented messages don't use ports.
> + */
> +__u32 hmark_v6(struct sk_buff *skb, const struct xt_action_param *par)
> +{
> + struct xt_hmark_info *info = (struct xt_hmark_info *)par->targinfo;
> +no6ports:
> + nexthdr &= info->prmask;
> + /* get a consistent hash (same value on both flow directions) */
> + if (addr2 < addr1)
> + swap(addr1, addr2);
> + hash = jhash_3words(addr1, addr2, ports.v32, info->hashrnd) ^ nexthdr;
whats the point computing hash, if info->hmod is null, since we dont set
skb->mark ?
> + if (info->hmod)
> + skb->mark = (hash % info->hmod) + info->hoffs;
> +
> + return XT_CONTINUE;
> +}
> +#endif
> +
Same problem/question on hmark_v4()
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox