Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next-2.6] be2net: request native mode each time the card is reset
From: Sathya Perla @ 2011-07-20  5:52 UTC (permalink / raw)
  To: netdev

Currently be3-native mode is requested only in probe(). It must be requested, each time the card is reset either after an EEH error or after
sleep/hibernation.
Also, the be_cmd_check_native_mode() is better named be_cmd_req_native_mode()

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/benet/be_cmds.c |    2 +-
 drivers/net/benet/be_cmds.h |    2 +-
 drivers/net/benet/be_main.c |    6 ++++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index f520a5c..054fa67 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -2390,7 +2390,7 @@ err:
 }
 
 /* Uses mbox */
-int be_cmd_check_native_mode(struct be_adapter *adapter)
+int be_cmd_req_native_mode(struct be_adapter *adapter)
 {
 	struct be_mcc_wrb *wrb;
 	struct be_cmd_req_set_func_cap *req;
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index 1151df6..8e4d488 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -1545,7 +1545,7 @@ extern int be_cmd_set_qos(struct be_adapter *adapter, u32 bps, u32 domain);
 extern void be_detect_dump_ue(struct be_adapter *adapter);
 extern int be_cmd_get_die_temperature(struct be_adapter *adapter);
 extern int be_cmd_get_cntl_attributes(struct be_adapter *adapter);
-extern int be_cmd_check_native_mode(struct be_adapter *adapter);
+extern int be_cmd_req_native_mode(struct be_adapter *adapter);
 extern int be_cmd_get_reg_len(struct be_adapter *adapter, u32 *log_size);
 extern void be_cmd_get_regs(struct be_adapter *adapter, u32 buf_len, void *buf);
 
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index ae2d262..c411bb1 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -2511,6 +2511,8 @@ static int be_setup(struct be_adapter *adapter)
 	int status;
 	u8 mac[ETH_ALEN];
 
+	be_cmd_req_native_mode(adapter);
+
 	cap_flags = en_flags = BE_IF_FLAGS_UNTAGGED |
 				BE_IF_FLAGS_BROADCAST |
 				BE_IF_FLAGS_MULTICAST;
@@ -2618,6 +2620,8 @@ static int be_clear(struct be_adapter *adapter)
 
 	be_cmd_if_destroy(adapter, adapter->if_handle,  0);
 
+	adapter->be3_native = 0;
+
 	/* tell fw we're done with firing cmds */
 	be_cmd_fw_clean(adapter);
 	return 0;
@@ -3215,8 +3219,6 @@ static int be_get_config(struct be_adapter *adapter)
 	if (status)
 		return status;
 
-	be_cmd_check_native_mode(adapter);
-
 	if ((num_vfs && adapter->sriov_enabled) ||
 		(adapter->function_mode & 0x400) ||
 		lancer_chip(adapter) || !be_physfn(adapter)) {
-- 
1.7.4


^ permalink raw reply related

* Funds payment board
From: FINAL PAYMENT SETTLEMENT BOARD @ 2011-07-20  5:19 UTC (permalink / raw)



We wish to inform you now that the square peg is now in square hole and
can be accelerated so that your payment can be processed and will be
released to you as soon as you respond to this letter. Also note that from
the record in our file, YOUR APPROVED PAYMENT IS $850,000.00
(EIGHT HUNDRED AND FIFTY THOUSAND DOLLARS).I wrote to know if this is  
your valid email.

Please, let me know if this email is valid.

Email:debtsetlementboard@ozledim.net

Regards

Robin Steven






^ permalink raw reply

* Re: [PATCH 1/3] stmmac: unify MAC and PHY configuration parameters
From: Giuseppe CAVALLARO @ 2011-07-20  5:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, stuart.menefy
In-Reply-To: <20110719.115738.131562530013095617.davem@davemloft.net>

Hello David,

On 7/19/2011 8:57 PM, David Miller wrote:
> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
> Date: Tue, 19 Jul 2011 10:38:02 +0200
> 
>> Prior to this change, most PHY configuration parameters were passed
>> into the STMMAC device as a separate PHY device. As well as being
>> unusual, this made it difficult to make changes to the MAC/PHY
>> relationship.
>>
>> This patch moves all the PHY parameters into the MAC configuration
>> structure, mainly as a separate structure. This allows us to completly
>> ignore the MDIO bus attached to a stmmac if desired, and not create
>> the PHY bus. It also allows the stmmac driver to use a different PHY
>> from the one it is connected to, for example a fixed PHY or bit banging
>> PHY.
>>
>> Also derive the stmmac/PHY connection type (MII/RMII etc) from the
>> mode can be passed into <platf>_configure_ethernet.
>> STLinux kernel at git://git.stlinux.com/stm/linux-sh4-2.6.32.y.git
>> provides several examples how to use this new infrastructure (that
>> actually is easier to maintain and clearer).
>>
>> Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
>> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> 
> I find these changes confusing, because I can't see where these
> platform data objects are created that end up being used by
> the stmmac driver.
> 
> I'm concerned about this because if you're changing these data
> structures, you'll need to update also the code that creates
> these platform data objects.

You are right and indeed I wanted to provide some other patches to show
how to use the new infrastructure instead of adding the stlinux git
where our platforms that already use that.

> Finally, this patch needs to update Documentation/networking/stmmac.txt

Agree and sorry to have forgotten that.
I'll update the driver's documentation trying to add a valid example how
to use the new structures (taking as example stm devel).
After that I'll send the patches again.

At any rate, as I also do via email, I'm happy to support all guys that
are starting to use in the driver on several architectures (arm, sh, ppc).

This patch is useful because tidy-up the code and make easier the MDIO
part (also in case we use fixed_link etc.).

Thanks for your feedback.

Regards
Peppe

> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply

* Re: [BUG] ipv6: all routes share same inetpeer
From: Eric Dumazet @ 2011-07-20  5:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110719.115929.106510307852361614.davem@davemloft.net>

Le mardi 19 juillet 2011 à 11:59 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 19 Jul 2011 20:57:50 +0200
> 
> > Le mardi 19 juillet 2011 à 20:20 +0200, Eric Dumazet a écrit :
> >> Le mardi 19 juillet 2011 à 10:37 -0700, David Miller a écrit :
> >> > From: Eric Dumazet <eric.dumazet@gmail.com>
> >> > Date: Tue, 19 Jul 2011 19:23:49 +0200
> >> > 
> >> > > Maybe you can find the bug before me ?
> >> > 
> >> > I think when we add the route we cow the metrics almost immediately.
> >> > The daddr is, unfortunately, fully prefixed at that point.
> >> 
> >> Yes, we shall provide a second ip6_rt_copy() argument, with the
> >> destination address.
> >> 
> > 
> > Hmm, or maybe just change the dst_copy_metrics(&rt->dst, &ort->dst);
> > call done from ip6_rt_copy(), to avoid doing the COW if not really
> > needed ?
> 
> This is ok if it handles the case where ort's metrics point to
> writable inetpeer memory.

OK but if ort's metrics are writeable we must perform the dst_copy_metrics()
and therefore fill rt6i_dst before ?

My first patch had an issue in rt6_alloc_cow(), line 710, where 
ipv6_addr_equal(&rt->rt6i_dst.addr, daddr) becomes always true.

I guess I can replace it by ipv6_addr_equal(&ort->rt6i_dst.addr, daddr)




^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Eric Dumazet @ 2011-07-20  4:24 UTC (permalink / raw)
  To: Neil Horman
  Cc: Ben Greear, Jiri Pirko, netdev, Alexey Dobriyan, David S. Miller
In-Reply-To: <20110720020737.GB2692@neilslaptop.think-freely.org>

Le mardi 19 juillet 2011 à 22:07 -0400, Neil Horman a écrit :
> > 
> I think this is a good idea.  It lets pktgen dynamically make the clone/share
> decision dynamically and only impacts performance for those systems.
> 

Just let pktgen refuse to use clone_skb command for these devices.

At that point, an userland application is going to be faster and more
flexible than pktgen.

So when I said "remove pktgen" from distro it was not a joke.
Maybe its time to admit pktgen has to be removed from kernel sources.

$ wc net/core/pktgen.c
 3788 10881 92771 net/core/pktgen.c

Hmmm, 3788 lines, patched 180 times for a thing that only sends UDP
frames...

^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Ben Greear @ 2011-07-20  4:19 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, Jiri Pirko, netdev, Alexey Dobriyan,
	David S. Miller
In-Reply-To: <20110720020737.GB2692@neilslaptop.think-freely.org>

On 07/19/2011 07:07 PM, Neil Horman wrote:
> On Tue, Jul 19, 2011 at 05:52:49PM -0700, Ben Greear wrote:
>> On 07/19/2011 05:43 PM, Eric Dumazet wrote:
>>> Le mardi 19 juillet 2011 à 20:19 -0400, Neil Horman a écrit :

>> When the features-flags work gets completed so that we can start adding
>> new flags, we could add a CANT_DO_MULTI_SKB flag to drivers with known
>> issues and then restrict pktgen config accordingly.
>>
> I think this is a good idea.  It lets pktgen dynamically make the clone/share
> decision dynamically and only impacts performance for those systems.
>
>> Upstream code already clears skb memory to avoid leaking kernel memory
>> contents, so if you take away multi-skb too, pktgen is going to suck
>> at what it is supposed to do:  run fast as possible.
>>
> I don't want to take away multi-skb, but I do want pktgen to work reliably.  I
> think flagging drivers that need unshared skbs is the way to go.

Lets all cheer on the Intel NIC driver guys and Mr. Miraslaw then!

>> If you want real fun, use pktgen on a wlan0 device...it will crash
>> regardless of whether you use multi-skb or not because of xmit-queue
>> number issues :P
>>
> I'll try that, thanks :)

Here's a probably-white-space damaged, and deemed-unfit-for-kernel patch
that at least works around the problem...my users don't really need pktgen
on wlans, so I didn't put any extra effort into it, but for someone
more motivated..this might be a good starting point :)

I guess it doesn't really crash, but if I recall correctly, it filled
up the queues in a bad way and effectively blocked all wifi traffic
until reboot, or maybe rmmod ath9k.


 From d57130f29843cab30196b11d4476231f245e3f92 Mon Sep 17 00:00:00 2001
From: Ben Greear <greearb@candelatech.com>
Date: Wed, 9 Feb 2011 16:58:42 -0800
Subject: [PATCH 31/38] mac80211: Set up tx-queue-mapping in subif_start_xmit.

Otherwise, ath9k gets confused about which queue to use
and spews a warning like this when driving traffic with
pktgen.

WARNING: at /home/greearb/git/linux.wireless-testing-ct/drivers/net/wireless/ath/ath9k/xmit.c:1748 ath_tx_start+0x4a2/0x662 [ath9k]()
Hardware name: To Be Filled By O.E.M.
Modules linked in: ath5k arc4 ath9k mac80211 ath9k_common ath9k_hw ath cfg80211 nfs lockd bluetooth cryptd aes_i586 aes_generic veth 8021q garp stp l]
Pid: 1729, comm: kpktgend_0 Tainted: G        W   2.6.38-rc4-wl+ #21
Call Trace:
  [<c043091b>] ? warn_slowpath_common+0x65/0x7a
  [<fabe784e>] ? ath_tx_start+0x4a2/0x662 [ath9k]
  [<c043093f>] ? warn_slowpath_null+0xf/0x13
  [<fabe784e>] ? ath_tx_start+0x4a2/0x662 [ath9k]
  [<fabe14d0>] ? ath9k_tx+0x14f/0x183 [ath9k]
  [<fab9026d>] ? __ieee80211_tx+0x10c/0x18c [mac80211]
  [<fab90397>] ? ieee80211_tx+0xaa/0x188 [mac80211]
  [<fab905f3>] ? ieee80211_xmit+0x17e/0x186 [mac80211]
  [<fab8ecc0>] ? ieee80211_skb_resize+0x8e/0xd2 [mac80211]
  [<fab9148b>] ? ieee80211_subif_start_xmit+0x643/0x65c [mac80211]
  [<c0440000>] ? rescuer_thread+0x25/0x1c8
  [<f92cd354>] ? pktgen_thread_worker+0x114c/0x1b44 [pktgen]
  [<fab90e48>] ? ieee80211_subif_start_xmit+0x0/0x65c [mac80211]
  [<c042d612>] ? default_wake_function+0xb/0xd
  [<c04254c7>] ? __wake_up_common+0x34/0x5c
  [<c0443a29>] ? autoremove_wake_function+0x0/0x2f
  [<f92cc208>] ? pktgen_thread_worker+0x0/0x1b44 [pktgen]
  [<c044371a>] ? kthread+0x62/0x67
  [<c04436b8>] ? kthread+0x0/0x67
  [<c04035f6>] ? kernel_thread_helper+0x6/0x10

Signed-off-by: Ben Greear <greearb@candelatech.com>
---
:100644 100644 64e0f75... e8ff199... M	net/mac80211/tx.c
  net/mac80211/tx.c |    2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 64e0f75..e8ff199 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1971,6 +1971,8 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
  	} else
  		memcpy(skb_push(skb, hdrlen), &hdr, hdrlen);

+	skb_set_queue_mapping(skb, ieee80211_select_queue(sdata, skb));
+
  	nh_pos += hdrlen;
  	h_pos += hdrlen;

-- 
1.7.3.4


>
> Regards
> Neil


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply related

* (unknown)
From: 168 @ 2011-07-20  2:16 UTC (permalink / raw)




^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Neil Horman @ 2011-07-20  2:07 UTC (permalink / raw)
  To: Ben Greear
  Cc: Eric Dumazet, Jiri Pirko, netdev, Alexey Dobriyan,
	David S. Miller
In-Reply-To: <4E2626E1.6030005@candelatech.com>

On Tue, Jul 19, 2011 at 05:52:49PM -0700, Ben Greear wrote:
> On 07/19/2011 05:43 PM, Eric Dumazet wrote:
> >Le mardi 19 juillet 2011 à 20:19 -0400, Neil Horman a écrit :
> 
> >>I'm sensitive to the performance impact, but I would much rather see a lower
> >>performing pktgen that doesn't randomly crash, and bring the performance back up
> >>in a safe, reliable way.  To that end, I've been starting to think about
> >>pre-allocating a ring buffer of skbs with a skb->users count biased up to
> >>prevent driver freeing.  That way we could detect 'unused skb's' by a user count
> >>that was at the bias level.  Thoughts?
> >>
> >
> >I dont know. I use pktgen maybe once per week and never got a single
> >crash like this. We probably are very few pktgen users in the world, and
> >we use it exactly to avoid calling skb_clone() or other expensive per
> >xmit setup.
> >
> >Just remove pktgen from RedHat kernels, if you dont trust sysadmins.
> ># CONFIG_PKTGEN is not set
> >
> >Alternatively, add a check to problematic drivers to _not_ mess skb if
> >skb_shared(skb) is true : eventually use skb_share_check()
> 
> When the features-flags work gets completed so that we can start adding
> new flags, we could add a CANT_DO_MULTI_SKB flag to drivers with known
> issues and then restrict pktgen config accordingly.
> 
I think this is a good idea.  It lets pktgen dynamically make the clone/share
decision dynamically and only impacts performance for those systems.

> Upstream code already clears skb memory to avoid leaking kernel memory
> contents, so if you take away multi-skb too, pktgen is going to suck
> at what it is supposed to do:  run fast as possible.
> 
I don't want to take away multi-skb, but I do want pktgen to work reliably.  I
think flagging drivers that need unshared skbs is the way to go.

> If you want real fun, use pktgen on a wlan0 device...it will crash
> regardless of whether you use multi-skb or not because of xmit-queue
> number issues :P
> 
I'll try that, thanks :)

Regards
Neil


^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Neil Horman @ 2011-07-20  1:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jiri Pirko, netdev, Alexey Dobriyan, David S. Miller
In-Reply-To: <1311122593.3113.46.camel@edumazet-laptop>

On Wed, Jul 20, 2011 at 02:43:12AM +0200, Eric Dumazet wrote:
> Le mardi 19 juillet 2011 à 20:19 -0400, Neil Horman a écrit :
> > > 
> > You are correct Eric, this can cause a significant performance regression, but I
> > think that beats causing a panic or other unexpected behavior.  I read your
> > previous threads with others regarding fixing this with vlans, but I don't think
> > its fair to just say 'its fast, but it might cause oopses'. 
> > 
> > And its not sufficient to simply forbid soft drivers to make use of pktgen, its
> > not just a soft driver problem, its systemic.  Any driver which assumes that it
> > has exclusive access to an skb submitted for transmit is at risk from pktgen in
> > its current implementation.  That of course as a subset includes all the soft
> > drivers, but others are also suceptible.  As examples (some of which I noted in
> > the origional post) virtio_net uses the skb->cb to hold vnet header information
> > which will be corrupted on sucessive sends.  bnx2x linearizes skbs under certain
> > circumstances, which means pktgen, if it marshals a fragmented frame will not
> > send a fragmented frame after the first iteration.  The PPP and Slip drivers
> > skb_push the skb to prepend a header to the frame on send, meaning sucessive
> > uses, up until they get an skb_under_panic will get iteratively more malformed
> > frames on the wire as ppp headers get stacked on top of one another.  These are
> > ust a few of the examples I've found.
> > 
> > The long and the short of it in my mind, is that we have a fundamental
> > disconnect between driver asumptions and pktgen.  If its ok to submit shared
> > skbs to drivers, then we need to augment drivers that modify skbs on transmit to
> > clone the skb (likey not an efficient solution), or if its not ok to do so, we
> > need to change pktgen to not behave that way.
> > 
> 
> Its a known problem, please check mail archives. Nobody felt a fix was
> needed.
> 
As I said in my origional note, I said I read the archives, I didn't agree with
the conclusion that a fix was unnecessecary.  I'm sorry if you don't care for
dissenting opinions.

> > > Note : a sysadmin has other ways to make a machine panic or reboot or
> > > halt...
> > Yes, predictable ways, that the sysadmin can see coming based on what they're
> > doing (i.e. no one should be shocked if they dd /dev/random to /dev/kmem and get
> > a hang or panic, or if they issue a sysrq-c, etc).  This case is different.  A
> > sysadmin reasonably expects pktgen to send the frames they configure on the
> > interface they specify.  While its arguably reasonable to forsee that it may not
> > work with soft interfaces, pktgen just won't work with some hardware drivers (as
> > per the examples above).  And it won't always be an oops, it may be occasional
> > random behvaior in the output data, and its highly dependent not just on the use
> > of pktgen, but rather the specific command(s) issued.
> > 
> > 
> > I'm sensitive to the performance impact, but I would much rather see a lower
> > performing pktgen that doesn't randomly crash, and bring the performance back up
> > in a safe, reliable way.  To that end, I've been starting to think about
> > pre-allocating a ring buffer of skbs with a skb->users count biased up to
> > prevent driver freeing.  That way we could detect 'unused skb's' by a user count
> > that was at the bias level.  Thoughts?
> > 
> 
> I dont know. I use pktgen maybe once per week and never got a single
> crash like this. We probably are very few pktgen users in the world, and
> we use it exactly to avoid calling skb_clone() or other expensive per
> xmit setup.
Please re-read my origional post.  Lots of drivers work just fine, some don't.
Some just behave differently.  Its the random results thats broken and I feel
needs fixing.  I get that performance is an issue, and I'm open to other
solutions, I'm not open to just saying 'its fine, mostly'.

> 
> Just remove pktgen from RedHat kernels, if you dont trust sysadmins.
> # CONFIG_PKTGEN is not set
> 
You're twisting my words.  At what point in time did I say I
don't trust sysadmins?  I want to give them a tool that works reliably without
them having to comb through their nic drivers xmit patch to ensure that pktgen
works without crashing or causing other odd behavior.  I don't think thats too
much to ask.
 
> Alternatively, add a check to problematic drivers to _not_ mess skb if
> skb_shared(skb) is true : eventually use skb_share_check()
> 
The former isn't feasible, as many skb modifications are neccesitated by the
nature of the hardware.  The latter is possible, but far less scalable than just
modifying pktgen.

FWIW, I like Ben's solution, adding a flag to drivers noting that they can't
handle multi-skb.  Then we can dynamically enforce a clone in pktgen when needed
(or buffer up additional skbs)

^ permalink raw reply

* Re: [PATCH] Constrain UFO fragment sizes to multiples of 8 bytes
From: Maciej Żenczykowski @ 2011-07-20  1:30 UTC (permalink / raw)
  To: Bill Sommerfeld; +Cc: David S. Miller, netdev, Tom Herbert
In-Reply-To: <1311124953-32159-1-git-send-email-wsommerfeld@google.com>

Ack.
This should probably make it into all manner of stable branches.

On Tue, Jul 19, 2011 at 6:22 PM, Bill Sommerfeld <wsommerfeld@google.com> wrote:
> Because the ip fragment offset field counts 8-byte chunks, ip
> fragments other than the last must contain a multiple of 8 bytes of
> payload.  ip_ufo_append_data wasn't respecting this constraint and,
> depending on the MTU and ip option sizes, could create malformed
> non-final fragments.
>
> Google-Bug-Id: 5009328
> Signed-off-by: Bill Sommerfeld <wsommerfeld@google.com>
> ---
>
> Note to reviewers: The first two hunks simply rename the "mtu"
> parameter to "maxfraglen"; the real change is in the third hunk.
> maxfraglen (the size of the largest non-final fragment which fits
> inside mtu) is already computed for the non-UFO fragmentation path.
>
> We saw this problem in a 2.6.34-based kernel with a bond device that
> had UFO enabled (as a result of changeset d9f5950f90292f7c);
> 1742f183fc218798 rewrote netdev_increment_features to no longer force
> on UFO.  The MTU on this bond device was set to 1470; fragmented
> datagrams were mangled, causing the receiver to get errors of the
> form:
>
> [506259.362640] UDP: short packet: From x.x.x.x:nnnn 1471/1469 to y.y.y.y:nnnn
>
>  net/ipv4/ip_output.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index be27e60..ccaaa85 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -734,7 +734,7 @@ static inline int ip_ufo_append_data(struct sock *sk,
>                        int getfrag(void *from, char *to, int offset, int len,
>                               int odd, struct sk_buff *skb),
>                        void *from, int length, int hh_len, int fragheaderlen,
> -                       int transhdrlen, int mtu, unsigned int flags)
> +                       int transhdrlen, int maxfraglen, unsigned int flags)
>  {
>        struct sk_buff *skb;
>        int err;
> @@ -767,7 +767,7 @@ static inline int ip_ufo_append_data(struct sock *sk,
>                skb->csum = 0;
>
>                /* specify the length of each IP datagram fragment */
> -               skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
> +               skb_shinfo(skb)->gso_size = maxfraglen - fragheaderlen;
>                skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
>                __skb_queue_tail(queue, skb);
>        }
> @@ -831,7 +831,7 @@ static int __ip_append_data(struct sock *sk,
>            (rt->dst.dev->features & NETIF_F_UFO) && !rt->dst.header_len) {
>                err = ip_ufo_append_data(sk, queue, getfrag, from, length,
>                                         hh_len, fragheaderlen, transhdrlen,
> -                                        mtu, flags);
> +                                        maxfraglen, flags);
>                if (err)
>                        goto error;
>                return 0;
> --
> 1.7.3.1
>
>



-- 
Maciej A. Żenczykowski
Kernel Networking Developer @ Google
1600 Amphitheatre Parkway, Mountain View, CA 94043
tel: +1 (650) 253-0062

^ permalink raw reply

* [PATCH] Constrain UFO fragment sizes to multiples of 8 bytes
From: Bill Sommerfeld @ 2011-07-20  1:22 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Tom Herbert, Maciej Żenczykowski, Bill Sommerfeld

Because the ip fragment offset field counts 8-byte chunks, ip
fragments other than the last must contain a multiple of 8 bytes of
payload.  ip_ufo_append_data wasn't respecting this constraint and,
depending on the MTU and ip option sizes, could create malformed
non-final fragments.

Google-Bug-Id: 5009328
Signed-off-by: Bill Sommerfeld <wsommerfeld@google.com>
---

Note to reviewers: The first two hunks simply rename the "mtu"
parameter to "maxfraglen"; the real change is in the third hunk.
maxfraglen (the size of the largest non-final fragment which fits
inside mtu) is already computed for the non-UFO fragmentation path.

We saw this problem in a 2.6.34-based kernel with a bond device that
had UFO enabled (as a result of changeset d9f5950f90292f7c);
1742f183fc218798 rewrote netdev_increment_features to no longer force
on UFO.  The MTU on this bond device was set to 1470; fragmented
datagrams were mangled, causing the receiver to get errors of the
form:

[506259.362640] UDP: short packet: From x.x.x.x:nnnn 1471/1469 to y.y.y.y:nnnn

 net/ipv4/ip_output.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index be27e60..ccaaa85 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -734,7 +734,7 @@ static inline int ip_ufo_append_data(struct sock *sk,
 			int getfrag(void *from, char *to, int offset, int len,
 			       int odd, struct sk_buff *skb),
 			void *from, int length, int hh_len, int fragheaderlen,
-			int transhdrlen, int mtu, unsigned int flags)
+			int transhdrlen, int maxfraglen, unsigned int flags)
 {
 	struct sk_buff *skb;
 	int err;
@@ -767,7 +767,7 @@ static inline int ip_ufo_append_data(struct sock *sk,
 		skb->csum = 0;

 		/* specify the length of each IP datagram fragment */
-		skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
+		skb_shinfo(skb)->gso_size = maxfraglen - fragheaderlen;
 		skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
 		__skb_queue_tail(queue, skb);
 	}
@@ -831,7 +831,7 @@ static int __ip_append_data(struct sock *sk,
 	    (rt->dst.dev->features & NETIF_F_UFO) && !rt->dst.header_len) {
 		err = ip_ufo_append_data(sk, queue, getfrag, from, length,
 					 hh_len, fragheaderlen, transhdrlen,
-					 mtu, flags);
+					 maxfraglen, flags);
 		if (err)
 			goto error;
 		return 0;
-- 
1.7.3.1

^ permalink raw reply related

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Ben Greear @ 2011-07-20  0:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neil Horman, Jiri Pirko, netdev, Alexey Dobriyan, David S. Miller
In-Reply-To: <1311122593.3113.46.camel@edumazet-laptop>

On 07/19/2011 05:43 PM, Eric Dumazet wrote:
> Le mardi 19 juillet 2011 à 20:19 -0400, Neil Horman a écrit :

>> I'm sensitive to the performance impact, but I would much rather see a lower
>> performing pktgen that doesn't randomly crash, and bring the performance back up
>> in a safe, reliable way.  To that end, I've been starting to think about
>> pre-allocating a ring buffer of skbs with a skb->users count biased up to
>> prevent driver freeing.  That way we could detect 'unused skb's' by a user count
>> that was at the bias level.  Thoughts?
>>
>
> I dont know. I use pktgen maybe once per week and never got a single
> crash like this. We probably are very few pktgen users in the world, and
> we use it exactly to avoid calling skb_clone() or other expensive per
> xmit setup.
>
> Just remove pktgen from RedHat kernels, if you dont trust sysadmins.
> # CONFIG_PKTGEN is not set
>
> Alternatively, add a check to problematic drivers to _not_ mess skb if
> skb_shared(skb) is true : eventually use skb_share_check()

When the features-flags work gets completed so that we can start adding
new flags, we could add a CANT_DO_MULTI_SKB flag to drivers with known
issues and then restrict pktgen config accordingly.

Upstream code already clears skb memory to avoid leaking kernel memory
contents, so if you take away multi-skb too, pktgen is going to suck
at what it is supposed to do:  run fast as possible.

If you want real fun, use pktgen on a wlan0 device...it will crash
regardless of whether you use multi-skb or not because of xmit-queue
number issues :P

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Eric Dumazet @ 2011-07-20  0:43 UTC (permalink / raw)
  To: Neil Horman; +Cc: Jiri Pirko, netdev, Alexey Dobriyan, David S. Miller
In-Reply-To: <20110720001904.GA1992@neilslaptop.think-freely.org>

Le mardi 19 juillet 2011 à 20:19 -0400, Neil Horman a écrit :
> > 
> You are correct Eric, this can cause a significant performance regression, but I
> think that beats causing a panic or other unexpected behavior.  I read your
> previous threads with others regarding fixing this with vlans, but I don't think
> its fair to just say 'its fast, but it might cause oopses'. 
> 
> And its not sufficient to simply forbid soft drivers to make use of pktgen, its
> not just a soft driver problem, its systemic.  Any driver which assumes that it
> has exclusive access to an skb submitted for transmit is at risk from pktgen in
> its current implementation.  That of course as a subset includes all the soft
> drivers, but others are also suceptible.  As examples (some of which I noted in
> the origional post) virtio_net uses the skb->cb to hold vnet header information
> which will be corrupted on sucessive sends.  bnx2x linearizes skbs under certain
> circumstances, which means pktgen, if it marshals a fragmented frame will not
> send a fragmented frame after the first iteration.  The PPP and Slip drivers
> skb_push the skb to prepend a header to the frame on send, meaning sucessive
> uses, up until they get an skb_under_panic will get iteratively more malformed
> frames on the wire as ppp headers get stacked on top of one another.  These are
> ust a few of the examples I've found.
> 
> The long and the short of it in my mind, is that we have a fundamental
> disconnect between driver asumptions and pktgen.  If its ok to submit shared
> skbs to drivers, then we need to augment drivers that modify skbs on transmit to
> clone the skb (likey not an efficient solution), or if its not ok to do so, we
> need to change pktgen to not behave that way.
> 

Its a known problem, please check mail archives. Nobody felt a fix was
needed.

> > Note : a sysadmin has other ways to make a machine panic or reboot or
> > halt...
> Yes, predictable ways, that the sysadmin can see coming based on what they're
> doing (i.e. no one should be shocked if they dd /dev/random to /dev/kmem and get
> a hang or panic, or if they issue a sysrq-c, etc).  This case is different.  A
> sysadmin reasonably expects pktgen to send the frames they configure on the
> interface they specify.  While its arguably reasonable to forsee that it may not
> work with soft interfaces, pktgen just won't work with some hardware drivers (as
> per the examples above).  And it won't always be an oops, it may be occasional
> random behvaior in the output data, and its highly dependent not just on the use
> of pktgen, but rather the specific command(s) issued.
> 
> 
> I'm sensitive to the performance impact, but I would much rather see a lower
> performing pktgen that doesn't randomly crash, and bring the performance back up
> in a safe, reliable way.  To that end, I've been starting to think about
> pre-allocating a ring buffer of skbs with a skb->users count biased up to
> prevent driver freeing.  That way we could detect 'unused skb's' by a user count
> that was at the bias level.  Thoughts?
> 

I dont know. I use pktgen maybe once per week and never got a single
crash like this. We probably are very few pktgen users in the world, and
we use it exactly to avoid calling skb_clone() or other expensive per
xmit setup.

Just remove pktgen from RedHat kernels, if you dont trust sysadmins.
# CONFIG_PKTGEN is not set

Alternatively, add a check to problematic drivers to _not_ mess skb if
skb_shared(skb) is true : eventually use skb_share_check()




^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Rick Jones @ 2011-07-20  0:25 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, Jiri Pirko, netdev, Alexey Dobriyan,
	David S. Miller
In-Reply-To: <20110720001904.GA1992@neilslaptop.think-freely.org>

How "everyday use" is pktgen considered?  As "everyday use" as 
netperf/iperf/whatnot? Is it actually considered something one would run 
on production rather than test systems?

rick jones

^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Neil Horman @ 2011-07-20  0:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jiri Pirko, netdev, Alexey Dobriyan, David S. Miller
In-Reply-To: <1311108107.3113.22.camel@edumazet-laptop>

On Tue, Jul 19, 2011 at 10:41:47PM +0200, Eric Dumazet wrote:
> Le mardi 19 juillet 2011 à 22:29 +0200, Jiri Pirko a écrit :
> 
> > You are right, but it may not cause panic, right? In case this patch
> > would cause significant performance regression, how about to just forbid
> > pktgen to run on soft-netdevs ?
> > 
> 
> Please do
> 
You are correct Eric, this can cause a significant performance regression, but I
think that beats causing a panic or other unexpected behavior.  I read your
previous threads with others regarding fixing this with vlans, but I don't think
its fair to just say 'its fast, but it might cause oopses'. 

And its not sufficient to simply forbid soft drivers to make use of pktgen, its
not just a soft driver problem, its systemic.  Any driver which assumes that it
has exclusive access to an skb submitted for transmit is at risk from pktgen in
its current implementation.  That of course as a subset includes all the soft
drivers, but others are also suceptible.  As examples (some of which I noted in
the origional post) virtio_net uses the skb->cb to hold vnet header information
which will be corrupted on sucessive sends.  bnx2x linearizes skbs under certain
circumstances, which means pktgen, if it marshals a fragmented frame will not
send a fragmented frame after the first iteration.  The PPP and Slip drivers
skb_push the skb to prepend a header to the frame on send, meaning sucessive
uses, up until they get an skb_under_panic will get iteratively more malformed
frames on the wire as ppp headers get stacked on top of one another.  These are
ust a few of the examples I've found.

The long and the short of it in my mind, is that we have a fundamental
disconnect between driver asumptions and pktgen.  If its ok to submit shared
skbs to drivers, then we need to augment drivers that modify skbs on transmit to
clone the skb (likey not an efficient solution), or if its not ok to do so, we
need to change pktgen to not behave that way.

> Note : a sysadmin has other ways to make a machine panic or reboot or
> halt...
Yes, predictable ways, that the sysadmin can see coming based on what they're
doing (i.e. no one should be shocked if they dd /dev/random to /dev/kmem and get
a hang or panic, or if they issue a sysrq-c, etc).  This case is different.  A
sysadmin reasonably expects pktgen to send the frames they configure on the
interface they specify.  While its arguably reasonable to forsee that it may not
work with soft interfaces, pktgen just won't work with some hardware drivers (as
per the examples above).  And it won't always be an oops, it may be occasional
random behvaior in the output data, and its highly dependent not just on the use
of pktgen, but rather the specific command(s) issued.

I'm sensitive to the performance impact, but I would much rather see a lower
performing pktgen that doesn't randomly crash, and bring the performance back up
in a safe, reliable way.  To that end, I've been starting to think about
pre-allocating a ring buffer of skbs with a skb->users count biased up to
prevent driver freeing.  That way we could detect 'unused skb's' by a user count
that was at the bias level.  Thoughts?

Neil

> 
> 
> 
> 

^ permalink raw reply

* Re: [PATCH] slcan: remove unused 'leased', 'line' and 'pid' fields from the 'slcan' structure
From: David Miller @ 2011-07-19 23:56 UTC (permalink / raw)
  To: matvejchikov; +Cc: netdev
In-Reply-To: <CAKh5naa=x13=wUE+MLAYv0XQ4OhuY5G47ZDmFz0vbkibNtoz-A@mail.gmail.com>

From: Matvejchikov Ilya <matvejchikov@gmail.com>
Date: Tue, 19 Jul 2011 11:58:48 +0400

> Signed-off-by: Matvejchikov Ilya <matvejchikov@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] slip: remove unused 'line' field from the 'slip' structure
From: David Miller @ 2011-07-19 23:55 UTC (permalink / raw)
  To: matvejchikov; +Cc: netdev
In-Reply-To: <CAKh5nabuWJtnfM6PEOkGbEog89jmiAAiPPArV==2F5mX0PUH-Q@mail.gmail.com>

From: Matvejchikov Ilya <matvejchikov@gmail.com>
Date: Tue, 19 Jul 2011 11:56:44 +0400

> Signed-off-by: Matvejchikov Ilya <matvejchikov@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 00/45] Update bna driver version to 3.0.2.0
From: David Miller @ 2011-07-19 23:24 UTC (permalink / raw)
  To: rmody; +Cc: netdev, adapter_linux_open_src_team
In-Reply-To: <E5313AF6F2BFD14293E5FD0F94750F86A83BDF4C07@HQ1-EXCH01.corp.brocade.com>

From: Rasesh Mody <rmody@brocade.com>
Date: Tue, 19 Jul 2011 15:53:06 -0700

> We can further split the submission into multiple smaller
> submissions (e.g. 4 patches a time as you suggested), but there
> still will be one big patch set for new hardware and feature
> enablement

That's fine, just don't add the patch that causes the driver
to recognize the new device IDs until all the necessary
support code is there.

^ permalink raw reply

* RE: [PATCH 00/45] Update bna driver version to 3.0.2.0
From: Rasesh Mody @ 2011-07-19 22:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, Adapter Linux Open SRC Team
In-Reply-To: <20110718.185805.413849001475627611.davem@davemloft.net>

>From: David Miller [mailto:davem@davemloft.net]
>Sent: Monday, July 18, 2011 6:58 PM
>
>From: Rasesh Mody <rmody@brocade.com>
>Date: Mon, 18 Jul 2011 17:48:03 -0700
>
>> We wanted to avoid submitting developing code that could break
>> upstream driver due to partial changes.
>
>So what you're saying is that if only some of the patches are applied
>the driver will not work properly?
>
>That's not allowed either, your set of changes must be fully
>bisectable meaning that at any point in the series the driver
>must still compile and work properly.
>
>You guys need to sort out how you guys submit your changes.
>
>When you have 4 patches ready to go, post them immediately.
>
>Because if there are fundamental problem with the first 4,
>everything else that depends upon them will need to change.

David,

We can further split the submission into multiple smaller submissions (e.g. 4 patches a time as you suggested), but there still will be one big patch set for new hardware and feature enablement, which cannot be split it into small bisectable patches. Is this acceptable? What is the upstream guideline for submitting big changes such as driver re-architecture or re-organization?

Thanks,
--Rasesh

^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Ben Greear @ 2011-07-19 21:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, Neil Horman, netdev, Alexey Dobriyan, David S. Miller
In-Reply-To: <1311108107.3113.22.camel@edumazet-laptop>

On 07/19/2011 01:41 PM, Eric Dumazet wrote:
> Le mardi 19 juillet 2011 à 22:29 +0200, Jiri Pirko a écrit :
>
>> You are right, but it may not cause panic, right? In case this patch
>> would cause significant performance regression, how about to just forbid
>> pktgen to run on soft-netdevs ?
>>
>
> Please do

No, please do not.  Pktgen *can* work on soft-devs if you do not
use multi-skb, and it can be useful for testing highs speed vlan
traffic and various other things.

If you must, forbid multi-skb on soft devices, but don't just
break pktgen for all soft-devices.

Thanks,
Ben

>
> Note : a sysadmin has other ways to make a machine panic or reboot or
> halt...
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv6: make fragment identifications less predictable
From: Matt Mackall @ 2011-07-19 20:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Fernando Gont, David Miller, security, Eugene Teo, netdev
In-Reply-To: <1311108423.3113.24.camel@edumazet-laptop>

On Tue, 2011-07-19 at 22:47 +0200, Eric Dumazet wrote:
> IPv6 fragment identification generation is way beyond what we use for
> IPv4 : It uses a single generator. Its not scalable and allows DOS
> attacks.
> 
> Now inetpeer is IPv6 aware, we can use it to provide a more secure and
> scalable frag ident generator (per destination, instead of system wide)

This code really needs to get moved out of random.c and into net/. Other
than that, looks fine to me.

> This patch :
> 1) defines a new secure_ipv6_id() helper
> 2) extends inet_getid() to provide 32bit results
> 3) extends ipv6_select_ident() with a new dest parameter
> 
> Reported-by: Fernando Gont <fernando@gont.com.ar>
> CC: Matt Mackall <mpm@selenic.com>
> CC: Eugene Teo <eugeneteo@kernel.sg>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  drivers/char/random.c  |   15 +++++++++++++++
>  include/linux/random.h |    1 +
>  include/net/inetpeer.h |   13 ++++++++++---
>  include/net/ipv6.h     |   12 +-----------
>  net/ipv4/inetpeer.c    |    7 +++++--
>  net/ipv6/ip6_output.c  |   36 +++++++++++++++++++++++++++++++-----
>  net/ipv6/udp.c         |    2 +-
>  7 files changed, 64 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index d4ddeba..7292819 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1523,6 +1523,21 @@ __u32 secure_ip_id(__be32 daddr)
>  	return half_md4_transform(hash, keyptr->secret);
>  }
>  
> +__u32 secure_ipv6_id(const __be32 daddr[4])
> +{
> +	const struct keydata *keyptr;
> +	__u32 hash[4];
> +
> +	keyptr = get_keyptr();
> +
> +	hash[0] = (__force __u32)daddr[0];
> +	hash[1] = (__force __u32)daddr[1];
> +	hash[2] = (__force __u32)daddr[2];
> +	hash[3] = (__force __u32)daddr[3];
> +
> +	return half_md4_transform(hash, keyptr->secret);
> +}
> +
>  #ifdef CONFIG_INET
>  
>  __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
> diff --git a/include/linux/random.h b/include/linux/random.h
> index fb7ab9d..ce29a04 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -58,6 +58,7 @@ extern void get_random_bytes(void *buf, int nbytes);
>  void generate_random_uuid(unsigned char uuid_out[16]);
>  
>  extern __u32 secure_ip_id(__be32 daddr);
> +extern __u32 secure_ipv6_id(const __be32 daddr[4]);
>  extern u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
>  extern u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
>  				      __be16 dport);
> diff --git a/include/net/inetpeer.h b/include/net/inetpeer.h
> index 39d1230..4233e6f 100644
> --- a/include/net/inetpeer.h
> +++ b/include/net/inetpeer.h
> @@ -71,7 +71,7 @@ static inline bool inet_metrics_new(const struct inet_peer *p)
>  }
>  
>  /* can be called with or without local BH being disabled */
> -struct inet_peer	*inet_getpeer(struct inetpeer_addr *daddr, int create);
> +struct inet_peer	*inet_getpeer(const struct inetpeer_addr *daddr, int create);
>  
>  static inline struct inet_peer *inet_getpeer_v4(__be32 v4daddr, int create)
>  {
> @@ -106,11 +106,18 @@ static inline void inet_peer_refcheck(const struct inet_peer *p)
>  
> 
>  /* can be called with or without local BH being disabled */
> -static inline __u16	inet_getid(struct inet_peer *p, int more)
> +static inline int inet_getid(struct inet_peer *p, int more)
>  {
> +	int old, new;
>  	more++;
>  	inet_peer_refcheck(p);
> -	return atomic_add_return(more, &p->ip_id_count) - more;
> +	do {
> +		old = atomic_read(&p->ip_id_count);
> +		new = old + more;
> +		if (!new)
> +			new = 1;
> +	} while (atomic_cmpxchg(&p->ip_id_count, old, new) != old);
> +	return new;
>  }
>  
>  #endif /* _NET_INETPEER_H */
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index c033ed0..3b5ac1f 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -463,17 +463,7 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add
>  	return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr));
>  }
>  
> -static __inline__ void ipv6_select_ident(struct frag_hdr *fhdr)
> -{
> -	static u32 ipv6_fragmentation_id = 1;
> -	static DEFINE_SPINLOCK(ip6_id_lock);
> -
> -	spin_lock_bh(&ip6_id_lock);
> -	fhdr->identification = htonl(ipv6_fragmentation_id);
> -	if (++ipv6_fragmentation_id == 0)
> -		ipv6_fragmentation_id = 1;
> -	spin_unlock_bh(&ip6_id_lock);
> -}
> +extern void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt);
>  
>  /*
>   *	Prototypes exported by ipv6
> diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
> index 90c5f0d..e382138 100644
> --- a/net/ipv4/inetpeer.c
> +++ b/net/ipv4/inetpeer.c
> @@ -391,7 +391,7 @@ static int inet_peer_gc(struct inet_peer_base *base,
>  	return cnt;
>  }
>  
> -struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
> +struct inet_peer *inet_getpeer(const struct inetpeer_addr *daddr, int create)
>  {
>  	struct inet_peer __rcu **stack[PEER_MAXDEPTH], ***stackptr;
>  	struct inet_peer_base *base = family_to_base(daddr->family);
> @@ -436,7 +436,10 @@ relookup:
>  		p->daddr = *daddr;
>  		atomic_set(&p->refcnt, 1);
>  		atomic_set(&p->rid, 0);
> -		atomic_set(&p->ip_id_count, secure_ip_id(daddr->addr.a4));
> +		atomic_set(&p->ip_id_count,
> +				(daddr->family == AF_INET) ?
> +					secure_ip_id(daddr->addr.a4) :
> +					secure_ipv6_id(daddr->addr.a6));
>  		p->tcp_ts_stamp = 0;
>  		p->metrics[RTAX_LOCK-1] = INETPEER_METRICS_NEW;
>  		p->rate_tokens = 0;
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 8db0e48..32e5339 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -596,6 +596,31 @@ int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
>  	return offset;
>  }
>  
> +void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt)
> +{
> +	static atomic_t ipv6_fragmentation_id;
> +	int old, new;
> +
> +	if (rt) {
> +		struct inet_peer *peer;
> +
> +		if (!rt->rt6i_peer)
> +			rt6_bind_peer(rt, 1);
> +		peer = rt->rt6i_peer;
> +		if (peer) {
> +			fhdr->identification = htonl(inet_getid(peer, 0));
> +			return;
> +		}
> +	}
> +	do {
> +		old = atomic_read(&ipv6_fragmentation_id);
> +		new = old + 1;
> +		if (!new)
> +			new = 1;
> +	} while (atomic_cmpxchg(&ipv6_fragmentation_id, old, new) != old);
> +	fhdr->identification = htonl(new);
> +}
> +
>  int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
>  {
>  	struct sk_buff *frag;
> @@ -680,7 +705,7 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
>  		skb_reset_network_header(skb);
>  		memcpy(skb_network_header(skb), tmp_hdr, hlen);
>  
> -		ipv6_select_ident(fh);
> +		ipv6_select_ident(fh, rt);
>  		fh->nexthdr = nexthdr;
>  		fh->reserved = 0;
>  		fh->frag_off = htons(IP6_MF);
> @@ -826,7 +851,7 @@ slow_path:
>  		fh->nexthdr = nexthdr;
>  		fh->reserved = 0;
>  		if (!frag_id) {
> -			ipv6_select_ident(fh);
> +			ipv6_select_ident(fh, rt);
>  			frag_id = fh->identification;
>  		} else
>  			fh->identification = frag_id;
> @@ -1076,7 +1101,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
>  			int getfrag(void *from, char *to, int offset, int len,
>  			int odd, struct sk_buff *skb),
>  			void *from, int length, int hh_len, int fragheaderlen,
> -			int transhdrlen, int mtu,unsigned int flags)
> +			int transhdrlen, int mtu,unsigned int flags,
> +			struct rt6_info *rt)
>  
>  {
>  	struct sk_buff *skb;
> @@ -1120,7 +1146,7 @@ static inline int ip6_ufo_append_data(struct sock *sk,
>  		skb_shinfo(skb)->gso_size = (mtu - fragheaderlen -
>  					     sizeof(struct frag_hdr)) & ~7;
>  		skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> -		ipv6_select_ident(&fhdr);
> +		ipv6_select_ident(&fhdr, rt);
>  		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
>  		__skb_queue_tail(&sk->sk_write_queue, skb);
>  
> @@ -1286,7 +1312,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
>  
>  			err = ip6_ufo_append_data(sk, getfrag, from, length,
>  						  hh_len, fragheaderlen,
> -						  transhdrlen, mtu, flags);
> +						  transhdrlen, mtu, flags, rt);
>  			if (err)
>  				goto error;
>  			return 0;
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 328985c..29213b5 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -1359,7 +1359,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, u32 features)
>  	fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
>  	fptr->nexthdr = nexthdr;
>  	fptr->reserved = 0;
> -	ipv6_select_ident(fptr);
> +	ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
>  
>  	/* Fragment the skb. ipv6 header and the remaining fields of the
>  	 * fragment header are updated in ipv6_gso_segment()
> 


-- 
Mathematics is the supreme nostalgia of our time.



^ permalink raw reply

* Re: [PATCH net-next]vhost: fix condition check for # of outstanding dma buffers
From: Shirley Ma @ 2011-07-19 20:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, netdev, jasowang
In-Reply-To: <20110719190945.GB8667@redhat.com>

On Tue, 2011-07-19 at 22:09 +0300, Michael S. Tsirkin wrote:
> On Tue, Jul 19, 2011 at 11:37:58AM -0700, Shirley Ma wrote:
> > Signed-off-by: Shirley Ma <xma@us.ibm.com>
> > ---
> > 
> >  drivers/vhost/net.c |    6 ++++--
> >  1 files changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 70ac604..83cb738 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -189,8 +189,10 @@ static void handle_tx(struct vhost_net *net)
> >                               break;
> >                       }
> >                       /* If more outstanding DMAs, queue the work */
> > -                     if (unlikely(vq->upend_idx - vq->done_idx >
> > -                                  VHOST_MAX_PEND)) {
> > +                     if (unlikely((vq->upend_idx - vq->done_idx >
> > +                                     VHOST_MAX_PEND) ||
> > +                                  (vq->upend_idx - vq->done_idx >
> > +                                      VHOST_MAX_PEND -
> UIO_MAXIOV))) {
> 
> Could you please explain why this makes sense please?
> VHOST_MAX_PEND is 128 UIO_MAXIOV is 1024 so
> the result is negative?

I thought it is equal to:

if (vq->upend_idx > vq->done_idx) 
	check vq->upend_idx - vq->done_idx > VHOST_MAX_PEND
if (vq->upend_idx < vq->done_idx)
	check vq->upend_idx + UIO_MAXIOV - vq->done_idx > VHOST_MAX_PEND
	

> I thought upend_idx - done_idx is exactly the number
> of buffers, so once we get too many we stop until
> one gets freed?

They are index, so in vhost zerocopy callback, we can get the idx right
away.

> 
> >                               tx_poll_start(net, sock);
> >                               set_bit(SOCK_ASYNC_NOSPACE,
> &sock->flags);
> >                               break;
> > 


^ permalink raw reply

* Re: [PATCH] vhost: clean up outstanding buffers before setting vring
From: Shirley Ma @ 2011-07-19 20:50 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, netdev, jasowang
In-Reply-To: <20110719194956.GC8667@redhat.com>

On Tue, 2011-07-19 at 22:49 +0300, Michael S. Tsirkin wrote:
> On Tue, Jul 19, 2011 at 11:02:26AM -0700, Shirley Ma wrote:
> > The outstanding DMA buffers need to be clean up before setting vring
> in
> > vhost. Otherwise the vring would be out of sync.
> > 
> > Signed-off-by: Shirley Ma<xma@us.ibm.com>
> 
> I suspect what is missing is calling
> vhost_zerocopy_signal_used then?
> 
> If yes we probably should do it after
> changing the backend, not on vring set.

I think vhost_zerocopy_signal_used might not be sufficient. But we can
test it out by remove/reloading the guest virtio_net module.

> > ---
> > 
> >  drivers/vhost/vhost.c |   11 +++++++++--
> >  1 files changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index c14c42b..d6315b4 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -445,8 +445,10 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> >                       vhost_poll_flush(&dev->vqs[i].poll);
> >               }
> >               /* Wait for all lower device DMAs done. */
> > -             if (dev->vqs[i].ubufs)
> > +             if (dev->vqs[i].ubufs) {
> >                       vhost_ubuf_put_and_wait(dev->vqs[i].ubufs);
> > +                     kfree(dev->vqs[i].ubufs);
> > +             }
> >  
> >               /* Signal guest as appropriate. */
> >               vhost_zerocopy_signal_used(&dev->vqs[i]);
> > @@ -651,6 +653,12 @@ static long vhost_set_vring(struct vhost_dev
> *d, int ioctl, void __user *argp)
> >       vq = d->vqs + idx;
> >  
> >       mutex_lock(&vq->mutex);
> > +     /* Wait for all lower device DMAs done. */
> > +     if (vq->ubufs)
> > +             vhost_ubuf_put_and_wait(vq->ubufs);
> 
> Could you elaborate on the problem you observe please?
> At least in theory, existing code flushes outstanding
> requests when backend is changed.
> And since vring set verifies no backend is active,
> we should be fine?

The problem encounters when guest rmmod virtio_net module, then reload
the module, and configure the interface, it complains about some ring id
is not a head. With this patch, the problem is solved. 

> 
> > +
> > +     /* Signal guest as appropriate. */
> > +     vhost_zerocopy_signal_used(vq);
> >  
> >       switch (ioctl) {
> >       case VHOST_SET_VRING_NUM:
> > @@ -1592,7 +1600,6 @@ void vhost_ubuf_put_and_wait(struct
> vhost_ubuf_ref *ubufs)
> >  {
> >       kref_put(&ubufs->kref, vhost_zerocopy_done_signal);
> >       wait_event(ubufs->wait, !atomic_read(&ubufs->kref.refcount));
> > -     kfree(ubufs);
> 
> Won't this leak memory when ubufs are switched in
> vhost_net_set_backend? 

Right, I forgot to check net.c, whenever it calls
vhot_ubuf_put_and_wait, it should call kfree(ubufs).




^ permalink raw reply

* [PATCH net-next-2.6] ipv6: make fragment identifications less predictable
From: Eric Dumazet @ 2011-07-19 20:47 UTC (permalink / raw)
  To: Fernando Gont, David Miller; +Cc: security, Eugene Teo, netdev, Matt Mackall
In-Reply-To: <1311089463.2375.42.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

IPv6 fragment identification generation is way beyond what we use for
IPv4 : It uses a single generator. Its not scalable and allows DOS
attacks.

Now inetpeer is IPv6 aware, we can use it to provide a more secure and
scalable frag ident generator (per destination, instead of system wide)

This patch :
1) defines a new secure_ipv6_id() helper
2) extends inet_getid() to provide 32bit results
3) extends ipv6_select_ident() with a new dest parameter

Reported-by: Fernando Gont <fernando@gont.com.ar>
CC: Matt Mackall <mpm@selenic.com>
CC: Eugene Teo <eugeneteo@kernel.sg>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/char/random.c  |   15 +++++++++++++++
 include/linux/random.h |    1 +
 include/net/inetpeer.h |   13 ++++++++++---
 include/net/ipv6.h     |   12 +-----------
 net/ipv4/inetpeer.c    |    7 +++++--
 net/ipv6/ip6_output.c  |   36 +++++++++++++++++++++++++++++++-----
 net/ipv6/udp.c         |    2 +-
 7 files changed, 64 insertions(+), 22 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d4ddeba..7292819 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1523,6 +1523,21 @@ __u32 secure_ip_id(__be32 daddr)
 	return half_md4_transform(hash, keyptr->secret);
 }
 
+__u32 secure_ipv6_id(const __be32 daddr[4])
+{
+	const struct keydata *keyptr;
+	__u32 hash[4];
+
+	keyptr = get_keyptr();
+
+	hash[0] = (__force __u32)daddr[0];
+	hash[1] = (__force __u32)daddr[1];
+	hash[2] = (__force __u32)daddr[2];
+	hash[3] = (__force __u32)daddr[3];
+
+	return half_md4_transform(hash, keyptr->secret);
+}
+
 #ifdef CONFIG_INET
 
 __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
diff --git a/include/linux/random.h b/include/linux/random.h
index fb7ab9d..ce29a04 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -58,6 +58,7 @@ extern void get_random_bytes(void *buf, int nbytes);
 void generate_random_uuid(unsigned char uuid_out[16]);
 
 extern __u32 secure_ip_id(__be32 daddr);
+extern __u32 secure_ipv6_id(const __be32 daddr[4]);
 extern u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
 extern u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 				      __be16 dport);
diff --git a/include/net/inetpeer.h b/include/net/inetpeer.h
index 39d1230..4233e6f 100644
--- a/include/net/inetpeer.h
+++ b/include/net/inetpeer.h
@@ -71,7 +71,7 @@ static inline bool inet_metrics_new(const struct inet_peer *p)
 }
 
 /* can be called with or without local BH being disabled */
-struct inet_peer	*inet_getpeer(struct inetpeer_addr *daddr, int create);
+struct inet_peer	*inet_getpeer(const struct inetpeer_addr *daddr, int create);
 
 static inline struct inet_peer *inet_getpeer_v4(__be32 v4daddr, int create)
 {
@@ -106,11 +106,18 @@ static inline void inet_peer_refcheck(const struct inet_peer *p)
 
 
 /* can be called with or without local BH being disabled */
-static inline __u16	inet_getid(struct inet_peer *p, int more)
+static inline int inet_getid(struct inet_peer *p, int more)
 {
+	int old, new;
 	more++;
 	inet_peer_refcheck(p);
-	return atomic_add_return(more, &p->ip_id_count) - more;
+	do {
+		old = atomic_read(&p->ip_id_count);
+		new = old + more;
+		if (!new)
+			new = 1;
+	} while (atomic_cmpxchg(&p->ip_id_count, old, new) != old);
+	return new;
 }
 
 #endif /* _NET_INETPEER_H */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index c033ed0..3b5ac1f 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -463,17 +463,7 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add
 	return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr));
 }
 
-static __inline__ void ipv6_select_ident(struct frag_hdr *fhdr)
-{
-	static u32 ipv6_fragmentation_id = 1;
-	static DEFINE_SPINLOCK(ip6_id_lock);
-
-	spin_lock_bh(&ip6_id_lock);
-	fhdr->identification = htonl(ipv6_fragmentation_id);
-	if (++ipv6_fragmentation_id == 0)
-		ipv6_fragmentation_id = 1;
-	spin_unlock_bh(&ip6_id_lock);
-}
+extern void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt);
 
 /*
  *	Prototypes exported by ipv6
diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 90c5f0d..e382138 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -391,7 +391,7 @@ static int inet_peer_gc(struct inet_peer_base *base,
 	return cnt;
 }
 
-struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
+struct inet_peer *inet_getpeer(const struct inetpeer_addr *daddr, int create)
 {
 	struct inet_peer __rcu **stack[PEER_MAXDEPTH], ***stackptr;
 	struct inet_peer_base *base = family_to_base(daddr->family);
@@ -436,7 +436,10 @@ relookup:
 		p->daddr = *daddr;
 		atomic_set(&p->refcnt, 1);
 		atomic_set(&p->rid, 0);
-		atomic_set(&p->ip_id_count, secure_ip_id(daddr->addr.a4));
+		atomic_set(&p->ip_id_count,
+				(daddr->family == AF_INET) ?
+					secure_ip_id(daddr->addr.a4) :
+					secure_ipv6_id(daddr->addr.a6));
 		p->tcp_ts_stamp = 0;
 		p->metrics[RTAX_LOCK-1] = INETPEER_METRICS_NEW;
 		p->rate_tokens = 0;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 8db0e48..32e5339 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -596,6 +596,31 @@ int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
 	return offset;
 }
 
+void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt)
+{
+	static atomic_t ipv6_fragmentation_id;
+	int old, new;
+
+	if (rt) {
+		struct inet_peer *peer;
+
+		if (!rt->rt6i_peer)
+			rt6_bind_peer(rt, 1);
+		peer = rt->rt6i_peer;
+		if (peer) {
+			fhdr->identification = htonl(inet_getid(peer, 0));
+			return;
+		}
+	}
+	do {
+		old = atomic_read(&ipv6_fragmentation_id);
+		new = old + 1;
+		if (!new)
+			new = 1;
+	} while (atomic_cmpxchg(&ipv6_fragmentation_id, old, new) != old);
+	fhdr->identification = htonl(new);
+}
+
 int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 {
 	struct sk_buff *frag;
@@ -680,7 +705,7 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		skb_reset_network_header(skb);
 		memcpy(skb_network_header(skb), tmp_hdr, hlen);
 
-		ipv6_select_ident(fh);
+		ipv6_select_ident(fh, rt);
 		fh->nexthdr = nexthdr;
 		fh->reserved = 0;
 		fh->frag_off = htons(IP6_MF);
@@ -826,7 +851,7 @@ slow_path:
 		fh->nexthdr = nexthdr;
 		fh->reserved = 0;
 		if (!frag_id) {
-			ipv6_select_ident(fh);
+			ipv6_select_ident(fh, rt);
 			frag_id = fh->identification;
 		} else
 			fh->identification = frag_id;
@@ -1076,7 +1101,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 			int getfrag(void *from, char *to, int offset, int len,
 			int odd, struct sk_buff *skb),
 			void *from, int length, int hh_len, int fragheaderlen,
-			int transhdrlen, int mtu,unsigned int flags)
+			int transhdrlen, int mtu,unsigned int flags,
+			struct rt6_info *rt)
 
 {
 	struct sk_buff *skb;
@@ -1120,7 +1146,7 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 		skb_shinfo(skb)->gso_size = (mtu - fragheaderlen -
 					     sizeof(struct frag_hdr)) & ~7;
 		skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
-		ipv6_select_ident(&fhdr);
+		ipv6_select_ident(&fhdr, rt);
 		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
 		__skb_queue_tail(&sk->sk_write_queue, skb);
 
@@ -1286,7 +1312,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 
 			err = ip6_ufo_append_data(sk, getfrag, from, length,
 						  hh_len, fragheaderlen,
-						  transhdrlen, mtu, flags);
+						  transhdrlen, mtu, flags, rt);
 			if (err)
 				goto error;
 			return 0;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 328985c..29213b5 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1359,7 +1359,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, u32 features)
 	fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
 	fptr->nexthdr = nexthdr;
 	fptr->reserved = 0;
-	ipv6_select_ident(fptr);
+	ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
 
 	/* Fragment the skb. ipv6 header and the remaining fields of the
 	 * fragment header are updated in ipv6_gso_segment()



^ permalink raw reply related

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Eric Dumazet @ 2011-07-19 20:41 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Neil Horman, netdev, Alexey Dobriyan, David S. Miller
In-Reply-To: <20110719202922.GA2352@minipsycho>

Le mardi 19 juillet 2011 à 22:29 +0200, Jiri Pirko a écrit :

> You are right, but it may not cause panic, right? In case this patch
> would cause significant performance regression, how about to just forbid
> pktgen to run on soft-netdevs ?
> 

Please do

Note : a sysadmin has other ways to make a machine panic or reboot or
halt...

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox