Netdev List
 help / color / mirror / Atom feed
* Re: [patch net-next] switchdev: fix typo in inline function definition
From: David Miller @ 2015-01-18 17:24 UTC (permalink / raw)
  To: jiri; +Cc: netdev, sfeldma
In-Reply-To: <1421573156-6101-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Sun, 18 Jan 2015 10:25:56 +0100

> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Oops, applied, thanks!

^ permalink raw reply

* Re: [PATCH net-next] iproute2: bridge: support vlan range
From: Scott Feldman @ 2015-01-18 17:44 UTC (permalink / raw)
  To: roopa; +Cc: Netdev, shemminger, vyasevic@redhat.com, Wilson Kok
In-Reply-To: <54BB78DA.6060208@cumulusnetworks.com>

On Sun, Jan 18, 2015 at 1:11 AM, roopa <roopa@cumulusnetworks.com> wrote:
> On 1/17/15, 5:35 PM, Scott Feldman wrote:
>>
>> On Thu, Jan 15, 2015 at 10:52 PM,  <roopa@cumulusnetworks.com> wrote:
>>>
>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>
>>> This patch adds vlan range support to bridge command
>>> using the newly added vinfo flags BRIDGE_VLAN_INFO_RANGE_BEGIN and
>>> BRIDGE_VLAN_INFO_RANGE_END.
>>>
>
>>
>>> +                               vinfo.flags |=
>>> BRIDGE_VLAN_INFO_RANGE_BEGIN;
>>> +                       } else {
>>> +                               vinfo.vid = atoi(*argv);
>>> +                       }
>>>                  } else if (strcmp(*argv, "self") == 0) {
>>>                          flags |= BRIDGE_FLAGS_SELF;
>>>                  } else if (strcmp(*argv, "master") == 0) {
>>> @@ -67,7 +78,7 @@ static int vlan_modify(int cmd, int argc, char **argv)
>>>                  argc--; argv++;
>>>          }
>>>
>>> -       if (d == NULL || vid == -1) {
>>> +       if (d == NULL || vinfo.vid == -1) {
>>
>> Where was vinfo.vid initialized to -1?  Maybe use vid rather than
>> vinfo.vid in the code above where parsing the arg, and continue using
>> vid and vid_end until final put of vinfo.
>>
> There is already a "memset(&vinfo, 0, sizeof(vinfo));"  in the code in the
> beginning of the function.

That's the problem...vinfo.vid is initialized to 0, not -1, so
checking if vinfo.vid == -1 is always false.

^ permalink raw reply

* Re: BW regression after "tcp: refine TSO autosizing"
From: Eric Dumazet @ 2015-01-18 17:48 UTC (permalink / raw)
  To: Eyal Perry
  Cc: Or Gerlitz, Linux Netdev List, Amir Vadai, Yevgeny Petrilin,
	Saeed Mahameed, Ido Shamay, Amir Ancel, Eyal Perry
In-Reply-To: <CAM_57Zgx3DoQbQMffAJNMeZHa8Yb8M+JuwEH=1Fo_Xou+DkgcQ@mail.gmail.com>

On Sun, 2015-01-18 at 18:22 +0200, Eyal Perry wrote:

> 
> Please let me know if you see something in the results.

Getting high throughput on a single flow means lot of tweaking.

For a start, mlx4 is known to have interrupt mitigation that can hurt,
as the TX interrupt timer is restarted for every packet that is
delivered to the NIC.

ethtool -c ethX
..
tx-usecs: 16
tx-frames: 16
tx-usecs-irq: 0
tx-frames-irq: 256
...

-> TX IRQ can be delayed by 16*16 = 256 usec.

Can you try :

ethtool -C ethX tx-usecs 2 tx-frames 2

Or even

ethtool -C ethX tx-usecs 1 tx-frames 1

Interrupt mitigation is a trade-off.

If one customer wants high throughput on a single flow, then you might
remove interrupt mitigation.

If another customer wants cpu efficiency with thousand of flows, I guess
current mlx4 defaults are pretty good.

^ permalink raw reply

* Re: Wireless scanning while turning off the radio problem..
From: Arend van Spriel @ 2015-01-18 17:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Emmanuel Grumbach, Johannes Berg, David Miller,
	Linux Wireless List, Network Development
In-Reply-To: <CA+55aFzEr-R1anNm91pnM=ujfhBstJikZBVtduphff1PtccHDw@mail.gmail.com>

On 01/18/15 17:53, Linus Torvalds wrote:
> On Sun, Jan 18, 2015 at 11:24 PM, Emmanuel Grumbach<egrumbach@gmail.com>  wrote:
>>
>> we have different scan flows based on the firmware version you have,
>> so it would help if you could tell me what firmware you have.
>
> Sure. It's the larest one I could find
>
>     iwlwifi 0000:01:00.0: loaded firmware version 23.11.10.0 op_mode iwlmvm
>
> with the actual firmware file being 'iwlwifi-7260-10.ucode' from the
> current linux-firmware tree.
>
> Iin a different email Arend van Spriel<arend@broadcom.com>  wrote:
>>
>> The function iwl_trans_pcie_stop_device() put device in low-power and
>> resets the cpu on the device.  So iwl_op_mode_hw_rf_kill ends up in
>> iwl_mvm_set_hw_rfkill_state which schedules cfg80211_rfkill_sync_work
>> and returns true if firmware is running.  The patch below might work.
>
> Any suggestions for how to best try to trigger this for testing?
> Looking at my logs, it turns out that I actually got this three times,
> but they were all on the same boot, and I think the first case might
> just have triggered the later ones.
>
> The trigger was turning off wifi from the wifi settings app due to
> being in an airplane when they were closing the doors. I don't *think*
> there was actually any wifi around at the time, which may or may not
> have made the scanning take longer and made it easier to trigger.
>
> But I've done it before (although this machine has been upgraded to
> F21 reasonably recently, and I did update the ucode file before the
> trip). And I did it afterwards to test. And it happened that one time
> (and then apparently kept happening during suspend/resume/shutdown,
> but as mentioned, I blame that on some sticky problem from the first
> time, and those events in turn happened because I couldn't get
> wireless to work afterwards).
>
> IOW, I'm not at all sure I can recreate it, so your "analyzing the
> source code for how this could happen" may be the only good way..

Your issue occurs when firmware is instructed by user-space, ie. 
wpa_supplicant(?), to do "scheduled scan". This type of scanning is done 
entirely in the device. Typically, this type of scanning is done by 
wpa_supplicant after 3 normal scans in which no configured networks were 
found. The idea is that the host can be idle while the device does the 
scanning and wakes up the host when a configured network is found.

So as you indicated you were in location where none of your configured 
networks were available. Flipping the rfkill switch in that situation is 
the way to trigger the issue.

Regards,
Arend

^ permalink raw reply

* Re: Wireless scanning while turning off the radio problem..
From: Arend van Spriel @ 2015-01-18 18:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Emmanuel Grumbach, Johannes Berg, David Miller,
	Linux Wireless List, Network Development
In-Reply-To: <54BBF207.2030708@broadcom.com>

On 01/18/15 18:48, Arend van Spriel wrote:
> On 01/18/15 17:53, Linus Torvalds wrote:
>> On Sun, Jan 18, 2015 at 11:24 PM, Emmanuel
>> Grumbach<egrumbach@gmail.com> wrote:
>>>
>>> we have different scan flows based on the firmware version you have,
>>> so it would help if you could tell me what firmware you have.
>>
>> Sure. It's the larest one I could find
>>
>> iwlwifi 0000:01:00.0: loaded firmware version 23.11.10.0 op_mode iwlmvm
>>
>> with the actual firmware file being 'iwlwifi-7260-10.ucode' from the
>> current linux-firmware tree.
>>
>> Iin a different email Arend van Spriel<arend@broadcom.com> wrote:
>>>
>>> The function iwl_trans_pcie_stop_device() put device in low-power and
>>> resets the cpu on the device. So iwl_op_mode_hw_rf_kill ends up in
>>> iwl_mvm_set_hw_rfkill_state which schedules cfg80211_rfkill_sync_work
>>> and returns true if firmware is running. The patch below might work.
>>
>> Any suggestions for how to best try to trigger this for testing?
>> Looking at my logs, it turns out that I actually got this three times,
>> but they were all on the same boot, and I think the first case might
>> just have triggered the later ones.
>>
>> The trigger was turning off wifi from the wifi settings app due to
>> being in an airplane when they were closing the doors. I don't *think*
>> there was actually any wifi around at the time, which may or may not
>> have made the scanning take longer and made it easier to trigger.
>>
>> But I've done it before (although this machine has been upgraded to
>> F21 reasonably recently, and I did update the ucode file before the
>> trip). And I did it afterwards to test. And it happened that one time
>> (and then apparently kept happening during suspend/resume/shutdown,
>> but as mentioned, I blame that on some sticky problem from the first
>> time, and those events in turn happened because I couldn't get
>> wireless to work afterwards).
>>
>> IOW, I'm not at all sure I can recreate it, so your "analyzing the
>> source code for how this could happen" may be the only good way..
>
> Your issue occurs when firmware is instructed by user-space, ie.
> wpa_supplicant(?), to do "scheduled scan". This type of scanning is done
> entirely in the device. Typically, this type of scanning is done by
> wpa_supplicant after 3 normal scans in which no configured networks were
> found. The idea is that the host can be idle while the device does the
> scanning and wakes up the host when a configured network is found.
>
> So as you indicated you were in location where none of your configured
> networks were available. Flipping the rfkill switch in that situation is
> the way to trigger the issue.

After a scheduled scan has been set by wpa_supplicant, which is 
difficult to tell from user-space. Checking the nl80211 code it seems 
that kernel sends a netlink event for this so if you run 'iw event' you 
should be able to see the "start_sched_scan" event.

Regards,
Arend

> Regards,
> Arend
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Wireless scanning while turning off the radio problem..
From: Linus Torvalds @ 2015-01-18 18:40 UTC (permalink / raw)
  To: Arend van Spriel
  Cc: Emmanuel Grumbach, Johannes Berg, David Miller,
	Linux Wireless List, Network Development
In-Reply-To: <54BBF207.2030708-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

On Mon, Jan 19, 2015 at 5:48 AM, Arend van Spriel <arend-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
>
> So as you indicated you were in location where none of your configured
> networks were available. Flipping the rfkill switch in that situation is the
> way to trigger the issue.

So you certainly seem to be able to explain the behavior I saw under
the circumstances they happened.

I suspect the best thing to do is to just apply your patch. I may not
be able to really test it much for the next few days anyway. Emmanuel?

             Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 0/2] bgmac: some fixes to napi usage
From: Hauke Mehrtens @ 2015-01-18 18:49 UTC (permalink / raw)
  To: zajec5, davem; +Cc: netdev, Hauke Mehrtens

I compared the napi documentation with the bgmac driver and found some 
problems in that driver. These two patches should fix the problems.

Hauke Mehrtens (2):
  bgmac: register napi before the device
  bgmac: activate irqs only if there is nothing to poll

 drivers/net/ethernet/broadcom/bgmac.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

-- 
1.9.1

^ permalink raw reply

* [PATCH 1/2] bgmac: register napi before the device
From: Hauke Mehrtens @ 2015-01-18 18:49 UTC (permalink / raw)
  To: zajec5, davem; +Cc: netdev, Hauke Mehrtens
In-Reply-To: <1421606999-23692-1-git-send-email-hauke@hauke-m.de>

napi should get registered before the netdev and not after.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
---
 drivers/net/ethernet/broadcom/bgmac.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
index 05c6af6..aa9f950 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -1515,6 +1515,8 @@ static int bgmac_probe(struct bcma_device *core)
 	if (core->bus->sprom.boardflags_lo & BGMAC_BFL_ENETADM)
 		bgmac_warn(bgmac, "Support for ADMtek ethernet switch not implemented\n");
 
+	netif_napi_add(net_dev, &bgmac->napi, bgmac_poll, BGMAC_WEIGHT);
+
 	err = bgmac_mii_register(bgmac);
 	if (err) {
 		bgmac_err(bgmac, "Cannot register MDIO\n");
@@ -1529,8 +1531,6 @@ static int bgmac_probe(struct bcma_device *core)
 
 	netif_carrier_off(net_dev);
 
-	netif_napi_add(net_dev, &bgmac->napi, bgmac_poll, BGMAC_WEIGHT);
-
 	return 0;
 
 err_mii_unregister:
@@ -1549,9 +1549,9 @@ static void bgmac_remove(struct bcma_device *core)
 {
 	struct bgmac *bgmac = bcma_get_drvdata(core);
 
-	netif_napi_del(&bgmac->napi);
 	unregister_netdev(bgmac->net_dev);
 	bgmac_mii_unregister(bgmac);
+	netif_napi_del(&bgmac->napi);
 	bgmac_dma_free(bgmac);
 	bcma_set_drvdata(core, NULL);
 	free_netdev(bgmac->net_dev);
-- 
1.9.1

^ permalink raw reply related

* [PATCH 2/2] bgmac: activate irqs only if there is nothing to poll
From: Hauke Mehrtens @ 2015-01-18 18:49 UTC (permalink / raw)
  To: zajec5, davem; +Cc: netdev, Hauke Mehrtens
In-Reply-To: <1421606999-23692-1-git-send-email-hauke@hauke-m.de>

IRQs should only get activated when there is nothing to poll in the
queue any more and to after every poll.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
---
 drivers/net/ethernet/broadcom/bgmac.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
index aa9f950..3007d95 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -1167,10 +1167,10 @@ static int bgmac_poll(struct napi_struct *napi, int weight)
 		bgmac->int_status = 0;
 	}
 
-	if (handled < weight)
+	if (handled < weight) {
 		napi_complete(napi);
-
-	bgmac_chip_intrs_on(bgmac);
+		bgmac_chip_intrs_on(bgmac);
+	}
 
 	return handled;
 }
-- 
1.9.1

^ permalink raw reply related

* Re: Wireless scanning while turning off the radio problem..
From: Emmanuel Grumbach @ 2015-01-18 18:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arend van Spriel, Johannes Berg, David Miller,
	Linux Wireless List, Network Development
In-Reply-To: <CA+55aFws=-MY+rUsEny=DsBx9gct93BnfeGXSnOCLxEFtvDj8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Sun, Jan 18, 2015 at 8:40 PM, Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> On Mon, Jan 19, 2015 at 5:48 AM, Arend van Spriel <arend-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
>>
>> So as you indicated you were in location where none of your configured
>> networks were available. Flipping the rfkill switch in that situation is the
>> way to trigger the issue.
>
> So you certainly seem to be able to explain the behavior I saw under
> the circumstances they happened.
>
> I suspect the best thing to do is to just apply your patch. I may not
> be able to really test it much for the next few days anyway. Emmanuel?
>

Sorry - I was a bit busy.
The patch seems wrong, we can't really call that function from the
rfkill interrupt - it will blow up.
The good news is that I could reproduce the bug based on what Arend
pointed. I totally missed the fact
that it was scheduled scan - thanks Arend for that.

So the system I have here doesn't have HW rfkill so I had to implement
a hook that fakes it,
but I can't reproduce the problem.
I'll come up with a patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Wireless scanning while turning off the radio problem..
From: Emmanuel Grumbach @ 2015-01-18 19:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arend van Spriel, Johannes Berg, David Miller,
	Linux Wireless List, Network Development
In-Reply-To: <CANUX_P1pmB2PD00N2r2LgjNL4fG6RuTPxsO8G2weU+8EeENPTg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Sun, Jan 18, 2015 at 8:52 PM, Emmanuel Grumbach <egrumbach-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Sun, Jan 18, 2015 at 8:40 PM, Linus Torvalds
> <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
>> On Mon, Jan 19, 2015 at 5:48 AM, Arend van Spriel <arend-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>> So as you indicated you were in location where none of your configured
>>> networks were available. Flipping the rfkill switch in that situation is the
>>> way to trigger the issue.
>>
>> So you certainly seem to be able to explain the behavior I saw under
>> the circumstances they happened.
>>
>> I suspect the best thing to do is to just apply your patch. I may not
>> be able to really test it much for the next few days anyway. Emmanuel?
>>
>
> Sorry - I was a bit busy.
> The patch seems wrong, we can't really call that function from the
> rfkill interrupt - it will blow up.
> The good news is that I could reproduce the bug based on what Arend
> pointed. I totally missed the fact
> that it was scheduled scan - thanks Arend for that.
>
> So the system I have here doesn't have HW rfkill so I had to implement
> a hook that fakes it,
> but I can't reproduce the problem.
> I'll come up with a patch.


Ok - Here is the patch:

diff --git a/drivers/net/wireless/iwlwifi/mvm/ops.c
b/drivers/net/wireless/iwlwifi/mvm/ops.c
index 384eefd..bbd8054 100644
--- a/drivers/net/wireless/iwlwifi/mvm/ops.c
+++ b/drivers/net/wireless/iwlwifi/mvm/ops.c
@@ -867,6 +867,9 @@ static bool iwl_mvm_set_hw_rfkill_state(struct
iwl_op_mode *op_mode, bool state)
        if (calibrating)
                iwl_abort_notification_waits(&mvm->notif_wait);

+       if (state)
+               mvm->scan_status = IWL_MVM_SCAN_NONE;
+
        /*
         * Stop the device if we run OPERATIONAL firmware or if we are in the
         * middle of the calibrations.


I will send that for internal review, because I am not all that much
familiar with this and I want the guy who wrote that code to take a
look before I send a pull request, but that should help.

In any case, Linus, until you get the fix (which should be sent to
upstream in 12 hours) you can restart the whole wifi stack instead of
rebooting:

sudo modprobe -r iwlwifi
sudo modprobe iwlwifi
sudo killall wpa_supplicant


that should bring your wifi back without the need to reboot.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: Wireless scanning while turning off the radio problem..
From: Emmanuel Grumbach @ 2015-01-18 19:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arend van Spriel, Johannes Berg, David Miller,
	Linux Wireless List, Network Development
In-Reply-To: <CANUX_P3kv5ofQJLGyEvRSgetHg=TJsaVennLJWq3ummgRJHWgg@mail.gmail.com>

Emmanuel Grumbach
egrumbach@gmail.com


On Sun, Jan 18, 2015 at 9:13 PM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
> On Sun, Jan 18, 2015 at 8:52 PM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>> On Sun, Jan 18, 2015 at 8:40 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>> On Mon, Jan 19, 2015 at 5:48 AM, Arend van Spriel <arend@broadcom.com> wrote:
>>>>
>>>> So as you indicated you were in location where none of your configured
>>>> networks were available. Flipping the rfkill switch in that situation is the
>>>> way to trigger the issue.
>>>
>>> So you certainly seem to be able to explain the behavior I saw under
>>> the circumstances they happened.
>>>
>>> I suspect the best thing to do is to just apply your patch. I may not
>>> be able to really test it much for the next few days anyway. Emmanuel?
>>>
>>
>> Sorry - I was a bit busy.
>> The patch seems wrong, we can't really call that function from the
>> rfkill interrupt - it will blow up.
>> The good news is that I could reproduce the bug based on what Arend
>> pointed. I totally missed the fact
>> that it was scheduled scan - thanks Arend for that.
>>
>> So the system I have here doesn't have HW rfkill so I had to implement
>> a hook that fakes it,
>> but I can't reproduce the problem.
>> I'll come up with a patch.
>
>
> Ok - Here is the patch:
>
> diff --git a/drivers/net/wireless/iwlwifi/mvm/ops.c
> b/drivers/net/wireless/iwlwifi/mvm/ops.c
> index 384eefd..bbd8054 100644
> --- a/drivers/net/wireless/iwlwifi/mvm/ops.c
> +++ b/drivers/net/wireless/iwlwifi/mvm/ops.c
> @@ -867,6 +867,9 @@ static bool iwl_mvm_set_hw_rfkill_state(struct
> iwl_op_mode *op_mode, bool state)
>         if (calibrating)
>                 iwl_abort_notification_waits(&mvm->notif_wait);
>
> +       if (state)
> +               mvm->scan_status = IWL_MVM_SCAN_NONE;
> +
>         /*
>          * Stop the device if we run OPERATIONAL firmware or if we are in the
>          * middle of the calibrations.
>
>
> I will send that for internal review, because I am not all that much
> familiar with this and I want the guy who wrote that code to take a
> look before I send a pull request, but that should help.
>
Nah this is bad...

I have a correct patch - pull request on the way.

^ permalink raw reply

* [net-next PATCH v2 1/1] net: sched: Introduce connmark action
From: Jamal Hadi Salim @ 2015-01-18 20:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, nbd, pablo, fw, jiri, Jamal Hadi Salim

From: Felix Fietkau <nbd@openwrt.org>

This tc action allows you to retrieve the connection tracking mark
This action has been used heavily by openwrt for a few years now.

There are known limitations currently:

doesn't work for initial packets, since we only query the ct table.
  Fine given use case is for returning packets

no implicit defrag.
  frags should be rare so fix later..

won't work for more complex tasks, e.g. lookup of other extensions
  since we have no means to store results

we still have a 2nd lookup later on via normal conntrack path.
This shouldn't break anything though since skb->nfct isn't altered.

V2:
remove unnecessary braces
change the action identifier to 14
Fix some stylistic issues caught by checkpatch

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/net/tc_act/tc_connmark.h        |   14 +++
 include/uapi/linux/tc_act/tc_connmark.h |   22 ++++
 net/sched/Kconfig                       |   11 ++
 net/sched/Makefile                      |    1 +
 net/sched/act_connmark.c                |  201 +++++++++++++++++++++++++++++++
 5 files changed, 249 insertions(+)
 create mode 100644 include/net/tc_act/tc_connmark.h
 create mode 100644 include/uapi/linux/tc_act/tc_connmark.h
 create mode 100644 net/sched/act_connmark.c

diff --git a/include/net/tc_act/tc_connmark.h b/include/net/tc_act/tc_connmark.h
new file mode 100644
index 0000000..5c1104c
--- /dev/null
+++ b/include/net/tc_act/tc_connmark.h
@@ -0,0 +1,14 @@
+#ifndef __NET_TC_CONNMARK_H
+#define __NET_TC_CONNMARK_H
+
+#include <net/act_api.h>
+
+struct tcf_connmark_info {
+	struct tcf_common common;
+	u16 zone;
+};
+
+#define to_connmark(a) \
+	container_of(a->priv, struct tcf_connmark_info, common)
+
+#endif /* __NET_TC_CONNMARK_H */
diff --git a/include/uapi/linux/tc_act/tc_connmark.h b/include/uapi/linux/tc_act/tc_connmark.h
new file mode 100644
index 0000000..994b097
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_connmark.h
@@ -0,0 +1,22 @@
+#ifndef __UAPI_TC_CONNMARK_H
+#define __UAPI_TC_CONNMARK_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_CONNMARK 14
+
+struct tc_connmark {
+	tc_gen;
+	__u16 zone;
+};
+
+enum {
+	TCA_CONNMARK_UNSPEC,
+	TCA_CONNMARK_PARMS,
+	TCA_CONNMARK_TM,
+	__TCA_CONNMARK_MAX
+};
+#define TCA_CONNMARK_MAX (__TCA_CONNMARK_MAX - 1)
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index c54c9d9..db20cae 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -698,6 +698,17 @@ config NET_ACT_VLAN
 	  To compile this code as a module, choose M here: the
 	  module will be called act_vlan.
 
+config NET_ACT_CONNMARK
+        tristate "Netfilter Connection Mark Retriever"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        ---help---
+	  Say Y here to allow retrieving of conn mark
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_connmark.
+
 config NET_CLS_IND
 	bool "Incoming device classification"
 	depends on NET_CLS_U32 || NET_CLS_FW
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 679f24a..47304cd 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_NET_ACT_SIMP)	+= act_simple.o
 obj-$(CONFIG_NET_ACT_SKBEDIT)	+= act_skbedit.o
 obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
+obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
 obj-$(CONFIG_NET_SCH_FIFO)	+= sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)	+= sch_cbq.o
 obj-$(CONFIG_NET_SCH_HTB)	+= sch_htb.o
diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
new file mode 100644
index 0000000..edb92a5
--- /dev/null
+++ b/net/sched/act_connmark.c
@@ -0,0 +1,201 @@
+/*
+ * net/sched/act_connmark.c  netfilter connmark retriever action
+ * skb mark is over-written
+ *
+ * Copyright (c) 2011 Felix Fietkau <nbd@openwrt.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+*/
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+#include <uapi/linux/tc_act/tc_connmark.h>
+#include <net/tc_act/tc_connmark.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+
+#define CONNMARK_TAB_MASK     3
+
+static struct tcf_hashinfo connmark_hash_info;
+
+static int tcf_connmark(struct sk_buff *skb, const struct tc_action *a,
+			struct tcf_result *res)
+{
+	const struct nf_conntrack_tuple_hash *thash;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_connmark_info *ca = a->priv;
+	struct nf_conn *c;
+	int proto;
+
+	spin_lock(&ca->tcf_lock);
+	ca->tcf_tm.lastuse = jiffies;
+	bstats_update(&ca->tcf_bstats, skb);
+
+	if (skb->protocol == htons(ETH_P_IP)) {
+		if (skb->len < sizeof(struct iphdr))
+			goto out;
+
+		proto = NFPROTO_IPV4;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (skb->len < sizeof(struct ipv6hdr))
+			goto out;
+
+		proto = NFPROTO_IPV6;
+	} else {
+		goto out;
+	}
+
+	c = nf_ct_get(skb, &ctinfo);
+	if (c) {
+		skb->mark = c->mark;
+		/* using overlimits stats to count how many packets marked */
+		ca->tcf_qstats.overlimits++;
+		nf_ct_put(c);
+		goto out;
+	}
+
+	if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+			       proto, &tuple))
+		goto out;
+
+	thash = nf_conntrack_find_get(dev_net(skb->dev), ca->zone, &tuple);
+	if (!thash)
+		goto out;
+
+	c = nf_ct_tuplehash_to_ctrack(thash);
+	/* using overlimits stats to count how many packets marked */
+	ca->tcf_qstats.overlimits++;
+	skb->mark = c->mark;
+	nf_ct_put(c);
+
+out:
+	skb->nfct = NULL;
+	spin_unlock(&ca->tcf_lock);
+	return ca->tcf_action;
+}
+
+static const struct nla_policy connmark_policy[TCA_CONNMARK_MAX + 1] = {
+	[TCA_CONNMARK_PARMS] = { .len = sizeof(struct tc_connmark) },
+};
+
+static int tcf_connmark_init(struct net *net, struct nlattr *nla,
+			     struct nlattr *est, struct tc_action *a,
+			     int ovr, int bind)
+{
+	struct nlattr *tb[TCA_CONNMARK_MAX + 1];
+	struct tcf_connmark_info *ci;
+	struct tc_connmark *parm;
+	int ret = 0;
+
+	if (!nla)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_CONNMARK_MAX, nla, connmark_policy);
+	if (ret < 0)
+		return ret;
+
+	parm = nla_data(tb[TCA_CONNMARK_PARMS]);
+
+	if (!tcf_hash_check(parm->index, a, bind)) {
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*ci), bind);
+		if (ret)
+			return ret;
+
+		ci = to_connmark(a);
+		ci->tcf_action = parm->action;
+		ci->zone = parm->zone;
+
+		tcf_hash_insert(a);
+		ret = ACT_P_CREATED;
+	} else {
+		ci = to_connmark(a);
+		if (bind)
+			return 0;
+		tcf_hash_release(a, bind);
+		if (!ovr)
+			return -EEXIST;
+		/* replacing action and zone */
+		ci->tcf_action = parm->action;
+		ci->zone = parm->zone;
+	}
+
+	return ret;
+}
+
+static inline int tcf_connmark_dump(struct sk_buff *skb, struct tc_action *a,
+				    int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_connmark_info *ci = a->priv;
+
+	struct tc_connmark opt = {
+		.index   = ci->tcf_index,
+		.refcnt  = ci->tcf_refcnt - ref,
+		.bindcnt = ci->tcf_bindcnt - bind,
+		.action  = ci->tcf_action,
+		.zone   = ci->zone,
+	};
+	struct tcf_t t;
+
+	if (nla_put(skb, TCA_CONNMARK_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	t.install = jiffies_to_clock_t(jiffies - ci->tcf_tm.install);
+	t.lastuse = jiffies_to_clock_t(jiffies - ci->tcf_tm.lastuse);
+	t.expires = jiffies_to_clock_t(ci->tcf_tm.expires);
+	if (nla_put(skb, TCA_CONNMARK_TM, sizeof(t), &t))
+		goto nla_put_failure;
+
+	return skb->len;
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static struct tc_action_ops act_connmark_ops = {
+	.kind		=	"connmark",
+	.hinfo		=	&connmark_hash_info,
+	.type		=	TCA_ACT_CONNMARK,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_connmark,
+	.dump		=	tcf_connmark_dump,
+	.init		=	tcf_connmark_init,
+};
+
+MODULE_AUTHOR("Felix Fietkau <nbd@openwrt.org>");
+MODULE_DESCRIPTION("Connection tracking mark restoring");
+MODULE_LICENSE("GPL");
+
+static int __init connmark_init_module(void)
+{
+	int ret;
+
+	ret = tcf_hashinfo_init(&connmark_hash_info, CONNMARK_TAB_MASK);
+	if (ret)
+		return ret;
+
+	return tcf_register_action(&act_connmark_ops, CONNMARK_TAB_MASK);
+}
+
+static void __exit connmark_cleanup_module(void)
+{
+	tcf_unregister_action(&act_connmark_ops);
+}
+
+module_init(connmark_init_module);
+module_exit(connmark_cleanup_module);
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH v4 3/4] can: kvaser_usb: Add support for the Usbcan-II family
From: Ahmed S. Darwish @ 2015-01-18 20:12 UTC (permalink / raw)
  To: Olivier Sobrie
  Cc: Oliver Hartkopp, Wolfgang Grandegger, Marc Kleine-Budde,
	David S. Miller, Paul Gortmaker, Linux-CAN, netdev, LKML
In-Reply-To: <20150112135302.GB18351@hposo>

Hi!

On Mon, Jan 12, 2015 at 02:53:02PM +0100, Olivier Sobrie wrote:
> Hello,
> 
> On Sun, Jan 11, 2015 at 03:36:12PM -0500, Ahmed S. Darwish wrote:
> > From: Ahmed S. Darwish <ahmed.darwish@valeo.com>
> > 

...

> > @@ -98,7 +128,13 @@
> >  #define CMD_START_CHIP_REPLY		27
> >  #define CMD_STOP_CHIP			28
> >  #define CMD_STOP_CHIP_REPLY		29
> > -#define CMD_GET_CARD_INFO2		32
> > +#define CMD_READ_CLOCK			30
> > +#define CMD_READ_CLOCK_REPLY		31
> 
> These two defines are not used.
> 

They were added for completeness: the only gap in our continuous
sequence of command IDs from 12 to 39 ;-) No big deal, to be
removed in the next submission.

...

> > +
> > +struct kvaser_msg_tx_acknowledge_header {
> > +	u8 channel;
> > +	u8 tid;
> > +};
> 
> Is this struct really needed? Can't you simply use
> leaf_msg_tx_acknowledge or usbcan_msg_tx_acknowledge
> structures to read the header.
> Same for kvaser_msg_rx_can_header.
> 

They're added to ensure type-safety throughout the code. Basically
they're the common part of a command that has different wire format
between the Leaf and the USBCan, but share a common header.  Such
notation was only added when it was strictly necessary.

For example, there are three functions where 'rx_can_header' is
referenced in the driver, and one function where 'tx_acknowledge_header'
is referenced. Without such header structure, I'll have to sprinkle
3 to 4 extra blocks of:

	switch (dev->family) {
	       case KVASER_LEAF: 
	       case KVASER_USBCAN:
	}

which would be _really_ ugly. The *_header notation ensures that, in
the body of each function, we're accessing the fields in a very safe
manner.

Thanks,
Darwish

^ permalink raw reply

* Re: [PATCH v4 3/4] can: kvaser_usb: Add support for the Usbcan-II family
From: Marc Kleine-Budde @ 2015-01-18 20:13 UTC (permalink / raw)
  To: Ahmed S. Darwish, Olivier Sobrie
  Cc: Oliver Hartkopp, Wolfgang Grandegger, David S. Miller,
	Paul Gortmaker, Linux-CAN, netdev, LKML
In-Reply-To: <20150118201221.GA15143@linux>

[-- Attachment #1: Type: text/plain, Size: 2137 bytes --]

On 01/18/2015 09:12 PM, Ahmed S. Darwish wrote:
> Hi!
> 
> On Mon, Jan 12, 2015 at 02:53:02PM +0100, Olivier Sobrie wrote:
>> Hello,
>>
>> On Sun, Jan 11, 2015 at 03:36:12PM -0500, Ahmed S. Darwish wrote:
>>> From: Ahmed S. Darwish <ahmed.darwish@valeo.com>
>>>
> 
> ...
> 
>>> @@ -98,7 +128,13 @@
>>>  #define CMD_START_CHIP_REPLY		27
>>>  #define CMD_STOP_CHIP			28
>>>  #define CMD_STOP_CHIP_REPLY		29
>>> -#define CMD_GET_CARD_INFO2		32
>>> +#define CMD_READ_CLOCK			30
>>> +#define CMD_READ_CLOCK_REPLY		31
>>
>> These two defines are not used.
>>
> 
> They were added for completeness: the only gap in our continuous
> sequence of command IDs from 12 to 39 ;-) No big deal, to be
> removed in the next submission.
> 
> ...
> 
>>> +
>>> +struct kvaser_msg_tx_acknowledge_header {
>>> +	u8 channel;
>>> +	u8 tid;
>>> +};
>>
>> Is this struct really needed? Can't you simply use
>> leaf_msg_tx_acknowledge or usbcan_msg_tx_acknowledge
>> structures to read the header.
>> Same for kvaser_msg_rx_can_header.
>>
> 
> They're added to ensure type-safety throughout the code. Basically
> they're the common part of a command that has different wire format
> between the Leaf and the USBCan, but share a common header.  Such
> notation was only added when it was strictly necessary.
> 
> For example, there are three functions where 'rx_can_header' is
> referenced in the driver, and one function where 'tx_acknowledge_header'
> is referenced. Without such header structure, I'll have to sprinkle
> 3 to 4 extra blocks of:
> 
> 	switch (dev->family) {
> 	       case KVASER_LEAF: 
> 	       case KVASER_USBCAN:
> 	}
> 
> which would be _really_ ugly. The *_header notation ensures that, in
> the body of each function, we're accessing the fields in a very safe
> manner.

+1 Keep it as it is.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH iproute2 0/3] Unify output for inet sockets
From: Vadim Kochan @ 2015-01-18 20:43 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan

This is series is about to have output information from /proc and
Netlink in one place.

Some short summary:

    1) Make memory info more readable.
    2) Unify output of inet sockets from /proc and Netlink.
    3) Added new '-H, --human' option to show info in more human format
        which is used for memory info meanwhile.

Vadim Kochan (3):
  ss: Make meminfo look little bit more readable
  ss: Unify inet sockets output
  ss: Unify tcp stats output

 include/utils.h |   3 +
 ip/ipaddress.c  |  31 +--
 lib/utils.c     |  40 ++-
 misc/ss.c       | 737 ++++++++++++++++++++++++++++++--------------------------
 4 files changed, 437 insertions(+), 374 deletions(-)

-- 
2.1.3

^ permalink raw reply

* [PATCH iproute2 1/3] ss: Make meminfo look little bit more readable
From: Vadim Kochan @ 2015-01-18 20:43 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan
In-Reply-To: <1421613815-6635-1-git-send-email-vadim4j@gmail.com>

From: Vadim Kochan <vadim4j@gmail.com>

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 include/utils.h |  3 +++
 ip/ipaddress.c  | 31 ++--------------------
 lib/utils.c     | 40 +++++++++++++++++++++++++++-
 misc/ss.c       | 82 ++++++++++++++++++++++++++++++++++-----------------------
 4 files changed, 93 insertions(+), 63 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index e1fe7cf..6b0b76c 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -5,6 +5,7 @@
 #include <asm/types.h>
 #include <resolv.h>
 #include <stdlib.h>
+#include <stdbool.h>
 
 #include "libnetlink.h"
 #include "ll_map.h"
@@ -162,4 +163,6 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 		char **name, char **type, char **link, char **dev,
 		int *group, int *index);
 
+char *sprint_num(char *str, uint64_t num, bool use_iec);
+
 #endif /* __UTILS_H__ */
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index d5e863d..5a2a956 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -362,41 +362,14 @@ static void print_vfinfo(FILE *fp, struct rtattr *vfinfo)
 
 static void print_num(FILE *fp, unsigned width, uint64_t count)
 {
-	const char *prefix = "kMGTPE";
-	const unsigned int base = use_iec ? 1024 : 1000;
-	uint64_t powi = 1;
-	uint16_t powj = 1;
-	uint8_t precision = 2;
 	char buf[64];
 
-	if (!human_readable || count < base) {
+	if (!human_readable) {
 		fprintf(fp, "%-*"PRIu64" ", width, count);
 		return;
 	}
 
-	/* increase value by a factor of 1000/1024 and print
-	 * if result is something a human can read */
-	for(;;) {
-		powi *= base;
-		if (count / base < powi)
-			break;
-
-		if (!prefix[1])
-			break;
-		++prefix;
-	}
-
-	/* try to guess a good number of digits for precision */
-	for (; precision > 0; precision--) {
-		powj *= 10;
-		if (count / powi < powj)
-			break;
-	}
-
-	snprintf(buf, sizeof(buf), "%.*f%c%s", precision,
-		(double) count / powi, *prefix, use_iec ? "i" : "");
-
-	fprintf(fp, "%-*s ", width, buf);
+	fprintf(fp, "%-*s ", width, sprint_num(buf, count, use_iec));
 }
 
 static void print_link_stats64(FILE *fp, const struct rtnl_link_stats64 *s,
diff --git a/lib/utils.c b/lib/utils.c
index f65ceaa..21e2e78 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -28,7 +28,7 @@
 #include <time.h>
 #include <sys/time.h>
 #include <errno.h>
-
+#include <inttypes.h>
 
 #include "utils.h"
 
@@ -878,3 +878,41 @@ void print_nlmsg_timestamp(FILE *fp, const struct nlmsghdr *n)
 	tstr[strlen(tstr)-1] = 0;
 	fprintf(fp, "Timestamp: %s %lu us\n", tstr, usecs);
 }
+
+char *sprint_num(char *str, uint64_t num, bool use_iec)
+{
+	const char *prefix = "kMGTPE";
+	const unsigned int base = use_iec ? 1024 : 1000;
+	uint64_t powi = 1;
+	uint16_t powj = 1;
+	uint8_t precision = 2;
+
+	if (num < base) {
+		sprintf(str, "%"PRIu64, num);
+		return str;
+	}
+
+	/* increase value by a factor of 1000/1024 and print
+	 * if result is something a human can read */
+	for(;;) {
+		powi *= base;
+		if (num / base < powi)
+			break;
+
+		if (!prefix[1])
+			break;
+		++prefix;
+	}
+
+	/* try to guess a good number of digits for precision */
+	for (; precision > 0; precision--) {
+		powj *= 10;
+		if (num / powi < powj)
+			break;
+	}
+
+	sprintf(str, "%.*f%c%s", precision,
+		(double) num / powi, *prefix, use_iec ? "i" : "");
+
+	return str;
+}
diff --git a/misc/ss.c b/misc/ss.c
index f434f57..c5995ab 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -26,6 +26,7 @@
 #include <fnmatch.h>
 #include <getopt.h>
 #include <stdbool.h>
+#include <inttypes.h>
 
 #include "utils.h"
 #include "rt_names.h"
@@ -96,6 +97,7 @@ int show_tcpinfo = 0;
 int show_bpf = 0;
 int show_proc_ctx = 0;
 int show_sock_ctx = 0;
+bool show_human = 0;
 /* If show_users & show_proc_ctx only do user_ent_hash_build() once */
 int user_ent_hash_build_init = 0;
 
@@ -1598,37 +1600,56 @@ outerr:
 	return ferror(fp) ? -1 : 0;
 }
 
-static char *sprint_bw(char *buf, double bw)
+static char *sprint_bandw(char *buf, double bw)
 {
-	if (bw > 1000000.)
-		sprintf(buf,"%.1fM", bw / 1000000.);
-	else if (bw > 1000.)
-		sprintf(buf,"%.1fK", bw / 1000.);
-	else
-		sprintf(buf, "%g", bw);
+	return sprint_num(buf, bw, false);
+}
 
-	return buf;
+static char *sprint_bytes(char *buf, uint64_t bytes)
+{
+	if (!show_human) {
+		sprintf(buf, "%"PRIu64, bytes);
+		return buf;
+	}
+
+	return sprint_num(buf, bytes, false);
 }
 
 static void print_skmeminfo(struct rtattr *tb[], int attrtype)
 {
+	char buf[64];
 	const __u32 *skmeminfo;
-	if (!tb[attrtype])
+
+	if (!tb[attrtype]) {
+		if (attrtype == INET_DIAG_SKMEMINFO) {
+			if (!tb[INET_DIAG_MEMINFO])
+				return;
+
+			const struct inet_diag_meminfo *minfo =
+				RTA_DATA(tb[INET_DIAG_MEMINFO]);
+
+			printf(" mem:(rd=%s,", sprint_bytes(buf, minfo->idiag_rmem));
+			printf("wr=%s,", sprint_bytes(buf, minfo->idiag_tmem));
+			printf("fwd=%s,", sprint_bytes(buf, minfo->idiag_fmem));
+			printf("wr_queue=%s)", sprint_bytes(buf, minfo->idiag_wmem));
+		}
 		return;
+	}
+
 	skmeminfo = RTA_DATA(tb[attrtype]);
 
-	printf(" skmem:(r%u,rb%u,t%u,tb%u,f%u,w%u,o%u",
-	       skmeminfo[SK_MEMINFO_RMEM_ALLOC],
-	       skmeminfo[SK_MEMINFO_RCVBUF],
-	       skmeminfo[SK_MEMINFO_WMEM_ALLOC],
-	       skmeminfo[SK_MEMINFO_SNDBUF],
-	       skmeminfo[SK_MEMINFO_FWD_ALLOC],
-	       skmeminfo[SK_MEMINFO_WMEM_QUEUED],
-	       skmeminfo[SK_MEMINFO_OPTMEM]);
+	printf(" skmem:(");
+	printf("rd=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_RMEM_ALLOC]));
+	printf("rcv_buf=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_RCVBUF]));
+	printf("wr=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_WMEM_ALLOC]));
+	printf("snd_buf=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_SNDBUF]));
+	printf("fwd=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_FWD_ALLOC]));
+	printf("wr_queue=%s,", sprint_bytes(buf, skmeminfo[SK_MEMINFO_WMEM_QUEUED]));
+	printf("oth=%s", sprint_bytes(buf, skmeminfo[SK_MEMINFO_OPTMEM]));
 
 	if (RTA_PAYLOAD(tb[attrtype]) >=
 		(SK_MEMINFO_BACKLOG + 1) * sizeof(__u32))
-		printf(",bl%u", skmeminfo[SK_MEMINFO_BACKLOG]);
+		printf(",bklog=%s", sprint_bytes(buf, skmeminfo[SK_MEMINFO_BACKLOG]));
 
 	printf(")");
 }
@@ -1639,17 +1660,7 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 	char b1[64];
 	double rtt = 0;
 
-	if (tb[INET_DIAG_SKMEMINFO]) {
-		print_skmeminfo(tb, INET_DIAG_SKMEMINFO);
-	} else if (tb[INET_DIAG_MEMINFO]) {
-		const struct inet_diag_meminfo *minfo
-			= RTA_DATA(tb[INET_DIAG_MEMINFO]);
-		printf(" mem:(r%u,w%u,f%u,t%u)",
-		       minfo->idiag_rmem,
-		       minfo->idiag_wmem,
-		       minfo->idiag_fmem,
-		       minfo->idiag_tmem);
-	}
+	print_skmeminfo(tb, INET_DIAG_SKMEMINFO);
 
 	if (tb[INET_DIAG_INFO]) {
 		struct tcp_info *info;
@@ -1723,7 +1734,7 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 
 		if (rtt > 0 && info->tcpi_snd_mss && info->tcpi_snd_cwnd) {
 			printf(" send %sbps",
-			       sprint_bw(b1, (double) info->tcpi_snd_cwnd *
+			       sprint_bandw(b1, (double) info->tcpi_snd_cwnd *
 					 (double) info->tcpi_snd_mss * 8000000.
 					 / rtt));
 		}
@@ -1740,12 +1751,12 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 		if (info->tcpi_pacing_rate &&
 		    info->tcpi_pacing_rate != ~0ULL) {
 			printf(" pacing_rate %sbps",
-				sprint_bw(b1, info->tcpi_pacing_rate * 8.));
+				sprint_bandw(b1, info->tcpi_pacing_rate * 8.));
 
 			if (info->tcpi_max_pacing_rate &&
 			    info->tcpi_max_pacing_rate != ~0ULL)
 				printf("/%sbps",
-					sprint_bw(b1, info->tcpi_max_pacing_rate * 8.));
+					sprint_bandw(b1, info->tcpi_max_pacing_rate * 8.));
 		}
 		if (info->tcpi_unacked)
 			printf(" unacked:%u", info->tcpi_unacked);
@@ -3207,6 +3218,7 @@ static void _usage(FILE *dest)
 "   -b, --bpf           show bpf filter socket information\n"
 "   -Z, --context       display process SELinux security contexts\n"
 "   -z, --contexts      display process and socket SELinux security contexts\n"
+"   -H, --human         display info in human readable format\n"
 "\n"
 "   -4, --ipv4          display only IP version 4 sockets\n"
 "   -6, --ipv6          display only IP version 6 sockets\n"
@@ -3306,6 +3318,7 @@ static const struct option long_opts[] = {
 	{ "help", 0, 0, 'h' },
 	{ "context", 0, 0, 'Z' },
 	{ "contexts", 0, 0, 'z' },
+	{ "human", 0, 0, 'H' },
 	{ 0 }
 
 };
@@ -3321,7 +3334,7 @@ int main(int argc, char *argv[])
 	struct filter dbs_filter = {};
 	int state_filter = 0;
 
-	while ((ch = getopt_long(argc, argv, "dhaletuwxnro460spbf:miA:D:F:vVzZ",
+	while ((ch = getopt_long(argc, argv, "dhaletuwxnro460spbf:miA:D:F:vVzZH",
 				 long_opts, NULL)) != EOF) {
 		switch(ch) {
 		case 'n':
@@ -3493,6 +3506,9 @@ int main(int argc, char *argv[])
 			show_proc_ctx++;
 			user_ent_hash_build();
 			break;
+		case 'H':
+			show_human = true;
+			break;
 		case 'h':
 		case '?':
 			help();
-- 
2.1.3

^ permalink raw reply related

* [PATCH iproute2 2/3] ss: Unify inet sockets output
From: Vadim Kochan @ 2015-01-18 20:43 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan
In-Reply-To: <1421613815-6635-1-git-send-email-vadim4j@gmail.com>

From: Vadim Kochan <vadim4j@gmail.com>

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 misc/ss.c | 355 +++++++++++++++++++++++++-------------------------------------
 1 file changed, 144 insertions(+), 211 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index c5995ab..40439b3 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -708,6 +708,7 @@ struct tcpstat
 	int		refcnt;
 	unsigned long long sk;
 	int		rto, ato, qack, cwnd, ssthresh;
+	unsigned int	iface;
 };
 
 static const char *tmr_name[] = {
@@ -746,12 +747,6 @@ static const char *print_ms_timer(int timeout)
 	return buf;
 }
 
-static const char *print_hz_timer(int timeout)
-{
-	int hz = get_user_hz();
-	return print_ms_timer(((timeout*1000) + hz-1)/hz);
-}
-
 struct scache
 {
 	struct scache *next;
@@ -1441,55 +1436,124 @@ out:
 	return res;
 }
 
-static int tcp_show_line(char *line, const struct filter *f, int family)
+static char *proto_name(int protocol)
+{
+	switch (protocol) {
+	case IPPROTO_UDP:
+		return "udp";
+	case IPPROTO_TCP:
+		return "tcp";
+	case IPPROTO_DCCP:
+		return "dccp";
+	}
+
+	return "???";
+}
+
+static void inet_stats_print(struct tcpstat *s, int protocol)
+{
+	char *buf = NULL;
+
+	if (netid_width)
+		printf("%-*s ", netid_width, proto_name(protocol));
+	if (state_width)
+		printf("%-*s ", state_width, sstate_name[s->state]);
+
+	printf("%-6d %-6d ", s->rq, s->wq);
+
+	formatted_print(&s->local, s->lport, s->iface);
+	formatted_print(&s->remote, s->rport, 0);
+
+	if (show_options) {
+		if (s->timer) {
+			if (s->timer > 4)
+				s->timer = 5;
+			printf(" timer:(%s,%s,%d)",
+			       tmr_name[s->timer],
+			       print_ms_timer(s->timeout),
+			       s->retrs);
+		}
+	}
+
+	if (show_proc_ctx || show_sock_ctx) {
+		if (find_entry(s->ino, &buf,
+				(show_proc_ctx & show_sock_ctx) ?
+				PROC_SOCK_CTX : PROC_CTX) > 0) {
+			printf(" users:(%s)", buf);
+			free(buf);
+		}
+	} else if (show_users) {
+		if (find_entry(s->ino, &buf, USERS) > 0) {
+			printf(" users:(%s)", buf);
+			free(buf);
+		}
+	}
+}
+
+static int proc_parse_inet_addr(char *loc, char *rem, int family, struct tcpstat *s)
+{
+	s->local.family = s->remote.family = family;
+	if (family == AF_INET) {
+		sscanf(loc, "%x:%x", s->local.data, (unsigned*)&s->lport);
+		sscanf(rem, "%x:%x", s->remote.data, (unsigned*)&s->rport);
+		s->local.bytelen = s->remote.bytelen = 4;
+		return 0;
+	} else {
+		sscanf(loc, "%08x%08x%08x%08x:%x",
+		       s->local.data,
+		       s->local.data + 1,
+		       s->local.data + 2,
+		       s->local.data + 3,
+		       &s->lport);
+		sscanf(rem, "%08x%08x%08x%08x:%x",
+		       s->remote.data,
+		       s->remote.data + 1,
+		       s->remote.data + 2,
+		       s->remote.data + 3,
+		       &s->rport);
+		s->local.bytelen = s->remote.bytelen = 16;
+		return 0;
+	}
+	return -1;
+}
+
+static int proc_inet_split_line(char *line, char **loc, char **rem, char **data)
 {
-	struct tcpstat s;
-	char *loc, *rem, *data;
-	char opt[256];
-	int n;
 	char *p;
 
 	if ((p = strchr(line, ':')) == NULL)
 		return -1;
-	loc = p+2;
 
-	if ((p = strchr(loc, ':')) == NULL)
+	*loc = p+2;
+	if ((p = strchr(*loc, ':')) == NULL)
 		return -1;
-	p[5] = 0;
-	rem = p+6;
 
-	if ((p = strchr(rem, ':')) == NULL)
+	p[5] = 0;
+	*rem = p+6;
+	if ((p = strchr(*rem, ':')) == NULL)
 		return -1;
+
 	p[5] = 0;
-	data = p+6;
+	*data = p+6;
+	return 0;
+}
 
-	do {
-		int state = (data[1] >= 'A') ? (data[1] - 'A' + 10) : (data[1] - '0');
+static int tcp_show_line(char *line, const struct filter *f, int family)
+{
+	struct tcpstat s = {};
+	char *loc, *rem, *data;
+	char opt[256];
+	int n;
+	int hz = get_user_hz();
 
-		if (!(f->states & (1<<state)))
-			return 0;
-	} while (0);
+	if (proc_inet_split_line(line, &loc, &rem, &data))
+		return -1;
 
-	s.local.family = s.remote.family = family;
-	if (family == AF_INET) {
-		sscanf(loc, "%x:%x", s.local.data, (unsigned*)&s.lport);
-		sscanf(rem, "%x:%x", s.remote.data, (unsigned*)&s.rport);
-		s.local.bytelen = s.remote.bytelen = 4;
-	} else {
-		sscanf(loc, "%08x%08x%08x%08x:%x",
-		       s.local.data,
-		       s.local.data+1,
-		       s.local.data+2,
-		       s.local.data+3,
-		       &s.lport);
-		sscanf(rem, "%08x%08x%08x%08x:%x",
-		       s.remote.data,
-		       s.remote.data+1,
-		       s.remote.data+2,
-		       s.remote.data+3,
-		       &s.rport);
-		s.local.bytelen = s.remote.bytelen = 16;
-	}
+	int state = (data[1] >= 'A') ? (data[1] - 'A' + 10) : (data[1] - '0');
+	if (!(f->states & (1 << state)))
+		return 0;
+
+	proc_parse_inet_addr(loc, rem, family, &s);
 
 	if (f->f && run_ssfilter(f->f, &s) == 0)
 		return 0;
@@ -1511,66 +1575,36 @@ static int tcp_show_line(char *line, const struct filter *f, int family)
 		s.ato = s.qack = 0;
 	}
 
-	if (netid_width)
-		printf("%-*s ", netid_width, "tcp");
-	if (state_width)
-		printf("%-*s ", state_width, sstate_name[s.state]);
-
-	printf("%-6d %-6d ", s.rq, s.wq);
+	s.retrs = s.timer != 1 ? s.probes : s.retrs;
+	s.timeout = (s.timeout * 1000 + hz - 1) / hz;
 
-	formatted_print(&s.local, s.lport, 0);
-	formatted_print(&s.remote, s.rport, 0);
+	inet_stats_print(&s, IPPROTO_TCP);
 
-	if (show_options) {
-		if (s.timer) {
-			if (s.timer > 4)
-				s.timer = 5;
-			printf(" timer:(%s,%s,%d)",
-			       tmr_name[s.timer],
-			       print_hz_timer(s.timeout),
-			       s.timer != 1 ? s.probes : s.retrs);
-		}
+	if (show_details) {
+		if (s.uid)
+			printf(" uid:%u", (unsigned)s.uid);
+		printf(" ino:%u", s.ino);
+		printf(" sk:%llx", s.sk);
+		if (opt[0])
+			printf(" opt:\"%s\"", opt);
 	}
+
 	if (show_tcpinfo) {
-		int hz = get_user_hz();
-		if (s.rto && s.rto != 3*hz)
-			printf(" rto:%g", (double)s.rto/hz);
+		if (s.rto && s.rto != 3 * hz)
+			printf(" rto:%g", (double)s.rto / hz);
 		if (s.ato)
-			printf(" ato:%g", (double)s.ato/hz);
+			printf(" ato:%g", (double)s.ato / hz);
 		if (s.cwnd != 2)
 			printf(" cwnd:%d", s.cwnd);
 		if (s.ssthresh != -1)
 			printf(" ssthresh:%d", s.ssthresh);
-		if (s.qack/2)
-			printf(" qack:%d", s.qack/2);
-		if (s.qack&1)
+		if (s.qack / 2)
+			printf(" qack:%d", s.qack / 2);
+		if (s.qack & 1)
 			printf(" bidir");
 	}
-	char *buf = NULL;
-	if (show_proc_ctx || show_sock_ctx) {
-		if (find_entry(s.ino, &buf,
-				(show_proc_ctx & show_sock_ctx) ?
-				PROC_SOCK_CTX : PROC_CTX) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	} else if (show_users) {
-		if (find_entry(s.ino, &buf, USERS) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	}
 
-	if (show_details) {
-		if (s.uid)
-			printf(" uid:%u", (unsigned)s.uid);
-		printf(" ino:%u", s.ino);
-		printf(" sk:%llx", s.sk);
-		if (opt[0])
-			printf(" opt:\"%s\"", opt);
-	}
 	printf("\n");
-
 	return 0;
 }
 
@@ -1779,25 +1813,11 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 	}
 }
 
-static char *proto_name(int protocol)
-{
-	switch (protocol) {
-	case IPPROTO_UDP:
-		return "udp";
-	case IPPROTO_TCP:
-		return "tcp";
-	case IPPROTO_DCCP:
-		return "dccp";
-	}
-
-	return "???";
-}
-
 static int inet_show_sock(struct nlmsghdr *nlh, struct filter *f, int protocol)
 {
 	struct rtattr * tb[INET_DIAG_MAX+1];
 	struct inet_diag_msg *r = NLMSG_DATA(nlh);
-	struct tcpstat s;
+	struct tcpstat s = {};
 
 	parse_rtattr(tb, INET_DIAG_MAX, (struct rtattr*)(r+1),
 		     nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
@@ -1806,52 +1826,28 @@ static int inet_show_sock(struct nlmsghdr *nlh, struct filter *f, int protocol)
 	s.local.family = s.remote.family = r->idiag_family;
 	s.lport = ntohs(r->id.idiag_sport);
 	s.rport = ntohs(r->id.idiag_dport);
+	s.wq = r->idiag_wqueue;
+	s.rq = r->idiag_rqueue;
+	s.timer = r->idiag_timer;
+	s.timeout = r->idiag_expires;
+	s.retrs = r->idiag_retrans;
+	s.ino = r->idiag_inode;
+	s.uid = r->idiag_uid;
+	s.iface = r->id.idiag_if;
+
 	if (s.local.family == AF_INET) {
 		s.local.bytelen = s.remote.bytelen = 4;
 	} else {
 		s.local.bytelen = s.remote.bytelen = 16;
 	}
+
 	memcpy(s.local.data, r->id.idiag_src, s.local.bytelen);
 	memcpy(s.remote.data, r->id.idiag_dst, s.local.bytelen);
 
 	if (f && f->f && run_ssfilter(f->f, &s) == 0)
 		return 0;
 
-	if (netid_width)
-		printf("%-*s ", netid_width, proto_name(protocol));
-	if (state_width)
-		printf("%-*s ", state_width, sstate_name[s.state]);
-
-	printf("%-6d %-6d ", r->idiag_rqueue, r->idiag_wqueue);
-
-	formatted_print(&s.local, s.lport, r->id.idiag_if);
-	formatted_print(&s.remote, s.rport, 0);
-
-	if (show_options) {
-		if (r->idiag_timer) {
-			if (r->idiag_timer > 4)
-				r->idiag_timer = 5;
-			printf(" timer:(%s,%s,%d)",
-			       tmr_name[r->idiag_timer],
-			       print_ms_timer(r->idiag_expires),
-			       r->idiag_retrans);
-		}
-	}
-	char *buf = NULL;
-
-	if (show_proc_ctx || show_sock_ctx) {
-		if (find_entry(r->idiag_inode, &buf,
-				(show_proc_ctx & show_sock_ctx) ?
-				PROC_SOCK_CTX : PROC_CTX) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	} else if (show_users) {
-		if (find_entry(r->idiag_inode, &buf, USERS) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	}
+	inet_stats_print(&s, protocol);
 
 	if (show_details) {
 		if (r->idiag_uid)
@@ -1867,13 +1863,13 @@ static int inet_show_sock(struct nlmsghdr *nlh, struct filter *f, int protocol)
 			printf(" %c-%c", mask & 1 ? '-' : '<', mask & 2 ? '-' : '>');
 		}
 	}
+
 	if (show_mem || show_tcpinfo) {
 		printf("\n\t");
 		tcp_show_info(nlh, r, tb);
 	}
 
 	printf("\n");
-
 	return 0;
 }
 
@@ -2194,53 +2190,19 @@ outerr:
 
 static int dgram_show_line(char *line, const struct filter *f, int family)
 {
-	struct tcpstat s;
+	struct tcpstat s = {};
 	char *loc, *rem, *data;
 	char opt[256];
 	int n;
-	char *p;
-
-	if ((p = strchr(line, ':')) == NULL)
-		return -1;
-	loc = p+2;
-
-	if ((p = strchr(loc, ':')) == NULL)
-		return -1;
-	p[5] = 0;
-	rem = p+6;
 
-	if ((p = strchr(rem, ':')) == NULL)
+	if (proc_inet_split_line(line, &loc, &rem, &data))
 		return -1;
-	p[5] = 0;
-	data = p+6;
-
-	do {
-		int state = (data[1] >= 'A') ? (data[1] - 'A' + 10) : (data[1] - '0');
 
-		if (!(f->states & (1<<state)))
-			return 0;
-	} while (0);
+	int state = (data[1] >= 'A') ? (data[1] - 'A' + 10) : (data[1] - '0');
+	if (!(f->states & (1 << state)))
+		return 0;
 
-	s.local.family = s.remote.family = family;
-	if (family == AF_INET) {
-		sscanf(loc, "%x:%x", s.local.data, (unsigned*)&s.lport);
-		sscanf(rem, "%x:%x", s.remote.data, (unsigned*)&s.rport);
-		s.local.bytelen = s.remote.bytelen = 4;
-	} else {
-		sscanf(loc, "%08x%08x%08x%08x:%x",
-		       s.local.data,
-		       s.local.data+1,
-		       s.local.data+2,
-		       s.local.data+3,
-		       &s.lport);
-		sscanf(rem, "%08x%08x%08x%08x:%x",
-		       s.remote.data,
-		       s.remote.data+1,
-		       s.remote.data+2,
-		       s.remote.data+3,
-		       &s.rport);
-		s.local.bytelen = s.remote.bytelen = 16;
-	}
+	proc_parse_inet_addr(loc, rem, family, &s);
 
 	if (f->f && run_ssfilter(f->f, &s) == 0)
 		return 0;
@@ -2254,31 +2216,7 @@ static int dgram_show_line(char *line, const struct filter *f, int family)
 	if (n < 9)
 		opt[0] = 0;
 
-	if (netid_width)
-		printf("%-*s ", netid_width, dg_proto);
-	if (state_width)
-		printf("%-*s ", state_width, sstate_name[s.state]);
-
-	printf("%-6d %-6d ", s.rq, s.wq);
-
-	formatted_print(&s.local, s.lport, 0);
-	formatted_print(&s.remote, s.rport, 0);
-
-	char *buf = NULL;
-
-	if (show_proc_ctx || show_sock_ctx) {
-		if (find_entry(s.ino, &buf,
-				(show_proc_ctx & show_sock_ctx) ?
-				PROC_SOCK_CTX : PROC_CTX) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	} else if (show_users) {
-		if (find_entry(s.ino, &buf, USERS) > 0) {
-			printf(" users:(%s)", buf);
-			free(buf);
-		}
-	}
+	inet_stats_print(&s, IPPROTO_UDP);
 
 	if (show_details) {
 		if (s.uid)
@@ -2288,12 +2226,11 @@ static int dgram_show_line(char *line, const struct filter *f, int family)
 		if (opt[0])
 			printf(" opt:\"%s\"", opt);
 	}
-	printf("\n");
 
+	printf("\n");
 	return 0;
 }
 
-
 static int udp_show(struct filter *f)
 {
 	FILE *fp = NULL;
@@ -2362,7 +2299,6 @@ outerr:
 	} while (0);
 }
 
-
 struct unixstat
 {
 	struct unixstat *next;
@@ -2376,12 +2312,9 @@ struct unixstat
 	char *name;
 };
 
-
-
 int unix_state_map[] = { SS_CLOSE, SS_SYN_SENT,
 			 SS_ESTABLISHED, SS_CLOSING };
 
-
 #define MAX_UNIX_REMEMBER (1024*1024/sizeof(struct unixstat))
 
 static void unix_list_free(struct unixstat *list)
-- 
2.1.3

^ permalink raw reply related

* [PATCH iproute2 3/3] ss: Unify tcp stats output
From: Vadim Kochan @ 2015-01-18 20:43 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan
In-Reply-To: <1421613815-6635-1-git-send-email-vadim4j@gmail.com>

From: Vadim Kochan <vadim4j@gmail.com>

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 misc/ss.c | 362 +++++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 231 insertions(+), 131 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 40439b3..73097b2 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -691,24 +691,59 @@ static const char *sstate_namel[] = {
 	[SS_CLOSING] = "closing",
 };
 
+struct dctcpstat
+{
+	unsigned int	ce_state;
+	unsigned int	alpha;
+	unsigned int	ab_ecn;
+	unsigned int	ab_tot;
+	bool		enabled;
+};
+
 struct tcpstat
 {
-	inet_prefix	local;
-	inet_prefix	remote;
-	int		lport;
-	int		rport;
-	int		state;
-	int		rq, wq;
-	int		timer;
-	int		timeout;
-	int		retrs;
-	unsigned	ino;
-	int		probes;
-	unsigned	uid;
-	int		refcnt;
-	unsigned long long sk;
-	int		rto, ato, qack, cwnd, ssthresh;
-	unsigned int	iface;
+	inet_prefix	    local;
+	inet_prefix	    remote;
+	int		    lport;
+	int		    rport;
+	int		    state;
+	int		    rq, wq;
+	unsigned	    ino;
+	unsigned	    uid;
+	int		    refcnt;
+	unsigned int	    iface;
+	unsigned long long  sk;
+	int		    timer;
+	int		    timeout;
+	int		    probes;
+	char		    *cong_alg;
+	double		    rto, ato, rtt, rttvar;
+	int		    qack, cwnd, ssthresh, backoff;
+	double		    send_bps;
+	int		    snd_wscale;
+	int		    rcv_wscale;
+	int		    mss;
+	unsigned int	    lastsnd;
+	unsigned int	    lastrcv;
+	unsigned int	    lastack;
+	double		    pacing_rate;
+	double		    pacing_rate_max;
+	unsigned int	    unacked;
+	unsigned int	    retrans;
+	unsigned int	    retrans_total;
+	unsigned int	    lost;
+	unsigned int	    sacked;
+	unsigned int	    fackets;
+	unsigned int	    reordering;
+	double		    rcv_rtt;
+	int		    rcv_space;
+	bool		    has_ts_opt;
+	bool		    has_sack_opt;
+	bool		    has_ecn_opt;
+	bool		    has_ecnseen_opt;
+	bool		    has_fastopen_opt;
+	bool		    has_wscale_opt;
+	struct dctcpstat    *dctcp;
 };
 
 static const char *tmr_name[] = {
@@ -1471,7 +1506,7 @@ static void inet_stats_print(struct tcpstat *s, int protocol)
 			printf(" timer:(%s,%s,%d)",
 			       tmr_name[s->timer],
 			       print_ms_timer(s->timeout),
-			       s->retrs);
+			       s->retrans);
 		}
 	}
 
@@ -1538,8 +1573,107 @@ static int proc_inet_split_line(char *line, char **loc, char **rem, char **data)
 	return 0;
 }
 
+static char *sprint_bandw(char *buf, double bw)
+{
+	return sprint_num(buf, bw, false);
+}
+
+static char *sprint_bytes(char *buf, uint64_t bytes)
+{
+	if (!show_human) {
+		sprintf(buf, "%"PRIu64, bytes);
+		return buf;
+	}
+
+	return sprint_num(buf, bytes, false);
+}
+
+static void tcp_stats_print(struct tcpstat *s)
+{
+	char b1[64];
+
+	if (s->has_ts_opt)
+		printf(" ts");
+	if (s->has_sack_opt)
+		printf(" sack");
+	if (s->has_ecn_opt)
+		printf(" ecn");
+	if (s->has_ecnseen_opt)
+		printf(" ecnseen");
+	if (s->has_fastopen_opt)
+		printf(" fastopen");
+	if (s->cong_alg)
+		printf(" %s", s->cong_alg);
+	if (s->has_wscale_opt)
+		printf(" wscale:%d,%d", s->snd_wscale, s->rcv_wscale);
+	if (s->rto)
+		printf(" rto:%g", s->rto);
+	if (s->backoff)
+		printf(" backoff:%u", s->backoff);
+	if (s->rtt)
+		printf(" rtt:%g/%g", s->rtt, s->rttvar);
+	if (s->ato)
+		printf(" ato:%g", s->ato);
+
+	if (s->qack)
+		printf(" qack:%d", s->qack);
+	if (s->qack & 1)
+		printf(" bidir");
+
+	if (s->mss)
+		printf(" mss:%d", s->mss);
+	if (s->cwnd && s->cwnd != 2)
+		printf(" cwnd:%d", s->cwnd);
+	if (s->ssthresh)
+		printf(" ssthresh:%d", s->ssthresh);
+
+	if (s->dctcp && s->dctcp->enabled) {
+		struct dctcpstat *dctcp = s->dctcp;
+
+		printf(" ce_state %u alpha %u ab_ecn %u ab_tot %u",
+				dctcp->ce_state, dctcp->alpha, dctcp->ab_ecn,
+				dctcp->ab_tot);
+	} else if (s->dctcp) {
+		printf(" fallback_mode");
+	}
+
+	if (s->send_bps)
+		printf(" send %sbps", sprint_bandw(b1, s->send_bps));
+	if (s->lastsnd)
+		printf(" lastsnd:%u", s->lastsnd);
+	if (s->lastrcv)
+		printf(" lastrcv:%u", s->lastrcv);
+	if (s->lastack)
+		printf(" lastack:%u", s->lastack);
+
+	if (s->pacing_rate) {
+		printf(" pacing_rate %sbps", sprint_bandw(b1, s->pacing_rate));
+		if (s->pacing_rate_max)
+				printf("/%sbps", sprint_bandw(b1,
+							s->pacing_rate_max));
+	}
+
+	if (s->unacked)
+		printf(" unacked:%u", s->unacked);
+	if (s->retrans || s->retrans_total)
+		printf(" retrans:%u/%u", s->retrans, s->retrans_total);
+	if (s->lost)
+		printf(" lost:%u", s->lost);
+	if (s->sacked && s->state != SS_LISTEN)
+		printf(" sacked:%u", s->sacked);
+	if (s->fackets)
+		printf(" fackets:%u", s->fackets);
+	if (s->reordering != 3)
+		printf(" reordering:%d", s->reordering);
+	if (s->rcv_rtt)
+		printf(" rcv_rtt:%g", s->rcv_rtt);
+	if (s->rcv_space)
+		printf(" rcv_space:%d", s->rcv_space);
+}
+
 static int tcp_show_line(char *line, const struct filter *f, int family)
 {
+	int rto = 0, ato = 0;
 	struct tcpstat s = {};
 	char *loc, *rem, *data;
 	char opt[256];
@@ -1561,22 +1695,27 @@ static int tcp_show_line(char *line, const struct filter *f, int family)
 	opt[0] = 0;
 	n = sscanf(data, "%x %x:%x %x:%x %x %d %d %u %d %llx %d %d %d %d %d %[^\n]\n",
 		   &s.state, &s.wq, &s.rq,
-		   &s.timer, &s.timeout, &s.retrs, &s.uid, &s.probes, &s.ino,
-		   &s.refcnt, &s.sk, &s.rto, &s.ato, &s.qack,
+		   &s.timer, &s.timeout, &s.retrans, &s.uid, &s.probes, &s.ino,
+		   &s.refcnt, &s.sk, &rto, &ato, &s.qack,
 		   &s.cwnd, &s.ssthresh, opt);
 
 	if (n < 17)
 		opt[0] = 0;
 
 	if (n < 12) {
-		s.rto = 0;
+		rto = 0;
 		s.cwnd = 2;
 		s.ssthresh = -1;
-		s.ato = s.qack = 0;
+		ato = s.qack = 0;
 	}
 
-	s.retrs = s.timer != 1 ? s.probes : s.retrs;
-	s.timeout = (s.timeout * 1000 + hz - 1) / hz;
+	s.retrans   = s.timer != 1 ? s.probes : s.retrans;
+	s.timeout   = (s.timeout * 1000 + hz - 1) / hz;
+	s.ato	    = (double)ato / hz;
+	s.qack	   /= 2;
+	s.rto	    = (double)rto;
+	s.ssthresh  = s.ssthresh == -1 ? 0 : s.ssthresh;
+	s.rto	    = s.rto != 3 * hz  ? s.rto / hz : 0;
 
 	inet_stats_print(&s, IPPROTO_TCP);
 
@@ -1589,20 +1728,8 @@ static int tcp_show_line(char *line, const struct filter *f, int family)
 			printf(" opt:\"%s\"", opt);
 	}
 
-	if (show_tcpinfo) {
-		if (s.rto && s.rto != 3 * hz)
-			printf(" rto:%g", (double)s.rto / hz);
-		if (s.ato)
-			printf(" ato:%g", (double)s.ato / hz);
-		if (s.cwnd != 2)
-			printf(" cwnd:%d", s.cwnd);
-		if (s.ssthresh != -1)
-			printf(" ssthresh:%d", s.ssthresh);
-		if (s.qack / 2)
-			printf(" qack:%d", s.qack / 2);
-		if (s.qack & 1)
-			printf(" bidir");
-	}
+	if (show_tcpinfo)
+		tcp_stats_print(&s);
 
 	printf("\n");
 	return 0;
@@ -1634,21 +1761,6 @@ outerr:
 	return ferror(fp) ? -1 : 0;
 }
 
-static char *sprint_bandw(char *buf, double bw)
-{
-	return sprint_num(buf, bw, false);
-}
-
-static char *sprint_bytes(char *buf, uint64_t bytes)
-{
-	if (!show_human) {
-		sprintf(buf, "%"PRIu64, bytes);
-		return buf;
-	}
-
-	return sprint_num(buf, bytes, false);
-}
-
 static void print_skmeminfo(struct rtattr *tb[], int attrtype)
 {
 	char buf[64];
@@ -1688,11 +1800,13 @@ static void print_skmeminfo(struct rtattr *tb[], int attrtype)
 	printf(")");
 }
 
+#define TCPI_HAS_OPT(info, opt) !!(info->tcpi_options & (opt))
+
 static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 		struct rtattr *tb[])
 {
-	char b1[64];
 	double rtt = 0;
+	struct tcpstat s = {};
 
 	print_skmeminfo(tb, INET_DIAG_SKMEMINFO);
 
@@ -1709,39 +1823,49 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 			info = RTA_DATA(tb[INET_DIAG_INFO]);
 
 		if (show_options) {
-			if (info->tcpi_options & TCPI_OPT_TIMESTAMPS)
-				printf(" ts");
-			if (info->tcpi_options & TCPI_OPT_SACK)
-				printf(" sack");
-			if (info->tcpi_options & TCPI_OPT_ECN)
-				printf(" ecn");
-			if (info->tcpi_options & TCPI_OPT_ECN_SEEN)
-				printf(" ecnseen");
-			if (info->tcpi_options & TCPI_OPT_SYN_DATA)
-				printf(" fastopen");
-		}
-
-		if (tb[INET_DIAG_CONG])
-			printf(" %s", rta_getattr_str(tb[INET_DIAG_CONG]));
-
-		if (info->tcpi_options & TCPI_OPT_WSCALE)
-			printf(" wscale:%d,%d", info->tcpi_snd_wscale,
-			       info->tcpi_rcv_wscale);
+			s.has_ts_opt	   = TCPI_HAS_OPT(info, TCPI_OPT_TIMESTAMPS);
+			s.has_sack_opt	   = TCPI_HAS_OPT(info, TCPI_OPT_SACK);
+			s.has_ecn_opt	   = TCPI_HAS_OPT(info, TCPI_OPT_ECN);
+			s.has_ecnseen_opt  = TCPI_HAS_OPT(info, TCPI_OPT_ECN_SEEN);
+			s.has_fastopen_opt = TCPI_HAS_OPT(info, TCPI_OPT_SYN_DATA);
+		}
+
+		if (tb[INET_DIAG_CONG]) {
+			const char *cong_attr = rta_getattr_str(tb[INET_DIAG_CONG]);
+			s.cong_alg = malloc(strlen(cong_attr + 1));
+			strcpy(s.cong_alg, cong_attr);
+		}
+
+		if (TCPI_HAS_OPT(info, TCPI_OPT_WSCALE)) {
+			s.has_wscale_opt  = true;
+			s.snd_wscale	  = info->tcpi_snd_wscale;
+			s.rcv_wscale	  = info->tcpi_rcv_wscale;
+		}
+
 		if (info->tcpi_rto && info->tcpi_rto != 3000000)
-			printf(" rto:%g", (double)info->tcpi_rto/1000);
-		if (info->tcpi_backoff)
-			printf(" backoff:%u", info->tcpi_backoff);
-		if (info->tcpi_rtt)
-			printf(" rtt:%g/%g", (double)info->tcpi_rtt/1000,
-			       (double)info->tcpi_rttvar/1000);
-		if (info->tcpi_ato)
-			printf(" ato:%g", (double)info->tcpi_ato/1000);
-		if (info->tcpi_snd_mss)
-			printf(" mss:%d", info->tcpi_snd_mss);
-		if (info->tcpi_snd_cwnd != 2)
-			printf(" cwnd:%d", info->tcpi_snd_cwnd);
+			s.rto = (double)info->tcpi_rto / 1000;
+
+		s.backoff	 = info->tcpi_backoff;
+		s.rtt		 = (double)info->tcpi_rtt / 1000;
+		s.rttvar	 = (double)info->tcpi_rttvar / 1000;
+		s.ato		 = (double)info->tcpi_rttvar / 1000;
+		s.mss		 = info->tcpi_snd_mss;
+		s.rcv_space	 = info->tcpi_rcv_space;
+		s.rcv_rtt	 = (double)info->tcpi_rcv_rtt / 1000;
+		s.lastsnd	 = info->tcpi_last_data_sent;
+		s.lastrcv	 = info->tcpi_last_data_recv;
+		s.lastack	 = info->tcpi_last_ack_recv;
+		s.unacked	 = info->tcpi_unacked;
+		s.retrans	 = info->tcpi_retrans;
+		s.retrans_total  = info->tcpi_total_retrans;
+		s.lost		 = info->tcpi_lost;
+		s.sacked	 = info->tcpi_sacked;
+		s.reordering	 = info->tcpi_reordering;
+		s.rcv_space	 = info->tcpi_rcv_space;
+		s.cwnd		 = info->tcpi_snd_cwnd;
+
 		if (info->tcpi_snd_ssthresh < 0xFFFF)
-			printf(" ssthresh:%d", info->tcpi_snd_ssthresh);
+			s.ssthresh = info->tcpi_snd_ssthresh;
 
 		rtt = (double) info->tcpi_rtt;
 		if (tb[INET_DIAG_VEGASINFO]) {
@@ -1749,67 +1873,43 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 				= RTA_DATA(tb[INET_DIAG_VEGASINFO]);
 
 			if (vinfo->tcpv_enabled &&
-			    vinfo->tcpv_rtt && vinfo->tcpv_rtt != 0x7fffffff)
+					vinfo->tcpv_rtt && vinfo->tcpv_rtt != 0x7fffffff)
 				rtt =  vinfo->tcpv_rtt;
 		}
 
 		if (tb[INET_DIAG_DCTCPINFO]) {
+			struct dctcpstat *dctcp = malloc(sizeof(struct
+						dctcpstat));
+
 			const struct tcp_dctcp_info *dinfo
 				= RTA_DATA(tb[INET_DIAG_DCTCPINFO]);
 
-			if (dinfo->dctcp_enabled) {
-				printf(" ce_state %u alpha %u ab_ecn %u ab_tot %u",
-				       dinfo->dctcp_ce_state, dinfo->dctcp_alpha,
-				       dinfo->dctcp_ab_ecn, dinfo->dctcp_ab_tot);
-			} else {
-				printf(" fallback_mode");
-			}
+			dctcp->enabled	= !!dinfo->dctcp_enabled;
+			dctcp->ce_state = dinfo->dctcp_ce_state;
+			dctcp->alpha	= dinfo->dctcp_alpha;
+			dctcp->ab_ecn	= dinfo->dctcp_ab_ecn;
+			dctcp->ab_tot	= dinfo->dctcp_ab_tot;
+			s.dctcp		= dctcp;
 		}
 
 		if (rtt > 0 && info->tcpi_snd_mss && info->tcpi_snd_cwnd) {
-			printf(" send %sbps",
-			       sprint_bandw(b1, (double) info->tcpi_snd_cwnd *
-					 (double) info->tcpi_snd_mss * 8000000.
-					 / rtt));
+			s.send_bps = (double) info->tcpi_snd_cwnd *
+				(double)info->tcpi_snd_mss * 8000000. / rtt;
 		}
 
-		if (info->tcpi_last_data_sent)
-			printf(" lastsnd:%u", info->tcpi_last_data_sent);
-
-		if (info->tcpi_last_data_recv)
-			printf(" lastrcv:%u", info->tcpi_last_data_recv);
-
-		if (info->tcpi_last_ack_recv)
-			printf(" lastack:%u", info->tcpi_last_ack_recv);
-
 		if (info->tcpi_pacing_rate &&
-		    info->tcpi_pacing_rate != ~0ULL) {
-			printf(" pacing_rate %sbps",
-				sprint_bandw(b1, info->tcpi_pacing_rate * 8.));
+				info->tcpi_pacing_rate != ~0ULL) {
+			s.pacing_rate = info->tcpi_pacing_rate * 8.;
 
 			if (info->tcpi_max_pacing_rate &&
-			    info->tcpi_max_pacing_rate != ~0ULL)
-				printf("/%sbps",
-					sprint_bandw(b1, info->tcpi_max_pacing_rate * 8.));
-		}
-		if (info->tcpi_unacked)
-			printf(" unacked:%u", info->tcpi_unacked);
-		if (info->tcpi_retrans || info->tcpi_total_retrans)
-			printf(" retrans:%u/%u", info->tcpi_retrans,
-			       info->tcpi_total_retrans);
-		if (info->tcpi_lost)
-			printf(" lost:%u", info->tcpi_lost);
-		if (info->tcpi_sacked && r->idiag_state != SS_LISTEN)
-			printf(" sacked:%u", info->tcpi_sacked);
-		if (info->tcpi_fackets)
-			printf(" fackets:%u", info->tcpi_fackets);
-		if (info->tcpi_reordering != 3)
-			printf(" reordering:%d", info->tcpi_reordering);
-		if (info->tcpi_rcv_rtt)
-			printf(" rcv_rtt:%g", (double) info->tcpi_rcv_rtt/1000);
-		if (info->tcpi_rcv_space)
-			printf(" rcv_space:%d", info->tcpi_rcv_space);
-
+					info->tcpi_max_pacing_rate != ~0ULL)
+				s.pacing_rate_max = info->tcpi_max_pacing_rate * 8.;
+		}
+		tcp_stats_print(&s);
+		if (s.dctcp)
+			free(s.dctcp);
+		if (s.cong_alg)
+			free(s.cong_alg);
 	}
 }
 
@@ -1830,7 +1930,7 @@ static int inet_show_sock(struct nlmsghdr *nlh, struct filter *f, int protocol)
 	s.rq = r->idiag_rqueue;
 	s.timer = r->idiag_timer;
 	s.timeout = r->idiag_expires;
-	s.retrs = r->idiag_retrans;
+	s.retrans = r->idiag_retrans;
 	s.ino = r->idiag_inode;
 	s.uid = r->idiag_uid;
 	s.iface = r->id.idiag_if;
-- 
2.1.3

^ permalink raw reply related

* Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
From: roopa @ 2015-01-18 20:55 UTC (permalink / raw)
  To: Scott Feldman
  Cc: stephen@networkplumber.org, David S. Miller, Jamal Hadi Salim,
	Jiří Pírko, Arad, Ronen, Thomas Graf,
	john fastabend, vyasevic@redhat.com, Netdev, Wilson Kok,
	Andy Gospodarek
In-Reply-To: <54BB7874.90201@cumulusnetworks.com>

On 1/18/15, 1:10 AM, roopa wrote:
> On 1/17/15, 5:05 PM, Scott Feldman wrote:
>> On Fri, Jan 16, 2015 at 11:32 PM, <roopa@cumulusnetworks.com> wrote:
>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>
>>> On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
>>> there is a need to not re-forward the frames that come up to the
>>> kernel in software.
>>>
>>> Typically these are broadcast or multicast packets forwarded by the
>>> hardware to multiple destination ports including sending a copy of
>>> the packet to the kernel (e.g. an arp broadcast).
>>> The bridge driver will try to forward the packet again, resulting in
>>> two copies of the same packet.
>>>
>>> These packets can also come up to the kernel for logging when they hit
>>> a LOG acl in hardware.
>>>
>>> This patch makes forwarding a flag on the port similar to
>>> learn and flood and drops the packet just before forwarding.
>>> (The forwarding disable on a bridge is tested to work on our boxes.
>>> The bridge port flag addition is only compile tested.
>>> This will need to be further refined to cover cases where a 
>>> non-switch port
>>> is bridged to a switch port etc. We will submit more patches to cover
>>> all cases if we agree on this approach).
>> Good topic to bring up, thanks for proposing a patch.  There is indeed
>> duplicate pkts sent out in the case where both the bridge and the
>> offloaded device are flooding these non-unicast pkts, such as ARP
>> requests.  We do have per-port control today over unicast flooding
>> using BR_FLOOD (IFLA_BRPORT_UNICAST_FLOOD).
>>
>> As you point out, this doesn't solve the case for non-offloaded ports
>> bridged with switch ports.  If this port setting is enabled on an
>> offloaded switch port, for example, the non-offloaded port can't get
>> an ARP request resolved, if the MAC is behind the offloaded switch
>> port.  But do we care?  Is there a use-case for this one, mixing
>> offloaded and non-offloaded ports in a bridge?
>
> Not sure. I don't know the use case, but I think I might have heard 
> that there could be a case
>  where a switch port could be bridged with a vm's port running on the 
> switch. (?)

Ignoring the above usecase for a bit. And thinking through this again, 
It appears that this check should
be only on the ingress port, no ?

If the ingress bridge port is an offloaded port, don't flood or forward 
because hardware has already done it.
And this is best done with the offload feature flag on the bridge port.

But, If we bring in the usecase, where a bridge has offloaded and 
non-offloaded ports mixed,
there should be an option to disable this check and to achieve that its 
better for this to be a
flag on the port maintained by the bridge driver like the one proposed 
by this patch (BR_FORWARD).

Thanks,
Roopa

^ permalink raw reply

* Re: [net-next PATCH v2 1/1] net: sched: Introduce connmark action
From: Cong Wang @ 2015-01-18 21:00 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David Miller, netdev, nbd, pablo, Florian Westphal,
	Jiří Pírko
In-Reply-To: <1421611245-18492-1-git-send-email-jhs@emojatatu.com>

On Sun, Jan 18, 2015 at 12:00 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> +
> +MODULE_AUTHOR("Felix Fietkau <nbd@openwrt.org>");
> +MODULE_DESCRIPTION("Connection tracking mark restoring");
> +MODULE_LICENSE("GPL");


Please move these to the bottom.

> +
> +static int __init connmark_init_module(void)
> +{
> +       int ret;
> +
> +       ret = tcf_hashinfo_init(&connmark_hash_info, CONNMARK_TAB_MASK);
> +       if (ret)
> +               return ret;
> +

Is this against latest net-next? We don't need to init the hashinfo anymore,
tcf_register_action() already does that.

> +       return tcf_register_action(&act_connmark_ops, CONNMARK_TAB_MASK);
> +}
> +
> +static void __exit connmark_cleanup_module(void)
> +{
> +       tcf_unregister_action(&act_connmark_ops);
> +}
> +

Even if we really needed, you forgot to call tcf_hashinfo_destroy()?

Thanks.

^ permalink raw reply

* Re: [net-next PATCH v2 1/1] net: sched: Introduce connmark action
From: Jamal Hadi Salim @ 2015-01-18 21:22 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, netdev, nbd, pablo, Florian Westphal,
	Jiří Pírko
In-Reply-To: <CAHA+R7Muz_d47NX01tKO7v7V7m=5mKf+BPP1jo9dEH==8R6bgw@mail.gmail.com>

On 01/18/15 16:00, Cong Wang wrote:
> On Sun, Jan 18, 2015 at 12:00 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> +
>> +MODULE_AUTHOR("Felix Fietkau <nbd@openwrt.org>");
>> +MODULE_DESCRIPTION("Connection tracking mark restoring");
>> +MODULE_LICENSE("GPL");
>
>
> Please move these to the bottom.
>

Done.

>> +
>> +static int __init connmark_init_module(void)
>> +{
>> +       int ret;
>> +
>> +       ret = tcf_hashinfo_init(&connmark_hash_info, CONNMARK_TAB_MASK);
>> +       if (ret)
>> +               return ret;
>> +
>
> Is this against latest net-next? We don't need to init the hashinfo anymore,
> tcf_register_action() already does that.
>

The code itself has been living outside the tree - so i am just doing
only necessary transforms. The above was needed because the action
was maintaining its own hash.
I will convert to the new mode.

>> +       return tcf_register_action(&act_connmark_ops, CONNMARK_TAB_MASK);
>> +}
>> +
>> +static void __exit connmark_cleanup_module(void)
>> +{
>> +       tcf_unregister_action(&act_connmark_ops);
>> +}
>> +
>
> Even if we really needed, you forgot to call tcf_hashinfo_destroy()?
>

Didnt follow - why do you need tcf_hashinfo_destroy()?

Will send v3 shortly

cheers,
jamal

> Thanks.
>

^ permalink raw reply

* [net-next PATCH v3 1/1] net: sched: Introduce connmark action
From: Jamal Hadi Salim @ 2015-01-18 21:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, nbd, pablo, fw, jiri, cwang, Jamal Hadi Salim

From: Felix Fietkau <nbd@openwrt.org>

This tc action allows you to retrieve the connection tracking mark
This action has been used heavily by openwrt for a few years now.

There are known limitations currently:

doesn't work for initial packets, since we only query the ct table.
  Fine given use case is for returning packets

no implicit defrag.
  frags should be rare so fix later..

won't work for more complex tasks, e.g. lookup of other extensions
  since we have no means to store results

we still have a 2nd lookup later on via normal conntrack path.
This shouldn't break anything though since skb->nfct isn't altered.

V2:
remove unnecessary braces (Jiri)
change the action identifier to 14 (Jiri)
Fix some stylistic issues caught by checkpatch
V3:
Move module params to bottom (Cong)
Get rid of tcf_hashinfo_init and friends and conform to newer API (Cong)

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/net/tc_act/tc_connmark.h        |   14 +++
 include/uapi/linux/tc_act/tc_connmark.h |   22 ++++
 net/sched/Kconfig                       |   11 ++
 net/sched/Makefile                      |    1 +
 net/sched/act_connmark.c                |  192 +++++++++++++++++++++++++++++++
 5 files changed, 240 insertions(+)
 create mode 100644 include/net/tc_act/tc_connmark.h
 create mode 100644 include/uapi/linux/tc_act/tc_connmark.h
 create mode 100644 net/sched/act_connmark.c

diff --git a/include/net/tc_act/tc_connmark.h b/include/net/tc_act/tc_connmark.h
new file mode 100644
index 0000000..5c1104c
--- /dev/null
+++ b/include/net/tc_act/tc_connmark.h
@@ -0,0 +1,14 @@
+#ifndef __NET_TC_CONNMARK_H
+#define __NET_TC_CONNMARK_H
+
+#include <net/act_api.h>
+
+struct tcf_connmark_info {
+	struct tcf_common common;
+	u16 zone;
+};
+
+#define to_connmark(a) \
+	container_of(a->priv, struct tcf_connmark_info, common)
+
+#endif /* __NET_TC_CONNMARK_H */
diff --git a/include/uapi/linux/tc_act/tc_connmark.h b/include/uapi/linux/tc_act/tc_connmark.h
new file mode 100644
index 0000000..994b097
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_connmark.h
@@ -0,0 +1,22 @@
+#ifndef __UAPI_TC_CONNMARK_H
+#define __UAPI_TC_CONNMARK_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_CONNMARK 14
+
+struct tc_connmark {
+	tc_gen;
+	__u16 zone;
+};
+
+enum {
+	TCA_CONNMARK_UNSPEC,
+	TCA_CONNMARK_PARMS,
+	TCA_CONNMARK_TM,
+	__TCA_CONNMARK_MAX
+};
+#define TCA_CONNMARK_MAX (__TCA_CONNMARK_MAX - 1)
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index c54c9d9..db20cae 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -698,6 +698,17 @@ config NET_ACT_VLAN
 	  To compile this code as a module, choose M here: the
 	  module will be called act_vlan.
 
+config NET_ACT_CONNMARK
+        tristate "Netfilter Connection Mark Retriever"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        ---help---
+	  Say Y here to allow retrieving of conn mark
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_connmark.
+
 config NET_CLS_IND
 	bool "Incoming device classification"
 	depends on NET_CLS_U32 || NET_CLS_FW
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 679f24a..47304cd 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_NET_ACT_SIMP)	+= act_simple.o
 obj-$(CONFIG_NET_ACT_SKBEDIT)	+= act_skbedit.o
 obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
+obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
 obj-$(CONFIG_NET_SCH_FIFO)	+= sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)	+= sch_cbq.o
 obj-$(CONFIG_NET_SCH_HTB)	+= sch_htb.o
diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
new file mode 100644
index 0000000..8e47251
--- /dev/null
+++ b/net/sched/act_connmark.c
@@ -0,0 +1,192 @@
+/*
+ * net/sched/act_connmark.c  netfilter connmark retriever action
+ * skb mark is over-written
+ *
+ * Copyright (c) 2011 Felix Fietkau <nbd@openwrt.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+*/
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+#include <uapi/linux/tc_act/tc_connmark.h>
+#include <net/tc_act/tc_connmark.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+
+#define CONNMARK_TAB_MASK     3
+
+static int tcf_connmark(struct sk_buff *skb, const struct tc_action *a,
+			struct tcf_result *res)
+{
+	const struct nf_conntrack_tuple_hash *thash;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_connmark_info *ca = a->priv;
+	struct nf_conn *c;
+	int proto;
+
+	spin_lock(&ca->tcf_lock);
+	ca->tcf_tm.lastuse = jiffies;
+	bstats_update(&ca->tcf_bstats, skb);
+
+	if (skb->protocol == htons(ETH_P_IP)) {
+		if (skb->len < sizeof(struct iphdr))
+			goto out;
+
+		proto = NFPROTO_IPV4;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (skb->len < sizeof(struct ipv6hdr))
+			goto out;
+
+		proto = NFPROTO_IPV6;
+	} else {
+		goto out;
+	}
+
+	c = nf_ct_get(skb, &ctinfo);
+	if (c) {
+		skb->mark = c->mark;
+		/* using overlimits stats to count how many packets marked */
+		ca->tcf_qstats.overlimits++;
+		nf_ct_put(c);
+		goto out;
+	}
+
+	if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+			       proto, &tuple))
+		goto out;
+
+	thash = nf_conntrack_find_get(dev_net(skb->dev), ca->zone, &tuple);
+	if (!thash)
+		goto out;
+
+	c = nf_ct_tuplehash_to_ctrack(thash);
+	/* using overlimits stats to count how many packets marked */
+	ca->tcf_qstats.overlimits++;
+	skb->mark = c->mark;
+	nf_ct_put(c);
+
+out:
+	skb->nfct = NULL;
+	spin_unlock(&ca->tcf_lock);
+	return ca->tcf_action;
+}
+
+static const struct nla_policy connmark_policy[TCA_CONNMARK_MAX + 1] = {
+	[TCA_CONNMARK_PARMS] = { .len = sizeof(struct tc_connmark) },
+};
+
+static int tcf_connmark_init(struct net *net, struct nlattr *nla,
+			     struct nlattr *est, struct tc_action *a,
+			     int ovr, int bind)
+{
+	struct nlattr *tb[TCA_CONNMARK_MAX + 1];
+	struct tcf_connmark_info *ci;
+	struct tc_connmark *parm;
+	int ret = 0;
+
+	if (!nla)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_CONNMARK_MAX, nla, connmark_policy);
+	if (ret < 0)
+		return ret;
+
+	parm = nla_data(tb[TCA_CONNMARK_PARMS]);
+
+	if (!tcf_hash_check(parm->index, a, bind)) {
+		ret = tcf_hash_create(parm->index, est, a, sizeof(*ci), bind);
+		if (ret)
+			return ret;
+
+		ci = to_connmark(a);
+		ci->tcf_action = parm->action;
+		ci->zone = parm->zone;
+
+		tcf_hash_insert(a);
+		ret = ACT_P_CREATED;
+	} else {
+		ci = to_connmark(a);
+		if (bind)
+			return 0;
+		tcf_hash_release(a, bind);
+		if (!ovr)
+			return -EEXIST;
+		/* replacing action and zone */
+		ci->tcf_action = parm->action;
+		ci->zone = parm->zone;
+	}
+
+	return ret;
+}
+
+static inline int tcf_connmark_dump(struct sk_buff *skb, struct tc_action *a,
+				    int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_connmark_info *ci = a->priv;
+
+	struct tc_connmark opt = {
+		.index   = ci->tcf_index,
+		.refcnt  = ci->tcf_refcnt - ref,
+		.bindcnt = ci->tcf_bindcnt - bind,
+		.action  = ci->tcf_action,
+		.zone   = ci->zone,
+	};
+	struct tcf_t t;
+
+	if (nla_put(skb, TCA_CONNMARK_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	t.install = jiffies_to_clock_t(jiffies - ci->tcf_tm.install);
+	t.lastuse = jiffies_to_clock_t(jiffies - ci->tcf_tm.lastuse);
+	t.expires = jiffies_to_clock_t(ci->tcf_tm.expires);
+	if (nla_put(skb, TCA_CONNMARK_TM, sizeof(t), &t))
+		goto nla_put_failure;
+
+	return skb->len;
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static struct tc_action_ops act_connmark_ops = {
+	.kind		=	"connmark",
+	.type		=	TCA_ACT_CONNMARK,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_connmark,
+	.dump		=	tcf_connmark_dump,
+	.init		=	tcf_connmark_init,
+};
+
+static int __init connmark_init_module(void)
+{
+	return tcf_register_action(&act_connmark_ops, CONNMARK_TAB_MASK);
+}
+
+static void __exit connmark_cleanup_module(void)
+{
+	tcf_unregister_action(&act_connmark_ops);
+}
+
+module_init(connmark_init_module);
+module_exit(connmark_cleanup_module);
+MODULE_AUTHOR("Felix Fietkau <nbd@openwrt.org>");
+MODULE_DESCRIPTION("Connection tracking mark restoring");
+MODULE_LICENSE("GPL");
+
-- 
1.7.9.5

^ permalink raw reply related

* Re: [RFC PATCH v2 1/2] net: af_packet support for direct ring access in user space
From: Neil Horman @ 2015-01-18 22:02 UTC (permalink / raw)
  To: David Miller
  Cc: john.fastabend, netdev, danny.zhou, dborkman, john.ronciak,
	hannes, brouer
In-Reply-To: <20150114.153509.1264618607573705890.davem@davemloft.net>

On Wed, Jan 14, 2015 at 03:35:09PM -0500, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Mon, 12 Jan 2015 20:35:11 -0800
> 
> > +		if ((region.direction != DMA_BIDIRECTIONAL) &&
> > +		    (region.direction != DMA_TO_DEVICE) &&
> > +		    (region.direction != DMA_FROM_DEVICE))
> > +			return -EFAULT;
>  ...
> > +		if ((umem->nmap == npages) &&
> > +		    (0 != dma_map_sg(dev->dev.parent, umem->sglist,
> > +				     umem->nmap, region.direction))) {
> > +			region.iova = sg_dma_address(umem->sglist) + offset;
> 
> I am having trouble seeing how this can work.
> 
> dma_map_{single,sg}() mappings need synchronization after a DMA
> transfer takes place.
> 
> For example if the DMA occurs to the device, then that region can
> be cached in the PCI controller's internal caches and thus future
> cpu writes into that memory region will not be seen, until a
> dma_sync_*() is invoked.
> 
> That isn't going to happen when the device transmit queue is
> being completely managed in userspace.
> 
> And this takes us back to the issue of protection, I don't think
> it is addressed properly yet.
> 
> CAP_NET_ADMIN privileges do not mean "can crap all over memory"
> yet with this feature that can still happen.
> 
> If we are dealing with a device which cannot provide strict protection
> to only the process's locked local pages, you have to do something
> to implement that protection.
> 
> And you have _exactly_ one option to do that, abstracting the page
> addresses and eating a system call to trigger the sends, so that you
> can read from the user's (fake) descriptors and write into the real
> descriptors (translating the DMA addresses along the way) and
> triggering the TX doorbell.
> 
> I am not going to consider seriously an implementation that says "yeah
> sometimes the user can crap onto other people's memory", this isn't
> MS-DOS, it's a system where proper memory protections are mandatory
> rather than optional.
> 
This is probably a stupid question, but can you not dynamically mark the address
range that gets mapped for dma as uncacheable? i.e. Something simmilar to
ioremap_noncache, but to mark the region as uncacheable within the pci
controller?  Would doing so not obviate the need for sync operations
(potentially at the cost of some performance, though perhaps not as much as
incurring a system call)
Neil

^ permalink raw reply

* Re: BW regression after "tcp: refine TSO autosizing"
From: Eyal Perry @ 2015-01-18 21:40 UTC (permalink / raw)
  To: Eric Dumazet, Eyal Perry
  Cc: Or Gerlitz, Linux Netdev List, Amir Vadai, Yevgeny Petrilin,
	Saeed Mahameed, Ido Shamay, Amir Ancel
In-Reply-To: <1421603317.11734.154.camel@edumazet-glaptop2.roam.corp.google.com>


On 1/18/2015 19:48 PM, Eric Dumazet wrote:
> On Sun, 2015-01-18 at 18:22 +0200, Eyal Perry wrote:
>
>> Please let me know if you see something in the results.
> Getting high throughput on a single flow means lot of tweaking.
>
> For a start, mlx4 is known to have interrupt mitigation that can hurt,
> as the TX interrupt timer is restarted for every packet that is
> delivered to the NIC.
>
> ethtool -c ethX
> ..
> tx-usecs: 16
> tx-frames: 16
> tx-usecs-irq: 0
> tx-frames-irq: 256
> ...
>
> -> TX IRQ can be delayed by 16*16 = 256 usec.
>
> Can you try :
>
> ethtool -C ethX tx-usecs 2 tx-frames 2
>
> Or even
>
> ethtool -C ethX tx-usecs 1 tx-frames 1
So indeed, interrupt mitigation (tx-usecs 1 tx-frames 1) improves things up
for the "refined TSO autosizing" kernel (from 18.4Gbps to 19.7Gbps). but
in the
other kernel, the BW is remains the same with and without the coalescing.
> Interrupt mitigation is a trade-off.
>
> If one customer wants high throughput on a single flow, then you might
> remove interrupt mitigation.
>
> If another customer wants cpu efficiency with thousand of flows, I guess
> current mlx4 defaults are pretty good.
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox