Netdev List
 help / color / mirror / Atom feed
* [linux-firmware v4 2/2] rtl_nic: add new firmware for RTL8111F
From: Hayes Wang @ 2011-09-01  7:07 UTC (permalink / raw)
  To: dwmw2; +Cc: romieu, netdev, Hayes Wang
In-Reply-To: <1314860833-4000-1-git-send-email-hayeswang@realtek.com>

Add new firmware:
1. rtl_nic/rtl8168f-1.fw
   version: 0.0.2
2. rtl_nic/rtl8168f-2.fw
   version: 0.0.2

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 WHENCE                |    6 ++++++
 rtl_nic/rtl8168f-1.fw |  Bin 0 -> 3024 bytes
 rtl_nic/rtl8168f-2.fw |  Bin 0 -> 336 bytes
 3 files changed, 6 insertions(+), 0 deletions(-)
 create mode 100644 rtl_nic/rtl8168f-1.fw
 create mode 100644 rtl_nic/rtl8168f-2.fw

diff --git a/WHENCE b/WHENCE
index 3803906..ea3918c 100644
--- a/WHENCE
+++ b/WHENCE
@@ -1661,6 +1661,12 @@ File: rtl_nic/rtl8168e-2.fw
 File: rtl_nic/rtl8168e-3.fw
 Version: 0.0.2
 
+File: rtl_nic/rtl8168f-1.fw
+Version: 0.0.2
+
+File: rtl_nic/rtl8168f-2.fw
+Version: 0.0.2
+
 Licence:
  * Copyright © 2011, Realtek Semiconductor Corporation
  *
diff --git a/rtl_nic/rtl8168f-1.fw b/rtl_nic/rtl8168f-1.fw
new file mode 100644
index 0000000000000000000000000000000000000000..ffb1dd137eebeb2394acbfcf14c96a6091e73a06
GIT binary patch
literal 3024
zcmbuBdx+Fk6vyw*I>+72d~Q0quCou{ubCBXwSQKlFtQRMB9jc#KoE>FQqefNiaMJj
zT9yQ|83c*Dgb0O!yNX!)NB9a9U%%1yLCP$Ft-9^*^f_~XqigtI1DA8|d4A8i=ic8q
z=iEojm)0bw)qF59xiB#)F{wJ9sF^$^IXRg`XTQ0u%Ms_Q+`Ku{s;d(-7tWjc+QJu?
zEn2)VnM~AFPl;DgnUS12V@i$N=G+K%L!CNT;<9Zn4sCI>o%TjP*_9!89~L+lI9F;$
zadnY~Mb7-9iM+4bRz7As)Zr6%*bWYDqu8`(qpD7FvCUozpVi*1FA7R@O>IoSb|u<p
zbxd~|T^iHdA-!*OSxmd3{h$}1#n6k;uBMn?3iX$b9*gM}Xorhw08J-iz%I$<a^y11
zJ+U|iQ$+OMY{4Fy71ajOCS;Lz(V+#R1;&pQ-KHTwOLPHp^NTc!o<Lq7h^9A*Zm!3s
zjD1?PEQJia`ku1Sy=+TQ*;dg8?0$8kzch)iKP%cgO>{dp_18r+J4AmYR;G|#M&A={
zF^c`+kRCGHTeR8e$D&_Co58n<819bf4>9n=XC4UW;C-#uQ{C7R=gNA~8thN4VGow8
z;l3C9;J#=I-#E2q275k&&#Q582Glc3baX3xMzj*X70jKZMeq0WeQDexIvCq}^bS6S
z&x-k?yQwEqEP5*{y4HI6hUn62(O#WC=hiT1`*337Z~Mt+htH4WL?;;kjOan)^fLY%
z(Z42&))^1(PxAS<iuNfNE$2Mz!5LwHvQD%VyI_}(TS0B@MdYdQ`>CdT-}9AuU8UbA
zP4|}Pt4s&})+)cZIcOw)bfoCWFkislu#d{ep+~l#XmLtCKRO?WzL~BNU8njU2AtVi
z^WTU4bDl3Zp0n71JZ}6mp06~XIyNG&g10%&(9=@t&d`tH(5ukZ;0&&@=u?gHX>d;$
z9X3gHI#^Ei^>eVQ!q0{E^ZT{*v59>cHzK0fkgX#}*BF0pX=ddvvzO}wx(Kz7x%X%v
zEf=l8{vJIE#)kOG7omqgQ_r}(?FM>CzXzJ_U2MmPHHEgHt>gT$O<ku?=pW7+IUPCk
z`HRVC^R^G0!nn_EHRt+9tbkcEyw=Z1p=c2pa=zp*CjUd<3eiDe3Akgq-uCddo||tS
zm~3{R*aMpda5RGNDKOQ+x8LF3bKX|_*u0*+Mc}l&jknNYJFce=OctjB8_PL9uiJ=U
zmG^5n;&l_yS%1Ez_991LPEmg|x>cfUYCP6d208W*`$wFz9&w`h+4(l@CFkFqNkO<j
z$i5GIbUFP|_TX_W>p=%|u%ylFG8aUT(Bn3IyLWnh2K{kl_Kpmw^D=v9?43)cMMoUx
zeZYPIGiEh3vA6lPIGxB1ZwL3nay0)VdKCFecz2q0mKrLMgRLsz=eL<N&kpi@-?^f1
zzXYBUqL;p+H)lk5aIXel65TUI^fIyAcpD9CXV>GoGUPH0e^94=%Tn~SjoRB*l8=~A
z%GXG~2<Mz#;cFXqLUb=Rv~kzYli%Sxem7Xw^7h;LaPD?K1)Nuenx5{@dpAI|FMVx>
zUSIF)$i3BI$nP`MXxQw|52A+p??lfWpf2@wH9ny(dt2--jI?>4ExHqb>;G!QMeM2I
z%FPJ9$Wi1ro9RPfG25%W0WJPJ^gMYRmy<8<Z3{!6pUlVR47(gX%(*s+-YD_kvD8g4
zBg+&${$1;GkIzyZAH(#2RuF)l7zI#!a|7PSTI!GAqhGDmR4bZ5X5T_PxAHOAm`@xY
zYTv+MXz0`KeKPpGK#YKyNrnAuW$&;LnBAUp|9~0)y#pDmV+lBz)dO#mm+x>H-vZ0K
zkGy@zTWYf~Omr@@Fvi~8SFl^e8{Nb19qimm=EnSvav#gN<7Ugg`E~9E7&dTLr@?xT
z^FJTs{}(HEk;S}k?hTgBqVEzn3%|H8^S%HqPW0l-%p|t^S!b_;6<HpW&F65=Bd|Fj
zZ~F$$in|v($2yNMvCP+U-Q5pH>a_nEdF<WH>P}|+7<G5CqMdw~9*F)4Ui&W1zs=ke
zql|lYBOc0cqC?(kHq2-jchgZ*EqwY@(RsDZkm<<p=7`QpM{QQ@zi5AGOYML5`Tv40
z{EwT>I)_8t{^REVw?6uBJH$nDafY;*Um>gCo?^^BpQ<i+$U7Hv-C6tp8^gAL$oGfm
MzqYOImd5}71$?3tJpcdz

literal 0
HcmV?d00001

diff --git a/rtl_nic/rtl8168f-2.fw b/rtl_nic/rtl8168f-2.fw
new file mode 100644
index 0000000000000000000000000000000000000000..880a223f35ab98b5367d2d1ee788ac0c9b05d762
GIT binary patch
literal 336
zcmXw!y-EX75QUEk8YD%wS#>~kA+_0i?~<^j5w%s2yud<;R)L6ETFfJeRqT8MpCOem
z;a{vRY(+5f%(A&~_=YnxoI3z;eKXE|FTXqph3mLZNL-#~KJ}hGd*{?4HN@bw7lJ!3
z21jSb$z*g<c<=I%B_ZqkZolhs2ka|$pg@e}HxhD&DYQX9#U6cJtpRgj*M<^B?QsO#
zTX(Bnn{9J;qrRmLYKmB!x1tZmvFKODALB%HLAJ;ja+7=||DHtOs{Xt2S@eVah3K68
pbi(+!%m6bidzA}pqXtCe%=|h92o_7bN9aXW9hLp^-}iP7@DF!bO921?

literal 0
HcmV?d00001

-- 
1.7.6

^ permalink raw reply related

* [PATCH net-next v2 3/4] r8169: fix the reset setting for 8111evl
From: Hayes Wang @ 2011-09-01  6:53 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang
In-Reply-To: <1314860034-3911-1-git-send-email-hayeswang@realtek.com>

rtl8111evl should stop any TLP requirement before resetting by
enabling register 0x37 bit 7.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 193daf1..b999433 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -3995,6 +3995,7 @@ static void rtl8169_hw_reset(struct rtl8169_private *tp)
 		while (RTL_R8(TxPoll) & NPQ)
 			udelay(20);
 	} else if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
+		RTL_W8(ChipCmd, RTL_R8(ChipCmd) | StopReq);
 		while (!(RTL_R32(TxConfig) & TXCFG_EMPTY))
 			udelay(100);
 	} else {
-- 
1.7.6

^ permalink raw reply related

* [PATCH net-next v2 1/4] r8169: fix WOL setting for 8105 and 8111EVL
From: Hayes Wang @ 2011-09-01  6:53 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang

rtl8105, rtl8111E, and rtl8111evl need enable RxConfig bit 1 ~ 3
for supporting wake on lan.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 1cf8c3c..764c98e 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -3322,6 +3322,11 @@ static void r810x_pll_power_down(struct rtl8169_private *tp)
 	if (__rtl8169_get_wol(tp) & WAKE_ANY) {
 		rtl_writephy(tp, 0x1f, 0x0000);
 		rtl_writephy(tp, MII_BMCR, 0x0000);
+
+		if (tp->mac_version == RTL_GIGA_MAC_VER_29 ||
+		    tp->mac_version == RTL_GIGA_MAC_VER_30)
+			RTL_W32(RxConfig, RTL_R32(RxConfig) | AcceptBroadcast |
+				AcceptMulticast | AcceptMyPhys);
 		return;
 	}
 
@@ -3417,7 +3422,8 @@ static void r8168_pll_power_down(struct rtl8169_private *tp)
 		rtl_writephy(tp, MII_BMCR, 0x0000);
 
 		if (tp->mac_version == RTL_GIGA_MAC_VER_32 ||
-		    tp->mac_version == RTL_GIGA_MAC_VER_33)
+		    tp->mac_version == RTL_GIGA_MAC_VER_33 ||
+		    tp->mac_version == RTL_GIGA_MAC_VER_34)
 			RTL_W32(RxConfig, RTL_R32(RxConfig) | AcceptBroadcast |
 				AcceptMulticast | AcceptMyPhys);
 		return;
-- 
1.7.6

^ permalink raw reply related

* [PATCH net-next v2 4/4] r8169: support new chips of RTL8111F
From: Hayes Wang @ 2011-09-01  6:53 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang
In-Reply-To: <1314860034-3911-1-git-send-email-hayeswang@realtek.com>

Support new chips of RTL8111F.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c |  178 +++++++++++++++++++++++++++++++++-
 1 files changed, 176 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index b999433..40008d2 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -42,6 +42,8 @@
 #define FIRMWARE_8168E_1	"rtl_nic/rtl8168e-1.fw"
 #define FIRMWARE_8168E_2	"rtl_nic/rtl8168e-2.fw"
 #define FIRMWARE_8168E_3	"rtl_nic/rtl8168e-3.fw"
+#define FIRMWARE_8168F_1	"rtl_nic/rtl8168f-1.fw"
+#define FIRMWARE_8168F_2	"rtl_nic/rtl8168f-2.fw"
 #define FIRMWARE_8105E_1	"rtl_nic/rtl8105e-1.fw"
 
 #ifdef RTL8169_DEBUG
@@ -133,6 +135,8 @@ enum mac_version {
 	RTL_GIGA_MAC_VER_32,
 	RTL_GIGA_MAC_VER_33,
 	RTL_GIGA_MAC_VER_34,
+	RTL_GIGA_MAC_VER_35,
+	RTL_GIGA_MAC_VER_36,
 	RTL_GIGA_MAC_NONE   = 0xff,
 };
 
@@ -218,7 +222,11 @@ static const struct {
 	[RTL_GIGA_MAC_VER_33] =
 		_R("RTL8168e/8111e",	RTL_TD_1, FIRMWARE_8168E_2),
 	[RTL_GIGA_MAC_VER_34] =
-		_R("RTL8168evl/8111evl",RTL_TD_1, FIRMWARE_8168E_3)
+		_R("RTL8168evl/8111evl",RTL_TD_1, FIRMWARE_8168E_3),
+	[RTL_GIGA_MAC_VER_35] =
+		_R("RTL8168f/8111f",	RTL_TD_1, FIRMWARE_8168F_1),
+	[RTL_GIGA_MAC_VER_36] =
+		_R("RTL8168f/8111f",	RTL_TD_1, FIRMWARE_8168F_2)
 };
 #undef _R
 
@@ -1200,6 +1208,19 @@ static void rtl_link_chg_patch(struct rtl8169_private *tp)
 			     ERIAR_EXGMAC);
 		rtl_w1w0_eri(ioaddr, 0xdc, ERIAR_MASK_0001, 0x01, 0x00,
 			     ERIAR_EXGMAC);
+	} else if (tp->mac_version == RTL_GIGA_MAC_VER_35 ||
+		   tp->mac_version == RTL_GIGA_MAC_VER_36) {
+		if (RTL_R8(PHYstatus) & _1000bpsF) {
+			rtl_eri_write(ioaddr, 0x1bc, ERIAR_MASK_1111,
+				      0x00000011, ERIAR_EXGMAC);
+			rtl_eri_write(ioaddr, 0x1dc, ERIAR_MASK_1111,
+				      0x00000005, ERIAR_EXGMAC);
+		} else {
+			rtl_eri_write(ioaddr, 0x1bc, ERIAR_MASK_1111,
+				      0x0000001f, ERIAR_EXGMAC);
+			rtl_eri_write(ioaddr, 0x1dc, ERIAR_MASK_1111,
+				      0x0000003f, ERIAR_EXGMAC);
+		}
 	}
 }
 
@@ -1739,6 +1760,10 @@ static void rtl8169_get_mac_version(struct rtl8169_private *tp,
 		u32 val;
 		int mac_version;
 	} mac_info[] = {
+		/* 8168F family. */
+		{ 0x7cf00000, 0x48100000,	RTL_GIGA_MAC_VER_36 },
+		{ 0x7cf00000, 0x48000000,	RTL_GIGA_MAC_VER_35 },
+
 		/* 8168E family. */
 		{ 0x7c800000, 0x2c800000,	RTL_GIGA_MAC_VER_34 },
 		{ 0x7cf00000, 0x2c200000,	RTL_GIGA_MAC_VER_33 },
@@ -2873,6 +2898,97 @@ static void rtl8168e_2_hw_phy_config(struct rtl8169_private *tp)
 	rtl_writephy(tp, 0x1f, 0x0000);
 }
 
+static void rtl8168f_1_hw_phy_config(struct rtl8169_private *tp)
+{
+	static const struct phy_reg phy_reg_init[] = {
+		/* Channel estimation fine tune */
+		{ 0x1f, 0x0003 },
+		{ 0x09, 0xa20f },
+		{ 0x1f, 0x0000 },
+
+		/* Modify green table for giga & fnet */
+		{ 0x1f, 0x0005 },
+		{ 0x05, 0x8b55 },
+		{ 0x06, 0x0000 },
+		{ 0x05, 0x8b5e },
+		{ 0x06, 0x0000 },
+		{ 0x05, 0x8b67 },
+		{ 0x06, 0x0000 },
+		{ 0x05, 0x8b70 },
+		{ 0x06, 0x0000 },
+		{ 0x1f, 0x0000 },
+		{ 0x1f, 0x0007 },
+		{ 0x1e, 0x0078 },
+		{ 0x17, 0x0000 },
+		{ 0x19, 0x00fb },
+		{ 0x1f, 0x0000 },
+
+		/* Modify green table for 10M */
+		{ 0x1f, 0x0005 },
+		{ 0x05, 0x8b79 },
+		{ 0x06, 0xaa00 },
+		{ 0x1f, 0x0000 },
+
+		/* Disable hiimpedance detection (RTCT) */
+		{ 0x1f, 0x0003 },
+		{ 0x01, 0x328a },
+		{ 0x1f, 0x0000 }
+	};
+
+	rtl_apply_firmware(tp);
+
+	rtl_writephy_batch(tp, phy_reg_init, ARRAY_SIZE(phy_reg_init));
+
+	/* For 4-corner performance improve */
+	rtl_writephy(tp, 0x1f, 0x0005);
+	rtl_writephy(tp, 0x05, 0x8b80);
+	rtl_w1w0_phy(tp, 0x06, 0x0006, 0x0000);
+	rtl_writephy(tp, 0x1f, 0x0000);
+
+	/* PHY auto speed down */
+	rtl_writephy(tp, 0x1f, 0x0007);
+	rtl_writephy(tp, 0x1e, 0x002d);
+	rtl_w1w0_phy(tp, 0x18, 0x0010, 0x0000);
+	rtl_writephy(tp, 0x1f, 0x0000);
+	rtl_w1w0_phy(tp, 0x14, 0x8000, 0x0000);
+
+	/* improve 10M EEE waveform */
+	rtl_writephy(tp, 0x1f, 0x0005);
+	rtl_writephy(tp, 0x05, 0x8b86);
+	rtl_w1w0_phy(tp, 0x06, 0x0001, 0x0000);
+	rtl_writephy(tp, 0x1f, 0x0000);
+
+	/* Improve 2-pair detection performance */
+	rtl_writephy(tp, 0x1f, 0x0005);
+	rtl_writephy(tp, 0x05, 0x8b85);
+	rtl_w1w0_phy(tp, 0x06, 0x4000, 0x0000);
+	rtl_writephy(tp, 0x1f, 0x0000);
+}
+
+static void rtl8168f_2_hw_phy_config(struct rtl8169_private *tp)
+{
+	rtl_apply_firmware(tp);
+
+	/* For 4-corner performance improve */
+	rtl_writephy(tp, 0x1f, 0x0005);
+	rtl_writephy(tp, 0x05, 0x8b80);
+	rtl_w1w0_phy(tp, 0x06, 0x0006, 0x0000);
+	rtl_writephy(tp, 0x1f, 0x0000);
+
+	/* PHY auto speed down */
+	rtl_writephy(tp, 0x1f, 0x0007);
+	rtl_writephy(tp, 0x1e, 0x002d);
+	rtl_w1w0_phy(tp, 0x18, 0x0010, 0x0000);
+	rtl_writephy(tp, 0x1f, 0x0000);
+	rtl_w1w0_phy(tp, 0x14, 0x8000, 0x0000);
+
+	/* improve 10M EEE waveform */
+	rtl_writephy(tp, 0x1f, 0x0005);
+	rtl_writephy(tp, 0x05, 0x8b86);
+	rtl_w1w0_phy(tp, 0x06, 0x0001, 0x0000);
+	rtl_writephy(tp, 0x1f, 0x0000);
+}
+
 static void rtl8102e_hw_phy_config(struct rtl8169_private *tp)
 {
 	static const struct phy_reg phy_reg_init[] = {
@@ -2997,6 +3113,12 @@ static void rtl_hw_phy_config(struct net_device *dev)
 	case RTL_GIGA_MAC_VER_34:
 		rtl8168e_2_hw_phy_config(tp);
 		break;
+	case RTL_GIGA_MAC_VER_35:
+		rtl8168f_1_hw_phy_config(tp);
+		break;
+	case RTL_GIGA_MAC_VER_36:
+		rtl8168f_2_hw_phy_config(tp);
+		break;
 
 	default:
 		break;
@@ -3522,6 +3644,8 @@ static void __devinit rtl_init_pll_power_ops(struct rtl8169_private *tp)
 	case RTL_GIGA_MAC_VER_32:
 	case RTL_GIGA_MAC_VER_33:
 	case RTL_GIGA_MAC_VER_34:
+	case RTL_GIGA_MAC_VER_35:
+	case RTL_GIGA_MAC_VER_36:
 		ops->down	= r8168_pll_power_down;
 		ops->up		= r8168_pll_power_up;
 		break;
@@ -3994,7 +4118,9 @@ static void rtl8169_hw_reset(struct rtl8169_private *tp)
 	    tp->mac_version == RTL_GIGA_MAC_VER_31) {
 		while (RTL_R8(TxPoll) & NPQ)
 			udelay(20);
-	} else if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
+	} else if (tp->mac_version == RTL_GIGA_MAC_VER_34 ||
+		   tp->mac_version == RTL_GIGA_MAC_VER_35 ||
+		   tp->mac_version == RTL_GIGA_MAC_VER_36) {
 		RTL_W8(ChipCmd, RTL_R8(ChipCmd) | StopReq);
 		while (!(RTL_R32(TxConfig) & TXCFG_EMPTY))
 			udelay(100);
@@ -4480,6 +4606,49 @@ static void rtl_hw_start_8168e_2(void __iomem *ioaddr, struct pci_dev *pdev)
 	RTL_W8(Config5, RTL_R8(Config5) & ~Spi_en);
 }
 
+static void rtl_hw_start_8168f_1(void __iomem *ioaddr, struct pci_dev *pdev)
+{
+	static const struct ephy_info e_info_8168f_1[] = {
+		{ 0x06, 0x00c0,	0x0020 },
+		{ 0x08, 0x0001,	0x0002 },
+		{ 0x09, 0x0000,	0x0080 },
+		{ 0x19, 0x0000,	0x0224 }
+	};
+
+	rtl_csi_access_enable_1(ioaddr);
+
+	rtl_ephy_init(ioaddr, e_info_8168f_1, ARRAY_SIZE(e_info_8168f_1));
+
+	rtl_tx_performance_tweak(pdev, 0x5 << MAX_READ_REQUEST_SHIFT);
+
+	rtl_eri_write(ioaddr, 0xc0, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC);
+	rtl_eri_write(ioaddr, 0xb8, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC);
+	rtl_eri_write(ioaddr, 0xc8, ERIAR_MASK_1111, 0x00100002, ERIAR_EXGMAC);
+	rtl_eri_write(ioaddr, 0xe8, ERIAR_MASK_1111, 0x00100006, ERIAR_EXGMAC);
+	rtl_w1w0_eri(ioaddr, 0xdc, ERIAR_MASK_0001, 0x00, 0x01, ERIAR_EXGMAC);
+	rtl_w1w0_eri(ioaddr, 0xdc, ERIAR_MASK_0001, 0x01, 0x00, ERIAR_EXGMAC);
+	rtl_w1w0_eri(ioaddr, 0x1b0, ERIAR_MASK_0001, 0x10, 0x00, ERIAR_EXGMAC);
+	rtl_w1w0_eri(ioaddr, 0x1d0, ERIAR_MASK_0001, 0x10, 0x00, ERIAR_EXGMAC);
+	rtl_eri_write(ioaddr, 0xcc, ERIAR_MASK_1111, 0x00000050, ERIAR_EXGMAC);
+	rtl_eri_write(ioaddr, 0xd0, ERIAR_MASK_1111, 0x00000060, ERIAR_EXGMAC);
+	rtl_w1w0_eri(ioaddr, 0x0d4, ERIAR_MASK_0011, 0x0c00, 0xff00,
+		     ERIAR_EXGMAC);
+
+	RTL_W8(MaxTxPacketSize, EarlySize);
+
+	rtl_disable_clock_request(pdev);
+
+	RTL_W32(TxConfig, RTL_R32(TxConfig) | TXCFG_AUTO_FIFO);
+	RTL_W8(MCU, RTL_R8(MCU) & ~NOW_IS_OOB);
+
+	/* Adjust EEE LED frequency */
+	RTL_W8(EEE_LED, RTL_R8(EEE_LED) & ~0x07);
+
+	RTL_W8(DLLPR, RTL_R8(DLLPR) | PFM_EN);
+	RTL_W32(MISC, RTL_R32(MISC) | PWM_EN);
+	RTL_W8(Config5, RTL_R8(Config5) & ~Spi_en);
+}
+
 static void rtl_hw_start_8168(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
@@ -4574,6 +4743,11 @@ static void rtl_hw_start_8168(struct net_device *dev)
 		rtl_hw_start_8168e_2(ioaddr, pdev);
 		break;
 
+	case RTL_GIGA_MAC_VER_35:
+	case RTL_GIGA_MAC_VER_36:
+		rtl_hw_start_8168f_1(ioaddr, pdev);
+		break;
+
 	default:
 		printk(KERN_ERR PFX "%s: unknown chipset (mac_version = %d).\n",
 			dev->name, tp->mac_version);
-- 
1.7.6

^ permalink raw reply related

* [PATCH net-next v2 2/4] r8169: define the early size for 8111evl
From: Hayes Wang @ 2011-09-01  6:53 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang
In-Reply-To: <1314860034-3911-1-git-send-email-hayeswang@realtek.com>

For RTL8111EVL, the register of MaxTxPacketSize doesn't acctually
limit the tx size. It influnces the feature of early tx.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 764c98e..193daf1 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -311,6 +311,7 @@ enum rtl_registers {
 	MaxTxPacketSize	= 0xec,	/* 8101/8168. Unit of 128 bytes. */
 
 #define TxPacketMax	(8064 >> 7)
+#define EarlySize	0x27
 
 	FuncEvent	= 0xf0,
 	FuncEventMask	= 0xf4,
@@ -4463,7 +4464,7 @@ static void rtl_hw_start_8168e_2(void __iomem *ioaddr, struct pci_dev *pdev)
 	rtl_w1w0_eri(ioaddr, 0x0d4, ERIAR_MASK_0011, 0x0c00, 0xff00,
 		     ERIAR_EXGMAC);
 
-	RTL_W8(MaxTxPacketSize, 0x27);
+	RTL_W8(MaxTxPacketSize, EarlySize);
 
 	rtl_disable_clock_request(pdev);
 
-- 
1.7.6

^ permalink raw reply related

* Re: [RFMC] per-container tcp buffer limitation
From: KAMEZAWA Hiroyuki @ 2011-09-01  6:48 UTC (permalink / raw)
  To: Glauber Costa
  Cc: netdev, Linux Containers, linux-mm, Pavel Emelyanov,
	Eric W. Biederman, David Miller, Stephen Hemminger, penberg
In-Reply-To: <4E5EF14F.3040300@parallels.com>

On Wed, 31 Aug 2011 23:43:27 -0300
Glauber Costa <glommer@parallels.com> wrote:

> Hello People,
> 
> [ For the ones in linux-mm that are receiving this for the first time,
>    this is a follow up of
>    http://thread.gmane.org/gmane.linux.kernel.containers/21295 ]
> 
> Here is a new, a bit more mature version of my previous RFC. Now I 
> Request For More Comments from you guys in this new version of the patch.
> 
> Highlights:
> 
> * Although I do intend to experiment with more scenarios (suggestions 
> welcome), there does not seem to be a (huge) performance hit with this 
> patch applied, at least in a basic latency benchmark. That indicates 
> that even if we can demonstrate a performance hit, it won't be too hard 
> to optimize it away (famous last words?)
> 
> Since the patch touches both rcv and snd sides, I benchmarked it with 
> netperf against localhost. Command line: netperf -t TCP_RR -H localhost.
> 
> Without the patch
> =================
> 
> Socket Size   Request  Resp.   Elapsed  Trans.
> Send   Recv   Size     Size    Time     Rate
> bytes  Bytes  bytes    bytes   secs.    per sec
> 
> 16384  87380  1        1       10.00    26996.35
> 16384  87380
> 
> With the patch
> ===============
> 
> Local /Remote
> Socket Size   Request  Resp.   Elapsed  Trans.
> Send   Recv   Size     Size    Time     Rate
> bytes  Bytes  bytes    bytes   secs.    per sec
> 
> 16384  87380  1        1       10.00    27291.86
> 16384  87380
> 
> 
> As you can see, rate is a bit higher, but still under an one percent 
> range, meaning it is basically unchanged. I will benchmark it with 
> various levels of cgroup nesting on my next submission so we can have a 
> better idea of the impact of it when enabled.
> 
seems nice.

> * As nicely pointed out by Kamezawa, I dropped the sockets cgroup, and 
> introduced a kmem cgroup. After careful consideration, I decided not to 
> reuse the memcg. Basically, my impression is that memcg is concerned 
> with user objects, with page granularity and its swap attributes. 
> Because kernel objects are entirely different, I prefer to group them here.
> 

I myself has no objection to this direction. Other guys ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] ipv6: Create module parameter for use_tempaddr
From: Paul Stewart @ 2011-09-01  6:03 UTC (permalink / raw)
  To: netdev

When ipv6 is used as a module, there is no good place to set
the default value for use_tempaddr.  Using sysctl.conf will
set this parameter too early -- before the module is loaded.
To solve this, create a module parameter that will set the
default value of use_tempaddr for all devices.

Signed-off-by: Paul Stewart <pstew@chromium.org>
---
 include/linux/ipv6.h |    1 +
 net/ipv6/addrconf.c  |    3 +++
 net/ipv6/af_inet6.c  |    8 ++++++++
 3 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 0c99776..0d45a7c 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -178,6 +178,7 @@ struct ipv6_devconf {
 struct ipv6_params {
 	__s32 disable_ipv6;
 	__s32 autoconf;
+	__s32 use_tempaddr;
 };
 extern struct ipv6_params ipv6_defaults;
 #endif
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f012ebd..27314a2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4609,6 +4609,9 @@ static int __net_init addrconf_init_net(struct net *net)
 		/* these will be inherited by all namespaces */
 		dflt->autoconf = ipv6_defaults.autoconf;
 		dflt->disable_ipv6 = ipv6_defaults.disable_ipv6;
+#ifdef CONFIG_IPV6_PRIVACY
+		dflt->use_tempaddr = ipv6_defaults.use_tempaddr;
+#endif
 	}
 
 	net->ipv6.devconf_all = all;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 3b5669a..5022950 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -76,6 +76,9 @@ static DEFINE_SPINLOCK(inetsw6_lock);
 struct ipv6_params ipv6_defaults = {
 	.disable_ipv6 = 0,
 	.autoconf = 1,
+#ifdef CONFIG_IPV6_PRIVACY
+	.use_tempaddr = 0,
+#endif
 };
 
 static int disable_ipv6_mod = 0;
@@ -89,6 +92,11 @@ MODULE_PARM_DESC(disable_ipv6, "Disable IPv6 on all interfaces");
 module_param_named(autoconf, ipv6_defaults.autoconf, int, 0444);
 MODULE_PARM_DESC(autoconf, "Enable IPv6 address autoconfiguration on all interfaces");
 
+#ifdef CONFIG_IPV6_PRIVACY
+module_param_named(use_tempaddr, ipv6_defaults.use_tempaddr, int, 0444);
+MODULE_PARM_DESC(use_tempaddr, "Enable IPv6 address privacy for autoconfiguration by default");
+#endif
+
 static __inline__ struct ipv6_pinfo *inet6_sk_generic(struct sock *sk)
 {
 	const int offset = sk->sk_prot->obj_size - sizeof(struct ipv6_pinfo);
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 3/3] net/irda: sh_irda: add PM support
From: Kuninori Morimoto @ 2011-09-01  6:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: Magnus, Linux-Net, Kuninori Morimoto
In-Reply-To: <8739ggdcie.wl%kuninori.morimoto.gx@renesas.com>

From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
---
 drivers/net/irda/sh_irda.c |   38 +++++++++++++++++++++++++++-----------
 1 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/net/irda/sh_irda.c b/drivers/net/irda/sh_irda.c
index f8e5750..71e817a 100644
--- a/drivers/net/irda/sh_irda.c
+++ b/drivers/net/irda/sh_irda.c
@@ -24,6 +24,7 @@
  */
 #include <linux/module.h>
 #include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
 #include <linux/clk.h>
 #include <net/irda/wrapper.h>
 #include <net/irda/irda_device.h>
@@ -143,7 +144,7 @@ struct sh_irda_xir_func {
 struct sh_irda_self {
 	void __iomem		*membase;
 	unsigned int		irq;
-	struct clk		*clk;
+	struct platform_device	*pdev;
 
 	struct net_device	*ndev;
 
@@ -692,7 +693,7 @@ static int sh_irda_open(struct net_device *ndev)
 	struct sh_irda_self *self = netdev_priv(ndev);
 	int err;
 
-	clk_enable(self->clk);
+	pm_runtime_get_sync(&self->pdev->dev);
 	err = sh_irda_crc_init(self);
 	if (err)
 		goto open_err;
@@ -716,7 +717,7 @@ static int sh_irda_open(struct net_device *ndev)
 	return 0;
 
 open_err:
-	clk_disable(self->clk);
+	pm_runtime_put_sync(&self->pdev->dev);
 
 	return err;
 }
@@ -732,6 +733,7 @@ static int sh_irda_stop(struct net_device *ndev)
 	}
 
 	netif_stop_queue(ndev);
+	pm_runtime_put_sync(&self->pdev->dev);
 
 	dev_info(&ndev->dev, "stoped\n");
 
@@ -784,11 +786,8 @@ static int __devinit sh_irda_probe(struct platform_device *pdev)
 	if (err)
 		goto err_mem_2;
 
-	self->clk = clk_get(&pdev->dev, NULL);
-	if (IS_ERR(self->clk)) {
-		dev_err(&pdev->dev, "cannot get irda clock\n");
-		goto err_mem_3;
-	}
+	self->pdev = pdev;
+	pm_runtime_enable(&pdev->dev);
 
 	irda_init_max_qos_capabilies(&self->qos);
 
@@ -818,8 +817,7 @@ static int __devinit sh_irda_probe(struct platform_device *pdev)
 	goto exit;
 
 err_mem_4:
-	clk_put(self->clk);
-err_mem_3:
+	pm_runtime_disable(&pdev->dev);
 	sh_irda_remove_iobuf(self);
 err_mem_2:
 	iounmap(self->membase);
@@ -838,7 +836,7 @@ static int __devexit sh_irda_remove(struct platform_device *pdev)
 		return 0;
 
 	unregister_netdev(ndev);
-	clk_put(self->clk);
+	pm_runtime_disable(&pdev->dev);
 	sh_irda_remove_iobuf(self);
 	iounmap(self->membase);
 	free_netdev(ndev);
@@ -847,11 +845,29 @@ static int __devexit sh_irda_remove(struct platform_device *pdev)
 	return 0;
 }
 
+static int sh_irda_runtime_nop(struct device *dev)
+{
+	/* Runtime PM callback shared between ->runtime_suspend()
+	 * and ->runtime_resume(). Simply returns success.
+	 *
+	 * This driver re-initializes all registers after
+	 * pm_runtime_get_sync() anyway so there is no need
+	 * to save and restore registers here.
+	 */
+	return 0;
+}
+
+static const struct dev_pm_ops sh_irda_pm_ops = {
+	.runtime_suspend	= sh_irda_runtime_nop,
+	.runtime_resume		= sh_irda_runtime_nop,
+};
+
 static struct platform_driver sh_irda_driver = {
 	.probe	= sh_irda_probe,
 	.remove	= __devexit_p(sh_irda_remove),
 	.driver	= {
 		.name	= DRIVER_NAME,
+		.pm	= &sh_irda_pm_ops,
 	},
 };
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 2/3] net/irda: sh_irda: update author's email address
From: Kuninori Morimoto @ 2011-09-01  6:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: Magnus, Linux-Net, Kuninori Morimoto
In-Reply-To: <8739ggdcie.wl%kuninori.morimoto.gx@renesas.com>

From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>

it also cleanup white space

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
---
 drivers/net/irda/sh_irda.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/irda/sh_irda.c b/drivers/net/irda/sh_irda.c
index a25d32a..f8e5750 100644
--- a/drivers/net/irda/sh_irda.c
+++ b/drivers/net/irda/sh_irda.c
@@ -2,7 +2,7 @@
  * SuperH IrDA Driver
  *
  * Copyright (C) 2010 Renesas Solutions Corp.
- * Kuninori Morimoto <morimoto.kuninori@renesas.com>
+ * Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
  *
  * Based on sh_sir.c
  * Copyright (C) 2009 Renesas Solutions Corp.
@@ -142,7 +142,7 @@ struct sh_irda_xir_func {
 
 struct sh_irda_self {
 	void __iomem		*membase;
-	unsigned int		 irq;
+	unsigned int		irq;
 	struct clk		*clk;
 
 	struct net_device	*ndev;
@@ -432,9 +432,9 @@ static void sh_irda_set_mode(struct sh_irda_self *self, enum sh_irda_mode mode)
 		func	= &sh_irda_mfir_func;
 		break;
 	default:
-		name = "NONE";
-		data = 0;
-		func = &sh_irda_xir_func;
+		name	= "NONE";
+		data	= 0;
+		func	= &sh_irda_xir_func;
 		break;
 	}
 
@@ -848,10 +848,10 @@ static int __devexit sh_irda_remove(struct platform_device *pdev)
 }
 
 static struct platform_driver sh_irda_driver = {
-	.probe   = sh_irda_probe,
-	.remove  = __devexit_p(sh_irda_remove),
-	.driver  = {
-		.name = DRIVER_NAME,
+	.probe	= sh_irda_probe,
+	.remove	= __devexit_p(sh_irda_remove),
+	.driver	= {
+		.name	= DRIVER_NAME,
 	},
 };
 
@@ -868,6 +868,6 @@ static void __exit sh_irda_exit(void)
 module_init(sh_irda_init);
 module_exit(sh_irda_exit);
 
-MODULE_AUTHOR("Kuninori Morimoto <morimoto.kuninori@renesas.com>");
+MODULE_AUTHOR("Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>");
 MODULE_DESCRIPTION("SuperH IrDA driver");
 MODULE_LICENSE("GPL");
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 1/3] net/irda: sh_irda: add sh_irda_ index to all functions
From: Kuninori Morimoto @ 2011-09-01  6:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: Magnus, Linux-Net, Kuninori Morimoto
In-Reply-To: <8739ggdcie.wl%kuninori.morimoto.gx@renesas.com>

From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
---
 drivers/net/irda/sh_irda.c |   68 ++++++++++++++++++++++----------------------
 1 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/drivers/net/irda/sh_irda.c b/drivers/net/irda/sh_irda.c
index 4488bd5..a25d32a 100644
--- a/drivers/net/irda/sh_irda.c
+++ b/drivers/net/irda/sh_irda.c
@@ -262,7 +262,7 @@ static int sh_irda_set_baudrate(struct sh_irda_self *self, int baudrate)
 	return 0;
 }
 
-static int xir_get_rcv_length(struct sh_irda_self *self)
+static int sh_irda_get_rcv_length(struct sh_irda_self *self)
 {
 	return RFL_MASK & sh_irda_read(self, IRRFLR);
 }
@@ -272,47 +272,47 @@ static int xir_get_rcv_length(struct sh_irda_self *self)
  *		NONE MODE
  *
  *=====================================*/
-static int xir_fre(struct sh_irda_self *self)
+static int sh_irda_xir_fre(struct sh_irda_self *self)
 {
 	struct device *dev = &self->ndev->dev;
 	dev_err(dev, "none mode: frame recv\n");
 	return 0;
 }
 
-static int xir_trov(struct sh_irda_self *self)
+static int sh_irda_xir_trov(struct sh_irda_self *self)
 {
 	struct device *dev = &self->ndev->dev;
 	dev_err(dev, "none mode: buffer ram over\n");
 	return 0;
 }
 
-static int xir_9(struct sh_irda_self *self)
+static int sh_irda_xir_9(struct sh_irda_self *self)
 {
 	struct device *dev = &self->ndev->dev;
 	dev_err(dev, "none mode: time over\n");
 	return 0;
 }
 
-static int xir_8(struct sh_irda_self *self)
+static int sh_irda_xir_8(struct sh_irda_self *self)
 {
 	struct device *dev = &self->ndev->dev;
 	dev_err(dev, "none mode: framing error\n");
 	return 0;
 }
 
-static int xir_fte(struct sh_irda_self *self)
+static int sh_irda_xir_fte(struct sh_irda_self *self)
 {
 	struct device *dev = &self->ndev->dev;
 	dev_err(dev, "none mode: frame transmit end\n");
 	return 0;
 }
 
-static struct sh_irda_xir_func xir_func = {
-	.xir_fre	= xir_fre,
-	.xir_trov	= xir_trov,
-	.xir_9		= xir_9,
-	.xir_8		= xir_8,
-	.xir_fte	= xir_fte,
+static struct sh_irda_xir_func sh_irda_xir_func = {
+	.xir_fre	= sh_irda_xir_fre,
+	.xir_trov	= sh_irda_xir_trov,
+	.xir_9		= sh_irda_xir_9,
+	.xir_8		= sh_irda_xir_8,
+	.xir_fte	= sh_irda_xir_fte,
 };
 
 /*=====================================
@@ -321,12 +321,12 @@ static struct sh_irda_xir_func xir_func = {
  *
  * MIR/FIR are not supported now
  *=====================================*/
-static struct sh_irda_xir_func mfir_func = {
-	.xir_fre	= xir_fre,
-	.xir_trov	= xir_trov,
-	.xir_9		= xir_9,
-	.xir_8		= xir_8,
-	.xir_fte	= xir_fte,
+static struct sh_irda_xir_func sh_irda_mfir_func = {
+	.xir_fre	= sh_irda_xir_fre,
+	.xir_trov	= sh_irda_xir_trov,
+	.xir_9		= sh_irda_xir_9,
+	.xir_8		= sh_irda_xir_8,
+	.xir_fte	= sh_irda_xir_fte,
 };
 
 /*=====================================
@@ -334,12 +334,12 @@ static struct sh_irda_xir_func mfir_func = {
  *		SIR MODE
  *
  *=====================================*/
-static int sir_fre(struct sh_irda_self *self)
+static int sh_irda_sir_fre(struct sh_irda_self *self)
 {
 	struct device *dev = &self->ndev->dev;
 	u16 data16;
 	u8  *data = (u8 *)&data16;
-	int len = xir_get_rcv_length(self);
+	int len = sh_irda_get_rcv_length(self);
 	int i, j;
 
 	if (len > IRDARAM_LEN)
@@ -362,7 +362,7 @@ static int sir_fre(struct sh_irda_self *self)
 	return 0;
 }
 
-static int sir_trov(struct sh_irda_self *self)
+static int sh_irda_sir_trov(struct sh_irda_self *self)
 {
 	struct device *dev = &self->ndev->dev;
 
@@ -371,7 +371,7 @@ static int sir_trov(struct sh_irda_self *self)
 	return 0;
 }
 
-static int sir_tot(struct sh_irda_self *self)
+static int sh_irda_sir_tot(struct sh_irda_self *self)
 {
 	struct device *dev = &self->ndev->dev;
 
@@ -381,7 +381,7 @@ static int sir_tot(struct sh_irda_self *self)
 	return 0;
 }
 
-static int sir_fer(struct sh_irda_self *self)
+static int sh_irda_sir_fer(struct sh_irda_self *self)
 {
 	struct device *dev = &self->ndev->dev;
 
@@ -390,7 +390,7 @@ static int sir_fer(struct sh_irda_self *self)
 	return 0;
 }
 
-static int sir_fte(struct sh_irda_self *self)
+static int sh_irda_sir_fte(struct sh_irda_self *self)
 {
 	struct device *dev = &self->ndev->dev;
 
@@ -400,12 +400,12 @@ static int sir_fte(struct sh_irda_self *self)
 	return 0;
 }
 
-static struct sh_irda_xir_func sir_func = {
-	.xir_fre	= sir_fre,
-	.xir_trov	= sir_trov,
-	.xir_9		= sir_tot,
-	.xir_8		= sir_fer,
-	.xir_fte	= sir_fte,
+static struct sh_irda_xir_func sh_irda_sir_func = {
+	.xir_fre	= sh_irda_sir_fre,
+	.xir_trov	= sh_irda_sir_trov,
+	.xir_9		= sh_irda_sir_tot,
+	.xir_8		= sh_irda_sir_fer,
+	.xir_fte	= sh_irda_sir_fte,
 };
 
 static void sh_irda_set_mode(struct sh_irda_self *self, enum sh_irda_mode mode)
@@ -419,22 +419,22 @@ static void sh_irda_set_mode(struct sh_irda_self *self, enum sh_irda_mode mode)
 	case SH_IRDA_SIR:
 		name	= "SIR";
 		data	= TMD_SIR;
-		func	= &sir_func;
+		func	= &sh_irda_sir_func;
 		break;
 	case SH_IRDA_MIR:
 		name	= "MIR";
 		data	= TMD_MIR;
-		func	= &mfir_func;
+		func	= &sh_irda_mfir_func;
 		break;
 	case SH_IRDA_FIR:
 		name	= "FIR";
 		data	= TMD_FIR;
-		func	= &mfir_func;
+		func	= &sh_irda_mfir_func;
 		break;
 	default:
 		name = "NONE";
 		data = 0;
-		func = &xir_func;
+		func = &sh_irda_xir_func;
 		break;
 	}
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 0/3] net/irda: sh_irda: add PM support
From: Kuninori Morimoto @ 2011-09-01  6:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: Magnus, Linux-Net, Kuninori Morimoto


Hi David

These are tidyup and PM supoort patches for sh_irda

Kuninori Morimoto (3):
      net/irda: sh_irda: add sh_irda_ index to all functions
      net/irda: sh_irda: update author's email address
      net/irda: sh_irda: add PM support

^ permalink raw reply

* RE: [PATCH net-next 3/3] r8169: support new chips of RTL8111F
From: hayeswang @ 2011-09-01  6:02 UTC (permalink / raw)
  To: 'Francois Romieu'; +Cc: netdev, linux-kernel
In-Reply-To: <20110831191358.GB24634@electric-eye.fr.zoreil.com>

 

> -----Original Message-----
> From: Francois Romieu [mailto:romieu@fr.zoreil.com] 
> Sent: Thursday, September 01, 2011 3:14 AM
> To: Hayeswang
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net-next 3/3] r8169: support new chips of RTL8111F
> 
> Hayes Wang <hayeswang@realtek.com> :
> [...]
> > diff --git a/drivers/net/ethernet/realtek/r8169.c 
> > b/drivers/net/ethernet/realtek/r8169.c
> > index 68f1e2f..c04fbc0 100644
> > --- a/drivers/net/ethernet/realtek/r8169.c
> > +++ b/drivers/net/ethernet/realtek/r8169.c
> [...]
> > @@ -4476,6 +4602,49 @@ static void 
> rtl_hw_start_8168e_2(void __iomem *ioaddr, struct pci_dev *pdev)
> >  	RTL_W8(Config5, RTL_R8(Config5) & ~Spi_en);  }
> >  
> > +static void rtl_hw_start_8168f_1(void __iomem *ioaddr, 
> struct pci_dev 
> > +*pdev)
> [...]
> > +	RTL_W8(MaxTxPacketSize, 0x27);
> 
> Hmmm...
> 
> $ grep MaxTxPacketSize drivers/net/r8169.c
> 	MaxTxPacketSize	= 0xec,	/* 8101/8168. Unit of 128 bytes. */
> 	RTL_W8(MaxTxPacketSize, TxPacketMax);
> 	RTL_W8(MaxTxPacketSize, TxPacketMax);
> 	RTL_W8(MaxTxPacketSize, TxPacketMax);
> 	RTL_W8(MaxTxPacketSize, TxPacketMax);
> 	RTL_W8(MaxTxPacketSize, TxPacketMax);
> 	RTL_W8(MaxTxPacketSize, TxPacketMax);
> 	RTL_W8(MaxTxPacketSize, 0x27);
> 	RTL_W8(MaxTxPacketSize, TxPacketMax);
> 	RTL_W8(MaxTxPacketSize, TxPacketMax);
> 
> Is the 0x27 value still in units of 128 bytes ?

Yes.

> 
> Could it be TxPacketMax as everywhere else in the driver 
> instead of 0x27 ?

Yes, it is fine to be replaced with TxPacketMax.
The value is suggested by our hardware engineer. This chip supports 9K bytes for
tx, and TxPacketMax * 128 < 9K. This register doesn't acctually limit the tx
size. It influences the behavor when sending large packet. Thus, the different
setting may just result in different performance when sending large packet.

 
Best Regards,
Hayes

^ permalink raw reply

* RE: [PATCH net-next 1/3] r8169: fix WOL setting for 8105 and 8111EVL
From: hayeswang @ 2011-09-01  6:02 UTC (permalink / raw)
  To: 'Francois Romieu'; +Cc: netdev, linux-kernel
In-Reply-To: <20110831191344.GA24634@electric-eye.fr.zoreil.com>

 

> -----Original Message-----
> From: Francois Romieu [mailto:romieu@fr.zoreil.com] 
> Sent: Thursday, September 01, 2011 3:14 AM
> To: Hayeswang
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net-next 1/3] r8169: fix WOL setting for 
> 8105 and 8111EVL
> 
> Hayes Wang <hayeswang@realtek.com> :
> > 8105, 8111E, and 8111EVL need enable RxConfig bit 1 ~ 3 for 
> supporting 
> > wake on lan.
> [...]
> > diff --git a/drivers/net/ethernet/realtek/r8169.c 
> > b/drivers/net/ethernet/realtek/r8169.c
> > index 1cf8c3c..96e003a 100644
> > --- a/drivers/net/ethernet/realtek/r8169.c
> > +++ b/drivers/net/ethernet/realtek/r8169.c
> > @@ -3416,8 +3416,11 @@ static void 
> r8168_pll_power_down(struct rtl8169_private *tp)
> >  		rtl_writephy(tp, 0x1f, 0x0000);
> >  		rtl_writephy(tp, MII_BMCR, 0x0000);
> >  
> > -		if (tp->mac_version == RTL_GIGA_MAC_VER_32 ||
> > -		    tp->mac_version == RTL_GIGA_MAC_VER_33)
> > +		if (tp->mac_version == RTL_GIGA_MAC_VER_29 ||
> > +		    tp->mac_version == RTL_GIGA_MAC_VER_30 ||
> > +		    tp->mac_version == RTL_GIGA_MAC_VER_32 ||
> > +		    tp->mac_version == RTL_GIGA_MAC_VER_33 ||
> > +		    tp->mac_version == RTL_GIGA_MAC_VER_34)
> >  			RTL_W32(RxConfig, RTL_R32(RxConfig) | 
> AcceptBroadcast |
> >  				AcceptMulticast | AcceptMyPhys);
> 
> Fine for RTL_GIGA_MAC_VER_34 but RTL_GIGA_MAC_VER_29 and 
> RTL_GIGA_MAC_VER_29 use r810x_pll_power_{up/down}, not their 
> r8168_pll_xyz siblings.
> 

I would fix it. Thanks.
 
Best Regards,
Hayes

^ permalink raw reply

* Re: [patch net-next-2.6 1/2] net: allow to change carrier via sysfs
From: Jiri Pirko @ 2011-09-01  5:46 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ben Hutchings, Ben Greear, Nicolas de Pesloüan,
	Michał Mirosław, netdev, davem, eric.dumazet
In-Reply-To: <20110831144023.6361de56@nehalam.ftrdhcpuser.net>

Wed, Aug 31, 2011 at 11:40:23PM CEST, shemminger@vyatta.com wrote:
>On Wed, 31 Aug 2011 22:36:45 +0100
>Ben Hutchings <bhutchings@solarflare.com> wrote:
>
>> On Wed, 2011-08-31 at 13:48 -0700, Ben Greear wrote:
>> > On 08/31/2011 01:31 PM, Nicolas de Pesloüan wrote:
>> > > Le 31/08/2011 22:12, Ben Hutchings a écrit :
>> > >> On Wed, 2011-08-31 at 22:03 +0200, Nicolas de Pesloüan wrote:
>> > >>> Le 31/08/2011 10:45, Jiri Pirko a écrit :
>> > >>>
>> > >>>>>>> Do you expect drivers using implementation different than just calling
>> > >>>>>>> netif_carrier_on/off? Or is it supposed to also e.g. power down PHYs?
>> > >>>>>> Yes, generally it can be used also for en/disable phy, for testing
>> > >>>>>> purposes if hw and driver would support it.
>> > >>>>>
>> > >>>>> I'd like to see this working for GRE tunnel devices (for keepalive
>> > >>>>> daemon to be able to indicate to routing daemons whether tunnel is
>> > >>>>> really working) - implementation would be identical to dummy's case.
>> > >>>>> Should I prepare a patch or can I leave it to you?
>> > >>>>
>> > >>>> Ok, I can include it to this patchset (I'm going to repost first patch
>> > >>>> anyway)
>> > >>>
>> > >>> Can't we assume that the dummy's case is the default behavior and
>> > >>> register this default
>> > >>> ndo_change_carrier callback for every device ?
>> > >>
>> > >> You have got to be joking. No device driver that has real link
>> > >> monitoring should use this implementation.
>> > >
>> > > Well, why not? Arguably, this is probably not the feature one would use every day, but...
>> > >
>> > > Testing a cluster reaction to a link down event would be easier if one doesn't need to unplug the cable for the test. I understand that one can turn off the
>> > > switch port (physical or virtual), but echo 0 > /sys/class/net/eth0/carrier would be nice too.
>> > 
>> > There is special hardware out there that can do bypass, and often it also has a mode
>> > that will programatically cut link by throwing some relays.  We use this for our
>> > testing equipment...
>> > 
>> > If there is some way to twiddle standard-ish hardware to actually drop link, that
>> > would be neat.  I'd think it should be an ethtool type of thing, however.
>> 
>> We need to be able to control this as part of our driver test suite (on
>> the peer, not the device under test).  There are various MDIO bits that
>> look like they should do this but unfortunately they don't have
>> consistent effects.  Besides that, many PHYs are not MDIO-manageable.
>> So this would have to be a device-specific operation, whether it's
>> exposed through ethtool or sysfs.
>> 
>> > Actually dropping link, and letting that naturally propagate up the stack seems
>> > more reasonable than lying about the status half way up the stack.
>> 
>
>For testing clustering, there are hooks in vmware and QEMU/KVM to allow
>dropping carrier on the VM side.

Afaik in kvm this is only possible on emulated e1000 (last time I
checked).

^ permalink raw reply

* Re: [patch net-next-2.6 1/2] net: allow to change carrier via sysfs
From: Jiri Pirko @ 2011-09-01  5:44 UTC (permalink / raw)
  To: Ben Greear
  Cc: Nicolas de Pesloüan, Ben Hutchings,
	Michał Mirosław, netdev, davem, eric.dumazet,
	shemminger
In-Reply-To: <4E5E9E16.6060502@candelatech.com>

Wed, Aug 31, 2011 at 10:48:22PM CEST, greearb@candelatech.com wrote:
>On 08/31/2011 01:31 PM, Nicolas de Pesloüan wrote:
>>Le 31/08/2011 22:12, Ben Hutchings a écrit :
>>>On Wed, 2011-08-31 at 22:03 +0200, Nicolas de Pesloüan wrote:
>>>>Le 31/08/2011 10:45, Jiri Pirko a écrit :
>>>>
>>>>>>>>Do you expect drivers using implementation different than just calling
>>>>>>>>netif_carrier_on/off? Or is it supposed to also e.g. power down PHYs?
>>>>>>>Yes, generally it can be used also for en/disable phy, for testing
>>>>>>>purposes if hw and driver would support it.
>>>>>>
>>>>>>I'd like to see this working for GRE tunnel devices (for keepalive
>>>>>>daemon to be able to indicate to routing daemons whether tunnel is
>>>>>>really working) - implementation would be identical to dummy's case.
>>>>>>Should I prepare a patch or can I leave it to you?
>>>>>
>>>>>Ok, I can include it to this patchset (I'm going to repost first patch
>>>>>anyway)
>>>>
>>>>Can't we assume that the dummy's case is the default behavior and
>>>>register this default
>>>>ndo_change_carrier callback for every device ?
>>>
>>>You have got to be joking. No device driver that has real link
>>>monitoring should use this implementation.
>>
>>Well, why not? Arguably, this is probably not the feature one would use every day, but...
>>
>>Testing a cluster reaction to a link down event would be easier if one doesn't need to unplug the cable for the test. I understand that one can turn off the
>>switch port (physical or virtual), but echo 0 > /sys/class/net/eth0/carrier would be nice too.
>
>There is special hardware out there that can do bypass, and often it also has a mode
>that will programatically cut link by throwing some relays.  We use this for our
>testing equipment...
>
>If there is some way to twiddle standard-ish hardware to actually drop link, that
>would be neat.  I'd think it should be an ethtool type of thing, however.

Ethtool can implement this eventually by calling the same ndo.

>
>Actually dropping link, and letting that naturally propagate up the stack seems
>more reasonable than lying about the status half way up the stack.

Yes, that is really the intension of the proposed ndo. Real hw driver
should implement that as you say, not directly setting carrier_on/off
>
>Thanks,
>Ben
>
>-- 
>Ben Greear <greearb@candelatech.com>
>Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: slow performance on disk/network i/o full speed after drop_caches
From: Stefan Priebe - Profihost AG @ 2011-09-01  5:41 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Zhu Yanhai, Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton,
	Mel Gorman, Jens Axboe, Linux Netdev List, KOSAKI Motohiro
In-Reply-To: <20110901041458.GA30123@localhost>

Thanks!

Am 01.09.2011 06:14, schrieb Wu Fengguang:
> Hi Stefan,
>
> On Wed, Aug 31, 2011 at 03:11:02PM +0800, Stefan Priebe - Profihost AG wrote:
>> Hi Fengguang,
>> Hi Yanhai,
>>
>>> you're abssolutely corect zone_reclaim_mode is on - but why?
>>> There must be some linux software which switches it on.
>>>
>>> ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i
>>> ~#
>>>
>>> also
>>> ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i
>>> ~#
>>>
>>> tells us nothing.
>>>
>>> I've then read this:
>>>
>>> "zone_reclaim_mode is set during bootup to 1 if it is determined that
>>> pages from remote zones will cause a measurable performance reduction.
>>> The page allocator will then reclaim easily reusable pages (those page
>>> cache pages that are currently not used) before allocating off node pages."
>>>
>>> Why does the kernel do that here in our case on these machines.
>>
>> Can nobody help why the kernel in this case set it to 1?
>
> It's determined by RECLAIM_DISTANCE.
>
> build_zonelists():
>
>                  /*
>                   * If another node is sufficiently far away then it is better
>                   * to reclaim pages in a zone before going off node.
>                   */
>                  if (distance>  RECLAIM_DISTANCE)
>                          zone_reclaim_mode = 1;
>
> Since Linux v3.0 RECLAIM_DISTANCE is increased from 20 to 30 by this commit.
> It may well help your case, too.
>
> commit 32e45ff43eaf5c17f5a82c9ad358d515622c2562
> Author: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Date:   Wed Jun 15 15:08:20 2011 -0700
>
>      mm: increase RECLAIM_DISTANCE to 30
>
>      Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236)
>      that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual
>      Xeon E5520 + Intel S5520UR MB).  He is using Cyrus IMAPd and it's built on
>      a very traditional single-process model.
>
>        * a master process which reads config files and manages the other
>          process
>        * multiple imapd processes, one per connection
>        * multiple pop3d processes, one per connection
>        * multiple lmtpd processes, one per connection
>        * periodical "cleanup" processes.
>
>      There are thousands of independent processes.  The problem is, recent
>      Intel motherboard turn on zone_reclaim_mode by default and traditional
>      prefork model software don't work well on it.  Unfortunatelly, such models
>      are still typical even in the 21st century.  We can't ignore them.
>
>      This patch raises the zone_reclaim_mode threshold to 30.  30 doesn't have
>      any specific meaning.  but 20 means that one-hop QPI/Hypertransport and
>      such relatively cheap 2-4 socket machine are often used for traditional
>      servers as above.  The intention is that these machines don't use
>      zone_reclaim_mode.
>
>      Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions.
>      This patch doesn't change such high-end NUMA machine behavior.
>
>      Dave Hansen said:
>
>      : I know specifically of pieces of x86 hardware that set the information
>      : in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode
>      : behavior which that implies.
>      :
>      : They've done performance testing and run very large and scary benchmarks
>      : to make sure that they _want_ this turned on.  What this means for them
>      : is that they'll probably be de-optimized, at least on newer versions of
>      : the kernel.
>      :
>      : If you want to do this for particular systems, maybe _that_'s what we
>      : should do.  Have a list of specific configurations that need the
>      : defaults overridden either because they're buggy, or they have an
>      : unusual hardware configuration not really reflected in the distance
>      : table.
>
>      And later said:
>
>      : The original change in the hardware tables was for the benefit of a
>      : benchmark.  Said benchmark isn't going to get run on mainline until the
>      : next batch of enterprise distros drops, at which point the hardware where
>      : this was done will be irrelevant for the benchmark.  I'm sure any new
>      : hardware will just set this distance to another yet arbitrary value to
>      : make the kernel do what it wants.  :)
>      :
>      : Also, when the hardware got _set_ to this initially, I complained.  So, I
>      : guess I'm getting my way now, with this patch.  I'm cool with it.
>
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index b91a40e..fc839bf 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -60,7 +60,7 @@ int arch_update_cpu_topology(void);
>    * (in whatever arch specific measurement units returned by node_distance())
>    * then switch on zone reclaim on boot.
>    */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE 30
>   #endif
>   #ifndef PENALTY_FOR_NODE_WITH_CPUS
>   #define PENALTY_FOR_NODE_WITH_CPUS     (1)
>
> Thanks,
> Fengguang
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: slow performance on disk/network i/o full speed after drop_caches
From: Wu Fengguang @ 2011-09-01  4:14 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Zhu Yanhai, Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton,
	Mel Gorman, Jens Axboe, Linux Netdev List, KOSAKI Motohiro
In-Reply-To: <4E5DDE86.3040202@profihost.ag>

Hi Stefan,

On Wed, Aug 31, 2011 at 03:11:02PM +0800, Stefan Priebe - Profihost AG wrote:
> Hi Fengguang,
> Hi Yanhai,
> 
> > you're abssolutely corect zone_reclaim_mode is on - but why?
> > There must be some linux software which switches it on.
> >
> > ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i
> > ~#
> >
> > also
> > ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i
> > ~#
> >
> > tells us nothing.
> >
> > I've then read this:
> >
> > "zone_reclaim_mode is set during bootup to 1 if it is determined that
> > pages from remote zones will cause a measurable performance reduction.
> > The page allocator will then reclaim easily reusable pages (those page
> > cache pages that are currently not used) before allocating off node pages."
> >
> > Why does the kernel do that here in our case on these machines.
> 
> Can nobody help why the kernel in this case set it to 1?

It's determined by RECLAIM_DISTANCE.

build_zonelists():

                /*
                 * If another node is sufficiently far away then it is better
                 * to reclaim pages in a zone before going off node.
                 */
                if (distance > RECLAIM_DISTANCE)
                        zone_reclaim_mode = 1;

Since Linux v3.0 RECLAIM_DISTANCE is increased from 20 to 30 by this commit.
It may well help your case, too.

commit 32e45ff43eaf5c17f5a82c9ad358d515622c2562
Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date:   Wed Jun 15 15:08:20 2011 -0700

    mm: increase RECLAIM_DISTANCE to 30
    
    Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236)
    that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual
    Xeon E5520 + Intel S5520UR MB).  He is using Cyrus IMAPd and it's built on
    a very traditional single-process model.
    
      * a master process which reads config files and manages the other
        process
      * multiple imapd processes, one per connection
      * multiple pop3d processes, one per connection
      * multiple lmtpd processes, one per connection
      * periodical "cleanup" processes.
    
    There are thousands of independent processes.  The problem is, recent
    Intel motherboard turn on zone_reclaim_mode by default and traditional
    prefork model software don't work well on it.  Unfortunatelly, such models
    are still typical even in the 21st century.  We can't ignore them.
    
    This patch raises the zone_reclaim_mode threshold to 30.  30 doesn't have
    any specific meaning.  but 20 means that one-hop QPI/Hypertransport and
    such relatively cheap 2-4 socket machine are often used for traditional
    servers as above.  The intention is that these machines don't use
    zone_reclaim_mode.
    
    Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions.
    This patch doesn't change such high-end NUMA machine behavior.
    
    Dave Hansen said:
    
    : I know specifically of pieces of x86 hardware that set the information
    : in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode
    : behavior which that implies.
    :
    : They've done performance testing and run very large and scary benchmarks
    : to make sure that they _want_ this turned on.  What this means for them
    : is that they'll probably be de-optimized, at least on newer versions of
    : the kernel.
    :
    : If you want to do this for particular systems, maybe _that_'s what we
    : should do.  Have a list of specific configurations that need the
    : defaults overridden either because they're buggy, or they have an
    : unusual hardware configuration not really reflected in the distance
    : table.

    And later said:
    
    : The original change in the hardware tables was for the benefit of a
    : benchmark.  Said benchmark isn't going to get run on mainline until the
    : next batch of enterprise distros drops, at which point the hardware where
    : this was done will be irrelevant for the benchmark.  I'm sure any new
    : hardware will just set this distance to another yet arbitrary value to
    : make the kernel do what it wants.  :)
    :
    : Also, when the hardware got _set_ to this initially, I complained.  So, I
    : guess I'm getting my way now, with this patch.  I'm cool with it.

diff --git a/include/linux/topology.h b/include/linux/topology.h
index b91a40e..fc839bf 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -60,7 +60,7 @@ int arch_update_cpu_topology(void);
  * (in whatever arch specific measurement units returned by node_distance())
  * then switch on zone reclaim on boot.
  */
-#define RECLAIM_DISTANCE 20
+#define RECLAIM_DISTANCE 30
 #endif
 #ifndef PENALTY_FOR_NODE_WITH_CPUS
 #define PENALTY_FOR_NODE_WITH_CPUS     (1)

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* Re: [linux-firmware v3 1/2] rtl_nic: update firmware for RTL8111E-VL
From: Ben Hutchings @ 2011-09-01  3:55 UTC (permalink / raw)
  To: Hayes Wang; +Cc: dwmw2, romieu, netdev
In-Reply-To: <1314807407-1512-1-git-send-email-hayeswang@realtek.com>

[-- Attachment #1: Type: text/plain, Size: 777 bytes --]

On Wed, 2011-08-31 at 18:35 +0100, Hayes Wang wrote:
> Updated firmware with stability fixes.
> Version: 0.0.2
> 
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>
> ---
>  WHENCE                |    2 +-
>  rtl_nic/rtl8168e-3.fw |  Bin 2804 -> 3552 bytes
>  2 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/WHENCE b/WHENCE
> index a47b307..eb7bdfd 100644
> --- a/WHENCE
> +++ b/WHENCE
> @@ -1657,7 +1657,7 @@ File: rtl_nic/rtl8168d-2.fw
>  File: rtl_nic/rtl8105e-1.fw
>  File: rtl_nic/rtl8168e-1.fw
>  File: rtl_nic/rtl8168e-2.fw
> -File: rtl_nic/rtl8168e-3.fw
> +File: rtl_nic/rtl8168e-3.fw (version: 0.0.2)
[...]

Like I said before, the version belongs on a separate line:

File: rtl_nic/rtl8168e-3.fw
Version: 0.0.2

Ben.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* [RFMC] per-container tcp buffer limitation
From: Glauber Costa @ 2011-09-01  2:43 UTC (permalink / raw)
  To: netdev
  Cc: Linux Containers, linux-mm, Pavel Emelyanov, Eric W. Biederman,
	David Miller, KAMEZAWA Hiroyuki, Stephen Hemminger, penberg

[-- Attachment #1: Type: text/plain, Size: 2508 bytes --]

Hello People,

[ For the ones in linux-mm that are receiving this for the first time,
   this is a follow up of
   http://thread.gmane.org/gmane.linux.kernel.containers/21295 ]

Here is a new, a bit more mature version of my previous RFC. Now I 
Request For More Comments from you guys in this new version of the patch.

Highlights:

* Although I do intend to experiment with more scenarios (suggestions 
welcome), there does not seem to be a (huge) performance hit with this 
patch applied, at least in a basic latency benchmark. That indicates 
that even if we can demonstrate a performance hit, it won't be too hard 
to optimize it away (famous last words?)

Since the patch touches both rcv and snd sides, I benchmarked it with 
netperf against localhost. Command line: netperf -t TCP_RR -H localhost.

Without the patch
=================

Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    26996.35
16384  87380

With the patch
===============

Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    27291.86
16384  87380


As you can see, rate is a bit higher, but still under an one percent 
range, meaning it is basically unchanged. I will benchmark it with 
various levels of cgroup nesting on my next submission so we can have a 
better idea of the impact of it when enabled.

* As nicely pointed out by Kamezawa, I dropped the sockets cgroup, and 
introduced a kmem cgroup. After careful consideration, I decided not to 
reuse the memcg. Basically, my impression is that memcg is concerned 
with user objects, with page granularity and its swap attributes. 
Because kernel objects are entirely different, I prefer to group them here.

* Only tcp ipv4 is converted - because it is basically the one in which
memory pressure thresholds are really put to use. I plan to touch the 
other protocols in the next submission.

* As with other sysctls, the sysctl controlling tcp memory pressure 
behaviour was made per-netns. But it will show cgroup-data for the 
current cgroup. The cgroup control file, however, will only set a 
maximum value. The pressure thresholds is not the business of the box 
administrator, but rather, of the container's - anything goes, provided 
none of the 3 values go over the maximum.

Comments welcome

[-- Attachment #2: tcp-membuf.patch --]
[-- Type: text/plain, Size: 33843 bytes --]

diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index ac663c1..363b8e8 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -35,6 +35,10 @@ SUBSYS(cpuacct)
 SUBSYS(mem_cgroup)
 #endif
 
+#ifdef CONFIG_CGROUP_KMEM
+SUBSYS(kmem)
+#endif
+
 /* */
 
 #ifdef CONFIG_CGROUP_DEVICE
diff --git a/include/linux/kmem_cgroup.h b/include/linux/kmem_cgroup.h
new file mode 100644
index 0000000..9c62718
--- /dev/null
+++ b/include/linux/kmem_cgroup.h
@@ -0,0 +1,68 @@
+/* kmem_cgroup.h - Kernel Memory Controller
+ *
+ * Copyright Parallels Inc., 2011
+ * Author: Glauber Costa <glommer@parallels.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _LINUX_KMEM_CGROUP_H
+#define _LINUX_KMEM_CGROUP_H
+#include <linux/cgroup.h>
+#include <linux/atomic.h>
+#include <linux/percpu_counter.h>
+
+struct kmem_cgroup {
+	struct cgroup_subsys_state css;
+	struct kmem_cgroup *parent;
+
+	int tcp_memory_pressure;
+	int tcp_max_memory;
+	atomic_long_t tcp_memory_allocated;
+	struct percpu_counter tcp_sockets_allocated;
+	long tcp_prot_mem[3];
+
+	atomic_long_t udp_memory_allocated;
+};
+
+
+#ifdef CONFIG_CGROUP_KMEM
+static inline struct kmem_cgroup *cgroup_sk(struct cgroup *cgrp)
+{
+	return container_of(cgroup_subsys_state(cgrp, kmem_subsys_id),
+		struct kmem_cgroup, css);
+}
+
+static inline struct kmem_cgroup *task_sk(struct task_struct *tsk)
+{
+	return container_of(task_subsys_state(tsk, kmem_subsys_id),
+		struct kmem_cgroup, css);
+}
+
+static inline bool kmem_cgroup_disabled(void)
+{
+	if (kmem_subsys.disabled)
+		return true;
+	return false;
+}
+#else
+static inline struct kmem_cgroup *cgroup_sk(struct cgroup *cgrp)
+{
+	return NULL;
+}
+
+static inline struct kmem_cgroup *task_sk(struct task_struct *tsk)
+{
+	return NULL;
+}
+#endif /* CONFIG_CGROUP_KMEM */
+#endif /* _LINUX_KMEM_CGROUP_H */
+
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index d786b4f..bbd023a 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -55,6 +55,7 @@ struct netns_ipv4 {
 	int current_rt_cache_rebuild_count;
 
 	unsigned int sysctl_ping_group_range[2];
+	long sysctl_tcp_mem[3];
 
 	atomic_t rt_genid;
 	atomic_t dev_addr_genid;
diff --git a/include/net/sock.h b/include/net/sock.h
index 8e4062f..b68e6ea 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -62,7 +62,9 @@
 #include <linux/atomic.h>
 #include <net/dst.h>
 #include <net/checksum.h>
+#include <linux/kmem_cgroup.h>
 
+int sockets_populate(struct cgroup_subsys *ss, struct cgroup *cgrp);
 /*
  * This structure really needs to be cleaned up.
  * Most of it is for TCP, and not used by any of
@@ -339,6 +341,7 @@ struct sock {
 #endif
 	__u32			sk_mark;
 	u32			sk_classid;
+	struct kmem_cgroup	*sk_cgrp;
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk, int bytes);
 	void			(*sk_write_space)(struct sock *sk);
@@ -786,16 +789,18 @@ struct proto {
 
 	/* Memory pressure */
 	void			(*enter_memory_pressure)(struct sock *sk);
-	atomic_long_t		*memory_allocated;	/* Current allocated memory. */
-	struct percpu_counter	*sockets_allocated;	/* Current number of sockets. */
+	atomic_long_t		*(*memory_allocated)(struct kmem_cgroup *sg);	/* Current allocated memory. */
+	struct percpu_counter	*(*sockets_allocated)(struct kmem_cgroup *sg);	/* Current number of sockets. */
+
+	int			(*init_cgroup)(struct cgroup *cgrp, struct cgroup_subsys *ss);
 	/*
 	 * Pressure flag: try to collapse.
 	 * Technical note: it is used by multiple contexts non atomically.
 	 * All the __sk_mem_schedule() is of this nature: accounting
 	 * is strict, actions are advisory and have some latency.
 	 */
-	int			*memory_pressure;
-	long			*sysctl_mem;
+	int			*(*memory_pressure)(struct kmem_cgroup *sg);
+	long			*(*prot_mem)(struct kmem_cgroup *sg);
 	int			*sysctl_wmem;
 	int			*sysctl_rmem;
 	int			max_header;
@@ -826,6 +831,56 @@ struct proto {
 #endif
 };
 
+#define sk_memory_pressure(sk)						\
+({									\
+	int *__ret = NULL;						\
+	if (sk->sk_prot->memory_pressure)				\
+		__ret = sk->sk_prot->memory_pressure(sk->sk_cgrp);	\
+	__ret;								\
+})
+
+#define sk_sockets_allocated(sk)				\
+({ 								\
+	struct percpu_counter *__p;				\
+	__p = sk->sk_prot->sockets_allocated(sk->sk_cgrp);	\
+	__p;							\
+})
+
+#define sk_memory_allocated(sk)					\
+({								\
+	atomic_long_t *__mem;					\
+	__mem = sk->sk_prot->memory_allocated(sk->sk_cgrp);	\
+	__mem;							\
+})
+
+#define sk_prot_mem(sk)						\
+({								\
+	long *__mem = sk->sk_prot->prot_mem(sk->sk_cgrp);	\
+	__mem;							\
+})
+
+#define sg_memory_pressure(prot, sg)				\
+({								\
+	int *__ret = NULL;  					\
+	if (prot->memory_pressure)				\
+		__ret = prot->memory_pressure(sg);		\
+	__ret;							\
+})
+
+#define sg_memory_allocated(prot, sg)				\
+({								\
+	atomic_long_t *__mem; 					\
+	__mem = prot->memory_allocated(sg);			\
+	__mem;							\
+})
+
+#define sg_sockets_allocated(prot, sg)				\
+({ 								\
+	struct percpu_counter *__p;				\
+	__p = prot->sockets_allocated(sg);			\
+	__p;							\
+})
+
 extern int proto_register(struct proto *prot, int alloc_slab);
 extern void proto_unregister(struct proto *prot);
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 149a415..97405ed 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -230,7 +230,6 @@ extern int sysctl_tcp_fack;
 extern int sysctl_tcp_reordering;
 extern int sysctl_tcp_ecn;
 extern int sysctl_tcp_dsack;
-extern long sysctl_tcp_mem[3];
 extern int sysctl_tcp_wmem[3];
 extern int sysctl_tcp_rmem[3];
 extern int sysctl_tcp_app_win;
@@ -255,7 +254,13 @@ extern int sysctl_tcp_thin_dupack;
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
-extern int tcp_memory_pressure;
+
+struct kmem_cgroup;
+extern long *tcp_sysctl_mem(struct kmem_cgroup *sg);
+struct percpu_counter *sockets_allocated_tcp(struct kmem_cgroup *sg);
+int *memory_pressure_tcp(struct kmem_cgroup *sg);
+int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss);
+atomic_long_t *memory_allocated_tcp(struct kmem_cgroup *sg);
 
 /*
  * The next routines deal with comparing 32 bit unsigned ints
@@ -286,7 +291,7 @@ static inline bool tcp_too_many_orphans(struct sock *sk, int shift)
 	}
 
 	if (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
-	    atomic_long_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])
+	    atomic_long_read(sk_memory_allocated(sk)) > sk_prot_mem(sk)[2])
 		return true;
 	return false;
 }
diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
index 779abb9..44d2191 100644
--- a/include/trace/events/sock.h
+++ b/include/trace/events/sock.h
@@ -31,13 +31,14 @@ TRACE_EVENT(sock_rcvqueue_full,
 
 TRACE_EVENT(sock_exceed_buf_limit,
 
-	TP_PROTO(struct sock *sk, struct proto *prot, long allocated),
+	TP_PROTO(struct sock *sk, struct proto *prot, long allocated,
+		 long *prot_mem),
 
-	TP_ARGS(sk, prot, allocated),
+	TP_ARGS(sk, prot, allocated, prot_mem),
 
 	TP_STRUCT__entry(
 		__array(char, name, 32)
-		__field(long *, sysctl_mem)
+		__field(long *, prot_mem)
 		__field(long, allocated)
 		__field(int, sysctl_rmem)
 		__field(int, rmem_alloc)
@@ -45,7 +46,7 @@ TRACE_EVENT(sock_exceed_buf_limit,
 
 	TP_fast_assign(
 		strncpy(__entry->name, prot->name, 32);
-		__entry->sysctl_mem = prot->sysctl_mem;
+		__entry->prot_mem = prot_mem;
 		__entry->allocated = allocated;
 		__entry->sysctl_rmem = prot->sysctl_rmem[0];
 		__entry->rmem_alloc = atomic_read(&sk->sk_rmem_alloc);
@@ -54,9 +55,9 @@ TRACE_EVENT(sock_exceed_buf_limit,
 	TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld "
 		"sysctl_rmem=%d rmem_alloc=%d",
 		__entry->name,
-		__entry->sysctl_mem[0],
-		__entry->sysctl_mem[1],
-		__entry->sysctl_mem[2],
+		__entry->prot_mem[0],
+		__entry->prot_mem[1],
+		__entry->prot_mem[2],
 		__entry->allocated,
 		__entry->sysctl_rmem,
 		__entry->rmem_alloc)
diff --git a/init/Kconfig b/init/Kconfig
index d627783..ed3019c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -690,6 +690,17 @@ config CGROUP_MEM_RES_CTLR_SWAP_ENABLED
 	  select this option (if, for some reason, they need to disable it
 	  then swapaccount=0 does the trick).
 
+config CGROUP_KMEM
+	bool "Kernel Memory Resource Controller for Control Groups"
+	depends on CGROUPS 
+	help
+	  The Kernel Memory cgroup can limit the amount of memory used by
+	  certain kernel objects in the system. Those are fundamentally
+	  different from the entities handled by the Memory Controller,
+	  which are page-based, and can be swapped. Users of the kmem
+	  cgroup can use it to guarantee that no group of processes will
+	  ever exhaust kernel resources alone.
+
 config CGROUP_PERF
 	bool "Enable perf_event per-cpu per-container group (cgroup) monitoring"
 	depends on PERF_EVENTS && CGROUPS
diff --git a/mm/Makefile b/mm/Makefile
index 836e416..1b1aa24 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -45,6 +45,7 @@ obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
+obj-$(CONFIG_CGROUP_KMEM) += kmem_cgroup.o
 obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
 obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
 obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
diff --git a/mm/kmem_cgroup.c b/mm/kmem_cgroup.c
new file mode 100644
index 0000000..d2a86dd
--- /dev/null
+++ b/mm/kmem_cgroup.c
@@ -0,0 +1,53 @@
+/* kmem_cgroup.c - Kernel Memory Controller
+ *
+ * Copyright Parallels Inc, 2011
+ * Author: Glauber Costa <glommer@parallels.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/cgroup.h>
+#include <linux/slab.h>
+#include <net/sock.h>
+
+static int kmem_populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
+{
+	return sockets_populate(ss, cgrp);
+}
+
+static void
+kmem_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
+{
+	struct kmem_cgroup *sk = cgroup_sk(cgrp);
+	kfree(sk);
+}
+
+static struct cgroup_subsys_state *kmem_create(
+	struct cgroup_subsys *ss, struct cgroup *cgrp)
+{
+	struct kmem_cgroup *sk = kzalloc(sizeof(*sk), GFP_KERNEL);
+
+	if (!sk)
+		return ERR_PTR(-ENOMEM);
+
+	if (cgrp->parent)
+		sk->parent = cgroup_sk(cgrp->parent);
+
+	return &sk->css;
+}
+
+struct cgroup_subsys kmem_subsys = {
+	.name = "kmem",
+	.create = kmem_create,
+	.destroy = kmem_destroy,
+	.populate = kmem_populate,
+	.subsys_id = kmem_subsys_id,
+};
diff --git a/net/core/sock.c b/net/core/sock.c
index bc745d0..2b748d5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -134,6 +134,25 @@
 #include <net/tcp.h>
 #endif
 
+static DEFINE_RWLOCK(proto_list_lock);
+static LIST_HEAD(proto_list);
+
+int sockets_populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
+{
+	struct proto *proto;
+	int ret = 0;
+
+	read_lock(&proto_list_lock);
+	list_for_each_entry(proto, &proto_list, node) {
+		if (proto->init_cgroup) {
+			ret |= proto->init_cgroup(cgrp, ss);
+		}
+	}
+	read_unlock(&proto_list_lock);
+	
+	return ret;
+}
+
 /*
  * Each address family might have different locking rules, so we have
  * one slock key per address family:
@@ -1114,6 +1133,16 @@ void sock_update_classid(struct sock *sk)
 		sk->sk_classid = classid;
 }
 EXPORT_SYMBOL(sock_update_classid);
+
+void sock_update_cgrp(struct sock *sk)
+{
+#ifdef CONFIG_CGROUP_KMEM
+	rcu_read_lock(); 
+	sk->sk_cgrp = task_sk(current);
+	rcu_read_unlock();
+#endif
+}
+
 #endif
 
 /**
@@ -1141,6 +1170,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		atomic_set(&sk->sk_wmem_alloc, 1);
 
 		sock_update_classid(sk);
+		sock_update_cgrp(sk);
 	}
 
 	return sk;
@@ -1289,8 +1319,8 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
 		sk_set_socket(newsk, NULL);
 		newsk->sk_wq = NULL;
 
-		if (newsk->sk_prot->sockets_allocated)
-			percpu_counter_inc(newsk->sk_prot->sockets_allocated);
+		if (sk_sockets_allocated(sk))
+			percpu_counter_inc(sk_sockets_allocated(sk));
 
 		if (sock_flag(newsk, SOCK_TIMESTAMP) ||
 		    sock_flag(newsk, SOCK_TIMESTAMPING_RX_SOFTWARE))
@@ -1678,29 +1708,50 @@ EXPORT_SYMBOL(sk_wait_data);
  */
 int __sk_mem_schedule(struct sock *sk, int size, int kind)
 {
-	struct proto *prot = sk->sk_prot;
 	int amt = sk_mem_pages(size);
+	struct proto *prot = sk->sk_prot;
 	long allocated;
+	int *memory_pressure;
+	long *prot_mem;
+	int parent_failure = 0;
+	struct kmem_cgroup *sg = sk->sk_cgrp;
 
 	sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
-	allocated = atomic_long_add_return(amt, prot->memory_allocated);
+
+	memory_pressure = sk_memory_pressure(sk);
+	prot_mem = sk_prot_mem(sk);
+
+	allocated = atomic_long_add_return(amt, sk_memory_allocated(sk));
+
+#ifdef CONFIG_KMEM_CGROUP
+	for (sg = sk->sk_cgrp->parent; sg != NULL; sg = sg->parent) {
+		long alloc;
+		/*
+		 * Large nestings are not the common case, and stopping in the
+		 * middle would be complicated enough, that we bill it all the
+		 * way through the root, and if needed, unbill everything later
+		 */
+		alloc = atomic_long_add_return(amt, sg_memory_allocated(prot, sg));
+		parent_failure |= (alloc > sk_prot_mem(sk)[2]);
+	} 
+#endif
+
+	/* Over hard limit (we, or our parents) */
+	if (parent_failure || (allocated > prot_mem[2]))
+		goto suppress_allocation;
 
 	/* Under limit. */
-	if (allocated <= prot->sysctl_mem[0]) {
-		if (prot->memory_pressure && *prot->memory_pressure)
-			*prot->memory_pressure = 0;
+	if (allocated <= prot_mem[0]) {
+		if (memory_pressure && *memory_pressure)
+			*memory_pressure = 0;
 		return 1;
 	}
 
 	/* Under pressure. */
-	if (allocated > prot->sysctl_mem[1])
+	if (allocated > prot_mem[1])
 		if (prot->enter_memory_pressure)
 			prot->enter_memory_pressure(sk);
 
-	/* Over hard limit. */
-	if (allocated > prot->sysctl_mem[2])
-		goto suppress_allocation;
-
 	/* guarantee minimum buffer size under pressure */
 	if (kind == SK_MEM_RECV) {
 		if (atomic_read(&sk->sk_rmem_alloc) < prot->sysctl_rmem[0])
@@ -1714,13 +1765,13 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind)
 				return 1;
 	}
 
-	if (prot->memory_pressure) {
+	if (memory_pressure) {
 		int alloc;
 
-		if (!*prot->memory_pressure)
+		if (!*memory_pressure)
 			return 1;
-		alloc = percpu_counter_read_positive(prot->sockets_allocated);
-		if (prot->sysctl_mem[2] > alloc *
+		alloc = percpu_counter_read_positive(sk_sockets_allocated(sk));
+		if (prot_mem[2] > alloc *
 		    sk_mem_pages(sk->sk_wmem_queued +
 				 atomic_read(&sk->sk_rmem_alloc) +
 				 sk->sk_forward_alloc))
@@ -1739,11 +1790,19 @@ suppress_allocation:
 			return 1;
 	}
 
-	trace_sock_exceed_buf_limit(sk, prot, allocated);
+	trace_sock_exceed_buf_limit(sk, prot, allocated, prot_mem);
 
 	/* Alas. Undo changes. */
 	sk->sk_forward_alloc -= amt * SK_MEM_QUANTUM;
-	atomic_long_sub(amt, prot->memory_allocated);
+
+	atomic_long_sub(amt, sk_memory_allocated(sk));
+
+#ifdef CONFIG_CGROUP_KMEM
+	for (sg = sk->sk_cgrp->parent; sg != NULL; sg = sg->parent) {
+		atomic_long_sub(amt, sg_memory_allocated(prot, sg));
+	}
+#endif
+
 	return 0;
 }
 EXPORT_SYMBOL(__sk_mem_schedule);
@@ -1755,14 +1814,25 @@ EXPORT_SYMBOL(__sk_mem_schedule);
 void __sk_mem_reclaim(struct sock *sk)
 {
 	struct proto *prot = sk->sk_prot;
+	struct kmem_cgroup *sg = sk->sk_cgrp;
+	int *memory_pressure = sk_memory_pressure(sk);
+	
 
 	atomic_long_sub(sk->sk_forward_alloc >> SK_MEM_QUANTUM_SHIFT,
-		   prot->memory_allocated);
+		   sk_memory_allocated(sk));
+
+#ifdef CONFIG_CGROUP_KMEM
+	for (sg = sk->sk_cgrp->parent; sg != NULL; sg = sg->parent) {
+		atomic_long_sub(sk->sk_forward_alloc >> SK_MEM_QUANTUM_SHIFT,
+						sg_memory_allocated(prot, sg));
+	}
+#endif
+
 	sk->sk_forward_alloc &= SK_MEM_QUANTUM - 1;
 
-	if (prot->memory_pressure && *prot->memory_pressure &&
-	    (atomic_long_read(prot->memory_allocated) < prot->sysctl_mem[0]))
-		*prot->memory_pressure = 0;
+	if (memory_pressure && *memory_pressure &&
+	    (atomic_long_read(sk_memory_allocated(sk)) < sk_prot_mem(sk)[0]))
+		*memory_pressure = 0;
 }
 EXPORT_SYMBOL(__sk_mem_reclaim);
 
@@ -2254,9 +2324,6 @@ void sk_common_release(struct sock *sk)
 }
 EXPORT_SYMBOL(sk_common_release);
 
-static DEFINE_RWLOCK(proto_list_lock);
-static LIST_HEAD(proto_list);
-
 #ifdef CONFIG_PROC_FS
 #define PROTO_INUSE_NR	64	/* should be enough for the first time */
 struct prot_inuse {
@@ -2481,13 +2548,15 @@ static char proto_method_implemented(const void *method)
 
 static void proto_seq_printf(struct seq_file *seq, struct proto *proto)
 {
+	struct kmem_cgroup *sg = task_sk(current);
+
 	seq_printf(seq, "%-9s %4u %6d  %6ld   %-3s %6u   %-3s  %-10s "
 			"%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c\n",
 		   proto->name,
 		   proto->obj_size,
 		   sock_prot_inuse_get(seq_file_net(seq), proto),
-		   proto->memory_allocated != NULL ? atomic_long_read(proto->memory_allocated) : -1L,
-		   proto->memory_pressure != NULL ? *proto->memory_pressure ? "yes" : "no" : "NI",
+		   proto->memory_allocated != NULL ? atomic_long_read(sg_memory_allocated(proto, sg)) : -1L,
+		   proto->memory_pressure != NULL ? *sg_memory_pressure(proto, sg) ? "yes" : "no" : "NI",
 		   proto->max_header,
 		   proto->slab == NULL ? "no" : "yes",
 		   module_name(proto->owner),
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index b14ec7d..e8e8889 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -53,19 +53,21 @@ static int sockstat_seq_show(struct seq_file *seq, void *v)
 	struct net *net = seq->private;
 	int orphans, sockets;
 
+	struct kmem_cgroup *sg = task_sk(current);
+
 	local_bh_disable();
 	orphans = percpu_counter_sum_positive(&tcp_orphan_count);
-	sockets = percpu_counter_sum_positive(&tcp_sockets_allocated);
+	sockets = percpu_counter_sum_positive(sg_sockets_allocated((&tcp_prot), sg));
 	local_bh_enable();
 
 	socket_seq_show(seq);
 	seq_printf(seq, "TCP: inuse %d orphan %d tw %d alloc %d mem %ld\n",
 		   sock_prot_inuse_get(net, &tcp_prot), orphans,
 		   tcp_death_row.tw_count, sockets,
-		   atomic_long_read(&tcp_memory_allocated));
+		   atomic_long_read(sg_memory_allocated((&tcp_prot), sg)));
 	seq_printf(seq, "UDP: inuse %d mem %ld\n",
 		   sock_prot_inuse_get(net, &udp_prot),
-		   atomic_long_read(&udp_memory_allocated));
+		   atomic_long_read(sg_memory_allocated((&udp_prot), sg)));
 	seq_printf(seq, "UDPLITE: inuse %d\n",
 		   sock_prot_inuse_get(net, &udplite_prot));
 	seq_printf(seq, "RAW: inuse %d\n",
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 69fd720..9ce7e75 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -14,6 +14,8 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/nsproxy.h>
+#include <linux/kmem_cgroup.h>
+#include <linux/swap.h>
 #include <net/snmp.h>
 #include <net/icmp.h>
 #include <net/ip.h>
@@ -174,6 +176,43 @@ static int proc_allowed_congestion_control(ctl_table *ctl,
 	return ret;
 }
 
+static int ipv4_tcp_mem(ctl_table *ctl, int write,
+			   void __user *buffer, size_t *lenp,
+			   loff_t *ppos)
+{
+	int ret;
+	unsigned long vec[3];
+	struct kmem_cgroup *kmem = task_sk(current);
+	struct net *net = current->nsproxy->net_ns;
+	int i;
+
+	ctl_table tmp = {
+		.data = &vec,
+		.maxlen = sizeof(vec),
+		.mode = ctl->mode,
+	};
+
+	if (!write) {
+		ctl->data = &net->ipv4.sysctl_tcp_mem;
+		return proc_doulongvec_minmax(ctl, write, buffer, lenp, ppos);
+	}
+
+	ret = proc_doulongvec_minmax(&tmp, write, buffer, lenp, ppos);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < 3; i++)
+		if (vec[i] > kmem->tcp_max_memory)
+			return -EINVAL;
+
+	for (i = 0; i < 3; i++) {
+		net->ipv4.sysctl_tcp_mem[i] = vec[i];
+		kmem->tcp_prot_mem[i] = net->ipv4.sysctl_tcp_mem[i];
+	}
+
+	return 0;
+}
+
 static struct ctl_table ipv4_table[] = {
 	{
 		.procname	= "tcp_timestamps",
@@ -433,13 +472,6 @@ static struct ctl_table ipv4_table[] = {
 		.proc_handler	= proc_dointvec
 	},
 	{
-		.procname	= "tcp_mem",
-		.data		= &sysctl_tcp_mem,
-		.maxlen		= sizeof(sysctl_tcp_mem),
-		.mode		= 0644,
-		.proc_handler	= proc_doulongvec_minmax
-	},
-	{
 		.procname	= "tcp_wmem",
 		.data		= &sysctl_tcp_wmem,
 		.maxlen		= sizeof(sysctl_tcp_wmem),
@@ -721,6 +753,12 @@ static struct ctl_table ipv4_net_table[] = {
 		.mode		= 0644,
 		.proc_handler	= ipv4_ping_group_range,
 	},
+	{
+		.procname	= "tcp_mem",
+		.maxlen		= sizeof(init_net.ipv4.sysctl_tcp_mem),
+		.mode		= 0644,
+		.proc_handler	= ipv4_tcp_mem,
+	},
 	{ }
 };
 
@@ -734,6 +772,7 @@ EXPORT_SYMBOL_GPL(net_ipv4_ctl_path);
 static __net_init int ipv4_sysctl_init_net(struct net *net)
 {
 	struct ctl_table *table;
+	unsigned long limit;
 
 	table = ipv4_net_table;
 	if (!net_eq(net, &init_net)) {
@@ -769,6 +808,12 @@ static __net_init int ipv4_sysctl_init_net(struct net *net)
 
 	net->ipv4.sysctl_rt_cache_rebuild_count = 4;
 
+	limit = nr_free_buffer_pages() / 8;
+	limit = max(limit, 128UL);
+	net->ipv4.sysctl_tcp_mem[0] = limit / 4 * 3;
+	net->ipv4.sysctl_tcp_mem[1] = limit;
+	net->ipv4.sysctl_tcp_mem[2] = net->ipv4.sysctl_tcp_mem[0] * 2;
+
 	net->ipv4.ipv4_hdr = register_net_sysctl_table(net,
 			net_ipv4_ctl_path, table);
 	if (net->ipv4.ipv4_hdr == NULL)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 46febca..beec487 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -266,6 +266,7 @@
 #include <linux/crypto.h>
 #include <linux/time.h>
 #include <linux/slab.h>
+#include <linux/nsproxy.h>
 
 #include <net/icmp.h>
 #include <net/tcp.h>
@@ -282,23 +283,12 @@ int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
 struct percpu_counter tcp_orphan_count;
 EXPORT_SYMBOL_GPL(tcp_orphan_count);
 
-long sysctl_tcp_mem[3] __read_mostly;
 int sysctl_tcp_wmem[3] __read_mostly;
 int sysctl_tcp_rmem[3] __read_mostly;
 
-EXPORT_SYMBOL(sysctl_tcp_mem);
 EXPORT_SYMBOL(sysctl_tcp_rmem);
 EXPORT_SYMBOL(sysctl_tcp_wmem);
 
-atomic_long_t tcp_memory_allocated;	/* Current allocated memory. */
-EXPORT_SYMBOL(tcp_memory_allocated);
-
-/*
- * Current number of TCP sockets.
- */
-struct percpu_counter tcp_sockets_allocated;
-EXPORT_SYMBOL(tcp_sockets_allocated);
-
 /*
  * TCP splice context
  */
@@ -308,17 +298,141 @@ struct tcp_splice_state {
 	unsigned int flags;
 };
 
+#ifdef CONFIG_CGROUP_KMEM
 /*
  * Pressure flag: try to collapse.
  * Technical note: it is used by multiple contexts non atomically.
  * All the __sk_mem_schedule() is of this nature: accounting
  * is strict, actions are advisory and have some latency.
  */
-int tcp_memory_pressure __read_mostly;
-EXPORT_SYMBOL(tcp_memory_pressure);
-
 void tcp_enter_memory_pressure(struct sock *sk)
 {
+	struct kmem_cgroup *sg = sk->sk_cgrp;
+	if (!sg->tcp_memory_pressure) {
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPMEMORYPRESSURES);
+		sg->tcp_memory_pressure = 1;
+	}
+}
+EXPORT_SYMBOL(tcp_enter_memory_pressure);
+
+long *tcp_sysctl_mem(struct kmem_cgroup *sg)
+{
+	return sg->tcp_prot_mem;
+}
+EXPORT_SYMBOL(tcp_sysctl_mem);
+
+atomic_long_t *memory_allocated_tcp(struct kmem_cgroup *sg)
+{
+	return &(sg->tcp_memory_allocated);
+}
+EXPORT_SYMBOL(memory_allocated_tcp);
+
+static int tcp_write_maxmem(struct cgroup *cgrp, struct cftype *cft, u64 val)
+{
+	struct kmem_cgroup *sg = cgroup_sk(cgrp);
+
+	if (!cgroup_lock_live_group(cgrp))
+		return -ENODEV;
+
+	/*
+	 * We can't allow more memory than our parents. Since this
+	 * will be tested for all calls, by induction, there is no need
+	 * to test any parent other than our own
+	 * */
+	if (sg->parent && (val > sg->parent->tcp_max_memory))
+		val = sg->parent->tcp_max_memory;
+
+	sg->tcp_max_memory = val;
+
+	sg->tcp_prot_mem[0] = val / 2;
+	sg->tcp_prot_mem[1] = (val * 2) / 3;
+	sg->tcp_prot_mem[2] = val;
+
+	cgroup_unlock();
+
+	return 0;
+}
+
+static u64 tcp_read_maxmem(struct cgroup *cgrp, struct cftype *cft)
+{
+	struct kmem_cgroup *sg = cgroup_sk(cgrp);
+	u64 ret;
+
+	if (!cgroup_lock_live_group(cgrp))
+		return -ENODEV;
+	ret = sg->tcp_max_memory;
+
+	cgroup_unlock();
+	return ret;
+}
+
+static struct cftype tcp_files[] = {
+	{
+		.name = "tcp_maxmem",
+		.write_u64 = tcp_write_maxmem,
+		.read_u64 = tcp_read_maxmem,
+	},
+};
+
+int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss)
+{
+	struct kmem_cgroup *sg = cgroup_sk(cgrp);
+	unsigned long limit;
+	struct net *net = current->nsproxy->net_ns;
+
+	sg->tcp_memory_pressure = 0;
+
+	percpu_counter_init(&sg->tcp_sockets_allocated, 0);
+	atomic_long_set(&sg->tcp_memory_allocated, 0);
+
+	limit = nr_free_buffer_pages() / 8;
+	limit = max(limit, 128UL);
+
+	if (sg->parent)
+		sg->tcp_max_memory = sg->parent->tcp_max_memory;
+	else
+		sg->tcp_max_memory = limit * 2;
+
+	sg->tcp_prot_mem[0] = net->ipv4.sysctl_tcp_mem[0];
+	sg->tcp_prot_mem[1] = net->ipv4.sysctl_tcp_mem[1];
+	sg->tcp_prot_mem[2] = net->ipv4.sysctl_tcp_mem[2];
+
+	return cgroup_add_files(cgrp, ss, tcp_files, ARRAY_SIZE(tcp_files));
+}
+EXPORT_SYMBOL(tcp_init_cgroup);
+
+int *memory_pressure_tcp(struct kmem_cgroup *sg)
+{
+	return &sg->tcp_memory_pressure;
+}
+EXPORT_SYMBOL(memory_pressure_tcp);
+
+struct percpu_counter *sockets_allocated_tcp(struct kmem_cgroup *sg)
+{
+	return &sg->tcp_sockets_allocated;
+}
+EXPORT_SYMBOL(sockets_allocated_tcp);
+#else
+
+/* Current number of TCP sockets. */
+struct percpu_counter tcp_sockets_allocated;
+atomic_long_t tcp_memory_allocated;	/* Current allocated memory. */
+int tcp_memory_pressure;
+
+int *memory_pressure_tcp(struct kmem_cgroup *sg)
+{
+	return &tcp_memory_pressure;
+}
+EXPORT_SYMBOL(memory_pressure_tcp);
+
+struct percpu_counter *sockets_allocated_tcp(struct kmem_cgroup *sg)
+{
+	return &tcp_sockets_allocated;
+}
+EXPORT_SYMBOL(sockets_allocated_tcp);
+
+void tcp_enter_memory_pressure(struct sock *sock)
+{
 	if (!tcp_memory_pressure) {
 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPMEMORYPRESSURES);
 		tcp_memory_pressure = 1;
@@ -326,6 +440,19 @@ void tcp_enter_memory_pressure(struct sock *sk)
 }
 EXPORT_SYMBOL(tcp_enter_memory_pressure);
 
+long *tcp_sysctl_mem(struct kmem_cgroup *sg)
+{
+	return init_net.ipv4.sysctl_tcp_mem;
+}
+EXPORT_SYMBOL(tcp_sysctl_mem);
+
+atomic_long_t *memory_allocated_tcp(struct kmem_cgroup *sg)
+{
+	return &tcp_memory_allocated;
+}
+EXPORT_SYMBOL(memory_allocated_tcp);
+#endif /* CONFIG_CGROUP_KMEM */
+
 /* Convert seconds to retransmits based on initial and max timeout */
 static u8 secs_to_retrans(int seconds, int timeout, int rto_max)
 {
@@ -3226,7 +3353,9 @@ void __init tcp_init(void)
 
 	BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
 
+#ifndef CONFIG_CGROUP_KMEM
 	percpu_counter_init(&tcp_sockets_allocated, 0);
+#endif
 	percpu_counter_init(&tcp_orphan_count, 0);
 	tcp_hashinfo.bind_bucket_cachep =
 		kmem_cache_create("tcp_bind_bucket",
@@ -3277,14 +3406,8 @@ void __init tcp_init(void)
 	sysctl_tcp_max_orphans = cnt / 2;
 	sysctl_max_syn_backlog = max(128, cnt / 256);
 
-	limit = nr_free_buffer_pages() / 8;
-	limit = max(limit, 128UL);
-	sysctl_tcp_mem[0] = limit / 4 * 3;
-	sysctl_tcp_mem[1] = limit;
-	sysctl_tcp_mem[2] = sysctl_tcp_mem[0] * 2;
-
 	/* Set per-socket limits to no more than 1/128 the pressure threshold */
-	limit = ((unsigned long)sysctl_tcp_mem[1]) << (PAGE_SHIFT - 7);
+	limit = ((unsigned long)init_net.ipv4.sysctl_tcp_mem[1]) << (PAGE_SHIFT - 7);
 	max_share = min(4UL*1024*1024, limit);
 
 	sysctl_tcp_wmem[0] = SK_MEM_QUANTUM;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ea0d218..c44e830 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -316,7 +316,7 @@ static void tcp_grow_window(struct sock *sk, struct sk_buff *skb)
 	/* Check #1 */
 	if (tp->rcv_ssthresh < tp->window_clamp &&
 	    (int)tp->rcv_ssthresh < tcp_space(sk) &&
-	    !tcp_memory_pressure) {
+	    !sk_memory_pressure(sk)) {
 		int incr;
 
 		/* Check #2. Increase window, if skb with such overhead
@@ -393,15 +393,16 @@ static void tcp_clamp_window(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
+	struct proto *prot = sk->sk_prot;
 
 	icsk->icsk_ack.quick = 0;
 
-	if (sk->sk_rcvbuf < sysctl_tcp_rmem[2] &&
+	if (sk->sk_rcvbuf < prot->sysctl_rmem[2] &&
 	    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
-	    !tcp_memory_pressure &&
-	    atomic_long_read(&tcp_memory_allocated) < sysctl_tcp_mem[0]) {
+	    !sk_memory_pressure(sk) &&
+	    atomic_long_read(sk_memory_allocated(sk)) < sk_prot_mem(sk)[0]) {
 		sk->sk_rcvbuf = min(atomic_read(&sk->sk_rmem_alloc),
-				    sysctl_tcp_rmem[2]);
+				    prot->sysctl_rmem[2]);
 	}
 	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
 		tp->rcv_ssthresh = min(tp->window_clamp, 2U * tp->advmss);
@@ -4806,7 +4807,7 @@ static int tcp_prune_queue(struct sock *sk)
 
 	if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
 		tcp_clamp_window(sk);
-	else if (tcp_memory_pressure)
+	else if (sk_memory_pressure(sk))
 		tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss);
 
 	tcp_collapse_ofo_queue(sk);
@@ -4872,11 +4873,11 @@ static int tcp_should_expand_sndbuf(struct sock *sk)
 		return 0;
 
 	/* If we are under global TCP memory pressure, do not expand.  */
-	if (tcp_memory_pressure)
+	if (sk_memory_pressure(sk))
 		return 0;
 
 	/* If we are under soft global TCP memory pressure, do not expand.  */
-	if (atomic_long_read(&tcp_memory_allocated) >= sysctl_tcp_mem[0])
+	if (atomic_long_read(sk_memory_allocated(sk)) >= sk_prot_mem(sk)[0])
 		return 0;
 
 	/* If we filled the congestion window, do not expand.  */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 1c12b8e..88034a3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1901,7 +1901,7 @@ static int tcp_v4_init_sock(struct sock *sk)
 	sk->sk_rcvbuf = sysctl_tcp_rmem[1];
 
 	local_bh_disable();
-	percpu_counter_inc(&tcp_sockets_allocated);
+	percpu_counter_inc(sk_sockets_allocated(sk));
 	local_bh_enable();
 
 	return 0;
@@ -1957,7 +1957,7 @@ void tcp_v4_destroy_sock(struct sock *sk)
 		tp->cookie_values = NULL;
 	}
 
-	percpu_counter_dec(&tcp_sockets_allocated);
+	percpu_counter_dec(sk_sockets_allocated(sk));
 }
 EXPORT_SYMBOL(tcp_v4_destroy_sock);
 
@@ -2598,11 +2598,14 @@ struct proto tcp_prot = {
 	.unhash			= inet_unhash,
 	.get_port		= inet_csk_get_port,
 	.enter_memory_pressure	= tcp_enter_memory_pressure,
-	.sockets_allocated	= &tcp_sockets_allocated,
+	.memory_pressure	= memory_pressure_tcp,
+	.sockets_allocated	= sockets_allocated_tcp,
 	.orphan_count		= &tcp_orphan_count,
-	.memory_allocated	= &tcp_memory_allocated,
-	.memory_pressure	= &tcp_memory_pressure,
-	.sysctl_mem		= sysctl_tcp_mem,
+	.memory_allocated	= memory_allocated_tcp,
+#ifdef CONFIG_CGROUP_KMEM
+	.init_cgroup		= tcp_init_cgroup,
+#endif
+	.prot_mem		= tcp_sysctl_mem,
 	.sysctl_wmem		= sysctl_tcp_wmem,
 	.sysctl_rmem		= sysctl_tcp_rmem,
 	.max_header		= MAX_TCP_HEADER,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 882e0b0..06aeb31 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1912,7 +1912,7 @@ u32 __tcp_select_window(struct sock *sk)
 	if (free_space < (full_space >> 1)) {
 		icsk->icsk_ack.quick = 0;
 
-		if (tcp_memory_pressure)
+		if (sk_memory_pressure(sk))
 			tp->rcv_ssthresh = min(tp->rcv_ssthresh,
 					       4U * tp->advmss);
 
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index ecd44b0..2c67617 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -261,7 +261,7 @@ static void tcp_delack_timer(unsigned long data)
 	}
 
 out:
-	if (tcp_memory_pressure)
+	if (sk_memory_pressure(sk))
 		sk_mem_reclaim(sk);
 out_unlock:
 	bh_unlock_sock(sk);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1b5a193..258f137 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -120,9 +120,6 @@ EXPORT_SYMBOL(sysctl_udp_rmem_min);
 int sysctl_udp_wmem_min __read_mostly;
 EXPORT_SYMBOL(sysctl_udp_wmem_min);
 
-atomic_long_t udp_memory_allocated;
-EXPORT_SYMBOL(udp_memory_allocated);
-
 #define MAX_UDP_PORTS 65536
 #define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN)
 
@@ -1918,6 +1915,24 @@ unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
 }
 EXPORT_SYMBOL(udp_poll);
 
+#ifdef CONFIG_CGROUP_KMEM
+static atomic_long_t *memory_allocated_udp(struct kmem_cgroup *sg)
+{
+	return &sg->udp_memory_allocated;
+}
+#else
+atomic_long_t udp_memory_allocated;
+static atomic_long_t *memory_allocated_udp(struct kmem_cgroup *sg)
+{
+	return &udp_memory_allocated;
+}
+#endif
+
+static long *udp_sysctl_mem(struct kmem_cgroup *sg)
+{
+	return sysctl_udp_mem;
+}
+
 struct proto udp_prot = {
 	.name		   = "UDP",
 	.owner		   = THIS_MODULE,
@@ -1936,8 +1951,8 @@ struct proto udp_prot = {
 	.unhash		   = udp_lib_unhash,
 	.rehash		   = udp_v4_rehash,
 	.get_port	   = udp_v4_get_port,
-	.memory_allocated  = &udp_memory_allocated,
-	.sysctl_mem	   = sysctl_udp_mem,
+	.memory_allocated  = &memory_allocated_udp,
+	.prot_mem	   = udp_sysctl_mem,
 	.sysctl_wmem	   = &sysctl_udp_wmem_min,
 	.sysctl_rmem	   = &sysctl_udp_rmem_min,
 	.obj_size	   = sizeof(struct udp_sock),

^ permalink raw reply related

* Re: BQL crap and wireless
From: Adrian Chadd @ 2011-09-01  2:44 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Jim Gettys, Andrew McGregor, Tom Herbert, Dave Taht,
	linux-wireless, Matt Smith, Kevin Hayes, Derek Smithies, netdev
In-Reply-To: <CAB=NE6Wj4BxSjZmTOg9EFMcJ+H0RBZE4q8+6DB3x8qP=r42yNQ@mail.gmail.com>

I think there's enough interesting ideas here to keep people busy
experimenting for a while.

I'm going to refrain from chiming in further until I've got the
FreeBSD 11n TX code at the point where I can begin tinkering with
adaptive queue management in the driver and rate control layer.

Thanks for the pointers and interesting discussion,



Adrian

^ permalink raw reply

* Warning Increase Your Mailbox Quota Now
From: System Administrator @ 2011-09-01  1:24 UTC (permalink / raw)



Your Mail Quota Has Exceeded The Set Quota/Limit. You Are Currently
Running On 23GB Due To Hidden Files And Folder On Your Mailbox,
you may not be able to receive or send new mails until you re-validate.

Please Click the Link Below To Validate Your Mailbox And Increase Your Quota.

http://buzurl.com/bc74

Failure To Validate Your Quota May Result In Loss Of Important Information
In Your Mailbox Or Cause Limited Access To It.

Mail Quota alert -Error Code #1997142DDE
System Administrator

^ permalink raw reply

* Re: [PATCH 18/24] sctp: Remove unnecessary OOM logging messages
From: Joe Perches @ 2011-09-01  0:25 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, vladislav.yasevich, sri, linux-sctp, netdev,
	linux-kernel, Andrew Morton
In-Reply-To: <20110829.181520.933815499847625814.davem@davemloft.net>

On Mon, 2011-08-29 at 18:15 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 29 Aug 2011 23:51:21 +0200
> > Le lundi 29 août 2011 à 23:43 +0200, Eric Dumazet a écrit :
> >> Furthermore, a failed vmalloc() is not guaranteed to emit an OOM
> >> message, is it ?
> > It currently displays a message without context :
> > vmap allocation for size XXXXXX failed: use vmalloc=<size> to increase
> > size.
> > So we dont know which part of the kernel asked this allocation.
> > Please dont remove existing error messages after failed vmalloc() calls.
> Indeed.
> Joe, these vmalloc() and also the __GFP_NOWARN cases will need to be
> attended to and this series resubmitted as such.

No worries.

Andrew Morton picked up a patch I posted that
changes vmalloc to be similar to kmalloc when
the pointer returned is NULL (OOM).  It now
uses dump_stack for those cases.

https://patchwork.kernel.org/patch/1114682/

I'll keep all the current vmalloc failure messages
for now and resubmit in a day or two this series
with acks.  Not batman or netfilter though as they
were picked up by their maintainers.

A month or two after the vmalloc patch hits
mainline and/or wider testing, and it's deemed
acceptable, removing vmalloc site specific OOM
messages should be appropriate.

Anyone object?

I plan on submitting drivers/net OOM removals
next week.

^ permalink raw reply

* Re: [PATCH] bridge: mask forwarding of IEEE 802 local multicast groups
From: David Lamparter @ 2011-09-01  0:16 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Nick Carter, David Lamparter, eswierk, netdev,
	Michał Mirosław, davem
In-Reply-To: <20110831134904.1a050924@nehalam.ftrdhcpuser.net>

On Wed, Aug 31, 2011 at 01:49:04PM -0700, Stephen Hemminger wrote:
> On Wed, 31 Aug 2011 21:41:26 +0100
> Nick Carter <ncarter100@gmail.com> wrote:
> 
> > On 15 August 2011 19:25, Stephen Hemminger
> > <shemminger@linux-foundation.org> wrote:
> > > On Mon, 15 Aug 2011 17:27:12 +0100
> > > Nick Carter <ncarter100@gmail.com> wrote:
> > >
> > >> On 28 July 2011 16:41, Stephen Hemminger
> > >> <shemminger@linux-foundation.org> wrote:
> > >> > On Wed, 27 Jul 2011 13:17:15 +0200
> > >> > David Lamparter <equinox@diac24.net> wrote:
> > >> >
> > >> >> On Fri, Jul 15, 2011 at 06:33:45PM +0200, David Lamparter wrote:
> > >> >> > On Fri, Jul 15, 2011 at 06:03:57PM +0200, David Lamparter wrote:
> > >> >> > > On Fri, Jul 15, 2011 at 04:44:50PM +0100, Nick Carter wrote:
> > >> >> > > > On 12 July 2011 12:36, David Lamparter <equinox@diac24.net> wrote:
> > >> >> > > > > On Mon, Jul 11, 2011 at 08:27:55AM -0700, Stephen Hemminger wrote:
> > >> >> > > > >> I am still undecided on this. Understand the need, but don't like idea
> > >> >> > > > >> of bridge behaving in non-conforming manner. Will see if IEEE 802 committee
> > >> >> > > > >> has any input.
> > >> >> > > > >
> > >> >> > > > > The patch doesn't make the bridge behave nonconformant. The default mask
> > >> >> > > > > is 0, which just keeps the old behaviour.
> > >> >> >
> > >> >> > P.S.: I'd like to once more stress this. In my opinion the patch should
> > >> >> > be merged because it provides desireable functionality at a small cost
> > >> >> > (one test, one knob) and __does not change any default behaviour__.
> > >> >>
> > >> >> Stephen, anything new on this?
> > >> >
> > >> > No.
> > >> > Don't like adding yet another hack user visible API which will have
> > >> > to be maintained for too long. But on the other hand I don't have
> > >> > a better solution at my finger tips. If better idea doesn't come
> > >> > along, then we can go with yours.
> > >> >
> > >> I have not noticed any other proposals and this thread has been open
> > >> for quite a while.  Have we waited long enough ? If so can this patch
> > >> be taken ?
> > >>
> > >
> > > I am testing an alternative. The problem with your proposal is that
> > > it relies on the multicast address. It turns out there are people using
> > > other addresses for the STP group address, so using that as a identifier
> > > is incorrect.
> > If the chosen STP group address is in the local multicast group range
> > this patch will handle it.
> > 
> > David Lamparter has reviewed this patch and asked for it to be merged.
> >  This patch has at least two real world uses.  Ed needs this patch to
> > forward LLDP frames and I need this patch to forward 802.1X frames.
> > 
> > This patch has been out for review for 9 weeks and it still looks like
> > the best solution.
> 
> I prefer the netfilter solution because it is more general. We already have
> a firewall solution why shouldn't this case be part of it?

Nick's patch *IS* the netfilter solution. Check where it jumps to:

forward:
	switch (p->state) {
	case BR_STATE_FORWARDING:
		rhook = rcu_dereference(br_should_route_hook);
		if (rhook) {
			if ((*rhook)(skb)) {
				*pskb = skb;
				return RX_HANDLER_PASS;

This calls ebt_broute, which returns true if the BROUTING chain says
"DROP", which means "don't bridge, deliver on physdev" in this context.

Your patch reinvents the wheel - new ebtables chain - and does not
allow any control without bridge-netfilter in kernel.

Nick's patch allows rudimentary control (enough for most cases i'd say)
when bridge-netfilter is disabled and full same-as-other-multicast
control when bridge-netfilter is enabled/loaded.


-David

^ permalink raw reply

* Re: RFC - should network devices trim frames > soft mtu
From: David Lamparter @ 2011-09-01  0:10 UTC (permalink / raw)
  To: Michael Chan; +Cc: Stephen Hemminger, David Miller, netdev@vger.kernel.org
In-Reply-To: <1314829651.9556.37.camel@HP1>

On Wed, Aug 31, 2011 at 03:27:31PM -0700, Michael Chan wrote:
> > This means that for non-VLAN tagged frames, the device drops received
> > packets if the length is greater than the MTU.  I don't see that in
> > other devices. What is the correct method? IMHO the bnx2 driver is
> > wrong here and if the policy is desired it should be enforced at
> > the next level (netif_receive_skb).  Hardcoding a protocol value is
> > kind of a giveaway that something is fishy.
> > 
> 
> I guess the reasoning is that we program the RX MTU in our chip to
> automatically discard packets bigger than the RX MTU and count them as
> over-size packets.  We add 4 bytes to the RX MTU to account for the VLAN
> tag which may be stripped or not stripped by the chip depending on
> settings.  The extra 4 bytes in the RX MTU setting will allow over-size
> packets by up to 4 bytes to get through.
> 
> I agree we should move this to the next level.

802.3ac allows both unconditionally raising the MTU to 1522 as well as
checking the protocol and only accepting 802.1Q frames at 1522 while
restricting everything else to 1518.

802.3as raises the bar to 2000 bytes, but explicitly states that the
actual payload - without encapsulation headers from 802.1Q, 1ad, 1ah,
MPLS & co. - should keep the 1500 byte limit.

I think the sensible approach would be to move the MTU check as close
as possible to the border between ethernet and the upper layer
protocols, i.e. the driver shouldn't check this at all and try to tx/rx
as much as the hardware supports. This is needed for QinQ, 802.1ah & co.


-David

^ permalink raw reply

* Re: [PATCH net-next 8/8] tg3: Code movement
From: Joe Perches @ 2011-08-31 23:59 UTC (permalink / raw)
  To: Matt Carlson; +Cc: davem, netdev
In-Reply-To: <1314827094-29714-9-git-send-email-mcarlson@broadcom.com>

On Wed, 2011-08-31 at 14:44 -0700, Matt Carlson wrote:
> This patch just moves some code around for better organization.
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
[]
> @@ -15541,7 +15542,7 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
>  		tnapi->tx_pending = TG3_DEF_TX_RING_PENDING;
>  
>  		tnapi->int_mbox = intmbx;
> -		if (i < 4)
> +		if (i <= 4)
>  			intmbx += 0x8;
>  		else
>  			intmbx += 0x4;

Not just code movement.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox