From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: References: <2ce4491cf3eb01922d0fec97147a1533cca30c41.1435170868.git.mschiffer@universe-factory.net> <20150807161622.GB2893@odroid> From: Matthias Schiffer Message-ID: <55C4ECD3.3020608@universe-factory.net> Date: Fri, 7 Aug 2015 19:37:23 +0200 MIME-Version: 1.0 In-Reply-To: <20150807161622.GB2893@odroid> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="geNHxr7bT76WC6TX6qlwDLnhpGhBkJt88" Subject: Re: [B.A.T.M.A.N.] [RFC] batman-adv: add generic netlink query API to replace debugfs files List-Id: The list for a Better Approach To Mobile Ad-hoc Networking List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: b.a.t.m.a.n@lists.open-mesh.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --geNHxr7bT76WC6TX6qlwDLnhpGhBkJt88 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 08/07/2015 06:16 PM, Linus L=C3=BCssing wrote: > Hi Matthias, >=20 > here at the Wireless BattleMesh we finally had the chance to get > some initial discussions on your patch going. For the start a few > comprehension questions came up: >=20 >=20 > On Wed, Jun 24, 2015 at 08:34:28PM +0200, Matthias Schiffer wrote: >> * As batman-adv uses single_open, the whole content of the originators= / >> transglobal files must fit into a single buffer; in large batman-adv= >> networks this often fails (as an order-5 allocation or even more wou= ld be >> necessary) >> * When originators or transglobal aren't just used for debugging, they= are >> first converted to text, and then parsed again in userspace by tools= like >> alfred/batadv-vis. Sending MAC address lists from the kernel to user= space >> as text makes the buffer size issue even worse. >=20 > These two points can be addressed through debugfs too, for > instance using sequential debugfs writes, right? (In fact IIRC you > had started with that approach until you got the feedback from > Gregk, right?) I've had a look at the different functions the seq_file API provides, but I didn't write any code. >=20 > Can you elaborate a little more on the "order-5 allocation"? > What amount of free RAM did the machines have where we observed > Out-of-Memory kernel panics upon debugfs access? Can you give some > numbers / calculations why we ended up with several Megabytes > memory allocations on debugfs access? An order-5 allocation are 2^5 =3D 32 pages of memory, i.e. 128K of RAM. A= s these are allocated by kmalloc, these 32 pages are allocated as one piece of physical RAM. As the RAM gets fragmented more and more the longer a system is running, 32 pages in one piece can be hard to find even when there are still tens of MB of free RAM. The fragmentation issue becomes a bit worse because the seq_file code starts with a single page and loops until the buffer is big enough to fit the whole output, always freeing the buffer in each loop iteration and allocation a new buffer twice the size. >=20 >=20 > The debugfs race conditions Gregk and you talked about are on > adding/removing debugfs files, right? Are there any known race > conditions on simple reads/writes in the abscene of removing > debugfs files? The race conditions only occur when files are removed, but that alone is bad enough - I'd really like to avoid enabling debugfs at all on critical systems. >=20 >=20 > Since you've had a look at both the netlink and sequential debugfs > approach already, can you give some estimation about the > complexity or rough number of lines of code to change for the > sequential debugfs approach? I guess that should be possible in 100~200 lines of code. Most of it would be similar to the code I've implemented for the netlink API: storing counters between the callback runs to keep track of the current position in the data structures. Of course, it will have the same drawbacks: when originator/tt entries are added or removed between the calls, entries will be duplicate or missing from the output. The main reason why I didn't consider fixing the debugfs code first was that returning to userspace in the middle of the read will make the race conditions much easier to hit: at the moment, the race can only occur when the file is removed between open() and read(), which is usually very short. The time between several read() calls can be much bigger, especially when the files are read and parsed line by line. >=20 > One thought that popped up here was, whether it'd make sense to > first "fix" the debugfs approach to the extent possible with a > couple of lines instead of 800+ lines to get rid of the issues > we frequently observe. And then merge a complete fix but bigger > patchset implementing netlink support with a more thorough review > and discussions on what we'd need for its API now and upcoming > features. >=20 > Cheers, Linus >=20 --geNHxr7bT76WC6TX6qlwDLnhpGhBkJt88 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCgAGBQJVxOzTAAoJEBbvP2TLIB2cHHQP/Ap7WTlPD2LIH54kuI+8BJqH glxoyDLe12fsfolk51odsnjoW2IqL84AAHmRZZBCWsaM2iorNTdJovXOSxOWX1a6 JpnATLYkbeMfyb/C6n7Vq4VSjbbzZAdSYuctcR9qLArDswpPXj73NqawOBUKbYw9 McTgVOJXdwZeCrTX26rIXlYzqVKAY0B8tZ18Hn/+u3/cgG05sGEHAwgzx7VyW1y+ Rw4iNGhH9EL145FDwVHB2+aQKa25BmEFHfcSkwI+F9JX9xcjHto+ihBVaZE3/NL6 T+u3VB8O1IfNhiaSjjXFsfx3DktBz+mUGZtVVF/j6Ud+n8H76ApNXv9A6bSwekq8 QDqocupo2ZSrnba7PAqMtq8foqHpbrsIFR0CEn82bvah1CeOM/h+hCAisN6be1Ex 3W9Wg6x9LSfLuaELHG1CTf7ElmgVEKYvOD1xB6LuDI/4j9AcOovFNIAzlUGZQWV+ w+RO568uXbwOQdnuivoe6ePOT6dw/M9Q5qDSomm4hASVILuanHqULt00kItGO4Vg avlD6zMJ5ZdVAt4b17wGwUSKl/UYd9cK0yCx0AM4Jvyp/0w05JOj0DHCmHp1qEPq 1paROpTVlHvUMjy8pYFKzo3hAMyRsZW73x7ipW0skLlT93V6kSfAHlZ+AoIRtyo8 Oi25Hf3TV633tzK1aNFs =CaOD -----END PGP SIGNATURE----- --geNHxr7bT76WC6TX6qlwDLnhpGhBkJt88--