* Potentially unbounded allocations in seq_read?
@ 2013-12-11 17:04 Tvrtko Ursulin
2013-12-11 17:48 ` Tvrtko Ursulin
2013-12-11 17:49 ` Al Viro
0 siblings, 2 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2013-12-11 17:04 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton; +Cc: linux-fsdevel, linux-kernel
Hi all,
It seems that the buffer allocation in seq_read can double in size
indefinitely, at least I've seen that in practice with /proc/<pid>/smaps
(attempting to double m->size to 4M on a read of 1000 bytes). This
produces an ugly WARN_ON_ONCE, which should perhaps be avoided? (given
that it can be triggered by userspace at will)
>From the top comment in seq_file.c one would think that it is a
fundamental limitation of the current code that everything which will be
read (even if in chunks) needs to be in the kernel side buffer at the
same time?
If that is true then only way to fix it would be to completely re-design
the seq_file interface, just silencing the allocation failure with
__GFP_NOWARN perhaps as a temporary measure.
As an alternative, since it does sound a bit pathological, perhaps users
for seq_file who know can be printing out such huge amounts of text
should just use a different (new?) facility?
Thanks,
Tvrtko
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Potentially unbounded allocations in seq_read? 2013-12-11 17:04 Potentially unbounded allocations in seq_read? Tvrtko Ursulin @ 2013-12-11 17:48 ` Tvrtko Ursulin 2013-12-11 18:00 ` Al Viro 2013-12-11 17:49 ` Al Viro 1 sibling, 1 reply; 11+ messages in thread From: Tvrtko Ursulin @ 2013-12-11 17:48 UTC (permalink / raw) To: Alexander Viro; +Cc: Andrew Morton, linux-fsdevel, linux-kernel On Wed, 2013-12-11 at 17:04 +0000, Tvrtko Ursulin wrote: > Hi all, > > It seems that the buffer allocation in seq_read can double in size > indefinitely, at least I've seen that in practice with /proc/<pid>/smaps > (attempting to double m->size to 4M on a read of 1000 bytes). This > produces an ugly WARN_ON_ONCE, which should perhaps be avoided? (given > that it can be triggered by userspace at will) > > From the top comment in seq_file.c one would think that it is a > fundamental limitation of the current code that everything which will be > read (even if in chunks) needs to be in the kernel side buffer at the > same time? Oh-oh, seems that m->size is doubled on every read. So if app is reading with a buffer smaller than data available, it can do nine reads before it hits a >MAX_ORDER allocation. Not good. :) Regards, Tvrtko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Potentially unbounded allocations in seq_read? 2013-12-11 17:48 ` Tvrtko Ursulin @ 2013-12-11 18:00 ` Al Viro 0 siblings, 0 replies; 11+ messages in thread From: Al Viro @ 2013-12-11 18:00 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Andrew Morton, linux-fsdevel, linux-kernel On Wed, Dec 11, 2013 at 05:48:32PM +0000, Tvrtko Ursulin wrote: > On Wed, 2013-12-11 at 17:04 +0000, Tvrtko Ursulin wrote: > > Hi all, > > > > It seems that the buffer allocation in seq_read can double in size > > indefinitely, at least I've seen that in practice with /proc/<pid>/smaps > > (attempting to double m->size to 4M on a read of 1000 bytes). This > > produces an ugly WARN_ON_ONCE, which should perhaps be avoided? (given > > that it can be triggered by userspace at will) > > > > From the top comment in seq_file.c one would think that it is a > > fundamental limitation of the current code that everything which will be > > read (even if in chunks) needs to be in the kernel side buffer at the > > same time? > > Oh-oh, seems that m->size is doubled on every read. So if app is reading > with a buffer smaller than data available, it can do nine reads before > it hits a >MAX_ORDER allocation. Not good. :) Huh? Is that from observation or from reading the code? If it's the former, I would really like to see details, if it's the latter... you are misreading it. m->size is doubled until it's large enough to hold ->show() output; size argument of seq_read() has nothing to do with that. Once the damn thing is large enough, read() is served from it. So are subsequent reads, until you manage to eat all that had been generated. Then the same buffer is used for the next entry; again, no doubling unless that next entry is even bigger and won't fit. Doubling on each read(2) takes really strange iterator to trigger and you'll need ->show() spewing bigger and bigger entries. Again - details, please... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Potentially unbounded allocations in seq_read? 2013-12-11 17:04 Potentially unbounded allocations in seq_read? Tvrtko Ursulin 2013-12-11 17:48 ` Tvrtko Ursulin @ 2013-12-11 17:49 ` Al Viro 2013-12-11 17:59 ` Tvrtko Ursulin 1 sibling, 1 reply; 11+ messages in thread From: Al Viro @ 2013-12-11 17:49 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Andrew Morton, linux-fsdevel, linux-kernel On Wed, Dec 11, 2013 at 05:04:41PM +0000, Tvrtko Ursulin wrote: > Hi all, > > It seems that the buffer allocation in seq_read can double in size > indefinitely, at least I've seen that in practice with /proc/<pid>/smaps > (attempting to double m->size to 4M on a read of 1000 bytes). This > produces an ugly WARN_ON_ONCE, which should perhaps be avoided? (given > that it can be triggered by userspace at will) An entry in /proc/<pid>/smaps that did not fit into 2Mb? Seriously? How in hell has that happened? If you can trigger that at will, please post the reproducer. > >From the top comment in seq_file.c one would think that it is a > fundamental limitation of the current code that everything which will be > read (even if in chunks) needs to be in the kernel side buffer at the > same time? > > If that is true then only way to fix it would be to completely re-design > the seq_file interface, just silencing the allocation failure with > __GFP_NOWARN perhaps as a temporary measure. > > As an alternative, since it does sound a bit pathological, perhaps users > for seq_file who know can be printing out such huge amounts of text > should just use a different (new?) facility? If a seq_file user is attempting to spew a couple of megs of text in one ->show() call, there's definitely something misused. Either they ought to use a different iterator (might be feasible if that monster entry is produced by some kind of loop) or just not use seq_file at all. I'm very surprised that /proc/*/smaps has managed to step into that, though - show_pid_smap() shouldn't be able to do so, AFAICS... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Potentially unbounded allocations in seq_read? 2013-12-11 17:49 ` Al Viro @ 2013-12-11 17:59 ` Tvrtko Ursulin 2013-12-11 18:07 ` Al Viro 0 siblings, 1 reply; 11+ messages in thread From: Tvrtko Ursulin @ 2013-12-11 17:59 UTC (permalink / raw) To: Al Viro; +Cc: Andrew Morton, linux-fsdevel, linux-kernel On Wed, 2013-12-11 at 17:49 +0000, Al Viro wrote: > On Wed, Dec 11, 2013 at 05:04:41PM +0000, Tvrtko Ursulin wrote: > > Hi all, > > > > It seems that the buffer allocation in seq_read can double in size > > indefinitely, at least I've seen that in practice with /proc/<pid>/smaps > > (attempting to double m->size to 4M on a read of 1000 bytes). This > > produces an ugly WARN_ON_ONCE, which should perhaps be avoided? (given > > that it can be triggered by userspace at will) > > An entry in /proc/<pid>/smaps that did not fit into 2Mb? Seriously? > How in hell has that happened? If you can trigger that at will, please > post the reproducer. Yeah, no, wrong assumption. It was not about the size but the number of reads. For example: open("/proc/3131/smaps", O_RDONLY|O_LARGEFILE) = 3 fstat64(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 brk(0xf9019000) = 0xf9019000 read(3, "be483000-be4ab000 rw-s 00000000 "..., 4096) = 4054 write(1, "be483000-be4ab000 rw-s 00000000 "..., 4054) = 4054 read(3, "be5c3000-be5c5000 rw-s 10da3f000"..., 4096) = 3656 write(1, "be5c3000-be5c5000 rw-s 10da3f000"..., 3656) = 3656 read(3, "be6b1000-be6b2000 rw-s 10da48000"..., 4096) = 3661 write(1, "be6b1000-be6b2000 rw-s 10da48000"..., 3661) = 3661 read(3, "be6b9000-be6ba000 rw-s 10da38000"..., 4096) = 3599 write(1, "be6b9000-be6ba000 rw-s 10da38000"..., 3599) = 3599 read(3, "be6d6000-be7b6000 rw-s 10d889000"..., 4096) = 3599 write(1, "be6d6000-be7b6000 rw-s 10d889000"..., 3599) = 3599 read(3, "be884000-be885000 rw-s 10d85c000"..., 4096) = 3661 write(1, "be884000-be885000 rw-s 10d85c000"..., 3661) = 3661 read(3, "be88d000-be8de000 rw-p 00000000 "..., 4096) = 4007 write(1, "be88d000-be8de000 rw-p 00000000 "..., 4007) = 4007 read(3, "bea29000-bea4d000 r-xp 00000000 "..., 4096) = 4057 write(1, "bea29000-bea4d000 r-xp 00000000 "..., 4057) = 4057 read(3, "beab3000-bead0000 r--p 00000000 "..., 4096) = 2092 write(1, "beab3000-bead0000 r--p 00000000 "..., 2092) = 2092 read(3, 0xf9017030, 4096) = -1 ENOMEM (Out of memory) Regards, Tvrtko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Potentially unbounded allocations in seq_read? 2013-12-11 17:59 ` Tvrtko Ursulin @ 2013-12-11 18:07 ` Al Viro 2013-12-12 13:40 ` Tvrtko Ursulin 0 siblings, 1 reply; 11+ messages in thread From: Al Viro @ 2013-12-11 18:07 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Andrew Morton, linux-fsdevel, linux-kernel On Wed, Dec 11, 2013 at 05:59:57PM +0000, Tvrtko Ursulin wrote: > On Wed, 2013-12-11 at 17:49 +0000, Al Viro wrote: > > On Wed, Dec 11, 2013 at 05:04:41PM +0000, Tvrtko Ursulin wrote: > > > Hi all, > > > > > > It seems that the buffer allocation in seq_read can double in size > > > indefinitely, at least I've seen that in practice with /proc/<pid>/smaps > > > (attempting to double m->size to 4M on a read of 1000 bytes). This > > > produces an ugly WARN_ON_ONCE, which should perhaps be avoided? (given > > > that it can be triggered by userspace at will) > > > > An entry in /proc/<pid>/smaps that did not fit into 2Mb? Seriously? > > How in hell has that happened? If you can trigger that at will, please > > post the reproducer. > > Yeah, no, wrong assumption. It was not about the size but the number of > reads. For example: > > open("/proc/3131/smaps", O_RDONLY|O_LARGEFILE) = 3 > fstat64(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 > brk(0xf9019000) = 0xf9019000 > read(3, "be483000-be4ab000 rw-s 00000000 "..., 4096) = 4054 > write(1, "be483000-be4ab000 rw-s 00000000 "..., 4054) = 4054 > read(3, "be5c3000-be5c5000 rw-s 10da3f000"..., 4096) = 3656 > write(1, "be5c3000-be5c5000 rw-s 10da3f000"..., 3656) = 3656 > read(3, "be6b1000-be6b2000 rw-s 10da48000"..., 4096) = 3661 > write(1, "be6b1000-be6b2000 rw-s 10da48000"..., 3661) = 3661 > read(3, "be6b9000-be6ba000 rw-s 10da38000"..., 4096) = 3599 > write(1, "be6b9000-be6ba000 rw-s 10da38000"..., 3599) = 3599 > read(3, "be6d6000-be7b6000 rw-s 10d889000"..., 4096) = 3599 > write(1, "be6d6000-be7b6000 rw-s 10d889000"..., 3599) = 3599 > read(3, "be884000-be885000 rw-s 10d85c000"..., 4096) = 3661 > write(1, "be884000-be885000 rw-s 10d85c000"..., 3661) = 3661 > read(3, "be88d000-be8de000 rw-p 00000000 "..., 4096) = 4007 > write(1, "be88d000-be8de000 rw-p 00000000 "..., 4007) = 4007 > read(3, "bea29000-bea4d000 r-xp 00000000 "..., 4096) = 4057 > write(1, "bea29000-bea4d000 r-xp 00000000 "..., 4057) = 4057 > read(3, "beab3000-bead0000 r--p 00000000 "..., 4096) = 2092 > write(1, "beab3000-bead0000 r--p 00000000 "..., 2092) = 2092 > read(3, 0xf9017030, 4096) = -1 ENOMEM (Out of memory) OK, so you've got ENOMEM somewhere, but where had it come from? The buffer from previous read() ought to have sufficed for this one, unless the next entry had been much longer than usual... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Potentially unbounded allocations in seq_read? 2013-12-11 18:07 ` Al Viro @ 2013-12-12 13:40 ` Tvrtko Ursulin 2013-12-12 13:49 ` Al Viro 0 siblings, 1 reply; 11+ messages in thread From: Tvrtko Ursulin @ 2013-12-12 13:40 UTC (permalink / raw) To: Al Viro; +Cc: Andrew Morton, linux-fsdevel, linux-kernel On Wed, 2013-12-11 at 18:07 +0000, Al Viro wrote: > On Wed, Dec 11, 2013 at 05:59:57PM +0000, Tvrtko Ursulin wrote: > > On Wed, 2013-12-11 at 17:49 +0000, Al Viro wrote: > > > On Wed, Dec 11, 2013 at 05:04:41PM +0000, Tvrtko Ursulin wrote: > > > > Hi all, > > > > > > > > It seems that the buffer allocation in seq_read can double in size > > > > indefinitely, at least I've seen that in practice with /proc/<pid>/smaps > > > > (attempting to double m->size to 4M on a read of 1000 bytes). This > > > > produces an ugly WARN_ON_ONCE, which should perhaps be avoided? (given > > > > that it can be triggered by userspace at will) > > > > > > An entry in /proc/<pid>/smaps that did not fit into 2Mb? Seriously? > > > How in hell has that happened? If you can trigger that at will, please > > > post the reproducer. > > > > Yeah, no, wrong assumption. It was not about the size but the number of > > reads. For example: > > > > open("/proc/3131/smaps", O_RDONLY|O_LARGEFILE) = 3 > > fstat64(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 > > brk(0xf9019000) = 0xf9019000 > > read(3, "be483000-be4ab000 rw-s 00000000 "..., 4096) = 4054 > > write(1, "be483000-be4ab000 rw-s 00000000 "..., 4054) = 4054 > > read(3, "be5c3000-be5c5000 rw-s 10da3f000"..., 4096) = 3656 > > write(1, "be5c3000-be5c5000 rw-s 10da3f000"..., 3656) = 3656 > > read(3, "be6b1000-be6b2000 rw-s 10da48000"..., 4096) = 3661 > > write(1, "be6b1000-be6b2000 rw-s 10da48000"..., 3661) = 3661 > > read(3, "be6b9000-be6ba000 rw-s 10da38000"..., 4096) = 3599 > > write(1, "be6b9000-be6ba000 rw-s 10da38000"..., 3599) = 3599 > > read(3, "be6d6000-be7b6000 rw-s 10d889000"..., 4096) = 3599 > > write(1, "be6d6000-be7b6000 rw-s 10d889000"..., 3599) = 3599 > > read(3, "be884000-be885000 rw-s 10d85c000"..., 4096) = 3661 > > write(1, "be884000-be885000 rw-s 10d85c000"..., 3661) = 3661 > > read(3, "be88d000-be8de000 rw-p 00000000 "..., 4096) = 4007 > > write(1, "be88d000-be8de000 rw-p 00000000 "..., 4007) = 4007 > > read(3, "bea29000-bea4d000 r-xp 00000000 "..., 4096) = 4057 > > write(1, "bea29000-bea4d000 r-xp 00000000 "..., 4057) = 4057 > > read(3, "beab3000-bead0000 r--p 00000000 "..., 4096) = 2092 > > write(1, "beab3000-bead0000 r--p 00000000 "..., 2092) = 2092 > > read(3, 0xf9017030, 4096) = -1 ENOMEM (Out of memory) > > OK, so you've got ENOMEM somewhere, but where had it come from? > The buffer from previous read() ought to have sufficed for this one, > unless the next entry had been much longer than usual... So this is the story... task_mmu.c:show_map_vma() calls seq_path. There we have a d_path call which returns -ENAMETOOLONG and keeps doing so even though the buffer grows to huge proportions. It is something on tmpfs, don't know what. But in the meantime, shouldn't seq_path be a bit more considerate on this particular error and not mark the state as "could not fit" forever? Perhaps it would make sense to limit it a bit? Or even more so, on errors _other_ than -ENAMETOOLONG it will at the moment mark the result as "need more space". That also sounds broken to me. Regards, Tvrtko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Potentially unbounded allocations in seq_read? 2013-12-12 13:40 ` Tvrtko Ursulin @ 2013-12-12 13:49 ` Al Viro 2013-12-12 13:59 ` Tvrtko Ursulin 0 siblings, 1 reply; 11+ messages in thread From: Al Viro @ 2013-12-12 13:49 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Andrew Morton, linux-fsdevel, linux-kernel On Thu, Dec 12, 2013 at 01:40:40PM +0000, Tvrtko Ursulin wrote: > So this is the story... task_mmu.c:show_map_vma() calls seq_path. There > we have a d_path call which returns -ENAMETOOLONG and keeps doing so > even though the buffer grows to huge proportions. It is something on > tmpfs, don't know what. > > But in the meantime, shouldn't seq_path be a bit more considerate on > this particular error and not mark the state as "could not fit" forever? > Perhaps it would make sense to limit it a bit? > > Or even more so, on errors _other_ than -ENAMETOOLONG it will at the > moment mark the result as "need more space". That also sounds broken to > me. a) *what* errors other than -ENAMETOOLONG? b) d_path() not fitting into 2Mb is definitely a bug. If you really have managed to get a dentry tree 1 million levels deep, you have much worse problems. c) which kernel version it is? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Potentially unbounded allocations in seq_read? 2013-12-12 13:49 ` Al Viro @ 2013-12-12 13:59 ` Tvrtko Ursulin 2013-12-12 14:21 ` Al Viro 0 siblings, 1 reply; 11+ messages in thread From: Tvrtko Ursulin @ 2013-12-12 13:59 UTC (permalink / raw) To: Al Viro; +Cc: Andrew Morton, linux-fsdevel, linux-kernel On Thu, 2013-12-12 at 13:49 +0000, Al Viro wrote: > On Thu, Dec 12, 2013 at 01:40:40PM +0000, Tvrtko Ursulin wrote: > > > So this is the story... task_mmu.c:show_map_vma() calls seq_path. There > > we have a d_path call which returns -ENAMETOOLONG and keeps doing so > > even though the buffer grows to huge proportions. It is something on > > tmpfs, don't know what. > > > > But in the meantime, shouldn't seq_path be a bit more considerate on > > this particular error and not mark the state as "could not fit" forever? > > Perhaps it would make sense to limit it a bit? > > > > Or even more so, on errors _other_ than -ENAMETOOLONG it will at the > > moment mark the result as "need more space". That also sounds broken to > > me. > > a) *what* errors other than -ENAMETOOLONG? Is this your way of saying there can't be any other errors from d_path? > b) d_path() not fitting into 2Mb is definitely a bug. If you really have > managed to get a dentry tree 1 million levels deep, you have much worse > problems. > c) which kernel version it is? 3.10. I can't imagine this is an actual dentry tree somewhere, probably just a bug of some sort. I'll probably hunt it down completely some time next week, time permitting. Regards, Tvrtko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Potentially unbounded allocations in seq_read? 2013-12-12 13:59 ` Tvrtko Ursulin @ 2013-12-12 14:21 ` Al Viro 2013-12-12 14:52 ` Tvrtko Ursulin 0 siblings, 1 reply; 11+ messages in thread From: Al Viro @ 2013-12-12 14:21 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: Andrew Morton, linux-fsdevel, linux-kernel On Thu, Dec 12, 2013 at 01:59:31PM +0000, Tvrtko Ursulin wrote: > > a) *what* errors other than -ENAMETOOLONG? > > Is this your way of saying there can't be any other errors from d_path? Check yourself... It is the only error that makes sense there and yes, it is the only one being returned. > > b) d_path() not fitting into 2Mb is definitely a bug. If you really have > > managed to get a dentry tree 1 million levels deep, you have much worse > > problems. > > c) which kernel version it is? > > 3.10. I can't imagine this is an actual dentry tree somewhere, probably > just a bug of some sort. I'll probably hunt it down completely some time > next week, time permitting. Sounds like missing backport of 118b23 ("cope with potentially long ->d_dname() output for shmem/hugetlb"). It *is* in -stable (linux-3.10.y has it since 3.10.17 as commit ad4c3c), but if your tree doesn't have it, that's the one to try first... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Potentially unbounded allocations in seq_read? 2013-12-12 14:21 ` Al Viro @ 2013-12-12 14:52 ` Tvrtko Ursulin 0 siblings, 0 replies; 11+ messages in thread From: Tvrtko Ursulin @ 2013-12-12 14:52 UTC (permalink / raw) To: Al Viro; +Cc: Andrew Morton, linux-fsdevel, linux-kernel On Thu, 2013-12-12 at 14:21 +0000, Al Viro wrote: > On Thu, Dec 12, 2013 at 01:59:31PM +0000, Tvrtko Ursulin wrote: > > > > a) *what* errors other than -ENAMETOOLONG? > > > > Is this your way of saying there can't be any other errors from d_path? > > Check yourself... It is the only error that makes sense there and yes, > it is the only one being returned. Cool then, thats why I e-mailed you in the first place, because you know this area very well. > > > b) d_path() not fitting into 2Mb is definitely a bug. If you really have > > > managed to get a dentry tree 1 million levels deep, you have much worse > > > problems. > > > c) which kernel version it is? > > > > 3.10. I can't imagine this is an actual dentry tree somewhere, probably > > just a bug of some sort. I'll probably hunt it down completely some time > > next week, time permitting. > > Sounds like missing backport of 118b23 ("cope with potentially long ->d_dname() > output for shmem/hugetlb"). It *is* in -stable (linux-3.10.y has it since > 3.10.17 as commit ad4c3c), but if your tree doesn't have it, that's the one > to try first... Yes that fixed it, thanks! Regards, Tvrtko ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-12-12 14:52 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-11 17:04 Potentially unbounded allocations in seq_read? Tvrtko Ursulin 2013-12-11 17:48 ` Tvrtko Ursulin 2013-12-11 18:00 ` Al Viro 2013-12-11 17:49 ` Al Viro 2013-12-11 17:59 ` Tvrtko Ursulin 2013-12-11 18:07 ` Al Viro 2013-12-12 13:40 ` Tvrtko Ursulin 2013-12-12 13:49 ` Al Viro 2013-12-12 13:59 ` Tvrtko Ursulin 2013-12-12 14:21 ` Al Viro 2013-12-12 14:52 ` Tvrtko Ursulin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).