* Magic in videobuf
@ 2010-03-15 8:07 Pawel Osciak
2010-03-15 11:25 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 13+ messages in thread
From: Pawel Osciak @ 2010-03-15 8:07 UTC (permalink / raw)
To: linux-media@vger.kernel.org
Cc: Marek Szyprowski, 'Hans Verkuil',
kyungmin.park@samsung.com
Hello,
is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
besides driver debugging? I intend to remove them, as we weren't able
to find any particular use for them when we were discussing this at
the memory handling meeting in Norway...
Best regards
--
Pawel Osciak
Linux Platform Group
Samsung Poland R&D Center
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Magic in videobuf
2010-03-15 8:07 Magic in videobuf Pawel Osciak
@ 2010-03-15 11:25 ` Mauro Carvalho Chehab
2010-03-15 13:27 ` Hans Verkuil
0 siblings, 1 reply; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2010-03-15 11:25 UTC (permalink / raw)
To: Pawel Osciak
Cc: linux-media@vger.kernel.org, Marek Szyprowski,
'Hans Verkuil', kyungmin.park@samsung.com
Hi Pawel,
Pawel Osciak wrote:
> Hello,
>
> is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
> besides driver debugging? I intend to remove them, as we weren't able
> to find any particular use for them when we were discussing this at
> the memory handling meeting in Norway...
It is a sort of paranoid check to avoid the risk of mass memory corruption
if something goes deadly wrong with the video buffers.
The original videobuf, written back in 2001/2002 had this code, and I've
kept it on the redesign I did in 2007, since I know that DMA is very badly
implemented on some chipsets. There are several reports of the video driver
to corrupt the system memory and damaging the disk data when a PCI transfer
to disk happens at the same time that a PCI2PCI data transfer happens (This
basically affects overlay mode, where the hardware is programmed to transfer
data from the video board to the video adapter board).
The DMA bug is present on several VIA and SYS old chipsets. It happened again
in some newer chips (2007?), and the fix were to add a quirk blocking overlay
mode on the reported broken hardware. It seems that newer BIOSes for those
newer hardware fixed this issue.
That's said, I never got any report from anyone explicitly saying that they
hit the MAGIC_CHECK() logic.
I prefer to keep this logic, but maybe we can add a CONFIG option to disable it.
Something like:
#ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK
#define MAGIC_CHECK() ...
#else
#define MAGIC_CHECK()
#endif
Cheers,
Mauro
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Magic in videobuf
2010-03-15 11:25 ` Mauro Carvalho Chehab
@ 2010-03-15 13:27 ` Hans Verkuil
2010-03-15 16:23 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 13+ messages in thread
From: Hans Verkuil @ 2010-03-15 13:27 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Pawel Osciak, linux-media@vger.kernel.org, Marek Szyprowski,
kyungmin.park@samsung.com
> Hi Pawel,
>
> Pawel Osciak wrote:
>> Hello,
>>
>> is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
>> besides driver debugging? I intend to remove them, as we weren't able
>> to find any particular use for them when we were discussing this at
>> the memory handling meeting in Norway...
>
> It is a sort of paranoid check to avoid the risk of mass memory corruption
> if something goes deadly wrong with the video buffers.
>
> The original videobuf, written back in 2001/2002 had this code, and I've
> kept it on the redesign I did in 2007, since I know that DMA is very badly
> implemented on some chipsets. There are several reports of the video
> driver
> to corrupt the system memory and damaging the disk data when a PCI
> transfer
> to disk happens at the same time that a PCI2PCI data transfer happens
> (This
> basically affects overlay mode, where the hardware is programmed to
> transfer
> data from the video board to the video adapter board).
>
> The DMA bug is present on several VIA and SYS old chipsets. It happened
> again
> in some newer chips (2007?), and the fix were to add a quirk blocking
> overlay
> mode on the reported broken hardware. It seems that newer BIOSes for those
> newer hardware fixed this issue.
>
> That's said, I never got any report from anyone explicitly saying that
> they
> hit the MAGIC_CHECK() logic.
>
> I prefer to keep this logic, but maybe we can add a CONFIG option to
> disable it.
> Something like:
>
> #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK
> #define MAGIC_CHECK() ...
> #else
> #define MAGIC_CHECK()
> #endif
What on earth does this magic check have to do with possible DMA
overruns/memory corruption? This assumes that somehow exactly these magic
fields are overwritten and that you didn't crash because of memory
corruption elsewhere much earlier. It pollutes the code for no good
reason. All it does is oops anyway, so it really doesn't 'avoid' a crash
(as if you could in such scenarios). And most likely the damage has been
done already in that case.
Please let us get rid of this. It makes no sense whatsoever.
Regards,
Hans
--
Hans Verkuil - video4linux developer - sponsored by TANDBERG Telecom
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Magic in videobuf
2010-03-15 13:27 ` Hans Verkuil
@ 2010-03-15 16:23 ` Mauro Carvalho Chehab
2010-03-15 16:40 ` Hans Verkuil
0 siblings, 1 reply; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2010-03-15 16:23 UTC (permalink / raw)
To: Hans Verkuil
Cc: Pawel Osciak, linux-media@vger.kernel.org, Marek Szyprowski,
kyungmin.park@samsung.com
Hans Verkuil wrote:
>> Hi Pawel,
>>
>> Pawel Osciak wrote:
>>> Hello,
>>>
>>> is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
>>> besides driver debugging? I intend to remove them, as we weren't able
>>> to find any particular use for them when we were discussing this at
>>> the memory handling meeting in Norway...
>> It is a sort of paranoid check to avoid the risk of mass memory corruption
>> if something goes deadly wrong with the video buffers.
>>
>> The original videobuf, written back in 2001/2002 had this code, and I've
>> kept it on the redesign I did in 2007, since I know that DMA is very badly
>> implemented on some chipsets. There are several reports of the video
>> driver
>> to corrupt the system memory and damaging the disk data when a PCI
>> transfer
>> to disk happens at the same time that a PCI2PCI data transfer happens
>> (This
>> basically affects overlay mode, where the hardware is programmed to
>> transfer
>> data from the video board to the video adapter board).
>>
>> The DMA bug is present on several VIA and SYS old chipsets. It happened
>> again
>> in some newer chips (2007?), and the fix were to add a quirk blocking
>> overlay
>> mode on the reported broken hardware. It seems that newer BIOSes for those
>> newer hardware fixed this issue.
>>
>> That's said, I never got any report from anyone explicitly saying that
>> they
>> hit the MAGIC_CHECK() logic.
>>
>> I prefer to keep this logic, but maybe we can add a CONFIG option to
>> disable it.
>> Something like:
>>
>> #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK
>> #define MAGIC_CHECK() ...
>> #else
>> #define MAGIC_CHECK()
>> #endif
>
> What on earth does this magic check have to do with possible DMA
> overruns/memory corruption? This assumes that somehow exactly these magic
> fields are overwritten and that you didn't crash because of memory
> corruption elsewhere much earlier.
Yes, that's the assumption. As, in general, there are more than one videobuffer,
and assuming that one buffer physical address is close to the other, if the data
got miss-aligned at the DMA, it is likely that the magic number of the next buffer
will be overwritten if something got bad. The real situation will depend on how
fragmented is the memory.
> It pollutes the code
There are only 18 occurences of MAGIC* at a given videobuf driver:
$ grep MAGIC ~v4l/master_hg/v4l/videobuf-dma-sg.c |wc -l
18
So, I don't think it is too much pollution.
> for no good
> reason. All it does is oops anyway, so it really doesn't 'avoid' a crash
> (as if you could in such scenarios). And most likely the damage has been
> done already in that case.
It won't avoid the damage, but the error message could potentially help
to track the issue. It will also likely limit the damage.
> Please let us get rid of this. It makes no sense whatsoever.
I don't have a strong opinion about this subject, but if this code might help
to avoid propagating the damage and to track the issue, I don't see why we
need to remove it, especially since it is easy to disable the entire logic
by just adding a few #if's to remove this code on environments where no
problem is expected.
--
Cheers,
Mauro
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Magic in videobuf
2010-03-15 16:23 ` Mauro Carvalho Chehab
@ 2010-03-15 16:40 ` Hans Verkuil
2010-03-15 17:26 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 13+ messages in thread
From: Hans Verkuil @ 2010-03-15 16:40 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Pawel Osciak, linux-media@vger.kernel.org, Marek Szyprowski,
kyungmin.park@samsung.com
> Hans Verkuil wrote:
>>> Hi Pawel,
>>>
>>> Pawel Osciak wrote:
>>>> Hello,
>>>>
>>>> is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
>>>> besides driver debugging? I intend to remove them, as we weren't able
>>>> to find any particular use for them when we were discussing this at
>>>> the memory handling meeting in Norway...
>>> It is a sort of paranoid check to avoid the risk of mass memory
>>> corruption
>>> if something goes deadly wrong with the video buffers.
>>>
>>> The original videobuf, written back in 2001/2002 had this code, and
>>> I've
>>> kept it on the redesign I did in 2007, since I know that DMA is very
>>> badly
>>> implemented on some chipsets. There are several reports of the video
>>> driver
>>> to corrupt the system memory and damaging the disk data when a PCI
>>> transfer
>>> to disk happens at the same time that a PCI2PCI data transfer happens
>>> (This
>>> basically affects overlay mode, where the hardware is programmed to
>>> transfer
>>> data from the video board to the video adapter board).
>>>
>>> The DMA bug is present on several VIA and SYS old chipsets. It happened
>>> again
>>> in some newer chips (2007?), and the fix were to add a quirk blocking
>>> overlay
>>> mode on the reported broken hardware. It seems that newer BIOSes for
>>> those
>>> newer hardware fixed this issue.
>>>
>>> That's said, I never got any report from anyone explicitly saying that
>>> they
>>> hit the MAGIC_CHECK() logic.
>>>
>>> I prefer to keep this logic, but maybe we can add a CONFIG option to
>>> disable it.
>>> Something like:
>>>
>>> #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK
>>> #define MAGIC_CHECK() ...
>>> #else
>>> #define MAGIC_CHECK()
>>> #endif
>>
>> What on earth does this magic check have to do with possible DMA
>> overruns/memory corruption? This assumes that somehow exactly these
>> magic
>> fields are overwritten and that you didn't crash because of memory
>> corruption elsewhere much earlier.
>
> Yes, that's the assumption. As, in general, there are more than one
> videobuffer,
> and assuming that one buffer physical address is close to the other, if
> the data
> got miss-aligned at the DMA, it is likely that the magic number of the
> next buffer
> will be overwritten if something got bad. The real situation will depend
> on how
> fragmented is the memory.
For the record: we are talking about the magic fields as found in
include/media/videobuf*.h. None of the magic field there are actually in
the video buffers. They are in administrative structures or in ops structs
which are unlikely to be close in memory to the actual buffers.
Magic values that are actually put in the buffers themselves might serve
some purpose.
>
>> It pollutes the code
>
> There are only 18 occurences of MAGIC* at a given videobuf driver:
> $ grep MAGIC ~v4l/master_hg/v4l/videobuf-dma-sg.c |wc -l
> 18
>
> So, I don't think it is too much pollution.
It is, because it is absolute not clear what its purpose is, and in this
case even when I know the purpose it still makes no sense. Code like that
confuses people and does more harm than good.
>
>> for no good
>> reason. All it does is oops anyway, so it really doesn't 'avoid' a crash
>> (as if you could in such scenarios). And most likely the damage has been
>> done already in that case.
>
> It won't avoid the damage, but the error message could potentially help
> to track the issue. It will also likely limit the damage.
>
>> Please let us get rid of this. It makes no sense whatsoever.
>
> I don't have a strong opinion about this subject, but if this code might
> help
> to avoid propagating the damage and to track the issue, I don't see why we
> need to remove it, especially since it is easy to disable the entire logic
> by just adding a few #if's to remove this code on environments where no
> problem is expected.
It is highly unlikely that this code ever prevented these issues.
Especially given the places where the check is done. I think this is just
debug code that has been dragged along for all these years without anyone
bothering to remove it.
Regards,
Hans
--
Hans Verkuil - video4linux developer - sponsored by TANDBERG Telecom
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Magic in videobuf
2010-03-15 16:40 ` Hans Verkuil
@ 2010-03-15 17:26 ` Mauro Carvalho Chehab
2010-03-15 23:30 ` Andy Walls
2010-03-16 14:58 ` Pawel Osciak
0 siblings, 2 replies; 13+ messages in thread
From: Mauro Carvalho Chehab @ 2010-03-15 17:26 UTC (permalink / raw)
To: Hans Verkuil
Cc: Pawel Osciak, linux-media@vger.kernel.org, Marek Szyprowski,
kyungmin.park@samsung.com
Hans Verkuil wrote:
>> Hans Verkuil wrote:
>>>> Hi Pawel,
>>>>
>>>> Pawel Osciak wrote:
>>>>> Hello,
>>>>>
>>>>> is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
>>>>> besides driver debugging? I intend to remove them, as we weren't able
>>>>> to find any particular use for them when we were discussing this at
>>>>> the memory handling meeting in Norway...
>>>> It is a sort of paranoid check to avoid the risk of mass memory
>>>> corruption
>>>> if something goes deadly wrong with the video buffers.
>>>>
>>>> The original videobuf, written back in 2001/2002 had this code, and
>>>> I've
>>>> kept it on the redesign I did in 2007, since I know that DMA is very
>>>> badly
>>>> implemented on some chipsets. There are several reports of the video
>>>> driver
>>>> to corrupt the system memory and damaging the disk data when a PCI
>>>> transfer
>>>> to disk happens at the same time that a PCI2PCI data transfer happens
>>>> (This
>>>> basically affects overlay mode, where the hardware is programmed to
>>>> transfer
>>>> data from the video board to the video adapter board).
>>>>
>>>> The DMA bug is present on several VIA and SYS old chipsets. It happened
>>>> again
>>>> in some newer chips (2007?), and the fix were to add a quirk blocking
>>>> overlay
>>>> mode on the reported broken hardware. It seems that newer BIOSes for
>>>> those
>>>> newer hardware fixed this issue.
>>>>
>>>> That's said, I never got any report from anyone explicitly saying that
>>>> they
>>>> hit the MAGIC_CHECK() logic.
>>>>
>>>> I prefer to keep this logic, but maybe we can add a CONFIG option to
>>>> disable it.
>>>> Something like:
>>>>
>>>> #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK
>>>> #define MAGIC_CHECK() ...
>>>> #else
>>>> #define MAGIC_CHECK()
>>>> #endif
>>> What on earth does this magic check have to do with possible DMA
>>> overruns/memory corruption? This assumes that somehow exactly these
>>> magic
>>> fields are overwritten and that you didn't crash because of memory
>>> corruption elsewhere much earlier.
>> Yes, that's the assumption. As, in general, there are more than one
>> videobuffer,
>> and assuming that one buffer physical address is close to the other, if
>> the data
>> got miss-aligned at the DMA, it is likely that the magic number of the
>> next buffer
>> will be overwritten if something got bad. The real situation will depend
>> on how
>> fragmented is the memory.
>
> For the record: we are talking about the magic fields as found in
> include/media/videobuf*.h. None of the magic field there are actually in
> the video buffers. They are in administrative structures or in ops structs
> which are unlikely to be close in memory to the actual buffers.
Well, Pawel's email didn't mentioned that he is referring just to one type
of magic check.
> Magic values that are actually put in the buffers themselves might serve
> some purpose.
>
>>> It pollutes the code
>> There are only 18 occurences of MAGIC* at a given videobuf driver:
>> $ grep MAGIC ~v4l/master_hg/v4l/videobuf-dma-sg.c |wc -l
>> 18
>>
>> So, I don't think it is too much pollution.
>
> It is, because it is absolute not clear what its purpose is, and in this
> case even when I know the purpose it still makes no sense. Code like that
> confuses people and does more harm than good.
>
>>> for no good
>>> reason. All it does is oops anyway, so it really doesn't 'avoid' a crash
>>> (as if you could in such scenarios). And most likely the damage has been
>>> done already in that case.
>> It won't avoid the damage, but the error message could potentially help
>> to track the issue. It will also likely limit the damage.
>>
>>> Please let us get rid of this. It makes no sense whatsoever.
>> I don't have a strong opinion about this subject, but if this code might
>> help
>> to avoid propagating the damage and to track the issue, I don't see why we
>> need to remove it, especially since it is easy to disable the entire logic
>> by just adding a few #if's to remove this code on environments where no
>> problem is expected.
>
> It is highly unlikely that this code ever prevented these issues.
> Especially given the places where the check is done. I think this is just
> debug code that has been dragged along for all these years without anyone
> bothering to remove it.
I remember that when I did the conversion, the memory magic helped a lot to find
several issues. So, yes, they are very useful when debug troubles at videbuf.
I remember I had to re-format one disk, during that time, due to a videobuf issue.
So, those checks help people that are touching at the videobuf code, reducing the
chances of damaging their disk partitions when trying to implement overlay mode and
userptr on the videobuf implementations that misses those features, or when
working on a different mmap() logic at the driver.
They also helped me with some troubles related to compat32 stuff and troubles at
mmap() logic on the driver, as the videobuf magic hits when the data segment
is pointing to the wrong place. This may also help to find bugs with troubles with
the memory allocators.
By looking only at the adm struct magic, I don't see any problem on getting rid
of them.
--
Cheers,
Mauro
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Magic in videobuf
2010-03-15 17:26 ` Mauro Carvalho Chehab
@ 2010-03-15 23:30 ` Andy Walls
2010-03-16 10:13 ` Pawel Osciak
2010-03-16 14:58 ` Pawel Osciak
1 sibling, 1 reply; 13+ messages in thread
From: Andy Walls @ 2010-03-15 23:30 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Hans Verkuil, Pawel Osciak, linux-media@vger.kernel.org,
Marek Szyprowski, kyungmin.park@samsung.com
On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote:
> Hans Verkuil wrote:
> >> Hans Verkuil wrote:
> >>>> Pawel Osciak wrote:
> >>>>> is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
> >>>>> besides driver debugging? I intend to remove them, as we weren't able
> >>>>> to find any particular use for them when we were discussing this at
> >>>>> the memory handling meeting in Norway...
> >>>> It is a sort of paranoid check to avoid the risk of mass memory
> >>>> corruption
> >>>> if something goes deadly wrong with the video buffers.
> >>>>
> >>> What on earth does this magic check have to do with possible DMA
> >>> overruns/memory corruption? This assumes that somehow exactly these
> >>> magic
> >>> fields are overwritten and that you didn't crash because of memory
> >>> corruption elsewhere much earlier.
> All it does is oops anyway, so it really doesn't 'avoid' a crash
> >>> (as if you could in such scenarios). And most likely the damage has been
> >>> done already in that case.
> >> It won't avoid the damage, but the error message could potentially help
> >> to track the issue. It will also likely limit the damage.
> >>
> >>> Please let us get rid of this. It makes no sense whatsoever.
> >> I don't have a strong opinion about this subject, but if this code might
> >> help
> >> to avoid propagating the damage and to track the issue, I don't see why we
> >> need to remove it, especially since it is easy to disable the entire logic
> >> by just adding a few #if's to remove this code on environments where no
> >> problem is expected.
> >
> > It is highly unlikely that this code ever prevented these issues.
> > Especially given the places where the check is done. I think this is just
> > debug code that has been dragged along for all these years without anyone
> > bothering to remove it.
> I remember I had to re-format one disk, during that time, due to a videobuf issue.
> So, those checks help people that are touching at the videobuf code, reducing the
> chances of damaging their disk partitions when trying to implement overlay mode and
> userptr on the videobuf implementations that misses those features, or when
> working on a different mmap() logic at the driver.
In a previous job, working on a particularly large application, I had
occasional corruption in a shared memory segment that was shared by many
writer processes and 2 readers. A simple checksum on the data header
(and contents if appropriate) was enough to detect corrpution and avoid
dereferencing a corrupted pointer to the next data element (when walking
a data area filled with Key-Length-Value encoded data).
This "forward error detection" was inelegant to me - kind of like
putting armor on one's car instead of learning to drive properly. I
only resorted to using the checksum because there was almost no way to
find which process was corrupting shared memory in a reasonable amount
of time. It allowed me to change a "show stopper" bug into an annoying
data presentation bug, so the product could be released to a production
environment.
In a development environment, it would be much better to disable such
defensive coding and let the kernel Oops. You'll never find the
problems if you keep hiding them from yourself.
Regards,
Andy
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Magic in videobuf
2010-03-15 23:30 ` Andy Walls
@ 2010-03-16 10:13 ` Pawel Osciak
2010-03-16 16:35 ` Hans Verkuil
0 siblings, 1 reply; 13+ messages in thread
From: Pawel Osciak @ 2010-03-16 10:13 UTC (permalink / raw)
To: 'Andy Walls', 'Mauro Carvalho Chehab'
Cc: 'Hans Verkuil', linux-media, Marek Szyprowski,
kyungmin.park
>Andy Walls wrote:
>On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote:
>> Hans Verkuil wrote:
>> >> Hans Verkuil wrote:
>> >>>> Pawel Osciak wrote:
>
>> >>>>> is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
>> >>>>> besides driver debugging? I intend to remove them, as we weren't able
>> >>>>> to find any particular use for them when we were discussing this at
>> >>>>> the memory handling meeting in Norway...
>> >>>> It is a sort of paranoid check to avoid the risk of mass memory
>> >>>> corruption
>> >>>> if something goes deadly wrong with the video buffers.
>> >>>>
>
>> >>> What on earth does this magic check have to do with possible DMA
>> >>> overruns/memory corruption? This assumes that somehow exactly these
>> >>> magic
>> >>> fields are overwritten and that you didn't crash because of memory
>> >>> corruption elsewhere much earlier.
>
>> All it does is oops anyway, so it really doesn't 'avoid' a crash
>> >>> (as if you could in such scenarios). And most likely the damage has been
>> >>> done already in that case.
>> >> It won't avoid the damage, but the error message could potentially help
>> >> to track the issue. It will also likely limit the damage.
>> >>
>> >>> Please let us get rid of this. It makes no sense whatsoever.
>> >> I don't have a strong opinion about this subject, but if this code might
>> >> help
>> >> to avoid propagating the damage and to track the issue, I don't see why we
>> >> need to remove it, especially since it is easy to disable the entire logic
>> >> by just adding a few #if's to remove this code on environments where no
>> >> problem is expected.
>> >
>> > It is highly unlikely that this code ever prevented these issues.
>> > Especially given the places where the check is done. I think this is just
>> > debug code that has been dragged along for all these years without anyone
>> > bothering to remove it.
>
>> I remember I had to re-format one disk, during that time, due to a videobuf issue.
>> So, those checks help people that are touching at the videobuf code, reducing the
>> chances of damaging their disk partitions when trying to implement overlay mode and
>> userptr on the videobuf implementations that misses those features, or when
>> working on a different mmap() logic at the driver.
>
>
>In a previous job, working on a particularly large application, I had
>occasional corruption in a shared memory segment that was shared by many
>writer processes and 2 readers. A simple checksum on the data header
>(and contents if appropriate) was enough to detect corrpution and avoid
>dereferencing a corrupted pointer to the next data element (when walking
>a data area filled with Key-Length-Value encoded data).
>
>This "forward error detection" was inelegant to me - kind of like
>putting armor on one's car instead of learning to drive properly. I
>only resorted to using the checksum because there was almost no way to
>find which process was corrupting shared memory in a reasonable amount
>of time. It allowed me to change a "show stopper" bug into an annoying
>data presentation bug, so the product could be released to a production
>environment.
>
>In a development environment, it would be much better to disable such
>defensive coding and let the kernel Oops. You'll never find the
>problems if you keep hiding them from yourself.
So, to sum up (I hope I understood you guys correctly):
we are not seeing any particular reason (besides debugging) for having
the checks in videobuf-core. Checks in memory-specific handling may have some
uses, although I am not sure how much. I am not an expert on sg drivers, but as
the magics are in the kernel control structures, they are not really a subject
to corruption. What may get corrupted is video data or sg lists, but the magics
are in a separate memory areas anyway. So videobuf-core magics should be removed
and we are leaning towards removing memory-type magics as well?
Best regards
--
Pawel Osciak
Linux Platform Group
Samsung Poland R&D Center
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Magic in videobuf
2010-03-15 17:26 ` Mauro Carvalho Chehab
2010-03-15 23:30 ` Andy Walls
@ 2010-03-16 14:58 ` Pawel Osciak
1 sibling, 0 replies; 13+ messages in thread
From: Pawel Osciak @ 2010-03-16 14:58 UTC (permalink / raw)
To: 'Mauro Carvalho Chehab', 'Hans Verkuil'
Cc: linux-media, Marek Szyprowski, kyungmin.park
>Mauro Carvalho Chehab wrote:
>>>>>> is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
>>>>>> besides driver debugging? I intend to remove them, as we weren't able
>>>>>> to find any particular use for them when we were discussing this at
>>>>>> the memory handling meeting in Norway...
>>>>> It is a sort of paranoid check to avoid the risk of mass memory
>>>>> corruption
>>>>> if something goes deadly wrong with the video buffers.
>>>>>
>>>>> The original videobuf, written back in 2001/2002 had this code, and
>>>>> I've
>>>>> kept it on the redesign I did in 2007, since I know that DMA is very
>>>>> badly
>>>>> implemented on some chipsets. There are several reports of the video
>>>>> driver
>>>>> to corrupt the system memory and damaging the disk data when a PCI
>>>>> transfer
>>>>> to disk happens at the same time that a PCI2PCI data transfer happens
>>>>> (This
>>>>> basically affects overlay mode, where the hardware is programmed to
>>>>> transfer
>>>>> data from the video board to the video adapter board).
>>>>>
>>>>> The DMA bug is present on several VIA and SYS old chipsets. It happened
>>>>> again
>>>>> in some newer chips (2007?), and the fix were to add a quirk blocking
>>>>> overlay
>>>>> mode on the reported broken hardware. It seems that newer BIOSes for
>>>>> those
>>>>> newer hardware fixed this issue.
>>>>>
>>>>> That's said, I never got any report from anyone explicitly saying that
>>>>> they
>>>>> hit the MAGIC_CHECK() logic.
>>>>>
>>>>> I prefer to keep this logic, but maybe we can add a CONFIG option to
>>>>> disable it.
>>>>> Something like:
>>>>>
>>>>> #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK
>>>>> #define MAGIC_CHECK() ...
>>>>> #else
>>>>> #define MAGIC_CHECK()
>>>>> #endif
>>>> What on earth does this magic check have to do with possible DMA
>>>> overruns/memory corruption? This assumes that somehow exactly these
>>>> magic
>>>> fields are overwritten and that you didn't crash because of memory
>>>> corruption elsewhere much earlier.
>>> Yes, that's the assumption. As, in general, there are more than one
>>> videobuffer,
>>> and assuming that one buffer physical address is close to the other, if
>>> the data
>>> got miss-aligned at the DMA, it is likely that the magic number of the
>>> next buffer
>>> will be overwritten if something got bad. The real situation will depend
>>> on how
>>> fragmented is the memory.
>>
>> For the record: we are talking about the magic fields as found in
>> include/media/videobuf*.h. None of the magic field there are actually in
>> the video buffers. They are in administrative structures or in ops structs
>> which are unlikely to be close in memory to the actual buffers.
>
>Well, Pawel's email didn't mentioned that he is referring just to one type
>of magic check.
Thanks for the explanation. Although I don't really understand how any of
the magic numbers would help with DMA corruption, short of DMA operations
overwriting random memory areas (or just huge amounts of memory). If I understood
correctly, even the magics in dma-sg code are in structs videobuf_dmabuf and
videobuf_dma_sg_memory. Do dma operations touch those structures directly?
Best regards
--
Pawel Osciak
Linux Platform Group
Samsung Poland R&D Center
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Magic in videobuf
2010-03-16 10:13 ` Pawel Osciak
@ 2010-03-16 16:35 ` Hans Verkuil
2010-03-16 16:45 ` Devin Heitmueller
2010-03-17 6:50 ` Pawel Osciak
0 siblings, 2 replies; 13+ messages in thread
From: Hans Verkuil @ 2010-03-16 16:35 UTC (permalink / raw)
To: Pawel Osciak
Cc: 'Andy Walls', 'Mauro Carvalho Chehab',
linux-media, Marek Szyprowski, kyungmin.park
>>Andy Walls wrote:
>>On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote:
>>> Hans Verkuil wrote:
>>> >> Hans Verkuil wrote:
>>> >>>> Pawel Osciak wrote:
>>
>>> >>>>> is anyone aware of any other uses for MAGIC_CHECK()s in videobuf
>>> code
>>> >>>>> besides driver debugging? I intend to remove them, as we weren't
>>> able
>>> >>>>> to find any particular use for them when we were discussing this
>>> at
>>> >>>>> the memory handling meeting in Norway...
>>> >>>> It is a sort of paranoid check to avoid the risk of mass memory
>>> >>>> corruption
>>> >>>> if something goes deadly wrong with the video buffers.
>>> >>>>
>>
>>> >>> What on earth does this magic check have to do with possible DMA
>>> >>> overruns/memory corruption? This assumes that somehow exactly these
>>> >>> magic
>>> >>> fields are overwritten and that you didn't crash because of memory
>>> >>> corruption elsewhere much earlier.
>>
>>> All it does is oops anyway, so it really doesn't 'avoid' a crash
>>> >>> (as if you could in such scenarios). And most likely the damage has
>>> been
>>> >>> done already in that case.
>>> >> It won't avoid the damage, but the error message could potentially
>>> help
>>> >> to track the issue. It will also likely limit the damage.
>>> >>
>>> >>> Please let us get rid of this. It makes no sense whatsoever.
>>> >> I don't have a strong opinion about this subject, but if this code
>>> might
>>> >> help
>>> >> to avoid propagating the damage and to track the issue, I don't see
>>> why we
>>> >> need to remove it, especially since it is easy to disable the entire
>>> logic
>>> >> by just adding a few #if's to remove this code on environments where
>>> no
>>> >> problem is expected.
>>> >
>>> > It is highly unlikely that this code ever prevented these issues.
>>> > Especially given the places where the check is done. I think this is
>>> just
>>> > debug code that has been dragged along for all these years without
>>> anyone
>>> > bothering to remove it.
>>
>>> I remember I had to re-format one disk, during that time, due to a
>>> videobuf issue.
>>> So, those checks help people that are touching at the videobuf code,
>>> reducing the
>>> chances of damaging their disk partitions when trying to implement
>>> overlay mode and
>>> userptr on the videobuf implementations that misses those features, or
>>> when
>>> working on a different mmap() logic at the driver.
>>
>>
>>In a previous job, working on a particularly large application, I had
>>occasional corruption in a shared memory segment that was shared by many
>>writer processes and 2 readers. A simple checksum on the data header
>>(and contents if appropriate) was enough to detect corrpution and avoid
>>dereferencing a corrupted pointer to the next data element (when walking
>>a data area filled with Key-Length-Value encoded data).
>>
>>This "forward error detection" was inelegant to me - kind of like
>>putting armor on one's car instead of learning to drive properly. I
>>only resorted to using the checksum because there was almost no way to
>>find which process was corrupting shared memory in a reasonable amount
>>of time. It allowed me to change a "show stopper" bug into an annoying
>>data presentation bug, so the product could be released to a production
>>environment.
>>
>>In a development environment, it would be much better to disable such
>>defensive coding and let the kernel Oops. You'll never find the
>>problems if you keep hiding them from yourself.
>
> So, to sum up (I hope I understood you guys correctly):
>
> we are not seeing any particular reason (besides debugging) for having
> the checks in videobuf-core. Checks in memory-specific handling may have
> some
> uses, although I am not sure how much. I am not an expert on sg drivers,
> but as
> the magics are in the kernel control structures, they are not really a
> subject
> to corruption. What may get corrupted is video data or sg lists, but the
> magics
> are in a separate memory areas anyway. So videobuf-core magics should be
> removed
> and we are leaning towards removing memory-type magics as well?
That is my opinion, yes. However, there is one case where this is actually
useful. Take for example the function videobuf_to_dma in
videobuf-dma-sg.c. This is called by drivers and it makes sense that that
function should double-check that the videobuf_buffer is associated with
the dma_sg memtype.
But calling this 'magic' is a poor choice of name. There is nothing magic
about it, in this case it is just an identifier of the memtype. And there
may be better ways to do this check anyway.
I have not done any analysis, but might be enough to check whether the
int_ops field of videobuf_queue equals the sg_ops pointer. If so, then the
whole magic handling can go away in this case.
Regards,
Hans
--
Hans Verkuil - video4linux developer - sponsored by TANDBERG Telecom
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Magic in videobuf
2010-03-16 16:35 ` Hans Verkuil
@ 2010-03-16 16:45 ` Devin Heitmueller
2010-03-17 6:50 ` Pawel Osciak
1 sibling, 0 replies; 13+ messages in thread
From: Devin Heitmueller @ 2010-03-16 16:45 UTC (permalink / raw)
To: Hans Verkuil
Cc: Pawel Osciak, Andy Walls, Mauro Carvalho Chehab, linux-media,
Marek Szyprowski, kyungmin.park
On Tue, Mar 16, 2010 at 12:35 PM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> That is my opinion, yes. However, there is one case where this is actually
> useful. Take for example the function videobuf_to_dma in
> videobuf-dma-sg.c. This is called by drivers and it makes sense that that
> function should double-check that the videobuf_buffer is associated with
> the dma_sg memtype.
>
> But calling this 'magic' is a poor choice of name. There is nothing magic
> about it, in this case it is just an identifier of the memtype. And there
> may be better ways to do this check anyway.
It's funny, because when I saw the word "magic", I knew *exactly* what
it did. The notion of putting a magic value at the top of a structure
is not that uncommon (although you don't see it in Linux much). I've
seen it done many times over the years (all using the term "magic"
which is why I knew what it did). The cases where it is really
helpful is when you have lots of (void *) pointers in your buffer
management code, as it lets you detect cases where a function gets
passed the wrong buffer type (which wouldn't fail a compile time due
to the void * pointer). And by using different magic values for
different structure types, in many cases it will give you a hint where
the problem is (because you would now know *what* type of structure
got passed into the function).
That said, these sorts of cases usually aren't usually intended to
catch *random* memory corruption as they are to help isolate problems
in buffer handling code.
Devin
--
Devin J. Heitmueller - Kernel Labs
http://www.kernellabs.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Magic in videobuf
2010-03-16 16:35 ` Hans Verkuil
2010-03-16 16:45 ` Devin Heitmueller
@ 2010-03-17 6:50 ` Pawel Osciak
2010-03-17 7:59 ` Hans Verkuil
1 sibling, 1 reply; 13+ messages in thread
From: Pawel Osciak @ 2010-03-17 6:50 UTC (permalink / raw)
To: 'Hans Verkuil'
Cc: 'Andy Walls', 'Mauro Carvalho Chehab',
linux-media, Marek Szyprowski, kyungmin.park
>Hans Verkuil wrote:
>>>Andy Walls wrote:
>>>On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote:
>>>> Hans Verkuil wrote:
>>>> >> Hans Verkuil wrote:
>>>> >>>> Pawel Osciak wrote:
>>>
>>>> >>>>> is anyone aware of any other uses for MAGIC_CHECK()s in videobuf
>>>> code
>>>> >>>>> besides driver debugging? I intend to remove them, as we weren't
>>>> able
>>>> >>>>> to find any particular use for them when we were discussing this
>>>> at
>>>> >>>>> the memory handling meeting in Norway...
>>>> >>>> It is a sort of paranoid check to avoid the risk of mass memory
>>>> >>>> corruption
>>>> >>>> if something goes deadly wrong with the video buffers.
>>>> >>>>
>>>
>>>> >>> What on earth does this magic check have to do with possible DMA
>>>> >>> overruns/memory corruption? This assumes that somehow exactly these
>>>> >>> magic
>>>> >>> fields are overwritten and that you didn't crash because of memory
>>>> >>> corruption elsewhere much earlier.
>>>
>>>> All it does is oops anyway, so it really doesn't 'avoid' a crash
>>>> >>> (as if you could in such scenarios). And most likely the damage has
>>>> been
>>>> >>> done already in that case.
>>>> >> It won't avoid the damage, but the error message could potentially
>>>> help
>>>> >> to track the issue. It will also likely limit the damage.
>>>> >>
>>>> >>> Please let us get rid of this. It makes no sense whatsoever.
>>>> >> I don't have a strong opinion about this subject, but if this code
>>>> might
>>>> >> help
>>>> >> to avoid propagating the damage and to track the issue, I don't see
>>>> why we
>>>> >> need to remove it, especially since it is easy to disable the entire
>>>> logic
>>>> >> by just adding a few #if's to remove this code on environments where
>>>> no
>>>> >> problem is expected.
>>>> >
>>>> > It is highly unlikely that this code ever prevented these issues.
>>>> > Especially given the places where the check is done. I think this is
>>>> just
>>>> > debug code that has been dragged along for all these years without
>>>> anyone
>>>> > bothering to remove it.
>>>
>>>> I remember I had to re-format one disk, during that time, due to a
>>>> videobuf issue.
>>>> So, those checks help people that are touching at the videobuf code,
>>>> reducing the
>>>> chances of damaging their disk partitions when trying to implement
>>>> overlay mode and
>>>> userptr on the videobuf implementations that misses those features, or
>>>> when
>>>> working on a different mmap() logic at the driver.
>>>
>>>
>>>In a previous job, working on a particularly large application, I had
>>>occasional corruption in a shared memory segment that was shared by many
>>>writer processes and 2 readers. A simple checksum on the data header
>>>(and contents if appropriate) was enough to detect corrpution and avoid
>>>dereferencing a corrupted pointer to the next data element (when walking
>>>a data area filled with Key-Length-Value encoded data).
>>>
>>>This "forward error detection" was inelegant to me - kind of like
>>>putting armor on one's car instead of learning to drive properly. I
>>>only resorted to using the checksum because there was almost no way to
>>>find which process was corrupting shared memory in a reasonable amount
>>>of time. It allowed me to change a "show stopper" bug into an annoying
>>>data presentation bug, so the product could be released to a production
>>>environment.
>>>
>>>In a development environment, it would be much better to disable such
>>>defensive coding and let the kernel Oops. You'll never find the
>>>problems if you keep hiding them from yourself.
>>
>> So, to sum up (I hope I understood you guys correctly):
>>
>> we are not seeing any particular reason (besides debugging) for having
>> the checks in videobuf-core. Checks in memory-specific handling may have
>> some
>> uses, although I am not sure how much. I am not an expert on sg drivers,
>> but as
>> the magics are in the kernel control structures, they are not really a
>> subject
>> to corruption. What may get corrupted is video data or sg lists, but the
>> magics
>> are in a separate memory areas anyway. So videobuf-core magics should be
>> removed
>> and we are leaning towards removing memory-type magics as well?
>
>That is my opinion, yes. However, there is one case where this is actually
>useful. Take for example the function videobuf_to_dma in
>videobuf-dma-sg.c. This is called by drivers and it makes sense that that
>function should double-check that the videobuf_buffer is associated with
>the dma_sg memtype.
>
>But calling this 'magic' is a poor choice of name. There is nothing magic
>about it, in this case it is just an identifier of the memtype. And there
>may be better ways to do this check anyway.
>
>I have not done any analysis, but might be enough to check whether the
>int_ops field of videobuf_queue equals the sg_ops pointer. If so, then the
>whole magic handling can go away in this case.
Well... I see this discussion is dragging on a bit.
I will not be touching magics for now then, at least not until we arrive at
a consensus sometime in the future.
Best regards
--
Pawel Osciak
Linux Platform Group
Samsung Poland R&D Center
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Magic in videobuf
2010-03-17 6:50 ` Pawel Osciak
@ 2010-03-17 7:59 ` Hans Verkuil
0 siblings, 0 replies; 13+ messages in thread
From: Hans Verkuil @ 2010-03-17 7:59 UTC (permalink / raw)
To: Pawel Osciak
Cc: 'Andy Walls', 'Mauro Carvalho Chehab',
linux-media, Marek Szyprowski, kyungmin.park
On Wednesday 17 March 2010 07:50:27 Pawel Osciak wrote:
>>That is my opinion, yes. However, there is one case where this is actually
>>useful. Take for example the function videobuf_to_dma in
>>videobuf-dma-sg.c. This is called by drivers and it makes sense that that
>>function should double-check that the videobuf_buffer is associated with
>>the dma_sg memtype.
>>
>>But calling this 'magic' is a poor choice of name. There is nothing magic
>>about it, in this case it is just an identifier of the memtype. And there
>>may be better ways to do this check anyway.
>>
>>I have not done any analysis, but might be enough to check whether the
>>int_ops field of videobuf_queue equals the sg_ops pointer. If so, then the
>>whole magic handling can go away in this case.
>
>
> Well... I see this discussion is dragging on a bit.
> I will not be touching magics for now then, at least not until we arrive at
> a consensus sometime in the future.
You give up too easily :-)
But I agree that it is probably best to start with checkpatch cleanups.
I did a bit more research. The idea that I had to check the int_ops field
doesn't work at the moment because videobuf_qtype_ops contains a wild variety
of functions: some operate on a queue, some on a buffer. Some get a queue
pointer, but really should have gotten a buffer pointer. Ops like copy_to_user
and copy_stream shouldn't have been in the qtype at all, at least not in the
current form.
Anyway, it would help a lot if the qtype_ops are split into two structs: one
for queues, one for buffers. Each queue or buffer struct then has a pointer
to the ops. And that can be used by the qtype code as a check to detect
whether it is called from the right context.
Frankly, in the long term the queue-specific ops should probably be replaced
by buffer-specific ops anyway since we want to have a lot more flexibility
here and possible have different memory types per buffer (or even plane) on
the same queue.
It is my impression that all the queue-specific ops just walk over the buffers
anyway, so they can easily be replaced by buffer-specific ops and the 'walk over'
part can be moved to the core. So perhaps switching immediately to buffer-only
ops might actually be better than create separate queue and buffer ops.
Regards,
Hans
--
Hans Verkuil - video4linux developer - sponsored by TANDBERG
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-03-17 8:00 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-15 8:07 Magic in videobuf Pawel Osciak
2010-03-15 11:25 ` Mauro Carvalho Chehab
2010-03-15 13:27 ` Hans Verkuil
2010-03-15 16:23 ` Mauro Carvalho Chehab
2010-03-15 16:40 ` Hans Verkuil
2010-03-15 17:26 ` Mauro Carvalho Chehab
2010-03-15 23:30 ` Andy Walls
2010-03-16 10:13 ` Pawel Osciak
2010-03-16 16:35 ` Hans Verkuil
2010-03-16 16:45 ` Devin Heitmueller
2010-03-17 6:50 ` Pawel Osciak
2010-03-17 7:59 ` Hans Verkuil
2010-03-16 14:58 ` Pawel Osciak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox