**Important Note: **This article is part of the series in which TechReport.us discuss theory of Video Stream Matching.

*This step performs the following: *

*From*

input video(s) extracted the frames, which are called mean frames.

*These*

mean frames are store on hard disk, used for decision making.

*Extract*

the features from mean frames.

**3.2
Stages of Image Analysis**

*Image
analysis starts from input and last step before Computer vision
process. *

**3.2.1
Preprocessing **

*When video get as input. It may source video or target video it consists on huge number of frames. Processing of these all frames is not possible. Because it required high processing speed and very fast systems. So only mean frames (Key Frames) takes for processing.*

**3.2.2
Mean frames**

*Key
frame is the old terminology. Now it is replacing by mean frame. Pre
processing is required to select only use full frame. For this select
only those frames which representing a clip uniquely. For this
extract the key frame or mean frame from video. Which posses the same
distribution as whole clip or near to it.[6]*

*So
mean frame selected first by using different mean frame extraction
method. *

**3.2.2.1
Mean Frame Extraction **

*When
video came as input, then system first read that video from desired
location then save it on drive. Now select a feature which has the
ability to define the frame uniquely.[6] *

*That
feature will be help full in classification process. In that system
select the histogram. *

**3.2.2.1.1
Histogram Feature **

*The
histogram is applied on frames in the form of tiles. First divide the
image in number of tiles which are not over lapping then calculate
the histogram for each tile. After calculating the histogram store
its values in column form. Each tile placed its values in one column
respectively. [7, 8]*

*Histogram
is calculated of an image in gray scale therefore it has 256 possible
values which need to be reduced. These values can be reduced by
selecting a use full bin number. *

*Bin
number is the value which shows in how many part total intensity
spectrum divided. This value is in the power of two. It reduced the
number of data values and this is first step towards data reduction.*

**Mathematically***:
*

*B
= number of bins .*

*I
= number of intensity levels . *

*WB
= Each window values *

*WB
= I/B*

*For
Number of possible windows per image must be in ideal form. Ideal
form means image rows and columns are in the power of two. If image
is not in the power of two then first convert it in to the power of
two after that other processing start.*

*If
rows % 2 == 0 and cols % 2 == 0*

*Then
*

*Continue
processing*

*Else
*

*Apply
Padding.*

*Number
of Tiles per image is depends on the size of Tile. Size of the window
also must be in the power of two. Like 2, 4, 8, 16, 32.*

*Greater
the size of window Lower the quality of information. Smaller the size
of window, greater the processing time. So this must be a suitable
selection *

*WN
= Number of windows *

*R
= number of Rows of image *

*C
= Number of Columns of image*

*SxS
= size of each window *

*WN
= ( R * C ) / S*^{2 }

*So
total number of elements per feature vector is *

*TVH
= Number of elements of feature vector*

*TVH
= WN * WB *

*In
this system data is sent back to caller module only in one column
form.*

*For
classification or for decision making like which frame needs to keep
or which frame no needs to keep, KS Test apply. [9]*

**3.2.2.2
KS Test**

The Kolmogorov-Smirnov test (KS-test) tries to determine if two datasets differ significantly. The KS-test has the advantage of making no assumption about the distribution of data. (Technically speaking it is non-parametric and distribution free.) [25]

In a typical experiment, data collected in one situation (let’s call this the control group) is compared to data collected in a different situation (let’s call this the treatment group) with the aim of seeing if the first situation produces different results from the second situation.

If
the outcomes for the treatment situation are “the same” as
outcomes in the control situation, we assume that treatment in fact
causes no effect. Rarely are the outcomes of the two groups
identical, so the question arises: How different must the outcomes
be? Statistics aim to assign numbers to the test results; *P*-values
report if the numbers differ significantly. Reject the null
hypothesis if *P* is “small”.

The process of assigning numbers to results is not straightforward. There is no fairy god mother that can wave her magic wand and tell you if results are evidence for or against an effective treatment.

One simple strategy you might have thought of is surely dead wrong: try lots of different statistics and pick the one that reports want you want.

Every statistical test makes “mistakes”: tells you the treatment is effective when it isn’t or tells you the treatment is not effective when it is effective. These mistakes are not user-errors, rather the statistical tool –properly used and applied to real data– simply lies some small fraction (say a few percent) of the time. Thus if you apply many different statistical tests you are very likely to get at least one wrong answer.

Statisticians, of course, try to make statistics that only rarely (say 5% of the time) lie. In doing this they tune their tests to be particularly good at detecting differences in common situations. Used in those situations the tests may be the best possible tests. Used in different situations the tests may lie outrageously.

For example consider the datasets of two images, one is called control other image is called treatment:

controlA={0.22, -0.87, -2.39, -1.79, 0.37, -1.54, 1.28, -0.31, -0.74, 1.72, 0.38, -0.17, -0.62, -1.10, 0.30, 0.15, 2.30, 0.19, -0.50, -0.09}

treatmentA={-5.13, -2.19, -2.43, -3.83, 0.50, -3.25, 4.32, 1.63, 5.18, -0.43, 7.11, 4.87, -3.10, -5.81, 3.76, 6.31, 2.58, 0.07, 5.76, 3.50}

Notice that both datasets are approximately balanced around zero; evidently the mean in both cases is “near” zero. However there is substantially more variation in the treatment group which ranges approximately from -6 to 6 whereas the control group ranges approximately from -2½ to 2½. The datasets are different, but parametric approaches cannot see the difference.

Situations in which the treatment and control groups are smallish datasets (say 20 items each) that differ in mean, but substantial non-normal distribution masks the difference.

For example, again consider the data set of two images:

controlB={1.26, 0.34, 0.70, 1.75, 50.57, 1.55, 0.08, 0.42, 0.50, 3.20, 0.15, 0.49, 0.95, 0.24, 1.37, 0.17, 6.98, 0.10, 0.94, 0.38}

treatmentB= {2.37, 2.16, 14.82, 1.73, 41.04, 0.23, 1.32, 2.91, 39.41, 0.11, 27.44, 4.51, 0.51, 4.50, 0.18, 14.68, 4.66, 1.30, 2.06, 1.19}

These datasets were drawn from lognormal distributions that differ substantially in mean.

The KS test detects this difference, the mean does not.