I was curious about this statement:
On average it would be in the middle of the list, that search would take O(N/2)
time. Let’s prove this.
So I’ve made this Python script that tests this statement above.
I used numpy’s randint
to generate an integer from 0 - 1000.
I have created a 1000 cell random number list by appending randomly generated integers (0-1000) to a list, 1000 times. Let’s call it test_list
.
I have created a 10 cell random integer(0-1000) list to use as a search_item_list
.
I have run each item
from the search_item_list
against the test_list
and recorded how many steps it took every time.
I compared the number of steps
each time to the length of the test_list
to get a float between 0 and 1
representing how far the list was traversed to find the number 0 none and 1 meaning all the steppes. So half of the steps would be 0.5, I call it steps_depth
.
I’ve wrapped it up in a function, to which I can pass an argument and specify how many times I want the new 1k test array to be created and all these following commands executed on it, it returns one list of all recorder steps_depth
s
Then I’ve created a function that creates a specified number of lists which I called groups
, each filled with an even range of 100/number_of_lists. So if you create two groups then group0 = 0-51, group1 = 51 - 101, etc.
Then I’ve run results from my tests (all recorded step_depths
) through another function that checks in
which of the created group the step_depth should go and it counts the number of members of each group.
Based on my research usually about 40% of times the number is up to the first half and 60% of times is past the first half. Also surprisingly it’s about 35-40% in the last quarter alone.
And if I make 10 groups (0-11, 11-21, … 91-101) then it’s about 35% in group 10.
The numbers don’t seem to be evenly distributed nor really randomly generated.
I wonder what other “Random” integer methods are available in other packages and how random and evenly distributed would they come out.
Also note that I’ve done these steppes more than 200.000 times.
Each time creating new random array and drawing 10 new random values to search against the array (including scenarios when value was not found at all)
See the result of running the test total 200 times below
each line represent percentage distribution of results according to the part of the array they were found, 1k array, 10 numbers, run 10 times.
[q1, q2, q3, q4]
- q is quarter of the array and the number itself represent it’s share in the whole array
So i’ve run this 20 times:
FIRST TEST:
[24.0, 19.0, 7.0, 50.0]
in this test 24% of times the searched number was found in the first quarter of the test_array, 19% of times in the second quarter, 7% of times in the third quarter and 50%(!!!) of times in the fourth quarter.
The other 19 tests:
[24.0, 13.0, 16.0, 47.0]
[25.0, 23.0, 8.0, 44.0]
[25.0, 18.0, 8.0, 49.0]
[13.0, 10.0, 16.0, 61.0]
[21.0, 14.0, 13.0, 52.0]
[23.0, 16.0, 13.0, 48.0]
[24.0, 16.0, 12.0, 48.0]
[18.0, 17.0, 21.0, 44.0]
[22.0, 24.0, 16.0, 38.0]
[23.0, 12.0, 13.0, 52.0]
[26.0, 18.0, 17.0, 39.0]
[21.0, 17.0, 16.0, 46.0]
[24.0, 15.0, 11.0, 50.0]
[28.0, 17.0, 12.0, 43.0]
[18.0, 21.0, 14.0, 47.0]
[23.0, 17.0, 8.0, 52.0]
[22.0, 19.0, 12.0, 47.0]
[20.0, 19.0, 11.0, 50.0]
[18.0, 17.0, 8.0, 57.0]
[24.0, 14.0, 10.0, 52.0]
[14.0, 21.0, 18.0, 47.0]
[26.0, 14.0, 21.0, 39.0]
[22.0, 18.0, 12.0, 48.0]
[20.0, 12.0, 9.0, 59.0]
[31.0, 14.0, 13.0, 42.0]
[22.0, 11.0, 13.0, 54.0]
[20.0, 20.0, 18.0, 42.0]
[20.0, 15.0, 9.0, 56.0]
[22.0, 22.0, 9.0, 47.0]
Average for each group: 22.1, 16.76, 12.8, 48.33
Notice how the last quarter is dominating all the time…