Just a quick disclaimer i am in no way a master statistician. Most of my experience with entropy and RNGs comes from creating my own RNGs in linux as well as decoding radio signals. I have also worked on the RTL-entropy project. If you are a master statistician and see that i have done something wrong please email me. I really want to be proven wrong on this one.
For a while i have been working on the next gen detector for the Energetic Ray Global Observatory project . As a part of this i have had access to their massive database. One day in 2015 we had this idea to try and create a HRNG using timestamp data from our ergo units. As a preliminary measure i decided to run some entropy tests on the data itself. Now the timestamps data usually looks like this.
However, our units are only accurate to ~100ns or 30 meters at the speed of light ( it’s actually better than that in a controlled enviroment ). So i decided to lop the last two numbers off. Also since the first 9 didn’t fluctuate i cut those off too. Remember i was just going for entropy here. The first test i preformed was on a full timestamp file. I used the program ent and got these readings for unit 522.
I found this weird because even urand on linux looks more like this.
While the background of this website for example looks like this.
Then i decided to cut the last 10,000 events from a the unit and cut out the 10-17 digits so it looked like this .
ok looks pretty random to me right? lets run ent again on the 10,000 lines.
huh ?!?! whats going on here well maybe i need to remove new lines. so i did that. It looked like this now
At this point i was freaking out because i thought there was a massive hardware issue so i decided to look at one of our older versions. The first one made by some students from M.I.T and Tom Bales. I ran the same set of tests on unit 9.
Timestamps uncut last 10,000:
Timestamps last 10,000 10-17:
Timestamps last 10,000 cut no newlines with ent -c :
Now i just did all this test with new data that you can find here and it’s the same as it was months ago. If anyone wants more data to play with send me an email. The only thing i have found is that 7 is the predominate number. Remember this is for two different units with two totally different GPS and detector systems. Number 522 and 9 you can view them here (when our database isn’t backing up lol) .
Update Sat May 14 08:38:17 EDT 2016:
My friend Grant Hernandez suggested that “ent is treating your ASCII timestamp sample file as binary. It doesn’t know the representation of the numbers in binary.” which made more sense to me than a hardware issue across versions or cosmic rays timing isn’t random so we tried packing it with a simple python script.The file that we packed required newlines in order to pack each event as its own 4 byte stream. The output data from the file looked something like this.
Not human readable but this is the same thing as our no newlines file just packed into bytes. After i created the byte file we ran some tests the first one was on unit 9 cut file with newlines.
Much more information dense than before. However, the Chi square test stuck out like a sore thumb lets check out what the manual for ent says about the Chi square test in this case.
Weird lets try this test with our other unit.
uhh that’s weirdly similar… At first i thought this was an error because it looked so much like the first test.
The -c for the second test output was pretty long so i’m going to post it here.
so i’m going to create and run a few more tests that arn’t in the scope of ent we will see what happens.
Update Mon May 16 11:49:30 EDT 2016 Finding a needle in a haystack of needles:
I have ran a few more tests and i still can’t figure out why it’s (apparently) not random. Although there is some correlation with the sin(prime_numbers) for some weird reason. However, i have decided that there are some jobs that humans just arn’t good at and this is one of them. I have decided to employ my knowledge of scala and this cool new library to learn more about this data. Now apparently employing neural nets for something like this “is overkill”. I could do something like subtract every event from every other event and find some patterns in there but if this is caused by build up in the GM-tube (which i’m planning on testing hardware wise) it won’t be very consistant and may just be adding more entropy to the pool. Having a deep learning algorithm will also pave the way for a spacial tracking for all units that i was planning on doing with machine learning. As well as allow me to find sequences such as coordinated events within a localized group and maybe even find the power of the cosmic ray that hit this area. Anyways, i love overkill especially if it doesn’t cost me anything. I am also creating a test for the Bonferroni correction that i may encorprate back into ent when i get it working. I’m also going to try isolating a unit in a farday cage to see if any Low frequency raido signals are setting off the GM-tube at a constant interval, and maybe building a CV enabled cloud chamber(Simons idea) and putting it ontop of the unit to count how may events are from radiation.
Update Wed May 18 15:29:36 EDT 2016 isotope test:
I ran a test using three isotopes to trigger the GM-tube on our units after i collected 10,000 events i ran ent on the packed timestamps and got this.
there could still be a trigger from lf radio even at the high rates.
Wed May 18 18:17:04 EDT 2016 isotope test update:
I have found that this expirment is flawed. I have found that triggering the gps multiple times in the 200Ms window causes it to use the last event in that window. So the numbers will always be high. However, if i move the isotopes further away i allow for the unit to be triggered by cosmic rays. Short of running this expirement in a very deep salt mine or under the swiss alps i have no way of assuring that cosmic rays are not triggering the unit. I am testing the results that i have from the faraday caged unit. We will see what the results are.