Analysing the randomness provided by Stacks VRF within a Smart Contract

vicnicius · September 28, 2024, 11:58am

Hey everyone! I’ve quickly researched using random numbers from within a smart contract for an app I’m developing. I’m sharing it here for two reasons:

I’d appreciate it if someone with more expertise could review the steps and methods and let me know if anything could have affected the results and how to improve it.
It could benefit other projects using a similar approach.

I’m building an app on Stacks that uses the VRF seed for an on-chain drawing mechanism. The smart contract responsible for the drawing has a feature that lets it determine the “difficulty” of its drawing. It will accept a difficulty setting from 1 to 10, which will correlate to the chances someone will have of guessing the number the drawing mechanism selected. For a given difficulty d, the chances should be 1/10^d.

It works in two steps:

Grab the VRF seed for a given block height (tenure height post-Nakamoto) where you want the drawing to happen, and apply the necessary transformations to build a large clarity unsigned integer.
The smart contract will then apply a modulo operation on that large number to limit possible results according to the set difficulty.

For step one, I’m leveraging the City Coin VRF contract: STX Transaction - SPSCW…DYQ11.citycoin-vrf-v2. I believe most implementations will take a very similar approach, but since this was already there and the contract was audited, I felt the best path would be to reuse it.

For step two, I grab the number the VRF contract generated as seed and do the following:

(define-private (pick-lottery-numbers (seed uint))
    (if (is-eq difficulty u1) (ok (mod seed u10))
    (if (is-eq difficulty u2) (ok (mod seed u100))
    (if (is-eq difficulty u3) (ok (mod seed u1000))
    (if (is-eq difficulty u4) (ok (mod seed u10000))
    (if (is-eq difficulty u5) (ok (mod seed u100000))
    (if (is-eq difficulty u6) (ok (mod seed u1000000))
    (if (is-eq difficulty u7) (ok (mod seed u10000000))
    (if (is-eq difficulty u8) (ok (mod seed u100000000))
    (if (is-eq difficulty u9) (ok (mod seed u1000000000))
    (if (is-eq difficulty u10) (ok (mod seed u10000000000))
    err-invalid-difficulty)))))))))))

In my analysis, I collected this function’s output and plotted it to observe the distributions visually. I also did a statistical analysis using the Chi-squared method, comparing the observed results with those of a uniform distribution (null hypothesis), considering the modulo operation would group results. A p-value of less than 0.05 suggests that the data distribution significantly differs from a uniform distribution. A p-value greater than or equal to 0.05 suggests insufficient evidence to conclude that your data significantly differs from a uniform distribution.

Difficulty 1
Screenshot 2024-09-28 at 11.46.28
P-value: 0.004575483833877736

Difficulty 2
Screenshot 2024-09-28 at 11.54.11
P-value: 0.263850889504435

Difficulty 3
Screenshot 2024-09-28 at 12.13.11
P-value: 0.37192589368491225

Difficulty 4
Screenshot 2024-09-28 at 12.15.30
P-value: 0.3905126463210302

For difficulty 5, the uniformity of the results has changed significantly, probably because the sample size has become too small to test (?).

Difficulty 5
Screenshot 2024-09-28 at 12.41.29
P-value: 1

You can have a look at the data and how every calculation was made here: An analysis of the randomness of the drawing mechanism behind the Felix Lottery Smart Contract / vini.btc | Observable

You can also play with the data. This is how I collected the data: felix-contract/scripts/rnd-analysis.js at main · vini-btc/felix-contract · GitHub

My main questions are:

Is there something wrong conceptually or in implementing the statistical tests?
Is there something wrong with implementing the random integer generation in the smart contract?
Would you consider the results enough to claim the drawing mechanism is fair?
If I want to increase the confidence that my algorithm outputs are close to a uniform distribution, would adding another “source of randomness” to increase entropy make sense? I was thinking of adding the result from the rnd integer generated by the City Coin contract to something like the block timestamp or the Bitcoin block hash. Still, those probably open the possibility of miners colluding to get a specific result.

In general, any insights or feedback are very welcome!

friedger · September 29, 2024, 8:51pm

Just for reference, there was an analysis about the vrf Analysis of the Stacks blockchain VRF

eriq · October 2, 2024, 2:46pm

Hi vincnicius,
I used a different approach to pick random winners in my onchain raffle from a valid range of integers.

I create some entropy hashing togheter the following:
-Header ID Hash
-Timestamp
-a user generated string (the name of the raffle)

Once I get the hash I convert it to a number within the provided range. This way I can generate always the same result inputting block height, the range and the name.

I choose the Header ID Hash bc:

id-header-hash : This property returns a (buff 32) value containing the index block hash of a Stacks block. This hash is globally unique, and is derived from the block hash and the history of accepted PoX operations. This is also the block hash value you would pass into (at-block) .

Cheers

Eriq

vicnicius · October 2, 2024, 4:25pm

Thank you very much, @eriq! Yeah, I’m taking the same steps, adding other variables from different sources to add entropy. I’d still be curious to understand if my results make sense, especially since I’m probably not great with statistics.

eriq · October 3, 2024, 10:47am

I checked your code… sounds good your approach. but i will add def the id-header-hash from stacks blocks to generate more entropy. it cannot be predicted… and think is impossible to manipulate the results on the minter side…
Timestamp is another unpredictable value, because miners are fighting to get the reward, so it’s impossible to know the exact timing of the transaction execution.
On my side I’m trying to keep the draw mechanism as simple as possible, and I believe the id-header-hash could be enough as source of randomness, everything you add to the entropy is a plus.
Most important thing is to avoid buying tickets after the draw block. this is the only exploit we need to take care of.

vicnicius · October 4, 2024, 2:39pm

Thanks for having a look! It makes sense to add the id-header-hash, for sure. My only concern with it was first thinking of a very theoretical edge case where you have a HUGE prize lottery, and miners could collude to control the generated block in a way that would benefit them. If the incentives would be high enough, perhaps… but that was when I thought of using it alone, not in combination with other entries.

Please share your approach as well, if that’s ok! Perhaps we could expose an optimal strategy in an isolated contract like the city coin VRF, and I’d be happy to run the same analysis on this version to see if there’s any difference (there should be).

eriq · October 4, 2024, 5:32pm

hi bro, here is a snippet of my approach. keep it simple is my motto.
on playground the entropy is very low, on mainnet is better, anyway the results are random enough…
i still believe it’s impossible to force both timestamp and id header hash to get a predetermined result… only with a signer with more than 50% stacking could manipulate the chain… the competition between miners is enough to avoid this behaviour.

would you like to DM on x? my account is @instoppabile

Eriq

snippet