Flow Design
The underlying structure of storage is called Flow. Flow is a continuously appended list of sectors, with each sector being a 256-byte data segment. The data submitted by users will be converted into several consecutive sectors. Users need to upload the metadata of the submission to the storage contract, which will then allocate the starting position in the Flow for the submission. Afterward, users will send the actual data to the storage nodes in the 0g network.
(Note: Concepts like "file" and "segment" are abstracted at a higher level and are not considered within the scope of Flow.)
The Merkle tree over the Flow
A Merkle tree is constructed based on Flow, and the storage contract will maintain the root of the Merkle tree. Since users only submit metadata instead of the complete data to the contract, both the structure of the Merkle tree and the submission metadata are specially designed.
The Merkle root
The Merkle root over the Flow is defined as padding the flow with zeros until the number of sectors is a power of 2. Then, compute the keccak256 hash of each sector, and use keccak256 to calculate the Merkle root of these hashes. The diagram below illustrates an example of a flow with 5 sectors.
Submissions
Since users only upload metadata to the storage contract, certain constraints must be met to ensure that the contract can correctly maintain the Merkle root: The user must pad the original data with zero bytes until it meets the submission conditions.
A submission consists of several concatenated sector arrays that must adhere to the following:
Each sector array's length must be a power of 2, and the lengths of these sector arrays must decrease strictly monotonically.
The length of the longest sector array must not exceed 8 times the length of the shortest sector array.
The metadata for a submission includes:
The size of the original data in bytes
The Merkle root and length of each sector array
The hash of the submission is defined as:
The keccak256 hash of the concatenate for each sector array‘s Merkle root.
This approach ensures that:
The user's submission is always composed of several Merkle trees over the subtrees within the Flow, providing the Merkle root.
The user's submission will not be excessively long.
When receiving submissions, the contract will pad the Flow with zeros to ensure that the start position aligns with the length of the first sector array. For example, if the first sector array contains 64 elements, the submission should be aligned to a 64-element boundary.
Padding for a submission
Here we provide a concrete algorithm for padding the original data to be a valid submission
For example, 13 can be decomposed to (8, 4, 1)
Proof of Random Access
PoRA (Proof of Random Access) is the mining algorithm for storage nodes. It periodically issues tasks requiring storage nodes to read the stored data and compute a hash value. When the hash value is below a specified threshold, the node finds a solution and submits the relevant content to the blockchain to claim a reward.
List of parameters
Constant parameters
Also we can derive the following parameters
Admin adjustable parameters
Mine context
The process of PoRA is not working all the time. It is activated by L1 blocks periodically. For every EPOCH_INTERVAL
blocks, a mine context will be released by the storage mine contract. All the miners will work on PoRA to find valid solutions. Once TARGET_SUBMITS
solutions are submitted, the miners stop PoRA until the next trigger. This can avoid making the miners’ disk busy all the time.
A mining context will be made with the following fields:
epoch
the index of the contextblockNumber
the block number of the block triggering PoRAblockDigest
the hash value of the block triggering PoRAflowRoot
the Merkle root of the FlowflowLength
the number of sectors in the Flowdigest
(referred as Context Digest) the keccak256 hash of (blockDigest
,flowRoot
,flowLength
)
Registration
Miners must register in the smart contract to obtain a minerId
for PoRA. The mine contract will generate a random ID for the miner’s on-chain address. Miners can change their miner ID at any time by requesting a new ID.
In the latest version of the storage node, the node will automatically check and request an ID upon startup.
In future versions, the rewards earned by the same miner ID may be capped.
Data sealing
To ensure that each miner ID independently stores a copy and to maintain the total number of copies in the storage network, PoRA introduces a data sealing mechanism. Unlike Filecoin, the sealing and unsealing processes do not involve heavy cryptographic computations. Instead, they are designed to ensure that the cost of sealing and unsealing is higher than the cost of storage.
The data is sealed at every SEAL_SIZE
byte. The sealed data is computed by
the raw data
unsealed
in the Flow,the
minerId
,the index of the first sector in
unsealed
(should be in alignment ofSEAL_SIZE / SECTOR_SIZE
)the digest
contextDigest
of the first context whoseflowLength
covers the wholeunsealed
.
unsealed
is regarded as an array of 32-byte elements and is sealed by
<aside> 💡 By default, all integers are in 256-bit big-endian format when computing hash values.
</aside>
The sealing process is sequential and the unsealing process is parallel-able. A node can answer a part of the data without unsealing the whole chunk.
Calculate loading chunks
The general process of PoRA involves the following steps:
Based on some on-chain seeds (such as a context digest) and a user-provided nonce, a recall position is computed through a CPU-intensive task.
The miner mixes the loaded data at the recall position with the intermediate results of this CPU-intensive task to calculate the final hash.
As the Flow data grows to the terabyte (TB) level or higher, a single node may be unable to store all the data. If many recall positions are not stored by the node, PoRA will shift from challenging data retrieval to challenging CPU capacity, which contradicts the design's original intent.
Recall Range
To address this, PoRA allows users to specify a recall range, which influences the calculation of the recall position. The miner can specify the mining range with two parameters:
startPosition
The index of the start sector.miningLength
The length of sectors for mine range.A 64-bit binary number with wildcard characters, for example:
0011xx001x
, wherex
represents either 0 or 1. This number is specified with two fieldsshardId
: replace all the wildcard with 0shardMask
: replace all the wildcard with 1, and others to 0
The mining range should satisfy the following conditions:
The
startPosition
should be in alignment ofSECTORS_PER_PRICE
The
miningLength
should not exceedSECTORS_PER_MINE × shardDegree
startPosition + miningLength
should not exceed theflowLength
in the most recent context.
Recall Position
Consider a big integer recallSeed
calculated from the CPU-intensive task. The recall position is calculated as
Iteration of PoRA
Pick a random
nonce
with 32 bytesGet the current mining context with
flowLength
andcontextDigest
Specify the mining range and calculated digest as
Compute recall position of PoRA chunk with a large Scratchpad
ScratchPad
Regard
recallSeed
as a 256-bit big-endian format integer and calculaterecallPosition
as described in the previous sectionLoad
LOADING_SIZE
-bytes sealed data started atrecallPosition
(in terms of sectors) assealedData
Let
mixedData
besealedData
xorscratchpad
Regard
mixedData
as an array ofSEAL_SIZE
-byte elements. For each segmenty
with indexi
, computeporaHash
and check if it reachestargetQuality
.
Design of ScratchPadHash
in PoRA
ScratchPadHash
in PoRADifficulty Adjustment
Once a MineContext
is released at a specific block number, TARGET_SUBMITS
valid solutions are expected to be submitted in exactly TARGET_BLOCKS
blocks. Otherwise the poraTarget
will be adjusted in the following steps.
Decide the
actualBlocks
blocks as the actual time taken to submitTARGET_SUBMITS
solutions. If there are not enough solutions until the next Mine Context release, the blocks between two context releases is defined asactualBlocks
.Adjust the
poraTarget
Economic Model
List of Parameters
Constant parameters
Admin adjustable parameters
Pricing
When users submit data, they need to pay a fee based on the size of the submission, calculated as SECTOR_PRICE × <number of sectors in the submission>
. The zeroes padded after the original data to meet the submission requirements are also subject to fees. However, zeroes padded before the original data by the storage contract to achieve alignment requirements do not incur any fees.
Although the pre-padded zeroes do not incur fees, the fees paid by the user will be evenly distributed across sectors, including the original data and all the zero padded part.
Reward Bucket
Every PRICING_SIZE
bytes of data in the Flow form a pricing chunk, which has a reward bucket, collecting the fees for each sector within this chunk. When a pricing chunk is filled with data, its rewards in that buckets are released linearly over a period defined by DATA_LIFETIME
.
When PoRA's recallPosition
hits a pricing chunk, it will take half of the "remaining released rewards.”
System Rewards
In the early stages of the ecosystem, the foundation can reserve a portion of tokens for system rewards. When PoRA’s recallPosition
hits a pricing chunk that is "releasing rewards", an additional reward of BASE_REWARD
will be issued.
The funds for the base reward will be manually deposited into the reward contract and tracked separately. If the balance for the base reward is insufficient to cover a signle base reward, miners will not be able to receive the full base reward.
Service Fee
A system service fee is charged as a proportion of the storage fees paid by the user, according to the parameter SERVICE_FEE_RATE_BP
.
Last updated