TechAE Blogs - Explore now for new leading-edge technologies

TechAE Blogs - a global platform designed to promote the latest technologies like artificial intelligence, big data analytics, and blockchain.

Full width home advertisement

Post Page Advertisement [Top]

Data Generation

Step 1: Don't go to the root

Step 2: sudo apt-get update

Run this command to update Linux dependencies.

Linux Dependencies

Step 3: sudo apt-get install gcc make flex bison byacc git

Now, installing some libraries named gcc, make, flex, bison, byacc, and git.

Libraries install

Step 4: git clone https://github.com/gregrahn/tpcds-kit.git

Cloning Github repository

Cloning from Github

Step 5: cd tpcds-kit/tools

Moving to tpcds-kit/tools directory.

Step 6: make OS=LINUX

Last but not least, generating datasets from Github according to OS version.

Downloading datasets

Step 7: ./dsdgen -scale 5 -force

Lastly, this command will allow you to generate 5 GB of test data including 24 .dat extension files.

Generating datasets

You can generate up to 100TB of test data just by changing the scale value in the above command. The below table shows Row counts per scale factor.


Conclusion

Initial test data generation is easy using these 7 steps. You can also take datasets from Kaggle or any other website. For official documentation, you can refer to this document.

No comments:

Post a Comment

Bottom Ad [Post Page]