TechAE Blogs - Explore now for new leading-edge technologies

TechAE Blogs - a global platform designed to promote the latest technologies like artificial intelligence, big data analytics, and blockchain.

Full width home advertisement

Post Page Advertisement [Top]

Hive File Formats

This article outlines the file formats that the Hadoop (HDFS) file system supports. After reading this blog, you'll have a better grasp of the various file formats available in Hive, as well as how and when to use them. Apache Hive can read and write a variety of file types that are often used in Apache Hadoop. Hive tables can be created and stored in a variety of file formats, including ORC, RC, Flatfile, Sequence, and others.

File Formats

Hive Text File Format Example

Create table textfile_table
stored as textfile;

Hive Sequence File Format Example

Create table sequencefile_table
stored as sequencefile;

Hive RC File Format Example

Create table RCfile_table
stored as rcfile;

Hive AVRO File Format Example

Create table avro_table
stored as avro;

Hive ORC File Format Example

Create table orc_table
stored as orc;

Hive Parquet File Format Example

Create table parquet_table
stored as parquet;
Pros and Cons of File Formats

In the above table, you can learn about the Pros and Cons of different file formats.

The RC and ORC formats outperform the Text and Sequence File formats. When comparing the RC and ORC file formats, ORC is always preferable because it requires less time to retrieve data and takes up less space to store data. The ORC file, on the other hand, adds additional CPU overhead by lengthening the time it takes to decompress relational data.

a) You can use the TEXTFILE format if your data is delimited by some parameters.
b) You can utilize the SEQUENCE FILE format if your data is in little files that are smaller than the block size.
c) RCFILE can be used to perform analytics on your data while also storing it efficiently.
d) You can use the ORCFILE format to store your data in an optimal method that saves storage space and improves performance.

Hive also includes a number of compression techniques, including Gzip, Bzip, LZO, and Snappy. We can use a directory structure to partition the data.

No comments:

Post a Comment

Bottom Ad [Post Page]