Everything you need to know about Big Data

The term Big Data refers to the data which is nearly impossible to process using a single computer as it is huge in size, generated frequently and is in many different formats.

For Example, Data generated by Facebook can be termed as Big Data, millions of users are chatting and sharing posts on Facebook at any given point of time, which in turn generates data that is huge, which moves quickly from one place to another (Posts going viral in seconds) and is of different formats (images, texts, Videos).

Characteristics of Big Data:

There are five basics Characteristics of Big Data

Volume: It refers to vast amount of data being generated every second of every day.

Example: Data generated by a Satellite revolving around the Earth.

Velocity: It refers to speed at which new Data is generated and the speed at which data moves  around.

Example: Social media posts going viral in seconds.

Variety:  It refers to different formats of Data.

Example:  email, video, audio, and financial transactions.

Variability: It refers to Data flows which can be highly inconsistent; more data might flow at one time and less at another.

Example:  Number of Tweets increases 100 folds during any important event.

Complexity: Data comes from multiple sources, which increases complexities and makes it difficult to link, match, cleanse and transform.

Example: When we want to combine data from Facebook, Instagram, and WhatsApp.

Types of Big Data:

There are three types of Big Data

Structured Data:  It refers to any data that is of a certain format or Schema.

Example: Spreadsheet.

Unstructured Data:  It refers to messy data which does not follows a format or schema. Over 90% of all data generated in unstructured.

Example: Audio, images, video etc.

Semi – structured Data:  It refers to data which comes between structure and unstructured. It does not reside in a formatted table, but it is somewhat organized and can be easily extracted.

Example: HTML, JSON.

Sources of Big Data:

There are mainly three sources of Big Data

Human Generated: Data that humans create and share.

Example: Social media posts, emails, presentations, audio, and video files.

Machine Generated: Data generated from machines that doesn’t rely on active human intervention.

 Example: Sensors on vehicles, security cameras, satellites.

Organization Generated:  Data generated as Organizations runs their Business.

Example: Records generated when we make purchase at an Online or Physical store.

Big Data Processing:

There are two ways to process Big Data.

Batch Processing: It is used when we need to process all the data at once or periodically such as daily, weekly, or monthly.

Example: Monthly mobile bill.

Streaming Processing: It is used when we need to process dataincrementally or when it arrives.

Example: Heart monitor, Fraud detection.

Techniques of Working with Big Data:

There are mainly four techniques used to work with Big Data.

Artificial Intelligence (AI): It is a branch of Computer Science which deals with teaching Computer systems to perform tasks that would typically need human intelligence.

Example: Self driving cars.

Machine Learning (ML): It is a branch of Artificial Intelligence which focus on enabling computers to perform tasks based on data and with minimal human interference.

Example: Email filters to sort out spam.

Deep Learning: It is a branch of Machine learning that imitates the way humans gain certain types of knowledge.

Example: Face recognition

Data Science: It is a versatile and multidisciplinary field which focuses on finding actionable insights from large sets of raw and structured data. It combines tools and workflow from disciplines like Math and Statistics, Computer Science and Business to process, manage and analyze data.

Example: Healthcare recommendations

Thank you for reading and Happy Learning.

Ready to elevate your skills? Click here for my Python book and here for my Machine Learning book on Kindle.

Leave a comment

Create a website or blog at WordPress.com

Up ↑