Quick Reference to six D’s of the data field

For any professional or beginner in the data field, regardless of your specialty or technology you will work on, you will hear about one or more of the following concepts, and we can say it is absolutely important for any data professional to know at least the general concept of any of the following concepts. In this article, we will go through each concept and we will explain the technologies and roles associated with each concept.

Data Engineering

Data Engineering is concerned about design, implement, and maintain data pipelines with different types of data processing such as batch, or streaming processing for the organization. Data Engineer in any organization plays the role of technical arm, which is responsible for implementing and maintaing data pipeline, which other systems and teams depends on to get organization KPIs or to get inisghts from the data.

Roles

  • Data Engineer
  • ETL Developer
  • Data Integration Developer
  • Data Scientist 

Concepts & Technology

  • ETL (Extract Transform Load)
  • Data Warehouse 
  • SQL (Extract and Query information from RDBMS)
  • NoSQL Concepts and Technologies such as (MongoDB –> Document Database, Reddis–>
  • Big Data Ecosystem projects such as (Spark, Hive, Pig, …etc.) 

For in-depth view of data engineering skills and technologies, check our article “Data Engineering Roadmap”

https://blog.datavalley.technology/2020/07/23/data-engineering-detailed-roadmap-data-engineering/

Data Science

Data Science is the art of uncovering the insights and trends in data. It has been around since ancient times. The ancient Egyptians used census data to increase efficiency in tax collection and they accurately predicted the flooding of the Nile river every year. Since then, people working in data science have carved out a unique and distinct field for the work they do.

Now, and because of the massive advancements in storing large datasets and processing/learning algorithms. The term Data Science has gained a bright exposure and the giant companies are now in a poor need of data professionals who can help them to learn more about their business and establish new future strategies. 

Roles

  • Data Scientist
  • Machine Learning Engineer
  • AI Engineer

Concepts & Technology

  • Mathematics (Calculus, Linear Algebra)
  • Statistics
  • Machine Learning Algorithms
  • Deep Learning Techniques
  • Data Warehouse 
  • SQL (Extract and Query information from RDBMS)
  • NoSQL Concepts and Technologies such as (MongoDB –> Document Database, Reddis–>
  • Data Visualization (Tableau, Power BI, Qlik View, Qlik Sense)

For in-depth view of Data Science skills and technologies check our previous article

https://blog.datavalley.technology/2020/07/06/data-science-roadmap-concepts-tools-and-technologies/

Data Visualization

Data Visualization is the art of presenting the data and show it in a proper visualized format that represents the required insights and KPIs, without proper data visualizations it will be hard to take accurate and fast decisions by just looking at numbers and sheets, great data visualization save time and efforts and keep tracking organization KPIs and insights simple and traight forward.

Roles

  • BI Developer
  • Data Scientist 

Concepts & Technology

  • Business Intelligence Concepts
  • Business Knowledge of the business
  • Reports and Dashboard Design
  • Data Modeling 
  • Data Warehouse 

Data Virtualization

Data virtualization is an approach that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how the data is formatted or where it is physically located. The goal of data virtualization is to create a single representation of data from multiple, disparate sources without having to copy or move the data.

Roles

  • Data Engineer
  • Data Scientist
  • BI Developer

Some of the vendors that have data virtualization solutions

  • Actifio
  • Atscale
  • Data Virtuality
  • Denodo
  • IBM Cloud Pak for Data

Data Governance

Data Governance by definition is “Data governance is a set of principles and practices that ensure high quality through the complete lifecycle of your data.”, or we can say that the rule of data governance in an organization is to create data quality, and data security rules, plus create rules to make sure data is consisted and correctly processed from different sources according to the organization standards and business logic. Under the umbrella of the data governance, we have three main pillars, we have data security, data quality, and business processes.

Roles

  • Data Engineer
  • BI Developer
  • Data Quality Developer
  • Data Stewards
  • Business analysts 

Data Quality

The role of data quality is to develop the required processes to make sure that organizations’ data is in high quality and follows the quality rules that were placed by organization data governance. Data Quality rules and processes is based on 6 main dimensions which are:

Completeness: Is all required information available in the data or there is missing information?

Integrity: Are the relations between different datasets exist or not? is it the correct relationship that reflects the real relationship in business logic or not? for example, we should have a relation between employees’ data and store data, and we should have also a link between employees’ data and department data.

Consistency: Is the information available consisted across all data sources or not? for example, a customer that has a deactivated account, should not have any purchases in our sales transactions.

Conformity: Is the data stored in the standard format or not, for example, is all dates have the same standard format or not, is all names follow the standard or not?

Accuracy: Is the data provided accurate or not, is the information represents real value or not?

Timeliness: Is the data loaded and represented in the correct timeframe or not?


Check our latest articles

Facebook
Twitter

Unlimited access to educational materials for subscribers

Ask ChatGPT
Set ChatGPT API key
Find your Secret API key in your ChatGPT User settings and paste it here to connect ChatGPT with your Tutor LMS website.
Hi, Welcome back!
Forgot?
Don't have an account?  Register Now