Skip to main content

Discretization in data mining

Discretization in data mining

Data discretization refers to a method of converting a huge number of data values into smaller ones so that the evaluation and management of data become easy. In other words, data discretization is a method of converting attributes values of continuous data into a finite set of intervals with minimum data loss. There are two forms of data discretization first is supervised discretization, and the second is unsupervised discretization. Supervised discretization refers to a method in which the class data is used. Unsupervised discretization refers to a method depending upon the way which operation proceeds. It means it works on the top-down splitting strategy and bottom-up merging strategy.

Now, we can understand this concept with the help of an example

Suppose we have an attribute of Age with the given values

Another example is analytics, where we gather the static data of website visitors. For example, all visitors who visit the site with the IP address of India are shown under country level.

Some Famous techniques of data discretization

1.Histogram analysis

2.Binning

3.Cluster Analysis

4.Data discretization using decision tree analysis

5.Data discretization using correlation analysis

Data discretization and concept hierarchy generation

The term hierarchy represents an organizational structure or mapping in which items are ranked according to their levels of importance. In other words, we can say that a hierarchy concept refers to a sequence of mappings with a set of more general concepts to complex concepts. It means mapping is done from low-level concepts to high-level concepts. For example, in computer science, there are different types of hierarchical systems. A document is placed in a folder in windows at a specific place in the tree structure is the best example of a computer hierarchical tree model. There are two types of hierarchy: top-down mapping and the second one is bottom-up mapping.

Let's understand this concept hierarchy for the dimension location with the help of an example.

A particular city can map with the belonging country. For example, New Delhi can be mapped to India, and India can be mapped to Asia.

Top-down mapping

Top-down mapping generally starts with the top with some general information and ends with the bottom to the specialized information.

Bottom-up mapping

Bottom-up mapping generally starts with the bottom with some specialized information and ends with the top to the generalized information.

Comments

Popular posts from this blog

Why Laravel Framework is the Most Popular PHP Framework in 2025

Laravel In 2025, Laravel continues to be the most popular PHP framework among developers and students alike. Its ease of use, advanced features, and strong community support make it ideal for building modern web applications. Here’s why Laravel stands out: 1. Easy to Learn and Use Laravel is beginner-friendly and has a simple, readable syntax, making it ideal for students and new developers. Unlike other PHP frameworks, you don’t need extensive experience to start building projects. With clear structure and step-by-step documentation, Laravel allows developers to quickly learn the framework while practicing real-world web development skills. 2. MVC Architecture for Organized Development Laravel follows the Model-View-Controller (MVC) architecture , which separates application logic from presentation. This structure makes coding organized, easier to maintain, and scalable for large projects. For students, learning MVC in Laravel helps understand professional ...

The Latest Popular Programming Languages in the IT Sector & Their Salary Packages (2025)

Popular Programming Languages in 2025 The IT industry is rapidly evolving in 2025, driven by emerging technologies that transform the way businesses build, automate, and innovate. Programming languages play a vital role in this digital revolution, powering everything from web and mobile development to artificial intelligence and cloud computing. The most popular programming languages in today’s IT sector stand out for their versatility, scalability, and strong developer communities. With increasing global demand, mastering top languages such as Python, Java, JavaScript, C++, and emerging frameworks ensures excellent career growth and competitive salary packages across software development, data science, and IT engineering roles. 1. Python Python stands as the most versatile and beginner-friendly language, widely used in data science, artificial intelligence (AI), machine learning (ML), automation, and web development . Its simple syntax and powerful libraries like Pandas, ...

Data Mining And Basic Data Mining Task

Data Mining And Basic Data Mining Task Data Mining Basic Task Data Mining- In industry lots of data available in business,science or any type of industry. Firstly that all data and daily transaction saved in operational database.in that operation database all data saved related with day to day transaction. Data warehouse collect data from operational data warehouse and save successfully.In data warehouse gives only important data from operational database.if operational database contains 100 transaction then in data warehouse gives a 95 transactions from operational database. Data mining basically coming from KDD (knowledge discovery database) concept.Data mining is only part of KDD process. Data mining used from selecting data from data warehouse and show that data to user with with graphical formation like pi chart,bar chart ,diagram etc. Data mining select a important data from data warehouse with user requirement and show that data to user wi...