Got it

Category variable - Card square box Highlighted

Latest reply: Nov 21, 2021 14:19:38 3169 6 14 0 0

Hello, everyone!

This post highlights the category variable Card square box. Please find more details as you read further down.

When a class variable is encountered in modeling, it is often converted to a dummy variable for processing.

However, if the attribute of a category variable is too large, too many dumb variables will be generated, resulting in a dimension increase. In many cases, only some dumb variables enter the model and some information about the category variable may be lost.

In addition to converting to dumb variables, you can also divide the class variables into boxes to reduce the number of attributes.

Binning algorithm

The category variable does not have a value size. Therefore, the size relationship between adjacent bins does not need to be ensured as a continuous variable. Calculate the bad sample rate of each attribute of the category variable before the binning, sort the attributes according to the bad sample rate and then combine the adjacent attributes until the termination condition is reached.

The algorithm is as follows:

(1) Calculate the total number of samples, number of good samples, number of bad samples, sample ratio and bad sample rate of each attribute of the category variable and then sort the samples according to the bad sample rate. In this case, each attribute is divided into one group.

(2) Calculate the two adjacent groups of card square values and combine the two adjacent groups with the smallest card square value.

(3) Repeat step (2) until the number of groups is <=BinMax.

(4) Check whether each group contains bad samples and good samples. If a group contains only bad samples or good samples, combine the group with the smallest value.

(5) Repeat step (4) until each group contains bad samples and good samples at the same time.

(6) Check whether the proportion of samples in each group is >=BinPcntMin. If the proportion of samples in a group is <BinPcntMin, combine the group with the smallest value.

(7) Repeat step (6) until the samples of each group account for >=BinPcntMin.

Code section:

2. Calculate the total sample, good sample, bad sample and bad sample rate.

3. Calculate the card square value.

4. The following is a single-variable binning function. The preceding two functions are invoked to return the result of single-variable binning.

According to the preceding algorithm, the binning function is divided into three parts, combining two adjacent bins. Check whether each box contains both good and bad and check whether the proportion of each box is greater than or equal to BinPcntMin. In the preceding information, spe_attri is a special attribute value and is used as a separate group.

That's all, thanks!

• x
• convention：

Favorite(0) Share
 user_3427849 Created Jul 27, 2019 09:28:48 .pcb{margin-right:0} .pcb table{table-layout:auto;} div#jdc_suspicious_threads{ background: #fcf5d8; text-align: center; font-size: 12px; } very detail,thank you for your share. View more x convention：
 JJ_G Created Jul 27, 2019 09:32:45 .pcb table{table-layout:auto;} div#jdc_suspicious_threads{ background: #fcf5d8; text-align: center; font-size: 12px; } Thanks View more x convention：
 olive.zhao Admin Created Jul 29, 2019 06:46:14 .pcb table{table-layout:auto;} div#jdc_suspicious_threads{ background: #fcf5d8; text-align: center; font-size: 12px; } View more x convention：
 Zebra Created May 29, 2021 09:13:24 .pcb table{table-layout:auto;} div#jdc_suspicious_threads{ background: #fcf5d8; text-align: center; font-size: 12px; } Great share View more x convention：
 olive.zhao Admin Created Nov 17, 2021 08:54:06 .pcb table{table-layout:auto;} div#jdc_suspicious_threads{ background: #fcf5d8; text-align: center; font-size: 12px; } Thanks for your sharing! View more x convention：
 user_4358465 Created Nov 21, 2021 14:19:38 .pcb table{table-layout:auto;} div#jdc_suspicious_threads{ background: #fcf5d8; text-align: center; font-size: 12px; } Excellent...a good summary View more x convention：

Comment

Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
• Politically sensitive content
• Content concerning pornography, gambling, and drug abuse
• Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder