Got it

Category variable - Card square box Highlighted

Latest reply: Nov 21, 2021 14:19:38 3169 6 14 0 0

Hello, everyone!

This post highlights the category variable Card square box. Please find more details as you read further down.


When a class variable is encountered in modeling, it is often converted to a dummy variable for processing. 


However, if the attribute of a category variable is too large, too many dumb variables will be generated, resulting in a dimension increase. In many cases, only some dumb variables enter the model and some information about the category variable may be lost. 


In addition to converting to dumb variables, you can also divide the class variables into boxes to reduce the number of attributes.


Binning algorithm


The category variable does not have a value size. Therefore, the size relationship between adjacent bins does not need to be ensured as a continuous variable. Calculate the bad sample rate of each attribute of the category variable before the binning, sort the attributes according to the bad sample rate and then combine the adjacent attributes until the termination condition is reached.


The algorithm is as follows:


(1) Calculate the total number of samples, number of good samples, number of bad samples, sample ratio and bad sample rate of each attribute of the category variable and then sort the samples according to the bad sample rate. In this case, each attribute is divided into one group.


(2) Calculate the two adjacent groups of card square values and combine the two adjacent groups with the smallest card square value.


(3) Repeat step (2) until the number of groups is <=BinMax.


(4) Check whether each group contains bad samples and good samples. If a group contains only bad samples or good samples, combine the group with the smallest value.


(5) Repeat step (4) until each group contains bad samples and good samples at the same time.


(6) Check whether the proportion of samples in each group is >=BinPcntMin. If the proportion of samples in a group is <BinPcntMin, combine the group with the smallest value.


(7) Repeat step (6) until the samples of each group account for >=BinPcntMin.


Code section:


1. Loading.


103237gkdgd9x4z9ydwwp0.png?image.png


2. Calculate the total sample, good sample, bad sample and bad sample rate.


104223uq6riqfqghb4rp1q.png?image.png


3. Calculate the card square value.


104241p5wcwmpnffyhc554.png?image.png


4. The following is a single-variable binning function. The preceding two functions are invoked to return the result of single-variable binning.


According to the preceding algorithm, the binning function is divided into three parts, combining two adjacent bins. Check whether each box contains both good and bad and check whether the proportion of each box is greater than or equal to BinPcntMin. In the preceding information, spe_attri is a special attribute value and is used as a separate group.


104319hncj5t45tsst90zg.png?image.png104412qf6i0c8p2cfprclt.png?image.png104519ug08df8ziib972f3.png?image.png104544u7464z647ewchcw7.png?image.png

That's all, thanks!

  • x
  • convention:

user_3427849
Created Jul 27, 2019 09:28:48

very detail,thank you for your share.
View more
  • x
  • convention:

JJ_G
Created Jul 27, 2019 09:32:45

Thanks
View more
  • x
  • convention:

olive.zhao
Admin Created Jul 29, 2019 06:46:14

Category variable - Card square box-3016851-1
View more
  • x
  • convention:

Zebra
Created May 29, 2021 09:13:24

Great share
View more
  • x
  • convention:

olive.zhao
Admin Created Nov 17, 2021 08:54:06

Thanks for your sharing!
View more
  • x
  • convention:

user_4358465
Created Nov 21, 2021 14:19:38

Excellent...a good summary
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.