Got it

[ Technical Dry Goods ] The low-level details of DAYU data preparation: the realization of a simple asynchronous request

Latest reply: Mar 30, 2021 02:17:26 107 1 1 0 0

Hello, everyone!

Today I'm going to introduce you DAYU


 In the ETL operator developed by DAYU Data, there is a function of data preparation. Data preparation refers to obtaining a part of sample data (100 by default) from the data source and displaying it for the user to operate. The user can use a series of instructions to process the data, and the processed target data will be loaded to the target through the ETL operator. The following figure shows the page prepared for the data:

image.png

        

        When obtaining sample data, for large DLI tables or large OBS files, there is a risk of timeout when obtaining sample data. Therefore, we have considered designing an asynchronous solution to obtain data before.

        This asynchronous solution needs to complete two points: (1) Obtain data asynchronously to prevent timeout; (2) The original data obtained can be cached in the background. When the user manipulates the sample data on the interface, the cache is directly used instead of going again. Get sample data once.

        The final implementation uses map + thread pool to avoid the large workload of the database. The core code logic is as follows:

if (execute) {
    String uuid = getuuid(user);
    convert2RowList = getTempRowData(uuid, dataId);
    if (convert2RowList == null) {
        convert2RowList = Waitting;
        tempSamplingData.put(uuid, new RowListAge(Waitting));
        executorService.submit(() -> {
            try {
                List<Row> rowList = getObsFileContent(tokenInfo, user, charsetName, 100);
                tempSamplingData.put(uuid, new RowListAge(rowList, new ArrayList<>(), dataId));
            } catch (Exception e) {
                LOGGER.error("get obs file error", e);
                tempSamplingData.put(uuid, new RowListAge(Error, e));
            }
        });
    }
    if (Error == convert2RowList) {
        RowListAge error = tempSamplingData.remove(uuid);
        if (error != null && error.e != null) {
            throw error.e;
        } else {
            throw new IdeException(ErrorCode.OBS_RESOURCE_INVALID_OBSPATH);
        }
    }}

        (1) The foreground calls the execute request, and the background queries the data in the Map. If the data does not exist, the thread that obtains the data is submitted to the thread pool, and the waiting identifier is returned to the foreground;
        (2) After the thread that obtains the data is executed , Put the acquired raw data in the Map;
        (3) If the exception information is captured, put the exception information in the Map;
        (4) The front desk polls and sends the request until there is data returned or an exception is thrown or the entire The request took more than 2 minutes.


good
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.