Tag: Java

  • Spark Join Optimisation

    Last time, we discussed how to find the bottle neck from Spark UI, but actually, many performance issues in our daily Spark jobs are related to data skew. There are many situations where data skew occurs: The first problem is actually relatively easy to solve. A common and straightforward way is to directly repartition. The…

    Continue Reading →

  • Yarn Capacity Scheduler Introduction

    Yarn Resource Allocation Overview Yarn’s three classic schedulers, FIFO (which is rarely used), Fair Scheduler, and the topic of today’s discussion, Yarn’s Capacity Scheduler, are familiar to many. The official documentation provides a detailed explanation of the configuration parameters for the Capacity Scheduler. Today, I will use a practical example to help you understand the…

    Continue Reading →

  • Zookeeper Election Mechanism Introduction

    What’s zookeeper? Zookeeper is a distributed system. In most cases, zookeeper acts as a coordinator, rather than a storage or message queue. For instance, let’s say we wanna inform other systems after we have done some stuffs, in this scerario, we can create a path as a signal on zabbix, and systems which want to…

    Continue Reading →

  • spark executor memory introduction

    Overview Hello everyone, today I would like to introduce a problem that we often encounter in our work: “Container killed by YARN for exceeding memory limits”. What causes this problem? What is the difference between this problem and OOM? What is the relationship between this problem and the memory structure of Spark Executor? Today, let’s…

    Continue Reading →

  • What’s virtual thread in JDK 19?

    Recently, JDK 19 was released, which introduced several new features. One of the more notable features is the addition of virtual threads. Many people may be confused about what virtual threads are and how they differ from the platform threads we currently use. To understand virtual threads in JDK 19, we first need to understand…

    Continue Reading →

  • Spark Basic Tuning Guide

    How spark works ? Nowadays python is becoming more and more popular among data scientists. Pyspark is rightfully become one of the most popular tools among them. Today let’s use a very simple Pyspark sample to help us understand more about spark. ## Architecture First of all, spark architecture is based on the most typical…

    Continue Reading →

  • Explaination of the principle of Oauth2.0

    What’s OAuth? OAuth is a protocol that designed to protect the user who wanna share their data to third part application or systems. OAuth 2.0 is subsequent version of OAuth 1.0, mostly we use OAuth 2.0 nowadays. Sample Let’s say when we’re going to log in YOUTUBE by using the account of GOOGLE Terms In…

    Continue Reading →

  • Domain Driven Design

    What’s DDD? DDD(Domain Driven Design) is just a concept, an abstract instruction or direction to help us reduce the complexity of our own application. But because it’s a concept, it has already confused so many developers. We need to implement the concept by ourselves without any restrictions, which means you don’t have a tutorial sample…

    Continue Reading →