Blog

Golang Java Operator Spark Yarn Zookeeper

  • Spark Join Optimisation

    Last time, we discussed how to find the bottle neck from Spark UI, but actually, many performance issues in our daily Spark jobs are related to data skew. There are many situations where data skew occurs: The first problem is actually relatively easy to solve. A common and straightforward way is to directly repartition. The…

    read more


  • Golang Basics – Closure 

    First of all, what is closure? From the explanation on Wiki: In programming languages, a closure, also lexical closure or function closure, is a technique for implementing lexically scoped name binding in a language with first-class functions. Operationally, a closure is a record storing a function[a] together with an environment.[1] The environment is a mapping…

    read more


  • Yarn Capacity Scheduler Introduction

    Yarn Resource Allocation Overview Yarn’s three classic schedulers, FIFO (which is rarely used), Fair Scheduler, and the topic of today’s discussion, Yarn’s Capacity Scheduler, are familiar to many. The official documentation provides a detailed explanation of the configuration parameters for the Capacity Scheduler. Today, I will use a practical example to help you understand the…

    read more