Let's figure out the newWorkStealingPool method, which prepares a ExecutorService for us.

This thread pool is special. Its behavior is based on the idea of "stealing" work.

Tasks are queued and distributed among processors. But if a processor is busy, then another free processor can steal a task from it and execute it. This format was introduced in Java in order to reduce conflicts in multi-threaded applications. It is built on the fork/join framework.

fork/join

In the fork/join framework, tasks are decomposed recursively, that is, they are broken down into subtasks. Then the subtasks are executed individually, and the results of the subtasks are combined to form the result of the original task.

The fork method starts a task asynchronously on some thread, and the join method lets you wait for this task to finish.

newWorkStealingPool

The newWorkStealingPool method has two implementations:

public static ExecutorService newWorkStealingPool(int parallelism) {
        return new ForkJoinPool
            (parallelism,
             ForkJoinPool.defaultForkJoinWorkerThreadFactory,
             null, true);
    }

public static ExecutorService newWorkStealingPool() {
        return new ForkJoinPool
            (Runtime.getRuntime().availableProcessors(),
             ForkJoinPool.defaultForkJoinWorkerThreadFactory,
             null, true);
    }

From the outset, we note that under the hood we are not calling the ThreadPoolExecutor constructor. Here we are working with the ForkJoinPool entity. Like ThreadPoolExecutor, it is an implementation of AbstractExecutorService.

We have 2 methods to choose from. In the first, we ourselves indicate what level of parallelism we want to see. If we do not specify this value, then the parallelism of our pool will be equal to the number of processor cores available to the Java virtual machine.

It remains to figure out how it works in practice:

Collection<Callable<Void>> tasks = new ArrayList<>();
        ExecutorService executorService = Executors.newWorkStealingPool(10);

        for (int i = 0; i < 10; i++) {
            int taskNumber = i;
            Callable<Void> callable = () -> {
                System.out.println("Processed user request #" + taskNumber + " on thread " + Thread.currentThread().getName());
                return null;
            };
            tasks.add(callable);
        }
        executorService.invokeAll(tasks);

We create 10 tasks that display their own completion status. After that, we launch all the tasks using the invokeAll method.

Results when executing 10 tasks on 10 threads in the pool:

Processed user request #9 on ForkJoinPool-1-worker-10 thread
Processed user request #4 on ForkJoinPool-1-worker-5 thread
Processed user request #7 on ForkJoinPool-1-worker-8 thread
Processed user request #1 on ForkJoinPool-1-worker-2 thread
Processed user request #2 on ForkJoinPool-1-worker-3 thread
Processed user request #3 on ForkJoinPool-1-worker-4 thread
Processed user request #6 on ForkJoinPool-1-worker-7 thread
Processed user request #0 on ForkJoinPool-1-worker-1 thread
Processed user request #5 on ForkJoinPool-1-worker-6 thread
Processed user request #8 on ForkJoinPool-1-worker-9 thread

We see that after the queue is formed, the threads take tasks for execution. You can also check how 20 tasks will be distributed in a pool of 10 threads.

Processed user request #3 on ForkJoinPool-1-worker-4 thread
Processed user request #7 on ForkJoinPool-1-worker-8 thread
Processed user request #2 on ForkJoinPool-1-worker-3 thread
Processed user request #4 on ForkJoinPool-1-worker-5 thread
Processed user request #1 on ForkJoinPool-1-worker-2 thread
Processed user request #5 on ForkJoinPool-1-worker-6 thread
Processed user request #8 on ForkJoinPool-1-worker-9 thread
Processed user request #9 on ForkJoinPool-1-worker-10 thread
Processed user request #0 on ForkJoinPool-1-worker-1 thread
Processed user request #6 on ForkJoinPool-1-worker-7 thread
Processed user request #10 on ForkJoinPool-1-worker-9 thread
Processed user request #12 on ForkJoinPool-1-worker-1 thread
Processed user request #13 on ForkJoinPool-1-worker-8 thread
Processed user request #11 on ForkJoinPool-1-worker-6 thread
Processed user request #15 on ForkJoinPool-1-worker-8 thread
Processed user request #14 on ForkJoinPool-1-worker-1 thread
Processed user request #17 on ForkJoinPool-1-worker-6 thread
Processed user request #16 on ForkJoinPool-1-worker-7 thread
Processed user request #19 on ForkJoinPool-1-worker-6 thread
Processed user request #18 on ForkJoinPool-1-worker-1 thread

From the output, we can see that some threads manage to complete several tasks (ForkJoinPool-1-worker-6 completed 4 tasks), while some complete only one (ForkJoinPool-1-worker-2). If a 1-second delay is added to the implementation of the call method, the picture changes.

Callable<Void> callable = () -> {
   System.out.println("Processed user request #" + taskNumber + " on thread " + Thread.currentThread().getName());
   TimeUnit.SECONDS.sleep(1);
   return null;
};

For the sake of experiment, let's run the same code on another machine. The resulting output:

Processed user request #2 on ForkJoinPool-1-worker-23 thread
Processed user request #7 on ForkJoinPool-1-worker-31 thread
Processed user request #4 on ForkJoinPool-1-worker-27 thread
Processed user request #5 on ForkJoinPool-1-worker-13 thread
Processed user request #0 on ForkJoinPool-1-worker-19 thread
Processed user request #8 on ForkJoinPool-1-worker-3 thread
Processed user request #9 on ForkJoinPool-1-worker-21 thread
Processed user request #6 on ForkJoinPool-1-worker-17 thread
Processed user request #3 on ForkJoinPool-1-worker-9 thread
Processed user request #1 on ForkJoinPool-1-worker-5 thread
Processed user request #12 on ForkJoinPool-1-worker-23 thread
Processed user request #15 on ForkJoinPool-1-worker-19 thread
Processed user request #14 on ForkJoinPool-1-worker-27 thread
Processed user request #11 on ForkJoinPool-1-worker-3 thread
Processed user request #13 on ForkJoinPool-1-worker-13 thread
Processed user request #10 on ForkJoinPool-1-worker-31 thread
Processed user request #18 on ForkJoinPool-1-worker-5 thread
Processed user request #16 on ForkJoinPool-1-worker-9 thread
Processed user request #17 on ForkJoinPool-1-worker-21 thread
Processed user request #19 on ForkJoinPool-1-worker-17 thread

In this output, it is notable that we "asked for" the threads in the pool. What's more, the names of the worker threads don't go from one to ten, but are instead sometimes higher than ten. Looking at the unique names, we see that there really are ten workers (3, 5, 9, 13, 17, 19, 21, 23, 27 and 31). Here it is quite reasonable to ask why this happened? Whenever you don't understand what is going on, use the debugger.

This is what we'll do. Let's cast the executorService object to a ForkJoinPool:

final ForkJoinPool forkJoinPool = (ForkJoinPool) executorService;

We will use the Evaluate Expression action to examine this object after calling the invokeAll method. To do this, after the invokeAll method, add any statement, such as an empty sout, and set a breakpoint on it.

We can see that the pool has 10 threads, but the size of the array of worker threads is 32. Weird, but okay. Let's keep digging. When creating a pool, let's try to set the parallelism level to more than 32, say 40.

ExecutorService executorService = Executors.newWorkStealingPool(40);

In the debugger, let's look at the forkJoinPool object again.

Now the size of the array of worker threads is 128. We can assume that this is one of the JVM's internal optimizations. Let's try to find it in the code of the JDK (openjdk-14):

Just as we suspected: the size of the array of worker threads is calculated by performing bitwise manipulations on the parallelism value. We don't need to try to figure out what exactly is happening here. It is enough to simply know that such an optimization exists.

Another interesting aspect of our example is the use of the invokeAll method. It is worth noting that the invokeAll method can return a result, or rather a list of results (in our case, a List<Future<Void>>), where we can find the result of each of the tasks.

var results = executorService.invokeAll(tasks);
        for (Future<Void> result : results) {
            // Process the task's result
        }

This special kind of service and thread pool can be used in tasks with a predictable, or at least implicit, level of concurrency.