Monday, October 3, 2022

Codechef Starters 57, Tree and Divisors

Introduction


In TREEDIVS problem from Codechef Starters 57 contest, you need to output a node statistics, for each nodes in the tree.

A node v is associated to an integer value A_v.


The node statistics to compute is the number of divisors of the product of the values for each nodes within its subtree.

For v a node in the tree,


Statistics(v) = DivisorCount(TreeProduct(v))


TreeProduct(v) = Product_{u in Tree(v)} A_u


Counting divisors of a number requires its prime factorization.



While my logics was correct, I got Time Limit Exceeded on my submission on a few test cases.





In this article, we will go over multiple optimization strategies to reduce runtime within the time limit.


Refer for the successful submission from another community member from which I got the hints for the optimizations. See also Codechef editorial & Small-to-Large merging USACO article.


The performance for 4 different versions run against 2 different types of inputs are:



Runtime got slightly worst on random input. On slim input, it went down drastically.


Setup


Input format


The input consists of

  1. N, the tree size
  2. A, the list of node value
  3. (u_i, v_i), the list of edges to build adjacency list


3
4
100 101 102 103
1 2
1 3
1 4
4
2 2 2 2
1 2
2 3
3 4
5
43 525 524 12 289
1 2
1 3
3 4
4 5


Expected output


We output the list of node statistics for each node:


192 2 8 2 
5 4 3 2
1080 12 60 18 3


Input Parsing


package codechef.starters57.TREEDIVS;

import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PrintWriter;

public class TreeAndDivisorsMain {
public static void main(String[] args) throws IOException {
TreeAndDivisorsFactory factory = TreeAndDivisors3::new;

//InputStream inputStream = System.in;
InputStream inputStream = new FileInputStream("TREEDIVS");
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
PrintWriter writer = new PrintWriter(new BufferedOutputStream(System.out));

String[] tokens;

tokens = bufferedReader.readLine().split(" ");
int T = Integer.parseInt(tokens[0]);
while (T > 0) {
tokens = bufferedReader.readLine().split(" ");
int N = Integer.parseInt(tokens[0]);
tokens = bufferedReader.readLine().split(" ");

int[] A = new int[N];
for (int i = 0; i < N; i++) {
A[i] = Integer.parseInt(tokens[i]);
}

TreeAndDivisors treeAndDivisors = factory.createTreeAndDivisors(N, A);
for (int i = 0; i < N-1; i++) {
tokens = bufferedReader.readLine().split(" ");
int u = Integer.parseInt(tokens[0]);
int v = Integer.parseInt(tokens[1]);
treeAndDivisors.addEdge(u-1, v-1);
}

int[] divisorCount = treeAndDivisors.divisors();
String divisorLine = AbstractTreeAndDivisors.listToString(divisorCount);
writer.println(divisorLine);

T--;
}

writer.close();
inputStream.close();
}
}


Load Test


We define an interface for the solution contract:


public interface TreeAndDivisors {
void addEdge(int u, int v);
int[] divisors();
}


We define a factory to decouple running a solution on input data from the implementation itself.


public interface TreeAndDivisorsFactory {
TreeAndDivisors createTreeAndDivisors(int n, int[] A);
}


We define a tree generator to compare the effect of the tree structure shape on runtime:


public interface TreeGenerator {
void generate(List<List<Integer>> adjacency);
}


Unit Test


Here's JUnit 5 parameterized unit test :


@ParameterizedTest
@MethodSource
public void testSolution(int[] A, int[][] edges, int[] expectedDivisors) {
TreeAndDivisorsFactory factory = TreeAndDivisors3::new;
TreeAndDivisors treeAndDivisors = factory.createTreeAndDivisors(A.length, A);
Arrays.stream(edges)
.forEach(edge -> treeAndDivisors.addEdge(edge[0], edge[1]));
Assertions.assertArrayEquals(expectedDivisors, treeAndDivisors.divisors());
}

static Object[][] testSolution() {
return new Object[][] {
new Object[] {
new int[] { 100, 101, 102, 103 },
new int[][] { { 0, 1 }, { 0, 2 }, { 0, 3 } },
new int[] { 192, 2, 8, 2 } }
};
}


Input Generation


We define 3 input generators, for a random distribution and 2 extreme cases.

1. With setRandomAdjacency, the node distribution is random. For each node i, 0 < i < N, its parent is selected at random in [0, ..., i-1].


2. With setSlimAdjacency, the tree degenerates in a list.




3. With setThickAdjacency, a single parent contains all the nodes as direct children.




static void setThickAdjacency(List<List<Integer>> adjacency) {
for (int i = 1; i < adjacency.size(); i++) {
adjacency.get(0).add(i);
}
}

static void setSlimAdjacency(List<List<Integer>> adjacency) {
for (int i = 1; i < adjacency.size(); i++) {
adjacency.get(i-1).add(i);
}
}

static void setRandomAdjacency(List<List<Integer>> adjacency) {
for (int i = 1; i < adjacency.size(); i++) {
int parent = random.nextInt(i);
adjacency.get(parent).add(i);
}
}



Analysis


Recursion enables computing the parent value from its children values. A top down approach via Depth First Search traversal will be able to compute all node statistics. The traversal runs iteratively through a FIFO queue data structure, instead of recursively through a call stack, to avoid potential StackOverflow exception.


Like in this other problem, the DFS traversal is abstracted away to focus only on the custom logics. This constructs separate the traversal logics from the update logics, which is located in updateNode.

private void dfs() {
Traversal traversal = new AdjacencyListDFSTraversal(adjacency);
traversal.traverse(this::updateNode);
}

protected abstract void updateNode(int current, int parent, List<Integer> children);

Data structure: Hash Map


When computing a current node product value, we need to keep track of the prime factorization of the product value for each child node.

Each prime factorization is stored as a Map<Integer,Integer> where

  • keys are prime numbers
  • values are associated exponents


Version 1

Computing the parent hash map requires 1 merge operation per child.
private void updateNode(int current, int parent, List<Integer> children) {
Map<Integer, Integer> parentExponents = primeFactors(A[current]);

children.stream()
.filter(child -> !(child == parent))
.map(child -> primeExponents[child])
.forEach(childExponents -> mergeExponents(parentExponents, childExponents));

primeExponents[current] = parentExponents;
divisorCount[current] = divisorCount(parentExponents);
}

In the slim case where the tree is a long list,

  • node n at the bottom requires 0 merges
  • node n-1 requires 1 merges
  • ...
  • node 1 at the top requires n-1 merges

This results in O(N^2) total merges.

However we get an exponential trend in the runtime as N grows:



Then we start getting memory error as we increase N. It is thrown when increasing hash table size in HashMap resize.


Caused by: java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.HashMap.resize(HashMap.java:702)
at java.base/java.util.HashMap.merge(HashMap.java:1363)
at codechef.starters57.TREEDIVS.AbstractTreeAndDivisors.mergeExponents(AbstractTreeAndDivisors.java:85)
at codechef.starters57.TREEDIVS.TreeAndDivisors1.lambda$updateNode$2(TreeAndDivisors1.java:19)
at codechef.starters57.TREEDIVS.TreeAndDivisors1$$Lambda$472/0x0000000800d54248.accept(Unknown Source)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at codechef.starters57.TREEDIVS.TreeAndDivisors1.updateNode(TreeAndDivisors1.java:19)
at codechef.starters57.TREEDIVS.AbstractTreeAndDivisors$$Lambda$466/0x0000000800d4efa0.visit(Unknown Source)
at graph.tree.traversal.AbstractDFSTraversal.traverse(AbstractDFSTraversal.java:63)
at graph.tree.traversal.AbstractDFSTraversal.traverse(AbstractDFSTraversal.java:36)
at codechef.starters57.TREEDIVS.AbstractTreeAndDivisors.dfs(AbstractTreeAndDivisors.java:78)
at codechef.starters57.TREEDIVS.AbstractTreeAndDivisors.divisors(AbstractTreeAndDivisors.java:67)
at codechef.starters57.TREEDIVS.TreeAndDivisorsTest.runInput(TreeAndDivisorsTest.java:89)
at codechef.starters57.TREEDIVS.TreeAndDivisorsTest.runBatch(TreeAndDivisorsTest.java:68)
at codechef.starters57.TREEDIVS.TreeAndDivisorsTest.run(TreeAndDivisorsTest.java:24)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)


The original approach had a memory leak, keeping track of every intermediate hash maps, while only the current and children ones were required.

To reduce memory footprint, we should null out children Map references once we no longer need them. It will mark the objects as candidates for Garbage Collection.


@Override
protected void updateNode(int current, int parent, List<Integer> children) {
Map<Integer, Integer> parentExponents = primeFactors(A[current]);

children.stream()
.filter(child -> !(child == parent))
.forEach(child -> {
mergeExponents(parentExponents, primeExponents[child]);
primeExponents[child] = null;
});

primeExponents[current] = parentExponents;
divisorCount[current] = divisorCount(parentExponents);
}


We can now see the O(N^2) trend:



Versions 2 & 3


We can skip a large number of merges by reusing a child object. Selecting the child with the larger size is optimal. Then just reuse the same object to assign it to the current node.

private void updateNode(int current, int parent, List<Integer> children) {
Map<Integer, Integer> valueExponents = primeFactors(A[current]);

Optional<Integer> maxOptional = children.stream()
.filter(child -> child != parent)
.max(Comparator.comparing(child -> primeExponents[child].size()));

Map<Integer, Integer> parentExponents;

if (maxOptional.isPresent()) {
int maxChild = maxOptional.get();

parentExponents = primeExponents[maxChild];
mergeExponents(parentExponents, valueExponents);
children.stream()
.filter(child -> !(child == parent || child == maxChild))
.map(child -> primeExponents[child])
.forEach(childExponents -> mergeExponents(parentExponents, childExponents));
} else {
parentExponents = valueExponents;
}

primeExponents[current] = parentExponents;
divisorCount[current] = divisorCount(parentExponents);
}



Version 4


Computing the results require 2 operations

  1. merge prime factor exponents
  2. compute divisor count

Instead of doing them separately, combining them in 1 single pass reduces the number of iterations through the hash map keys by 50%.

We switch to mergeMultiplyExponents instead of mergeExponents to now perform both actions at once.


private void updateNode(int current, int parent, List<Integer> children) {
Map<Integer, Integer> valueExponents = primeFactors(A[current]);

Optional<Integer> maxOptional = children.stream()
.filter(child -> child != parent)
.max(Comparator.comparing(child -> primeExponents[child].size()));

Map<Integer, Integer> parentExponents;
int dc;

if (maxOptional.isPresent()) {
int maxChild = maxOptional.get();

parentExponents = primeExponents[maxChild];
dc = divisorCount[maxChild];

dc = mergeMultiplyExponents(dc, parentExponents, valueExponents);

for (int child: children) {
if (child == parent || child == maxChild) {
continue;
}
dc = mergeMultiplyExponents(dc, parentExponents, primeExponents[child]);
}
} else {
parentExponents = valueExponents;
dc = divisorCount(valueExponents);
}

primeExponents[current] = parentExponents;
divisorCount[current] = dc;
}


Gradle Report


See this commit for the full application code.

We compare multiple versions

  1. Original
  2. Null out children maps
  3. Reuse max child Map
  4. 50% less Map scans

 against 2 inputs

  • random
  • slim

For slim input with N=4000 nodes, the runtimes were

  1. [27.915s] Original version
  2. [17.442s] Null out children maps
  3. [4.536s] Reuse max child Map
  4. [0.312s] Reuse Map scan for both merging and divisor count computation

    For random input with N=30000 nodes, the runtimes were

      1. [3.562s] Original version
      2. [3.324s] Null out children maps
      3. [3.368s] Reuse max child Map
      4. [3.881s] Reuse Map scan for both merging and divisor count computation





      Conclusion


      As we merge hash tables bottom-up towards the root, hash table grows significantly. Iterating through the entries is the bottleneck in the code execution. We should optimize towards reducing full hash table scans as much as possible.

      Wednesday, September 7, 2022

      Meta Hacker Cup 2022, Qualification Round, Problem D, Second Flight

      Introduction


      In Problem D, we need to return the max number of passengers who can travel from airport X to airport Y, using either

      1. a direct flight, available in the morning & the evening
      2. a 2-leg flight, going through 1 intermediate airport

      Graph contains N nodes and M edges. We need to run Q queries overall.


      Interface


      The end-goal is to implement following interface within the problem constraints,


      interface SecondFlightService {
      void addFlight(int x, int y, int capacity);
      void setConnectionCapacity();
      long passengers(int x, int y);
      }

      • addFlight is called for each edge to add it
      • setConnectionCapacity is called once after all edges were added
      • passengers is called to process the query

      Input Parsing


      package hackercup2022;

      import java.io.BufferedOutputStream;
      import java.io.BufferedReader;
      import java.io.FileInputStream;
      import java.io.FileOutputStream;
      import java.io.InputStream;
      import java.io.InputStreamReader;
      import java.io.OutputStream;
      import java.io.PrintWriter;
      import java.util.ArrayList;
      import java.util.List;

      public class SecondFlight implements SecondFlightService {

      public static void main(String[] args) throws Exception {
      //InputStream inputStream = System.in;
      InputStream inputStream = new FileInputStream("second_flight_sample_input.txt");
      BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));

      //OutputStream outputStream = System.out;
      OutputStream outputStream = new FileOutputStream("second_flight_sample_output.txt");
      PrintWriter writer = new PrintWriter(new BufferedOutputStream(outputStream));

      String[] tokens;

      tokens = bufferedReader.readLine().split(" ");
      int T = Integer.parseInt(tokens[0]);
      for (int t = 1; t <= T; t++) {
      tokens = bufferedReader.readLine().split(" ");
      int N = Integer.parseInt(tokens[0]);
      int M = Integer.parseInt(tokens[1]);
      int Q = Integer.parseInt(tokens[2]);
      int[] A = new int[M];
      int[] B = new int[M];
      int[] C = new int[M];
      SecondFlightService secondFlight = new SecondFlight(N);
      for (int i = 0; i < M; i++) {
      tokens = bufferedReader.readLine().split(" ");
      A[i] = Integer.parseInt(tokens[0]);
      B[i] = Integer.parseInt(tokens[1]);
      C[i] = Integer.parseInt(tokens[2]);
      secondFlight.addFlight(A[i], B[i], C[i]);
      }

      secondFlight.setConnectionCapacity();

      writer.print(String.format("Case #%d:", t));
      for (int i = 0; i < Q; i++) {
      tokens = bufferedReader.readLine().split(" ");
      int x = Integer.parseInt(tokens[0]);
      int y = Integer.parseInt(tokens[1]);
      writer.print(" " + secondFlight.passengers(x, y));
      }
      writer.println();
      }

      writer.close();
      outputStream.close();
      inputStream.close();
      }
      }


      Solution with adjacency matrix


      We iterate over the list of nodes. We maintain 2 2D arrays, capacity vs connectionCapacity, to maintain the counts for direct vs indirect flights.


      public class SecondFlight {


      /**
      * number of airports
      */
      int N;

      public SecondFlight(int N) {
      this.N = N;
      capacity = new int[N+1][N+1];
      connectionCapacity = new int[N+1][N+1];
      }

      int[][] capacity;
      int[][] connectionCapacity;
      public void addFlight(int a, int b, int c) {
      capacity[a][b] = c;
      capacity[b][a] = c;
      }

      public void setConnectionCapacity() {
      for (int x = 1; x <= N; x++) {
      for (int y = 1; y <= N; y++) {
      if (capacity[x][y] == 0) {
      continue;
      }
      for (int z = 1; z <= N; z++) {
      if (capacity[y][z] == 0 || x == z) {
      continue;
      }

      connectionCapacity[x][z] += Math.min(capacity[x][y], capacity[y][z]);
      connectionCapacity[z][x] = connectionCapacity[z][x];
      }
      }
      }
      }

      public int passengers(int x, int y) {
      return 2 * capacity[x][y] + connectionCapacity[x][y];
      }
      }


      This approach runs in O(N^3) runtime complexity, O(N^2) space complexity.

      We cache all possible values. Each 2-leg flight [x, y, z] contribute to the count for associated (x, z) pair.

      The matrix is symmetrical since we use an undirected graph. A flight going in either direction [x,y] or [y,x] has the same capacity.


      Caching all values


      We now switch from adjacency matrix to adjacency list.

      We use directed edges to represent the graph.


      static class Edge {
      int a, b, capacity;
      Edge(int a, int b, int capacity) {
      this.a = a;
      this.b = b;
      this.capacity = capacity;
      }
      }


      We iterate over List<Edge> the list of edges, instead of nodes.

      We lookup the list of the airports connected to any airport in List<List<Edge> adjacency list.


      public class SecondFlight {


      /**
      * number of airports
      */
      int N;

      int[][] capacity;


      List<Edge> edges = new ArrayList<>();
      List<List<Edge>> adjacency = new ArrayList<>();

      public SecondFlight(int N) {
      this.N = N;
      capacity = new int[N+1][N+1];

      for (int i = 0; i <= N; i++) {
      adjacency.add(new ArrayList<>());
      }
      }

      public void addFlight(int a, int b, int c) {
      addEdge(new Edge(a, b, c));
      addEdge(new Edge(b, a, c));
      }

      private void addEdge(Edge edge) {
      edges.add(edge);
      adjacency.get(edge.a).add(edge);
      capacity[edge.a][edge.b] = 2*edge.capacity;
      }


      public void setConnectionCapacity() {
      edges.forEach(this::addSecondLeg);
      }

      private void addSecondLeg(Edge leg1) {
      for (Edge leg2 : adjacency.get(leg1.b)) {
      if (leg2.b != leg1.a) {
      capacity[leg1.a][leg2.b] += Math.min(leg1.capacity, leg2.capacity);
      }
      }
      }

      public int passengers(int x, int y) {
      return capacity[x][y];
      }
      }


      What's the new runtime complexity?

      For each edge leg1, we iterate over all the neighbors of the destination airport of the flight represented by leg1. If we don't end up with a round trip, we update the count associated to the considered 2-leg flight, [leg1.a, leg1.b, leg2.b]

      Each edge incoming to a node v can be used as leg1, each edge outgoing from v can be used as leg2, as long as it's not the reverse of leg1.

      Since this is undirected graph, we have

      deg_in(v) = deg_out(v) = deg(v)

      so the overall complexity is

      Sum(deg(v)^2) <= (Sum(deg(v))^2 = M^2

      To summarize,

      • runtime complexity is O(M^2)
      • space complexity is O(N^2)

      which won't work with problem constraint

      1 <= M,N <= 200 000


      Computing the query


      Computing the query instead of precomputing it does not require the 2-D array:


      public class SecondFlight implements SecondFlightService {


      /**
      * number of airports
      */
      int N;

      static class Edge {
      int a, b, capacity;
      Edge(int a, int b, int capacity) {
      this.a = a;
      this.b = b;
      this.capacity = capacity;
      }

      @Override
      public String toString() {
      return String.format("(%d->%d) %d", a, b, capacity);
      }
      }
      List<Map<Integer, Edge>> adjacency = new ArrayList<>();

      public SecondFlight(int N) {
      this.N = N;
      for (int i = 0; i <= N; i++) {
      adjacency.add(new HashMap<>());
      }
      }

      public void addFlight(int a, int b, int c) {
      addEdge(new Edge(a, b, c));
      addEdge(new Edge(b, a, c));
      }

      private void addEdge(Edge edge) {
      adjacency.get(edge.a).put(edge.b, edge);
      }

      public void setConnectionCapacity() {
      }

      public long passengers(int x, int y) {
      long p = 0;
      for (Edge leg1: adjacency.get(x).values()) {
      if (leg1.b == y) {
      p += 2 * leg1.capacity;
      continue;
      }

      Edge leg2 = adjacency.get(leg1.b).get(y);
      if (leg2 == null) {
      continue;
      }

      p += Math.min(leg1.capacity, leg2.capacity);
      }
      return p;
      }
      }


      We switched from List<Edge> to  Map<Integer, Edge> in adjacency list.

      The complexity of calculating the result is O(deg(x)) which is O(M).


      Optimizations


      Optimizations outlined in the editorial are


      Caching query results


      We cache the results in a 2D map, Map<Integer, Map<Integer, Long>>.

      Selecting the node with smaller degree


      We check adjacency list size to pick start node with fewer neighbors.

      Solution


      The new version for the query logics is

      Map<Integer, Map<Integer, Long>> results = new HashMap<>();

      public long passengers(int x, int y) {
      if (adjacency.get(x).size() > adjacency.get(y).size()) {
      return passengers(y, x);
      }
      // assert x has fewer neighbors than y

      return results.computeIfAbsent(x, x2 -> new HashMap<>())
      .computeIfAbsent(y, y2 -> computePassengers(x, y));
      }

      private long computePassengers(int x, int y) {
      long p = 0;
      for (Edge leg1: adjacency.get(x).values()) {
      if (leg1.b == y) {
      p += 2 * leg1.capacity;
      continue;
      }

      Edge leg2 = adjacency.get(leg1.b).get(y);
      if (leg2 == null) {
      continue;
      }

      p += Math.min(leg1.capacity, leg2.capacity);
      }
      return p;
      }

      Conclusion


      Simple approaches did not fit problem constraints.

      We looked at doing offline computation. To get query in O(1), caching all values required O(N^2) space complexity. Calculating them was O(M^2) runtime complexity. Both size are impractical.

      For online computation, 1 query's runtime was at first O(M) with optimized data structure using map in adjacency list.

      It turned out that simple further optimizations such as caching previous queries and picking node with smaller degree did the trick to reduce overall runtime from O(Q*M) down to O(Q^(1/2)*M).



      Thursday, May 12, 2022

      Trie Data Structure for string deduplication

      Introduction


      In Leetcode weekly contest problem, we need to compute the number of unique subarrays of an array nums that match following condition:

      • At most k elements in nums[i..j] are divisible by p, with 0 <= i <= j < n.

      Because the same subarray sequence can occur in multiple places in the array, how do we avoid double-counting them?


      Unit Test


      import org.junit.jupiter.api.Assertions;
      import org.junit.jupiter.api.Test;

      public class KDivisibleElementsSubarraysTest {
      KDivisibleElementsSubarrays KDivisibleElementsSubarrays = new KDivisibleElementsSubarrays();

      @Test
      public void solution() {
      int[] nums = new int[] { 2, 3, 3, 2, 2 };
      int k = 2;
      int p = 2;
      Assertions.assertEquals(11, KDivisibleElementsSubarrays.countDistinct2(nums, k, p));
      }
      }


      Brute-Force solution


      We can cache the subarrays in a Set<List<Integer>>. For each subarray, we need to go through the whole subarray to copy it in a list and then compute it's hash. It results an O(N^3) complexity. 


      import java.util.HashSet;
      import java.util.List;
      import java.util.Set;
      import java.util.stream.Collectors;
      import java.util.stream.IntStream;

      public class KDivisibleElementsSubarrays {

      public int countDistinct(int[] nums, int k, int p) {
      int n = 0;
      Set<List<Integer>> s = new HashSet<>();
      for (int i = 0; i < nums.length; i++) {
      for (int j = i; j < nums.length; j++) {
      List<Integer> l = subarray(nums, i, j);
      if (s.contains(l)) {
      continue;
      }

      if (divisible(l, k, p)) {
      n++;
      }

      s.add(l);
      }
      }
      return n;
      }

      private List<Integer> subarray(int[] nums, int i, int j) {
      return IntStream.range(i, j+1)
      .map(k -> nums[k])
      .boxed()
      .collect(Collectors.toList());
      }

      private boolean divisible(List<Integer> l, int k, int p) {
      return l.stream()
      .filter(i -> (i % p) == 0)
      .count() <= k;
      }
      }



      String matching algorithm


      Rabin-Karp algorithm


      The editorial suggests rolling hash from Rabin-Karp String matching algorithm.


      For each subarray of a fixed size, we are able to recompute hash code in just O(1) time. We perform following steps to update the hash code:

      • remove the leftmost character from the previous subarray
      • shift the middle characters that are in common
      • add the rightmost character from the next subarray

      2 scenarios can happen after looking up the hash code:


      1. Favorable case: Hash code does not match with any other subarrays visited so far. It is necessary that hash codes match if subarrays are the same. So we can move on to the next subarray in O(1) time after assessing a True Negative.

      2. Adversarial case: Hash code matches with an already cached subarray's hash code. But subarrays may have different contents while their hash match. It can be a False Positive. This is because hash function is not injective. We need to compare the 2 subarrays character by character in O(N) time to make sure content is the same.


      For more info, refer to CLRS chapter 32, String Matching and section 32.2The Rabin-Karp algorithm.


      String matching algorithms


      Given a String of size n and a Pattern of size m, with characters from an Alphabet of size |Σ|,  we have following runtime complexity table:


      Algorithm          | Preprocessing | Matching

      Brute-force        | 0             | O((n-m+1)*m)
      Rabin-Karp         | O(m)          | O((n-m+1)*m)
      Finite automaton   | O(m*|Σ|)      | O(n)
      Knuth-Morris-Pratt | O(m)          | O(n)



      Trie Data Structure


      To reduce complexity from O(N^3) down to O(N^2), we leverage a trie data structure. It's a tree where each node has only 1 character as value. It potentially has |Σ| children, for each character of the alphabet.


      Like in a suffix tree, to build the Trie, we sequentially insert each suffix [i..n-1], 0 <= i < n in the trie root node.

      In trie diagram below, some nodes were marked with the beginning index for each suffix from the original string, "23322".



      Each subarray [i..j] is a prefix of suffix [i..n-1]. Walking the trie will decide whether the subarray starting at the trie root and ending in the current node should be added or marked as duplicate.


      package leetcode.weekly291;

      import java.util.HashMap;
      import java.util.Map;

      public class KDivisibleElementsSubarrays {

      static class Trie {
      Map<Integer, Trie> children = new HashMap<>();
      }

      private final Trie root = new Trie();
      public int countDistinct(int[] nums, int k, int p) {
      int n = 0;
      for (int i = 0; i < nums.length; i++) {
      n += suffixSubarrays(nums, i, k, p);
      }
      return n;
      }

      private int suffixSubarrays(int[] nums, int start, int k, int p) {
      int n = 0;
      Trie current = root;
      int divisible = 0;
      for (int i = start; i < nums.length; i++) {
      if (nums[i] % p == 0) {
      divisible++;
      }
      if (divisible > k) {
      break;
      }

      Trie child;
      if (current.children.containsKey(nums[i])) {
      // subarray at [start ... i] is deduplicated
      child = current.children.get(nums[i]);
      } else {
      if (divisible <= k) {
      n++;
      }
      child = new Trie();
      current.children.put(nums[i], child);
      }
      current = child;
      }
      return n;
      }
      }


      Conclusion


      The problem of finding distinct subarrays is an example of string matching. We can use a rolling hash to find potential matches. This works well for inputs with a small number of matches. Due to collision checking, worst-case complexity becomes the same as naive approach.

      As an alternative, we can leverage a Trie data structure to deduplicate strings. To build it, we insert each suffix of the original string. We match the longest prefix in common between current suffix and the already inserted suffixes. This allows to process each subarrays starting at a given index at once by just processing the suffix starting at the same index.