Introduction
Problem C in Codeforces Educational Codeforces Round 125 describes a string processing process with following rules
- characters only include '(' and ')'
- find shortest prefix that is either
- a palindrome
- a regular sequence: it should be well-formed, like an arithmetic expression
- repeat parsing process
For a regular sequence, we could maintain a counter. We increment it on opening brackets and decrement on closing brackets. If it reaches 0 after being positive only it is a valid prefix matching a regular sequence. If it becomes negative before reaching 0, sequence is not regular.
For a palindrome, naive palindrome checking complexity is too high, O(N) for a general alphabet on a string of size N, as we need to compare N/2 characters.
We need to leverage the binary & shorted prefix properties of the problems.
Running a case disjunction lets us discover properties of the output.
Finite State-machine
We look at all the possibilities for the first few characters. We draw a decision tree to develop an intuition on parsing logics.
We can then assign a state to every possible pattern of minimal prefix being either a palindrome or regular. Here's the finite state machine version:
Conclusion
For string parsing problem, it helps to visualize the simple cases via a decision tree. We then expand it to get a more exhaustive picture to include most cases. We can leverage compiler theory / Finite State Machine to extract the possible states. For validation purpose we run a sample input against the FSM to see how the match progresses as input is parsed.
Annex
Dot file for decision tree
digraph G {
rankdir="LR"
Start -> O1;
Start -> C1;
Start -> E1;
O1 -> O2;
O1 -> C2;
C1 -> OStar;
OStar -> C3;
OStar -> E2;
Start [label="^"];
O1, O2 [label="("];
C1, C2, C3 [label=")"];
E1, E2 [label="$"]
OStar [label="(*"];
}~
Dot file for finite state machine
digraph FSM {
rankdir=LR;
Invisible -> Start;
Start -> S1[label="("];
S1 -> R[label=")"]
S1 -> P2[label="("]
R -> Start[style="dashed"]
P2 -> Start[style="dashed"]
Pn -> Start[style="dashed"]
Start -> P1[label=")"];
P1 -> P1[label="("]
P1 -> Pn[label=")"]
Start -> End[label="EOF"]
S1 -> End[label="EOF"]
P1 -> End[label="EOF"]
Invisible[style=invis];
Start[label="^"];
End[label="$", shape=doublecircle];
}
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.