The eighth post of the series of programming job interview challenge is out. 68 readers provided answers to job interview challenge #7 and most of them had the correct solution. As rvh mentioned, the trick here was to understand that the round table has a symmetric shape and: Actually this algorithm isn’t limited to just a round table. It will work with any shape that is symmetric with respect to both x and y axes. I couldn’t describe the correct answer better than Yoav Zobel, as he was the first one who also formally proved his algorithm:

The Correct answer (as provided by Yoav Zobel):

The algorithm is pretty simple:
1. Place the first coin right in the middle of the table.
2. For the next coins, put each coin in the direct opposite of the coin that your opponent placed (same radius on the opposite side).

You will always win.
I’ve thought of the formal proof (reductio ad absurdum):
Let’s say I followed the algorithm, and let’s assume I lost, since I didn’t have a place to put a coin. We know it’s not the first turn because in the first turn the table is empty.
Now look at the last coin placed by my opponent on the table (in the previous turn). The opposite place of this coin is surely not vacant (because if it has been vacant, I would have a place for my coin). Since it’s not vacant, there must be another coin blocking (even partialy) this space.
Now let’s look at this “blocking” coin. It could have been placed either by me or by my opponent.
1. If it was mine, then I’ve must have placed it after the opponent placed a coin in the opposite side. But we know that it’s not possible since the opponent wouldn’t have a place to put his last coin.
2. So it must have been my opponent’s coin. But that option is not possible as well, because if my opponent indeed put this coin, then right after that I would have put my coin in the opposite side. Again, this is not possible since the opponent wouldn’t have a place to put his last coin.
So we ruled out all of the options, so now we know that if I’ve followed the algorithm, there’s no way I’ll lose the game.

2 Readers wrote blog entries about last week’s challenge:

Those are the commentators who answered correctly:

wasabi, William Ellis, Rudi Angela, Caio Romão, Yoav Zobel, Mike, Jonathan Gilbert, Chris Jobson, Igor Ostrovsky, Tim Niblett, Abhijit, ER, raj, rvh, Christof Jans, jiji, Demie, u8ijk, David Avraamides, Trevor, Sajid Umerji, Loz, Pieter G, Cruise, BA, zack, herbie, xyzzy, Mike Harris, Jim Lebeau, Alex, Justin Henzie, Mark R, Leo, andrew, Mike Stok, Joost, Jamie Macey, Adrian Quark, Tom A, Rodrigo Marcuschi, Tom Ritchford, Anton Irinev, Dustin, Michael, Matt, Fahri Basegmez, Michael Mrozek, Binil Thomas, Hal, Ali Pang, Ben Kavanagh, Krystian Cybulski and Vivek.

As to the wrong solutions, due to the fact that there were a very small amount of them and I didn’t see the same mistake twice, I won’t get into details about them. But, if someone wants an explanation, please leave a comment and I will answer immediately.

This week challenge:

You are writing a software component that receives a binary record every 20 millisecond. Your component function as a pipe, meaning it can’t store the binary records, it should do whatever needed on the current record and than pass it to the next component in the pipeline. Your component goal is to alert whenever it identifies a specific expression (which is provided at the initialization process) in the stream of records – you are looking for a specific combination of binary records. If the incoming records assembles such a combination (with the other ones that came before it), alert about it (raise an event, for instance). Provide the most efficient algorithm to solve this problem, just remember the following issues:

  1. There is already a method to compare two binary records (O(1) complexity).
  2. You can’t store the incoming records (the next component in the pipe may change the data and you don’t want to deep copy every incoming record).
  3. You can do “preparations” at initialization time (computations, memory allocations, etc…).
  4. If something is not clear, please ask me for more details (as you would do in a real job interview).

Reminder: You can provide your answers either by leaving a comment or by posting it in your blog (better). Comment with answers will be approved only next week, because if readers will see the correct answer in one of the comments, it will become pointless.

Tags :

29 Responses to “A Programming Job Interview Challenge #8 – A Needle in a Haystack”


  1. Pieter G

    Said on June 16, 2008 :

    the fastest way i can come up with is to generate a finite state machine at initialisation.

    the transitions between states would be the defined by the records you look for in the pattern and one transistion for an unmatched record.

    when the machine enters the goal state is should send the notification (how to most quickly do that i leave to someone else).

    when reaching te goal state the machine should not terminate but continue (else we may miss a occurrence)

    example of generated FSM when looking for the pattern: ’1001′
    (x mean anything/default)
    (input not nessesarily restricted to ’1′ and ’0′)
    S[1] –> S1
    S[x] –> S

    S1[0] –> S2
    S1[X] –>S

    S2[1] –> G
    S2[x] –> S

    G (goal) send alert
    G[0] –> S1
    G[X] — > S

    state transition takes O(1) time, and we only have to compare the current result to the relevant possible state transitions. so we have a minimal number of comparisons as well.

  2. Dirk

    Said on June 16, 2008 :

    This is a fun example of “thinking outside of the box”, great for interviews indeed, I may use it one day :-)

    as for the answer: compare one record at a time and keep the position you’re at in the expression.
    If it matches, move your position forward; if it doesn’t match, start over at position zero.
    When you reach the end of the expression, it’s time for an alert…

    (note: you could keep multiple positions if multiple expressions needed to be identified)

  3. Mark R

    Said on June 16, 2008 :

    I’ve actually run into a similar situation in real life, and I used the Knuth-Morris-Pratt algorithm:
    http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

    For this question, you need to recognize that a “text string” can be made of arbitrary sized “characters” – all you need is an equality operator, which was given as part of the question.

  4. Josh Bjornson

    Said on June 16, 2008 :

    public class HaystackNeedleFinder {

    private final List invalidSequenceDefinition;
    private final int sequenceSize;
    private int sequencePosition;

    /**
    * Store the target sequence that should trigger alerts.
    *
    * @param invalidSequenceDefinition
    */
    public HaystackNeedleFinder(final List invalidSequenceDefinition) {
    this.invalidSequenceDefinition = Collections.unmodifiableList(invalidSequenceDefinition);
    this.sequencePosition = 0;
    this.sequenceSize = this.invalidSequenceDefinition.size();
    }

    /**
    * This method should be replaced with a call to the provided method that compares two
    * records and returns true when they match (point 1:
    * “There is already a method to compare two binary records (O(1) complexity)”);
    * this “dummy” implementation uses the Arrays.equals() method
    * for two byte arrays.
    *
    * @param record1
    * @param record2
    * @return true when the records are equal, false otherwise
    */
    public static final boolean areRecordsEqual(final byte[] record1, final byte[] record2) {
    return Arrays.equals(record1, record2);
    }

    /**
    * As we process a record, we keep track of how much of the invalid sequence we have matched so far (what position in the
    * invalid sequence definition we have matched up to now with this record). As soon as we get a record that doesn’t match
    * the sequence definition, we will start over again, trying to match incoming records from the beginning of the
    * invalid sequence definition.
    *
    * @throws InvalidRecordSequenceException when this record is the final record in a matching sequence, along with previous record(s)
    */
    public void processRecord(final byte[] record) throws InvalidRecordSequenceException {
    if (areRecordsEqual(record, invalidSequenceDefinition.get(sequencePosition))) {
    sequencePosition++;
    if (sequencePosition == sequenceSize) {
    // we have now matched the whole sequence
    sequencePosition = 0;
    throw new InvalidRecordSequenceException(invalidSequenceDefinition);
    }
    } else {
    // the records do not match; reset the matching sequence marker to the beginning
    sequencePosition = 0;
    }
    }
    }

    /**
    * Exception to signify that an invalid record sequence has been encountered.
    *
    * @author jbjornson
    */

    class InvalidRecordSequenceException extends Exception {

    private static final long serialVersionUID = 1L;

    private final List invalidSequenceDefinition;

    public InvalidRecordSequenceException(final List invalidSequenceDefinition) {
    super();
    this.invalidSequenceDefinition = Collections.unmodifiableList(invalidSequenceDefinition);
    }

    public List getInvalidSequenceDefinition() {
    return invalidSequenceDefinition;
    }
    }

  5. Tim C

    Said on June 16, 2008 :

    Sounds like you’d want to use a state machine.

  6. Timothy Fries

    Said on June 16, 2008 :

    At startup time, you’ll need to compile the matching expression into a simple state machine that can take an existing state and the current record as arguments, and return the resulting state. States may be one of Start, Complete, Error, and any number of intermediate states that represent a valid partial match in progress.

    A list is kept of all in-flight states. When the component starts up, this list is empty. When a record is received, for each state in this list, run the state machine with that state and the received record. If the result is not Error or Complete, store the resulting state in the list to be processed again when the next record is received. If the result is Complete, perform the designated alert.

    Additionally, for each incoming record, call the state machine with the record and the Start state; then handle the result as above.

  7. Tristan

    Said on June 16, 2008 :

    Ummm, I would probably create some sort of DFA/NFA/RegEx. (they are equivalent anyways)

    The set up would be done at startup, state traversal is computationaly trivial due to O(1) comparisons, and we just need to store state, but not the actual records.

  8. Jon von Gillern

    Said on June 16, 2008 :

    w00t, I knew CprE 210 would come in handy some day!

    http://www.vonsharp.net/ProgrammingJobInterviewChallenge8Answer.aspx

  9. Michael Mrozek

    Said on June 16, 2008 :

    Presumably the desired combination is provided at initialization as an array of binary blocks. Alternatively, it may be one large block, but presumably there is then a way to extract specific parts of that block; it amounts to the same thing. Keep an index of how much of the desired pattern has passed through the pipe. Start the index at 0. When a block comes in, compare it to the desired sequence block at array index 0 (if the desired sequence is one large block instead, extract 0 to the size of the incoming block and use that for comparison). If it matches, increment the search index by 1 (or by the size of the block if using a single search block). Keep checking input blocks against the current search block marked by the index. If an input block doesn’t match, reset the index to 0. If the index surpasses the total number of blocks in the search array (or the length of the one large search block), raise the alert.

  10. Edward Shen

    Said on June 16, 2008 :

    Multiple State Machines (or one well built one) should do the trick.

  11. Alex

    Said on June 16, 2008 :

    Given the super-generic nature of the question, it’s difficult to say if this would be a sufficient answer… But given that we’re looking for a specific combination of records, I would build a state machine around the expression. Simply consider every path to the “fail” state to be an edge back to the start state. You wouldn’t have to store the records in the nodes, simply the test that would have to run over the record.

    So let’s say my requirement was “3 single digit records in a row.” Set up an FSA with 4 nodes. Start, first,second,finish.
    -Get the first record. It’s a “4″. It passes the “single digit” test. update your state machine so that it’s at state “first”.
    -Get the next record. The record is “403BBX”. it fails the single digit test, go back to “start” node.
    -Get the next record, it’s a “3″. advance 1 node.
    -Get the next record, it’s a “6″. advance 1 node.
    -Get the next record, it’s a “5″. advance 1 node.
    You’re at the “finish” node. Signal to wherever, fire the “found what we were looking for” event. The record is passed along the pipe as soon as the test is run. We actually don’t even need to use the “compare” method.

    This algorithm depends on a few things: First of all, it requires that the combination or expression we’re looking for can be represented by an FSA, or other structure studied while learning Automata (Maybe a Context Free Grammar, for instance). Second, it requires that we know the expression or combination while in the source-code phase, and it’s not being passed in dynamically (writing code that generates FSA’s would NOT count as a simple solution to this problem).

  12. Trinity

    Said on June 17, 2008 :

    Hi,

    Given the constraints here is my algorithm. This will work based on the assumption that given the expression to compare, we can compute the sequence of binary records equivalent to the expression. We need to do this just once during the initialization time and after that we can use the available record-comparison methods to compare two binary records.

    1) Use the given expression and create the sequence of binary records representing that expression. Set a pointer to the first binary record that we just created.

    2) Once this is set up, start receiving the incoming binary records (one every 20ms).

    3) Read the incoming binary record and use the O(1) comparison method to check if there is a match between the incoming binary record and the binary record that the pointer is pointing to.
    If the records match, move the pointer to the next binary record (in the precomputed list of records) and pass the received (incoming) binary record to the next component in the pipeline.

    If the binary records do not match then, reset the pointer to point to the beginning of the precomputed list.

    4) If the pointer has moved past the end of the precomputed list, generate the alert specifying that the expression match has found.

    5) Continue steps 3 and 4 comparing each record and manipulating the pointer accordingly.

    Let me know your comments

    Regards
    Trinity

  13. Antoine Hersen

    Said on June 17, 2008 :

    Hello,

    What kind of expression ? A regular expression ?

    If so a finite state machine that can be easily derived from the regular expression should suffice.

    But it doubt the problem is that easy so what am I missing ?

    Regards,
    Antoine Hersen

  14. kuldip saini

    Said on June 17, 2008 :

    Just set a flag on first matching record arrival. If subsequent records matches keep it in set state and on last matching record arrival raise the event also reset the flag.
    If after first correct record subsequent records does not match reset the flag.

  15. Mark Brackett

    Said on June 17, 2008 :

    In code (C#):
    class PatternMatch {
    readonly Record[] pattern;
    readonly int patternLength;
    int patternIndex = 0;

    PatternMatch(Record[] pattern) {
    this.pattern = pattern;
    this.patternLength = this.pattern.Length;
    }

    Record ProcessRecord(Record record) {
    if (pattern[patternIndex] == record) {
    patternIndex++;
    } else if (patternIndex != 0 && pattern[0] == record) {
    patternIndex = 1;
    } else {
    patternIndex = 0; //reset
    }

    if (patternIndex == patternLength) {
    patternIndex = 0; //reset
    OnPatternFound(record);
    }

    return record;
    }
    }

    In words:
    Store a pattern as an array of the record structure. Use an index variable to track how far along the pattern the previous records matched.
    As each record comes in, compare the pattern at the index to the incoming record. If they’re the same, then increment your index. Otherwise, compare the record to the pattern start.

  16. Heiko Hatzfeld

    Said on June 18, 2008 :

    How complex can the sequence of “search” records be?

    to be more exact:
    - How many records will be in one sequence?
    - Will the sequence allow Wild-cards (Match “A”, Match “B”, Match (up to) any 2, Match “C”)?
    - Will there be quantifiers (up to X times… at least Y times…)?

  17. Amit

    Said on June 18, 2008 :

    Hi Heiko,

    I assume that sequence = expression we are looking for. So:
    - The expression can be of any length (you can assume that it is not bigger than your memory, that is not the point here).
    - No, the expression (sequence) is “hard coded” (it is not defined by rules).
    - Again, the expression (sequence) is “hard coded”.

    Is everything clear now?

  18. Heiko Hatzfeld

    Said on June 18, 2008 :

    Hello…

    I posted my solution here…

    It is using enumerators to keep track of all currently possible matchings. So it will correctly alert the user 3 times if
    “Foo” “Foo”
    is the pattern and the input is
    “Foo” “Foo” “Foo” “Foo”

    Also it is using generics this time, so it can work with any object that is IEquatable

    http://roadrunner74.wordpress.com/2008/06/23/dev102-needle-in-a-haystack/

  19. Ken Snyder

    Said on June 18, 2008 :

    Can the matching expressions overlap? For example, say you need to match “abcdab” and you receive records “ab”, “cd”, “ab”, “cd”, then “ab”–would there be two matches there or only one?

  20. Shahar Y

    Said on June 18, 2008 :

    Hi Ken,

    There will be one match.

  21. Heiko Hatzfeld

    Said on June 18, 2008 :

    Doh…

    One match is far to simple to be a real challenge ;)

    And I never trust a requirement that says there will only be one… After all… This is a pipe, the pipe could be up for days, and you are sure you only want one match? ;)

  22. Shahar Y

    Said on June 18, 2008 :

    Hi Heiko,

    If it is an expression inside an expression, like in Ken example, one match is enough.
    I didn’t meen that you are only looking for the first match, though (you keep on working, looking for matches all the time).
    But a good solution will find both matches in Ken’s example too.

    BTW – the link that you provided for your solution brings me into a page where I get the following message – “Looks like you have a problem here sir/madam. You sure you have the right place?”.

  23. Heiko Hatzfeld

    Said on June 18, 2008 :

    The Blog was sceduled for creation. I removed that and protected it by password…

    Try now…

    The password is “foo”

    You can wipe this post, since i will unprotect it on monday…

    Heiko

  24. James Curran

    Said on June 18, 2008 :

    Since relying on the trackbacks have failed the last two times I posted an answer on my blog, I’ll note here that I’ve written on a solution at:
    http://honestillusion.com/blogs/blog_0/archive/2008/06/18/dev102-s-programming-job-interview-challenge-8.aspx

  25. Jonathan Gilbert

    Said on June 19, 2008 :

    I will assume the existence of a generic ring buffer structure.

    Since the only comparison between incoming records is equality and the incoming records cannot be stored, there is no way to avoid a linear search of the possible records in the match; if we can’t store the record itself, we’ll have to store *which* record it was.

    To this end, in the initialization, each record in the “expression” to match must be assigned an ID. Since there may be repetition in the records in the expression (e.g. in the “ab” “cd” “ab” example in Ken Snyder’s question), the initialization is O(n^2) on the size of the expression; for each subsequent record in the expression, the previous records must be searched to determine whether an ID has already been assigned to that record. For Ken Snyder’s input, after initialization, the IDs 1, 2 and 1 would be assigned to the three tokens in the search expression. The ringbuffer is assigned to the total number of elements in the search expression and initialized with a semaphore value different from any ID assigned to the search tokens, such as -1.

    Since there is no way provided to order or hash the records, a linear search can’t be avoided, but we can at least work to reduce the constant a little bit. There are many different ways of predicting input, but to be as efficient as possible, whatever algorithm is used needs to be matched against the characteristics of the input. No information is provided about these, so I will present a simple heuristic: A doubly-linked list is maintained in which each node refers to one of the tokens from the input expression. Only unique tokens are present; there is no need for duplication. Each node also stores the frequency with which that node has been hit. On each new token from the input stream, the linked list is searched for a match. If one is found, that node’s frequency count is incremented and then the order of the linked list is maintained in order of decreasing frequency. Thus, more frequently-matched tokens will be closer to the start of the list and a full scan won’t be necessary. This is obviously still O(n), but the constant should be measurably smaller for at least some types of input.

    Having identified which of the possible search tokens, if any, the input matches, the ID of that token (or the “no match” semaphore value if there was no match) is placed into the ringbuffer, overwriting the ID remembered for the token that has just gone “out of reach”. Finally, the ring buffer is scanned, comparing the IDs against those in the search input. If they all match, then a search hit is reported.

    If the offset into the stream is required, then it is a trivial matter to count the binary records as they pass through.

    The complexity is then O(n*m) with a respectable constant. I can’t see any way to improve on this within the constraints listed. The performance could be improved substantially, though, if a comparison operator were available that could order the binary records. By sorting a copy of the search expression’s distinct records (along with their IDs), a binary search can be used to identify which of the search records each input record matches, reducing the complexity to O(m lg n). Alternately, a hashing operator could provide a respectable look-up time. Depending on the characteristics of the input records and the amount of memory available, a hashing operator might be faster or slower: If the input records are such that a greater-than or less-than comparison can be returned very close to the start of the record, a binary search of even a moderately large search could take less time than simply computing the hash of the input record (which must touch every byte of it). Alternately, if the search expression is not too large and memory requirements are lax enough that an array capable of essentially eliminating collisions might be used, hashing could provide essentially O(1) look-ups of the incoming records. However, guaranteeing this with generality would not be possible; it would only work for certain classes of input, and there are guaranteed to be pathologically bad cases, depending on the hashing algorithm and how aggressively collisions are avoided (by increasing the array size).

  26. Mark R

    Said on June 25, 2008 :

    I was quite surprised not to see my post here – the dog must have ate it. Here it is again, as best as I can remember.

    I have run into a similar problem in real life, and I solved it by using the Knuth-Morris-Pratt algorithm:
    http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

    Since it examines each character only once, it is perfect for using in a pipe. It is only necessary to realize that the “character” can be of any size, as long as you have an equals operator – which was a given condition of the problem.

3 Trackback(s)

  1. Jun 17, 2008: Arjan`s World » LINKBLOG for June 17, 2008
  2. Jun 18, 2008: Dev102 - Needle in a Haystack « Some things to note…
  3. Jun 23, 2008: A PROGRAMMING JOB INTERVIEW CHALLENGE #9 | Dev102.com

Post a Comment