591. Tag Validator

Given a string representing a code snippet, you need to implement a tag validator to parse the code and return whether it is valid. A code snippet is valid if all the following rules hold:

  1. The code must be wrapped in a valid closed tag. Otherwise, the code is invalid.
  2. A closed tag (not necessarily valid) has exactly the following format : <TAG_NAME>TAG_CONTENT</TAG_NAME>. Among them, <TAG_NAME> is the start tag, and </TAG_NAME> is the end tag. The TAG_NAME in start and end tags should be the same. A closed tag is valid if and only if the TAG_NAME and TAG_CONTENT are valid.
  3. A valid TAG_NAME only contain upper-case letters, and has length in range [1,9]. Otherwise, the TAG_NAME is invalid.
  4. A valid TAG_CONTENT may contain other valid closed tags, cdata and any characters (see note1) EXCEPT unmatched <, unmatched start and end tag, and unmatched or closed tags with invalid TAG_NAME. Otherwise, the TAG_CONTENT is invalid.
  5. A start tag is unmatched if no end tag exists with the same TAG_NAME, and vice versa. However, you also need to consider the issue of unbalanced when tags are nested.
  6. A < is unmatched if you cannot find a subsequent >. And when you find a < or </, all the subsequent characters until the next > should be parsed as TAG_NAME (not necessarily valid).
  7. The cdata has the following format : <![CDATA[CDATA_CONTENT]]>. The range of CDATA_CONTENT is defined as the characters between <![CDATA[ and the first subsequent ]]>.
  8. CDATA_CONTENT may contain any characters. The function of cdata is to forbid the validator to parse CDATA_CONTENT, so even it has some characters that can be parsed as tag (no matter valid or invalid), you should treat it as regular characters.

Valid Code Examples:

Input: "<DIV>This is the first line <![CDATA[<div>]]></DIV>"
Output: True
Explanation:
The code is wrapped in a closed tag : <DIV> and </DIV>.
The TAG_NAME is valid, the TAG_CONTENT consists of some characters and cdata.
Although CDATA_CONTENT has unmatched start tag with invalid TAG_NAME, it should be considered as plain text, not parsed as tag.
So TAG_CONTENT is valid, and then the code is valid. Thus return true.
Input: "<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"
Output: True
Explanation:
We first separate the code into : start_tag|tag_content|end_tag.
start_tag -> "<DIV>"
end_tag -> "</DIV>"
tag_content could also be separated into : text1|cdata|text2.
text1 -> ">> ![cdata[]] "
cdata -> "<![CDATA[<div>]>]]>", where the CDATA_CONTENT is "<div>]>"
text2 -> "]]>>]"
The reason why start_tag is NOT "<DIV>>>" is because of the rule 6. The reason why cdata is NOT "<![CDATA[<div>]>]]>]]>" is because of the rule 7.

Invalid Code Examples:

Input: "<A>  <B> </A>   </B>"
Output: False
Explanation: Unbalanced. If "<A>" is closed, then "<B>" must be unmatched, and vice versa.

Input: "<DIV>  div tag is not closed  <DIV>"
Output: False

Input: "<DIV>  unmatched <  </DIV>"
Output: False

Input: "<DIV> closed tags with invalid tag name  <b>123</b> </DIV>"
Output: False

Input: "<DIV> unmatched tags with invalid tag name  </1234567890> and <CDATA[[]]>  </DIV>"
Output: False

Input: "<DIV>  unmatched start tag <B>  and unmatched end tag </C>  </DIV>"
Output: False

Note:

  1. For simplicity, you could assume the input code (including the any characters mentioned above) only contain letters, digits, '<','>','/','!','[',']' and ' '.

Rust Solution

struct Solution;

struct Parser {
    code: Vec<char>,
    index: usize,
    n: usize,
}

impl Parser {
    fn new(code: String) -> Self {
        let code: Vec<char> = code.chars().collect();
        let index = 0;
        let n = code.len();
        Parser { code, index, n }
    }

    fn parse_code(&mut self) -> Result<(), String> {
        self.parse_closed_tag()?;
        if self.peek().is_none() {
            Ok(())
        } else {
            Err("parse_code".to_string())
        }
    }

    fn parse_closed_tag(&mut self) -> Result<(), String> {
        let start_tag = self.parse_start_tag()?;
        self.parse_content()?;
        let end_tag = self.parse_end_tag()?;
        if start_tag != end_tag {
            return Err(start_tag);
        }
        Ok(())
    }

    fn parse_content(&mut self) -> Result<(), String> {
        while let Some(c) = self.peek() {
            if c != '<' {
                self.next();
            } else {
                if let Some(s) = self.peek_more(2) {
                    if s == "<!" {
                        self.parse_cdata()?;
                        continue;
                    }
                    if s == "</" {
                        break;
                    }
                    self.parse_closed_tag()?
                } else {
                    return Err("<".to_string());
                }
            }
        }
        Ok(())
    }

    fn parse_cdata(&mut self) -> Result<(), String> {
        if let Some(s) = self.next_more(9) {
            if s != "<![CDATA[" {
                Err(s)
            } else {
                while let Some(s) = self.peek_more(3) {
                    if s == "]]>" {
                        break;
                    } else {
                        self.next();
                    }
                }
                if let Some(s) = self.next_more(3) {
                    if s == "]]>" {
                        Ok(())
                    } else {
                        Err(s)
                    }
                } else {
                    Err("]]>".to_string())
                }
            }
        } else {
            Err("<![CDATA[".to_string())
        }
    }

    fn parse_start_tag(&mut self) -> Result<String, String> {
        if let Some(c) = self.next() {
            if c != '<' {
                Err("<".to_string())
            } else {
                let mut tag = "".to_string();
                while let Some(c) = self.next() {
                    match c {
                        '>' => {
                            break;
                        }
                        'A'..='Z' => {
                            tag.push(c);
                        }
                        _ => return Err("<".to_string()),
                    }
                }
                if !(1..=9).contains(&tag.len()) {
                    return Err(tag);
                }
                Ok(tag)
            }
        } else {
            Err("<".to_string())
        }
    }

    fn parse_end_tag(&mut self) -> Result<String, String> {
        if let Some(s) = self.next_more(2) {
            if s != "</" {
                Err(s)
            } else {
                let mut tag = "".to_string();
                while let Some(c) = self.next() {
                    if c == '>' {
                        break;
                    } else {
                        tag.push(c);
                    }
                }
                Ok(tag)
            }
        } else {
            Err("<".to_string())
        }
    }

    fn peek(&self) -> Option<char> {
        if self.index < self.n {
            Some(self.code[self.index])
        } else {
            None
        }
    }
    fn peek_more(&self, size: usize) -> Option<String> {
        if self.index + size < self.n {
            Some(self.code[self.index..self.index + size].iter().collect())
        } else {
            None
        }
    }
    fn next(&mut self) -> Option<char> {
        if self.index < self.n {
            let c = self.code[self.index];
            self.index += 1;
            Some(c)
        } else {
            None
        }
    }
    fn next_more(&mut self, size: usize) -> Option<String> {
        if self.index + size <= self.n {
            let s = self.code[self.index..self.index + size].iter().collect();
            self.index += size;
            Some(s)
        } else {
            None
        }
    }
}

impl Solution {
    fn is_valid(code: String) -> bool {
        let mut parser = Parser::new(code);
        parser.parse_code().is_ok()
    }
}

#[test]
fn test() {
    let code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>".to_string();
    let res = true;
    assert_eq!(Solution::is_valid(code), res);
    let code = "<DIV>>>  ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>".to_string();
    let res = true;
    assert_eq!(Solution::is_valid(code), res);
    let code = "<A>  <B> </A>   </B>".to_string();
    let res = false;
    assert_eq!(Solution::is_valid(code), res);
    let code = "<DIV>  div tag is not closed  <DIV>".to_string();
    let res = false;
    assert_eq!(Solution::is_valid(code), res);
    let code = "<DIV>  unmatched <  </DIV>".to_string();
    let res = false;
    assert_eq!(Solution::is_valid(code), res);
    let code = "<DIV> closed tags with invalid tag name  <b>123</b> </DIV>".to_string();
    let res = false;
    assert_eq!(Solution::is_valid(code), res);
    let code = "<DIV> unmatched tags with invalid tag name  </1234567890> and <CDATA[[]]>  </DIV>"
        .to_string();
    let res = false;
    assert_eq!(Solution::is_valid(code), res);
    let code = "<DIV>  unmatched start tag <B>  and unmatched end tag </C>  </DIV>".to_string();
    let res = false;
    assert_eq!(Solution::is_valid(code), res);
    let code = "<A></A><B></B>".to_string();
    let res = false;
    assert_eq!(Solution::is_valid(code), res);
    let code = "<A><A>/A></A></A>".to_string();
    let res = true;
    assert_eq!(Solution::is_valid(code), res);
    let code = "<A<></A<>".to_string();
    let res = false;
    assert_eq!(Solution::is_valid(code), res);
}

Having problems with this solution? Click here to submit an issue on github.