--- date: '2026-02-10' description: quotes and parsing id: results modified: 2026-06-05 15:08:40 GMT-04:00 tags: - sfwr4tb3 - assignment title: path, sed created: '2026-02-10' published: '2026-02-10' pageLayout: default slug: thoughts/university/twenty-five-twenty-six/sfwr-4tb3/88-Assignment-5/results permalink: https://aarnphm.xyz/thoughts/university/twenty-five-twenty-six/sfwr-4tb3/88-Assignment-5/results.md generator: quartz: v4.6.0 hostedProvider: Cloudflare baseUrl: aarnphm.xyz full: https://aarnphm.xyz/llms-full.txt --- ## A1 Implement a sanitizer for pathnames using Flex and C. Read from stdin, produce sanitized portable pathname on stdout or error on stderr. Rules: - Components separated by `/`, consecutive `/` same as single `/` - Final `/` has no meaning, initial `/` is significant - Leading/trailing spaces allowed but no significance - Component: `a-z`, `A-Z`, `0-9`, `.` (dot) - `.` component = current directory (remove), `..` = parent directory (resolve) - Max 14 chars per component, max 255 chars total pathname | stdin | stdout | | :------------------- | :---------- | | `/aaa//bb/c/` | `/aaa/bb/c` | | `aaa/b.b/../cc/./dd` | `aaa/cc/dd` | Errors (to stderr, terminate immediately): | stdin | stderr | | :------------------------------- | :------------------- | | `/a//b/#/c` | `invalid character` | | `/012345678901234/bb` | `component too long` | | `aa/../..` | `malformed pathname` | | `/this/is/a/path/...too long...` | `pathname too long` | ```text [^generated-inline-footnote-1]*"/"+"/"* { int i=0; while(yytext[i]==' '||yytext[i]=='\t') i++; rawlen=yyleng-i; absolute=true; } [^generated-inline-footnote-2]+ ; [ \t]+$ ; "/"+"/"* { rawlen += yyleng; if (rawlen > MAX_PATH) error("pathname too long"); } {LONGCOMP} { error("component too long"); } {COMP} { rawlen += yyleng; if (rawlen > MAX_PATH) error("pathname too long"); push_comp(yytext, yyleng); } {INVALID} { error("invalid character"); } \n { output_path(); } <> { output_path(); return 0; } %% int main() { depth = 0; absolute = false; rawlen = 0; yylex(); return 0; } ``` ## A2 Write a `sed` command that removes double quotation marks surrounding CSV cells that contain neither commas nor double-quotation marks. Input: ``` "We need to add ""double quotation"" marks around cell contents","in some cases, but,","they can be unnecessary in others." "Write a sed command",that conservatively removes all "strictly unnecessary" ones,"""and no more.""" Be careful to ignore cells that contain "double quotes with no commas!",test,"test." ``` Expected output: ``` "We need to add ""double quotation"" marks around cell contents","in some cases, but,",they can be unnecessary in others. Write a sed command,that conservatively removes all "strictly unnecessary" ones,"""and no more.""" Be careful to ignore cells that contain "double quotes with no commas!",test,test. ``` ```bash sed -E ':loop; s/(^|,)"([^",]*)"(,|$)/\1\2\3/g; t loop' excessive-quotes.csv ``` The pattern matches a cell boundary (`^` or `,`), then `"`, then content with no commas or quotes (`[^",]*`), then `"`, then cell boundary (`,` or `$`). The loop ensures all such cells on a line are cleaned. ## A3 Write a `sed` command to convert JSON string-encoded booleans, null, and numbers to their native JSON types. - `"true"` → `true`, `"false"` → `false`, `"null"` → `null` - `""` → `` (integers and decimals, no spaces) Input: `{"true": true, "false": "false", "null": "null", "number1": "3.14159", ...}` Expected output: `{"true": true, "false": false, "null": null, "number1": 3.14159, ...}` ```bash sed -E 's/: "true"/: true/g; s/: "false"/: false/g; s/: "null"/: null/g; s/: "(-?[0-9]+\.?[0-9]*)"/: \1/g' quoted-data.json ``` Replace string-valued `"true"`, `"false"`, `"null"` with their unquoted equivalents. For numbers, match `": "` followed by an optional minus, digits, optional decimal point and more digits, then closing `"`. The number regex `(-?[0-9]+\.?[0-9]*)` avoids matching strings with spaces or letters. ## A4 ### Part A: Parse regular expressions into AST Extend the regex parser from the course notes with attribute evaluation rules so `parse` returns the abstract syntax tree. Attribute grammar: ``` expression(e) → term(e) { '|' term(f) « e := Choice(e, f) » } term(e) → factor(e) { factor(f) « e := Conc(e, f) » } factor(e) → atom(e) [ '*' « e := Star(e) » | '+' « e := Conc(e, Star(e)) » | '?' « e := Choice(e, ε()) » ] atom(e) → plainchar(e) | escapedchar(e) | '(' expression(e) ')' plainchar(e) → ' ' « e := Sym(' ') » | ... | '~' « e := Sym('~') » escapedchar(e) → '\\' ( '(' « e := Sym('(') » | ')' | ... | '|' « e := Sym('|') ») ``` _sol_: ```python PlainChars = ( ' !"#$%&\',-./0123456789:;<=>@ABCDEFGHIJKLMNO' + 'PQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{}~' ) EscapedChars = '()*+?\\|' FirstFactor = PlainChars + '\\(' src: str pos: int sym: str def nxt(): global pos, sym if pos < len(src): sym, pos = src[pos], pos + 1 else: sym = chr(0) def expression(): e = term() while sym == '|': nxt() f = term() e = Choice(e, f) return e def term(): e = factor() while sym in FirstFactor: f = factor() e = Conc(e, f) return e def factor(): e = atom() if sym == '*': nxt() e = Star(e) elif sym == '+': nxt() e = Conc(e, Star(e)) elif sym == '?': nxt() e = Choice(e, ε()) return e def atom(): if sym in PlainChars: e = Sym(sym) nxt() elif sym == '\\': nxt() if sym in EscapedChars: e = Sym(sym) nxt() else: raise Exception('invalid escaped character at ' + str(pos)) elif sym == '(': nxt() e = expression() if sym == ')': nxt() else: raise Exception("')' expected at " + str(pos)) else: raise Exception('invalid character at ' + str(pos)) return e def parse(s: str): global src, pos src, pos = s, 0 nxt() e = expression() if sym != chr(0): raise Exception('unexpected character at ' + str(pos)) return e ``` ### Part B: Extended attribute grammar for counted repetitions Extended grammar for `factor`: ``` factor(e) → atom(e) [ '*' « e := Star(e) » | '+' « e := Conc(e, Star(e)) » | '?' « e := Choice(e, ε()) » | '{' integer(n) ( '}' « e := repeat(e, n) » | ',' ( integer(m) '}' « e := repeatRange(e, n, m) » | '}' « e := Conc(repeat(e, n), Star(e)) » ) ) ] ``` where: - $\text{repeat}(e, 0) = \epsilon()$, $\text{repeat}(e, n) = \text{Conc}(\text{repeat}(e, n-1), e)$ for $n \ge 1$ - $\text{repeatRange}(e, n, m) = \text{Choice}(\text{repeat}(e, n), \text{repeatRange}(e, n+1, m))$ for $n < m$, $\text{repeatRange}(e, n, n) = \text{repeat}(e, n)$ Written with attribute rules: ``` factor(e) → atom(e) [ '*' « e := Star(e) » | '+' « e := Conc(e, Star(e)) » | '?' « e := Choice(e, ε()) » | '{' integer(n) ('}' « e := eN where e0 := ε(); ei := Conc(ei-1, e) for i = 1..n; eN := en » | ',' (integer(m) '}' « e := eN where e0 := ε(); ei := Conc(ei-1, e) for i = 1..m; eN := en | Choice(en, en+1) | ... | Choice(...Choice(en, en+1)..., em) » | '}' « eN := Conc(en, Star(e)) where e0 := ε(); ei := Conc(ei-1, e) for i = 1..n » ) ) ] integer(n) → digit(n) { digit(d) « n := 10 × n + d » } digit(d) → '0' « d := 0 » | ... | '9' « d := 9 » ``` _sol_: ```python PlainChars = ( ' !"#$%&\',-./0123456789:;<=>@ABCDEFGHIJKLMNO' + 'PQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz~' ) EscapedChars = '()*+?\\|{}' FirstFactor = PlainChars + '\\(' src: str pos: int sym: str def nxt(): global pos, sym if pos < len(src): sym, pos = src[pos], pos + 1 else: sym = chr(0) def repeat(e, n): r = ε() for _ in range(n): r = Conc(r, e) return r def repeatRange(e, lo, hi): r = repeat(e, lo) for i in range(lo + 1, hi + 1): r = Choice(r, repeat(e, i)) return r def integer(): if '0' <= sym <= '9': n = ord(sym) - ord('0') nxt() else: raise Exception('digit expected at ' + str(pos)) while '0' <= sym <= '9': n = 10 * n + ord(sym) - ord('0') nxt() return n def expression(): e = term() while sym == '|': nxt() f = term() e = Choice(e, f) return e def term(): e = factor() while sym in FirstFactor: f = factor() e = Conc(e, f) return e def factor(): e = atom() if sym == '*': nxt() e = Star(e) elif sym == '+': nxt() e = Conc(e, Star(e)) elif sym == '?': nxt() e = Choice(e, ε()) elif sym == '{': nxt() n = integer() if sym == '}': nxt() e = repeat(e, n) elif sym == ',': nxt() if sym == '}': nxt() e = Conc(repeat(e, n), Star(e)) else: m = integer() if sym == '}': nxt() else: raise Exception("'}' expected at " + str(pos)) e = repeatRange(e, n, m) else: raise Exception("'}' or ',' expected at " + str(pos)) return e def atom(): if sym in PlainChars: e = Sym(sym) nxt() elif sym == '\\': nxt() if sym in EscapedChars: e = Sym(sym) nxt() else: raise Exception('invalid escaped character at ' + str(pos)) elif sym == '(': nxt() e = expression() if sym == ')': nxt() else: raise Exception("')' expected at " + str(pos)) else: raise Exception('invalid character at ' + str(pos)) return e def parse(s: str): global src, pos src, pos = s, 0 nxt() e = expression() if sym != chr(0): raise Exception('unexpected character at ' + str(pos)) return e ``` [^generated-inline-footnote-1]: \t [^generated-inline-footnote-2]: \t