---
date: '2026-01-15'
description: AST et al.
id: results
modified: 2026-06-05 15:08:40 GMT-04:00
tags:
  - sfwr4tb3
  - assignment
title: regular constructions and language parser
created: '2026-01-15'
published: '2026-01-15'
pageLayout: default
slug: thoughts/university/twenty-five-twenty-six/sfwr-4tb3/88-Assignment-2/results
permalink: https://aarnphm.xyz/thoughts/university/twenty-five-twenty-six/sfwr-4tb3/88-Assignment-2/results.md
generator:
  quartz: v4.6.0
  hostedProvider: Cloudflare
  baseUrl: aarnphm.xyz
full: https://aarnphm.xyz/llms-full.txt
---
## A1

Consider expression made up of identifiers $a, b, c, d$ and binary operators $+, -$ like shown

$$
\begin{align*}
a &+ b &+ c \\
a &- b &- c \\
a &- b &+ c - d \\
a &+ b &- c + d
\end{align*}
$$

Write grammars as below with NLTK and draw the parse trees with NLTK

> \[!question\] 1.
>
> Write a grammar such that $+$ binds tighter than $-$ and both $+$ and $-$ associate to the left. That is, $a+b+c$ is parsed as $\left( a+b \right)+c$ and $a-b+c-d$ as $\left( a - \left( b+c \right) \right) - d$. Draw the parse tree for $a+b+c$ and $a-b+c-d$!

> \[!question\] 2.
>
> Write a grammar such that $-$ binds tighter than $+$ and both $-$ and $+$ associate to the left. That is $a+b+c$ is parsed as $\left( a+b \right) + c$ and $a-b+c-d$ as $\left( a-b \right) + \left( c-d \right)$. Draw the parse tree for $a+b+c$ and $a-b+c-d$!

> \[!question\] 3.
>
> Write a grammar such that $+$ and $-$ bind equally strongly and associate to the left. That is, $a+b+c$ is parsed as $\left( a+b \right)+c$ is parsed as $\left( a+b \right) + c$ and $a-b+c-d$ as $\left( \left( a-b \right) +c \right) - d$. Draw the parse tree for $a+b+c$ and $a-b+c-d$!

> \[!question\] 4.
>
> Write a grammar such that $+$ and $-$ bind equally strongly and associate to the right. That is $a+b+c$ is parsed as $a+\left( b+c \right)$ and $a-b+c-d$ as $a-\left( b+\left( c-d \right) \right)$. Draw the parses trees for $a+b+c$ and $a-b+c-d$!

> \[!question\] 5.
>
> Now consider expression made up of identifiers $a, b, c, d$ binary operator $+$ and unary operator $-$ such that
>
> $$
> \begin{aligned}
> &-a+b+c
> &a+-b+c
> \end{aligned}
> $$
>
> Write a grammar such that $-$ binds tighter than $+$ and $+$ associates to the left. That is, $-a+b+c$ is parsed as $\left( \left( -a \right)+b \right)+c$ and $a+-b+c$ as $a+\left( \left( -b \right) + c \right)$. Draw the parse tree for $-a+b+c$ and $a+-b+c$!

> \[!question\] 6.
>
> Write a grammar such that $+$ binds tighter than $-$ and $+$ associates to the left. That is, $-a+b+c$ is parsed as $-\left( \left( a+b \right) +c \right)$ and $a+-b+c$ as $a+\left( -\left( b+c \right) \right)$. Draw the parse trees for $-a+b+c$ and $a+-b+c$!

```python title="a1.py" path="thoughts/university/twenty-five-twenty-six/sfwr-4tb3/88 Assignment 2/a1.py"
#!/usr/bin/env python3
"""
A1: Grammar variations for expressions with identifiers a, b, c, d and operators +, -

Key principles:
- Tighter binding = deeper in grammar hierarchy (lower nonterminal)
- Left-associativity = left recursion (E -> E op T)
- Right-associativity = right recursion (E -> T op E)
"""

from nltk import CFG
from nltk.parse.chart import ChartParser
from nltk.tree import Tree


def parse_and_draw(grammar: CFG, sentence: str, title: str = ''):
  parser = ChartParser(grammar)
  tokens = list(sentence.replace(' ', ''))
  trees = list(parser.parse(tokens))
  if trees:
    if title:
      print(f'\n{title}')
    trees[0].pretty_print()
    return trees[0]
  print(f'No parse found for: {sentence}')
  return None


# Q1: + binds tighter than -, both left-associative
# a+b+c -> (a+b)+c
# a-b+c-d -> (a-(b+c))-d
grammar_q1 = CFG.fromstring("""
    E -> E '-' T | T
    T -> T '+' F | F
    F -> 'a' | 'b' | 'c' | 'd'
""")

# Q2: - binds tighter than +, both left-associative
# a+b+c -> (a+b)+c
# a-b+c-d -> (a-b)+(c-d)
grammar_q2 = CFG.fromstring("""
    E -> E '+' T | T
    T -> T '-' F | F
    F -> 'a' | 'b' | 'c' | 'd'
""")

# Q3: + and - bind equally, left-associative
# a+b+c -> (a+b)+c
# a-b+c-d -> ((a-b)+c)-d
grammar_q3 = CFG.fromstring("""
    E -> E '+' T | E '-' T | T
    T -> 'a' | 'b' | 'c' | 'd'
""")

# Q4: + and - bind equally, right-associative
# a+b+c -> a+(b+c)
# a-b+c-d -> a-(b+(c-d))
grammar_q4 = CFG.fromstring("""
    E -> T '+' E | T '-' E | T
    T -> 'a' | 'b' | 'c' | 'd'
""")

# Q5: unary - binds tighter than binary +, + left-associative
# -a+b+c -> ((-a)+b)+c
# a+-b+c -> ((a+(-b))+c)
# NOTE: the assignment says a+-b+c -> a+((-b)+c) but that contradicts left-assoc
# implementing standard left-associativity here
grammar_q5 = CFG.fromstring("""
    E -> E '+' T | T
    T -> '-' T | F
    F -> 'a' | 'b' | 'c' | 'd'
""")

# Q6: binary + binds tighter than unary -, + left-associative
# -a+b+c -> -((a+b)+c)
# a+-b+c -> a+(-(b+c))
grammar_q6 = CFG.fromstring("""
    E -> '-' E | T
    T -> T '+' U | U
    U -> '-' E | F
    F -> 'a' | 'b' | 'c' | 'd'
""")


def main():
  # Q1 tests
  parse_and_draw(grammar_q1, 'a+b+c', 'Q1: a+b+c -> (a+b)+c')
  parse_and_draw(grammar_q1, 'a-b+c-d', 'Q1: a-b+c-d -> (a-(b+c))-d')

  # Q2 tests
  parse_and_draw(grammar_q2, 'a+b+c', 'Q2: a+b+c -> (a+b)+c')
  parse_and_draw(grammar_q2, 'a-b+c-d', 'Q2: a-b+c-d -> (a-b)+(c-d)')

  # Q3 tests
  parse_and_draw(grammar_q3, 'a+b+c', 'Q3: a+b+c -> (a+b)+c')
  parse_and_draw(grammar_q3, 'a-b+c-d', 'Q3: a-b+c-d -> ((a-b)+c)-d')

  # Q4 tests
  parse_and_draw(grammar_q4, 'a+b+c', 'Q4: a+b+c -> a+(b+c)')
  parse_and_draw(grammar_q4, 'a-b+c-d', 'Q4: a-b+c-d -> a-(b+(c-d))')

  # Q5 tests
  parse_and_draw(grammar_q5, '-a+b+c', 'Q5: -a+b+c -> ((-a)+b)+c')
  parse_and_draw(grammar_q5, 'a+-b+c', 'Q5: a+-b+c -> ((a+(-b))+c)')

  # Q6 tests
  parse_and_draw(grammar_q6, '-a+b+c', 'Q6: -a+b+c -> -((a+b)+c)')
  parse_and_draw(grammar_q6, 'a+-b+c', 'Q6: a+-b+c -> a+(-(b+c))')


if __name__ == '__main__':
  main()

```

## A2

Consider following grammar for arithmetic expressions:

$$
\begin{aligned}
\text{expression} &\to [\, + \mid - \,]\, \text{term}\, \{\, (\, + \mid - \,)\, \text{term}\, \} \\
\text{term}       &\to \text{factor}\, \{\, (\, \times \mid /\, )\, \text{factor}\, \} \\
\text{factor}     &\to \text{number} \mid \text{identifier} \mid (\, \text{expression}\, )
\end{aligned}
$$

### part 1.

Use the LaTeX `mdwtools` package to

1. pretty-print above grammar, as in:
   ![[thoughts/university/twenty-five-twenty-six/sfwr-4tb3/88 Assignment 2/img/expressiongrammar.webp]]
2. to draw the syntax diagrams of the three nonterminals, as in:
   ![[thoughts/university/twenty-five-twenty-six/sfwr-4tb3/88 Assignment 2/img/termdiagram.webp]]

For this, create a file `prettyprintgrammar.tex` from the terminal and use this as template:

```latex
\documentclass{article}
\usepackage[rounded]{syntax}

\title{Syntax Diagrams for Expressions}
\author{Aaron Pham}

\setlength{\linewidth}{160mm}
\setlength
{\textwidth}{160mm}

\begin{document}
\maketitle

\begin{grammar}
...
\end{grammar}


\begin{syntdiag} % for expression
...
\end{syntdiag}

\begin{syntdiag} % for term
...
\end{syntdiag}

\begin{syntdiag} % for factor
...
\end{syntdiag}


\end{document}

```

You can run `pdflatex` from the terminal with `pdflatex prettyprintgrammar.tex`

### part 3.

Use the railroad diagram generator RR to draw the syntax diagram! You can either use the website <http://bottlecaps.de/rr/ui> or run RR locally. To run RR locally:

1. Run `java -jar ./rr-2.2-SNAPSHOT-java11/rr.war -gui` inside the unzipped folder. That should print the message `Now listening on http://localhost:8080/`.
2. Open `http://localhost:8080/` in your web browser.

Note that RR uses a W3C standard for EBNF. Insert the grammar and the generated SVG or PNG diagrams in the cell below.

![[thoughts/university/twenty-five-twenty-six/sfwr-4tb3/88 Assignment 2/diagram]]

## A3

Procedure `derivable` from the course notes can be used for unrestricted grammars, not just context-sensitive grammars. The procedure will terminate with `true` or `false` for context-sensitive grammars (_decision procedure_) but may or may not terminate for unrestricted grammars (_semi-decision procedure_).

```python
from collections.abc import Iterator


class Grammar:
  def __init__(
    self, T: set[str], N: set[str], P: set[tuple[str, str]], S: str
  ):
    self.T, self.N, self.P, self.S = T, N, P, S

  def L(self, log=False, stats=False) -> Iterator[str]:
    dd, d = set(), {self.S}
    if log:
      print('    ', self.S)
    while d != set():
      if stats:
        print('# added derivations:', len(d))
      if log:
        print()
      dd.update(d)
      d = set()
      for π in sorted(dd, key=len):
        for σ, τ in self.P:  # production σ → τ
          i = π.find(σ, 0)
          while i != -1:  # π == π[0:i] + σ + π[i + len(σ):]
            χ = π[0:i] + τ + π[i + len(σ) :]
            χ = χ.replace('  ', ' ')
            if (χ not in dd) and (χ not in d):
              if all(a in self.T for a in χ.split()):
                yield χ.strip()
              if log:
                print('    ', π, '⇒', χ)
              d.add(χ)
            i = π.find(σ, i + 1)


def derivable(
  G: Grammar, ω: str, log=False, stats=False
) -> bool:  # G must be context-sensitive
  dd, d, ω = set(), {G.S}, ω.strip()
  if log:
    print('    ', G.S)
  while d != set():
    if stats:
      print('# added derivations:', len(d))
    if log:
      print()
    dd.update(d)
    d = set()
    for π in sorted(dd, key=len):
      for σ, τ in G.P:  # production σ → τ
        i = π.find(σ, 0)
        while i != -1:  # π == π[0:i] + σ + π[i + len(σ):]
          χ = π[0:i] + τ + π[i + len(σ) :]
          χ = χ.replace('  ', ' ')
          if (χ not in dd) and (χ not in d):
            if χ.strip() == ω:
              return True
            elif len(χ.strip()) <= len(ω):
              if log:
                print('    ', π, '⇒', χ)
              d.add(χ)
          i = π.find(σ, i + 1)
  return False


setattr(Grammar, 'derivable', derivable)
```

Consider the language $\{ a^{n}b^{2n}c^{n} \mid n \ge 1 \}$. Write a grammar, $G$, for this language, and use procedure `derivable` to check that `a b b c, a a b b b b c c, a a a b b b b b b c c c` are derivable, but `a b c, a b b b c, a b b c c, a a b b c c` are not.! The grammar must be monotonic, meaning context-sensitive.

```python title="a3.py" path="thoughts/university/twenty-five-twenty-six/sfwr-4tb3/88 Assignment 2/a3.py"
#!/usr/bin/env python3
"""
A3: Context-Sensitive Grammar for {a^n b^{2n} c^n | n ≥ 1}

The grammar must be monotonic (context-sensitive): |α| ≤ |β| for all α → β.

Strategy:
1. Generate structure: S → a B B C | a S B B C
   - Each 'a' comes with two B's and one C
   - 'a' is already terminal, B and C are nonterminals

2. Permute to sort: C B → B C
   - Move all B's to the left of all C's

3. Convert nonterminals to terminals (left-to-right):
   - a B → a b  (start conversion at boundary)
   - b B → b b  (propagate through B's)
   - b C → b c  (convert first C)
   - c C → c c  (propagate through C's)

All productions are monotonic (length-preserving or increasing).
"""


class Grammar:
  def __init__(
    self, T: set[str], N: set[str], P: set[tuple[str, str]], S: str
  ):
    self.T, self.N, self.P, self.S = T, N, P, S

  def derivable(self, ω: str, log=False, stats=False) -> bool:
    dd, d, ω = set(), {self.S}, ω.strip()
    if log:
      print('    ', self.S)
    while d:
      if stats:
        print('# added derivations:', len(d))
      if log:
        print()
      dd.update(d)
      d = set()
      for π in sorted(dd, key=len):
        for σ, τ in self.P:
          i = π.find(σ, 0)
          while i != -1:
            χ = π[0:i] + τ + π[i + len(σ) :]
            χ = χ.replace('  ', ' ')
            if (χ not in dd) and (χ not in d):
              if χ.strip() == ω:
                return True
              elif len(χ.strip()) <= len(ω):
                if log:
                  print('    ', π, '⇒', χ)
                d.add(χ)
            i = π.find(σ, i + 1)
    return False


# Grammar for {a^n b^{2n} c^n | n ≥ 1}
# Terminals: a, b, c
# Nonterminals: S, B, C
# Start symbol: S
T = {'a', 'b', 'c'}
N = {'S', 'B', 'C'}
P = {
  ('S', 'a B B C'),  # base case: n=1 generates a B B C
  ('S', 'a S B B C'),  # recursive: each level adds a, B B, C
  ('C B', 'B C'),  # permutation: move B's left of C's
  ('a B', 'a b'),  # conversion: terminal 'a' triggers B→b
  ('b B', 'b b'),  # propagate: b triggers next B→b
  ('b C', 'b c'),  # conversion: last b triggers C→c
  ('c C', 'c c'),  # propagate: c triggers next C→c
}
S = 'S'

G = Grammar(T, N, P, S)


def main():
  print('=' * 70)
  print('A3: Context-Sensitive Grammar for {a^n b^{2n} c^n | n ≥ 1}')
  print('=' * 70)

  print('\n--- Grammar ---')
  print(f'T = {T}')
  print(f'N = {N}')
  print(f'S = {S}')
  print('P = {')
  for lhs, rhs in sorted(P, key=lambda x: (x[0], x[1])):
    print(f'    {lhs} → {rhs}')
  print('}')

  print('\n--- Monotonicity Check ---')
  for lhs, rhs in P:
    lhs_len = len(lhs.split())
    rhs_len = len(rhs.split())
    status = '✓' if lhs_len <= rhs_len else '✗'
    print(f'  {status} |{lhs}| = {lhs_len} ≤ |{rhs}| = {rhs_len}')

  print('\n--- Testing Derivability ---')

  # Should be derivable
  valid = ['a b b c', 'a a b b b b c c', 'a a a b b b b b b c c c']
  print('\nValid strings (should be derivable):')
  for s in valid:
    result = G.derivable(s)
    status = '✓' if result else '✗'
    n = s.count('a')
    print(f'  {status} n={n}: "{s}" → {result}')

  # Should NOT be derivable
  invalid = ['a b c', 'a b b b c', 'a b b c c', 'a a b b c c']
  print('\nInvalid strings (should NOT be derivable):')
  for s in invalid:
    result = G.derivable(s)
    status = '✓' if not result else '✗'
    print(f'  {status} "{s}" → {result}')

  print('\n--- Sample Derivation (n=1) ---')
  print('Target: a b b c')
  G.derivable('a b b c', log=True)

  print('\n--- Sample Derivation (n=2) ---')
  print('Target: a a b b b b c c')
  G.derivable('a a b b b b c c', log=True)


if __name__ == '__main__':
  main()

```

## A4

Explain in simple words the languages described by the following regular expressions. Avoid paraphrasing the regular expressions!

> \[!question\] 1
>
> $(a^{*}b^{*})^{*}$

All strings over the alphabet $\{a, b\}$, including the empty string.

> \[!question\] 2
>
> $(a^{*}[b])^{*}$

All strings over the alphabet $\{a, b\}$, including the empty string (same language as question 1).

> \[!question\] 3
>
> $(a^{*}ba^{*}b)^{*} a^{*}$

Strings over $\{a, b\}$ with an even number of $b$‘s, including zero.

> \[!question\] 4
>
> $(a^{*}[ba^{*}c])^{*}$

Strings over $\{a, b, c\}$ where deleting all $a$‘s leaves a sequence of $bc$ pairs, possibly empty.

> \[!question\] 5
>
> $(a\mid ba)^{*}[b]$

Strings over $\{a, b\}$ with no consecutive $b$‘s.

> \[!question\] 6
>
> $a^{*}(ba+)^{*}$

Strings over $\{a, b\}$ where every $b$ is immediately followed by at least one $a$.

