--- slug: index tags: - evergreen - fruit description: Aaron's digital garden title: Aaron's notes date: 2022-04-22 --- Beige and rosé are my two favourite colours. I try to be present, but you will find me either [writing](https://aarnphm.xyz/thoughts/writing#motivation) or [reading](https://aarnphm.xyz/books). I like to take long walks, host [functions](https://aarnphm.xyz/thoughts/atelier-with-friends), and people watching. Cooking is my love language, which is how my mom expresses her love for me. How one cooks their eggs tells a lot about how they treat others. [open-source projects](https://aarnphm.xyz/thoughts/work) are overall net positive for everyone, so contribute. I believe in [tools](https://aarnphm.xyz/thoughts/papers/Tools-for-Conviviality-by-Ivan-Illich.pdf) that give back [agency](https://aarnphm.xyz/thoughts/Agency) to users and help them fulfil their [desire](https://aarnphm.xyz/thoughts/desire) in life. Understanding the [inner working](https://aarnphm.xyz/thoughts/mechanistic-interpretability) of large language models would help us to do better science. Currently, I’m building [serving infrastructure](https://bentoml.com) for [ml](https://aarnphm.xyz/thoughts/Machine-learning) systems and explore our interaction through [large language models](https://aarnphm.xyz/thoughts/LLMs). I’m best reached [here](https://twitter.com/aarnphm_) or --- slug: books tags: - evergreen description: resconstructed source of "https://aarnphm.xyz/books" title: antilibrary. date: 2022-04-22 --- A (mostly) up-to-date book lists that I read, wanting, am reading, or finished reading. See also: [digital version](https://aarnphm.xyz/curius) > In essence, an [antilibrary](https://nesslabs.com/antilibrary) is a collection of unread books. It represents an ode to self that reminds you about topics that one wants to explore. ## current. | title | author | notes | | ------------------------------------------------------------------------------------------------- | --------------------------------------------------- | ----------------------------------------------- | | Essay on Love | Alain de Botton | | | [Nietzsche and Philosophy](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche) | Gilles Deleuze | | | [The Gay Science](https://aarnphm.xyz/thoughts/papers/The-Gay-Science-by-Friedrich-Nietzsche.pdf) | Friedrich Nietzsche | | | Beyond Good and Evil | Friedrich Nietzsche | | | Beyond The Pleasure Principle | Sigmund [Freud](https://aarnphm.xyz/thoughts/Freud) | | | The Critique of Pure Reason | Immanuel Kant | | | The Metaphysics of Morals | Immanuel Kant | | | Crime and Punishment | Fyodor Dostoevsky | | | Structure and Interpretation of Computer Programs | Abelson and Sussman | [pdf](https://web.mit.edu/6.001/6.037/sicp.pdf) | | Man and His Symbols | Carl G. Jung | | ## to read. ### [philosophy](https://aarnphm.xyz/tags/philosophy) | title | author | notes | | ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- | | A Treatise of Human Nature | David Hume | | | The Evolution of Modern Metaphysics: Making Sense of Things | A. W. Moore | | | [Being and Some Philosophers](https://aarnphm.xyz/thoughts/papers/Being-and-Some-Philosophers.pdf) | Etienne Gilson | | | The Phenomenology of Spirit | G. W. F. Hegel | | | The World as Will and [Representation](https://aarnphm.xyz/thoughts/representations) | Arthur Schopenhauer | | | The Prince | Niccolò Machiavelli | | | Utilitarianism | John Stuart [Mill](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill) | | | Meditations on First Philosophy | René [Descartes](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Descartes), French ed. | | | Existentialism in Social [Pedagogy](https://aarnphm.xyz/thoughts/education) | Søren Kierkegaard | | | [The Will To Believe](https://aarnphm.xyz/thoughts/The-Will-To-Believe) | William James | | | The Care of the Self | Michel Foucault | | | Metaphysical myths, mathematical Practice: The Ontology and [Epistemology](https://aarnphm.xyz/thoughts/Epistemology) of the Exact Science | Michel Foucault | | | Repetition | Kierkegaard | | | On Certainty | Ludwig Wittgenstein | | | The Conquest of Happiness | Bertrand Russell | [html](https://russell-j.com/beginner/COH-TEXT.HTM) | | Being and Time | Heidegger | | | Pensees | Pascal | [html](https://www.gutenberg.org/files/18269/18269-h/18269-h.htm) | | Being and Nothingness | Jean-Paul Sartre | | #### [Nietzsche](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche) - **The Birth of Tragedy** - **The Will to Power** - **Thus Spoke Zarathustra** - **Twilight of the Idols** - **On The Genealogy of Morals** - **Ecce Homo** #### [Kant](https://aarnphm.xyz/thoughts/Philosophy-and-Kant) - **The Critique of Practical Reason** - **Groundwork of the Metaphysics of Morals** #### [Camus](https://aarnphm.xyz/thoughts/Camus) - **The Fall** - **The Rebel** - **The First Man** - **Resistance, Rebellion, and Death** ### non-fiction | title | author | notes | | ------------------------------------------------------ | ------------------- | -------------------------------------------------------------------------------- | | Deep Work | Cal Newport | | | Digital Minimalism | Cal Newport | | | Playing Software: Homo Ludens in Computational Culture | Miguel Sicart | | | Reimagining Capitalism in a World on Fire | Rebecca Henderson | | | Principles | Ray Dalio | | | Mindset | Dr. Carol S. Dweck | | | The Pleasure of Finding Things Out | Richard P. Feynman | | | Walden and Civil Disobedience | Henry David Thoreau | | | Deep Sleep | Jade Wu | | | Are We Spiritual Machines? | Ray Kurzweil | [html](https://onlinebooks.library.upenn.edu/webbin/book/lookupid?key=olbp56055) | ### fiction | title | author | | ----------------------- | --------------------- | | Recursion | Blake Crouch | | The Trial | Franz Kafka | | Sea of Tranquility | Emily St. John Mandel | | Oblivion | David Foster Wallace | | The Uninhabitable Earth | Wallace-Weels | | The Idiot | Fyodor Dostoevsky | | The Brothers Karamazov | Fyodor Dostoevsky | | Fall On Your Knees | Ann-Marie MacDonald | | Foundation series | Isaac Asimov | | The Three-Body Problem | Liu Cixin | | Robinson Crusoe | Daniel Defoe | | The Overstory | Richard Powers | | Rejection | Tony Tulathimutte | ### poetry | title | author | | --------------------- | ----------- | | Dog songs | Mary Oliver | | Come Home To Yourself | Déjà Rae | --- ## finished. ### 2024 - **The Triple Helix: Gene, Organism, and Environment** by Richard Lewontin - **Fear and Trembling** by Søren Kierkegaard - **Either/Or** by Søren Kierkegaard - **The Lily of the Field and the Bird of the Air** by Søren Kierkegaard - **Meditations** by Marcus Aurelius - **[The Myth of Sisyphus](https://aarnphm.xyz/thoughts/Camus#the-myth-of-sisyphus)** by Albert Camus - **The Stranger** by [Albert Camus](https://aarnphm.xyz/thoughts/Camus) - **The metamorphosis** by Franz Kafka - **The end of the affair** by Graham Greene - **The Little Book of [Deep Learning](https://aarnphm.xyz/thoughts/deep-learning)** by [François Fleuret](https://fleuret.org/public/lbdl.pdf) - **[The Ego and the Id](https://aarnphm.xyz/thoughts/Freud#the-ego-and-the-id)** by Sigmund Freud - **Tomorrow, and Tomorrow, and Tomorrow** by Gabrielle Zevin - **[Web Browser Engineering](https://browser.engineering/onepage.html)** by Pavel Panchekha & Chris Harrelson - **1984** by George Orwell ### 2023 - **Why I Write** by George Orwell - **Why I Am So Wise** by Friedrich Nietzsche - **[Civilisation and its Discontents](https://aarnphm.xyz/thoughts/Civilisation-and-its-Discontents)** by Sigmund Freud - **Dopamine Nation** by Dr. Anna Lembke - **The Midnight Library** by Matt Haig - **Out of Love** by Hazel Hayes - **In Emergency, Break Glass: What Nietzsche Can Teach Us About Joyful Living in a Tech-Saturated World** by Nate Anderson - **The Subtle Art of Not Giving a Fuck** by Mark Manson - **[Pretentiousness: Why it Matters](https://aarnphm.xyz/thoughts/fashion#pretentious)** by Dan Fox - **The Republic** by Plato - **Apology** by Plato - **Symposium** by Plato - **Pillow Thoughts IV** by Courtney Peppernell - **Radically Human: How New Technology Is Transforming Business and Shaping Our Future** by Paul Daugherty and H. James Wilson ### 2022 - **Infinite Jest** by DFW - **Dune** series by Frank Herbert - **Kafka on the Shore** by Haruki Murakami - **21 Lessons for the 21st Century** by Yuval Noah Harari - **The Outsiders: Eight Unconventional CEOs and Their Radically Rational Blueprint for Success** by Will Thorndike ### 2021 - **Working in Public: The Making and Maintenance of Open Source Software** by Nadia Eghbal - **The Death of Ivan Ilyich** by Tolstoy - **Godfather** and **The Scilian** by Mario Puzo - **1984** by George Orwell --- slug: cheatsheet tags: - evergreen description: resconstructed source of "https://aarnphm.xyz/cheatsheet" title: cheatsheet date: 2024-10-10 --- A list of cheatsheet of whatever that fits with my workflow $$ \begin{aligned} \text{Big O(micron)} &: O \text{ or } \mathcal{O} \\ \text{Big Omega} &: \Omega \\ \text{Big Theta} &: \Theta \\ \text{Small O(micron)} &: o \\ \text{Small Omega} &: \omega \\ \text{On the order of}: &: \sim \end{aligned} $$ --- slug: curius tags: - evergreen - hyperlinks description: curius dot app slash aaron dash pham title: curius. date: 2024-01-26 --- See curius.app/aaron-pham or curius.aarnphm.xyz --- slug: ideas tags: - technical - evergreen description: Liste de projets, d'idées, d'écrits auxquels on reviendra. title: ideas. date: 2022-01-25 --- ### lettres - love (wip) - self-healing and love - growth after death - education and pedagogical implications on next generations - recommendation system and word2vec - social interactions a la carte. ### projets - LaTeX codeblock renderer for [neovim](https://aarnphm.xyz/uses#neovim), in editor - Support KaTeX, and probably MathJax - Uses `conceallevel` - - yet another emulator in Rust - Want to stream current running process and make it clickable? - Vim and Emacs support - multiplexer - stream TTY? ```mermaid flowchart TD 1[GUI] --> 2[tty] --> 3[rsh] 1 --> 5[multiplexer] 2 --> 1 ``` - rsh: new shell language written with Rust-like syntax - I get fed up with bash - Should be cloud-first? - Nix inspiration for caching and package management? - [Rust](https://aarnphm.xyz/thoughts/Rust) key-value store - Think of it as MongoDB but has Redis capability - Dockerfile for LLM - [ollama](https://github.com/ollama/ollama)’s Modelfile. - Dockerfile frontend, [BuildKit](https://aarnphm.xyz/thoughts/BuildKit), [OCI](https://aarnphm.xyz/thoughts/OCI)-compliant frontend. - Stay away from Docker 😄 - disappearing text - For svg: [codepen](https://codepen.io/Mikhail-Bespalov/pen/yLmpxOG) > Im thinking to build a toronto compute company, looking for funding > > — aaron (@aarnphm\_) [11 octobre 2024](https://twitter.com/aarnphm_/status/1844775079286120682) ### écriture - bazil: A [Bazel](https://bazel.build/) for the unversed - Bazel is hard to get started with --- slug: infinite-poem tags: - seed description: resconstructed source of "https://aarnphm.xyz/infinite-poem" title: infinite poem date: 2024-10-11 --- ```js const rules = { start: "$line1\n $line2\n$line3\n $line4\n$line5", line1: "What shall a $dog_breed do?", line2: "$verbs through the $nature_place,", line3: "Then she $verbs her $dog_feature.", line4: "$human_action, I $human_verb", line5: "This $adj $noun of $emotion.", dog_breed: "labrador (4) | terrier | shepherd | beagle | poodle", dog_feature: "floppy ears | wagging tail | wet nose | playful eyes | soft fur", verbs: "runs | leaps | bounds | trots | dashes", nature_place: "meadow | forest | garden | park | beach", human_action: "Watching | Smiling | Laughing | Wondering | Marveling", human_verb: "contemplate | ponder | appreciate | cherish | admire", adj: "simple | joyful | precious | fleeting | eternal", noun: "moment | bond | connection | friendship | companionship", emotion: "love | happiness | wonder | gratitude | peace", } // Generate and print the poem 5 times for (let i = 0; i < 10; i++) { console.log(`Poem ${i + 1}:`) console.log(RiTa.grammar(rules).expand()) console.log() // Add a blank line between poems } ``` --- slug: influence tags: - growth description: A list of folks that inspires me a bunch title: affecter. date: 2024-01-23 --- I think a lot about this [quote](https://aarnphm.xyz/quotes#life-jobs-smart) from Steve Jobs, and realised that you are who you surrounded yourself with. Whether online, daily, we often populate our minds and time by the people we hang around or work with. People who I owed a lot, but not limited to: [Jacky](https://jzhao.xyz/), [Chaoyu](https://twitter.com/chaoyu_), [Sean](https://www.linkedin.com/in/ssheng/), [Hank and John](https://www.youtube.com/@vlogbrothers), [Kieran](https://www.fourtet.net/), Nicole, Jesse, [Tommy](https://tommytrinh.me/) --- slug: inspo tags: - technical - seed description: cool stuff on the internet title: website inspiration date: 2024-10-24 --- ## website _see also: [portfolio trail](https://curius.app/aaron-pham/portfolio)_ - Brian Sholis’ website: clean visual, great contents ([link](https://www.sholis.com/)) - Jacky’s website ([link](https://jzhao.xyz/)) - daylightcomputer’s inspired but in pure CSS and [HTML](https://github.com/jackyzha0/sunlit) - Daylight Computer ([link](https://daylightcomputer.com/)) - - - clean aesthetics with nice hierarchical components - - warm, graphics, animation smooth ## essay - - vintage, letter type - ## protocol - Willow: protocol for synchronisable data store ([link](https://willowprotocol.org/specs/index.html#specifications)) ## resources - --- slug: movies tags: - evergreen description: resconstructed source of "https://aarnphm.xyz/movies" title: movies. date: 2024-02-07 --- A (mostly) up-to-date film, movies, shows that I have consumed, or on the watch list. > Similar to an [antilibrary](https://aarnphm.xyz/books), an anti-blockbusters is a collection of movies, short films that represents the art of film-making. Honourable mentions: [mubi](https://mubi.com/en/ca) and [a24](https://a24films.com/) ## to [watch.](https://aarnphm.xyz/thoughts/Cinematography) - [ ] The King of Comedy (1982) - [ ] Dead Poets Society (1989) - [ ] La Haine (1995) - [ ] Flame & Citron (2008) - [ ] Blue is the Warmest Color (2013) - [ ] Frances Ha (2012) - [ ] Dallas Buyers Club (2013) - [ ] Paterson (2016) - [ ] Manchester by the Sea (2016) - [ ] Killing of the Sacred Deer (2017) - [ ] The Favorite (2018) - [ ] Under The Silver Lake (2018) - [ ] The Father (2020) - [ ] Poor Things (2023) - [ ] Maestro (2023) - [ ] Dune: Part Two (2024) ## recurring. ### vintage. - Citizen Kane (1941) - Casablanca (1942) - Godfather (1972) - China Town (1974) - Scarface (1983) - Midnight Run (1988) - Goodfellas (1990) - Schindler’s List (1993) - Pulp Fiction (1994) - Forest Gump (1994) - Good Will Hunting (1997) - Notting Hill (1999) - Chicago (2002) ### thriller. - The Breakfast Club (1985) - The Silence of the Lambs (1991) - My Cousin Vinny (1992) - Shawshank Redemption (1994) - No Country for Old Men (2007) - Whiplash (2014) - Fury (2014) - The Revenant (2015) - La La Land (2016) - Hackshaw Ridge (2016) - Joker (2019) - The Banshees of Inisherin (2022) ### comedy. - Intouchables (2011) - The Intern (2015) - Jojo Rabbit (2019) ### buster. - Saving Private Ryan (1998) - Fight Club (1999) - The Social Network (2010) - Hacksaw Ridge (2016) - Blade Runner 2048 (2017) - John Wick series (2014 - 2022) - Dune (2021) ### a24. - Ex machina (2015) - Lady Bird (2017) - The Lighthouse (2019) - Uncut Gems (2019) - The Green Knight (2021) - The Tragedy of Macbeth (2021) - Everything Everywhere All at Once (2022) - Causeway (2023) - Past Lives (2023) - The Whale (2023) - Dream Scenario (2023) ### bond. - Dr. No (1962) - Goldfinger (1964) - Never Say Never Again (1983) - Octopussy (1983) - Casino Royale (2006) - Skyfall (2012) - Spectre (2015) ### wes anderson. - Rushmore (1998) - The Royal Tenenbaums (2001) - The Life Aquatic with Steve Zissou (2004) - The Darjeeling Limited (2007) - Fantastic Mr. Fox (2009) - Moonrise Kingdom (2012) - The Grand Budapest Hotel (2014) - Isle of Dogs (2018) - The French Dispatch (2021) - Asteroid City (2023) ### christopher nolan. - Following (1998) - Memento (2000) - Insomnia (2002) - Batman Begins (2005) - The Prestige (2006) - The Dark Knight (2008) - Inception (2010) - The Dark Knight Rises (2012) - Interstellar (2014) - Dunkirk (2017) - Tenet (2020) - Oppenheimer (2023) ### martin scorsese. - Mean Streets (1973) - Taxi Driver (1976) - The Wolf of Wall Street (2013) - The Irishman (2019) - Killers of the Flower Moon (2022) ### short. - The Wonderful Story of Henry Sugar (2023) ### shows. - Black Mirror - Bojack Horseman - True Detective (2014) --- slug: posts/2023 tags: - fruit - growth description: 2023: A letter to myself. title: 2023: a letter. date: 2023-12-31T00:00:00Z --- _tw: self-harm. This is a public journal entry. Some of the following writing may contains information that you might find disturbed. Please treat it with kindness and care should you choose to read it._ This is 2023. A letter to myself. I will start with a mere nostalgic reflection filled with the goods and not so good, and ends with what I wish to accomplish for 2024. --- _To 2023 self,_ Per tradition, your year began with a visit from your parents. We decided to do SF-NY, with good food, shopping, and visiting relatives. Living on a different continent, you’ve longed for the simple joys of home – those weekend returns, the comfort of a home-cooked meal, the tender gesture of cut fruits like in your younger days. Now, these moments are treasured yearly reunions, a home-away-from-home gathering, where relatives journey across the globe to rekindle familial bonds. Despite finding these gatherings overwhelming, in your heart, you cherish these visits, holding onto the warmth of your parents’ presence. Returning to San Francisco was like stepping back into a vibrant painting, a city pulsating with life and a myriad of experiences. It was here, amidst its dynamic streets and scenic vistas, that you found writing and reading, surrounded by a community of inspiring individuals. And it was San Francisco, that you and J crossed paths. Running became a rhythmic solace amidst sleepless nights and turbulent thoughts. It was a discipline that anchored you, a steady presence in the chaos of young adulthood. You would find yourself lacing up for a run, whether it was pushing through work until dawn or embarking on a 5-mile run along the Bay regardless of how tired you were. In a sense, running offered a sense of stability, a means to channel your energy and thoughts, though often at the expense of your physical and emotional health, under the guise of youthful resilience. _“You are young; you should be fine,”_ you would tell yourself, perhaps a bit too cavalierly. Then came the spontaneous decision to go backpacking in Yosemite with S. Despite not being in the best shape, the allure of adventure was irresistible. It was a journey of firsts – your inaugural backpacking trip, your first visit to the awe-inspiring Yosemite, and your first encounter with the chill of near-freezing nights outdoors. Each moment was a revelation, an invitation to embrace the unfamiliar and challenging, a vivid reminder to savor every new experience life offered. ![](https://aarnphm.xyz/posts/2023/../../posts/images/2023-collage-yosemite.svg) --- Quoting [Nietzsche](https://aarnphm.xyz/posts/2023/../../thoughts/Philosophy-and-Nietzsche), _“To live is to suffer, to survive is to find some meaning in suffering”_. I don’t know about you, but the moment life hints at normalcy and tranquility, a restless itch starts to stir within me. It’s like K often says with a knowing smile, “you are a messy gyal.” There’s a peculiar comfort in chaos, a familiar embrace in the whirlwind of change that I’ve always gravitated towards. Now, as I stand at this crossroads in the Bay Area, that restlessness is more pronounced than ever. I had taken a leave from school to move to the Bay for work, a decision that now hanged in the balance. The Bay Area, beckoned me to stay. Here, life, was a beautiful mosaic of experiences – doing what I love, being surrounded by friends, and cherishing those weekends with J. Yet, it was shadowed by the looming uncertainties of visa statuses, a constant undercurrent of anxiety about the future. The alternative, returning to Canada, loomed like a storm cloud. It’s a retreat into a past that’s drenched in discomfort, a reversal into what I’ve always perceived as a life of constraints and unfulfilled potential. The very thought of leaving J, hitting pause on our shared dreams in San Francisco, sends a pang of sorrow through me. Canada isn’t just a different location; it’s a return to a version of myself that I’ve struggled to leave behind. The visa challenges remained, a familiar yet unwelcome companion, no matter which border I call home. In the quiet moments, your mind wrestled with these paths, each fraught with its own set of fears and what-ifs. “Stop, don’t leave. You can do it. Stay,” a voice within you whispered, a blend of hope and desperation. It was a plea to cling to the life you’ve started to build here, to not let go of the joy and love you’ve found. This internal dialogue became your constant soundtrack, a reflection of the turmoil that dances within your heart. It was a Sunday afternoon, you and J were enjoying a cup of coffee, in the Marina. Like whispers of the gentlest breeze, the wind danced through J’s hair, each strand a melody, weaving tales of love in the air. It carried J’s scent, a tapestry of [rose](https://aarnphm.xyz/posts/2023/../../thoughts/Scents#le-labos-rose-31) subtly entwined with earth’s warm embrace, a tender symphony barely touching the senses. You whispered in her ears, _“I have to leave for Toronto.”_ Her hair, once a playground for your fingers, now swayed to the rhythm of a compassionate wind, each strand moving with the grace of unshed tears. The air, perfumed with the delicate scent of roses and spices, seemed to hold our memories, cradling them gently as if to soften the blow of parting. Our eyes met, yet tinged with the inevitable sorrow of farewell. Words were unnecessary; our hearts spoke in silent verses, each beat a soft adieu. It was a parting not of anger or regret, but of two souls acknowledging their journey together had reached a tender, inevitable end. We both sit there, cried in silent. The gentle wind, a compassionate witness to our farewell, carried away the last whispers of a love that was as beautiful as it was ephemeral, leaving behind a calm, poignant tranquility. --- It was now sunny July, you found yourself back in Canada, slowly acclimating to the new life. The makeshift bed, consists of two fitted sheets, a duvet, and a pillow, while waiting for furnitures to arrive back from SF, offered a modest comfort yet lacked the essence of home — a feeling that remained elusive, a sense of displacement that gently lingered. This wasn’t your first rodeo. Relocations has somewhat become normalcy for you: leaving Hanoi for boarding school seven years ago, then moving across Canada for university and lived in campus housing, to moving into student housing, _alone_ amidst 2020’s misfortunes, returning to Vietnam shortly after, then moved back to Canada for online-university living in an overcrowded unhygienic household filled with strangers, followed by your determination to leave Canada once and for all to SF, to chase the “American dream”. However, this time, the feeling stirred differently. Gone was the wide-eyed public school kid who first stepped onto Canadian soil, filled with aspirations. Faded, too, was the image of the bewildered freshman adrift in a sea of unfamiliar faces at university. And the weary, drained engineer who sought refuge in San Francisco, seeking an escape, had evolved. Now, as you sat amidst the quiet of your new space, you grappled with a curious blend of familiarity and foreignness, a paradox yet to be unraveled. It was as though each move had subtly reshaped you, leaving you at this juncture—a point where the past’s reflections and the present’s realities were gently converging, weaving a tapestry of your journey, both unique and universal. In this moment, you were at the cusp of reconciling these myriad selves, each a chapter in the unfolding story of your life. Staring into the abyss, you wonder what would unfold in this next chapter of life… --- There, in the quietude of your new surroundings, you embarked on a pilgrimage of the self. It was a journey marked not by physical distances but by the rich, inner landscapes you traversed. In the company of books – those silent yet eloquent companions – you sought refuge. The philosophers, with their timeless musings, the historians narrating tales of yore, and the modern sages offering insights of the present, became your guides in this quest for understanding. You also rekindled old friendships, those that had lain dormant in the wake of your sojourn to San Francisco. It was as if you were gathering scattered pieces of a once-familiar mosaic, each friend a fragment of a life you once knew. There was a sense of quiet accomplishment in the quiet transformation of your apartment. Each piece of furniture, was a testament to a life being patiently rebuilt, piece by piece. Physical exertion, too, found its place in your routine – climbing gym, disciplined rhythm of your runs, a pursuit of wellness that contrasted with the less tangible journey of the mind. The runs, though lacking the scenic vistas of San Francisco, offered a subtler, more introspective landscape. Work, too, assumed a new significance with [OpenLLM](https://aarnphm.xyz/posts/2023/../../thoughts/work#openllm----serve-fine-tune-and-deploy-llms-in-production). It demanded of you a pace and a depth of understanding that was both exhilarating and daunting. The ability to assimilate, to adapt swiftly, became what you accustom to. Then there was HackTheNorth. Convincing S to sponsor HackTheNorth, and your subsequent workshop on language models, was not merely a professional victory, but a reconnection to a vibrant belief in hacker culture, filled with anticipation and excitement for building technology. ![](https://aarnphm.xyz/posts/2023/../../posts/images/2023-collage-htn.svg) --- ![](https://aarnphm.xyz/posts/2023/../../posts/images/2023-collage-heal.svg) You [showed up](https://x.com/daniellefong/status/1732922352244302196). You showed your love and affection for your friends through the warmth of a home-cooked meals. Potlucks, tasting menus - they were your ways of nurturing the bonds of friendship, a respite from the pressures of student life. Remember that Halloween, when you cooked a feast, ensuring your friends were well-fed and ready for a night of revelry? Surrounded by the sizzling skillet and steaming hot mashed potatoes, you found a sense of belonging. You’ve never seen yourself as the quintessential party-goer, often feeling like an observer on the fringes of the festivities. But you went, drawn by the camaraderie, even as a part of you remained reluctant. At the party, a familiar sensation crept in – a detachment, a subtle unravelling of your connection with the scene around you. Your inner id, usually so deeply buried, surfaced to whisper a stark truth: you didn’t quite mesh with this crowd. This realisation triggered a rush of anxiety, a feeling that swelled like a wave, urging you to escape, to find solace in the quiet of your own space. So, you left. You left the noise, the laughter, and returned to the silence of your home. There, in the aftermath of the evening’s earlier warmth, you were greeted by the remnants of your culinary endeavours – the pots, pans, and utensils bearing testament to the meal shared in love and friendship. In the stillness of your kitchen, a profound sense of loneliness enveloped you. You sat there, amidst the silent witnesses of your earlier joy, and tears began to fall. It was a poignant contrast – the joy of cooking for others and the solitary ache of feeling out of place, misunderstood. --- _Remember that breakup with J?_ The summer was a portrait of heartbreak, painted in shades of sorrow and restless nights. It’s funny how we try to mend ourselves, isn’t it? With a schedule as a plaster over a gaping wound. I had it all mapped out, or so I thought. But life, in its infinite jest, has a way of upending even the best-laid plans. It was on a nondescript day, November 13th, that I found myself on a date with a woman I’d met in the digital maze of online dating. The evening was unremarkable, tinged with the effort of trying to reconnect with the world. We ended up at her place - an encounter that was at best, mediocre. In the midst of the intimacy, memories of J invaded my mind, unbidden, like ghosts from a past life. J and I, we were polyamorous. Unorthodox, yes (but not really in SF), but to each other, we were anchors. My reluctance to move back to Canada was rooted in her – she was my ‘it’, my endgame. And then, as if summoned by the universe, a message from J pierced the night. Her words, simple yet loaded, unravelled me. We had agreed to silence, to give time and space for healing. But there I was, haunted by the love that embraced my most authentic self, the part of me unshielded by the armor I’d forged over the years. That night was a symphony of restlessness, the presence of another unable to fill the void. 3:30 am, my phone shattered the silence – it was J. Panic and longing intertwined as I answered. _What harm could there be?_ What followed was a mosaic of late-night conversations, spanning many weeks. J’s voice, laced with tears, spoke of longing and loss. Our talks were a roller-coaster of emotions – laughter quickly drowned by arguments, smiles eclipsed by sorrow. I was a cocktail of anger and sadness; I had moved on, or so I had convinced myself. Why now, in the midst of this? J’s behaviour was a mystery, a deviation from her usual sensibility. And there I lay, sleep eluding me, troubled by the thought of her distress. It was a pain that seeped deep into my bones, a relentless reminder of a love that refused to be buried. One morning, you found yourself seeking refuge in kitchen. It’s curious how, in times of turmoil, we gravitate towards the mundane, the ritualistic. There’s a certain healing power in cooking – the methodical chopping of vegetables, the hiss and dance of ingredients in the skillet, the rich tapestry of scents that fill the air. But even in this culinary cocoon, the spectre of J haunted you, infusing your silent tears with the bitterness of memory. As you lost yourself in these reflections, a momentary lapse in attention brought a sharp pain – a startling intrusion into your reverie. A drop of blood bloomed on the cutting board, a vivid contrast against the muted colours of the vegetables. The sight of it, coupled with the realisation that you had inadvertently cut your finger, brought a wave of lightheadedness. Yet, even as the shock set in, you instinctively reached for a towel, pressing it firmly against the wound. With a calm born of necessity, you navigated your way to the first-aid kit. Your hands, guided by a survival instinct that momentarily eclipsed the overwhelming thoughts of J, worked diligently to clean and dress the wound. After tending to the injury, you slumped against the fridge, your gaze drifting aimlessly to the ceiling. In an instance, a thought flickered through your mind – the notion of ending it all. But just as quickly as it surfaced, it dissipated at the thought of your mother. The image of her, perhaps unaware of the depths of your current struggles, yet invariably intertwined with your existence, acted as a grounding force. In the quiet of your kitchen, with the pain in your finger a sharp but grounding sensation, you were left to confront your ‘ego’ – the pain, the emotion, the longing, the love, and the indomitable will to endure. --- Navigating the aftermath of a first serious relationship is akin to finding one’s way through an uncharted wilderness, especially for someone who had always embraced solitude. My relationship with J was a journey into unexplored emotional depths, a discovery of a love both profound and transformative. Yet, when it ended, I was adrift in a sea of emotions, overwhelmed like a teacup caught in a relentless downpour. In relationships, we often find ourselves surprised by the depths and complexities of those we hold close. J was a revelation in this sense, a mirror to parts of myself I hadn’t known. But as the emotional turbulence continued, my logical self, long subdued, finally asserted itself. It whispered of the need for closure, for the sake of my own well-being. The final call to J was a bridge between past and future, a necessary severance, blocking all lines of communication going forward. This decision, difficult as it was, felt like the only way forward, a path to healing for both of you. Sharing experiences with Mom did lift a weight off your shoulders. It marked a turning point, a chance to truly move on. And before you knew it, Christmas break was upon you. Your return to school was marked by a fresh perspective, one shaped by your stint in SF. School is now a place for you to explore your interests and have the joy of learning, as it should be. For the first time, you found joy in the very structure of academia. ![](https://aarnphm.xyz/posts/2023/../../posts/images/2023-collage-finals.svg) --- 2023’s Aaron did: - Work-wise, [OpenLLM](https://aarnphm.xyz/posts/2023/../../thoughts/work#openllm----serve-fine-tune-and-deploy-llms-in-production), we actually made revenue this year, and got to work with some very, very cool companies!! You also did [buildspace S4](https://buildspace.so/) - Favourite movie that I cried has to be [Past Lives](https://www.youtube.com/watch?v=kA244xewjcI\&ab_channel=A24). The quintessential symphony of my journey so far. - Favourite restaurant is [CIMA](http://www.cimaonlocke.ca/). The food is amazing and I love the staff there. I have cried here many, many times. - I found philosophy somewhat cumbersome before, but this one class in university did change my perspective on the subject. Read [Nietzsche](https://aarnphm.xyz/posts/2023/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche)’s work, did exploration on metaphysics, and found _Beyond Good and Evil_ my favourite for 2023. - Expanded my vinyl collections, including Daft Punk, Fleetwood Mac, Led Zeppelin. - Made some house [tunes](https://www.youtube.com/playlist?list=PLsRPzRsbp3lCxe4gXH4S4Zf38X_45Oj6N), very much inspired by Fred again, Four Tet, and Peggy Gou. You still find your tone, but keep working at it. The following was hacked together in an afternoon: [](https://aarnphm.xyz/posts/2023/../../posts/images/2023-flac-1.mp3) _ID 1_ - Also, that one YouTube video I kept playing on repeat is [this one from the Lot Radio](https://www.youtube.com/watch?v=hvO0PrMBH9I\&ab_channel=TheLotRadio), or anything from this [query](https://www.youtube.com/results?search_query=four+tet+lot+radio) - Favourite object is this [10-inch pan](https://madeincookware.com/products/stainless-steel-frying-pan/10-inch). I kid you not having a stainless steel pans feels like a hack. Absolute love this bad boy. Second favourite object is this [turtleneck](https://www.ralphlauren.ca/men-clothing-sweaters/wool-cashmere-turtleneck-sweater/625236.html?dwvar625236_colorname=Vest%20Olive%20Heather#q=turtleneck\&br=t\&fq=division%253A%2522Men%2522\&start=1). I wore this pretty much everywhere. If you see me IRL chances are you saw me wearing this. --- Looking back, twenty-twenty-three was filled with moments of joy and sorrow, of love and loss. What I want for 2024: - `atelier with friends`, where you can pay what you think the meal is worth. I want to do at least _10_ this year. - Continue in the rabbit hole of philosophy: Deleuze and and Camus - I want to tend to my garden a bit better. There are too many `draft` and `noindex` notes that needs to taken care of. Mainly because 2023 was pretty much turmoil galore 🤗 - Learning to _let go_, and boundaries. - Finish that G2. (_ok I do need to get a driver license_) - Apprendre le français Kindly, _Your present self_ --- slug: posts/Chaos tags: - sapling - growth - self description: Chaos has, and always be a driven force within life, intuitive yet disheveled. And a few things I learned growing up in a foreign land. title: Chaos is intuitive yet disheveled. date: 2024-02-18 --- And a few things I learned growing up in a foreign land. _This is an extension of [chaos](https://aarnphm.xyz/posts/Chaos/../../thoughts/Chaos) that recently occupy my [chain of thought](https://aarnphm.xyz/posts/Chaos/../../thoughts/NLP). See this on [substack](https://open.substack.com/pub/livingalone/p/chaos-has-and-always-be-a-driven?r=1z8i4s\&utm_campaign=post\&utm_medium=web\&showWelcomeOnShare=true)_ ![](https://aarnphm.xyz/posts/Chaos/../../posts/images/passage-giorgio-m.webp) _Passage by Giorgio Morandi_ Chaos isn’t merely an undercurrent of life; it’s a pervasive force, ever-present, often simmering just beneath the surface, ready to erupt and manifest in myriad forms. It serves not only as a backdrop in the narratives of storytellers and the musings of philosophers but also a distinct entity with the power to challenge those brave enough to embrace its unpredictability. To move abroad, to step into the unknown, is to court chaos – to acknowledge and accept the inevitability of change and the sharp tang of constant motion. So far, I’ve lived on my own (or far away from family) for a third of my life, having made the leap to Canada at 16. This move, though seemingly late comparing to high school peers, was a turning point. It wasn’t just a change of scenery; rather, it formed a new way of seeing and being in the world for me. To articulate about the essence of moving to a new continent, let alone partake in the Western [educational](https://aarnphm.xyz/posts/Chaos/../../thoughts/education) system, still to this day, is a task fraught with complexity that I yet to comprehend. In the years before my departure, I was enrolled at [Hanoi-Amsterdam](https://en.wikipedia.org/wiki/Hanoi_%E2%80%93_Amsterdam_High_School), some can considered the “crème de la crème” school within the public school system in Vietnam. Middle school was pretty much an endless march of memorisation and night classes, all leading up to the high school entrance exams. Within this rigorous routine, there was no room for complaints or questions. I wasn’t content, yet I found a way to push through, not realising the toll it was taking on my mental and physical health. Therapy, attempted much later, didn’t reveal anything new. Perhaps, my continued sessions are a search for external validation that I’ve longed for. My sense of [self](https://aarnphm.xyz/posts/Chaos/../../thoughts/Value) was intertwined with being accepted into this institution. --- ```poetry language=fr Three weeks before the entrance exam, or something like that. Saturday afternoon. ``` The sun blazed down with a ferocity that seemed almost personal, its rays relentless against the backdrop of an afternoon sky devoid of clouds. Inside, within the four walls of the room where I had spent my years growing from a child into something resembling an adult, I sat hunched over my literature review. The task was simple in theory: memorize one of three essential poems. Yet, as the sunlight fought its way through the window, casting a harsh light on the pages before me, the words seemed to dance and dodge my grasp, refusing to be tamed. My focus was a blade, dulling with each failed attempt to carve the verses into my memory. The stillness of the room, a stark contrast to the turmoil within me, was punctuated only by the occasional creak of the house settling, as if it too strained under the weight of the heat. The air was thick, the kind of heat that makes the mind sluggish, the body weary. It was as if the entire world outside had paused, holding its breath, while I waged my silent battle within these familiar walls. Frustration mounted within me, a tide that threatened to breach its banks. I pressed on, the words of the poem blurring before my eyes, each line a testament to my faltering resolve. My mom, ever attuned to my struggles, sensed my distress. Her suggestion to move on was gentle, her words soft, “It’s okay, darling, let’s skip this one.” But to me, they sounded like a verdict, a confirmation of my fears. At 15, her words did not offer the comfort she intended. Instead, they unleashed the floodgates, and tears streamed down my face, a silent scream of defiance and despair. ```poetry language=fr Mom, I can't fail this exam. ``` I managed through sobs, the words thick in my throat. The room, with its memories and familiar comforts, felt suddenly alien, a witness to my vulnerability. In that moment, the outside heat, the oppressive stillness, and the chaos of my inner turmoil melded into a single, inescapable reality. --- Even after securing my place at Hanoi-Amsterdam, my disdain for it grew. The competitive and toxic atmosphere was a far cry from what I expected. It was a battleground for status, with little regard for collaboration or personal growth. My mom, herself an educator, saw the system’s failure to nurture curiosity or critical thinking. So, when the chance to study abroad presented itself, I seized it, leaving Vietnam behind. This decision marked the start of a tumultuous journey within. [Entropy](https://aarnphm.xyz/posts/Chaos/../../thoughts/Entropy) was seemingly first introduced to me in the form of the Canadian education system. The transition from the rigid, rote-learning environment to the more open, discussion-based system in Canada was jarring. The shift from a public school to a private boarding school was equally disorienting. The culture shock was palpable, and the adjustment period was fraught with challenges. I was a stranger in a strange land, a fish out of water, and the chaos of my new reality was overwhelming. I was completely baffled, destroyed, was up to no good (if you knew me you knew what I’m talking about!). But one thing that I have learnt from all the trauma accumulated throughout my experience at Amsterdam, was that, “Mama ain’t raised no quitter.” Thus, it was not-quite-okay-but-found-a-functional-way-to-survive mental model to persist throughout high school. Then the rest was history. Seemingly, this untamed curious inner child, still clung to my being, propels me forward. It is that inner chaos that encourage me to embark on this journey of understanding. > The world is a scary place, but I’m learning to cope through it. The [Übermensch](https://aarnphm.xyz/posts/Chaos/../../thoughts/Philosophy-and-Nietzsche) crossed over the bridge and guided me through the trenches of life. --- ![](https://aarnphm.xyz/posts/Chaos/../../posts/images/cima.webp) > I’m not sure where I want to go from here. Writing it down felt like opening a door I have long left shut. Each word was a step deeper into memory I have neatly folded away, not realising how much they still pulsed with life beneath the surface. Each of them felt like a sword, that carved deep into the heart, has a way of prying open the floodgates of emotions long buried. It’s one thing to carry your past quietly within you, another entirely to lay it out for the world—and yourself—to see. Suddenly, the chaos I thought I had managed whispered louder, demanding attention. [Equanimity](https://aarnphm.xyz/posts/Chaos/../../thoughts/Chaos#versus-equanimity), that state of calm balance, feels elusive, almost mythical, when you’ve danced with chaos so intimately. It’s as if I’ve befriended the storm, finding a strange comfort in its unpredictability, its relentless energy. This chaos, it doesn’t just disturb; it defines, shaping the contours of who I am, how I see the world. There’s a fear in tranquility, a suspicion of its silence. What does it mean to be at peace when you’ve grown accustomed to the noise? Yet, this journey—my journey—isn’t about conquering the chaos but learning to live with it, to see its patterns and understand its rhythms. Maybe equanimity isn’t about taming the monster but recognizing it as a part of the self, a reflection of the complexities and contradictions that make us human. The pursuit of balance isn’t a battle but a negotiation, a conversation with the parts of ourselves we fear and love in equal measure. Embarking on this exploration of different “entropic phenomena,” as I’ve come to call it, isn’t running away. It’s a [search](https://aarnphm.xyz/posts/Chaos/../../thoughts/Search) for understanding, a way to navigate the tumult with eyes wide open. There’s beauty in the chaos, lessons in the turbulence. And perhaps, in acknowledging this, I move closer to the equanimity I seek—not as a destination, but as a way of being, fluid and ever-evolving, amidst the storms and stillness alike. --- Last but not least, I would leave you, future Aaron, with a few questions that past-Aaron has been longed to find an answer. Let us, the duality of self, partake in a [Socratic dialogue](https://aarnphm.xyz/posts/Chaos/../../thoughts/questions), hopefully, through the process, we can find some normalcy within ourself: ## Q: who are you trying to become? A: Perhaps it is less about becoming but more about unravelling the complexities from within. There is a certain naive desire, a childlike curiosity, that propels me towards the unknown, the seas of uncertainty. In embracing this naive desire, I become a vessel of my own making, navigating the complex seas of existence. As it may be, at the moment, I’m trying to protect that child and shield him from the turbulence and chaos we call life. ## Q: why can’t you move back home? A: Consider the river and the dam. The river, a living artery, courses from its source with a purpose as clear as its waters. It meanders, shaped by the land it traverses, until it reaches the dam. Here, it lies in a deep réservoir, a body of water in waiting, destined to flow through turbines and continue its journey downstream. This cycle is perpetual: the sun draws the water skyward, and it returns as rain, nourishing the earth on its way back to the river. But the droplets that return are transformed, no longer the same entities that once rested in the dam’s embrace. The act of leaving one’s home for foreign shores is akin to such journey - a voyage of transformation, of encountering new landscapes, of merging with unfamiliar currents. When one leaves home, they embark on a trajectory vastly different from those who stay. The familiar becomes distant, and upon return, the once-known world feels alienated. You stand apart, changed in the eyes of those who remember you once were. “Home” remains a static concept, a memory preserved in amber, while you, like the river, have been irrevocably altered by your experiences. _In other words, this is often known as [the theme of displacement](https://aarnphm.xyz/posts/Chaos/../../thoughts/displacement)_ To return home is to face a poignant paradox: the physical space may be unchanged, the same faces may greet you, the house of your childhood may still nestle in its familiar spot, but your perception of it all has shifted. Gone the person you once was; now you have become the confluence of experiences that mold the “now” you, just as the returning water is forever changed by its journey. Yet, despite these changes, the essence remains. The being of ‘aqua’, remains unchanged, as the inner child within us persists. It is this unchanging essence that bridges the gap between the person we have become and the place we once called home. The question, then, is not why you cannot move back home, but rather, how can one reconcile the transformed self with a place that is both intimately familiar and strangely foreign, a place etched in memory, unchanged by time yet estranged by the journey’s passage. ## Q: what do you want to achieve? A: I want to achieve a sense of peace, a balance between the chaos and the stillness. Navigating the tumult with grace, and learn to let people in. I want to look back, on what we have went through: the stillness, the moments of joy and sorrow, and know that I have lived fully, embracing the complexities and contradictions, that make me human. I want to settle down, finding a place that you truly found happiness, and found sparing partners that will help you enjoy the journey a lil bit more. ## Q: what is next? A: Changes are hard, pushes us from the comfort of our well-defined boundaries, daring us to step beyond the familiar. It whispers of growth, of the necessity to stretch our skins beyond the contours of our current selves. This leap, from one domain to another, is fraught with challenge, yet it pulses with the thrill of exploration. Yet, in this era, the drive for transformation often crashes against the shores of economic reality. Monetary values, trickles in sparingly, hardly enough to spark the fires of self-renewal. Chaos, in its disdain for the stagnant, scoffs at the notion of safety. Safety, a gilded cage, stifles growth, ensuring that within its confines, we remain less than what we might become.. Life, then, poses its eternal riddles: Why does fear of the unknown paralyze us so? How do we stand firm in the belief that we are not solitary wanderers in this vast expanse? The warmth of unseen affections often goes unnoticed, yet in the heart’s quiet moments, we understand that our absence would echo in spaces we have touched. The world, with its myriad terrors and wonders, unfolds before us, a realm where the overman’s gaze might fall upon us. Yet, this overman, this ideal, is but a mask, a collective facade beneath which we all seek refuge. An unexpected call from a high school friend, a rarity, blooms like a flower in the desert. It’s a testament to the enduring nature of connections, a comforting reminder that amidst the vastness, there are anchors, points of light in the familiarity of shared pasts. But the immensity of it all can be overwhelming. Life teems with endless possibilities, a ceaseless buzzing that fills the mind with anxiety. The world, too large, our time, too fleeting, and the soul, too eager, finds itself adrift in a sea of potential paths. I’ve learned the art of detachment. People, with their inherent unpredictability, often disappoint. By tempering expectations, we shield ourselves from the sting of disillusionment. Camus mused on alienation, a reflection on the distance between the self and the other, a chasm often widened by unmet expectations. What lies ahead is a question that perpetually dances on the edges of my thoughts, a melody whose tune is both haunting and invigorating. Perhaps the answer to this enigma doesn’t reside in a single destination or outcome but rather in the delicate equilibrium between the facets of my being. On one hand, there’s the driven Aaron, fueled by curiosity and a relentless pursuit of excellence. This Aaron is a force, a whirlwind of ambition and determination, always pushing forward, always reaching for the next peak to conquer. On the other hand, there exists another Aaron, one who carries the weight of past hurts and seeks not just to advance but to heal. This Aaron understands that growth isn’t solely about personal achievements but also about nurturing and repairing the web of relationships that envelop him. This version of myself is attuned to the quiet, often overlooked work of mending bridges and soothing wounds, both his own and those of the people around him. The path forward, then, might not be a straight line but a winding road that requires navigating the complexities of these dual identities. It’s about recognising that the quest for achievement and the journey toward healing are not mutually exclusive but are, in fact, complementary forces. By embracing both the drive to excel and the need to heal, one can forge a way forward that honours the entirety of your aspirations. In this balance, You might find not just the next step but a deeper understanding of what it means to truly live. It’s about making peace with the multifaceted nature of my desires and recognising that every facet, whether driven by ambition or the need for connection, plays a crucial role in defining who I am and who I aspire to be. The road ahead is one of integration, where the driven and the broken parts of me walk hand in hand, each lending strength to the other as I continue to explore the vast landscape of possibilities that life offers. With regards, Anh P. ![](https://aarnphm.xyz/posts/Chaos/../../posts/images/aaron-younglings.webp) --- slug: posts/Questions-about-Apology tags: - philosophy - fruit description: Questions about Plato's Apology title: Questions about Apology date: 2023-11-09 --- In Plato’s [Apology](https://aarnphm.xyz/posts/Questions-about-Apology/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato#apology), Socrates delineates a distinct boundary between pursuing a life of justice and engaging in politics. He posits that a life devoted to righteousness is fundamentally at odds with the realm of political involvement (Apology, pp. 41-42). Through the Socratic dialogues of Socrates’ own trials, Plato illustrates an examination of the moral and ethical foundations of the Athenian society and political system, underscored by Socrates’ assertions, _“He who will fight for the right, if he would live even for a brief space, must have a private station and not a public one”_. Consequently, I find myself aligning with Socrates’ perspective, asserting that leading a just life is an endeavour incompatible with holding political office. [Socrates](https://aarnphm.xyz/posts/Questions-about-Apology/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates) ascribes his abstention from political engagement to a divine mandate (“a voice”) directing him towards the pursuit of [truth](https://aarnphm.xyz/posts/Questions-about-Apology/../../thoughts/Will-to-Truth) and virtue. At the onset of the trial, Socrates mentions a prophecy from the Oracle of Delphi, which declares that “no man \[is] wiser” than Socrates. Spurred by this proclamation, Socrates engages with those renowned for their wisdom through scrutiny and questioning in an attempt to unravel the oracle’s message. Yet, none could furnish satisfactory answers to his inquires, leading Socrates to a profound realisation that his true wisdom is rooted in recognising his own ignorance for knowing nothing (Apology 21a-23b). Socrates then embarks on a path of reminding those around him always to use intellect to scrutinise their lives and questions whether they live their life truthfully, embodying a commitment to virtuous living (Apology, 23b-23d). This philosophy is further emphasised in his dialogue, where the statement “the unexamined life is not worth living” encapsulates his conviction and mission to lead a life rooted in truth and virtue. Socrates’ commitment to a virtuous life often stood in stark contrast to the political pragmatism of Athens, a discrepancy which resulted in numerous adversaries for Socrates over the years as he advocated for a virtuous way of life. Throughout his trial, Socrates shed light on the corruption ingrained within the Athenian democratic system, as evidenced by the charges against him—corrupting the youth and displaying impiety towards the Athenian pantheons (Apology 24a-28b). These accusations stemmed from his associations with individuals who had fallen out of political favour in Athens post-Peloponnesian War (Britannica). Upheld by the principle that “injustice and disobedience to a better, whether God or man, is evil and dishonourable,” Socrates found the notion of partaking in the “public life” of Athens’ turbulent political scene unpalatable, especially when faced with political decisions (Apology 24d, 25a). Hence, he chose to steer clear of a “public life,” recognising that the political domain, fraught with inherent compromises, could lead individuals towards committing injustices, thereby tarnishing the soul. In Socrates’ view, it was his duty as a philosopher to uphold moral integrity without succumbing to the compromises inherent in politics. Socrates even suggests that death is preferable to a life of dishonesty or moral compromise (38a, 30c-d). His willingness to face death rather than retract his philosophical beliefs during his trial epitomises this stance. By abstaining from political life, Socrates was able to dedicate himself to a life of virtue and truth, even at the cost of his own life. Through this choice, Socrates exemplifies the notion that a life worth living is one committed to higher principles rather than personal or political gain. While I understand this stance, I find it somewhat implausible as I believe a life worth living necessitates a balance between moral integrity and political engagement rather than solely focusing on maintaining a high moral compass. If one aligns solely with Socrates’ ideas, there’s a risk of being perceived as selfish for not seizing the opportunity to effect positive change. Historical figures like Martin Luther King Jr. embody a different ideology by embracing political engagement to drive substantial changes for the betterment of society (Strauss, B). In conclusion, [Plato](https://aarnphm.xyz/posts/Questions-about-Apology/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato)’s Apology articulates Socrates’ persuasive arguments regarding the dangers inherent in political life for individuals committed to justice and truth. Socrates’ life and trial serve as poignant exemplars of these challenges within the historical context. The core of Socrates’ philosophical inquiries and thought-provoking arguments, which challenge the values and norms of Athenian society, suggests that total withdrawal from public life is the sole path for a philosopher whose mission is to pursue truth and maintain personal integrity, irrespective of the political climate. While these ideas may hold relevance in a specific context, I argue that they are implausible. A nuanced balance between political engagement and moral integrity should be the cornerstone one aims for to lead a life that is truly worth living. ### References Plato. (n.d.). Apology. Translated by Benjamin Jowett. Retrieved from . Encyclopaedia Britannica. (n.d.). Background of the trial - Socrates. Retrieved from Strauss, B. (n.d.). Martin Luther King Jr. and Socrates. Retrieved from \_feedback: criticising Apology arguments, perceived as selfish Socrates in general with appearance vs. essence (value essence over appearance) so long as you are actually selfish Alegory of the cave for example. Devaluation of opinion Earlier in Repulblic doing thought experience (just look injust vs injust look just) ⇒ live of the just would be better to live \_ --- slug: posts/Questions-about-Metaphysics tags: - philosophy - fruit description: Questions about Aristotle's Metaphysics title: Questions about Metaphysics date: 2023-11-16 --- In reflecting upon [Aristotle](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle)’s [Being qua being](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Being-qua-being), especially his demarcation of wisdom from mere experiential and technical knowledge, it becomes compelling to juxtapose his perspectives with the more fluid conceptions of knowledge for modern days, proposed by [Nietzsche](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/Philosophy-and-Nietzsche) and [Freud](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/Freud). Nietzsche’s theories, emphasising the subjective nature of knowledge, alongside [Freud](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/Freud)’s insights into the unconscious dimensions of human comprehension, present a stark contrast to Aristotle’s more structured paradigm. Aristotle delineates wisdom as a form of knowledge superior to others, stating, _“For the wise man must not be ordered but must order, and he must not obey another, but the less wise must obey him”_ (Metaphysics, Book 1, Chapter 2). This hierarchical and seemingly rigid distinction appears less pertinent in contemporary discourse, where the boundaries between various domains of knowledge are increasingly permeable and intertwined. My argument posits that while Aristotle’s framework offers a valuable basis for understanding wisdom, a modern interpretation of wisdom should not only incorporate a philosophical understanding of universal truths but also embrace the dynamic and ethical application of knowledge in varied contexts. Wisdom, in today’s world, goes beyond simple comprehension or command; it encapsulates adaptability, cooperative engagement, and the sophisticated application of knowledge in addressing the complex challenges that define our times. [Aristotle](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle) establishes a clear hierarchy between experience, knowledge, and wisdom, positing that while experience is valuable for practical action, it falls short of constituting true knowledge or wisdom. He notes, _“With a view to action, experience seems in no respect inferior to art… But yet we think that knowledge and understanding belong to art rather than to experience…”_ (Metaphysics, 132). Here, experience is depicted as the practical application of skills, a necessary but insufficient component of deeper understanding. In contrast, knowledge, particularly in forms like art or technical mastery, is portrayed as encompassing a comprehension of underlying principles and causes. Furthermore, Aristotle demarcates knowledge as a progression beyond experience, implying a deep understanding of the ‘why’ behind things. He states, _“For men of experience know that the thing is so, but do not know why, while the others know the ‘why’ and the cause”_ (Metaphysics, 132). Here, knowledge represents a transition from simply acknowledging facts to understanding their foundational principles and broader implications. This includes both ‘technē,’ a kind of knowledge relevant to making things (craftsmanship or art), and ‘epistēmē,’ scientific knowledge. These forms of knowledge are not just about knowing facts or processes; they involve understanding the principles and causes behind them. Wisdom (_Sophia_), according to Aristotle, is the pinnacle in this hierarchy. In discussing the nature of sciences and their quest for understanding, he observes, _“Clearly then Wisdom is knowledge about certain principles and causes” (Metaphysics, Book 1, Chapter 1)_. This assertion posits wisdom not as a mere collection of knowledge but as a synthesis of practical know-how, theoretical understanding, and philosophical introspection. It is through this synthesis that one apprehends the fundamental nature of reality. In Aristotle’s philosophical construct, wisdom thus signifies a deep and comprehensive grasp of universal truths and causes, transcending the limitations of both practical experience and technical knowledge. Wisdom is characterised by an ability to teach and understand the causes in every branch of knowledge. He views wisdom as the highest form of knowledge, one that seeks to understand the ultimate causes and principles of all things. In the modern world, the distinction between experiential knowledge and wisdom, as outlined by Aristotle, seems less rigid. This perspective is further challenged by the contributions of thinkers like Nietzsche and Freud, who bring unique insights into the nature of knowledge. Nietzsche’s concept of [perspectivalism](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/Philosophy-and-Nietzsche#thus-spoke-zarathustra) suggests that all knowledge is subjective and shaped by our viewpoints, challenging the idea of objective or absolute wisdom (The Atlas Society; Nietzsche on Truth and Philosophy, Cambridge University Press). Similarly, Freud’s deterministic view of the unconscious mind and the role of instincts in shaping human behaviour highlight the complexities and unconscious elements in our understanding of knowledge and wisdom (Internet Encyclopedia of Philosophy). These perspectives imply that in the modern context, where knowledge is often seen as more fluid and multifaceted, Aristotle’s structured approach to wisdom may not fully encapsulate the diverse and subjective nature of understanding. Technological advancements and the widespread accessibility of information have facilitated the acquisition of deep knowledge in various fields, transcending traditional academic boundaries. This democratisation of knowledge hints at a more integrated relationship between experience, technical expertise, and wisdom, aligning with the contemporary educational emphasis on interdisciplinary approaches and problem-solving skills. Furthermore, Nietzsche’s criticism of the concept of a predetermined human telos or purpose stands in opposition to Aristotle’s view of wisdom as the pursuit of universal and objective truths. Nietzsche’s perspective suggests that the potential for human excellence and virtue is not a fixed or singular path but rather a diverse and evolving journey shaped by individual experiences and perspectives. In conclusion, Aristotle’s hierarchical approach among experience, knowledge, and wisdom in Metaphysics, while foundation, is increasingly at odds with contemporary views and deemed not plausible for the modern world. Nietzsche’s critique, especially his rejection of objective moral values and advocacy for individualistic value creation, challenges Aristotle’s wisdom hierarchy (Philosophy Now). The [Übermensch](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/Chaos) concept, focusing on individual value creation through self-justified actions, stands in stark contrast to Aristotle’s view of wisdom as understanding universal principles (Philosophy Now). Thus, a modern reinterpretation is warranted. Contemporary wisdom should merge a philosophical understanding of universal truths with dynamic, ethical knowledge application in various contexts. Wisdom today surpasses mere comprehension or command, embodying adaptability, cooperation, and innovative application of knowledge for complex challenges. Thus, my argument, therefore, aligns more with Nietzsche’s vision, advocating a nuanced, individualistic wisdom approach for the 21st century. ### Reference 1. Ansell-Pearson, K. (2012). Nietzsche’s Übermensch: A Hero of Our Time. _Philosophy Now_. Retrieved from 2. Thornton, S. (2020). Sigmund Freud (1856—1939). In _Internet Encyclopedia of Philosophy_. Retrieved from 3. Clark, M. (1990). _Nietzsche on Truth and Philosophy_. Cambridge University Press. Retrieved from --- slug: posts/Questions-about-Spinoza tags: - philosophy - fruit description: Questions about Spinoza's Ethics. In the Appendix to Ethics Part One (pp. 180-85), Spinoza criticizes the idea “that God directs all things to some definite end” and “that God has made all things for man and has made man to worship God.” (181). Why do people believe such things? title: Questions about Spinoza date: 2023-11-30 --- In delving deeper into the philosophical insights of Baruch [Sphinoza](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Sphinoza)’s “Ethics,” particularly his repudiation of teleology and the anthropocentric conception of divine power, it becomes essential to contrast these views with the philosophical tenets found in Friedrich Nietzsche’s “Beyond Good and Evil.” Spinoza, with his staunch rationalism, argues that misconceptions about the divine will lead to a skewed understanding of morality and aesthetics. He suggests that true morality emerges from comprehending nature and God as entities devoid of human-like intentions or ends. For Spinoza, morality is less about adhering to external moral codes and more about aligning oneself with a profound understanding of God’s nature. [Nietzsche](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Philosophy-and-Nietzsche), while sharing Spinoza’s scepticism of conventional [morality](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/moral), approaches the subject from a different vantage point. His concept of perspectivalism, particularly the idea of the [“Will to Power”](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Will#as-power), challenges traditional notions of morality and [truth](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Will-to-Truth) as expressions of an inherent drive in all living beings to assert and maintain influence and perspective. Unlike Spinoza, Nietzsche is more focused on the role of individual power in shaping morals, eschewing the existence of a higher being. While Spinoza prompts us to envision a deterministic universe without divine purpose, urging a more objective approach to morality, Nietzsche confronts us with the nihilistic consequences of such a universe, advocating for the creation of personal values in response. This juxtaposition of ideas is crucial in contemporary philosophical discussions, urging a critical reassessment of our moral beliefs. I posit that our moral values should not only draw strength from personal conviction but also be grounded in a rational understanding of our environment and history, informed by disciplines like anthropology. This balanced approach offers a way to navigate the complex landscape of moral philosophy in the modern world. The genesis of teleological beliefs, believed by Spinoza, stemmed from human ignorance and the inherent desire to seek personal advantage. He wrote, _“all men are born ignorant of the causes of things, and that all men want to seek their own advantage and are conscious of wanting this.”_ He asserts that individuals, born ignorant of the causes of things and conscious of their desires, mistake their subjective experiences and desires for universal truths. This ignorance leads them to ascribe purpose and intention to natural phenomena, a projection of their own human-centric perspective. Spinoza argues that people, unable to comprehend the true causes of events, resort to the idea of a purposeful divine intervention, attributing their fortunes and misfortunes to a deity’s will. This anthropocentric view, according to Spinoza, arises not from an understanding of the universe but from a fundamental ignorance about it. The arguments for the fallacy of teleological thinking are multifaceted. Spinoza first argues that attributing purposes to nature inverts the true order of cause and effect. By assuming that events occur for a specific end, people mistakenly elevate what are mere effects to the status of causes. Spinoza also challenges the notion of divine purpose, suggesting that if God created the world for an end, it implies a deficiency in God, contradicting the notion of divine perfection. He asserts that everything in nature occurs out of necessity and follows from God’s nature, not from a divine intention or goal. Spinoza’s argument here is radical for his time, as it removes divine will from the equation of existence, positioning nature and its occurrences as manifestations of a deterministic universe. Spinoza extends his critique to the realm of human morality and [aesthetics](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/aesthetic-value), arguing that the belief in a purposeful universe has led to skewed notions of good and evil, beauty and ugliness. He posits that these concepts are subjective and arise from how things affect individuals personally rather than from any intrinsic quality of the things themselves. By believing that everything is created for human use, people judge the value of things based on their utility or pleasure. This anthropocentric perspective, according to Spinoza, leads to a distorted understanding of nature and contributes to conflicts and scepticism, as what is considered ‘good’ or ‘beautiful’ varies widely among individuals. Nietzsche’s approach in [Beyond Good and Evil](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Philosophy-and-Nietzsche#anatomy-of-beyond-good-and-evil) presents a stark divergence from Spinoza’s rationalistic [determinism](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Determinism). Nietzsche, known for his provocative style and radical ideas, fundamentally challenges the concept of God, dismissing it as a mere human construct. He criticizes the Christian moral framework and the notion of an objective, universal truth. Nietzsche argues that what is often perceived as truth is merely a manifestation of human will and the power dynamics at play in society. In “Beyond Good and Evil,” Nietzsche states, “There is no such thing as moral phenomena, but only a moral interpretation of phenomena” (Beyond Good and Evil, Aphorism 108). This perspective reflects his belief in perspectivalism, the idea that all knowledge is interpretive and contingent upon individual perspectives. Nietzsche’s critique extends to the realm of [metaphysics](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Metaphysics) and [epistemology](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Epistemology) he views the belief in God and divine teleology as a weakness, a human invention to impose meaning and order in a fundamentally chaotic and purposeless universe. Contrasting with Spinoza’s deterministic view, where everything follows from the necessity of God’s nature, Nietzsche’s perspective is that the universe and human existence lack any inherent meaning or purpose. He posits that moral values are not just human-centric interpretations but fundamental expressions of “The Will to Power” and subjective interpretation in shaping human understanding and morality. For Nietzsche, the universe is not a cosmos ordered by divine providence or natural law but is instead a dynamic play of forces and wills, constantly in flux and beyond any fixed moral categorization. Furthermore, Nietzsche’s critique of divine teleology is intertwined with his broader rejection of traditional metaphysical and moral systems. He perceives these systems as symptomatic of humanity’s fear of facing the existential void – the absence of inherent meaning or purpose in life. In Aphorism 36 of “Beyond Good and Evil,” Nietzsche explains this perspective, highlighting the human tendency to construct metaphysical worlds as a way of coping with the inherent meaninglessness of existence. In contrast to Spinoza’s concepts of morality, Nietzsche presents a more critical analysis of morality and aesthetics. In Nietzsche’s view, moral systems are tools employed by individuals or groups to exert their influence and control over others through “herd instincts” (Nietzsche, “Beyond Good and Evil,” Aphorism 202). This perspective implies that moral and aesthetic judgments are more about asserting dominance and control than about any objective assessment of utility or pleasure. In synthesizing Spinoza’s rational critique with Nietzsche’s radical perspective, we uncover a comprehensive philosophical framework that profoundly challenges traditional beliefs in divine purpose and absolute morality. Nietzsche extends beyond Spinoza’s critique of anthropocentrism, delving into the deeper power dynamics that shape moral and aesthetic assertions. He presents morality not as a universal truth but as a subjective construct influenced by prevailing power structures and individual wills, reflecting his broader themes of scepticism towards absolute truths and the subjective nature of human experience. This combined perspective of Spinoza’s deterministic view and Nietzsche’s perspectivalism offers a potent critique of human-centric views of the universe and teleological thinking. It underscores the contingency, subjectivity, and influence of desires and power structures in our interpretations and judgments. This dual approach not only remains profoundly relevant in contemporary discourse but also enriches our understanding of philosophical and ethical discussions. Together, Spinoza and Nietzsche compel us to reconsider our notions of the universe, morality, and our place within it, highlighting the necessity of acknowledging the complex interplay of knowledge, power, and subjective human experience in shaping our worldview. --- slug: posts/chatgpt tags: - engineer4a03 - fruit description: And its implication on how we assess learning. an overview. title: On ChatGPT and its pedagogical consequences date: 2024-10-02 --- _The following in an excerpt of a paper I wrote for my coursework._ > [!question]- Question > > In the context of Gartner’s hype cycle, what has been the trajectory of generative conversational AI? > > Should a format including generative conversational AI be introduced to replace traditional essay assignments in educational settings, and if so, what are some potential implications for student learning and assessment? ([Dwivedi et al., 2023](#bib-dwivedi2023102642)) ## Introduction. Historically, Alan Turing’s seminal work “Computing Machinery and Intelligence” laid the foundation for exploring the possibilities of a thinking machine ([TURING, 1950](#bib-10.1093/mind/lix.236.433)). Subsequently, the development of [AI](https://aarnphm.xyz/posts/chatgpt/../../thoughts/Machine-learning) had taken a symbolic approach — world representations through systems that utilise high-level symbols and manipulate tokens to arrive at a result commonly referred to as Good Old-Fashioned AI (GOFAI) ([Haugeland, 1997](#bib-10.7551/mitpress/4626.001.0001)). While GOFAI showed promise through decision-tree [reasoning](https://aarnphm.xyz/posts/chatgpt/../../thoughts/reason), its limitations became apparent in the 1980s when the field entered “AI Winter.” This was likely due to the cynicism within the AI researchers’ community and a reduction in funding, which halted most research and development ([Hendler, 2008](#bib-handler2008avoidanotheraiwinter)). However, given the rise of Moore’s Law and the exponential amount of computing and [data](https://aarnphm.xyz/posts/chatgpt/../../thoughts/data) available, a new approach to [AI](https://aarnphm.xyz/posts/chatgpt/../../thoughts/AGI) arose, focusing on statistical methods and connectionist networks such as artificial neural networks. ([Haugeland, 1997](#bib-10.7551/mitpress/4626.001.0001)) dubbed this approach as New Fangled AI (NFAI). Fast forward to the $21^{\text{st}}$ century, ML has entered the mainstream through the rise of generative AI (GenAI). This paper posits that GenAI currently occupies the “peak of inflated expectations”, approaching the “trough of disillusionment” on Gartner’s hype cycle. It will also examine the implications of machine-assisted interfaces beyond conversational UI and their pedagogical consequences for student learning and assessment. ## Gartner’s hype cycle. For context, applications such as ChatGPT are built on top of [Transformers](https://aarnphm.xyz/posts/chatgpt/../../thoughts/Transformers) architecture and pre-trained on a large corpus of [text](https://aarnphm.xyz/posts/chatgpt/../../thoughts/Language#representation) ([Brown et al., 2020](#bib-brown2020languagemodelsfewshotlearners)). Given an input sequence of tokens length $n$, these systems will predict the next tokens at index $n+1$. Most implementations of transformers are autoregressive ([Croft, 2023](#bib-croft2023llm)), meaning that the model will predict the future values (index $n+1 \to \infty$) based on past values (index $0 \to n$). However, ([Keles et al., 2022, p. 4](#bib-keles2022computationalcomplexityselfattention)) proved that the computation complexity of self-attention is quadratic; therefore, running these systems in production remains a scaling problem ([Kaplan et al., 2020](#bib-kaplan2020scalinglawsneurallanguage)). The current positioning of GenAI at the peak of inflated expectations aligns with the ([Gartner, 2024](#bib-gartner2024multimodal)) prediction. Three key factors support this assessment: rapid advancement in research, widespread enterprise adoption, and increased public awareness. Ongoing research in GenAI, specifically language models, spans several topics, including mechanistic interpretability ([Nanda, 2023](#bib-nanda2023concrete)), which explores the inner workings of auto-regressive models, information retrieval techniques aimed to improve correctness and reduce hallucinations among LLM systems ([Béchard & Ayala, 2024](#bib-béchard2024reducinghallucinationstructuredoutputs); [Dhuliawala et al., 2023](#bib-dhuliawala2023chainofverificationreduceshallucinationlarge)), as well as vested interests in multimodal applications of transformers ([Xu et al., 2023](#bib-xu2023multimodallearningtransformerssurvey)). Leading research labs, from Anthropic on their interpretability and alignment work ([Bricken et al., 2023](#bib-bricken2023monosemanticity); [Elhage et al., 2022](#bib-elhage2022superposition); [Templeton et al., 2024](#bib-templeton2024scaling)), AI21’s Jamba with its innovative hybrid transformers architecture ([Team et al., 2024](#bib-jambateam2024jamba15hybridtransformermambamodels)) to open-weights models from [Meta](https://www.llama.com/), [Google](https://deepmind.google/technologies/gemini/pro/) continue lead redefine the boundaries of what these systems are capable of. Enterprise adoption is evident with Salesforce ([Nijkamp et al., 2023](#bib-nijkamp2023xgen7btechnicalreport)), Oracle’s [collaboration with Cohere](https://cohere.com/customer-stories/oracle), and Microsoft’s Copilot for its 365 Product Suite. However, widespread implementation doesn’t necessarily equate to immediate, measurable productivity gains. Integrating these systems effectively into enterprise workflows to deliver tangible business value takes time and effort. Despite the field’s excitement, the current hype and expectations often exceed its reliable capabilities, especially for complex use cases. Significant challenges persist, including hallucinations and lack of factual grounding ([Huang et al., 2023, p. 3](#bib-huang2023surveyhallucinationlargelanguage)). We observe such behaviours in ChatGPT, where the given knowledge cutoff prevents the systems from providing up-to-date information, which will “hallucinate” and provide inaccurate answers. ([Dwivedi et al., 2023, p. 4.4.9.1.2](#bib-dwivedi2023102642)) As the field progresses towards the “trough of disillusionment” on Gartner’s hype cycle, a more realistic assessment of GenAI’s capabilities will likely emerge, paving the way for more effective applications. ## Implications of machine-assisted interfaces and its pedagogical consequences for student learning and assessment. The proliferation of conversational user interfaces (CUI) is based upon a simple heuristic of how [auto-regressive models](https://aarnphm.xyz/posts/chatgpt/../../thoughts/Autoregressive-models) models surface their internal state through generating the next tokens. CUIs often prove frustrating when dealing with tasks requiring larger information sets. Additionally, for tasks that require frequent information retrieval (research, academic writing), CUIs are suboptimal as they compel users to maintain information in their working memory unnecessarily. Pozdniakov proposed a framework that incorporate both application and interaction design, emphasizing manual alignment inputs from end users ([Pozdniakov et al., 2024, p. 3](#bib-pozdniakov2024largelanguagemodelsmeet)). This approach, when applied replace traditional essay assignments, has two major implications for student learning and assessment: a shift in core competencies and collaborative assessment methods. With machine-assisted interfaces, students will need to develop stronger critical thinking skills to evaluate AI-generated content and formulate precise instructions. The focus will shift towards the process of reaching desired outcomes and improving information retrieval skills. This shift aligns with the potential for machine-assisted proofs to solve novel problems, as discussed by ([Tao, 2024](#bib-tao2024machineassisted)). These new interfaces will require instructors to adapt their evaluation methods. Assessment will need to consider students’ pace flexibility and their level of engagement with a given topic. This approach encourages a more holistic, cross-disciplinary understanding, better preparing students for continuous learning in our rapidly evolving technological landscape. ## References - Béchard, P., & Ayala, O. M. (2024). _Reducing hallucination in structured outputs via Retrieval-Augmented Generation_. arXiv preprint arXiv:2404.08189 [arxiv](https://arxiv.org/abs/2404.08189) - Bricken, T., Templeton, A., Batson, J., Chen, B., Jermyn, A., Conerly, T., Turner, N., Anil, C., Denison, C., Askell, A., Lasenby, R., Wu, Y., Kravec, S., Schiefer, N., Maxwell, T., Joseph, N., Hatfield-Dodds, Z., Tamkin, A., Nguyen, K., … Olah, C. (2023). Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2023/monosemantic-features/index.html) - Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). _Language Models are Few-Shot Learners_. arXiv preprint arXiv:2005.14165 [arxiv](https://arxiv.org/abs/2005.14165) - Croft, B. (2023). _LLM Visualization_. - Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A., & Weston, J. (2023). _Chain-of-Verification Reduces Hallucination in Large Language Models_. arXiv preprint arXiv:2309.11495 [arxiv](https://arxiv.org/abs/2309.11495) - Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Baabdullah, A. M., Koohang, A., Raghavan, V., Ahuja, M., Albanna, H., Albashrawi, M. A., Al-Busaidi, A. S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I., Brooks, L., Buhalis, D., … Wright, R. (2023). Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. _International Journal of Information Management_, _71_, 102642. - Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield-Dodds, Z., Lasenby, R., Drain, D., Chen, C., Grosse, R., McCandlish, S., Kaplan, J., Amodei, D., Wattenberg, M., & Olah, C. (2022). Toy Models of Superposition. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2022/toy_model/index.html) - Gartner. (2024). _Gartner Predicts 40 Percent of Generative AI Solutions Will Be Multimodal By 2027_. - Haugeland, J. (1997). _Mind Design II: Philosophy, Psychology, and Artificial Intelligence_. The MIT Press. - Hendler, J. (2008). Avoiding Another AI Winter. _IEEE Intelligent Systems_, _23_(2), 2–4. - Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2023). _A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions_. arXiv preprint arXiv:2311.05232 [arxiv](https://arxiv.org/abs/2311.05232) - Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). _Scaling Laws for Neural Language Models_. arXiv preprint arXiv:2001.08361 [arxiv](https://arxiv.org/abs/2001.08361) - Keles, F. D., Wijewardena, P. M., & Hegde, C. (2022). _On The Computational Complexity of Self-Attention_. arXiv preprint arXiv:2209.04881 [arxiv](https://arxiv.org/abs/2209.04881) - Nanda, N. (2023). _Concrete Steps to Get Started in Transformer Mechanistic Interpretability_. - Nijkamp, E., Xie, T., Hayashi, H., Pang, B., Xia, C., Xing, C., Vig, J., Yavuz, S., Laban, P., Krause, B., Purushwalkam, S., Niu, T., Kryściński, W., Murakhovs’ka, L., Choubey, P. K., Fabbri, A., Liu, Y., Meng, R., Tu, L., … Xiong, C. (2023). _XGen-7B Technical Report_. arXiv preprint arXiv:2309.03450 [arxiv](https://arxiv.org/abs/2309.03450) - Pozdniakov, S., Brazil, J., Abdi, S., Bakharia, A., Sadiq, S., Gasevic, D., Denny, P., & Khosravi, H. (2024). _Large Language Models Meet User Interfaces: The Case of Provisioning Feedback_. arXiv preprint arXiv:2404.11072 [arxiv](https://arxiv.org/abs/2404.11072) - Tao, T. (2024). _Machine-Assisted Proofs_. - Team, J., Lenz, B., Arazi, A., Bergman, A., Manevich, A., Peleg, B., Aviram, B., Almagor, C., Fridman, C., Padnos, D., Gissin, D., Jannai, D., Muhlgay, D., Zimberg, D., Gerber, E. M., Dolev, E., Krakovsky, E., Safahi, E., Schwartz, E., … Shoham, Y. (2024). _Jamba-1.5: Hybrid Transformer-Mamba Models at Scale_. arXiv preprint arXiv:2408.12570 [arxiv](https://arxiv.org/abs/2408.12570) - Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken, T., Chen, B., Pearce, A., Citro, C., Ameisen, E., Jones, A., Cunningham, H., Turner, N. L., McDougall, C., MacDiarmid, M., Freeman, C. D., Sumers, T. R., Rees, E., Batson, J., Jermyn, A., … Henighan, T. (2024). Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html) - TURING, A. M. (1950). I.—COMPUTING MACHINERY AND INTELLIGENCE. _Mind_, _LIX_(236), 433–460. - Xu, P., Zhu, X., & Clifton, D. A. (2023). _Multimodal Learning with Transformers: A Survey_. arXiv preprint arXiv:2206.06488 [arxiv](https://arxiv.org/abs/2206.06488) --- slug: posts/corporate-personhood tags: - engineer4a03 - fruit description: and moral responsibilities of corporation. title: Of Corporations, Courts, Personhood, and Morality date: 2024-11-19 --- The following in an excerpt of a paper I wrote for my coursework. > [!question]- Question > > Read “Of Corporations, Courts, Personhood, and Morality. Business Ethics Quarterly, 25(4), 415-431.” ([Blair, 2015](#bib-blair2015ofcorporations)) > > After reading this paper and in consideration of objective reality, subjective reality and legal fiction, do you think corporations should be regarded as separate legal “persons”? Do you agree with Prof. Thomas Donaldson’s vision of corporations as “moral” persons? Why or why not? How does the concept of corporate “personhood” influence our thinking about the social responsibilities of corporations? Corporate personhood posits complex philosophical challenges that intersect with practical questions of morality, responsibility and social impact. While corporations have historically been granted legal personhood to facilitate commerce and establish clear rules of operation ([Blair, 2015](#bib-blair2015ofcorporations)), this legal fiction deserves a thorough examination to determine its ethical validity in the 21st century. The essay posits that corporations should not be considered as separate legal “persons” and should be limited to a practical legal framework while strongly opposing ([Donaldson, 1984](#bib-donaldson1984corporation)) vision of treating corporations as moral agents. ## Against Donaldson’s argument Legally, corporations are treated as separate entities, allowing them to own property, enter contracts, and be liable for debts independent of their shareholders. ([Blair, 2015](#bib-blair2015ofcorporations)) highlights that this legal fiction facilitates economic growth by encouraging investment and risk-taking. The objective reality is that corporations are collectives of individuals, and legal personhood is a tool for managing complex economic activities. However, conflating this legal construct with moral personhood is problematic. ([Donaldson, 1984](#bib-donaldson1984corporation)) posits that corporations are moral agents capable of ethical reasoning and responsibility. However, [Kant](https://aarnphm.xyz/posts/corporate-personhood/../../thoughts/Philosophy-and-Kant)’s categorical imperative challenges this very notion. Kantian [ethics](https://aarnphm.xyz/posts/corporate-personhood/../../thoughts/ethics) require autonomous agents capable of rational decision-making and moral consideration for others ([Kant, 1785](#bib-kant1785kangft)). Corporations, driven primarily by profit maximisation, lack the capacity for moral autonomy. Their decision-making processes are constrained by shareholder interests and market forces, limiting their ability to act out of duty or universal moral laws. ([Deleuze & Guattari, 1972](#bib-deleuze1972anti))‘s critique further elucidates the inherent contradictions within capitalist systems. They argue that capitalism dissolves traditional structures and encourages an unrestrained pursuit of profit and market power. Under this framework, corporations act as agents of “deterritorialisation”—entities that disrupt established social norms in their relentless pursuit of growth. When corporations are granted personhood, they influence and shape the socio-political landscape, often without meaningful accountability. This “schizophrenic” drive for growth highlights the ethical risks associated with conflating corporate interests with those of individuals. ([Chomsky, 1999](#bib-chomsky1999profit)) further argues that corporations, empowered by neo-liberal policies, often operate contrary to the public good, undermining democratic processes and social welfare. This perspective reinforces the view that corporations lack the moral orientation necessary to be considered moral persons. While Donaldson’s vision of corporations as “moral persons” attempts to impose ethical obligations on corporate behaviour, it fails to address the fundamental contradiction between profit-driven corporate structure and genuine moral [agency](https://aarnphm.xyz/posts/corporate-personhood/../../thoughts/Agency). “True” moral personhood requires the capacity for autonomous ethical reasoning and the ability to act against self-interest when morally required. Corporate fiduciary duties to shareholders, as highlighted in the Delaware court decisions discussed in ([Blair, 2015](#bib-blair2015ofcorporations)), structurally prevent this kind of authentic moral reasoning. ## Implications on social responsibility By treating corporations as persons, we risk anthropomorphising entities that are fundamentally tools of capital accumulation. ([Zuboff, 2020](#bib-zuboff2020surveillance)) describes how corporations exploit personal [data](https://aarnphm.xyz/posts/corporate-personhood/../../thoughts/data) for profit, often at the expense of individual privacy and autonomy, with the rise of surveillance capitalism. Similarly, ([CRAWFORD, 2021](#bib-atlasofai)) demonstrates the deployment of AI to optimise corporate efficiency often lacks moral oversight, exacerbating inequalities and affecting marginalised communities disproportionately. Recognising corporations are not moral agents; we shift the onus onto legal frameworks and societal pressures to enforce ethical behaviour. This understanding aligns with the objective reality of corporations as collections of individuals whose actions must be guided by laws and norms rather than assumed moral capacities. In conclusion, corporate personhood should be recognised as a limited legal fiction rather than a morally meaningful form. While legal personhood serves practical functions in commerce and law, extending this to claims of moral personhood obscures the need for external regulation and democratic oversight of corporate power. Instead of expecting corporations to embody moral principles, society should strengthen regulatory frameworks that ensure corporate actions align with the broader public interest, especially in the era of AI and data capitalism. [^analogy] ## References - Blair, M. M. (2015). Of Corporations, Courts, Personhood, and Morality. _Business Ethics Quarterly_, _25_(4), 415–431. - Chomsky, N. (1999). _Profit Over People: Neoliberalism and Global Order_. Seven Stories Press. - CRAWFORD, K. (2021). _The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence_. Yale University Press. - Deleuze, G., & Guattari, F. (1972). _Anti-Oedipus: Capitalism and Schizophrenia_. Les Editions de Minuit. - Donaldson, T. (1984). Corporations & Morality. _Noûs_, _18_(3), 548–551. - Kant, I. (1785). _Groundwork for the Metaphysics of Morals_ (T. E. Hill & A. Zweig, Eds.). Oxford University Press. - Zuboff, S. (2020). Surveillance Capitalism. _Project Syndicate_. [^analogy]: data capitalism or surveillance capitalism are used exchangeably in this context, as they both refer to the same concept of using personal data for profit. --- slug: posts/index tags: - fruit description: collections of writing title: posts. date: 2024-01-10 --- Collections of writing I really like. Some will also get posted on [chaos of living alone.](https://livingalonealone.com/) --- slug: posts/new tags: - fruit description: I saw a disstrack dropped at a hackathon demo for the first time. And on perplexity of hackathon. title: I saw a disstrack dropped at a hackathon. date: 2024-09-30 --- _And on perplexity of hackathon. See this on [substack](https://open.substack.com/pub/livingalone/p/i-saw-a-disstrack-dropped-at-a-hackathon?r=1z8i4s\&utm_campaign=post\&utm_medium=web\&showWelcomeOnShare=true)_ ![](https://aarnphm.xyz/posts/new/../../posts/images/cohere.webp) _Cohere Toronto Office_ ## feels and results. The train station loomed, a grey monolith against the ever-darkening sky. It was half-past seven on a Sunday, and I ran late for the 20:23 Lakeshore West Train back to Hamilton. Quickly grabbing my laptops from the bags I packed for the weekend away, I hop back onto the [stream](https://x.com/i/broadcasts/1OwxWNvzRejJQ) to catch others’ presentations. It wasn’t any ordinary Sunday, but rather the demo night of [New Build](https://x.com/newsystems_/status/1828455648377327976) Exhaustion clung to me like a second skin after 48 hours of sleep deprivation and intense focus on hacking on a project. Our team had already finished the demo, yet something gnawed at the corner of my mind. A vague unease, shapeless as the fog, settled over me. I couldn’t shake the feeling of [displacement](https://aarnphm.xyz/posts/new/../../thoughts/displacement) that slipped through my fingers, leaving the aftertaste of a half-remembered dream. ![](https://aarnphm.xyz/posts/new/../../posts/images/new-feeling.webp) I have done a fair shares of [hackathons](https://jzhao.xyz/posts/hackathons) in the past, yet New Build stood apart from most hackathons I have attended. New Build is **the** definition of “unc cracked tpot club” that build projects during the weekend. It was the distilled essence of Toronto’s raw talents that represents the ever-fast-growing tech scene in Canada. New Build was a multidisciplinary hackathon that combines intensive project development with team formation inspired by NBA Draft[^1]. One major feature that differentiated New from other hackathons is the draft mechanics. We knew who the team captains were. Lo and behold, yours truly was one of them. Given the crowd of cracked and brilliant minds participating in this event, the weight of self-imposed expectations hung heavy. I felt compelled to match their prowess, not for their sake but to prove something to myself. Yet beneath it all, a voice whispered a simple desire lingering at the back of my mind - to savour the experience and craft something genuine and [quaint](https://maggieappleton.com/folk-interfaces). I had an idea in mind infused with warmth, a reflection of my inner child, free from the cold glare of corporatism. > I want to play and build something novel! Yet, on Saturday morning, as soon as the clock struck 08:30, my corporate-wired mind took control, drowning out any remnants of authenticity I have. We immediately got carried away into short-term optimisation[^2] of the problem statement, min-maxxing for the potential outcomes of the project. Additionally, we were fixated on the name such that we wanted to make it work. > **We have fallen into the trap of corporatisation of hackathons**. ![](https://aarnphm.xyz/posts/new/../../posts/images/new-question.webp) This mindset got to me, and it showed during the demo. The panel said nothing. No questions, no grilling. Defeat washed over me, heavy as the silence. I felt small, like one of those shuttered storefronts dotting the neighbourhood. On the train home, I watched the city blur past - all grit, neon, and late-night diners. Something shifted, quiet as a whisper: I know my shit. Damn good, actually. The city kept moving, indifferent. And so would I. ## on hacker culture and implications of New Build. _the following is an excerpt from [Hacking the Hackathon](https://jzhao.xyz/posts/hackathons)_ A weird thing about startup/hustle culture: We fetishise exhaustion as a badge of honour. We have collectively decided that bags under our eyes are way cooler than a new iPhone. This behaviour very much stems from Silicon Valey’s [saviorism](https://stanforddaily.com/2018/02/16/silicon-valleys-saviorism-problem/) attitude. The time-boxed nature of hackathons only serves as microcosms of this zeitgeist and compels participants to push their limits in a 24-36 hours sprint to push out marketable products. The fundamental issue with this approach is its [reductionist](https://aarnphm.xyz/posts/new/../../thoughts/reductionism) nature. These rapid-fire development sessions rarely build upon existing knowledge or work in the field. More often than not, they ignore crucial context surrounding the complex issues they attempt to address, distilling multifaceted problems into a simple web app[^3]. This methodology prioritizes speed and novelty over depth and nuance, potentially leading to superficial innovations that fail to address root causes or consider long-term implications. “Hackers” are makers compelled to create - not for money or fame, but for the pure joy of bringing something new to life. The congregations of craftsmen eventually led to the formation of hackerspaces such as hackathons – a kind of digital-age speakeasy for the intellectually adventurous. These spaces were initially conceived as the “third space” outside the state’s influence and the capitalist market. Yet, these spaces often struggle to remain true to their vision without intentional intervention. The commercialisation of hackathons can be seen as an unintended consequence of their underlying financial incentives. Hackathons aren’t cheap to run, so organizers, with the best of intentions, turn to sponsorships to keep the lights on and the Red Bull flowing. But each logo slapped on a banner chips away at the original ethos. It’s a classic chicken-and-egg problem. Hackathons need money, but the incentive structure to foot the bill slowly morphs hackathons away from their original purposes. It is tricky, right? How do you keep the spirit of innovation and learning while all these other factors are at play? ```mermaid flowchart LR A[sponsors trying to maximize the benefits] --> B[organizers increases size and scope of events] B --> C[Hackers are incentivized to build] C --> A ``` > It's likely New Builds 2 will happen September 2025.\ > \ > If you're a company, a fund, and institution that wants to get involved to help make that happen, we can start discussing now.\ > \ > So far I know:\ > \ > \- Draft Night should move to Paradise Cinema for scale and theatrics.\ > \ > \- Grand… > > — V (@internetvin) [1 octobre 2024](https://twitter.com/internetvin/status/1841118814676668585) I think organizers should emphasize the ethos of hackathons, eliminate the focus on prizes and short-term projects, and replace it with something better. Reclaiming the design spaces means to cultivate a culture of [play](https://aarnphm.xyz/posts/new/../../thoughts/play) - a space for “for unfettered exploration which gives individuals freedom to explore ideas that might not have clear monetary values.” ```poetry language=fr A hackathon should be the infrastructure layer so that everyone can play. ``` ### implications from New Build. New Build tackled addresses some problems and challenges pretty well, such as the [draft mechanics](https://x.com/aarnphm_/status/1839714935963607405) which introduces some [entropy](https://aarnphm.xyz/posts/new/../../thoughts/Entropy), but fell short in terms of prizes incentives. \_K and I were chatting about how New Build felt like extended [New Office Hours](https://x.com/aarnphm_/status/1775641922029162773), which is a good first step in cultivating spaces for play\_ New Build represents what Toronto has to offer, a first step to solve the “human capital flight” (often refers as “brain drain”) in Canada. Looking ahead, I’d love to see New Build create more space for pure play. Maybe even go full retreat-style, similar to [rabbitholeathon](https://www.rabbitholeathon.com/). I have faith in the New Build team. They’ve got good people. And good people are the ultimate moat. ### going forward with hackathons. As for me, I keep saying each hackathon will be my last. The 36-hour coding binges aren’t as appealing as they once were. But I said that last time too, so who knows? There’s something addictive about the energy of a good hackathon[^4] Here’s the thing about hackathons: they don’t have to choose between being recruiting events and playgrounds for innovation. The best ones are both. But right now, the scales are tipped too far towards recruitment. It’s like optimizing for an acquisition instead of building something people want. The real magic of hackathons happens when you put hackers first. Everything else – the jobs, the networking, the sponsorships – that all follows naturally when you get the core experience right. ## to my teammates. ```poetry language=fr I'm obsessed with your work. I'm so blessed to have a chance to work with you all. I'm sorry that I couldn't do more, but overall it was a net positive. I wouldn't trade anything for it. Even though we didn't win, I'm glad that we did work on something together. I do hope that we would cross path again in the future. regards, aaron. ``` [^1]: At a conventional hackathon, one can form teams beforehand with friends or pick one team at the event for the unversed. [^2]: [Hackathons as Co-optation Ritual: Socializing Workers and Instituionalising Innovation in the “New” Economy](https://academicworks.cuny.edu/cgi/viewcontent.cgi?article=1575\&context=gc_pubs) by _Sharon Zukin and Max Papadantonakis_ [^3]: One team built AI agents to solve public policies. Per the demo, it seemed to recommend building “more police stations” to solve Moss Park’s challenges. However, it is not as simple as just “building more police stations”. The judge was pretty firm on this, but the idea was there. [^4]: Honestly, I only do this one because of [Tommy](https://tommytrinh.me/), tyfe. --- slug: posts/to-the-past-lovers tags: - sapling - poetry - love description: on past love. title: un ancien amour. date: 2024-02-12 --- ```poetry language=en Beneath the quiet of night, under the vast sky, where stars whisper stories of ancient light, I find you again in the sigh of the wind, in the gentle caress of the moon's soft beam. Your smile, a memory etched in the stars, a lantern guides me through the harrowed wall, of my heart. Your laughter, a melody that reverberates across the eons, a symphony that lingers in the silence of my mind, keeps me company among the tumultuous life. The miles stretched wide, a chasm of silent cries, a beacon once thought to withstand the test of time. But time, a cruel mistress, adds distance to the miles, a facade of perfection, at best, a jest. No distance too far, no age too enduring, to dim the echo of your laughter, to quell the fire of your touch. Yet, here I stand, reminiscing the good old days, lost in the labyrinth of time. a prisoner of the heart, and a slave to the mind, that misses the idea of you. ``` --- slug: quotes tags: - evergreen description: A collection of quotes, wisdom, and advice. title: advice. date: 2024-01-23 --- ## On life. Throw me some wisdom, and advices? I have none. — Jesse Your life so far is a drawing canvas. You can’t change what’s already been drawn, but you can always paint a new line. — paraphrased from [@tommytrxnh](https://twitter.com/tommytrxnh) 20 years from now you will be more disappointed by the things that you didn’t do than by the ones you did do. So throw off the bowlines. Sail away from the safe harbour. Catch the trade winds in your sails. Explore. Dream. Discover. — Mark Twain Sometimes, we \[care] too much about potential, less on credentials — Kate ## On bits and bytes. Computer is a bicycle for the mind. — [Steve Jobs](https://www.youtube.com/watch?v=ob_GX50Za6c\&ab_channel=MichaelLawrence) All I can say to the young is close your eyes. — Ted Nelson An expert is a man who has made all the mistakes, which can be made, in a very narrow field. — Niels Bohr ## On perspectives. Our capacity to deal with [language](https://aarnphm.xyz/thoughts/Language) is a complex, genetically-determined part of our biological endowment. It’s a product of evolution, part of our nature. — Noam Chomsky The falseness of an opinion is not for us any objection to it: it is here, perhaps, that our new [language](https://aarnphm.xyz/thoughts/Language) sounds most strangely. The question is, how far an opinion is life-furthering, life-preserving, species-preserving, perhaps species-rearing, and we are fundamentally inclined to maintain that the falsest opinions — that the renunciation of false opinions would be \[a renunciation of life]. — [Friedrich Nietzsche](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche) I always feel happy, you know why? Because I don’t expect anything from anyone. Expectations always hurt. Life is short. So love your life. Be Happy. & Keep smiling. Just live for yourself & before you speak, listen. Before you write, think. Before you spend, earn. Before you pray, forgive. Before you hurt, feel. Before you hate, love. Before you quit, try. Before you die, live. — William Shakespeare A pessimist sees the difficulty in every opportunity; an optimist sees the opportunity in every difficulty. — Winston Churchill People think focus means saying yes to the thing you’ve got to focus on. But that’s not what it means at all. It means saying no to the hundred other good ideas that there are — Steve Jobs _The moral thing that should wish to say is very simple. I should say: Love is wise, hatred is foolish_ — [Bertrand Russell](https://www.youtube.com/watch?v=ihaB8AFOhZo\&ab_channel=PhilosophieKanal) Craftsman is knowing how to work, Art is knowing when to stop. — Ben Affleck ## On drive. Life can be much broader when you discover one simple fact...that everything around you was made up by people no smarter than you.... Once you learn that, you'll never be the same again.— Steve Jobs I just wondered how things were put together. — Claude Shannon Never stop learning. Assume nothing, question everything. Teach others what you know. Analyze objectively — Richard Feynman The first principle is that you must not fool yourself, and you are the easiest person to fool. — Richard Feynman Success consists of going from failure to failure without loss of enthusiasm. — Winston Churchill \[One] who works with the door open gets all kinds of interruptions, but \[they] also occasionally gets clues as to what the world is and what might be important. — Richard Hamming ## On randomness and fun. I have to be successful because I like expensive things. — some random person on twitter People like you think I get lucky. Here’s the thing, I make my own luck. — Harvey Specter --- slug: thoughts/AGI tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/AGI" title: AGI date: 2024-02-07 --- The proposal is that such an AGI would be able to understand or learn any intellectual task that a human being can. It would also be able to learn and improve itself, and possibly be able to do things that humans cannot do. We saw “some sparks” in [LLMs](https://aarnphm.xyz/thoughts/AGI/../../thoughts/LLMs) that it can “understand” [natural language](https://aarnphm.xyz/thoughts/AGI/../../thoughts/NLP) See also [Yann’s chat with Lex](https://www.youtube.com/watch?v=5t1vTLU7s40\&ab_channel=LexFridman) --- slug: thoughts/Agency tags: - seed - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Agency" title: Agency date: 2024-02-07 --- > The ability and freedom to act in their immediate environment. Considered to be a study of [action theory](https://aarnphm.xyz/thoughts/Agency/../../thoughts/action-theory) > Everyone talks about having agency, but when it comes to falling in love, we have none (that’s why it is called falling) [Chaos](https://aarnphm.xyz/thoughts/Agency/../../thoughts/Chaos) allows for agency, but too much [entropy](https://aarnphm.xyz/thoughts/Agency/../../thoughts/Entropy) can create problems. ## Self-determination theory [link](https://selfdeterminationtheory.org/theory/) ## having a shit blog has made me feel abundant source: [Escaping Flatland](https://www.henrikkarlsson.xyz/p/having-a-shit-blog-has-made-me-feel) --- slug: thoughts/Alignment tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Alignment" title: Alignment date: 2024-03-05 --- See also: [Overton Window](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/Overton-Window) and this [blog on alignment research](https://openai.com/blog/our-approach-to-alignment-research) The act of aligning oneself with a particular group or ideology. This can be done for a variety of reasons, including: - To gain social acceptance - To gain power - To gain resources Often known as a solution to solve “hallucination” in large models token-generation. > To align a model is simply teaching it to generate tokens that is within the bound of the Overton Window. The goal is to build a aligned system that help us solve other alignment problems > Should we build a [ethical](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/ethics) aligned systems, or [morally](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/moral) aligned systems? One of [mechanistic interpretability](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/mechanistic-interpretability)’s goal is to [ablate](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/mechanistic-interpretability#ablation) harmful features ### [design](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/design) See also [Information Theory](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/Information-Theory) --- slug: thoughts/Attention tags: - technical - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Attention" title: Attention date: 2024-02-07 --- ([Vaswani et al., 2023](#bib-vaswani2023attentionneed)) Attention operates on a sequence of query $Q$, key $K$ and value $V$ vector. Attention matrix of a sequence then computed as: $$ A(Q, K, V) = \text{softmax}(\frac{Q \cdot K^{T}}{\sqrt{d}})V \space \space \text{ for } Q_{L \times d}, K_{L \times d}, V_{L \times d} $$ ## Muti-head Attention Allows the model to jointly attend to information from different representation subspaces at different positions: $$ \begin{aligned} \text{MHA}(Q,K,V) &= \text{concat}(\text{head}_1, \cdots, \text{head}_n) W^O \\ &\text{where } \space \text{head}_i = \text{A}(QW_i^O, KW_i^O, VW_i^O) \\ W^O & \in \mathbb{R}^{hd_v \times d_{\text{model}}} \end{aligned} $$ ## Group-Query Attention by ([Ainslie et al., 2023](#bib-ainslie2023gqatraininggeneralizedmultiquery)) idea: reduce number of KV heads $n_k$ to a fraction $n_k^{'} = \frac{n_q}{k}$ of number of query heads $n_q$ (evenly dividing the query heads into $n_k$ groups with $r$ heads) ## RadixAttention Implemented in ([Zheng et al., 2024](#bib-zheng2024sglangefficientexecutionstructured)) where they maintain a LRU eviction policy to maintain relevant [KV cache](https://aarnphm.xyz/thoughts/Attention/../../thoughts/KV-compression) for all requests within a [radix tree](https://aarnphm.xyz/thoughts/Attention/../../thoughts/Radix-tree) radix tree setup: - key: sequence of tokens - value: KV cache tensor (stored in GPU in a paged layout) ![](https://aarnphm.xyz/thoughts/Attention/../../thoughts/images/vllm/radix-attention.webp) _dynamic evolution of the radix tree in response to various requests._ > [!abstract]- explanation of RadixAttention with LRU eviction policy > > These requests include two chat sessions, a batch of few-shot learning inquiries, and a self-consistency sampling. Each tree edge carries a label denoting a substring or a sequence of tokens. The nodes are color-coded to reflect different states: green for newly added nodes, blue for cached nodes accessed during the time point, and red for nodes that have been evicted. > > [full explanation](https://lmsys.org/blog/2024-01-17-sglang/#backend-automatic-kv-cache-reuse-with-radixattention) ### cache-aware scheduling We define the hit rate as $$ \begin{aligned} \text{hit rate} &= \frac{\sum_{r \in R} \text{number of cached prefill tokens in } r}{\sum_{r \in R} \text{number of prefill tokens in } r} \\[8pt] &=1 - \frac{C}{\sum_{r \in R} \text{number of prefill tokens}} \end{aligned} $$ _in batch settings: sort requests by matching prefix length and prioritise one with longer matched prefixes instead of FIFO schedule._ ```pseudo \begin{algorithm} \caption{Cache-Aware Scheduling for RadixAttention with Continuous Batching} \begin{algorithmic} \State \textbf{Input:} The radix tree $T$, the memory pool $P$, the current running batch $B$, the waiting queue $Q$. \State \textbf{Output:} Finished requests and updated system state. \State // Get all requests from the waiting queue \State requests $\gets Q.\text{get\_all\_requests}()$ \State // Search for prefix matching for all waiting request \For{req $\in$ requests} \State req.prefix\_node, req.prefix\_len $\gets$ T.match\_prefix(req.input\_tokens) \EndFor \State // Sort the request according to matched prefix lengths \State requests.sort() \State // Select requests for the next batch \State available\_size $\gets$ T.evictable\_size() + P.available\_size() \State current\_size $\gets$ 0 \State new\_batch $\gets$ [] \For{req $\in$ requests} \If{req.size() + current\_size $\le$ available\_size} \State new\_batch.append(req) \State $\delta \gets T.\text{increase\_ref\_counter}(req.\text{prefix\_node})$ \State available\_size $\gets$ available\_size + $\delta$ \EndIf \EndFor \State Q.remove\_requests(new\_batch) \State // Insert requests into the current running batch \State B.merge(new\_batch) \State // Allocate new memory and do eviction if necessary \State needed\_size $\gets$ B.needed\_size() \State success, buffer $\gets$ P.alloc(needed\_size) \If{$\neg \text{success}$} \State T.evict(needed\_size) \State success, buffer $\gets$ P.alloc(needed\_size) \EndIf \State B.run(buffer) \State // Process finished requests \State finished\_requests $\gets$ B.drop\_finished\_requests() \For{req $\in$ finished\_requests} \State T.decrease\_ref\_counter(req.prefix\_node) \State T.insert(req) \EndFor \State \Return finished\_requests \end{algorithmic} \end{algorithm} ``` We got lower bound: $$ C \ge \sum_{e \in \text{edges}(T)} \mid e \mid $$ Consider we visit radix tree $T$ in DFS order. For each edge $e$ of $T$, the first time we compute KV cache associated with $e$, then we will compute the whole subtree of $e$. During computation of $e$ subtree, then edge $e$ will be continuously hit, thus no additional computation will happen. > [!tip] cache hit > > with cache size $\ge$ maximum request length (which will equals to longest path in radix tree), edge $e$ **WILL NOT** be evicted during computation of its subtree since the common prefix including $e$ of the subtree will be continuously hit. We can show that longest-shared-prefix-first order is equivalent to DFS order by induction [^proof] ### compressed FSM for jump-ahead tokens. Implemented in ([Zheng et al., 2024](#bib-zheng2024sglangefficientexecutionstructured)) #### Method 1: [FSM](https://aarnphm.xyz/thoughts/Attention/../../thoughts/constrained-decoding/../../thoughts/constrained-decoding#guided-generations-with-fsm)-based decoding - intuition: Using FSM ([Willard & Louf, 2023](#bib-willard2023efficientguidedgenerationlarge)) to guide generations by increasing logit bias for tokens that conform to given JSON schema. This allows us to track the current state during decoding and filter out invalid tokens by applying logit bias to the output. ![](https://aarnphm.xyz/thoughts/Attention/../../thoughts/constrained-decoding/../../thoughts/images/vllm/constrained-json-fsm.webp) - limitation: we can see that given construction of FSM requires token-level access, it can only transition the state by only _one_ token at a time, resulting in slow decoding. #### Method 2: Interleaved-based - intuition: breaks down JSON schemas, each containing either a chunk prefill part or constrained decoding part. They are then executed interleaved by inference system. Faster than per-token decoding given that chunked prefill components can process multiple tokens per forward pass See also using llama.cpp as backend. - limitation: - interleaved-based require custom syntax, making it less expressive compared to regex. - struggles to deal with tokenization boundaries due to conflicts between decode and chunked prefill segments. - frequent communications between interpreter and back-end adds additional overhead. #### **Method 3: Jump-Forward Decoding with compressed FSM** ![](https://aarnphm.xyz/thoughts/Attention/../../thoughts/constrained-decoding/../../thoughts/images/vllm/jump-forward-decoding-fsm.webp) > [!tip] tokenization boundary handling > > During decoding, it is preferred to combine multiple characters into a single tokens. > > For example, when decoding `"Hello"` in context of JSON decoding, LLM might output the following token `"`, `He`, `llo`, `",` > > This may cause some strange behaviour if we combine the last `"` with `,` (this regex `"[\w\d\s]*"` with the last `,` will lead to endless decoding because this token `",` is not valid even if the LM wants to stop.) Fix: - implement re-tokenization mechanism during jump-forward phase (append string instead of the tokens, followed with re-tokenization of the entire text) $\to$ add approximately 4% of overhead - use a comprehensive regex to guide the decoding phase, instead of employing multiple concatenated regex [^coalescence] [Lien vers l'original](https://aarnphm.xyz/thoughts/Attention/../../thoughts/constrained-decoding#compressed-fsm-for-jump-ahead-tokens) ## RingAttention ([Liu et al., 2023](#bib-liu2023ringattentionblockwisetransformers)) ## RazorAttention ([Tang et al., 2024](#bib-tang2024razorattentionefficientkvcache)) ## Paged Attention by ([Kwon et al., 2023](#bib-kwon2023efficient)) Used in conjunction with [continuous batching](https://aarnphm.xyz/thoughts/Attention/../../thoughts/Continuous-batching), implemented through [vLLM](https://aarnphm.xyz/thoughts/Attention/../../thoughts/vllm) Reduce memory usage of attention mechanism by swapping kv-cache in and out of memory. A block manager is similar to those of _virtual memory_ in OS. Essentially, it’s a form of **paging**, such that attention can be stored in contiguous memory. Partitions the KV cache of each sequence into KV blocks. Another optimization is to use [KV compression](https://aarnphm.xyz/thoughts/Attention/../../thoughts/KV-compression) to reduce the size of the KV cache for longer context. Given: - each block contains KV vectors for fixed number of tokens, denoted as block size $B$. - Key block $K_j= (k_{(j-1)B+1}, \ldots, k_{jB})$ - Value block $V_j= (v_{(j-1)B+1}, \ldots, v_{jB})$ $$ A_{ij} = \frac{\exp(q_i^T K_j / \sqrt{d})}{\sum_{t=1}^{i//B} \exp(q_i^T K_t / \sqrt{d})}, \quad o_i = \sum_{j=1}^{i//B} V_j A_{ij}^T $$ where $A_{ij}=(a_{i,(j-1)B+1}, \ldots a_{i,jB})$ is row vector of attention score on j-th KV block. ## References - Ainslie, J., Lee-Thorp, J., de Jong, M., Zemlyanskiy, Y., Lebrón, F., & Sanghai, S. (2023). _GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints_. arXiv preprint arXiv:2305.13245 [arxiv](https://arxiv.org/abs/2305.13245) - Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C. H., Gonzalez, J. E., Zhang, H., & Stoica, I. (2023). Efficient Memory Management for Large Language Model Serving with PagedAttention. _Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles_. - Liu, H., Zaharia, M., & Abbeel, P. (2023). _Ring Attention with Blockwise Transformers for Near-Infinite Context_. arXiv preprint arXiv:2310.01889 [arxiv](https://arxiv.org/abs/2310.01889) - Tang, H., Lin, Y., Lin, J., Han, Q., Hong, S., Yao, Y., & Wang, G. (2024). _RazorAttention: Efficient KV Cache Compression Through Retrieval Heads_. arXiv preprint arXiv:2407.15891 [arxiv](https://arxiv.org/abs/2407.15891) - Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). _Attention Is All You Need_. arXiv preprint arXiv:1706.03762 [arxiv](https://arxiv.org/abs/1706.03762) - Zheng, L., Yin, L., Xie, Z., Sun, C., Huang, J., Yu, C. H., Cao, S., Kozyrakis, C., Stoica, I., Gonzalez, J. E., Barrett, C., & Sheng, Y. (2024). _SGLang: Efficient Execution of Structured Language Model Programs_. arXiv preprint arXiv:2312.07104 [arxiv](https://arxiv.org/abs/2312.07104) [^proof]: _base_: a random request correspond to node $x \in T$ will be processed. - All requests correspond to nodes $\{v_{1}, \ldots, v_{n}\}$ on path $x \gets \text{root}$ doesn’t need recomputation. - Thus, computation complexity for requests of nodes $\{v_{1}, \ldots, v_{n}, x\}$ is aligned with DFS _induction_: assume we visit node $y \in T$, and the visited node align with DFS order. Let $P$ denote _path of_ $y \gets \text{root}$. - Each node that has not been visited has the lowest common ancestor with visited nodes on $P$. - Since nodes on $P$ are cached, a node $z$ that has yet to be visited with lowest common accestor on $P$ will have the _longest shared prefix_ - longest-shared-prefix-first order will select $z$, which is a valid DFS q.e.d --- slug: thoughts/Autograd tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Autograd" title: Autograd date: 2021-10-10 --- Auto differentiation and [XLA](https://aarnphm.xyz/thoughts/Autograd/../../thoughts/XLA) $f(x) = e^{2x} - x^3 \rightarrow \frac{df}{dx} = 2e^{2x} - 3x^2$ ← manual diff Others: - numerical, symbolic - autodiff - similar to symbolic, but on demand? - instead of expression → returns numerical value Forward mode - compute the partial diff of each scalar wrt each inputs in a forward pass - represented with tuple of original $v_i$ and _primal_ $v_i^o$ (tangent) $v_i \rightarrow (v_i, \dot{v^o})$ - [Jax](https://aarnphm.xyz/thoughts/Autograd/../../thoughts/Jax) uses operator overloading. Reverse mode - store values and dependencies of intermediate variables in memory - After forward pass, compute partial diff output wrt the intermediate adjoint $\bar{v}$ --- slug: thoughts/Automatic-Differentiation tags: - math description: resconstructed source of "https://aarnphm.xyz/thoughts/Automatic-Differentiation" title: Automatic Differentiation date: 2024-02-07 --- see also: [Autograd](https://aarnphm.xyz/thoughts/Automatic-Differentiation/../../thoughts/Autograd) and [Jax](https://aarnphm.xyz/thoughts/Automatic-Differentiation/../../thoughts/Jax) Input: code compute a function Output: code compute the derivative of the function AD writes functions as sequence of compositions block $f(x) = f_n \circ f_{n-1} \circ \ldots \circ f_1(x)$, and then computes the derivative of the function by applying the chain rule. --- slug: thoughts/Autoregressive-models tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Autoregressive-models" title: Autoregressive models date: 2024-02-07 --- A statistical model is autoregressive if it predicts future values based on past values. For example, an autoregressive model might seek to predict a stock’s future prices based on its past performance. In context of LLMs, generative pre-trained [transformers](https://aarnphm.xyz/thoughts/Autoregressive-models/../../thoughts/Transformers) (GPTs) are derivations of auto-regressive models where it takes an input sequence of tokens length $n$ and predicting the next token at index $n+1$. Auto-regressive models are often considered a more correct terminology when describing text-generation models. --- slug: thoughts/Behavirourism tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Behavirourism" title: Behavirourism date: 2024-02-07 --- Positive reinforcement (praise, rewards) strengthens the behaviour and increases the likelihood of it being repeated, where as negative reinforcement ensures such behaviour are not being repeated. ### critique. - one-dimensional to understand human-behaviour, as it focuses only on observable behaviours and neglects internal mental processes - deterministic, as it assumes that behaviour is determined by the environment and not by the individual, which induces [confirmation bias](https://aarnphm.xyz/thoughts/Behavirourism/../../thoughts/confirmation-bias) - [Compression](https://aarnphm.xyz/thoughts/Behavirourism/../../thoughts/Compression) problems --- slug: thoughts/BuildKit tags: - seed - container description: resconstructed source of "https://aarnphm.xyz/thoughts/BuildKit" title: BuildKit date: 2024-02-08 --- Concurrent, cache-efficient, and secure build system for building [OCI-compliant](https://aarnphm.xyz/thoughts/BuildKit/../../thoughts/OCI) images and artifacts. Containers are a form of [Content-addressable storage](https://aarnphm.xyz/thoughts/BuildKit/../../thoughts/Content-addressable-storage), such that you can run your application within an isolated environment. ### LLB You can think of it as LLVM IR to C is what LLB is to Dockerfile. Marshaled as a protobuf message, see [definition](https://github.com/moby/buildkit/blob/master/solver/pb/ops.proto) See also [Flatbuffer](https://aarnphm.xyz/thoughts/BuildKit/../../thoughts/In-memory-representation) --- slug: thoughts/Camus tags: - philosophy description: Camus, a scattered thoughts and notes. title: Camus date: 2024-02-28 --- Of the absurd [reasoning](https://aarnphm.xyz/thoughts/Camus/../../thoughts/reason) and [existentialism](https://aarnphm.xyz/thoughts/Camus/../../thoughts/Existentialism). ## The Myth of Sisyphus ### Absurd and suicide Suicide is the solution for the absurd: - People never died because of ontological arguments - Suicide is often the result of people who didn’t find worth in the living - Life is not worth living therefore I took the easy way out as a paradox: - Suicide is the justification of meaning of life ← the most important question for philosophers - From [Nietzchean](https://aarnphm.xyz/thoughts/Camus/../../thoughts/Philosophy-and-Nietzsche) prose, those who say “no” acts as if they said “yes”: Schopenhauer Fantasise the act of eluding: > Hope of another life one must “deserve” or trickery those who lives not for life itself but for some great idea that will transcend it, refine it, give it meaning, and betray it. Logic is easy, but it is impossible to be logical to bitter end. It is considered truth if one decided to die at the hand of self, but does that mean life itself just have no meaning? > **Absurd reasoning** is based on whether there are [logic](https://aarnphm.xyz/thoughts/Camus/../../thoughts/logic) to reasons for men who died by their “own hands consequently follow to its conclusion of their emotional inclination” The absurd come from the abject at birth, similar to end pages of the books starts from the beginning. To understand absurd is to understand the art of living, the world of intelligence. Seemingly the questions of the absurd stems from question “Why”. The wearing of a normal life, inaugurates the impulse of consciousness. Heidegger: “mere anxiety \[is] a source of everything.” --- ### definition of absurd See also P.17, P.20, P.25 I realise that if through science I can seize phenomena and enumerate them, I cannot, for all that, apprehend the world. Absurd is the confrontation of this irrational call for clarity whose call echoes in the human heart ```poetry language=fr The absurd is measured by the mans in the world ``` The attack of [reasons](https://aarnphm.xyz/thoughts/Camus/../../thoughts/reason) and decency are never stronger than our own Once we recognised the absurd, it becomes passion. How many lives with his passion or not is a different question. Philosophers lives through their lenses of the world such that they ran these experiments and believed so strongly in the results Jaspers despair any ontology because we have lost naïveté Kierkegaard lives the absurd: no truth is absolute and can render satisfactory an existence impossible in itself. The absurd is born from reasons man making sense of the world and the irreparable silence of the universe echoed back to one. > [!note] P.30 > > In all these cases, from the simplest to the most complex, the magnitude of the absurdity will be in direct ratio to the distance between the two terms of my comparison. There are absurd marriages, challenges, rancors, silences, wars, and even peace treaties. For each of them the absurdity springs from a comparison. I am thus justified in saying that the feeling of absurdity does not spring from the mere scrutiny of a fact or an impression, but that it bursts from the comparison between a bare fact and a certain reality, between an action and the world that transcends it. The absurd is essentially a divorce. It lies in neither of the elements compared; it is born of their confrontation. In this particular case and on the plane of intelligence, I can therefore say that the Absurd is not in man (if such a metaphor could have a meaning) nor in the world, but in their presence together. For the moment it is the only bond uniting them. If I wish to limit myself to facts, i know what man wants, I know that the world offers him, snd now i can say that i know what links them. Rule of method: A man is always a prey for his truth. Once he has admitted them he cannot free himself from them. a man who become conscious of his absurd is now forever bound by it Indeed, Kierkegaard himself shows us the path taken. > I do not want to suggest anything here, but how can one fail to read in his works the signs of an almost intentional mutilation of the soul to balance the mutilation accepted in regard to the absurd? It is the leitmotiv of the Journal. “What I lacked was the animal which also belongs to human destiny… . But give me a body then.” And further on: “Oh! especially in my early youth what should I not have given to be a man, even for six months … What I lack, basically, is a body and the physical conditions of existence.” > Reconciliation through scandal is still reconciliation. It allows one perhaps, aa can be seen, to derive hope of its contrary, which is death Kierkegaard’s view on despair is that it is not a fact, but a state: the state of sin. For sin is what alienates from God. The absurd, is the metaphysical state of the conscious man, does not lead to God. Therefore, the absurd is the sin without God > [!note] P.44 > > I read merely these assertions of Husserl, apparently parade cal yet rigorously logical if what precedes is accept That which is true is true absolutely, in itself; truth, one, identical with itself, however different the creation who perceive it, men, monsters, angels or gods.” Reason triumphs and trumpets forth with that voice, I cannot, deny. What can its assertions mean in the absurd word The perception of an angel or a god has no meaning for me. That geometrical spot where divine reason ratifies mine will always be incomprehensible to me. There, too, I discern a leap, and though performed in the abstract, it nonetheless means for me forgetting just what I do not want to forget. Husserl exclaims: “If all masses subject to attraction were to disappear, the law of attraction would not be destroyed but would simply remain without any possible application, I know that I am faced with a metaphysic of consolation. And if I want to discover the point where though leaves the path of evidence, I have only to reread the parallel reasoning that H voices regarding the mind: if we could contemplate clearly the exact laws of psychic process, they would be seen to be likewise eternal and invariable, like the basic laws of theoretical science. Hence they would be valid even if there were no psychic process. Even if the mind were not, its law would be, i see then a psychological truth H aims to make a rational rule: after having denied the integrating power of human reason, he leaps this expedient by eternal reason. Husserl’s concrete universe in that all essences are not formal, but some are material, that the first are the object of logic and second of science, this is mere question of definition. I then realize that merely the order of the procession has been changed. This world has ceased to have its reflection in a higher universe, but the heaven of forms is figured in the host of images of this earth. This changes nothing for me. Rather than encountering here a taste for the concrete, the meaning of the human condition, I find an intellectualism sufficiently unbridled to generalize the concrete itself. ### absurd freedom If I were a tree among trees, a cat among animals, this life would have a meaning, or rather this problem would not arise, for I should belong to this world. I should be this world to which I am now opposed by my whole consciousness and my whole insistence upon familiarity. This ridiculous reason is what sets me in opposition to all creation. I cannot cross it out with a stroke of pen. What I believe to be true I must therefore preserve. The absurd is simultaneously the awareness and rejection of death. Suicide as a solution for the absurd, the absolute, such that man’s cannot seem to live with his dreadful future, that he choose suicide as a solution. - Consciousness and revolt as rejections are contrary of renunciation - The method regards to a matter of persistent > [!tip] Freedom > > Knowing whether or not a man is free involves in whether he can/cannot have a master. The paradox of this freedom is that understanding the metaphysical liberty takes away its meaning of being free. God is problem of evil: either we are not free and God all-powerful is responsible for evil. Or we are free and responsible but god is not all-powerful. Freedom cannot be inferred as a general solution, such that it can only be derived from one’s experience. I don’t inherit freedom from a higher being, as I’m my own owner of my thoughts and actions Such that I’m responsible for my own actions. If the absurd cancel put the eternal freedom, it restored and magnifies my freedom of action. Man is bound to postulate his freedom based on the illusion of which he was living. Losing oneself in that bottomless certainty, feeling henceforth sufficiently remote from one’s own life to increase it and take a broad view of it - it involves a principles of liberation. Such new independence has a definite time limit, like any freedom of action. ### the absurd man The actor trains himself to feed only on appearances. --- ### Analysis Camus’ argument on the absurd: - the world is full of irrationality and indifference. The world is silent against humanity search for the meaning of life. - Meaning and value are constructed by humans, instead of what Kierkegaard implies in putting faith as a solution for outsource our value system. Because eventually life is meaningless - But what Kierkegaard is doing is actually a philosophical suicide. > [!note] Note > > I don’t know whether this world has a meaning that transcend it. But I know that i do not know that meaning and that it is impossible for me just now to know it > [!note] Note > > What can a meaning outside my condition mean to me. I can understand only in human terms. Did he mean the world or the human as absurd? No, because as rational human being we are programmed to create order and put meaning to life in a indifferent and irrational universe The why arises, and trying to find rational in an irrational world is absurd. The absurd cannot be negated, meaning we can live either in acceptance or escape from it. Religion is a set of pre-made answers for existential and philosophical questions, and use as tools for control. Philosophical suicide is to elude the absurd and trying to figure out the meaning of life, with a set of man-made beliefs How to live life in a meaningless world? It is to let loose in all definitions of meaning and live life fruitfully. Instead of despairing, see the silver lining, to focus on this life, create [value](https://aarnphm.xyz/thoughts/Camus/../../thoughts/Value) on our own, when our time is limited, with a full perception of it. One should not accept the absurd, we should revolt around it as we have full control of our own actions and freedom. Rebellion: full of thoughts and actions. as rejection of hope. The goal is to live solely with what he know, to accommodate with what is and to bring in nothing? --- slug: thoughts/Cauchy-Schwarz tags: - math description: resconstructed source of "https://aarnphm.xyz/thoughts/Cauchy-Schwarz" title: Cauchy-Schwarz date: 2024-11-05 --- _useful for derive upper bounds, e.g when analysing the error or convergence rate of an algorithm_ > [!abstract] format > > for all vectors $v$ and $v$ of an inner product space, we have > > $$ > \mid \langle u, v \rangle \mid ^2 \le \langle u, u \rangle \dot \langle v, v \rangle > $$ In context of Euclidean norm: $$ \mid x^T y \mid \le \|x\|_2 \|y\|_2 $$ ## proof _using Pythagorean theorem_ special case of $v=0$. Then $\|u\|\|v\| =0$, ⇒ if $u$ and $v$ are [linearly dependent](https://aarnphm.xyz/thoughts/Cauchy-Schwarz/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1#linear-dependence-of-vectors)., then q.e.d Assume that $v \neq 0$. Let $z \coloneqq u - \frac{\langle u, v \rangle}{\langle v, v \rangle} v$ It follows from linearity of inner product that $$ \langle z,v \rangle = \langle u - \frac{\langle u,v \rangle}{\langle v, v \rangle} v,v \rangle = \langle u,v \rangle - \frac{\langle u,v \rangle}{\langle v,v \rangle}\langle v,v \rangle = 0 $$ Therefore $z$ is orthogonal to $v$ (or $z$ is the projection onto the plane orthogonal to $v$). We can then apply Pythagorean theorem for the following: $$ u = \frac{\langle u,v \rangle}{\langle v,v \rangle} v + z $$ which gives $$ \begin{aligned} \|u\|^{2} &= \mid \frac{\langle u,v \rangle}{\langle v,v \rangle} \mid^{2} \|v\|^{2} + \|z\|^2 \\ &=\frac{\mid \langle u,v \rangle \mid^{2}}{(\|v\|^2)^{2}} \|v\|^{2} + \|z\|^2 \\ &= \frac{\mid \langle u, v \rangle\mid^2}{\|v\|^{2} } + \|z\|^2 \ge \frac{\mid \langle u,v \rangle \mid^2}{\|v\|^{2} }\\ \end{aligned} $$ Follows $\|z\|^{2}=0 \implies z=0$, which estabilishes linear dependences between $u$ and $v$. q.e.d --- slug: thoughts/Chaos tags: - philosophy description: Chaos a la carte. title: Chaos date: 2024-01-08 --- Full [post](https://aarnphm.xyz/thoughts/Chaos/../../posts/Chaos). > Chaos: a looseness [collection](https://subconscious.substack.com/p/self-organizing-ideas) of one’s [will](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Will) to life. The etymology of chaos traces back to the Greek word χάος (khaos), meaning which means abyss, that which gapes wide open, that which is vast and [empty](https://www.merriam-webster.com/wordplay/chaos-meaning-and-history) ## as system. See also [Chaos as an intermittently forced linear system](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/papers/Chaos-as-an-intermittently-forced-linear-system.pdf) Chaos theory posits that within the apparent randomness of complex system lies an underlying pattern, self-similarity, and self-organization. The amount of time in which a system can be predicted is dependent on the following: - how much uncertainty can be tolerated in the forecast. - accuracy of measuring current state of the system. - A time scale, often known as [Lyapunov time](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Lyapunov-time) We can often see [entropy](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Entropy) as a consequence of chaos. These are often linked, yet distinct concepts. The loss of order induces unpredictability within deterministic systems, or such systems are _sensitive dependent_ on initial condition. Whereas entropy deals with property of how one system can be arranged. We can observe this through Lorenz [attractor](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/attractor) system: $$ \begin{align*} \frac{dx}{dt} &= \sigma(y - x), \\ \frac{dy}{dt} &= x(\rho - z) - y, \\ \frac{dz}{dt} &= xy - \beta z. \end{align*} $$ > Chaos: When the present determines the future, but the approximate present does not approximately determine the future. ## as scale. See also [this tweet](https://twitter.com/eshear/status/1760755072571777412) Often known as cognitive dissonance, or linked with emotional turmoil. The personal traits continuum scale, characterised by Carl Jung suggested that the human psyche lies within the spectrum of extroversion and introversion, rather than a definitive single continuum that modern psychology perceive it to be. _How does Chaos influence the scale of human psyche?_ ## fundamentals. What [Nietzsche](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Philosophy-and-Nietzsche) would imply: > Alas! there cometh the time when man will no longer launch the arrow of his longing beyond man—and the string of his bow will have unlearned to whizz! > > I tell you: one must still have chaos in one, to give birth to a dancing star. I tell you: ye have still chaos in you. _extracted from Z V, Death of The God_ Chaos is the essence of one existence. Such that the world is not governed by fixed rules and predetermined order. Nietzsche rejects [transcendentals chaos](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Transcendentals), such construct reality beyond sensory understanding. These truths are [philosophers’ prejudices](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Philosophy-and-Nietzsche#prejudices-of-philosophers) that deny one’s will to [power](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Will#as-power), such that all truth are just one’s perception and experience. - “eternal recurrence” [^1] \- a litmus test for an individual’s capacity to affirm life ← Actively mentioned throughout [Thus Spoke Zarathustra](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Philosophy-and-Nietzsche#thus-spoke-zarathustra). - implies the possibility of composite [self](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/papers/Nietzsche-the-Kantian-Self-and-Eternal-Recurrence.pdf), that the individual remains the same for eternity of life. - allows chaos to remain the force of life, and will to power to be a configuration of chaos. > The self, for Nietzsche, is not just a radically unstable postmodern self. It is such a self, but it is not simply such a self. It also has a stability, sameness, and unity that goes far beyond anything Kant ever imagined in his wildest dreams The Übermensch must find his footing, create his own values through the act of living. Chaos is important for the creation of anything that is truly new and valuable. Chaos, in its many forms, is often seen as a force to be feared or avoided, yet it is also a catalyst for growth. It challenges the boundaries of our comfort zones and compels us to engage with aspects of our lives and selves that we might prefer to ignore. ## versus equanimity. Equanimity should be one to seek, but yet chaos is all I desire. (moment of chaos, moment of equanimity) The rule of a utilitarian is to maximize desire at all cost, therefore, does it mean I should always seek chaos? Nietzsche would argue that the motion of chaos invokes entropy, and entropy induces value, and the Übermensch embarks upon the creation of value. [Taste](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/taste) implies multiplicity of being. It is driven by inner chaos to explore and expands on our [representation](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Language) of the world. Yet ignorance seems to overload chaos, and prevents the maximum utilization of ones potential. I wonder if chaos is just a collection of different entropic phenomena. Equanimity, represents a state of calmness and balance, even in the face of adversity. Achieving it is not about denying chaos or the tumult of emotions it can evoke, but rather about finding a way to navigate through it without being overwhelmed. It’s about learning to coexist with the chaos, recognizing it as a part of the broader tapestry of life and the self. Running away from normalcy to seek out “different entropic phenomena” speaks to a deep-seated curiosity and a desire not just for experience, but for understanding the intricate dynamics of life. It’s a testament to the strength and resilience of the human spirit in its quest for meaning, even when faced with the seemingly insurmountable. [^1]: See also [Giles Deleuze’s](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Giles-Deleuze#nietzsche-and-philosophy) interpretation. --- slug: thoughts/Cholesky-decomposition tags: - math description: resconstructed source of "https://aarnphm.xyz/thoughts/Cholesky-decomposition" title: Cholesky decomposition date: 2024-10-28 --- decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose. (used for [Monte-Carlo simulations](https://aarnphm.xyz/thoughts/Cholesky-decomposition/../../thoughts/Monte-Carlo#simulations)) $$ A = LL^{*} $$ where $L$ is a lower triangular matrix with real and positive diagonal entries, and $L^{*}$ is the conjugate transpose of $L$. --- slug: thoughts/Cinematography tags: - film - evergreen description: resconstructed source of "https://aarnphm.xyz/thoughts/Cinematography" title: Cinematography date: 2023-09-11 --- Notes on format: - Anamorphic [lenses](https://aarnphm.xyz/thoughts/Cinematography/../../thoughts/lenses) Equipment: - A7III - Shallow depth of field - FX3 - larger sensor pixel area > [Lightning](https://aarnphm.xyz/thoughts/Cinematography/../../thoughts/Lighting) is key [Planimetric composition](https://aarnphm.xyz/thoughts/Cinematography/../../thoughts/Planimetric-composition) - Wes Anderson --- slug: thoughts/Civilisation-and-its-Discontents tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Civilisation-and-its-Discontents" title: Civilisation and its Discontents date: 2023-10-10 --- See also: [Freud](https://aarnphm.xyz/thoughts/Civilisation-and-its-Discontents/../../thoughts/Freud) C1: ego and sense of self within the societal context - Oceanic feeling - ignorance for the existence of others - Cant seem to separate himself from the sense of reality C2: the meaning of happiness? - his discontent against personal freedom and societal restrictions - The sense of guilt? Guilty for not following societal norms - Eros and Thanatos C3: What are the core purposes of this biological beings we called self? Freud argues the human psyche is not a single monolith, rather comprises of complex interplay of the following components: Id: primal, instinctive part of self, seeking immediate gratification of pleasure Ego: logical, rational conscious part of the psyche Superego: internalized moral and societal values C5: Emphasis on the construct of human psyche creates internal conflicts, adding civilizations norms which increases the tendency for aggression versus self love --- slug: thoughts/Color tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Color" title: Color date: 2024-03-09 --- ### theory. Complementary Analogous Triadic See also [coolors.co](https://aarnphm.xyz/thoughts/Color/../../coolors.co) contrast, combination, thickness 1. background Color 2. surface area --- slug: thoughts/Compiler tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Compiler" title: Compiler date: 2024-10-07 --- ## just-in-time compiler ```mermaid graph TD A[Source Code] --> B[Bytecode / IR] B --> C[Interpreter] C --> D{Hot Spot?} D -->|Yes| E[JIT Compiler] D -->|No| C E --> F[Native Machine Code] F --> G[Execution] C --> G ``` See also: [thoughts/jit.py](https://cdn.aarnphm.xyz/assets/thoughts/jit.py) toy example for branch optimization: ```python import numpy as np import numpy.typing as npt cache: list[npt.NDArray[np.float32]] = [] def dct_jit(x: npt.NDArray[np.float32]) -> npt.NDArray[np.float32]: global cache x_tuple = tuple(x) if x_tuple in cache: return cache[x_tuple] N = len(x) result = np.zeros(N) for k in range(N): sum_val = 0 for n in range(N): sum_val += x[n] * np.cos(np.pi * k * (2 * n + 1) / (2 * N)) result[k] = sum_val cache[x_tuple] = result return result ``` --- slug: thoughts/Compression tags: - seed - technical description: resconstructed source of "https://aarnphm.xyz/thoughts/Compression" title: Compression date: 2024-02-07 --- --- slug: thoughts/Constructionist tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Constructionist" title: Constructionist date: 2024-02-07 --- Mindstorm and Design Justice --- slug: thoughts/Containers tags: - technical - storage description: resconstructed source of "https://aarnphm.xyz/thoughts/Containers" title: Containers date: 2024-02-08 --- See also [OCI specification](https://aarnphm.xyz/thoughts/Containers/../../thoughts/OCI), [BuildKit](https://aarnphm.xyz/thoughts/Containers/../../thoughts/BuildKit) --- slug: thoughts/Content-addressable-storage tags: - seed - technical description: resconstructed source of "https://aarnphm.xyz/thoughts/Content-addressable-storage" title: Content-addressable storage date: 2023-04-15 --- Content-addressed storage is a mechanism to store information such that it can be retrieved based on its content, not name or location. > If you have a book, say “Control Systems Engineer by N.S.Nise, with ISBN: 978-1-119-47422-7”, you can find the book anywhere, including its information and content. > > By contrast, if I use location-addressing to identify the book, say, “the book on the second shelf of the third row in the library”, it would be difficult to find the book if the library is reorganized. | Content-addressed | Location-addressed | | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | use cryptographic hash functions[^1] to generate unique keys to retrieved based on contents | e.g: [HTTP](https://aarnphm.xyz/thoughts/Content-addressable-storage/../../thoughts/HTTP), look up content by its location (URI). Thus contents is controlled by the owner of the location | ## Immutable Objects, Mutable References Utilize [Merkle DAG](https://aarnphm.xyz/thoughts/Content-addressable-storage/../../thoughts/Merkle-DAG), immutable content-addressed objects, and mutable pointers to the DAG, which creates a dichotomy presents in many distributed systems. See also [IPFS](https://aarnphm.xyz/thoughts/Content-addressable-storage/../../thoughts/IPFS), [Block-reference mechanism](https://aarnphm.xyz/thoughts/Content-addressable-storage/../../thoughts/Block-reference-mechanism) [^1]: See [cryptographic functions](https://aarnphm.xyz/thoughts/Content-addressable-storage/../../thoughts/cryptography#functions) --- slug: thoughts/Continuous-batching tags: - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Continuous-batching" title: Continuous batching date: 2024-02-08 --- ([Yu et al., 2022](#bib-280922)) solves the static batching to reduce cost and improve throughput by appending requests continuously into existing KV cache [^paper] ![](https://aarnphm.xyz/thoughts/Continuous-batching/../../thoughts/images/vllm/continuous-batching.webp) ## References - Yu, G.-I., Jeong, J. S., Kim, G.-W., Kim, S., & Chun, B.-G. (2022). Orca: A Distributed Serving System for Transformer-Based Generative Models. _16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)_, 521–538. [^paper]: The [paper](https://www.usenix.org/conference/osdi22/presentation/yu) and [presentation](https://www.youtube.com/watch?v=Ob9PPLxETYU\&ab_channel=USENIX) for the paper. Most notable open source implementation is [vLLM](https://aarnphm.xyz/thoughts/Continuous-batching/../../thoughts/vllm). p/s: Actually, I think first implemented in [huggingface/tgi](https://github.com/huggingface/text-generation-inference) --- slug: thoughts/Database tags: - technical description: resconstructed source of "https://aarnphm.xyz/thoughts/Database" title: Database date: 2024-02-09 --- See also [introduction](https://aarnphm.xyz/thoughts/Database/../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS) --- slug: thoughts/Determinism tags: - seed - computing description: resconstructed source of "https://aarnphm.xyz/thoughts/Determinism" title: Determinism date: 2024-01-08 --- The argument from Hume --- slug: thoughts/Digital-garden tags: - seed - pattern description: resconstructed source of "https://aarnphm.xyz/thoughts/Digital-garden" title: Digital garden date: 2024-02-09 --- A collection of notes, thoughts, and ideas that are cultivated and grown over time. It’s a place where you can plant seeds, grow them, and let them bloom. It’s a place where you can let your thoughts grow organically, and where you can let your ideas flourish. In a sense, it is a form of [hypertext](https://aarnphm.xyz/thoughts/Digital-garden/../../thoughts/Hypertext), a personalized Xanadu system. Wikipedia is also considered as society’s digital garden Joel Hooks puts this better than I can ever do: > A garden is usually a place where things grow. > > Gardens can be very personal and full of whimsy or a garden can be a source of food and substance. > > We gather and work together in community gardens to share the labor as well as the rewards of a collective effort. > > It’s a comparison that you can take very far. From “planting seeds” and “pulling weeds” to tending multiple gardens that each serve an individual need or desired outcome. > > Like with real gardens, our digital gardens are a constant ebb and flow towards [entropy](https://aarnphm.xyz/thoughts/Digital-garden/../../thoughts/Entropy). > Nerding hard on digital gardens, personal wikis, and experimental knowledge systems with [@\_jonesian](https://twitter.com/_jonesian) today.\ > \ > We have an epic collection going, check these out...\ > \ > 1\. [@tomcritchlow](https://twitter.com/tomcritchlow)'s Wikifolders: [pic.twitter.com/9ri6g9hD93](https://t.co/9ri6g9hD93) > > — Maggie Appleton (@Mappletons) [15 avril 2020](https://twitter.com/Mappletons/status/1250532315459194880) See also: [post](https://maggieappleton.com/garden-history) and [introduction](https://joelhooks.com/digital-garden) ## The garden and the stream [Source](https://hapgood.us/2015/10/17/the-garden-and-the-stream-a-technopastoral/) --- slug: thoughts/Dishes tags: - evergreen - menu description: resconstructed source of "https://aarnphm.xyz/thoughts/Dishes" title: Menus date: 2023-10-26 --- A collection of courses. See [atelier with friends](https://aarnphm.xyz/thoughts/Dishes/../../thoughts/atelier-with-friends/) if you are interested to join.. This serves as a ground truth for a collection of dishes throughout. ## italienne. 1. Uovo la Raviolo ### salsa. 1. Marinara 2. Sugo Pomodoro ## le viandier. 1. Soupe à l’Oignon Gratinée 2. Chicken liver paté a la Jacques Pepin 3. La trout meunière a la choux de Bruxelles 4. Canard a l’orange 5. Salade Landaise 6. la charcuterie 7. Mousse au chocolat 8. gateau au chocolat - espresso buttercream, honey, sea salt, chocolate ganache. 9. choux au craquelin - matcha cream, powered sugar, matcha powder. --- slug: thoughts/Dysregulation tags: - seed - psychology description: resconstructed source of "https://aarnphm.xyz/thoughts/Dysregulation" title: Dysregulation date: 2024-02-12 --- > That feeling when you want to text that person back, but you are too nervous about why they didn’t then you started forming up scenarios in your head why such things happens. prefrontal cortex goes to sleep and amygdala takes over ⇒ reaffirming core beliefs ⇒ get caught anxiety - [ ] How to deal with it? - [ ] Regulate your emotions, cut through that energy - [ ] Stop and name the feeling, turn on prefrontal cortex for logical brain - [ ] Safety lies within you, not in the other person --- slug: thoughts/Embedding tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Embedding" title: Embedding date: 2024-02-25 --- See also [Transformers](https://aarnphm.xyz/thoughts/Embedding/../../thoughts/Transformers#inference) --- slug: thoughts/Entropy tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Entropy" title: Entropy date: 2024-01-11 --- > In particular, "good, aligned, conversational AI" is just one of many possible different rollouts. Finetuning / alignment tries to "collapse" and control the entropy to that region of the simulator. Jailbreak prompts try to knock the state into other logprob ravines. > > — Andrej Karpathy (@karpathy) [6 mars 2023](https://twitter.com/karpathy/status/1632800082679705600) $$ S = k_b \ln \Omega $$ --- slug: thoughts/Epistemology tags: - seed - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Epistemology" title: Epistemology date: 2024-02-07 --- The study of knowledge and justified belief. --- slug: thoughts/Euler's-identity tags: - math description: resconstructed source of "https://aarnphm.xyz/thoughts/Euler's-identity" title: Euler's identity date: 2024-11-05 --- Probably the most [beautiful](https://aarnphm.xyz/thoughts/Euler's-identity/../../thoughts/aesthetic-value#beauty) equation in mathematics: $$ \begin{aligned} e^{i \pi} &+ 1 = 0 \\ \\ \because e &: \text{Euler's number} \\ i &: \text{imaginary unit satisfies } i^{2} = -1 \\ \pi &: \text{pi} \end{aligned} $$ special case of Euler’s formula: $$ e^{i \theta} = \cos(\theta) + i \sin(\theta) $$ --- slug: thoughts/Existentialism tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Existentialism" title: Existentialism date: 2024-02-29 --- See also [Camus](https://aarnphm.xyz/thoughts/Existentialism/../../thoughts/Camus)’s absurdism. The school of philosophy that emerged as a backdrop of WWII, where entire generation was confronted with the anxiety-provoking given of death, freedom, and meaninglessness. Most of frontier were French, most notably Jean-Paul Sartre, Simone de Beauvoir, Albert [Camus](https://aarnphm.xyz/thoughts/Existentialism/../../thoughts/Camus), Gabriel Marcel, and Maurice Merleau-Ponty, the conceptual groundwork of the movement was laid much earlier in the nineteenth century by pioneers like Søren Kierkegaard and Friedrich [Nietzsche](https://aarnphm.xyz/thoughts/Existentialism/../../thoughts/Philosophy-and-Nietzsche) and twentieth-century German philosophers like Edmund Husserl, Martin Heidegger, and Karl Jaspers as well as prominent Spanish intellectuals José Ortega y Gasset and Miguel de Unamuno. See also [definition](https://plato.stanford.edu/entries/existentialism/) --- slug: thoughts/Expenses tags: - evergreen description: resconstructed source of "https://aarnphm.xyz/thoughts/Expenses" title: Expenses date: 2024-01-09 --- > [!tip] TL;DR > > This is for personal uses, and I fully understand that I’m very fortunate to afford such lifestyle. ### Subscriptions: | Description | \$ | occurrence | Currency | Card | | ----------------------------------------------------------------------- | --------- | ---------- | -------- | ----- | | Apple TV | 9.95 | M | USD | Chase | | Discord Nitro | 9.99 | M | USD | Chase | | Perplexity Pro | 200 | Y | USD | Chase | | bookbear express | 70 | Y | USD | Chase | | Vocabulary | 29.99 | Y | USD | Chase | | Duolingo Max | 149.99 | Y | USD | Chase | | Strava | 79.99 | Y | USD | Chase | | Twitter Premium+ | 210 | Y | USD | Chase | | Uber One | 9.99 | M | USD | Chase | | Youtube Premium Student | 7.99 | M | USD | Chase | | Grammarly (for mom) | 144 | Y | USD | Chase | | [fashion](https://aarnphm.xyz/thoughts/Expenses/../../thoughts/fashion) | recurrent | year | USD | Chase | ### Archive: List of subscription I have stopped using. | Description | \$ | occurrence | Currency | Card | | -------------- | ----- | ---------- | -------- | ----- | | ChatGPT Plus | 20 | M | USD | Chase | | Apple One | 19.95 | M | USD | Chase | | Midjourney | 10 | M | USD | Chase | | Supermaven Pro | 10 | M | USD | Chase | --- slug: thoughts/Fisher-Yates tags: - seed description: Fisher-Yates shuffle algorithm title: Fisher-Yates date: 2024-01-30 --- Produced an _unbiased_ permutation: every permutation is equally likely. Pseudocode: ```pseudo \begin{algorithm} \caption{Fisher-Yates shuffle} \begin{algorithmic} \REQUIRE An array $A$ of length $n$ \FOR{$i = n-1$ \TO $1$} \STATE $j \gets$ random integer such that $0 \leq j \leq i$ \STATE swap $A[i]$ and $A[j]$ \ENDFOR \end{algorithmic} \end{algorithm} ``` Implementation of modern Fisher-Yates algorithm ```js title="FisherYates.js" function sample(obj, n, guard) { if (n == null || guard) { if (!isArrayLike(obj)) obj = values(obj) return obj[random(obj.length - 1)] } var sample = toArray(obj) var length = getLength(sample) n = Math.max(Math.min(n, length), 0) var last = length - 1 for (var index = 0; index < n; index++) { var rand = random(index, last) var temp = sample[index] sample[index] = sample[rand] sample[rand] = temp } return sample.slice(0, n) } ``` --- slug: thoughts/Freud tags: - seed - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Freud" title: Sigmund Freud date: 2023-10-10 --- ## Beyond the Pleasure Principle ## The Ego and The Id > The state of consciousness is very transitory - [ ] P.16-18 - [ ] The relationship between Pcts and Cs. ### Cs, Pcs, Ucs. Two kinds of consciousness but in a dynamic sense it is one The ego is a coherent organisation of mental processes, that the consciousness is attached to. > But what about those in the processes which we may—roughly and inexactly— up under the name of thought-processes? They represent displacements of mental energy which are effected, where in the interior of the apparatus as this energy proceed on its way towards action. Do they advance to the sur which causes consciousness to be generated? Or does sciousness make its way to them? This is clearly one of the difficulties that arise when one begins to take the spatial or topological idea of mental life logically. Both are equally unimaginable. There must be a third alternative. In itself something unconscious become preconscious such that how can we make something that is repressed (pre)conscious would be answered: Internal perception yields sensation of processes arising in the most diverse strata of the mental apparatus. These sensations are my views about their idea for this. These sensations are multilocular, like external perceptions; they may come from different places simultaneously and may thus have different or even opposite qualities. Sensations of a pleasurable nature have not anything inherently impelling about them, whereas unpleasurable ones have it in the highest degree. The latter impel towards change, towards discharge, and that is why we interpret un-pleasure as implying a heightening and pleasure a lowering of energic cathexis.’ Let us call what becomes conscious as pleasure and unpleasure a quantitative and qualitative ‘something’ in the course of mental events; the question then is whether this ‘something’ can become conscious in the place where it is, or whether it must first be transmitted to the system Pept. Clinical experience decides for the latter. It shows us that this something’ behaves like a repressed impulse. It can exert driving force without the ego noticing the compulsion. > [!tip] The Ego > > The ego is the id modified by influence of perceptual system object-cathexis and Oedipus complex to describe the form of ego ### Object-choices and identification ```poetry language=fr At this point we must widen our range a little. We succeeded in explaining the painful disorder of melancholia by supposing that [in those suffering from it] an object which was lost has been set up again inside the ego-that is, that an object-cathexis has been replaced by an identification. ``` At that time, however, we did not appreciate the full significance of this process and did not know how common and how typical it is. Since then we have come to understand that this kind of substitution has a great share in determining the form taken by the ego and that it makes an essential contribution towards building up what is called its ‘character At the very beginning, in the individual’s primitive oral phase, object-cathexis and identification are no doubt indistinguishable from each other. We can only suppose that later on object-cathexis proceed from the id, which feels erotic trends as needs. The ego, which to begin with is still feeble, becomes aware of the object-cathexis, and either acquiesces in them or tries to fend them off by the process of repression. The super-ego originates from the experience that let to totemism Early conflicts of the ego with object-cathexis of the id can be continued in conflicts with their heir, super-ego If the ego has not succeeded in properly mastering the Oedipus complex, the energic cathexis pf the latter, spring from the id will come into operation once more reaction-formation of the ego ideal. --- slug: thoughts/GPU-programming tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/GPU-programming" title: GPU programming date: 2023-10-10 --- --- slug: thoughts/Garbage-in-Garbage-out tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Garbage-in-Garbage-out" title: Garbage in Garbage out date: 2024-02-08 --- There has been this notion of “garbage in, garbage out” in CS which states that bad [data](https://aarnphm.xyz/thoughts/Garbage-in-Garbage-out/../../thoughts/data), inputs will then produce an output that is of equal quality. The problem of [alignment](https://aarnphm.xyz/thoughts/Garbage-in-Garbage-out/../../thoughts/Alignment): How can we ingest [information](https://aarnphm.xyz/thoughts/Garbage-in-Garbage-out/../../thoughts/Information-Theory)into a system to align with our objectives? How one creates agenda-free [representations](https://aarnphm.xyz/thoughts/Garbage-in-Garbage-out/../../thoughts/representations) of a agenda-filled world? --- slug: thoughts/Gestalt-Principles tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Gestalt-Principles" title: Gestalt Principles date: 2024-03-09 --- Relates to how we perceive [composition](https://aarnphm.xyz/thoughts/Gestalt-Principles/../../thoughts/composition) Proximity Common Region --- slug: thoughts/Giles-Deleuze tags: - philosophy - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Giles-Deleuze" title: Giles Deleuze date: 2024-02-24 --- French philosopher, known for his work on the concept of multiplicity, being and affirmation. Also work on critical philosophy and the study of sense and value. ## [Nietzsche and Philosophy](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/Philosophy-and-Nietzsche) The common misunderstanding of power is that it is the object of the will. Instead, Deleuze posits Power as subject of the will, such that [Will to Power](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/Will-to-Power) is not a [desire](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/desire) for domination, but expressive force that creates values Nietzsche’s genealogy work on [moral](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/Philosophy-and-Nietzsche#on-genealogy-of-morals) makes nihilism the presupposition of all metaphysics rather than a particular metaphysics, which allows nihilism to be overcome via the active negation of reactive forces. Deleuze rejects the traditional metaphysical view of being as stable and singular, instead proposing an ontology of difference where being is understood as a dynamic process of becoming. [^1] This process is characterized by the constant creation of new relations and entities, without any predetermined goal or final state. In this framework, the will to power is seen as the differential and generative force that drives the process of becoming, constantly creating new values and ways of being. Deleuze interprets Nietzsche’s “eternal return” as affirmation of becoming: The analogy of a dice throw[^2]: When we throw the dice, the outcome is the combination of chances (randomness) and the necessity (resulting combination that follows the throw). Deleuze infers that necessity is not something separate from chance but is affirmed through chance. Or necessity (outcome of dice throw) is realized through **the act** of throwing the dice. Nietzsche turns chance into an affirmation, identifying it with multiplicity, fragments, parts, and [chaos](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/Chaos). The dice throw affirms becoming, and the combination it forms upon falling is the affirmation of necessity. ## active and reactive forces. See also: [action theory](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/action-theory) ## Capitalism and Schizophrenia [^1]: See [this notes](https://faculty.fordham.edu/tampio/Tampio%20-%20Multiplicity.pdf) [^2]: [chances](https://piratesandrevolutionaries.blogspot.com/2009/05/dicethrow-11-in-deleuze-nietzsche.html?m=1): “Nietzsche identifie le hasard au multiple, aux fragments, aux membres, au chaos: chaos des dés qu’on choque et qu’on lance.” --- slug: thoughts/Group-theory tags: - math description: resconstructed source of "https://aarnphm.xyz/thoughts/Group-theory" title: group theory date: 2024-02-26 --- # Graph isomorphism[](#graph-isomorphism) --- slug: thoughts/HTTP tags: - technical description: resconstructed source of "https://aarnphm.xyz/thoughts/HTTP" title: HTTP date: 2024-02-08 --- --- slug: thoughts/Hegel tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Hegel" title: Hegel date: 2024-02-07 --- ## Phenomenology of Spirit --- slug: thoughts/Hidden-Markov-model tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Hidden-Markov-model" title: Hidden Markov model date: 2024-10-02 --- See also [wikipedia](https://en.wikipedia.org/wiki/Hidden_Markov_model) A Markov model where observations are dependent on a latent [_Markov process_](https://en.wikipedia.org/wiki/Markov_chain) $X$ > an HMM has an additional requirement that the outcome of $Y$ at time $t = t_0$ must be “influenced” exclusively by the outcome of $X$ at $t = t_0$ and that the outcomes of $X$ and $Y$ at $t It is _non-sequential_ writing — text that branches and enable choices to readers, without the need to follow a predetermined path. Hypertext can also be interpreted as a [database](https://aarnphm.xyz/thoughts/Hypertext/../../thoughts/Database) format in which information related to that on a display can be accessed directly from the display ![](https://aarnphm.xyz/thoughts/Hypertext/../../thoughts/images/hypertext.webp) He also brought up the concept of transclusion, which include parts of documents within other documents by reference. He envisioned an utopia, a global hypertext system (Xanadu) where all data was stored once, no deletions, and every information can be accessed through a links [^1], and everyone would be paid fairly for their work. ## fiction See also: [url](http://fictionaut.com/blog/2010/02/12/checking-in-with-hypertext-fiction/) non-linear space that use hypertext to explore narrative possibilities [^1]: [Interview with Ted Nelson](https://ics.uci.edu/~ejw/csr/nelson_pg.html) --- slug: thoughts/IPFS tags: - seed - technical description: resconstructed source of "https://aarnphm.xyz/thoughts/IPFS" title: IPFS date: 2024-02-08 --- IPFS is a decentralized storage and delivery networks which built on top of [p2p](https://aarnphm.xyz/thoughts/IPFS/../../thoughts/p2p) networking and content-based addressing (CID). > Can be seen in [git](https://aarnphm.xyz/thoughts/IPFS/../../thoughts/git) repositories, BitTorent, and most recently Ethereum. Similar to how we can reference an URI, we can look up its content by [content-address](https://aarnphm.xyz/thoughts/IPFS/../../thoughts/Content-addressable-storage) How would we use IPFS to share and publish data? --- slug: thoughts/In-memory-representation tags: - technical - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/In-memory-representation" title: In memory representation date: 2022-10-01 --- ## flatbuffer _difference_ with protobuf: no unpacking/parsing [Benchmark](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html) zero-mem copy with slightly larger wire format ## protobuf --- slug: thoughts/Information-Theory tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Information-Theory" title: Information Theory date: 2024-01-20 --- See also [pdf](https://fleuret.org/public/EN_essays/fleuret-inf-theory-2024.pdf) > Less horror. Probably full of typo.\ > \ > Source tex there: [pic.twitter.com/9e4FdQol3b](https://t.co/9e4FdQol3b) > > — François Fleuret (@francoisfleuret) [18 janvier 2024](https://twitter.com/francoisfleuret/status/1748011011590799462) ## hierarchy related to [design](https://aarnphm.xyz/thoughts/Information-Theory/../../thoughts/design) --- slug: thoughts/Intelligence-amplification tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Intelligence-amplification" title: Intelligence amplification date: 2024-01-07 --- > I’m playing around with calling our tech, as it is today, IA (intelligence amplification) instead of AI. IA have the vibe of tools for thought, needing human interaction, and resemble a lot more what we actually have today. AI feels more like independent long-running agents. > > — Andrej Karpathy (@karpathy) [7 janvier 2024](https://twitter.com/karpathy/status/1744062845426532473) Intelligence should be thought as a tool for thought, not an independent agent These systems should be built on top of human intelligence, not replace it. Next-token prediction is primitive to call a system intelligent. Can a [transformers](https://aarnphm.xyz/thoughts/Intelligence-amplification/../../thoughts/Transformers) ever be [Turing-complete](https://aarnphm.xyz/thoughts/Intelligence-amplification/../../thoughts/Turing-complete-Transformers)? ## research area. A lot of alpha in mechanistic analysis of the [representations](https://aarnphm.xyz/thoughts/Intelligence-amplification/../../thoughts/representations) these models exhibit., or “virtual brain analysis”. --- slug: thoughts/Jax tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Jax" title: Jax date: 2022-11-07 --- Numpy + [Autograd](https://aarnphm.xyz/thoughts/Jax/../../thoughts/Autograd). Use [XLA](https://aarnphm.xyz/thoughts/Jax/../../thoughts/XLA) to compile and run NumPy code on accelerators. Asynchronous dispatch, for sync use `block_until_ready()` ```python import jax.numpy as jnp from jax import random key = random.PRNGKey(0) x = random.normal(key, (10,)) jnp.dot(x, x.T).block_until_ready() ``` - notable function: - `jit()` for compilation of multiple computations - `grad()` for performing transformation (autodiff, Jacobian-vector product) - `vmap()` for auto-vectorisation > Arrays are **immutable** in Jax - Treat functions as pure as to compiled with [XLA](https://aarnphm.xyz/thoughts/Jax/../../thoughts/XLA) ```python title="entropix/dslider.py" from functools import partial from typing import NamedTuple, Tuple import jax import jax.numpy as jnp import jax.scipy as jsp @jax.jit def kl_divergence(logp: jnp.ndarray, logq: jnp.ndarray) -> jnp.ndarray: """Compute KL divergence between two log probability distributions.""" p = jnp.exp(logp) return jnp.sum(jnp.where(p > 0, p * (logp - logq), 0.0), axis=-1) @jax.jit def ent_varent(logp: jnp.ndarray) -> Tuple[jnp.ndarray, jnp.ndarray]: """Compute entropy and varentropy from log probabilities.""" p = jnp.exp(logp) ent = -jnp.sum(p * logp, axis=-1) diff = logp + ent[..., None] varent = jnp.sum(p * diff**2, axis=-1) return ent, varent @jax.jit def normalize_logits(logits: jnp.ndarray, noise_floor: float) -> jnp.ndarray: """Normalize logits to log probabilities with noise floor truncation.""" shifted = logits - jnp.max(logits, axis=-1, keepdims=True) normalized = shifted - jax.nn.logsumexp(shifted + EPS, axis=-1, keepdims=True) # noise floor calculated for bfloat16 return jnp.where(normalized < noise_floor, jnp.log(EPS), normalized) ``` _references: [github](https://github.com/xjdr-alt/entropix/blob/main/entropix/dslider.py)_ ## control flow see also [link](https://jax.readthedocs.io/en/latest/notebooks/Common_Gotchas_in_JAX.html#python-control-flow-jit) The following works: ```python @jax.jit def f(x): for i in range(3): x = 2 * x return x print(f(3)) @jax.jit def g(x): y = 0. for i in range(x.shape[0]): y = y + x[i] return y print(g(jnp.array([1., 2., 3.]))) ``` > [!warning]- doesn't work > > ```python {2,4,6} > @jax.jit > def fail(x): > if x < 3: return 3. * x ** 2 > else : return -4 * x > > fail(2) > ``` Reasoning: `jit` traces code on `ShapedArray` abstraction, where each abstract value represents the set of all array values with a fixed shape and dtype > [!tip]+ type coercion tradeoff > > If we trace a Python function on a `ShapedArray((), jnp.float32)` that isn’t committed to a specific concrete value, when we hit a line like if `x < 3`, the expression x < 3 evaluates to an abstract `ShapedArray((), jnp.bool_)` that represents the set `{True, False}`. Fix: you can use `static_argnums` to specify which argument should be treated as static ```python @jit(static_argnums=(0,)) def f(x): if x < 3: return 3. * x ** 2 else: return -4 * x ``` ## buffers > [!question] How does JAX handle memory buffers? [fast replay buffers](https://github.com/instadeepai/flashbax) --- slug: thoughts/KV-compression tags: - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/KV-compression" title: KV compression date: 2024-10-10 --- see also: [github](https://github.com/October2001/Awesome-KV-Cache-Compression) TLDR: Most algorithm determine importance through aggregating attentions over observed queries ([Liu et al., 2023](#bib-liu2023scissorhandsexploitingpersistenceimportance); [Zhang et al., 2023](#bib-zhang2023h2oheavyhitteroracleefficient)) More recent work aggregated attention from _limited observation windows_ ([Cai et al., 2024](#bib-cai2024pyramidkvdynamickvcache); [Li et al., 2024](#bib-li2024snapkvllmknowslooking)) uses top\_k to find $k$-indices of attentions per head to preserve, and evict the not-so-important ones. ## idea. Look at past attention weights for each pair of key and value vectors (a measure of the degree with which that KV’s representation has been queried during past attention operations) Then select the KV with the least attention to evict Think of LFU (least frequency used) cache management policy the KV cache for each sequence in a particular layer is allocated on the GPU as a _# attention heads $X$ sequence length_ tensor. > [!tip] Important > > total memory allocation scales with the _maximum_ sequence length for all attention heads of the KV cache ## Adaptive KV-cache compression See also [paper](https://arxiv.org/abs/2310.01801) ([Ge et al., 2024](#bib-ge2024modeltellsdiscardadaptive)) ## Streaming LLM _Using attention sink_ see also [paper](https://arxiv.org/abs/2309.17453) ([Xiao et al., 2024](#bib-xiao2024efficientstreaminglanguagemodels)) Ablate attentions among layers that deemed to be less valuable to current generations. ## Pyramid-KV See also [paper](https://arxiv.org/abs/2406.02069) ([Cai et al., 2024](#bib-cai2024pyramidkvdynamickvcache)) ![](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/images/pyramid-kv.webp) ## Snap-KV See also [paper](https://arxiv.org/abs/2404.14469), [github](https://github.com/FasterDecoding/SnapKV) ([Li et al., 2024](#bib-li2024snapkvllmknowslooking)) Voting: calculating attention weights for each query within observation windows across all attention heads, then aggregate to highlight prefix positions. Formally for a single batch: $$ \begin{aligned} C = &\sum_{i=0}^{L_{\text{obs}}} W_{\text{obs}} [:,i,:] \\ I &= \text{Top}_{k}(C, k) \end{aligned} $$ _[hijack for llama\_hijack\_4\_37.py](https://github.com/FasterDecoding/SnapKV/blob/82135ce2cc60f212a9ba918467f3d9c8134e163f/snapkv/monkeypatch/llama_hijack_4_37.py#L19)_ > [!tip] Important > > $k$ is defined as $\lfloor p \times L_{\text{prefix}} \rfloor$, where $p$ is the compression rates. Hit Rate: essentially the attention features above a predefined threshold $\Theta$ to be important features. The idea is to have two stages: - **Vote for important features**: select important features based on important features given fixed windows. - **Update and store the compressed KV**: concat attention features within the windows and update the KV-cache. - clustering via pooling ⇒ frequent hit-rate attention ```python attn_cache = pool1d(attn_weights_sum, kernel_size=kernel_size, padding=kernel_size//2, stride=1) ``` ## Ada-KV ideas: instead of uniform eviction for KV cache hit, allocate a certain budget $B_i$ per attention heads to dynamically evict certain heads _built on-top of PyramidKV and SnapKV_ ![](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/images/vllm/ada-kv.webp) > [!note] Note > > With Ada-SnapKV, each attention layers are still assigned with a fixed compression rate (refer to the image example) See also [paper](https://arxiv.org/abs/2407.11550) ([Feng et al., 2024](#bib-feng2024adakvoptimizingkvcache)) ## KIVI link: [github](https://github.com/jy-yuan/KIVI) --- ## KV-Compress _variable compression rates per attention head_ source: [github](https://github.com/IsaacRe/vllm-kvcompress) ## idea. Look at past attention weights for each pair of key and value vectors (a measure of the degree with which that KV’s representation has been queried during past attention operations) Then select the KV with the least attention to evict Think of LFU (least frequency used) cache management policy the KV cache for each sequence in a particular layer is allocated on the GPU as a _# attention heads $X$ sequence length_ tensor. > [!tip] Important > > total memory allocation scales with the _maximum_ sequence length for all attention heads of the KV cache [Lien vers l'original](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm/../../thoughts/KV-compression#idea) > [!notes] Notes > > A variation of [Ada-SnapKV](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm/../../thoughts/KV-compression#ada-kv) idea: - _group-query-compression_: compress KV-cache of GQA without repeating it into the dimension of $\sum$ query heads. - Modified PagedAttention that compute _against_ KV-cache (contains variable numbers of KVs per head) ![](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm/../../thoughts/images/vllm/kv-compress-vllm.webp) > For vLLM, each cache block stores KV for every attention head of every layer > > For KV-Compress, each block only holds KVs for a single head. Block tables are expanded $l \times H$ so that unique block for each specific KV head and layer can be retrieved ### Query-Group Compression (QGC) KV compression algorithm doesn’t have GQA design in mind. - [Pyramid-KV](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm/../../thoughts/KV-compression#pyramid-kv) cache and compress KV _after_ repetition for alignment with query tensors - Redundancy in cache before compression > modification of eviction-based methods per groups ### Block layout and allocation idea: adapt PagedAttention to page out cache on a _per-head, per-layer–as well as per sequence–basis_ ![](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm/../../thoughts/images/vllm/paged-attention-block-kv-compress.webp) > [!note]- explanation > > A simplified example with two KV heads and a block size of two: > > - KV metrics are visualized for a given cache state, highlighting blocks of a particular sequence in the decoding batch that is scheduled to evict two blocks. > - Logical indices are displayed under the corresponding metrics slot. #### Evict from Paged KV cache > need to evict KV blocks instead of evict single KV attention [Lien vers l'original](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm#kv-compress) --- ## References - Cai, Z., Zhang, Y., Gao, B., Liu, Y., Liu, T., Lu, K., Xiong, W., Dong, Y., Chang, B., Hu, J., & Xiao, W. (2024). _PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling_. arXiv preprint arXiv:2406.02069 [arxiv](https://arxiv.org/abs/2406.02069) - Feng, Y., Lv, J., Cao, Y., Xie, X., & Zhou, S. K. (2024). _Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference_. arXiv preprint arXiv:2407.11550 [arxiv](https://arxiv.org/abs/2407.11550) - Ge, S., Zhang, Y., Liu, L., Zhang, M., Han, J., & Gao, J. (2024). _Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs_. arXiv preprint arXiv:2310.01801 [arxiv](https://arxiv.org/abs/2310.01801) - Li, Y., Huang, Y., Yang, B., Venkitesh, B., Locatelli, A., Ye, H., Cai, T., Lewis, P., & Chen, D. (2024). _SnapKV: LLM Knows What You are Looking for Before Generation_. arXiv preprint arXiv:2404.14469 [arxiv](https://arxiv.org/abs/2404.14469) - Liu, Z., Desai, A., Liao, F., Wang, W., Xie, V., Xu, Z., Kyrillidis, A., & Shrivastava, A. (2023). _Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time_. arXiv preprint arXiv:2305.17118 [arxiv](https://arxiv.org/abs/2305.17118) - Xiao, G., Tian, Y., Chen, B., Han, S., & Lewis, M. (2024). _Efficient Streaming Language Models with Attention Sinks_. arXiv preprint arXiv:2309.17453 [arxiv](https://arxiv.org/abs/2309.17453) - Zhang, Z., Sheng, Y., Zhou, T., Chen, T., Zheng, L., Cai, R., Song, Z., Tian, Y., Ré, C., Barrett, C., Wang, Z., & Chen, B. (2023). _H₂O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models_. arXiv preprint arXiv:2306.14048 [arxiv](https://arxiv.org/abs/2306.14048) --- slug: thoughts/LLMs tags: - sapling - ml - llm description: resconstructed source of "https://aarnphm.xyz/thoughts/LLMs" title: LLMs date: 2024-02-07 --- [large language](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Machine-learning) models, often implemented as [autoregressive](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Autoregressive-models) [transformers](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Transformers) models. > [!note] GPTs and friends > > Most variants of LLMs are decoder-only Have “capabilities” to understand [natural language](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/NLP). Exhibits [emergent behaviour](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/emergent-behaviour) of [intelligence](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/intelligence), but probably not [AGI](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/AGI) due to [observer-expectancy effect](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/observer-expectancy-effect). One way or another is a form of [behaviourism](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Behavirourism), through [reinforcement learning](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Machine-learning). It is being “told” what is good or bad, and thus act accordingly towards the users. However, this induces [confirmation bias](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/confirmation-bias) where one aligns and contains his/her prejudices towards the problem. ### Scalability Incredibly hard to scale, mainly due to their [large](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/large-models) memory footprint and tokens memory allocation. ### Optimization See also: [this talk](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/images/htn-openllm.pdf) - [Quantization](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/quantization): reduce computational and memory costs of running inference with representing the weight and activations with low-precision data type - [Continuous batching](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Continuous-batching): Implementing [Paged Attention](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Attention#paged-attention) with custom scheduler to manage swapping kv-cache for better resource utilisation ### on how we are being [taught](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/education#teaching). How would we assess thinking? Similar to calculator, it _simplifies_ and increase accessibility to the masses, but in doing so _lost_ the value in the _action of doing_ math. We do math to internalize the concept, and practice to thinking coherently. Similarly, we [write](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/writing) to help crystalised our ideas, and in the process improve through the act of putting it down. The process of rephrasing and arranging sentences poses a challenges for the writer, and in doing so, teach you how to think coherently. Writing essays is an exercise for students to articulate their thoughts, rather than testing the understanding of the materials. ### on [ethics](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/ethics) See also [Alignment](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Alignment). There are ethical concerns with the act of “hallucinating” content, therefore alignment research is crucial to ensure that the model is not producing harmful content. ### as philosophical tool. To create a better [representations](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/representations) of the world for both humans and machines to understand, we can truly have assistive tools to enhance our understanding of the world surround us ### AI generated content Don’t shit where you eat, **[Garbage in, garbage out](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Garbage-in-Garbage-out)**. The quality of the content is highly dependent on the quality of the data it was trained on, or model are incredibly sensitive to [data](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/data) variances and biases. Bland doublespeak See also: [All the better to see you with](https://www.kernelmag.io/2/all-the-better-to-see-you) > Here's a real problem though. Most people find writing hard and will get AIs to do it for them whenever they can get away with it. Which means bland doublespeak will become the default style of writing. Ugh. > > — Paul Graham (@paulg) [25 février 2024](https://twitter.com/paulg/status/1761801995302662175) ### machine-assisted writings _source: [`gwern[dot]net`](https://gwern.net/gpt-3)_ Idea: use [sparse autoencoders](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability#sparse-autoencoders) to guide ideas generations ### Good-enough > "How did we get AI art before self-driving cars?" IMHO this is the single best heuristic for predicting the speed at which certain AI advances will happen. [pic.twitter.com/yAo6pwEsxD](https://t.co/yAo6pwEsxD) > > — Joshua Achiam (@jachiam0) [1 décembre 2022](https://twitter.com/jachiam0/status/1598448668537155586) This only occurs if you only need a “good-enough” item where value outweighs the process. However, one should always consider to put in the work, rather than being “ok” with good enough. In the process of working through a problem, one will learn about bottleneck and problems to be solved, which in turn gain invaluable experience otherwise would not achieved if one fully relies on the interaction with the models alone. ### as [search](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Search) These models are incredibly useful for summarization and information gathering. With the [taxonomy](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/taxonomy) of [RAG](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/RAG) or any other CoT tooling, you can pretty much augment and produce and improve search-efficiency bu quite a lot. notable mentions: - [perplexity.ai](https://perplexity.ai/): [RAG](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/RAG)-first search engine - [explorer.globe.engineer](https://explorer.globe.engineer/): tree-based [information retrieval](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/information-retrieval) - [Exa labs](https://twitter.com/ExaAiLabs) - [You.com](https://you.com/?chatMode=default) ### Programming Overall should be a net positive, but it’s a double-edged sword. #### as end-users [Source](https://www.geoffreylitt.com/2023/03/25/llm-end-user-programming.html) > I think it’s likely that soon all computer users will have the ability to develop small software tools from scratch, and to describe modifications they’d like made to software they’re already using #### as developers Tool that lower of barrier of entry is always a good thing, but it often will lead to probably even higher discrepancies in quality of software Increased in productivity, but also increased in technical debt, as these generated code are mostly “bad” code, and often we have to nudge and do a lot of **[prompt engineering](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/prompt-engineering)**. --- slug: thoughts/Language tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Language" title: Language date: 2024-01-08 --- > Language as a public tool to understand the private life. important to our self-knowledge ⇒ emphasise through reading books. ## communication. Notably through the work of “Philosophical Investigations” by Ludwig Wittgenstein - Concept of “language-game” - The idea that each of us construct a pictures that we see the world through language. - Conflict arose when pictures are not aligned, often lead to context collapse. Possibly the most salient feature of [LLMs](https://aarnphm.xyz/thoughts/Language/../../thoughts/LLMs) is that the system is surprising patient per each interactions with humans. ## [representations](https://aarnphm.xyz/thoughts/Language/../../thoughts/representations). [Language models](https://aarnphm.xyz/thoughts/Language/../../thoughts/LLMs) is a representation of our knowledge. Techniques such as [deep learning](https://aarnphm.xyz/thoughts/Language/../../thoughts/deep-learning) has risen to prominence due to its ability to learn from data, and in doing so, it has the capability to represent the world in a way that is more similar to how we perceive it. --- slug: thoughts/Lighting tags: - seed - film description: resconstructed source of "https://aarnphm.xyz/thoughts/Lighting" title: Lighting date: 2023-11-11 --- ### Key light - Book light - Key source ⇒ bounce towards a diffusers - Spot light ⇒ Soft and dim contrast to the shot --- slug: thoughts/Low-rank-adapters tags: - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Low-rank-adapters" title: Low rank adapters date: 2024-02-08 --- --- slug: thoughts/Lyapunov-time tags: - seed - math description: resconstructed source of "https://aarnphm.xyz/thoughts/Lyapunov-time" title: Lyapunov time date: 2024-02-25 --- --- slug: thoughts/Machine-learning tags: - ml - sapling description: resconstructed source of "https://aarnphm.xyz/thoughts/Machine-learning" title: Machine learning date: 2024-02-07 --- Detects pattern within data and use it to make useful prediction. Generally AI $\subset$ ML $\subset$ [DL](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/deep-learning) Some main exploration: - [Transformers](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/Transformers) - CNN - [Optimization](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/optimization) - Gradient descent - hyperparameter tuning - Recommender systems - Reinforcement learning - Q-learning - Policy Gradient - [Monte-Carlo](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/Monte-Carlo) Tree Search - Generative Models - GAN - VAE - Autoencoder - Supervised Q-learning - [Low-rank adapters](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/Low-rank-adapters) Fields - [mechanistic interpretability](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/mechanistic-interpretability) Related: - [linear algebra](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm#linear-algebra-review). - [autograd](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/Automatic-Differentiation) - [supervised machine learning](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm). --- slug: thoughts/Merkle-DAG tags: - seed - technical description: resconstructed source of "https://aarnphm.xyz/thoughts/Merkle-DAG" title: Merkle DAG date: 2024-02-08 --- It is a directed acyclic [graph](https://aarnphm.xyz/thoughts/Merkle-DAG/../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/Graphs) where each node is a version of the content and edges represents the change (diffs) Each node has an identifier which is the results of hashing the content. Merkle DAG nodes are _immutable_ and _[content-addressable](https://aarnphm.xyz/thoughts/Merkle-DAG/../../thoughts/Content-addressable-storage)_. Any changes in the node would alter its identifier thus affect all ascendants, which create a different DAG. Examples of the DAG in action: - [IPFS](https://aarnphm.xyz/thoughts/Merkle-DAG/../../thoughts/IPFS) - [Containers](https://aarnphm.xyz/thoughts/Merkle-DAG/../../thoughts/Containers) - [git](https://aarnphm.xyz/thoughts/Merkle-DAG/../../thoughts/git) --- slug: thoughts/Metaphysics tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Metaphysics" title: Metaphysics date: 2024-02-09 --- See also: [The Evolution of Modern Metaphysics](https://aarnphm.xyz/thoughts/Metaphysics/../../books#tagsphilosophy-philosophy) Gentle introduction from [Aristotle](https://aarnphm.xyz/thoughts/Metaphysics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle), with [Being qua being](https://aarnphm.xyz/thoughts/Metaphysics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Being-qua-being) --- slug: thoughts/Misra-Gries-heavy-hitters-algorithm tags: - algorithm description: extends Boyer-Moore finding algorithm title: Misra-Gries heavy-hitters algorithm date: 2024-10-11 --- one of the earliest [data](https://aarnphm.xyz/thoughts/Misra-Gries-heavy-hitters-algorithm/../../thoughts/data) streaming algorithm. ## problem. > Given the bag $b$ of $n$ elements and an integer $k \geq 2$. Find the values that occur more than $n/k$ times in $b$ idea: two passes over the values in $b$, while storing at most $k$ values from $b$ and their number of occurrences. Assume the bag is available in array $b[0:n-1]$ of $n$ elements, then a _heavy-hitter_ of bag $b$ is a value that occurs more than $n/k$ times in $b$ for some integer $k \geq 2$ ## pseudocode. ```pseudo \begin{algorithm} \caption{Misra--Gries} \begin{algorithmic} \State $t \gets \{\}$ \State $d \gets 0$ \For{$i \gets 0$ to $n-1$} \If{$b[i] \notin t$} \State $t \gets t \cup \{b[i]\}$ \State $d \gets d + 1$ \Else \State $t \gets t \cup \{b[i]\}$ \EndIf \If{$d = k$} \State Delete $k$ distinct values from $t$ \State Update $d$ \EndIf \EndFor \end{algorithmic} \end{algorithm} ``` --- slug: thoughts/Monte-Carlo tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Monte-Carlo" title: Monte-Carlo methods date: 2024-04-12 --- ## tree search. a [search](https://aarnphm.xyz/thoughts/Monte-Carlo/../../thoughts/Search) algorithm based on random sampling of the search space. - Selection: root $R$ and select successive child nodes until leaf $L$ is reached. - The root is current game state and leaf is any node that has a potential child from no simulation - Expansion: Unless $L$ ends the game decisively for either player, then create one (or more) child nodes and choose node $C$ from one of them. - Simulation: Complete **one** random playout from node $C$. - Backpropgation: Result of playout to update information in nodes on path from $C$ to $R$. ## simulations --- slug: thoughts/NLP tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/NLP" title: NLP date: 2024-02-07 --- See also: [LLMs](https://aarnphm.xyz/thoughts/NLP/../../thoughts/LLMs) ### CoT prompting arxiv: [2201.11903](https://arxiv.org/abs/2201.11903) --- slug: thoughts/Nagle-and-TCP-Cork tags: - seed - networking description: resconstructed source of "https://aarnphm.xyz/thoughts/Nagle-and-TCP-Cork" title: Nagle's algorithm and TCP_CORK date: 2022-07-01 --- ### Nagle’s algorithm and Delay ACK - _small packets_ → not for TCP → Nagle algorithm: `Maximize ratio of packets - data content` → Delay ACK: `silly window` ```prolog if available_data & window_size > MSS send payload on wire else if unconfirmed_data queue else send ``` ### Cork algorithm --- slug: thoughts/Nesterov-momentum tags: - ml - optimization description: resconstructed source of "https://aarnphm.xyz/thoughts/Nesterov-momentum" title: Nesterov momentum date: 2024-11-11 --- See also [paper](http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf) idea: - first take a step in the direction of accumulated momentum - computes gradient at “lookahead” position, - make the update using this gradient. > [!abstract] definition > > For a parameter vector $\theta$, the update can be expressed as > > $$ > \begin{aligned} v_t &= \mu v_{t-1} + \nabla L(\theta_t + \mu v_{t-1}) \\ \theta_{t+1} &= \theta_t - \alpha v_t \end{aligned} > $$ Achieves better convergence rates | function type | gradient descent | Nesterove AG | | ------------------------ | ---------------------------------- | --------------------------------------- | | Smooth | $\theta(\frac{1}{T})$ | $\theta(\frac{1}{T^{2}})$ | | Smooth & Strongly Convex | $\theta(\exp (-\frac{T}{\kappa}))$ | $\theta(\exp -\frac{T}{\sqrt{\kappa}})$ | --- slug: thoughts/Networked-Thoughts tags: - seed - pattern description: resconstructed source of "https://aarnphm.xyz/thoughts/Networked-Thoughts" title: Networked Thoughts date: 2024-02-09 --- --- slug: thoughts/OCI tags: - seed - container description: resconstructed source of "https://aarnphm.xyz/thoughts/OCI" title: OCI Format date: 2023-08-10 --- A standard for packaging and running containerized applications. [Specification](https://github.com/opencontainers/image-spec): ### Layout Directory structure for [location-addressable](https://aarnphm.xyz/thoughts/OCI/../../thoughts/Content-addressable-storage) blobs --- slug: thoughts/Orwellian tags: - seed - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Orwellian" title: Orwellian date: 2024-10-02 --- Described a situation, idea, or societal condition that George Orwell identified as being destructive to the welfare of a free and open society. --- slug: thoughts/Overton-Window tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Overton-Window" title: Overton Window date: 2024-03-05 --- _also known as window of discourse_ > A window into the ideas that frames ideas that people are prepared to entertain. All ideas outside the window are not seriously considered. More prominent in the land of policy-making, but also apply to general idea perception. To move the window requires people, ideas outside of the window to shift what is considered “generally” acceptable by the public. --- slug: thoughts/PJRT tags: - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/PJRT" title: PJRT date: 2024-03-04 --- Blog [post](https://opensource.googleblog.com/2023/05/pjrt-simplifying-ml-hardware-and-framework-integration.html) and [source](https://github.com/openxla/xla/tree/main/xla/pjrt) Lower stack layer for framework and hardware communication. As abstraction to transpile to different hardware targets: [TPU](https://aarnphm.xyz/thoughts/PJRT/../../thoughts/TPU), [GPU](https://aarnphm.xyz/thoughts/PJRT/../../thoughts/GPU-programming) --- slug: thoughts/PageRank tags: - seed - algorithm description: resconstructed source of "https://aarnphm.xyz/thoughts/PageRank" title: PageRank date: 2024-09-04 --- --- slug: thoughts/Pavlovian-scale tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Pavlovian-scale" title: Pavlovian scale date: 2024-09-25 --- Also known as classical conditioning > a biologically potent stimulus is paired with a neutral stimulus --- slug: thoughts/Philosophy-and-Kant tags: - philosophy - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Philosophy-and-Kant" title: Philosophy and Kant date: 2023-12-04 --- ### ontology framework. ### critique. --- slug: thoughts/Philosophy-and-Nietzsche tags: - philosophy - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche" title: Philosophy and Nietzsche date: 2023-12-04 --- See also: Nietzsche’s [Life](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche) and overall influence ## Nietzsche and Philosophy _by [Giles Deleuze](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Giles-Deleuze)_ The decadence of modern philosophy is the theory of value imposes conformism and a new form of submission Philosophy of sense and values has to be a critique ### Value Problem with [Kant](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Philosophy-and-Kant): failed to pose the problem of critique in terms of values Notion of [aesthetic value](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/aesthetic-value) implies critical reversal Critical philosophy has two inseparable moments: the referring back of all things and any kind of origin to values, but also the referring back of these values to something which is, as it were, the origin and determines their value. This is Nietzsche’s twofold struggle: - against those who remove values from criticism, contenting themselves with producing inventories of existing values or we criticising things in the name of established values (the “philosophy labourers”, Kant and Schopenhauer, [BGE ](#anatomy-of-beyond-good-and-evil)211) - against those who criticise, or respect, values by deriving them from simple facts, from so-called “objective facts” (the utilitarians, the “scholars”, BGE Part 6). Nietzsche attacks both the “high” idea of foundation which leaves values indifferent to their own origin and the idea of a simple causal derivation or smooth beginning which suggests an indifferent origin of values Genealogy: substitute pathos of difference or distance for both Kantian principle of universality and the principle of resemblance dear to utilitarianism (GM I) ### Sense - there are no def of sense - We don’t know where the force come from - Philosophy is symptomatology, not semeiology - To interpret and to evaluate is to weigh causal and effects. Force is not a cause, but a symptom. ### Against [dialectics](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/dialectics) Theory of force Life struggles with another form of life it affirms its own difference and enjoys this difference. The negative is not present in the essence as that from which force draws its activity: on the contrary it is a result of activity, of the existence of an active force and the affirmation of its difference. The negative is a product of existence itself: the aggression necessarily linked to an active existence, the aggression of an affirmation. As for negation as a concept, “it is only a subsequently-invented pale contrasting image in relation to its positive basic concept - filled with life and passion through and through” (GM I 10 p. 37). For the speculative element of negation, opposition or contradiction Nietzsche substitutes the practical element of difference, the object of affirmation and enjoyment. It is in this sense that there is a Nietzschean empiricism. The question which Nietzsche constantly repeats, “what does a will want, what does this one or that one want?”, must not be understood as the search for a goal, a motive or an object for this will. What a will wants is to affirm its difference. In its essential relation with the “other” a will makes its difference an object of affirmation. > “The pleasure of knowing oneself different”, the enjoyment of difference (BGE 260); This is the new, aggressive and elevated conceptual element that empiricism substitutes for the heavy notions of the dialectic and above all, as the dialectician puts it, for the labour of the negative. ### Tragedy > [!tip] Tragic > > The linking among contradictions, negatives, and opposition Tragedy has three ways of dying: - Socrates’ dialectics, or Euripidean death - Christianity - Modern dialectics and Wagner 1. BT emphasizes the contradiction is between primitive unity and individuality 2. Reflected in the opposition of Dionysus and Apollo - Apollo overcomes the suffering of the individual by the radiant glorification of the eternity of the phenomenon: construct appearances of appearance, thus freed from suffering - Dionysus shatters the individual, absorbing him into original being ⇒ reproduces contradictions as pain of individual and introduces into higher pleasure 3. Two antithesis ways of solving tragedy 4. Reconciliation dominated by Dionysus ### Nietzsche’s Evolution Tragic in totality lies within its contradiction, Dionysus’ resolutions and expressions of such solutions Characteristic of tragic culture, as seem in Kant, Schopenhauer, Wagner, as in trying to solve it - wisdom takes the place of science as the highest end. ### Existence and Innocence Necessary to disperse the universe, to lose respect for the whole > Innocence is the game of existence, of force and will Existence affirmed and appreciated, force not separated, the will not divided in two - first approximation of innocence Mentioned Heraclitus = tragic thinker H understood existence on the basis of an instinct of [play](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/play) Existence as an [aesthetic](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/aesthetic-value) phenomenon rather than moral or religious Affirmation of being Heraclitus denied the duality of worlds, “he denied being itself’. Moreover he made an affirmation of becoming. We have to reflect for a long time to understand what it means to make an affirmation of becoming. In the first place it is doubtless to say that there is only becoming. No doubt it is also to affirm becoming. But we also affirm the being of becoming, we say that becoming affirms being or that being is affirmed in becoming. Heraclitus has two thoughts which are like ciphers: according to one there is no being, everything is becoming; according to the other, being is the being of becoming as such. A working thought which affirms becoming and a contemplative thought which affirms the being of becoming. These two ways of thinking are inseparable, they are the thought of a single element, as Fire and Dike, as Physis and Logos. For there is no being beyond becoming, nothing beyond multiplicity; neither multiplicity nor becoming are appearances or illusions Multiplicity is the inseparable manifestation, essential transformation and constant symptom of unity Affirming being of becoming and affirming becoming are two return state Eternal return is distinct return of outward movement, distinct contemplation of action ### The dice-throw The game as two set of movement Earth is where the dice is thrown and sky is when the dice is thrown back The dice-throw affirm becoming and it affirms the being of becoming Not a large number of throws produce the repetition of combinations but rather the number of combinations which produce the repetition of the dice throw Dice that are thrown once is the affirmation of chance Combination of dice that are thrown is the affirmation of necessity Necessity is affirmed by chances and chances id being affirmed by the act of necessity ### Nietzsche and Mallermé 1. To think is to send out a dice-throw 2. Man does not know how to play 3. To throw a dice is not only irrational, but also constitute to the tragic attempt and tragic thought par excellence Necessity is the abomination of chance ### Tragic thoughts Spirit of revenge as in different form nihilism takes place It is a type, but not separable from typology The Touchstone Relates to other tragic philosopher, but shan’t take this at face value Tragedy in Nietzsche philosophy, one must ask: - How does this other think? - How much ressentiment and bad conscience remains in his thoughts? Zarathustra opposes playing to betting, dancing to leaping --- ## _Anatomy_ of Beyond Good and Evil ### Prejudices of Philosophers [Source](https://www.marxists.org/reference/archive/nietzsche/1886/beyond-good-evil/ch01.htm) - Begins by critiquing the traditional approaches of [truth](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Will-to-Truth) and morality, deemed it “hazardous enterprise”. From a perspective of a philosophers, who are “deemed to pursue the truth” doesn’t seem to fully understand why ### The Free Spirit [Source](https://www.marxists.org/reference/archive/nietzsche/1886/beyond-good-evil/ch02.htm) > [!note] Aphorism 24 > > What strange simplification and falsification mankind lives! One can never cease wondering once one has acquired eyes for this marvel! How we have made everything around us clear and free and easy and simple! How we have been able to give our senses a passport to everything superficial, our thoughts a godlike desire for wanton gambolling and false conclusions! - How from the beginning, we have contrived to retain our ignorance as to enjoy an almost inconceivable freedom, frivolity, impetuosity, bravery, cheerfulness of life, so as to enjoy life! Man lives in blissful ignorance, and it is this ignorance that allows him to enjoy life. Contains a deliberate overlooking or misunderstanding of complexity and depth of reality, such that one grant one’s thoughts the freedom to roam superficially. > And only on this solidified, granite-like foundation of ignorance could knowledge rear itself hitherto, the will to knowledge on the foundation of a far more powerful will, the will to ignorance, to the uncertain, to the untrue! Not as its opposite, but — as its refinement! Nietzsche posits that humans have contrived to retain their ignorance in order to enjoy life with freedom, lack of scruple, heartiness, and gaiety. This foundation of ignorance allows knowledge to rise, but it does so on the foundation of a far more powerful [will](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Will) --- will to ignorance to uncertainty, to the untrue. Nietzsche juxtaposes a paradox at our existence: a foundation of ignorance is actually built upon our will to knowledge. Will to knowledge is not opposed to ignorance, rather a refinement. Will to ignorance is actually a strategy of [power](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Will-to-Power), as it motivates [force](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Giles-Deleuze#active-and-reactive-forces). --- ## The Gay Science Mentions the Death of God and start the introduction to the doctrine of eternal occurrence > [!note] The connotation of "gay" in Nietzsche's dialectics > > The original title was “la gayza scienza”, and “gay” doesn’t necessarily means homosexuality, rather flexible and joyful. If word for word to be transcribed, it would meant The Joyful Science. --- ## On Genealogy of Morals --- ## Thus Spoke Zarathustra Consciousness is what you make of it. The values you gather through experience are curated largely based on your environment, and Zarathustra guides you on acting morally. People are innately good, but circumstances make them act a certain way. --- slug: thoughts/Planimetric-composition tags: - film description: resconstructed source of "https://aarnphm.xyz/thoughts/Planimetric-composition" title: Planimetric composition date: 2023-08-11 --- --- slug: thoughts/Progressive-disclosure tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Progressive-disclosure" title: Progressive disclosure date: 2024-09-02 --- > make complexity easier to learn, but still enables power users to discover all workflows. --- slug: thoughts/PyTorch tags: - ml - framework description: tidbits from PyTorch title: PyTorch date: 2024-11-11 --- see also: [unstable docs](https://pytorch.org/docs/main/) ## `MultiMarginLoss` Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input $x$ (a 2D mini-batch `Tensor`) and output $y$ (which is a 1D tensor of target class indices, $0 \le y \le \text{x}.\text{size}(1) -1$): For each mini-batch sample, loss in terms of 1D input $x$ and output $y$ is: $$ \text{loss}(x,y) = \frac{\sum_{i} \max{0, \text{margin} - x[y] + x[i]}^p}{x.\text{size}(0)} \\ \because i \in \{0, \ldots x.\text{size}(0)-1\} \text{ and } i \neq y $$ --- slug: thoughts/RAG tags: - technical - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/RAG" title: RAG date: 2024-02-07 --- Retrieval-Augmented Generation paper: [arxiv](https://arxiv.org/abs/2005.11401) Since models has finite memory, limited context windows, generations often leads to “hallucinations” and lack of cohesion The idea of RAG is to combine a pretrained retriever and a seq2seq to do end-to-end fine tuning. Two core components include [embeddings](https://aarnphm.xyz/thoughts/RAG/../../thoughts/Embedding) and vector databases. --- slug: thoughts/Radix-tree tags: - technical description: resconstructed source of "https://aarnphm.xyz/thoughts/Radix-tree" title: Radix tree date: 2024-11-18 --- A prefix [trie](https://aarnphm.xyz/thoughts/Radix-tree/../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/Hash-tables) in which each node that is the only child is merged with its parent. ![](https://aarnphm.xyz/thoughts/Radix-tree/../../thoughts/images/Patricia_trie.svg) _By Claudio Rocchini - Own work, CC BY 2.5, [wikimedia](https://commons.wikimedia.org/w/index.php?curid=2118795)_ result: number of all internal nodes is at most the radix $r$ of the tree, where $r=2^{x} \forall x \in \mathbb{R}^d \cap x \ge 1$ Edge can be labelled with sequences of elements as well as single elements. key at each node is compared chunk-of-bits, where quantity of bits in any given chunk is the radix $r$ of the radix tree: - $r=2$ then radix trie is binary, which minimise sparsity at the expense of maximising trie-depth - $r \ge 4$ is a power of two, then it is a r-ary trie, which lessen the depth at the expense of some sparseness **Lookup pseudocode**: ```pseudo \begin{algorithm} \caption{Lookup} \begin{algorithmic} \State $\text{traverseNode} \gets \text{root}$ \State $\text{elementsFound} \gets 0$ \While{traverseNode $\neq \text{null} \land \neg \text{traverseNode}.\text{isLeaf}() \land \text{elementsFound} < \text{length}(x)$} \State nextEdge $\gets$ select edge from traverseNode.edges where edge.label is a prefix of $x.\text{suffix}(\text{elementsFound})$ \If{nextEdge $\neq \text{null}$} \State traverseNode $\gets$ nextEdge.targetNode \State elementsFound $\gets$ elementsFound + length(nextEdge.label) \Else \State traverseNode $\gets$ null \EndIf \EndWhile \State \Return traverseNode $\neq \text{null} \land \text{traverseNode}.\text{isLeaf}() \land \text{elementsFound} = \text{length}(x)$ \end{algorithmic} \end{algorithm} ``` ## complexity Permits lookup, deletion, insertion in $O(k)$ rather than $O(\log n)$ Normally $k \ge \log n$, but in a balanced tree every comparison is a string comparison requires $O(k)$ worse-case time. Whereas in a trie all comparison require constant times, but takes $m$ comparisons to look up a string length $m$ --- slug: thoughts/Routh-Hurwitz-criterion tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Routh-Hurwitz-criterion" title: Routh-Hurwitz criterion date: 2024-02-06 --- > Condition for the stability of linear time-invariant (LTI) [control system](https://aarnphm.xyz/thoughts/Routh-Hurwitz-criterion/../../tags/sfwr3dx4) > [!tip] sufficient condition for Stability > > All coefficients in the first column complete Routh array are the same sign For a system with transfer function $\hat{G}(s) = \frac{\mathcal{N}(s)}{\mathcal{D}(s)}$ Input-output stability implies that all root of $\mathcal{d}(s)$ are in the Left Half Plane (LHP) --- slug: thoughts/Rust tags: - seed - technical description: resconstructed source of "https://aarnphm.xyz/thoughts/Rust" title: Rust date: 2022-10-29 --- Ownership and Borrowing - Stack and heaps ```rust fn main() { let s = String::from("Hello"); } ``` borrow mutable ONCE - long running owners - refcount Foreign-Function Interfaces (FFI) --- slug: thoughts/SVCCA tags: - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/SVCCA" title: SVCCA date: 2024-11-04 --- ([Raghu et al., 2017](#bib-raghu2017svccasingularvectorcanonical)) proposed a way to compare two representations that is both invariant to affine transform and fast to compute [^explain] > based on canonical correlation analysis which was invariant to linear transformation. > [!abstract] definition > > Given a dataset $X = \{x_{1},\cdots, x_m\}$ and a neuron $i$ on layer $l$, we define $z_i^l$ to be the _vector_ of outputs on $X$, or: > > $$ > z^l_i = (z^l_i(x_1), \cdots, z^l_i(x_m)) > $$ SVCCA proceeds as following: 1. **Input**: takes as input two (not necessary different) sets of neurons $l_{1} = \{z_1^{l_{1}}, \cdots, z_{m_{1}}^{l_1}\}$ and $l_{2} = \{z_1^{l_2}, \cdots, z_{m_2}^{l_{2}}\}$ 2. **Step 1**: Perform [SVD](https://aarnphm.xyz/thoughts/SVCCA/../../thoughts/Singular-Value-Decomposition) of each subspace to get subspace $l^{'}_1 \subset l_1, l^{'}_2 \subset l_2$ 3. **Step 2**: Compute Canonical Correlation similarity between $l^{'}_1, l^{'}_2$, that is maximal correlations between $X,Y$ can be expressed as: $$ \max \frac{a^T \sum_{XY}b}{\sqrt{a^T \sum_{XX}a}\sqrt{b^T \sum_{YY}b}} $$ where $\sum_{XX}, \sum_{XY}, \sum_{YX}, \sum_{YY}$ are covariance and cross-variance terms. By performing change of basis $\tilde{x_{1}} = \sum_{xx}^{\frac{1}{2}} a$ and $\tilde{y_1}=\sum_{YY}^{\frac{1}{2}} b$ and Cauchy-Schwarz we recover an eigenvalue problem: $$ \tilde{x_{1}} = \argmax [\frac{x^T \sum_{X X}^{\frac{1}{2}} \sum_{XY} \sum_{YY}^{-1} \sum_{YX} \sum_{XX}^{-\frac{1}{2}}x}{\|x\|}] $$ 4. **Output**: aligned directions $(\tilde{z_i^{l_{1}}}, \tilde{z_i^{l_{2}}})$ and correlations $\rho_i$ > [!tip] distributed representations > > SVCCA has no preference for representations that are neuron (axed) aligned. [^testnet] ## References - Raghu, M., Gilmer, J., Yosinski, J., & Sohl-Dickstein, J. (2017). _SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability_. arXiv preprint arXiv:1706.05806 [arxiv](https://arxiv.org/abs/1706.05806) [^explain]: means allowing comparison between different layers of network and more comparisons to be calculated than with previous methods [^testnet]: Experiments were conducted with a convolutional network followed by a residual network: convnet: `conv --> conv --> bn --> pool --> conv --> conv --> conv --> conv --> bn --> pool --> fc --> bn --> fc --> bn --> out` resnet: `conv --> (x10 c/bn/r block) --> (x10 c/bn/r block) --> (x10 c/bn/r block) --> bn --> fc --> out` Note that SVD and CCA works with $\text{span}(z_1, \cdots, z_m)$ instead of being axis aligned to $z_i$ directions. This is important if representations are distributed across many dimensions, which we observe in cross-branch superpositions! --- slug: thoughts/Scents tags: - evergreen description: resconstructed source of "https://aarnphm.xyz/thoughts/Scents" title: Scents date: 2024-01-07 --- A (mostly) up-to-date scents that I use/like/prefer. See [antilibrary](https://aarnphm.xyz/thoughts/Scents/../../books) for reading list. ### like. - Maison Margiela’s _Lazy Sunday Morning_ - Maison Francis Kurkdjian’s _OUD satin mood_ - Tom Ford’s _Noir de Noir_ ### current. #### [Le Labo’s Rose 31](https://www.lelabofragrances.ca/rose-31.html?bypass=true\®ion=CA\&locale=EN\&gad_source=1) - Definitely a winter/spring scent. - If you like the smell of roses. Alternatives are Matcha 26, or Fleurs d’Oranger 27. #### [Le Labo’s Labdanum 18](https://www.lelabofragrances.ca/labdanum-18.html?bypass=true\®ion=CA\&locale=EN\&gad_source=1) - warm and sweet scent, good for a summer, fall night. - definitely stays a lot longer comparing to rose. --- slug: thoughts/Search tags: - seed - technical description: resconstructed source of "https://aarnphm.xyz/thoughts/Search" title: Search date: 2024-02-07 --- ## Engine A search engine is essentially query processing. It is a form of [information retrieval](https://aarnphm.xyz/thoughts/Search/../../thoughts/information-retrieval) that helps one to answer [questions](https://aarnphm.xyz/thoughts/Search/../../thoughts/questions) The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). Some search engines also mine [data](https://aarnphm.xyz/thoughts/Search/../../thoughts/data) available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler. ## query See also [PageRank](https://aarnphm.xyz/thoughts/Search/../../thoughts/PageRank) ### HITS algorithm --- slug: thoughts/Singular-Value-Decomposition tags: - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Singular-Value-Decomposition" title: Singular Value Decomposition date: 2024-10-21 --- $$ \begin{aligned} X &= \begin{bmatrix} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_m \\ \vdots & \vdots & \vdots & \vdots \\ 1 & x_1 & \cdots & x_m \end{bmatrix} = U \Sigma V^T \\ &= \begin{bmatrix} 1 & 1 & \cdots & 1 \\ u_{1} & u_{2} & \cdots & u_n \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 1 & \cdots & 1 \end{bmatrix} \begin{bmatrix} \sigma_1 & \cdots & \cdots & \cdots \\ \vdots & \sigma_2 & \cdots & \cdots \\ \vdots & \cdots & \ddots & \cdots \\ \vdots & \cdots & \cdots & \sigma_m \\ 0 & 0 & 0 & 0 \\ \end{bmatrix} {\begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ v_{1} & v_{2} & \cdots & v_n \\ \vdots & \vdots & \vdots & \vdots \end{bmatrix}}^T \\ \\ x_k &\in \mathbb{R}^n \\ \\ \text{U, V } &: \text{unitary matrices} \\ \Sigma &: \text{diagonal matrix} \end{aligned} $$ where $\begin{bmatrix} 1 \\ u_{1} \\ \vdots \\ 1 \end{bmatrix}$ are “eigen-faces” $U$ is orthonormal, meaning: $$ \begin{aligned} U U^T &= U^T U = \mathbb{I}_{n \times n} \\ V V^T &= V^T V = \mathbb{I}_{m \times m} \\ \\ \Sigma &: \text{diagonal} \quad \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_m \geq 0 \end{aligned} $$ --- slug: thoughts/TPU tags: - hardware description: resconstructed source of "https://aarnphm.xyz/thoughts/TPU" title: TPU date: 2024-03-04 --- See also: [XLA](https://aarnphm.xyz/thoughts/TPU/../../thoughts/XLA), and [architecture](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm) --- slug: thoughts/The-Prisoner's-Dilemma tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/The-Prisoner's-Dilemma" title: The Prisoner's Dilemma date: 2024-04-12 --- a [game theory](https://aarnphm.xyz/thoughts/The-Prisoner's-Dilemma/../../thoughts/game-theory) thought experiment involves two rational agents, each of whom can cooperate for mutual benefit or “defect” for individual reward. --- slug: thoughts/The-Will-To-Believe tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/The-Will-To-Believe" title: The Will To Believe date: 2024-02-08 --- Book: [web](https://www.gutenberg.org/files/26659/26659-h/26659-h.htm) ## rationality > But this relief seems to be a negative rather than a positive character. Shall we then say that the feeling of rationality is constituted merely by the absence of any feeling of irrationality? Just as we feel no particular pleasure when we breathe freely, but a very intense feeling of distress when the respiratory motions are prevented,—so any unobstructed tendency to action discharges itself without the production of much cogitative accompaniment, and any perfectly fluent course of thought awakens but little feeling; but when the movement is inhibited, or when the thought meets with difficulties, we experience distress. It is only when the distress is upon us that we can be said to strive, to crave, or to aspire. > All feeling whatever, in the light of certain recent psychological speculations, seems to depend for its physical condition not on simple discharge of nerve-currents, but on their discharge under arrest, impediment, or resistance. --- slug: thoughts/Transcendentals tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Transcendentals" title: Transcendentals date: 2024-01-14 --- > properties of being that are universal to all beings ### truth. Kant’s [transcendental idealism](https://aarnphm.xyz/thoughts/Transcendentals/../../thoughts/Philosophy-and-Kant). --- slug: thoughts/Transformers tags: - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Transformers" title: Transformers date: 2024-02-07 --- See also: [LLMs](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/LLMs), [embedding](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/Embedding), [visualisation from Brendan Bycroft](https://bbycroft.net/llm) > A multi-layer perception (MLP) architecture built on top of a [multi-head attention](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/Attention#muti-head-attention) mechanism ([Vaswani et al., 2023](#bib-vaswani2023attentionneed)) to signal high entropy tokens to be amplified and less important tokens to be diminished. ELI5: Mom often creates a food list consists of $n$ of items to buy. Your job is to guess what the last item on this list would be. Most implementations are [autoregressive](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/Autoregressive-models). Most major SOTA are decoder-only, as encoder-decoder models has lack behind due to their expensive encoding phase. [state-space models](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/state-space-models) which address transformers’ [efficiency issues](https://arxiv.org/pdf/2009.06732) in attention layers within information-dense data ## memory limitations. _excerpt from [arxiv](https://arxiv.org/html/2403.14123)_ > "How is LLaMa.cpp possible?"\ > great post by [@finbarrtimbers](https://twitter.com/finbarrtimbers) \ > \ > llama.cpp surprised many people (myself included) with how quickly you can run large LLMs on small computers, e.g. 7B runs @ \~16 tok/s on a MacBook. Wait don't you need supercomputers to work… [pic.twitter.com/EIp9iPkZ6x](https://t.co/EIp9iPkZ6x) > > — Andrej Karpathy (@karpathy) [15 août 2023](https://twitter.com/karpathy/status/1691571869051445433) ## inference. Either compute-bound (batch inference, saturated usage) or memory-bound (latency) [speculative decoding](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/vllm#speculative-decoding) ⇒ memory-bound (to saturate FLOPs) ### next-token prediction. Sampling: we essentially look forward K-tokens, and then we sample from the distribution of the next token. ## Feynman-Kac Let $\mathcal{V}$ be the vocab of given transformers model, and $\mathcal{S} = \mathcal{V}^{*}$ the set of multi-token strings. Assume $\mathcal{V}$ contains token `EOS` and write $\mathcal{F} \subseteq \mathcal{S}$ for the set of `EOS`-terminated strings. > [!definition] > > is a tuple $(s_{0}, \{M_t\}_{t\ge 1}, \{G_t\}_{t\ge 1})$ where: > > - $s_{0} \in \mathcal{S}$ is an _initial state_, which will take as empty string $\epsilon$ > - $M_t(s_t \mid s_{t-1}, f_\theta)$ is a _Markov kernel_ from $s_{t-1} \in \mathcal{F}^c$ to $s_t \in \mathcal{S}$, parameterised by a transformer network $f_\theta: \mathcal{F}^c \to \mathbb{R}^{\mid \mathcal{V} \mid}$ mapping non-`EOS`-terminated strings to vectors of logits > - $G_t(s_{t-1}, s_t, f_\theta)$ is a _potential function_, mapping a pair $(s_{t-1}, s_t) \in \mathcal{F}^c \times \mathcal{S}$ to a real-valued non-negative score. Goal: generate from distribution $\mathbb{P}$ that reweights Markove chain $\mathbb{M}$ by potential functions $G_t$. We define ==_step-t filtering posteriors_==: $$ P_t(s_t) = \frac{\mathbb{E}_\mathbb{M} \left[ \prod_{i=1}^{t \wedge T} G_i(S_{i-1}, S_i, f_\theta) \cdot [S_t = s_t] \right]}{\mathbb{E}_\mathbb{M} \left[ \prod_{i=1}^{t \wedge T} G_i(S_{i-1}, S_i, f_\theta) \right]} $$ _Given that $T$ is mostly finite_ we can then define _overall posterior_ $\mathbb{P}(s) = \lim_{t \to \infty} \mathbb{P}_t(s)$ ([Lew et al., 2023, p. see 2.2 for examples](#bib-lew2023sequentialmontecarlosteering)) ```pseudo \begin{algorithm} \caption{Sequential Monte Carlo Transformer Steering} \begin{algorithmic} \State \textbf{Input:} $N$ (\# particles), $K$ (factor), Feynman-Kac Transformer model $\{s_0, \{M_t\}_{t \geq 1}, \{G_t\}_{t \geq 1}\}$ \State \textbf{Output:} Weighted particle approximation $\{(x_i, w_i)\}_{i=1,\ldots,N}$ of the posterior $\mathbb{P}$ \\ \State \textbf{Output:} Unbiased estimate $\hat{Z}$ of the partition function $Z = \mathbb{E}_\mathbb{M}[\prod_{t=1}^T G_t(s_t, s_{t-1}, f_\theta)]$ \\ \State Initialize $f_\theta \gets \texttt{CachedTransformer}()$ \State Initialize $(x_i, w_i) \gets (s_0, 1)$ for $i = 1, \ldots, N$ \State Initialize $t \gets 1$ \While{$x_i \not\in \mathcal{F}$ for some $i \in \{1, \ldots, N\}$} \State $K_i \gets K (1 - \mathbb{1}_{\mathcal{F}}(x_i)) + \mathbb{1}_{\mathcal{F}}(x_i)$ for $i = 1, \ldots, N$ \State $N' \gets \sum_{i=1}^N K_i$ \For{$i \in \{1, \ldots, N\}$} \If{$x_i \in \mathcal{F}$} \State Set $(x_{i,1}, w_{i,1}) \gets (x_i, w_i \cdot \frac{N'}{N})$ \Else \State Generate $x_{i,k} \sim M_t(\cdot \mid x_i, f_\theta)$ for $k = 1, \ldots, K$ \State Set $w_{i,k} \gets w_i \cdot G_t(x_i, x_{i,k}, f_\theta) \cdot \frac{N'}{K N}$ for $k = 1, \ldots, K$ \EndIf \EndFor \State Set normalized weights $\hat{w}_{i,k} \gets \frac{w_{(i,k)}}{\sum_{j=1}^N \sum_{l=1}^{K_j} w_{(j,l)}}$ for $i = 1, \ldots, N$ and $k = 1, \ldots, K_i$ \State Set $c^* \gets \inf\{c \in \mathbb{R}_{> 0} \mid \sum_{i=1}^N \sum_{k=1}^{K_i} (\mathbb{1} \wedge c \hat{w}_{(i,k)}) > N\}$ \State Set $(I_\text{det}, I_\text{stoch}, I_\text{strat}) \gets (\{(i,k) \mid c^{*} \hat{w}_{i,k} \geq 1\}, \{(i,k) \mid c^{*} \cdot \hat{w}_{i,k} < 1\}, \{\})$ \State Set $\alpha \gets \frac{\sum_{i \in I_\text{stoch}} \hat{w}_i}{|I_\text{det}|}$ and generate $U \sim \text{Uniform}([0, \alpha])$ \For{$i \in I_\text{stoch}$} \State Set $U \gets U - \hat{w}_i$ \If{$U < 0$} \State Set $I_\text{strat} \gets I_\text{strat} \cup \{i\}$ \State Set $U \gets U + \alpha$ \EndIf \EndFor \State Set particles $\{(x_i, w_i)\}_{i=1,\ldots,|I_\text{det}|} \gets \{(x_j, w_j \cdot \frac{N}{N'}) \mid j \in I_\text{det}\}$ \State Set particles $\{(x_i, w_i)\}_{i=|I_\text{det}|+1,\ldots,N} \gets \{(x_j, \frac{N}{c^* N'} \sum_{l=1}^{N} \sum_{k=1}^{K_l} w_{(j,k)}) \mid j \in I_\text{strat}\}$ \EndWhile \State \Return $\left((x_i, w_i)_{i=1,\ldots,N}, \hat{Z} = \frac{1}{N} \sum_{i=1}^N w_i \right)$ \end{algorithmic} \end{algorithm} ``` ## References - Lew, A. K., Zhi-Xuan, T., Grand, G., & Mansinghka, V. K. (2023). _Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs_. arXiv preprint arXiv:2306.03081 [arxiv](https://arxiv.org/abs/2306.03081) - Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). _Attention Is All You Need_. arXiv preprint arXiv:1706.03762 [arxiv](https://arxiv.org/abs/1706.03762) --- slug: thoughts/Turing-complete-Transformers tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/Turing-complete-Transformers" title: Turing-complete Transformers date: 2024-01-30 --- > Turing Complete Transformers: Two Transformers Are More Powerful Than One\ > "We prove transformers are not Turing complete, propose a new architecture that is Turing complete, and empirically demonstrate that the new architecture can generalize more effectively than transformers."… [pic.twitter.com/LGVlZt0afu](https://t.co/LGVlZt0afu) > > — Burny — Effective Omni (@burny\_tech) [7 janvier 2024](https://twitter.com/burny_tech/status/1744100637187461455) The idea is to combine two small [transformers](https://aarnphm.xyz/thoughts/Turing-complete-Transformers/../../thoughts/Transformers) rather than one [large models](https://aarnphm.xyz/thoughts/Turing-complete-Transformers/../../thoughts/large-models) More specialised on given tasks, and prove to be Turing-complete? --- slug: thoughts/Value tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Value" title: Value date: 2024-02-07 --- Encapsulates the following branches: - [Moral](https://aarnphm.xyz/thoughts/Value/../../thoughts/moral) - [Aesthetic](https://aarnphm.xyz/thoughts/Value/../../thoughts/aesthetic-value) > [!tip] Axiology > > concerns about about the goodness of all varieties, encompasses the nature of value, where it came from. --- slug: thoughts/Vietnamese-poem tags: - seed - poem description: dedicated to my roots, Vietnamese born. title: Vietnamese poem date: 2024-11-18 --- ## Tố Hữu Nguyễn Kim Thành (alias: Tố Hữu) was born in Phù Lai Village, near cố đô Huế. He was considered to be one of the frontier in contemporary Vietnamese literature ### Vú em ```poetry language=vi Nàng gửi con về nương xóm cũ Nghẹn ngào trở lại đẩy xe nôi Rồi từ hôm ấy, ôm con chủ Trong cánh tay êm, luống ngậm ngùi Nàng nhớ con nằm trong tổ lạnh Không chăn, không nệm ấm, không màn. Biết đâu trong những giờ hiu quạnh Nó gọi tên nàng tiếng đã khan! Rồi từ hôm ấy, dưới đêm sâu Hồi hộp nàng ra vịn cửa lầu Nhìn xuống ven trời dày bóng nặng Tìm nghe trong gió tiếng con đâu Gió vẫn vô tình lơ đãng bay Những tàu cau yếu sẽ lung lay Xạc xào động cánh đau lòng mẹ Nghe tiếng lòng con vẳng tới đây! Ta thấy nàng nghiêng mình rũ rượi Gục đầu thổn thức trong bàn tay... Bạn ơi, nguồn thảm sầu kia bởi Số phận hay do chế độ này? Huế, tháng 5-1938 ``` --- slug: thoughts/Will-to-Power tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Will-to-Power" title: Will to Power date: 2024-02-24 --- --- slug: thoughts/Will-to-Truth tags: - seed - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/Will-to-Truth" title: Will to Truth date: 2023-10-24 --- See also: [Philosophy and Nietzche](https://aarnphm.xyz/thoughts/Will-to-Truth/../../thoughts/Philosophy-and-Nietzsche) _excerpt from [Nietzsche](https://aarnphm.xyz/thoughts/Will-to-Truth/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche)’s Beyond Good and Evil_ > The will to truth, which is still going to tempt us to many a hazardous enterprise; that celebrated veracity of which all philosophers have hitherto spoken with reverence: what questions this will to truth has already set before us! What strange, wicked, questionable questions! It is already a long story — yet does it not seem as if it has only just begun? Is it any wonder we should at last grow distrustful, lose our patience, turn impatiently away? That this sphinx should teach us too to ask questions? Who really is it that here questions us? What really is it in us that wants ‘the truth’? Nietzsche critiques the traditional approaches of truth and morality, deemed it “hazardous enterprise”. From a perspective of a philosophers, who are “deemed to pursue the truth” doesn’t seem to fully understand why > We did indeed pause for a long time before the question of the origin of this will — until finally we came to a complete halt before an even more fundamental question. We asked after the value of this will. Granted we want truth: why not rather untruth? And uncertainty? Even ignorance? — The problem of the value of truth stepped before us — or was it we who stepped before this problem? Which of us is Oedipus here? Which of us sphinx? It is, it seems, a rendezvous of questions and question-marks. And, would you believe it, it has finally almost come to seem to us that this problem has never before been posed — that we have been the first to see it, to fix our eye on it, to hazard it? For there is a hazard in it and perhaps there exists no greater hazard --- slug: thoughts/Will tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/Will" title: Will date: 2024-01-14 --- ## truth. [Nietzsche](https://aarnphm.xyz/thoughts/Will/../../thoughts/Philosophy-and-Nietzsche) critiques the traditional approaches of [truth](https://aarnphm.xyz/thoughts/Will/../../thoughts/Will-to-Truth). Nietzsche argues that philosophical thinking, like all conscious thinking, is driven by “instinctive” psychological forces, underneath which lie “valuations or, more clearly, physiological demands for the preservation of a certain type of life.” What we really value is not truth, but survival, he says. He resists “accustomed value feelings,” and wants to go “beyond good and evil” (201) ## power. ## rationality > But this relief seems to be a negative rather than a positive character. Shall we then say that the feeling of rationality is constituted merely by the absence of any feeling of irrationality? Just as we feel no particular pleasure when we breathe freely, but a very intense feeling of distress when the respiratory motions are prevented,—so any unobstructed tendency to action discharges itself without the production of much cogitative accompaniment, and any perfectly fluent course of thought awakens but little feeling; but when the movement is inhibited, or when the thought meets with difficulties, we experience distress. It is only when the distress is upon us that we can be said to strive, to crave, or to aspire. > All feeling whatever, in the light of certain recent psychological speculations, seems to depend for its physical condition not on simple discharge of nerve-currents, but on their discharge under arrest, impediment, or resistance. [Lien vers l'original](https://aarnphm.xyz/thoughts/Will/../../thoughts/The-Will-To-Believe#rationality) --- slug: thoughts/XLA tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/XLA" title: XLA date: 2022-12-23 --- - Accelerated Algebra - Developed from Tensorflow ```python def calc(x, y, z): return tf.reduce_sum(x + y * z) ``` Optimise compute graph via single kernel launch vs. launching three separate kernel See also [PJRT](https://aarnphm.xyz/thoughts/XLA/../../thoughts/PJRT) --- slug: thoughts/action-theory tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/action-theory" title: action theory date: 2024-02-22 --- There is a huge difference between activity and passivity --- slug: thoughts/aesthetic-value tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/aesthetic-value" title: aesthetic value date: 2024-01-30 --- Also known as [taste](https://aarnphm.xyz/thoughts/aesthetic-value/../../thoughts/taste), under the scope of [value](https://aarnphm.xyz/thoughts/aesthetic-value/../../thoughts/Value) [Source](https://plato.stanford.edu/entries/aesthetic-concept/) > - whether artworks are necessarily aesthetic objects; > - how to square the allegedly perceptual basis of aesthetic judgments with the fact that we give [reasons](https://aarnphm.xyz/thoughts/aesthetic-value/../../thoughts/reason) in support of them; > - how best to capture the elusive contrast between an aesthetic attitude and a practical one; > - whether to define aesthetic experience according to its phenomenological or [representational](https://aarnphm.xyz/thoughts/aesthetic-value/../../thoughts/representations) content; > - how best to understand the relation between aesthetic value and aesthetic experience ## beauty --- slug: thoughts/algebraic-geometry tags: - math - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/algebraic-geometry" title: Algebraic geometry date: 2024-05-22 --- See also [git source](https://github.com/stacks/stacks-project) and [web view](https://stacks.math.columbia.edu/) --- slug: thoughts/atelier-with-friends/dundurn tags: - menu description: atelier with friends deux - orangeville title: dundurn. date: 2024-03-23 --- ## entrée. ### Soupe à l’Oignon Gratinée oignons caramélisés, bouillon de bœuf, gruyère, baguette. ## plat principal. ### [poissons.](https://aarnphm.xyz/thoughts/atelier-with-friends/dundurn/../../../../thoughts/atelier-with-friends/images/dundurn-1.webp) flétan, sauce au beurre citronné, carottes anciennes rôties, purée de carottes. ## dessert. ### tiramisu. espresso, mascarpone, biscuits à la cuillère, cacao. --- slug: thoughts/atelier-with-friends/index tags: - menu - evergreen description: resconstructed source of "https://aarnphm.xyz/thoughts/atelier-with-friends/index" title: atelier with friends. date: 2024-03-07 --- Somewhat impromptu supper club hosted by yours truly. See also [dishes](https://aarnphm.xyz/thoughts/atelier-with-friends/index/../../../../thoughts/Dishes) for a comprehensive repertoire. --- slug: thoughts/atelier-with-friends/orangeville tags: - menu description: atelier with friends uno - orangeville title: orangeville. date: 2024-03-08 --- ## pasta. ### [Uovo la Raviolo](https://aarnphm.xyz/thoughts/atelier-with-friends/orangeville/../../../../thoughts/atelier-with-friends/images/orangeville-1.webp) uovo, beurre noisette, salvia, parmigiano reggiano, ricotta ripieno, noce moscata ### [Pomodori alla fetuccine](https://aarnphm.xyz/thoughts/atelier-with-friends/orangeville/../../../../thoughts/atelier-with-friends/images/orangeville-2.webp) bucatini, marinara, olio oliva ### Aglio e Olio bucatini, olio oliva, aglio ### Pesto alla bucatini bucatini, pesto, pepe ## salsa. ### Marinara pomodori, cipolla, basilico, aglio confit, fiocchi di peperoncino, origano, aceto di vino bianco. ### Pesto basilico, olio extra vergine di oliva, pinoli tostati, parmigiano reggiano, aglio tritato. --- slug: thoughts/attractor tags: - math - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/attractor" title: Attractor date: 2024-03-25 --- A set of points described by a dynamical system. Some exhibits [chaotic](https://aarnphm.xyz/thoughts/attractor/../../thoughts/Chaos) behaviour, see also [Paul Bourke’s work](https://paulbourke.net/fractals/) Often create visually appealing patterns, but its applications range from physics to biology: how we understand weather patterns, bird migration patterns, quantum phenomena. --- slug: thoughts/being tags: - seed - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/being" title: being. date: 2024-06-12 --- What is being as a part of [epistemology](https://aarnphm.xyz/thoughts/being/../../thoughts/Epistemology)? ### why do we practice art? Practice anything, no matter how well or how bad, we practice art as a act of becoming, for our soul to grow. --- slug: thoughts/composition tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/composition" title: composition date: 2024-03-09 --- How we combine elements to make a comprehensive model. Uses [Color](https://aarnphm.xyz/thoughts/composition/../../thoughts/Color) --- slug: thoughts/computational-poem tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/computational-poem" title: computational poem date: 2024-10-11 --- Workshop by [Alicia Guo](https://www.aliciaguo.com/) See also [thoughts/code/poem.js](https://cdn.aarnphm.xyz/assets/thoughts/code/poem.js) ## text generation with grammars So what shapes languages? Grammars do. ## context-free grammars --- slug: thoughts/confirmation-bias tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/confirmation-bias" title: confirmation bias date: 2024-02-07 --- --- slug: thoughts/constrained-decoding tags: - ml - proposal description: structured generations in vLLM a la carte title: constrained decoding date: 2024-11-18 --- The following document describes and summarizes existing works in vLLM to improve general guided decoding performance. [^performance] This design will largely affect how `logit_processor` are currently being handle within the vLLM architecture. Main mega thread: [vllm-project/vllm#5423](https://github.com/vllm-project/vllm/issues/5423) Goal: - Improve general TPS when using guided decoding. - Standardize logit processor interface [^samplingpr] - separate compute\_logits and preparing logits into two separate steps Orthogonal, but still goals: - [vllm-project/vllm#5006](https://github.com/vllm-project/vllm/pull/5006) - Logit processor plugins, similar to how vLLM plugins are handled. [vllm-project/vllm#4769](https://github.com/vllm-project/vllm/pull/4769) - xgrammar: Scope: `logit_processor`, sampling controller interface ## background ![flow](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/pre-optimized-logit-processor-handling.webp) _reference: [vllm-project/vllm#5329](https://github.com/vllm-project/vllm/pull/5329)_ Currently, generations with FSM is super slow, even with warmup steps to initialize given FSM. This behaviour is further exemplified when running with context longer than 4096 tokens. Additionally, all outlines logit processors are considered stateful, which slows down the model executor, given in V0 logit processors are applied [row-by-row blocking](https://github.com/vllm-project/vllm/blob/1ea291a4173a82c537ab42487e23375be4926d30/vllm/model_executor/layers/logits_processor.py#L143) Thus comparing to sglang, vLLM v0 is currently not up to par. ## plan - Implement [jump-ahead decoding](https://lmsys.org/blog/2024-02-05-compressed-fsm/#method-1-finite-state-machine-based) through a JSONWorker, we can then extend this to CFGWorker - similar to how spec decode is currently implemented in V0 echo from [**@cadedaniel**](https://github.com/cadedaniel): “tree scoring in \[spec decode] could use the same API as multi-path jump decoding.” > [!question] How should we handle FSM per requests? > > - Currently, users can specify different schemas per request, which means the FSM will be compiled per request. This is suboptimal because it slows down general TTFT. > - For most use cases, we should assume JSON schema similar to how the system prompt is currently being handled (pass during server init) --- ## appendix. The following includes background information about guided generations. ### compressed FSM for jump-ahead tokens. Implemented in ([Zheng et al., 2024](#bib-zheng2024sglangefficientexecutionstructured)) #### Method 1: [FSM](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/constrained-decoding#guided-generations-with-fsm)-based decoding - intuition: Using FSM ([Willard & Louf, 2023](#bib-willard2023efficientguidedgenerationlarge)) to guide generations by increasing logit bias for tokens that conform to given JSON schema. This allows us to track the current state during decoding and filter out invalid tokens by applying logit bias to the output. ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/constrained-json-fsm.webp) - limitation: we can see that given construction of FSM requires token-level access, it can only transition the state by only _one_ token at a time, resulting in slow decoding. #### Method 2: Interleaved-based - intuition: breaks down JSON schemas, each containing either a chunk prefill part or constrained decoding part. They are then executed interleaved by inference system. Faster than per-token decoding given that chunked prefill components can process multiple tokens per forward pass See also using llama.cpp as backend. - limitation: - interleaved-based require custom syntax, making it less expressive compared to regex. - struggles to deal with tokenization boundaries due to conflicts between decode and chunked prefill segments. - frequent communications between interpreter and back-end adds additional overhead. #### **Method 3: Jump-Forward Decoding with compressed FSM** ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/jump-forward-decoding-fsm.webp) > [!tip] tokenization boundary handling > > During decoding, it is preferred to combine multiple characters into a single tokens. > > For example, when decoding `"Hello"` in context of JSON decoding, LLM might output the following token `"`, `He`, `llo`, `",` > > This may cause some strange behaviour if we combine the last `"` with `,` (this regex `"[\w\d\s]*"` with the last `,` will lead to endless decoding because this token `",` is not valid even if the LM wants to stop.) Fix: - implement re-tokenization mechanism during jump-forward phase (append string instead of the tokens, followed with re-tokenization of the entire text) $\to$ add approximately 4% of overhead - use a comprehensive regex to guide the decoding phase, instead of employing multiple concatenated regex [^coalescence] ### Coalescence intuition: Instead of expanding to $n$ state, we can compress certain chunks into one state to reduce the size of said FSM. ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/part-of-json-fsm.webp) _figure 1: initial FSM state_ ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/compressed-fsm-json.webp) _figure 2: compressed FSM state_ A way to adapt character regex to work with tokens in `outlines`: ```python import outlines.fsm as fsm from outlines.fsm.regex import make_deterministic_fsm, create_fsm_index_tokenizer new_fsm, _ = make_deterministic_fsm(fsm) idx, _ = create_fsm_index_tokenizer(new_fsm, tokenizer) ``` ```mermaid stateDiagram-v2 [*] --> InputPrompt: Start state "input prompt" as InputPrompt state "next-token probability distribution" as GetProb state "valid tokens" as ListTokens { [*] --> CheckTransitions CheckTransitions --> FilterTokens: Get index[0].keys() FilterTokens --> [*] } state "Sample Token" as SampleToken state "Update FSM State" as UpdateState InputPrompt --> GetProb: "model.generate" GetProb --> ListTokens: Get next-token distribution ListTokens --> SampleToken: Use filtered token list SampleToken --> UpdateState: Selected token X UpdateState --> [*]: new_state = index[0]["X"] ``` ```python idx_with_tokens = { state: {tokenizer.tokenizer.decode([key]): value for key, value in transitions.items()} for state, transitions in idx.items() } ``` > [!note]- example > > ```mermaid > stateDiagram-v2 > direction LR > 0 --> 2: n > 0 --> 1: t > 1 --> 2: a > 2 --> 4: na > 2 --> 3: a > 3 --> 5: am > 4 --> 6: me > 5 --> 6: me > 2 --> 6: name > 6 --> 7: e > 6 --> 8: c > 7 --> 9: p > 8 --> 9: p > 9 --> 11: Paul > 9 --> 12: Pa > 9 --> 10: Jo > 11 --> 13: aul > 12 --> 14: ul > 10 --> 26: o > 26 --> 27: h > 27 --> 14: n > 13 --> 14: l > 14 --> 16: s > 14 --> 15: s > 15 --> 17: s > 16 --> 17: s > 17 --> 18: a > 17 --> 19: ag > 18 --> 20: ge > 19 --> 20: e > 20 --> 21: 30 > 20 --> 22: 20 > 21 --> 24: 2 > 22 --> 24: 2 > 22 --> 23: 3 > 24 --> 25: 0 > 25 --> [*] > ``` _note:_ each state of FSM represents a forward pass to the LM. In vanilla generation, this is essentially necessary. Thus there is no added overhead of FSM for controlling the generated outputs. From state 2-6, we observer that there are eight different paths to get the same generations of `name`. We probably don’t need to do this, given that it will all give us result `name` But suffice to say, we can hijack this behaviour to accelerate generations by append either of the following tokens **word** to currently generated sequence: - \[”name”] - \[”n”, “a”, “m”, “e”] - \[”na”, “m”, “e”] - \[”nam”, “e”] - \[”n”, “am”, “e”] - \[”n”, “ame”] - \[”na”, “me”] - \[”n”, “a”, “me”] A simplified index can be shown as: ```python simplified_index = { 0: {'{"': 2}, 2: {"name": 6}, 6: {'":"': 9}, 9: {'Paul': 14, 'John': 14}, 14: {'","': 17}, 17: {'age': 20}, 20: {'":': 22}, 22: {'20': 24, '30': 24}, 24: {'}': 25}, } ``` That’s at least a 5x speedup over structured generations, given that out of the 9 tokens, two states are single-state transitions. Therefore we only need to call the model twice!! > [!tip]- difference in sampling distribution > > All these paths lead to the same string and the same speedup, however they lead to potentially very different states for the LLM when it reaches state 6. That is, the strings are the same, but each path leads to a different conditional probability distribution in stage 6. > > ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/json-difference-in-sampling-distribution.webp) ### Guided generations with FSM. ([Willard & Louf, 2023](#bib-willard2023efficientguidedgenerationlarge)), implemented at _assumption: we are building against [autoregressive transformers models](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/Autoregressive-models)_ - Let $\mathcal{F} \subset \mathcal{P}(\mathcal{V})$, where $\mathcal{P}$ is the power set operator, be subset of multi-token string that ends with tokens $\text{EOS} \in \mathcal{V}$. - Text generation tasks is to draw samples from $\mathcal{F}$ Notable sampling methods include greedy decoding (generate tokens recursively with highest probability tokens), beam search (but using heuristic to find the mode of distribution) [^smc] A pseudocode for sampling procedure is as follow: ```pseudo \begin{algorithm} \caption{LLM token sampling} \begin{algorithmic} \Function{sample}{$L$} \State $s \gets ()$ \For{$i \gets 1, L$} \State $\alpha \gets \text{LM}(s, \theta)$ \State Sample $s \sim \text{Categorical}(\alpha)$ \If{$s = \text{EOS}$} \State \textbf{break} \EndIf \State $s \gets \text{append}(s, s)$ \EndFor \State \Return $s$ \EndFunction \end{algorithmic} \end{algorithm} ``` Given that we are dealing with finite discrete distribution, we can then compute an un-normalized conditional distribution by applying a boolean mask $m: \mathcal{P}(\mathcal{V}) \to \{0,1\}^N$, which restricts the support of original distribution: $$ \begin{aligned} \alpha &= \text{LM}(\tilde{S_t}, \theta) \\ \tilde{\alpha} &= m(\tilde{S_t}) \odot \alpha \\ \tilde{s_{t+1}} &\approx \text{Categorial}(\tilde{\alpha}) \end{aligned} $$ > [!math] augmentation upon sampling algorithm > > ```pseudo > \begin{algorithm} > \caption{token sampling with masking} > \begin{algorithmic} > \Function{sample}{$L$} > \State $s \gets ()$ > \For{$i \gets 1, L$} > \State $\alpha \gets \text{LM}(s, \theta)$ > \State Construct the mask m($s$) > \State $\tilde{\alpha} \gets m \odot \alpha$ > \State Sample $\tilde{s} \sim \text{Categorical}(\tilde{\alpha})$ > \If{$\tilde{s} = \text{EOS}$} > \State \textbf{break} > \EndIf > \State $s \gets \text{append}(s, \tilde{s})$ > \EndFor > \State \Return $s$ > \EndFunction > \end{algorithmic} > \end{algorithm} > ``` > [!tip] finite automaton > > We define a _finite-state machine_, given by $(Q, \Sigma , \delta, q_0, F)$ [^automaton-definition] where character comprising the strings in $\mathcal{V}$ are drawn from $\Sigma$, i.e: $\mathcal{V} \in \mathcal{P}(\Sigma)$ > > ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/fsm-iterative-generations.webp) We define finding sub-sequences of FSM $M$ that accept string $v$ as follow: ```pseudo \begin{algorithm} \caption{Find sub-sequences of the FSM $M$ that accept the string $v$} \begin{algorithmic} \Function{FindSubSequences}{$M, v$} \State $M = (Q, \Sigma, \delta, q_0, F)$ \State $\texttt{res} \gets ()$ \For{$r \in \delta^{-1}(\cdot, v_0)$} \Comment{$\text{ Loop through states that read } v_0$} \State $p \gets (r)$ \For{$i \gets 1, |v| - 1$} \Comment{$\text{ Walk the FSM}$} \If{$\delta(r, v_i) = \emptyset$} \Comment{$\text{ The FSM does not read } v_i$} \State $p \gets ()$ \State \textbf{break} \Comment{$\text{ Stop walking and try the next start state}$} \EndIf \State $r \gets \delta(r, v_i)$ \State $p \gets \text{append}(p, r)$ \EndFor \State $\texttt{res} \gets \text{append}(\texttt{res}, p)$ \EndFor \State \Return $\texttt{res}$ \EndFunction \end{algorithmic} \end{algorithm} ``` We can then define construction of $\sigma$ ```pseudo \begin{algorithm} \caption{Construct a map from FSM states to subsets of $\mathcal{V}$} \begin{algorithmic} \Function{MapStatesToVocab}{$M, \mathcal{V}$} \State $M = (Q, \Sigma, \delta, q_0, F)$ \State Initialize the map $\sigma$ with empty sets for each element in $Q$ \For{$v \in \mathcal{V}$} \Comment{$\text{Loop through the vocabulary}$} \State $Z \gets \text{find\_sub\_sequences}(M, v)$ \For{$z \in Z$} \Comment{$\text{Loop through state sequences accepting } v$} \State $\sigma(z_0) \gets \sigma(z_0) \cup v$ \EndFor \EndFor \State \Return $\sigma$ \EndFunction \end{algorithmic} \end{algorithm} ``` ## References - Lew, A. K., Zhi-Xuan, T., Grand, G., & Mansinghka, V. K. (2023). _Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs_. arXiv preprint arXiv:2306.03081 [arxiv](https://arxiv.org/abs/2306.03081) - Willard, B. T., & Louf, R. (2023). _Efficient Guided Generation for Large Language Models_. arXiv preprint arXiv:2307.09702 [arxiv](https://arxiv.org/abs/2307.09702) - Zheng, L., Yin, L., Xie, Z., Sun, C., Huang, J., Yu, C. H., Cao, S., Kozyrakis, C., Stoica, I., Gonzalez, J. E., Barrett, C., & Sheng, Y. (2024). _SGLang: Efficient Execution of Structured Language Model Programs_. arXiv preprint arXiv:2312.07104 [arxiv](https://arxiv.org/abs/2312.07104) [^performance]: Benchmark script can be found at [vllm-project/vllm#10046](https://github.com/vllm-project/vllm/pull/10046). Current RFC [vllm-project/vllm#5423](https://github.com/vllm-project/vllm/issues/5423) Note that `lm-format-enforcer` failed to compile the test schema. [^samplingpr]: [vllm-project/vllm#6273](https://github.com/vllm-project/vllm/pull/6273) proposed a sampling controller interface, but [**@cadedaniel**](https://github.com/cadedaniel) shares some [concerns](https://github.com/vllm-project/vllm/pull/6273#issuecomment-2243654991) wrt fast-forward tokens [^coalescence]: this phenomena is also known as [coalescence](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/constrained-decoding#coalescence) in structured generations, where it exploit deterministic structures in desired outputs to skip expensive forward pass [^smc]: ([Lew et al., 2023](#bib-lew2023sequentialmontecarlosteering)) recently proposes a sequential [Monte Carlo steering](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/Monte-Carlo). The idea is to classify causal generations as a _posteriori inference_ problem in a class of discrete probabilistic sequence models. See also [Feynman-Kac transformers models](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/Transformers#feynman-kac) [^automaton-definition]: [finite state machine](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/DFA) - $Q$ is a finite set of states - $\Sigma$ is a finite alphabet - $\delta: Q \times \Sigma \to Q$ is the transition function - $q_0 \in Q$ is the start state - $F \subseteq Q$ is the set of all accepted states. --- slug: thoughts/cryptography tags: - technical description: resconstructed source of "https://aarnphm.xyz/thoughts/cryptography" title: cryptography date: 2024-02-08 --- ### functions. See also [Merkle DAG](https://aarnphm.xyz/thoughts/cryptography/../../thoughts/Merkle-DAG) --- slug: thoughts/cyanotype tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/cyanotype" title: cyanotype date: 2024-10-03 --- > slow-reacting, economical photographic printing formulation In context of writing, similar to what telescopic writing is. --- slug: thoughts/data tags: - seed - pattern description: resconstructed source of "https://aarnphm.xyz/thoughts/data" title: data date: 2024-02-07 --- Representation of information in a formalised manner suitable for communication, interpretation, or processing by humans or by automatic means. ⇒ semanticity Logistic regression: $$ \frac{1}{1 + e^{-(x - \mu)/s}} $$ - schema + relational. ## theory See also [database](https://aarnphm.xyz/thoughts/data/../../tags/sfwr3db3) ## types. nominal data - qualitative data - mutually exclusive - cannot be ranked - $= \neq \in \notin$ ordinal data - represents categories - $= \neq \in \notin > <$ time-series data (interval) - no true zero - $= \neq > < + -$ ratio data - $= \neq > < + - \times \%$ ## dimensionality --- slug: thoughts/deep-learning tags: - ml - framework description: resconstructed source of "https://aarnphm.xyz/thoughts/deep-learning" title: deep learning date: 2024-01-11 --- See also: [The Little Book of Deep Learning](https://aarnphm.xyz/thoughts/deep-learning/../../books#2024) ([pdf](https://fleuret.org/public/lbdl.pdf) or [lectures](https://fleuret.org/dlc/)) or this [lecture series at CMU](https://dlsyscourse.org/lectures/) - [PyTorch](https://aarnphm.xyz/thoughts/deep-learning/../../thoughts/PyTorch) - [Jax](https://aarnphm.xyz/thoughts/deep-learning/../../thoughts/Jax): from [autograd](https://github.com/HIPS/autograd) project, by pretty much the same core team --- slug: thoughts/design tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/design" title: design date: 2024-03-09 --- ### what? --- slug: thoughts/desire tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/desire" title: Desire date: 2024-02-08 --- --- slug: thoughts/dialectics tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/dialectics" title: dialectics date: 2024-02-07 --- often involves some sort of contradictory between opposing sides. ### [Hegel](https://aarnphm.xyz/thoughts/dialectics/../../thoughts/Hegel)’s dialectics The opposing sides are dependant of the topics being discussed. In [Phenomenology of Spirit](https://aarnphm.xyz/thoughts/dialectics/../../thoughts/Hegel#phenomenology-of-spirit) which presents his [epistemology](https://aarnphm.xyz/thoughts/dialectics/../../thoughts/Epistemology), the “opposing sides” are different definitions of consciousness and of the object that consciousness is aware of or claims to know. In his work in [logic](https://aarnphm.xyz/thoughts/dialectics/../../thoughts/logic), the opposing sides are logical concepts that are opposed to one another. --- slug: thoughts/displacement tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/displacement" title: displacement date: 2024-01-08 --- Often explored through Graham Greene’s works. --- slug: thoughts/distraction tags: - seed - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/distraction" title: distraction date: 2024-03-18 --- > i think we forget that a core part of the human experience is to create and be creative\ > \ > we create all the time, even in the most mundane ways\ > \ > but creativity isnt about the act of \*producing\* something - its about reaching a state of awareness that allows you to filter out the… [pic.twitter.com/A7PML2zLtt](https://t.co/A7PML2zLtt) > > — harpriya (@harpriiya) [18 mars 2024](https://twitter.com/harpriiya/status/1769532246674022407) --- slug: thoughts/education tags: - pattern description: resconstructed source of "https://aarnphm.xyz/thoughts/education" title: education date: 2024-02-07 --- See more on [the extension](https://aarnphm.xyz/thoughts/education/../../posts/education) ## system current[^1] education system is not designed to inspire curiosity, and it would keep being this way unless the definition quality of success and quality moves to a more holistic measures. University should be a place for you to think, not to always be right. It should encourage a form of [intellectual playfulness](https://aarnphm.xyz/thoughts/education/../../thoughts/play) and [agency](https://aarnphm.xyz/thoughts/education/../../thoughts/Agency) for explore. ## teaching I do think that professor should use more primary sources, less secondary. Secondary sources are curated and [compressed](https://aarnphm.xyz/thoughts/education/../../thoughts/reductionism) the amount of information being given. Compression can lead to [confirmation bias](https://aarnphm.xyz/thoughts/education/../../thoughts/confirmation-bias), but saturation of information also overloads the students. ### shortification/tiktok-fication of information > \# on shortification of "learning"\ > \ > There are a lot of videos on YouTube/TikTok etc. that give the appearance of education, but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy thinking they are… > > — Andrej Karpathy (@karpathy) [10 février 2024](https://twitter.com/karpathy/status/1756380066580455557) The idea of learning is that it supposed to be mentally challenging, not fun and easy. In the process of shortification, we are losing the depth of information, as does any form of [compression](https://aarnphm.xyz/thoughts/education/../../thoughts/Compression). Similar to how [LLMs](https://aarnphm.xyz/thoughts/education/../../thoughts/LLMs) is being trained on today. > Learning is not supposed to be fun. It doesn’t have to be actively not fun either, but the primary feeling should be that of effort. It should look a lot less like that “10 minute full body” workout from your local digital media creator and a lot more like a serious session at the gym. You want the mental equivalent of sweating. It’s not that the quickie doesn’t do anything, it’s just that it is wildly suboptimal if you actually care to learn. The process of learning should be enduring, but rewarding. It should be a process of internalizing the concept, and practice to thinking coherently, similar to how we [write](https://aarnphm.xyz/thoughts/education/../../thoughts/writing). ### [Constructionist](https://aarnphm.xyz/thoughts/education/../../thoughts/Constructionist) critique[^2]: Too many tools and too much space: Large space should start small, and widen, rather than having everything readily available [^1]: [WEF on relevance of education system](https://www.weforum.org/agenda/2020/04/our-education-system-is-losing-relevance-heres-how-to-update-it/), written in April 13rd 2020 [^2]: See [here](https://saskschoolboards.ca/wp-content/uploads/97-07.htm#:~:text=Constructivist%20teaching%20is%20based%20on,rather%20than%20passively%20receiving%20information.) --- slug: thoughts/effective-procedure tags: - math description: resconstructed source of "https://aarnphm.xyz/thoughts/effective-procedure" title: effective procedure date: 2024-10-08 --- In [logic](https://aarnphm.xyz/thoughts/effective-procedure/../../thoughts/logic), an effective procedure is a procedure for solving problem by any intuitively ‘effective’ means from a specific class. ## formation rules for propositional calculus (wff: well-formed formula) $$ \begin{aligned} \text{FR1} &. \text{ A variable standing alone is a wff} \\ \text{FR2} &. \text{ If } \alpha \text{ is a wff, so is } \neg \alpha \\ \text{FR3} &. \text{ If } \alpha \text{ and } \beta \text{ are wffs, then } (\alpha \cdot \beta ), (\alpha \space \beta), (\alpha \vee \beta ), (\alpha \supset \beta), \text{ and } (\alpha \equiv \beta) \text{ are wffs} \end{aligned} $$ --- slug: thoughts/emergent-behaviour tags: - seed - psychology description: resconstructed source of "https://aarnphm.xyz/thoughts/emergent-behaviour" title: emergent behaviour date: 2024-02-07 --- > When a complex entity exhibits properties, or behaviours that its parts do not have on their own. Or how can complex properties emerge from simple rules. We observe this from: - [LLMs](https://aarnphm.xyz/thoughts/emergent-behaviour/../../thoughts/LLMs), speculations at most - Ants colonies - mold simulations In context of single agent within multi-agent systems, is it due to the rules itself ([reductionist](https://aarnphm.xyz/thoughts/emergent-behaviour/../../thoughts/reductionism)) or additional factors are involved here? --- slug: thoughts/ethics tags: - philosophy - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/ethics" title: ethics date: 2024-03-05 --- Closely connected to [value](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Value) theory, or [moral](https://aarnphm.xyz/thoughts/ethics/../../thoughts/moral) philosophy. [Kantian](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Philosophy-and-Kant) ethics presupposes that there is a universal moral law that applies to all rational beings, or his deontological ethics framework that based on “categorical imperative”. This is different from [Mill’s utilitarianism](https://aarnphm.xyz/thoughts/ethics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill), who argued actions are right insofar as they promote happiness and wrong insofar that they produce a _reverse_ of happiness. [Nietzsche](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Philosophy-and-Nietzsche) critiqued conventional moral theories, and argued for reevaluation of [value](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Value). He believed that traditional morality stifled the full potential of human excellence, seen through BGE or “On the Genealogy of Moral”. Ethics arguments are based of the principles of “good” versus “evil”. What defined as “good” and “evil”? Does human whom ideology falls outside of the [Overton Window](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Overton-Window) considered “evil”? That’s why it’s important to understand our [alignment](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Alignment) through anthropology work such that we didn’t repeat history. ## normative. ### consequentialism - Utilitariansm See also [John Stuart Mill](https://aarnphm.xyz/thoughts/ethics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill) Locke: Action is acceptable if it respects human rights of everyone involved - Common good ### deontology Duty ethics ### virtue ### care ## meta-ethics. --- slug: thoughts/fashion tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/fashion" title: fashion. date: 2024-02-19 --- Fashion, rather a hobby than a need (prob. why [expsense](https://aarnphm.xyz/thoughts/fashion/../../thoughts/Expenses) are high, but worth it). It is an art, a form of self-expression and self-care, for one to exert themselves to the world. > My mantra: “Quality over quantity.” Get a few good pieces that will last you a long time. I do follow trends, and fashion shows. A mix of smart casual, and street wear are my comfort zone. Keep it simple, and minimalistic. **Less is always more**. ### gender dynamics. See also: [Fashion and the Homospectatorial Look](https://aarnphm.xyz/thoughts/fashion/../../thoughts/papers/Fashion-and-the-Homospectatorial-Look.pdf) and [this video](https://www.youtube.com/watch?v=DA2PqBAyGqI\&t=454s\&ab_channel=oliSUNvia) Fuss’s arguments suggest that contemporary fashion photography does not simply cater to a heterosexual male gaze but also tacitly produces a gaze that, while regulating homosexual desire, provides opportunities for its expression. She argues that fashion photography often presents women in a manner that is eroticized, which can be seen as catering to a male gaze. However, this same eroticization can also appeal to women, creating a homospectatorial look where women are viewing other women through a lens that is both homoerotic and commodified. Plays well into spaces that open for more nuanced and subtle expression of [desire](https://aarnphm.xyz/thoughts/fashion/../../thoughts/desire) and [identity](https://aarnphm.xyz/thoughts/fashion/../../thoughts/identity). [Quiet luxury](https://aarnphm.xyz/thoughts/fashion/../../thoughts/fashion#quiet-luxury)’s emphasis on minimalism, subtle textures mirrors the homospectatorial look. ### why fashion show matters, and why it doesn’t. Similar to math, AI conference, it is a place for people to show off their work and get inspired. Legendary designers, with the likes of Ralph Lauren, Dean and Dan Caten, over the years has pretty much shaped and influenced how our jeans and causal style look like. Just look down to their pair of jeans, or slim-fit trousers you wear. Some of the crest or folded textures are inspired by probably one of these designers. It matters in the sense it drives the industry, but also it doesn’t matter because you can pretty much get the look or highlights of what is trendy from the internet, or social media, via “shortification” of videos and informations. I do care about fashion shows simply I appreciate the art and the work that goes into it. ### pretentious. Extensions from the book _Pretentiousness: Why It Matters_ by Dan Fox. > Pretentiousness is for anyone who has braved being different, whether that’s making a stand against artistic consensus or running the gauntlet of the last bus home dressed differently from everyone else > Calling a person pretentious can be a way of calling out the trappings and absurdities of power. It’s a way of undermining the authority that they have positioned themselves with. Fashion often get associated with pretentiousness, as present a signal of wealth. Simply by its outlandish message, people often get the impression that one wear these logo-mania brands often have a lot of capital in their possession, _which is usually the case_. In reality it is the middle class demographics who are purchasing these products, who can indulge themselves without compromising their financial well-beings. One shouldn’t get judged by the clothes they wear, but rather the character they possess. If you feel good in a designer dress, or you worked hard for it, then by all means you should enjoy it to the fullness. However, it often signals the flex culture, or inferred as new money. It is also a matter of [taste](https://aarnphm.xyz/thoughts/fashion/../../thoughts/taste), and [identity](https://aarnphm.xyz/thoughts/fashion/../../thoughts/identity). ### quiet luxury. > you don't actually love "quiet luxury." you're in love with the idea of inherited wealth, spacious uncluttered homes, and enough free time to pursue your hobbies [pic.twitter.com/OwSn6wWxVP](https://t.co/OwSn6wWxVP) > > — derek guy (@dieworkwear) [16 avril 2023](https://twitter.com/dieworkwear/status/1647662031619895296) Not a huge fan of fast fashion, I’d rather spend a bit more on a good pair of jeans that will last me years, than a cheap pair that will last me a few months. > Few exceptions include Uniqlo, Muji, but I wouldn’t consider them fast fashion, because they are actually high-quality products 😄 Don’t buy into maximalist brands. Overpriced, and the churn rate is high. > Few exceptions: Tom Ford, Maison Margiela, Saint Laurent --- > Go for quiet luxury, aka timeless pieces The following are few of favourite brands, in no particular order: | Brand | Genre of clothing to get | | --------------------------------------- | ----------------------------------------------------------------------------- | | Brunello Cucinelli | Cashmere, wool, and linen 🤌 😗 | | Manière De Voir | Probably the best black fit tee I’ve ever worn. | | Oak + Fort | Minimalistic, clean, and simple. Also not to be confused with _Frank and Oak_ | | Studio Nicholson | Trousers, and shirts, ong their leroy jacky are amazing. | | COS | Basics, and essentials. | | [Stoffa](https://stoffa.co/pages/store) | Custom made, and tailored, wanna know how to style. | | Sefr | Pretty niche, but some of their beige lazaro shirts are nice. | | Ted Baker | Holy crap that’s half of my closets. Trousers, shirts, suits, cargo, etc. | | Ralph Lauren Polo | Them trench coats are nice, daily driver during winter szn. | | Mansuir Gavriel | Their bags are my fav. | | Olend Backpacks | For the love of backpacks. | | Bellroy | Tote, durable, flexible. | | Loro Piana | If you can afford go for it | | Club Monaco | Trench coats, overcoats, too based. | | Brooks Brothers | Suits on special occasions. | | Arcteryx | Technical wear, performance, gears are awesome. | | Timberland | Utility, quality, style and worth for money. | | Banana Republic | Got their cashmere and merino wool sweaters. They are good. | | Abercrombie & Fitch | Baggy jeans, flannels comfy wear. | | Massimo Dutti | Their leather jackets are nice. | | Sezzane | Their blouses are nice. | --- slug: thoughts/friendship tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/friendship" title: friendship date: 2024-06-22 --- > Heartbreaks are also what you make of them. Relationships teach you how to gently treat someone as one of your own; they also bash and crush your heart, as if the world is crumbling in front of your eyes. But it is okay; relationships are what we deem worth giving meaning to the absurdity of life. ## of pleasure. or utility? > the issue with a majority of San Francisco's culture of authentic relating, circling, cuddle parties, "deep convo" events, is that it’s intimacy without relationship. Closeness without friendship.\ > \ > In other words, porn. It may feel good when you’re doing it, but empty… > > — Patricia Mou (@patriciamou\_) [16 février 2024](https://twitter.com/patriciamou_/status/1758354933521478126) > [!question]- How does one weave that delicate fabric of trust, when so many are ensnared in their own lives? > > To cultivate trust, we must first turn inward, nurturing the quiet confidence of self-trust before extending our hands to others. Yet, amidst the hustle and haste, how do we anchor our souls in authenticity and leave room for empathy? Can we create a sanctuary for trust to blossom, even in the most turbulent of seas? Trust requires vulnerability - the courage to show up as our authentic selves, to share our hopes and fears without guarantee. When we have faith in each other’s core intentions, even as we stumble and err, we weave a web of trust that can withstand the tempests of misunderstanding. > … because old friends may feel like strangers once substantial time has passed. Consistent with this possibility, several of the barriers that participants endorsed when thinking about reaching out to old friends are similar to the barriers that make people reluctant to talk to strangers. _People are surprisingly hesitant to reach out to old friend_ - [Communication Psychology](https://www.nature.com/articles/s44271-024-00075-8) reaching out feels… weird. In this age of tally, being genuine, would be the ONLY metrics that is important to maintaining relationship. But, can love so alloyed be counted as love at all? If each act of kindness conceals a grasping need, each smile a silent plea, then perhaps the better part of friendship has been lost. We trade in a base currency, a barter of pleasure and utility. The sacred alchemy of selfless affection feels beyond our ken. I yearn for something higher, a way of relating that does not reduce us to mere instruments of the other’s satisfaction. But the way is unclear, the path grown over from neglect. Perhaps most importantly, we must learn to give without strings, without ledger. It is ok to reach out to old friends, or go on that walk every Saturday to meet strangers. To offer our time, our energy, our care - not as a loan to be repaid, but as a gift freely bestowed. To delight in the joy of the other without thought to our own gain. None of this is to lay blame at anyone. We are all, to some degree, caught up in this dance, of pursuing something greater than ourselves. But even in such a world, we need not resign ourselves to a life of bartered affections. Still, I hold out hope that true friendship may yet be possible - a meeting of souls, freely given, that seeks the good of the other for their own sweet sake. Even if the world declares it folly, still I dream of a love uncorrupted. ## of mutual caring. > To care about something is generally to find it worthwhile or valuable in some way; caring about one’s friend is no exception. A central difference among the various accounts of mutual caring is the way in which these accounts understand the kind of evaluation implicit therein - [SEP](https://plato.stanford.edu/entries/friendship/) However, people are more afraid of commitment, and they are even more afraid of being hurt. Who wouldn’t? We have seen too much, borne witness to the cruelties that humans can inflict upon one another. It’s no wonder that hearts grow hesitant, that souls recoil from the prospect of vulnerability. And so we retreat, donning armor forged from fear, shielding ourself from this tumultuous life. But in our haste to protect ourselves, do we not also rob ourselves of life’s greatest joys? In the end, perhaps the greatest fear is not of commitment or solitude, of judgment or pain. Perhaps what we truly fear is the glorious, terrifying possibility of being seen, of being known in all our imperfect beauty. For to be truly seen is to be vulnerable, and relinquish control of oneself. And friends will be the ones who are there for you along the way. --- slug: thoughts/game-theory tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/game-theory" title: game theory date: 2024-04-12 --- emerged John von Neumann published the paper _On the Theory of Games of Strategy_, when Neumann’s original proff used Brower’s fixed-point theorem on continuous mappings into compact convex sets. --- slug: thoughts/git tags: - technical description: resconstructed source of "https://aarnphm.xyz/thoughts/git" title: git date: 2024-02-08 --- That one tool that every developers uses, but no one really understands. See also [The Git Parable](https://tom.preston-werner.com/2009/05/19/the-git-parable) by Tom Preston-Werner. # internals[](#internals) --- slug: thoughts/human-interaction tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/human-interaction" title: human interaction date: 2024-02-06 --- --- slug: thoughts/identity tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/identity" title: identity date: 2024-02-19 --- ### [Freud](https://aarnphm.xyz/thoughts/identity/../../thoughts/Freud) --- slug: thoughts/index tags: - evergreen - fruit description: resconstructed source of "https://aarnphm.xyz/thoughts/index" title: thoughts date: 2024-01-09 --- Collection of scattered thoughts and ideas, concepts, thoughts that I entertain quite a lot. Here are some of my favourite [posts](https://aarnphm.xyz/thoughts/index/../../posts/) of [writing](https://aarnphm.xyz/thoughts/index/../../thoughts/writing) --- slug: thoughts/information-retrieval tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/information-retrieval" title: information retrieval date: 2024-02-07 --- --- slug: thoughts/intelligence tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/intelligence" title: Intelligence date: 2024-02-07 --- Lecture from [Hinton](https://www.youtube.com/watch?v=rGgGOccMEiY\&ab_channel=CSERCambridge) ## neuroscience --- slug: thoughts/joininteract tags: - seed - application description: resconstructed source of "https://aarnphm.xyz/thoughts/joininteract" title: interact cohort 2024 date: 2024-08-23 --- > [!question] Someone gives you 50,000 dollars for a project that explicitly can’t be a business. What’s the project you work on and why? I want to host [dinner](https://aarnphm.xyz/thoughts/joininteract/../../thoughts/atelier-with-friends) centered around intimacy and cultural curation in different cities. The project would involve hosting a series of 4-course meals for small groups, with each event celebrating the local cuisine and culture of its location. I also want to handcraft unique ceramic dishes for each course, adding my personal touch to the experience. Traveling to various cities would also allow me to explore regional ingredients, cooking techniques, and food traditions. I would document this culinary journey on a website featuring photos, recipes, and behind-the-scenes content from each dinner. At its core, this project stems from my love for people and my belief that cooking is a profound way to show care and strengthen human connection. In an age where superficial aspects of life often dominate, I cherish the authentic stories and bonds that can form when we gather around a shared meal. > [!question] What’s something you accomplished or created in the last year that you’re proud of? In the last year, I learned the quiet courage of loving oneself, albeit through hosting dinners, attuning to my inner child, and letting go. It was a hard-fought lesson, wrested from countless small surrenders to the ache and beauty that comprise this mortal coil. There were days mired in melancholy when I yearned to be someone, anyone else. To slip out of my own skin and leave behind the burdens I carried. But slowly, tentatively, I began to make peace with the face in the mirror - both foreign and familiar, an ally and adversary. I came to understand that even in the midst of pain, pinpricks of light could be found if one only remembered to look. It takes a peculiar kind of bravery to embrace the fullness of who you are, scars and all. To grant yourself grace on the days when you have nothing to give. I am still learning the art of it - how to meet my own gaze without flinching, how to be gentle with the wounded parts of my soul. But I am proud of how far I have come. Of the hard-won compassion I now extend to myself in moments of frailty and despair. There is a hushed valor in choosing to love the unlovely parts of your own being. In that quietude between the shadows and the light, I am beginning to discover the makings of peace. Some days, that is enough. Some days, it is everything. > [!question] Elaine Scarry: “Beauty comes out to greet us and prepares us for the other undertakings, for finding out about truth and committing ourselves to justice.” Agree or disagree? Elaine Scarry’s defence of [beauty](https://aarnphm.xyz/thoughts/joininteract/../../thoughts/papers/On-Beauty-and-Being-Just.pdf) against moral condemnation offers a compelling perspective on its role in our pursuit of higher truths and values. She argues that beauty’s immediate allure and clarity serve as an entry point to deeper understanding and ethical contemplation. The “clear discernibility” of beauty, according to Scarry, introduces us to states of certainty and conviction while simultaneously highlighting our capacity for error. This paradox encourages a nuanced approach to perception and judgment. Beauty’s power to induce “radical decentering” frees our minds from self-preoccupation, allowing us to better perceive the complexities of the world and the subtleties of truth. Scarry metaphorically describes beautiful things as “ladders reaching toward the beauty of the world,” suggesting that aesthetic experiences can elevate our consciousness and attune us to broader concepts of goodness and justice. [Kant](https://aarnphm.xyz/thoughts/joininteract/../../thoughts/Philosophy-and-Kant), in his Critique of Judgment, proposed that beauty can be seen as a symbol of the good. However, he cautioned that such an analogy should be approached with an awareness of the aspects in which beauty and goodness differ, as well as the aspects in which they reveal similarities. Ultimately, [beauty](https://aarnphm.xyz/thoughts/joininteract/../../thoughts/beauty) serves as a guide, albeit one shaped by cultural biases and power structures. While it can point us in fruitful directions and attune us to what is worthwhile, to truly arrive at truth, we must critically examine our notions of beauty and remain open to challenging our tastes. > [!question] What qualities or skills best characterize the way you discover and solve problems? My approach to discovering and solving problems is characterized by an intense curiosity and a willingness to dive deep into complex issues. I often cultivate a digital garden, a space where I voraciously explore tangentially related concepts and perspectives. While technology grants me access to a wealth of information and resources, it also exposes me to a world of uncertainty and ambiguity. Embracing the [Socratic](https://aarnphm.xyz/thoughts/joininteract/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates) paradox, I remain acutely aware of the limitations of my knowledge. This [epistemic](https://aarnphm.xyz/thoughts/joininteract/../../Epistemology) humility is not just a philosophical stance but a practical necessity for solving complex problems. It requires an openness to being wrong and a willingness to engage in trial-and-error experimentation. Underpinning this approach is a fundamental belief in [agency](https://aarnphm.xyz/thoughts/joininteract/../../Agency) and self-efficacy. Kant’s exhortation “Sapere aude” (dare to know) from his essay “Answering the Question: What is Enlightenment” resonates deeply with me. It encourages a program of intellectual self-liberation through reason, a path I strive to follow. I believe that with sufficient determination, ingenuity, and grit, we can achieve remarkable things. In this context, neuroticism, often viewed as a liability, becomes a gift when transmuted into dogged persistence. Combined with the humility to recognize the scope of ignorance, it propels a restless journey of discovery, chasing the light of knowledge into the unknown. Through this alchemical process, vices are transformed into virtues in the relentless pursuit of truth. > [!question] Who (between 18 and 23) would you be the most excited to find out was in your Fellowship class? Why? * I would be thrilled to meet individuals who are bridging the understanding gap between foundational models and humans through innovative interfaces and interactions. Language models perceive the world differently than we do, and developing rich interfaces to connect these distinct worldviews could lead to profound insights and a deeper understanding of our world. While techniques like prompting and dimensionality reduction offer glimpses into the possibilities, current interactions remain static. We have yet to experience a true extension of self through these models, as I firmly believe they are magical being. Solving this understanding gap would enhance our journey to refine our taste and attune to what is truly meaningful. --- slug: thoughts/large-models tags: - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/large-models" title: Foundational large models date: 2024-01-08 --- Popularized through [LLMs](https://aarnphm.xyz/thoughts/large-models/../../thoughts/LLMs), [GPT-3 paper](https://arxiv.org/abs/2005.14165), See also: 7.1 of [The Little Book of Deep Learning](https://aarnphm.xyz/thoughts/large-models/../../books#2024) Though, it should be thought as [Intelligence amplification](https://aarnphm.xyz/thoughts/large-models/../../thoughts/Intelligence-amplification) rather than “artificial intelligence” system. ## Scaling laws Initial [work](https://arxiv.org/abs/2001.08361) from OpenAI Distributed serving of large models requires cost-efficient methods[^1] - [Petals](https://petals.dev/): a decentralized system that run Llama 2 over internet ### large world models [LWM](https://github.com/LargeWorldModel/LWM): implementation of [RingAttention](https://aarnphm.xyz/thoughts/large-models/../../thoughts/Attention#ringattention) ## visions [^1]: [Distributed Inference and Fine-tuning of Large Language Models over the Internet](https://arxiv.org/abs/2312.08361) --- slug: thoughts/latent-space tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/latent-space" title: latent space date: 2024-04-03 --- --- slug: thoughts/lenses tags: - seed - film description: resconstructed source of "https://aarnphm.xyz/thoughts/lenses" title: Lenses date: 2024-01-22 --- A collection of lenses I uses for both photos and [videos.](https://aarnphm.xyz/thoughts/lenses/../../thoughts/Cinematography) - Sony 10-18mm f/4 OSS - Sony 16-35mm f/2.8 GM II - Sony 24-70mm f/2.8 GM II - Sony 50mm f/1.8 - Sony 85mm f/1.8 --- slug: thoughts/linguistic tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/linguistic" title: linguistic date: 2024-02-12 --- --- slug: thoughts/logic tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/logic" title: logic date: 2024-03-02 --- --- slug: thoughts/mechanistic-interpretability tags: - interp description: all things mech interp title: mechanistic interpretability date: 2024-10-30 --- [whirlwind tour](https://www.youtube.com/watch?v=veT2VI4vHyU\&ab_channel=FAR%E2%80%A4AI), [initial exploration](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/pdfs/tinymorph-exploration.pdf), [glossary](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J) > The subfield of alignment that delves into reverse engineering of a neural network, especially [LLMs](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/LLMs) To attack the _curse of dimensionality_, the question remains: _how do we hope to understand a function over such a large space, without an exponential amount of time?_ [^lesswrongarc] ## inference application in the wild: [Goodfire](https://goodfire.ai/) and [Transluce](https://transluce.org/) > [!question]+ How we would do inference with SAE? > > > Quick 🧵 and some of quick introspection into how they might run inference > > > > — aaron (@aarnphm\_) [25 septembre 2024](https://twitter.com/aarnphm_/status/1839016131321016380) idea: treat SAEs as a `logit_processor`, though there are currently some bottleneck with `logit_processor` in [vLLM](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/vllm), similar to [guided decoding](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/vllm#guided-decoding) Currently, before v1, `logit_processor` are row-wise, meaning logits are currently being processed before passing down to scheduling group [^vllm-caveats] ## steering refers to the process of manually modifying certain activations and hidden state of the neural net to influence its outputs For example, the following is a toy example of how a decoder-only transformers (i.e: GPT-2) generate text given the prompt “The weather in California is” ```mermaid flowchart LR A[The weather in California is] --> B[H0] --> D[H1] --> E[H2] --> C[... hot] ``` To steer to model, we modify $H_2$ layers with certain features amplifier with scale 20 (called it $H_{3}$)[^1] ```mermaid flowchart LR A[The weather in California is] --> B[H0] --> D[H1] --> E[H3] --> C[... cold] ``` One usually use techniques such as [sparse autoencoders](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/mechanistic-interpretability#sparse-autoencoders) to decompose model activations into a set of interpretable features. For feature [ablation](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/mechanistic-interpretability#ablation), we observe that manipulation of features activation can be strengthened or weakened to directly influence the model’s outputs A few examples where ([Panickssery et al., 2024](#bib-panickssery2024steeringllama2contrastive)) uses contrastive activation additions to steer Llama 2 ### contrastive activation additions intuition: using a contrast pair for steering vector additions at certain activations layers Uses _mean difference_ which produce difference vector similar to PCA: Given a dataset $\mathcal{D}$ of prompt $p$ with positive completion $c_p$ and negative completion $c_n$, we calculate mean-difference $v_\text{MD}$ at layer $L$ as follow: $$ v_\text{MD} = \frac{1}{\mid \mathcal{D} \mid} \sum_{p,c_p,c_n \in \mathcal{D}} a_L(p,c_p) - a_L(p, c_n) $$ > [!tip] implication > > by steering existing learned representations of behaviors, CAA results in better out-of-distribution generalization than basic supervised finetuning of the entire model. ## sparse autoencoders abbrev: SAE _see also: [landspace](https://docs.google.com/document/d/1lHvRXJsbi41bNGZ_znGN7DmlLXITXyWyISan7Qx2y6s/edit?tab=t.0#heading=h.j9b3g3x1o1z4)_ Often contains one layers of MLP with few linear ReLU that is trained on a subset of datasets the main LLMs is trained on. > empirical example: if we wish to interpret all features related to the author Camus, we might want to train an SAEs based on all given text of Camus to interpret “similar” features from Llama-3.1 > [!abstract] definition > > We wish to decompose a models’ activitation $x \in \mathbb{R}^n$ into sparse, linear combination of feature directions: > > $$ > \begin{aligned} x \sim x_{0} + &\sum_{i=1}^{M} f_i(x) d_i \\[8pt] \because \quad &d_i M \gg n:\text{ latent unit-norm feature direction} \\ &f_i(x) \ge 0: \text{ corresponding feature activation for }x \end{aligned} > $$ Thus, the baseline architecture of SAEs is a linear autoencoder with L1 penalty on the activations: $$ \begin{aligned} f(x) &\coloneqq \text{ReLU}(W_\text{enc}(x - b_\text{dec}) + b_\text{enc}) \\ \hat{x}(f) &\coloneqq W_\text{dec} f(x) + b_\text{dec} \end{aligned} $$ > training it to reconstruct a large dataset of model activations $x \sim \mathcal{D}$, constraining hidden representation $f$ to be sparse [L1 norm](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1#l1norm) with coefficient $\lambda$ to construct loss during training: $$ \begin{aligned} \mathcal{L}(x) &\coloneqq \| x-\hat{x}(f(x)) \|_2^2 + \lambda \| f(x) \|_1 \\[8pt] &\because \|x-\hat{x}(f(x)) \|_2^2 : \text{ reconstruction loss} \end{aligned} $$ > [!tip] intuition > > We need to reconstruction fidelity at a given sparsity level, as measured by L0 via a mixture of reconstruction fidelity and L1 regularization. We can reduce sparsity loss term without affecting reconstruction by scaling up norm of decoder weights, or constraining norms of columns $W_\text{dec}$ during training Ideas: output of decoder $f(x)$ has two roles - detects what features acre active ⇐ L1 is crucial to ensure sparsity in decomposition - _estimates_ magnitudes of active features ⇐ L1 is unwanted bias ### Gated SAE _uses Pareto improvement over training to reduce L1 penalty_ ([Rajamanoharan et al., 2024](#bib-rajamanoharan2024improvingdictionarylearninggated)) Clear consequence of the bias during training is _shrinkage_ ([Sharkey, 2024](#bib-sharkey2024feature)) [^shrinkage] Idea is to use [gated ReLU](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder/../../thoughts/optimization#gated-linear-units-and-variants) encoder ([Dauphin et al., 2017](#bib-dauphin2017languagemodelinggatedconvolutional); [Shazeer, 2020](#bib-shazeer2020gluvariantsimprovetransformer)): $$ \tilde{f}(\mathbf{x}) \coloneqq \underbrace{\mathbb{1}[\underbrace{(\mathbf{W}_{\text{gate}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{gate}}) > 0}_{\pi_{\text{gate}}(\mathbf{x})}]}_{f_{\text{gate}}(\mathbf{x})} \odot \underbrace{\text{ReLU}(\mathbf{W}_{\text{mag}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{mag}})}_{f_{\text{mag}}(\mathbf{x})} $$ where $\mathbb{1}[\bullet > 0]$ is the (point-wise) Heaviside step function and $\odot$ denotes element-wise multiplication. | term | annotations | | -------------------- | ------------------------------------------------------------------------------- | | $f_\text{gate}$ | which features are deemed to be active | | $f_\text{mag}$ | feature activation magnitudes (for features that have been deemed to be active) | | $\pi_\text{gate}(x)$ | $f_\text{gate}$ sub-layer’s pre-activations | to negate the increases in parameters, use weight sharing: Scale $W_\text{mag}$ in terms of $W_\text{gate}$ with a vector-valued rescaling parameter $r_\text{mag} \in \mathbb{R}^M$: $$ (W_\text{mag})_{ij} \coloneqq (\exp (r_\text{mag}))_i \cdot (W_\text{gate})_{ij} $$ ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder/../../thoughts/images/gated-sae-architecture.webp) _Figure 3: Gated SAE with weight sharing between gating and magnitude paths_ ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder/../../thoughts/images/jump_relu.webp) _Figure 4: A gated encoder become a single layer linear encoder with Jump ReLU_ ([Erichson et al., 2019](#bib-erichson2019jumpreluretrofitdefensestrategy)) _activation function_ $\sigma_\theta$ ### feature suppression See also: [link](https://www.alignmentforum.org/posts/3JuSjTZyMzaSeTxKk/addressing-feature-suppression-in-saes) Loss function of SAEs combines a MSE reconstruction loss with sparsity term: $$ \begin{aligned} L(x, f(x), y) &= \|y-x\|^2/d + c\mid f(x) \mid \\[8pt] &\because d: \text{ dimensionality of }x \end{aligned} $$ > the reconstruction is not perfect, given that only one is reconstruction. **For smaller value of $f(x)$, features will be suppressed** > [!note]- illustrated example > > consider one binary feature in one dimension $x=1$ with probability $p$ and $x=0$ otherwise. Ideally, optimal SAE would extract feature activation of $f(x) \in \{0,1\}$ and have decoder $W_d=1$ > > However, if we train SAE optimizing loss function $L(x, f(x), y)$, let say encoder outputs feature activation $a$ if $x=1$ and 0 otherwise, ignore bias term, the optimization problem becomes: > > $$ > \begin{aligned} a &= \argmin p * L(1,a,a) + (1-p) * L(0,0,0) \\ &= \argmin (1-a)^2 + \mid a \mid * c \\ &= \argmin a^2 + (c-2) *a +1 \end{aligned} \Longrightarrow \boxed{a = 1-\frac{c}{2}} > $$ > [!question]+ How do we fix feature suppression in training SAEs? > > introduce element-wise scaling factor per feature in-between encoder and decoder, represented by vector $s$: > > $$ > \begin{aligned} f(x) &= \text{ReLU}(W_e x + b_e) \\ f_s(x) &= s \odot f(x) \\ y &= W_d f_s(x) + b_d \end{aligned} > $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder) ## sparse crosscoders > [!tip] maturity > > a research preview from Anthroppic and this is pretty much still a work in progress see also [reproduction on Gemma 2B](https://colab.research.google.com/drive/124ODki4dUjfi21nuZPHRySALx9I74YHj?usp=sharing) and [github](https://github.com/ckkissane/crosscoder-model-diff-replication) A variant of [sparse autoencoder](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/sparse-autoencoder) where it reads and writes to multiple layers ([Lindsey et al., 2024](#bib-lindsey2024sparsecrosscoders)) Crosscoders produces shared features across layers and even models Resolve: - cross-layer features: resolve cross-layer superposition - circuit simplification: remove redundant features from analysis and enable jumping across training many uninteresting identity circuit connections - model diffing: produce shared sets of features across models. This also introduce one model across training, and also completely independent models with different architectures. ## motivations ### cross-layer [superposition](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/mechanistic-interpretability#superposition-hypothesis) ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/images/additive-residual-stream-llm.webp) _given the additive properties of transformers’ residual stream, **adjacent layers** in larger transformers can be thought as “almost parallel”_ > [!tip]- intuition > > In basis of superposition hypothesis, a feature is a linear combinations of neurons at any given layers. > > ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/images/feature-neurons.webp) ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/images/one-step-circuit.webp) ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/images/parallel-joint-branch.webp) _if we think of adjacent layers as being “almost parallel branches that potentially have superposition between them”, then we can apply dictionary learning jointly [^jointlysae]_ ### persistent features and complexity Current drawbacks of sparse autoencoders is that we have to train it against certain activations layers to extract features. In terms of the residual stream per layers, we end up having lots of duplicate features across layers. > Crosscoders can simplify the circuit _given that we use an appropriate architecture_ [^risks] ## setup. > Autoencoders and transcoders as special cases of crosscoders. > > - autoencoders: reads and predict the same layers > - transcoders: read from layer $n$ and predict layer $n+1$ Crosscoder read/write to many layers, subject to causality constraints. > [!math]+ crosscoders > > Let one compute the vector of feature activation $f_(x_j)$ on data point $x_j$ by summing over contributions of activations of different layers $a^l(x_j)$ for layers $l \in L$: > > $$ > \begin{aligned} f(x_j) &= \text{ReLU}(\sum_{l\in L}W_{\text{enc}}^l a^l(x_j) + b_{\text{enc}}) \\[8pt] &\because W^l_{\text{enc}} : \text{ encoder weights at layer } l \\[8pt] &\because a^l(x_j) : \text{ activation on datapoint } x_j \text{ at layer } l \\ \end{aligned} > $$ We have loss $$ L = \sum_{l\in L} \|a^l(x_j) - a^{l^{'}}(x_j)\|^2 + \sum_{l\in L}\sum_i f_i(x_j) \|W^l_{\text{dec,i}}\| $$ and regularization can be rewritten as: $$ \sum_{l\in L}\sum_{i} f_i(x_j) \|W^l_{\text{dec,i}}\| = \sum_{i} f_i(x_j)(\displaystyle\sum_{l \in L} \|W^l_\text{dec,i}\|) $$ _weight of L1 regularization penalty by L1 norm of per-layer decoder weight norms_ $\sum\limits{l\in L} \|W^l_\text{dec,i}\|$ [^l2weightnorm] We use L1 due to - baseline loss comparison: L2 exhibits lower loss than sum of per-layer SAE losses, as they would effectively obtain a loss “bonus” by spreading features across layers - layer-wise sparsity surfaces layer-specific features: based on empirical results of [model diffing](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/sparse-crosscoders#model-diffing), that L1 uncovers a mix of shared and model-specific features, whereas L2 tends to uncover only shared features. ## variants ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/images/crosscoders-variants.webp) good to explore: 1. strictly causal crosscoders to capture MLP computation and treat computation performed by attention layers as linear 2. combine strictly causal crosscoders for MLP outputs without weakly causal crosscoders for attention outputs 3. interpretable attention replacement layers that could be used in combination with strictly causal crosscoders for a “replacement model” ## model diffing see also: [model stiching](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/model-stiching) and [SVCCA](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/SVCCA) > ([Laakso & Cottrell, 2000](#bib-doi:10.1080/09515080050002726)) proposes compare [representations](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/representations) by transforming into representations of distances between data points. [^sne] ## questions > How do features change over model training? When do they form? > As we make a model wider, do we get more features? or they are largely the same, packed less densely? [Lien vers l'original](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders) ## superposition hypothesis > [!abstract]+ tl/dr > > phenomena when a neural network represents _more_ than $n$ features in a $n$-dimensional space > Linear representation of neurons can represent more features than dimensions. As sparsity increases, model use superposition to represent more [features](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/mechanistic-interpretability#features) than dimensions. > > neural networks “want to represent more features than they have neurons”. When features are sparsed, superposition allows compression beyond what linear model can do, at a cost of interference that requires non-linear filtering. reasoning: “noisy simulation”, where small neural networks exploit feature sparsity and properties of high-dimensional spaces to approximately simulate much larger much sparser neural networks In a sense, superposition is a form of **lossy [compression](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/Compression)** ### importance - sparsity: how _frequently_ is it in the input? - importance: how useful is it for lowering loss? ### over-complete basis _reasoning for the set of $n$ directions [^direction]_ ## features > A property of an input to the model When we talk about features ([Elhage et al., 2022, p. see “Empirical Phenomena”](#bib-elhage2022superposition)), the theory building around several observed empirical phenomena: 1. Word Embeddings: have direction which corresponding to semantic properties ([Mikolov et al., 2013](#bib-mikolov-etal-2013-linguistic)). For example: ```prolog V(king) - V(man) = V(monarch) ``` 2. Latent space: similar vector arithmetics and interpretable directions have also been found in generative adversarial network. We can define features as properties of inputs which a sufficiently large neural network will reliably dedicate a neuron to represent ([Elhage et al., 2022, p. see “Features as Direction”](#bib-elhage2022superposition)) ## ablation > refers to the process of removing a subset of a model’s parameters to evaluate its predictions outcome. idea: deletes one activation of the network to see how performance on a task changes. - zero ablation or _pruning_: Deletion by setting activations to zero - mean ablation: Deletion by setting activations to the mean of the dataset - random ablation or _resampling_ ## residual stream ```mermaid flowchart LR A[Token] --> B[Embeddings] --> C[x0] C[x0] --> E[H] --> D[x1] C[x0] --> D D --> F[MLP] --> G[x2] D --> G[x2] G --> I[...] --> J[unembed] --> X[logits] ``` residual stream $x_{0}$ has dimension $\mathit{(C,E)}$ where - $\mathit{C}$: the number of tokens in context windows and - $\mathit{E}$: embedding dimension. [Attention](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/Attention) mechanism $\mathit{H}$ process given residual stream $x_{0}$ as the result is added back to $x_{1}$: $$ x_{1} = \mathit{H}{(x_{0})} + x_{0} $$ ## grokking See also: [writeup](https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking), [code](https://colab.research.google.com/drive/1F6_1_cWXE5M7WocUcpQWp3v8z4b1jL20), [circuit threads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) > A phenomena discovered by ([Power et al., 2022](#bib-power2022grokkinggeneralizationoverfittingsmall)) where small algorithmic tasks like modular addition will initially memorise training data, but after a long time ti will suddenly learn to generalise to unseen data > [!tip] empirical claims > > related to phase change ## References - Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). _Language Modeling with Gated Convolutional Networks_. arXiv preprint arXiv:1612.08083 [arxiv](https://arxiv.org/abs/1612.08083) - Erichson, N. B., Yao, Z., & Mahoney, M. W. (2019). _JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks_. arXiv preprint arXiv:1904.03750 [arxiv](https://arxiv.org/abs/1904.03750) - Rajamanoharan, S., Conmy, A., Smith, L., Lieberum, T., Varma, V., Kramár, J., Shah, R., & Nanda, N. (2024). _Improving Dictionary Learning with Gated Sparse Autoencoders_. arXiv preprint arXiv:2404.16014 [arxiv](https://arxiv.org/abs/2404.16014) - Sharkey, L. (2024). _Addressing Feature Suppression in SAEs_. AI Alignment Forum. [\[post\]](https://www.alignmentforum.org/posts/3JuSjTZyMzaSeTxKk/addressing-feature-suppression-in-saes) - Shazeer, N. (2020). _GLU Variants Improve Transformer_. arXiv preprint arXiv:2002.05202 [arxiv](https://arxiv.org/abs/2002.05202) - Gorton, L. (2024). _The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders to InceptionV1 Early Vision_. arXiv preprint arXiv:2406.03662 [arxiv](https://arxiv.org/abs/2406.03662) - Laakso, A., & Cottrell, G. (2000). Content and cluster analysis: Assessing representational similarity in neural systems. _Philosophical Psychology_, _13_(1), 47–76. - Lindsey, J., Templeton, A., Marcus, J., Conerly, T., Batson, J., & Olah, C. (2024). Sparse Crosscoders for Cross-Layer Features and Model Diffing. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2024/crosscoders/index.html) - Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield-Dodds, Z., Lasenby, R., Drain, D., Chen, C., Grosse, R., McCandlish, S., Kaplan, J., Amodei, D., Wattenberg, M., & Olah, C. (2022). Toy Models of Superposition. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2022/toy_model/index.html) - Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations. In L. Vanderwende, H. Daumé III, & K. Kirchhoff (Eds.), _Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies_ (pp. 746–751). Association for Computational Linguistics. - Panickssery, N., Gabrieli, N., Schulz, J., Tong, M., Hubinger, E., & Turner, A. M. (2024). _Steering Llama 2 via Contrastive Activation Addition_. arXiv preprint arXiv:2312.06681 [arxiv](https://arxiv.org/abs/2312.06681) - Power, A., Burda, Y., Edwards, H., Babuschkin, I., & Misra, V. (2022). _Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets_. arXiv preprint arXiv:2201.02177 [arxiv](https://arxiv.org/abs/2201.02177) [^lesswrongarc]: good read from [Lawrence C](https://www.lesswrong.com/posts/6FkWnktH3mjMAxdRT/what-i-would-do-if-i-wasn-t-at-arc-evals#Ambitious_mechanistic_interpretability) for ambitious mech interp. [^vllm-caveats]: [the benchmark](https://github.com/vllm-project/vllm/pull/10046) was run against `vllm#0.6.3.dev236+g48138a84`, with all configuration specified in the pull request. [^1]: An example steering function can be: $$ H_{3} = H_{2} + \text{steering\_strength} * \text{SAE}.W_{\text{dec}}[20] * \text{max\_activation} $$ [^shrinkage]: If we hold $\hat{x}(\bullet)$ fixed, thus L1 pushes $f(x) \to 0$, while reconstruction loss pushes $f(x)$ high enough to produce accurate reconstruction. An optimal value is somewhere between. However, rescaling the [shrink](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder/../../thoughts/mechanistic-interpretability#feature-suppression) feature activations ([Sharkey, 2024](#bib-sharkey2024feature)) is not necessarily enough to overcome bias induced by L1: a SAE might learnt sub-optimal encoder and decoder directions that is not improved by the fixed. [^jointlysae]: ([Gorton, 2024](#bib-gorton2024missingcurvedetectorsinceptionv1)) applies SAEs to study InceptionV1, where cross-branch superposition is significant in interpreting models with parallel branches [^risks]: causal description it provides likely differs from that of the underlying model. [^l2weightnorm]: $\|W_\text{dec,i}^l\|$ is the L2 norm of a single feature’s decoder vector at a given layer. In principe, one might have expected to use L2 norm of per-layer norm $\sqrt{\sum_{l \in L} \|W_\text{dec,i}^l\|^2}$ [^sne]: Chris Colah’s [blog post](https://colah.github.io/posts/2015-01-Visualizing-Representations/) explains how t-SNE can be used to visualize collections of networks in a function space. [^direction]: Even though features still correspond to directions, the set of interpretable direction is larger than the number of dimensions --- slug: thoughts/model-stiching tags: - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/model-stiching" title: model stiching date: 2024-11-04 --- ([Lenc & Vedaldi, 2015](#bib-lenc2015understandingimagerepresentationsmeasuring)) ## References - Lenc, K., & Vedaldi, A. (2015). _Understanding image representations by measuring their equivariance and equivalence_. arXiv preprint arXiv:1411.5908 [arxiv](https://arxiv.org/abs/1411.5908) --- slug: thoughts/monetary tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/monetary" title: Monetary date: 2024-01-20 --- Karpathy on [AI’s 30 under 30](https://twitter.com/karpathy/status/1748816969858720232) > [Money](https://aarnphm.xyz/thoughts/monetary/../../thoughts/monetary) is an information system for labor allocation. (this [tweet](https://x.com/elonmusk/status/1349977642708168704?s=20)) Money doesn’t have any intrinsic power. You can’t simply throw more money into a system and hope it would fix the problem. [Chaos](https://aarnphm.xyz/thoughts/monetary/../../thoughts/Chaos) is produced from the act of generating wealth. What does it really means by accumulating wealth? If capital gains is a property in the pursuit for [knowledge](https://aarnphm.xyz/thoughts/monetary/../../thoughts/Epistemology), chances are you will enjoy your time. The problems with curiosity without [alignment](https://aarnphm.xyz/thoughts/monetary/../../thoughts/Alignment) of capitalism is that you will run out of time and money sooner or later. --- slug: thoughts/moral tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/moral" title: Moral date: 2024-02-07 --- See also: [Value](https://aarnphm.xyz/thoughts/moral/../../thoughts/Value) > [!tip] Justification > > Provide criteria for judging actions. It might be that the criterion is simple, such as right actions maximize the good, or it may be complex, such as the right action is the one that gives adequate weight to each competing duty Most notable are Kant’s [deontological ethics](https://aarnphm.xyz/thoughts/moral/../../thoughts/Philosophy-and-Kant), utilitarianism, and virtue ethics. Considering what is right? or provides the account of wrongness, permissibility. --- slug: thoughts/music-theory tags: - seed - sapling description: resconstructed source of "https://aarnphm.xyz/thoughts/music-theory" title: Music theory date: 2023-09-25 --- Half steps → between E-F, B-C Full step from E → F# Minor → Flat major scale Elements per side for a House/Techno: L-R: Perc, piano, strings, pads, guitars, synths, fx, bv M: vocals, snare, bass, kick ### Effects ### Vocals Call and responses ### Mids - Guitar, piano ### Bass Usually with Synths/808 Tools: - Ableton Operator - Serum 1/16 note grid i-iv-vii ### Drums UKG: → Swing (1/16th bar off) #### Kick - Fat 909 #### Hats Jacking hi-hats: closed followed by a open hi-hats or vice versa #### Claps #### Percussion ### Syncopation ### Major third For any given scale, choose an altered chords: - Instead of a III as a minor, play as a major - major III to a minor VI --- slug: thoughts/observer-expectancy-effect tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/observer-expectancy-effect" title: observer-expectancy effect date: 2024-02-07 --- The observer’s prejudices influence towards the people she/he is observing. ## Robert Rosenthal paper: **The effect of experimenter bias on the performance of the albino rat** ### Clever Hans ## Operant conditioning Where behaviours are modified through the associations of stimuli with reinforcement or punishment. Thus, operants, or behaviours affected by the environment, are conditioned to happen more or less often based on the environmental consequence of the behaviour. --- slug: thoughts/optimization tags: - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/optimization" title: ml optimization date: 2024-10-31 --- A list of optimization functions that can be used in ML training to reduce loss. ## sigmoid $$ \text{sigmoid}(x) = \frac{1}{1+e^{-x}} $$ ## ReLU $$ \text{FFN}(x, W_{1}, W_{2}, b_{1}, b_{2}) = max(0, xW_{1}+b_{1})W_{2} + b_{2} $$ A version in T5 without bias: $$ \text{FFN}_\text{ReLU}(x,W_{1},W_{2}) = max(xW_{1},0)W_{2} $$ ## Swish ([Ramachandran et al., 2017](#bib-ramachandran2017searchingactivationfunctions)) introduces an alternative to ReLU that works better on deeper models across different tasks. $$ f(x) = x \cdotp \text{sigmoid}(\beta x) \\ \because \beta : \text{ constant parameter} $$ ## Gated Linear Units and Variants > component-wise product of two linear transformations of the inputs, one of which is sigmoid-activated. ([Shazeer, 2020](#bib-shazeer2020gluvariantsimprovetransformer)) introduces a few GELU activations to yield improvements in [Transformers](https://aarnphm.xyz/thoughts/optimization/../../thoughts/Transformers) architecture. $$ \begin{aligned} \text{GLU}(x,W,V,b,c) &= \sigma(xW+b) \otimes (xV+c) \\ \text{Bilinear}(x,W,V,b,c) &= (xW+b) \otimes (xV+c) \end{aligned} $$ GLU in other variants: $$ \begin{aligned} \text{ReGLU}(x,W,V,b,c) &= \max(0, xW+b) \otimes (xV+c) \\ \text{GEGLU}(x,W,V,b,c) &= \text{GELU}(xW+b) \otimes (xV+c) \\ \text{SwiGLU}(x,W,V,b,c) &= \text{Swish}_\beta(xW+b) \otimes (xV+c) \end{aligned} $$ FFN for transformers layers would become: $$ \begin{aligned} \text{FFN}_\text{GLU}(x,W,V,W_{2}) &= (\sigma (xW) \otimes xV)W_{2} \\ \text{FFN}_\text{Bilinear}(x,W,V,W_{2}) &= (xW \otimes xV)W_{2} \\ \text{FFN}_\text{ReGLU}(x,W,V,W_{2}) &= (\max(0, xW) \otimes xV)W_{2} \\ \text{FFN}_\text{GEGLU}(x,W,V,W_{2}) &= (\text{GELU}(xW) \otimes xV)W_{2} \\ \text{FFN}_\text{SwiGLU}(x,W,V,W_{2}) &= (\text{Swish}_\beta(xW) \otimes xV)W_{2} \end{aligned} $$ \_note: reduce number of hidden units $d_\text{ff}$ (second dimension of $W$ and $V$ and the first dimension of $W_{2}$) by a factor of $\frac{2}{3}$ when comparing these layers ## JumpReLU ([Rajamanoharan et al., 2024](#bib-rajamanoharan2024jumpingaheadimprovingreconstruction)) ## momentum See also [Stochastic gradient descent](https://aarnphm.xyz/thoughts/optimization/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent) # Nesterov momentum See also [paper](http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf) idea: - first take a step in the direction of accumulated momentum - computes gradient at “lookahead” position, - make the update using this gradient. > [!abstract] definition > > For a parameter vector $\theta$, the update can be expressed as > > $$ > \begin{aligned} v_t &= \mu v_{t-1} + \nabla L(\theta_t + \mu v_{t-1}) \\ \theta_{t+1} &= \theta_t - \alpha v_t \end{aligned} > $$ Achieves better convergence rates | function type | gradient descent | Nesterove AG | | ------------------------ | ---------------------------------- | --------------------------------------- | | Smooth | $\theta(\frac{1}{T})$ | $\theta(\frac{1}{T^{2}})$ | | Smooth & Strongly Convex | $\theta(\exp (-\frac{T}{\kappa}))$ | $\theta(\exp -\frac{T}{\sqrt{\kappa}})$ | [Lien vers l'original](https://aarnphm.xyz/thoughts/optimization/../../thoughts/Nesterov-momentum) ### Polyak’s Momentum ## References - Rajamanoharan, S., Lieberum, T., Sonnerat, N., Conmy, A., Varma, V., Kramár, J., & Nanda, N. (2024). _Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders_. arXiv preprint arXiv:2407.14435 [arxiv](https://arxiv.org/abs/2407.14435) - Ramachandran, P., Zoph, B., & Le, Q. V. (2017). _Searching for Activation Functions_. arXiv preprint arXiv:1710.05941 [arxiv](https://arxiv.org/abs/1710.05941) - Shazeer, N. (2020). _GLU Variants Improve Transformer_. arXiv preprint arXiv:2002.05202 [arxiv](https://arxiv.org/abs/2002.05202) --- slug: thoughts/papers/index tags: - folder description: resconstructed source of "https://aarnphm.xyz/thoughts/papers/index" title: papers. date: 2024-01-20 --- A somewhat local cache of all papers I’ve read. This is one source of my Zotero [library](https://aarnphm.xyz/thoughts/papers/index/../../../../books). --- slug: thoughts/pdfs/index tags: - folder description: resconstructed source of "https://aarnphm.xyz/thoughts/pdfs/index" title: pdfs. date: 2024-10-29 --- The following include a list of PDFs that are pretty cool --- slug: thoughts/personal-computing tags: - seed - computing description: resconstructed source of "https://aarnphm.xyz/thoughts/personal-computing" title: personal computing date: 2024-02-25 --- See [this tweet](https://twitter.com/joekndy/status/1761616198482219368) --- slug: thoughts/play tags: - seed - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/play" title: Play date: 2023-10-18 --- ### play? > intentional activity of doing the thing you want to do ⇒ create share ownership of spaces which competition cannot and have as much fun as we can. Turn life into a canvas, rather a graph with checkpoint. Throw away your 5-year life plan, to create a garden of your curiosity Commercial viability vs. creativity endeavour, [Do thing that don’t scale](https://paulgraham.com/ds.html) ### software. > 🏀 A Note on Playful Software 🏀\ > \ > Playful software != video games. I mean tinkerable, whimsical, playful consumer software: creative software, social networks, dating apps, messengers\ > \ > Play that's not segregated from ordinary life [pic.twitter.com/cHyGpcI9m8](https://t.co/cHyGpcI9m8) > > — XH (@xhfloz) [19 septembre 2023](https://twitter.com/xhfloz/status/1704176399173488823) Four components - whimsy - new people - surprise - joy Involves freedom of choice - social - about the process ### Create spaces not product > not necessarily meaning you are doing for yourself, but make it possible for others to utilise the space. ### Play as a form of tinkering Internet [playground](https://woolgather.sh/issue/2) Can we shift [education system](https://aarnphm.xyz/thoughts/play/../../thoughts/education#system) away from assessing students to let them explore their own interests? [Magic Circle](https://subconscious.substack.com/p/magic-circles) or [from squishy\[dot\]computer](https://newsletter.squishy.computer/p/magic-circles) - is a space which a game takes place. Once we step into it, we suspend the rules of life, allow the rules of the game to take over our interactions - boundaries of magic circle often via ceremonies: - National Anthem before olympics game - Gong before yoga class - Walking down the aisle at a wedding ⇒ similar to the idea of [liminal space](https://en.wikipedia.org/wiki/Liminality) in anthropology, or [Game of life](https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life) Graeber on [What’s the point of we can’t have fun](https://davidgraeber.org/articles/whats-the-point-if-we-cant-have-fun/) > Why does the existence of action carried out for the sheer pleasure of acting, the exertion of powers for the sheer pleasure of exerting them, strike us as mysterious? What does it tell us about ourselves that we instinctively assume that it is? - “Man plays only when he is in the full sense of the word a man” (Friedrich Schiller, 1795) ### [philosophy](https://aarnphm.xyz/thoughts/play/../../tags/philosophy) > [!notes] Philosophy as play > > Involves a form of perspective shifting: trying on or inhabit alternative perspective Intellectual playfulness[^1], loosely, is the disposition to try out new ideas, perspectives and systems of thought (involves perspective shifting) for the sheer joy of it. It is a disposition to explore ideas for the value of exploration itself. - intellectually playful exploration sometimes can better serve the goal of finding the truth, than will exploration that is strictly aimed at finding the truth - it functions against epistemic traps: belief systems that undermine our epistemic efforts, leaving us stuck inside them ### Irony Play involves lightness with rules — the ability to lightly step away from but also the ability to lightly adopt. To be serious about a game is to play it under the idea that its goals are really and genuinely important — as an Olympic athlete does. To be playful about games is neither to be utterly serious, or utterly ironic, but to move easily into and out of commitments to rule-sets > To be playful is to wear the games’ care lightly > To be playful is to be pretentious #### Pretentious: Why It Matters by Daniel Fox - argues that pretentious invokes curiosity and creativity, instead of negative connotation Necessitates freedom, conditional freedom? Play often initiate some sort of pressure, such that it expects us to be a part of the construction. [^1]: excerpt from [Playfulness vs Epistemic Traps](https://philpapers.org/archive/NGUPVE.pdf) --- slug: thoughts/prompt-engineering tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/prompt-engineering" title: Prompt engineering date: 2024-02-12 --- A constructive way to form communications with [LLMs](https://aarnphm.xyz/thoughts/prompt-engineering/../../thoughts/LLMs). As we improve the quality of prompts, we can expect better results from the models. Similar to [linguistic](https://aarnphm.xyz/thoughts/prompt-engineering/../../thoughts/linguistic), a good prompt is a good form of communication with the system. This is different from [zero-shot prompting](https://aarnphm.xyz/thoughts/prompt-engineering/../../thoughts/zero-shot-learning) ## CoT prompting See also: [NLP](https://aarnphm.xyz/thoughts/prompt-engineering/../../thoughts/NLP) You can think of it as explaining a big topics to a five years old. You break down topic into smaller, logic parts that mimics a train of thoughts. ## Least-to-most prompting Prompted to first list the sub-problems to a problem, then solve them in sequence. --- slug: thoughts/quantization tags: - seed - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/quantization" title: Quantization date: 2024-02-05 --- See also: [this talk](https://aarnphm.xyz/thoughts/quantization/../../thoughts/images/htn-openllm.pdf) I gave at Hack the North 2023. > reduce computational and memory costs of running inference with representing the weight and activations with low-precision data type - `int16` - [half precision](https://aarnphm.xyz/thoughts/quantization/../../thoughts/quantization#fp32-to-fp16) - `bfloat16` - `int8` > [!note] Note > > This also applies to post-training quantization, where the methodology is applied after the model has been trained, instead of during load-time. ## `fp32` to `fp16` > Does my operation support `fp16`? - CPU does support saving `fp16` weights, but computations are done in `fp32` > Does my operation _sensitive_ to `fp16`? For example `epsilon` in `LayerNormalization` usually is very small $1e^{-12}$, but smallest value in `fp16` is $\approx 6e^{-5}$, which cause `NaN` issues. ## `fp32` to `int8` Consider a float `x` in `[a, b]`, such that _affine quantization scheme_: $$ x = S \cdot (x_q - Z) $$ where: - $x_q$ is the quantized `int8` associated with `x` - $S$ and $Z$ are scaling and zero-point parameters - $S$ is the scale, positive `float32` - $Z$ is the zero-point, or the `int8` value corresponding to value `0` in `fp32` Thus quantized value $x_q$ is: $x_q = \text{round}(x / S + Z)$ And `fp32` value outside of `[a, b]` is clipped to closest representable value. $$ \forall x \in [a, b] \quad x_q = \text{clip}(\text{round}(x/S + Z), \text{round}(a/S + Z), \text{round}(b/S + Z)) $$ See also: [paper](https://arxiv.org/abs/1712.05877) ## quantization time - Post-training **dynamic quantization**: range of each activation is computed on the fly at _runtime_ - Post-training **static quantization**: range of each activation is computed _offline_ before _runtime_ - Observers are put on activations to collect their value - certain number of forward passes on calibration datasets - range of each computation are computed according to some _calibration technique_ - **Quantization aware training**: range of each activation is computed _during training_ - `fake_quantize` operations are inserted in the computation graph - `fake_quantize` is a no-op during inference, but during training, it simulates the effect of quantization ## Methods and libraries [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) and [GPTQ](https://arxiv.org/abs/2210.17323) --- slug: thoughts/questions tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/questions" title: questions date: 2024-02-07 --- What is questions really? People always say “there is no such thing as a stupid question”, but I do think questions comes from innate ability and desire to learn, not coming from subjective opinion. Source: [Ask better questions on Kernel](https://www.kernel.community/en/learn/module-2/better-questions/) ### Socratic method [Socrates](https://aarnphm.xyz/thoughts/questions/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates) are notoriously known for just asking questions and involve in such dialogue. A method of hypothesis elimination, in that better hypotheses are found by steadily identifying and eliminating those that lead to contradictions. A Socratic Circle is an approach to understanding texts. It is based off the assumption that all knowledge is a posteriori knowledge, all thinking comes from asking questions, and that one question should lead to asking further questions. Students will then often involved in a Socratic [dialects](https://aarnphm.xyz/thoughts/questions/../../thoughts/dialectics), where inner circle will explore and ask questions, the outer circle will then provide feedback and vice versa. --- slug: thoughts/reason tags: - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/reason" title: reason date: 2024-02-26 --- ### inductive. ### deductive. --- slug: thoughts/reductionism tags: - seed - psychology description: resconstructed source of "https://aarnphm.xyz/thoughts/reductionism" title: reductionism date: 2024-02-07 --- See also: [Compression](https://aarnphm.xyz/thoughts/reductionism/../../thoughts/Compression) Reductionism is the relationship among theories. 1. Ontology: a belief that whole of reality consists of a minimal number of parts 2. Methodology: scientific attemp to provide explanation in terms of ever-smaller entities 3. Theory: suggest newer theory does not replace/absorb older one, but reduces it to more basic terms. --- slug: thoughts/representations tags: - seed - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/representations" title: representations. date: 2024-02-25 --- ## symbolic --- slug: thoughts/scripts/index tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/scripts/index" title: scripts. date: 2024-10-30 --- A list of tools to be used for this vault. --- slug: thoughts/sparse-autoencoder tags: - ml - interp description: resconstructed source of "https://aarnphm.xyz/thoughts/sparse-autoencoder" title: sparse autoencoder date: 2024-11-04 --- abbrev: SAE _see also: [landspace](https://docs.google.com/document/d/1lHvRXJsbi41bNGZ_znGN7DmlLXITXyWyISan7Qx2y6s/edit?tab=t.0#heading=h.j9b3g3x1o1z4)_ Often contains one layers of MLP with few linear ReLU that is trained on a subset of datasets the main LLMs is trained on. > empirical example: if we wish to interpret all features related to the author Camus, we might want to train an SAEs based on all given text of Camus to interpret “similar” features from Llama-3.1 > [!abstract] definition > > We wish to decompose a models’ activitation $x \in \mathbb{R}^n$ into sparse, linear combination of feature directions: > > $$ > \begin{aligned} x \sim x_{0} + &\sum_{i=1}^{M} f_i(x) d_i \\[8pt] \because \quad &d_i M \gg n:\text{ latent unit-norm feature direction} \\ &f_i(x) \ge 0: \text{ corresponding feature activation for }x \end{aligned} > $$ Thus, the baseline architecture of SAEs is a linear autoencoder with L1 penalty on the activations: $$ \begin{aligned} f(x) &\coloneqq \text{ReLU}(W_\text{enc}(x - b_\text{dec}) + b_\text{enc}) \\ \hat{x}(f) &\coloneqq W_\text{dec} f(x) + b_\text{dec} \end{aligned} $$ > training it to reconstruct a large dataset of model activations $x \sim \mathcal{D}$, constraining hidden representation $f$ to be sparse [L1 norm](https://aarnphm.xyz/thoughts/sparse-autoencoder/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1#l1norm) with coefficient $\lambda$ to construct loss during training: $$ \begin{aligned} \mathcal{L}(x) &\coloneqq \| x-\hat{x}(f(x)) \|_2^2 + \lambda \| f(x) \|_1 \\[8pt] &\because \|x-\hat{x}(f(x)) \|_2^2 : \text{ reconstruction loss} \end{aligned} $$ > [!tip] intuition > > We need to reconstruction fidelity at a given sparsity level, as measured by L0 via a mixture of reconstruction fidelity and L1 regularization. We can reduce sparsity loss term without affecting reconstruction by scaling up norm of decoder weights, or constraining norms of columns $W_\text{dec}$ during training Ideas: output of decoder $f(x)$ has two roles - detects what features acre active ⇐ L1 is crucial to ensure sparsity in decomposition - _estimates_ magnitudes of active features ⇐ L1 is unwanted bias ### Gated SAE _uses Pareto improvement over training to reduce L1 penalty_ ([Rajamanoharan et al., 2024](#bib-rajamanoharan2024improvingdictionarylearninggated)) Clear consequence of the bias during training is _shrinkage_ ([Sharkey, 2024](#bib-sharkey2024feature)) [^shrinkage] Idea is to use [gated ReLU](https://aarnphm.xyz/thoughts/sparse-autoencoder/../../thoughts/optimization#gated-linear-units-and-variants) encoder ([Dauphin et al., 2017](#bib-dauphin2017languagemodelinggatedconvolutional); [Shazeer, 2020](#bib-shazeer2020gluvariantsimprovetransformer)): $$ \tilde{f}(\mathbf{x}) \coloneqq \underbrace{\mathbb{1}[\underbrace{(\mathbf{W}_{\text{gate}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{gate}}) > 0}_{\pi_{\text{gate}}(\mathbf{x})}]}_{f_{\text{gate}}(\mathbf{x})} \odot \underbrace{\text{ReLU}(\mathbf{W}_{\text{mag}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{mag}})}_{f_{\text{mag}}(\mathbf{x})} $$ where $\mathbb{1}[\bullet > 0]$ is the (point-wise) Heaviside step function and $\odot$ denotes element-wise multiplication. | term | annotations | | -------------------- | ------------------------------------------------------------------------------- | | $f_\text{gate}$ | which features are deemed to be active | | $f_\text{mag}$ | feature activation magnitudes (for features that have been deemed to be active) | | $\pi_\text{gate}(x)$ | $f_\text{gate}$ sub-layer’s pre-activations | to negate the increases in parameters, use weight sharing: Scale $W_\text{mag}$ in terms of $W_\text{gate}$ with a vector-valued rescaling parameter $r_\text{mag} \in \mathbb{R}^M$: $$ (W_\text{mag})_{ij} \coloneqq (\exp (r_\text{mag}))_i \cdot (W_\text{gate})_{ij} $$ ![](https://aarnphm.xyz/thoughts/sparse-autoencoder/../../thoughts/images/gated-sae-architecture.webp) _Figure 3: Gated SAE with weight sharing between gating and magnitude paths_ ![](https://aarnphm.xyz/thoughts/sparse-autoencoder/../../thoughts/images/jump_relu.webp) _Figure 4: A gated encoder become a single layer linear encoder with Jump ReLU_ ([Erichson et al., 2019](#bib-erichson2019jumpreluretrofitdefensestrategy)) _activation function_ $\sigma_\theta$ ### feature suppression See also: [link](https://www.alignmentforum.org/posts/3JuSjTZyMzaSeTxKk/addressing-feature-suppression-in-saes) Loss function of SAEs combines a MSE reconstruction loss with sparsity term: $$ \begin{aligned} L(x, f(x), y) &= \|y-x\|^2/d + c\mid f(x) \mid \\[8pt] &\because d: \text{ dimensionality of }x \end{aligned} $$ > the reconstruction is not perfect, given that only one is reconstruction. **For smaller value of $f(x)$, features will be suppressed** > [!note]- illustrated example > > consider one binary feature in one dimension $x=1$ with probability $p$ and $x=0$ otherwise. Ideally, optimal SAE would extract feature activation of $f(x) \in \{0,1\}$ and have decoder $W_d=1$ > > However, if we train SAE optimizing loss function $L(x, f(x), y)$, let say encoder outputs feature activation $a$ if $x=1$ and 0 otherwise, ignore bias term, the optimization problem becomes: > > $$ > \begin{aligned} a &= \argmin p * L(1,a,a) + (1-p) * L(0,0,0) \\ &= \argmin (1-a)^2 + \mid a \mid * c \\ &= \argmin a^2 + (c-2) *a +1 \end{aligned} \Longrightarrow \boxed{a = 1-\frac{c}{2}} > $$ > [!question]+ How do we fix feature suppression in training SAEs? > > introduce element-wise scaling factor per feature in-between encoder and decoder, represented by vector $s$: > > $$ > \begin{aligned} f(x) &= \text{ReLU}(W_e x + b_e) \\ f_s(x) &= s \odot f(x) \\ y &= W_d f_s(x) + b_d \end{aligned} > $$ ## References - Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). _Language Modeling with Gated Convolutional Networks_. arXiv preprint arXiv:1612.08083 [arxiv](https://arxiv.org/abs/1612.08083) - Erichson, N. B., Yao, Z., & Mahoney, M. W. (2019). _JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks_. arXiv preprint arXiv:1904.03750 [arxiv](https://arxiv.org/abs/1904.03750) - Rajamanoharan, S., Conmy, A., Smith, L., Lieberum, T., Varma, V., Kramár, J., Shah, R., & Nanda, N. (2024). _Improving Dictionary Learning with Gated Sparse Autoencoders_. arXiv preprint arXiv:2404.16014 [arxiv](https://arxiv.org/abs/2404.16014) - Sharkey, L. (2024). _Addressing Feature Suppression in SAEs_. AI Alignment Forum. [\[post\]](https://www.alignmentforum.org/posts/3JuSjTZyMzaSeTxKk/addressing-feature-suppression-in-saes) - Shazeer, N. (2020). _GLU Variants Improve Transformer_. arXiv preprint arXiv:2002.05202 [arxiv](https://arxiv.org/abs/2002.05202) [^shrinkage]: If we hold $\hat{x}(\bullet)$ fixed, thus L1 pushes $f(x) \to 0$, while reconstruction loss pushes $f(x)$ high enough to produce accurate reconstruction. An optimal value is somewhere between. However, rescaling the [shrink](https://aarnphm.xyz/thoughts/sparse-autoencoder/../../thoughts/mechanistic-interpretability#feature-suppression) feature activations ([Sharkey, 2024](#bib-sharkey2024feature)) is not necessarily enough to overcome bias induced by L1: a SAE might learnt sub-optimal encoder and decoder directions that is not improved by the fixed. --- slug: thoughts/sparse-crosscoders tags: - interp description: resconstructed source of "https://aarnphm.xyz/thoughts/sparse-crosscoders" title: sparse crosscoders date: 2024-11-03 --- > [!tip] maturity > > a research preview from Anthroppic and this is pretty much still a work in progress see also [reproduction on Gemma 2B](https://colab.research.google.com/drive/124ODki4dUjfi21nuZPHRySALx9I74YHj?usp=sharing) and [github](https://github.com/ckkissane/crosscoder-model-diff-replication) A variant of [sparse autoencoder](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/sparse-autoencoder) where it reads and writes to multiple layers ([Lindsey et al., 2024](#bib-lindsey2024sparsecrosscoders)) Crosscoders produces shared features across layers and even models Resolve: - cross-layer features: resolve cross-layer superposition - circuit simplification: remove redundant features from analysis and enable jumping across training many uninteresting identity circuit connections - model diffing: produce shared sets of features across models. This also introduce one model across training, and also completely independent models with different architectures. ## motivations ### cross-layer [superposition](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/mechanistic-interpretability#superposition-hypothesis) ![](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/images/additive-residual-stream-llm.webp) _given the additive properties of transformers’ residual stream, **adjacent layers** in larger transformers can be thought as “almost parallel”_ > [!tip]- intuition > > In basis of superposition hypothesis, a feature is a linear combinations of neurons at any given layers. > > ![](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/images/feature-neurons.webp) ![](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/images/one-step-circuit.webp) ![](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/images/parallel-joint-branch.webp) _if we think of adjacent layers as being “almost parallel branches that potentially have superposition between them”, then we can apply dictionary learning jointly [^jointlysae]_ ### persistent features and complexity Current drawbacks of sparse autoencoders is that we have to train it against certain activations layers to extract features. In terms of the residual stream per layers, we end up having lots of duplicate features across layers. > Crosscoders can simplify the circuit _given that we use an appropriate architecture_ [^risks] ## setup. > Autoencoders and transcoders as special cases of crosscoders. > > - autoencoders: reads and predict the same layers > - transcoders: read from layer $n$ and predict layer $n+1$ Crosscoder read/write to many layers, subject to causality constraints. > [!math]+ crosscoders > > Let one compute the vector of feature activation $f_(x_j)$ on data point $x_j$ by summing over contributions of activations of different layers $a^l(x_j)$ for layers $l \in L$: > > $$ > \begin{aligned} f(x_j) &= \text{ReLU}(\sum_{l\in L}W_{\text{enc}}^l a^l(x_j) + b_{\text{enc}}) \\[8pt] &\because W^l_{\text{enc}} : \text{ encoder weights at layer } l \\[8pt] &\because a^l(x_j) : \text{ activation on datapoint } x_j \text{ at layer } l \\ \end{aligned} > $$ We have loss $$ L = \sum_{l\in L} \|a^l(x_j) - a^{l^{'}}(x_j)\|^2 + \sum_{l\in L}\sum_i f_i(x_j) \|W^l_{\text{dec,i}}\| $$ and regularization can be rewritten as: $$ \sum_{l\in L}\sum_{i} f_i(x_j) \|W^l_{\text{dec,i}}\| = \sum_{i} f_i(x_j)(\displaystyle\sum_{l \in L} \|W^l_\text{dec,i}\|) $$ _weight of L1 regularization penalty by L1 norm of per-layer decoder weight norms_ $\sum\limits{l\in L} \|W^l_\text{dec,i}\|$ [^l2weightnorm] We use L1 due to - baseline loss comparison: L2 exhibits lower loss than sum of per-layer SAE losses, as they would effectively obtain a loss “bonus” by spreading features across layers - layer-wise sparsity surfaces layer-specific features: based on empirical results of [model diffing](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/sparse-crosscoders#model-diffing), that L1 uncovers a mix of shared and model-specific features, whereas L2 tends to uncover only shared features. ## variants ![](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/images/crosscoders-variants.webp) good to explore: 1. strictly causal crosscoders to capture MLP computation and treat computation performed by attention layers as linear 2. combine strictly causal crosscoders for MLP outputs without weakly causal crosscoders for attention outputs 3. interpretable attention replacement layers that could be used in combination with strictly causal crosscoders for a “replacement model” ## model diffing see also: [model stiching](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/model-stiching) and [SVCCA](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/SVCCA) > ([Laakso & Cottrell, 2000](#bib-doi:10.1080/09515080050002726)) proposes compare [representations](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/representations) by transforming into representations of distances between data points. [^sne] ## questions > How do features change over model training? When do they form? > As we make a model wider, do we get more features? or they are largely the same, packed less densely? ## References - Gorton, L. (2024). _The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders to InceptionV1 Early Vision_. arXiv preprint arXiv:2406.03662 [arxiv](https://arxiv.org/abs/2406.03662) - Laakso, A., & Cottrell, G. (2000). Content and cluster analysis: Assessing representational similarity in neural systems. _Philosophical Psychology_, _13_(1), 47–76. - Lindsey, J., Templeton, A., Marcus, J., Conerly, T., Batson, J., & Olah, C. (2024). Sparse Crosscoders for Cross-Layer Features and Model Diffing. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2024/crosscoders/index.html) [^jointlysae]: ([Gorton, 2024](#bib-gorton2024missingcurvedetectorsinceptionv1)) applies SAEs to study InceptionV1, where cross-branch superposition is significant in interpreting models with parallel branches [^risks]: causal description it provides likely differs from that of the underlying model. [^l2weightnorm]: $\|W_\text{dec,i}^l\|$ is the L2 norm of a single feature’s decoder vector at a given layer. In principe, one might have expected to use L2 norm of per-layer norm $\sqrt{\sum_{l \in L} \|W_\text{dec,i}^l\|^2}$ [^sne]: Chris Colah’s [blog post](https://colah.github.io/posts/2015-01-Visualizing-Representations/) explains how t-SNE can be used to visualize collections of networks in a function space. --- slug: thoughts/state-space-models tags: - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/state-space-models" title: state-space models date: 2024-02-07 --- See [state-space/mamba](https://github.com/state-spaces/mamba) and [paper](https://arxiv.org/abs/2312.00752) Mama uses a selective SSM scan. State-space duality (SSD): SSM + attentions layers (SMA, or structured masked [attention](https://aarnphm.xyz/thoughts/state-space-models/../../thoughts/Attention)) --- slug: thoughts/tacit-knowledge tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/tacit-knowledge" title: tacit knowledge date: 2024-10-22 --- --- slug: thoughts/taste tags: - seed - pattern description: resconstructed source of "https://aarnphm.xyz/thoughts/taste" title: taste date: 2024-02-19 --- ## as guide. [Jacky’s post](https://jzhao.xyz/posts/aesthetics-and-taste) > We have built up an instinctive habit of looking things up and seeing how other people have done it before trying it for ourselves. But the downside is that this habit primes our brains to value our work in the context of the taste of others rather than of our own. We have outsourced our [value](https://aarnphm.xyz/thoughts/taste/../../thoughts/Value) systems for what is good and bad (how we may judge [aesthetic value](https://aarnphm.xyz/thoughts/taste/../../thoughts/aesthetic-value)) to other people. > Looking at the history of scientific progress, we see plenty of evidence on how this reliance on the taste of committees and society broadly only serves to inhibit progress. Managed creativity can, at best, produce only what its managers specify. All that remains are the ideas that live in the Overton Window --- slug: thoughts/taxonomy tags: - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/taxonomy" title: Taxonomy date: 2024-02-07 --- --- slug: thoughts/university/twenty-four-twenty-five/engineer-4a03/case-study tags: - engineer4a03 description: a case study into how surveillance capitalism drives one of the most influential controversy in data privacy of the 21st century title: Cambridge Analytica, a case study date: 2024-11-08 --- ## group Cambridge Analytica scandal epitomises a dark reality towards ethical responsibilities of corporations operating within the framework of surveillance capitalism ([Zuboff, 2019](#bib-zuboff2019age)) Through what Zuboff calls “extraction practices,” ([Zuboff, 2015, p. 78](#bib-doi:10.1057/jit.2015.5)) Cambridge Analytica harvested personal data from millions of Facebook users, treating individual privacy not as a right but as a commodity to be seized. [^1] As Zuboff argues, this new economic logic is fundamentally incompatible with democratic norms, as it concentrates unprecedented power in private companies while eliminating traditional reciprocities between corporations and people. The ethical responsibility of Facebook lies in its facilitation of an infrastructure that prioritizes data acquisition over user privacy ([Srnicek, 2017, p. pg.2, see expansion, monopolisation, invulnerabilities](#bib-srnicek2017platformcapitalism)). By designing a platform that encourages extensive data sharing and by failing to enforce strict oversight over third-party data access, Facebook normalized surveillance as a core aspect of its business model ([Couldry & Mejias, 2019](#bib-couldry2019costs)). This aligns with the principles of surveillance capitalism, where the commodification of personal information becomes a driving economic force, often at the expense of individual autonomy and privacy. Cambridge Analytica’s actions further exemplify the perils of surveillance capitalism by demonstrating how personal data can be weaponised to manipulate democratic processes. The firm’s use of regression ML algorithm to influence electoral outcomes highlights a significant ethical breach—transforming citizens from participants in a democracy to subjects of behavioral manipulation ([Susser et al., 2019](#bib-susser2019technology)). This not only undermines individual rights but also poses a threat to the integrity of democratic institutions. In a sense, Chris Wylie assumed significant ethical responsibilities as a whistleblower. By exposing the company’s unethical data practices, Wylie upheld a moral imperative to prevent harm to society and protect democratic processes. Whistleblowers often face substantial personal and professional risks, but their actions are vital in bringing unethical practices to light ([Vandekerckhove & Langenberg, 2012](#bib-vandekerckhove2012organize)). Wylie’s decision to reveal the inner workings of Cambridge Analytica provided transparency and prompted a global discourse on data privacy and the dangers of surveillance capitalism. Regulators and policymakers share in the ethical responsibility due to their delayed response to the evolving landscape of data privacy. The lack of robust legal frameworks allowed surveillance capitalism to flourish unchecked, exposing vulnerabilities in data protection and user rights ([Acquisti et al., 2016](#bib-10.1257/jel.54.2.442)). The scandal underscores the urgent need for comprehensive regulations that address the complexities of data commodification in the digital age. ## References - Acquisti, A., Taylor, C., & Wagman, L. (2016). The Economics of Privacy. _Journal of Economic Literature_, _54_(2), 442–492. - Couldry, N., & Mejias, U. A. (2019). _The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism_. Stanford University Press. - Srnicek, N. (2017). The challenges of platform capitalism: Understanding the logic of a new business model. _Juncture_, _23_(4), 254–257. - Susser, D., Roessler, B., & Nissenbaum, H. (2019). Technology, autonomy, and manipulation. _Internet Policy Review_, _8_(2). - Vandekerckhove, W., & Langenberg, S. (2012). Can We Organize Courage? Implications from Foucault’s Parrhesia. _Electronic Journal of Business Ethics and Organizational Studies_. - Zuboff, S. (2015). Big other: Surveillance Capitalism and the Prospects of an Information Civilization. _Journal of Information Technology_, _30_(1), 75–89. - Zuboff, S. (2019). _The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power_. PublicAffairs. [^1]: Surveillance capitalism operates by extracting surplus data from individuals—often without their explicit consent—and using it to predict and influence behavior for profit.([Zuboff, 2015, p. 81](#bib-doi:10.1057/jit.2015.5)) Facebook’s business model relied heavily on harvesting vast amounts of user data to drive targeted advertising, creating an environment ripe for exploitation. --- slug: thoughts/university/twenty-four-twenty-five/engineer-4a03/index tags: - university - engineer4a03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/index" title: Engineering Ethics date: 2024-10-29 --- See also [ethics](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/index/../../../../../../../../thoughts/ethics), [literature review](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/index/../../../../../../../../thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review), [case study](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/index/../../../../../../../../thoughts/university/twenty-four-twenty-five/engineer-4a03/case-study) --- slug: thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review tags: - engineer4a03 description: How we understand machine learning system is how we can move towards a safe futures, yet the road ahead lies many troubles to overcome. A literature review into the inception of the field, as well as where do we go from here. title: machine learning, as inception of time, a literature review date: 2024-10-07 --- See also [essays on ChatGPT](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../posts/chatgpt), [case study on Cambridge Analytica](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/university/twenty-four-twenty-five/engineer-4a03/case-study) ## introduction. To understand how AI is fundamentally political, we need to go beyond neural nets and statistical pattern recognition to instead ask _what_ is being optimized, and _for whom_, and _who_ gets to decide. Then we can trace the implications of those choices. -- Kate Crawford, _The Atlas of AI_ 1979’s “Star-Trek: the Motion Picture” centered around the antagonist, V’Ger, an artificial entity that have outgrown its original programs, sought annihilation upon planet Earth. At the core, the movie is mostly fictional, yet its prevalence to our current state of affairs is uncanny. Much in Artificial intelligence (AI) has changed since 1960s, including a shift in symbolic systems to more recent hype about deep connectionist networks. AI has expanded rapidly as a academia field and as a industry[^1]. Yet, the belief of formalising human intelligence and reproduced by machine has always been the core disputes in the history of AI. There has always been two narratives discussed within academia and industry practitioners on how we should approach such systems: The likes of Marvin Minsky claiming “machine can think” ([CRAWFORD, 2021, pp. 5–9](#bib-atlasofai)); while Dreyfus ([Dreyfus, 2008](#bib-dreyfus2008why)) believed in a Heideggerian AI system would dissolve the framing problem[^framing]. Nowadays, this narrative morphs into two verticals: Entities that seek to build systems capable of outperforming at tasks that a human can do at a greater degree of accuracy and efficiency (OpenAI, Anthropic, SSI, many AI labs, etc.[^ssi]), and companies that build AI systems to amplify our abilities to create and improve efficiency for our work (Runway, Cohere, etc.). This literature review aims to provide a comprehensive overview of the current state of AI, through its history and current adoption. It will also include investigations into certain concerns for diversity, equity, and inclusion (DEI) within the field, as well as the ethical implications of AI systems. It will then conclude and posit questions about where we go from here. ## growth. _Mathematicians wish to treat matters of perception mathematically, and make themselves ridiculous \[...] the mind \[...] does it tacitly, naturally, and without technical rules._ -- Pascal, _Pensées_ The inception of [AI](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/Machine-learning) might well begin when the belief of a total formalisation of knowledge must be possible[^2]. From Plato’s dichotomy of the rational soul from the body with its skills and intuition[^3], to Leibniz’s conception of the binary systems as a “universal characteristics” ([Leibniz, 1951, pp. 15, 25, 38](#bib-leibniz_selections_1951)) that led to Babbage’s design of “Analytic Engine” being recognized as the “first digital computer”, Alan Turing posited that a high-speed digital computer, programmed with rules, might exhibit [emergent behaviour](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/emergent-behaviour) of [intelligence](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/intelligence) ([TURING, 1950](#bib-10.1093/mind/lix.236.433)). Thus, a paradigm among researchers that focused on symbolic [reasoning](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/reason) was born, referred to as Good Old-Fashioned AI (GOFAI) ([Haugeland, 1997](#bib-10.7551/mitpress/4626.001.0001)). GOFAI was built on a high level symbolic representation of the world, popularized through expert systems ([Jackson, 1998](#bib-jackson_introduction_1998)) that tried to mimic human expert on specialized tasks [^4]. Yet, we observed a period of “AI Winter” where most symbolic AI research either reached dead end or funding being dried up ([Hendler, 2008](#bib-handler2008avoidanotheraiwinter)). This is largely due to GOFAI’s semantic representation which were implausible to scale to generalized tasks. Concurrently, Donald Norman’s Parallel Distributed Processing ([Rumelhart et al., 1986](#bib-10.7551/mitpress/5236.001.0001)) group investigated variations of Rosenblatt’s project ([Rosenblatt, 1958](#bib-rosenblatt1958perceptron)), where they proposed intermediate processors within the network (often known as “hidden layers”) alongside with inputs and outputs to extrapolate appropriate responses based on what it had learned during training process. These systems, built on top of statistical methods[^5] and connectionist networks are often referred to by Haugeland as New-Fangled AI (NFAI) ([Haugeland, 1997](#bib-10.7551/mitpress/4626.001.0001)). In retrospect, GOFAI are [deterministic](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/Determinism) in a sense that intentionality is injected within symbolic tokens through explicit programming. Connectionist networks, on the other hand, are often considered as black-box models, given their hidden nature of intermediate representations of perceptron. Unlike GOFAI, its internal representation is determined by the state of the entire network rather than any single unit. Given the rise of Moore’s Law and the exponential amount of computing and data available, we are currently witnessing the dominance of connectionist networks, especially with the injection of LLMs into the mainstream ([Kaplan et al., 2020](#bib-kaplan2020scalinglawsneurallanguage)), where the majority of research are focused on developing artificial neural networks that optimizes around loss functions ([Vaswani et al., 2023](#bib-vaswani2023attentionneed)) ([Srivastava et al., 2014](#bib-srivastava_dropout_2014)). One notable example that combines both GOFAI and NFAI systems is AlphaZero, a connectionist network based Go playing systems, that uses a deep neural networks to assess new positions and Monte-Carlo Tree Search (a GOFAI algorithm) to determine its next move ([Silver et al., 2017](#bib-silver2017masteringchessshogiselfplay)). ## adoption. For context, we produce a lot of data: social media consumption, emails transaction, search, online shopping, mainly due to the rise of the internet and Web 2.0 post 9/11. While capitialism has always been a fraught system, there are incentives for harvesting our attention and predict our future behaviour — what Zuboff refers to as “surveillance capitalism” ([Carr, 2019](#bib-carr2019thieves)). In a sense, surveillance capitalism is built on top of the notion of _extraction imperatives_ where the Google and Facebook of the world have to mine as much information as possible [^6]. Machine learning benefited of this phenomenon since statistical methods often predict certain pattern from given data and yield certain predictions/decisions. ML can be categorized into two sub fields, supervised learning (where algorithms are trained on labelled data to provide prediction based on given labels) and unsupervised learning (where algorithms are trained on the basis of “produce _y_ in the form of _x_”)[^7]. Supervised learning methods including Naive Bayes, Decision tree, and other Bayesian models have been well integrated into industries to solve forecasting and classification problems ([Wu et al., 2020](#bib-zhang2020labelingmethod)) ## fairness See also: MIT Press ([Hao et al., 2019](#bib-haokarbuolamwini2019)), Darthmouth investigation in COMPAS system ([Dressel, 2018](#bib-doi:10.1126/sciadv.aao5580)) DEI has become a key aspect of technological progress in the $21^{\text{st}}$ century. This applies to AI, where its black-box nature has proven to be difficult for researchers to align certain bias bugs. Two main DEI methods emerge for addressing given problems: improving data diversity and ensuring fairness during the training procedure. The primary methods on fighting against bias bugs in contemporary AI system includes increase in data diversity. There is a timeless saying in computer science “[Garbage in Garbage out](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/Garbage-in-Garbage-out)”, which essentially states that bad data will produce outputs that’s of equal quality. This is most prevalent in AI, given the existence of these networks within a black-box model. One case of this is the very first iterations of Google Photos’ image recognition where it identified people with darker skins as “gorillas” ([BBC News, 2015](#bib-bbcgoogleapology2015)). Alliances such as The Data & Trust Alliance, including Meta, Nike, CVS Health, are formed to regulate and combat algorithmic bias. The Data & Trust Alliance aims to confront dangers of powerful algorithms in the work force before they can cause harm instead of simply reacting after the damage is done (Lohr, 2021). (Clarke, 2021) proposed that close inspection and regulation of these models should be monitored closely to mitigate misrepresentation of marginalized groups (Khan, 2022). Truth is, data lacks context. A prime example of this US’ COMPAS used by US courts to assess the likelihood of criminal to reoffend. ProPublica concluded that COMPAS was inherently biased towards those of African descent, citing that it overestimated the false positives rate for those of African descent by two folds ([Angwin et al., 2016](#bib-angwinlarsonmattukirchner2016)). Interestingly, a study done at Darthmouth showed a surprising accuracy on the rate of recidivism with random volunteers when given the same information as the COMPAS algorithm ([Dressel, 2018](#bib-doi:10.1126/sciadv.aao5580)). The question remains, how do we solve fairness and ensure DEI for marginalized groups when there is obviously prejudice and subjectivity that introduce bias at play? It is not a problem we can’t solve, rather collectively we should define what makes an algorithm **fair**. ## References - Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A Learning Algorithm for Boltzmann Machines. _Cognitive Science_, _9_(1), 147–169. - Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). How We Analyzed the COMPAS Recidivism Algorithm. _ProPublica_. - Aristotle. (2009). _Nicomachean Ethics_ (L. Brown, Ed.; W. D. Ross, Trans.). Oxford University Press. - BBC News. (2015). Google apologises for Photos app’s racist blunder. _BBC News_. - Carr, N. (2019). Thieves of Experience: How Google and Facebook Corrupted Capitalism. _Los Angeles Review of Books_. - CRAWFORD, K. (2021). _The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence_. Yale University Press. - Dressel, J., & Hany Farid. (2018). The accuracy, fairness, and limits of predicting recidivism. _Science Advances_, _4_(1), eaao5580. - Dreyfus, H. L. (1972). _What Computers Can’t Do: A Critique of Artificial Reason_ (1st ed.). Harper & Row. - Dreyfus, H. L. (2008). Why Heideggerian AI Failed and How Fixing It Would Require Making It More Heideggerian. In _The Mechanical Mind in History_ (pp. 331–362). MIT Press. - Hao, K., Kar, J., & Buolamwini, J. (2019). Can you make AI fairer than a judge? Play our courtroom algorithm game. _MIT Technology Review_. - Haugeland, J. (1997). _Mind Design II: Philosophy, Psychology, and Artificial Intelligence_. The MIT Press. - Hendler, J. (2008). Avoiding Another AI Winter. _IEEE Intelligent Systems_, _23_(2), 2–4. - Jackson, P. (1998). _Introduction to Expert Systems_ (3rd ed., p. 542). Addison Wesley. - Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. _Science_, _349_(6245), 255–260. - Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). _Scaling Laws for Neural Language Models_. arXiv preprint arXiv:2001.08361 [arxiv](https://arxiv.org/abs/2001.08361) - Leibniz, G. W. (1951). _Leibniz Selections_ (P. P. Wiener, Ed.; p. 606). Charles Scribner’s Sons. - McKinsey & Company. (2024). McKinsey technology trends outlook 2024. _McKinsey Digital_. - Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. _Psychological Review_, _65_(6), 386–408. - Rumelhart, D. E., McClelland, J. L., & Group, P. R. (1986). _Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations_. The MIT Press. - Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2017). _Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm_. arXiv preprint arXiv:1712.01815 [arxiv](https://arxiv.org/abs/1712.01815) - Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. _Journal of Machine Learning Research_, _15_(56), 1929–1958. - TURING, A. M. (1950). I.—COMPUTING MACHINERY AND INTELLIGENCE. _Mind_, _LIX_(236), 433–460. - Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). _Attention Is All You Need_. arXiv preprint arXiv:1706.03762 [arxiv](https://arxiv.org/abs/1706.03762) - Wu, D., Wang, X., Su, J., Tang, B., & Wu, S. (2020). A Labeling Method for Financial Time Series Prediction Based on Trends. _Entropy_, _22_(10). [^1]: ([Jordan & Mitchell, 2015](#bib-jordan2015machine)) described the emerging trends within classical machine learning systems, focusing on recommendation systems. From a recent McKinsey’s reports of outlook trend of 2024, they reported around 570bn dollars equity investment in the adoption of generative AI, notably the integration of LLMs into enterprises usecase ([McKinsey & Company, 2024](#bib-mckinsey2024techtrends)) [^framing]: An intelligent being learns from its experience, then applies such intuition to predict future events. How does one select appropriate context (frame) for a given situation?\ Dreyfus’ argument is that machines are yet able to represent human’s reliance on many unconscious and subconscious processes ([Dreyfus, 1972](#bib-dreyfus1972what)). A Heideggerian AI would exhibit Dasein (being in the world). [^ssi]: Their goals are to build “artificial super intelligence” (ASI) systems. This target is largely due to certain observer-expectancy effect we observe in the current AI system. [^2]: According to Plato, Socrates asked Euthyphro, a fellow Athenian who is about to turn in his own father for murder in the name of piety: “I want to know what is characteristic of piety which makes all actions pious. \[…] that I may have it to turn to, and to use as a standard whereby to judge your actions and those of other men.” This is Socrates’ version of [effective procedure](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/effective-procedure) for modern-day computer scientists. [^3]: According to Plato, all knowledge must be universally applicable with explicit definitions, in other words, intuition, feeling would not constitute as the definition of knowing Aristotle differed from Plato where intuition was necessary to applying theory into practice ([Aristotle, 2009, p. 8, book VI](#bib-aristotle_nicomachean_ethics)). For Plato, cooks, who proceed by taste and intuition does not involve understanding because they have no knowledge. Intuition is considered as a mere belief. [^4]: Allen Newell and Herbert Simon’s work at RAND initially showed that computers can simulate important aspects of intelligence. [^5]: Notable figures include John Hopfield, Hinton’s “A Learning Algorithm for Boltzmann Machines” ([Ackley et al., 1985](#bib-ackley_learning_1985)) that introduces the concept of Boltzmann’s distributions in training neural networks, as well as Hinton’s later work on backpropagation algorithm. [^6]: Some notable quotes: - “Unlike financial derivatives, which they in some ways resemble, these new data derivatives draw their value, parasite-like, from human experience.”. - “\[Facebook’s algorithm fine-tuning and data wrangling] is aimed at solving one problem: how and when to intervene in the state of play that is your daily life in order to modify your behavior and thus sharply increase the predictability of your actions now, soon, and later.” [^7]: This is a mere simplification of the field. ML researchers also investigate in specific sub-fields --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS tags: - sfwr3db3 - university description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS" title: DBMS date: 2024-09-04 --- Book: Database Management System [ISBN-13:978-0072465631](https://www.amazon.ca/Database-Management-Systems-Raghu-Ramakrishnan/dp/0072465638) > [!tip] Midterm > > Thurs Oct.24 2024 (during lecture time) Due at 2200, late penalty of 20% per 24h, max 5 days. ```bash ssh se3db3 ``` Relational Model, E-R Model, Views, Indexes, Constraints, Relational Algebra - 2.5 exabytes of [data](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS/../../../../../../../../thoughts/data) per day. ## [search](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS/../../../../../../../../thoughts/Search) vs. query - indexed keyword - [PageRank](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS/../../../../../../../../thoughts/PageRank) - data independence - fault tolerant - concurrency control for transactions - reliable storage to maintain semantics ## independence - logical: protection from changes in _logical_ structure - physical: protection from changes in _physical_ structure --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/Entity-Relationship-Models tags: - sfwr3db3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/Entity-Relationship-Models" title: Entity-Relationship Models date: 2024-09-11 --- ## E/R model > sketch databse schemas including constraints. - Entity set = rectangle - Attribute = oval, with a line to the rectangle (representing its entity set) ## relationship - connects two or more entity sets. - represented by a _diamonds_ value of a relationship is a **relationship set** ### many-to-many relationship > an entity of either set can be connected to many entities of the other set. ### many-to-one relationship > each entity of the first set can be connected to at most one entity of the second set. and each entity of the second set can be connected to at least one entity of the first set. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/Keys-and-Foreign-Keys tags: - sfwr3db3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/Keys-and-Foreign-Keys" title: Foreign Keys and Relational Models date: 2024-09-09 --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/Keys-and-Foreign-Keys/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/relationalModel_Sept5.pdf) > A relation is a table Relations are **unordered** ⇒ relations are sets ## tuple and domain constraints - tuple: expresses conditions on the values of each tuple - domain constraint: tuple constrain that involves a single attributes ```sql (GPA <= 4.0) AND (GPA >= 0.0) ``` ## unique identifier > A _superkey_ is a set of attributes for a relation $r$ if $r$ cannot contain two distinct tuples $t_1$ and $t_2$ such that $t_1{[K]} = t_2{[K]}$ > A _(candidate) key_ for $r$ if $K$ is a minimal superkey ex: superkey of `RegNum` ## primary value handles `null` value > Presence of nulls in keys > [!tip] definition > > Each relation must have a **primary key** on which nulls are not allowed. > > notation: the attributes of the primary keys are _underlined_ ⇒ references between relations are realised through primary keys > [!note] Remark > > A set of fields is a _key_ for a relation if: > > 1. No two distinct tuples can have same values in all key fields > 2. This is not true for any subset of the key (minimal) > > If [#2](https://github.com/aarnphm/aarnphm.github.io/issues/2) is false, then a _superkey_ > > If there’s > 1 key for a relation, one of the keys is chosen to be _primary key_ Example: requirements: - For a given student and course, there is a single grade. ```sql CREATE TABLE Enrolled ( sid INTEGER, cid INTEGER, grade INTEGER, PRIMARY KEY (sid, cid), UNIQUE (cid, grade) ); ``` - Students can take only one course, and received a single grade for that courses; further, no two students in a course receive the grade ```sql CREATE TABLE Enrolled ( sid INTEGER, cid INTEGER, grade INTEGER, PRIMARY KEY sid, KEY (cid, grade) ); ``` > IC are validated when data is updated ## interpolation constraints (foreign keys) Referential integrity constraints _are imposed in order to guarantee **values** refer to existing tuples_ > [!note] Definition > > A _foreign key_ requires that the values on a set $X$ of attributes of a relation $R_1$ **must appear as values** for the _primary key_ of another relation $R_2$ Ex: _sid_ is a _foreign key_ referring to _Students_ > If al foreign key constraints are enforced ⇒ referential integrity is enforced ## enforcing referential integrity See also [source](https://www.ibm.com/docs/en/informix-servers/14.10?topic=integrity-referential) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/a1/content tags: - sfwr3db3 - assignment description: some notes about entity-relationship models and foreign keys title: E/R models and keys date: 2024-09-26 --- **Problem 1**: Consider the relations `PLAYERS` and `PLAYS` given by the schemas below. - `PLAYERS (playerID, firstName, lastName, gender, DOB, height, weight, drafted)` - `PLAYS (playerID, teamID, teamName, number, position, startYear)` PLAYERS provides information on all basketball players in the league, giving the playerID, first name and last name of the player, the gender, the date of birth (DOB), the player’s height and weight, and the year they were drafted into the league. PLAYS provides information about which players play on which teams. A player with playerID plays on a team with a teamID and team name. The player has a number, the position they play on the team, and the year they started playing with this team. For example, playerID 5 plays with teamID 1, the Toronto Raptors, with the number 4, in the point guard position, since 2021. Given these schemas, answer the following questions: > [!question] 1.a (9 marks) > > Identify three candidate keys. For each candidate key, describe the key, and briefly state the assumptions or conditions under which each candidate key would be valid Candidate keys: 1. $\text{playerID}$ in `PLAYERS` relation: - description: playerID contains a sole attribute, so it is minimal superkey. Given that each player will have unique `playerID` - assumption: each players has unique playerID 2. $\{\text{playerID}, \text{teamID}, \text{number}\}$ in `PLAYS` relation: - description: $\{\text{playerID}, \text{teamID}, \text{number}\}$ is minimal superkey given assumption. - assumption: A player uses the same number for their duration at a given team. 3. $\{\text{playerID}, \text{teamID}, \text{startYear}\}$ in `PLAYS` relation: - description: $\{\text{playerID}, \text{teamID}, \text{startYear}\}$ identifies the assumption, making it a minimal superkey. - assumption: A player can only be associated with a team at a given period in time. > [!question] 1.b (6 marks) > > List three integrity constraints that should hold over these relations. For each constraint, describe in one sentence why your constraint is necessary. 1. `playerID` in `PLAYS` references `playerID` in `PLAYERS`: - reason: foreign key constraint is necessary to ensure referential integrity, in other word, every player in `PLAY` must exist in `PLAYERS` 2. `drafted` in `PLAYERS` must be less than or equal to `startYear` in `PLAYS`: - reason: temporal integrity constraint, i.e., a player cannot start playing for a team before they were drafted into the league 3. $\{\text{teamID}, \text{number}\}$ in `PLAYS` table must be unique per `playerID` - reason: uniqueness constraint, i.e., no two players on the same team have the same number at any point in time --- **Problem 2**: You will prepare an E-R diagram describing the schema of airline operations storing information in an airline database. MacAir Aviation manages flight operations, passenger services, fleet maintenance, and staff. The company, henceforth referred to as “MacAir”, has hired you to design their database. MacAir wants to store information about people, where a person is represented with a person ID, name, age, and phone number. There are four types of persons: passenger, pilot, cabin crew, and ground staff: - A passenger has a dietary preference (e.g., ‘Vegan’, ‘Gluten-Free’, ‘Lactose- Free’, etc.). - A pilot, and a cabin crew both have a position (e.g., ‘Captain’, ‘First Officer’, etc.) and a salary. - Ground staff have attributes for salary and department (e.g. Billing and invoicing, Information Technology, etc.). An airline ticket has a 13-digit numeric ticket number, a seat number (e.g., 38A, 2E, etc.), and a class (‘E’, ‘B’, or ‘F’, representing economy, business, and first-class, respectively). Passengers book one or more tickets through a travel website (e.g., ‘Expedia’, ‘SkyScanner’, etc.) with an associated price. A ticket is bought by exactly one passenger. MacAir records an airline with an identifying alias, which is a 2-letter alphabetic code (‘AC’ for Air Canada), and the airline name (e.g., ‘Air Canada’). Airplanes have a serial number, a manufacturer, and a model (e.g. 737MAX). A pilot flies many airplanes, however, an airplane must be flown by at least one pilot. A cabin crew member works for at most one airline, and an airline has to have at least one cabin crew member working for it. An airline must own at least one airplane, but an airplane is owned by exactly one airline. A country has a code (a 3-letter alphabetic code, e.g., ‘CAN’ for Canada), a name, and a continent. An airport has an IATA code (International Air Transport Association, 3-letter alphabetic code, e.g., ‘YYZ’ for Toronto Pearson Airport), a name, and a city. A country has zero or more airports, however, an airport must be in exactly one country. An airline belongs to exactly one country, but a country can have many airlines. Ground staff work for at most one airport but an airport must have at least one ground staff. A (flight) route is represented with a numeric ID, the number of stops (e.g., 0 for nonstop), and the duration (in hours). A route contains exactly one source airport and exactly one destination airport (e.g., source airport: ’YYZ’, destination airport: ’MCO’). However, airports serve as the source or destination on many routes. An airline has many routes around the world, and a route is used by many airlines. The entity ‘Scheduled Flights’ contains all flights that serve a route. Scheduled flights are defined via an alpha-numeric flight number, departure date, arrival date, scheduled departure time, scheduled arrival time, actual departure time, and actual arrival time. A scheduled flight contains exactly one route, but a route participates in many (scheduled) flights. For example, the ‘YYZ’ to ‘MCO’ route appears in the scheduled flights for (AC1670, Sept. 13, Sept 13, 17:45, 20:35, 18:00, 20:50) Airlines use at least one scheduled flight to conduct operations, but a scheduled flight is associated to exactly one airline. A ticket is bought for exactly one (scheduled) flight, and there must be at least one ticket purchased for a (scheduled) flight. Baggage is associated to exactly one ticket. We record the type of bags (i.e., carry-on, checked, oversized, strollers), total quantity of bags for each type (e.g., 2 carry-on bags, 2 checked bags, 1 stroller, total weight of all bags for a type (e.g., 30kg for carry-on bags, 60kg for checked bags, 5kg for stroller), and whether the bags (per type) are fragile. A ticket is associated to many (types of) bags. > [!question] 2.a > > Draw the ER diagram capturing the described requirements. You may use any drawing tool of your choice, but please ensure your ER diagram is clearly readable, and the notation you use is clear and consistent (i.e., notation from the lecture slides or textbook). > [!question] 2.b > > Give a brief (one sentence) description of each of your entities and relationships, and any constraints that exist. For example, $X$ is a weak entity with attributes $(a, b, c)$, and has a many-one relationship with $Y$ _Person_: denotes the meta definition of a person with attributes $(\text{id [PK], name, age, phone\_number})$ _Baggage_: is an entity with attributes $(\text{type}, \text{quantity}, \text{weight}, \text{is\_fragile})$, has a many-to-one relationship with _Ticket_ _Passenger_: is a subclass of _Person_, with attributes $(\text{dietary\_preference})$, has a one-many relationship with _Ticket_ _Ticket_: is a strong entity with atributes $(\text{ticket\_number [PK]}, \text{seat\_number, class, price, travel\_website})$, having one-to-many relationship with _Baggage_ _Pilot_: is a subclass of _Person_, with attributes $(\text{position},\text{salary})$, has a “fly” one-to-many relationship with _airplane_ _Cabin Crew_: is a subclass of _Person_, with attributes $(\text{position},\text{salary})$, has a “work” many-to-one relationship with _airline_ _Ground Staff_: is a subclass of _Person_, with attributes $(\text{department},\text{salary})$, has a “work” many-to-one relationship with _airport_ _airport_: is a strong entity with attributes $(\text{iata\_code [PK, FK]}, \text{name [PK]}, \text{city})$, has “has” one-to-many relationship with _Ground Staff_ and many-to-one with _country_ _country_: is a strong entity with attributes $(\text{code [PK]}, \text{name}, \text{continent})$, has one-to-many relationship with _airline_ _airline_: is a strong entity with attributes $(\text{name}, \text{alias [PK]})$, has one-to-many relationship with _scheduled\_flight_, and one-to-many with _airplane_ _airplane_: is a strong entity with attributes $(\text{serial\_number [PK]}, \text{manufacturer}, \text{model})$, has many-to-one relationship with _pilot_ _flight\_route_: is a strong entity with attributes $(\text{id [PK]}, \text{stop, duration})$, has one-to-many relationship with _scheduled\_flight_ and one-to-one with _airport_ through relationship `source` and `dest` _scheduled\_flight_: is a strong entity with attributes: $$ \begin{aligned} (\text{flight\_number [PK]}, \text{departure\_date}, \text{arrival\_date} & \\ \text{scheduled\_departure\_time}, & \text{scheduled\_arrival\_time}, \\ \text{actual\_departure\_time}, & \text{actual\_arrival\_time}) \end{aligned} $$ has one-to-many relationship with _flight\_route_ and one-to-many with _airport_ through relationship `source` Constraints: - All person id are unique. - An airline must own at least one airplane and have at least one cabin crew member. - An airplane must be flown by at least one pilot. - An airport must have at least one ground staff. - A scheduled flight must have at least one ticket purchased for it. - A country can have zero or more airports, but an airport must be in exactly one country. - An airline belongs to exactly one country. - A route contains exactly one source airport and one destination airport. - A scheduled flight contains exactly one route and is associated with exactly one airline. - A ticket is bought for exactly one scheduled flight and by exactly one passenger. > [!question] 2.c > > Provide the corresponding DB2 `CREATE TABLE`` statements describing the relational schema. Please include all your statements in an executable script `airline.ddl\` that can be run on the DB2 command line, in a single command. Ensure that your script runs on the CAS DB2 server. See also: [airline.ddl](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/a1/airline.ddl) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/a2/content tags: - sfwr3db3 - assignment description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/a2/content" title: SQL and Relational Algebra date: 2024-11-11 --- ## 1. SQL > [!question] Q1 > > Find all passengers, between the ages of 20 and 30 (inclusive), who have a “Vegan” or “Vegetarian” dietary preference. Return their ID, name, and age. ```sql SELECT p.personid AS id, p.name, p.age FROM person p JOIN passenger pass ON p.personid = pass.personid WHERE p.age BETWEEN 20 AND 30 AND pass.dietarypref IN ('Vegan', 'Vegetarian') ORDER BY p.personid; ``` > [!question] Q2 > > a. Find the number of airplanes that exist for each model. Return the model and the count for each model. b. Extend your query from (a) to find the number of airplanes in each model for any of the following airlines: ‘Air Canada’, ‘Etihad Airways’, or ‘United Airlines’. Return the name of the airline, the model, and the number of airplanes. ```sql -- Q2a SELECT model, COUNT(*) AS numairplanes FROM airplane GROUP BY model ORDER BY model; -- Q2b SELECT a.name AS airlinename, p.model, COUNT(*) AS numairplanes FROM airplane p JOIN airline a ON p.airlinealias = a.alias WHERE a.name IN ('Air Canada', 'Etihad Airways', 'United Airlines') GROUP BY a.name, p.model ORDER BY a.name, p.model; ``` > [!question] Q3 > > a. For each “Air Canada” ticket, find the average of the total weight, for all baggage associated to the ticket. Return the ticket number, and the average total (baggage) weight. b. Find all tickets with “Oversized”, non-fragile baggage with a total weight (strictly) greater than 90 lbs, during the holiday season from Dec. 10, 2023 to Jan. 3, 2024 (inclusive). Return all qualifying ticket numbers, and the total `(Oversized)` baggage weight. ```sql -- Q3a SELECT t.ticketno, AVG(b.totalweight) AS AverageBaggageWeight FROM ticket t JOIN scheduledflight sf ON t.flightno = sf.flightno AND t.flightdepdate = sf.depdate JOIN airline a ON sf.airlinealias = a.alias LEFT JOIN baggage b ON t.ticketno = b.ticketno WHERE a.name = 'Air Canada' GROUP BY t.ticketno ORDER BY t.ticketno; -- Q3b SELECT b.ticketno, b.totalweight AS OversizedBaggageWeight FROM baggage b JOIN ticket t ON b.ticketno = t.ticketno JOIN scheduledflight sf ON t.flightno = sf.flightno AND t.flightdepdate = sf.depdate WHERE b.bagtype = 'Oversized' AND b.fragile = FALSE AND b.totalweight > 90 AND sf.depdate BETWEEN '2023-12-10' AND '2024-01-03' ORDER BY b.ticketno; ``` > [!question] Q4 > > Where and when are the cheapest tickets for flights from Toronto “YYZ” to Orlando “MCO”? Return the ticket number, the date of departure, the minimum price (rename to min-Price), and the website where the ticket(s) were purchased. ```sql WITH MinPriceFlights AS ( -- First find the minimum price for this route SELECT MIN(b.Price) as min_price FROM Route r JOIN ScheduledFlight sf ON r.RouteID = sf.RouteID JOIN Ticket t ON sf.FlightNo = t.FlightNo AND sf.DepDate = t.FlightDepDate JOIN Book b ON t.TicketNo = b.TicketNo WHERE r.srcAirport = 'YYZ' AND r.dstAirport = 'MCO' ) SELECT t.TicketNo, sf.DepDate as DepartureDate, b.Price as minPrice, b.Website FROM Route r JOIN ScheduledFlight sf ON r.RouteID = sf.RouteID JOIN Ticket t ON sf.FlightNo = t.FlightNo AND sf.DepDate = t.FlightDepDate JOIN Book b ON t.TicketNo = b.TicketNo CROSS JOIN MinPriceFlights mpf WHERE r.srcAirport = 'YYZ' AND r.dstAirport = 'MCO' AND b.Price = mpf.min_price ORDER BY sf.DepDate; ``` > [!question] Q5 > > a. Which routes are served by at least three airlines? Return the routeID, and display your results in descending order by the number of airlines. b. Which routes are not served by any airline? Return the routeID, the source and destination airports ```sql -- Q5a SELECT u.RouteID, COUNT(DISTINCT u.AirlineAlias) as NumAirlines FROM Use u GROUP BY u.RouteID HAVING COUNT(DISTINCT u.AirlineAlias) >= 3 ORDER BY NumAirlines DESC; -- Q5b SELECT r.RouteID, r.srcAirport as SourceAirport, r.dstAirport as DestinationAirport FROM Route r LEFT JOIN Use u ON r.RouteID = u.RouteID WHERE u.AirlineAlias IS NULL ORDER BY r.RouteID; ``` > [!question] Q6 > > a. Find the number of distinct passengers who also work as either a pilot, cabin crew, or ground staff. Rename this result as NumStaffPassengers. b. For each airline, how many pilots or cabin crew are also passengers? Return the airline (alias), and the corresponding count ```sql -- Q6a SELECT COUNT(DISTINCT p.PersonID) as NumStaffPassengers FROM Passenger p WHERE p.PersonID IN ( SELECT PersonID FROM Pilot UNION SELECT PersonID FROM CabinCrew UNION SELECT PersonID FROM GroundStaff ); -- Q6b SELECT a.Alias as AirlineAlias, COUNT(DISTINCT p.PersonID) as StaffPassengerCount FROM Airline a LEFT JOIN ( -- Get all pilots and cabin crew SELECT PersonID, AirlineAlias FROM CabinCrew UNION -- For pilots, we need to get their airline through the planes they fly SELECT DISTINCT pi.PersonID, ap.AirlineAlias FROM Pilot pi JOIN Flies f ON pi.PersonID = f.PilotID JOIN Airplane ap ON f.AirplaneSNo = ap.SerialNo ) AS staff ON a.Alias = staff.AirlineAlias -- Join with Passenger to check which staff are also passengers JOIN Passenger pass ON staff.PersonID = pass.PersonID GROUP BY a.Alias ORDER BY a.Alias; ``` > [!question] Q7 > > a. Find all the one-way routes operated by airline “ACA”, i.e., airline alias = ‘ACA’. In this context, a one-way route is where the airline serves from a source airport to a destination airport, but not in the reverse direction. Return the route ID, and the corresponding source and destination airports, respectively. b. Find the most popular route where the departure date lies between “2023-12-01” to “2023-12-31” (inclusive). Popularity is defined as the maximum number of tickets purchased during this time duration. Return the route ID, the corresponding source and destination air- ports, and number of tickets sold along this route. ```sql -- Q7a SELECT r1.RouteID, r1.srcAirport as SourceAirport, r1.dstAirport as DestinationAirport FROM Route r1 JOIN Use u1 ON r1.RouteID = u1.RouteID WHERE u1.AirlineAlias = 'ACA' AND NOT EXISTS ( -- Check if reverse route exists SELECT 1 FROM Route r2 JOIN Use u2 ON r2.RouteID = u2.RouteID WHERE u2.AirlineAlias = 'ACA' AND r2.srcAirport = r1.dstAirport AND r2.dstAirport = r1.srcAirport ) ORDER BY r1.RouteID; -- Q7b WITH RouteTickets AS ( -- Count tickets per route in December 2023 SELECT r.RouteID, r.srcAirport, r.dstAirport, COUNT(*) as TicketCount FROM Route r JOIN ScheduledFlight sf ON r.RouteID = sf.RouteID JOIN Ticket t ON sf.FlightNo = t.FlightNo AND sf.DepDate = t.FlightDepDate WHERE sf.DepDate BETWEEN '2023-12-01' AND '2023-12-31' GROUP BY r.RouteID, r.srcAirport, r.dstAirport ), MaxTickets AS ( -- Find the maximum ticket count SELECT MAX(TicketCount) as MaxCount FROM RouteTickets ) SELECT rt.RouteID, rt.srcAirport as SourceAirport, rt.dstAirport as DestinationAirport, rt.TicketCount as NumberOfTickets FROM RouteTickets rt, MaxTickets mt WHERE rt.TicketCount = mt.MaxCount ORDER BY rt.RouteID; ``` > [!question] Q8 > > a. Which Air Canada (alias “ACA”) flights from source airport “YYZ” to destination airport “MCO” have “First” class tickets? Return all satisfying flight numbers. b. Find all airlines that are unique to their country (i.e., they are the only airline for their country). Return the airline alias, airline name, and the country name ```sql -- Q8a WITH AirlinesPerCountry AS ( -- Count airlines per country SELECT c.Code as CountryCode, c.Name as CountryName, COUNT(*) as AirlineCount FROM Country c JOIN Airline a ON c.Code = a.CountryCode GROUP BY c.Code, c.Name HAVING COUNT(*) = 1 ) SELECT a.Alias as AirlineAlias, a.Name as AirlineName, apc.CountryName FROM Airline a JOIN AirlinesPerCountry apc ON a.CountryCode = apc.CountryCode ORDER BY apc.CountryName, a.Name; -- Q8b SELECT a1.Alias as AirlineAlias, a1.Name as AirlineName, c.Name as CountryName FROM Airline a1 JOIN Country c ON a1.CountryCode = c.Code WHERE NOT EXISTS ( SELECT 1 FROM Airline a2 WHERE a2.CountryCode = a1.CountryCode AND a2.Alias != a1.Alias ) ORDER BY c.Name, a1.Name; ``` ## 2. Relational Algebra > [!question] Question > > For queries Q1 - Q6, give the corresponding relational algebra expression ### Q1 $$ \begin{align} & R_1 = \text{Person} \bowtie_{\text{Person.PersonID} = \text{Passenger.PersonID}} \text{Passenger} \\[6pt] & R_2 = \sigma_{\substack{ \text{Age} \geq 20 \\ \wedge \, \text{Age} \leq 30 \\ \wedge \, \big(\text{DietaryPref} = \text{'Vegan'} \\ \phantom{\wedge \,} \vee \, \text{DietaryPref} = \text{'Vegetarian'}\big) }} (R_1) \\[6pt] & \text{Result} = \pi_{\text{PersonID}, \, \text{Name}, \, \text{Age}} (R_2) \end{align} $$ ### Q2 a. $$ \gamma_{\text{Model}, \text{count}(*) \rightarrow \text{NumAirplanes}}(\text{Airplane}) $$ b. $$ \begin{align} & R_1 = \text{Airplane} \bowtie_{\text{AirlineAlias = Alias}} \text{Airline} \\[6pt] & R_2 = \sigma_{\substack{ \text{Name} = \text{'Air Canada'} \\ \vee \, \text{Name} = \text{'Etihad Airways'} \\ \vee \, \text{Name} = \text{'United Airlines'} }} (R_1) \\[6pt] & \text{Result} = \gamma_{\substack{ \text{Name}, \text{Model}, \\ \text{count}(*) \rightarrow \text{NumAirplanes} }} (R_2) \end{align} $$ ### Q3 a. $$ \begin{align} & R_1 = \text{Ticket} \bowtie_{ \substack{ \text{FlightNo = FlightNo} \\ \wedge \, \text{FlightDepDate = DepDate} }} \text{ScheduledFlight} \\[6pt] & R_2 = R_1 \bowtie_{\text{AirlineAlias = Alias}} \text{Airline} \\[6pt] & R_3 = R_2 \Join_{\text{Ticket.TicketNo = Baggage.TicketNo}} \text{Baggage} \\[6pt] & R_4 = \sigma_{\text{Name} = \text{'Air Canada'}} (R_3) \\[6pt] & R_5 = \pi_{\text{TicketNo}, \text{TotalWeight}} (R_4) \\[6pt] & \text{Result} = \\ & \quad \gamma_{\text{TicketNo}, \, \text{avg}(\text{TotalWeight}) \rightarrow \text{AverageBaggageWeight}} (R_5) \end{align} $$ _NOTE_: R2 should “\leftouterjoin” instead (but current limitation of LaTeX renderer) b. $$ \begin{align} & R_1 = \text{Ticket} \bowtie_{ \substack{ \text{FlightNo = FlightNo} \\ \wedge \, \text{FlightDepDate = DepDate} } } \text{ScheduledFlight} \\[6pt] & R_2 = \text{Baggage} \bowtie_{\text{TicketNo = TicketNo}} R_1 \\[6pt] & R_3 = \sigma_{\substack{ \text{BagType} = \text{'Oversized'} \\ \wedge \, \text{Fragile} = \text{False} \\ \wedge \, \text{TotalWeight} > 90 \\ \wedge \, \text{DepDate} \geq \text{'2023-12-10'} \\ \wedge \, \text{DepDate} \leq \text{'2024-01-03'} }} (R_2) \\[6pt] & \text{Result} = \pi_{\text{TicketNo, TotalWeight}} (R_3) \end{align} $$ ### Q4 $$ \begin{align} & R_1 = \sigma_{\substack{\text{srcAirport} = \text{'YYZ'} \\ \land \, \text{dstAirport} = \text{'MCO'}}} (\text{Route}) \\[6pt] & R_2 = R_1 \bowtie_{\text{Route.RouteID} = \text{ScheduledFlight.RouteID}} \text{ScheduledFlight} \\[6pt] & R_3 = R_2 \bowtie_{ \substack{ \text{ScheduledFlight.FlightNo} = \text{Ticket.FlightNo} \\ \land \, \text{ScheduledFlight.DepDate} = \text{Ticket.FlightDepDate} }} \text{Ticket} \\[6pt] & R_4 = R_3 \bowtie_{\text{Ticket.TicketNo} = \text{Book.TicketNo}} \text{Book} \\[6pt] & \text{MinPrice} = \mathcal{G}_{\emptyset, \, \text{min\_price} \leftarrow \text{MIN(Price)}} \Big( \Pi_{\text{Price}} (R_4) \Big) \\[6pt] & \text{Result} = \\ & \quad \Pi_{ \substack{ \text{TicketNo}, \, \text{DepDate} \rightarrow \text{DepartureDate}, \\ \text{Price} \rightarrow \text{minPrice}, \, \text{Website} }} \Big( \sigma_{\text{Price} = \text{min\_price}} (R_4 \times \text{MinPrice}) \Big) \end{align} $$ ### Q5 a. $$ \begin{align} R_1 &= \Pi_{\text{RouteID}, \text{AirlineAlias}} (\text{Use}) \\[8pt] R_2 &= \mathcal{G}_{\text{RouteID}, \text{NumAirlines} \leftarrow \text{COUNT}(\text{AirlineAlias})} (R_1) \\ \text{Result} &= \Pi_{\text{RouteID}} (\sigma_{\text{NumAirlines} \geq 3} (R_2)) \end{align} $$ b. $$ \begin{align} R_1 &= \text{Route} \: \Join_{\text{Route.RouteID = Use.RouteID}} \: \text{Use} \\[6pt] R_2 &= \sigma_{\text{AirlineAlias} \: \text{IS} \: \text{NULL}} (R_1) \\[6pt] \text{Result} &= \\ & \quad \Pi_{\text{RouteID}, \, \substack{ \text{srcAirport} \rightarrow \text{SourceAirport}, \\ \text{dstAirport} \rightarrow \text{DestinationAirport} }} (R_2) \end{align} $$ _NOTE_: Route should “\leftouterjoin” instead (but current limitation of LaTeX renderer) ### Q6 a. $$ \begin{align} & \text{Staff} = \\ & \quad \Pi_{\text{PersonID}} (\text{Pilot}) \space \cup \\ & \quad \Pi_{\text{PersonID}} (\text{CabinCrew}) \space \cup \\ & \quad \Pi_{\text{PersonID}} (\text{GroundStaff}) \\[6pt] & \text{StaffPassengers} = \\ & \quad \Pi_{\text{PersonID}} (\text{Passenger}) \cap \text{Staff} \\[6pt] & \text{Result} = \\ & \quad \mathcal{G}_{\emptyset, \, \text{NumStaffPassengers} \leftarrow \text{COUNT(PersonID)}} (\text{StaffPassengers}) \end{align} $$ b. $$ \begin{align} & \text{CabinCrewWithAirline} = \\ & \quad \Pi_{\text{PersonID}, \, \text{AirlineAlias}} (\text{CabinCrew}) \\[6pt] & \text{PilotsWithPlanes} = \\ & \quad \Pi_{\text{PersonID}, \, \text{AirlineAlias}} (\\ & \qquad \text{Pilot} \bowtie_{\text{Pilot.PersonID} = \text{Flies.PilotID}} \text{Flies} \\ & \qquad \bowtie_{\text{Flies.AirplaneSNo} = \text{Airplane.SerialNo}} \text{Airplane}\\ & \quad ) \\[6pt] & \text{AllStaffWithAirline} = \\ & \quad \text{CabinCrewWithAirline} \cup \text{PilotsWithPlanes} \\[6pt] & \text{StaffPassengers} = \\ & \quad \text{AllStaffWithAirline} \bowtie_{\text{PersonID}} \Pi_{\text{PersonID}} (\text{Passenger}) \\[6pt] & \text{Result} = \\ & \quad \mathcal{G}_{\text{AirlineAlias}, \, \text{StaffPassengerCount} \leftarrow \text{COUNT(PersonID)}} (\text{StaffPassengers}) \end{align} $$ ## 3. Indexes The following includes two possible indexes: ### $\text{(FlightNo, DeptDate)}$ on `ScheduledFlight` table - Attributes: (FlightNo, DeptDate) on `ScheduledFlight` table - Properties: composite index on both attributes , clustered index respectively - Benefits - Q3, Q4, Q7b given these queries heavily join with ScheduledFlight and filter on depature dates - composite nature supports queries that use both FlightNo and DepDate in joins (frequently due to the foreign key relationship with Ticket table) - Since these fields are part of the primary key of ScheduledFlight and are frequently used in joins with Ticket - help with range scan on DepDate ### $\text{(RouteID, AirlineAlias)}$ on `Use` table - Attributes: (RouteID, AirlineAlias) on `Use` table - Properties: composite index., unclustered index respectively - Benefits: - Q5a, Q5b, Q7a and indirect Q4 - given these rely on route-airline relationship - Q5a needs to count distinct airlines per route, so this index eliminate this scan - Q7a looks for ACA airline routes, so this will provide direct access - Being unclustered is appropriate as `Use` is frequently accessed for lookups but doesn’t require physical ordering --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/index tags: - university - sfwr3db3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/index" title: Databases date: 2024-10-29 --- See also [databases](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/index/../../../../../../../../thoughts/databases) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm tags: - sfwr3db3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm" title: databases internals date: 2024-10-23 --- ## Practice Q1. - a. F - b. F (wrong: Must be T) - A relation R(A,B,C) **may** have at most three minimal keys (not superkey) - c. T - d. T - e. T (any ops involving a null is a null) - f. F (DML: data manipulation, not management) - g. F (a weak entity set has one or more many-many relationship) - h. F Q3. ```prolog Product(maker, model, price) PC(model, speed) Printer(model, type) ``` - model is PK for all relations - `type` are “laser” and “ink-jet” - every PC model and every printer model is a Product model (every PC/printer must be referenced in relation to Product) - price of a product should not be more than 10% higher than the average price of all product (average price of all product is given value avgPrice) - model and price are int, all other attributes of type char(20) ```sql title="create schema" create table Product( model INTEGER PRIMARY KEY NOT NULL; maker CHAR(20), price INTEGER (CHECK price <= (SELECT AVG(price)*1.10 FROM Product)) ); create table PC( model INTEGER PRIMARY KEY NOT NULL; speed CHAR(20), FOREIGN KEY(model) REFERENCES Product(model) ); create table Printer( model INTEGER PRIMARY KEY NOT NULL; type CHAR(20) (CHECK (type IN ('laser', 'ink-jet'))) FOREIGN KEY(model) REFERENCES Product(model) ); ``` ```sql title="find makers from whom a combination (PC and Printer) can be bought for less than 2000" SELECT DISTINCT p1.maker FROM Product p WHERE EXISTS ( SELECT * FROM PC pc, Printer pr, Product p1, Product p2 WHERE p1.model = pc.model and p2.model = pr.model and p1.price + p2.price < 2000 and p1.maker = p.maker and p2.maker = p.maker ) ``` ```sql title="For each maker, find the min and max price of a (PC, ink-jet printer) combination" SELECT p1.maker, min(p1.price+p2.price), max(p1.price+p2.price) FROM Product p1, Product p2, PC pc, Printer pr WHERE pr.type = 'ink-jet' AND p1.model = pc.model AND p2.model = pr.model and p1.maker = p2.maker ORDER BY p1.maker; ``` Q4. a. (1,3) b. cartesian products # Foreign Keys and Relational Models See also [slides](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/Keys-and-Foreign-Keys/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/relationalModel_Sept5.pdf) > A relation is a table Relations are **unordered** ⇒ relations are sets ## tuple and domain constraints - tuple: expresses conditions on the values of each tuple - domain constraint: tuple constrain that involves a single attributes ```sql (GPA <= 4.0) AND (GPA >= 0.0) ``` ## unique identifier > A _superkey_ is a set of attributes for a relation $r$ if $r$ cannot contain two distinct tuples $t_1$ and $t_2$ such that $t_1{[K]} = t_2{[K]}$ > A _(candidate) key_ for $r$ if $K$ is a minimal superkey ex: superkey of `RegNum` ## primary value handles `null` value > Presence of nulls in keys > [!tip] definition > > Each relation must have a **primary key** on which nulls are not allowed. > > notation: the attributes of the primary keys are _underlined_ ⇒ references between relations are realised through primary keys > [!note] Remark > > A set of fields is a _key_ for a relation if: > > 1. No two distinct tuples can have same values in all key fields > 2. This is not true for any subset of the key (minimal) > > If [#2](https://github.com/aarnphm/aarnphm.github.io/issues/2) is false, then a _superkey_ > > If there’s > 1 key for a relation, one of the keys is chosen to be _primary key_ Example: requirements: - For a given student and course, there is a single grade. ```sql CREATE TABLE Enrolled ( sid INTEGER, cid INTEGER, grade INTEGER, PRIMARY KEY (sid, cid), UNIQUE (cid, grade) ); ``` - Students can take only one course, and received a single grade for that courses; further, no two students in a course receive the grade ```sql CREATE TABLE Enrolled ( sid INTEGER, cid INTEGER, grade INTEGER, PRIMARY KEY sid, KEY (cid, grade) ); ``` > IC are validated when data is updated ## interpolation constraints (foreign keys) Referential integrity constraints _are imposed in order to guarantee **values** refer to existing tuples_ > [!note] Definition > > A _foreign key_ requires that the values on a set $X$ of attributes of a relation $R_1$ **must appear as values** for the _primary key_ of another relation $R_2$ Ex: _sid_ is a _foreign key_ referring to _Students_ > If al foreign key constraints are enforced ⇒ referential integrity is enforced ## enforcing referential integrity See also [source](https://www.ibm.com/docs/en/informix-servers/14.10?topic=integrity-referential) [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/Keys-and-Foreign-Keys) ## [ER Model](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/Entity-Relationship-Models) > A weak entity doesn’t have enough information to have its own PK and relies on supporting entity for unique identification > [!tip] Weak Entity > > weak identity we need one (or more) many-to-one (supporting) relationship(s) to other (supporting) entity sets ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/weak-entity.webp) Role - entity set may appear more than once in a relationship (label the edge between relationship) ## sql. ```sql create table Beers ( name CHAR(20) PRIMARY KEY, -- fixed-length of $n$ character manf VARCHAR(20), -- variable length of $n$ character ) create table Sells ( bar CHAR(20), beer CHAR(20) REFERENCES Beers(name), price REAL NOT NULL, PRIMARY KEY (bar, beer) ) -- or create table Sells ( bar CHAR(20), beer CHAR(20), price REAL NOT NULL, PRIMARY KEY (bar, beer), FOREIGN KEY(beer) REFERENCES Beers(name) ) ``` > [!tip] values > > any values can be `NULL`, unless specified otherwise > [!tip] PRIMARY KEYS vs. UNIQUE. > > - 1 PK for a relation, but several UNIQUE > - No attributes of PK can be NULL > - Attributes declared UNIQUE may have NULL ### DATE and TIME ```sql DATE("yyyy-mm-dd") TIME("hh:mm:ss") ``` ### constraints. - keys - foreign keys - domain - tuple-based - assertions - `REFERENCES` attribute ==**must be**== `PRIMARY KEY` or `UNIQUE` ```prolog FOREIGN KEYS REFERENCES (attributes) ``` **enforcing** constraints from relation $R$ to relation $S$, the following violation are possible: 1. insert/update $R$ introduces values not found in $S$ 2. deletion/update to $S$ causes tuple of $R$ to “dangle” ex: suppose $R=\text{Sell} \cap S=\text{Beer}$ _delete or update to $S$ that removes a beer value found in some tuples of $R$_ actions: 1. _Default_: reject modification 2. `CASCADE`: make the same changes in Sells - Delete beer: delete Sells tuple - Update beer: change value in Sells 3. `SET NULL`: change beer to `NULL` > Can choose either `CASCADE` or `SET NULL` as policy, otherwise reject as default ```sql create table Sells ( bar CHAR(20), beer CHAR(20) CHECK (beer IN (SELECT name FROM Beers)), price REAL CHECK (price <= 5.00), FOREIGN KEY(beer) REFERENCES Beers(name) ON DELETE SET NULL ON UPDATE CASCADE ) ``` > [!tip] attributed-based check > > `CHECK()`: cond may use name of attribute, but **any other relation/attribute name MUST BE IN subquery** > > `CHECK` only runs when a value for that attribute is inserted or updated. > [!note] Tuple-based checks > > added as a relation-schema element > > check on insert or update only ```sql create table Sells ( bar CHAR(20), beer CHAR(20), price REAL, CHECK (bar = 'Joe''s Bar' OR price <= 5.00), ) ``` ### queries ```sql SELECT name FROM Beers WHERE manf = 'Anheuser-Busch'; SELECT t.name FROM Beers t WHERE t.manf = 'Anheuser-Busch'; SELECT * FROM Beers WHERE manf = 'Anheuser-Busch'; SELECT name AS beer, manf FROM Beers WHERE manf = 'Anheuser-Busch'; SELECT bar, beer, price*95 AS priceInYen FROM Sells; -- constants as expr (using Likes(drinker,beer)) SELECT drinker, 'likes Bud' as whoLikesBud FROM Likes WHERE beer = 'Bud'; ``` > [!note] patterns > > `%` is any string, and `_` is any character > > ```sql > SELECT name FROM Drinkers > WHERE phone LIKE '%555-_ _ _ _'; > ``` > In sql, logics are 3-valued: TRUE, FALSE, UNKNOWN > > - comparing any value with `NULL` yields `UNKNOWN` > - A tuple in a query answer iff the `WHERE` is `TRUE` `ANY()` and `ALL()` ensures anyof or allof relations. > [!tip] > > `IN` is concise > > ```sql > SELECT * FROM Cartoons WHERE LastName IN ('Simpsons', 'Smurfs', 'Flintstones') > ``` IN is a predicate about `R` tuples ```sql -- (1,2) satisfies the condition, 1 is output once SELECT a FROM R -- loop once where b in (SELECT b FROM S); -- (1,2) with (2,5) and (1,2) with (2,6) both satisfies the condition, 1 is output twice SELECT a FROM R, S -- double loop WHERE R.b = S.b; ``` > NOT EQUAL operator in SQL is `<>` > [!note] Difference between > > - `ANY` means not = a, or not = b, or not = c > - `NOT IN` means not = a, and not = b, and not = c. (analogous to `ALL`) > [!note] > > `EXISTS()` is true iff subquery result is not empty. > [!note] > > structure: `()()` ### bag > or a multiset, is like a set, but an element may appear more than once. - Force results to be a set with `SELECT DISTINCT` - Force results to be a bag with `UNION ALL` `ORDER BY` ops followed with `desc` ### insert, update, delete ```sql INSERT INTO Likes VALUES('Sally', 'Bud'); -- or INSERT INTO Likes(beer, drinker) VALUES('Bud', 'Sally'); ``` add `DEFAULT` value during `CREATE TABLE` (`DEFAULT` value will be used if inserted tuple has no value for given attributes) ```sql create table Drinkers ( name CHAR(30) PRIMARY KEY, addr CHAR(50) DEFAULT '123 Sesame Street', phone CHAR(16) ); -- in this case, this will use DEFAULT value for addr -- | name | address | phone | -- | Sally | 123 Sesame Street | NULL | INSERT INTO Drinkers(name) VALUES('Sally'); ``` _Those drinkers who frequent at least one bar that Sally also frequents_ ```sql INSERT INTO Buddies (SELECT d2.drinker FROM Frequents d1, Frequents d2 WHERE d1.drinker = 'Sally' AND d2.drinker <> 'Sally' AND d1.bar = d2.bar) ``` `DELETE FROM`: ```sql -- remove a relation DELETE FROM Beers WHERE name = 'Bud'; -- remove all relation DELETE FROM Likes; -- Delete from Beer(name, manf) all beers for which there is another beer by the same manufacturer DELETE FROM Beers b WHERE EXISTS ( SELECT name FROM Beers WHERE manf = b.manf AND name <> b.name ) ``` `UPDATE` schema: ```prolog UPDATE SET WHERE ``` ### aggregations `SUM`, `AVG`, `COUNT`, `MIN`, `MAX` can be applied toa column in `SELECT` clause `COUNT(*)` counts the number of tuples ```sql -- find average price of Bud SELECT AVG(price) FROM Sells WHERE beer = 'Bud'; -- to get distinct value, then use DISTINCE SELECT COUNT(DISTINCT price) FROM Sells WHERE beer = 'Bud'; ``` > `NULL` never contributes to a sum, average, or count > > however, if all values in a column are `NULL`, then aggregation is `NULL` > > exception: `COUNT` of an empty set is 0 `GROUP BY`: according to the values of all those attributes, and any aggregation is applied only within each group: ```sql -- find the youngest employees per rating SELECT rating, MIN(age) FROM Employees GROUP BY rating -- find for each drinker the average price of Bud at the bars they frequent SELECT drinker, AVG(price) FROM Frequents, Sells WHERE beer = 'Bud' AND Frequents.bar = Sells.bar GROUP BY drinker; ``` > [!tip] restriction on > > each element of `SELECT` must be either: > > 1. Aggregated > > 2. An attribute on `GROUP BY` list > > > [!warning]- illegal example > > > > ```sql > > SELECT bar,beer,AVG(price) FROM Sells GROUP BY bar > > -- only one tuple out for each bar, no unique way to select which beer to output > > ``` `HAVING()` _may_ followed by `GROUP_BY` > If so, the condition applies to each group, and groups not satisfying the condition are eliminated. ```sql -- Get average price of beer given all beer groups exists with at -- least three bars or manufactured by Pete's SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 OR beer in (SELECT name FROM Beers WHERE manf = 'Pete''s'); ``` requirements on `HAVING`: - Anything goes in a subquery - Outside subqueries they may refer to attributes only if they are either: - A grouping attribute - aggregated ### cross product (cartesian product) ```sql -- Frequents x Sells -- (Bar) | Beer | Price | Drinker | (Bar) -- Joe | Bud | 3.00 | Aaron | Joe -- Joe | Bud | 3.00 | Mary | Jane SELECT drinker FROM Frequents, Sells WHERE beer = 'Bud' AND Frequents.bar = Sells.bar; ``` Or known as **join operations** ⇒ all join operations are considered cartesian products. Outer join preserves dangling tuples by padding with `NULL` > A tuple of $R$ that has no tuple of $S$ which it joins is said to be `dangling` ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/left-outer-join.webp) _Left outer join_ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/right-outer-join.webp) _Right outer join_ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/full-outer-join.webp) _full outer join_ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/inner-join.webp) _inner join_ ```sql R [NATURAL] [LEFT|RIGHT|FULL] OUTERJOIN [ON] S -- example R NATURAL FULL OUTERJOIN S ``` - natural: check equality on all common attributes && no two attributes with same name - left: padding dangling tuples of R only - right: padding dangling tuples of S only - full: padding both (default) ## views - many views (how users see data), single _logical schema (logical structure)_ and _physical schema (files and indexes used)_ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/view-abstraction.webp) virtual views _does not stored in database_ (think of query for constructing relations) materialized views are constructed and stored in DB. ```sql title="view default to virtual" CREATE [MATERIALIZED] VIEW as ; -- example: CanDrink(drinker, beer) create view CanDrink AS SELECT drinker, beer FROM Frequents f, Sells s WHERE f.bar = s.bar; ``` > Usually one shouldn’t update view, as it simply doesn’t exists. ## index idea: think of DS to speed access tuple of relations, organize records via tree or hashing DS: B+ Tree Index or Hash-based Index ### B+ Tree note: each node are at least 50% full ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/b-plus-tree.webp) > [!tip] cost > > tree is “height-balanced” > > insert/delete at $\log_{F}N$ cost > > min 50% occupancy, each node contains $d \leq m \leq 2d$ entries, where $d$ is the _order or the tree_ #### insert a data entry - find correct leaf $L$ - put data entry onto $L$ - if $L$ has enough space ⇒ done! - `split` $L$ - redistribute entries evenly, `copy up` middle key - insert index entry point to $L_{2}$ in parent of $L$ > split grow trees, root split increase heights #### delete a data entry - find leaf $L$ where entry belongs - remove entry - if L is at least half-full ⇒ done! - if not - redistribute, borrowing from sibling (adjacent node with same parent of $L$) - if fails, merge and sibling - merge occured then delete entry (point to $L$ or sibling) from parent of $L$ > merge propagate root, decrease heights ### Hash-based Index - index is a collection of _buckets_ Insert: if bucket is full ⇒ `split` ### Alternatives for data entries | | How | | --------------------- | --------------------------------------------------------------------- | | By Value | record contents are stored in index file (no need to follow pointers) | | By Reference | \ | | By List of References | \ | --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/tut/t1 tags: - sfwr3db3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/tut/t1" title: /squeel/ date: 2024-09-13 --- TA: [Jongjun Park](mailto:parkj182@mcmaster.ca) ```sql db2 connect to se3db3 ``` ```bash scp /path/to/.ddl macid@se3db3.cas.mcmaster.ca:/home/macid/workspace/.dll # on server db2 -tnf .ddl ``` --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3ra3/Stakeholders tags: - sfwr3ra3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/Stakeholders" title: Stakeholders date: 2024-09-13 --- ## stakeholders help to _identify_ the problem Eliciting SSON to validate first hypothesis ## personas are V.A.R.I.E.D 1. Vivid 2. Actionable 3. Real 4. Identifiable 5. Exact 6. Detailed ex: Personal Floation Device (PFD) Which Personas could be relevant? - Cabin Crew - Frequent traveller - Traveller with small children - Traveller with Disability --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3ra3/acme-visits tags: - sfwr3ra3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/acme-visits" title: acme visits date: 2024-09-12 --- See also [description](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/acme-visits/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3ra3/acme/requirements) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3ra3/index tags: - sfwr3ra3 - university description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/index" title: Software Requirements and Security Considerations date: 2024-09-04 --- prof: [Dr. Sébastien Mosser](https://mosser.github.io/teaching/) See also ### requirements. functional vs. non-functional: IEEE 29148 --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3ra3/midterm tags: - sfwr3ra3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/midterm" title: requirements notes date: 2024-10-21 --- > People do not buy products, they buy solutions to their problems. A good requirements is - necessary - verifiable - attainable Goals: desired results for target organisation - obstacles: property to be overcome Behaviour: - functional: outcome produced by _system_ - non-functional: property of how system achieves outcomes Constraints: - imposed by environment VARIED framework: Vivid: actually meet the persona Actionable: it should help the team to build the product Real: where user are, observe and interact Identifiable: dog-food Exact: Be specific Detailed: Good personas are substantials > [!abstract] Requirement Engineering > > is a **human activity** based on cognitive psychology, anthropology, sociology, and linguistics ## SSON (Single Statement of Need) > clear, concise statement about system's overall goals and how it will accomplish those goals - describe what **capability the system** being developed will provide ## goal. > convey **intention/rationale/objective** of stakholders - support _elicitation, analysis, and provide inputs for specification_ Actor: act within a system to achieve the goal Agent: act on behalf of other actors Role: an actor can play $0 \cdots n$ roles Position: consistent roles that are cohesive > [!note] Hard Goal > > Can **measure**, quantify and describe in their entirety > [!note] Soft Goal > > We know we need but cannot describe fully Resource: Can be used to achieve goals by an actor Plans: how actor will execute actions ### resolving soft goals 1. Definitions: convert soft ⇒ hard 2. Contributions: create sub goals to solve soft goals as functions 3. Decomposition: decompose soft to multiple sub goals ## Risks. flexible, adaptive, and changeable > potential events can impact your project progress Issues: known problems can be identified Risk (what if) ⇒ issues (current) who to contact how to mitigate such risks > calm, figure out root cause, and come up with solutions ### RACI matrix Responsible Accountable Consulted Informed ### fish-bone diagram scope creep risk register risk assessment ### probability and impact matrix inherent risk: measure of a risk, calculated by its probability and impact time risks budget risks scope risks: not be able to deliver milestones external risks: Single point of failure: risk that has potential to cause a catastrophic failure. dependency: relations between different tasks ### mitigation strategies - avoid - accept - reduce and control (use decision tree) - transfer ## Requirements of Oppression socio-technical, DEI ⇒ social infrastructure that reflect, reinforce, and amplify the matrix of oppression - Gender - Ability - Race Bias in data 1. Center the margins, or increase more diversity 2. social conflict? 3. Human-centric ## [tacit knowledge](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/midterm/../../../../../../../../thoughts/tacit-knowledge) ## modelling > conceptual representation of something --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3ra3/t2 tags: - sfwr3ra3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/t2" title: Identifying Stakeholders date: 2024-09-09 --- See also [instruction](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/t2/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3ra3/T02.pdf) ## case study - As people go green, there is an increased need for information on the facilities for cycling and pedestrian traffic in cities. - The tutorials for this course will develop an application that allows citizens to find information about these facilities. - Assume that you have been hired as a consulting company for the city of Hamilton to provide a mobile application (codename: BikeTour) for these facilities > [!question] Task 1 > > Identify the stakeholders of the following software system. > > > Brainstorm a collection of stakeholders that you should consult for this application. > > > > Differentiate the direct stakeholders from the indirect ones. Reminder: A stakeholder is any individual/group/org with a vested interest in your product 1. direct: - City of Hamilton - Hamilton cyclist - Pedestrians and other active transportation users 2. indirect - Hamilton City Council - decision on infrastructure billing - Government - properties developers - Nearby municipalities with connecting transportation (i.e: Dundas, Burlington, etc.) - Local academia institution who might benefit from this infrastructures. - Cycling advocacy groups > [!question] Task 2 > > What other requirements sources could be used to develop that product? > > > Brainstorm a collection of requirements sources for this application. - regulations and industry “best-practice” - municipals bylaws and official plans - accesibility guidelines - cycling apps engagement strategies - social media - local newspapers - similar functionalities - existing transport data - cycling volumes, traffic, collisions stats - bike share data, transit network data (enable multi-modal planning trips) - User research / stakeholders input - Surveys, “blind studies”, interviews with users - Testing and feedback loop - refinement of functionality - Technical requirements > [!question] Task 3 > > Perform an analysis of what types of elicitation methods would be appropriate for your identified stakeholders. - Interviews - Focus on the needs of the stakeholders (one-on-one), allowing detailed discussions with subject matters experts opinions - Focus groups - Organize focus groups with Hamilton cyclists and pedestrians to get direct feedback on their needs, experiences, and expectations from the app - Surveys - Gather feedback from a large number of users through surveys to gather requirements. - Building knowledge base, existing documents (i.e: city bylaws, municipality plans, etc.) ⇒ inform elicitation process - Brainstorming sessions - conduct sessions with diverse stakeholders to generate possible solutions > [!question] Task 4 > > Identify your “most-valuable” stakeholder(s) and the most valuable feature(s) BikeTour can bring to them Write a couple of scenarios 1. discovering new routes Sarah is an avid cyclist living in Hamilton. She commutes to work by bike daily but is getting bored of her usual route. She opens up the BikeTour app, selects her starting point and destination, and browses the suggested route options. She filters for routes that are scenic but still bike-friendly and direct enough for commuting. The app displays several new route options with elevation profiles, estimated trip duration, and route difficulty ratings sourced from other local cyclists’ trip data. 2. data-driven cycling infrastructure planning Trevor is a PM in City of Hamilton’s Sustainable Mobility Department. He is working on prioritising new cycling infrastuctures projects for the coming calendar year. Trevor logs into the BikeTour admin dashboard and views the aggregated, anonymized trip data from Hamilton cyclists using the app. A heatmap shows the most popular cycling routes, while another data visualization identifies “problem areas” with frequent cyclist-reported issues like potholes, close calls with cars, or inadequate bike parking. By cross-referencing this crowdsourced data from actual Hamilton cyclists with the city’s existing cycling network data, Trevor can easily identify key gaps and safety hotspots. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/index tags: - sfwr4aa4 - university description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/index" title: Real-time system a la carte date: 2024-09-06 --- 1. Soft RT system are those which do not : False 2. A good scheduling algorithm for hard real time: False 3. (continuous graph): B --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab4/content tags: - lab - sfwr4aa4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab4/content" title: Threaded LED date: 2024-10-04 --- See [part1](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab4/part1/main.c) See [part2](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab4/part2/main.c) See [part3](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab4/part3/main.c) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab5/content tags: - lab - sfwr4aa4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab5/content" title: external LED date: 2024-10-04 --- See [part1](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab5/part1/main.c) See [part2](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab5/part2/main.c) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/lab8 tags: - sfwr4aa4 - lab description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/lab8" title: PWM and Shared Memory date: 2024-11-08 --- $$ f_\text{PWM} = \frac{f_\text{clk}}{N(X+1)} $$ See [thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part1.c](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part1.c) See [thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part2.c](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part2.c) and [thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part2\_application.c](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part2_application.c) See [thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part3.c](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part3.c) and [thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part3\_application.c](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part3_application.c) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content tags: - sfwr4aa4 - lab description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content" title: PID Controller from input signals date: 2024-11-15 --- Transfer function for angular speed: $$ \frac{A}{1 + \tau s} $$ The input signal that begins at time $t_{0}$ and its minimum and maximum values are given by $u_\text{min}, u_\text{max}$. The resulting output signal is initially at $y_{0}$ and eventually settles down for a steady state value of $y_\text{ss}$. The steady state gain $A$ is given by: $$ A = \frac{y_\text{ss} - y_0}{u_\text{max} - u_\text{min}} = \frac{\triangle y}{\triangle u} $$ Time constant $\tau$ is time required for output to increase from initial value to $0.632 \times \triangle y$ Let $t_1$ is time when change in output is $0.632 \times \triangle y$: $$ \begin{aligned} y(t_{1}) &= 0.632 \times (y_\text{ss} -y_{0}) + y_{0} \\[8pt] \tau &= t_{1} - t_{0} \end{aligned} $$ ## find the transfer function ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/first-graph.webp) $$ \begin{align*} \triangle y &= 3V \\[6pt] \triangle v &= 8.285 - 2.454 = 5.831 \text{rad/s} \\[6pt] A &= \frac{5.831}{3} = 1.9436666667 \text{rad/s} \\[12pt] \text{target velocity} &= 2.454 + 0.632 * 5.831 = 6.139192 \text{rad/s} \\[8pt] \tau \approx 0.029 \text{secs} \end{align*} $$ _note: reach it at around 5.029 sec_ ## graphs see [simulink file](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/lab9part2.slx) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/graph-p2.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/pid-setup.webp) _proportional_ $P = 350 * \frac{\pi}{180}$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm tags: - sfwr4aa4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm" title: rt_system items date: 2024-10-22 --- correctness: $|C(t) - Cs(t)| < \epsilon$ **drift** is RoC of the clock value from perfect clock. Given clock has bounded drift $\rho$ then $$ \mid \frac{dC(t)}{dt} -1 \mid < \rho $$ Monotonicity: $\forall t_{2} > t_{1}: C(t_{2}) > C(t_{1})$ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/rt-sys-failure.webp) ## kernels `syscall` in kernel: User space and Kernel Space are in different spaces ```mermaid graph LR A[procedure] --[parameters]--> B[TRAP] B --> C[Kernel] C --> B --> A ``` > [!tip] Important > > a user process becomes kernel process when _execute syscall_ Scheduling ensures fairness, min response time, max throughput | | OS | RTOS | | ------------ | ----------------- | --------------------------------------- | | philos | time-sharing | event-driven | | requirements | high-throughput | schedulablity (meet all hard deadlines) | | metrics | fast avg-response | ensureed worst-case response | | overload | fairness | meet critical deadlines | > Kernel programs can always preempt user-space programs Kernel program example: ```c #include /* Required by macros*/ #include /*KERN_INFO needs it*/ #include static char *my_string __initdata = "dummy"; static int my_int __initdata = 4; /* Init function with user defined name*/ static int __init hello_4_init(void) { printk(KERN_INFO "Hello %s world, number %d\n", my_string, my_int); return 0; } /* Exit function with user defined name*/ static void __exit hello_4_exit(void) { printf(KERN_INFO "Goodbye cruel world 4\n"); } /*Macros to be used after defining init and exit functions*/ module_init(hello_4_init); module_exit(hello_4_exit) ``` ## **preemption** && `syscall` > The act of temporarily interrupting a currently scheduled task for higher priority tasks. > NOTE: `make` doesn’t recompile if DAG is not changed. ## process - independent execution, logical unit of work scheduled by OS - in virtual memory: - Stack: store local variables and function arguments - Heaps: dyn located (think of `malloc`, `calloc`) - BSS segment: uninit data - Data segment: init data (global & static variables) - text: RO region containing program instructions | | stack | heap | | -------- | ------------------------------------------- | ------------------------- | | creation | `Member m` | `Member*m = new Member()` | | lifetime | function runs to completion | delete, free is called | | grow | fixed | dyn added by OS | | err | stack overflow | heap fragmentation | | when | size of memory is known, data size is small | large scale dyn mem | ## `fork()` - create a `child` process that is identical to its parents, return `0` to child process and pid - add a lot of overhead as duplicated. **Data space is not shared** > variables init b4 `fork()` will be duplicated in both parent and child. ```c #include int main(int argc, char** argv) { int child = fork(); int c = 0; if (child) c += 5; else { child = fork(); c += 5; if (child) c += 5; } printf("%d ", c); } ``` ## threads - program-wide resources: global data & instruction - execution state of control stream - shared address space for faster context switching > * Needs synchronisation (global variables are shared between threads) > * lack robustness (one thread can crash the whole program) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/mem-layout-threaded.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/single-vs-multithreaded.webp) ```c #include void *foo(void *args) {} pthread_attr_t attr; pthread_attr_init(attr); pthread_t thread; // pthread_create(&thread, &attr, function, arg); ``` To solve race condition, uses semaphores. ## polling and interrupt - polling: reading memloc to receive update of an event - think of ```prolog while (true) { if (event) { process_data() event = 0; } } ``` - interrupt: receieve interrupt signal - think of ```prolog signal(SIGNAL, handler) void handler(int sig) { process_data() } int main() { while (1) { do_work() } } ``` | | interrupt | polling | | ------------ | --------- | ------- | | speed | fast | slow | | efficiency | good | poor | | cpu-waste | low | high | | multitasking | yes | yes | | complexity | high | low | | debug | difficult | easy | ## process priority `nice`: change process priority - 0-99: RT tasks - 100-139: Users > lower the NICE value, higher priority ```c #include int getpriority(int which, id_t who); int setpriority(int which, id_t who, int value); ``` set scheduling policy: `sched_setscheduler(pid, SCHED_FIFO | SCHED_RR | SCHED_DEADLINE, ¶m)` ## scheduling 1. Priority-based preemptive scheduling ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/pbps.webp) Temporal parameters: Let the following be the scheduling parameters: | desc | var | | -------------------- | --------------------- | | # of tasks | $n$ | | release/arrival-time | $r_{i,j}$ | | absolute deadline | $d_i$ | | relative deadline | $D_i = r_{i,j} - d_i$ | | execution time | $e_i$ | | response time | $R_i$ | ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/abs-rel-deadline.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/resp-time-exec-time.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/resp-time-preempted-exec.webp) _response time when execution is preempted_ > Period $p_i$ of a periodic task $T_i$ is **min length** of all time intervales between release times of consecutive tasks. > Phase of a Task $\phi_i$ is the release time $r_{i,1}$ of a task $T_i$, or $\phi_i = r_{i,1}$ > _in phase_ are first instances of several tasks that are released simultaneously > [!tip] Representation > > a periodic task $T_i$ can be represented by: > > - 4-tuple $(\phi_i, P_i, e_i, D_i)$ > - 3-tuple $(P_i, e_i, D_i)$, or $(0, P_i, e_i, D_i)$ > - 2-tuple $(P_i, e_i)$, or $(0, P_i, e_i, P_i)$ > [!tip] Utilisation factor > > for a task $T_i$ with execution time $e_i$ and period $p_i$ is given by > > $$ > u_i = \frac{e_i}{p_i} > $$ For system with $n$ tasks overall system utilisation is $U = \sum_{i=1}^{n}{u_i}$ ## cyclic executive assume tasks are non-preemptive, jobs parameters with hard deadlines known. - no race condition, no deadlock, just function call - however, very brittle, number of frame $F$ can be large, release times of tasks must be fixed ### _hyperperiod_ > is the least common multiple (lcm) of the periods. > [!tip] maximum num of arriving jobs > > $$ > N = \sum_{i=1}^{n} \frac{H}{p_i} > $$ **Frames**: each task must fit within a single frame with size $f$ ⇒ number of frames $F = \frac{H}{f}$ C1: A job must fit in a frame, or $f \geq \text{max} \space e_i \forall \space 1\leq i \leq n$ for all tasks C2: hyperperiod has an integer number of frames, or $\frac{H}{f} = \text{integer}$ C3: $2f - \text{gcd}(P_i, f) \leq D_i$ per task. ### task slices idea: if framesize constraint doesn’t met, then “slice” into smaller sub-tasks $T_3=(20, 5)$ becomes $T_{3_{1}}=(20,1)$ and $T_{3_{2}}=(20,3)$ and $T_{3_{3}}=(20, 1)$ ### Flow Graph for hyper-period - Denote all jobs in hyperperiod of $F$ frames as $J_{1} \cdots J_{F}$ - Vertices: - $k$ job vertices $J_{1},J_{2},\cdots,J_{k}$ - $F$ frame vertices $x,y,\cdots,z$ - Edges: - $(\text{source}, J_i)$ with capacity $C_i=e_i$ - Encode jobs’ compute requirements - $(J_i, x)$ with capacity $f$ iff $J_i$ can be scheduled in frame $x$ - encode periods and deadlines - edge connected job node and frame node if the following are met: 1. job arrives **before** or at the starting time of the frame 2. job’s absolute deadline **larger** or equal to ending time of frame - $(f, \text{sink})$ with capacity $f$ - encodes limited computational capacity in each frame ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/flow-graph-hyperperiod.webp) ## static priority assignment For higher priority: - shorter period tasks (rate monotonic RM) - tasks with shorter relative deadlines (deadline monotonic DM) ### rate-monotonic - running on uniprocessor, tasks are preemptive, no OS overhead for preemption > task $T_i$ has higher priority than task $T_j$ if $p_i < p_j$ > [!tip] schedulability test for RM (Test 1) > > Given $n$ periodic processes, independent and preemptable, $D_i \geq p_i$ for all processes, **periods of all tasks are _integer_ multiples of each other** > > a sufficient condition for tasks to be scheduled on uniprocessor: $U = \sum_{i=1}^{n}\frac{e_i}{p_i} \leq 1$ > [!tip] schedulability test for RM (Test 2) > > A _sufficient_ but not necessary condition is $U \leq n \cdot (2^{\frac{1}{n}} - 1)$ for $n$ periodic tasks > > for $n \to \infty$, we have $U < \ln(2) \approx 0.693$ > [!tip] schedulability test for RM (Test 3) > > Consider a set of task $(T_{1},T_{2},\cdots,T_i)$ with $p_{1} > Supposed $T_2$ finishes at $t$. Total number of isntances of task $T_1$ released over time interval $[0; t)$ is $\lceil \frac{t}{p_{1}} \rceil$ > > Thus the following condition must be met for every instance of task $T_1$ released during tim interval $(0;t)$: > > $$ > t = \lceil \frac{t}{p_{1}} \rceil \space e_1 + e_2 > $$ idea: find $k$ such that time $t = k \times p_1 \geq k * e_1 + e_2$ and $k\times p_1 \leq p_2$ for task 2 > [!tip] general solution for RM-schedulability > > The time demand function for task $i; 1 \leq i \leq n$: > > $$ > \begin{aligned} \omega_i(t) &= \sum_{k=1}^{i} \lceil \frac{t}{p_k} \rceil \space e_k \leq t \\ \\ &\because 0 \leq t \leq p_i \end{aligned} > $$ > > holds a time instant $t$ chosen as $t=k_j p_j, (j=1,\cdots,i)$ and $k_j = 1, \cdots, \lfloor \frac{p_i}{p_j} \rfloor$ ### deadline-monotonic - if every task has period equal to relative deadline, same as RM - arbitrary deadlines then DM performs better than RM - **RM always fails if DM fails** ## dynamic priority assignment ### earliest-deadline first (EDF) _depends on closeness of absolute deadlines_ > [!tip] EDF schedulability test 1 > > set of $n$ periodic tasks, each whose relative deadline is equal to or greater than its period iff $\sum_{i=1}^{n}(\frac{e_i}{p_i}) \leq 1$ > [!tip] EDF schedulability test 2 > > relative deadlines are not equal to or greater than their periods > > $$ > \sum_{i=1}^{n}(\frac{e_i}{\text{min}(D_i, p_i)}) \leq 1 > $$ ## Priority Inversion **critical sections** to avoid **race condition** > Higher priority task can be blocked by a lower priority task due to resource contention shows how resource contention can delay completion of higher priority tasks - access shared resources guarded by Mutex or semaphores - access non-preemptive subsystems (storage, networks) Resource Access Control ### mutex serially reusable: a resource cannot be interrupted > If a job wants to use $k_i$ units of resources $R_i$, it executes a lock $L(R_i; k_i)$, and unlocks $U(R_i; k_i)$ once it finished ### Non-preemptive Critical Section Protocol (NPCS) idea: schedule all critical sections non-preemptively **While a task holds a resource it executes at a priority higher than the priorities of all tasks** **a higher priority task is blocked only when some lower priority job is in critical section** pros: - zk about resource requirements of tasks cons: - task can be blocked by a lower priority task for a long time even without resource conflict ### Priority Inheritance Protocol (PIP) idea: increase the priorites only upon resource contention avoid NPCS drawback would still run into deadlock (think of RR task resource access) ### Priority Ceiling Protocol (PCP) idea: extends PIP to prevent deadlocks - assigned priorities are fixed - resource requirements of all the tasks that will request a resource $R$ is known `ceiling(R)`: highest priority. Each resource has fixed priority ceiling --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3 tags: - sfwr4aa4 - quiz description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3" title: q3 date: 2024-09-27 --- ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-1.webp) Answer: B ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-2.webp) Answer: 8 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-3.webp) Answer: 70 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-4.webp) Answer: D ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-5.webp) Answer: B ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-6.webp) Answer: B ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-7.webp) Answer: C ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-8.webp) Answer: B --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/w2 tags: - sfwr4aa4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/w2" title: Fork and threads date: 2024-09-13 --- Q1: T Q2: T Q3: F Q4: User Q5: Save memory space Q6: Yes, but the modification can be only seen in child process, and value in parents process cannot be changed. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4g06ab/index tags: - university - sfwr4g06ab description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4g06ab/index" title: Software Enginering Capstone a la carte. date: 2024-09-04 --- ## projects See [tinymorph](https://tinymorph.aarnphm.xyz) ## statement and goals. 1. natural-language driven terminal Possible prof: [Emil Sekerinski](https://www.cas.mcmaster.ca/~emil/) or [Richard Paige](https://www.google.com/search?q=Richard+Paige\&sourceid=chrome\&ie=UTF-8) - [warp](https://www.warp.dev) as an example, but closed source - So you can think it like [Alacritty](https://github.com/alacritty/alacritty) but with async command-runner - voice-driven assistant: real-time transcribe ⇒ generate commands from language to shell commands - voice → natural language - natural language → commands - Configuration, maybe in Lua - stretch goal: new shell based on rust syntax and borrowing concept of variables. 2. WYSIWYG editor (choosen, see [docs](https://tinymorph.aarnphm.xyz)) - Markdown renderer - train [SAE](https://transformer-circuits.pub/2023/monosemantic-features/index.html) for specific type of writing tonality ⇒ manual steering for text generation on creative writing - exploration of internals writing features based on text - inspired by [Prism](https://x.com/thesephist/status/1747099907016540181) 3. Infrastructure and AI Companion for Engineering Knowledge Management (19) - [Quartz](https://quartz.jzhao.xyz/) + similarity search + ANN for reranking --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept tags: - sfwr4ml3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept" title: Bias and intercept date: 2024-09-16 --- See also: [slides 3](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture3.pdf), [slides 4](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture4.pdf), [slides 5](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture5.pdf) ## adding bias in D-dimensions OLS $$ X^{'}_{n \times (d+1)} = \begin{pmatrix} x_1^{1} & \cdots & x_1^{d} & 1 \\ \vdots & \ddots & \vdots & \vdots \\ x_n^{1} & \cdots & x_n^{d} & 1 \end{pmatrix} $$ and $$ W_{(d+1) \times 1} = \begin{pmatrix} w_1 \\ \vdots \\ w_d \\ w_0 \end{pmatrix} $$ Add an new auxiliary dimension to the input data, $x_{d+1} = 1$ Solve OLS: $$ \min\limits{W \in \mathbb{R}^{d \times 1}} \|XW - Y\|_2^2 $$ Gradient for $f: \mathbb{R}^d \rightarrow \mathbb{R}$ $$ \triangledown_{w} \space f = \begin{bmatrix} \frac{\partial f}{\partial w_1} \\ \vdots \\ \frac{\partial f}{\partial w_d} \\ \end{bmatrix} $$ Jacobian for $g: \mathbb{R}^m \rightarrow \mathbb{R}^n$ $$ \begin{aligned} \triangledown_{w} \space g &= \begin{bmatrix} \frac{\partial g_1}{\partial w_1} & \cdots & \frac{\partial g_1}{\partial w_d} \\ \vdots & \ddots & \vdots \\ \frac{\partial g_n}{\partial w_1} & \cdots & \frac{\partial g_n}{\partial w_d} \end{bmatrix}_{n \times m} \\ \\ &u, t \in \mathbb{R}^d \\ &\because g(u) = u^T v \implies \triangledown_{w} \space g = v \text{ (gradient) } \\ \\ &A \in \mathbb{R}^{n \times n}; u \in \mathbb{R}^n \\ &\because g(u) = u^T A u \implies \triangledown_{w} \space g = (A + A^T) u^T \text{ (Jacobian) } \end{aligned} $$ > [!tip] result > > $$ > W^{\text{LS}} = (X^T X)^{-1} X^T Y > $$ ## non-linear data Idea is to include adding an additional padding ## multivariate polynomials. > question the case of multivariate polynomials > > - Assume $M >> d$ > - Number of terms (monomials): $\approx (\frac{M}{d})^d$ > - `#` of training samples $\approx$ `#` parameters An example of `Curse of dimensionality` ## overfitting. strategies to avoid: - add more training data - L1 (Lasso) or L2 (Ridge) regularization - add a penalty term to the objective function - L1 makes sparse models, since it forces some parameters to be zero (robust to outliers). Since having the absolute value to the weights, forcing some model coefficients to become exactly 0. $$ \text{Loss}(w) = \text{Error} + \lambda \times \| w \| $$ - L2 is better for feature interpretability, for higher non-linear. Since it doesn’t perform feature selection, since weights are only reduced near 0 instead of exactly 0 like L1 $$ \text{Loss}(w) = \text{Error} + \lambda \times w^2 $$ - Cross-validation - split data into k-fold - early stopping - dropout, see [example](https://keras.io/api/layers/regularization_layers/dropout/) - randomly selected neurons are ignored ⇒ makes network less sensitive **sample complexity** of learning multivariate polynomials ## regularization. L2 regularization: $$ \text{min}_{W \in \mathbb{R}^{d}} \| XW - Y \|^{2}_{2} + \lambda \| W \|_{2}^{2} $$ > [!tip] Solving > > Solve that > > $$ > W^{\text{RLS}} = (X^T X + \lambda I)^{-1} X^T Y > $$ > > Inverse exists as long as $\lambda > 0$ ## polynomial curve-fitting revisited feature map: $\phi{(x)}: R^{d_1} \rightarrow R^{d_2}$ where $d_{2} >> d_{1}$ training: - $W^{*} = \min\limits{W} \| \phi W - Y \|^{2}_{2} + \lambda \| W \|_{2}^{2}$ - $W^{*} = (\phi^T \phi + \lambda I)^{-1} \phi^T Y$ prediction: - $\hat{y} = \langle{W^{*}, \phi{(x)}} \rangle = {W^{*}}^T \phi(x)$ > [!abstract] choices of > > - Gaussian basis functions: $\phi(x) = \exp{(-\frac{\| x - \mu \|^{2}}{2\sigma^{2}})}$ > - Polynomial basis functions: $\phi(x) = \{1, x, x^{2}, \ldots, x^{d}\}$ > - Fourier basis functions: DFT, FFT ## computational complexity calculate $W^{\text{RLS}} = (\phi^T \phi + \lambda I)^{-1} \phi^T Y$ matmul: - Native: $O(d^3)$ - Strassen’s algorithm: $O(d^{2.81})$ - Copper-Smith-Winograd: $O(d^{2.376})$ matrix inversion: - Gaussian elimination: $O(d^3)$ - [Cholesky decomposition](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept/../../../../../../../../thoughts/Cholesky-decomposition): $O(d^3)$ (involved around $\frac{1}{3}n^3$ FLOPs) ## kernels compute higher dimension inner products $$ K(x^i, x^j) = \langle \phi(x^i), \phi(x^j) \rangle $$ Polynomial kernels of degree 2: $$ k(x^i, x^j) = (1 + (x^i)^T x^j)^2 = (1 + \langle{x^i, x^j} \rangle)^2 \\ \\ \because O(d) \text{ operations} $$ > [!abstract] degree M polynomial > > $$ > k(x^i, x^j) = (1 + (x^i)^T x^j)^M > $$ How many operations? - improved: $d + \log M$ ops --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression tags: - sfwr4ml3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression" title: Linear regression date: 2024-09-10 --- See also [slides for curve fitting](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture1.pdf), [regression](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture2.pdf), [colab link](https://colab.research.google.com/drive/1eljHSwYJSR5ox6bB9zopalZmMSJoNl4v?usp=sharing) python: [ols\_and\_kls.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/code/ols_and_kls.py) ## curve fitting. > [!question] how do we fit a distribution of data over a curve? > > Given a set of $n$ data points $S=\set{(x^i, y^i)}^{n}_{n=1}$ - $x \in \mathbb{R}^{d}$ - $y \in \mathbb{R}$ (or $\mathbb{R}^{k}$) ## ols. > [!tip] Ordinary Least Squares (OLS) > > Let $\hat{y^i}$ be the prediction of a model $X$, $d^i = \| y^i - \hat{y^i} \|$ is the error, minimize $\sum_{i=1}^{n} (y^i - \hat{y^i})^2$ In the case of 1-D ordinary least square, the problems equates find $a,b \in \mathbb{R}$ to minimize $\min\limits_{a,b} \sum_{i=1}^{n} (ax^i + b - y^i)^2$ ### optimal solution $$ \begin{aligned} a &= \frac{\overline{xy} - \overline{x} \cdot \overline{y}}{\overline{x^2} - (\overline{x})^2} = \frac{\text{COV}(x,y)}{\text{Var}(x)} \\ b &= \overline{y} - a \overline{x} \end{aligned} $$ where $\overline{x} = \frac{1}{N} \sum{x^i}$, $\overline{y} = \frac{1}{N} \sum{y^i}$, $\overline{xy} = \frac{1}{N} \sum{x^i y^i}$, $\overline{x^2} = \frac{1}{N} \sum{(x^i)^2}$ ### hyperplane > [!abstract] Hyperplane equation > > $$ > \hat{y} = w_{0} + \sum_{j=1}^{d}{w_j x_j} \\ \because w_0: \text{the y-intercept (bias)} > $$ Homogeneous hyperplane: $$ \begin{aligned} w_{0} & = 0 \\ \hat{y} &= \sum_{j=1}^{d}{w_j x_j} = \langle{w,x} \rangle \\ &= w^Tx \end{aligned} $$ Matrix form OLS: $$ X_{n\times d} = \begin{pmatrix} x_1^1 & \cdots & x_d^1 \\ \vdots & \ddots & \vdots \\ x_1^n & \cdots & x_d^n \end{pmatrix}, Y_{n\times 1} = \begin{pmatrix} y^1 \\ \vdots \\ y^n \end{pmatrix}, W_{d\times 1} = \begin{pmatrix} w_1 \\ \vdots \\ w_d \end{pmatrix} $$ $$ \begin{aligned} \text{Obj} &: \sum_{i=1}^n (\hat{y}^i - y^i)^2 = \sum_{i=1}^n (\langle w, x^i \rangle - y^i)^2 \\ &\\\ \text{Def} &: \Delta = \begin{pmatrix} \Delta_1 \\ \vdots \\ \Delta_n \end{pmatrix} = \begin{pmatrix} x_1^1 & \cdots & x_d^1 \\ \vdots & \ddots & \vdots \\ x_1^n & \cdots & x_d^n \end{pmatrix} \begin{pmatrix} w_1 \\ \vdots \\ w_d \end{pmatrix} - \begin{pmatrix} y^1 \\ \vdots \\ y^n \end{pmatrix} = \begin{pmatrix} \hat{y}^1 - y^1 \\ \vdots \\ \hat{y}^n - y^n \end{pmatrix} \end{aligned} $$ > [!question] minimize > > $$ > \min\limits_{W \in \mathbb{R}^{d \times 1}} \|XW - Y\|_2^2 > $$ > [!abstract] OLS solution > > $$ > W^{\text{LS}} = (X^T X)^{-1}{X^T Y} > $$ Example: $$ \hat{y} = w_{0} + w_{1} \cdot x_{1} + w_{2} \cdot x_{2} $$ With $$ X_{n \times 2} = \begin{pmatrix} x^{1}_{1} & x^{1}_{2} \\ x^{2}_{1} & x^{2}_{2} \\ x^{3}_{1} & x^{3}_{2} \end{pmatrix} $$ and $$ X^{'}_{n \times 3} = \begin{pmatrix} x^{1}_{1} & x^{1}_{2} & 1 \\ x^{2}_{1} & x^{2}_{2} & 1 \\ x^{3}_{1} & x^{3}_{2} & 1 \end{pmatrix} $$ With $$ W = \begin{pmatrix} w_1 \\ w_2 \end{pmatrix} $$ and $$ W^{'} = \begin{pmatrix} w_1 \\ w_2 \\ w_0 \end{pmatrix} $$ thus $$ X^{'} \times W = \begin{pmatrix} w_0 + \sum{w_i \times x_i^{1}} \\ \vdots \\ w_0 + \sum{w_i \times x_i^{n}} \end{pmatrix} $$ See also [Bias and intercept](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent tags: - sfwr4ml3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent" title: Stochastic gradient descent date: 2024-11-11 --- See also [SGD and ODEs](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/A4) [Nesterov momentum](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent/../../../../../../../../thoughts/Nesterov-momentum) is based on [On the importance of initialization and momentum in deep learning](http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf) ```pseudo \begin{algorithm} \caption{SGD} \begin{algorithmic} \State \textbf{input:} $\gamma$ (lr), $\theta_0$ (params), $f(\theta)$ (objective), $\lambda$ (weight decay), \State $\mu$ (momentum), $\tau$ (dampening), nesterov, maximize \For{$t = 1$ to $...$} \State $g_t \gets \nabla_\theta f_t(\theta_{t-1})$ \If{$\lambda \neq 0$} \State $g_t \gets g_t + \lambda\theta_{t-1}$ \EndIf \If{$\mu \neq 0$} \If{$t > 1$} \State $b_t \gets \mu b_{t-1} + (1-\tau)g_t$ \Else \State $b_t \gets g_t$ \EndIf \If{$\text{nesterov}$} \State $g_t \gets g_t + \mu b_t$ \Else \State $g_t \gets b_t$ \EndIf \EndIf \If{$\text{maximize}$} \State $\theta_t \gets \theta_{t-1} + \gamma g_t$ \Else \State $\theta_t \gets \theta_{t-1} - \gamma g_t$ \EndIf \EndFor \State \textbf{return} $\theta_t$ \end{algorithmic} \end{algorithm} ``` # Nesterov momentum See also [paper](http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf) idea: - first take a step in the direction of accumulated momentum - computes gradient at “lookahead” position, - make the update using this gradient. > [!abstract] definition > > For a parameter vector $\theta$, the update can be expressed as > > $$ > \begin{aligned} v_t &= \mu v_{t-1} + \nabla L(\theta_t + \mu v_{t-1}) \\ \theta_{t+1} &= \theta_t - \alpha v_t \end{aligned} > $$ Achieves better convergence rates | function type | gradient descent | Nesterove AG | | ------------------------ | ---------------------------------- | --------------------------------------- | | Smooth | $\theta(\frac{1}{T})$ | $\theta(\frac{1}{T^{2}})$ | | Smooth & Strongly Convex | $\theta(\exp (-\frac{T}{\kappa}))$ | $\theta(\exp -\frac{T}{\sqrt{\kappa}})$ | [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent/../../../../../../../../thoughts/Nesterov-momentum) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Support-Vector-Machine tags: - sfwr4ml3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Support-Vector-Machine" title: Support Vector Machine date: 2024-11-11 --- --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content tags: - sfwr4ml3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content" title: Least Squared Regression date: 2024-10-07 --- See also [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/LSR), [pdf](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/assignment.pdf), [solutions](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/solution.pdf) ## question 1. ### problem 1. > [!question]- part 1 > > 1. Divide the dataset into three parts: 1800 samples for training, 200 samples for validation, and 200 samples for testing. Perform linear OLS (without regularization) on the training samples twice—first with a homogeneous model (i.e., where the y-intercepts are zero) and then with a non-homogeneous model (allowing for a non-zero y-intercept). Report the MSE on both the training data and the validation data for each model > 2. Compare the results. Which approach performs better? Why? Apply the better-performing approach to the test set and report the MSE. > 3. Do you observe significant overfitting in any of the cases? 1. For homogeneous model, the MSE on training data is 26.1649 and on validation data is 77.0800 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p1-1.webp) Whereas with non-homogeneous model, the MSE on training data is 2.5900 and on validation data is 8.8059 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p1-12.webp) 2. We can observe that non-homogeneous model clearly performs better than the homogeneous models, given a significantly lower MSE (indicates that predictions are closer to the actual value). We can also see the difference between training and validation sets for non-homogeneous models shows better consistency, or better generalisation. Test set MSE for non-homogeneous model is 2.5900 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p1-2.webp) 3. We observe in both cases that the training MSE is significantly lower than the validation MSE, indicating overfitting. The non-homogeneous model shows a lower difference between training and validation MSE, which suggest there were some overfitting. The homogeneous models show more severe overfitting due to its constraints (forcing intercept to zero). > [!question]- part 2 > > 1. Divide the dataset into three parts: 200 samples for training, 1800 samples for validation, and 200 samples for testing. Perform linear OLS (without regularization) on the training samples twice—first with a homogeneous model (i.e., where the y-intercepts are zero) and then with a non-homogeneous model (allowing for a non-zero y-intercept). Report the MSE on both the training data and the validation data for each model > 2. Compare these results with those from the previous part. Do you observe less overfitting or more overfitting? How did you arrive at this conclusion? 1. For homogeneous model, the MSE on training data is 0.000 and on validation data is 151.2655 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p2-1.webp) Whereas with non-homogeneous model, the MSE on training data is 0.000 and on validation data is 15.8158 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p2-nhom.webp) 2. We observe an increased in overfitting, given the perfit fit in training data versus validation MSE for both model. We can still see that non-homogeneous models outperform homogeneous models, but the difference between training and validation MSE is significantly higher than the previous case. This is largely due to smaller training set (200 training samples versus 1800 training samples), models have less data to train on. ### problem 2. > [!question]- part 1 > > 1. Divide the Dataset into Three Parts: > > - **Training Data**: Select **200 data points**. > > - **Validation Data**: Assign **1800 data points**. > > - **Testing Data**: Set aside the **remaining 200 data points** for testing. > > 2. Run Regularized Least Squares (non-homogeneous) using 200 training data points. Choose various values of lambda within the range `{exp(-2), exp(-1.5), exp(-1), …, exp(3.5), exp(4)}`. This corresponds to $\lambda$ values ranging from exp(-2) to exp(4) with a step size of 0.5. For each value of $\lambda$, Run Regularized Least Squares (non-homogeneous) using 200 training data points. Compute the Training MSE and Validation MSE. > 3. Plot the Training MSE and Validation MSE as functions of lambda. The following is the graph for Training and Validation MSE as functions of lambda. ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q2-p2-g.webp) > [!question]- part 2 > > 1. What is the best value for lambda? Why? > 2. Use the best value of lambda to report the results on the test set. 1. Best $\lambda$ would be the one corresponding to lowest point on the validation MSE curve, as it is the one that minimizes the validation MSE. From the graph, we observe it is around $\lambda \approx 7.3891$ 2. Using $\lambda \approx 7.3891$, we get the following Test MSE around 1.3947 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p2-rls-test.webp) ### problem 3. > [!question]- part 1 > > Choose a preprocessing approach (i.e., select a mapping) that transforms the 900-dimensional data points (900 pixels) into a new space. This new space can be either lower-dimensional or higher-dimensional. Clearly explain your preprocessing approach. We will use 2D Discrete Cosine Transform (DCT) to transform our data, followed by feature selection to reduce dimensionality by selecting a top-k coefficient. Reason: 1. DCT is mostly used in image compression (think of JPEG). Transform image from spatial to frequency domain. 2. Reduce dimensionality to help with overfitting, given we will only use 200 samples for training. In this case, we will choose `n_coeffs=100` > [!question]- part 2 > > implement your preprocessing approach. See the [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/LSR) for more information > [!question] part 3 > > Report the MSE on the training and validation sets for different values of lambda and plot it. **As mentioned, it should perform better for getting points.** choose the best value of lambda, apply your preprocessing approach to the test set, and then report the MSE after running RLS. The following graph shows the Training and Validation MSE as functions of $\lambda$. The optimal alpha is found to be $\lambda \approx 4.4817$ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-dct-preprocess.webp) The given Test MSE is found to be around 3.2911 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-test-dct.webp) --- ## question 2. > [!question] problem statement > > In this question, we will use least squares to find the best line ($\hat{y}=ax + b$) that fits a non-linear function, namedly $f(x) = 2x - x^3 -1$ > > For this, assume that you are given a set of $n$ training point $\{ (x^i, y^i)\}^{n}_{i=1} = \{(({i}/{n}), 2({i}/{n})- ({i}/{n})^3- 1)\}^{n}_{i=1}$. > > Find a line (i.e $a,b \in \mathbb{R}$) that fits the training data the best when $n \to \infty$. Write down your calculations as well as the final values for $a$ and $b$. > > Additional notes: $n \to \infty$ assumption basically means that we are dealing with an integral rather than a finite summation. You can also assume $x$ is uniformly distributed on \[0, 1] We need to minimize sum of squared errors: $$ MSE(a,b) = \int_{0}^{1}(ax^i + b - y^i)^2 dx $$ We can compute $\mu_{x}, \mu_{y}$: $$ \begin{aligned} \mu_{x} &= \int_{0}^{1}x dx = \frac{1}{2} \\ \mu_{y} &= \int_{0}^{1}f(x) dx = \int_{0}^{1}(2x - x^3 - 1) dx = [x^2]^{1}_{0} - [\frac{x^4}{4}]^{1}_{0} - [x]^{1}_{0} = - \frac{1}{4} \end{aligned} $$ $$ \begin{aligned} \text{Var}(x) &= E[x^2] - (E[x])^2 = \int_{0}^{1}x^2 dx - (\frac{1}{2})^2 = \frac{1}{3} - \frac{1}{4} = \frac{1}{12} \\ \text{Cov}(x,y) &= E[xy] - E[x]E[y] = \int_{0}^{1}x(2x - x^3 - 1) dx - (\frac{1}{2})(-\frac{1}{4}) \end{aligned} $$ Compute $E[xy] = \int_{0}^{1}(2x-x^4-x)dx = \frac{2}{3} - \frac{1}{5} - \frac{1}{2} = - \frac{1}{30}$: Therefore we can compute covariance: $$ \text{Cov}(x,y) = - \frac{1}{30} + \frac{1}{8} = \frac{11}{120} $$ Slope $a$ and intercept $b$ can the be computed as: $$ \begin{aligned} a &= \frac{\text{Cov}(x,y)}{\text{Var}(x)} = \frac{11}{120} \times 12 = 1.1 \\ b &= \mu_{y} - a\mu_{x} = - \frac{1}{4} - \frac{11}{10} \times \frac{1}{2} = - \frac{4}{5} = -0.8 \end{aligned} $$ Thus, the best-fitting line is $\hat{y} = ax + b = \frac{11}{10}x - \frac{4}{5}$ ## question 3. > [!question] problem statement > > In this question, we would like to fit a line with zero y-intercept ($\hat{y} = ax$) to the curve $y=x^2$. However, instead of minimising the sume of squares of errors, we want to minimise the folowing objective function: > > $$ > \sum_{i} [\log {\frac{\hat{y}^i}{y^i}}]^2 > $$ > > Assume that the distribution of $x$ is uniform on \[2, 4]. What is the optimal value for $a$? Show your work. _asumption: log base 10_ We need to minimize the objective function $$ \text{Objective}(a) = \text{argmin} \sum_{i} [\log {\frac{\hat{y}^i}{y^i}}]^2 $$ where $\hat{y}^i = ax^i$ and $y^i=(x^i)^2$ Given $x$ is uniformly distributed on \[2, 4], we can express the sum as integral: $$ \begin{aligned} \text{Objective}(a) &= \int_{2}^{4} [\log {\frac{ax}{x^2}}]^2 dx \\ &= \int_{2}^{4} [\log(a) + \log(x) - 2 \log(x)]^2 dx \\ &= \int_{2}^{4} [\log(a) - \log(x)]^2 dx \end{aligned} $$ let $\ell = \log(a)$, we can rewrite the objective function as: $$ \begin{aligned} \text{Objective}(\ell) &= \int_{2}^{4} [\ell - \log(x)]^2 dx \\ &= \int_{2}^{4} [\ell^2 - 2\ell \log(x) + \log^2(x)] dx \\ &= \ell^2 \int_{2}^{4} dx - 2\ell \int_{2}^{4} \log(x) dx + \int_{2}^{4} \log^2(x) dx \end{aligned} $$ Compute each integral: $$ \begin{aligned} I_0 &= \int_{2}^{4} dx = 4 - 2 = 2 \\ I_1 &= \int_{2}^{4} \log(x) dx = [x \log(x) - x]^{4}_{2} = 4 \log(4) - 4 - 2 \log(2) + 2 = 4 \log(2) = 6 \log(2) - 2 \\ I_2 &= \int_{2}^{4} \log^2(x) dx \end{aligned} $$ Given we only interested in finding optimal $a$, we find the partial derivatives of given objective function: $$ \frac{\partial}{\partial \ell} \text{Objective}(\ell) = \frac{\partial}{\partial \ell} (\ell^2 I_0 - 2 \ell I_1 + I_2) = 2\ell I_0 - 2I_1 $$ Set to zero to find minimum $\ell$: $\log(a) = \ell = \frac{I_1}{I_0} = \frac{6 \log(2) - 2}{2} = 3\log(2) - 1$ Therefore, $a_{\text{opt}} = e^{\ell} = e^{3 \log(2) - 1} = e^{3 \log(2)} \times \frac{1}{e} = \frac{8}{e}$ Thus, optimal value for a s $a=8/e$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content tags: - sfwr4ml3 description: implementation of PCA on LFW and TNC datasets title: PCA and Kernels, from scratch date: 2024-10-21 --- See also [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/PCA), [pdf](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/assignment.pdf), [solutions](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/solution.pdf) ## question 1. ### task 1: eigenfaces implementation of `centeralize_data()` and `pca_components()` ```python def centeralize_data(data): return data - (data_mean := np.mean(data, axis=0).reshape(1, -1)), data_mean # fmt: off def pca_components(Vt, n_components): return Vt[:n_components] # fmt: on ``` Yields the following when running `plot_class_representatives`: [result](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t1.webp) ### task 2: PCA transformation and reconstructing > [!question] part A > > Implement `pca_tranform` ```python def pca_transform(X, n_components): U, s, *result = normalized_svd(X) return U[:, :n_components] * s[:n_components], *result ``` > [!question] part B > > Implement `pca_inverse_transform` ```python def pca_inverse_transform(transformed_data, Vt, n_components, data_mean): return transformed_data @ pca_components(Vt, n_components) + data_mean ``` Which yields the following for TNC visualisation: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-tnc-viz.webp) and LFW visualisation: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-lfw-viz.webp) We also expect some loss in information while reconstructing: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-bush-loss-info.webp) ### task 3: average reconstruction error for LFW $$ \text{error}=\frac{1}{n}\sum_{i=1}^n||x_i-\text{reconstruct}(pca(x_i))||^2_2 $$ > [!question] part A > > plot average reconstruction error on training and testing data points Training code: ```python # Define the number of components to test in PCA c_components = [2, 10, 30, 60, 100] # Initialize lists to store the reconstruction errors for training and testing data train_errors, test_errors = [], [] # Initialize deterministic seed SEED = 42 X_train, X_test = train_test_split(X_bush, train_size=400, random_state=SEED) # \text{error}=\frac{1}{n}\sum_{i=1}^n||x_i-\text{reconstruct}(pca(x_i))||^2_2 def mse(train_data, reconstructed): return np.mean(np.sum((train_data - reconstructed) ** 2, axis=1)) # Loop through each specified number of components for PCA for n_components in c_components: # Apply PCA and then inverse PCA to the training data transformed_train, Vt_train, mean_train = pca_transform(X_train, n_components) # Calculate the Mean Squared Error (MSE) as the reconstruction error for the training set train_errors.append(mse(X_train, pca_inverse_transform(transformed_train, Vt_train, n_components, mean_train))) # Normalize the test data. Transform the test data using the train data's PCA components # and reconstruct the test data. # Calculate the Mean Squared Error (MSE) as the reconstruction error for the test set test_errors.append(mse(X_test, pca_inverse_transform((X_test - mean_train) @ pca_components(Vt_train, n_components).T, Vt_train, n_components, mean_train))) # fmt: skip # Print the average reconstruction errors for each number of components for i, n_components in enumerate(c_components): print(f'Components: {n_components}\n\tTrain Error: {train_errors[i]:.4f}\n\tTest Error: {test_errors[i]:.4f}') ``` yields the following observation ```prolog Components: 2 Train Error: 40.2048 Test Error: 44.1277 Components: 10 Train Error: 21.6275 Test Error: 25.1425 Components: 30 Train Error: 11.6392 Test Error: 15.6092 Components: 60 Train Error: 6.6892 Test Error: 11.4092 Components: 100 Train Error: 3.7635 Test Error: 8.7075 ``` The eval results graph: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t3-eval.webp) > [!question] part B > > 1. Explains the difference between the two graphs > 2. What would the error be if we compute it for the TNC dataset while using two components and 2000 samples? 1. The following observation can be made: - Both decreases as the number of components increases (lower means better reconstruction quality). However, we observe test error line (red) is higher than train error (blue). This shows some overfitting given smaller training data size (400) against LFW dataset (which includes 1288 entries) - Both show diminishing returns, yet this effect is more pronounced on test error - As `n_components` increases, we see a decreases in bias (improving reconstruction for both train and test data). However, test error decreases more slowly given later components are less effective in reconstructing features for unseen data 2. Error for average reconstruction error for TNC is shown below: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t3-tnc-reconstruct-error.webp) ### task 4: Kernel PCA > [!question] part A > > Apply Kernel PCA and plot transformed Data Applied a `StandardScaler` to `X_TNC` and plot 3x4 grid with the (1,1) being the original data plot, followed by 11 slots for `gamma` from $[ 0.0001 \cdots 1 ]$. Run on `n_components=2` ```python gamma_values = [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1] n_components = 2 # Standardize the features scaler = StandardScaler() X_TNC_scaled = scaler.fit_transform(X_TNC) # Create subplots to visualize the transformed data for each gamma plt.figure(figsize=(20, 15)) # Plot the original data before applying Kernel PCA plt.subplot(3, 4, 1) plt.scatter(X_TNC_scaled[:, 0], X_TNC_scaled[:, 1], c=Y_TNC, cmap='bwr') plt.title('Original Data') plt.xlabel('coord_x') plt.ylabel('coord_y') # Set the limits for the x and y axes x_limits = (-4, 4) y_limits = (-4, 4) # Apply Kernel PCA for each gamma value for idx, gamma in enumerate(gamma_values): # Apply Kernel PCA kpca = KernelPCA(n_components=n_components, kernel='rbf', gamma=gamma) X_kpca = kpca.fit_transform(X_TNC_scaled) # Plot the transformed data plt.subplot(3, 4, idx + 2) plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c=Y_TNC, cmap='bwr') plt.title(f'Gamma = {gamma}') plt.xlabel('First principal component') plt.ylabel('Second principal component') # Set fixed x and y axis limits plt.xlim(x_limits) plt.ylim(y_limits) plt.tight_layout() plt.show() ``` Yield the following graph: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t4-kernel-pca-n-2.webp) > [!question] part B > > Based on your observations, how does Kernel PCA compare to Linear PCA on this dataset with red and blue labels? In what ways does Kernel PCA affect the distribution of the data points, particularly in terms of how well the red and blue points are organized? Choose the best value(s) for `gamma` and report it (them). What criteria did you use to determine the optimal `gamma` value? **Comparison**: - Kernel PCA is more effective in capturing the non-linear relationships in the data, in which we see the spread between blue and red circles, which modify the data distribution. Whereas with linear PCA, it maintains the circular structure, meaning linear PCA doesn’t alter data distribution that much **Effects**: - For small value of gamma $[ 0.0001, 0.0005, 0.001 ]$ the points are highly concentrated, meaning kernels is too wide (this makes sense given that `gamma` is the inverse of standard deviations) - For gamma $[ 0.005, \cdots 0.05 ]$, we notice a separation between blue and red circles. - For gamma $[0.1, 0.2]$ , we start to see similar features from original data entries, albeit scaled down given RBF kernels. - At gamma $[0.5, 1]$, we notice datasets to spread out, forming elongated features. > For gamma $[ 0.1, 0.2 ]$ seems to provide best representation of the original data **Criteria**: - class separation: how well the blue and red circles are separated from each other - compact: how tightly clustered the points within each classes are. - structure preservation: how well the circular nature of the original datasets are preserved. - dimensionality reduction: how well the data is projected in lower dimensions space > [!question] part C > > Find best values for reconstruction error of kernel PCA training loop yields the following: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t4-part-b-opt-kernel.webp) > [!question] part D > > 1. Visualisation of Reconstruction Error > 2. How does kernel PCA compare to Linear PCA on this dataset? If Kernel PCA shows improved performance, please justify your answer. If Linear PCA performs better, explain the reasons for its effectiveness. Reconstruction Error from kernel PCA as well as linear PCA: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t4-reconstruct-err-pca-kernels.webp) **Performance**: - Linear PCA has significantly better reconstruction error than kernel PCA (6.68 of linear PCA against 47.48 at $\text{gamma}=0.01$ of kernel PCA) - Regardless of `gamma`, Kernel PCA shows a lot higher error **Reasoning for Linear PCA**: 1. Data characteristic: most likely LFW contains mostly linear relationship between features (face images have strong linear correlations in pixel intensities and structures) 2. Dimensionality: This aligns with Task 3 Part B where we observe same value with `n_components=60` for linear PCA 3. Overfitting: less prone to overfitting, given that Kernel PCA might find local optima that overfit given patterns of data (in this case face features). Additionally, RBF is more sensitive to outliers Explanation why Kernel PCA doesn’t work as well: 1. Kernel: RBF assumes local, non-linear relationships. This might not work with facial data given strong linear correlation among facial features. 2. Gamma: We notice that with $\text{gamma}=0.01$ achieve lowest error, still underperformed comparing to linear PCA. 3. Noise: non-linear kernel mapping are more prone to capture noise or irrelevant patterns in facial images. --- ## question 2. > [!note] problem statement > > “Driving high” s prohibited in the city, and the police have started using a tester that shows whether a driver is high on cannabis. The tester is a binary classifier (1 for positive result, and 0 for negative result) which is not accurate all the time: > > - if the driver is truly high, then the test will be positive with probability $1 - \beta_1$ and negative with probability $\beta_1$ (so the probability of wrong result is $\beta_1$ in this case) > - if the driver is not high, then the test will be positive with probability $\beta_2$ and negative with probability $1-\beta_2$ (so the probability of wrong result is $\beta_2$ in this case) > > Assume the probability of (a randomly selected driver from the population) being “truly high” is $\alpha$ > [!question] part 1 > > What is the probability that the tester shows a positive result for a (randomly selected) driver? (write your answer in terms of $\alpha, \beta_1, \beta_2$) Probability of a driver being truly high: $P(\text{High}) = \alpha$ Probability of a driver not being high: $P(\text{Not High}) = 1- \alpha$ Probability of a positive test given the dirver is high: $P(\text{Positive} | \text{High}) = 1 - \beta_1$ Probability of a positive test given the dirver is not high: $P(\text{Positive} | \text{Not High}) = \beta_2$ _using law of total probability to find overall probability of a positive test result:_ $$ \begin{aligned} P(\text{Positive}) &= P(\text{Positive} | \text{High}) \cdot P(\text{High}) + P(\text{Positive} | \text{Not High}) P(\text{Not High}) \\ &= (1 - \beta_1) \cdot \alpha + (\beta_2) \cdot (1 - \alpha) \end{aligned} $$ > [!question] part 2 > > The police have collected test results for n randomly selected drivers (i.i.d. samples). What is the likelihood that there are exactly $n_{+}$ positive samples among the $n$ samples? Write your solution in terms of $\alpha, \beta_1, \beta_2, n_{+}, n$ Let probability of positive test result for a randomly selected driver is $$ p = P(\text{Positive}) = (1 - \beta_1) \cdot \alpha + (\beta_2) \cdot (1 - \alpha) $$ Now, apply binomial probability to find the likelihood of $n_{+}$ positive samples among $n$ samples: $$ \begin{aligned} P(X=n_{+}) &= \binom{n}{n_{+}} \cdot p^{n_{+}} \cdot (1-p)^{n-n_{+}} \\ &= \binom{n}{n_{+}} \cdot [(1 - \beta_1) \cdot \alpha + (\beta_2) \cdot (1 - \alpha)]^{n_{+}} \\ &\quad \quad \quad \quad \cdot (1 - ((1 - \beta_1) \cdot \alpha + (\beta_2) \cdot (1 - \alpha)))^{n-n_{+}} \\ &= \binom{n}{n_{+}} \cdot [(1 - \beta_1 - \beta_2) \cdot \alpha + \beta_2]^{n_{+}} \cdot (1 - \beta_2 + \alpha \cdot (\beta_1 + \beta_2 - 1))^{n-n_{+}} \\ \end{aligned} $$ > [!question] part 3 > > What is the maximum likelihood estimate of $\alpha$ given a set of $n$ random samples from which $n_{+}$ are positive results? In this part, you can assume that $\beta_1$ and $\beta_2$ are fixed and given. Simplify your final result in terms of $n, n_{+}, \beta_1, \beta_2$ _Assumption: using nature log `ln`_ _MLE of $\alpha$_ Let likelikhood function $L(\alpha)$: $$ \begin{aligned} L(\alpha) &= \binom{n}{n_{+}} \cdot p(\alpha)^{n_{+}} \cdot (1-p(\alpha))^{n-n_{+}} \\ \\ \because &\quad p(\alpha) = (1 - \beta_1) \cdot \alpha + \beta_2 \cdot (1-\alpha) \end{aligned} $$ Take log of both sides and drop constant term: $$ \ln L(\alpha ) = n_{+} \ln [p(\alpha)] + (n-n_{+}) \ln [1-p(\alpha)] $$ To find the maximum likelihood, we differentiate with respect to $\alpha$ and set to zero: $$ \begin{aligned} n_{+} \cdot \frac{p^{'}(\alpha)}{p(\alpha )} &- (n-n_{+}) \cdot \frac{p^{'}(\alpha)}{1-p(\alpha )} = 0 \\ \\ \because &\quad p'(\alpha ) = 1 - \beta_1 - \beta_2 \\ \\ \\ \\ n_{+} \cdot \frac{1 - \beta_1 - \beta_2}{p(\alpha )} &= (n-n_{+}) \cdot \frac{1 - \beta_1 - \beta_2}{1-p(\alpha )} \\ \\ n_{+} - n_{+} p(\alpha ) &= n p(\alpha) - n_{+} p(\alpha) n_{+} = np(\alpha) \end{aligned} $$ Substituting $p(\alpha) = (1 - \beta_1) \cdot \alpha + \beta_2 \cdot (1-\alpha)$: $$ \begin{aligned} n_{+} &= n [(1-\beta_1) \cdot \alpha + \beta_2 \cdot (1-\alpha)] \\ \frac{n_{+}}{n} &= (1-\beta_1-\beta_2) \cdot \alpha + \beta_2 \\ \\ \\ \\ \text{MLE for } \hat{\alpha} &= \frac{\frac{n_{+}}{n} - \beta_2}{1 - \beta_{1} - \beta_{2}} \\ &= \frac{n_{+} - n \cdot \beta_{2}}{n - n\cdot \beta_{1} - n\cdot \beta_{2}} \end{aligned} $$ > [!question] part 4 > > What will be the maximum likelikhood estimate of $\alpha$ for the special cases of > > - $(i) \beta_{1} = \beta_{2} = 0$ > - $(i) \beta_{1} = \beta_{2} = 0.5$ > - $(i) \beta_{1} = 0.2, \beta_{2} = 0.3$ For $(i) \beta_{1} = \beta_{2} = 0$: $\hat{\alpha} = \frac{n_{+}}{n}$ For $(i) \beta_{1} = \beta_{2} = 0.5$: $\hat{\alpha} = \text{undefined}$ _note: this makes sense, given when the test is completely random, then there is no information about true proportion of high drivers._ For $(i) \beta_{1} = 0.2, \beta_{2} = 0.3$: $\hat{\alpha} = \frac{n_+ - 0.3n}{0.5n} = \frac{2n_{+}}{n} - \frac{3}{5} = \frac{2n_+}{n} - 0.6$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content tags: - sfwr4ml3 description: implementation in pure PyTorch title: SVM and Logistic Regression date: 2024-11-11 --- See also [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/svm) ## task 1: linear [SVM](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Support-Vector-Machine) for MNIST classification > [!question] part a > > Is the implementation of the multi-class linear SVM similar to the end-to-end multi-class SVM that we learned in the class? Are there any significant differences? | Differences | multi-class linear SVM | end-to-end multi-class SVM | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | Loss function | Uses `MultiMarginLoss`, which creates a criterion that optimises a multi-class classification hinge loss [^multiloss] | multi-vector encoding where $h(x) = \arg\max_{y} $ | | Architecture | Through a single linear layers based on given input\_size and `num_classes` | optimized over pairs of class scores with multi-vector encoding | | Parameter Learning | Uses [SGD](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/SGD) with minibatches to optimize MML | Whereas we show a theoretical formulation of optimizing over multi-vector encoded space [^theoretical] | > [!question] part B > > 1. Compute the accuracy on the train and test set after each epoch in the training. Plot these accuracies as a function of the epoch number and include it in the report (include only the plot in your report, not all the 2\*100 numbers). > 2. Compute the hinge loss on the train and test set after each epoch in the training. Plot these loss values as a function of the epoch number and include it in the report.(include only the plot in your report, not all the 2\*100 numbers) > 3. Report the last epoch results (including loss values and accuracies) for both train and test sets. > 4. Does the model shows significant overfitting? Or do you think there might be other factors that are more significant in the mediocre performance of the model? The following includes graph for both accuracy and loss on train/test sets after 100 epochs ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t1-partb.webp) Last epoch results for both train and test sets: ```prolog ------------------------------------------------------------- Epoch 100 - Train loss: 0.016170, Train accuracy: 100.00% - Test loss: 0.165001, Test accuracy: 78.50% ------------------------------------------------------------- ``` We observe training accuracy continuing to improve, while test accuracy plateaus. Same observation can be made for in `Loss vs. Epochs` graph, where gap between training and test loss increases as epochs increase _While this shows evidence of overfitting, one can argue there are factors affecting model performance:_ **Liminal training data**: - we are currently only use 0.25% of MNIST dataset (which is around 150 samples) [^size] - This makes it difficult for the model to learn generalizable patterns **Model limitation**: - Linear SVM can only learn linear decision boundaries - MNIST datasets requires non-linear decision boundaries to achieve high performance (we observe this through relatively quick plateau test accuracy after 78.5%) > We don’t observe in degrading test performance, which is not primarily behaviour of overfitting. > [!question] part c > > Weight decay works like regularization. Set weight decay to each of the values (0.1, 1, 10) during defining the SGD optimizer (see [SGD optimizer documentation](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) for how to do that). > > Plot the train/test losses and accuracies per epoch. Also report the last epoch results (loss and accuracy for both train and test) . > > > [!tip] Important > > > > Does weight decay help in this case? Justify the results. The following are logs for set of weight decay from (0.1, 1, 10) ```text Training with weight decay = 0.1 ============================================================= Epoch 020 - Train loss: 0.1048, Train accuracy: 94.67% - Test loss: 0.2342, Test accuracy: 75.30% ------------------------------------------------------------- Epoch 040 - Train loss: 0.0638, Train accuracy: 98.00% - Test loss: 0.2072, Test accuracy: 78.60% ------------------------------------------------------------- Epoch 060 - Train loss: 0.0520, Train accuracy: 98.67% - Test loss: 0.2034, Test accuracy: 79.10% ------------------------------------------------------------- Epoch 080 - Train loss: 0.0447, Train accuracy: 99.33% - Test loss: 0.2043, Test accuracy: 80.00% ------------------------------------------------------------- Epoch 100 - Train loss: 0.0422, Train accuracy: 99.33% - Test loss: 0.2051, Test accuracy: 79.60% ------------------------------------------------------------- ``` ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t1-partc-wd-point1.webp) ```text Training with weight decay = 1 ============================================================= Epoch 020 - Train loss: 0.2499, Train accuracy: 90.67% - Test loss: 0.3714, Test accuracy: 73.00% ------------------------------------------------------------- Epoch 040 - Train loss: 0.2374, Train accuracy: 89.33% - Test loss: 0.3621, Test accuracy: 73.30% ------------------------------------------------------------- Epoch 060 - Train loss: 0.2416, Train accuracy: 87.33% - Test loss: 0.3646, Test accuracy: 72.80% ------------------------------------------------------------- Epoch 080 - Train loss: 0.2367, Train accuracy: 90.67% - Test loss: 0.3621, Test accuracy: 74.70% ------------------------------------------------------------- Epoch 100 - Train loss: 0.2366, Train accuracy: 90.67% - Test loss: 0.3592, Test accuracy: 74.20% ------------------------------------------------------------- ``` ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t1-partc-wd-1.webp) ```text Training with weight decay = 10 ============================================================= Epoch 020 - Train loss: 0.7413, Train accuracy: 33.33% - Test loss: 0.7881, Test accuracy: 23.10% ------------------------------------------------------------- Epoch 040 - Train loss: 0.7422, Train accuracy: 37.33% - Test loss: 0.7906, Test accuracy: 22.00% ------------------------------------------------------------- Epoch 060 - Train loss: 0.7437, Train accuracy: 33.33% - Test loss: 0.7938, Test accuracy: 18.50% ------------------------------------------------------------- Epoch 080 - Train loss: 0.7316, Train accuracy: 26.67% - Test loss: 0.7883, Test accuracy: 16.90% ------------------------------------------------------------- Epoch 100 - Train loss: 0.7415, Train accuracy: 24.00% - Test loss: 0.7953, Test accuracy: 13.70% ------------------------------------------------------------- ``` ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t1-partc-wd-10.webp) ```text final results comparison: ====================================================================== weight decay train loss test loss train acc test acc ---------------------------------------------------------------------- 0.1 0.0422 0.2051 99.33% 79.60% 1.0 0.2366 0.3592 90.67% 74.20% 10.0 0.7415 0.7953 24.00% 13.70% ``` Yes, but the result is highly sensitive based on given weight decay value. 1. with `weight_decay = 0.1` we observe the best performance, with training accuracy reaches to 99.33%, smaller gap between train and test loss. Smooth learning curves with stable conversion. 2. with `weight_decay = 1` we saw a decrease in training accuracy, larger gap between training and test loss, training become a bit unstable with fluctuation in accuracy, and regularisation is too strong, which affect learning 3. with `weight_decay = 10`, we saw it severely impairs model performance, given that it is too restrictive. Unstable training, high loss values, regularisation is too aggressive. > Small dataset makes the model more sensitive to regularisation. Linearity makes it lax to require regularisation. > Weight decay does help when properly tuned, and make learning a bit more stable. ## task 2: Logistic Regression for MNIST classification > [!question] part a > > Use Cross Entropy Loss (rather than Hinge loss) to implement logistic regression _context_: - Hinge Loss: it penalized predictions that are not sufficiently confident. Only cares about correct classification with sufficient margin - cross-entropy: For binary loss is defined: $$ L(y, p(x)) = -(y * \log(p(x)) + (1-y) * \log (1-p(x))) $$ For multi-class is defined: $$ L(y, p(x)) = - \sum y_i * \log(p_i(x)) $$ > [!question] part b > > 1. Compute the accuracy on the train and test set after each epoch in the training. Plot these accuracies as a function of the epoch number. > 2. Compute the cross-entropy loss on the train and test set after each epoch in the training. Plot these loss values as a function of the epoch number. > 3. Report the last epoch results (including loss values and accuracies) for both train and test sets. > 4. Does the model shows significant overfitting? Or do you think there might be other factors that are more significant in the mediocre performance of the model? The following is the graph entails both accuracy and loss on train/test dataset: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t2-partb.webp) ```text ------------------------------------------------------------- Epoch 100 - Train loss: 2.3271, Train accuracy: 8.67% - Test loss: 2.3272, Test accuracy: 8.20% ------------------------------------------------------------- ``` No sign of overfitting, given training/test accuracy are very close together. Training loss and test loss curves are pretty close The reason for poor performance are as follow: - random chance baseline: for 10-class problem, random guessing would give \~10% accuracy, so it perform a bit worse. - The model doesn’t seem to learn at all. It perform significantly worse than SVM. - Cross-entropy loss might need additional tuning. - Non-linearity: Given that MNIST data contains non-linear features, it might be hard for LR to capture all information from training dataset. > [!question] part c > > Does it work better, worse, or similar? Significantly worse, due to the difference in loss function. ## task 3: non-linearity > [!question] part a > > Add a hidden layer with 5000 neurons and a RELU layer for both logistic regression and SVM models in Task 1 and Task 2. > > 1. For both models, plot the train loss and the test loss. > 2. For both models, plot the train and test accuracies. > 3. For both models, report the loss and accuracy for both train and test sets. The following is the modified version of LinearSVM with hidden layers: ```python class ModifiedModel(nn.Module): def __init__(self, input_size, hidden_size, num_classes): super().__init__() self.fc1 = nn.Linear(input_size, hidden_size) self.relu = nn.ReLU() self.fc2 = nn.Linear(hidden_size, num_classes) def forward(self, x): x = x.view(-1, input_size) x = self.fc1(x) x = self.relu(x) return self.fc2(x) ``` With training/test accuracy and loss graph: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t3-parta.webp) Final epoch result: ```text ------------------------------------------------------------ Epoch 100: Train Loss: 0.0033, Train Accuracy: 100.00% Test Loss: 0.1723, Test Accuracy: 78.10% ------------------------------------------------------------ ``` Modified version of `LogisticRegression` with hidden layers: ```python class ModifiedLogisticModel(nn.Module): def __init__(self, input_size, hidden_size, num_classes): super().__init__() self.fc1 = nn.Linear(input_size, hidden_size) self.relu = nn.ReLU() self.fc2 = nn.Linear(hidden_size, num_classes) def forward(self, x): x = x.view(-1, input_size) x = self.fc1(x) x = self.relu(x) return self.fc2(x) ``` With training/test accuracy and loss graph: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t3-partb-lr.webp) Final epoch result: ```text ------------------------------------------------------------ Epoch 100: Train Loss: 0.1133, Train Accuracy: 100.00% Test Loss: 0.6675, Test Accuracy: 78.70% ------------------------------------------------------------ ``` > [!question] part b > > Compare the results with the linear model (without weight decay, to keep the comparison fair). Which approach works better? Why? Which appproach is more prone to overfitting? Explain your findings and justify it. Linear model works better in this case, even thought it achieve lower loss, similar test accuracy. The added complexity of the hidden layer and [ReLU](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/optimization#relu) activation didn’t improve the model’s performance given the dataset size (too small) The problem set might be linearly separable enough such that the model simply learns to generalise overall behaviour of the whole dataset (also known as grokking [^grokking]). > Note that overfitting suggests that there weren’t enough data in given training sets, given we observe similar test metrics for both `LinearSVM` and `ModifiedModel` (with ReLU and hidden layers) So it is not necessary “which works better”, rather it should be about limited training data rather than architectural options. ## task 4: data augmentation > [!note]+ instruction > > In this task, we will explore the concept of data augmentation, which is a powerful technique used to enhance the diversity of our training dataset without collecting new data. By applying various transformations to the original training images, we can create modified versions of these images. We can then use these modified images to train our model with a “richer” set of examples. The use of data augmentation helps to improve the robustness and generalization of our models. Data augmentation is particularly beneficial in tasks like image classification, where we expect the model to be invariant to slight variations of images (e.g., rotation, cropping, blurring, etc.) > > For this task, you are given a code that uses Gaussian Blur augmentation, which applies a Gaussian filter to slightly blur the images. If you run the code, you will see that this type of augmentation actually makes the model less accurate (compared with Task 3, SVM test accuracy) > > For this task, you must explore other types of data augmentation and find one that improves the test accuracy by at least 1 percent compared with not using any augmentation (i.e., compared with Task 3, SVM test accuracy). Only change the augmentation approach, and keep the other parts of the code unchanged. Read the PyTorch documentation on different augmentation techniques [here](https://pytorch.org/vision/stable/transforms.html), and then try to identify a good augmentation method from them. > > Report the augmentation approach that you used, and explain why you think it helps. Also include train/test accuracy plots per epoch, and the train/test accuracy at the final epoch. The following augmentation achieves higher test accuracy comparing to `ModifiedModel` without any transformation ```python augmentation = transforms.Compose([ # Small random rotation with higher probability of small angles transforms.RandomRotation(degrees=3, fill=0), # Even more conservative rotation # Very subtle random perspective transforms.RandomPerspective(distortion_scale=0.15, p=0.3, fill=0), # Convert to tensor transforms.ToTensor(), # Normalize to improve training stability transforms.Normalize((0.1307,), (0.3081,)), # MNIST mean and std # Extremely subtle random noise transforms.RandomAdjustSharpness(sharpness_factor=1.2, p=0.3) ]) ``` ### **Explanation** `ToTensor` is self-explanatory. Additional augmentation playground can also be found in the [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/svm) #### `RandomRotation` - we use $+-3$ degrees given that digits can appear at slightly different angles in said dataset - small rotation preserves readability, while increase variety - fill set to 0 to preserve black background #### `RandomPerspective` - add a small distortion scale to simulate viewing angle variations. - help with robustness to viewpoint change #### `Normalise` - Add MNIST mean and std to normalise training - make it more stable #### `RandomAdjustSharpness` - Simulate some random noise - One can also use `RandomErasing`, but the essentially work the same ### results The following is the final epoch result: ```text ------------------------------------------------------------- Epoch 100 - Train loss: 0.015159, Train accuracy: 99.33% - Test loss: 0.183071, Test accuracy: 81.10% ------------------------------------------------------------- ``` With graphs: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t4-highest.webp) [^multiloss]: [Loss](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/PyTorch#multimarginloss) is defined as: $\text{loss}(x,y) = \frac{\sum_{i} \max{0, \text{margin} - x[y] + x[i]}^p}{x.\text{size}(0)}$ [^theoretical]: Given input $(x_1, y_1), \ldots, (x_m, y_m)$ parameters: - regularization parameter $\lambda > 0$ - loss function $\delta: \mathcal{Y} \times \mathcal{Y} \rightarrow \mathbb{R}_{+}$ - class sensitive feature mapping $\Psi: \mathcal{X} \times \mathcal{Y} \rightarrow \mathbb{R}^d$ In this case, we solve for $$ \min_{w \in \mathbb{R}^d} (\lambda \|w\|^2 + \frac{1}{m} \sum_{i=1}^{m} \max_{y^{'} \in \mathcal{Y}}(\delta (y^{'}, y_i) + \langle w, \Psi (x_i, y^{'}) - \Psi (x_i, y_i) \rangle)) $$ [^size]: MNIST datasets are [60000](https://keras.io/api/datasets/mnist/) 28x28 grayscale images, therefore $0.25/100 * 60000 = 150$ samples being used [^grokking]: [grokking](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/mechanistic-interpretability#grokking) is a process where neural network learns a pattern in the data, and it “memorize” this pattern to generalize to all unseen dataset, in which improve generalisation performance from random chance to perfect generalisation! Though, this phenomena is often observed in larger networks beyond overfitting. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/annotated/index tags: - sfwr4ml3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/annotated/index" title: annotated slides. date: 2024-11-01 --- Slides for all lectures with annotations. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/index tags: - university - sfwr4ml3 - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/index" title: Introduction to Machine Learning date: 2024-09-10 --- See also [machine learning](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/index/../../../../../../../../thoughts/Machine-learning) and [introduction](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/index/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture0.pdf) For annotated slides check out [annotated folders](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/index/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/annotated) Books: - Pattern Recognition and Machine Learning” by Christopher M. Bishop - [Understanding Machine Learning](https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-machine-learning-theory-algorithms.pdf) by Shai Shalev-Shwartz and Shai Ben-David. Generative-adversarial networks: [github](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood tags: - sfwr4ml3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood" title: likelihood date: 2024-10-07 --- ## maximum likelihood estimation $$ \begin{aligned} \alpha &= \argmax P(X | \alpha) \\ &= \argmin - \sum_{i} \log (P(x^i | \alpha)) \end{aligned} $$ $P(\alpha)$ captures a priori distribution of $\alpha$. $P(\alpha | X)$ is the posterior distribution of $\alpha$ given $X$. ## maximum a posteriori estimation $$ \begin{aligned} \alpha^{\text{MAP}} &= \argmax P(\alpha | X) \\ &= \argmax_{\alpha} \frac{P(X|\alpha)P(\alpha)}{P(X)} \\ &= \argmin_{\alpha}(-\log P(\alpha)) - \sum_{i=1}^{n} \log P(x^i | \alpha) \end{aligned} $$ $$ \begin{aligned} \argmax_{W} P(x | \alpha) P (\alpha) &= \argmax_{W} [\log P(\alpha) + \sum_{i} \log (x^i, y^i | W)] \\ &= \argmax_{W} [\ln \frac{1}{\beta} - \lambda {\parallel W \parallel}_{2}^{2} - \frac{({x^i}^T W - y^i)^2}{\sigma^2}] \end{aligned} $$ $$ P(W) = \frac{1}{\beta} e^{\lambda \parallel W \parallel_{2}^{2}} $$ > [!question] What if we have > > $$ > P(W) = \frac{1}{\beta} e^{\frac{\lambda \parallel W \parallel_{2}^{2}}{r^2}} > $$ $$ \argmax_{W} P(Z | \alpha) = \argmax_{W} \sum \log P(x^i, y^i | W) $$ $$ P(y | x, W) = \frac{1}{\gamma} e^{-\frac{(x^T W-y)^2}{2 \sigma^2}} $$ ## expected error minimisation Squared loss: $l(\hat{y},y)=(y-\hat{y})^2$ solution to $y^* = \argmin_{\hat{y}} E_{X,Y}(Y-\hat{y}(X))^2$ is $E[Y | X=x]$ Instead we have $Z = \{(x^i, y^i)\}^n_{i=1}$ ### error decomposition $$ \begin{aligned} &E_{x,y}(y-\hat{y_Z}(x))^2 \\ &= E_{xy}(y-y^{*}(x))^2 + E_x(y^{*}(x) - \hat{y_Z}(x))^2 \\ &= \text{noise} + \text{estimation error} \end{aligned} $$ ### bias-variance decompositions For linear estimator: $$ \begin{aligned} E_Z&E_{x,y}(y-(\hat{y}_Z(x)\coloneqq W^T_Zx))^2 \\ =& E_{x,y}(y-y^{*}(x))^2 \quad \text{noise} \\ &+ E_x(y^{*}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{bias} \\ &+ E_xE_Z(\hat{y_Z}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{variance} \end{aligned} $$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm tags: - sfwr4ml3 - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm" title: Supervised machine learning date: 2024-10-28 --- See also: [book](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Understand-Machine-Learning.pdf) ## probability density function if $X$ is a random variable, the probability density function (pdf) is a function $f(x)$ such that: $$ P(a \leq X \leq b) = \int_{a}^{b} f(x) dx $$ if distribution of $X$ is uniform over $[a,b]$, then $f(x) = \frac{1}{b-a}$ ## curve fitting. > [!question] how do we fit a distribution of data over a curve? > > Given a set of $n$ data points $S=\set{(x^i, y^i)}^{n}_{n=1}$ - $x \in \mathbb{R}^{d}$ - $y \in \mathbb{R}$ (or $\mathbb{R}^{k}$) [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression#curve-fitting) In the case of 1-D ordinary least square, the problems equates find $a,b \in \mathbb{R}$ to minimize $\min\limits_{a,b} \sum_{i=1}^{n} (ax^i + b - y^i)^2$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression#1dols) > [!question]+ minimize > > $$ > \begin{aligned} \frac{\partial f}{\partial a} &= 2 \sum^{n}_{i=1}{(ax^i + b - y^i)} x^{i} = 0 \\ \frac{\partial f}{\partial b} &= 2 \sum^{n}_{i=1}{(ax^i + b - y^i)} = 0 \\ \\ \implies 2nb + 2a \sum_{i=1}^{n} x^i &= 2 \sum_{i=1}^{n} y^i \\ \implies b + a \overline{x} &= \overline{y} \\ \implies b &= \overline{y} - a \overline{x} \\ \\ \because \overline{y} &= \frac{1}{n} \sum_{i=1}^{n} y^{i} \\ \overline{x} &= \frac{1}{n} \sum_{i=1}^{n} x^{i} \end{aligned} > $$ ### optimal solution $$ \begin{aligned} a &= \frac{\overline{xy} - \overline{x} \cdot \overline{y}}{\overline{x^2} - (\overline{x})^2} = \frac{\text{COV}(x,y)}{\text{Var}(x)} \\ b &= \overline{y} - a \overline{x} \end{aligned} $$ where $\overline{x} = \frac{1}{N} \sum{x^i}$, $\overline{y} = \frac{1}{N} \sum{y^i}$, $\overline{xy} = \frac{1}{N} \sum{x^i y^i}$, $\overline{x^2} = \frac{1}{N} \sum{(x^i)^2}$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression#optimal-solution) ### hyperplane > [!abstract] Hyperplane equation > > $$ > \hat{y} = w_{0} + \sum_{j=1}^{d}{w_j x_j} \\ \because w_0: \text{the y-intercept (bias)} > $$ Homogeneous hyperplane: $$ \begin{aligned} w_{0} & = 0 \\ \hat{y} &= \sum_{j=1}^{d}{w_j x_j} = \langle{w,x} \rangle \\ &= w^Tx \end{aligned} $$ Matrix form OLS: $$ X_{n\times d} = \begin{pmatrix} x_1^1 & \cdots & x_d^1 \\ \vdots & \ddots & \vdots \\ x_1^n & \cdots & x_d^n \end{pmatrix}, Y_{n\times 1} = \begin{pmatrix} y^1 \\ \vdots \\ y^n \end{pmatrix}, W_{d\times 1} = \begin{pmatrix} w_1 \\ \vdots \\ w_d \end{pmatrix} $$ $$ \begin{aligned} \text{Obj} &: \sum_{i=1}^n (\hat{y}^i - y^i)^2 = \sum_{i=1}^n (\langle w, x^i \rangle - y^i)^2 \\ &\\\ \text{Def} &: \Delta = \begin{pmatrix} \Delta_1 \\ \vdots \\ \Delta_n \end{pmatrix} = \begin{pmatrix} x_1^1 & \cdots & x_d^1 \\ \vdots & \ddots & \vdots \\ x_1^n & \cdots & x_d^n \end{pmatrix} \begin{pmatrix} w_1 \\ \vdots \\ w_d \end{pmatrix} - \begin{pmatrix} y^1 \\ \vdots \\ y^n \end{pmatrix} = \begin{pmatrix} \hat{y}^1 - y^1 \\ \vdots \\ \hat{y}^n - y^n \end{pmatrix} \end{aligned} $$ > [!question] minimize > > $$ > \min\limits_{W \in \mathbb{R}^{d \times 1}} \|XW - Y\|_2^2 > $$ > [!abstract] OLS solution > > $$ > W^{\text{LS}} = (X^T X)^{-1}{X^T Y} > $$ Example: $$ \hat{y} = w_{0} + w_{1} \cdot x_{1} + w_{2} \cdot x_{2} $$ With $$ X_{n \times 2} = \begin{pmatrix} x^{1}_{1} & x^{1}_{2} \\ x^{2}_{1} & x^{2}_{2} \\ x^{3}_{1} & x^{3}_{2} \end{pmatrix} $$ and $$ X^{'}_{n \times 3} = \begin{pmatrix} x^{1}_{1} & x^{1}_{2} & 1 \\ x^{2}_{1} & x^{2}_{2} & 1 \\ x^{3}_{1} & x^{3}_{2} & 1 \end{pmatrix} $$ With $$ W = \begin{pmatrix} w_1 \\ w_2 \end{pmatrix} $$ and $$ W^{'} = \begin{pmatrix} w_1 \\ w_2 \\ w_0 \end{pmatrix} $$ thus $$ X^{'} \times W = \begin{pmatrix} w_0 + \sum{w_i \times x_i^{1}} \\ \vdots \\ w_0 + \sum{w_i \times x_i^{n}} \end{pmatrix} $$ See also [Bias and intercept](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept) [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression#hyperplane) ## adding bias in D-dimensions OLS $$ X^{'}_{n \times (d+1)} = \begin{pmatrix} x_1^{1} & \cdots & x_1^{d} & 1 \\ \vdots & \ddots & \vdots & \vdots \\ x_n^{1} & \cdots & x_n^{d} & 1 \end{pmatrix} $$ and $$ W_{(d+1) \times 1} = \begin{pmatrix} w_1 \\ \vdots \\ w_d \\ w_0 \end{pmatrix} $$ Add an new auxiliary dimension to the input data, $x_{d+1} = 1$ Solve OLS: $$ \min\limits{W \in \mathbb{R}^{d \times 1}} \|XW - Y\|_2^2 $$ Gradient for $f: \mathbb{R}^d \rightarrow \mathbb{R}$ $$ \triangledown_{w} \space f = \begin{bmatrix} \frac{\partial f}{\partial w_1} \\ \vdots \\ \frac{\partial f}{\partial w_d} \\ \end{bmatrix} $$ Jacobian for $g: \mathbb{R}^m \rightarrow \mathbb{R}^n$ $$ \begin{aligned} \triangledown_{w} \space g &= \begin{bmatrix} \frac{\partial g_1}{\partial w_1} & \cdots & \frac{\partial g_1}{\partial w_d} \\ \vdots & \ddots & \vdots \\ \frac{\partial g_n}{\partial w_1} & \cdots & \frac{\partial g_n}{\partial w_d} \end{bmatrix}_{n \times m} \\ \\ &u, t \in \mathbb{R}^d \\ &\because g(u) = u^T v \implies \triangledown_{w} \space g = v \text{ (gradient) } \\ \\ &A \in \mathbb{R}^{n \times n}; u \in \mathbb{R}^n \\ &\because g(u) = u^T A u \implies \triangledown_{w} \space g = (A + A^T) u^T \text{ (Jacobian) } \end{aligned} $$ > [!tip] result > > $$ > W^{\text{LS}} = (X^T X)^{-1} X^T Y > $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept#adding-bias-in-d-dimensions-ols) ## overfitting. strategies to avoid: - add more training data - L1 (Lasso) or L2 (Ridge) regularization - add a penalty term to the objective function - L1 makes sparse models, since it forces some parameters to be zero (robust to outliers). Since having the absolute value to the weights, forcing some model coefficients to become exactly 0. $$ \text{Loss}(w) = \text{Error} + \lambda \times \| w \| $$ - L2 is better for feature interpretability, for higher non-linear. Since it doesn’t perform feature selection, since weights are only reduced near 0 instead of exactly 0 like L1 $$ \text{Loss}(w) = \text{Error} + \lambda \times w^2 $$ - Cross-validation - split data into k-fold - early stopping - dropout, see [example](https://keras.io/api/layers/regularization_layers/dropout/) - randomly selected neurons are ignored ⇒ makes network less sensitive **sample complexity** of learning multivariate polynomials [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept#overfitting) ## regularization. L2 regularization: $$ \text{min}_{W \in \mathbb{R}^{d}} \| XW - Y \|^{2}_{2} + \lambda \| W \|_{2}^{2} $$ > [!tip] Solving > > Solve that > > $$ > W^{\text{RLS}} = (X^T X + \lambda I)^{-1} X^T Y > $$ > > Inverse exists as long as $\lambda > 0$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept#regularization) ## polynomial curve-fitting revisited feature map: $\phi{(x)}: R^{d_1} \rightarrow R^{d_2}$ where $d_{2} >> d_{1}$ training: - $W^{*} = \min\limits{W} \| \phi W - Y \|^{2}_{2} + \lambda \| W \|_{2}^{2}$ - $W^{*} = (\phi^T \phi + \lambda I)^{-1} \phi^T Y$ prediction: - $\hat{y} = \langle{W^{*}, \phi{(x)}} \rangle = {W^{*}}^T \phi(x)$ > [!abstract] choices of > > - Gaussian basis functions: $\phi(x) = \exp{(-\frac{\| x - \mu \|^{2}}{2\sigma^{2}})}$ > - Polynomial basis functions: $\phi(x) = \{1, x, x^{2}, \ldots, x^{d}\}$ > - Fourier basis functions: DFT, FFT [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept#polynomial-curve-fitting-revisited) ## kernels compute higher dimension inner products $$ K(x^i, x^j) = \langle \phi(x^i), \phi(x^j) \rangle $$ Polynomial kernels of degree 2: $$ k(x^i, x^j) = (1 + (x^i)^T x^j)^2 = (1 + \langle{x^i, x^j} \rangle)^2 \\ \\ \because O(d) \text{ operations} $$ > [!abstract] degree M polynomial > > $$ > k(x^i, x^j) = (1 + (x^i)^T x^j)^M > $$ How many operations? - improved: $d + \log M$ ops [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept#kernels) ## kernel least squares Steps: - $W^{*} = \min\limits_{W} \|\phi W - Y\|_2^2 + \lambda \| W \|_2^2$ - shows that $\exists \space a \in \mathbb{R}^n \mid W^{*} = \phi^T a$, or $W^{*} = \sum a_i \phi(x^i)$ > [!note]- proof > > $$ > \begin{aligned} 0 &= \frac{\partial}{\partial W} (\|\phi W - Y\|_2^2 + \lambda \| W \|_2^2) \\ &= 2 W^T (\phi^T \phi) - 2 Y^T \phi + 2 \lambda W^T \\ &\implies \lambda W = \phi^T Y - \phi^T \phi W \\ &\implies \lambda W = \phi^T \frac{(Y - \phi W)}{\lambda} \\ \end{aligned} > $$ - Uses $W^{*} = \sum a_i \phi(x^i)$ to form the dual representation of the problem. $$ \min\limits_{\overrightarrow{a} \in \mathbb{R}^n} \| Ka - Y \|_2^2 + \lambda a^T K a \\ \because \hat{Y} = \phi \phi^T a = K_{n \times n} \dots a_{n \times 1} $$ Solution: $$ a^{*} = (K + \lambda I)^{-1} Y $$ ### choices - polynomial kernel: $K(x, z) = (1 + x^T z)^d$ - Gaussian kernel: $K(x, z) = e^{-\frac{\|x-z\|_2^2}{2\sigma^2}} = e^{-\alpha \|x-z\|^2_2}$ ## mapping high-dimensional data ## minimising reconstruction error - Given $X \in \mathbb{R}^{d \times n}$, find $A$ that minimises the reconstruction error: $$ \min\limits_{A,B} \sum_{i} \| x^i - B A x^i \|_2^2 $$ > if $q=d$, then error is zero. Solution: - $B = A^T$ - $\min\limits_{A} \sum_i \| x^i - A^T A x^i \|^2$ is subjected to $A A^T = I_{q \times q}$ - assuming data is centered, or $\frac{1}{n} \sum\_{i} x^i = \begin{bmatrix} 0 & \cdots & 0 \end{bmatrix}^T $ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/principal-component-analysis#minimising-reconstruction-error) ## eigenvalue decomposition $$ \begin{aligned} X^T X \mathcal{u} &= \lambda \mathcal{u} \\ X^T X &= U^T \Lambda U \\ \\ \\ \because \Lambda &= \text{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d) \\ &= \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_q \end{bmatrix} \end{aligned} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/principal-component-analysis#eigenvalue-decomposition) ## pca Idea: given input $x^1, \cdots, x^n \in \mathbb{R}^d$, $\mu = \frac{1}{n} \sum_{i} x^i$ Thus $$ C = \sum (x^i - \mu)(x^i - \mu)^T $$ Find the eigenvectors/values of $C$: $$ C = U^T \Lambda U $$ Optimal $A$ is: $$ A = \begin{bmatrix} u_1^T \\ u_2^T \\ \vdots \\ u_q^T \end{bmatrix} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/principal-component-analysis#pca) ## bayes rules and chain rules Joint distribution: $P(X,Y)$ Conditional distribution of $X$ given $Y$: $P(X|Y) = \frac{P(X,Y)}{P(Y)}$ Bayes rule: $P(X|Y) = \frac{P(Y|X)P(X)}{P(Y)}$ Chain rule: $P(X_1, X_2, \ldots , X_k) = P(X_1)P(X_2|X_1)P(X_3|X_2,X_1)\ldots P(X_k|X_1,X_2,\ldots,X_{k-1})$ > [!note] i.i.d assumption > > assume underlying distribution $D$, that train and test sets are independent and identically distributed (i.i.d) Example: flip a coin Outcome $H=0$ or $T=1$ with $P(H) = p$ and $P(T) = 1-p$, or $x \in \{0,1\}$, $x$ is the Bernoulli random variable. $P(x=0)=\alpha$ and $P(x=1)=1-\alpha$ Would be [maximum likelihood estimate](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood) $$ \alpha^{\text{ML}} = \argmax P(X | \alpha) = \argmin_{\alpha} - \sum_{i} \log (P(x^i | \alpha)) $$ ## maximum a posteriori estimation $$ \begin{aligned} \alpha^{\text{MAP}} &= \argmax P(\alpha | X) \\ &= \argmax_{\alpha} \frac{P(X|\alpha)P(\alpha)}{P(X)} \\ &= \argmin_{\alpha}(-\log P(\alpha)) - \sum_{i=1}^{n} \log P(x^i | \alpha) \end{aligned} $$ $$ \begin{aligned} \argmax_{W} P(x | \alpha) P (\alpha) &= \argmax_{W} [\log P(\alpha) + \sum_{i} \log (x^i, y^i | W)] \\ &= \argmax_{W} [\ln \frac{1}{\beta} - \lambda {\parallel W \parallel}_{2}^{2} - \frac{({x^i}^T W - y^i)^2}{\sigma^2}] \end{aligned} $$ $$ P(W) = \frac{1}{\beta} e^{\lambda \parallel W \parallel_{2}^{2}} $$ > [!question] What if we have > > $$ > P(W) = \frac{1}{\beta} e^{\frac{\lambda \parallel W \parallel_{2}^{2}}{r^2}} > $$ $$ \argmax_{W} P(Z | \alpha) = \argmax_{W} \sum \log P(x^i, y^i | W) $$ $$ P(y | x, W) = \frac{1}{\gamma} e^{-\frac{(x^T W-y)^2}{2 \sigma^2}} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood#maximum-a-posteriori-estimation) ## expected error minimisation Squared loss: $l(\hat{y},y)=(y-\hat{y})^2$ solution to $y^* = \argmin_{\hat{y}} E_{X,Y}(Y-\hat{y}(X))^2$ is $E[Y | X=x]$ Instead we have $Z = \{(x^i, y^i)\}^n_{i=1}$ ### error decomposition $$ \begin{aligned} &E_{x,y}(y-\hat{y_Z}(x))^2 \\ &= E_{xy}(y-y^{*}(x))^2 + E_x(y^{*}(x) - \hat{y_Z}(x))^2 \\ &= \text{noise} + \text{estimation error} \end{aligned} $$ ### bias-variance decompositions For linear estimator: $$ \begin{aligned} E_Z&E_{x,y}(y-(\hat{y}_Z(x)\coloneqq W^T_Zx))^2 \\ =& E_{x,y}(y-y^{*}(x))^2 \quad \text{noise} \\ &+ E_x(y^{*}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{bias} \\ &+ E_xE_Z(\hat{y_Z}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{variance} \end{aligned} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood#expected-error-minimisation) # nearest neighbour See also: [slides 13](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture13.pdf), [slides 14](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture14.pdf), [slides 15](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture15.pdf) $$ \hat{y}_W(x) = \text{sign}(W^T x) = 1_{W^T x \geq 0} \\ \\ \because \hat{W} = \argmin_{W} L_{Z}^{0-1} (\hat{y}_W) $$ Think of contiguous loss function: margin loss, cross-entropy/negative log-likelihood, etc. ## linear programming $$ \max_{W \in \mathbb{R}^d} \langle{u, w} \rangle = \sum_{i=1}^{d} u_i w_i \\ \\ \text{s.t } A w \ge v $$ Given that data is linearly separable > $\exists W^{*} \mid \forall i \in [n], ({W^{*}}^T x^i)y^i > 0$ So > $\exists W^{*}, \gamma > 0 \mid \forall i \in [n], ({W^{*}}^T x^i)y^i \ge \gamma$ So > $\exists W^{*} \mid \forall i \in [n], ({W^{*}}^T x^i)y^i \ge 1$ ## perceptron Rosenblatt’s perceptron algorithm ```pseudo \begin{algorithm} \caption{Batch Perceptron} \begin{algorithmic} \REQUIRE Training set $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE Initialize $\mathbf{w}^{(1)} = (0,\ldots,0)$ \FOR{$t = 1,2,\ldots$} \IF{$(\exists \space i \text{ s.t. } y_i\langle\mathbf{w}^{(t)}, \mathbf{x}_i\rangle \leq 0)$} \STATE $\mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} + y_i\mathbf{x}_i$ \ELSE \STATE \textbf{output} $\mathbf{w}^{(t)}$ \STATE \textbf{break} \ENDIF \ENDFOR \end{algorithmic} \end{algorithm} ``` ### greedy update $$ W_{\text{new}}^T x^i y^i = \langle W_{\text{old}}+ y^i x^i, x^i \rangle y^i $$ ## SVM idea: maximizes margin and more robus to “perturbations” Eucledian distance between two points $x$ and the hyperplan parametrized by $W$ is: $$ \frac{\mid W^T x + b \mid }{\|W\|_2} $$ > Assuming $\| W \|_2=1$ then the distance is $\mid W^T x + b \mid$ ### maximum margin hyperplane $W$ has $\gamma$ margin if - $W^T x + b \ge \gamma \forall \text{ blue x}$ - $W^T x +b \le - \gamma \forall \text{ red x}$ Margin: $$ Z = \{(x^{i}, y^{i})\}_{i=1}^{n}, y \in \{-1, 1\}, \|W\|_2 = 1 $$ ```pseudo \begin{algorithm} \caption{Hard-SVM} \begin{algorithmic} \REQUIRE Training set $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE \textbf{solve:} $(w_{0},b_{0}) = \argmin\limits_{(w,b)} \|w\|^2 \text{ s.t } \forall i, y_{i}(\langle{w,x_i} \rangle + b) \ge 1$ \STATE \textbf{output:} $\hat{w} = \frac{w_0}{\|w_0\|}, \hat{b} = \frac{b_0}{\|w_0\|}$ \end{algorithmic} \end{algorithm} ``` [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour) ## linear algebra review. Diagonal matrix: every entry except the diagonal is zero. $$ A = \begin{bmatrix} a_{1} & 0 & \cdots & 0 \\ 0 & a_{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & a_{n} \end{bmatrix} $$ trace: sum of the entries in main diagonal: $\text{tr}(A) = \sum_{i=1}^{n} a_{ii}$ Properties of transpose: $$ \begin{aligned} (A^T)^T &= A \\ (A + B)^T &= A^T + B^T \\ (AB)^T &= B^T A^T \end{aligned} $$ Properties of inverse: $$ \begin{aligned} (A^{-1})^{-1} &= A \\ (AB)^{-1} &= B^{-1} A^{-1} \\ (A^T)^{-1} &= (A^{-1})^T \end{aligned} $$ > [!tip] Inverse of a matrix > > if a matrix $A^{-1}$ exists, mean A is invertible (non-singular), and vice versa. ### quadratic form > Given a square matrix $A \in \mathbb{R}^{n \times n}$, the quadratic form is defined as: $x^TAx \in \mathbb{R}$ $$ x^TAx = \sum_{i=1}^{n} \sum_{j=1}^{n} a_{ij} x_i x_j $$ ### norms A function $f : \mathbb{R}^n \Rightarrow \mathbb{R}$ is a norm if it satisfies the following properties: - non-negativity: $\forall x \in \mathbb{R}^n, f(x) > 0$ - definiteness: $f(x) = 0 \iff x=0$ - Homogeneity: $\forall x \in \mathbb{R}^n, t\in \mathbb{R}, f(tx) \leq \mid t\mid f(x)$ - triangle inequality: $\forall x, y \in \mathbb{R}^n, f(x+y) \leq f(x) + f(y)$ ### symmetry > A square matrix $A \in \mathbb{R}^{n \times n}$ is symmetric if $A = A^T \mid A \in \mathbb{S}^n$ > > Anti-semi-symmetric if $A = -A^T \mid A$ Given any square matrix $A \in \mathbb{R}^{n \times n}$, the matrix $A + A^T$ is symmetric, and $A - A^T$ is anti-symmetric. > $A = \frac{1}{2}(A+A^T) + \frac{1}{2}(A-A^T)$ > [!tip] positive definite > > $A$ is positive definite if $x^TAx > 0 \forall x \in \mathbb{R}^n$. > > - It is denoted by $A \succ 0$. > - The set of all positive definite matrices is denoted by $\mathbb{S}^n_{++}$ > [!tip] positive semi-definite > > $A$ is positive semi-definite if $x^TAx \geq 0 \forall x \in \mathbb{R}^n$. > > - It is denoted by $A \succeq 0$. > - The set of all positive semi-definite matrices is denoted by $\mathbb{S}^n_{+}$ > [!tip] negative definite > > $A$ is negative definite if $x^TAx < 0 \forall x \in \mathbb{R}^n$. > > - It is denoted by $A \prec 0$. > - The set of all negative definite matrices is denoted by $\mathbb{S}^n_{--}$ > [!tip] negative semi-definite > > $A$ is negative semi-definite if $x^TAx \leq 0 \forall x \in \mathbb{R}^n$. > > - It is denoted by $A \preceq 0$. > - The set of all negative semi-definite matrices is denoted by $\mathbb{S}^n_{-}$ A symmetric matrix $A \in \mathbb{S}^n$ is indefinite if it is neither positive semi-definite or negative semi-definite. $$ \exists x_1, x_2 \in \mathbb{R}^n \space \mid \space x_1^TAx_1 > 0 \space and \space x_2^TAx_2 < 0 $$ > Given **any** matrix $A \in \mathbb{R}^{m \times n}$, the matrix $G = A^TA$ is always positive semi-definite (known as the Gram matrix) > > Proof: $x^TGx = x^TA^TAx = (Ax)^T(Ax) = \|Ax\|_2^2 \geq 0$ ### eigenvalues and eigenvectors The non-zero vector $x \in \mathbb{C}^n$ is an eigenvector of A and $\lambda \in \mathbb{C}$ is called the eigenvalue of A if: $$ Ax = \lambda x $$ > [!note] finding eigenvalues > > $$ > \begin{aligned} \exists \text{ non-zero eigenvector } x \in \mathbb{C} & \iff \text{ null space of } (A - \lambda I) \text{ is non-empty} \\ \implies \mid A - \lambda I \mid \text{ is singular } \\ \mid A - \lambda I \mid &= 0 \end{aligned} > $$ > > Solving eigenvectors via $(A-\lambda_{i}I)x_i=0$ ## matrix representation of a system of linear equations $$ \begin{aligned} x_1 + x_2 + x_3 &= 5 \\ x_1 - 2x_2 - 3x_3 &= -1 \\ 2x_1 + x_2 - x_3 &= 3 \end{aligned} $$ Equivalent matrix representation of $Ax = b$ $$ \begin{aligned} A &= \begin{bmatrix} 1 & 1 & 1 \\ 1 & -2 & -3 \\ 2 & 1 & -1 \end{bmatrix} \\ x &= \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \\ b &= \begin{bmatrix} 5 \\ -1 \\ 3 \end{bmatrix} \end{aligned} \because A \in R^{m \times n}, x \in R^n, b \in R^m $$ > [!tip] Transpose of a matrix > > $A \in R^{m \times n}$ and $A^T \in R^{n \times m}$ ## dot product. $$ \begin{aligned} \langle x, y \rangle &= \sum_{i=1}^{n} x_i y_i \\ &= \sum_{i=1}^{n} x_i \cdot y_i \end{aligned} $$ ## linear combination of columns Let $A \in R^{m \times n}$, $X \in R^n$, $Ax \in R^n$ Then $Ax = \sum_{i=1}^{n}{\langle a_i \rangle} x_i \in R^n$ ## inverse of a matrix The inverse of a square matrix $A \in R^{n \times n}$ is a **unique** matrix denoted by $A^{-1} \in \mathbb{R}^{n\times{n}}$ $$ A^{-1} A = I = A A^{-1} $$ ## euclidean norm $L_{2}$ norm: $$ \| x \|_{2} = \sqrt{\sum_{i=1}^{n}{x_i^2}} = X^TX $$ L1 norm: $\| x \|_{1} = \sum_{i=1}^{n}{|x_i|}$ $L_{\infty}$ norm: $\| x \|_{\infty} = \max_{i}{|x_i|}$ p-norm: $\| x \|_{p} = (\sum_{i=1}^{n}{|x_i|^p})^{1/p}$ > [!tip] Comparison > > $ \|x\|_{\infty} \leq \|x\|_{2} \leq \|x\|\_{1}$ > One can prove this with Cauchy-Schwarz inequality ## linear dependence of vectors Given $\{x_1, x_2, \ldots, x_n\} \subseteq \mathbb{R}^d$ and $\alpha_1, \alpha_2, \ldots, \alpha_n \in \mathbb{R}$ $$ \forall i \in [ n ], \forall \{a_1, a_2, \ldots, a_n\} \subseteq \mathbb{R}^d \space s.t. \space x_i \neq \sum_{j=1}^{n}{a_j x_j} $$ ## Span > Given a set of vectors $\{x_1, x_2, \ldots, x_n\} \subseteq \mathbb{R}^d$, the span of the set is the set of all possible linear combinations of the vectors. > > $$ > \text{span}(\{x_1, x_2, \ldots, x_n\}) = \{ y: y = \sum_{i=1}^{n}{\alpha_i x_i} \mid \alpha_i \in \mathbb{R} \} > $$ If $x_{1}, x_{2}, \ldots, x_{n}$ are linearly independent, then the span of the set is the entire space $\mathbb{R}^d$ ## Rank For a matrix $A \in \mathbb{R}^{m \times n}$: - column rank: max number of linearly independent columns of $A$ - row rank: max number of linearly independent rows of $A$ If $\text{rank}(A) \leq m$, then the rows are linearly independent. If $\text{rank}(A) \leq n$, then the columns are linearly independent. > rank of a matrix $A$ is the number of linearly independent columns of $A$: > > - if $A$ is full rank, then $\text{rank}(A) = \min(m, n)$ ($\text{rank}(A) \leq \min(m, n)$) > - $\text{rank}(A) = \text{rank}(A^T)$ ## solving linear system of equations If $A \in \mathbb{R}^{n}$ is invertible, there exists a solution: $$ x = A^{-1}b $$ ## Range and Projection Given a matrix $A \in \mathbb{R}^{m \times n}$, the range of $A$, denoted by $\mathcal{R}(A)$ is the span of columns of $A$: $$ \mathcal{R}(A) = \{ y \in \mathbb{R}^m \mid y = Ax \mid x \in \mathbb{R}^m \} $$ Projection of a vector $y \in \mathbb{R}^m$ onto $\text{span}(\{x_1, \cdots, x_n\})$, $x_i \in \mathbb{R}^m$ is a vector in the span that is as close as possible to $y$ wrt $l_2$ norm $$ \text{Proj}(y; \{x_{1}, \cdots, x_n\}) = \argmin_{{v \in \text{span}(\{x_1, \cdots, x_n\})}} \| y - v \|_2 $$ ## Null space of $A$ is the set of all vectors that satisfies the following: $$ \mathcal{N}(A) = \{ x \in \mathbb{R}^n \mid Ax = 0 \} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1) ## probability theory With Bayes rules we have $$ P(Y|X) = \frac{P(X|Y)P(Y)}{P(X)} $$ Chain rule states for event $A_1, \ldots A_n$: $$ \begin{aligned} P(A_1 \cap A_2 \cap \ldots \cap A_n) &= P(A_n|A_{n-1} \cap \ldots \cap A_1)P(A_{n-1} \cap \ldots \cap A_1) \\ &= P(A_1) \prod_{i=2}^{n} P(A_i|\cap_{j=1}^{i-1} A_j) \end{aligned} $$ > [!tip] Law of Total Probability > > If $B_{1}, \ldots , B_{n}$ are finite partition of the same space, or $\forall i \neq j, B_i \cap B_j = \emptyset \land \cup_{i=1}^{n} B_i = \Omega$, then law of total probability state that for an event A > > $$ > P(A) = \sum_{i=1}^{n} P(A|B_i)P(B_i) > $$ ### cumulative distribution function For a random variable X, a CDF $F_X(x): \mathbb{R} \rightarrow [0,1]$ is defined as: $$ F_X(x) \coloneqq P(X \leq x) $$ - $0 0$ $$ \begin{aligned} p_X(x) &= \frac{e^{-\lambda} \lambda^x}{x!} \\ \mathbb{E}[X] &= \lambda \\ \text{Var}(X) &= \lambda \end{aligned} $$ ### continuous random variables Uniform distribution: $X \sim \text{Unif}(a,b), a \le b$ $$ \begin{aligned} f_X(x) &= \begin{cases} \frac{1}{b-a} & \text{if } a \le x \le b \\ 0 & \text{otherwise} \end{cases} \\ \\ \mathbb{E}[X] &= \frac{a+b}{2} \\ \text{Var}(X) &= \frac{(b-a)^2}{12} \end{aligned} $$ Exponential distribution: $X \sim \text{Exp}(\lambda), \lambda > 0$ $$ \begin{aligned} f_X(x) = \lambda e^{-\lambda x} \\ \\ \mathbb{E}[X] &= \frac{1}{\lambda} \\ \text{Var}(X) &= \frac{1}{\lambda^2} \end{aligned} $$ Gaussian distribution: $X \sim \mathcal{N}(\mu, \sigma^2), -\infty < \mu < \infty, \sigma^2 > 0$ $$ \begin{aligned} p_X(x) &= \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \\ \\ \mathbb{E}[X] &= \mu \\ \text{Var}(X) &= \sigma^2 \end{aligned} $$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour tags: - sfwr4ml3 - ml description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour" title: nearest neighbour date: 2024-10-28 --- See also: [slides 13](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture13.pdf), [slides 14](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture14.pdf), [slides 15](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture15.pdf) $$ \hat{y}_W(x) = \text{sign}(W^T x) = 1_{W^T x \geq 0} \\ \\ \because \hat{W} = \argmin_{W} L_{Z}^{0-1} (\hat{y}_W) $$ Think of contiguous loss function: margin loss, cross-entropy/negative log-likelihood, etc. ## linear programming $$ \max_{W \in \mathbb{R}^d} \langle{u, w} \rangle = \sum_{i=1}^{d} u_i w_i \\ \\ \text{s.t } A w \ge v $$ Given that data is linearly separable > $\exists W^{*} \mid \forall i \in [n], ({W^{*}}^T x^i)y^i > 0$ So > $\exists W^{*}, \gamma > 0 \mid \forall i \in [n], ({W^{*}}^T x^i)y^i \ge \gamma$ So > $\exists W^{*} \mid \forall i \in [n], ({W^{*}}^T x^i)y^i \ge 1$ ## perceptron Rosenblatt’s perceptron algorithm ```pseudo \begin{algorithm} \caption{Batch Perceptron} \begin{algorithmic} \REQUIRE Training set $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE Initialize $\mathbf{w}^{(1)} = (0,\ldots,0)$ \FOR{$t = 1,2,\ldots$} \IF{$(\exists \space i \text{ s.t. } y_i\langle\mathbf{w}^{(t)}, \mathbf{x}_i\rangle \leq 0)$} \STATE $\mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} + y_i\mathbf{x}_i$ \ELSE \STATE \textbf{output} $\mathbf{w}^{(t)}$ \STATE \textbf{break} \ENDIF \ENDFOR \end{algorithmic} \end{algorithm} ``` ### greedy update $$ W_{\text{new}}^T x^i y^i = \langle W_{\text{old}}+ y^i x^i, x^i \rangle y^i $$ ## SVM idea: maximizes margin and more robus to “perturbations” Eucledian distance between two points $x$ and the hyperplan parametrized by $W$ is: $$ \frac{\mid W^T x + b \mid }{\|W\|_2} $$ > Assuming $\| W \|_2=1$ then the distance is $\mid W^T x + b \mid$ ### maximum margin hyperplane $W$ has $\gamma$ margin if - $W^T x + b \ge \gamma \forall \text{ blue x}$ - $W^T x +b \le - \gamma \forall \text{ red x}$ Margin: $$ Z = \{(x^{i}, y^{i})\}_{i=1}^{n}, y \in \{-1, 1\}, \|W\|_2 = 1 $$ ```pseudo \begin{algorithm} \caption{Hard-SVM} \begin{algorithmic} \REQUIRE Training set $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE \textbf{solve:} $(w_{0},b_{0}) = \argmin\limits_{(w,b)} \|w\|^2 \text{ s.t } \forall i, y_{i}(\langle{w,x_i} \rangle + b) \ge 1$ \STATE \textbf{output:} $\hat{w} = \frac{w_0}{\|w_0\|}, \hat{b} = \frac{b_0}{\|w_0\|}$ \end{algorithmic} \end{algorithm} ``` --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/principal-component-analysis tags: - sfwr4ml3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/principal-component-analysis" title: principal component analysis date: 2024-10-07 --- ## problem statement - map $x \in R^d$ to $z \in \mathbb{R}^q$ with $q < d$ - A $q \times d$ matrix can represent a linear mapping: $$ z = Ax $$ - Assume that $A A^T = I$ (orthonormal matrix) ## minimising reconstruction error - Given $X \in \mathbb{R}^{d \times n}$, find $A$ that minimises the reconstruction error: $$ \min\limits_{A,B} \sum_{i} \| x^i - B A x^i \|_2^2 $$ > if $q=d$, then error is zero. Solution: - $B = A^T$ - $\min\limits_{A} \sum_i \| x^i - A^T A x^i \|^2$ is subjected to $A A^T = I_{q \times q}$ - assuming data is centered, or $\frac{1}{n} \sum\_{i} x^i = \begin{bmatrix} 0 & \cdots & 0 \end{bmatrix}^T $ ## eigenvalue decomposition $$ \begin{aligned} X^T X \mathcal{u} &= \lambda \mathcal{u} \\ X^T X &= U^T \Lambda U \\ \\ \\ \because \Lambda &= \text{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d) \\ &= \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_q \end{bmatrix} \end{aligned} $$ ## pca Idea: given input $x^1, \cdots, x^n \in \mathbb{R}^d$, $\mu = \frac{1}{n} \sum_{i} x^i$ Thus $$ C = \sum (x^i - \mu)(x^i - \mu)^T $$ Find the eigenvectors/values of $C$: $$ C = U^T \Lambda U $$ Optimal $A$ is: $$ A = \begin{bmatrix} u_1^T \\ u_2^T \\ \vdots \\ u_q^T \end{bmatrix} $$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1 tags: - sfwr4ml3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1" title: linalg review date: 2024-09-11 --- ## matrix representation of a system of linear equations $$ \begin{aligned} x_1 + x_2 + x_3 &= 5 \\ x_1 - 2x_2 - 3x_3 &= -1 \\ 2x_1 + x_2 - x_3 &= 3 \end{aligned} $$ Equivalent matrix representation of $Ax = b$ $$ \begin{aligned} A &= \begin{bmatrix} 1 & 1 & 1 \\ 1 & -2 & -3 \\ 2 & 1 & -1 \end{bmatrix} \\ x &= \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \\ b &= \begin{bmatrix} 5 \\ -1 \\ 3 \end{bmatrix} \end{aligned} \because A \in R^{m \times n}, x \in R^n, b \in R^m $$ > [!tip] Transpose of a matrix > > $A \in R^{m \times n}$ and $A^T \in R^{n \times m}$ ## dot product. $$ \begin{aligned} \langle x, y \rangle &= \sum_{i=1}^{n} x_i y_i \\ &= \sum_{i=1}^{n} x_i \cdot y_i \end{aligned} $$ ## linear combination of columns Let $A \in R^{m \times n}$, $X \in R^n$, $Ax \in R^n$ Then $Ax = \sum_{i=1}^{n}{\langle a_i \rangle} x_i \in R^n$ ## inverse of a matrix The inverse of a square matrix $A \in R^{n \times n}$ is a **unique** matrix denoted by $A^{-1} \in \mathbb{R}^{n\times{n}}$ $$ A^{-1} A = I = A A^{-1} $$ ## euclidean norm $L_{2}$ norm: $$ \| x \|_{2} = \sqrt{\sum_{i=1}^{n}{x_i^2}} = X^TX $$ L1 norm: $\| x \|_{1} = \sum_{i=1}^{n}{|x_i|}$ $L_{\infty}$ norm: $\| x \|_{\infty} = \max_{i}{|x_i|}$ p-norm: $\| x \|_{p} = (\sum_{i=1}^{n}{|x_i|^p})^{1/p}$ > [!tip] Comparison > > $ \|x\|_{\infty} \leq \|x\|_{2} \leq \|x\|\_{1}$ > One can prove this with Cauchy-Schwarz inequality ## linear dependence of vectors Given $\{x_1, x_2, \ldots, x_n\} \subseteq \mathbb{R}^d$ and $\alpha_1, \alpha_2, \ldots, \alpha_n \in \mathbb{R}$ $$ \forall i \in [ n ], \forall \{a_1, a_2, \ldots, a_n\} \subseteq \mathbb{R}^d \space s.t. \space x_i \neq \sum_{j=1}^{n}{a_j x_j} $$ ## Span > Given a set of vectors $\{x_1, x_2, \ldots, x_n\} \subseteq \mathbb{R}^d$, the span of the set is the set of all possible linear combinations of the vectors. > > $$ > \text{span}(\{x_1, x_2, \ldots, x_n\}) = \{ y: y = \sum_{i=1}^{n}{\alpha_i x_i} \mid \alpha_i \in \mathbb{R} \} > $$ If $x_{1}, x_{2}, \ldots, x_{n}$ are linearly independent, then the span of the set is the entire space $\mathbb{R}^d$ ## Rank For a matrix $A \in \mathbb{R}^{m \times n}$: - column rank: max number of linearly independent columns of $A$ - row rank: max number of linearly independent rows of $A$ If $\text{rank}(A) \leq m$, then the rows are linearly independent. If $\text{rank}(A) \leq n$, then the columns are linearly independent. > rank of a matrix $A$ is the number of linearly independent columns of $A$: > > - if $A$ is full rank, then $\text{rank}(A) = \min(m, n)$ ($\text{rank}(A) \leq \min(m, n)$) > - $\text{rank}(A) = \text{rank}(A^T)$ ## solving linear system of equations If $A \in \mathbb{R}^{n}$ is invertible, there exists a solution: $$ x = A^{-1}b $$ ## Range and Projection Given a matrix $A \in \mathbb{R}^{m \times n}$, the range of $A$, denoted by $\mathcal{R}(A)$ is the span of columns of $A$: $$ \mathcal{R}(A) = \{ y \in \mathbb{R}^m \mid y = Ax \mid x \in \mathbb{R}^m \} $$ Projection of a vector $y \in \mathbb{R}^m$ onto $\text{span}(\{x_1, \cdots, x_n\})$, $x_i \in \mathbb{R}^m$ is a vector in the span that is as close as possible to $y$ wrt $l_2$ norm $$ \text{Proj}(y; \{x_{1}, \cdots, x_n\}) = \argmin_{{v \in \text{span}(\{x_1, \cdots, x_n\})}} \| y - v \|_2 $$ ## Null space of $A$ is the set of all vectors that satisfies the following: $$ \mathcal{N}(A) = \{ x \in \mathbb{R}^n \mid Ax = 0 \} $$ --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/Atmosphere tags: - astron2e03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Atmosphere" title: Atmospheric properties for exoplanets date: 2024-03-07 --- Ref: [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Atmosphere/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/06_Atmospheres_2024.pdf) ### features. $$ H = \frac{k_BT}{\omega m_H g} $$ 1. solid or dash solid q 2. larger mean molecular weights? B: shallow features ⇒ higher mean molecular weights mass-metallicity trend > [!question] Question > > Can we detect clouds in exoplanets? Clouds suppress atmospheric chemical signatures > [!tip] Important > > Introduced a degeneracy between cloud-top pressure and mean molecular weight ### clouds/winds on giant planets. Wind cells Hadley cells > [!note] Coriolis Effect > > Winds do not follow a straight trajectory ### Winds on tidally-locked exoplanets --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/Blackbody-Radiation tags: - astron2e03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Blackbody-Radiation" title: Blackbody Radiation date: 2024-02-06 --- ### Atmospheric escape > _non-thermal escape_: A physical process that results in the full or partial loss of a planet’s atmosphere. **large-scale magnetic fields** - conductive material - convective motion - has kinetic energy > Mars doesn’t have convective interior, since the core has been cooled off. > > radioactive decay within the core ### Stellar winds continuous flow of _ionized particles_ emitted by the Sun and other stars. #### Charge exchange ### Thermal escape #### Jeans escape Given the Maxwell-Boltzmann distribution, the probability of a particle having a certain velocity is given by: $$ \left( \frac{dN}{dv} \right)_{m,T} = v^2 \left( \frac{m}{2 \pi k_B T} \right)^{\frac{3}{2}} \exp \left( -\frac{mv^2}{2k_BT} \right) $$ --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/Exoplanets tags: - astron2e03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Exoplanets" title: Expolanets date: 2024-02-02 --- ### Q1) a. _Would you see any of the solar system planets transit?_ For an inclination of $i = 45 \degree$, transits are mostly observed when the orbital plan is edge on to the observer. It is plausible for some planets that is larger sized and orbit closer to ecliptic plane would transit the Sun given the direct line of sight. b. _If you monitored the Sun with radial velocity (RV) measurements and your technology was precise enough that you could measure RV signals down to 1 m/s, show and discuss whether you’re able to detect Venus._ Given the semi-amplitude $K$ of the radial velocity curve is given by $$ K = \frac{M_p \sin i}{(M_{*}+M_p)^{\frac{2}{3}}} \left( \frac{2 \pi G}{P} \right)^{\frac{1}{3}} $$ We have $$ \begin{align*} G &= 6.674 \times 10^{-11} m^3 \text{kg}^{-1} s^{-1} \\\ M_p &= 4.87 \times 10^{24} \text{kg} \\\ M_{*} &= 1.989 \times 10^{30} \text{kg} \\\ P &= 224.7 \text { days} \\\ K & = 4.87 \times 10^{24} \sin 45 \left( \frac{2 \pi G}{224.7 \times 24 \times 3600} \right)^{\frac{1}{3}} \approx 0.061 \text{m/s} \end{align*} $$ Given the precision of the RV measurements is 1 m/s, we can conclude that Venus is not detectable with the current technology. Venus induces a very small motion in the Sun due to gravitation pull, since RV is more sensitive to larger planets closer to their host stars. c. _Using the same RV measurements, show and discuss whether you’re able to detect Jupiter_ For Jupiter, we have $$ \begin{align*} G &= 6.674 \times 10^{-11} m^3 \text{kg}^{-1} s^{-1} \\\ M_p &= 1.898 \times 10^{27} \text{kg} \\\ M_{*} &= 1.989 \times 10^{30} \text{kg} \\\ P &= 224.7 \text { days} \\\ K = 1.898 \times 10^{27} \sin 45 \left( \frac{2 \pi G}{224.7 \times 24 \times 3600} \right)^{\frac{1}{3}} \approx 8.81 \text{m/s} \end{align*} $$ We can conclude that Jupiter is detectable with the current technology. This is due to Jupyter’s significant mass and gravitational pull on the Sun, which induces a larger motion via the Doppler shifts. d. _If you knew that the Sun’s mass is $1 M$ and you successfully detected Venus and/or Jupiter using these RV data, could you measure either planet’s absolute mass and why_ Detecting a planet using RV allows us to measure planet’s minimum mass, not absolute mass. This has to do with the inclination angle of its orbit ($\sin i$) If the orbit is edge-on ($i = 90 \degree$), then RV gives the closest approximation to the planet’s absolute mass. However, in this case our $i = 45 \degree$, so we can only measure the minimum mass of the planet based on the assumption of an edge-on orbit. e. _If you also monitored the Sun with astrometric measurements and your technology was precise enough that you could measure signals down to 10 $\mu \text{as}$ (i.e. micro-arcseconds), show and discuss whether you’re able to detect Jupiter_ The amplitude of astrometric signal $a$ is given by $$ a = \frac{m_{p}}{m_{*}} \frac{a_{p}}{d} $$ where $m_{p}$ is the mass of the planet, $m_{*}$ is the mass of the star, $a_{p}$ is the semi-major axis of the planet’s orbit, and $d$ is the distance to the star. For Jupyter, we have $$ \begin{align*} m_{p} &= 1.898 \times 10^{27} \text{kg} \\\ m_{*} &= 1.989 \times 10^{30} \text{kg} \\\ a_{p} &= 5.2 \text{AU} \\\ d &= 10 \text{pc} \\\ a &= \frac{1.898 \times 10^{27}}{1.989 \times 10^{30}} \frac{5.2 \times 1.496 \times 10^{11}}{10 pc} * 1e^6 \approx 496.21 \mu \text{as} \end{align*} $$ Therefore, Jupyter would be easily detectable. The signal is the result of Jupyter’s substantial mass and larger distance from the Sun. f. _Using the same astrometric measurements, show and discuss whether you’re able to detect Venus_ For Venus, we have $$ \begin{align*} m_{p} &= 4.87 \times 10^{24} \text{kg} \\\ m_{*} &= 1.989 \times 10^{30} \text{kg} \\\ a_{p} &= 0.72 \text{AU} \\\ d &= 10 \text{pc} \\\ a &= \frac{4.87 \times 10^{24}}{1.989 \times 10^{30}} \frac{0.72 \times 1.496 \times 10^{11}}{10 pc} * 1e^6 \approx 0.177 \mu \text{as} \end{align*} $$ Therefore, Venus would not be detectable. The signal is the result of Venus’s smaller mass and closer proximity to the Sun, therefore exert a smaller gravitational effect on the Sun’s position. g. _If you knew that the Sun’s mass is 1 M and you successfully detected Venus and/or Jupiter using these astrometric data, could you measure either planet’s absolute mass and why?_ Yes, since astrometric measures the displacement of the star’s position relative to distant background stars as it orbits around. The amplitude of the astrometric signal is directly proportional to the mass of the planet, and inversely proportional to the mass of the star, therefore we can calculate the absolute mass of the planet, given the semi-major axis of its orbits and the mass of the stars (which is 1M in this case here). ### Q2) $$ \begin{align*} L_{\text{orb}} &= \frac{2 \pi a^2 \sqrt{1-e^2}}{P} M \\\ L_{\text{rot}} &= I \omega \\\ I &= \frac{2}{5} M R^2 \\\ \omega &= \frac{2 \pi}{P_{\text{rot}}} \end{align*} $$ a. _Derive the expression for the ratio of orbital to rotational angular momenta. For this exercise, assume a circular orbit_ For ratio $\frac{L_{\text{orb}}}{L_{\text{rot}}}$ we have $$ \begin{align*} L_{\text{orb}} &= \frac{2 \pi a^2}{P} M \\\ L_{\text{rot}} & = I \omega = \frac{2}{5} M R^2 \frac{2 \pi}{P_{\text{rot}}} = \frac{4 \pi M R^2}{5 P_{\text{rot}}} \end{align*} $$ Therefore $\frac{L_{\text{orb}}}{L_{\text{rot}}} = \frac{5 a^2 P_{\text{rot}}}{2 R^2 P}$ b. \_It is a common misconception that the planets in our solar system orbit the Sun. In reality, the planets and the Sun all orbit their common center of mass. As such, the Sun has a non-zero semimajor axis $a_{\odot}$. Let us approximate the solar system as a 1-planet system that contains the Sun and Jupiter. In this scenario, what is the expression for $a_{\odot}$ in terms of Jupiter’s semimajor axis $a_J$ and both objects’ masses?\_ In a two-body system, the formula to derive the distance of the Sun from the barycenter is given by: $$ a_{\odot} = \frac{a_J M_J}{M_{\odot}} $$ where $a_J$ is the semimajor axis of Jupiter, $M_J$ is the mass of Jupiter, and $M_{\odot}$ is the mass of the Sun. The total distance $D$ between the Sun and Jupyter is the sum of their distance to the center of mass: $D = a_{\odot} + a_J$ Thus, considering this, the distance of the Sun from the barycenter is given by: $$ a_{\odot} = \frac{a_J M_J}{M_J + M_{\odot}} $$ c. _Using this expression, calculate the value of a in au_ Given that $a_J = 5.2 \text{AU}$, $M_J = 1.898 \times 10^{27} \text{kg}$, and $M_{\odot} = 1.989 \times 10^{30} \text{kg}$, we have $$ a_{\odot} = \frac{5.2 \times 1.898 \times 10^{27}}{1.898 \times 10^{27} + 1.989 \times 10^{30}} \approx 0.00496 \text{AU} $$ d. \_Given your value of $a_\odot$, calculate the ratio of the Sun’s orbital angular momentum to its rotation angular momentum. Is most of the Sun’s angular momentum manifested as orbital or rotational?\_ Using the formula derived in part a, we have $$ \frac{L_{\text{orb}}}{L_{\text{rot}}} = \frac{5 a_{\odot}^2 P_{\text{rot}}}{2 R^2 P} = \frac{5 \times {0.00496 \text{AU}}^2 \times 25 * 86400 \text{ sec}}{2 \times {(6.96 \times 10^8)}^2 \times 11.86 \times 3.153 \times 10^7} \approx 0.0164 $$ This indicates that most of the Sun’s angular momentum is manifested as rotational. e. _Now calculate the ratio of Jupiter’s orbital angular momentum to its rotational angular momentum. Is most of Jupiter’s angular momentum manifested as orbital or rotational?_ Using the formula derived in part a, we have $$ \frac{L_{\text{orb}}}{L_{\text{rot}}} = \frac{5 a_J^2 P_{\text{rot}}}{2 R^2 P} = \frac{5 \times {5.2 \text{AU}}^2 \times 9.93 \times 3600 \text{ sec}}{2 \times {(7.149 \times 10^7)}^2 \times 11.86 \times 3.153 \times 10^7} \approx 28287.8 $$ This indicates that most of Jupiter’s angular momentum is manifested as orbital. f. \_In parts d) and e) above, you should have found that the total angular momenta of both the Sun and Jupiter are heavily dominated by either their own $Li_{\text{orb}}$ or $L_{\text{rot}}$. Using the dominant forms of angular momenta for each body, calculate the ratio $\frac{L_J}{L_\odot}$\_ For Jupyter’s orbital angular momentum $L_{\text{orb}, J}$, we have $L_{\text{orb}, J} = M_J \sqrt{G M_{\odot} a_J}$, and for the Sun’s rotational angular momentum $L_{\text{rot}, \odot} = I_{\odot} \omega_{\odot}$, we have $L_{\text{rot}, \odot} = \frac{2}{5} M_{\odot} R_{\odot}^2 \omega_{\odot} = \frac{2}{5} M_{\odot} R_{\odot}^2 \frac{2 \pi}{P_{\text{rot,} \odot}}$ Thus the ratio $\frac{L_J}{L_\odot}$ is given by $$ \frac{L_J}{L_\odot} = \frac{L_{\text{orb}, J}}{L_{\text{rot}, \odot}} = \frac{M_J \sqrt{G M_{\odot} a_J}}{\frac{2}{5} M_{\odot} R_{\odot}^2 \frac{2 \pi}{P_{\text{rot,} \odot}}} $$ Given that $a_J = 5.2 \text{AU}$, $M_J = 1.898 \times 10^{27} \text{kg}$, $M_{\odot} = 1.989 \times 10^{30} \text{kg}$, $R_{\odot} = 6.96 \times 10^8 \text{m}$, and $P_{\text{rot,} \odot} = 25 \times 86400 \text{sec}$, we have $$ \frac{L_J}{L_\odot} \approx 17.20 $$ g. _Comment on where most of the angular momentum in the solar system is located._ Most of angular momentum in the solar system is located in the orbital motion of the planets, with Jupyter having the most significant contribution to the total angular momentum. This is due to the angular momentum of an orbiting body is proportional to the mass of the body and the distance from the center of mass, and inversely proportional to the period of the orbit. ### Q3) $$ \begin{align} v(\theta) &= \sqrt{GM \left( \frac{2}{r(\theta)} - \frac{1}{a} \right)} \\\ E = K + U &= -\frac{GMm}{2a} \\\ \end{align} $$ a. _Use the conservation of angular momentum L and mechanical energy E to derive Eq. 4_ The angular momentum $L$ of a planet in orbit around a larger mass is given by $$ L = mrv_{\perp} $$ where: - $m$ is the mass of the planet - $v_{\perp}$ is the velocity of the planet perpendicular to the vector pointing from the Sun - $r$ is the distance from the planet to the larger mass. In an elliptical orbit, the direction of veloocity changes, but magnitude of angular momentum is conserved due to no external torques. Therefore $$ L = mr(\theta)v(\theta)\sin \phi = \text{constant} $$ The total mechanical energy $E$ of a planet in orbit around a larger mass is given by The kinetic energy $K$ and the potential energy $U$ of a planet in orbit around a larger mass is given by $$ \begin{align} K &= \frac{1}{2}mv(\theta)^2 \\\ U &= -\frac{GMm}{r(\theta)} \end{align} $$ The total mechanical energy $E$ of a planet in orbit around a larger mass is given by $$ E = K + U = = \frac{1}{2}mv(\theta)^2 - \frac{GMm}{r(\theta)} $$ Given that the orbital velocity $v(\theta)$ is given by $$ v(\theta) = \sqrt{GM \left( \frac{2}{r(\theta)} - \frac{1}{a} \right)} $$ We can substitute $v(\theta)$ into the equation for $K$ to get $$ K = GMm \left( \frac{1}{r(\theta)} - \frac{1}{2a} \right) $$ Thus the total mechanical energy $E$ of a planet in orbit around a larger mass is given by $$ \begin{align} E = K + U &= GMm \left( \frac{1}{r(\theta)} - \frac{1}{2a} \right) - \frac{GMm}{r(\theta)} \\\ &= GMm \left( \frac{1}{r(\theta)} - \frac{1}{2a} - \frac{1}{r(\theta)} \right) \\\ &= -\frac{GMm}{2a} \end{align} $$ b. Use Eq. 4 to derive Eq. 3 $$ E = K + U = = \frac{1}{2}mv(\theta)^2 - \frac{GMm}{r(\theta)} $$ Since E remains constant, given that the total energy in a bound orbit is negative, we have $$ E = -\frac{GMm}{2a} $$ where $a$ is the semi-major axis of the orbit. We equate the two equations and solve for $v(\theta)$ to get $$ \begin{align} -\frac{GMm}{2a} &= \frac{1}{2}mv(\theta)^2 - \frac{GMm}{r(\theta)} \\\ v(\theta)^2 &= \frac{GM}{r(\theta)} \left( \frac{2}{r(\theta)} - \frac{1}{a} \right) \\\ v(\theta) &= \sqrt{GM \left( \frac{2}{r(\theta)} - \frac{1}{a} \right)} \end{align} $$ --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/Heating-Cooling-GH-effect tags: - astron2e03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Heating-Cooling-GH-effect" title: Heating, Cooling, and the Greenhouse Effect date: 2024-02-26 --- Ref: [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Heating-Cooling-GH-effect/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/06_HeatingCooling_GHeffect_2024.pdf) ### toy modal. eq: $T_{\text{surf}} ~ 1.32 \times T_{\text{atm, 2}}$ > [!note] General form of > > $$ > T_{\text{surf}} = {\lbrack \frac{(n+1)S}{\omega} \rbrack}^{\frac{1}{4}} = (n+1)^{\frac{1}{4}} \times T_{\text{atm, n}} > $$ --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/W1 tags: - astron2e03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1" title: Solar systems in the context of exoplanets date: 2024-01-08 --- Ref: [Solar System Exoplanets 2024](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/Solar-System-Exoplanets-2024.pdf) ## Obj. - content of solar system and orbital properties - Compare properties of Solar System to known exoplanetary - _six_ techniques for exoplanet detection & limitation. --- ## How people learn? > Student enter the classroom with preconceptions about how the world works. If their _initial understanding is not fully engaged, they may fail to grasp new concepts_ _develop competence_ 1. foundation knowledge 2. interrelationships among facts and concepts 3. retrieval and application. ## Solar system Sun → terrestrial planets → asteroid belt → Jovian (gas giants) \~ Ice giant planets → Trans-Neptunian objects (TNOs) (Dwarf planets → Kuiper belt → Oort cloud) > `1 au` (astronomical unit): average distance between Earth and Sun > Planetary orbits are (nearly) _co-planar_ - Dispersion in mutual inclinations: $\Delta{i} \approx 2\text{ deg}$ - Pluto and many other TNOs are \_more highly inclined ## Consequence of **Protoplanetary disks** _from Alma telescope_ - radio images of _warm dust continuum_ ($\leq 10^6\text{ Myrs}$) - Disk sizes $\approx 100\text{ au}$ - Variety of morphologies > [!question] Question > > Concentric gaps opened by _protoplanets_? - Due to active construction of _two protoplanets?_ > [!question] Question > > What other _dynamical properties_ do you expect for planets formed from a disk? - **Keplerian Motion**: Planets formed from a disk are expected to exhibit Keplerian motion → direct consequence rather than properties ## Regular vs. Irregular Satellites (aka, moons) | Regular Satellites | Irregular | | --------------------------------------------------------- | ----------------------------- | | Resemble mini planetary systems | Irregular orbits | | prograde | prograde or retrograde orbits | | low mutual inclinations, e.g: 4 Galilean moons of Jupyter | highly elliptical | | nearly circular orbits | highly inclined | ![Exoplanets discovery technique](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/exoplanets-discovery-technique.webp) > Most exoplanetary systems are compact Kepler-11 System ## Transit ![Trasit](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/transit.webp) - Time-resolved photometry (i.e. stellar brightness) = “light curve” Can measure: - Orbital period - Orbital inclination - Has to be edged-on - **relative to telescope**, not to the _star_ - Reference is line-of-sight to exoplanetary system. - Planet radius ### transit depth. $$ \begin{aligned} \mathbf{Z} &= \frac{\text{Area}_{pl}}{\text{Area}_{*}} = (\frac{R_{pl}}{R_*})^2 \\\ &\\\ Z&: \text{transit depth} \\\ R_{pl}&: \text{planet radius} \\\ R_{*}&: \text{stellar radius} \\\ \end{aligned} $$ ### limb-darkening - appears fainter at their edges compared to centres - depends on the **star’s temperature structure** and the **wavelength of the observations** ![Example transit graph](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/transit-graph.webp) > The higher the depth, the larger the planet > Limb-darkening only depends on the stars, and wavelength observing at > Depth **doesn’t depends** on how far away the planets is away from the star (depends on the durations, orbiting more slowly) > Duration is impacted by _period_ and _inclination_ ### known transiting expolanets ![Radius Period diagram](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/radius-period-diagram.webp) Geometric transit probability: $$ \begin{align*} P_{tr} &\approx \frac{R_{*}}{a} \\ &= 0.5\% \left( \frac{R_{*}}{R_{\odot}} \right) \left( \frac{a}{a_{\oplus}} \right)^{-1} \end{align*} $$ where $\odot$ and $\oplus$ is the _sun_ and _earth_ respectively ## Transit Timing Variations _oscillating orbits_ ![Transit timing variation example](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/transit-timing-variation.webp) > B exhibits larger TTV > A is more massive, since B is influenced by A (pulled by gravitational effect) ## Radial velocity Only sees the bigger stars chemical abundances in star atmosphere → graphs (dotted vertical lines) Time-resolved spectroscopy to measure _Doppler-shifted spectral features_ > Radial velocity shift translates into wavelength shift $$ \frac{\lambda_{obs}}{\lambda_{ref}} = \sqrt{\frac{1+v_{rad}/c}{1-v_{rad}/c}} $$ Can measure - Orbital period - Orbital eccentricity - Planet’s minimum mass semi-amplitude of RV signal _K_ > K depends on the orbital inclination _i_ such that RV method is _sensitive an upper limit on planetary mass_ $$ \begin{align} K &= M_p(\frac{2\pi{G}}{PM_{*}^{2}})^{1/3} \\\ K &= M_p \sin{i} (\frac{2\pi{G}}{PM_{*}^{2}})^{1/3} \end{align} $$ _Derivation_ $$ \begin{align} a_sM_s &= a_pM_p \\\ P^2 &= \frac{4\pi^2}{GM_{*}}a_p^3 \end{align} $$ $M_p$: planet mass, $i$: orbital inclination, $P$: orbital period, $M_{*}$: stellar mass > - Insensitive to face-on, maximally sensitive to edge-on > - Easier to detect big planets Transits + Radial Velocity (Radius + mass) → planet bulk density ## Astrometry > proper motions ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/astrometry.webp) ### Aside - Parallax `1 pc = 1 AU / 1"` 1” arcsec = 1/60 arcminutes = (1/60)/60 degrees parsec is the distance from two planets Consider a star-planet system located at _d_ from us $x=d\theta = 1{AU}(\frac{d}{1pc})(\frac{\theta}{1"})$ $$ \triangle{\theta} = \frac{M_p}{d}(\frac{GP^2}{4\pi^2M^2_{*}})^{1/3} $$ biased on long period ### Gravitational Microlensing > Mass bends spacetime → light ray are bent by a curved spacetime → massive object act as _gravitational lens_ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/gravitational-microlensing.webp) --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/index tags: - university - astron2e03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/index" title: Planetary Astronomy date: 2024-01-08 --- Dr. [Ryan Cloutier](mailto:ryan.cloutier@mcmaster.ca) or [link](https://avenue.cllmcmaster.ca/d2l/home/598689) Book (optional): [_Fundamental Planetary Science: Physics, Chemistry, and Habitability by Lissauer, J.J. & de Pater, I. (ISBN 9781108411981)_](https://www.cambridge.org/highereducation/books/fundamental-planetary-science/8FD11659BE64C35A172DF0432D7FCFA4#overview) - warm-up quizzes b4 classes, post-lecture quizzes - Participation marks for pre, post will be graded - 3 in-class tests - Jan 25th - Feb 15th - March 21st - 3 take-home assignments, due date as hard copies, at the beginning of the class: - Feb 1st - March 7th - April 4th - 1 finals ## Overview 1. Our solar system in the context of exoplanetary systems 2. Exoplanet discovery techniques - Transits + Transit Timing Variations - Radial velocity - Astrometry - Direct Imaging - Gravitational microlensing 3. Orbital mechanics - Kepler’s laws, - Gravity, angular momentum, and energy - Orbital resonances - 3-body dynamics - Tides 4. Heating & Cooling - Blackbody radiation - Star-planet interactions - Greenhouse effect 5. Planetary Atmospheres - Thermal structure - Energy transport - Cloud formation - Composition - Exoplanet transmission spectra 6. Planetary Interiors - Bulk density and composition - Mass-radius relation 7. Exoplanet formation/demographics - Core accretion - Measuring occurrence rates - Observed occurrence rates and formation inferences --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Defining-Internal-Alignment-and-Job-Analysis tags: - commerce4be3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Defining-Internal-Alignment-and-Job-Analysis" title: Defining Internal Alignment & Job Analysis date: 2024-01-24 --- > internal alignment: relationship among different jobs/skills/competencies within a single organisation, job structure also known as internal equity Structure needs - support organisation strategy - support workflow (process which good/services are delivered to the customer) - motivates behaviour (line-of-sight) > **career laddering/progression** ## Internal Pay Structure > refers to the array of pay rates for different work or skills within a single organisation - number of level - pay differentials between levels - criteria or bases used to determine those levels and differentials. ## differentials > pay difference among levels - requiring more skill/experience - performed in unpleasant work conditions - adds more value to the company - motivations ## criteria - content: work performed in a job - value: worth of the work ## structure. Job-based structure: work content - tasks, behaviours, responsibilities (engineering teams) Person-based structure: skill, knowledge, competencies focus to employees (lawyer, clientele) ## impact. ### external factors - economic pressures: inflations, COL - Government policies, Laws and Regulations: Pay-Equity Act - Stakeholders: board, employees - Cultures and customs: high-performance and focus internal equity. ### organisation factors - strategy - technology - human capital - HR policy - Employee acceptance - Cost implications ## internal labour markets > rules and procedures that determine pay for different jobs within single organisation and allocate employees among those different jobs. ## Strategy for designing internal structures. | Tailored | Loosely Coupled | Egalitarian | Hierarchical | | -------------------------------------------- | ----------------------------------------------------- | ------------------------------------------------- | -------------------------------------------------------------- | | Adapted by organisation with low costs | Adapted by organisaion require constant innovation | Few levels | multiple levels | | well-defined jobs with detailed paystructure | Pay structure are more loosely linked to organization | smaller differentials | detailed job description | | McDonald | Job are flexible, adaptable and changing | Equal treatment = knowledgable feels underpaid | | | | | higher performance when collaboration is required | higher performance when workflow depends on individual effort. | ## Equity theory: fairness - compare ratio of their own outcomes ## Tournament theory - relationship between motivation and performance ## Institutional theory - Copy others and conform - use “best practices” - align for one organisation might not align with another ## Consequences. _for internally-aligned Pay Structure_ - efficiency - fairness - compliance --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels--and-Employee-Benefits tags: - commerce4be3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels--and-Employee-Benefits" title: Designing Pay Levels and Employee Benefits date: 2024-02-28 --- See also [Designing Pay Levels, Pay Mix and Pay Structure](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels--and-Employee-Benefits/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels,-Pay-Mix-and-Pay-Structure.pdf) and [Pay Employment Benefits](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels--and-Employee-Benefits/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/Pay-Employment-Benefits.pdf) ## decision for externally competitive pay levels and structure. - employer’s competitive pay policy - purpose of survey - construct market line. - balance competitiveness with internal alignment through _pay range, flat rates, bands_ ## survey. - adjust pay level - pay mix: stock, benefits - pay structure: job evaluation results. - estimate competitors’ labour costs (competitive intelligence) ### design Which job to include? - benchmark job approach, low-high approach, conversion/survey level What information to collect? - organisation data, total compensation data, information about incumbent ### interpretation Verify anomalies, accuracy of match, validation to other trends. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels--and-Employee-Benefits/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/survey-data-elements-for-inclusion.webp) ## select relevant market competitors 1. Relevant labor markets 2. Fuzzy markets: new orgs/orgs with unique jobs fuse diverse factors for relevant markets fuzzy > [!question] Question > > What factors determine the relevant market for pay surveys? Why is the definition of the relevant market important? - **Industry and Job Function**: depending on the job sector and industry size. - **Geographic Location**: location-based pay - **Experience and Education Level**: pay for experience and education - **Market trends**: market trends and changes importance because: - **Competitiveness**: to attract and retain employees. - **Fairness and Equity**: enhance satisfaction and reduce turnover. - **Legal compliance**: to avoid discrimination. ### organization | Basic Elements | Examples | Rationale | | --------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------- | | Identification | Company name, address, contact person | Further contacts | | Financial performance | Assets, sales, profits (after taxes), cashflow | Indicates nature of the product/service markets, the ability to pay, size and financials | | Size | Profit centres, product lines | Importance of specific job groups to business success | | | Total number of employees | Impact on labour market | | Structure | Organizational charts | Indicates how business is organized and how important managerial jobs are. | ### Total compensation - cash forms used - non-cash forms used | | Advantages | Disadvantages | | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | **Base pay** | Tells how competitors are valuing the work in similar jobs. | Fails to include performance incentives and other forms, so will not give true picture if competitors offer low base but high incentives. | | **Total cash** | Tells how competitors are valuing work; also tells the cash pay for performance opportunity in the job. | Not all employees may receive incentives, so it may overstate the competitors’ pay; plus, it does not include long-term incentives. | | **Total compensation (base + bonus + stock options + benefits)** | Tells the total value competitors place on this work. | All employees may not receive all the forms. Don’t set base pay equal to competitors’ total compensation. | ### incumbent & jobs | Basic Elements | Examples | Rationale | | -------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------- | | Date | Date survey data in effect | Need to update rates to current date | | Job | Match generic job description | Indicates degree of similarity with survey’s key jobs | | Individual | Number of employees supervised and reporting levels | Describes scope of responsibilities | | | Years since degree, education, date of hire | Indicates training and tenure of incumbents | | Pay | Actual rates paid to each individual, total earnings, last increase, bonuses, incentives | | ### hr outcomes. | Basic Elements | Examples | Rationale | | ------------------ | -------------------------------------------------------------------------------------- | ----------------------------------------------- | | Productivity | Revenues to employee ratio, revenues to labour costs ratio | Reflect organization performance and efficiency | | Total labour costs | Number of employees x (average wages and benefits) | Major expense | | Attraction | Yield ratio, number accepting offer to number of job offers ratio | Reveals recruiting success | | Retention | Turnover rate; number of high or low performers who leave to number of employees ratio | Reveals outflow of people | | Employee views | Total pay satisfaction | Reveals what employees think about their pay | ## market pay line > links a company’s benchmark jobs on horizontal axis with market rates paid by competitors on the vertical axis. ### Internal structure and external market rates - pay-policy line - pay ranges #### pay-policy line > percent above or below market line intend to “lead”, “lag”, or “match” rate. > [!tip] Develop grades > > single grade will have same pay range #### pay range - midpoints where pay-policy line crosses centre of grade, minimum and maximum - larger ranges in managerial jobs reflect the greater opportunity for performance variants in the work - firm uses percentiles as maximum and minimums while other establish them separately. > pay disparity among candidates. 1. Internal pressures - recognize performance pay difference with pay - expectations pay over time 2. External pressures - difference in quality among individuals - difference in productivity or value variations - mix of pay forms #### range overlap Overlap ought to be large enough to induce employees to seek promotions. ## Broadbanding > collapse salary grades into a few broad bands, each with a minimum and maximum - flexibility - career growth --- ## Employee Benefits - Flexible hours - WFH: 45% of employees love their jobs (according to Forbes) - Vacation time and PTO: No timeout more prone to burnt out - Pay parental leave > part of compensation package, other than pay for time worked. Growth in Employee Benefits - Cost effectiveness of Benefits - Union - Employer impetus - Government Impetus ## issues. - ensure external competitiveness - adequacy of benefits - Who should be protected? - How much choice should employees have among an array of benefits? - How should benefits be financed? > [!question] Question > > How does external equity differ when pay versus benefits? - Pay is quantifiable regarding monetary values, whereas benefits are objective in terms of equity. --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Final-report tags: - commerce4be3 description: a Uber compensation policies case study title: Uber compensation analysis date: 2024-03-20 --- --- ## meeting minutes ### 2024-03-20 Evan is missing, everyone else is present Part 1: Company introduction (Evan) Part 2: Identification of issues/problem statement (Vanessa, Josh) Part 3: Analysis of the current compensation system Part 4: Proposals for compensation package and performance criteria (Aaron, Imran) Part 5: Implementation and details of improvement, suggestions Part 6: Conclusion, recommendations --- To establish pay transparency, Uber should disclose to drivers how their pay is calculated, including the commission Uber takes from each fare, typically around 25% (Zinkula. 2024). Uber can provide a detailed breakdown in the driver app and weekly pay statements showing the passenger fare, Uber’s take rate, and the driver payout for each trip. Uber should also publish its average take rates and driver earnings by the city to provide greater transparency and allow drivers to make informed decisions. Uber should ensure drivers do not operate at a loss after accounting for expenses like fuel, insurance, and vehicle maintenance, which is \$0.32 per mile (Zoepf, 2018). Uber should guarantee drivers a minimum hourly earnings rate after accounting for expenses or a minimum rate card per mile and per minute to implement minimum earning guarantees. This will require extensive research on different car models and fuel consumption, as well as constructing statistical models to predict the expected costs for each driver accurately. This will provide greater financial security and help compensate drivers fairly for their time and costs. Uber should reward high-performing drivers with incentives based on metrics like trips completed and utilization rate, in addition to current perks provided through Uber Pro (Uber 2024). Uber should also consider tenure-based increases (an example is their proposed Upfront Driver Pay), such as raising driver rates by 2-3% for each year of service (Sherman, 2024). This will help retain experienced drivers and demonstrate that Uber values its long-term driver partners. Lastly, Uber should expand its driver rewards program, Uber Pro, which offers vehicle maintenance discounts based on points earned for trips (Mishel, 2024). Kessler (2020) reported that while Uber has provided some sick pay and other financial assistance to drivers, but many say it is insufficient during the pandemic. Drivers are classified as independent contractors, lacking benefits like health insurance and paid time off. However, these benefits are still limited compared to employee benefits packages. To provide more security for drivers, Uber should look into offering occupational accident insurance, disability payments, and subsidized health insurance in more markets. Uber benefits drivers in European cities like London and Paris (SERUpractice) --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-description-exercise tags: - commerce4be3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-description-exercise" title: exercise date: 2024-01-31 --- 4BE3 Job Description Assignment Factor 1 – Knowledge/Education/Training We agree with the Level 3 decision because the types of duties required of the Customer Service Representative are quite heavy and a person who is still in high school may not fully be able to grasp these ideas or fully understand how to do them. Factor 2 – Skill Gained by Experience We agree with the Level 3 decision (over 3 months and including 6 months) because these tasks may take a bit more time to learn and require more training and support with them. Employee probation is usually done after 3 months so they should be able to do these tasks on their own by that point with minimal support. Factor 3 – Responsibility for Decisions and Skill in Operations We think that Factor 3 should be increased to a Level 4 because the employee has decision-making authority over refunds and payments, and also needs to pay close attention to inventory and ordering to ensure everything is ordered in a timely manner. They need to also manage multiple programs. Factor 4 – Responsibility for Ingenuity and Creativity We agree with the Level 2 decision because the role does not require too much creativity other than the occasional thinking on the spot and coming up with solutions, but these solutions will not be implemented company wide. --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-based-Pay-structures-and-Job-Evaluation tags: - commerce4be3 - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-based-Pay-structures-and-Job-Evaluation" title: Job-based Pay structures and Job Evaluation date: 2024-01-31 --- See [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-based-Pay-structures-and-Job-Evaluation/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-based-Pay-Structures-and-Job-Evaluation.pdf) ## Speakers - ask better questions - need to know what you want the compensation philosophy - what is the purpose of the compensation plan? - are we still doing this correct? - questions this philosophy? - ask hard problems, questions - internet is not a good salary source - Good communication and understand what your compensation philosophy is - Keys: - Jobs and hierarchy they brings - skills and competencies they offer - Documents and version control it. - Subjective and Bias builtin - Research the roles within the jobs. ## Job Evaluation > systematically determining the relative worth of jobs to create job structure within an organisation > based on combination of job content, skills, values, organisation culture, external market Decision: - purpose - single or multiple plans - among alternative approaches - involvement of relevant stakeholders - evaluate usefulness ## establish purpose aligned if - supports organisation strategy - supports workflow - fair to employees - motivates behaviour toward organisation objectives. ## single vs. multiple plans - evaluation plans for different types of workflow - number of job evaluation plans _depends_ on how detailed it needs to be to make pay decisions ## choices of job evaluation ### simple ranking. - order from highest to lowest based on relative values - advantages: simple, fast, easy to understand and explain to employees; least expensive initially. - disadvantages: - ranking criteria is poorly defined → evaluations become biased - evaluators must be knowledgeable about all jobs - results are difficult to defend and costly solutions maybe be required. | alternatives methods | description | | -------------------- | ---------------------------------------------------------------------------------------------------- | | Alternation | order descriptions alternately at each extreme, evaluators agree on which jobs are the most valuable | | Paired comparison | compare each job with every other job, number of comparisons = n(n-1)/2 | ### classification - series of classes cover the range of jobs - descriptions are labels which capture general nature of work ## point method - assignment of numeric score - procedure results in a relative ordering of jobs based on the number of points that each job “scores”. ### 1. Job Analysis - representative benchmark jobs is drawn for analysis ### 2. Determine Compensable Factors - based on strategy and values of organisation - based on the work performed - acceptable to stakeholders affected by the resulting pay structure challenges: - small numbers - unique criteria ### 3. Scale the Factors - 4 to 8 degrees ### 4. Weigh the Factors - of important - weights reflect the relative importance of each Factors - determined through an advisory committee (a priori judgement approach) ### 5 & 6. Communicate. Who? - employees - consultants - union representatives ### Design process and Job structures provides a hierarchy of work, or a job structure [exercise](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-based-Pay-structures-and-Job-Evaluation/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-description-exercise) ## Skill-based plans - in the trade - link pay to the depth or breath of skills - pay individuals for all relevant skills → wage attach to person ### types. 1. depth - based on knowledge of the person 2. generalist/breadth - increased by acquiring new knowledge > [!tip] Purpose of skill-based > > supports organisation’s strategy, workflow, fair to employees, and motivates behaviour ### Outcomes - well accepted by employees and provide strong motivation for individuals to increase new skills - become increasingly expensive - flexibility permits leaner staff - success is determined by how well it aligns ## Competency-based plans - faireness and motivations - skill-based, foundation for successful work - core competencies are often linked to the mission statement - competency set translates core → action - indicators are observable behaviour ### Analysis - core competencies are not unique for each company - differs applies their competencies - verify their possession of that competency - no objective way to certifying competencies - relatively few levels and wide differentials for increased levels ## internal alignment reflected in structures - purposed of job and person-based procedures is to design and manage a pay structure. ## reliability and validity - consultants - improve reliability by using evaluators familiar with the work and trained in job evaluation. - validity refers to degree the evaluation assesses relative job worth. ## Acceptability - formal appeals process → request and re-analysis or skills re-evaluation. - employee attitude surveys assess perceptions of how useful evaluation is as a management tools. ## bias. To ensure bias-free evaluation: - compensable factors and scales to include the content of jobs - factor weights are consistently biased against jobs - apply the plan as bias free a manner as feasible. --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Pay-model tags: - commerce4be3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Pay-model" title: Pay model date: 2024-01-10 --- see also: [Slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Pay-model/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/The-Pay-Model.pdf) ## compensation. > refer to all forms of financial returns and tangible services and benefits receive as part of an employment relationship 1. societal - pay and benefits as measure of justice - job losses or gains in a country is a function of labor costs 2. stockholders - ESA: employment options plan and stock purchase plan, ISO - executive pays: VPs, higher up. - performance measures 3. managers - major expense that must be managed - major determinant of employee attitudes and behaviours 4. employees - financial freedom - exchange of good - incentive to work a job, and have a reward for having done so. Merit payment: > Total Rewards: RRSP: 401k Health spending account: employment security: union membership > Social capital Employee value proposition Psychological safety: without having retaliation and being safe at work environment. ## total reward. ### total compensation - include cash payments (IA, CPP) Cash compensation: - Base pay: Job evaluation - merit increases are increments - COLA (cost of living adjustment) - incentives (bonuses) Benefits - health insurance - pension: retirement and saving - allowances ### relational returns > Non-financial returns that substantially impact employee behaviour, such as employment security and learning and developmental opportunities - psychological returns - recognition and status ## pay model. ```mermaid graph LR SP{{Strategic polcies}} --> T{{Techniques}} --> SO{{Strategic objectives}} ``` collective bargaining - objectives - policies that form the foundation of compensation - techniques that make up compensation system. ### internal alignment - comparisons among jobs and skill levels within organization - pertains pay rates both for employees - Pay relationship affect compensations objectives ### external competitiveness - pay comparisons with competitors externally - **market driven** - objectives: - ensure pay is sufficient to attract - control labor cost to ensure competitive pricing of product. ### employee contributions - how employees are rewarded - bases for performance-based evaluations, perceive pay as fair. ### management - right people get the right pay for achieving the right objectives the right way. ## pay techniques - four basic policies - tools and mechanism that are used to achieve objectives. Gender inequality [article](https://web.archive.org/web/20230602214140/https://www.theglobeandmail.com/business/careers/article-not-a-single-large-public-canadian-firm-has-closed-the-gender-pay-gap/) --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/index tags: - university - commerce4be3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/index" title: Compensation date: 2024-10-29 --- --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/A1 tags: - swfr4x03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A1" title: Floating points error, Taylor series, and approximation date: 2023-09-25 --- **Problem 1 \[5 points]** Consider solving the scalar equation $ax = b$, for given a and b and assume that you have computed $\hat{x}$. To measure the quality of $\hat{x}$, we can compute the residual $r = b − a\hat{x}$. Derive the error in $fl(r)$, that is the relative error in the floating point representation of $r$. Can it be large? Explain. _Answer_: Given $r = b - a\hat{x}$, - Let $fl(a)$ is the floating point representation of $a$ - Let $fl(b)$ be the floating point representation of $b$ - Let $fl(\hat{x})$ be the floating point representation of $\hat{x}$ Assuming relative error of $fl(\hat{x})$ is $\delta_{\hat{x}}$ ⇒ $fl(\hat{x}) = \hat{x}_{true}(1+\delta_{\hat{x}})$ Therefore: $a*\hat{x}=a*\hat{x}_{true}(1+\delta_{\hat{x}})$ Assuming relative error of $fl(a\hat{x})$ is $\delta_{m}$ ⇒ $fl(a\hat{x}) = a*\hat{x}_{true}(1+\delta_{\hat{x}})(1+\delta_{m})$ Computed residual $r = b - a*\hat{x}_{true}(1+\delta_{\hat{x}})$ Assuming relative error of $fl(b-a\hat{x})$ is $\delta_{s}$ ⇒ $fl(b-a\hat{x}) = b - a*\hat{x}_{true}(1+\delta_{\hat{x}})(1+\delta_{m})(1+\delta_{s})$ Thus, the error in $fl(r)$ is $\delta_{r} = (1+\delta_{\hat{x}})(1+\delta_{m})(1+\delta_{s}) - 1$ > The error can be large if: > > - the relative error of $\hat{x}$ is large > - significant rounding error in multiplication and subtraction (otherwise $\delta_m$ and $\delta_s$ is large) > - value of $a$ and $b$ such that $b - a\hat{x}$ introduces “catastrophic cancellation”, or $b \approx a\hat{x}$ --- **Problem 2 \[2 points]** Explain the output of the following code ```matlab clear all; x = 10/9; for i=1:20 x = 10*(x-1); end x ``` Is the result accurate? _Answer_: The following includes steps for the above MATLAB code: 1. `clear all` clears all variables in current workspace 2. `x = 10/9` initialise the first value of $x$ to $\frac{10}{9}$ 3. The `for` loop runs for 20 times, where it updates $x$ using the following formula $x:=10*(x-1)$ 4. Finally, `x` prints out the value of `x` into the MATLAB terminal window. The output of the code is not correct, due to floating point errors. Machine epsilon $\epsilon_{mach}$ by default in MATLAB (which is in double precision) is approx. $2.2204e-16$ Since $x$ is a floating point, every iteration in the `for` loop will include a floating point error, and thus after 20 iterations, the results won’t be accurate to its mathematical value. --- **Problem 3 \[3 points]** Suppose you approximate $e^x$ by its truncated Taylor series. For given $x = 0.1$, derive the smallest number of terms of the series needed to achieve accuracy of $10^{−8}$ . Write a Matlab program to check that your approximation is accurate up to $10^{−8}$. Name your program `check_exp.m`. _Answer_: Taylor series of real or complex $f$ at $c$ is defined by $f(x) = \sum^{\inf}_{k=0}\frac{f^{(k)}(c)}{k!}(x-c)^k$ Given $f$ has $n+1$ continuous derivative $[a, b]$, or $f \in C^{n+1}[a, b]$ , then the truncated Taylor series can be defined as $f(x) = \sum^{\inf}_{k=0}\frac{f^{(k)}(c)}{k!}(x-c)^k + E_{n+1}$ where $E_{n+1} = \frac{f^{n+1}(\xi(c, x))}{(n+1)!}(x-c)^{n+1} = \frac{f^{n+1}(\xi)}{(n+1)!}(x-c)^{n+1}$ Hence, with $x := x+h$ we have $f(x+h) = \sum^{\inf}_{k}\frac{f^{(k)}(x)}{k!}(h)^k + E_{n+1}$ where $E_{n+1} = \frac{f^{n+1}(\xi)}{(n+1)!}h^{n+1}$ and $\xi$ is between $x$ and $x+h$ Thus, we need to find $n$ terms such that $| E_{n+1} = \frac{e^x(\xi)}{(n+1)!}x^{n+1} | \le 10^{-8}$ with $\xi$ between 0 and $x$ With $x=0.1$, then $e^0.1 \approx 1.1052$. $E_{n+1} = \frac{e^{\xi}}{(n+1)!}x^{n+1} = \frac{1.1052}{(n+1)!}0.1^{n+1} \le 10^{-8} \rightleftharpoons \frac{0.1^{n+1}}{(n+1)!} \le 9.0481e-09$ From the above function, with $n=6$ the Taylor Series will be accurate up to $10^{-8}$ The below is the Matlab to examine the above terms: ```matlab title="check_exp.m" function check_exp() x = 0.1; % Approximation for the first 6 terms of the Taylor series approx = 1 + x + x^2/factorial(2) + x^3/factorial(3) + x^4/factorial(4) + x^5/factorial(5); actual = exp(x); error = abs(approx - actual); % Display the results fprintf('Approximated value: %f\n', approx); fprintf('Actual value: %f\n', actual); fprintf('Error: %e\n', error); % Check if the error is less than 10^-8 if error < 10^-8 disp('The approximation is accurate up to 10^-8.'); else disp('The approximation is NOT accurate up to 10^-8.'); end end ``` --- **Problem 4 \[3 points]** The sine function has the Taylor series expansion $sin(x) = x − \frac{x^3}{3!} + \frac{x^5}{5!} − \frac{x^7}{7!} + · · · +$ Suppose we approximate $sin(x)$ by $x − \frac{x^3}{3!} + \frac{x^5}{5!}$. What are the absolute and relative errors in this approximation for $x = 0.1, 0.5, 1.0$? Write a Matlab program to produce these errors; name it `sin_approx.m`. _Answer_: Assuming $y=sin(x)$ as exact value and $\tilde{y}$ is the approximate value of $sin(x)$, which is $\tilde{y} = x − \frac{x^3}{3!} + \frac{x^5}{5!}$ - Absolute error is given by $|y - \tilde{y}|$ - Relative error is given by $\frac{|y-\tilde{y}|}{y}$ For the following $x \in {0.1, 0.5, 1.0}$, the following table represents the error: | Error | $x=0.1$ | $x=0.5$ | $x=1.0$ | | -------- | ------------ | ------------ | ------------ | | Absolute | 1.983852e-11 | 1.544729e-06 | 1.956819e-04 | | Relative | 1.987162e-10 | 3.222042e-06 | 2.325474e-04 | ```matlab title="sin_approx.m" function sin_approx() % Define the values of x x_values = [0.1, 0.5, 1.0]; % Loop through each value of x to compute the errors for i = 1:length(x_values) x = x_values(i); % Calculate the approximation approx = x - x^3/factorial(3) + x^5/factorial(5); % Calculate the actual value of sin(x) actual = sin(x); % Calculate the absolute error abs_error = abs(approx - actual); % Calculate the relative error rel_error = abs_error / abs(actual); % Display the results for each x fprintf('For x = %f:\n', x); fprintf('Approximated value: %f\n', approx); fprintf('Actual value: %f\n', actual); fprintf('Absolute Error: %e\n', abs_error); fprintf('Relative Error: %e\n\n', rel_error); end end ``` --- **Problem 5 \[2 points]** How many terms are needed in the series $arccot(x) = \frac{π}{2} − x + \frac{x^3}{3} − \frac{x^5}{5} + \frac{x^7}{7} + · · ·$ to compute $arccot(x)$ for $|x| \le 0.5$ accurate to 12 decimal places. _Answer_: To calculate $arccot(x)$ for $|x| \le 0.5$ accurate to 12 decimal places, we need to find $n$ such that $|E_{n+1}| < 10^{-12}$ Substitute for error term, needs to find $n$ such that $|\frac{f^{n+1}(\xi)}{(n+1)!}h^{n+1}| < 10^{-12}$ We know that the general term for Taylor series of $arccot(x)$ is $a_n = \frac{(-1)^nx^{2n+1}}{2n+1}$ Since we are considering on interval $|x| \le 0.5$, and `arccot(x)` is an alternating series, the largest possible value of the error term will occur when $x=0.5$ Thus, the equation to solve for $n$ term is $|\frac{(-1)^{n+1}*x^{2n+1}}{(2n+1)*(n+1)!}| < 10^{-12} \rightleftharpoons \frac{x^{2n+1}}{(2n+1)*(n+1)!} < 10^{-12}$ Using the following function `find_nth_term`, we can find that when $n=17$ will ensure the $arccot(x)$ for $|x| \le 0.5$ to be accurate to 12 decimal places. ```python import math def find_nth_terms(x: float, eps: float = 1e-12): n = 0 term = x while abs(term) >= eps: n += 1 term = math.pow(-1, n) * math.pow(x, 2 * n + 1) / (2 * n + 1) return n find_nth_terms(0.5) ``` --- **Problem 6 \[2 points]** Consider the expression $1024 + x$. Derive for what values of $x$ this expression evaluates to 1024. _Answer_: In IEEE 754 double precision, $\epsilon_{mach} = 2^{-52} \approx 2.2*10^{−16}$ From the definition of machine epsilon ($1024 + \epsilon_{mach} > 1024$), the difference between $N$ and the next representable numbers is proportional to $N$, that is $N*\epsilon_{mach}$ Thus the problem implies there is such $x$ that exists within a range such that $x < \frac{1}{2}*\epsilon_{mach}*N$ Substitute value for $N=1024$ and $\epsilon_{mach} \approx 2.2*10^{−16}$ ⇒ $x < \frac{1}{2}*2.2*10^{-16}*1024 \approx 1.1368448×10^{−13}$ > $\forall x \lessapprox 1.1368448×10^{−13} \rightarrow (1024 + x) \: \text{evaluates} \: 1024$ --- **Problem 7 \[2 points]** Give an example in base-10 floating-point arithmetic when a. $(a + b) + c \neq a + (b + c)$ b. $(a ∗ b) ∗ c \neq a ∗ (b ∗ c)$ _Answer_: For the first example $(a + b) + c \neq a + (b + c)$, assuming using double precision: Let: - $a=1.0$ - $b=1.0*10^{-16}$ - $c=-1.0$ ⇒ $(a+b)+c = 0$, whereas $a+(b+c) = 1.11022*10^{-16}$ The explanation from _Problem 6_ can be used to explain that $(a+b) = a$ since $b < 1.1368448×10^{−13}$, therefore $(a+b)+c=0$, whereas in $a+(b+c) \approx 1.0 - 0.999999999 \approx 1.11022*10^{-16}$ due to round up for floating point arithmetic. For the second example $(a ∗ b) ∗ c \neq a ∗ (b ∗ c)$, assuming the following $FP$ system $(10, 3, L, U)$ where $x=\pm{d_0.d_1d_2}*10^e, d_0 \neq 0, e \in [L, U]$ Let: - $a=1.23$ - $b=4.56$ - $c=7.89$ ⇒ $(a*b)*c=44.3$ ($a*b=5.61$ rounded and $5.61*c=44.3$), whereas $a*(b*c)=44.2$ ($b*c=35.9$ rounded and $35.9*a = 44.2$) --- **Problem 8 \[8 points]** Consider a binary floating-point (FP) system with normalised FP numbers and 8 binary digits after the binary point: $x=\pm{1.d_1d_2 · · · d_8 × 2^e}$ For this problem, assume that we do not have a restriction on the exponent $e$. Name this system B8. (a) \[2 points] What is the value (in decimal) of the unit roundoff in B8? (b) (1 point) What is the next binary number after $1.10011001$? (c) \[5 points] The binary representation of the decimal $0.1$ is infinite: $0.00011001100110011001100110011 · · ·$. Assume it is rounded to the nearest FP number in B8. What is this number (in binary)? _Answer_: B8 system can also be defined as $FP(2, 8, L, U)$ (a). For a binary FP system with $p$ binary digits after binary point, the unit roundoff $u$ is given by $u=2^{-p}$ With $t=8$, unit roundoff for this system in decimal is $u = 2^{-8} = 0.00390625$ (b). Given $u=2^{-8}=0.00000001$ in binary, the next binary number can be calculated as: ``` 1.10011001 + 0.00000001 = 1.10011010 ``` (c). first 9 digits after the binary point to determine how to round: 0.000110011 Given the unit roundoff is $2^{-8}$ and 9th digit is 1 (or $2^{-9}$) → round up Therefore, 0.1 rounded to nearest FP system in B8 is $0.00011010$ in binary --- **Problem 9 \[10 points]** For a scalar function $f$ consider the derivative approximations $f^{'}(x) \approx g_1(x, h) := \frac{f(x + 2h) − f(x)}{2h}$ and $f^{'}(x) \approx g_2(x, h) := \frac{f(x + h) − f(x − h)}{2h}$ . a. \[4 points] Let $f(x) = e^{sin(x)}$ and $x_0 = \frac{\pi}{4}$. - Write a Matlab program that computes the errors $|f ′(x_0)−g1(x_0, h)|$ and $|f′(x_0)−g_2(x_0, h)|$ for each $h = 10^{−k}, k = 1, 1.5, 2, 2.5 . . . , 16$. - Using `loglog`, plot on the same plot these errors versus $h$. Name your program `derivative_approx.m`. For each of these approximations: b. \[4 points] Derive the value of $h$ for which the error is the smallest. c. \[2 points] What is the smallest error and for what value of $h$ is achieved? How does this value compare to the theoretically “optimum” value? _Answer_: (a). ```matlab title="derivative_approx.m" function derivative_approx() % Define the function f and its derivative f = @(x) exp(sin(x)); df = @(x) cos(x) * exp(sin(x)); % Define the approximation functions g1 and g2 g1 = @(x, h) (f(x + 2*h) - f(x)) / (2*h); g2 = @(x, h) (f(x + h) - f(x - h)) / (2*h); % Define x0 x0 = pi/4; % Define k values and compute h values k_values = 1:0.5:16; h_values = 10.^(-k_values); % Initialize error arrays errors_g1 = zeros(size(h_values)); errors_g2 = zeros(size(h_values)); % Compute errors for each h_value for i = 1:length(h_values) h = h_values(i); errors_g1(i) = abs(df(x0) - g1(x0, h)); errors_g2(i) = abs(df(x0) - g2(x0, h)); end % Find the h value for which the error is the smallest for each approximation [~, idx_min_error_g1] = min(errors_g1); [~, idx_min_error_g2] = min(errors_g2); h_min_error_g1 = h_values(idx_min_error_g1); h_min_error_g2 = h_values(idx_min_error_g2); % Display the h values for the smallest errors fprintf('For g1, the smallest error is at h = %e\n', h_min_error_g1); fprintf('For g2, the smallest error is at h = %e\n', h_min_error_g2); % Plot errors using loglog loglog(h_values, errors_g1, '-o', 'DisplayName', '|f''(x_0) - g_1(x_0, h)|'); hold on; loglog(h_values, errors_g2, '-x', 'DisplayName', '|f''(x_0) - g_2(x_0, h)|'); hold off; % Add labels, title, and legend xlabel('h'); ylabel('Error'); title('Errors in Derivative Approximations'); legend; grid on; end ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A1/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/derivative-approx.svg) (b). The Taylor’s series expansion of function $f(x)$ around point $a$ is: $f(x) = \sum_{n=0}^{\inf}{\frac{f^{(n)}(a)}{n!}(x-a)^n} = f(a) + f^{'}(a)(x-a) + \frac{f^{''}(a)}{2!}(x-a)^2 + \frac{f^{'''}(a)}{3!}(x-a)^3 + ...$ For the first approximation $g_1(x, h)$, with Taylor series expansion: $f(x+2h) = f(x) + 2hf^{'}(x) + (2h)^2\frac{f^{''}(x)}{2!}$ for $x \leq \xi \leq x + 2h$ $\rightarrow g_1(x, h) = f^{'}(x) + (2h){f^{''}(\xi)}$ for $x \leq \xi \leq x + 2h$ Hence the error term is $2hf^{''}(\xi)$ ⇒ $h=2*\sqrt{\epsilon_{mach}}*\frac{1}{\sqrt{e^{sin(x)}cos(x)^2−e^{sin(x)}sin(x)}} = \frac{2\sqrt{\epsilon_{mach}}}{\sqrt{\frac{e^{\frac{1}{\sqrt{2}}}}{2} - \frac{e^{\frac{1}{\sqrt{2}}}}{\sqrt{2}}}}$ For the second approximation $g_2(x, h)$: the error term is $-\frac{1}{6}h^2f^{'''}(x)$ (c). For $g_1$, the smallest error is at h = 1.000000e-08 For $g_2$, the smallest error is at h = 3.162278e-06 --- **Problem 10 \[7 points]** In the Patriot disaster example, the decimal value 0.1 was converted to a single precision number with chopping. Suppose that it is converted to a double precision number with chopping. (a). \[5 points] What is the error in this double precision representation of 0.1. (b). \[2 points] What is the error in the computed time after 100 hours? _Answer_: (a). Given the binary representation of $0.1$ in double precision: - Sign: $0$ - Exponent: $0111111101101111111011$, which is 1019 in decimal ⇒ effective exponent is $1029-1023=-4$ - Significand: $10011001100110011001100110011001100110011001100110101001100110011001100110011001100110011001100110011010$ the binary digits will be chopped off at 52 bit. Therefore, $\epsilon_{mach} = 2^{-52}$ and thus $\text{roundoff error} = \frac{1}{2}\epsilon_{mach} = 2^{-53} \approx 1.11×10^{−16}$ (b). After 100 hours: $100 × 60 × 60 × 10 × 1.11 × 10^{−16} \approx 3.996×10^{−10} sec$ --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/A2 tags: - swfr4x03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2" title: Gaussian elimination, LU decompositions, and errors LS solving date: 2023-10-24 --- **Problem 1 \[8 points]** Consider the system $Ax = b$, where $A=\begin{bmatrix} 0.1 & 0.3 & 0.9\\ 0.3 & 0.9 & 2.7\\ 0.6 & 0.7 & 0.1 \end{bmatrix}$ and $b = \begin{bmatrix} 1.3 & 3.9 & 1.4\end{bmatrix}^T$ a. \[2 points] Show that $A$ is singular. b. \[2 points] If we were to use Gaussian elimination with partial pivoting to solve this system using exact arithmetic, show where the process fails. c. \[2 points] Solve this system in double precision using partial pivoting. Do not use Matlab’s functions. What is the solution that you obtain? d. \[2 points] Matlab’s `A\b` produces `NaN -Inf Inf` as a solution. Explain why NaN, -Inf and Inf. _Answer_: a. _For $A$ to be singular, prove $det(A) = 0$_ _Using Gaussian elimination without partial pivoting_ $$ \begin{aligned} A|b &= \begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.3 & 0.9 & 2.7 & | & 3.9\\ 0.6 & 0.7 & 0.1 & | & 1.4 \end{bmatrix} \\\ R_{2} - R_{1} \rightarrow A|b &= \begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.2 & 0.6 & 1.8 & | & 2.6\\ 0.6 & 0.7 & 0.1 & | & 1.4 \end{bmatrix} \\\ R_{3} - 3*R_{1} \rightarrow A|b &= \begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.2 & 0.6 & 1.8 & | & 2.6\\ 0.3 & -0.2 & -2.6 & | & -2.5 \end{bmatrix} \\\ R_3 - \frac{1}{2}*R_2 \rightarrow A|b &= \begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.2 & 0.6 & 1.8 & | & 2.6\\ 0.2 & -0.5 & -3.5 & | & -3.8 \end{bmatrix} \\\ \text{Thus } \rightarrow A|b \leftarrow &\begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.2 & 0.6 & 1.8 & | & 2.6\\ 0.2 & -0.5 & -3.5 & | & -3.8 \end{bmatrix} \\\ & \\\ det(A) = a(ei−fh)−b(di−fg)+c(dh−eg), A &=\begin{bmatrix} a & b & c \\ d & e & f\\ g & h & i \end{bmatrix} \\\ & \\\ \rightarrow det(A) = 0.1*(-0.6*3.5+1.8*0.5) - & \\\ 0.3*(-0.2*3.5-1.8*0.2) + & \\\ 0.9*(-0.5*0.2-0.6*0.2) &= 0 \end{aligned} $$ > [!tip] Lemma > > **$A$ is singular** b. _With partial pivoting_: $$ \begin{align} A|b &=\begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.3 & 0.9 & 2.7 & | & 3.9\\ 0.6 & 0.7 & 0.1 & | & 1.4 \end{bmatrix} \\\ R3 \leftrightarrow R1 \leftarrow A|b&=\begin{bmatrix} 0.6 & 0.7 & 0.1 & | & 1.4\\ 0.3 & 0.9 & 2.7 & | & 3.9\\ 0.1 & 0.3 & 0.9 & | & 1.3 \end{bmatrix} \\\ R2 - \frac{1}{2}R1 \leftarrow A|b&=\begin{bmatrix} 0.6 & 0.7 & 0.1 & | & 1.4\\ 0 & 0.55 & 2.65 & | & 3.2\\ 0.1 & 0.3 & 0.9 & | & 1.3 \end{bmatrix} \\\ R3 - \frac{1}{6}R1 \leftarrow A|b&=\begin{bmatrix} 0.6 & 0.7 & 0.1 & | & 1.4\\ 0 & 0.55 & 2.65 & | & 3.2\\ 0 & 0.18333333 & 0.88333333 & | & 1.06666667 \end{bmatrix} \\\ R3 - \frac{1}{3}R2 \leftarrow A|b&=\begin{bmatrix} 0.6 & 0.7 & 0.1 & | & 1.4\\ 0 & 0.55 & 2.65 & | & 3.2\\ 0 & 0 & 0 & | & -0.3 \end{bmatrix} \end{align} $$ We notice that $R3-\frac{1}{3}R2 \rightarrow 0=-0.3$, thus invalid. c. _With partial pivoting in double precision_ The $LU$ decomposition of $A=\begin{bmatrix} 0.1 & 0.3 & 0.9\\ 0.3 & 0.9 & 2.7\\ 0.6 & 0.7 & 0.1 \end{bmatrix}$ The following portray steps to calculate $U$ _(lower triangular)_: $$ \begin{aligned} R_3 \leftrightarrow R_1 \rightarrow U &= \begin{bmatrix} 0.6 & 0.7 & 0.1\\ 0.3 & 0.9 & 2.7\\ 0.1 & 0.3 & 0.9 \end{bmatrix}, \quad P_1 = \begin{bmatrix} 0 & 0 & 1\\ 0 & 1 & 0\\ 1 & 0 & 0 \end{bmatrix} \\\ R_2 - \frac{1}{2}R_1 \rightarrow U &= \begin{bmatrix} 0.6 & 0.7 & 0.1\\ 0 & 0.55 & 2.6500000000000004\\ 0.1 & 0.3 & 0.9 \end{bmatrix} \\\ R_3 - \frac{1}{6}R_1 \rightarrow U &= \begin{bmatrix} 0.6 & 0.7 & 0.1\\ 0 & 0.55 & 2.6500000000000004\\ 0 & 0.18333333333333335 & 0.8833333333333333 \end{bmatrix} \\\ R_3 - \frac{1}{3}R_2 \rightarrow U &= \begin{bmatrix} 0.6 & 0.7 & 0.1\\ 0 & 0.55 & 2.6500000000000004\\ 0 & 0 & 4.8109664400423476 \times 10^{-17} \end{bmatrix} \end{aligned} $$ \_note: the $a_{33}$ is close to zero, hence consistent with previous finding\_ $L=\begin{bmatrix} 1 & 0 & 0\\ 0.5 & 1 & 0\\ 0.16666666666666669 & 0.33333333333333326 & 1 \end{bmatrix}$ To solve for $x$ with $LU$ decomposition, We solve $L(Ux)=Pb$ $\rightarrow x=\begin{bmatrix} 14.006993006993 & -10.48951048951048 & 3.3846153846153832\end{bmatrix}$ d. Since A is singular, it doesn’t have an inverse. Matlab uses LU decomposition, and as we explored above, a pivot element is found to be zero or close to zero (matrix is _probably ill-conditioned_), which leads to $0x_1 + 0x_2 + 0x_3=\text{non negative value}$, which results in `NaN`. For the second value `-Inf`, the division is small. `Inf` is due to division by zero --- **Problem 2 \[2 points]** Apply Gaussian elimination with partial pivoting on the following matrix $A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ −1 & 1 & 0 & 0 & 1\\ −1 & −1 & 1 & 0 & 1\\ −1 & −1 & −1 & 1 & 1\\ −1 & −1 & −1 & −1 & 1 \end{bmatrix}$ Show all the steps. _Answer_: $A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ −1 & 1 & 0 & 0 & 1\\ −1 & −1 & 1 & 0 & 1\\ −1 & −1 & −1 & 1 & 1\\ −1 & −1 & −1 & −1 & 1 \end{bmatrix}$ $R2+R1 \text{ and } R3+R1\text{ and } R4+R1\text{ and } R5+R1\rightarrow A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ 0 & 1 & 0 & 0 & 2\\ 0 & −1 & 1 & 0 & 2\\ 0 & −1 & −1 & 1 & 2\\ 0 & −1 & −1 & −1 & 2 \end{bmatrix}$ $R3+R2 \text{ and } R4+R2\text{ and } R5+R2\rightarrow A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ 0 & 1 & 0 & 0 & 2\\ 0 & 0 & 1 & 0 & 4\\ 0 & 0 & −1 & 1 & 4\\ 0 & 0 & −1 & −1 & 4 \end{bmatrix}$ $R4+R3 \text{ and } R5+R3\rightarrow A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ 0 & 1 & 0 & 0 & 2\\ 0 & 0 & 1 & 0 & 4\\ 0 & 0 & 0 & 1 & 8\\ 0 & 0 & 0 & −1 & 8 \end{bmatrix}$ ## $R5+R4\rightarrow A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ 0 & 1 & 0 & 0 & 2\\ 0 & 0 & 1 & 0 & 4\\ 0 & 0 & 0 & 1 & 8\\ 0 & 0 & 0 & 0 & 16 \end{bmatrix}$ **Problem 3 \[5 points]** (a) (3 points) Let $A$, $B$, and $C$ be $n × n$ matrices, where $B$ and $C$ are nonsingular. For an $n-$vector $b$, describe how you would implement the formula $x = C^{-1} (A + I)(A + B^{−1})b.$ without computing any inverses. Here, $I$ is the $n × n$ identity matrix. (b) (2 points) What is the complexity of your approach in terms of big-O notation? _Answer_: a. _Given $B$ and $C$ are non-singular_ 1. Step 1: _Using $LU$ decomposition of B, such that $B=LU$_ 2. Step 2: Solve for $y$ in $By=b$ (As $y=B^{-1}b$) 1. solve for $z$ in $Lz=b$ via forward substitution 2. solve for $y$ in $Uy=z$ via backward substitution 3. Step 3: Compute $z=(A+B^{-1})b$ 1. This becomes $z=Ab+y$ 4. Step 4: Compute $w = (A+I)z$ 1. Via _matrix multiplication_ $\rightarrow w=Az + z$ 5. Step 5: _using $LU$ decomposition of C, such that $C=LU$_ 6. Step 6: Solve for $x$ in $Cx=w$ (As $x=C^{-1}w$) 1. Solve for $z'$ in $Lz'=w$ via forward substitution 2. Solve for $x$ in $Ux=z'$ via backward substitution With expansion, solved $x = C^{-1} (A + I)(A + B^{−1})b.$ b. Complexity analysis Let `total_cost` be the big-O notation Step 1 _using $LU$ decomposition of $B$_ $\rightarrow \text{total\_cost}=O(n^3)$ Step 2 _solving each $Lz=b$ and $Uy=z$_ takes $O(n^2)$ each, thus solving $Lz=b$ using $LU$ decomposition takes $O(2n^2)$ $\rightarrow \text{total\_cost}=O(n^3) + O(2n^2)$ Step 3 _Compute $z=(A+B^{-1})b$_ - MatmulOp of $Ab$ is $O(n^2)$ - AddOp of $Ab+y$ is $O(n)$ - Total for this step $O(n^2) + O(n)$ $\rightarrow \text{total\_cost}=O(n^3) + O(3n^2) + O(n)$ Step 4 _Compute $w = (A+I)z$_ - MatmulOp of $Ab$ is $O(n^2)$ - AddOp of $Ab+y$ is $O(n)$ - Total for this step $O(n^2) + O(n)$ $\rightarrow \text{total\_cost}=O(n^3) + O(4n^2) + O(2n)$ Step 5 _using $LU$ decomposition of $C$_ $\rightarrow \text{total\_cost}=O(2n^3) + O(4n^2) + O(2n)$ Step 6 _solving each $Lz'=w$ and $Ux=z'$ using LU composition_ takes $O(2n^2)$ $\rightarrow \text{total\_cost}=O(2n^3) + O(6n^2) + O(2n)$ --- **Problem 4 \[6 points]** An $n × n$ Hilbert matrix, denote it by $H$, has entries $h_{ij} = \frac{1}{(i+j-1)}, i, j = 1, . . . , n.$ For $n = 2, 3, . . .$ , generate the Hilbert matrix of order $n$, and also generate the $n-$vector $b = Hx$, where $x$ is a random vector. Solve the resulting system $Hx = b$ to obtain an approximate solution $\hat{x}$. (See the functions `hilb` and `rand`.) (a) \[2 points] How large can you take $n$ before the error $\frac{\Vert{\hat{x} - x}\Vert}{\Vert{x}\Vert}$ is 100 percent? (b) \[2 points] For $n$ up to the value you find in (a), report $\frac{\Vert{r}\Vert}{\Vert{b}\Vert}$ , where $r = b − H\hat{x}$, and $\frac{\Vert{\hat{x} - x}\Vert}{\Vert{x}\Vert}$. (c) \[2 points] As $n$ increases, how does the number of correct digits in the computed solution relate to the condition number of the matrix? See the `cond` function. Submit your Matlab program producing the above results. Name the Matlab file `hilb_problem.m`. _Answer_: The following `hilb_problem.m` is used: ```matlab title="hilb_problem.m" function hilb_problem() n = 1; while true % Generate Hilbert matrix of order n H = hilb(n); % Generate random vector x x = rand(n, 1); % Compute b = Hx b = H * x; % Solve the system Hx = b x_hat = H \ b; % Compute the relative error error = norm(x_hat - x) / norm(x); fprintf("error=%d, n=%d\n", error, n) % If the error is 100 percent, break if error >= 1 break; end n = n + 1; end fprintf('\n=============\n\nThe largest n before the error is 100 percent is: %d\n\n=============\n', n-1); for i = 1:n-1 H = hilb(i); x = rand(i, 1); b = H * x; x_hat = H \ b; r = b - H * x_hat; rel_resid = norm(r) / norm(b); rel_error = norm(x_hat - x) / norm(x); %fprintf('%d %.16f\n',i, rel_resid) fprintf('| %d | %.32f | %.32f |\n', i, rel_resid, rel_error); end cond_num = cond(H); fprintf('The condition number of the matrix for n = %d is: %f\n', n-1, cond_num); end ``` a. largest $n=12$ before the error $\frac{\Vert{\hat{x} - x}\Vert}{\Vert{x}\Vert}$ is 100 percent. b. The following entails the value of $\frac{\Vert{r}\Vert}{\Vert{b}\Vert}$ and $\frac{\Vert{\hat{x} - x}\Vert}{\Vert{x}\Vert}$ | n | $\frac{\Vert{r}\Vert}{\Vert{b}\Vert}$ | $\frac{\Vert{\hat{x} - x}\Vert}{\Vert{x}\Vert}$ | | -- | ------------------------------------- | ----------------------------------------------- | | 1 | 0.00000000000000000000000000000000 | 0.00000000000000000000000000000000 | | 2 | 0.00000000000000000000000000000000 | 0.00000000000000013220372219891702 | | 3 | 0.00000000000000000000000000000000 | 0.00000000000000363350625815651572 | | 4 | 0.00000000000000000000000000000000 | 0.00000000000006709266750580992637 | | 5 | 0.00000000000000007733975117624287 | 0.00000000000747821082933078000054 | | 6 | 0.00000000000000013934207506736382 | 0.00000000023960543432895825359428 | | 7 | 0.00000000000000010660570398371085 | 0.00000000837749558262967895463873 | | 8 | 0.00000000000000007165565184570407 | 0.00000009992506975169996005028294 | | 9 | 0.00000000000000007076549838447114 | 0.00000608952488692639798140973303 | | 10 | 0.00000000000000012662840530707719 | 0.00002450986238666613242472361311 | | 11 | 0.00000000000000011997633780813789 | 0.00379971054180424641297242338567 | | 12 | 0.00000000000000006503338066505365 | 0.25404291536273732043937911839748 | c. _As $n$ increases, the condition number increases, which means the matrix becomes more ill-conditioned. This means fewer digits in the computed solution are correct._ > [!tip] IMPORTANT > > The number of correct digits in the computed solution decreases due to the increase in the condition number as $n$ increases --- **Problem 5 \[4 points]** You have to interpolate $sin(x)$ by a polynomial of degree five using equally spaced points in \[0, 1]. (a) \[2 points] What (absolute) error would you expect if you use this polynomial? (b) \[2 points] Using equally spaced points, what degree polynomial would you use to achieve a maximum error of $10^{-8}$? _Answer_: a. Interpolate $sin(x)$ by a polynomial of degree _five_ using equally spaced on in $[0,1]$, Error as follow $f(x) - p_n(x) = E(x) = \frac{f^{n+1}(\xi)}{(n+1)!}\prod_{i=0}^{n}{(x-x_i)}$ where - $n$ is the degree of the polynomial ($n=5$) - $f^{n+1}(\xi)$ is $(n+1)\text{-th}$ derivate of $f$ Derivate of $sin(x)$ every 4 terms is $sin(x), cos(x), -sin(x), -cos(x)$. Therefore the 6th derivative is $-cos(x)$ Here $h=\frac{b-a}{n}=\frac{1}{5}$ and $M = max_{0\leq t\leq 1}|-cos(t)| = 1 - cos(1) = 2sin^2(\frac{1}{2})$ Therefore $|E(x)| = |f(x) - sin(x)| \leq \frac{M}{4(n+1)}h^{n+1}=\frac{2sin^2(\frac{1}{2})}{4(6)}(1/5)^6 \approx 1.225860517684960×10^{−6}$ b. To achieve maximum error of $10^{-8}$, We have $|f(x) - sin(x)| \leq\frac{max_{0\leq t\leq 1}|sin^{(n+1)}(t)|}{4(n+1)*n^{n+1}} = 10^{-8}$ derivative of $sin(x)$ cycles every 4 term, thus the max value of $|sin^{(n+1)}(t)|$ over $[0,1]$ is 1 Thus we need to solve for $n$ in $\frac{1}{4(n+1)n^{n+1}}=10^{-8} \rightarrow n\approx 7 \text{ (through trial and error)}$ Hence considering to use polynomial degree _seven_ to achieve the desired error bound. --- **Problem 6 \[3 points]** You are given the values of $\sqrt{x}$ at three points | | | | | | ---------- | - | - | - | | x | 1 | 4 | 9 | | $\sqrt{x}$ | 1 | 2 | 3 | (a) \[2 points] Construct the interpolating polynomial interpolating these data. (b) \[1 points] Using this polynomial, approximate $\sqrt{1.5}$. _Answer_: a. To construct the interpolating polynomial for these data, we will use _Lagrange basis_ $P(x)=\sum_{i=0}^{n-1}{y_i}{L_i(x)}$ where $L_i(x)$ is the $i\text{-th}$ Lagrange basis polynomial, defined as $L_i(x) = \prod_{j=0,j\neq i}^{n-1}\frac{x-x_j}{x_i-x_j}$ With $y(x) = \sqrt{x}$, and data point $x_0=1,y_0=1;x_1=4,y_1=2;x_2=9,y_2=3$ $P(x)=\sum_{i=0}^{2}{y_i}{L_i(x)} \text{ where } L_i(x) = \prod_{j=0,j\neq i}^{2}\frac{x-x_j}{x_i-x_j}$ $L_0(x) = \frac{(x-x_1)(x-x_2)}{(x_0-x_1)(x_0-x_2)} = \frac{(x-4)(x-9)}{(1-4)(1-9)} = \frac{(x-4)(x-9)}{24}$ $L_1(x) = \frac{(x-x_0)(x-x_2)}{(x_1-x_0)(x_1-x_2)}=\frac{(x-1)(x-9)}{(4-1)(4-9)}=\frac{(x-1)(9-x)}{15}$ $L_2(x) = \frac{(x-x_0)(x-x_1)}{(x_2-x_0)(x_2-x_1)}=\frac{(x-1)(x-4)}{(9-1)(4-1)} = \frac{(x-4)(x-1)}{40}$ $P(x) = y_0L_0(x) + y_1L_1(x) + y_2L_2(x) = 1 * \frac{(x-4)(x-9)}{24} + 2*\frac{(x-1)(9-x)}{15} + 3*\frac{(x-4)(x-1)}{40}$ > [!tip] IMPORTANT > > The interpolating polynomial $P(x)=\frac{(x-4)(x-9)}{24} + \frac{2(x-1)(9-x)}{15} + \frac{3(x-4)(x-1)}{40}$ b. The approximation of $P(\sqrt{1.5})=\frac{(1.5-4)(1.5-9)}{24} + \frac{2(1.5-1)(9-1.5)}{15} + \frac{3(1.5-4)(1.5-1)}{40}=1.1875$ --- **Problem 7 \[7 points]** Let $f(x) = \frac{sin(x)}{(1+20x)^2}$. Interpolate this function over $x \in [−1, 1]$ using (a) \[2 points] polynomial interpolation of degree $n = 15$ at equally spaced points. Then evaluate this polynomial at $N = 100$ equally spaced points. Denote the interpolating polynomial by $p(x)$. Plot - $f(x)$ and $p(x)$ versus $x$ at the interpolation points and at the $N$ points (on the same plot); - $|f(x) − p(x)|$ versus $x$ at the $N$ points. You can use the `polyfit` function. See the `linspace` function. (b) \[2 points] Repeat (a) but now using Chebyshev points. (c) \[2 points] Repeat (a) but now using spline interpolation at $n + 1$ equally spaced points. See the `spline` function. (d) \[1 points] Discuss the accuracies of your results. Submit your plots (6 in total) and the Matlab code producing them. Name your Matlab file `interp_problem.m`. _Answer_ $f(x)$ implementation in matlab are as follow: ```matlab f = @(x) sin(x)./((1 + 20*x).^2); ``` a. The following is a snippet of `interp_problem.m` for polynomial interpolation of degree $n=15$ ```matlab % (a) Polynomial interpolation of degree n = 15 at equally spaced points % Define the number of interpolation points and the degree of the polynomial n = 15; N = 100; % Generate n+1 equally spaced points in the interval [-1, 1] x = linspace(-1, 1, n+1); y = f(x); % Interpolate using polyfit p_coeff = polyfit(x, y, n); % Evaluate the interpolating polynomial at N equally spaced points x_N = linspace(-1, 1, N); p_N = polyval(p_coeff, x_N); % Plot f(x) and p(x) on the same graph figure; plot(x_N, f(x_N), 'b-', x_N, p_N, 'r--', x, y, 'go'); legend('f(x)', 'p(x)', 'Interpolation Points'); title('f(x) and p(x) vs. x'); xlabel('x'); ylabel('y'); % Plot the absolute error |f(x) - p(x)| at the N points figure; plot(x_N, abs(f(x_N) - p_N), 'm-'); title('Absolute Error |f(x) - p(x)| vs. x'); xlabel('x'); ylabel('Error'); ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig1.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig2.webp) b. The following is a snippet of `interp_problem.m` for Cheybyshev points ```matlab % (b) Polynomial interpolation using Chebyshev points % Generate Chebyshev points in the interval [-1, 1] x_cheb = cos((2*(1:n+1)-1)*pi/(2*n)); y_cheb = f(x_cheb); % Interpolate using polyfit p_cheb_coeff = polyfit(x_cheb, y_cheb, n); % Evaluate the interpolating polynomial at N equally spaced points p_cheb_N = polyval(p_cheb_coeff, x_N); % Plot f(x) and p(x) using Chebyshev points on the same graph figure; plot(x_N, f(x_N), 'b-', x_N, p_cheb_N, 'r--', x_cheb, y_cheb, 'go'); legend('f(x)', 'p(x) with Chebyshev', 'Interpolation Points'); title('f(x) and p(x) with Chebyshev vs. x'); xlabel('x'); ylabel('y'); % Plot the absolute error |f(x) - p(x)| using Chebyshev points at the N points figure; plot(x_N, abs(f(x_N) - p_cheb_N), 'm-'); title('Absolute Error |f(x) - p(x) with Chebyshev| vs. x'); xlabel('x'); ylabel('Error'); ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig3.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig4.webp) c. The following is a snippet of `interp_problem.m` through spline interpolation at $n + 1$ equally spaced points. ```matlab % (c) Spline interpolation at n+1 equally spaced points % Evaluate the function at n+1 equally spaced points y_spline = f(x); % Use the spline function to get the piecewise polynomial representation pp = spline(x, y_spline); % Evaluate the spline at N equally spaced points spline_N = ppval(pp, x_N); % Plot f(x) and the spline on the same graph figure; plot(x_N, f(x_N), 'b-', x_N, spline_N, 'r--', x, y_spline, 'go'); legend('f(x)', 'spline(x)', 'Interpolation Points'); title('f(x) and spline(x) vs. x'); xlabel('x'); ylabel('y'); % Plot the absolute error |f(x) - spline(x)| at the N points figure; plot(x_N, abs(f(x_N) - spline_N), 'm-'); title('Absolute Error |f(x) - spline(x)| vs. x'); xlabel('x'); ylabel('Error'); ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig5.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig6.webp) d. Discussion 1. The polynomial interpolation using equally spaced points _might show oscillations_ near endpoints due to _Runge phenomenon_ (oscillations near the endpoints of the interpolated interval become pronounced). We saw oscillation in the error graph here. 2. Polynomial interpolation using Chebyshev points should mitigate the oscillations 3. The spline interpolation will provide a piecewise polynomial that should fit the function smoothly and might offer better accuracy than polynomial interpolation --- **Problem 8 \[4 points]** Given the three data points $(−1, 1), (0, 0), (1, 1)$, determine the interpolating polynomial of degree two using: a. \[1 point] monomial basis b. \[1 point] Lagrange basis c. \[1 point] Newton basis \[1 point] Show that the three [representations](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/representations) give the same polynomial. _Answer_: a. Monomial basis The monomial basis for a polynomial of degree two is given by: $p(x)=a_0+a_1*x+a_2*x^2$ The linear system as follow $a_0-a_1+a_2=1$ $a_0=0$ $a_0+a_1+a_2=1$ Solving this system to obtain the $a_0=0,a_1=0, a_2=1$ > [!note] NOTE > > Thus _monomial basis_ of this polynomial of degree two is $p(x) = x^2$ b. Lagrange basis The Lagrange basis for a polynomial of degree two is given by: $p(x)=\sum_{j=0}^{2}{y_j}{L_j(x)} = f(x_0)L_0{(x)} + f(x_1)L_1{(x)} + f(x_2)L_2{(x)}$ where $L_0(x) = \frac{(x-x_1)(x-x_2)}{(x_0-x_1)(x_0-x_2)} = \frac{x(x-1)}{2}$ $L_1(x) = \frac{(x-x_0)(x-x_2)}{(x_1-x_0)(x_1-x_2)}=-x(x-1)$ $L_2(x) = \frac{(x-x_0)(x-x_1)}{(x_2-x_0)(x_2-x_1)}=\frac{x(x+1)}{2}$ Thus $p(x) = 1*\frac{x(x-1)}{2} + 0*(-x(x-1)) + \frac{x(x+1)}{2} = x^2$ > [!note] NOTE > > Thus _Lagrange basis_ of this polynomial of degree two is $p(x) = x^2$ c. Newton basis The Newton basis for a polynomial of degree two is given by: $p(x)=f(x_0)+(x-x_0)f[x_0, x_1] + (x-x_0)(x-x_1)f[x_0, x_1, x_2]$ where $f[x_0,x_1]=\frac{f(x_1)-f(x_0)}{x_1-x_0} = \frac{0-1}{0+1} = -1$ $f[x_0,x_1,x_2]=\frac{f[x_1, x_2]-f[x_0, x_1]}{x_2-x_0} = \frac{1+1}{1+1} = 1$ We have $f[x_1,x_2]=\frac{f(x_2)-f(x_1)}{x_2-x_1} = \frac{1-0}{1-0} = 1$ Thus $p(x)=1+(x+1)(−1)+(x+1)(x)*2 =1 - x-1 + (x^2+x)=x^2$ > [!note] NOTE > > Thus _Newton basis_ of this polynomial of degree two is $p(x) = x^2$ Therefore, we prove that all three basis yield the same polynomial for degree two. --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/A3 tags: - swfr4x03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3" title: Least squares, Trapezoidal and Simpson's rules date: 2023-11-30 --- **Problem 1** a. ```matlab function [q, nfun] = adsimpson(f, a, b, tol) persistent recursion_depth nfun_internal; if isempty(recursion_depth) recursion_depth = 0; end if isempty(nfun_internal) nfun_internal = 0; end recursion_depth = recursion_depth + 1; nfun_internal = nfun_internal + 1; % Increment function evaluations if recursion_depth > 1000 % Check recursion depth error('Maximum recursion depth exceeded.'); end c = (a + b)/2; h = b - a; fa = f(a); fb = f(b); fc = f(c); S = (h/6) * (fa + 4*fc + fb); d = (a + c)/2; e = (c + b)/2; fd = f(d); fe = f(e); Sleft = (h/12) * (fa + 4*fd + fc); Sright = (h/12) * (fc + 4*fe + fb); S2 = Sleft + Sright; if abs(S2 - S) < 15*tol q = S2 + (S2 - S)/15; else mid = (a + b)/2; [q_left, nfun_left] = adsimpson(f, a, mid, tol/2); [q_right, nfun_right] = adsimpson(f, mid, b, tol/2); q = q_left + q_right; nfun_internal = nfun_internal + nfun_left + nfun_right; end if nargout > 1 nfun = nfun_internal; end recursion_depth = recursion_depth - 1; if recursion_depth == 0 nfun_internal = 0; % Reset on the last exit end end ``` b. ```matlab function q = dsimpson(f, a, b, c, d, tol) function qx = integrand_x(y) [qx, ~] = adsimpson(@(x) f(x, y), a, b, tol); end [q, ~] = adsimpson(@(y) integrand_x(y), c, d, tol); end ``` The output are as follow ```prolog dsimpson 2.9491801536006179e-01 integral2 2.9491801499984915e-01 |dsimpson-integral2| =3.60e-10 ``` --- **Problem 2** ```matlab title="pendulum.m" function pendulum % Define the range for x values x_values = linspace(-0.99, 0.99, 200); % Adjust the number of points for smoothness K_values = zeros(size(x_values)); evals = zeros(size(x_values)); tol = 1e-10; % Define the integrand for the elliptic integral of the first kind for i = 1:length(x_values) x = x_values(i); integrand = @(theta) 1 ./ sqrt(1 - x^2 .* sin(theta).^2); % Use adsimpson to integrate and capture the number of function evaluations [K_values(i), evals(i)] = adsimpson(integrand, 0, pi/2, tol); end % Plot K(x) versus x figure; plot(x_values, K_values); title('Complete Elliptic Integral of the First Kind K(x) versus x'); xlabel('x'); ylabel('K(x)'); % Plot the number of function evaluations versus x figure; plot(x_values, evals); title('Number of Function Evaluations versus x'); xlabel('x'); ylabel('Number of Function Evaluations'); end ``` The following graph are then produced ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a3-p2-f1.svg) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a3-p2-f2.svg) _Explanation_ The graph show extreme spike at both end of the range, close to `+-1` The graph shows an extreme spike in the number of function evaluations at both ends of the $x$ range, close to $\pm1$. This is consistent with the expectation that as $x$ approaches $\pm1$, the integrand of the complete elliptic integral of the first kind, $\frac{d\theta}{\sqrt{1 - x^2 \sin^2 \theta}}$, approaches a singularity for some $theta$ within the interval $[0, \pi/2]$. When $x$ is near $\pm1$, the term $x^2 \sin^2 \theta$ can approach $1$, causing the denominator to approach zero and the integrand to become very large or approach infinity, especially as $\theta$ approaches $\pi/2$. The adaptive Simpson’s method tries to maintain the specified tolerance by increasing the number of intervals (thus function evaluations) where the integrand varies rapidly or becomes difficult to approximate due to singular behavior. Near these singularities, even small intervals can have large differences in the integrand values, leading the adaptive algorithm to recursively subdivide the intervals, resulting in a substantial increase in function evaluations. The sharp increase in function evaluations at the edges of the graph indicates that the algorithm is working as expected, refining the integration intervals to handle the challenging behavior of the integrand near the points where it is not well-behaved. The function evaluations become extremely high as the integrand requires very fine subdivisions to approximate the integral within the specified tolerance near the singular points. --- **Problem 3** ### C Trapezoidal rule ```matlab title="trapezoid.m" function I = trapezoid(f, a, b, n) % Composite Trapezoidal Rule x = linspace(a, b, n+1); % Generate n+1 points from a to b y = f(x); dx = (b - a)/n; I = (dx/2) * (y(1) + 2*sum(y(2:end-1)) + y(end)); end ``` ### C Simpson’s rule ```matlab title="simpson.m" function I = simpson(f, a, b, n) % Composite Simpson's Rule % Ensure n is even if mod(n, 2) == 1 warning('Simpson’s rule requires an even number of intervals.'); n = n + 1; end x = linspace(a, b, n+1); % Generate n+1 points from a to b y = f(x); dx = (b - a)/n; I = (dx/3) * (y(1) + 4*sum(y(2:2:end-1)) + 2*sum(y(3:2:end-2)) + y(end)); end ``` a. Given $\int_{0}^{\frac{\pi}{2}}e^xcos(x)dx$ with absolute error of at most $tol=10^{-4}$ #### Trapezoidal The error bound is given by $E_t\leq \frac{(b-a)^3}{12n^3}max_{a\leq x\leq b}|f^{''}(x)|$, where $f(x)=e^xcos(x)$ $f^{''}(x)=e^x(2cos(x) - 2sin(x))$ Since $e^x$ increasing and $|cos(x)-sin(x)|$ maximised at $x=\frac{\pi}{4}$ Therefore $f^{''}(x)$ is maximised at $x=\frac{\pi}{4}$ for interval $[0, \frac{\pi}{2}]$ $max|f^{''}(x)| = |e^{\frac{\pi}{4}}(2cos(\frac{\pi}{4}) - 2sin(\frac{\pi}{4}))| = e^{\frac{\pi}{4}}\sqrt{2}$ Then, we need to solve for $\frac{(\frac{\pi}{2})^3}{12n^2}e^{\frac{\pi}{4}}\sqrt{2} \leq 10^{-4}$ and gives $n \geq 101$ to satisfy the `tol` #### Simpson’s The error bound is given by $E_s \leq \frac{(b-a)^5}{180n^4}max_{a\leq x\leq b}|f^{4}(x)|$ $f^{4}(x)=e^x(-4sin(x) - 4cos(x))$ on interval $[0, \frac{\pi}{2}]$ is approx. 19.2419 Then, we need to solve for $\frac{(\frac{\pi}{2})^5}{180n^4}max|f^{4}(x)| \leq 10^{-4}$, which yields $n \geq 12$ b. #### Trapezoidal Using the following ```matlab f = @(x) exp(x) .* cos(x); a = 0; b = pi/2; tol = 1e-4; % Compute the exact integral value exact_integral = integral(f, a, b); % Initialize n and the approximate integral n = 1; approx_integral = 0; while true n = n + 1; % Increment n % Compute the trapezoidal approximation approx_integral = trapezoid(f, a, b, n); % Calculate the absolute error error = abs(exact_integral - approx_integral); % Check if the error is within the tolerance if error <= tol break; end end % Display the smallest n that meets the tolerance requirement disp(n); ``` yield $n \geq 110$ #### Simpson’s Using the following ```matlab f = @(x) exp(x) .* cos(x); a = 0; b = pi/2; tol = 1e-4; % Compute the exact integral value exact_integral = integral(f, a, b); % Initialize n (must be even for Simpson's rule) and the approximate integral n = 2; % Start with the smallest even number approx_integral = 0; while true % Compute the Simpson's approximation approx_integral = simpson(f, a, b, n); % Calculate the absolute error error = abs(exact_integral - approx_integral); % Check if the error is within the tolerance if error <= tol break; end n = n + 2; % Increment n by 2 to ensure it's even end % Display the smallest n that meets the tolerance requirement disp(['The smallest n for Simpson''s rule is ', num2str(n)]); ``` yield $n \geq 8$ c. #### Trapezoidal The following ```matlab f = @(x) exp(x) .* cos(x); a = 0; b = pi/2; n_values = 2:200; % n can be any integer for the trapezoidal rule tol = 1e-4; exact_integral = integral(f, a, b); % Initialize arrays to store the actual errors and theoretical error bounds actual_errors_trap = zeros(size(n_values)); bounds_trap = zeros(size(n_values)); % Compute the second derivative for the trapezoidal rule error bound f_second = @(x) exp(x) .* (cos(x) - sin(x) - sin(x) - cos(x)); % f''(x) max_f_second = max(abs(f_second(linspace(a, b, 1000)))); % Max over [a, b] % Calculate errors and bounds for each n for i = 1:length(n_values) n = n_values(i); % Trapezoidal rule calculations approx_integral_trap = trapezoid(f, a, b, n); actual_errors_trap(i) = abs(exact_integral - approx_integral_trap); bounds_trap(i) = ((b - a)^3 / (12 * n^2)) * max_f_second; end % Plot the error bounds and actual errors on a loglog plot figure; loglog(n_values, bounds_trap, 'r-', n_values, actual_errors_trap, 'b--'); legend('Trapezoid Bound', 'Trapezoid Actual'); title('Error Bounds and Actual Errors for Trapezoidal Rule'); xlabel('n (number of subintervals)'); ylabel('Error'); ``` yields ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a3-p3-c-trapezoidal.webp) #### Simpson’s The following: ```matlab title="errors.m" f = @(x) exp(x) .* cos(x); a = 0; b = pi/2; n_values = 2:2:200; % Simpson's rule requires an even number of intervals tol = 1e-4; exact_integral = integral(f, a, b); % Initialize arrays to store the actual errors and theoretical error bounds actual_errors_simp = zeros(size(n_values)); bounds_simp = zeros(size(n_values)); % Compute the fourth derivative for Simpson's rule error bound max_f_4th = max(abs(exp(linspace(a, b, 1000)) .* (cos(linspace(a, b, 1000)) - 4.*sin(linspace(a, b, 1000)) - 6.*cos(linspace(a, b, 1000)) - 4.*sin(linspace(a, b, 1000)) + cos(linspace(a, b, 1000))))); % Calculate errors and bounds for each n for i = 1:length(n_values) n = n_values(i); % Simpson's rule calculations approx_integral_simp = simpson(f, a, b, n); actual_errors_simp(i) = abs(exact_integral - approx_integral_simp); bounds_simp(i) = ((b - a)^5 / (180 * n^4)) * max_f_4th; end % Plot the error bounds and actual errors on a loglog plot figure; loglog(n_values, bounds_simp, 'r-', n_values, actual_errors_simp, 'b--'); legend('Simpson Bound', 'Simpson Actual'); title('Error Bounds and Actual Errors for Simpson''s Rule'); xlabel('n (number of subintervals)'); ylabel('Error'); ``` yields ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a3-p3-c-simpsons.webp) d. #### Trapezoidal Error bound for theoretical is proportional to $\frac{1}{n^2}$, therefore on the `loglog` the theoretical appears to be a straight lines with negative slope. Slope should be `-2`, because the error bound decreases with square of `# n` The actual error observed also diminished as $n$ becomes larger. Similar to error bound, the actual error is expected to decrease with increase in n, but may decrease faster/slower. In `loglog` plot, it then appears to be straight line. #### Simpson’s Error bound for theoretical is proportional to $\frac{1}{n^4}$, therefore on the `loglog` the theoretical appears to be a straight lines with negative slope. The actual error observed when using Simpson’s rule also shows a rapid decrease with increasing $n$. The actual error may decrease faster than the error bound predicts because the bound is a worst-case estimate. The true error often is less than this bound, especially for well-behaved functions. The difference in slopes between the actual error curve and the theoretical error bound curve is expected. The theoretical curve represents the maximum possible error, not the exact error, which can be much less depending on how the function behaves within each subinterval. The actual error may flatten as $n$ increases past a certain point. This is due to the limitations of numerical precision in Matlab. --- **Problem 4** ```matlab title="timeadd.m" function timeadd % Define the sizes of the matrices sizes = 500:100:1500; times_addR = zeros(length(sizes), 1); times_addC = zeros(length(sizes), 1); % Time the functions and record the execution times for i = 1:length(sizes) n = sizes(i); A = rand(n, n); B = rand(n, n); f_addR = @() addR(A, B); f_addC = @() addC(A, B); times_addR(i) = timeit(f_addR); times_addC(i) = timeit(f_addC); end % Perform least squares fitting to the model t = cn^2 X = [ones(length(sizes), 1), sizes'.^2]; crow_krow = X \ times_addR; ccol_kcol = X \ times_addC; % Output the constants fprintf('crow: %e\n', crow_krow(1)); fprintf('krow: %e\n', crow_krow(2)); fprintf('ccol: %e\n', ccol_kcol(1)); fprintf('kcol: %e\n', ccol_kcol(2)); % Plot the results figure; loglog(sizes, times_addR, 'o-', 'DisplayName', 'addR'); hold on; loglog(sizes, times_addC, 'o-', 'DisplayName', 'addC'); xlabel('Matrix Size (n)'); ylabel('Time (seconds)'); title('Time Complexity of Matrix Addition'); legend show; grid on; end function C = addR(A, B) [n, ~] = size(A); C = zeros(n, n); for i = 1:n C(i, :) = A(i, :) + B(i, :); end end function C = addC(A, B) [n, ~] = size(A); C = zeros(n, n); for j = 1:n C(:, j) = A(:, j) + B(:, j); end end ``` Yields ```matlab crow: -7.047139e-03 krow: 2.787915e-08 ccol: -4.545719e-04 kcol: 1.913233e-09 ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a3-p4-timeadd.webp) Reason for $k_{row} \approx 3$ 1. Overhead of function call: we include a lot of measurement noise in the function, so probably will increase system load and other process. 2. `addR` memory access: `addR` is not optimal since MATLAB’s column-major order. Accessing elements row-wise can lead to cache misses and inefficient usage of memory bandwidth. 3. Added overheads, maybe associated with MATLAB’s JIT compilation, memory management. 4. Polynomial fitting: LS model fits a polynomial of form $t=c+kn^2$. If error that increase with $n$, then there is a leading overestimation of the quadratic term. --- **Problem 5** $y=ae^{x^2} + bx^3$ For each datapoint $(x_i, y_i)$, compute the residual as $r_i=ae^{x_i^2}+bx_{i}^{3} - y_i$ Sum of squared residuals $S=\sum_{i=1}^{n}{r_i^{2}}$ Or in this case $S=(ae^{-1}-b-0)+(a-1)^2 + (ae+b-2)^2$ is minimized Or $\frac{\partial S}{\partial a}=0$ and $\frac{\partial S}{\partial b}=0$ which results to $2(ae^{-1}-b)(e^{-1}) + 2(a-1) + 2(ae+b-2)e = 0$ and $-2(ae^{-1} -b) + 2(ae+b-2)=0$ $a=\frac{2e+2e^2+2e^3}{1+4e^2+e^4}$ and $b=\frac{-e^3+2+e+4e^2}{1+4e^2+e^4}$ --- **Problem 6** a. $r_k =k(l_k-l_0) -F(l_k)$ $\phi(k)=\sum_{k=1}^{n}[k(l_k-l_0) - F(l_k)]^2$ $\frac{\partial \phi}{\partial k}=\sum_{k=1}^{n}2[k(l_k-l_0) - F(l_k)](l_k-l_0)=0$ Or $k\sum_{k=1}^{n}(l_k-l_0)^2=\sum_{k=1}^{n}F(l_k)(l_k-l_0) \rightarrow k=\frac{\sum_{k=1}^{n}F(l_k)(l_k-l_0)}{\sum_{k=1}^{n}(l_k-l_0)^2}$ $k \approx 0.8996 N/m$ ```python # Given data l_values = [7, 9.4, 12.3] # l values F_values = [2, 4, 6] # F(l) values l0 = 5.3 # Unstretched length of the spring # Calculate the numerator and denominator for the k value numerator = sum([F * (l - l0) for F, l in zip(F_values, l_values)]) denominator = sum([(l - l0)**2 for l in l_values]) # Calculate k k = numerator / denominator ``` b. Using the same logic with additional data, we get $k\approx 0.9052 N/m$ ```python # Additional measurements for part B additional_l_values = [8.3, 11.3, 14.4, 15.9] # Additional l values additional_F_values = [3, 5, 8, 10] # Additional F(l) values # Combine old and new data points all_l_values = l_values + additional_l_values all_F_values = F_values + additional_F_values # Calculate the numerator and denominator for the new k value numerator_all = sum([F * (l - l0) for F, l in zip(all_F_values, all_l_values)]) denominator_all = sum([(l - l0)**2 for l in all_l_values]) # Calculate the new k using all data k_all = numerator_all / denominator_all ``` To determine which constant `k` best fit the dataset, we calculate the sum of squares of residuals `SSR` using entire datasets ```python # Calculate the sum of squares of residuals for the original k and the new k def sum_of_squares(k, l_values, F_values, l0): return sum([(k * (l - l0) - F)**2 for l, F in zip(l_values, F_values)]) # Sum of squares of residuals using k from part A for the whole data SSR_k = sum_of_squares(k, all_l_values, all_F_values, l0) # Sum of squares of residuals using k from part B for the whole data SSR_k_all = sum_of_squares(k_all, all_l_values, all_F_values, l0) SSR_k, SSR_k_all ``` This yield SSR from A is approx. 0.9062, whereas from part B is approx 0.8962. The lower the better here, which means part B is a better fit to the entire data comparing to part A. --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/A4 tags: - swfr4x03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4" title: SGD, ODEs date: 2023-11-30 --- ## P1 You are given the file [points.mat](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/points.mat) with training data. There are three kinds of points. The goal is to train with these data and classify the points in $[0, 1] × [0, 1]$. Modify the file [netbpfull.m](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/netbpfull.m) such that it works with the three categories of points. • Load the data with `load points.mat`. This will result in an array $x$ containing points in 2D and an array `labels` containing labels. • Modify the function cost such that it returns $\text{accuracy}=\frac{\text{number of points classified correctly}}{\text{total number of points}}*100$ and also returns the indices (in `x`) of training points that are not classified correctly. • [netbpfull.m](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/netbpfull.m) should plot - accuracy versus number of iterations - cost versus number of iterations and - two plots like in ![this plot](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/Figure1.webp) - The training should stop if accuracy of 95% is reached; otherwise it should continue to `Niter=1e6`. For full marks, you need to achieve 95%. For pretty code, see [net.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/net.py). _Solution_ The following contains the diff of the original `netbpfull.m` and the modified version ```diff diff --git a/content/thoughts/university/compsci-4x03/netbpfull.m b/content/thoughts/university/compsci-4x03/netbpfull.m index 5a1b6e13..df3714f9 100644 --- a/content/thoughts/university/compsci-4x03/netbpfull.m +++ b/content/thoughts/university/compsci-4x03/netbpfull.m @@ -1,134 +1,225 @@ function netbp_full %NETBP_FULL % Extended version of netbp, with more graphics -% -% Set up data for neural net test -% Use backpropagation to train -% Visualize results -% -% C F Higham and D J Higham, Aug 2017 -% -% -% xcoords, ycoords, targets -x1 = [0.1,0.3,0.1,0.6,0.4,0.6,0.5,0.9,0.4,0.7]; -x2 = [0.1,0.4,0.5,0.9,0.2,0.3,0.6,0.2,0.4,0.6]; -y = [ones(1,5) zeros(1,5); zeros(1,5) ones(1,5)]; - -figure(1) -clf -a1 = subplot(1,1,1); -plot(x1(1:5),x2(1:5),'ro','MarkerSize',12,'LineWidth',4) -hold on -plot(x1(6:10),x2(6:10),'bx','MarkerSize',12,'LineWidth',4) -a1.XTick = [0 1]; -a1.YTick = [0 1]; -a1.FontWeight = 'Bold'; -a1.FontSize = 16; -xlim([0,1]) -ylim([0,1]) - -%print -dpng pic_xy.webp - - -% Initialize weights and biases + +% Load the data +load points.mat x; % This loads 'x' which contains points and 'labels' +load points.mat labels; % This loads 'x' which contains points and 'labels' + +x_mean = mean(x, 2); +x_std = std(x, 0, 2); +x = (x - x_mean) ./ x_std; % Normalize the data + +% Initialize weights and biases for a network with three outputs rng(5000); -W2 = 0.5*randn(2,2); -W3 = 0.5*randn(3,2); -W4 = 0.5*randn(2,3); -b2 = 0.5*randn(2,1); -b3 = 0.5*randn(3,1); -b4 = 0.5*randn(2,1); - - - -% Forward and Back propagate -% Pick a training point at random -eta = 0.05; +num_hidden_1 = 20; % Increased the number of neurons +num_hidden_2 = 20; +W2 = randn(num_hidden_1, 2) * 0.01; +W3 = randn(num_hidden_2, num_hidden_1) * 0.01; +W4 = randn(size(labels, 1), num_hidden_2) * 0.01; +b2 = zeros(num_hidden_1, 1); +b3 = zeros(num_hidden_2, 1); +b4 = zeros(size(labels, 1), 1); + +% Training parameters +eta = 0.001; % Adjusted learning rate +alpha = 0.89; % Momentum term +alpha_leak = 0.01; % Define this once at the beginning of your script +lambda = 0.001; % L2 Regularization strength Niter = 1e6; -savecost = zeros(Niter,1); +batch_size = 16; % Adjusted batch size for batch training +% Learning rate decay +decay_rate = 0.99; +decay_step = 10000; % Apply decay every 10000 iterations + +% buffers +savecost = zeros(Niter, 1); +saveaccuracy = zeros(Niter, 1); +savemisclassified = cell(Niter, 1); + +% Momentum variables +mW2 = zeros(size(W2)); +mW3 = zeros(size(W3)); +mW4 = zeros(size(W4)); +mb2 = zeros(size(b2)); +mb3 = zeros(size(b3)); +mb4 = zeros(size(b4)); + +% Training loop with batch training for counter = 1:Niter - k = randi(10); - x = [x1(k); x2(k)]; - % Forward pass - a2 = activate(x,W2,b2); - a3 = activate(a2,W3,b3); - a4 = activate(a3,W4,b4); - % Backward pass - delta4 = a4.*(1-a4).*(a4-y(:,k)); - delta3 = a3.*(1-a3).*(W4'*delta4); - delta2 = a2.*(1-a2).*(W3'*delta3); - % Gradient step - W2 = W2 - eta*delta2*x'; - W3 = W3 - eta*delta3*a2'; - W4 = W4 - eta*delta4*a3'; - b2 = b2 - eta*delta2; - b3 = b3 - eta*delta3; - b4 = b4 - eta*delta4; - % Monitor progress - newcost = cost(W2,W3,W4,b2,b3,b4); % display cost to screen - newcost = cost(W2,W3,W4,b2,b3,b4); % display cost to screen - fprintf("iter=% 5d cost=%e\n", counter, newcost) + % Select a batch of points + batch_indices = randperm(size(x, 2), batch_size); + x_batch = x(:, batch_indices); + labels_batch = labels(:, batch_indices); + + % Initialize gradients for the batch + gradW2 = zeros(size(W2)); + gradW3 = zeros(size(W3)); + gradW4 = zeros(size(W4)); + gradb2 = zeros(size(b2)); + gradb3 = zeros(size(b3)); + gradb4 = zeros(size(b4)); + + % Loop over all examples in the batch + for k = 1:batch_size + xk = x_batch(:, k); + labelk = labels_batch(:, k); + + % Forward pass + a2 = actfn(xk, W2, b2, 'leaky_relu'); + a3 = actfn(a2, W3, b3, 'leaky_relu'); + a4 = actfn(a3, W4, b4, 'sigmoid'); + + % Backward pass + delta4 = (a4 - labelk) .* a4 .* (1 - a4); + delta3 = (W4' * delta4) .* (a3 > 0 + alpha_leak * (a3 <= 0)); % Leaky ReLU derivative + delta2 = (W3' * delta3) .* (a2 > 0 + alpha_leak * (a2 <= 0)); % Leaky ReLU derivative + + % Accumulate gradients over the batch + gradW4 = gradW4 + delta4 * a3'; + gradW3 = gradW3 + delta3 * a2'; + gradW2 = gradW2 + delta2 * xk'; + gradb4 = gradb4 + delta4; + gradb3 = gradb3 + delta3; + gradb2 = gradb2 + delta2; + end + + % Average gradients over the batch + gradW4 = gradW4 + (lambda / batch_size) * W4; + gradW3 = gradW3 + (lambda / batch_size) * W3; + gradW2 = gradW2 + (lambda / batch_size) * W2; + gradb4 = gradb4 / batch_size; + gradb3 = gradb3 / batch_size; + gradb2 = gradb2 / batch_size; + + % Update weights with gradients + mW4 = alpha * mW4 - eta * gradW4; + mW3 = alpha * mW3 - eta * gradW3; + mW2 = alpha * mW2 - eta * gradW2; + mb4 = alpha * mb4 - eta * gradb4; + mb3 = alpha * mb3 - eta * gradb3; + mb2 = alpha * mb2 - eta * gradb2; + + W4 = W4 + mW4; + W3 = W3 + mW3; + W2 = W2 + mW2; + b4 = b4 + mb4; + b3 = b3 + mb3; + b2 = b2 + mb2; + % Calculate cost and accuracy for the whole dataset + [newcost, accuracy, misclassified] = cost(W2, W3, W4, b2, b3, b4, x, labels); savecost(counter) = newcost; + saveaccuracy(counter) = accuracy; + savemisclassified{counter} = misclassified; + + % Apply decay to the learning rate + if mod(counter, decay_step) == 0 + eta = eta * decay_rate; + end + + % Early stopping if accuracy is above 95% + if accuracy >= 95 + fprintf('Achieved 95\n', counter, newcost, accuracy); + end end -figure(2) -clf -semilogy([1:1e4:Niter],savecost(1:1e4:Niter),'b-','LineWidth',2) -xlabel('Iteration Number') -ylabel('Value of cost function') -set(gca,'FontWeight','Bold','FontSize',18) -print -dpng pic_cost.webp - -%%% Display shaded and unshaded regions -N = 500; -Dx = 1/N; -Dy = 1/N; -xvals = [0:Dx:1]; -yvals = [0:Dy:1]; -for k1 = 1:N+1 - xk = xvals(k1); - for k2 = 1:N+1 - yk = yvals(k2); - xy = [xk;yk]; - a2 = activate(xy,W2,b2); - a3 = activate(a2,W3,b3); - a4 = activate(a3,W4,b4); - Aval(k2,k1) = a4(1); - Bval(k2,k1) = a4(2); - end +% After training loop: Plot accuracy vs. number of iterations +figure; +plot(saveaccuracy); +xlabel('Number of Iterations'); +ylabel('Accuracy (%)'); +title('Accuracy vs. Number of Iterations'); + +% Plot cost vs. number of iterations +figure; +plot(savecost); +xlabel('Number of Iterations'); +ylabel('Cost'); +title('Cost vs. Number of Iterations'); + +% Plot decision boundaries and points +% First, create a meshgrid to cover the input space +[xv, yv] = meshgrid(linspace(min(x(1,:)), max(x(1,:)), 100), linspace(min(x(2,:)), max(x(2,:)), 100)); +mesh_x = [xv(:)'; yv(:)']; +mesh_a2 = actfn(mesh_x, W2, b2, 'leaky_relu'); +mesh_a3 = actfn(mesh_a2, W3, b3, 'leaky_relu'); +mesh_a4 = actfn(mesh_a3, W4, b4, 'sigmoid'); +[~, mesh_classes] = max(mesh_a4); +mesh_classes = reshape(mesh_classes, size(xv)); + +% Find the misclassified points from the last iteration +misclassified_indices = savemisclassified{end}; +classified_correctly_indices = setdiff(1:size(x, 2), misclassified_indices); + +% First Plot: Decision boundaries and correctly classified points only +figure; +contourf(xv, yv, mesh_classes); +hold on; +gscatter(x(1,classified_correctly_indices), x(2,classified_correctly_indices), vec2ind(labels(:,classified_correctly_indices)), 'rgb', 'osd', 12, 'LineWidth', 4); +title('Decision Boundaries and Correctly Classified Points'); +xlabel('Feature 1'); +ylabel('Feature 2'); +legend('Class 1', 'Class 2', 'Class 3'); +hold off; + +% Second Plot: Decision boundaries and misclassified points only +figure; +contourf(xv, yv, mesh_classes); +hold on; +gscatter(x(1,misclassified_indices), x(2,misclassified_indices), vec2ind(labels(:,misclassified_indices)), 'rgb', 'osd', 12, 'LineWidth', 4); +title('Decision Boundaries and Misclassified Points Only'); +xlabel('Feature 1'); +ylabel('Feature 2'); +legend('Misclassified'); +hold off; + + +% Activation function with switch for ReLU +function z = actfn(x, W, b, activation_type) + if strcmp(activation_type, 'leaky_relu') + % Define the Leaky ReLU slope for negative inputs + alpha_leak = 0.01; + z = max(alpha_leak * (W * x + b), W * x + b); + elseif strcmp(activation_type, 'relu') + z = max(0, W * x + b); + else + z = 1 ./ (1 + exp(-W * x - b)); + end +end + +% Cost function with accuracy and misclassified indices calculation +function [costval, accuracy, misclassified] = cost(W2, W3, W4, b2, b3, b4, x, labels) + misclassified = []; + correct_count = 0; + costval = 0; % Initialize the cost value + + for i = 1:size(x, 2) + input = x(:, i); + target = labels(:, i); + a2 = actfn(input, W2, b2, 'leaky_relu'); + a3 = actfn(a2, W3, b3, 'leaky_relu'); + a4 = actfn(a3, W4, b4, 'sigmoid'); + + % Compute the cross-entropy loss + epsilon = 1e-12; % since it could happen log(0), so set a small epsilon + costval = costval - sum(target .* log(a4 + epsilon) + (1 - target) .* log(1 - a4 + epsilon)); + + [~, predicted_class] = max(a4); + actual_class = find(target == 1); + if predicted_class == actual_class + correct_count = correct_count + 1; + else + misclassified = [misclassified, i]; + end + end + costval = costval / size(x, 2); % Average the cost over all examples + accuracy = (correct_count / size(x, 2)) * 100; end -[X,Y] = meshgrid(xvals,yvals); - -figure(3) -clf -a2 = subplot(1,1,1); -Mval = Aval>Bval; -contourf(X,Y,Mval,[0.5 0.5]) -hold on -colormap([1 1 1; 0.8 0.8 0.8]) -plot(x1(1:5),x2(1:5),'ro','MarkerSize',12,'LineWidth',4) -plot(x1(6:10),x2(6:10),'bx','MarkerSize',12,'LineWidth',4) -a2.XTick = [0 1]; -a2.YTick = [0 1]; -a2.FontWeight = 'Bold'; -a2.FontSize = 16; -xlim([0,1]) -ylim([0,1]) - -print -dpng pic_bdy_bp.webp - - function costval = cost(W2,W3,W4,b2,b3,b4) - - costvec = zeros(10,1); - for i = 1:10 - x =[x1(i);x2(i)]; - a2 = activate(x,W2,b2); - a3 = activate(a2,W3,b3); - a4 = activate(a3,W4,b4); - costvec(i) = norm(y(:,i) - a4,2); - end - costval = norm(costvec,2)^2; - end % of nested function end ``` I have done the following changes - While loading `points.mat`, X is now normalized such that training can converge faster - The hidden layers has now been increased to 20 neurons each - `W4` and `b4` are now initialized with the number of classes in the dataset - The weights are initialized with a smaller scale (multiplied by 0.01), likely to maintain a tighter initial distribution, reducing the risk of saturation of neurons if a sigmoid activation function is used. - Hyperparameters tuning: - The initial learning rate has been reduced to `0.001` for more stable training - Momentum (`alpha`) is set to 0.89, which is used to update the weight changes. This is to help the neural net get out of local minima points so that a more important global minimum is found. - Added batch-size of `16` for mini-batch gradient descent, a balance between SGD and single gradient descent. - Added learning rate decay, which reduces the learning rate by a factor of `0.99` every `10000` iterations. This is to help the neural net converge to a global minimum. - Training process: - Introduce batch-aware training, which trains the neural net with a batch of points instead of a single point. This is to help the neural net converge faster. - Updated activation function to provide three options: `sigmoid`, `relu`, and `leaky_relu`. The latter two are used for the hidden layers, while the former is used for the output layer. - Update the backpropagation to compute LeaKy ReLU derivatives for the hidden layers. - Updated gradients over batches, and also update the weights with L2 regularization. - Finally, updated cost functions from norm-based error (MSE) to cross-entropy loss, which is more suitable for classification problems. - Activation function: The leaky ReLU activation function is explicitly defined, which helps to mitigate the “dying ReLU” problem where neurons can become inactive and only output zero. The following contains graphs of the training process: ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p1-acc.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p1-cost.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p1-correct.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p1-failed.webp) ## P2 Implement in Matlab the bisection and Newton’s method for finding roots of scalar equations. Use your implementation of the bisection method to find a root of a. $\frac{1}{x} - exp(2-\sqrt{x})$ in $[0.1, 1]$ b. $x*sin(x) − 1$ in $[0, 2]$. Use your implementation of Newton’s method and Matlab’s `fsolve` to find a root of a. $\frac{1}{x} - exp(2-\sqrt{x})$ with initial guess $x_0 = 1$ b. $x*sin(x) − 1$ with initial guess $x_0 = 2$ For the bisection method, use $\text{tol} = 10*{−10}$. For your Newton and `fsolve`, solve until $|f(x_nn)| \leq 10^{−10}$. If you are obtaining different roots, explain the differences. Also, discuss the number of iterations. _Solution_ ### Bisection method ```matlab title="bisection.m" function [root, fval, iter] = bisection(f, a, b, tol) if f(a) * f(b) >= 0 error('f(a) and f(b) must have opposite signs'); end iter = 0; while (b - a) / 2 > tol iter = iter + 1; c = (a + b) / 2; if f(c) == 0 break; end if f(a) * f(c) < 0 b = c; else a = c; end end root = (a + b) / 2; fval = f(root); end f1 = @(x) 1/x - exp(2 - sqrt(x)); a1 = 0.1; b1 = 1; tol = 1e-9; [root_bisect1, fval_bisect1, iter_bisect1] = bisection(f1, a1, b1, tol); f2 = @(x) x*sin(x) - 1; a2 = 0; b2 = 2; [root_bisect_2, fval_bisect_2, iter_bisect_2] = bisection(f2, a2, b2, tol); ``` ### Newton method ```matlab title="newton.m" function [root, fval, iter] = newton(f, df, x0, tol) maxIter = 1000; % Limit number of iterations to prevent infinite loop iter = 0; x = x0; fx = f(x); while abs(fx) > tol && iter < maxIter iter = iter + 1; x = x - fx / df(x); fx = f(x); end root = x; fval = fx; end df1 = @(x) -1/x^2 + (1/(2*sqrt(x))) * exp(2 - sqrt(x)); x1 = 1; [root_newton_1, fval_newton_1, iter_newton_1] = newton(f1, df1, x1, tol); df2 = @(x) sin(x) + x*cos(x); x2 = 2; [root_newton_2, fval_newton_2, iter_newton_2] = newton(f2, df2, x2, tol); ``` ### `fsolve` ```matlab title="fsolve.m" options = optimoptions('fsolve', 'Display', 'off', 'FunctionTolerance', 1e-9); [root_fsolve_1, fval_fsolve_1, exitflag_1, output_1] = fsolve(f1, x1, options); iter_fsolve_1 = output_1.iterations; [root_fsolve_2, fval_fsolve_2, exitflag_2, output_2] = fsolve(f2, x2, options); iter_fsolve_2 = output_2.iterations; ``` ### Table For $\frac{1}{x} - exp(2-\sqrt{x})$ | method | root $r$ | $f(r)$ | num. iterations | | --------- | -------- | ----------- | --------------- | | bisection | 0.2152 | 8.7809e-09 | 29 | | Newton | 28.6942 | -2.5223e-14 | 9 | | `fsolve` | 28.5131 | -3.7357e-04 | 12 | For $x*sin(x) − 1$ | method | root $r$ | $f(r)$ | num. iterations | | --------- | -------- | ----------- | --------------- | | bisection | 1.1142 | 4.3660e-10 | 30 | | Newton | -9.3172 | -2.4834e-11 | 5 | | `fsolve` | 1.1142 | -1.9488e-08 | 3 | ### Analysis For $\frac{1}{x} - exp(2-\sqrt{x})$ 1. Bisection method: The bisection method found a root in the interval $[0.1,1]$ as expected. This method guarantees convergence to a root when it exists within the interval and the function changes sign. However, it is generally slower, as indicated by the higher number of iterations. 2. Newton’s method: converged to a completely different root, which is outside the interval considered for the bisection method. This shows that Newton’s method is highly sensitive to the initial guess. It also converges faster (fewer iterations) but can lead to roots that are far from the initial guess if the function is complex or if the derivative does not behave well. 3. `fsolve`: Similar to Newton’s method, `fsolve` also found a root far from the interval used for the bisection method. Likely uses a variant of Newton’s method or a similar approach, which explains the similar behavior. For $x*sin(x) − 1$ 1. Bisection method: As with the first function, the bisection method finds a root within the specified interval. The method is reliable but slow, as seen from the number of iterations. 2. Newton’s method: converged to a negative root, which is quite far from the interval $[0,2]$. This indicates that for this particular function, the method diverged significantly from the initial guess due to the function’s complex behavior, especially when considering trigonometric functions combined with polynomial terms. **Discussion**: - Root Differences: The significant differences in roots, especially for Newton’s method and `fsolve`, highlight the sensitivity of these methods to initial guesses and the nature of the function. For complex functions, especially those with multiple roots, the choice of the initial guess can lead to convergence to entirely different roots. - Number of Iterations: Newton’s method and `fsolve` generally require fewer iterations than the bisection method, demonstrating their faster convergence rate. However, this comes at the cost of potentially finding different roots, as seen in the results. ## P3 The annuity equation is $A =\frac{P}{r}(1 − (1 + r)^{−n})$ where $A$ is borrowed amount, $P$ is the amount of each payment, $r$ is the interest rate per period, and there are $n$ equally spaced payments. - Write Newton’s method for finding $r$. - Implement the function `function r = interest(A, n, P)` which returns the annual interest rate. Your function must call `fsolve`. Ensure that `fsolve` uses the analytical form of the derivative. Report the values of `interest(100000, 20*12, 1000), interest(100000, 20*12, 100)`. Interpret the results. _Solution_ ### Newton’s function for finding $r$ Given $A =\frac{P}{r}(1 − (1 + r)^{−n})$ We have $f(r)=\frac{P}{r}(1 − (1 + r)^{−n}) - A$ Newton’s methods says $r_1 = r_0 - \frac{f(r_0)}{f^{'}(r_0)}$, with $f^{'}(r)=-\frac{P}{r^2}(1-(1+r)^{-n}) + \frac{Pn}{r}(1+r)^{-n-1}$ Thus, $r = r_0 - \frac{\frac{P}{r_0}(1 − (1 + r_0)^{−n}) - A}{-\frac{P}{r_0^2}(1-(1+r_0)^{-n}) + \frac{Pn}{r_0}(1+r_0)^{-n-1}}$ ### Implementation ```matlab title="interest.m" function r = interest(A, n, P) % Define the function f(r) function value = f(r) value = P/r * (1 - (1 + r)^-n) - A; end % Define the derivative f'(r) function value = df(r) value = -P/r^2 * (1 - (1 + r)^-n) + P*n/r * (1 + r)^(-n-1); end % Initial guess for r r_initial = 0.05; % A typical starting value for interest rates % Solve for r using fsolve options = optimoptions('fsolve', 'Display', 'none', 'SpecifyObjectiveGradient', true); r = fsolve(@(r) deal(f(r), df(r)), r_initial, options); end f_1000=interest(100000, 20*12, 1000) f_100=interest(100000, 20*12, 100) ``` From calculation, `f_100=-0.0099` and `f_1000=0.0500`. Given here that we use `r_0=0.05` (or 5% interests rate) For $P=1000$, the interest rate that satisfies the annuity equation is approximately 5% For $P=100$, the interest rate required to satisfy the loan conditions would have to be different from 5%. The negative value of the function (-0.0099) suggests that the actual interest rate required to meet the annuity equation under these conditions is lower than the initial guess of 5%. _Note that the initial value here affects Newton’s approximation vastly. If one changes to `1%` one might observe different value_ ## P4 Consider Newton’s method on $x^5 − x^3 − 4x = 0$ a. How do the computed approximations behave with $x_0 = 1$? b. Try your implementation with $x_0 = 1$ and $x_0 = 1 + 10^{−14}$. Explain why this method behaves differently, when started with $x_0 = 1 + 10^{−14}$, compared to when it is started with $x_0 = 1$. c. Solve also with `fsolve`. Comment on the results. _Solution_ Given the Newton’s implementation ```matlab function [root, fval, iter] = newton(f, df, x0, tol) maxIter = 1000; % Limit number of iterations to prevent infinite loop iter = 0; x = x0; fx = f(x); while abs(fx) > tol && iter < maxIter iter = iter + 1; x = x - fx / df(x); fx = f(x); end root = x; fval = fx; end % Define the function and its derivative f = @(x) x.^5 - x.^3 - 4*x; df = @(x) 5*x.^4 - 3*x.^2 - 4; % Initial guess x0 = 1 x0 = 1; tol = 1e-10; root = newton(f, df, x0, tol); disp(root); % Initial guess x0 = 1 + 10^-14 x0 = 1 + 1e-14; root = newton(f, df, x0, tol); disp(root); ``` a. The approximation converges to $x=1.0$. This indicates that the method finds a root at $x=1$. b. Newton’s Method with $x_0=1$: The approximation converges to $x=1.0$. This indicates that the method finds a root at $x=1$. Newton’s Method with $x_0=1+10^{−14}$: The approximation converges to a different value, approximately $x=1.600485180440241$. This suggests that a small change in the initial guess leads Newton’s method to converge to a different root, highlighting the method’s sensitivity to initial conditions. c. Using `fsolve` ```matlab options = optimoptions('fsolve','Display','none'); % Suppress fsolve output root_fsolve = fsolve(f, 1, options); disp(root_fsolve); ``` The fsolve result differs from the Newton’s method result for the same initial guess (yields `0`). This could be due to the inherent differences in the algorithms used by `fsolve`. [fsolve](https://www.mathworks.com/help/optim/ug/fsolve.html) in matlab uses `Levenberg-Marquardt`, which finds roots approximately by minimizing the sum of squares of the function and is quite robust, comparing to the heuristic implementation of the Newton’s implementation. ## P5 Implement Newton’s method for systems of equations. Each of the following systems of nonlinear equations may present some difficulty in computing a solution. Use Matlab’s `fsolve` and your own implementation of Newton’s method to solve each of the systems from the given starting point. In some cases, the nonlinear solver may fail to converge or may converge to a point other than a solution. When this happens, try to explain the reason for the observed behavior. Report for `fsolve` and your implementation of Newton’s method and each of the systems below, the number of iterations needed to achieve accuracy of $10^{−6}$ (if achieved). ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p5-a.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p5-b.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p5-c.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p5-d.webp) _Solution_ For all of the following Newton’s method implementation, it will follow the following framework ```matlab function p5 % Define the system of equations F function F = equations(x) end % Define the Jacobian of the system function J = jacobian(x) end % Initial guess x0 = ...; % Tolerance and maximum number of iterations tol = 1e-6; max_iter = 100; % Newton's method x = x0; for iter = 1:max_iter F_val = equations(x); J_val = jacobian(x); delta = -J_val \ F_val; % Solve for the change using the backslash operator x = x + delta; % Update the solution % Check for convergence if norm(delta, Inf) < tol fprintf('Newton''s method: Solution found after %d iterations.\n', iter); fprintf('x1 = %.6f, x2 = %.6f\n', x(1), x(2)); break; end end if iter == max_iter fprintf('Newton''s method: No solution found after %d iterations.\n', max_iter); end % fsolve method options = optimoptions('fsolve', 'Display', 'off', 'TolFun', tol, 'MaxIterations', max_iter); [x_fsolve, ~, exitflag, output] = fsolve(@equations, x0, options); if exitflag > 0 % fsolve converged to a solution fprintf('fsolve: Solution found after %d function evaluations.\n', output.funcCount); fprintf('x1 = %.6f, x2 = %.6f\n', x_fsolve(1), x_fsolve(2)); else fprintf('fsolve: No solution found, exit flag = %d.\n', exitflag); end end ``` Where `equations` is the Newton’s system of equations, `jacobian` is the Jacobian of the system, followed by `x` as the approximation and check for convergence. ### A. ```matlab function p5 % Define the system of equations function F = equations(x) F = zeros(2,1); % Ensure F is a column vector F(1) = x(1) + x(2)*(x(2)*(5 - x(2)) - 2) - 13; F(2) = x(1) + x(2)*(x(2)*(1 + x(2)) - 14) - 29; end % Define the Jacobian of the system function J = jacobian(x) J = zeros(2,2); % Initialize J as a 2x2 matrix J(1,1) = 1; J(1,2) = (5 - 3*x(2))*x(2) - 2; J(2,1) = 1; J(2,2) = (1 + 3*x(2))*x(2) - 14; end % Initial guess x0 = [15; -2]; % Tolerance and maximum number of iterations tol = 1e-6; max_iter = 100; % Newton's method x = x0; for iter = 1:max_iter F_val = equations(x); J_val = jacobian(x); delta = -J_val \ F_val; % Solve for the change using the backslash operator x = x + delta; % Update the solution % Check for convergence if norm(delta, Inf) < tol fprintf('Newton''s method: Solution found after %d iterations.\n', iter); fprintf('x1 = %.6f, x2 = %.6f\n', x(1), x(2)); break; end end if iter == max_iter fprintf('Newton''s method: No solution found after %d iterations.\n', max_iter); end % fsolve method options = optimoptions('fsolve', 'Display', 'off', 'TolFun', tol, 'MaxIterations', max_iter); [x_fsolve, ~, exitflag, output] = fsolve(@equations, x0, options); if exitflag > 0 % fsolve converged to a solution fprintf('fsolve: Solution found after %d function evaluations.\n', output.iterations); fprintf('x1 = %.6f, x2 = %.6f\n', x_fsolve(1), x_fsolve(2)); else fprintf('fsolve: No solution found, exit flag = %d.\n', exitflag); end end ``` yields ```prolog Newton's method: Solution found after 23 iterations. x1 = 5.000000, x2 = 4.000000 fsolve: No solution found, exit flag = -2. ``` Since Newton’s method is locally convergent, meaning that if the starting point is close enough to the actual solution, it will usually converge quickly, and in this case, it did. This means the initial guess was sufficiently close with the true solution. However, `fsolve` did not converge to a solution. Since `fsolve` uses Levenberg-Marquardt algorithm (This algorithm is a [trust-region type algorithm](https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm), which is a combination of the Gauss-Newton algorithm and the method of gradient descent), it does come with limitation: 1. The Levenberg-Marquardt algorithm can be sensitive to the starting values. If the initial guess is not sufficiently close to the true solution, the algorithm may not converge. (which we observed) 2. Local minima: The algorithm may converge to a local minimum instead of a global minimum, especially if the function landscape is complex with multiple minima. exit flag of -2 means that the two consecutive steps taken by the algorithm were unable to decrease the residual norm, and the algorithm terminated prematurely. ### B ```matlab function p5 % Define the system of equations function F = equations(x) F = zeros(3,1); % Ensure F is a column vector F(1) = x(1)^2 + x(2)^2 + x(3)^2 - 5; F(2) = x(1) + x(2) - 1; F(3) = x(1) + x(3) - 3; end % Define the Jacobian of the system function J = jacobian(x) J = zeros(3,3); % Initialize J as a 3x3 matrix J(1,1) = 2*x(1); J(1,2) = 2*x(2); J(1,3) = 2*x(3); J(2,1) = 1; J(2,2) = 1; J(2,3) = 0; J(3,1) = 1; J(3,2) = 0; J(3,3) = 1; end % Initial guess x0 = [(1+sqrt(3))/2; (1-sqrt(3))/2; sqrt(3)]; % Tolerance and maximum number of iterations tol = 1e-6; max_iter = 100; % Newton's method x = x0; for iter = 1:max_iter F_val = equations(x); J_val = jacobian(x); delta = -J_val \ F_val; % Solve for the change using the backslash operator x = x + delta; % Update the solution % Check for convergence if norm(delta, Inf) < tol fprintf('Newton''s method: Solution found after %d iterations.\n', iter); fprintf('x1 = %.6f, x2 = %.6f, x3 = %.6f\n', x(1), x(2), x(3)); break; end end if iter == max_iter fprintf('Newton''s method: No solution found after %d iterations.\n', max_iter); end % fsolve method options = optimoptions('fsolve', 'Display', 'off', 'TolFun', tol, 'MaxIterations', max_iter); [x_fsolve, ~, exitflag, output] = fsolve(@equations, x0, options); if exitflag > 0 % fsolve converged to a solution fprintf('fsolve: Solution found after %d function evaluations.\n', output.iterations); fprintf('x1 = %.6f, x2 = %.6f, x3 = %.6f\n', x_fsolve(1), x_fsolve(2), x_fsolve(3)); else fprintf('fsolve: No solution found, exit flag = %d.\n', exitflag); end end ``` yields ```prolog Newton's method: Solution found after 57 iterations. x1 = 1.666667, x2 = -0.666667, x3 = 1.333333 fsolve: Solution found after 6 function evaluations. x1 = 1.000000, x2 = 0.000000, x3 = 2.000000 ``` Newton’s method takes steps based directly on the local derivative information, potentially taking large steps when far from the solution and smaller steps when closer. fsolve, when using the Levenberg-Marquardt algorithm, combines aspects of the gradient descent method (which takes smaller, more cautious steps) with the Gauss-Newton method (which is more aggressive). This can lead to different paths through the solution space and convergence to different solutions fsolve might have found a local minimum, which it mistook for a global minimum, while Newton’s method might have bypassed this due to its larger initial steps. ### C ```matlab function p5 % Define the system of equations function F = equations(x) F = zeros(4,1); % Ensure F is a column vector F(1) = x(1) + x(2)*10; F(2) = sqrt(5)*(x(3) - x(4)); F(3) = (x(2)-x(3))^2; F(4) = sqrt(10)*(x(1)-x(4))^2; end % Define the Jacobian of the system function J = jacobian(x) J = zeros(4,4); % Initialize J as a 3x3 matrix J(1, :) = [1, 10, 0, 0]; J(2, :) = [0, 0, sqrt(5), -sqrt(5)]; J(3, :) = [0, 2*(x(2) - x(3)), -2*(x(2) - x(3)), 0]; J(4, :) = [2*sqrt(10)*(x(1) - x(4)), 0, 0, -2*sqrt(10)*(x(1) - x(4))]; end % Initial guess x0 = [1; 2; 1; 1]; % Tolerance and maximum number of iterations tol = 1e-6; max_iter = 100; % Newton's method x = x0; for iter = 1:max_iter F_val = equations(x); J_val = jacobian(x); delta = -J_val \ F_val; % Solve for the change using the backslash operator x = x + delta; % Update the solution % Check for convergence if norm(delta, Inf) < tol fprintf('Newton''s method: Solution found after %d iterations.\n', iter); fprintf('x1 = %.6f, x2 = %.6f, x3 = %.6f, x4 = %.6f\n', x); break; end end if iter == max_iter fprintf('Newton''s method: No solution found after %d iterations.\n', max_iter); end % fsolve method options = optimoptions('fsolve', 'Display', 'iter', 'TolFun', tol, 'MaxIterations', max_iter); [x_fsolve, fval, exitflag, output] = fsolve(@equations, x0, options); if exitflag > 0 % fsolve converged to a solution fprintf('fsolve: Solution found after %d function evaluations.\n', output.funcCount); fprintf('x1 = %.6f, x2 = %.6f, x3 = %.6f, x4 = %.6f\n', x_fsolve); else fprintf('fsolve: No solution found, exit flag = %d.\n', exitflag); end end ``` yields ```prolog Newton's method: No solution found after 100 iterations. fsolve: Solution found after 35 function evaluations. x1 = -0.002673, x2 = 0.000267, x3 = 0.000407, x4 = 0.000407 ``` Newton’s method cannot find convergence after 100 steps because of divergence, if the initial guess is not close to the root, especially in the presence of steep gradients or saddle points. The Jacobian matrix at some point during the iteration may become ill-conditioned, which would lead to large numerical errors in the computation of the inverse or the solution of the linear system in each iteration `fsolve` converged here, meaning Levenberg-Marquardt is probably more robust in converging a local minima in this case. ### D ```matlab function p5 % Define the system of equations function F = equations(x) F = zeros(2,1); % Ensure F is a column vector F(1) = x(1); F(2) = 10*x(1) / (x(1) + 0.1) + 2*x(2)^2; end % Define the Jacobian of the system function J = jacobian(x) J = zeros(2,2); % Initialize J as a 2x2 matrix J(1, :) = [1, 0]; J(2, :) = [10*(0.1)/(x(1) + 0.1)^2, 4*x(2)]; end % Initial guess x0 = [1.8; 0]; % Tolerance and maximum number of iterations tol = 1e-6; max_iter = 100; % Newton's method x = x0; for iter = 1:max_iter F_val = equations(x); J_val = jacobian(x); delta = -J_val \ F_val; % Solve for the change using the backslash operator x = x + delta; % Update the solution % Check for convergence if norm(delta, Inf) < tol fprintf('Newton''s method: Solution found after %d iterations.\n', iter); fprintf('x1 = %.6f, x2 = %.6f\n', x(1), x(2)); break; end end if iter == max_iter fprintf('Newton''s method: No solution found after %d iterations.\n', max_iter); end % fsolve method options = optimoptions('fsolve', 'Display', 'off', 'TolFun', tol, 'MaxIterations', max_iter); [x_fsolve, ~, exitflag, output] = fsolve(@equations, x0, options); if exitflag > 0 % fsolve converged to a solution fprintf('fsolve: Solution found after %d function evaluations.\n', output.iterations); fprintf('x1 = %.6f, x2 = %.6f\n', x_fsolve(1), x_fsolve(2)); else fprintf('fsolve: No solution found, exit flag = %d.\n', exitflag); end end ``` yields ```prolog Newton's method: No solution found after 100 iterations. fsolve: Solution found after 15 function evaluations. x1 = 0.000000, x2 = -0.000316 ``` For Newton’s method: - Non-convergence: The fact that Newton’s method did not converge could be due to several factors such as a poor initial guess, especially since $x_1=0$ is one of the solutions, which may lead to a division by zero or a derivative that does not exist at some point during the iteration. - Sensitive Derivative: The function $\frac{10x_1}{(x_1+0.1)}$ has a derivative that becomes very large as $x_1$ approaches -0.1, and this can cause numerical issues, such as overflow or large rounding errors, which can prevent convergence. - Flat Regions: The method might be getting stuck in a flat region of the function where the gradient is very small, leading to very small steps that do not significantly change the estimate of the solution. For `fsolve` observation, same arguments can be made that of similar to problem C observation, with regards to robustness of Levenberg-Marquardt algorithm in solving this system of non-linear equations. ## P6 You are given the data file [data.txt](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/data.txt). Each row contains the 2D coordinates $(x_i, y_i)$ of an object at time $t_i$. This object exhibits a periodic motion. Implement the function `function period = findPeriod(file_name)` that reads the data from a file and computes the period of the periodic motion. The points in time where the object returns to the same position must be determined using fsolve. Report the value for the computed period. _Solution_ ```matlab title="findPeriod.m" function period = findPeriod(file_name) % Parse the data from the file data = parse(file_name); % Extract time, x, and y coordinates t = data(:, 1); x = data(:, 2); y = data(:, 3); % Define a tolerance for how close the object needs to be to its initial position tolerance = 1e-9; % Define interpolation functions for x and y, restricted to the range of data x_interp = @(tq) interp1(t, x, tq, 'spline', 'extrap'); y_interp = @(tq) interp1(t, y, tq, 'spline', 'extrap'); % Define the distance function from the initial position distance_from_initial = @(tq) sqrt((x_interp(tq) - x(1))^2 + (y_interp(tq) - y(1))^2); % Initial guess for fsolve - use the midpoint of the time data initial_guess = t(floor(length(t)/2)); % Use fsolve to find the time at which the distance is minimized options = optimoptions('fsolve', 'Display', 'iter', 'TolFun', tolerance, 'MaxFunEvals', 10000); t_period = fsolve(distance_from_initial, initial_guess, options); % Calculate the period period = t_period - t(1); end function data = parse(file_name) % Open the file fid = fopen(file_name, 'rt'); if fid == -1 error('Failed to open file: %s', file_name); end % Read the data from the file % Assuming the data is separated by spaces or tabs data = fscanf(fid, '%f %f %f', [3, Inf]); % Transpose the data to have rows as individual entries data = data'; % Close the file fclose(fid); end ``` yields the computed period of `39.3870` ## P7 Consider two bodies of masses $\mu = 0.012277471$ and $\hat{\mu} = 1 - \mu$ (Earth and Sun) in a planar motion, and a third body of negligible mass (moon) moving in the same plane. The motion is given by $u_1^{''} = u_1 + 2u_2^{'} -\hat{\mu}\frac{u_1+\mu}{((u_1+\mu)^2+u_2^2)^{\frac{3}{2}}}-\mu\frac{(u_1-\hat{\mu})}{((u_1-\hat{\mu})^2+u_2^2)^{\frac{3}{2}}}$ and $u_2^{''} = u_2 - 2u_1^{'} - \hat{\mu}\frac{u_2}{((u_1+\mu)^2 + u_2^2)^{\frac{3}{2}}} - \mu\frac{u_2}{((u_1 - \hat{\mu})^2 + u_2^2)^{\frac{3}{2}}}$ The initial values are $u_1(0) = 0.994$, $u_1^{'}(0) = 0$, $u_2(0) = 0$, $u_2^{'}(0) = −2.001585106379082522420537862224$. Implement the classical Runge-Kutta method of order 4 and integrate this problem on $[0,17.1]$ with uniform stepsize using 100, 1000, 10,000, and 20,000 steps. Plot the orbits for each case. How many uniform steps are needed before the orbit appears to be qualitatively correct? Submit plots and discussion. _Solution_ ```matlab title="rk4.m" function rk4 % Constants mu = 0.012277471; mu_hat = 1 - mu; % Initial Conditions u0 = [0.994, 0, 0, -2.001585106379082522420537862224]; % Time Span t_span = [0 17.1]; % Solve for different step sizes step_sizes = [100, 1000, 10000, 20000]; for i = 1:length(step_sizes) solve_with_steps(t_span, u0, step_sizes(i), mu, mu_hat); end end function solve_with_steps(t_span, u0, steps, mu, mu_hat) % RK4 Integration h = (t_span(2) - t_span(1)) / steps; t = linspace(t_span(1), t_span(2), steps); u = zeros(length(u0), length(t)); u(:,1) = u0'; for i = 1:length(t)-1 k1 = equations(t(i), u(:,i), mu, mu_hat); k2 = equations(t(i) + h/2, u(:,i) + h/2*k1, mu, mu_hat); k3 = equations(t(i) + h/2, u(:,i) + h/2*k2, mu, mu_hat); k4 = equations(t(i) + h, u(:,i) + h*k3, mu, mu_hat); u(:,i+1) = u(:,i) + h/6 * (k1 + 2*k2 + 2*k3 + k4); end % Plotting figure; plot(u(1,:), u(3,:)); xlabel('u1'); ylabel('u2'); title(sprintf('Orbit of the Third Body with RK4 (%d Steps)', steps)); grid on; % NOTE: The below is the correct approximation using ode45, % but for the sake of this assignment, we implement RK4 % t_eval = linspace(t_span(1), t_span(2), steps); % [T, U] = ode45(@(t,u) equations(t, u, mu, mu_hat), t_eval, u0); % % Plotting % figure; % plot(U(:,1), U(:,3)); % xlabel('u1'); % ylabel('u2'); % title(sprintf('Orbit of the Third Body (%d Steps)', steps)); % grid on; end function dudt = equations(t, u, mu, mu_hat) u1 = u(1); u1_prime = u(2); u2 = u(3); u2_prime = u(4); delta1 = ((u1 + mu)^2 + u2^2)^1.5; delta2 = ((u1 - mu_hat)^2 + u2^2)^1.5; du1dt = u1_prime; du1_primedt = u1 + 2*u2_prime - mu_hat*(u1 + mu)/delta1 - mu*(u1 - mu_hat)/delta2; du2dt = u2_prime; du2_primedt = u2 - 2*u1_prime - mu_hat*u2/delta1 - mu*u2/delta2; dudt = [du1dt; du1_primedt; du2dt; du2_primedt]; end ``` yields the following graph ### 100 steps ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p7-100.webp) It appears that the plot for the first 100 steps of the three-body problem using the RK4 method in MATLAB shows a spiral pattern rather than the expected closed orbit. This divergence could be due to several factors: Step Size: A step size of 100 may be too large to accurately capture the dynamics of the system, leading to significant numerical errors. The three-body problem is known for its sensitivity to initial conditions and step sizes, and thus requires a smaller step size for a more accurate solution. Numerical Stability: The RK4 method, while fourth-order accurate for each step, is not guaranteed to be stable for all step sizes and problems. ### 1000 steps ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p7-1000.webp) The plot for the 1000 steps case shows a significant improvement over the 100 steps case. This plot demonstrates a more defined and coherent orbit, which suggests that the step size is more appropriate for capturing the dynamics of the system. The spiral pattern from the 100 steps case is less pronounced, and the orbit begins to resemble the expected closed path of the three-body problem. However, there is still some noticeable deviation and distortion in the orbit, which indicates that while the solution is converging towards the correct behavior with a smaller step size, further refinement might be necessary. In practice, continuing to reduce the step size can help further improve the accuracy of the orbit. ### 10000 steps ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p7-10000.webp) The plot for 10,000 steps demonstrates a substantial improvement and now depicts a closed orbit, which is characteristic of the three-body problem when solved with sufficient numerical accuracy. This indicates that a step size small enough to capture the system’s dynamics accurately has been achieved, and the RK4 method is yielding a reliable approximation of the moon’s orbit. The orbit is smooth and does not exhibit the distortions seen in the plots with fewer steps. This suggests that the numerical integration is now sufficiently resolving the trajectory over the time span of interest. With 10,000 steps, it appears that the orbit is qualitatively correct, showing the expected behavior of a third body under the gravitational influence of the other two massive bodies. Sufficient Resolution: The step size of 10,000 steps seems to provide a high enough resolution for the RK4 method to produce a stable and accurate orbit. Numerical Accuracy: The smaller step size has reduced the numerical errors to a level where they do not significantly affect the qualitative behavior of the solution. Orbit Stability: The closed and stable orbit indicates that the solution is likely converging to the true physical behavior of the system. ### 20000 steps ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p7-20000.webp) The plot for 20,000 steps exhibits a very stable and well-defined orbit, which closely resembles the plot for 10,000 steps. This consistency between the two resolutions suggests that the numerical solution has converged, and increasing the step count further does not result in any significant changes to the orbit’s shape or accuracy. _NOTE:_ The above implementation manually implement RK4, since `ode45` is an adaptive methods and not conform to the fixed-step RK4. The equivalent of [Python’s implementation](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/rk4.py) with RK45. ## P8 The following system of ODEs, formulated by [Lorenz](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/Chaos#as-system), represents are crude model of atmospheric circulation: $$ \begin{align} y_1^{'} &= \sigma(y_2-y_1) \\\ y_2^{'} &= ry_1 - y_2 - y_1y_3 \\\ y_3^{'} &= y_1y_2 - by_3 \end{align} $$ Set $\omega = 10, b = \frac{8}{3}, r = 28$, take initial values $y_1(0) = 15, y_2(0) = 15, and y_3(0) = 36$, and integrate this ODE from $t = 0 \text{to} t = 100$ using Matlab’s `ode45`. Plot each component of the solution as a function of $t$. Plot also $(y_1, y_2)$, $(y_1, y_3)$, and $(y_2, y_3)$ (in separate plots). Change the initial values by a tiny amount (e.g. $10^{−10}$) and integrate again. Compare the difference in the computed solutions. _Solution_ ```matlab title="lorenz.m" function lorenz % Parameters sigma = 10; b = 8/3; r = 28; % Initial conditions y0 = [15; 15; 36]; % Time span tspan = [0 100]; % Solve the ODE [t, Y] = ode45(@(t,y) lorenzODE(t, y, sigma, b, r), tspan, y0); % Plotting the solutions figure; subplot(2, 2, 1); plot(t, Y(:,1), t, Y(:,2), t, Y(:,3)); title('Time Series of y1, y2, y3'); legend('y1', 'y2', 'y3'); xlabel('Time'); ylabel('Values'); subplot(2, 2, 2); plot(Y(:,1), Y(:,2)); title('y1 vs y2'); xlabel('y1'); ylabel('y2'); subplot(2, 2, 3); plot(Y(:,1), Y(:,3)); title('y1 vs y3'); xlabel('y1'); ylabel('y3'); subplot(2, 2, 4); plot(Y(:,2), Y(:,3)); title('y2 vs y3'); xlabel('y2'); ylabel('y3'); % Modify initial conditions and solve again y0_mod = y0 + 1e-10; [t_mod, Y_mod] = ode45(@(t,y) lorenzODE(t, y, sigma, b, r), tspan, y0_mod); % Interpolate Y_mod to match the time points of t Y_mod_interp = interp1(t_mod, Y_mod, t); % Compute the differences Y_diff = Y - Y_mod_interp; % Plot the differences figure; plot(t, Y_diff); title('Difference in Solutions with Modified Initial Conditions'); legend('Δy1', 'Δy2', 'Δy3'); xlabel('Time'); ylabel('Difference in Values'); end function dydt = lorenzODE(t, y, sigma, b, r) % Lorenz system ODEs dydt = [sigma*(y(2) - y(1)); r*y(1) - y(2) - y(1)*y(3); y(1)*y(2) - b*y(3)]; end ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p8-four-graphs.webp) The difference between the time series graph as shown ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p8-delta.webp) and diff between Lorenz are shown ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p8-delta-funcs.webp) The difference in the time series of $y_1, y_2, y_3$ indicates how sensitive the Lorenz system is to initial conditions. Even though the change in the initial conditions is extremely small (on the order of $10^{-10}$), the differences in the variables grow over time. This divergence is a characteristic of [chaotic](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/Chaos#as-system) systems and is known as sensitivity to initial conditions or the butterfly effect. Difference in Phase Space Graphs, or the delta plots show that the discrepancies between the two sets of solutions with slightly different initial conditions also exhibit complex behavior. Initially, the differences are small, but as time progresses, they become more pronounced, indicating that the system’s trajectory has deviated significantly from the original path. ## P9 Let A be an $n × n$ singular matrix. Let $F(X) = I − AX$ where $I$ is the $n × n$ identity matrix. When $F(X)$ is the zero $n × n$ matrix, then $X = A^{-1}$. We can use Newton’s method to find $A^{−1}$: $X_{k+1} = X_k + A^{−1}(I − AX_k)$ We replace $A^{-1}$ by $X_k$ to obtain the formula $X_{k+1} = X_k + X_k(I − AX_k)$ (1) a. Write a function to compute the inverse of a given matrix A using (1). You can use as an initial guess $X_0 = \frac{A^T}{{|A|}_1{|A|}_{\infty}}$ Test your program on a few random matrices and report numerical experiments comparing its accuracy and efficiency with Matlab’s inverse function `inv`. b. Does (1) converge quadratically? Provide sufficient detail supporting your claim. _Solution_ a. The following entails the MATLAB solution (Python equivalent is [inverse\_newt.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/inverse_newt.py)) ```matlab title="matrix_inverse_newt.m" function A_inv = matrix_inverse_newt(A) tol = 1e-9; max_iter = 100; n = size(A, 1); I = eye(n); Xk = A' / (norm(A, 1) * norm(A, inf)); for k = 1:max_iter Rk = I - A * Xk; Xk_new = Xk + Xk * Rk; % Stopping criterion based on the norm of the residual matrix if norm(Rk) < tol break; end Xk = Xk_new; end A_inv = Xk; end % Test the function with a random matrix rng(0); % Seed for reproducibility n = 4; % Size of the matrix A = rand(n, n); A_inv = matrix_inverse_newt(A); % Compare with MATLAB's built-in inverse function A_inv_true = inv(A); disp('Inverse using Newton''s method:'); disp(A_inv); disp('True Inverse:'); disp(A_inv_true); disp('Difference:'); disp(A_inv - A_inv_true); ``` yields ```prolog Inverse using Newton's method: -15.2997 3.0761 14.7235 9.6445 -0.2088 -1.8442 1.0366 1.8711 14.5694 -1.9337 -14.6497 -9.0413 -0.3690 0.5345 1.4378 -0.4008 True Inverse: -15.2997 3.0761 14.7235 9.6445 -0.2088 -1.8442 1.0366 1.8711 14.5694 -1.9337 -14.6497 -9.0413 -0.3690 0.5345 1.4378 -0.4008 Difference: 1.0e-09 * 0.6111 -0.1011 -0.6027 -0.3839 0.0353 -0.0058 -0.0348 -0.0222 -0.5881 0.0973 0.5800 0.3694 0.0273 -0.0045 -0.0269 -0.0171 ``` b. For quadratic convergence, we need $lim_{k\to\infty} \frac{|e_{k+1}|}{|e_k^2|} = C$ In this case, we need to check $E_k = X_k - A^{-1}$ as k increases. $X_{k+1} = X_k + A^{−1}(I − AX_k)$ Substitute $E_k$ we have $E_{k+1}=(I-X_kA)E_k$ > Therefore, for quadratic convergence, we need $|E_{k+1}| \leq C|E_k|^2$ for some constant $C$ From this, we have $|E_{k+1}| = |(I-X_kA)E_k| \leq |I-X_kA||E_k|$ The following modification of Python implementation is used to track errors ```python title="newt_err.py" import os, numpy as np def errors(A): A_inv = matrix_inverse_newt(A) # Recalculate the inverse and get the error at each iteration A_inv_newton, errors_newton = matrix_inverse_newt_err(A) # Now we will check for quadratic convergence by calculating the ratio of errors ratios = [] for i in range(1, len(errors_newton)-1): ratios.append(errors_newton[i+1] / errors_newton[i]**2) return ratios def matrix_inverse_newt_err(A, tol=1e-9, max_iter=100): n = A.shape[0] I = np.eye(n) A_inv_true = np.linalg.inv(A) # True inverse for error calculation Xk = A.T / (np.linalg.norm(A, 1) * np.linalg.norm(A, np.inf)) errors = [] # List to track errors over iterations for _ in range(max_iter): Rk = I - np.dot(A, Xk) Xk_new = Xk + np.dot(Xk, Rk) # Calculate and store the current error current_error = np.linalg.norm(Xk_new - A_inv_true) errors.append(current_error) # Stopping criterion based on the norm of the residual matrix if current_error < tol: break Xk = Xk_new return Xk, errors if __name__ == "__main__": # Test the function with a random matrix np.random.seed(420) # Seed for reproducibility n = 4 # Size of the matrix A = np.random.rand(n, n) print(errors(A)) ``` From the Python implementation, we calculated the ratio of the error at step $n+1$ to the square of the error at step $n$, and observed that these ratios seemed to stabilize around a constant value, rather than decreasing to zero. However, the ratios did not significantly deviate, indicating a consistent rate of convergence that could be quadratic. To assert that the convergence is quadratic, we would expect the ratios to be bounded and for $|E_{k+1}|$ to be significantly less than $|E_k|^2$ as $n$ increases. The results from Python showed that the error does decrease from one iteration to the next, which is consistent with convergence. > The discrepancy between the theoretical expectation of quadratic convergence and the observed stabilisation of the error ratios might suggest that while the Newton’s method for matrix inversion is converging, it may not exhibit pure quadratic convergence in the empirical test we conducted. There could be several reasons for this: - The matrix $A$ used in the test may not meet the conditions required for quadratic convergence throughout the iterations. - The numerical precision and floating-point representation in Python may affect the calculation of the error and its ratios. --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/Equations tags: - fruit - swfr4x03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/Equations" title: ODEs, Polynomials approx., Linear Least Squares, and Errors date: 2023-12-06 --- ### Machine epsilon $fl(x) = x(1+\mathbf{\epsilon}) \space\text{where }|\epsilon|\leq{u}$ $|\frac{fl(x)-x}{x}|=|\epsilon|\leq u \space\text{is called relative error.}$ $\text{Cancellations occur when subtracting nearby number containing roundoff.}$ ### Taylor series $$ \begin{aligned} f(x) &= \sum_{k=0}^{\inf}\frac{f^{(k)}(c)}{k!}(x-c)^k\\\ E_{n+1} &= \frac{f^{(n+1)}(\xi)}{(n+1)!}(h:=x-c)^{n+1}\\\ |E_{n+1}| \leq ch^{n+1}\\\ \end{aligned} $$ ### Polynomial Interpolation $$ \begin{aligned} v(x) = &\sum_{j=0}^{n}c_j\phi_{j}(x) \space \rightarrow \text{linearly independent iff} \space v(x) = 0 \space \forall \space x \rightarrow c_j=0 \space \forall \space j)\\\ &\\\ \text{Linear system: } &\begin{bmatrix} \phi_0(x_0) & \phi_1(x_0) & \cdots & \phi_n(x_0) \\ \phi_0(x_1) & \phi_1(x_1) & \cdots & \phi_n(x_1) \\ \vdots & \vdots & \ddots & \vdots \\ \phi_0(x_n) & \phi_1(x_n) & \cdots & \phi_n(x_n) \end{bmatrix} \begin{bmatrix} c_0 \\ c_1 \\ \vdots \\ c_n \end{bmatrix} = \begin{bmatrix} y_0 \\ y_1 \\ \vdots \\ y_n \end{bmatrix} \end{aligned} $$ $$ \begin{aligned} \text{Monomial basis: }&\phi_j(x)=x^j, \space j=0,1,...,n \space \rightarrow v(x)=\sum_{j=0}^{n}c_jx^j\\\ &p_n(x_i) = c_0 + c_1x_i + c_2x_i^2 + \cdots + c_nx_i^n = y_i \\\ &\\\ X: &\text{Vandermonde matrix} \rightarrow \text{det}(X)=\prod_{i=0}^{n-1} \left[ \prod_{j=i+1}^{n} (x_j - x_i) \right]\\\ \text{if } &x_i \space\text{are distinct:}\\\ &\bullet\space \text{det}(X) \neq 0\\\ &\bullet\space X\space \text{is nonsingular}\\\ &\bullet\space \text{system has unique solution}\\\ &\bullet\space \text{unique polynomial of degree}\leq{n}\space \text{that interpolates the data}\\\ &\bullet\space \text{can be poorly conditioned, work is }O(n^3)\\\ \end{aligned} $$ $$ \begin{aligned} \text{Lagrange basis: }&L_j(x_i) = \begin{cases} 0 & \text{if } i \neq j \\ 1 & \text{if } i = j \end{cases} \\\ &L_j(x) = \prod_{i=0,i\neq{j}}^{n}\frac{x-x_i}{x_j-x_i}\\\ &p_n(x_i) = \sum_{j=0}^{n} y_jL_j(x_i) = \sum_{j=0}^{i-1} y_jL_j(x_i) + y_iL_i(x_i) + \sum_{j=i+1}^{n} y_jL_j(x_i) = y_i\\\ \end{aligned} $$ $$ \begin{aligned} \text{Newton's basis: }&\phi_j(x)=\prod_{i=0}^{j-1}(x-x_i), j=0:n\\\ &p_n(x_i)=c_0 + c_1(x_i-x_0)+ \cdots + c_n(x_i-x_0)(x_i-x_1)\cdots(x_i-x_{n-1})=f(x_i)\\\ \end{aligned} $$ $$ \begin{aligned} &\text{Divided differences: }f[x_i,\cdots,x_j] = \frac{f[x_{i+1},\cdots,x_j]-f[x_i,\cdots,x_{j-1}]}{x_j-x_i}\\\ &\bullet\space\text{at } x=x_0 \text{ then } c_0 = f(x_0) = f[x_0]\\\ &\bullet\space\text{at } x=x_1 \text{ then } c_1 = \frac{f(x_1)-f(x_0)}{x_1-x_0} = f[x_0, x_1]\\\ &\bullet\space\text{at } x=x_2 \text{ then } c_2 = \frac{f(x_2)-c_0-c_1(x_2-x_0)}{(x_2-x_0)(x_2-x_1)} = \frac{\frac{f(x_2)-f(x_1)}{x_2-x_1}-\frac{f(x_1)-f(x_0)}{x_1-x_0}}{x_2-x_0} = f[x_0, x_1, x_2]\\\ &\\\ &\therefore\forall x\in{[a,b]}\space\exists\space\xi=\xi(x)\in(a,b)\space : \space f(x)-p_n(x)=\frac{f^{n+1}(\xi)}{(n+1)!} \prod_{i=0}^{n} (x - x_i)\\\ &\therefore\space\text{Error: } |f(x)-p_n(x)|\leq\frac{M}{4(n+1)}h^{n+1}\\\ &\text{where: }\\\ &\bullet\space M=max_{a\leq{t}\leq{b}}|f^{n+1}(t)|\\\ &\bullet\space h=\frac{b-a}{n}\\\ &\bullet\space x_i=a+ih \text{ for }i=0,1,\cdots,n \end{aligned} $$ ### Basic Numeric Integration $$ \begin{aligned} &I_f = \int_{a}^{b}{f(x)dx} \approx \sum_{j=0}^{n}a_jf(x_j)\space\text{(quadrature rule)}\\\ &\bullet\space x_0,\cdots,x_n\space\text{be distinct points in } [a,b]\\\ &\bullet\space p_n(x)\space\text{be interpolating polynomial of }f\rightarrow\space \int_{a}^{b}f(x)dx\approx\int_{a}^{b}p_n(x)dx\\\ &\bullet\space \text{Uses Lagrange form: }\int_{a}^{b}f(x)dx\approx\sum_{j=0}^{n}f(x_j)\int_{a}^{b}L_j(x)dx=\sum_{j=0}^{n}f(x_j)a_j\\\ \end{aligned} $$ $$ \begin{aligned} \text{Trapezoidal rule: } &f(x) \approx p_1(x)=f(x_0)L_0(x) + f(x_1)L_1(x)\space(n=1, x_0=a,x_1=b)\\\ \therefore\space &I_f=\int_{a}^{b}f(x)dx \approx f(a)\int_{a}^{b}{\frac{x-b}{a-b}dx} + f(b)\int_{a}^{b}{\frac{x-a}{b-a}dx} \\\ &\space\space\space\space=\frac{b-a}{2}[f(a) + f(b)]\\\ \text{Error: } &f(x) - p_1(x) = \frac{1}{2}f^{''}(\xi(x))(x-a)(x-b)\\\ \text{then: }&\int_{a}^{b}{(f(x)-p_1(x))dx} = \frac{1}{2}\int_{a}^{b}{f^{''}(\xi(x))(x-a)(x-b)dx}\\\ \text{From MVT: } &\exists\space\eta\in(a,b) \space : \space \int_{a}^{b}{f^{''}(\xi(x))(x-a)(x-b)dx} = f^{''}(\eta)\int_{a}^{b}{(x-a)(x-b)dx}\\\ \therefore\space&\text{Error of Trapezoidal rule: }\space I_f - I_{trap} = -\frac{f^{''}(\eta)}{12}(b-a)^3\\\ \end{aligned} $$ $$ \begin{aligned} \text{Midpoint rule: } &I_f \approx I_{mid} = (b-a)f(\frac{a+b}{2})\\\ &\text{Let } m=\frac{a+b}{2}\rightarrow f(x)=f(m)+f^{'}(m)(x-m)+\frac{1}{2}f^{''}(\xi(x))(x-m)^2\\\ \therefore\space&I_f = \int_{a}^{b} f(x) = (b - a)f(m) + \frac{1}{2} \int_{a}^{b} f''(\xi(x))(x - m)^2 \, dx\\\ &\exists\space\eta\in(a,b)\space : \space \frac{1}{2} \int_{a}^{b} f''(\xi(x))(x - m)^2 \, dx = \frac{f''(\eta)}{24}(b - a)^3\\\ \therefore\space&\text{Error of Midpoint rule: }\space I_f - I_{mid} = \frac{f^{''}(\eta)}{24}(b-a)^3\\\ \end{aligned} $$ $$ \begin{aligned} \text{Simpson's rule: } &I_f \approx I_{simp} = \frac{b-a}{6}[f(a) + 4f(\frac{a+b}{2}) + f(b)]\\\ &(p_2(x),n=2,x_0=a,x_1=\frac{a+b}{2},x_2=b)\\\ \therefore\space&\text{Error of Simpson's rule: }\space I_f - I_{Simpson} = -\frac{f^{(4)}(\eta)}{90}(\frac{b-a}{2})^5,\space\eta\in(a,b)\\\ \end{aligned} $$ ### Composite Numeric Integration $$ \begin{aligned} &\bullet\space\text{subdivide }[a,b]\space\text{int }r\space\text{subintervals}\\\ &\bullet\space h=\frac{b-a}{r}\space\text{length per interval}\\\ &\bullet\space t_i=a+ih\space\text{for }i=0,1,\cdots,r\\\ &t_0=a,t_r=b\space\rightarrow\space\int_{a}^{b}f(x)\,dx=\sum_{i=1}^{r}\int_{t_{i-1}}^{t_i}f(x)\,dx\\\ \end{aligned} $$ $$ \begin{aligned} \text{Composite Trapezoidal rule: } &I_{cf} = \frac{h}{2} [f(a) + f(b)] + h \sum_{i=1}^{r-1} f(t_i)\\\ \text{Error: } &I_f - I_{cf} = -\frac{f^{''}(\mu)}{12}(b-a)h^2\\\ \text{Composite Simpson rule: } &I_{cs} = \frac{h}{3} [f(a) + 2 \sum_{i=1}^{r/2-1} f(t_{2i}) + 4 \sum_{i=1}^{r/2} f(t_{2i-1}) + f(b)]\\\ \text{Error: } &I_f - I_{cs} = -\frac{f^{(4)}(\zeta)}{180}(b-a)h^4\\\ \text{Composite Midpoint rule: } &I_{cm} = h \sum_{i=1}^{r} f(a + (i - 1/2)h)\\\ \text{Error: } &I_f - I_{cm} = -\frac{f^{''}(\eta)}{24}(b-a)h^2\\\ \end{aligned} $$ ### Linear Least Squares \_Find $c_j$ such that $\sum_{k=0}^{m}(v(x*k)-y_k)^2=\sum*{k=0}^{m}(\sum*{j=0}^{n}c_j\phi_j(x_k)-y_k)^2$ is minimised\* Conditions: $\frac{\partial \phi}{\partial a} = 0, \quad \frac{\partial \phi}{\partial b} = 0$ $$ \begin{aligned} \text{Linear fit: } y_k&=ax_k+b,k=1,\cdots,m\\\ \begin{bmatrix} \sum_{k=0}^{m} x_k^2 & \sum_{k=0}^{m} x_k \\ \sum_{k=0}^{m} x_k & m + 1 \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} &= \begin{bmatrix} \sum_{k=0}^{m} x_k y_k \\ \sum_{k=0}^{m} y_k \end{bmatrix}\\\ p &= \sum_{k=0}^{m} x_k, \quad q = \sum_{k=0}^{m} y_k, \quad r = \sum_{k=0}^{m} x_k y_k, \quad s = \sum_{k=0}^{m} x_k^2\\\ \rightarrow\begin{bmatrix} s & p \\ p & m + 1 \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} &= \begin{bmatrix} r \\ q \end{bmatrix}\\\ \leftrightarrow A\mathbf{z} &= \begin{bmatrix} x_0 & 1 \\ x_1 & 1 \\ \vdots & \vdots \\ x_m & 1 \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} y_0 \\ y_1 \\ \vdots \\ y_m \end{bmatrix} = \mathbf{f}\space\text{is overdetermined}\\\ &\\\ \end{aligned} $$ $$ \begin{aligned} \text{Solving linear system: }r &= b - Ax\\\ ||r||_2^2 &= \sum_{i=1}^{m}r_i^2 = \sum_{i=1}^{m}(b_i-\sum_{j=1}^{n}a_{ij}x_j)^2\\\ \text{Let } \phi(x) &= \frac{1}{2}\|r\|^2_2 = \frac{1}{2} \sum_{i=1}^{m} (b_i - \sum_{j=1}^{n} a_{ij}x_j)^2\\\ \text{Conditions}: \frac{\partial \phi}{\partial x_k} &= 0, \quad k = 1, \cdots, n\\\ 0&=\sum_{i=1}^{m}(b_i-\sum_{j=1}^{n}a_{ij}x_j)(-a_{ik})\\\ \rightarrow \sum_{i=1}^{m}a_{ik}\sum_{j=1}^{n}a_{ij}x_j &= \sum_{i=1}^{m}a_{ik}b_i, k=1,\cdots,n\space (\text{equivalent to } A^{T}Ax=A^{T}b)\\\ \end{aligned} $$ $$ \begin{aligned} A^T Ax &= A^T b \space\text{is called the normal equations}\\\ \text{If }A \text{ has a full-column rank}, &\min_{x} \|b - Ax\|_2\space\text{has uniq sol:}\\\ x&=(A^TA)^{-1}A^Tb=A^{+}b\\\ \end{aligned} $$ $$ \begin{aligned} \text{Adaptive Simpson: find } &Q \space : \space |Q - I| \leq \text{tol}\\\ I &= \int_{a}^{b} f(x) \, dx = S(a, b) + E(a, b) \\\ S_1=S(a, b) &= \frac{h}{6} \left[ f(a) + 4f\left( \frac{a + b}{2} \right) + f(b) \right] \\\ E_1=E(a, b) &= -\frac{1}{90} \left( \frac{h}{2} \right)^5 f^{(4)}(\xi), \quad \xi \text{ between } a \text{ and } b\\\ \end{aligned} $$ $$ \begin{aligned} S =\space&\text{quadSimpson}(f, a, b, \text{tol})\\\ &h = b - a, \quad c = \frac{a + b}{2}\\\ &S_1 = \frac{h}{6} [f(a) + 4f\left(\frac{a+b}{2}\right) + f(b)]\\\ &S_2 = \frac{h}{12} [f(a) + 4f\left(\frac{a+c}{2}\right) + 2f(c) + 4f\left(\frac{c+b}{2}\right) + f(b)]\\\ &\tilde{E}_2 = \frac{1}{15}(S_2 - S_1)\\\ &\text{if} |\tilde{E}_2| \leq \text{tol}\\\ &\space\space\text{return } Q = S_2 + \tilde{E}_2 \\\ &\text{else}\\\ &\space\space Q_1 = \text{quadSimpson}(f, a, c, \text{tol}/2)\\\ &\space\space Q_2 = \text{quadSimpson}(f, c, b, \text{tol}/2)\\\ &\space\space\text{return } Q = Q_1 + Q_2 \\\ \end{aligned} $$ ### Newton’s Method for Nonlinear equations $x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}$ Convergence: if $f, f', f''$ are continuous in a neighborhood of a root $r$ of $f$ and $f'(r) \neq 0$, then $\exists\delta\ >0\space : \space |r-x_0|\leq{\delta}$, then $\forall x_n\space : \space: |r-x_n|\leq{\delta}, |r-x_{n+1}|\leq c(\delta)|r-x_n|^2$ $|e_{n+1}|\leq c(\delta)|e_n|^2$ (Quadratic convergence, order is 2) Let $c(\delta)=\frac{1}{2}*\frac{\max_{|r-x|\leq{\delta}}|f''(x)|}{\min_{|r-x|\leq{\delta}}|f'(x)|}$ For linear system: denote $\mathbf{x}=(x_1,x_2,\cdots,x_n)^T$ and $\mathbf{F}=(f_1,f_2,\cdots,f_n)$, find $\mathbf{x}^{*}$ such that $F(x^{*})=0$ $$ \begin{aligned} F(x^{(k)}) + F'(x^{(k)})(x^{(k+1)}-x^{(k)}) &= 0\\\ F'(x^{(k)}) \space&\text{is the Jacobian of } \mathbf{F} \space\text{at } x^{(k)}\\\ \text{Let } \mathbf{s} &= \mathbf{x}^{(k+1)} - \mathbf{x}^{(k)}\\\ \therefore\space F'(x^{(k)})s &= -F(x^{(k)})\\\ \mathbf{x}^{(k+1)} &= \mathbf{x} ^{(k)} + \mathbf{s}\\\ \end{aligned} $$ ### IVP in ODEs. $$ \begin{aligned} \text{Given } y'=f(t,y), y(a)=c, \text{ find } y(t) \text{ for } t\in[a,b]\\\ y' &\equiv y'(t) \equiv \frac{dy}{dt}\\\ \text{System of n first-order: } y' &= f(t,y), f: \mathbb{R} \times \mathbb{R}^n \rightarrow \mathbb{R}^n\\\ \end{aligned} $$ $$ \begin{aligned} \text{Forward Euler's method (explicit): } y_{t_{i+1}} &\approx y(t_i) + hf(t_i, y_(t_i))\\\ \text{where: }h &= \frac{b-a}{N}, N > 1\\\ h &= \text{step size}\\\ t_0 &= a, t_i=a+ih, i=1,2,\cdots,N\\\ \end{aligned} $$ $\text{Backward Euler's method (implicit): } y_{i+1} = y_i + hf(t_{i+1}, y_{i+1})$ > Non-linear, then apply Newton’s methods $$ \begin{aligned} \text{FE Stability: } y'&=\lambda{y},y(0)=y_0\\\ \text{Exact sol: } y(t)&=y_0e^{\lambda{t}}\\\ \text{FE sol with constant stepsize h: } y_{i+1}&=(1+h\lambda)y_i=(1+h\lambda)^{i+1}y_0\\\ \text{To be numerically stable: } h&\leq{\frac{2}{|\lambda|}}\\\ &\\\ \text{BE Stability: } y'&=\lambda{y},y(0)=y_0\\\ |y_{i+1}| &= \frac{1}{|1-h\lambda|}|y_i| \leq |y_i|\space\forall\space h > 0 \\\ \end{aligned} $$ ### Order, Error, Convergence and Stiffness $$ \begin{aligned} \text{Local truncation error of FE: } &d_i = \frac{y(t_{i+1}) - y(t_i)}{h} - f(t_i, y(t_i)) = \frac{h}{2}y''(\eta_i)\space\text{(q=1)}\\\ \text{Local truncation error of BE: } &d_i = -\frac{h}{2}y''(\xi_i)\space\text{(q=1)}\\\ \end{aligned} $$ $\text{A method of order }q\space\text{ if} q\text{ is the lowest positive int such that any smooth exact sol of }y(t):\max_{i}|d_i|=O(h^q)$ $$ \begin{aligned} \text{Global error: } e_i &= y(t_i) - y_i, i=0,1,\cdots,N\\\ \text{Consider } u' &= f(t,u), u(t_{i-1}) = y_{i-1}, \space\text{local error: }l_i=u(t_i)\\\ \end{aligned} $$ $$ \begin{aligned} \text{Convergence: } &\max_i e_i = \max_i |y(t_i) - y_i| \rightarrow 0 \text{ as } h \rightarrow 0\\\ \end{aligned} $$ > Stiffness is when the stepsize is restricted by stability rather than accuracy ### Runge-Kutta Methods $$ \begin{aligned} \text{Implicit trapezoidal: } y'(t) &= f(t,y), y(t_i)=y_i\\\ y_{i+1} &= y_i + \frac{h}{2} [f(t_i, y_i) + f(t_{i+1}, y_{i+1})]\\\ d_i = O(h^2) &= \frac{y(t_{i+1})-y(t_i)}{h}-\frac{1}{2}[f(t_i,y(t_i)) + f(t_{i+1},y(t_{i+1}))]\\\ &\\\ \text{Explicit trapezoidal: } Y&=y_i+hf(t_i,y_i)\\\ y_{i+1} &= y_i + \frac{h}{2} [f(t_i, y_i) + f(t_{i+1}, Y)]\\\ d_i = O(h^2) &= \frac{y(t_{i+1})-y(t_i)}{h}-\frac{1}{2}[f(t_i,y(t_i)) + f(t_{i+1},y(t_i)+hf(t_i,y(t_i)))]\\\ &\\\ \text{Implicit midpoint: } y_{i+1} &= y_i + hf(t_i+h/2, (y_i+y_{i+1})/2)\\\ \text{Explicit midpoint: } Y &= y_i + \frac{h}{2}f(t_i, y_i)\\\ \end{aligned} $$ Classical RK4: based on Simpson’s quadrature rule, $O(h^4)$ accuracy $$ \begin{align*} Y_1 &= y_i \\\ Y_2 &= y_i + \frac{h}{2}f(t_i, Y_1) \\\ Y_3 &= y_i + \frac{h}{2}f(t_i + \frac{h}{2}, Y_2) \\\ Y_4 &= y_i + hf(t_i + \frac{h}{2}, Y_3) \\\ y_{i+1} &= y_i + \frac{h}{6} [f(t_i, Y_1) + 2f(t_i + \frac{h}{2}, Y_2) + 2f(t_i + \frac{h}{2}, Y_3) + f(t_{i+1}, Y_4)]\\\ \end{align*} $$ --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/index tags: - university - swfr4x03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/index" title: Scientific Computation date: 2023-09-04 --- Introduction to Scientific Computation --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors tags: - eng3px3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors" title: Conversion Factors date: 2024-01-23 --- See also: [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors.pdf) and [this one](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-04-Conversion-Factors.pdf) Relevant to economic analysis process must: - explicitly incorporated into [NVF](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Net-value-analysis) by giving it _conversion factor_ - included as a hard constraints > conversion factor: convert benefit and costs into common units Determinants: - time, cost of labour, opportunity cost - marginal NV and quantity-dependent conversion Factors ### cost of labours. - wages - materials - overhead: HR, tools/equipment ### cost of time. - overtime shifts, extra works or outsourcing? - additional factor: happiness, time already spent (context: not all time is equal) ### opportunity cost. > negative impact from having to give up the best alternatives > [!tip] Important > > Should always consider this when going forward with a project. - cost of those forgone alternatives in _conversion units_ - costs for not solving other problems - compare NV for solving the other one. > Double counting: mutually exclusive alternatives that is considered as double-counting in calculating NVF. ### conversion function. - quantity-dependent conversion Factors $$ NV_{\text{oranges}}(x) = B_{\text{oranges}}(x) - C_{\text{oranges}}(x) $$ ### marginal value change. > extra net value obtained for one more item $$ \Delta NV = NV(x+1) - NV(x) $$ ### environmental impact conversion. > externalities: of a decision is an impact (benefit or cost) for people _other_ than decision makers. > externalities doesn’t have the same weight to benefits and costs. (failure of incentives) Correct this failure with policies: - taxes: carbon emission - subsidies ### economic of GHG emission. - changes overtime and relatively hard to calculate accurately. - 2022 study in Nature estimates at \\\frac{185}{\text{tonne}}\$ ### health costs. - difficult to answer this, but most common pollutants: $PM_{2.5}$ (fine particulate matter) and $NO$ (Nitrogen oxides) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/table-health-costs.webp) ### ethical consideration > [!question] ethical > > - What is the cost of negative societal/ethical/equality impact? > - Can you put the price on safety? F-N graph Emission for $PM_{2.5}$ per year is $$ \begin{align*} & = \frac{\text{Health cost per year}}{\text{total emission }} \cdot \text{emission off power generation} \cdot \frac{1}{\text{total annual}} \\\ & = \frac{\$166e9}{3.5e6\space \text{tonne}} * 6000 \text{ tones} * \frac{1}{640e9 \text{ kWh}} \\\ &= \$0.0004446429 \text{ per kWh} \\\ \end{align*} $$ --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Finals tags: - eng3px3 description: Economics for engineer, a guide. title: Economics for engineer, a guide. date: 2024-04-12 --- ## samples. 4.b 11.e 12.c 13.d 14.b 15.a 16.e 17.c 18.c 19.b 20.c 21.b 22.c 23.b 24.a 25.a 26.e 27.a 28.e 29.a 30.a --- ## [net value function](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function) $$ \text{NVF} = \text{benefit} - \text{cost} $$ ## conversion factors ### marginal value change. > extra net value obtained for one more item $$ \Delta NV = NV(x+1) - NV(x) $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors#marginal-value-change) ## optimisation # model-based[](#model-based) - conclusions from the model of the system Components: - decision variables - constraints - objectives - functions: mathematical function that determines the objective as a function of decision variable $$ \begin{align*} \min_{x} \phi = f(x) & &\leftarrow &\space \text{Objective function} \\\ \text{s.t} & &\leftarrow &\space \text{Constraints} \\\ h(x) = 0 & &\leftarrow &\space \text{Equality constraints} \\\ g(x) \leq 0 & &\leftarrow &\space \text{Inequality constraints} \\\ x_{lb} \leq x \leq x_{ub} & &\leftarrow &\space \text{Bounds} \end{align*} $$ ## decision variables ### discrete. > limited to a fixed or countable set of values $$ x_{\mathcal{D}} \mid a \in \mathcal{I} = \lbrace 1, 2, 3, 4, 5 \rbrace $$ ### continuous. > can take any value within a range $$ x_{\mathcal{C}} \subset \mathcal{R} $$ ## constraints - physical limitations: cannot purchase negative raw materials - model assumptions: assumptions about the system > [!tip] > > a decision upper and lower bounds ($x^{\mathcal{U}}$ and $x^{\mathcal{L}}$) > [!note] Properties > > - **Active/binding**: $\exists \space x^{*} \mid g(x^{*}) = 0$ > - **Inactive**: $\exists \space x^{*} \mid g(x^{*}) < 0$ ### graphing models > [!note] feasible set of an optimization model > > The collection of decision variables that satisfy all constraints > > $$ > \mathcal{S} \triangleq \lbrace x : g(x) \leq 0, h(x) = 0, x^L \leq x \leq x^U \rbrace > $$ ## outcomes > [!tip] optimal value > > the optimal value $\phi^{*}$ is the value of the objective at the optimum(s) > > $$ > \phi^{*} \triangleq \phi(x^{*}) > $$ > Constraints satisfy, but it is not binding Linear optimization problems $$ \begin{aligned} \underset{x_1,x_2}{\min} \space \phi &= 50x_1 + 37.5x_2 \\ &\text{s.t} \\\ 0.3x_1 + 0.4x_2 &\geq 2000 \\\ 0.4x_1 + 0.15x_2 &\geq 1500 \\\ 0.2x_1 + 0.35x_2 &\leq 1000, \\\ x_1 &\leq 9000 \\\ x_2 &\leq 6000 \\\ x_i &\geq 0 \end{aligned} $$ See also [Linear Optimization](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization/../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization) [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization#model-based) Linear optimization: ```math \begin{align*} \min_{x} \phi = c^\mathbf{T} \mathcal{x} & &\leftarrow &\space \text{Objective function} \\\ \text{s.t} & &\leftarrow &\space \text{Constraints} \\\ A_h \mathcal{x} = \mathcal{b}_h & &\leftarrow &\space \text{Equality constraints} \\\ A_g \mathcal{x} \leq \mathcal{b}g \leq 0 & &\leftarrow &\space \text{Inequality constraints} \\\ \mathcal{x}_{lb} \leq \mathcal{x} \leq \mathcal{x}_{ub} & &\leftarrow &\space \text{Variable Bounds} \end{align*} ``` [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization#linops) ## time value of money ### interest Interest $I$ is the compensation for loaning money. > [!tip] interest rate > > $i = \frac{I}{P}$. Thus $F = P(1+i)$ > [!tip] Simple interests > > $I_{\text{each}} = P \times \frac{i}{\text{year}}$, total interest $I = I_{\text{each}} \times N_{\text{year}}$ > > $F_n = P(1 + ni)$ > [!tip] Compound interests > > $F_n = P(1+i)^n$ > [!tip] nominal interest rates > > $r$ is the equivalent yearly rate if interest is withdrawn so it doesn’t compound. (i.e: $r=mi$ where $m$ is the number of compounding periods per year) > [!tip] effective annual interest rates > > $i_{\text{eff}} = (1 + \frac{r}{m})^m - 1$ > [!tip] effective interest rates > > how much interest do you accrue after a year if nominal rate is 12%? $F=P(1+i)^m=P(1+\frac{r}{m})^m$ > [!tip] continuous compounding > > $F = P e^{ry}$ ### net present value $$ \text{NPV} = \text{CF}_0 + \sum_{n=1}^{N}{\frac{\text{CF}_n}{(1+i)^n}} $$ where $\text{CF}_0$ is the initial cash flow, $\text{CF}_n$ is the cash flow at the end of the $n^{th}$ period, $i$ is the _effective interest rate_ > [!tip] discount rate > > Present value $PV = \frac{\text{CF}_t}{(1+r_d)^t}$, where $\text{CF}_t$ is cash flow happening in $t$ years in the future, and $r_d$ is the discount rate. > > sources: opportunity cost, inflation, risk, time preference, inflation, option premium regular deposit: Future value $FV = A \sum_{k=0}^{n-1}(1+i)^k = A \frac{(1+i)^n - 1}{i}$ where $A$ is the monthly, or time period, deposit. fraction of last payment that was interest was $\frac{i}{1+i}$, principal of the last payment is $A = F_{\text{last}}(1+i)$ > [!tip] geometric series > > $$ > \sum_{k=0}^{n-1}r^k = \frac{1-r^n}{1-r} > $$ ### inflation > [!tip] real vs. nominal > > nominal value refers to actual cash flow at the time it hapens, real value refers to equivalent amount of value at reference time, converted using inflation rates. > > real dollar $R = \frac{\text{CF}_n}{(1+r_i)^n}$, where $\text{CF}_n$ is the nominal cash flow at time $n$, and $r_i$ is the effective yearly inflation rate. > [!tip] internal rate of return > > the discount rate that results in a NPV of zero (break-even scenario) > > $$ > \text{CF}_0 + \sum_{n=1}^{N}{\frac{\text{CF}_n}{(1+r_{\text{IRR}})^n}} = 0 > $$ > [!tip] minimum acceptable rate of return > > a rate of return set by stakeholders that must be earned for a project to be accepted > > real vs. nominal MARR: real MARR is MARR if returns are calculated using real dollars, whereas nominal MARR is MARR if returns are calculated using nominal dollars. > > $\text{MARR}_{\text{real}} = \frac{1+\text{MARR}}{1+f} - 1$ where $f$ is the inflation rate ## risk management and stochastic modelling > Convert to dollar/wk to base calculation on same unit uncertainty, evaluating likeliness and potential impact, organize to risk matrix, determine expected impact, then propose mitigation strategies ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/most-critical-risk.webp) > [!tip] expected impact > > the chance it happens multiplied by the impact it will have if it happens. $\text{E[NPV]} = \sum_{i}{\text{NPV}(x_i)p(x_i)}$ > > Then use this to create necessary mitigation ### NPV with risk and uncertainty > [!note] probability distribution > > $p(x)$ of a discrete random variable $x$: Normalization requires that $\sum_{i}{p(x_i)} = 1$ > > PDF (probability density function) $f(x)$ of a continuous random variable $x$: Normalization requires that $\int{p(x)dx} = 1$ > [!tip] expected value for calculating stochastic to deterministic > > of function $f(x)$ is $\text{E}[f] = \sum_{i}{f(x_i)p(x_i)}$ for discrete random variable $x$ with probability distribution $p(x)$ > > of function $f(x)$ is $\text{E}[f] = \int_x{f(x)p(x)dx}$ for continuous random variable $x$ with PDF $p(x)$ > [!note] Normal distribution > > $f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ > > `NORM.DIST(x, mean, stddev, cumulative)`: cumulative is `1` for CDF, `0` for PDF `NORM.INV(RAND(), 0.5, 0.05)`: draw values from a normal distribution with mean 0.5 and stddev 0.05 ### non-linear deterministic and stochastic models mean value $\mu_{x}$ of a random variable $x$ is its own expected value $\text{E}[x]$, variance $\sigma^2_{x}$ is the expected value of the squared deviation from the mean $\text{E}[(x-\mu_x)^2]$, and stddev $\sigma_x$ > [!tip] central limit theorem > > sample size becomes large enough, the distribution of the sample mean will be approximately normally distributed, regardless of the distribution of the population, using [Monte-Carlo](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/Monte-Carlo) simulation > Expected value of linear and nonlinear functions: suppose $x$ and $y$ are independent random variables with means $\mu_x$ and $\mu_y$, and variances $\sigma^2_x$ and $\sigma^2_y$, then $E[x^{2}] = \sigma_x^2 - \mu_x^2$, $E[xy] = \int \int xyp_xp_ydxdy=\int xp_xdx \int yp_ydy=\mu_x \mu_y$ Dealing with 12 months per year: saying outcomes over a year should be **normally distributed** (CLT), with a mean given by expected value of monthly outcome and stddev given stddev of outcome divided by square root of the # of rolls ($\sqrt{12}$) --- ## project management and CPM - scope, cost, time to maximize quality WBS (work breakdown structure): hierarchical decomposition of the total scope of work CPM (critical path method): determine the longest path through the network, the critical path, and the shortest time to complete the project ![cpm.webp](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/cpm.webp) crashing a project means using additional resources to shorten a specific task ## supply and demand market equilibrium: where supply and demand curves intersect, quantity demanded equals quantity supplied. shift to right: greater demand, higher price, higher quantity. shift to left: lower demand, lower price, lower quantity. factors of production: land, labour, capital, entrepreneurship determinants of demand: - price: quantity demanded $Q_d$ falls when price $P$ rises and vice versa - prices of related goods: substitutes and complements determinants of supply: - price: quantity supplied $Q_s$ rises when price $P$ rises and vice versa - factors of productions - fiscal policies, taxes, regulation > [!tip] elasticity: how responsive quantity demanded or supplied is to a change in price. > > Surplus when $Q_s > Q_d$, shortage when $Q_s < Q_d$. > > Elasticity of demand: $E_d = \frac{\% \Delta Q_d}{\% \Delta P} = \frac{\mid \frac{P}{Q_D} \mid}{\mid \frac{dP}{dQ_D} \mid}$ > > Elasticity of supply: $E_s = \frac{\% \Delta Q_s}{\% \Delta P} = \frac{\mid \frac{P}{Q_S} \mid}{\mid \frac{dP}{dQ_S} \mid}$ > > higher slope corresponds to lower elasticity: inelastic, lower slope corresponds to higher elasticity: elastic Demand elasticity: $E_D <1$ means if price increases by 5% then demand will decrease by less than 5%, inelastic. $E_D >1$ means if price increases by 5% then demand will decrease by more than 5%, elastic. > [!tip] taxes > > arbitrary lower the equilibrium quantity, > > price seen by consumers vs. suppliers changes depends on relative elasticities of demand and supply: more price change will end up on consumer side > > quantities change depends on total elasticities of demand and supply: more elastic means more quantity change. > [!tip] subsidies > > arbitrary increase the equilibrium quantity, > > price seen by consumers vs. suppliers changes depends on relative elasticities of demand and supply: more price change will end up on consumer side > > quantities change depends on total elasticities of demand and supply: more elastic means more quantity change. ## behavioural economics invisible hand of the market: self-interest of individuals leads to the best outcome for society as a whole, in a free market economy, as rational actors are motivated by incentives. perfect competition: wheat (control of price none, low barrier to entry, high # of producers, products are identical) monopolistic competition: restaurants (control of price low, low barrier to entry, high # of producers, products are similar) oligopoly: airlines (control of price high, high barrier to entry, few producers, products are similar) monopoly: utilities (control of price high, high barrier to entry, one producer, unique product) game theory, most notable [The Prisoner’s Dilemma](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/The-Prisoner's-Dilemma) anti-trust legislation: prevent monopolies, promote competition, protect consumers > behavioural economics: + psychology to look at reasons people make _irrational_ decisions > > “bounded rationality”: you don’t have perfect information, and understand there’s an opportunity cost to get it law of demand and _ultimatum game_: people will pay less for a good if they can get it elsewhere for less, even if they value it more than the price they pay. [Cooperation](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/Cooperation): R. Axelrod’s _The Evolution of Cooperation_ propose a “strategy”, what you do dependent on what the other person does. PPF (production possibility frontier): trade-offs between two goods, given a fixed amount of resources. risk aversion: people prefer a certain outcome to a risky one, even if the expected value of the risky one is higher. ⇒ assume that the given investment is loss, then calculate based on margin gains ## tax, incentives and depreciations _income, corporate, property, sales_ personal income tax: progressive tax rate corporate tax: flat tax rate, regardless of income level → net income: subtracting expenses from gross income. profit on investments will be tax. If yields loss, then offset the loss against the profits from another to pay less tax overall. [optimization](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization) strategies: minimize liabilities, timing of expenditures → incorporate into financial models, do sensitivity analysis before-tax MARR: set MARR high enough to include taxes that need to be paid ⇒ for investment’s gross profit after-tax MARR: if tax is explicitly accounted for in the cash flows of the project, then MARR should be lower ⇒ for final investment decisions $$ MARR_{\text{after-tax}} = MARR_{\text{before-tax}} \times (1 - \text{corporate tax rate}) $$ _incentives_: tax credits, tax reliefs, programs to encourage certain activities _depreciation_: due to use-related physical loss, technological obsolescence, functional loss, market fluctuation. > Deprecation is a non-cash expense, but reduces the taxable income of a business. Can deduct annually by spreading the cost of an asset over its useful life. affects NPV (net present value), IRR (internal rate of return), and payback period calculation _Market value_: actual value of the asset can be sold for, estimated _Book value_: deprecated value of the asset, using a depreciation model _Salvage value_: estimated value of the asset at the end of its useful life > [!tip] value calculations > > Depreciation in year $n$ $D(n)$ is the decline in book value over that year: $BV(n) = BV(n-1) - D(n)$ > > Salvage value $SV$ is the book value at object’s EOL: $SV = BV(N) = MV(0) - \sum_{n=1}^{N} D(n)$ > [!note] Straight-line depreciation > > spreads uniformly over useful life, SLD of a period $D_{\text{sl}}(n) = \frac{\text{Purchase price}-\text{Salvage value after N periods}}{\text{N periods of useful life}}$. > > book value at end of $n^{th}$ year: $BV_{\text{sl}}(n) = P - n \times \frac{P-S}{N}$ > [!note] Declining-balance depreciation > > different assets are classified into classes: $D_{\text{db}}(n) = BV_{\text{db}}(n-1) \times d (\text{depreciation rate})$, such that book value at the end of a period $BV_{\text{db}}(n)$ is $BV_{\text{db}}(n) = P(1-d)^n$ > > given salvage value $S$ and period of useful life $N$, depreciation rate $d = 1 - \sqrt[N]{\frac{S}{P}}$ > [!note] Sum-of-years-digits depreciation > > $D_{\text{syd}}(n) = \frac{N-n+1}{\sum_{i=1}^{N} i} \times (P-S)$ > [!note] Unit of production depreciation > > $D_{\text{uop}}(n) = \frac{\text{units produced of period}}{\text{life in \# of units}} \times (P - S)$ > > assumes a SLD but vs. # of units rather than time. --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization tags: - eng3px3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization" title: Linear Optimization in Economics Analysis date: 2024-02-08 --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-08---Linear-Optimization.pdf), [optimization](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization) Linearization around [first order Taylor series](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/Equations#taylor-series) expansions Usage: - Resource allocation - Project selection - Scheduling and Capital budgeting - Energy network optimization > [!tip] Criteria for optimization models > > - comprised of only **continuous variables** > - **linear objective function** > - either only **linear constraints** or inequality constraints $$ \begin{align*} \min_{x} \phi = c^\mathbf{T} \mathcal{x} & &\leftarrow &\space \text{Objective function} \\\ \text{s.t} & &\leftarrow &\space \text{Constraints} \\\ A_h \mathcal{x} = \mathcal{b}_h & &\leftarrow &\space \text{Equality constraints} \\\ A_g \mathcal{x} \leq \mathcal{b}g \leq 0 & &\leftarrow &\space \text{Inequality constraints} \\\ \mathcal{x}_{lb} \leq \mathcal{x} \leq \mathcal{x}_{ub} & &\leftarrow &\space \text{Variable Bounds} \end{align*} $$ where: - $\mathcal{x} \rightarrow j^{\text{th}}$: decision variables - $c \rightarrow j^{\text{th}}$: cost coefficients of the $j^{\text{th}}$ decision variable - $a_{i, j}$: constraint coefficient for variable $j$ in constraint $i$ - $b_i \rightarrow \text{RHS}$: coefficient for constraint $i$ - $(A_k \mid k = \lbrace \mathcal{h}, \mathcal{g} \rbrace)$: matrix of size $\lbrack m_k \times n \rbrack$ ## Sensitivity reports ### Decision variables **Reduced cost**: the amount of objective function will change if variable bounds are tighten **Allowable increase/decrease**: how much objective coefficient must change before optimal solution changes. > [!note] > > If there are simultaneous changes to objective coefficients, and $\sum_{\text{each coefficient}}(\frac{\text{Proposed change}}{\text{Allowable change}}) \leq 100 \%$ then the optimal solution _would not change_. ### Constraints **Final value**: the value of constraints at the optimal solution **Shadow price**: of a constraint is the marginal improvement of the objective function value if the RHS is increased by 1 unit. **Allowable increase/decrease**: how much the constraint can change before the shadow prices changes. See [lemon\_orange.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/eng-3px3/lemon_orange.py) --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function tags: - eng3px3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function" title: Net value function date: 2024-01-09 --- See [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-Net-Value-Functions.pdf) ## What is economics? > Relocation of resources > everything _has_ a cost Cost-benefit analysis _Jules Dupuit_ See [Economics evolving: a history of economic thought by _Agnar Sandmo_](https://press.princeton.edu/books/paperback/9780691148427/economics-evolving) ## Net Value functions $$ \text{Net value = [Benefit] - [Cost]} $$ - relativity - perspective: $\text{(Benefit - Cost)}_{\text{client}}$ $$ \text{Benefits}_{\text{client}} > \text{Sale Price} > \text{Cost}_{\text{producer}} $$ $$ \text{System Net Value =} \space \text{Benefits}_{\text{client}} - \text{Cost}_{\text{producer}} $$ $$ \text{NVF = Benefits - Cost of space - Cost of time - ...} $$ Unit matching and conversion > [!notes] marginal value, quantity-dependent value > > _marginal net value_ of buying an apple is the change in NV from buying one more apple (slope of NVF wrt number of apple bought) either subsequent items gives more NV or lower costs. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/marginal-apple-q.webp) --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Net-value-analysis tags: - eng3px3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-value-analysis" title: Net Value Analysis date: 2024-01-16 --- See [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-value-analysis/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Engineering-Economics--and--Net-Value-Applications.pdf) > determine which options has most positive net value for the lab $$ \text{Net value}_{\text{Lab}} = \text{Benefits}_{\text{Lab}} - \text{Cost}_{\text{Lab}} $$ Simulation: gold nanoparticles $$ \begin{aligned} NV_P \text{(relative to purchasing)} &= NV_P - NV_P = 0 \\\ NV_F \text{(relative to purchasing)} &= NV_F - NV_P = C_P - C_F \\\ NV_{nR} \text{(relative to purchasing)} &= NV_{nR} - NV_P = C_P - C_{nR} \\\ \end{aligned} $$ ### relative to purchasing $$ NV = \$896 \, \text{week}^{-1} - \left( \frac{\$5}{100 \, \text{mL}} q_{\text{ingred}} + \frac{\$12.5}{\text{hr}} t_{\text{FumeHood}} + \frac{\$100}{\text{hr}} t_{\text{SEM}} + \frac{\$15}{\text{hr}} t_{\text{GradStudent}} + C_{\text{other}} \right) $$ --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Non-linear-optimization tags: - eng3px3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Non-linear-optimization" title: Non-linear Optimization date: 2024-04-12 --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Non-linear-optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-09---Nonlinear-Optimization.pdf) --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization tags: - eng3px3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization" title: Economic Optimization date: 2024-02-01 --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-07-Optimization-Problem-Formulation.pdf) # model-based[](#model-based) - conclusions from the model of the system Components: - decision variables - constraints - objectives - functions: mathematical function that determines the objective as a function of decision variable $$ \begin{align*} \min_{x} \phi = f(x) & &\leftarrow &\space \text{Objective function} \\\ \text{s.t} & &\leftarrow &\space \text{Constraints} \\\ h(x) = 0 & &\leftarrow &\space \text{Equality constraints} \\\ g(x) \leq 0 & &\leftarrow &\space \text{Inequality constraints} \\\ x_{lb} \leq x \leq x_{ub} & &\leftarrow &\space \text{Bounds} \end{align*} $$ ## decision variables ### discrete. > limited to a fixed or countable set of values $$ x_{\mathcal{D}} \mid a \in \mathcal{I} = \lbrace 1, 2, 3, 4, 5 \rbrace $$ ### continuous. > can take any value within a range $$ x_{\mathcal{C}} \subset \mathcal{R} $$ ## constraints - physical limitations: cannot purchase negative raw materials - model assumptions: assumptions about the system > [!tip] > > a decision upper and lower bounds ($x^{\mathcal{U}}$ and $x^{\mathcal{L}}$) > [!note] Properties > > - **Active/binding**: $\exists \space x^{*} \mid g(x^{*}) = 0$ > - **Inactive**: $\exists \space x^{*} \mid g(x^{*}) < 0$ ### graphing models > [!note] feasible set of an optimization model > > The collection of decision variables that satisfy all constraints > > $$ > \mathcal{S} \triangleq \lbrace x : g(x) \leq 0, h(x) = 0, x^L \leq x \leq x^U \rbrace > $$ ## outcomes > [!tip] optimal value > > the optimal value $\phi^{*}$ is the value of the objective at the optimum(s) > > $$ > \phi^{*} \triangleq \phi(x^{*}) > $$ > Constraints satisfy, but it is not binding Linear optimization problems $$ \begin{aligned} \underset{x_1,x_2}{\min} \space \phi &= 50x_1 + 37.5x_2 \\ &\text{s.t} \\\ 0.3x_1 + 0.4x_2 &\geq 2000 \\\ 0.4x_1 + 0.15x_2 &\geq 1500 \\\ 0.2x_1 + 0.35x_2 &\leq 1000, \\\ x_1 &\leq 9000 \\\ x_2 &\leq 6000 \\\ x_i &\geq 0 \end{aligned} $$ See also [Linear Optimization](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization) --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Sensitivity-analysis tags: - eng3px3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Sensitivity-analysis" title: Sensitivity analysis date: 2024-02-01 --- See [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Sensitivity-analysis/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-06-Sensitivity-Analysis.pdf) ### Marginal analysis > determining the impact of a decision on net value, especially when the decision is incremental (e.g., change in NV with one more orange) ### Sensitivity analysis > how sensitive the model (i.e., NVF) is to changes in its inputs or parameters (like conversion factors). \= marginal analysis for each variable separately and comparing the results. --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Simple-Report tags: - eng3px3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Simple-Report" title: NVF for affordable housing date: 2024-02-05 --- The high-level net value function for which is defined by performance parameters and conversion factors: $$ \text{NVF} = \text{HouseSalesRevenue} - \text{LabourCost} - \text{EnergyCost} - \text{MaterialsCost} - \text{R\&D} - \text{UpfrontConstructionCost} $$ Where the performance parameters and conversion factors are defined by the following: - **Productivity and Housing construction rate**: This is defined by the rate of construction and the number of prefabricated units sold. The conversion factor is considered by the sold units, and such generate revenue. The assumption here is that there are specific quota of units to be sold that meet the production capacity, as well as the price per unit sold are within market value. - **Labour cost**: This is defined by the operational cost for given number of workers to build the houses and automations. The conversion factors are derived from the average wage per hour, number of working hours per year, as well as number of workers on the project. The assumption here is that workers are paid accordingly to their job and the number of hours they work. (i.e. no overtime, \$35/hour, 45 hours/week) - **Energy cost**: This is defined by the energy consumption for the construction and operation of the houses. The conversion factors are derived from the consumption per square foot, total operational area, energy price. The assumption here is that the energy efficiency of workplace are within the standard, and using market price for energy consumption per kWh. - **Materials cost**: This is defined by the cost of materials for the construction of the houses. The conversion factors are derived from the average cost of material per square foot, and the total area of the houses, and the number of house built per year. The assumption, similar to as above, is that there is a certain quota of houses to be built, cost of raw materials required for construction meet standards and policies from rule makers. - **R\&D**: This is defined by the yearly budget for research and development to innovate and improve both the construction and the prefabricated units. The conversion factors is a lump sump of the budget allocated for R\&D. The assumption for this is that the budget will be able to afford the best team and resources to innovate on current design. - **Upfront construction cost**: This is defined by the initial investment to start the operation, including factory setup and equipment purchases, compliance with building codes, and other considerations. The conversion factors is an amortized, one-time cost of the initial investment. This will be reflected as the capital expenditure needed for the project. Some of the following considerations are made for the aforementioned [NVF](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Simple-Report/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function#net-value-functions), as well as the performance parameters and conversion factors: - **Environmental**: The focus on material uses should be vital, such that all materials are sustainable and non-toxic, to reduce emissions. Additionally, the energy consumption would also be increased due to utilisation of automation software and robotics. Therefore, The NVF should be reflected where both `EnergyCost` and `MaterialsCost` would be increased from initial assumptions and due diligence. - **Regulatory**: Compliance with building codes and different considerations for prefabricated homes are recognized. Compliance can implies additional costs, and therefore `UpfrontConstructionCost` could increase. This could also affect operational feasibility of the projects if any of regulatory requirements are not met. - **Ethical and DEI**: Possible **DEI** concerns include the wage gap, the small number of workforce due to robots and automation, and the potential displacement of workers. Additionally, DEI are also taken into account to provide affordable housing to different socio-economics classes, such that it aligns with broader social objectives. However, this would then also increase `UpfrontConstructionCost`, similar to regulatory considerations. --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Technical-Design tags: - eng3px3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Technical-Design" title: Technical Design date: 2024-01-25 --- See also: [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Technical-Design/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-05-Tech-Design.pdf) > technical analysis: Using science to determine how variables are related in order to draw conclusions in engineering-relevant context - Licensing is not discipline-specific > engineering design: > > - making decisions: _on the basis of engineering principles_ > - create plans: _for someone to create/modify something_ > - benefit of humans ### terms. 1. Decision variables: - could change about the design 2. Performance parameters - describes how well the realised design works that is relevant to the end users - can’t control performance parameters directly ### optimum engineering design. 1. use **technical analysis** to determine decision variables 2. write **NVF** in terms of _decision variables_ 3. use **optimisation methods** to determine - optimum set of decision variables - corresponding value of NVF - sensitive the optimum set and resulting NVF are to changes in decision variables and other parameters ### validity and assumptions: - push to one extreme --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/index tags: - university - eng3px3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/index" title: Engineering Economics date: 2024-01-09 --- Dr. [Matt Minnick](mailto:prof3px3@mcmaster.ca) Objective: 1. Economic principles to make decisions 2. Formulate Net Value function to evaluate and compare value & cost of alternative engineering decision 3. Make assumption or perform necessary research to cope with ambiguity and uncertainty in required tasks 4. Apply fundamentals of cost, price, present value, and other financial metrics 5. Manage group projects and interpersonal relations 6. Economic analysis Progress check-in: - One-pager what you have completed last week, Gaant chart, progress --- slug: thoughts/university/twenty-three-twenty-four/hci-4hc3/Interaction-Critical-Evaluation tags: - sfwr4hc3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/Interaction-Critical-Evaluation" title: Interaction Critical Evaluation date: 2023-09-25 --- The smartphone stands as a quintessential example of human-centered design in modern technology. Its interaction paradigm is built around the principles of affordances and signifiers; the touchscreen affords gestures such as taps, swipes, and pinches, while the graphical user interface is replete with icons and visual elements that signify their function and operation. For instance, a trash bin icon universally suggests deletion, and an envelope icon suggests messaging or email. The design of the smartphone interface also heavily relies on mappings; the spatial arrangement of apps on the home screen often corresponds to their frequency of use or importance, with the most essential apps placed at the bottom within easy reach of the thumb. Feedback is another critical aspect, with the device providing tactile, visual, or auditory responses to interactions, confirming actions such as sending a message or taking a photo. The smartphone’s conceptual model is designed to be intuitive, often mirroring real-world objects and actions, which reduces the learning curve and makes the technology accessible to a broad audience. However, despite the general usability, smartphones can sometimes lead to unintended interactions, such as accidental inputs when the device is in a user’s pocket, commonly referred to as ‘pocket dialing.’ This phenomenon supports the hypothesis that while the design is highly optimized for intentional use, it can occasionally misinterpret unintentional user input as valid. Nonetheless, the smartphone’s design is overwhelmingly helpful and useful, enabling a vast array of tasks to be performed with a single, portable device. It is a powerful testament to human-centered design, with its success lying in its ability to evolve continually, integrating feedback from millions of users to refine its interaction model. The smartphone not only accomplishes its intended tasks but also anticipates and adapts to user needs, often extending beyond its basic functions to serve as a camera, a GPS device, a gaming console, and much more, making it an indispensable tool in daily life. The convection oven stove is a staple in many kitchens (including mine), offering a combination of traditional stove top cooking and the advanced technology of convection baking. In terms of affordances, the stove provides clear cues for interaction; burners afford placing pots and pans, and the oven affords inserting food for baking or roasting. The knobs and buttons are signifiers that indicate where to interact to adjust the temperature and settings. The design typically includes mappings that are logical and aligned with the user’s expectations; for instance, turning a knob to the right often increases the heat, which is a standard convention in many cultures. Feedback is immediate and informative; the glow of an electric burner or the ignition click of a gas stove provides a clear indication that the stove is operational, while digital displays on the oven relay the temperature and cooking mode. The conceptual model of a convection oven stove is built upon the user’s familiarity with cooking appliances, leveraging analogies to traditional ovens and stoves while introducing new features like fan-assisted cooking, which improves heat distribution and cooking times. Despite these intuitive design elements, there can be unintended interactions or experiences. For example, There are knobs on my convection oven that are relatively confusing and its software interface are often times to complex for my daily usage. Additionally, the stove’s flat surface can sometimes make it unclear whether a burner is hot, which can be a safety hazard if the only feedback is visual and not tactile. Observations that support these unintended interactions include anecdotal evidence of users accidentally leaving the convection feature on or off, misunderstanding the icons that indicate convection settings, or touching a hot surface without realizing it because the stove lacks adequate warning indicators for residual heat. In conclusion, while the convection oven stove is designed to enhance the cooking experience by providing more uniform heat and faster cooking times, it is not without its usability challenges. The design is generally helpful, facilitating a wide range of cooking tasks, but it requires users to adapt and learn cooking techniques specific to convection as well as the oven specific interface. Improvements could be made to enhance the user experience, such as better signifiers for the convection feature and clearer safety warnings for hot surfaces. Last but not least, a smart fridge represents a leap forward in kitchen appliance technology, integrating features such as inventory tracking, internet connectivity, and even internal cameras. The affordances of a smart fridge are similar to those of traditional refrigerators, such as storing food at cool temperatures, but they also include interactive touch screens and the ability to sync with other smart devices. Signifiers are evident in the design of the touch screen interface, which often uses icons and menus to indicate where to tap to access features like temperature control, shopping lists, or to view the contents of the fridge via an internal camera. Mappings in a smart fridge are designed to be intuitive; for instance, adjusting the temperature settings involves sliding a bar, which corresponds with the user’s mental model of up for more and down for less. Feedback is provided through the touch screen with visual confirmation when a setting is changed, or when the fridge door is left open, sometimes accompanied by an auditory alert. The conceptual model of a smart fridge is built upon the idea that a refrigerator can be more than just a cooling appliance; it can be a food management system. It assumes that users will understand and appreciate the additional functionalities, like being able to check the contents of their fridge from their smartphone while at the grocery store. However, smart fridges can introduce unintended interactions. I find the multitude of features overwhelming or non-essential, leading to underutilization of the technology. For instance, if the interface is pretty cluttered or complex, and sometimes I struggle to perform even simple tasks like changing the temperature. Moreover, if the fridge’s software requires regular updates or experiences glitches, it can lead to frustration or even temporary loss of basic functionalities. Observations that support these potential issues include users ignoring smart features and using the fridge as a traditional refrigerator, or instances where a software malfunction may cause the interface to freeze or become unresponsive, requiring a reset or technical support. In assessing the helpfulness and usefulness of the smart fridge’s design, it’s clear that it aims to enhance the user’s experience by integrating with their digital life and providing convenience. However, the design’s success is contingent upon the user’s engagement with the smart features and their tolerance for adopting new technology in a traditionally non-technical space. While the smart fridge is a forward-thinking appliance, it must balance its advanced capabilities with the fundamental requirement of being user-friendly and reliable in performing its primary task of food preservation. --- slug: thoughts/university/twenty-three-twenty-four/hci-4hc3/Interactive-cycle tags: - sfwr4hc3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/Interactive-cycle" title: Interactive cycle date: 2023-10-10 --- ```mermaid flowchart TD 1[Computer] --> 2[Interaction] --> 3[User] 4[Input] --> 5[Interface] --> 6[Output] 1 --> 6 --> 3 --> 4 --> 1 ``` --- slug: thoughts/university/twenty-three-twenty-four/hci-4hc3/Psychopathology-of-everything tags: - sfwr4hc3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/Psychopathology-of-everything" title: Psychopathology of everything date: 2023-10-10 --- ### The complexity of modern devices > What is a good design? > > - Discoverability > > - possible to figure out what actions are possible > - where and how to perform them > > - Understanding > > - What does it mean? > - How is it supposed to be used | Design fields | Purpose | Optimisation target | Users | | ------------- | ----------------------------- | ------------------------------------------------------------------------------ | -------------------- | | Industrial | form & material | function, value, appearance of the product & system | Users & manufacturer | | Interaction | understandability & usability | understanding in technology interaction, upon psychology, design, art | users | | Experience | emotional impact | designing products focused placed on quality and enjoyment of total experience | Users | ?: What are the deficiency in human-machine interaction? - limitation of today-technology - self-imposed restriction such as: cost - lack of understanding of the design principles > Human Centred Design is an approach that puts human needs, capabilities, and behaviour first, then design to accommodate those needs, capabilities and machine behaviour. | Experience Design Industrial Design Interaction design | Areas of Focus | | ------------------------------------------------------ | ------------------------------------------------------------------------------------------------- | | Human-centred design | Process that ensures design match needs and capabilities of the people for whom they are intended | ### Fundamental principles of Interaction #### Experience - how fondly people remember their interaction - discoverability - affordances - signifiers - contraints - mappings - feedback - conceptual model of the system --- slug: thoughts/university/twenty-three-twenty-four/hci-4hc3/System-Image-and-Paradox-of-Technology tags: - swfr4hc3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/System-Image-and-Paradox-of-Technology" title: System Image and Paradox of Technology date: 2023-09-12 --- [Fundamental principles of Interaction](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/System-Image-and-Paradox-of-Technology/../../../../../../../../thoughts/university/twenty-three-twenty-four/hci-4hc3/Psychopathology-of-everything#fundamental-principles-of-interaction) goes into the fundamentals of interaction when designing a product → conceptual models - People create mental models of themselves, environment, interaction - Designer conceptual model vs. Users conceptual models > What does it takes to create a good conceptual models? > > - Users study? > - Low-fidelity prototype Paradox of technology The same technology that simplifies life by providing more functions in each device also complicates life by making the device harder to learn, harder to us Design challenges - Price - features parity - reliability - supports --- slug: thoughts/university/twenty-three-twenty-four/hci-4hc3/index tags: - university - sfwr4hc3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/index" title: Human Centred Design date: 2023-09-04 --- The following includes notes for the following course 4HC3 - Human-Centred Interface --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle tags: - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle" title: Aristotle date: 2023-09-11 --- Two types of learning - From perception to habit. Animal association of particulars. - From perception to belief. Rational cognition of particulars. This is the learning of experience (emperia). > being qua being > Metaphysics: The philosophical study of being qua being Against Plato - being = idea - how do ideas cause particulars? - how do ideas cause motion? > ideas, immaterial and changeless, are real seems unrealistic Form: a thing’s organization or disposition to behave Essence: a principle of reality Matter: potential for a substance to change ### Are Aristotle’s forms the same as Plato’s Ideas? They are similar in what they are expected to do, but they work in different ways. For Plato, the Idea of horse is different from every particular horse. It is a separate entity, immaterial, changeless, and better, more real than any fleshy animal. For Aristotle, forms have no existence separate from the individual substances whose form they are. Where there is a form, there is a particular substance. Light is the actualization of a potential state of a transparent medium. It is an accident of a transparent medium. The medium is a substance: air. It has accidents. One of these accidents is to become illuminated in the presence of colored bodies. Which is what we see as light. ## The Unmoved Mover (Metaphysics, Books 12, Chapter 6-7) Unmoved Mover causes motion without itself moving. Even without moving, a thing can cause other things to move toward it by causing love or desire. Something loved or desired need make no motion of its own to cause things move toward it; it initiates motion without moving. That is how the Unmoved Mover moves things—by being the object of love and desire • Necessarily exists (cannot not exist) • The final cause of motion in nature • The comprehensive reason for everything else • Divine • Alive and happy (because imperturbable) --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Descartes tags: - philos1aa3 - seed description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Descartes" title: Descartes date: 2023-12-09 --- > Descartes’s Method of Doubt: Press doubt as far as possible in order to find the boundaries of knowledge. --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Epicurus tags: - philosophy - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Epicurus" title: Epicurus date: 2023-11-09 --- [Socrates](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Epicurus/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates) ideas: - care for self → happiness and virtue are problems of knowledge - Idealism: Being is Idea; fundamental reality is immaterial, spiritual, and rational - Never by itself lead anyone astray Epicurus - Emphasize the value of philosophy as care of the self. - Deny Idealism, affirm matterialism in the form of atomism ## Desire > completely innocent, if it goes wrong because of its belief May be necessary or not necessary ### Necessary _non-satisfaction brings pain_ - Happiness (philosophy, friends) - Life (food, water) - Untroubled body (law, leisure) ### Not necessary _non-satisfaction not necessarily painful. Any pain of non-satisfaction relieved by other means (change one’s opinion about the object)_ - Natural (sex, immortality) - Conventional (reputation) ## The notoriety of Epicurus - materialism, denying the spiritual in nature - belief in chance and no final purpose - disbelief in afterlife - hedonism: pleasure is the highest good ## Idea of Pleasure - A feeling, not a sensation - An evaluation of sensation - Pleasure and pain are distinct qualities, like the two poles of a magnet. Neither is merely the lack of the other ## Pleasure ### Kinetic _depends on an object and is intermittent or discontinuous_ Shadowed by pain Excess → produce pain ### Katastematic _Continuous, independent of external objects._ Types: - _Aponia_: leisure, physical ease, stressless well-being - _Ataraxia_: untroubled, tranquil mind Un-quantified terms _Reasons for promoting pleasure as _the highest good__ 1. Cradle Argument The goodness of pleasure is learned in the cradle. The first good, naturally pursued 2. Conceptual Argument Concept of good becomes meaningless when conceived as independent of pleasure > The more obstructive, unpersuasive the definition comes ## Plato’s Against Pleasure as the Good - Pleasure is the replenishment of lack - Life spent in pursuit of pleasure constantly tries to fullfill newly arising lack - Any pleasure is made better by adding virtue. - Pleasure plus wisdom is better than pleasure without wisdom - Pleasure plus courage is better than pleasure without courage > So pleasure cannot be the highest good ⇒ Answer of Epicurus: > Wisdom, courage, and all the virtues _are_ katastematic pleasures Higher and lower hedonism ## Virtues > [!note] NOTE > > Personal qualities that assist us in the pursuit of happiness Katastematic Virtues according to Epicurus ### Prudence, practical wisdom Truly prudence have knowledge of kinetic pleasure ⇒ whether to choose or avoid it ⇒ never interfere with their katastematic pleasure Successful life =: uncanny unsucessful life in the eye of the world Aim for self-sufficiency, cultivate leisures, prefer private life, private pleasure, low-profile Learn to live in the little → circumstances changes ⇒ make due with less ### Self-sufficiency ### Frugality Less toy ⇒ more katastematic pleasure Wealth shouldn’t be the most important Doesn’t advocate poverty → invokes how we think wealth in a new way Wealth is not money, but the mean to enjoy life > Wealth is an abundant of katastematic pleasures Basis in nature ⇒ easy to apply ### Friendship Being a friend, having friend ⇒ support katastematic pleasures Awareness among friends that they _are not alone_ Sense of security := katastematic pleasure of virtues ### Justice Can’t be happy when act unjust ⇒ Epicurus believes in social contract. Human lives without any organisation 1st Civilisation: Agreement to prevent harm among themselves - Why?: Motives is not fear, but the desire for friendship. Unpleasant to prepare to fight at every moment → Violence is not a way of life > [!tip] IMPORTANT > > Justice and pleasures are fundamental building blocks of society Saw the needs for more formal definition of contract ⇒ Law and Justice > Justice is neither natural nor sheer conventional. Originated from conventional, but the motive is natural (pleasure of security and friendship) - Justice is an conventional good contrived to promote pleasure - Not eternal. Justice changes as circumstances change - Not inherently good. Good as a means to the higher end of pleasure. ## Challenge to Religion - Our world is one of infinite worlds in endless void - Nothing spiritual in nature. Human beings not special in nature. They are animals, system of matter, like everything else. Death is extinction. - The gods takes no interest in human affairs and cannot be moved by sacrifice or prayer. - Religious ceremonies are superstitious. They are the way a powerful few control the rest. The aim of philosophy is to liberate people from superstition. ## Tetrapharmakos _The four-fold remedy_ - The gods present no fears - Death presents no worry - The good is readily attainable - The terrible is readily endurable --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill tags: - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill" title: John Stuart Mill date: 2023-11-30 --- On _Liberty_ > “There are but few persons… whose experiments, if adopted by others, would be likely to be of any improvement on established practice. But these few are the salt of the earth; without them, human life would become a stagnant pool. Not only is it they who introduce good things which did not before exist; it is they who keep the life in those which already exist.” (252) > “The general tendency of things throughout the world is to render mediocrity the ascendant power among mankind.” (253) > “The initiation of all wise or noble things comes and must come from individuals.” (253) > ""What crushes individuality is despotism” (251) Experiments in living ⇒ praises for these people. Liberty is limited by the requirements of _Do no harm_ Self-regarding actions vs. Other-regarding actions Self: be as self, be as different Other: Do no harm If something is not-moral: Then Moral is wrong Utilitarianism: Individual’s expression - not maximise pleasure - but maximise **progress of humanity** > Harmonious development of humanity Unlimited regards to actions to self-regarding actions, but not other-regarding actions. ### Moral Individuality As individuals as we do, but still be Morals to others [Moral](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill/../../../../../../../../thoughts/moral) is good thing. Ask [Nietzsche](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill/../../../../../../../../thoughts/Philosophy-and-Nietzsche)’s, what is good thing? --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche tags: - philosophy - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche" title: Nietzsche's Life date: 2023-11-30 --- _1844-1900_ Theology, acquainted with Greek philosophy _Twilight of the Idols, 1889_ # Problem of Socrates.[](#problem-of-socrates) **Decadent: in decline, decay. “Doing a bad thing carefully.”** > An unexamined life is not worth living. Only Knowledge alone makes life worth living Importance things in life: goodness, happiness, depends on reasons, arguments → bizarre equations. > Maybe something wrong with this? What drives Socrates from this demands with reasons and knowledge? What are the motives? Hume: Common life Plato: Turns away from appearance, material life to know the true being Artist: playful presentation, loving life for the work Philosophers: Not joy, serious, engaged in serious business, striving to know the truth. Invent nothing, contemplate what is, what is true, “being qua being” > One must be all means stretch out one’s fingers and make the attempt to grasp this amazing finesse, that the value of life cannot be estimated. (269) Life is not a closed books, but evolving still. # God is Dead[](#god-is-dead) _God of the philosophers_ Atheism Denis Diderot: “It is … very important not to mistake hemlock for parsley; but to believe or not to believe in god, is not important at all” Nietzsche: “God is dead”. Means optimism, faith in science, the redemptive power of knowledge is “dead”, that is, unconvicing, hard to take seriously. _Nihilism: The highest values are devaluing themselves_ How do we have duty on truth? Doesn’t need reasons to be atheism? For N, believing god is passing into the past, and have no feature. Science has devitalise god. So to die the superior value of truth. Value of Truth is problematic. ## From _Thus Spoke Zarathustra_ Contrast between Nobility and goodness _noble spirits vs. good_ nobility > good noble people: maintain nobility, might considered by other as setback not become “a churl” (churlish, misanthropic, a hater of humanity) ## Morality as Anti-Nature Critiques in Christianity, - Anti-nature because anti-difference, when nature is all difference - Anti-nature because it values people all the same, when in nature, by nature, we are amazingly different. Security with “herd mentality” Regards as sign of decline, docile (democracy, or [John Stuart Mill](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill)’s instution) > Obstacles that are good for us: Becomes who we are Herd = human society - like sheep: unhappy on our own. Watch each other carefully, follow into our line - all herd: All animals all nervously watch each others Peace is overrated, as defeats kind of challenges to grow for leaders herd = security (survival values, of Europe at this time?) Everything modern people thinks good is bad, things judges to be evil could turn out to be good in the future. Decadent: arts and philosophy ⇒ artists wants to play with empiricists, doesn’t care about the truth (philosophers’ motive) Arguments against morality: One rule for everything Judge everything by one rules: Morality reduces human to singularity Object to Kant’s Morals, but not [John Stuart Mill](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill)’s Utilitarianism Respect for others is imposed on us ⇒ refused to embrace > Morals compromise creativity, and value more creativity more than morals (beyond good and evil) ### What I Owe the Ancients _The Birth of Tragedy, 1872_ Greek tragic drama - Aeschylus, _Oresteia_ - Sophocles, _Oedipus the King_ - Euripides, _Bacchae_ > Why do we enjoy tragedies? Why do we enjoy watching people suffer? > “All becoming and growing - all that guarantees a future - involves pain.” (282) > “Art is worth more than truth.” Shares’ [Plato](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato)’s view of democracy. Values life of creativity, to invent new values. Acknowledges death, suffering, tragedy. Not defects, or overcome by science. Knowledge can demoralise people (Knowledge is good) Knowledge is a very thing that it is good. Knowledge is not the path to virtue and happiness > Science tells us no good in itself, no purposes in itself. Values are selected by us, not stumbled upon. Life has no values, since we cannot see all that life has to offer. It is a place for adventure. Creating values is science not do, but art can ⇒ art is more important than truth. --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous tags: - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous" title: Nous date: 2023-12-07 --- Notes: [notes](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/All.pdf) Reference: [text](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/1A3Reader\(2019\).pdf) ### [Socrates](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates)’s Idea of Good - Something is good when it contributes to the flourish of human being - Deny democracy - the masses are childish - unnatural, confuses freedom with lack of restraint - inefficient - bad at financial management In [Phaedo](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Phaedo-and-Apology): `The body confuses the soul and does not allow it to acquire truth and wisdom` _See [Apology](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato#apology) for more information_ _Arguments for Survival_ Idea of all things come into being from their opposite → The have come from the death a soul _must_ exist despite being dead - Understanding of perfection is independent of experience - To have knowledge independently of experience → soul must have exists b4 body - Yes, soul will survive death, because soul that exists before birth must come from something dead - does not requires a living body to be a living soul _Against soul scattering_ - can dissolve and scatter must be composite - composite changes, simple doesn’t - Idea are simple - Understanding ideas is a pure power of mind - ideas are simple → soul understand them are also simple - Soul does not consist of parts → cannot change - Soul brings life to a body → death changes the body, but the souls live on. - Idea of the Even cannot become odd, Hot cannot become cold, soul, makes a body alive cannot die. _See [Republic](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato#republic)_ - Belief is liable to error, knowledge is not. - Belief can be changed by persuasion, knowledge cannot be. - Belief does not bring understanding, knowledge does. - True belief, right opinion, is still essentially belief or opinion, and cannot be knowledge since its truth is accidental. - Opinion is shameful because it is not a passive thing that innocently occurs to a person. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/knowledge-map-thoughts.webp) _Allegory of the Cave_ Plato’s pessimism We are sunk in error, addicted to opinion, and democracy is hopeless. It is the political expression of minds ruled by opinion, bereft of wise knowledge. Plato’s optimism The cosmos is organized by goodness. By gasping that we understand the world we live in, and by understanding that we understand how best to live. > And we can understand that, the idea of the good. At least some of us can. They are the philosophers, masters of the dialectic, and they should govern the rest. [Aristotle](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/aristotle-metaphysics.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/aristotle-form.webp) _What is Truth?_ > “To say of what is that it is, and of what is not that it is not, is true.” correspondence theory of truth > Truth is the correspondence of substance and statement _Theory of Causes_ - Formal cause: law of change - Material cause: material persisting through change - Efficient cause: agent of change - Final cause (teleological cause, _telos_): purpose of change [Epicurus](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Epicurus#desire) - God doesn’t concern human affair - nothing to fear > Happiness is uninterrupted tranquility. If intervene, from some disturbance of tranquility existence of evils proves indifference to gods. Atomism Doesn’t against the Soul, only against its immateriality [Stoic](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Stoic) Cynic principles Materialism without atomism Matter is continuous and without void. No empty space. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/epistemology.webp) _Free will and determinism_ > Epictetus: “If a good man could foresee the future, he would cooperate with sickness, death, and mutilation; for he would be aware that this had been ordained by the universal order of things, and that the whole is more important than the parts.” All causes are either: 1. _Antecedent causes_: events leading up to a change. 2. _Active, operating causes_: immediately produce the effect. _Moral_ The highest good (= virtue) is right volition. Every act is chosen, voluntary. No moral luck. Whether life goes well or ill is completely in our control. Suffering is a kind of error, a cognitive mistake, due to wrong judgment and false belief. [Descartes](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Descartes) Muller-Lyer Illusion > The idea of god is a perfect being > I think, therefore, I exists Thinking implies existence provides a test for truth, and _cognito_ Descartes equates material substance (matter, body) with spatial extension. The essence of body, what makes a body corporeal or material, is spatial extension. _Impliciations_ 1. primary and secondary qualities ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/descartes-qualities.webp) 2. Plenum > space is identical with matter, then the idea of “empty space” becomes impossible. The physical universe is therefore “filled up,” a plenum, with no empty space. 3. Inertness - Spatial extension is the whole essence of matter. - No other quality, except those primary qualities that necessarily accompany extension - Motion is not essential to a body. If a body moves, motion was transmitted to it from another moving body. 4. Mind-body problem - Mental perceptions vs. Physical causation [Sphinoza](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Sphinoza) Deus sive Nature Descartes: Mind and body are separate substances. A person is a substantial union of thinking substance and extended substance. Spinoza: A human being cannot be a substance. 1. Substance cannot **not** exist; it is a necessary, self-caused, _causa sui_ being. 2. No human being is a necessary, self-caused being. 3. Therefore no human being is a substance. Even less can a human being be what Descartes said—a composite of two substances. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/spinoza-knowledge.webp) --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato tags: - philosophy - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato" title: Plato date: 2023-11-08 --- See also [Socrates](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates) # Apology[](#apology) - Defence against charges against them - Promote study under the earth and sky - Corrupt youth use of Athens → promote new gods (last for 1 day) Poisons - Contrary to the charges → never takes money for instruction - Available to anyone without any fees → Socrates didn’t teach, just talk in public - Converse with knowledgable → these ppl never shares their wisdom - Socrates go after people? - Need to embarrass them? - Oracle at Delphi → Anyone wiser than Socrates? - No one is wiser than Socrates - Socrates doesn’t feel wise? - Enquire about what _wisdom_ is? - Enquire knowledgable → bored - Enquire poets → Wisdom doesn’t come from poets (poets inspired from wisdom of the god) - Enquire technicians → not the right source of knowledge for wisdom - Craftman == incapable of explain why he does what he does ⇒ verbal explanation is irrelevant > Wisdom can be used both for good and bad ```poetry language=fr Wisdom is a penetrate into the goods, not the bad ``` Oracle at Delphi → reputation for enigma (ironic, play with words) Oracle could mock humanity ```poetry language=fr No one is wiser than Socrates ``` - He is the wisest - No one is wise (Socrates is as wise, and he knows he is not wise → no hope) - He is the wisest because he knows he is not wisest > He does not know anything **fine** and **good** Socrates posses no expertise in making goodness > Socrates: Mission to relentless for people to confront them for their ignorance, takes care of your soul Convicted by the stupidity of them all ### Dialog _Dikasts_ → Arguments: - Behaviour is not subversive, contrary to their beliefs, and doesn’t harm the city - Actions are sanctified by the gods → You don’t understand anything - Follow his conscience rather than follow their democracy - Contempt towards democracy, Mockery - Democracy is a childish form of government > My trial will be equivalent to a doctor being prosecuted by a pastry-cook before a jury of children #### Context of the trial - Athens → democracy from 508 to 322 BCE - Peloponnesian War → Sparta defats Athens - Alkibiades: friend of Socrates - Dissolute - Conflicted about this character - Admire his charm, leadership quality - Poty, aristocratic → fear from his ambition - Fear from friends of Alkibiades - His family is from Spartan - Charges for treason → Sentenced to death → Resurface in Spartan → democracy is corrupted - Assassination attempted - Return to Athens as a hero - 404 → Athens accepted terms to surrender to Spartan - Assassinated while traveling → probable enemy of Democracy (Socrates) → look for someone to blame - Because he was a friend of Alkibiades → find scapegoat for a failing democracy Among philoshophers, poets to be contempted with Athens democracy: - Masses are childish, fickle, easily misled - Unnatural, tyranny of the weak over the strong - Confuses freedom with lack of restraint, favors flatterers - Inefficient > Government should be efficient → chose the best person to govern Socratic rules would not be Aristocrat, rather experts, masters acquired by rulership. Philosophical rulers → - Crucial to make them wise and knowledgable - Establish education → reason well and follow reasons [Nietzsche](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche) → problems with Socrates - Reasoned and consciously sound - Un-examined mind is not worth living? > What makes wisdom so good? ## Socratic Idea of Good > Something is good when it contributes to the full flourishing of a human being in all our powers and faculties for the natural duration of life. Know how to use all things in a way it tends good What is not good might not be wise? → really cares about Athens, wisdom, then why not participate and join in debates? - Afraid participating in this democracy? - Heard this voice → never tells him what to do, speak against things from what he is considering doing - Turned him away from doing wrong things - Too honest to survive if occupied with justice - Can’t serve justice with service - Man whose serve justice must live a private not public life → Never does the voice tells him to defence himself → Thought about fleeing → Voice told him not to → What happens after trial may be not a bad thing → Death is not a bad thing → Socrates is unmoved after receiving death > You too must be a good hope in place of death. A good man cannot be harmed What you think to harm me harms you a lot more He knows there is a life after death ? > Divine mission: encourage Athenians in self-reflection for good moral life Difference - Gods are perfectly just and follow moral standard, whereas everyone else (Holmer show us Gods brought up death) doesn’t - Poets: Gods bring meaningless suffering to people → Socrates thinks this is wrong, and poets should be defamed by not worshipping gods Virtuous - masculine - Conventionally good, knowledge within wisdom are good only if it is used wisely What does wisdom do for us? - Knowledge of the goods, and the power that comes with it - Knowing the good universally and philosophically, and from that > A good person cannot be harmed The unexamined life is not worth living Doing wrong is worse than suffering wrong Riches and power contribute nothing to happiness. Only wisdom and virtue matter, and wisdom is the ultimate virtue # Phaedo[](#phaedo) _by Plato_ _a month later, Socrates’ in jail, waiting for his execution_ Tries to write poetry - Write to gods of Apollo (Oracle at Delphi) - fables of issa - Recurring dream accross his life → ends with him hearing a voice: Socrates, practice art _last day of life_ jurors thinks death is the worst harm, yet Socrates said: > That might be true for somebody. Philosophers should fear death less than anyone else Philosophy is a apparition of practice of death What is death? - Turning away images of body, to intellectual form of ideas - Free their souls from the confusion of the mind - Body is an obstacle to knowledge - Truth is known by intellect, reasoning → involves the best part of the body, ignoring all sense > The body confuses the soul and does not allow it to acquire truth and wisdom. As long as we have a body and our soul is fused with such an evil and well shall never adequately attain what we want, which truth. > If we are ever to have pure knowledge, we must escape from the body and observe things in themselves with the soul by itself. It seems likely that we shall attain. wisdom only when we are dead (65d-66e) Explains the value of philosophy as a preparation of death free of deception, sensory of self → attain the wisdom that wait us from the other side Is there another side? Last hour: Prove: The soul cannot die. - Contrived and unconvincing - Life after death? Arguments: 1. Arguments of survival 1. All things come into being from their opposite Living come from the dead, and the dead comes from the living - To have come from the dead the soul 2. 1. Understanding of perfection is independent of sense experience 1. A _priori knowledge_: independent of experience 2. A _posterior knowledge_: depends on experience 2. To have knowledge independently of experience → the soul must have been alive prior to bodily life → will it survive death? 3. Yes, because a soul that exists before birth must come from something dead → association with a living body is not essential to a soul Do not require a living body to be a living soul 3. Against soul scattering - Soul can dissolve and scatter must be composite → _What is composite changes, what is simple does not change_ - Ideas like Equality or Justice do not change → Ideas are simple, not composite (simple = non-composite) - Understanding ideas is a **pure** power of mind and does not depend on the body - Since ideas are simple → soul that understands them must also be simple - So a soul does not consist of parts, is indivisible, and therefore cannot change - So death cannot change the soul 4. 1. Soul brings life to the body → makes body a live, in the way that the form of the even make six, for of Hot make fire hot 2. The idea of Even cannot become odd, the idea of the hot cannot become cold. > Pythagorean: Soul is more than all of what makes a body → Soul is a being in its own right, separate entity, detachable from body, which makes the body alive No one does evil without knowing that they are doing evil ⇒ Conclude: Cannot harm a good man, cannot kill a soul, tranquil I shall no longer with you, offer his cup of poison → reminds what is important Reserve to Crito: - We owe a cock to a Symposium. Make the offering and do not forgive → cure to a disease, from his body. Cannot trust from the world. It lacks things from what we need → The nether is much better Plato: Such was the end of the comrade. The men who does he knows Conquer the west → Western civilisation places a lot of sign in faith ### Symposium _some time after Socrates’ death_ _Agathon, a poet of tragedies, is the host_ > Soul is an abstraction from empirical knowledge. Alkibiades, of Scocrates: > This utterly unnatural, this truly extraordinary man … this hopelessly arrogant, this unbelievably insolent man … \[of] amazing arrogance and pride … he unique - Seduction: is a game wrt Socrates → Socrates life is one big game of irony Alkibiades: Seduction for a trade - invites Socrates to wrestle in the gym ⇐ doesn’t work - drunk (no one has seen him drunk). Lies down → makes move → nothing > Finding yourself falling in love with Socrates > > - Pretends to fall in love → others will fall in love with him Ordinary irony: False ⇒ to implies the opposite Socratic irony: Both is and is not seriously meant ⇒ True in one way, false in another > _to Euthyphro_: You think that your knowledge of the divine, and of piety and impiety, is so accurate that … you have no fear of having acted impiously in bringing your father to trial - Knowledge of divine is false ⇒ false ⇒ Euthyphro: self-righteous fouls doesn’t understand anything Angers when seeing Socrates falling for someone else. Future replicates the past. Love must be more than intense alliances from others. > Socrates can’t teach anybody anything that they don’t already know > > - Alkibiades is so rough beneath the smooth surface → not interesting to bring in as disciple > - Alkibiades knows himself too much to love, to be drawn to Socrates > - Maybe Alkibiades can be redeemed by philosophers ## Republic 1. Metaphysics: philosophical theory of being > To be is to be an idea. Idea means ideal form, means it is perfect, doesn’t know by the triad or the body Idea is intelligible (things grasped by intellect), not sensible (sense-perceptible things) being. True reality is world of ideas and immaterial, changeless, ethereal, fully rational. > Opinions without knowledge are shameful and ugly things. Sounds are copy of ideas ← bad > Idea of good is the idea to make things to be good. 1. Idea of good is the most important thing that everyone know - By conforming this, everything else would be useful and beneficient - Don’t know the idea of good = everything else is useless > Obtain knowledge of good is the foundation of Platonic philosophy Merely believed to be true: Everyone wants what is good > Every souls persue the goods and do what it is good Can’t understand our own goods without knowing how our own goods integrate with the best self of everything. → makes people the best of what they can be. Satisfied with opinion where knowledge is out there to be found? Opinion: doxa Knowledge: episteme Understanding: nous • Belief is liable to error, knowledge is not. • Belief can be changed by persuasion, knowledge cannot be. • Belief does not bring understanding, knowledge does. • True belief, right opinion, is still essentially belief or opinion, and cannot be knowledge since its truth is accidental. • Opinion is shameful because it is not a passive thing that innocently occurs to a person. ### Sun: Visible Things : Sight Idea of good stands to intelligible things as intelligible things stand to understanding Good illuminate to us understanding - Cause of beings for ideas - Cause for knowledge Virtually all ideas (white light is virtually all colors) ⇒ Makes all minds to be true and understanding. > To understand is to focus intellect on the form, idea (stare into the sun and not be blinded) Understand an idea is understand what true being is. Knowledge knows what is is and must be it is Criterion of knowledge: - Infallibility, the impossibility of error Understanding (nous) ⇒ Philosophy Beauty, justice are entities Thought (dianoia) ⇒ Science - Requires some intellectual understanding → thoughts - Hypothesis: first principle of science yet stayed unexplained Perception (aesthesis) ⇒ Opinion > How do we understand philosophy? ## Dialetic > Inquiry that systematically attempts to grasp with respect to each thing itself what they being of it is ( that is, the idea) Does away with presuppositions. It overcomes everything hypothetical in thought and leads to presuppositionless knowledge. --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Presocratics tags: - philos1aa3 - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Presocratics" title: Presocratics philosophers date: 2023-09-09 --- ### Anaximander > material cause and first element of things was the unbounded _aperion_ - earth is cylindrical form, that its depth is as a third part of its breath - begetting hot and cold was separated off from the eternal at the origin of the world. The heavenly bodies are wheels of fire separated off from the fire which encircles the world, enclosed with air Living creatures arose from the moist element as it was evaporated by the sun. Man was like another animal, namely, a fish, in the beginning. often associated with _aperion_ and _infinite_ as the fundamental of the universe > Human arose from inside of the fish, and become capable of protecting themselves. ### Xenophanes 1. --- #### Parmenides > Archaic The questions of the beginning ``` Originate from the gods? Natural occurrence of where things from ``` - Pythagoras - Scientific > Being is one entities Not being is not known \| Change is not being → not green is Logic convince `changes is not real` Appearance of changes = dilusion > Truth is being → homogeneous changes Ultimate being is not one → ## Materialism atheist? ### Empedocles Perspective of nature - configuration of _materialism_ Elements: Earth, Air, Fire and Water Elemental force: Love and Strifes real difference upon things manifestation of these real articles ### Democritus Atomism Atomic hypothesis → nature is Atoms and Void - Nature is body and void, nothing else, not purpose or design - Soul is the body System of the atom Atoms of different sizes and shape Coalesce and accumulate → soul > Soul is the matter of the finest, smallest, matter, coherently all part of the organism - Primary and secondary qualities - Primary: atoms are size, shape, and height - Secondary: molecular combinations include hot/cold, moist/dry > Sweet exists by convention, bitter by convention, colour by convention. Division comes to a stop when atom → Atom is the least divisible factor Absolute stopping point for destruction → rearrangement of these atoms are indispensable If there are gods → they have to have body → no atom is immortal → Gods are immortal. For Greek, Gods are human are indifferent. --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates tags: - philosophy - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates" title: Socrates date: 2023-09-25 --- [Pre-Socratic](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Presocratics) lives by materialism refers to Ionian: > keen on _nature_ science. Why it comes to be, why exists? Dissatisfied with nature science Apolo’s oracle at Delphi: `Know yourself` He knows nothing about superior things and right way to live He is not wise > Cannot be wise about nature, without wise of ourself for how we live and love Use the knowledge well ### Wisdom > Begins in self-understanding What is the way we ought to live? Philo-sophia love-wisdom != not being wise Can only love what we lack > Precisely not to be wise Philosophers vs. Fool: Philosophers knows hes not wise, whereas a fool doesn’t Longs for wisdom ⇒ only god is wise complete wisdom are not for human Learn from Pythagorias Soul is not human life, paradoxical of life Hermenedies: Pythagorian cares for the soul (rational, pursue for rational) Neither gods nor beast Spiritual beings operate between gods and beast - not posses wisdom but can have the ability to pursue such wisdom Socrates What is X? - True of all case - Reason why something is an X ex: Logos what gods love some, not others > Platonic Idea (Form) The idea of X is the form all particular Xs share, and which cause them to be X. Plato ⇒ Idea for all Socrates asks ⇒ More than materialism ⇒ from Socratic or Platonic is wrong - Idea is not material, not body in space or time - Ordinary experience Charges for Athenian trials? Animosity with regards to to his way of being? --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Sphinoza tags: - philos1aa3 - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Sphinoza" title: Sphinoza date: 2023-10-10 --- ## Ethics (1677) Deus sive Nature European considered Sphinoza as Atheist ⇒ but his works mentioned God ## God of the Philosophers/Metaphysicians - Unmoved Mover of Aristotle (pre-Socratic Ionian Xenophanes) ⇒ Value of philosophers esteemed, not the value of the normal being - Not a personal being with loves or being - but more rationalistic god Monotheistic God != nature ⇒ God creates nature Spinoza: God = nature - powerful, perfect being Substances **Aristotle**: Substance is that which exists in itself and not in another **Descartes**: Two substances, mental and physical (dualism) **Parmenides**: “Say and think only this: Being is” A single substance, numerically one. **Spinoza**: Combine the conclusion of Aristotle about what substance is with the argument of Parmenides, that there can be only one _single one substance_ What is this? One being is alive, infinite, intelligent, all-powerful material, spacial > God is not transcend natural, not super nature cause God is nature ## Substances (1D3) Substance is cause of itself, causa sui Cause of itself (causa sui) = necessary existence Something that doesn’t exist but how it can beomce? Substances has no cause, belongs to the substance it is It has to exists, essence includes existence ## Monism > Substance is unique. There exists only one substance. Not one _kind_ of substance, but _numerically one single substance_ 1. Something exists 2. What ever exists has a sufficient cause 3. Therefore a causa sui substance must exist, and 4. There is at most one Why one? 1. If there are two causa sui substances, must be difference 2. If there is a difference, then there must be a cause 3. One causa sui substance cannot cause change in another self-caused being ## Aristotle’s Idea of Substance Color does not exists in the horse, instead of the form Horse would exists with itself? > Nature is not a totality of things ## Mode and Attribute **Attribute**: That which is intellect perceives of substances as constituting its essence (1D4) Substances has inf attributes, each attributes has inf mode **Mode**: The affections of a stubstance; that is, that which is in something else and conceived through somethign else (1D5) Each mode is connected, can be modified by others “Mode” = modification, modality way. A mode of substance is a modification of it, some way in which substnace is modified. “Conceived through” = explained by, made intelligible, reasonable by ## Spinoza’s idea of Substances (=God) One single, infinite, eternal, complex substance, comprising infinitely many modes of infinitely many attributes Substances > Attribute > Mode Invisibility implies corruption God can’t have parts and pieces Is god divisible? Definition 6: By God I emean an absolutely infinite being; that is, substances Proposition 11: ### Method for the proof of god Ontological Proof: Explain God as a being that cannot not exist. God’s essence includes existence Cosmological Proof: God is the first cause, the aultimate cause of everything else. WIthout God the chain of cause and effect would recede forever, and the world would be without a rational foundation. Aristotle’s proof of the Unmoved Mover in Metaphysics was this type of proof Teleological Proof (“Design argument”): Nature shows evidence of intelligent design. ← Sphinoza rejects the _Teleological proof_. His three arguments in Proposition 11 #### Reduction of Absurdity (reductio ad absurdum) ```prolog To prove P: assume NP show that if NP, then Q & NQ Q and NQ is contradiction and is impossible (contradiction=False) so not NP therefore P ``` First proof: (ontological argument) 1. Suppose God does not exist 2. Axiom 7: If a thing can be conceived as not existing, its essence does not involve existence 3. Prop. 7: Existence belongs to the nature of substance. - Why? a. Substance cannot be produced by another b. So, from def.1, substance is self-caused, aso its essence involves edistence 4. The hypothetical non-existence of God reduces to contradiction 5. Therefore, God exists. Second proof: (cosmological argument) 1. For everything, there must be a cause, either of its existence or its nonexistence. 2. The cause, whether of existence or non-existence, is either in the thing or in another 3. A thing necessarily exists if no cause prevents its existence. 4. So, if God does not exist, there must be a cause of non-existence, and this cause must be in another 5. What causes God not to exist must absolute exclude God from being, and can there have nothing in commong with God 6. If two things have nothing in common, one cannot prevent the other’s existence 7. Therefore, no cause prevents God’s existence 8. So, God exists. > There is nothing of which we can be more certain than the existence of an absolutely infinite or perfect entity --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/index tags: - philos1aa3 - university description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/index" title: Philosophical Text date: 2023-09-04 --- An introduction to philosophy through the close reading of selected classical texts. Authors to be considered may include Plato, Descartes, Hobbes, Hume, Marx, Mill, Nietzsche. The full notes can be found [here](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/index/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/All.pdf), with all of the reference [text](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/index/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/1A3Reader\(2019\).pdf). Tutorial notes can be found [here](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/index/../../../../../../../../thoughts/university/philo-1aa3/tut) --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Being-qua-being tags: - philosophy - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Being-qua-being" title: Being qua being date: 2023-09-15 --- # Wisdom[](#wisdom) - Most general knowledge - Not instrumental knowledge - Practical (Techne) → practical - Epistemic (Episteme) → theoretical > Aristotle values episteme, since Techne is dependent on others ``` Theoretical is independent of others scenarios ``` Intrinsic values of epistemology? Instrumental values only relatable once intrinsic values are considered → intrinsic values over instrumental values - Since of being (`being qua being`) ⇒ is it possible scientifically? - Philosophy of science - How do we justify science is possible? - Unify principle to account for the variety of things - Questions wrt metaphysics - Science of metaphysics to study everything (study of being) Being has a lot of sense # Passage[](#passage) _Bk. 4 Ch. 2 (p. 135)_ Priority of Substances Focal point Healthy → functional relativity to ‘health’ Using health or different aspect of health That is being is related to one central point all things to be that are of substances, - affections - process - destruction/privations/qualities - productive if there is a science for one → a science for all? Metaphysics is a science for investigating What is the focal point of metaphysics? What is the primary sense of being? > Primary sense of being is being substances Substances is an unified things in the world ex: Human is a substances --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Epicurus tags: - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Epicurus" title: Epicurus's Hedonism and Materialism date: 2023-09-11 --- # Materialism[](#materialism) - made of atoms # Central role of chance[](#central-role-of-chance) # Denial of afterlife[](#denial-of-afterlife) - believes in gods and souls, but souls are _material of thing_ - souls dies once the body dies # Hedonism[](#hedonism) - pleasure as the highest good Q: - Why is pleasure the highest good? - Which pleasures should we choose? - Agree? Pleasure - Pain (good) (bad) Blessed life ⇒ healthy body ⇒ undisturbed soul lead to pleasure ``` Endure pain for greater pleasure ``` Kinetic vs. Katastematic pleasure object dependent vs. object independent --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Phaedo-and-Apology tags: - philosophy - philos1aa3 description: Phaedo and Apology title: Phaedo and Apology date: 2023-09-22 --- ## Structure Problem ⇒ Thesis ⇒ Structure ## Apology - Outsider - Speaking in a language that is not the court ⇒ Knowing what good is enough for being good → connect between wisdom and good action ⇒ To find the truth of what is good > Athenians are harming themselves since Socrates is the only one who concerns the truth and what truly good is. - “Ancient” / Recent accusation - Bad association with pre/post - Ancient: - Physicist → enquire things in heaven and under the earth - Sophist: Use arguments that leads people away from the truth - Doesn’t care about physics - “human/political virtue” ⇒ People aren’t as wise as they were - Corrupts the youth - Hates democracy - not believe to the gods of the state ⇒ can’t corrupt the youth: “No one does bad willingly. Corrupt person is more likely to harm people.” ⇒ If you know that you are harming people → you should come to me > The unexamined life is not worth living - Won’t act as what is good and knowledge → living a life of ignorance → not a worth living life. ## Phaedo - Pain and pleasure - How suicide is wrong, facing death = good ? - Philosophers desire death? - What if the soul is not immortal? - Single vs composite > All knowing is remembering ⇐ posteriori pp 62,63: - We know absolute equality - Material equalities fall short of absolute equality > To see inequality → need to have knowledge of what absolute equality is, from experience For Socrates → ideas are not fluid --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Republic tags: - philosophy - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Republic" title: Allegory of the cave, Republic date: 2023-09-13 --- Allegory of the cave: - See the shadow, but not the reflection of the real world - Escape the prisoner → convince them to go out and see what the real world is - Talking about shadow (escape to identify real world now instead of the shadow) ### Republic pp. 124-125 State of Ignorance? State of being in the cave = Ignorance eyesight ⇒ capacity for learning Sun = knowledge, intelligible world shadow → sensible world > taints these visions are unwilling to decent to human affair for the ⇒ which desire for allegory (don’t want to go down to the cape) Fear of the unknown? Journey to Enlightenment? Enlightenment as a duty? Why most of them don’t want to escape at the beginning? Why only one of them to escape? → fear of the unknown? The form? See also: [Symposium](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Republic/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Symposium#symposium-pp-105-106) Ladders’ of beauty ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Republic/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/IMG_0308.webp) --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Spinoza tags: - philos1aa3 - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Spinoza" title: Arguments regards to Sphinoza date: 2023-10-11 --- Substance = God = Nature → There is only one substances Many attributes: - Thought - Extensions Each attributes: - Mode (Aristotle’s substances) ```mermaid stateDiagram-v2 state "Substances" as A state "Thoughts" as B state "Extension" as C A --> B A --> C B --> M1 B --> M2 B --> M3 ``` _Think Least of Death_ by Nadler ## Principle of Sufficient reason > Everything that exists (and non-exists) has a cause and or reason ## Ethics Part II, pp. 190 --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Stoic tags: - philos1aa3 - philosophy description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Stoic" title: Stoic date: 2023-11-05 --- ### Epictetus - Enchiridion What we can control: belongs to us - opinion, pursuit, desire, aversion What we can’t control: belongs to others - property, reputation, command, body **Concern for care for self** > Some things are in our control and others not. Things in our control are opinion, pursuit, desire, aversion, and, in a word, whatever are our own actions. Things not in our control are body, property, repu-tation, command, and, in one word, whatever are not our own actions. | IN our control | NOT in our control | | ----------------------------------- | -------------------------------- | | Action/pursuit (inflect reputation) | Reputation (perceived by others) | | Acting on desire | Feeling of desire | > Epictetus saying that you can train of the feeling of desire. **All harm comes from false belief** > Duties are universally measured by relations maintain tranquility --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Symposium tags: - philosophy - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Symposium" title: Symposium date: 2023-09-29 --- ### Q1. ex: Th. E. dialogue shows [Socrates](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Symposium/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates) view that philosophy necessary for a good life E. definition of piety and S’s objection for it. ⇒ discussion in Apology about relation between philosophy and good life [Phaedo](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Symposium/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato#phaedo)’s philosophy for the soul ### Q2. See [Questions about Apology](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Symposium/../../../../../../../../../../posts/Questions-about-Apology) - Intro: Th. : Whether agree or disagree with Socrates - Why Socrates believe justice cannot be fought for in public life - Why S. is right/wrong - Conclusion ### Symposium pp. 105-106 1. Beauty of one thing 2. Beauty in every body is the same 3. Beauty in mind/soul 4. Beauty in the institution/law/science 5. Absolute beauty --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/T1 tags: - philos1aa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/T1" title: Piety date: 2023-09-13 --- - Problem - Motivated by a problem, questions - Explicit/Implicit via abstract - Thesis - Conclusions - Arguments - Structure Euthyphro - What is piety? - Are they pious because the god love them, or do the gods love them because they are pious Socrates - using the standard from judging that doesn’t appealing to what the gods love 1. Bits and perception → flagging ($*$) - Sits with it multiple times 2. Connections 3. Big picture --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/Graphs tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Graphs" title: Graphs date: 2024-02-26 --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Graphs/../../../../../../../../thoughts/university/sfwr-2c03/graph-algo.pdf) _Node_ as [information](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Graphs/../../../../../../../../thoughts/Information-Theory) and _edges_ as relationship between [data](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Graphs/../../../../../../../../thoughts/data) ## Directed acyclic graph (DAG) Application: [Merkle DAG](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Graphs/../../../../../../../../thoughts/Merkle-DAG) ## undirected. > [!note] > > - $\mathcal{N}$: set of vertices > - $\mathcal{E}$: set of _undirected_ edges: $\mathcal{E} \subseteq \mathcal{N} \times \mathcal{N}$ _path_ is a sequence of nodes and edges connect two nodes. > A graph is **connected** if there is a path between every pair of vertices. In a weight undirected graph each edge has a weight: $w: \mathcal{E} \to \mathbb{R}$ See also [Graph isomorphism](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Graphs/../../../../../../../../thoughts/Group-theory#graph-isomorphism) ## directed. > [!note] > > - $\mathcal{N}$: set of vertices > - $\mathcal{E}$: set of edges containing node pairs: $\mathcal{E} \subseteq \mathcal{N} \times \mathcal{N}$ > [!tip] > > sequence of nodes and edges are directional, edges are ordered pairs. > [!note] > > path with at-least one edge from a node > **Strongly component**: maximal sub-graph in which all node pairs are strongly connected. ## matrix [representation](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Graphs/../../../../../../../../thoughts/representations) Let $\mathcal{G} = (\mathcal{N}, \mathcal{E})$ be a directed graph with $n \in \mathcal{N} \land id(n) \text{ with } 0 \leq id(n) \leq |\mathcal{N}|$ Let $M = | \mathcal{N} | \times | \mathcal{N} |$ matrix > For every pairs of nodes $(m, n)$ set $M[id(m), id(n)] \coloneqq (m, n) \in \mathcal{E}$ ## The adjacency list representation > [!tip] Important > > Let $A \lbrack 0 \dots |\mathcal{N}|$ be an array of _bags_ For every edge $(m, n \in \mathcal{E})$ add $n$ to $(m,n)$ to bag $A \lbrack id(m) \rbrack$ | ops | complexity | | ---------------------------------------- | ----------------------------------------- | | add/remove nodes | $\Theta(\|\mathcal{N}\|)$ (copy array) | | add/remove edges $(n, m)$ | $\Theta(\|out(n)\|)$ (adding to bag) | | check an edge $(n, m)$ exists | $\Theta(\|out(n)\|)$ (searching bags) | | iterate over all _incoming_ edges of $n$ | $\Theta(\|\mathcal{E}\|)$ (scan all bags) | | iterate over all _outgoing_ edges of $n$ | $\Theta(\|out(n)\|)$ (scan a bag) | | Check or change the weight of $(n, m)$ | $\Theta(1)$ | ## comparison. > **Dense** graph: $|\mathcal{E}| \approx \Theta(|\mathcal{N}|^2)$ > **Sparse** graph: $|\mathcal{E}| \approx \Theta(|\mathcal{N}|)$ ## Traversing undirected graph. ### Depth-first search (DFS) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Graphs/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/images/example-graph-dfs.webp) ```pseudo \begin{algorithm} \caption{DFS-R(G, marked, n)} \begin{algorithmic} \REQUIRE $G = (\mathcal{N}, \mathcal{E}), \text{marked}, n \in \mathcal{N}$ \FORALL{$ (n, m) \in \mathcal{E} $} \IF{$\neq \text{marked}[m]$} \STATE $\text{marked}[m] \coloneqq \text{true}$ \STATE $\text{DFS-R}{(G, \text{marked}, m)}$ \ENDIF \ENDFOR \end{algorithmic} \end{algorithm} ``` $\text{marked} \coloneqq \lbrace n \longmapsto (n \neq s) \mid n \in \mathcal{N} \rbrace$ > [!tip] Conclusion > > - all nodes to which $n_3$ is connected > - $\mathcal{G}$ is **not** a connected graph > - The order of recursive call determines all $n_3$ is connected to. #### Complexity - $|\mathcal{N}|$ memory for _marked_ and at-most $|\mathcal{N}|$ recursive calls - inspect each node at-most once and traverse each edge once: $\Theta(|\mathcal{N}| + |\mathcal{E}|)$ #### Connected-components ```pseudo \begin{algorithm} \caption{DFS-CC-R(G, cc, n)} \begin{algorithmic} \REQUIRE $G = (\mathcal{N}, \mathcal{E}), cc, n \in \mathcal{N}$ \FORALL{$(n, m) \in \mathcal{E}$} \IF{$cc[m] = \text{unmarked}$} \STATE $cc[m] \coloneqq cc[n]$ \STATE $\text{DFS-CC-R}(G, cc, m)$ \ENDIF \ENDFOR \end{algorithmic} \end{algorithm} ``` ```pseudo \begin{algorithm} \caption{COMPONENTS(G, s)} \begin{algorithmic} \REQUIRE $G = (\mathcal{N}, \mathcal{E}), s \in \mathcal{N}$ \STATE $cc \coloneqq \{ n \mapsto \text{unmarked} \}$ \FORALL{$n \in \mathcal{N}$} \IF{$cc[n] = \text{unmarked}$} \STATE $cc[n] \coloneqq n$ \STATE $\text{DFS-CC-R}(G, cc, n)$ \ENDIF \ENDFOR \end{algorithmic} \end{algorithm} ``` #### Two-colourability > [!tip] Bipartite graph > > A graph is _bipartite_ if we can partition the nodes in two sets such that no two nodes in the same set share an edge. ```pseudo \begin{algorithm} \caption{DFS-TC-R(G, colors, n)} \begin{algorithmic} \REQUIRE $G = (\mathcal{N}, \mathcal{E}), \text{colors}, n \in \mathcal{N}$ \FORALL{$(n, m) \in \mathcal{E}$} \IF{$\text{colors}[m] = 0$} \STATE $\text{colors}[m] \coloneqq -\text{colors}[n]$ \STATE $\text{DFS-TC-R}(G, \text{colors}, m)$ \ELSIF{$\text{colors}[m] = \text{colors}[n]$} \STATE \textbf{print} "This graph is not bipartite." \ENDIF \ENDFOR \end{algorithmic} \end{algorithm} ``` ```pseudo \begin{algorithm} \caption{TwoColors(G)} \begin{algorithmic} \REQUIRE $G = (\mathcal{N}, \mathcal{E})$ \STATE $\text{colors} \coloneqq \{ n \mapsto 0 \mid n \in \mathcal{N} \}$ \FORALL{$n \in \mathcal{N}$} \IF{$\text{colors}[n] = 0$} \STATE $\text{colors}[n] \coloneqq 1$ \STATE $\text{DFS-TC-R}(G, \text{colors}, n)$ \ENDIF \ENDFOR \end{algorithmic} \end{algorithm} ``` ### Breadth-first search (BFS) ```pseudo \begin{algorithm} \caption{BFS(G, s)} \begin{algorithmic} \REQUIRE $G = (\mathcal{N}, \mathcal{E}), s \in \mathcal{N}$ \STATE $\text{marked} \coloneqq \{ n \mapsto (n \neq s) \mid n \in \mathcal{N} \}$ \STATE $Q \coloneqq \text{a queue holding only } s$ \WHILE{$\neg\text{Empty}(Q)$} \STATE $n \coloneqq \text{Dequeue}(Q)$ \FORALL{$(n, m) \in \mathcal{E}$} \IF{$\neg\text{marked}[m]$} \STATE $\text{marked}[m] \coloneqq \text{true}$ \STATE $\text{Enqueue}(Q, m)$ \ENDIF \ENDFOR \ENDWHILE \end{algorithmic} \end{algorithm} ``` #### Single-source shortest path > Given an undirected graph _without weight_ and node $s \in \mathcal{N}$, find a shortest path from node $s$ to all nodes $s$ can reach. ```pseudo \begin{algorithm} \caption{BFS-SSSP(G, s)} \begin{algorithmic} \REQUIRE $G = (\mathcal{N}, \mathcal{E}), s \in \mathcal{N}$ \STATE $\text{distance} \coloneqq \{ n \mapsto \infty \mid n \in \mathcal{N} \}$ \STATE $\text{distance}[s] \coloneqq 0$ \STATE $Q \coloneqq \text{a queue holding only } s$ \WHILE{$\neg\text{Empty}(Q)$} \STATE $n \coloneqq \text{Dequeue}(Q)$ \FORALL{$(n, m) \in \mathcal{E}$} \IF{$\text{distance}[m] = \infty$} \STATE $\text{distance}[m] \coloneqq \text{distance}[n] + 1$ \STATE $\text{Enqueue}(Q, m)$ \ENDIF \ENDFOR \ENDWHILE \end{algorithmic} \end{algorithm} ``` --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/Hash-tables tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Hash-tables" title: Hash tables and 2-3 tree date: 2024-02-26 --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Hash-tables/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/balanced-search-tree-hash.pdf) --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/Sorting tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Sorting" title: Sorting date: 2024-01-24 --- ### correctness of `BestTwoSum` Let $\text{TS(start, end)} = {(L[i], L[j]) \mid (L[i] + L[j] = w) \land (\text{start} \leq i \leq j \leq end)}$ ```prolog result := empty bag i, j := 0, N-1 while i < j do if L[i] + L[j] = w then add (L[i], L[j]) to result i,j := i+1,j-1 else if L[i] + L[j] < w then i := i+1 else j := j-1 return result /* result = TS(L, 0, N-1) */ ``` ### selection sort. ```prolog Input: L[0...N) of N values For pos := 0 to N-2 do min := pos For i := pos+1 to N-1 do if L[i] < L[min] then min := i swap L[pos] and L[min] ``` Comparison: $\sum_{\text{pos}=0}^{N-2}(N-1-pos) = \Theta(N^2)$, changes $2(N-1) = \Theta(N)$ ### insertion sort. ```prolog Input: L[0...N) of N values For pos := 1 to N-1 do v := L[pos] p := pos while p > 0 and v< L[p-1] do L[p] := L[p-1] p := p-1 L[p] := v ``` Comparison: $\leq \text{pos} = \sum_{\text{pos}=1}^{N-1} pos = \frac{N(N-1)}{2}$, changes $\leq \text{pos} = \sum_{\text{pos}=1}^{N-1}(1+pos) = \frac{N(N-1)}{2} + N - 1$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/Sorting/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/images/sumary-sorting.webp) ### merge sort. - divide-and-conquer ### A lower bound for general-purpose sorting _assume we have a list of $L \lbrack 0 \dots N)$ of $N$ distinct values_ $S$: All possible lists $L$ that are treated the same by A such that $C: L[i] < L[j]$ > [!question] Question > > Can we improve mergesort O(N) memory? ### quick sort. Complexity of quicksort $$ T(N) = \begin{cases} 1 & \text{if } N \leq 1; \\\ T(N-1) + N & \text{if } N > 1 \end{cases} $$ recursion tree: --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/W1 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/W1" title: Complexity analysis date: 2024-01-08 --- > algorithm = takes one-or-more values, produces an outputs Ex: `contains` > Given a list of $L$ and value $v$, return $v \in L.$ Input: $L$ is an array, $v$ is a value Output: true if $v \in L$, false otherwise ```prolog i, r := 0, false while i neq |L| do if L[i] = v then r := true i := i + 1 else i := i + 1 return r ``` ## invariants > Induction hypothesis that holds at beginning of iteration. [Deterministic](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/W1/../../../../../../../../thoughts/Determinism) 1. Base case: `inv` holds before the loop 2. Hypothesis: holds after `j-th, j `contains` is correct, runtime complexity is $\text{ContainsRuntime(|L|)}=|L|$ and memory complexity is $\text{ContainsMemory(|L|)}=1$ ### runtime. ![graph comparison](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/images/compare-graphs.webp) models are simplification. shows different growth rates. > Interested in scalability of algorithm _For large enough inputs, `contains` will always be faster than `altc` because **order of growth** of $\text{CRuntime} < \text{AltCRuntime}$_ ![Growth](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/images/growth.webp) > [!note] > > $f(n) = \mathcal{O}(g(n)) \iff \space \exists \space n_{0}\ , c>0 \mid 0 \leq f(n) \leq c \cdot g(n) \space \forall \space n \geq n_{0}$ > [!note] > > $f(n) = \Omega(g(n)) \iff \space \exists \space n_{0}\ , c>0 \mid 0 \leq c \cdot g(n) \leq f(n) \space \forall \space n \geq n_{0}$ > [!note] > > $f(n) = \Theta(g(n)) \iff \space \exists n_{0}, c_{lb}, c_{ub} \mid 0 \leq c_{lb} \cdot g(n) \leq f(n) \leq c_{ub} \cdot g(n) \forall n \geq n_{0}$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/W2 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/W2" title: Fundamentals date: 2024-01-15 --- See also: [complexity analysis](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/W2/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/W1#complexity) `LinearSearch(L, v, o)`: potentially-high cost ### _recursive binary search:_ ```prolog LowerBoundRec(L, v, begin, end) # Input: L: ordered array, v: value, 0 <= begin <= end <= |L| if begin = end then return begin else mid := (begin + end) div 2 if L[mid] < v then return LowerBoundRec(L, v, mid+1, end) else if L[mid] >= v then return LowerBoundRec(L, v, begin, mid) # Result: return first offset r, begin <= r <= end with L[r] = v, or no such offset exists, r = |L| ``` > repetition → induction Induction hypothesis: $$ \forall \space L', v' \space \exists \space 0 \leq b \leq e \leq |L'| \land 0 \leq e - b < m $$ Recursive case: `mid := (begin + end) div 2`: $b \leq \text{mid} < e$ termination bound function: $e - b$ Complexity: $$ T(n) = \begin{cases} 1 & \text{if } n = 0;\\\ 1 \cdot T(\lfloor \frac{n}{2} \rfloor) + 1 & \text{if } n > 1. \\\ \end{cases} $$ Complexity: $T(n) = 1 \cdot T(\lfloor \frac{n}{2} \rfloor) + 1$. Assume $n=2^x$, work = 1 $$ x+2 = \log_2(n) + 2 \rightarrow \Theta(\log_2(n)) $$ > Can usually assume $n=2^x$ > [!abstract] Theorem > > `LowerBoundRec` is correct and runtime and memory complexity of $\Theta(\log_2(|L|))$ ### _non-recursive binary search:_ ```prolog LowerBound(L, v, begin, end) # Input: L: ordered array, v: value, 0 <= begin <= end <= |L| while begin < end do mid := (begin + end) div 2 if L[mid] < v then begin := mid + 1 else end := mid return begin ``` --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/a1/A1 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a1/A1" title: Time complexity and recurrence relations date: 2024-01-08 --- ## Problem 1 ### 1.1 Consider the following function of $n$: $$ \begin{aligned} &n^2 \quad \sum_{i=0}^{n}5\cdot{i} \quad n^3\cdot \sqrt{\frac{1}{n^3}} \quad n^2 + 2^n \quad (\Pi_{i=1}^{9}i) \quad (\sum_{i=0}^{\log_2(n)}2^i) + 1 \quad 7^{\ln{(n)}} \\ &-\ln{(\frac{1}{n})} \quad \ln{(2^n)} \quad 10 \quad n\log_2{(n^7)} \quad \sqrt{n^4} \quad n^n \quad 5n \\ \end{aligned} $$ Group the above functions that have identical growth rate and order these groups on increasing growth. Hence: - If you place functions $f_1(n)$ and $f_2(n)$ in the same group, then we must have $f_1(n) = \Theta{(f_2(n))}$; - If you place function $f_1(n)$ in a group ordered before the group in which you place function $f_2(n)$, then we must have $f_1(n) = \mathcal{O}{(f_2(n))} \land f_1(n) \neq \Omega{(f_2(n))}$. _Solution:_ Theorem 3.25 states the following: $$ \lim_{n \to \infty} \frac{f(n)}{g(n)} \space \text{is defined and is} \space \begin{cases} \infty & \space \text{then} \space f(n) = \Omega(g(n)); \\ c \text{, with c > 0 a constant} & \space \text{then} \space f(n) = \Theta(g(n)); \\ 0 & \space \text{then} \space f(n) = \mathcal{O}(g(n)); \end{cases} $$ The following are the rules of thumb to determine the order of growth of functions: $$ \begin{align} c \cdot f(n) &= \Theta(f(n)) \\ \log_{a}(n) &= \Theta(\log_{b}(n)) \\ \lim_{n \to \infty} \frac{n^c}{n^{c+d}} &= \lim_{n \to \infty} \frac{1}{n^d} = 0 \rightarrow n^c = \mathcal{O}(n^{c+d}) \\ \log_{2}(n)^c &= \mathcal{O}(n^d) \space \forall c > 0, d > 0 \\ n^c &= \mathcal{O}(d^{n}) \space \forall c > 0, d > 1 \\ d^{\frac{n}{u}} &= \mathcal{O}(c^{\frac{n}{v}}) \space \forall c \geq d \geq 1, u \geq v \geq 1 \\ \sum_{i=1}^{m}{c_{i} \cdot n^{d_i}} &= \mathcal{O}(n^{d_i}) \\ f(n) + g(n) &= \Theta(g(n)) \\ h(n) \cdot n(n) &= \mathcal{O}({h(n) \cdot g(n)}) \end{align} $$ Let’s start with analysing the functions that have the same growth rate: - $n^2$: polynomial growth, $n^2 = \Theta(n^2)$ based on rule 3. - $\sum_{i=0}^{n}5\cdot{i}$: polynomial growth, $\sum_{i=0}^{n}5\cdot{i} = 5 \cdot \frac{n(n+1)}{2} = \frac{5}{2}n^2 + \frac{5}{2}n$, which is $\Theta(n^2)$ based on rule $7$. - $n^3 \cdot \sqrt{\frac{1}{n^3}}$: polynomial growth, $= n^3 \cdot \frac{1}{n^{\frac{3}{2}}} \rightarrow \Theta(n^{\frac{3}{2}})$ based on rule $9, 3$ - $n^2 + 2^n$: exponential growth, $=\Theta(2^n)$ based on rule $7$ - $(\Pi_{i=1}^{9}i)$: constant, $=1 \cdot 2 \dots 9$ based on rule 9 - $\sum_{i=0}^{\log_2{(n)}}{2^i} + 1$: linear, since the term is a geometrics series, such that $\sum_{i=0}^{\log_2{(n)}}{2^i} + 1 = \frac{1 \cdot (2^{\log_2(n)+1}-1)}{2-1} + 1 = 2 \cdot 2^{\log_2(n)} = 2n$. Therefore based on rule $1$ we have $\Theta(n)$ - $7^{\ln(n)}$: polynomial, since $7^{\ln(n)} = n^{\ln(7)}$. Apply rule $3$ we have $7^{\ln(n)} = \Theta(n^{\ln 7}) \rightarrow \mathcal{O}(n^2)$ - $-\ln(\frac{1}{n})$: logarithmic, since $-\ln (\frac{1}{n}) = \ln(n)$. Apply rule $1$ yield $\Theta{(\ln{n})}$. - $\ln (2^n)$: linear, since $\ln(2^n) = n \cdot \ln_{2}$ and rule $1$ yield $\Theta{(n)}$ - $10$: constant - $n\log_2(n^7)$: polynomial, since $n\log_{2}(n^7) = n \cdot 7\log_{2}{(n)} = 7n\log_{2}(n)$. Based on rule $9$ and $1$ yield $\mathcal{O}(n^2)$ - $\sqrt{n^4}$: polynomial growth, $=n^2$ based on rule 3. - $n^n$: super-exponential. - $5n$: linear based on rule 1, gives $\Theta(n)$ Thus the order from increasing rate: - $10 \quad (\Pi_{i=1}^{9}i)$ (_constant_) - $-\ln(\frac{1}{n})$ (_logarithmic_) - $\ln (2^n) \quad 5n \quad \sum_{i=0}^{\log_2{(n)}}{2^i} + 1$ (_linear_) - $n^2 \quad \sum_{i=0}^{n}5\cdot{i} \quad n^3 \cdot \sqrt{\frac{1}{n^3}} \quad \sqrt{n^4} \quad n\log_2(n^7) \quad 7^{\ln(n)}$ (_polynomial_) - $n^2 + 2^n$ (_exponential_) - $n^n$ (_super-exponential_) ### 1.2 Consider the following recurrence $$ T(n) = \begin{cases} 7 & \text{if } n \leq 1;\\ 3T(n-2) & \text{if } n > 1. \end{cases} $$ Use induction to prove that $T(n) = f(n)$ with $f(n) = 7 \cdot 3^{[\frac{n}{2}]}$ _Solution:_ #### base case: Given $T(n) = f(n) = 7 \cdot 3^{[\frac{n}{2}]}$ apply for $n=0$ and $n=1$: - $n = 1 \rightarrow T(n) = 7 \cdot 3^{[\frac{1}{2}]} = 7 \cdot 3^0 = 7$ - $n = 0 \rightarrow T(n) = 7 \cdot 3^{[\frac{0}{2}]} = 7 \cdot 3^0 = 7$ Thus the base case holds. #### induction hypothesis: Assume $T(n) = f(n)$ holds for $n > 1$. We will proves that $f(n+2) = T(n+2)$ also holds. $$ \begin{align} f(n) &= T(n) \\ T(n) = f(n) &= 3T(n-2) = 3f(n-2) = 3 \cdot 7 \cdot 3^{[\frac{n-2}{2}]} \\ T(n+2) &= 3T(n) \end{align} $$ Therefore: $$ \begin{align} T(n+2) &= 3 \cdot 3 \cdot 7 \cdot 3^{[\frac{n-2}{2}]} = 7 \cdot 3^{[\frac{n-2}{2} + 2]} \\ T(n+2) &= 7 \cdot 3^{[\frac{n+2}{2}]} = f(n+2) \end{align} $$ Therefore the induction hypothesis holds. #### conclusion: > $T(n) = f(n) = 7 \cdot 3^{[\frac{n}{2}]}$ is true for all $n \geq 0$. --- ## Problem 2 Consider the following `Count` algorithm ```prolog Algorithm Count(L, v): Pre: L is an array, v is a value i, c := 0, 0 while i neq |L| do if L[i] = v then c := c + 1 end if i := i + 1 end while return c. Post: return the number of copies of v in L ``` ### 2.1 Provide an invariant for the while loop at Line 2 _Solution_: $$ 0 \leq i \leq |L|, c = \sum_{j=0}^{i-1} \lbrack L \lbrack j \rbrack = v \rbrack $$ ### 2.2 Provide a bound function for the while loop at Line 2 _Solution_: $$ f(i) = |L| - i $$ ### 2.3 Prove that `Count` algorithm is correct. _Solution:_ Line 1: `i, c := 0, 0` - $L \lbrack 0, i)$ with $i=0$ is $L \lbrack 0, 0)$ - $L \lbrack 0,0)$ is empty, hence $c = \sum_{j=0}^{i-1} \lbrack L \lbrack j \rbrack = v \rbrack = 0$ - bound function $f(i) = |L| - i$ starts at $|L|, |L| \geq 0$ Line 2: `while i neq |L| do` - bound function $f(i)$ stops at $0$ - invariant still holds, with $i \neq |L|$ Now prove invariant holds again til reach the end of the `m-th` loop: Line 3-5: `if L[i] = v then` case: - $L \lbrack i \rbrack = v$ hence $v \in L\lbrack 0,i \rbrack$ - invariant still holds, with $i \neq |L|$ and $L\lbrack v \rbrack = v$ - $i_{\text{new}} = i + 1$ hence $0 < i_{\text{new}} \leq |L|$ implies $ 0 \leq i*{\text{new}} \leq |L|$ and $f(i*{\text{new}}) = f(i) - 1$ - $c_{\text{new}} = c + 1 = \sum_{j=0}^{i-1} \lbrack L \lbrack j \rbrack = v \rbrack + 1 = \sum_{j=0}^{i_{\text{new}}} \lbrack L \lbrack j \rbrack = v \rbrack$ - $f(i)$ strictly decreases after each iteration, $i_{\text{new}} := i + 1$ Therefore the invariant still holds within the if statement. L7: `end while` - $i = |L|$ hence $f(i) = 0$, the loop stops - $c = \sum_{j=0}^{i-1} \lbrack L \lbrack j \rbrack = v \rbrack = \sum_{j=0}^{|L|-1} \lbrack L \lbrack j \rbrack = v \rbrack$ Therefore the invariant still holds at the end of the loop. ### 2.4 What is the runtime and memory complexity of `Count` algorithm? _Solution:_ - L1 implies 2 instructions - L2 implies 2 instructions $|L| + 1$ times, - L3-5 (if loop) implies 4 instructions $|L|$ times - L6 implies 2 instructions $|L|$ times Therefore number of work is $5 + 8N$, thus runtime complexity would be $\Theta(5+8N)$. Memory complexity is $\Theta(1)$, since only 2 variables are used. ### 2.5 Provide an algorithm `FastCount(L, v)` operates on ordered lists `L` and computes the same results as `Count(L, v)` but with a $\mathcal{O}(\log_2(|L|))$ _Solution:_ ```prolog Algorithm FastCount(L, v): Pre: L is an ordered array, v is a value function binarySearchFirst(L, v) low, high := 0, |L| - 1 results := -1 while low <= high do mid := (low + high) / 2 if L[mid] < v then low := mid + 1 else if L[mid] > v then high := mid - 1 else results := mid high := mid - 1 end while return results end function function binarySearchLast(L, v) low, high := 0, |L| - 1 results := -1 while low <= high do mid := (low + high) / 2 if L[mid] < v then low := mid + 1 else if L[mid] > v then high := mid - 1 else results := mid low := mid + 1 end while return result end function firstIndex := binarySearchFirst(L, v) if firstIndex = -1 then return 0 end if lastIndex := binarySearchLast(L, v) return lastIndex - firstIndex + 1 Post: return the number of copies of v in L ``` --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/a10/A10 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a10/A10" title: MST with shortest path optimisation date: 2024-04-01 --- ## Problème 1. Consider we want to build McMaster Maps, the best online route planning tool in existence. The developers of McMaster maps have decided to represent road information in terms of a a massive graph in which nodes are crossing and the edges are the roads between crossings. The developers of McMaster Maps will be optimised toward computing the directions to McMaster University is the _only destination that matters_. Hence, McMaster Maps will be optimised toward computing the directions to McMaster University. To do so, McMaster Maps maintains _single-sink shortest path index_ that maintains the shortest path from any node to McMaster University. This index is represented by the typical _path_ and _cost_ arrays as computed by either `Dijkstra` or `Bellman-Ford`. Once in a while, an update to the road network happens: The weight of a single edge changes (this can represent adding and removing edges: addition changes the weight of the edge from $\infty$ to a numeric value and removing changes the weight of the edge from a numeric value to $\infty$) > [!question] P1.1 > > Given the road network as a graph $\mathcal{G}$, the _shortest path index_, and the edge that was changes, write an algorithm that determines whether the shortest path index is still valid. You may assume the graph is already updated. Argue why your algorithm is correct. What is the complexity of your algorithm? ```pseudo \begin{algorithm} \caption{CheckShortestPathIndexValidity($\mathcal{G}, path, cost, u, v, w$)} \begin{algorithmic} \Require $\mathcal{G}$, $path$ and $cost$, edge $(u, v)$ with new weight $w$ \Ensure Returns true if the shortest path index is still valid, false otherwise \IF{$cost[u] + w < cost[v]$} \State \Return false \ENDIF \If{$path[v] = u$ and $cost[u] + w > cost[v]$} \State \Return false \EndIf \State \Return true \end{algorithmic} \end{algorithm} ``` The shortest path index is invalid in two cases: - If the new shortest path from $u$ via edge $(u,v)$ is shorter than the current shortest path from $v$ to the destination. This is checked by the condition $cost[u] + w < cost[v]$. - If $v$‘s current shortest path to the destination goes through $u$ and the new weight $w$ makes this path longer than $v$ current shortest path cost. This is checked by $cost[u] + w > cost[v]$. If both of these conditions met, shortest path index remains valid. Therefore the algorithm is correct $\square$ The algorithm performs a constant number of comparison and array lookups. Thus it is $O(1)$ > [!question] P1.2 > > Assume the shortest path index is no longer valid: provide a modification of `Dijkstra` algorithm that restores the index to a valid state without recomputing all shortest paths. You may assume the graph is already updated. Argue why your algorithm is correct. ```pseudo \begin{algorithm} \caption{UpdateSingleSinkShortestPath($G, w, e=(u,v), \text{path}, \text{cost}$)} \begin{algorithmic} \REQUIRE Graph $G=(V,E)$, weight function $w:E\to\mathbb{R}^+$, updated edge $e=(u,v) \in E$ with new weight $w'(e)$, $\text{path}[1..|V|]$ and $\text{cost}[1..|V|]$ \ENSURE Updated shortest path tree arrays $\text{path}[1..|V|]$ and $\text{cost}[1..|V|]$ \STATE Initialize min-priority queue $Q$ with $(v, \text{cost}[u] + w'(e))$ \WHILE{$Q$ is not empty} \STATE Extract vertex $x$ with minimum $\text{cost}$ from $Q$ \FOR{each neighbor $y$ of $x$} \IF{$\text{cost}[x] + w(x,y) < \text{cost}[y]$} \STATE $\text{cost}[y] \leftarrow \text{cost}[x] + w(x,y)$ \STATE $\text{path}[y] \leftarrow x$ \IF{$y$ is in $Q$} \STATE Decrease key of $y$ in $Q$ to $\text{cost}[y]$ \ELSE \STATE Insert $y$ into $Q$ with key $\text{cost}[y]$ \ENDIF \ENDIF \ENDFOR \ENDWHILE \RETURN $\text{path}, \text{cost}$ \end{algorithmic} \end{algorithm} ``` Correctness: Let $T$ be the shortest path tree rooted at the sink vertex $t$ before the edge update, and let $T'$ be the tree true shortest path tree after the edge update. We define a following boundary function: $$ B(i) = \{ v \in V: \text{dist}_{T'}(v,t) \leq i \space \land \space \text{dist}_{T'}(v,t) < \text{dist}_T(v,t) \} $$ or $B(i)$ is the set of vertices whose true shortest path to $t$ is at most $i$ and strictly less than their distance in $T$. The following invariant will hold of the while loop: $$ \forall v \in B(i): \text{cost}[v] = \text{dist}_{T'}(v,t) \space \land \space \text{path}[v] \text{ is correct} $$ For base case, $i=0$, $B(0)$ is empty, thus invariant holds Suppose the invariant holds for $k$, that is at $k$-th iteration $B(k)$ and invariant holds. Let $x$ be the vertex extracted from $Q$ in the $k+1$-th iteration. We argue that $x \in B(k+1) \setminus B(k)$, or $\text{dist}_{T'}(x,t)=k \space \land \space \text{dist}_{T'}(x,t) < \text{dist}_{T}(x,t)$ First, note that $x$ must have been inserted into $Q$ by some previous iteration, say when visiting a vertex $y \in B(j)$ for some $j < i$. By the induction hypothesis, $\text{cost}[y] = \text{dist}_{T'}(y,t)$ at that time, so the tentative distance $\text{cost}[y] + w(y,x)$ used to insert $x$ into $Q$ equals $\text{dist}_{T'}(x,t)$. Since $x$ is extracted in the $k+1$-th iteration, we have $\text{dist}_{T'}(x,t) = k+1$. Moreover, we must have $\text{dist}_{T'}(x,t) < \text{dist}_{T}(x,t)$, for otherwise $x$ would have been visited before via a shorter path in $T$, contradicting the fact that $\text{dist}_{T'}(x,t) = i$. Thus, $x \in B(k+1) \setminus B(k)$. The algorithm then correctly updates $\text{cost}[x]$ to $\text{dist}_{T'}(x,t)$ and $\text{path}[x]$ to its parent $y$ in $T'$. Furthermore, for each neighbor $z$ of $x$, if $\text{cost}[x] + w(x,z) < \text{cost}[z]$, then the path to $z$ through $x$ is shorter than its current path, so the algorithm correctly updates $\text{cost}[z]$ and $\text{path}[z]$ and inserts $z$ into $Q$ with the updated distance. Therefore, after the $k+1$-th iteration, the invariant holds for all vertices in $B(k+1)$, including the newly added vertex $x$ and possibly some of its neighbors. By induction, the invariant holds for all $i$. $\square$ > [!question] P1.3 > > Explain which graph representation you used for your algorithm and what the complexity of your modified-`Dijkstra` algorithm is using this graph representation. > > _note_: Express the complexity in terms of the number of nodes affected by the change. For example, use a notation in which $C$ is the number of nodes affected by the edge change, $\text{incoming}(C)$ the incoming edges of $C$, and $\text{outgoing}(C)$ the outgoing edges of $C$. The algorithm use an adjacency list representation, as each vertex maintains a list of its outgoing edges. The overall complexity is $O((|\text{outgoing}(C)| |C|)\log |C|)$ The main loop run until $Q$ is empty. Per iteration, it extracts the vertex $x$ with minimum distance from $Q$, which takes $O(\log|C|)$ time, where $|C|$ is the number of affected vertices. For each outgoing edge $(x,y)$ of $x$, update the distance and parent of $y$ and either decrease its keep in $Q$ or insert into $Q$. The operation takes $O(\log|C|)$ time worst-case. The total number of iterations of the inner loop is bounded by $|\text{outgoing}(C)|$ as each outgoing edge of an affected vertex is processed at most once. Space complexity is $O(|C|)$, as min-priority queue $Q$ stores at most one entry per affected vertex. > [!question] P1.4 > > What is the worst-case complexity of your solution if you use the other graph representation? If an adjacency matrix representation is used, the graph is represented by a $|V|\times|V|$ matrix, where $|V|$ is the number of vertices in the graph. With this representation, the main difference in the algorithm is in the loop that iterates over the neighbors of the current vertex $x$. In an adjacency matrix, finding the neighbors of $x$ requires scanning the entire row corresponding to $x$ in the matrix, which takes $O(|V|)$ time. Therefore, the overall time complexity of the modified Dijkstra’s algorithm using an adjacency matrix becomes: $$ O(|C|\cdot |V| \log |C|) $$ If all vertices are affected ($|C| = |V|$), the complexity becomes $O(|V|^2 \log |V|)$. Space complexity is $O(|V|^2)$, as the matrix requires storing $|V|^2$ entries. --- ## Problème 2. Consider a company managing many servers placed all over the world. The company wants to add network connections between a minimal amount of servers to ensure that there is a path of communication between all pairs of servers. While researching this problem, the company was advised that some connections can be built more reliable than others: according to the consulted contractors, the probability that a connection $(m,n)$ between server $m$ and $n$ will work at any given time is $p(m,n)$ (We have $p(m,n)=p(n,m)$). The company wants to _minimize_ the number of connects, which _maximising_ the probability that all servers are connected to each other at any given time. We will help the company out in their challenge to figure out which connections they need to built. > [!question] P2.1 > > Model the above problem as a graph problem: what are the nodes and edges in your graph, do the edges have weights, and what problem are you trying to answer on your graph? **Nodes**: Each server is represented by a node (vertex) in the graph. Denote the set of all servers as $V$. **Edges**: potential network connections between servers are represented by edges in the graph. An edge $(m,n)$ exists between node $m$ and $n$ if a connection can be built. Denote set of all possible connections as $E$. **Weights**: Each edge $(m,n)$ has a weight $-\log(p(m,n))$ representing the probability that the connection will work at any given time. The problem is to find a _minimum spanning tree_ (MST) of the graph that connects all servers, such that the sum of the weights of the edges in the MST is maximized. Or it can be stated as: > Find a subset of edges $E' \subseteq E$ such that the graph $(V, E')$ is a minimum spanning tree and the sum of the weights of the edges in $E'$ is maximized. > [!question] P2.2 > > Provide an algorithm `NetworkPlan` to find the network connections to build. Explain why your algorithm is correct. ```pseudo \begin{algorithm} \caption{NetworkPlan} \begin{algorithmic} \REQUIRE $G$ \REQUIRE $p$ \STATE $E \gets {}$ \COMMENT{Set of edges in the graph} \FOR{each edge $(m,n)$ in $G$} \STATE $w(m,n) \gets -\log(p(m,n))$ \COMMENT{Compute edge weight} \STATE $E \gets E \cup {(m,n)}$ \ENDFOR \STATE $E \gets \text{SortEdges}(E)$ \COMMENT{Sort edges by weight in ascending order} \STATE $MST \gets \emptyset$ \STATE $DS \gets \text{MakeSet}(G.V)$ \COMMENT{Initialize disjoint sets} \FOR{each edge $(m,n)$ in $E$} \IF{$\text{FindSet}(DS, m) \neq \text{FindSet}(DS, n)$} \STATE $MST \gets MST \cup {(m,n)}$ \COMMENT{Add edge to MST} \STATE $\text{UnionSets}(DS, m, n)$ \COMMENT{Union sets containing $m$ and $n$} \ENDIF \ENDFOR \RETURN $MST$ \end{algorithmic} \end{algorithm} ``` It follows Kruskal’s algorithm to find MST. Since we are using negative logarithm of the probabilities as edge weights, the MST found by Kruskal will maximise the probability. > [!question] P2.3 > > Explain which graph representation you used for your algorithm and what the complexity of your algorithm is using this graph representation. It uses an adjacency list representation, where each node maintains a list of its neighboring nodes and the corresponding edge weights. Using an adjacency list, the time complexity of Kruskal’s algorithm is $O(E \log E)$, where $E$ is the number of edges in the graph. This is because sorting the edges takes $O(E \log E)$ time, and the main loop of Kruskal’s algorithm takes $O(E)$ time using a disjoint set data structure to efficiently check for cycles. > [!question] P2.4 > > What is the worst-case complexity of your solution if you use the other graph representation? Explain your answer. Constructing the adjacency matrix representation of the graph takes $O(V^2)$ time, where $V$ is the number of vertices in the graph. Sorting the edges takes $O(V^2 \log V^2) = O(V^2 \log V)$ time. The main loop of Kruskal’s algorithm takes $O(E \log V)$ time, where $E$ is the number of edges. This is because we need to iterate over all edges ($O(E)$) and perform the `UnionSets` operation, which takes $O(\log V)$ time using an efficient disjoint set data structure like union-by-rank and path compression. Therefore, the worst case scenario would be $O(V^2 + V^2 \log V + E \log V) = O(V^2 \log V)$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/a2/A2 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a2/A2" title: Collections and data structure types. date: 2024-02-04 --- ## Problem 1 > [!question] P1.1 > > Consider an initially-empty stack $S$ and the following sequence of operations: > > ```plaintext > PUSH(S, 3), POP(S), PUSH(S, 17), PUSH(S, 5), PUSH(S, 15), POP(S), POP(S), POP(S) > ``` > > Illustrate the result of each operation (clearly indicate the content of the stack after the operation and, in case of a `POP`, the returned value by the operation _Solution_ Stack $S$ is initially empty, or $S = \emptyset$ The sequence of operations is as follows: 1. `PUSH(S, 3)`: $S = \lbrace 3 \rbrace$ 2. `POP(S)`: $S = \emptyset$, returned value is $3$ 3. `PUSH(S, 17)`: $S = \lbrace 17 \rbrace$ 4. `PUSH(S, 5)`: $S = \lbrace 17, 5 \rbrace$ 5. `PUSH(S, 15)`: $S = \lbrace 17, 5, 15 \rbrace$ 6. `POP(S)`: $S = \lbrace 17, 5 \rbrace$, returned value is $15$ 7. `POP(S)`: $S = \lbrace 17 \rbrace$, returned value is $5$ 8. `POP(S)`: $S = \emptyset$, returned value is $17$ > [!question] P1.2 > > Consider an initially-empty queue $Q$ and the following sequence of operations: > > ```plaintext > ENQUEUE(Q, 3), DEQUEUE(Q), ENQUEUE(Q, 17), ENQUEUE(Q, 5), ENQUEUE(Q, 15), DEQUEUE(Q), DEQUEUE(Q), DEQUEUE(Q) > ``` > > Illustrate the result of each operation (clearly indicate the content of the queue after the operation and, in case of a `DEQUEUE`, the returned value by the operation _Solution_ Queue $Q$ is initially empty, or $Q = \emptyset$ The sequence of operations is as follows: 1. `ENQUEUE(Q, 3)`: $Q = \lbrace 3 \rbrace$ 2. `DEQUEUE(Q)`: $Q = \emptyset$, returned value is $3$ 3. `ENQUEUE(Q, 17)`: $Q = \lbrace 17 \rbrace$ 4. `ENQUEUE(Q, 5)`: $Q = \lbrace 17, 5 \rbrace$ 5. `ENQUEUE(Q, 15)`: $Q = \lbrace 17, 5, 15 \rbrace$ 6. `DEQUEUE(Q)`: $Q = \lbrace 5, 15 \rbrace$, returned value is $17$ 7. `DEQUEUE(Q)`: $Q = \lbrace 15 \rbrace$, returned value is $5$ 8. `DEQUEUE(Q)`: $Q = \emptyset$, returned value is $15$ > [!question] P1.3 > > Assume we have a stack implementation `MyDynArrayStack` using dynamic arrays: the implementation supports `N` push operations with an amortized runtime complexity of $\Theta(1)$ per operation, and the implementation supports `POP`, `EMPTY`, and `SIZE` operations with runtime time complexity of $\Theta(1)$ > > Provide a _queue_ implementation that uses `MyDynArrayStack` and supports any valid sequence of `N` `ENQUEUE` and `DEQUEUE` operations with an amortized runtime complexity of $\Theta(1)$ per operation. Explain why your implementation has the stated amortized runtime complexity for `ENQUEUE` and `DEQUEUE` operations. _Solution_ Queue implementation `MyDynArrayQueue` using two `MyDynArrayStack`, `sIn` and `sOut`: ```pseudo \begin{algorithm} \caption{Queue} \begin{algorithmic} \STATE $sIn := \text{MyDynArrayStack()}$ \STATE $sOut := \text{MyDynArrayStack()}$ \end{algorithmic} \end{algorithm} ``` ```pseudo \begin{algorithm} \caption{ENQUEUE(x)} \begin{algorithmic} \STATE $sIn.\text{PUSH}(x)$ \end{algorithmic} \end{algorithm} ``` ```pseudo \begin{algorithm} \caption{DEQUEUE()} \begin{algorithmic} \IF{$sOut.\text{EMPTY}()$} \WHILE{$\neg sIn.\text{EMPTY}()$} \STATE $sOut.\text{PUSH}(sIn.\text{POP}())$ \ENDWHILE \ENDIF \RETURN $sOut.\text{POP}()$ \end{algorithmic} \end{algorithm} ``` where `sIn` is used to store elements, and `sOut` is used to store elements in reverse order. Amortized runtime complexity explanation: 1. `ENQUEUE` > - `PUSH` to `sIn` has an amortized runtime complexity of $\Theta(1)$ per operation, as stated in the problem. 2. `DEQUEUE` > - When `sOut` is empty, the transfer of element from `sIn` to `sOut` has a runtime complexity of $\Theta(N)$ with worst case. > - However, the item was moved once per enqueue-dequeue cycle, so the total time spent is proportional to $N$, thus making amortized runtime complexity of $\Theta(1)$ per operation. --- ## Problem 2 Consider the relations courses(`prog`, `code`, `name`) (that models courses named name and identified by the program `prog` the course is part of, e.g., SFWRENG, and the course code code, e.g., `2C03`) and `enrolled(prog, code, sid)` (that models students with identifier `sid` enrolling for a course identified by program `prog` and course code `code`). We want to compute the list of all pairs (`sid`, `name`) in which `sid` is a student identifier $s$ and name is the name of a course the student with identifier $s$ is enrolled in. To compute this list, we developed the following two nested-loop algorithms: ```pseudo \begin{algorithm} \caption{CEJOIN(courses, enrolled)} \begin{algorithmic} \STATE $output := \emptyset$. \FOR{$(p_c, c_c, n_c) \in \text{courses}$} \FOR{$(p_e, c_e, s_e) \in \text{enrolled}$} \IF{$p_c = p_e \textbf{ and } c_c = c_e$} \STATE add $(s_e, n_c)$ to $output$. \ENDIF \ENDFOR \ENDFOR \RETURN $output$ \end{algorithmic} \end{algorithm} ``` ```pseudo \begin{algorithm} \caption{ECJOIN(courses, enrolled)} \begin{algorithmic} \STATE $output := \emptyset$. \FOR{$(p_e, c_e, s_e) \in \text{enrolled}$} \FOR{$(p_c, c_c, n_c) \in \text{courses}$} \IF{$p_c = p_e \textbf{ and } c_c = c_e$} \STATE add $(s_e, n_c)$ to $output$. \ENDIF \ENDFOR \ENDFOR \RETURN $output$ \end{algorithmic} \end{algorithm} ``` Assume we have significantly more students enrolled for courses than courses (`|enrolled| > |courses|`). > [!question] P2.1 > > Assume we are running algorithm `CEJOIN` and `ECJOIN` on a computer $\mathbb{C}$ where _every_ instruction takes _exactly the same_ amount of time to execute. Argue why `CEJOIN` must be faster than `ECJOIN` when running on computer $\mathbb{C}$? _Solution_ Given $|enrolled| > |courses|$, the difference between the two lies in the number of iterations of the inner loop. The outer loop table will only be scanned once, and the inner loop will be scanned for each iteration of the outer loop in the nested-loop algorithm: - `CEJOIN` will iterate over `enrolled` once, and `courses` over each iteration. Since `|enrolled| > |courses|`, `CEJOIN`’s inner loop will result in fewer iteration. - `ECJOIN` will iterate over `courses` once, and `enrolled` over each iteration. Since `|enrolled| > |courses|`, `ECJOIN`’s inner loop will means more iterations, comparing to `CEJOIN`. Thus, we can conclude that `CEJOIN` must be faster than `ECJOIN` when running on computer \mathbb{C. > [!question] P2.2 > > ![real-world-system](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a2/A2/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/a2/real-world-comparison.webp) Implementation of `CEJOIN` and `ECJOIN` in [impl\_22.cpp](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a2/impl_22.cpp) shows that `ECJOIN` is actually faster than `CEJOIN`. Explain why this is the case in a real-world system. _Solution_ Given the processor is using fast _caches_, the inner loop of `ECJOIN` will be faster than `CEJOIN` due to _cache locality_. Since `|enrolled| > |courses|`, the inner loop of `ECJOIN` will have better cache locality, as it will be accessing the same memory locations more frequently. Additionally, `ECJOIN`’s outer loop will only run once, therefore we can expect `ECJOIN` to be faster comparing to `CEJOIN`, since `CEJOIN` will have to access `enrolled` each iteration, which would be slower comparing to accessing `courses` (since every item in `enrolled` might not be cached). > [!question] P2.3 > > The measurements in the above figure have a few sudden jumps, e.g., at 1 500 000, 2 500 000, 4 500 000, 8 500 000, and 17 000 000. Explain what causes these jumps. _Solution_ These jumps are probably caused by the _cache size_ and _cache associativity_. As the data size increases, the number of elements might not fit into processor’s L1, L2, L3 cache, thus, causing more frequent cache misses. At these points, the dataset is so large such that the processor might have to access on slower main memory, which induces these bump we observed from the graph. We can also expect context-switching overhead given the code might be running in multiple processes or threads. However, the implementation implies that this code is running in a single thread, so we can rule out context-switching overhead. > [!question] P2.4 > > Write an algorithm that efficiently computes the same result as `CEJOIN` and `ECJOIN` in all-case $\Theta(|enrolled| \log_2{(|courses|)})$. In the design of your algorithm, you may require that either enrolled or courses is ordered. Argue why your algorithm is correct and why it has the runtime complexity we specified. _Solution_ Assuming `courses` is sorted and ordered by `(prog, code)` pair, the following algorithm can be used: ```pseudo \begin{algorithm} \caption{EFFJOIN(courses, enrolled)} \begin{algorithmic} \STATE $output := \emptyset$. \STATE $course := \text{sort}(course, \text{by}=(p_c, c_c))$ \FOR{$(p_e, c_e, n_e) \in \text{enrolled}$} \STATE $i := \text{binary-search}(courses, (p_e, c_e))$ \IF{$i \neq \text{null}$} \STATE add $(s_e, n_c)$ to $output$. \ENDIF \ENDFOR \RETURN $output$ \end{algorithmic} \end{algorithm} ``` With the following binary search algorithm: ```pseudo \begin{algorithm} \caption{binary-search(course, (p, c))} \begin{algorithmic} \STATE $l := 0$ \STATE $r := \text{len}(course) - 1$ \WHILE{$l \leq r$} \STATE $m := l + (r-l)/2$ \IF{$course[m].p_c = p \textbf{ and } course[m].c_c = c$} \RETURN $course[m]$ \ELSIF{$course[m].p_c < p \textbf{ or } (course[m].p_c = p \textbf{ and } course[m].c_c < c)$} \STATE $l := m + 1$ \ELSE \STATE $r := m - 1$ \ENDIF \ENDWHILE \RETURN $\text{null}$ \end{algorithmic} \end{algorithm} ``` In all cases this would yield a runtime complexity of $\Theta(|enrolled| \log_2{(|courses|)})$ **Correctness** 1. Sorting `course`, has time complexity of $\Theta(|courses| \log_2{|courses|})$, which will be executed once. 2. The binary search has time complexity of $\Theta(\log_2{|courses|})$, which will be executed for each iteration of `enrolled`. 3. The for loop iterate each element in `enrolled` once, which has time complexity of $\Theta(|enrolled|)$. Thus, the total time complexity is $\Theta(|enrolled| \log_2{(|courses|)})$. --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/a3/A3 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a3/A3" title: Questions on sortings and medians date: 2024-02-11 --- ## Problem 1 Consider the following program ```pseudo \begin{algorithm} \caption{Sort$(L[0 \ldots N])$} \begin{algorithmic} \REQUIRE $L$ is an array. \WHILE{$L$ is not sorted} \STATE $L \gets$ a random permutation of $L$. \ENDWHILE \ENSURE $L$ is sorted. \end{algorithmic} \end{algorithm} ``` Assume we can test that L is sorted $\Theta(|L|)$ time, that we can compute a random permutation of L in $\Theta(|L|)$ time. > [!question] P1.1 > > Does the $SORT$ program sort correctly? If yes, then provide an invariant for the while-loop and provide a bound function that can be used to prove the correctness of the program. If no, then argue why the program is not correct. _Solution_ The program does sort correctly, but inefficiently, given enough time. This is known as Bogosort, as there are finite permutation in the list, one of them are the sorted array. The invariant for the while-loop is as follow: $$ \text{Invariant: } 0 < i < S \mid L_i \text{ is a permutation of } L_0 \land S = n! $$ This invariant holds through the while-loop because the permutation of the list is a permutation of the original list, and the number of permutations is $n!$. The bound function for the while-loop is as follow: - The algorithm is probabilistic, and the expected number of iterations is $n!$. - The probability for get the sorted list is $p=\frac{1}{n!}$. One can use Bernoulli’s trial to find success probability $p$. The cumulative probability of having sorted the list after $k$ attempts is $1 - (1 - \frac{1}{n!})^k$ > [!question] P1.2 > > Assume the program $SORT$ is correct. Is the program stable? Explain why. _Solution_ The program is not stable since SORT is non-deterministic. - Each iteration generates a random permutation of the list. If the list contains duplicate elements, their relative order is then not reserved between permutations. - The algorithm does not consider the original order of elements when determine the list is sorted. > [!question] P1.3 > > What is the worst case runtime complexity of this program? What is the best case runtime complexity of this program? Is this program optimal? Explain your arguments. _Solution_ Worst case scenario occurs when the algorithm goes through all $n!$ permutations before finding the sorted list. The worst case runtime complexity is $\Theta(n! \cdot n)$. Best case scenario occurs when the first generated permutation is the sorted list, which has the probability of $\frac{1}{n!}$, which has the runtime of $\Theta(n)$ as it only needs to generate one random permutation and check for sorted. No, the program is not optimal, since it is based on random permutation, and the expected number of iterations is $n!$. There are no guarantee that the algorithm is reliable to sort the list. Definitely not as optimal as other sorting algorithm such as merge-sort or heap-sort. > [!question] P1.4 > > What is the expected case runtime complexity of this program? Explain your answer. _Solution_ Similar to previous mentioned, one can use Bernoulli’s trial to find success probability $p$. This probability $p$ of success of each run is $\frac{1}{n!}$, where $n$ is the number of elements in the list. The expected case runtime complexity is $\Theta(n! \cdot n)$, since the expected number of iterations is $n!$. (since each permutation will take $\Theta(n)$ time to check if array is sorted. ## Problem 2 The median of a list $L$ of distinct values is the middle value $\mathcal{v} \in L$: an equal number of values in $L$ are smaller and larger than $\mathcal{v}$. For example, in the list $L = [1,5,4,2,3]$, the median is 3. Consider two sorted lists $\mathcal{A} \lbrack 0 \ldots N)$ and $\mathcal{B} \lbrack 0 \ldots M)$ with $N + M$ distinct values. You may assume that the total number of values in $\mathcal{A}$ and $\mathcal{B}$ is odd ($N+M$ is odd). Hence, there is a value $\mathcal{v} \in ( \mathcal{A} \cup \mathcal{B}$ such that an equal amount $E = \lbrack \frac{N+M}{2} \rbrack$ of other values smaller and larger than $v$. > [!question] P2.1 > > Provide an algorithm `Median(A, B)` that computes the median of the combined list $\mathcal{A} \cup \mathcal{B}$ in $\mathcal{O}(\log_2(N+M))$ time. _Solution_ ```pseudo \begin{algorithm} \caption{Median$(A[0 \ldots N), B[0 \ldots M])$} \begin{algorithmic} \REQUIRE N < M $|A| \leq |B|$ \STATE $N \gets |A|$ \STATE $M \gets |B|$ \STATE $L \gets A \cup B$ \STATE $low \coloneqq 0$ \STATE $high \coloneqq N$ \WHILE{$low \leq high$} \STATE $i \coloneqq \lfloor \frac{low + high}{2} \rfloor \gets \text{index of A}$ \STATE $j \coloneq \lfloor \frac{N+M+1}{2} \rfloor - i \gets \text{index of B}$ \STATE $A_{\text{left}} = i > 0 \space ? \space A[i-1] \space : \space -\infty$ \STATE $A_{\text{right}} = i < N \space ? \space A[i] \space : \space \infty$ \STATE $B_{\text{left}} = j > 0 \space ? \space B[j-1] \space : \space -\infty$ \STATE $B_{\text{right}} = j < M \space ? \space B[j] \space : \space \infty$ \IF{$A_{\text{left}} \leq B_{\text{right}} \land B_{\text{left}} \leq A_{\text{right}}$} \IF{$(N+M) \mod 2 == 1$} \RETURN $\max(A_{\text{left}}, B_{\text{left}})$ \ELSE \RETURN $\frac{\max(A_{\text{left}}, B_{\text{left}}) + \min(A_{\text{right}}, B_{\text{right}})}{2}$ \ENDIF \ELSIF{$A_{\text{left}} > B_{\text{right}}$} \STATE $high \gets i - 1$ \ELSE \STATE $low \gets i + 1$ \ENDIF \ENDWHILE \end{algorithmic} \end{algorithm} ``` > [!question] P2.2 > > Explain why your algorithm is correct and why the complexity is $\Theta(\log_2(N+M))$. _Solution_ The median of combined list $\mathcal{A} \cup \mathcal{B}$ is the value $\mathcal{v}$ such that it either the maximum value of left elements or minimum value of right elements (since $N+M$ is odd). Additionally, it partitions the array such that left side will always contains $\lfloor \frac{M+N}{2} \rfloor$ elements. Since it employs binary search on two smaller arrays and adjusting the partition $A[i-1], A[i], B[j-1], B[j]$, it halves the search space through each iteration to smaller array. Thus, one can then achieve the complexity (of binary search) as $\Theta(\log_2(N+M))$. > [!question] P2.3 > > Let $\mathcal{P}$ be an algorithm with complexity $\Theta(\log_2(N+M))$ that computes the middle value $A \cup B$. Argue how we can use $P$ to break up the Merge-step necessary to merge two sorted lists with $N+M = 2E + 1$ values into two independent Merge-steps that each merge only $E$ values. _Solution_ After using $\mathcal{P}$ to find median of $\mathcal{A} \cup \mathcal{B}$, given that $N+M = 2E+1$, median will split the list into two halves, each with $E$ elements. Partition $\mathcal{A}$ and $\mathcal{B}$ into two subsets $\mathcal{A}_{\text{left}}, \mathcal{A}_{\text{right}}$ and $\mathcal{B}_{\text{left}}, \mathcal{B}_{\text{right}}$ such that left subsets contains items $\leq$ median, right subsets contains $gee$ median. Proceed with two independent Merge-steps that each merge only $E$ values for both lower and higher sets. Finally concatenate these two arrays into one sorted lists. Overall complexity for the merge ops is $O(2E)$ as each sub-problem involves merging $E$ elements. --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/a4/A4 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a4/A4" title: Efficient additions date: 2024-02-11 --- ## problem statement. Typically, we assume that basic operations on natural numbers (e.g., adding or multiplying two natural numbers together) are performed in constant time. In practice, this assumption is correct whenever we restrict ourselves to natural numbers with some maximum size (e.g., 64 bit natural numbers, for which basic operations are supported directly by modern processors). Applications such as cryptography often work with huge natural numbers, however (e.g., 4048 bit values, which can hold a maximum of $\approx 3.7 \cdot 10^{1218}$). Hence, for these applications we can no longer assume that operations on natural numbers are in constant time: these applications require the development of efficient algorithms even for basic operations on natural numbers. Consider two $n$-digit natural numbers $A = a_{1} \dots a_{n}$ and $B = b_{1} \dots b_{n}$ written in base 10: the digits $a_{1} \dots a_{n}$ and $b_{1} \dots b_{n}$ each have a value in $0 \dots 9$. For example, if $n=4$, then we could have $A=3456, B=9870$, in which case $a_{1}=3, a_{2}=4, a_{3}=5, a_{6}=6, b_{1}=9, b_{2}=8, b_{3}=7, b_{4}=0$. > [!question] 1.1 > > Write an algorithm `ADD(A, B)` that computes $A + B$ in $\Theta(n)$. Explain why your algorithm is correct and the runtime complexity is $\Theta(n)$. Assumption: one converts $A$ and $B$ into two arrays of $n$ integers, $A = \lbrack a_{1} \dots a_{n} \rbrack$ and $B = \lbrack b_{1} \dots b_{n} \rbrack$. ```pseudo \begin{algorithm} \caption{ADD(A, B)} \begin{algorithmic} \INPUT $A \coloneqq \lbrack a_{1} \dots a_{n} \rbrack$ \INPUT $B \coloneqq \lbrack b_{1} \dots b_{n} \rbrack$ \STATE $C \gets \lbrack \space \rbrack \text{ where } |C| = n + 1$ \STATE $carry \gets 0$ \STATE $i \gets n-1$ \WHILE{$i \geq 0$} \STATE $C[i+1] \gets (a_{i} + b_{i} + carry) \mod 10$ \STATE $carry \gets \lfloor (a_{i} + b_{i} + carry) / 10 \rfloor$ \STATE $i \gets i - 1$ \ENDWHILE \STATE $C[0] \gets carry$ \IF{$C[0] == 0$} \STATE $C \gets C[1 \dots n]$ \ENDIF \OUTPUT $C$ \end{algorithmic} \end{algorithm} ``` Runtime complexity: $\Theta(n)$ - L1 takes $\Theta(n)$ time to initialise. - `while` loop iterates $n$ times, each iteration perform constant time operations (additions, modulo, division) in $\Theta(1)$ time. - Finally, the adjustment of the output array $C$ takes $\Theta(1)$ time. Thus, total runtime complexity is $\Theta(n)$. Correctness: Invariants: $$ \begin{align} 0 \leq i \leq n-1, & \space i+2 \leq j \leq n \land c_{n-1} = 0 \\\ \quad c &= \lfloor \frac{\sum_{k=i+1}^{n-1}(a_k + b_k + c_k)}{10^{n-k-1}} \rfloor \mod 10 \\\ \quad C[i+1] &= (a_i + b_i + c) \mod 10 \\\ \quad C[j] &= ((a_{j-1} + b_{j-1} + c_{j-1}) \mod 10) \end{align} $$ where $c$ defines as the carry value resulting from the addition. bound function $f(i) = |A| - i$ starts at $|A|, |A| \geq 0$ _Proof_ Base case: $i = n-1$ (_L2,3_) Invariant for carry holds, as $c_{i} = c_{n-1} = 0$ Now we will prove these invariants still hold til reach the end of `m-th` loop: Assuming the invariants hold at the start of `m-th` loop, or: $$ \begin{align*} 0 \leq &m \leq n-1 \\\ c_m &= \lfloor \frac{\sum_{k=m}^{n-1}(a_k + b_k + c_k)}{10^{n-k-1}} \rfloor \mod 10 \\\ \quad C[m+1] &= (a_m + b_m + c_m) \mod 10 \\\ \quad C[j] &= ((a_{j-1} + b_{j-1} + c_{j-1}) \mod 10) \end{align*} $$ L4-7: The `while` loop. - Carry forward invariants holds $c_{m-1} = c_{\text{new}} = \lfloor \frac{(a_m + b_m + c_m)}{10} \rfloor \mod 10$ - $C[m+1] = (a_m + b_m + c_m) \mod 10$, or $C[m+1]$ holds correct digits after addition of $a_m, b_m$ and carry $c_m$ - $f(i)$ strictly decreases after each iteration, $i_{\text{new}} := i + 1$ Therefore the invariants holds. > [!question] 1.2 > > What is the runtime complexity of this algorithm in terms of the number of digits in A and B? Runtime complexity is $\Theta(n^2)$, where $n$ is the number of digits in $A$ and $B$. For each digits of $B$, it multiply every digits of $A$, which results in $n^2$ operations. Each addition operation takes at most $2n$ digit additions, and we perform $n$ of these additions, therefore resulting in $O(n^2)$ time. Overall, pen-and-paper addition of two $n$-digit numbers takes $\Theta(n^2)$ time. > [!question] 1.3 > > Let $C$ be an $n$-digit number with $n=2m$. Hence, $C = C_{\text{high}} \cdot 10^m + C_{\text{low}}$ where $C_{\text{high}}$ the first $m$ digits of C and $C_{\text{low}}$ is the remaining $m$ digits of C. For example, if $n=4, A=3456, B=9870$, then $m=2$ and > > $$ > \begin{aligned} &A=A_{\text{high}} \cdot 10^m + A_{\text{low}}, &A_{\text{high}} = 34,\quad &A_{\text{low}} = 56 \\\ &B=B_{\text{high}} \cdot 10^m + B_{\text{low}}, &B_{\text{high}} = 98,\quad &B_{\text{low}} = 70 \end{aligned} > $$ > > Using the breakdown of a number into their high and low part, one notices the following > > $$ > \begin{aligned} A \times B &= (A_{\text{high}} \cdot 10^m + A_{\text{low}}) \cdot (B_{\text{high}} \cdot 10^m + B_{\text{low}}) \\\ & = A_{\text{high}} \times B_{\text{high}} \cdot 10^{2m} + (A_{\text{high}} \times B_{\text{low}} + A_{\text{low}} \times B_{\text{high}}) \cdot 10^m + A_{\text{low}} \times B_{\text{low}} \end{aligned} > $$ > > Here is the following recursive algorithm `BREAKSDOWNMULTIPLY(A, B)` that computes $A \times B$: > > ```pseudo > \begin{algorithm} > \caption{BREAKSDOWNMULTIPLY(A, B)} > \begin{algorithmic} > \INPUT $A \text{ and } B \text{ have } n=2m \text{ digits}$ > \IF{$n = 1$} > \RETURN $a_{1} \times b_{1}$ > \ELSE > \STATE $hh \coloneqq \text{BREAKSDOWNMULTIPLY}(A_{\text{high}}, B_{\text{high}})$ > \STATE $hl \coloneqq \text{BREAKSDOWNMULTIPLY}(A_{\text{high}}, B_{\text{low}})$ > \STATE $lh \coloneqq \text{BREAKSDOWNMULTIPLY}(A_{\text{low}}, B_{\text{high}})$ > \STATE $ll \coloneqq \text{BREAKSDOWNMULTIPLY}(A_{\text{low}}, B_{\text{low}})$ > \RETURN $hh \cdot 10^{2m} + (hl + lh) \cdot 10^m + ll$ > \ENDIF > \RETURN $A \times B$ > \end{algorithmic} > \end{algorithm} > ``` > > Prove that algorithm `BREAKSDOWNMULTIPLY(A, B)` is correct. The proposed `BREAKSDOWNMULTIPLY(A, B)` is a variant of Karatsuba’s algorithm. Base case: $m=1 \implies n=2$, which implies $A \times B$ are correct (multiplication of two two-digits number). Through recursions, at any level $k, k = \log_2 n, n_k = 2^k \cdot m$, one would observe: - $A_k = A_{\text{high}_k} \cdot 10^{m_k} + A_{\text{low}_k}$ - $B_k = B_{\text{high}_k} \cdot 10^{m_k} + B_{\text{low}_k}$ The recursive call $hh_k, hl_k, lh_k, ll_k$ correctly computes the product of $A_k \times B_k$ til the base case. The combination of the products is proven through previous math steps, therefore, the algorithm is correct. > [!question] 1.4 > > Give a recurrence $T(n)$ for the runtime complexity of `BREAKSDOWNMULTIPLY(A, B)` Explain each term in the recurrence. > > Draw a recurrence tree for $T(n)$ and use this recurrence tree to solve the recurrence $T(n)$ by proving that $T(n) = \Theta (f(n))$ for some function $f(n)$ > > What is the runtime complexity of `BREAKSDOWNMULTIPLY(A, B)`? Do you expect this algorithm to be faster than the pen-and-paper multiplication algorithm? _Hint: Feel free to assume that $n = 2^k, k \in \mathbb{N}$. Feel free to assume that we can add two $v$-digit number in $\Theta(v)$ (e.g., using `ADD`) and that we can multiply a $v$-digit number with $10^w$ in $\Theta (v+w)$._ For two $n$ digits number $A$ and $B$, the recurrent $T(n)$ is: $$ T(n) = \begin{cases} \Theta(1) & \text{if } n = 1 \\\ 4T(n/2) + \Theta(n) & \text{if } n > 1 \end{cases} $$ - The base case when $n=1$ is $\Theta(1)$, as it only performs a single digit multiplication, without no recursive calls. - The recursive case when $n>1$ performs 4 recursive calls, each with $n/2$ digits, each on number half the size of original input (since $n=2m$), hence $4T(n/2)$. - $\Theta(n)$ is the linear time complexity adding the products of the recursive calls, per our assumption that we can multiply a $v$-digit number with $10^w$ in $\Theta(v+w)$. The recurrence tree for $T(n)$ is: ```bash T(n) ├── T(n/2) │ ├── T(n/4) │ │ ├── T(n/8) │ │ │ ├── ... │ │ │ ... │ │ │ ├── ... │ │ │ └── ... │ │ ... │ │ └── T(n/8) │ ├── T(n/4) │ │ ├── ... │ │ ├── ... │ │ ├── ... │ │ └── ... │ ├── T(n/4) │ │ ├── ... │ │ ├── ... │ │ ├── ... │ │ └── ... │ └── T(n/4) │ ├── ... │ ├── ... │ ├── ... │ └── ... ├── T(n/2) │ ├── ... │ ├── ... │ ├── ... │ └── ... ├── T(n/2) │ ├── ... │ ├── ... │ ├── ... │ └── ... └── T(n/2) ├── ... ├── ... ├── ... └── ... ``` - The total number of nodes at depth $k$ is $4^k$, since each level of recursion calls the function four times. - Work done at level $k$ is $4^k \cdot n/2^k = 2^k \cdot n$, since work done per depth is $n$ times the number of nodes add that depth. - Depth of the tree is $\log_2 n$, since the input size is halved at each level. Therefore, one can solve for $T(n)$: $$ \begin{aligned} T(n) &= \sum_{k=0}^{\log_2(n)} 2^k \cdot n \\\ &= n \cdot \sum_{k=0}^{\log_2(n)} 2^k \\\ &= n \cdot \frac{2^{\log_2(n) + 1} - 1}{2 - 1} \\\ &= n \cdot (2n - 1) \\\ &= 2n^2 - n \\\ &= \Theta(n^2) \end{aligned} $$ Thus the runtime complexity of `BREAKSDOWNMULTIPLY(A, B)` is quadratic, $\Theta(n^2)$. From here, the algorithm is the same as the pen-and-paper multiplication algorithm, which also takes $\Theta(n^2)$ time. > [!question] 1.5 > > One can observe > > $$ > (A_{\text{high}} + A_{\text{low}}) \times (B_{\text{high}} + B_{\text{low}}) = A_{\text{high}} \times B_{\text{high}} + A_{\text{high}} \times B_{\text{low}} + A_{\text{low}} \times B_{\text{high}} + A_{\text{low}} \times B_{\text{low}} > $$ > > Hence by rearranging terms, one can conclude that > > $$ > A_{\text{high}} \times B_{\text{low}} + A_{\text{low}} \times B_{\text{high}} = (A_{\text{high}} + A_{\text{low}}) \times (B_{\text{high}} + B_{\text{low}}) - A_{\text{high}} \times B_{\text{high}} - A_{\text{low}} \times B_{\text{low}} > $$ > > Based on conclusion above, $A \times B$ can be seen as: > > $$ > \begin{aligned} A \times B &= (A_{\text{high}} \cdot 10^m + A_{\text{low}}) \times (B_{\text{high}} \cdot 10^m + B_{\text{low}}) \\ &= A_{\text{high}} \times B_{\text{high}} \cdot 10^{2m} + A_{\text{high}} \times B_{\text{low}} \cdot 10^m + A_{\text{low}} \times B_{\text{high}} \cdot 10^m + A_{\text{low}} \times B_{\text{low}} \\ &= A_{\text{high}} \times B_{\text{high}} \cdot 10^{2m} + (A_{\text{high}} \times B_{\text{low}} + A_{\text{low}} \times B_{\text{high}}) \cdot 10^m + A_{\text{low}} \times B_{\text{low}} \\ &= A_{\text{high}} \times B_{\text{high}} \cdot 10^{2m} + \left(\left((A_{\text{high}} + A_{\text{low}}) \times (B_{\text{high}} + B_{\text{low}})\right) - \left(A_{\text{high}} \times B_{\text{high}}\right) - \left(A_{\text{low}} \times B_{\text{low}}\right)\right) \cdot 10^m + A_{\text{low}} \times B_{\text{low}}. \end{aligned} > $$ > > The final rewritten form of $A \times B$ only requires three multiplication terms, namely $A_{\text{high}} \times B_{\text{high}}, A_{\text{low}} \times B_{\text{low}}, (A_{\text{high}} + A_{\text{low}}) \times (B_{\text{high}} + B_{\text{low}})$ > > Use the observation to construct a recursive multiplication `SMARTMATHSMULTIPLY(A, B)` that only perform three recursive multiplications. Argue why `SMARTMATHSMULTIPLY(A, B)` is correct. ```pseudo \begin{algorithm} \caption{SMARTMATHSMULTIPLY(A, B)} \begin{algorithmic} \INPUT $A \text{ and } B \text{ have } n=2m \text{ digits}$ \IF{$n = 1$} \RETURN $a_{1} \times b_{1}$ \ELSE \STATE $hh \coloneqq \text{SMARTMATHSMULTIPLY}(A_{\text{high}}, B_{\text{high}})$ \STATE $ll \coloneqq \text{SMARTMATHSMULTIPLY}(A_{\text{low}}, B_{\text{low}})$ \STATE $mid \coloneqq \text{SMARTMATHSMULTIPLY}(A_{\text{high}} + A_{\text{low}}, B_{\text{high}} + B_{\text{low}})$ \RETURN $hh \cdot 10^{2m} + (mid - hh - ll) \cdot 10^m + ll$ \ENDIF \RETURN $A \times B$ \end{algorithmic} \end{algorithm} ``` The proposed `SMARTMATHSMULTIPLY(A, B)` is _the basis_ of Karatsuba’s algorithm. Base case: $n=1$, which implies $A \times B$ are correct (multiplication of two single digit number). Assume that `SMARTMATHSMULTIPLY(A, B)` correctly computes the product of $A \times B$ for $A, B$ with lest than $n$ digits. The following invariants hold per recursive call: - $A = A_{\text{high}} \cdot 10^m + A_{\text{low}} \land B = B_{\text{high} \cdot 10^m + B_{\text{low}}}$ where $m = \frac{n}{2}$ (true from problem statement and $n=2^k$) - recursive call computes $P_{1}, P_{2}, P_{3}$ correctly, where $P_{1} = A_{\text{high}} \times B_{\text{high}}, P_{2} = A_{\text{low}} \times B_{\text{low}}, P_{3} = (A_{\text{high}} + A_{\text{low}}) \times (B_{\text{high}} + B_{\text{low}})$ for numbers fewer than $n$ digits (from induction hypothesis) - combination invariants: $P_{4} = P_{3}-P_{2}-P_{1} \land A \times B = P_{1} \cdot 10^{2m} + P_{4} \cdot 10^m + P_{2}$ (true from previous statement) Thus, the algorithm is correct. > [!question] 1.6 > > Give a recurrence $T(n)$ for the runtime complexity of `SMARTMATHSMULTIPLY(A, B)` Explain each term in the recurrence. > > Solve the recurrence $T(n)$ by proving that $T(n) = \Theta (f(n))$ for some function $f(n)$. Use any methods that you find comfortable with. > > What is the runtime complexity of `SMARTMATHSMULTIPLY(A, B)`? Do you expect this algorithm to be faster than the pen-and-paper multiplication algorithm? _Hint: Feel free to assume that $n = 2^k, k \in \mathbb{N}$. Feel free to assume that we can add two $v$-digit number in $\Theta(v)$ (e.g., using `ADD`) and that we can multiply a $v$-digit number with $10^w$ in $\Theta (v+w)$._ For two $n$ digits number $A$ and $B$, the recurrent $T(n)$ is: $$ T(n) = \begin{cases} \Theta(1) & \text{if } n = 1 \\\ 3T(n/2) + \Theta(n) & \text{if } n > 1 \end{cases} $$ - The base case when $n=1$ is $\Theta(1)$, as it only performs a single digit multiplication, without no recursive calls. - The recursive case when $n>1$ performs 3 recursive calls, each with $n/2$ digits, each on number half the size of original input (since $n=2m$), hence $3T(n/2)$. - $\Theta(n)$ is the linear time complexity adding the products of the recursive calls, per our assumption that we can multiply a $v$-digit number with $10^w$ in $\Theta(v+w)$. Using `Master Theorem`, we can solve for $T(n)$, with $a=3, b=2, f(n)=\Theta (n) = n^{\log_2 3}$. > The master theorem states that if $f(N) = \Theta (N^{\log_b a} \log^{k}(N))$, with $k>0$, then $T(N) = \Theta (N^{\log_b a} \log^{k+1} N)$. Thus $T(n) = \Theta(n^{\log_2 3} \log(n)) = \Theta (n^{\log_2 3})$ > [!tip] Runtime complexity of > > This algorithm is expected to be faster than the pen-and-paper multiplication algorithm, which also takes $\Theta(n^2)$ time. --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/a5/A5 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a5/A5" title: Min heap and binary search tree date: 2024-02-26 --- ## Problème 1. Consider the following sequence of values $S = [3, 42, 39, 86, 49, 89, 99, 20, 88, 51, 64]$ > [!note] Note > > We can represent tree textually via the following representation > > ```text > 13 ( > 11 ( > 8 ( > 2 > 4 > ) > 12 ( > * > 1 > ) > ) > 7 ( > 5 > 6 ( > 99 > * > ) > ) > ) > ``` > > Where we use $*$ as a placeholder for a missing child for those nodes that only have a single child. > [!question] P1.1 > > Draw the min heap (as a tree) obtained by adding the values in $S$ in sequence. Show each step 1. $S = [3]$. The root of the heap. ```text 3 ``` 2. $S = [3, 42]$. Added to the left of the root. ```text 3 ( 42 * ) ``` 3. $S = [3, 42, 39]$. Added to the right of the root. ```text 3 ( 42 39 ) ``` 4. $S = [3, 42, 39, 86]$. Added to the left of the left child of the root. (42 < 86) ```text 3 ( 42 ( 86 * ) 39 ) ``` 5. $S = [3, 42, 39, 86, 49]$. Added to the right of 42. ```text 3 ( 42 ( 86 49 ) 39 ) ``` 6. $S = [3, 42, 39, 86, 49, 89]$. Added to the left of 39. ```text 3 ( 42 ( 86 49 ) 39 ( 89 ) ) ``` 7. $S = [3, 42, 39, 86, 49, 89, 99]$. Added to the right of 39. ```text 3 ( 42 ( 86 49 ) 39 ( 89 99 ) ) ``` 8. $S = [3, 42, 39, 86, 49, 89, 99, 20]$. 20 becomes left child of 86. (20 < 86) then swap. (20 < 42) then swap. ```text 3 ( 20 ( 42 ( 86 * ) 49 ) 39 ( 89 99 ) ) ``` 9. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88]$. 88 becomes right of 42 ```text 3 ( 20 ( 42 ( 86 88 ) 49 ) 39 ( 89 99 ) ) ``` 10. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88, 51]$. 51 becomes right of 49 ```text 3 ( 20 ( 42 ( 86 88 ) 49 ( 51 * ) ) 39 ( 89 99 ) ) ``` 11. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88, 51, 64]$. 64 becomes right of 49 ```text 3 ( 20 ( 42 ( 86 88 ) 49 ( 51 64 ) ) 39 ( 89 99 ) ) ``` > [!question] P1.2 > > Draw the max heap (as a tree) obtained by adding the values in $S$ in sequence. Show each step 1. $S = [3]$. The root of the heap. ```text 3 ``` 2. $S = [3, 42]$. 42 becomes the root, 3 becomes left child. ```text 42 ( 3 * ) ``` 3. $S = [3, 42, 39]$. 39 becomes right child. ```text 42 ( 3 39 ) ``` 4. $S = [3, 42, 39, 86]$. 86 becomes root. 42 becomes left child, 3 becomes left child of 42. ```text 86 ( 42 ( 3 * ) 39 ) ``` 5. $S = [3, 42, 39, 86, 49]$. 49 becomes left child of 86, swap 42, 42 becomes right child of 49. ```text 86 ( 49 ( 3 42 ) 39 ) ``` 6. $S = [3, 42, 39, 86, 49, 89]$. 89 become routes, swap 86, 49. ```text 89 ( 49 ( 3 42 ) 86 ( 39 * ) ) ``` 7. $S = [3, 42, 39, 86, 49, 89, 99]$. 99 becomes root, swap 89, 49. ```text 99 ( 49 ( 3 42 ) 89 ( 39 86 ) ) ``` 8. $S = [3, 42, 39, 86, 49, 89, 99, 20]$. 20 swap with 3, 3 becomes left child of 20. ```text 99 ( 49 ( 20 ( 3 * ) 42 ) 89 ( 39 86 ) ) ``` 9. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88]$. 88 becomes left child of 99, swap 49, 20. ```text 99 ( 88 ( 49 ( 20 ( 3 * ) * ) 42 ) 89 ( 39 86 ) ) ``` 10. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88, 51]$. 51 becomes right child of 88, swap 42, 20. ```text 99 ( 88 ( 51 ( 42 ( 20 ( 3 * ) ) * ) 49 ) 89 ( 39 86 ) ) ``` 11. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88, 51, 64]$. 64, pushes 49 down. ```text 99 ( 88 ( 51 ( 42 ( 20 ( 3 * ) ) * ) 64 ( 49 * ) ) 89 ( 39 86 ) ) ``` > [!question] P1.3 > > Draw the binary search tree obtained by adding the values in $S$ in sequence. Show each step 1. $S = [3]$. The root of the tree. ```text 3 ``` 2. $S = [3, 42]$. 42 becomes the right child of 3. ```text 3 ( * 42 ) ``` 3. $S = [3, 42, 39]$. 39 becomes the left child of 42. ```text 3 ( * 42 ( 39 * ) ) ``` 4. $S = [3, 42, 39, 86]$. 86 becomes the right child of 42. ```text 3 ( * 42 ( 39 86 ) ) ``` 5. $S = [3, 42, 39, 86, 49]$. 49 becomes the left child of 86. ```text 3 ( * 42 ( 39 86 ( 49 * ) ) ) ``` 6. $S = [3, 42, 39, 86, 49, 89]$. 89 becomes the right child of 86. ```text 3 ( * 42 ( 39 86 ( 49 89 ) ) ) ``` 7. $S = [3, 42, 39, 86, 49, 89, 99]$. 99 becomes the right child of 89. ```text 3 ( * 42 ( 39 86 ( 49 89 ( * 99 ) ) ) ) ``` 8. $S = [3, 42, 39, 86, 49, 89, 99, 20]$. 20 becomes the left child of 39. ```text 3 ( * 42 ( 39 ( 20 * ) 86 ( 49 89 ( * 99 ) ) ) ) ``` 9. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88]$. 88 becomes the right child of 86. ```text 3 ( * 42 ( 39 ( 20 * ) 86 ( 49 89 ( 88 99 ) ) ) ) ``` 10. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88, 51]$. 51 becomes the right child of 49. ```text 3 ( * 42 ( 39 ( 20 * ) 86 ( 49 ( * 51 ) 89 ( 88 99 ) ) ) ) ``` 11. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88, 51, 64]$. 64 becomes the left child of 51. ```text 3 ( * 42 ( 39 ( 20 * ) 86 ( 49 ( * 51 ( * 64 ) ) 89 ( 88 99 ) ) ) ) ``` ## Problème 2. Given an ordered list $L$ and value $v$, the `LowerBound` algorithm provide the position $p$ in list $L$ such that $p$ is the first offset in $L$ of a value larger-equal to $v$. Hence, $v \leq L[p]$ (or, if no such offset exists, $p = |L|$). The `LowerBound` algorithm does so in $\Theta(\log_2(|L|))$ comparisons. Argue that `LowerBound` is _worst-case optimal_: any algorithm that finds the correct position $p$ for any inputs $L$ and $v$ using only comparisons will require $\Theta(\log_2(|L|))$ comparisons. _Solution_ For a list of size $|L|$ there are $|L| + 1$ possible outcomes for the position $p$ in the list. The minimum height of a binary tree needed for $|L| + 1$ outcomes is $\log_2(|L| + 1)$ (at most $2^h$ leaves or $2^h \geq |L| + 1 \rightarrow h \geq \log_2(|L| +1)$ From Stirling’s approximation, comparison-based sorting algorithm lower bound is $\Omega(n \log(n))$. Given that the algorithm operates in $\Theta(\log_2(|L|))$ comparisons, it matches with the theoretical lower bound for the search algorithm. Therefore, no comparison-based algorithm can guarantee a better worst-case performance for position $p$, making `LowerBound` the worst-case optimal. ## Problème 3. Min heaps and max heaps allow one to efficiently store values and efficiently look up and remove the _smallest values_ and _largest values_, respectively. One cannot easily remove the largest value from a min heap or the smallest value from a max heap, however. > [!question] P3.1 > > Assume a value $v$ is a part of a min heap of at-most $n$ values and that we know v is stored at position $p$ in that heap. Provide an algorithm that can remove $v$ from the heap in worst-case $\mathcal{O}(\log_2(n))$ ```pseudo \begin{algorithm} \caption{RemoveValue($heap, p$)} \begin{algorithmic} \Procedure{RemoveValue}{$heap, i$} \State $n \gets heap.length$ \State $temp \gets heap[p]$ \State $heap[p] \gets heap[n]$ \State $heap[n] \gets temp$ \State $heap \gets heap[:n]$ \State $\text{HeapifyDown}(heap, p)$ \EndProcedure \end{algorithmic} \end{algorithm} ``` ```pseudo \begin{algorithm} \caption{HeapifyDown($heap, p$)} \begin{algorithmic} \Procedure{HeapifyDown}{$heap, i$} \State $n \gets \text{size of } heap$ \While{$\text{lchild}(i) \leq n$} \State $\text{left} \gets \text{lchild}(i)$ \State $\text{right} \gets \text{rchild}(i)$ \State $\text{smallest} \gets i$ \If{$\text{left} \leq n \text{ and } heap[\text{left}] < heap[\text{smallest}]$} \State $\text{smallest} \gets \text{left}$ \EndIf \If{$\text{right} \leq n \text{ and } heap[\text{right}] < heap[\text{smallest}]$} \State $\text{smallest} \gets \text{right}$ \EndIf \If{$\text{smallest} = i$} \State \textbf{break} \Else \State $\text{Swap } heap[i] \text{ with } heap[\text{smallest}]$ \State $i \gets \text{smallest}$ \EndIf \EndWhile \EndProcedure \end{algorithmic} \end{algorithm} ``` > [!question] P3.2 > > Provide a data structure that allows one to efficiently store values and efficiently look up and remove _both_ the smallest and the largest values: all three of these operations should be supported in $\Theta(\log_2(n))$ We will implement a Double-ended Priority Queue (DEPQ), which is a min-max heap. ```pseudo \begin{algorithm} \caption{Insert($heap, v$)} \begin{algorithmic} \Procedure{Insert}{$heap, v$} \State $heap.push(v)$ \State $\text{Swim}(heap, \text{size}(heap))$ \EndProcedure \end{algorithmic} \end{algorithm} ``` ```pseudo \begin{algorithm} \caption{RemoveMin($heap$)} \begin{algorithmic} \Procedure{RemoveMin}{$heap$} \State $n \gets \text{size}(heap)$ \State $temp \gets heap[1]$ \State $heap[1] \gets heap[\text{size}(heap)]$ \State $heap[n] \gets temp$ \State $heap \gets heap[:n]$ \State $\text{Sink}(heap, 1)$ \EndProcedure \end{algorithmic} \end{algorithm} ``` ```pseudo \begin{algorithm} \caption{RemoveMax($heap$)} \begin{algorithmic} \Procedure{RemoveMax}{$heap$} \State $maxPos \gets \text{argmax}\{heap[2], heap[3]\}$ \State $heap[maxPos] \gets heap[\text{size}(heap)]$ \State $\text{remove last el from} heap$ \State $\text{Sink}(heap, maxPos)$ \EndProcedure \end{algorithmic} \end{algorithm} ``` ```pseudo \begin{algorithm} \caption{Swim($heap, i$)} \begin{algorithmic} \Procedure{Swim}{$heap, i$} \While{$i > 1$} \State $parent \gets \lfloor i/2 \rfloor$ \State $grandParent \gets \lfloor parent/2 \rfloor$ \If{$(i \mod 2 = 0 \text{ and } heap[i] < heap[parent]) \text{ or } (i \mod 2 \neq 0 \text{ and } heap[i] > heap[parent])$} \State $\text{Swap}(heap[i], heap[parent])$ \EndIf \If{$grandParent \geq 1 \text{ and } (heap[i] < heap[grandParent] \text{ or } heap[i] > heap[grandParent])$} \State $\text{Swap}(heap[i], heap[grandParent])$ \EndIf \State $i \gets parent$ \EndWhile \EndProcedure \end{algorithmic} \end{algorithm} ``` ```pseudo \begin{algorithm} \caption{Sink($heap, i$)} \begin{algorithmic} \Procedure{Sink}{$heap, i$} \State $n \gets \text{size}(heap)$ \While{$\text{lchild}(i) \leq n$} \State $left \gets \text{lchild}(i)$ \State $right \gets \text{rchild}(i)$ \State $target \gets i$ \If{$\text{on min level and } heap[left] < heap[target]$} \State $target \gets left$ \ElsIf{$\text{on max level and } heap[left] > heap[target]$} \State $target \gets left$ \EndIf \If{$right \leq \text{size}(heap)$} \If{$\text{on min level and } heap[right] < heap[target]$} \State $target \gets right$ \ElsIf{$\text{on max level and } heap[right] > heap[target]$} \State $target \gets right$ \EndIf \EndIf \If{$target = i$} \State \textbf{break} \Else \State $\text{Swap}(heap[i], heap[target])$ \State $i \gets target$ \EndIf \EndWhile \EndProcedure \end{algorithmic} \end{algorithm} ``` --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/a6/A6 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a6/A6" title: LLRB and hash tables date: 2024-02-26 --- ## Problème 1. Consider the sequence of values $S=[3, 42, 39, 86, 49, 89, 99, 20, 88, 51, 64]$ > [!question] 1.1 > > Draw a left-leaning red-black tree obtained by adding the values in S in sequence. Show each step. Let the following format as node: `[R|B]-` where `[R|B]` denotes whether the node is red or black, followed by its value. The left-leaning red-black tree obtained by adding the values in S in sequence is as follows: 1. $S=[3]$ ```plaintext B-3 ``` 2. $S=[3, 42]$ New node become red, 42 ```plaintext B-3 \ R-42 ``` 3. $S=[3, 42, 39]$ New root 39, rotate color for 3, 42 ```plaintext B-39 / \ R-3 R-42 ``` 4. $S=[3, 42, 39, 86]$ Add 86, rotate color for tree ```plaintext B-39 / \ R-3 R-42 \ R-86 ``` 5. $S=[3, 42, 39, 86, 49]$ Add 49, rotate color for tree 49, 42 ```plaintext B-39 / \ R-3 R-49 / \ R-42 R-86 ``` 6. $S=[3, 42, 39, 86, 49, 89]$ Add 89, rotate color for 42, 86 ```plaintext B-39 / \ R-3 R-49 / \ B-42 B-86 \ R-89 ``` 7. $S=[3, 42, 39, 86, 49, 89, 99]$ Add 99, rotate color for 86, 89 ```plaintext B-39 / \ R-3 R-49 / \ B-42 B-89 / \ R-86 R-89 ``` 8. $S=[3, 42, 39, 86, 49, 89, 99, 20]$ Add 20, right of 3, red. Rotate color of 3 ```plaintext B-39 / \ B-3 R-49 | | \ R-20 B-42 B-89 / \ R-86 R-89 ``` 9. $S=[3, 42, 39, 86, 49, 89, 99, 20, 88]$ Add 88, correct root of 49, balance tree to L-39. R-89 ```plaintext B-49 / \ R-39 R-89 / | | \ B-3 R-42 B-86 B-99 | | R-20 R-88 ``` 10. $S=[3, 42, 39, 86, 49, 89, 99, 20, 88, 51]$ Add 51, left of 86 ```plaintext B-49 / \ R-39 R-89 / | | \ B-3 R-42 B-86 B-99 | / \ R-20 R-51 R-88 ``` 11. $S=[3, 42, 39, 86, 49, 89, 99, 20, 88, 51, 64]$ Add 64, 86 comes red, 64 becomes right of 51, rotate color 51, 88, 86 ```plaintext B-49 / \ R-39 R-89 / | | \ B-3 R-42 R-86 B-99 | / \ R-20 B-51 B-88 | R-64 ``` > [!question] 1.2 > > Consider the hash function $h(x) = (x+7) \text{ mod } 13$ a hash-table of 13 table entries that uses hashing with separate chaining. Draw the hash-table obtained by adding the values in $S$ in sequence. Show each step. The hash-table obtained by adding the values in $S$ in sequence is as follows: 1. $S=[3]$ $h(3) = 10 \text{ mod } 13 = 10$ ```plaintext 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3 11: 12: ``` 2. $S=[3, 42]$ $h(42) = 49 \text{ mod } 13 = 10$ Collision with 3, chaining with 3 ```plaintext 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3 -> 42 11: 12: ``` 3. $S=[3, 42, 39]$ $h(39) = 46 \text{ mod } 13 = 7$ ```plaintext 0: 1: 2: 3: 4: 5: 6: 7: 39 8: 9: 10: 3 -> 42 11: 12: ``` 4. $S = [3, 42, 39, 86]$ $h(86) = 93 \text{ mod } 13 = 2$ ```plaintext 0: 1: 2: 86 3: 4: 5: 6: 7: 39 8: 9: 10: 3 -> 42 11: 12: ``` 5. $S = [3, 42, 39, 86, 49]$ $h(49) = 56 \text{ mod } 13 = 4$ ```plaintext 0: 1: 2: 86 3: 4: 49 5: 6: 7: 39 8: 9: 10: 3 -> 42 11: 12: ``` 6. $S = [3, 42, 39, 86, 49, 89]$ $h(89) = 96 \text{ mod } 13 = 5$ ```plaintext 0: 1: 2: 86 3: 4: 49 5: 89 6: 7: 39 8: 9: 10: 3 -> 42 11: 12: ``` 7. $S = [3, 42, 39, 86, 49, 89, 99]$ $h(99) = 106 \text{ mod } 13 = 2$ Collide with 86, chaining with 86 ```plaintext 0: 1: 2: 86 -> 99 3: 4: 49 5: 89 6: 7: 39 8: 9: 10: 3 -> 42 11: 12: ``` 8. $S = [3, 42, 39, 86, 49, 89, 99, 20]$ $h(20) = 27 \text{ mod } 13 = 1$ ```plaintext 0: 1: 27 2: 86 -> 99 3: 4: 49 5: 89 6: 7: 39 8: 9: 10: 3 -> 42 11: 12: ``` 9. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88]$ $h(88) = 95 \text{ mod } 13 = 4$ Collide with 49, chaining with 49 ```plaintext 0: 1: 27 2: 86 -> 99 3: 4: 49 -> 88 5: 89 6: 7: 39 8: 9: 10: 3 -> 42 11: 12: ``` 10. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88, 51]$ $h(51) = 58 \text{ mod } 13 = 6$ ```plaintext 0: 1: 27 2: 86 -> 99 3: 4: 49 -> 88 5: 89 6: 51 7: 39 8: 9: 10: 3 -> 42 11: 12: ``` 11. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88, 51, 64]$ $h(64) = 71 \text{ mod } 13 = 6$ Collide with 51, chaining with 51 ```plaintext 0: 1: 27 2: 86 -> 99 3: 4: 49 -> 88 5: 89 6: 51 -> 64 7: 39 8: 9: 10: 3 -> 42 11: 12: ``` > [!question] 1.3 > > Consider the hash function $h(x) = (x+7) \text{ mod } 13$ a hash-table of 13 table entries that uses hashing with linear probing. Draw the hash-table obtained by adding the values in $S$ in sequence. Show each step. 1. $S=[3]$ $h(3) = 10 \text{ mod } 13 = 10$ ```plaintext 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3 11: 12: ``` 2. $S=[3, 42]$ $h(42) = 49 \text{ mod } 13 = 10$ Collision with 3, increment index to 11 ```plaintext 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3 11: 42 12: ``` 3. $S=[3, 42, 39]$ $h(39) = 46 \text{ mod } 13 = 7$ ```plaintext 0: 1: 2: 3: 4: 5: 6: 7: 39 8: 9: 10: 3 11: 42 12: ``` 4. $S = [3, 42, 39, 86]$ $h(86) = 93 \text{ mod } 13 = 2$ ```plaintext 0: 1: 2: 86 3: 4: 5: 6: 7: 39 8: 9: 10: 3 11: 42 12: ``` 5. $S = [3, 42, 39, 86, 49]$ $h(49) = 56 \text{ mod } 13 = 4$ ```plaintext 0: 1: 2: 86 3: 4: 49 5: 6: 7: 39 8: 9: 10: 3 11: 42 12: ``` 6. $S = [3, 42, 39, 86, 49, 89]$ $h(89) = 96 \text{ mod } 13 = 5$ ```plaintext 0: 1: 2: 86 3: 4: 49 5: 89 6: 7: 39 8: 9: 10: 3 11: 42 12: ``` 7. $S = [3, 42, 39, 86, 49, 89, 99]$ $h(99) = 106 \text{ mod } 13 = 2$ Collide with 86, increment index to 3 ```plaintext 0: 1: 2: 86 3: 99 4: 49 5: 89 6: 7: 39 8: 9: 10: 3 11: 42 12: ``` 8. $S = [3, 42, 39, 86, 49, 89, 99, 20]$ $h(20) = 27 \text{ mod } 13 = 1$ ```plaintext 0: 1: 27 2: 86 3: 99 4: 49 5: 89 6: 7: 39 8: 9: 10: 3 11: 42 12: ``` 9. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88]$ $h(88) = 95 \text{ mod } 13 = 4$ Collide with 49, index to 5 at 5, collide with 89, index to 6 ```plaintext 0: 1: 27 2: 86 3: 99 4: 49 5: 89 6: 88 7: 39 8: 9: 10: 3 11: 42 12: ``` 10. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88, 51]$ $h(51) = 58 \text{ mod } 13 = 6$ Collide with 88, index to 7 at 7, collide with 39, index to 8 ```plaintext 0: 1: 27 2: 86 3: 99 4: 49 5: 89 6: 88 7: 39 8: 51 9: 10: 3 11: 42 12: ``` 11. $S = [3, 42, 39, 86, 49, 89, 99, 20, 88, 51, 64]$ $h(64) = 71 \text{ mod } 13 = 6$ Collide with 88, index to 7 at 7, collide with 39, index to 8 at 8, collide with 51, index to 9 ```plaintext 0: 1: 27 2: 86 3: 99 4: 49 5: 89 6: 88 7: 39 8: 51 9: 64 10: 3 11: 42 12: ``` ## Problème 2. Consider a list $L$ of $N$ sorted values. Show how to construct a valid left-leaning red-black tree holding the values in $L$ in $\Theta(N)$ _Solution_ The following depicts the pseudocode implementation of the program ```pseudo \begin{algorithm} \caption{BuildLLRBTree($L, \text{start}, \text{end}$)} \begin{algorithmic} \IF{$\text{start} > \text{end}$} \RETURN $NULL$ \ENDIF \STATE $\text{mid} \gets (start + end) / 2$ \STATE $\text{left} \gets BuildLLRBTree(L, start, mid-1)$ \STATE $\text{right} \gets BuildLLRBTree(L, mid+1, end)$ \STATE $\text{node} \gets$ new Node($L[mid]$) \STATE $\text{node.left} \gets left$ \STATE $\text{node.right} \gets right$ \STATE $\text{node.color} \gets \text{BLACK}$ \RETURN $node$ \end{algorithmic} \end{algorithm} ``` ## Problème 3. Consider a set of strings $S$. We want to figure out whether $S$ has duplicates efficiently. We do not want to do so by sorting $S$ and then checking for duplicates: comparing strings can be a lot of work (e.g., they might differ in only a single character). Assume that you have a hash function $h$ that can compute a suitable hash code for any string $s \in S$ in $\mathcal{O}(|s|)$. Show how one can use hashing to find whether $S$ has duplicates without performing many comparisons between strings. Your algorithm should have an expected runtime of $\mathcal{O}(|S|)$ in which $|S| = \sum_{s \in S}|s|$ represents the total length of all strings in $S$. _Solution_ ```pseudo \begin{algorithm} \caption{Check for Duplicates Using Hashing} \begin{algorithmic} \Require A set of strings $S$ \Ensure True if there are duplicates in $S$, False otherwise \State Initialize an empty hash table $H$ \For{each string $s \in S$} \State Compute $h(s)$ using the hash function \If{$h(s)$ is in $H$} \For{each string $s' \in H[h(s)]$} \If{$s = s'$} \State \Return True \EndIf \EndFor \State Append $s$ to $H[h(s)]$ \Else \State Insert $s$ into $H$ at $h(s)$ \EndIf \EndFor \State \Return False \end{algorithmic} \end{algorithm} ``` --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/a7/A7 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a7/A7" title: Graph search and A-star date: 2024-03-17 --- ## Problème 1. Consider a $m \times n$ game board in which each cell has a numeric value, e.g: $$ \begin{array}{c|c|c|c|c|c|} & \textbf{A} & \textbf{B} & \textbf{C} & \textbf{D} & \textbf{E} & \textbf{F}\\ \hline \textbf{G} & 1 & 2 & 2 & 3 & 4 & 2\\ \hline \textbf{H} & 3 & 4 & 4 & 4 & 4 & 1\\ \hline \textbf{I} & 1 & 4 & 1 & 3 & 1 & 4\\ \hline \textbf{J} & 2 & 3 & 1 & 4 & 1 & 2\\ \hline \textbf{K} & 3 & 3 & 2 & 1 & 4 & 2\\ \hline \end{array} $$ A player starts the game with a token in the top-left cell (the cell GA in this example) and the player finishes the game by moving to the bottom-right cell (the cell KF in this example). In each round of the game, the player can move in four directions (up, down, left, and right). The distance of each move is determined by the value of the cell. When going over the border of the game board, one ends up on the other side. For example, if the player is in the cell JB, which has value 3, then the player can move 3 steps up (reaching GB), 3 steps right (reaching JE), 3 steps down (reaching HB), and 3 steps left (reaching JE). The score of a player is determined in the total number of rounds the player needs to reach the bottom-right cell. > [!question] P1.1 > > Model the above problem as a graph problem: What are the nodes and edges in your graph, do the edges have weights, and what problem are you trying to answer on your graph? The game will be modelled as an unweighted directed graph, where the problem is to find the shortest path (minimum number of rounds) to get from top-left cell to bottom-right cell: - **Nodes**: Each cell in the game board is a node. - **Edges**: edges are possible moves from one cell to another. The edges will have unweighted, as each moves takes one round regardless of the distance. For example, cell $JB$ will be connected to $GB, JE, JE, BH$ > [!question] P1.2 > > Provide an efficient algorithm that given a $m \times n$ game board, will find an optimal solution if such a solution exists. If the game board has no solution, then the algorithm should report that the game board is invalid. The runtime of the algorithm should be worst-case $\mathcal{O}(mn)$ Implement a breadth-first search to find shortest path: ```pseudo \begin{algorithm} \caption{Shortest Path} \begin{algorithmic} \Procedure{ShortestPath}{$\text{board, m, n}$} \State ${start} \gets (0, 0)$ \State ${end} \gets (m-1, n-1)$ \State ${queue} \gets \text{empty queue}$ \State ${visited} \gets \text{boolean array of size } m \times n \text{ initialized to } false$ \State ${distance} \gets \text{integer array of size } m \times n \text{ initialized to } \infty$ \State $visited[start] \gets true$ \State $distance[start] \gets 0$ \State $queue.\text{enqueue}(start)$ \While{$queue \text{ is not empty}$} \State $(i, j) \gets queue.\text{dequeue}()$ \If{$(i, j) = end$} \State \Return $distance[end]$ \EndIf \State $value \gets board[i][j]$ \For{$(x,y) \text{ in } \{(i - value, j), (i + value, j), (i, j - value), (i, j + value)\}$} \State $(x, y) \gets (x \bmod m, y \bmod n)$ \State $new \gets (x, y)$ \If{$visited[new] = false$} \State $visited[new] \gets true$ \State $distance[new] \gets distance[cell] + 1$ \State $queue.\text{enqueue}(new)$ \EndIf \EndFor \EndWhile \State \Return $\infty$ \Comment{ No solution exists} \EndProcedure \end{algorithmic} \end{algorithm} ``` > [!question] P1.3 > > Explain why your algorithm is correct and has a complexity that is worst-case $\mathcal{O}(mn)$ For BFS, for shortest path in unweighted path has runtime complexity of $\mathcal{O}(V+E)$ worst-case. In this setup, $V=mn$ (number of vertices, which is $mn$), and $E \leq 4mn$ (number of possible moves, up, left, right, down, which would be $\leq 4mn$). Thus, worst-case runtime complexity is $\mathcal{O}(mn)$ Let $T(m,n)$ be maximum number of operations performed by the algorithm. The boundary function can be defined as: $$ T(m,n) \leq c \cdot mn $$ where $c$ is a constant. Invariance requires BFS traversal holds true. Base case: starts cell is (0, 0), therefore $T(1,1) = 1 \leq c \cdot 1 \cdot 1$ Assume that invariance holds for all cell up to $k^{\text{th}}$ cell, that is, for any $(i, j)$ we have $T(i,j) \leq c \cdots ij$ Consider the $(k+1)^{\text{th}}$ cell $(i,j)$ processed. The number of operations is as followed: - dequeue $(i,j)$: constant $c_2$ - check if $(i,j)$ is the end cell: constant $c_3$ - retrieve value of cell: constant $c_4$ - iterate through the 4 possible moves: constant $c_5$ - check if neighbor is visited and mark nodes: constant $c_6$ Total number of operations is $c_2 + c_3 + c_4 + c_5 + c_6 = c_7$ Thus, number of operations performed for the $(k+1)^{\text{th}}$ cell is: $T(i,j) \leq c \cdot ij + c_7 \leq c \cdot mn + c_7$ $$ T(i,j) \leq c \cdot mn + c_7 \leq c \cdot mn + c \leq 2c \cdot mn $$ Therefore, it holds for $(k+1)^{\text{th}}$ cell. > [!question] P1.4 > > Which of the two graph representation we saw in the course material did you use to store the game board? What would the complexity of your algorithm be if you used the other graph representation? The graph representation is used as an adjacency list to store the game board. If the graph representation was an adjacency matrix, the graph as 2D matrix of size $(mn) \times (mn)$, space complexity would increase to $\mathcal{O}((mn)^2)$, time complexity would remain $O(mn + E)$, where $E \leq 4mn$. Given that accessing neighbour cell would be faster since constant-time access of matrix entries. Overall time complexity would still be $\mathcal{O}(mn)$ --- ## Problème 2. Edge-labeled graphs are graphs in which edges have labels that represent the type of relationship that is expressed by that edge. For example, in a social network graph, the edges could be labeled `parentOf`, `friendOf`, and `worksWith`. One way to express graph queries (that express how new information can be derived from edge-labeled graphs) is via a query graph that expresses how new relationships between source node s and target node t can be derived from existing information. The first query relates nodes that represent grandparents and their grandchildren, the second query relates nodes that represent ancestors and their descendants, and the third query relates everyone with a direct family relationship. Let $Q$ be a graph query and $G$ be an edge-labeled graph representing a data set. The graph query evaluation problem is the problem of computing the derived relationship in $G$ expressed by $Q$. Typically, queries are small and data graphs are enormous. Hence, here we will assume that the size of a query graph is constant. > [!question] P2.1 > > Model the graph query evaluation problem as a graph problem: What are the nodes and edges in your graph, do the edges have weights, and what problem are you trying to answer on your graph? The graph query evaluation can be modelled as directed graph, where: - **Nodes**: Each node represents a data elements in the graph - **Edges**: Each edge represents a relationship between two nodes, for example, ‘childOf’, ‘parentOf’. The problem is to find all the subgraphs or paths in the graph databases that matches the pattern specified by the query. For example, consider the following query `grandParentOf(s, t)`, the problems is to find all pairs of node $(s, t)$ in graph such that there exists a path of length 2 from s to t, with both edges labeled “parentOf”. > [!question] P2.2 > > Provide an efficient algorithm that, given a graph $G$, a source node $n$ in graph $G$, and query $Q$, will find all nodes $m$ such that the pair $(n, m)$ is in the derived relationship in $G$ expressed by $Q$. Assuming $Q$ has a constant time, the runtime of your algorithm should be worst-case $\mathcal{O}(| G |)$ in which $|G|$ is the total number of nodes and edges in $G$. ```pseudo \begin{algorithm} \caption{Graph Query Evaluation} \begin{algorithmic} \STATE \textbf{Input:} Graph $G$, Source node $n$, Query $Q$ \STATE \textbf{Output:} All nodes $m$ such that $(n, m)$ is in the derived relationship \STATE $R \leftarrow []$ \COMMENT{List to store result nodes} \STATE $Visited \leftarrow \{\}$ \STATE $Queue \leftarrow \text{InitializeQueue}()$ \STATE $\text{Enqueue}(Queue, (n, 0))$ \COMMENT{Enqueue source node with depth 0} \WHILE{$\text{Queue is not empty}$} \STATE $(u, depth) \leftarrow \text{Dequeue}(Queue)$ \IF{$u \not\in Visited$} \STATE $Visited \leftarrow Visited \cup \{u\}$ \FORALL{$v \text{ such that } Q(u, v) \text{ is true}$} \STATE $R.\text{append}(v)$ \IF{$\text{Q is transitive}$} \STATE $\text{Enqueue}(Queue, (v, depth + 1))$ \ENDIF \ENDFOR \FORALL{$v \text{ in Neighbors}(G, u)$} \IF{$v \not\in Visited$} \STATE $\text{Enqueue}(Queue, (v, depth + 1))$ \ENDIF \ENDFOR \ENDIF \ENDWHILE \RETURN $R$ \end{algorithmic} \end{algorithm} ``` > [!question] P2.3 > > Explain how you represented your graph $G$, why your algorithm is correct, and why your algorithm has a complexity that is worst-case $O(|G|)$. The graph $G$ is represented as an adjacency list, where each node maintains a list of its neighbors and its edge labels. Since we are doing BFS traversal from source node $n$. Time complexity for performing a BFS traversal is $O(|G| + |R|)$, where $|G|$ is the number of nodes and edges in the graph, and $|R|$ is the number of nodes in the derived relationship. Since $Q$ has a constant size, number of derived relationship $|R|$ is constant, therefore, overall time complexity is $O(|G|)$. Correctness of BFS traversal is guaranteed by the invariance of BFS traversal. The algorithm visits all nodes in the graph, and for each node, it visits all its neighbors. The algorithm also checks if the node has been visited before, and if not, it adds the node to the visited set and enqueues its neighbors. Therefore, the algorithm is correct. --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/a8/A8 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a8/A8" title: Trees path date: 2024-03-22 --- ## Problème 1. Let $\mathcal{G} = (\mathcal{N}, \mathcal{E})$ be an _undirected tree_ (graph $G$ is _undirected, connected_, and has $|\mathcal{N}|=|\mathcal{E}| + 1$ if we count edges $(v, w)$ and $(w, v)$ as the same edge). Let $m, n \in \mathcal{N}$. We say that the _distance_ between $m$ and $n$, denoted by `dist(m,n)`, is the length of the shortest path between $m$ and $n$. > [!question] P1.1 > > Show that $\text{dist}(m, n) = \text{dist}(n, m)$ An undirected tree $\mathcal{G}$ has the following properties: 1. $\mathcal{G}$ is connected and acyclic 2. A tree with $n$ nodes has $n-1$ edges 3. In a tree, there is a unique path between any two vertices. Let $m$ and $n$ be two arbitrary vertices in the $\mathcal{G}$. There exists a unique path $P$ from $m$ to $n$. Let the vertices along the path $P$ be: $m, v_{1}, v_{2}, \dots, v_k, n$. The length of path is $k+1$, which is the number of edges Let the vertices along the path $P^{'}$ be: $n, v_k, \dots, v_2, v_1, m$. Since $\mathcal{G}$ is an undirected graph, each edge $(v_i, v_{i+1})$ in $P$ corresponds to the same edge $(v_{i+1}, v_i)$ in $P^{'}$. Therefore, length of path $P^{'}$ is also $k+1$ `dist(m, n)` denotes the shortest path from m to n, and `dist(n, m)` denotes the shortest path from n to m. Since there is one unique path between m and n in the tree, both $P$ and $P^{'}$ are shortest paths between m and n, with length $k+1$ Therefore $\text{dist}(m, n) = \text{dist}(n, m)$ $\square$ > [!question] P1.2 > > Prove that there is a _unique_ path without repeated nodes and edges from node $m$ to node $n$ with length `dist(m,n)` Since $\mathcal{G}$ is an undirected tree, there is a unique path between any two vertices. Let $m$ and $n$ be two arbitrary vertices in the $\mathcal{G}$. There exists a simple path $P$ from $m$ to $n$. Suppose there are two different simple paths connecting $m$ and $n$: $$ \begin{align*} P_1 & : m, u_1, u_2, \dots, u_k, n \\ P_2 & : m, v_1, v_2, \dots, v_j, n \end{align*} $$ where $j \neq k$. Since the paths are different and since $P_2$ is a simple path, $P_1$ must contain an edge that isn’t in $P_2$ Let $j \ge 1$ the first index for which the edge $\{ u_{j-1}, u_j \}$ of $P_1$ is not in $P_2$. Then $u_{j-1} = v_{j-1}$. Let $u_k$ be the first vertex in path $P_1$ after $u_{j-1}$ (that is $k \geq j$) that is in the path $P_2$. Then $u_k = v_l$ for some $l \geq j$ We now have two simple path, $Q_1: u_{j-1}, \dots, u_k$ using edges from $P_1$ and $Q_2 : v_{j-1}, \dots, v_l$ using edges from $P_2$, between $u_{j-1} = v_{j-1}$ and $u_k = v_l$. The path $Q_1$ and $Q_2$ have no vertices, edges in common, thus the path from $u_{j-1}$ to $u_k$ along $Q_1$ followed by the path from $v_l$ to $v_{j-1}$ along the reverse of $Q_2$ is a cyclic in $T$, which contradicts the assumption that $T$ is a tree. Thus, the path from $m$ to $n$ is unique simple path $\square$ > [!question] P1.3 > > Prove the triangle inequality $\text{dist}(m, n) \leq \text{dist}(m, x) + \text{dist}(x, n)$ Let $m,n,x$ be three arbitrary vertices in the undirected tree $\mathcal{G}$a There exists a simple unique path $P_1$ from $m$ to $x$ (length is $\text{dist}(m,x)$), a simple unique path $P_2$ from $x$ to $n$ (length is ). Consider $P$ formed by concatenating $P_1$ and $P_2$. This is a path from $m$ to $n$ that passes through $x$ (length is $\text{dist}(m,x)+\text{dist}(x,n)$) since $P$ denotes path from $m$ to $n$, and $\text{dist}(m,n)$ denotes the shortest path between $m$ and $n$, we have $$ \text{dist}(m,n) \leq \text{length}(P) = \text{dist}(m,x) + \text{dist}(x,n) $$ > [!question] P1.4 > > Provide an algorithm that computes the distance $d=\text{max}_{m,n \in N} \text{dist}(m,n)$ that is the maximum distance between any pair of nodes in $\mathcal{G}$ in $\mathcal{O}(|\mathcal{N}| + \mathcal{E})$ ```pseudo \begin{algorithm} \caption{Maximum Distance in Tree} \begin{algorithmic} \Procedure{MaxDistance}{$\mathcal{G}$} \State $u \gets$ \Call{BFS}{$\mathcal{G}, v$} \Comment{$v$ is any arbitrary node in $\mathcal{N}$} \State $d \gets$ \Call{BFS}{$\mathcal{G}, u$} \State \Return $d$ \EndProcedure\Procedure{BFS}{$\mathcal{G}, s$} \State $Q \gets$ empty queue \State $\text{dist}[v] \gets \infty$ for all $v \in \mathcal{N}$ \State $\text{dist}[s] \gets 0$ \State $Q.\text{Enqueue}(s)$ \State $f \gets s$ \While{$Q$ is not empty} \State $u \gets Q.\text{Dequeue}()$ \State $f \gets u$ \ForAll{$v \in \mathcal{N}$ such that $(u, v) \in \mathcal{E}$} \If{$\text{dist}[v] = \infty$} \State $\text{dist}[v] \gets \text{dist}[u] + 1$ \State $Q.\text{Enqueue}(v)$ \EndIf \EndFor \EndWhile \State \Return $f$ \EndProcedure \end{algorithmic} \end{algorithm} ``` --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/a9/A9 tags: - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/a9/A9" title: Shortest path and series-parallel graph date: 2024-03-25 --- ## Problème 1. A regional government wants to improve their existing infrastructure between a collection of towns $T$. In specific, the government want to build a minimum number of roads such that there is a route from each town to each other town. The government has been advised by a dubious consultant that in the resulting road network, the number of users of a given road is independent of the presence of alternative routes. The regional government wants to minimise the number of roads it has to built to ensure that one can travel from one town to the other. Furthermore, the government wants to maximize the benefits of the road network by maximizing the number of users of the roads built. Hence, the government wants to only build roads that are expected to be used often. To help the construction plans, the government has asked the dubious consultant to estimate, for each pair of cities, the number of road users that would use the road between these two cities (if that road was built). Now the regional government is looking for a construction plan for a minimum number of roads connecting all towns that see the highest total usage among them. > [!question] P1.1 > > Model the above problem as a graph problem: What are the nodes and edges in your graph, do the edges have weights, and what problems are you trying got answer on your graph? - **Nodes**: Each town in the set of town $T$ is represented as a node in the graph. Denote the set of nodes as $V$ - **Edges**: Each potential road can be built between two pair of town. Denote it as $E$. Each edge weights represents the estimated number of users who would use the road should it is constructed. Denote the weight of an edge between nodes $i$ and $j$ as $w(i, j)$. The problem can then be modelled as given a weighted undirected graph $\mathcal{G} = (V, E)$ representing towns and potential roads, find a maximum weight minimum spanning tree of $G$. Explanation: - We need to ensure there is a route from each town to every other town while minimising the number of roads being built, thus the minimum spanning tree of the graph. - Among all possible MSTs, we want to find the one with the maximum total edge weight, thus maximum weight MSTs. > [!question] P1.2 > > Provide an algorithm $\text{ConstructionPlan}$ to find the minimum number of roads to build. Explain why your algorithm is correct. Let $w(x, y)$ be the weight of the edge between nodes $x$ and $y$. ```pseudo \begin{algorithm} \caption{$\text{ConstructionPlan}(G, T)$} \begin{algorithmic} \REQUIRE Graph $G = (T, E)$ with nodes $T$ representing towns and weighted edges $E$ representing potential roads with weights as estimated road usage \ENSURE Set of edges $E'$ representing roads to build for maximum weight minimum spanning tree \STATE $\text{Adj} \gets \text{new list}[|T|]$ \COMMENT{Adjacency list for graph} \FOR{$(u, v, w) \in E$} \STATE $\text{Adj}[u].\text{append}((v, w))$ \STATE $\text{Adj}[v].\text{append}((u, w))$ \ENDFOR \STATE $E' \gets \emptyset$ \STATE Pick an arbitrary node $s \in T$ \STATE $V \gets \{s\}$ \STATE $Q \gets \text{new min-priority queue}$ \COMMENT{Priority queue of edges} \FOR{$(v, w) \in \text{Adj}[s]$} \STATE $Q.\text{insert}((s, v, w))$ \ENDFOR \WHILE{$|V| < |T|$} \STATE $(u, v, w) \gets Q.\text{extract\_max}()$ \COMMENT{Get max weight edge} \IF{$v \notin V$} \STATE $E' \gets E' \cup {(u, v)}$ \STATE $V \gets V \cup {v}$ \COMMENT{Mark $v$ as visited} \FOR{$(x, w) \in \text{Adj}[v]$} \IF{$x \notin V$} \STATE $Q.\text{insert}((v, x, w))$ \ENDIF \ENDFOR \ENDIF \ENDWHILE \RETURN $E'$ \end{algorithmic} \end{algorithm} ``` _This is the Prim’s algorithm for finding MSTs. We instead replace the minimum weight with maximum weights to fit with problem description in P1.1_ \==**Correctness**:== Invariant: $V \text{ contains all visited nodes}, E' \text{ contains edges of a maximum weight spanning tree over nodes in } V$ - At L4, invariant holds since $V = \{s\}$ and $E' = \emptyset$ - per iteration, we add $E'$ with the maximum weight edge $(u, v)$ from any visited nodes $u \in V$ to any unvisited node $v \in T \setminus V$ (L6-14). Then add $v$ to $V$ (L15), maintaining the invariant because: - $V_{\text{new}} = V \cup \{v\}$ - $E'_{\text{new}} = E' \cup \{(u, v)\}$ is a maximum weight spanning tree over $V_{\text{new}}$ due to $(u, v)$ is the maximum weight edge connecting $V$ to $T \setminus V$. The algorithm never create cycle in $E'$ since we only add edge from visited nodes to unvisited nodes, which cannot create a cycle. \==**Bound function**:== Let $w(E')$ be the total weight of edges in $E'$. Per iteration, $w(E')$ is the maximum possible weight of any spanning tree over the nodes in $V$. This holds initially when $V=\{s\}$ and $w(E') = 0$. Per iteration, we add the maximum weight edge $(u, v)$ to $E'$ from $V$ to $T \setminus V$. Bound function is maintained because: - $w(E'_{\text{new}}) = w(E') + w(u, v) \geq w(E')$ - Any other spanning tree $T'$ over $V \cup \{v\}$ must contain an edge $(x,y)$ from $V$ to $\{v\}$, and $w(x,y) \leq w(u,v)$. Therefore, $w(T') \leq w(E' \cup \{(u,v)\} )$ The loop terminates when $|V| = |T|$, i.e all nodes are visited. Thus the algorithm is correct. $\square$ > [!question] P1.3 > > Explain which graph representation you used for your algorithm and what the complexity of your algorithm is using this graph representation. Uses adjacency list representation of the graph $G=(T,E)$. Since we store a list of edges for each nodes, or for each node $u \in T$, we maintain a list $\text{Adj}[u]$ containing the node $v$ and weight $w(u,v)$ for every edge $(u,v) \in E$. The time complexity of the algorithm is $\mathcal{O}(|T|^2\log|T| + |E|)$: 1. Initialising adjacency list: $\mathcal{O}(|E|)$, where the adjacency list takes $O(|E|)$ to populate 2. L4: Run for $|T|$ iterations 3. Inside the loop, find maximum weight edge takes from $Q$ takes $O(\log|T|)$. Therefore each iteration the while loop takes $O(|T|\log|T|)$ 4. Adding extracted edge to $E'$ takes $O(1)$ Thus, the overall time complexity is $\mathcal{O}(|T|^2\log|T| + |E|)$. If the graph is complete, then $|E| = \frac{|T|(|T|-1)}{2} = O(|T|^2)$, and the time complexity is $\mathcal{O}(|T|^2\log|T|)$ The space complexity is $O(|T| + |E|)$ > [!question] P1.4 > > What is the worst-case complexity of your solution if you use the other graph representation? Explain your answer Worst case complexity of the algorithm using adjacency matrix representation is $\mathcal{O}(|T|^2\log|T|)$. If adjacency matrix representation is used, or we use a $|T| \times |T|$ matrix $\text{Adj}$ to represent the graph, where $\text{Adj}[u][v]=w$ for an edge $(u,v)$ with weight $w$. Initialising the matrix will take $O(|T|^2)$ time. Adding edges from starting node $s$ to priority queue $Q$ will take $O(|T|\log|T|)$ time, as we need to scan row $\text{Adj}[s]$ which has $|T|$ elements. Inside the while loop, for loop (L18-23) that iterates over the edges of the newly visited node $v$ now takes $O(|T|\log|T|)$ time, as we need to scane the row that has $|T|$ entries, and for each unvisited node, we insert the edge into $Q$ which takes $O(\log|T|)$ time. Therefore, the while loop will take $O(|T| \cdot (|T|\log |T|)) = O(|T|^2\log|T|)$ Thus, the total time complexity would be $O(|T|^2 + |T|^2\log|T|) = O(|T|^2\log|T|)$ Whereas the space complexity is $O(|T|^2)$. Could be more efficient for sparse graph where $|E| \ll |T|^2$. --- ## Problème 2. A directed graph $\mathcal{G} = (\mathcal{N}, \mathcal{E}, s, t)$ with $s \in \mathcal{N}$ the _source_ and $t \in \mathcal{N}$ the _target_ is a series-parallel graph if it can be constructed inductively using the following rules: 1. An _elementary_ series-parallel graph is a single edge from $s$ to $t$ 2. The _series_-construction. Let $\mathcal{G}_1=(\mathcal{N}_1, \mathcal{E}_1, s_1, t_1)$ and $\mathcal{G}_2=(\mathcal{N}_2, \mathcal{E}_2, s_2, t_2)$ be two series-parallel graphs with the node $c$ in common ($\mathcal{N}_1 \cap \mathcal{N}_2 = \{c\}$). The graph $\mathcal{G}=(\mathcal{N}_1 \cup \mathcal{N}_2, \mathcal{E}_1 \cup \mathcal{E}_2)$ is a series-parallel graph 3. The _parallel_-construction. Let $\mathcal{G}_1=(\mathcal{N}_1, \mathcal{E}_1, s_1, t_1)$ and $\mathcal{G}_2=(\mathcal{N}_2, \mathcal{E}_2, s_2, t_2)$ be two series-parallel graphs without nodes in common ($\mathcal{N}_1 \cap \mathcal{N}_2 = \emptyset$) and let $s,t \notin (\mathcal{N}_1 \cup \mathcal{N}_2)$ be two fresh nodes. The graph $\mathcal{G}=(\mathcal{N}_1 \cup \mathcal{N}_2 \cup \{s,t\},\mathcal{E}_1 \cup \mathcal{E}_2 \cup \{(s,s_{1}), (s,s_{2}), (t_{1},t), (t_{2},t)\})$ is a series-parallel graph. Now assume we have a series-parallel graph $\mathcal{G} = (\mathcal{N}, \mathcal{E}, s, t)$ with an edge-weight function $weight: \mathcal{E} \to \mathbb{Z}$ (here, $\mathbb{Z}$ are the integers, which includes negative numbers). We note that series-parallel graphs are relatively simple structures. > [!question] P2.1 > > Write an algorithm to compute the single-source shortest paths from the source $s$ to all nodes $n \in \mathcal{N}$ in $\mathcal{O}(|\mathcal{N}|+|\mathcal{E}|)$ time ```pseudo \begin{algorithm} \caption{ShortestPathsSP($\mathcal{G} = (\mathcal{N}, \mathcal{E}, s, t)$, $weight$)} \begin{algorithmic} \State $dist \gets {}$ \Comment{Initialize empty dictionary for distances} \If{$\mathcal{G}$ is an elementary graph (single edge $(s,t)$)} \State $dist[s] \gets 0$ \State $dist[t] \gets weight(s,t)$ \ElsIf{$\mathcal{G}$ is a series composition of $\mathcal{G}_1$ and $\mathcal{G}_2$} \State $dist_1 \gets ShortestPathsSP(\mathcal{G}_1, weight)$ \State $dist_2 \gets ShortestPathsSP(\mathcal{G}_2, weight)$ \State $dist \gets dist_1 \cup dist_2$ \Comment{Combine distances} \For{each node $n$ in $\mathcal{G}_2$} \State $dist[n] \gets dist[n] + dist_1[t_1]$ \EndFor \ElsIf{$\mathcal{G}$ is a parallel composition of $\mathcal{G}_1$ and $\mathcal{G}_2$} \State $dist_1 \gets ShortestPathsSP(\mathcal{G}_1, weight)$ \State $dist_2 \gets ShortestPathsSP(\mathcal{G}_2, weight)$ \State $dist \gets dist_1 \cup dist_2$ \Comment{Combine distances} \State $dist[s] \gets 0$ \State $dist[t] \gets \min(dist_1[t_1], dist_2[t_2])$ \EndIf \RETURN $dist$ \end{algorithmic} \end{algorithm} ``` > [!question] P2.2 > > Explain why your algorithm is correct Base case: For an elementary graph with one edge $(s,t)$, it sets $dist[s]=0$ and $dist[t]=weight(s,t)$, which is the shortest path from $s$ to $t$. If $\mathcal{G}$ is a series-composition: Let $\mathcal{G}_1$ and $\mathcal{G}_2$ be the two series-parallel graphs with common node $c$. - It recursively computes the shortest paths from $s$ to all nodes in $\mathcal{G}_1$ and $\mathcal{G}_2$, store them in $dist_1$ and $dist_2$ respectively. - For any node $n \in \mathcal{G}_2$, the shortest path from $s$ \mu st go through target $t_1$ of $\mathcal{G}_1$. Thus, we update $dist[n] = dist[n] + dist_1[t_1]$. If $\mathcal{G}$ is a parallel-composition: Let $\mathcal{G}_1$ and $\mathcal{G}_2$ be the two series-parallel graphs with new source $s$ and target $t$ - It recursively computes the shortest paths from $s$ to all nodes in $\mathcal{G}_1$ and $\mathcal{G}_2$, store them in $dist_1$ and $dist_2$ respectively. - The new source $s$ has distance 0, correctly set by $dist[s]=0$ - The new target $t$ has distance $\min(dist_1[t_1], dist_2[t_2])$, which is the shortest path from $s$ to $t$. In both cases, it combines the distance using union operation, and the algorithm is correct. $\square$ > [!question] P2.3 > > Explain which graph representation you used for your algorithm and why your algorithm has the stated complexity It uses adjacency list representation of the graph $\mathcal{G} = (\mathcal{N}, \mathcal{E}, s, t)$. Each node maintains a list of its outgoing edges. Since in series-parallel graphs, each node has at most 2 outgoing edges, the adjacency list representation is efficient. Additionally, the algorithm recursively traverse the graph, and finding outgoing edges takes constant time. For base case, the algorithm takes $O(1)$ time to set the distance for the elementary graph. In both cases, it calls itself on $\mathcal{G}_1$ and $\mathcal{G}_2$, taking $O(|\mathcal{N}_1| + |\mathcal{E}_1|)$ and $O(|\mathcal{N}_2| + |\mathcal{E}_2|)$ time respectively. Then it updates the distance for each node in $\mathcal{G}_2$ in $O(|\mathcal{N}_2|)$ time. Thus, the time complexity is $O(|\mathcal{N}_1| + |\mathcal{E}_1| + |\mathcal{N}_2| + |\mathcal{E}_2|)$ Since each node is visited exactly once during the traversal, the time complexity is $O(|\mathcal{N}| + |\mathcal{E}|)$ (since $|\mathcal{N}_1| + |\mathcal{N}_2| = |\mathcal{N}|$ and $|\mathcal{E}_1| + |\mathcal{E}_2| = |\mathcal{E}|$) > [!question] P2.4 > > What is the worst-case complexity of your solution if you use the other graph representation? Explain your answer If using adjacency matrix representation, the worst-case complexity of the algorithm is $O(|\mathcal{N}|^2)$. It will take $O(|\mathcal{N}|^2)$ time to initialise the adjacency matrix. The analysis of the algorithm remains the same. The difference here is that finding outgoing edges for a node will take $O(|\mathcal{N}|)$ time in worst-case, as it will have to traverse the entire row of the matrix corresponding to that node. Since the ops is performed on each node during traversal, total complexity would be $O(|\mathcal{N}|^2)$. --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2c03/index tags: - university - sfwr2c03 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2c03/index" title: Data Structures and Algorithm date: 2024-01-08 --- Pretty explanatory, collection of notes for Data Structures and Algorithms course. Prof. [Jelle Hellings](https://jhellings.nl) or [mail](mailto:jhellings@mcmaster.ca) and [link](https://avenue.cllmcmaster.ca/d2l/home/598208) Due date per assignment, 5.25% each: 1. Jan 21st 2. Jan 28th 3. Feb 4th 4. Feb 11th 5. Feb 18th 6. Feb 26th 7. March 4th 8. March 11st 9. March 18st 10. March 25th 11. April 3rd 12. April 10th --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2fa3/DFA tags: - sfwr2fa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/DFA" title: Deterministic Finite Automata date: 2024-01-12 --- ## definition $$ \Sigma^{*}: \text{set of all strings based off }\Sigma $$ $$ \begin{align*} \text{DFA}\quad M &= (Q, \Sigma, \delta, s, F) \\\ Q &: \text{finite set of states} \\\ \Sigma &: \text{finite alphabet} \\\ \delta &: Q \times \Sigma \rightarrow Q \rightarrow \delta: Q \times \Sigma \rightarrow Q \\\ s &: \text{start state},\quad s\in{Q} \\\ F &: \text{set of final states},\quad F\subseteq{Q} \\\ \end{align*} $$ ### examples Ex: $\Sigma = \{a, b\}$. Creates a DFA $M$ that accepts all strings that contains at least three a’s. $$ \begin{align*} Q &= \{s_1, s_2, s_3, s_4\} \\\ s &= 1 \\\ F &= \{s_4\} \\\ \end{align*} $$ Transition function: $$ \begin{align*} \delta(1, a) = s_2 \\\ \delta(1, b) = s_1 \\\ \delta(2, a) = s_3 \\\ \delta(2, b) = s_2 \\\ \delta(3, a) = s_4 \\\ \delta(3, b) = s_3 \\\ \delta(4, a) = \delta(4, b) = s_4 \\\ \end{align*} $$ [representation](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/DFA/../../../../../../../../thoughts/representations): ```mermaid stateDiagram-v2 direction LR classDef accepting fill:#4CAF50,stroke:#333,stroke-width:2px classDef start fill:#FFD700,stroke:#333,stroke-width:2px s1: s1 s2: s2 s3: s3 s4: s4 [*] --> s1 s1 --> s1: b s1 --> s2: a s2 --> s2: b s2 --> s3: a s3 --> s3: b s3 --> s4: a s4 --> s4: a,b class s4 accepting class s1 start ``` > if in final string then accept, otherwise reject the string ## language. [Language](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/DFA/../../../../../../../../thoughts/Language) of machine $\mathcal{L}(M)$ is the set of strings M accepts, such that $\mathcal{L}(M) \in \Sigma^{*}$ $$ \mathcal{L}(M) = \{w \in \Sigma^{*} | \delta(s, w) \in F\} $$ > Assumption: $\Sigma = \{a, b\}$ > [!math] Questions > > Find DFA $M$ such that $\mathcal{L}(M)=$ the following > > 1. $\{ xab \mid x \in \Sigma^{*} \}$ > 2. $\{ x \mid |x| \% 2 = 0 \}$ > 3. $\{ x \mid x = 2^n\space ,\space n \in \mathbb{N} \}$, $\Sigma = \{0, 1\}$ > 4. $\{ x \mid "abc" \in x \}$, $\Sigma = \{a, b, c\}$ > 5. $\{ x \mid \text{a is the second last char of x} \}$ > 6. $\{ a^n \cdot b^n \mid n \ge 0 \}$ > 7. $\{ x \mid \text{a is the fifth last char of x} \}$ 1. ```mermaid stateDiagram-v2 direction LR classDef accepting fill:#4CAF50,stroke:#333,stroke-width:2px classDef start fill:#FFD700,stroke:#333,stroke-width:2px s0: q0 s1: q1 s2: q2 [*] --> s0 s0 --> s0: b s0 --> s1: a s1 --> s0: a s1 --> s2: b s2 --> s0: a s2 --> s1: b class s2 accepting class s0 start ``` 2. ```mermaid stateDiagram-v2 direction LR classDef accepting fill:#4CAF50,stroke:#333,stroke-width:2px classDef start fill:#FFD700,stroke:#333,stroke-width:2px s0: q0 s1: q1 [*] --> s0 s0 --> s1: a,b s1 --> s0: a,b class s0 accepting class s0 start ``` 3. ```mermaid stateDiagram-v2 direction LR classDef accepting fill:#4CAF50,stroke:#333,stroke-width:2px classDef start fill:#FFD700,stroke:#333,stroke-width:2px classDef dead fill:#ff6b6b,stroke:#333,stroke-width:2px s0: q0 s1: q1 s2: q2 s3: dead [*] --> s0 s0 --> s3: 0 s0 --> s1: 1 s1 --> s2: 0 s1 --> s3: 1 s2 --> s2: 0 s2 --> s3: 1 s3 --> s3: 0,1 class s1 accepting class s0 start class s3 dead ``` 4. ```mermaid stateDiagram-v2 direction LR classDef accepting fill:#4CAF50,stroke:#333,stroke-width:2px classDef start fill:#FFD700,stroke:#333,stroke-width:2px s0: q0 s1: q1 s2: q2 s3: q3 [*] --> s0 s0 --> s0: b,c s0 --> s1: a s1 --> s0: a,c s1 --> s2: b s2 --> s0: a,b s2 --> s3: c s3 --> s3: a,b,c class s3 accepting class s0 start ``` 5. ```mermaid stateDiagram-v2 direction LR classDef accepting fill:#4CAF50,stroke:#333,stroke-width:2px classDef start fill:#FFD700,stroke:#333,stroke-width:2px s0: q0 s1: q1 s2: q2 s3: q3 [*] --> s0 s0 --> s0: a,b s0 --> s1: a s1 --> s2: b s1 --> s3: a s2 --> s0: a,b s3 --> s0: a,b class s2 accepting class s0 start ``` 6. non-regular. _proof using Pumping Lemma_ - assume the language is regular, let $p$ be the pumping length. - Consider string $s = a^n \cdot b^n$ - any way of diving $s=xyz$ where $\mid xy \mid \le p$ and $\mid y \mid \ge 0$ will results y contains only a’ - pumped wouldn’t be in the language q.e.d 7. ```mermaid stateDiagram-v2 direction LR classDef accepting fill:#4CAF50,stroke:#333,stroke-width:2px classDef start fill:#FFD700,stroke:#333,stroke-width:2px s0: q0 s1: q1 s2: q2 s3: q3 s4: q4 s5: q5 [*] --> s0 s0 --> s0: a,b s0 --> s1: a s1 --> s2: a,b s2 --> s3: a,b s3 --> s4: a,b s4 --> s5: a,b s5 --> s0: a,b class s5 accepting class s0 start ``` --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2fa3/Finals tags: - sfwr2fa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/Finals" title: Crib. date: 2024-04-15 --- $$ \begin{align*} \neg (\exists x \mid R:P(x)) &\equiv \forall x \mid R:\neg P(x) \\\ \neg (\forall x \mid R:P(x)) &\equiv \exists x \mid R:\neg P(x) \end{align*} $$ > [!tip] regular language > > $$ > \begin{align*} \hat{\delta}(q, \epsilon) &= q \\\ \hat{\delta}(q, xa) &= \delta(\hat{\delta}(q, x), a) \end{align*} > $$ > All finite languages are regular, but not all regular languages are finite > [!tip] Pumping Lemma > > $$ > \text{L is regular} \implies (\exists \mid k \geq 0: (\forall x,y,z \in L \land |y| \geq k : (\exists u,v,w | y=uvw \land |v| > 1: (\forall i \mid i \geq 0: xuv^iwz \in L)))) > $$ > > - demon picks $k$ > - you pick $x,y,z \leftarrow xyz \in L \land |y| \geq k$ > - demon picks $u,v,w \leftarrow uvw = y \land |v| \geq 1$ > - you pick an $i \geq 0$, and show $xuv^2wz \notin L$ > [!note] context-free grammar > > $$ > \begin{align*} \mathbb{G} = (N, \Sigma, P, S) &\quad N: \text{non-terminal symbols} \\\ &\quad \Sigma: \text{terminal symbols} \space s.t \space \Sigma \cap N = \emptyset \\\ &\quad P: \text{production rules} \space s.t \text{a finite subset of } N \times (N \cup \Sigma)^{*} \\\ &\quad S: \text{start symbol} \in N \end{align*} > $$ > [!note] Properties > > - $\exists \text{ CFG} | L(G) = L \iff L \text{ is a context-free language}$ > - L is regular $\implies$ L is context-free > - $L_{1}, L_{2} \text{ are context-free} \implies L_{1} \cup L_{2} \text{ are context-free}$ > - context-free languages are not closed under complement, and not closed under intersection. ($L_1 \cap L_2, \sim{L_1} \text{ are not context-free}$) > > We know that $\{a^nb^nc^n\mid n \geq 0\}$ is not CF > [!tip] Pushdown Automata PDA > > $$ > \begin{align*} \text{PDA} = (Q, \Sigma, \Gamma, \delta, s, \bot, F) &\quad Q: \text{Finite set of state} \\\ &\quad \Sigma: \text{Finite input alphabet} \\\ &\quad \Gamma: \text{Finite stack alphabet} \\\ &\quad \delta: \subset (Q \times (\Sigma \cup \{\epsilon\}) \times \Gamma) \times (Q \times \Gamma^{*}) \\\ &\quad s: \text{start state} \in Q \\\ &\quad \bot: \text{empty stack} \in \Gamma \\\ &\quad F: \text{final state} \in Q \end{align*} > $$ > [!note] Properties > > $\mathcal{L}(M) = L \iff L \text{ is context-free}$ > [!tip] Turing machine > > $$ > \begin{align*} \text{TM} = (Q, \Sigma, \Gamma, \delta, s, q_{\text{accept}}, q_{\text{reject}}, \square) &\quad Q: \text{Finite set of state} \\\ &\quad \Sigma: \text{Finite input alphabet} \\\ &\quad \Gamma: \text{Finite tape alphabet} \\\ &\quad \delta: (Q \times \Gamma) \rightarrow Q \times \Gamma \times \{L, R\} \\\ &\quad s: \text{start state} \in Q \\\ &\quad q_{\text{accept}}: \text{accept state} \in Q \\\ &\quad q_{\text{reject}}: \text{reject state} \in Q \\\ &\quad \square: \text{blank symbol} \in \Gamma \end{align*} > $$ > > Transition function: $\delta(q, x) = (p, y, D)$: when in state $p$ scan symbol $a$, write $b$ on tape cell, move the head in direction $d$ and enter state $q$ > > transition to $q_{\text{accept}}$ or $q_{\text{reject}}$ is a halting state and accept/reject respectively. > [!note] Properties > > - A TM is “total” iff it halts on all input > - $\mathcal{L}(M) = L \iff (\forall s \mid s \in L \iff M \text{ accepts s})$ > - L is recognizable: $\iff \exists \text{ TM M s.t } \mathcal{L}(M)=L$ > - L is decidable: $\iff \exists \text{ total TM M s.t } \mathcal{L}(M)=L \land \forall s \in \Sigma^{*} \text{ M halts on s}$ > - $\text{L is decidable} \implies \text{L is recognizable}$ > [!tip] Church-Turing Thesis > > > Conjecture 1: All reasonable models of computation are equivalent: > > > > - perfect memory > > - finite amount of time > > > > Conjecture 2: Anything a modern digital computer can do, a Turing machine can do. > > Equivalence model > > - TMs with multiple tapes. > - NTMs. > - PDA with two stacks. > [!note] Finite Automata from Church-Turing Thesis > > Finite automata can be encoded as a string: > > Let $0^n10^m10^j0^{k_1}\ldots 10^{k_n}$ be a DFA with $n$ states, $m$ input characters, $j$ final states, $k_1\ldots k_n$ transitions > > $$ > \begin{align*} A_{\text{DFA}} &= \{M\#w \mid M \text{ is a DFA which accepts } w\} (1) \\\ A_{\text{TM}} &= \{M\#w \mid M \text{ is a TM which accepts } w\} (2) \end{align*} > $$ > > M is a “recognizer” $\implies M(x) = \begin{cases} \text{accept} & \text{if } x \in L \\\ \text{reject or loop} & \text{if } x \notin L \end{cases}$ > > M is a “decider” $\implies M(x) = \begin{cases} \text{accept} & \text{if } x \in L \\\ \text{reject} & \text{if } x \notin L \end{cases}$ > [!tip] Decidability and Recognizability > > (1) is deciable: Create a TM $M^{'}$ such that $M^{'}(M\#w)$ runs $M$ on $w$, therefore $M'$ is total, or $\mathcal{L}(M) = A_{\text{DFA}}$ > $M\#w \in \mathcal{L}(M^{'}) \iff M \text{ accepts } w \iff M\#w \in A_{\text{DFA}}$ > > (2) is recognizable: Create a TM $M^{'}$ such that $M^{'}(M\#w)$ runs $M$ on $w$ > $M\#w \in \mathcal{L}(M^{'}) \iff M \text{ accepts } w \iff M\#w \in A_{\text{TM}} \implies \mathcal{L}(M^{'}) = A_{\text{TM}}$ > Note that all regular language are deciable language > [!tip] Proof for > > Assume $A_{\text{TM}}$ is decidable > > $\exists$ a decider for $A_{\text{TM}}$, $D$. > > Let $P$ another TM such that $P(M)$: Call $D$ on $M\#M$ > > Paradox machine: P never loops: $P(M) = \begin{cases} \text{accept} & \text{if P reject M} \\\ \text{reject} & \text{if P accepts M} \end{cases}$ > [!tip] Countability > > - A set $S$ is countable infinite if $\exists$ a monotonic function $f: S \rightarrow \mathbb{N}$ (isomorphism) > - A set $S$ is uncountable if there is **NO** injection from $S$ > > Theorem: > > - The set of all PDAs is countably infinite > - $\Sigma^{*}$ is countably infinite (list out all string n in finite time) > - The set of all TMs is countably infinite ($\Sigma = \{0,1\} \mid \text{set of all TMs that } S \subseteq \Sigma^{*}$, so does REC, DEC, CF, REG > - The set of all languages is uncountable. > [!tip] Diagonalization and Problems > > The set of unrecognizable languages is uncountable. The set of all languages is uncountable. > > Proof: I can encode a language with a infinite string. $\Sigma = \{0,1\}$ Consider a machine $N$ that on input $x \in \{0,1\}^{*}$ such that $L^{*}(i)$ is undeciable from the diagonalization. Make sure to use the negation of the dia > > Theorem > > - L is deciable $\iff$ L and $\sim L$ are both recognizable > > > Proof: $L$ is deciable $\iff \sim L$ is deciable. $L$ is deciable $\implies$ L is recognizable > > > > Let $R_L, R_{\sim L}$ be recognizer. Create TM $M$ that runs $R_L$ and $R_{\sim L}$ on $x$ concurrently. if $R_L$ accepts $\implies$ accept, $R_{\sim L}$ accepts $\implies$ reject. > > > > If M never halts, M decides L. If $x \in L \implies R_L(x) \text{ halts}$, and $x \notin L \implies R_{\sim L}(x) \text{ halts}$. > [!tip] Reduction on universal TMs > > $\sim A_{\text{TM}} = \{M\#w \mid M \text{ does not accept w}\}$. Which implies $\sim A_{\text{TM}}$ is unrecognizable > > HP is undeciable, and recognizable. > > $$ > \text{Halting problem} = \{M\#w \mid M \text{ halts on w} \} > $$ > > Proof: Assume HP is deciable. $\exists D_{MP}(M\#w) = \begin{cases} \text{accept} & \text{if M halts on w} \\\ \text{reject} & \text{if M loops on w} \end{cases}$ > [!question]- Build a TM > > ```prolog > calls $D_{MP}$ on $M\#v$: > accepts: > - run $M$ on $v$ > - accept -> accept > - reject -> reject > reject: reject > ``` > > Therefore $M^{'}$ is total. Since $M\#w \in \mathcal{L(M^{'})} \iff \text{M accepts w} \iff M\#w \in A_{\text{TM}}$. Therefore $\mathcal{L}(M^{'}) = A_{\text{TM}}$. Which means $M^{'}$ is a decider for $A_{\text{TM}}$ (which is a paradox) $\square$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2fa3/NFA tags: - sfwr2fa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/NFA" title: NFA date: 2024-01-30 --- ## definition $$ \Sigma^{*}: \text{set of all strings based off }\Sigma $$ $$ \begin{align*} \text{NFA}\quad M &= (Q, \Sigma, \Delta, S, F) \\\ Q &: \text{finite set of states} \\\ \Sigma &: \text{finite alphabet} \\\ \Delta &: Q \times \Sigma \rightarrow P(Q) \\\ S &: \text{Start states},\quad S \subseteq Q \\\ F &: \text{Final states},\quad F \subseteq Q \\\ \end{align*} $$ ## examples 1. $\mathcal{L}(M) = \{ abxba \mid x \in \Sigma^{*}\}$ ```mermaid stateDiagram-v2 direction LR classDef accepting fill:#4CAF50,stroke:#333,stroke-width:2px classDef start fill:#FFD700,stroke:#333,stroke-width:2px s0: q0 s1: q1 s2: q2 s3: q3 s4: q4 s5: q5 [*] --> s0 s0 --> s1: a s1 --> s2: b s2 --> s2: Σ s2 --> s3: b s3 --> s4: a s4 --> [*] class s4 accepting class s0 start ``` 2. $\mathcal{L}(M) = \{ yx \mid x = 00 \lor x =11 \land y \in \Sigma^{*}\}$ ```mermaid stateDiagram-v2 direction LR classDef accepting fill:#4CAF50,stroke:#333,stroke-width:2px classDef start fill:#FFD700,stroke:#333,stroke-width:2px s0: q0 s1: q1 s2: q2 s3: q3 s4: q4 [*] --> s0 s0 --> s0: 0,1 s0 --> s1: 0 s0 --> s3: 1 s1 --> s2: 0 s3 --> s4: 1 s2 --> [*] s4 --> [*] class s2,s4 accepting class s0 start ``` ## epsilon transition ```mermaid stateDiagram-v2 direction LR [*] --> s1 s1 --> s2: 1 s2 --> s3: 1 s3 --> s4: ε s1 --> s4: ε s1 --> s1: 0 s3 --> s3: 1 ``` Given the following $M$ ```mermaid stateDiagram-v2 direction LR [*] --> s1 s1 --> s2: 1 s2 --> s3: 1 s3 --> s4: ε s1 --> s4: ε s1 --> s1: 0 s3 --> s3: 1 ``` $\mathcal{L}(M) = \{0^n1^m \mid n \geq 0, m \neq 1 \space, x \in \Sigma^{*}\}$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2fa3/Nous tags: - sfwr2fa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/Nous" title: Automata date: 2024-03-08 --- Book: [pdf](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/Automata-and-Computability.pdf) Q1: T/F, if F explain why. Q4: regular expression, 5 separate q Q2/Q3: DFAs and NFAs - Product construction: $\cap$ $\cup$ of DFAs - Subset construction: NFA to DFA - Quotient construction: State minimization ### Set theory Complement in $\Sigma^{*}$: $$ \overline{L} = \Sigma^{*} - L $$ associative: $$ \begin{align*} (A \cup B) \cup C &= A \cup (B \cup C), \\ (A \cap B) \cap C &= A \cap (B \cap C), \\ (AB)C &= A(BC). \end{align*} $$ commutative: $$ \begin{align*} A \cup B &= B \cup A \\ A \cap B &= B \cap A \end{align*} $$ > [!tip] null set > > null set $\emptyset$ is the identity for $\cup$ and annihilator for set concatenation > > $A \cup \emptyset = A$ and $A \emptyset = \emptyset A = \emptyset$ set $\{\epsilon\}$ is an identity for set concatenation $\{\epsilon\}A = A\{\epsilon\} = A$ Set union and intersection are distributive over set concatenation $$ \begin{align*} A \cup (B \cap C) &= (A \cup B) \cap (A \cup C) \\\ A \cap (B \cup C) &= (A \cap B) \cup (A \cap C) \end{align*} $$ Set concatenation distributes over union $$ \begin{align*} A(B \cup C) &= AB \cup AC \\\ (A \cup B)C &= AC \cup BC \end{align*} $$ ## DFA ## definition $$ \Sigma^{*}: \text{set of all strings based off }\Sigma $$ $$ \begin{align*} \text{DFA}\quad M &= (Q, \Sigma, \delta, s, F) \\\ Q &: \text{finite set of states} \\\ \Sigma &: \text{finite alphabet} \\\ \delta &: Q \times \Sigma \rightarrow Q \rightarrow \delta: Q \times \Sigma \rightarrow Q \\\ s &: \text{start state},\quad s\in{Q} \\\ F &: \text{set of final states},\quad F\subseteq{Q} \\\ \end{align*} $$ ### examples Ex: $\Sigma = \{a, b\}$. Creates a DFA $M$ that accepts all strings that contains at least three a’s. $$ \begin{align*} Q &= \{s_1, s_2, s_3, s_4\} \\\ s &= 1 \\\ F &= \{s_4\} \\\ \end{align*} $$ Transition function: $$ \begin{align*} \delta(1, a) = s_2 \\\ \delta(1, b) = s_1 \\\ \delta(2, a) = s_3 \\\ \delta(2, b) = s_2 \\\ \delta(3, a) = s_4 \\\ \delta(3, b) = s_3 \\\ \delta(4, a) = \delta(4, b) = s_4 \\\ \end{align*} $$ [representation](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/DFA/../../../../../thoughts/representations): ```mermaid stateDiagram-v2 direction LR classDef accepting fill:#4CAF50,stroke:#333,stroke-width:2px classDef start fill:#FFD700,stroke:#333,stroke-width:2px s1: s1 s2: s2 s3: s3 s4: s4 [*] --> s1 s1 --> s1: b s1 --> s2: a s2 --> s2: b s2 --> s3: a s3 --> s3: b s3 --> s4: a s4 --> s4: a,b class s4 accepting class s1 start ``` > if in final string then accept, otherwise reject the string [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/DFA#definition) > Let $\delta : Q \times \Sigma \rightarrow Q$, thus $\hat{\delta} : Q \times \Sigma^{*} \rightarrow Q$ $\delta(q, c) \rightarrow p$ therefore $\hat{\delta}(q, w) \rightarrow p$ ### regularity > [!tip] Important > > $$ > \begin{align*} \hat{\delta}(q, \epsilon) &= q \\\ \hat{\delta}(q, xa) &= \delta(\hat{\delta}(q, x), a) \end{align*} > $$ > [!tip] Important > > a subset $A \subset \Sigma^{*}$ is regular if and only if there exists a DFA $M$ such that $\mathcal{L}(M) = L$ > [!tip] Important > > All finite languages are regular, but not all regular languages are finite #### examples Show $L$ is regular where $L = \{ x \mid x \% 3 = 0 \cup x = \epsilon \}$, with $\Sigma = \{0, 1\}$ Three states, $q_{0}, q_{1}, q_{2}$, where $q_{0}$ denotes the string mod 3 is 0, $q_{1}$ denotes the string mod 3 is 1, and $q_{2}$ denotes the string mod 3 is 2. $\forall x \in \{0, 1\} \rightarrow \delta(q_{0}, x) = 0 \iff \#x \equiv 0 \space mod \space 3$, $\delta(q_{0}, x) = q_{1} \iff \#x \equiv 1 \space mod \space 3$, $\delta(q_{0}, x) = q_{2} \iff \#x \equiv 2 \space mod \space 3$ ```mermaid stateDiagram-v2 direction LR [*] --> q0 q0 --> q1 : 1 q0 --> q0 : 0 q1 --> q2 : 0 q1 --> q0 : 1 q2 --> q1 : 0 q2 --> q2 : 1 q0 --> [*] ``` --- ## product construction Assume that A, B are regular, there are automata $$ M_1 = (Q_1, \Sigma, \delta_1, s_1, F_1) \quad M_2 = (Q_2, \Sigma, \delta_2, s_2, F_2) $$ Thus $$ M_{3} = (Q_{3}, \Sigma, \delta_3, s_{3}, F_{3}) $$ where $Q_{3}=Q_{1} \times Q_{2}$, $s_{3} = (s_{1}, s_{2})$, $F_{3} = F_{1} \times F_{2}$, and $\delta_{3}((p, q), x) = (\delta_{1}(p, x), \delta_{2}(q, x))$ with $L(M_{1}) = A$ and $L(M_{2}) = B$, then **$A \cap B$** is regular. > [!tip] Lemma 4.1 > > $$ > \delta_3((p, q), x) = (\delta_1(p, x), \delta_2(q, x)) \space \forall x \in \Sigma^* > $$ Complement set: $Q - F \in Q$ Trivial machine $\mathcal{L}(M_{1}) = \{\}$, $\mathcal{L}(M_{2}) = \Sigma^*$, $\mathcal{L}(M_{3})=\{ \epsilon \}$ > [!note] De Morgan laws > > $$ > A \cup B = \overline{\overline{A} \cap \overline{B}} > $$ > [!tip] Theorem 4.2 > > $$ > L(M_3) = L(M_1) \cap L(M_2) > $$ > $\overline{L}$ is regular > $L_{1} \cap L_{2}$ is regular > $L_{1} \cup L_{2}$ is regular --- ## NFA ## definition $$ \Sigma^{*}: \text{set of all strings based off }\Sigma $$ $$ \begin{align*} \text{NFA}\quad M &= (Q, \Sigma, \Delta, S, F) \\\ Q &: \text{finite set of states} \\\ \Sigma &: \text{finite alphabet} \\\ \Delta &: Q \times \Sigma \rightarrow P(Q) \\\ S &: \text{Start states},\quad S \subseteq Q \\\ F &: \text{Final states},\quad F \subseteq Q \\\ \end{align*} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/NFA#definition) > [!tip] transition function > > $\hat{\Delta}: P(Q) \times \Sigma^* \rightarrow P(Q)$ $$ \begin{align*} \hat{\Delta}(A, a) &= \bigcup_{p \in \hat{\Delta}(A, \varepsilon)} \Delta(p, a) \\\ & \\\ &= \bigcup_{p \in A} \Delta(p, a). \end{align*} $$ ## subset construction > [!tip] acceptance > > $N$ accepts $x \in \Sigma^*$ if > > $$ > \hat{\Delta}(s, x) \cap F \neq \emptyset > $$ Define $L(N) = \{ x \in \Sigma^* \mid N\text{ accepts } x\}$ > [!tip] Theorem 4.3 > > Every DFA $(Q, \Sigma, \delta, s, F)$ is equvalent to an NFA $(Q, \Sigma, \Delta, \{s\} , F)$ where $\Delta(p, a) = \{ \delta(p, a) \}$ > [!tip] Lemma 6.1 > > For any $x, y \in \Sigma^* \land A \subseteq Q$, > > $$ > \hat{\Delta}(s, xy) = \hat{\Delta}(\hat{\Delta}(s, x), y) > $$ > [!tip] Lemma 6.2 > > $\hat{\Delta}$ commutes with set union: > > $$ > \hat{\Delta}(\bigcup_i A_i, x) =\bigcup_i \hat{\Delta}(A_i, x) > $$ Let $N = (Q_N, \Sigma, \Delta_N, S_N, F_N)$ be arbitrary NFA. Let M be DFA $M = (Q_M, \Sigma, \delta_M, s_M, F_M)$ where: $$ \begin{align*} Q_M &= P(Q_N) \\\ \delta_M(A, a) &= \hat{\Delta}_N(A, a) \\\ s_M &= S_N \\\ F_M &= \{ A \in Q_N \mid A \cap F_N \neq \emptyset \} \end{align*} $$ > [!tip] Lemma 6.3 > > For any $A \subseteq Q_N \land x \in \Sigma^*$ > > $$ > \hat{\delta}_M(A, x) = \hat{\Delta}_N(A, x) > $$ > [!tip] Theorem 6.4 > > The automata M and N accept the same sets. ## regex _atomic patterns_ are: - $L(a) = \{a\}$ - $L(\epsilon) = \{\epsilon\}$ - $L(\emptyset) = \emptyset$ - $L(\#) = \Sigma$: matched by any symbols - $L(@) = \Sigma^*$: matched by any string _compound patterns_ are formed by combining binary operators and unary operators. > [!tip] redundancy > > $a^+ \equiv aa^*$, $\alpha \cap \beta = \overline{\overline{\alpha} + \overline{\beta}}$ > if $\alpha$ and $\beta$ are patterns, then so are $\alpha + \beta, \alpha \cap \beta, \alpha^*, \alpha^+, \overline{\alpha}, \alpha \beta$ > [!tip] The following holds for x matches: > > $L(\alpha + \beta) = L(\alpha) \cup L(\beta)$ > > $L(\alpha \cap \beta) = L(\alpha) \cap L(\beta)$ > > $L(\alpha\beta) = L(\alpha)L(\beta) = \{yz \mid y \in L(\alpha) \land z \in L(\beta)\}$ > > $L(\alpha^*) = L(\alpha)^0 \cup L(\alpha)^1 \cup \dots = L(\alpha)^*$ > > $L(\alpha^+) = L(\alpha)^+$ > [!tip] Theorem 7.1 > > $\Sigma^* = L(\#^*) = L(@)$ > > Singleton set $\{x\} = L(x)$ > > Finite set: $\{x_{1},x_{2},\dots,x_m\} = L(x_{1}+x_{2}+\dots+x_m)$ > [!tip] Theorem 9 > > $$ > \begin{array}{cccl} \alpha + (\beta + \gamma) & \equiv & (\alpha + \beta) + \gamma & (9.1) \\ \alpha + \beta & \equiv & \beta + \alpha & (9.2) \\ \alpha + \phi & \equiv & \alpha & (9.3) \\ \alpha + \alpha & \equiv & \alpha & (9.4) \\ \alpha(\beta\gamma) & \equiv & (\alpha\beta)\gamma & (9.5) \\ \epsilon \alpha & \equiv & \alpha\epsilon \equiv \alpha & (9.6) \\ \alpha(\beta + \gamma) & \equiv & \alpha\beta + \alpha\gamma & (9.7) \\ (\alpha + \beta)\gamma & \equiv & \alpha\gamma + \beta\gamma & (9.8) \\ \phi\alpha & \equiv & \alpha\phi \equiv \phi & (9.9) \\ \epsilon + \alpha^* & \equiv & \alpha^* & (9.10) \\ \epsilon + \alpha^* & \equiv & \alpha^* & (9.11) \\ \beta + \alpha\gamma \leq \gamma & \Rightarrow & \alpha^*\beta \leq \gamma & (9.12) \\ \beta + \gamma\alpha \leq \gamma & \Rightarrow & \beta\alpha^* \leq \gamma & (9.13) \\ (\alpha\beta)^* & \equiv & \alpha(\beta\alpha)^* & (9.14) \\ (\alpha^* \beta)^* \alpha^* & \equiv & (\alpha + \beta)^* & (9.15) \\ \alpha^* (\beta\alpha^*)^* & \equiv & (\alpha + \beta)^* & (9.16) \\ (\epsilon + \alpha)^* & \equiv & \alpha^* & (9.17) \\ \alpha\alpha^* & \equiv & \alpha^* \alpha & (9.18) \\ \end{array} > $$ ## quotient automata also known as DFA state minimization > [!tip] definition > > $$ > p \approx q \equiv [p] = [q] > $$ Define $$ M / \approx \space = (Q', \Sigma, \delta', [s], F') $$ where (13.1) $$ \begin{align*} Q' &= Q / \approx \\\ \delta'([p], a) &= [\delta(p, a)] \\\ s' &= [s] \\\ F' &= \{ [p] \mid p \in F \} \end{align*} $$ > [!tip] Lemma 13.5 > > If $p \approx q$, then $\delta(p, a) \approx \delta(q, a)$ equivalently, if $[p] = [q]$, then $[\delta(p, a)] = [\delta(q, a)]$ > [!tip] Lemma 13.6 > > $p \in F \iff [p] \in F'$ > [!tip] Lemma 13.7 > > $$ > \forall x \in \Sigma^*, \hat{\delta'}([p], x) = [\hat{\delta}(p, x)] > $$ > [!tip] Theorem 13.8 > > $L(M / \approx) = L(M)$ > [!tip] algorithm > > 1. Table of all pairs $\{p, q\}$ > 2. Mark all pairs $\{p, q\}$ if $p \in F \land q \notin F \lor q \in F \land p \notin F$ > 3. If there exists unmarked pair $\{p, q\}$, such that $\{ \delta(p, a), \delta(q, a) \}$ is marked, then mark $\{p, q\}$ > 4. $p \approx q \iff \{p, q\}$ is not marked --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/A1 tags: - sfwr2fa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/A1" title: DFAs, NFAs, and regular languages date: 2024-02-16 --- ## Q1. For each statement below, state if it is true or false, and explain why. The explanation does not need to be a formal proof, but the argument should be sound. > [!question] Statement a > > If $L_1$ is regular and $|L_1| = k$ and $L_2$ is non-regular, then $L_1 \cap L_2$ is regular. This statement is **false**. All finite languages are regular. $|L_1| = k$ implies that $L_1$ is finite, and therefore regular. The intersection of a regular language and a non-regular language is not guaranteed to be regular. Note that all string under $L_1 \cap L_2$ must be a subset of $L_1$, and a subset of a finite language is finite, therefore regular. For example, let $L_1$ be a regular language that contains a string $a^nb^n$ and $L_2 = \{a^nb^n\}$. The intersection of $L_1 \cap L_2$ is non-regular. > [!question] Statement b > > If $L_1$ and $L_2$ are non-regular, then $L_1 \cup L_2$ is regular. This statement is **false**. The union of a regular and a non-regular language is not guaranteed to be regular. A language is regular if there is an finite automaton that accepts it. Note that $L_1$ is a regular language, therefore finite, and $L_2$ is non-regular, therefore there does not exist a finite automaton that accepts it. If $L_1 \cup L_2$ is regular, then there must exist a finite automaton that accepts it. However, such automaton would also accept $L_2$ since $L_2 \subseteq L_{1} \cup L_2$, therefore meet contradiction. Which renders the statement **false**. > [!question] Statement c > > $\forall L_1 \mid L_1 \text{ :non-regular, } \exists L_2 \mid L_2 \text{ :regular} \land L_{1} \subseteq L_{2}$ This statement is **true**. Let $\Sigma$ be the alphabet of $L_1$., choose $L_2 = \Sigma^{*}$, which is regular. Since $\Sigma^{*}$ is the set of all strings formed from $\Sigma$ plus empty string, it is guaranteed to contain $L_1$. Therefore $L_1 \subseteq \Sigma^{*}$. Therefore, $L_1 \subseteq L_2$ ## Q2. Create a DFA $M$ such that: > [!question] Statement a > > M accepts all strings which begin with $b$ but do not contain the substring $bab$. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/A1/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/dfa_2a.svg) > [!question] Statement b > > $\mathcal{L}{(M)} = \lbrace a^ib^jc^k \mid i+j+k \text{ is a multiple of 3} \rbrace$, $\Sigma = \lbrace a,b,c \rbrace$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/A1/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/dfa_2b.svg) > [!question] Statement c > > $\mathcal{L}{(M)} = \lbrace x \mid \text{at least two a's in last three characters of x} \rbrace$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/A1/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/dfa_2c.svg) ## Q3. Via product construction, create a DFA $M$ such that $$ \mathcal{L}(M) = \{ a^n b^m \mid n \lor m \text{ is a multiple of 3} \} $$ First create two machine: one where $n$ is a multiple and one where $m$ is a multiple of 3. Then create the “union” machine: $$ \begin{align*} \mathcal{L}(M_1) &= \lbrace a^nb^m \mid n \text{ is a multiple of 3} \\\ \mathcal{L}(M_2) &= \lbrace a^nb^m \mid m \text{ is a multiple of 3} \end{align*} $$ First, we will construct $M_1$: ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/A1/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/dfa_3a.svg) Then, we will construct $M_2$: ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/A1/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/dfa_3b.svg) From product construction, we will create $M$ based on $M_1$ and $M_2$: ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/A1/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/dfa_23.svg) ## Q4. Create an NFA which accepts all string in which the third last character is an $a$. Then via subset construction, create an equivalent DFA. Show all your work _Solution_ We define the following NFA $(Q, \Sigma, \delta, q_0, F)$ with: - $Q = \{q_0, q_1, q_2, q_3, q_4\}$ - $\Sigma = \{a, b\}$ - Start state $q_0$ - Accept state $q_{3}$ - Transition function $\delta$ as follows: $$ \begin{align*} \delta(q_0, a) &= \{q_0, q_1\} \\\ \delta(q_0, b) &= \{q_0\} \\\ \delta(q_1, a) &= \{q_2\} \\\ \delta(q_1, b) &= \{q_2\} \\\ \delta(q_2, a) &= \{q_3\} \\\ \delta(q_2, b) &= \{q_3\} \\\ \delta(q_3, a) &= \{q_4\} \\\ \delta(q_3, b) &= \{q_4\} \\\ \delta(q_4, a) &= \{q_4\} \\\ \delta(q_4, b) &= \{q_4\} \end{align*} $$ Via subset construction, we can create the following DFA: Start state of DFA is $\{ q_0 \}$, as it is the epsilon closure of the start state of the NFA Transition table: | DFA state | $a$ | $b$ | | -------------------------------- | -------------------------------- | ------------------------- | | $\{q_{0}\}$ | $\{q_{0}, q_{1}\}$ | $\{q_{0}\}$ | | $\{q_{0}, q_{1}\}$ | $\{q_{0}, q_{1}, q_{2}\}$ | $\{q_{0},q_{2}\}$ | | $\{q_{0}, q_{2}\}$ | $\{q_{0}, q_{1}, q_{3}\}$ | $\{q_{0},q_{3}\}$ | | $\{q_{0}, q_{1}, q_{2}\}$ | $\{q_{0}, q_{1}, q_{2}, q_{3}\}$ | $\{q_{0}, q_{2}, q_{3}\}$ | | $\{q_{0}, q_{3}\}$ | $\{q_{4}\}$ | $\{q_{4}\}$ | | $\{q_{0}, q_{1}, q_{2}, q_{3}\}$ | $\{q_{4}\}$ | $\{q_{4}\}$ | | $\{q_{4}\}$ | $\{q_{4}\}$ | $\{q_{4}\}$ | The final state are any states that include $q_{3}$, which are $\{q_{0}, q_{1}, q_{2}, q_{3}\}$ and $\{q_{0}, q_{3}\}$. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/A1/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a1/dfa_44.svg) Where ```python dfa_states = { 'D0': '{q0}', 'D1': '{q0, q1}', 'D2': '{q0, q2}', 'D3': '{q0, q1, q2}', 'D4': '{q0, q3}', 'D5': '{q0, q1, q2, q3}', 'D6': '{q4}' } ``` --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a2/A2 tags: - sfwr2fa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a2/A2" title: Regex, pumping lemma date: 2024-03-18 --- ## Problème 1. Give regular expression for the languages bellow > [!question] 1.a > > $$ > L = \{ a^nb^m \mid (n+m) \% 2 = 0 \} > $$ $$ L = (aa)^*(bb)^* + a(aa)^*b(bb)^* $$ > [!question] 1.b > > $$ > L = \{ w \mid w \text{ does not contain the substring: } aba \} > $$ $$ L = b^* + b^*a^+b^* + b^*(ab^+)^*ab^* $$ > [!question] 1.c > > $$ > L = \{ w \mid w \text{ has an even number of } b' \text{s}\} > $$ $$ L = a^* + a^*ba^*ba^* $$ ## Problème 2. Minimize the number of states in this DFA via quotient construction. Show all your steps, specifically in your table where you are “marking” nodes indicate which iteration you marked them on. Final states $\{ 0, 3 \}$, and non-final states are $\{ 1, 2, 4, 5, 6, 7, 8, 9 \} $ Initial table with all pairs: ```plaintext 0 1 2 3 4 5 6 7 8 9 --------------------- 0 | * * * 1 | * * * 2 | * * * 3 | * * * 4 | * * * 5 | * * * * * 6 | * * * * * 7 | * * * * 8 | * * * * * 9 | * * * * ``` Mark based on final/non-final states: ```plaintext 0 1 2 3 4 5 6 7 8 9 --------------------- 0 | ✓ ✓ ✓ 1 | ✓ 2 | ✓ 3 | ✓ ✓ ✓ 4 | ✓ 5 | ✓ 6 | ✓ 7 | 8 | ✓ 9 | ``` ### first iteration. {0,2} -(a)→ {1,3}: not marked {0,2} -(b)→ {6,7}: not marked {0,3} -(a)→ {1,4}: not marked {0,3} -(b)→ {6,8}: not marked {0,4} -(a)→ {1,5}: not marked {0,4} -(b)→ {6,8}: not marked {0,7} -(a)→ {1,9}: not marked {0,7} -(b)→ {6,8}: not marked {0,8} -(a)→ {1,9}: not marked {0,8} -(b)→ {6,5}: not marked {0,9} -(a)→ {1,7}: not marked {0,9} -(b)→ {6,6}: not marked ```plaintext 0 1 2 3 4 5 6 7 8 9 --------------------- 0 | ✓ ✓ ✓ 1 | ✓ 2 | ✓ 3 | ✓ ✓ ✓ 4 | ✓ 5 | ✓ 6 | ✓ 7 | 8 | ✓ 9 | ``` ### second iteration. {1,2} -(a)→ {2,3}: marked so we mark {1,2} {1,5} -(a)→ {2,0}: ```plaintext 0 1 2 3 4 5 6 7 8 9 --------------------- 0 | ✓ ✓ ✓ 1 | ✓ 2 | x ✓ 3 | ✓ ✓ ✓ 4 | ✓ 5 | ✓ 6 | ✓ 7 | 8 | ✓ 9 | ``` _sorry I gave up_ ### Problème 3. Your friend is looking at the formal definition of the pumping lemma and they think something is wrong $$ L \text{ is regular } \implies (\exists k \mid k>0: (\forall x,y,z \mid xyz \in L \land |y| > k: ( \exists u,v,w \mid y=uvw \land v \neq \epsilon: (\forall i | i \geq 0: xuv^iwz \in L)))) $$ They understand the argument it is crafted around, that is, due to the fact that strings are arbitrarily long and a DFA has finite states there must be a segment of accepted strings which “loop” in the machine. However, they claim for the pumping lemma above to hold, $L$ must be infinite, because if $L$ was finite the argument about “looping” no longer holds. Therefore, the pumping lemma only holds when $L$ is infinite. You can see where your friend is coming from, but they are incorrect. Why? Be precise in your argument, that is, show how if $L$ is finite, then $$ (\exists k \mid k > 0: (\forall x,y,z \mid xyz \in L \land |y| > k: ( \exists u,v,w \mid y=uvw \land v \neq \epsilon: (\forall i | i \geq 0: xuv^iwz \in L)))) $$ evaluates to true. (hint: If $L$ is finite, there is a “longest string”) _Solution_ Let $\ell$ be the length of the longest string in $L$. We can choose a pumping length $k$ such that $k \geq \ell = \ell+1$. Now let’s evaluate the following statement $$ (\forall x,y,z \mid xyz \in L \land |y| > k: ( \exists u,v,w \mid y=uvw \land v \neq \epsilon: (\forall i | i \geq 0: xuv^iwz \in L))) $$ Since $k > \ell$, there doesn’t exist a string $xyz \in L$ such that $|y| > k$. Therefore, antecendent of the implication $xyz \in L \land |y| > p$ is always false. Therefore, the entire inner implication is vacuously true, and the entire statement is true. Therefore, the pumping lemma holds for finite languages. ## Problème 4. Using the Pumping Lemma, prove the following languages are not regular. Make your steps in the “game” and variable choices very clear for each question. > [!note] Pumping Lemma > > There exists a pumping length $p$ such that $\forall s \in L, |s| \geq p$, we can write $s = xyz$ such that > > i. $|y| > 0$ > > ii. $|xy| \leq p$ > > iii. $\forall i \geq 0, xy^iz \in L$ > [!question] 4.a > > $$ > L = \{ a^{nm}b^ma^n \mid n,m \geq 0 \} > $$ Assume $L$ is regular. Let $s = a^{p^2}b^pa^p$. $s \in L$ since choosing $n=p,m=p$, and $|s|=p^2+2p \geq p$ for $p \geq 1$. By the pumping lemma, we can write $s=xyz$ satisfying conditions i) and iii). Since $|xy| \leq p$, $y$ must consist of only $a$‘s. Let $y = a^k$ for $1 \leq k \leq p$. Consider string $xy^0z = xz = a^{p^2-k}b^pa^p$. From condition iii), this must be in $L$. However, $xz$ has the first block of $a$ to length $p^2-k$ and last block of length $p$. To be in $L$, we must have $p^2-k=p \cdot m$ for some integer $m$. But $p^2-k>p$ for $p \geq 2$, so there is no such $m$ exists, which means $xz \notin L$. Thus, it contradicts the pumping lemma, and $L$ is not regular. $\square$ > [!question] 4.b > > $$ > L = \{ww \mid w \in \Sigma^*\} > $$ Assume $L$ is regular. Let $s = a^pb^pa^pb^p$. $s \in L$ since chos\sin g $w=a^pb^p$ and $|s|=4p \geq p$ for $p \geq 1$. By pumping lemma, we can write $s=xyz$ satisfying conditions i) and iii). Since $|xy| \leq p$, $y$ must consist of only $a$‘s. Let $y = a^k$ for $1 \leq k \leq p$. Consider the string $xy^2z = xa^kya^kb^pa^pb^p$. By condition iii) of the pumping lemma, this must be in $L$. However, $xy^2z$ to be in $L$, it must be the for of $ww$ for some $w \in \Sigma^*$. But the first half of $xy^2z$ is $a^{p+k}b^p$ and the second half is $a^pb^p$. Since $k \leq p$. So $xy^2z \notin L$. Thus, it contradicts the pumping lemma, and $L$ is not regular. $\square$ > [!question] 4.c > > $$ > L = \{ a^{k^3} \mid k \leq 0 \} > $$ Assume $L$ is regular. Let $s = a^{p^3}$. $s \in L$ since choosing $k=p$, and $|s|=p^3 \geq p$ for $p \geq 1$. By the pumping lemma, we can write $s=xyz$ satisfying conditions i) and iii). Since $|xy| \leq p$, $y$ must consist of only $a$‘s. Let $y = a^k$ for $1 \leq k \leq p$. Consider the string $xy^2z = xa^kya^kz = a^{p^3+k}$. By condition iii) of the pumping lemma, this must be in $L$. However, $xy^2z$ to be in $L$, it must be of the form $a^{k^3}$ for some $k \leq 0$. $p^3 < p^3+k < (p+1)^3$, which means $p^3+k$ is not a perfect cube, so $xy^2z \notin L$. Thus, it contradicts the pumping lemma, and $L$ is not regular. $\square$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a3/A3 tags: - sfwr2fa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a3/A3" title: Context-free grammar and push-down Turing machine date: 2024-04-10 --- ## Problème 1. Give a context free grammar for the following language: $$ L = \{a^nb^mc^k \mid k \neq n + m\} $$ _Solution_ The following CFG generates the language $L$: $$ \begin{aligned} S &\rightarrow S_1 \mid S_2 \mid S_3 \\ S_1 &\rightarrow aS_1c \mid aS_1 \mid A \\ S_2 &\rightarrow bS_2c \mid bS_2 \mid B \\ S_3 &\rightarrow aS_3b \mid cS_3 \mid C \\ A &\rightarrow aAc \mid aA \mid a \mid \varepsilon \\ B &\rightarrow bBc \mid bB \mid b \mid \varepsilon \\ C &\rightarrow aCb \mid cC \mid c \mid \varepsilon \end{aligned} $$ ## Problème 2. Let $$ L_1 =\{a^nb^mc^k \mid n,m,k \geq 0\} $$ and let $$ L_2 = \{a^nb^nc^n \mid n \geq 1\} $$ Complete the pushdown automata $M$ such that $L(M) = L_1 - L_2$, where $\Sigma = \{a,b,c\}$. _Solution_ Given that the $L(M)$ will accept all string where the number of a’s, b’s, and c’s are not all the same or there are zero of one or more types of characters, the following is the pushdown automata $M$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a3/A3/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/a3/p2.webp) ## Problème 3. The first table: | | 1 | 0 | x | # | c | $\square$ | | ------------ | -------------------- | -------------------- | ------------------ | ------------------- | -------------------- | ------------------------------ | | $q_{s}$ | $(q_{1,1}, x, R)$ | $(q_{1,3}, x, R)$ | $(q_{s}, x, R)$ | - | - | - | | $q_{1,1}$ | $(q_{1,1},1,R)$ | $(q_{1,1},0,R)$ | - | $(q_{1,2}, \#, R)$ | - | - | | $q_{1,2}$ | $(q_{1,5},x,R)$ | ==$(q_{1,2},x,R)$== | $(q_{1,2}, x, R)$ | - | - | - | | $q_{1,3}$ | $(q_{1,3},1,R)$ | $(q_{1,3},0,R)$ | - | $(q_{1,4}, \#, R)$ | - | - | | $q_{1,4}$ | ==$(q_{1,3},c,L)$== | $(q_{1,7},x,R)$ | $(q_{1,4}, x, R)$ | - | - | - | | $q_{1,5}$ | $(q_{1,5},1,R)$ | $(q_{1,5},0,R)$ | - | $(q_{1,8},\#,R)$ | - | - | | $q_{1,6}$ | $(q_{1,6},1,R)$ | $(q_{1,6},0,R)$ | - | $(q_{1,9},\#,R)$ | - | - | | $q_{1,7}$ | $(q_{1,7},1,R)$ | $(q_{1,7},0,R)$ | - | $(q_{1,10},\#,R)$ | - | - | | $q_{1,8}$ | ==$(q_{1,8},1,R)$== | ==$(q_{1,8},1,R)$== | - | - | ==$(q_{1,8}, c,R)$== | ==$(q_{1,end1}, \square, L)$== | | $q_{1,9}$ | ==$(q_{1,9},1,R)$== | ==$(q_{1,9},0,R)$== | - | - | ==$(q_{1,9},1,R)$== | ==$(q_{1,end2}, \square, L)$== | | $q_{1,10}$ | ==$(q_{1,10},c,R)$== | ==$(q_{1,10},1,R)$== | - | - | ==$(q_{1,10},c,R)$== | ==$(q_{1,end3}, \square, L)$== | | $q_{1,end1}$ | $(q_{1,end1},1,L)$ | $(q_{1,end1},0,L)$ | - | $(q_{1,end2},\#,L)$ | $(q_{1,end1},c,L)$ | - | | $q_{1,end2}$ | $(q_{1,end3},1,L)$ | $(q_{1,end3},0,L)$ | $(q_{1,end2},x,L)$ | $(q_{1,end2},\#,L)$ | - | - | | $q_{1,end3}$ | $(q_{1,end3},1,L)$ | $(q_{1,end3},0,L)$ | $(q_{1,end3},x,L)$ | $(q_{1,end3},\#,L)$ | - | $(q_{2},s,\square,R)$ | The second table: | | 1 | 0 | x | # | c | $\square$ | | --------- | ------------------------- | ------------------------- | --------------------------- | --------------------------- | ------------------------- | ----------------------- | | $q_{2,s}$ | - | - | ==$(q_{2,s}, \square, R)$== | ==$(q_{2,1}, \square, R)$== | - | - | | $q_{2,1}$ | - | - | ==$(q_{2,1}, \square, R)$== | ==$(q_{2,1}, \square, R)$== | - | - | | $q_{2,2}$ | ==$(q_{2,1},\square,L)$== | ==$(q_{2,1},\square,L)$== | - | - | ==$(q_{2,1},\square,L)$== | $(q_{3,s}, \square, L)$ | The final transition table: | | 1 | 0 | c | $\square$ | | --------- | ----------------------- | ----------------------- | ----------------------- | ------------------------- | | $q_{3,s}$ | $(q_{3,s},1,R)$ | $(q_{3,s},0,R)$ | $(q_{3,s},c,R)$ | $(q_{3,1},\square,R)$ | | $q_{3,1}$ | $(q_{3,2}, 0, L)$ | $(q_{3,2}, 1, L)$ | $(q_{3,1}, 1, L)$ | $(q_{3,2},\square,R)$ | | $q_{3,2}$ | $(q_{3,1}, \square, L)$ | $(q_{3,1}, \square, L)$ | $(q_{3,1}, \square, L)$ | $(q_{3,end}, \square, L)$ | --- slug: thoughts/university/twenty-three-twenty-four/sfwr-2fa3/index tags: - university - sfwr2fa3 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-2fa3/index" title: Discrete Mathematics date: 2024-10-29 --- Ok second try. --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3bb4/Sequential-programming tags: - sfwr3bb4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3bb4/Sequential-programming" title: Sequential programming date: 2023-09-11 --- ```mermaid flowchart TD 1[x>=0] --> 2[z+u*y = x*y & u >=0] --> 4[z = x*y] ``` # Annotations and correctness[](#annotations-and-correctness) ```prolog {P} S {Q} ``` ```mermaid --- title: correctness assertion --- stateDiagram-v2 direction LR P --> Q: S ``` ### rules for correctness If $P\wedge B \rightarrow Q$ # Sequential composition[](#sequential-composition) _Array as a partial function_ ```algorithm x := (x; E:F) ``` > _array_ is a function $D \rightarrow T$ where $D$ is a ‘small’ range of integers and $T$ is the type of array element _alter function `(x; E:F)`_ is defined by ```algorithm (x; E:F)(G) = F if E = G (x; E:F)(G) = x(G) if e != G ``` For example: Given array `x`: ```algorithm {x(0) = a ^ x(1) = b} x(1) := c {x(0) = a ^ x(1) = c} ``` > S is the sum(0..k) → loop invariant ```algorithm s, k := a(0), 1` {s = (\sum )} ``` --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3bb4/index tags: - university - sfwr3bb4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3bb4/index" title: Concurrent System Design date: 2023-09-04 --- ## Elements of Software Design This is a series of interactive notebooks that are used for teaching software design and more specifically concurrent system design at McMaster University. The accompanying assignments and tests are not part of this repository. A novel aspect is the use of state diagrams for teaching concurrency, including non-interference, see: [Teaching Concurrency with the Disappearing Formal Method](http://doi.org/10.1007/978-3-030-32441-4_9). Sekerinski, E. In Dongol, B.; Petre, L.; and Smith, G., editor(s), Formal Methods Teaching, volume 11758 of Lecture Notes in Computer Science, pages 135–149, 2019. Springer, Cham. The course notes are being constantly revised; comments are welcome. See my [home page](http://www.cas.mcmaster.ca/~emil/) for the latest installments of the courses using these notes. Of course, I would love to hear if you plan to use these notes for courses or otherwise. Most images are cell attachments, which are not rendered by GitHub. Images that are linked files are rendered, but not in the correct size. To view the notebooks properly, follow the instructions below. — Emil Sekerinski See also: [Data Structures and Algorithms](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3bb4/index/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/). ### Installation You need [Python 3](https://www.python.org/downloads/). Update `pip3` or `pip`, depending on your installation: ``` pip3 install --upgrade pip ``` Install Jupyter: ``` pip3 install jupyter ``` Jupyter can now be run by: ``` jupyter notebook ``` The notebooks rely on following Jupyter extensions: - [`exercise`](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/exercise/readme.html) with [`rubberband`](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/rubberband/readme.html): for releaving solution hints with the ⊞ symbol; these extensions can also be installed through the [`Jupyter nbextensions configurator`](https://github.com/ipython-contrib/jupyter_contrib_nbextensions) - [`jupyter-emil-extension`](https://gitlab.cas.mcmaster.ca/parksj6/jupyter-se3bb4-extension): for formatting of algorithms and layout of slides. Install locally by: ```sh curl -LJO https://gitlab.cas.mcmaster.ca/parksj6/jupyter-emil-extension/-/jobs/artifacts/master/download?job=build unzip -a jupyter-emil-extension.zip cd dist/ python3 -m pip install -e . --upgrade jupyter nbextension install --py jupyter_emil_extension jupyter nbextension enable --py jupyter_emil_extension ``` - [`RISE`](https://github.com/damianavila/RISE): optionally, for presenting the notebooks as slides; the resolution may need to be adjusted under “Edit Notebook Metadata” You also need to install the used programming languages: - [Java](https://java.com/en/download/): any recent version, only command line (no IDE) - [Go](https://golang.org/dl/): any recent version, only command line (no IDE) --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Block-Diagrams tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Block-Diagrams" title: Block Diagrams date: 2024-01-24 --- See also [sides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Block-Diagrams/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/block_diagrams.pdf) ## Moving through summing junction ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Block-Diagrams/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/summing-junction.webp) ## Reduction via Familiar Forms --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Frequency-Domain tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Frequency-Domain" title: Frequency Domain and a la carte. date: 2024-01-09 --- [Introduction](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Frequency-Domain/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/intro.pdf) and [Notes](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Frequency-Domain/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/frequency_domain.pdf) Open-loop versus closed-loop Transient and steady-state response Stability - Total response = Natural response + Forced response - Natural response (homogeneous solution): evolution of system due to initial conditions - Forced response (particular solution): evolution of system due to input Control objects: - Stabilize the system - Produce the desired transient response - Decrease/eliminate steady-state error - Make system “robust” to withstand disturbances and variations in parameters - Achieve optimal performance ## Block diagram representation of a system ```mermaid stateDiagram-v2 direction LR [*] --> System: r(t) System --> End: c(t) ``` _System as linear differential equation_ ## Laplace Transform ```mermaid graph LR Diff{{differential equations}} -- "Laplace transform" --> Algebraic{{algebraic equations}} -- "inverse Laplace transform" --> End{{time domain solution}} ``` $$ \mathcal{L} \{f(t)\} = \int_0^{\infty}f(t)^{-st}dt = F(s) $$ $$ \begin{array}{c c c} \hline \text{Item no.} & f(t) & F(s) \\ \hline 1. & \delta(t) & 1 \\ 2. & u(t) & \frac{1}{s} \\ 3. & tu(t) & \frac{1}{s^2} \\ 4. & t^n u(t) & \frac{n!}{s^{n+1}} \\ 5. & e^{-at}u(t) & \frac{1}{s + a} \\ 6. & \sin(\omega t)u(t) & \frac{\omega}{s^2 + \omega^2} \\ 7. & \cos(\omega t)u(t) & \frac{s}{s^2 + \omega^2} \\ \hline \end{array} $$ $$ \delta{(t)} = 0, \quad t \neq 0,\quad \int_0^{\infty}{\delta{(t)}}dt=1 $$ ### Properties $$ \begin{aligned} & f(0-)\text{: initial condition just before 0} \\[12pt] & \textbf{Linearity:} \quad \mathcal{L}\{k_1 f_1(t) \pm k_2 f_2(t)\} = k_1 F_1(s) \pm k_2 F_2(s) \\[12pt] & \textbf{Differentiation:} \\ & \quad \mathcal{L}\left\{\frac{df(t)}{dt}\right\} = sF(s) - f(0^-) \\ & \quad \mathcal{L}\left\{\frac{d^2f(t)}{dt^2}\right\} = s^2 F(s) - sf(0^-) - f'(0^-) \\[12pt] & \textbf{Frequency Shifting:} \quad \mathcal{L}\{e^{-at}f(t)\} = F(s + a) \\ \end{aligned} $$ ### Transfer function $n^{th}$ order _linear, time-invariant_ (LTI) differential equation: $$ a_n \frac{d^n c(t)}{dt^n} + a_{n-1} \frac{d^{n-1} c(t)}{dt^{n-1}} + \cdots + a_0 c(t) = b_m \frac{d^m r(t)}{dt^m} + b_{m-1} \frac{d^{m-1} r(t)}{dt^{m-1}} + \cdots + b_0 r(t) $$ _takes Laplace transform from both side_ $$ \begin{aligned} & a_n s^n C(s) + a_{n-1} s^{n-1} C(s) + \cdots + a_0 C(s) \text{ and init terms for } c(t) \\ & = b_m s^m R(s) + b_{m-1} s^{m-1} R(s) + \cdots + b_0 R(s) \text{ and init terms for } r(t) \\ \end{aligned} $$ _assume initial conditions are zero_ $$ \begin{aligned} (a_n s^n + a_{n-1} s^{n-1} + \cdots + a_0)C(s) &= (b_m s^m + b_{m-1} s^{m-1} + \cdots + b_0)R(s) \\[8pt] \frac{C(s)}{R(s)} &= G(s) = \frac{b_m s^m + b_{m-1} s^{m-1} + \cdots + b_0}{a_n s^n + a_{n-1} s^{n-1} + \cdots + a_0} \end{aligned} $$ > [!tip] Transfer function > > $$ > G(s)=\frac{C(s)}{R(s)} > $$ Q: $G(s) = \frac{1}{S+2}$. Input: $u(t)$. What is $y(t)$ ? $$ \begin{aligned} Y(s) &= G(s)\cdot u(s) \rightarrow Y(s)=\frac{1}{s(s+2)} = \frac{A}{s} + \frac{B}{s+2} = \frac{1}{2\cdot{s}} - \frac{1}{2\cdot{(s+2)}} \\ y(t) &= -\frac{1}{2}(1-e^{-2t})u(t) \end{aligned} $$ ## Inverse Laplace transform $$ \mathcal{L}^{-1} \{ F(s) \} = \frac{1}{2\pi j} \lim_{\omega \to \infty} \int_{\sigma-j\omega}^{\sigma+j\omega} F(s) e^{st} \, ds $$ ## Partial fraction expansion $$ \begin{aligned} F(s) &= \frac{N(s)}{D(s)} \\[8pt] N(s) &: m^{th} \text{ order polynomial in } s \\ D(s) &: n^{th} \text{ order polynomial in } s \\ \end{aligned} $$ ### Decomposition of $\frac{N(s)}{D(s)}$ 1. **Divide if improper**: $\frac{N(s)}{D(s)}$ such that $\text{degree of }N(s) \leq \text{degree of } D(s)$ such that $\frac{N(s)}{D(s)} = \text{a polynomial } + \frac{N_1(s)}{D(s)}$ 2. **Factor Denominator**: into factor form $$ (ps+q)^m \text{ and } (as^2+bs+c)^n $$ 3. **Linear Factors**: $(ps+q)^m$ such that: $$ \sum_{j=1}^{m}\frac{A_j}{(ps+q)^j} $$ 4. **Quadratic Factors**: $(as^2+bs+c)^n$ such that $$ \sum_{j=1}^{n}{\frac{B_j s+C_j}{(as^2+bs+c)^j}} $$ 5. **Determine Unknown** ## Stability analysis using Root of $D(s)$ > [!tip] roots of > > roots of $D(s)$ as **poles** $$ G(s) = \frac{N(s)}{D(s)} = \frac{N(s)}{\prod_{j=1}^{n}(s+p_j)} = \sum_{j=1}^{n}{\frac{A_j}{s+p_j}} $$ > $p_i$ can be imaginary Solving for $g(t)$ gives $$ g(t) = \sum_{j=1}^{n}{\mathcal{L}^{-1}\{\frac{A_j}{(s+p_j)}\}} = \sum_{j=1}^{n}{A_je^{-p_jt}} $$ ### stability analysis > [!tip] Important > > If $\sigma_i > 0$ then pole is in the left side of imaginary plane, and system is ==**stable** == ### Complex root For poles at $s=\sigma_i \pm j\omega$ we get $$ \frac{\alpha + j\beta}{s + \sigma_i + j\omega_i} + \frac{\alpha - j\beta}{s + \sigma_i - j\omega_i} $$ Wants to be on LHP for time-function associated with $s$ plane to be _stable_ ## Impedance of Inductor $$ Z(s) = \frac{V(s)}{I(s)} = Ls $$ since the voltage-current relation for an inductor is $v(t) = L\frac{di(t)}{dt}$ ## Impedance of Capacitor $$ Z(s) = \frac{V(s)}{I(s)} = \frac{1}{Cs} $$ since the voltage-current relation for a capacitor is $v(t) = \frac{1}{C} \int_0^{t}{i(\tau) d\tau}$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus-control tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus-control" title: Root locus control date: 2024-02-28 --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus-control/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/root_locus_control.pdf) and [Root locus](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus-control/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus) closed-loop properties of the function of $K_1 G(s)$ \## improving transient response ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus-control/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/transient-response-root-locus.webp) > [!question] Question > > How to calculate K? - Product of distances from open-loop pole to point in question > Second-order poles for the second-order system. ## improving steady state error (SSE) adding PID (compensator) with an integrator ($\frac{1}{s}$) in feed forward path. ### ideal integral compensation _proportional-plus-integral (PI) controller_ ⇒ causing error to go to zero. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus-control/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/ideal-integral-compensator.webp) > [!tip] Important > > Add zero! on the pole near the origin at $s=-a$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus-control/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/zero-add-compensator.webp) $$ \frac{K}{s}(s+a) = K_p + \frac{K_i}{s} $$ where $K_p$ is the proportional gain, and $K_i$ is the integral gain. > [!tip] Implementation > > $$ > G_c(s) = K_p + \frac{K_i}{s} = \frac{K_p(s+\frac{K_i}{K_p})}{s} > $$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus-control/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/idea-integral-compensator-impl.webp) ### lag compensation --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus" title: Root locus date: 2024-02-28 --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Root-locus/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/root_locus.pdf) --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/State-space-representation tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/State-space-representation" title: State space representation date: 2024-01-24 --- See also [sides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/State-space-representation/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/state_space.pdf) > time-domain technique $$ \begin{align} \dot{x} &= Ax + Bu \\\ y &= Cx + Du \end{align} $$ - _Linearly independent_ - _State vector_: $x = [x_{1},x_{2},\ldots, x_{n}]^{T}$ ## transfer function to a state space representation ### controller form Given $$ G(s) = \frac{\sum_{i=1}^{n-1}b_is^i + b_{0}}{s^n + \sum_{i=1}^{n-1}a_is^{i} + a_{0}} = \frac{Y(s)}{U(s)} $$ We get _controller canonical state space_ form: $$ \begin{aligned} \dot{x}(t) &= \begin{bmatrix} 0 & 1 & 0 & \cdots & 0 & 0 \\\ 0 & 0 & 1 & \cdots & 0 & 0 \\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\\ 0 & 0 & 0 & \cdots & 1 & 0 \\\ 0 & 0 & 0 & \cdots & 0 & 1 \\\ -a_0 & -a_1 & -a_2 & \cdots & -a_{n-2} & -a_{n-1} \end{bmatrix} x(t) + \begin{bmatrix} 0 \\\ 0 \\\ \vdots \\\ 0 \\\ 0 \\\ 1 \end{bmatrix} u(t) \\\ y(t) &= \begin{bmatrix} b_0 & b_1 & \cdots & b_{n-2} & b_{n-1} \end{bmatrix} x(t). \end{aligned} $$ ### observer form We get _observer canonical state space_ form: $$ \begin{aligned} \dot{x}(t) &= \begin{bmatrix} -a_{n-1} & 1 & 0 & \cdots & 0 & 0 \\ -a_{n-2} & 0 & 1 & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ -a_2 & 0 & 0 & \cdots & 1 & 0 \\ -a_1 & 0 & 0 & \cdots & 0 & 1 \\ -a_0 & 0 & 0 & \cdots & 0 & 0 \end{bmatrix} x(t) + \begin{bmatrix} b_{n-1} \\ b_{n-2} \\ \vdots \\ b_2 \\ b_1 \\ b_0 \end{bmatrix} u(t) \\ y(t) &= \begin{bmatrix} 1 & 0 & \cdots & 0 & 0 \end{bmatrix} x(t). \end{aligned} $$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Time-response tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Time-response" title: Time response date: 2024-01-31 --- ## first order. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Time-response/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/time-constant.webp) ## second order. $$ G(s) = \frac{b}{s^2 + as + b} $$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Time-response/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/second-order-system.webp) $$ C(s) = \frac{9}{s(s^2+9s+9)} $$ ### over-damped response. For inspect of poles, form of system’s response $$ c(t) = K_1 + K_2e^{-\sigma_1 t} + K_3e^{-\sigma_2 t} $$ ### critically damped response. System’s response: $$ c(t) = K_1 + K_2e^{-\sigma_1 t} + K_3te^{-\sigma_2 t} $$ where $-\sigma_1=-3$ is our pole location. ### under-damped response. Unit step response to the system: $$ C(s) = \frac{K_1}{s} + \frac{\alpha + j\beta}{s+1+j\sqrt{8}}+ \frac{\alpha - j\beta}{s+1-j\sqrt{8}} $$ Thus the form of system’s response: $$ c(t) = K_1 + e^{-\sigma_dt} \lbrack 2\alpha \cos \omega_d t+ 2\beta \sin \omega_d t \rbrack $$ $$ e^{-\sigma_dt} \lbrack 2\alpha \cos \omega_d t+ 2\beta \sin \omega_d t \rbrack = K_4 e^{-\sigma_d t} \cos (\omega_dt - \phi) $$ where $\phi = \tan^{-1}(\frac{\beta}{\alpha})$ and $K_4=\sqrt{(2\alpha)^2 + (2\beta)^2}$ ### general second-order systems - nature frequency $\omega_n$: frequency of oscillation of the system - damping ratio $\zeta = \frac{\text{exponential decay frequency}}{\text{natural frequency (rad/sec)}}$ _Deriving_ $\zeta$: - For _under-damped_ system, the poles are $\sigma = \frac{-a}{2}$ ### %OS (percent overshoot) $$ \%OS = e^{\zeta \pi / \sqrt{1-\zeta^2}} \times 100 \% $$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a1/content tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a1/content" title: Transfer functions of continuous-time systems date: 2024-02-09 --- **Problem 1**: Consider the following system: ![assignment-1-circuit](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a1/content/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/assignment-1-circuit.webp) Let $R_1 = 40\Omega, R_2 = 20\Omega, L = 10mH, C= 1\mu F$. The input is $v_{in}$ the output is $v_{out}$. Give both transfer function and state space representation for the system. _Solution_ Given circuit is a second-order linear system due to presence of one inductor (L) and one capacitor (C). Given transfer function $H(s)$ is given by the ratio over Laplace domain: $$ H(s) = \frac{V_{out}(s)}{V_{in}(s)} $$ Given that the impedance of the inductor $Z_l = sL$ and the impedance of the capacitor $Z_c = \frac{1}{sC}$, the total impedance of the circuit is given by: $$ Z_{\text{total}} = \frac{1}{\frac{1}{sL} + sC} $$ Using voltage divider rule, the transfer function is given by: $$ H(s) = \frac{V_{out}(s)}{V_{in}(s)} = \frac{\frac{1}{sC}}{\frac{1}{sL} + \frac{1}{sC}} $$ --- --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a2/content tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a2/content" title: Second-order systems date: 2024-03-01 --- ### problem 1. Consider the following system: ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a2/content/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a2/a1-system.webp) > [!question] Question > > Using the properties of second-order systems, determine $K_p$ and $K_d$ such that the overshoot is 10 percent and the settling time is 1 second. Confirm that your design meets the requirements by plotting the step response. Given the percent overshoot $\%OS$ and settling time based on the damping ratio $\zeta$ and natural frequency $\omega_n$: $$ \begin{align*} \%{OS} &= e^{\frac{-\zeta\pi}{\sqrt{1-\zeta^2}}} \times 100 \% \\\ T_s &= \frac{4}{\zeta\omega_n} \end{align*} $$ For 10% overshoot, we can solve for $\zeta$: $\zeta = \frac{-\ln(\%{OS}/100)}{\sqrt{\pi^2 + \ln^2(\%{OS}/100)}} \approx 5.916 \times e{-1}$. For 1 second settling time, we can solve for $\omega_n$: $\omega_n = \frac{4}{\zeta T_s} \approx 6.76 \space rad \space s$. Given second-order systems’ transfer function: $$ G(s) = \frac{\omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2} $$ and the transfer function of the PID controller in the given system is given by: $$ G_c(s) = K_p + K_d s $$ The transfer function is then followed by: $$ T(s) = G(s) G_c(s) = \frac{K_p + K_d s}{s^2 + 7s + 5} $$ We then have $\omega_n$ and $\zeta$ to solve for $K_p$ and $K_d$: $$ \begin{align*} 7+K_pK_d &= 2\zeta\omega_n \\\ 5+K_d &= \omega_n^2 \end{align*} $$ Thus, $K_p = 40.784365358764106$ and $K_d = 0.9999999999999991$. The following is the [code](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a2/p1.py) snippet for generating the graphs and results: ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a2/content/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a2/p1.webp) ```python title="p1.py" from scipy.optimize import fsolve import numpy as np import matplotlib.pyplot as plt from scipy.signal import TransferFunction, step OS, Ts = 0.10, 1.0 zeta = fsolve(lambda z: np.exp(-z*np.pi/np.sqrt(1-z**2)) - OS, 0.5)[0] wn = 4 / (zeta * Ts) # Coefficients from the standard second-order system a1 = 2 * zeta * wn # coefficient of s a0 = wn**2 # constant coefficient # Equating the coefficients to solve for Kp and Kd # 7 + Kd = a1 and 5 + Kp = a0 Kd = a1 - 7 Kp = a0 - 5 # Confirm the design by plotting the step response # First, define the transfer function of the closed-loop system with the calculated Kp and Kd G = TransferFunction([Kd, Kp], [1, 7+Kd, 5+Kp]) # Now, generate the step response of the system time = np.linspace(0, 5, 500) time, response = step(G, T=time) print(Kp, Kd, zeta, wn) # Plot the step response plt.figure(figsize=(10, 6)) plt.plot(time, response) plt.title('Step Response of the Designed PD Controlled System') plt.xlabel('Time (seconds)') plt.ylabel('Output') plt.grid(True) plt.show() ``` --- ### problem 2. Consider the following system: ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a2/content/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a2/p2.webp) > [!question] set a. > > If $K_d=K_p=K_i = 1$, is the system stable? (Please determine this by explicitly finding the poles of the closed-loop system and reasoning about stability based on the pole locations.) Given that $K_d = K_p = K_i = 1$, The PID controller transfer function is: $$ C(s) = K_p + \frac{K_i}{s} K_d s = 1 + \frac{1}{s} + s $$ The open-loop transfer function is given by: $G(s) = C(s) P(s) = (1 + s + \frac{1}{s}) \frac{1}{s^2 + 3s + 1}$. Thus the closed-loop transfer function is given by $T(s) = \frac{G(s)}{1 + G(s)} = \frac{s^3 + s^2 + 1}{s^3 + s^2 + 4s + 2}$. We need to solve $s^3 + s^2 + 4s + 2 = 0$ to find the poles of the closed-loop system. ```python import numpy as np print(np.roots([1,1,4,2])) ``` which yields `[-0.23341158+1.92265955j -0.23341158-1.92265955j -0.53317683+0.j]` as poles. Since all the poles have negative real parts, the system is stable. > [!question] set b. > > Fix $K_i = 10$. Using the [Routh-Hurwitz criterion](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a2/content/../../../../../../../../../../thoughts/Routh-Hurwitz-criterion), determine the ranges of $K_p$ and $K_d$ that result in a stable system. The open-loop transfer function is given by $$ G(s) = C(s) P(s) = (K_p + \frac{K_i}{s} + K_d s) \frac{1}{s^2 + 3s + 1} = \frac{K_d s^2+K_p s + 10}{s^3+3s^2+s} $$ The characteristic equation of the closed-loop system is given by $1 + G(s) = 0$: $$ \begin{align*} 1 + \frac{K_d s^2+K_p s + 10}{s^3+3s^2+s} &= 0 \\\ s^3+3s^2+s + K_d s^2 + K_p s + 10 &= 0 \\\ s^3 + (3+K_d) s^2 + (K_p + 1)s + 10 &= 0 \end{align*} $$ Applying the Routh-Hurwitz criterion, we have the following table: ```python from sympy import symbols, Matrix Kd, Kp = symbols('Kd Kp') a0 = 10 a1 = Kp + 1 a2 = 3 + Kd a3 = 1 routh = Matrix([ [a3, a1], [a2, a0], [a1 - (a2*a3)/a3, 0], [a0, 0] ]) print(routh) ``` which results in the following table: ```prolog Matrix([[1, Kp + 1], [Kd + 3, 10], [-Kd + Kp - 2, 0], [10, 0]]) ``` The conditions for stability from the Routh-Hurwitz criterion states that all the elements in the first column of the Routh array must be positive. Thus, we have the following inequalities: $$ \begin{align*} K_d + 3 &> 0 \\\ -K_d + K_p - 2 &> 0 \end{align*} $$ Solving for $K_d$ and $K_p$ yields the following ranges: $$ \begin{align*} K_d &> 0 \\\ K_p &> 2 \end{align*} $$ > [!question] set c. > > For the system in the first question, suppose that you want the steady-state error to be $10\%$. What should the values of $K_p$ and $K_d$ be? (Hint: the system is not in the unity gain form that we discussed in detail in lecture, so be careful.) The open-loop transfer function is given by: $$ G(s)H(s) = (K_p + K_d s)\frac{1}{s^2+7s+5} $$ The transfer function for closed-loop is given by: $$ T(s) = \frac{G(s)H(s)}{1+G(s)H(s)} $$ From final value theorem, the steady-state error is given by $$ \lim_{s\to0}s\cdot R(s) \cdot (1-T(s)) $$ For step input $R(s) =\frac{1}{s}$ we got $$ SSE = 0.1 = \lim_{s\to0} s \cdot \frac{1}{s} \cdot (1 - \frac{K_p + K_d s}{s^2+7s +5 + K_p + K_d s}) $$ $K_p = \frac{5}{8}$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/A3 tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/A3" title: Open-loop system date: 2024-03-18 --- See also [problem](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/A3/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/assignment3.pdf) ## Problemè 1 A unity feedback system has transfer function $$ G(s) = \frac{K}{s(s^2+4s+13)} $$ > [!question] 1.a > > Plot the root locus for this problem We need to find the poles and zeros of the open-loop transfer function $G(s)$. The poles are given by the roots of the denominator polynomial: $$ s(s^2+4s+13) = 0 \implies s = 0, -2 \pm 3j $$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/A3/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/p1a.webp) And the following code to draw the plot: ```python import matplotlib.pyplot as plt import control as ctl numerator = [1] denominator = [1, 4, 13, 0] G_s = ctl.TransferFunction(numerator, denominator) rlist, klist = ctl.root_locus(G_s, Plot=True, grid=True) # Show the plot plt.show() ``` > [!question] 1.b > > Find the value of $K$ that gives a damping ration of 0.2588 K is approximately $10.000$. The following code is used: ```python # Find the gain K for a damping ratio of 0.2588 K_range = np.linspace(0, 10, 1000) for K in K_range: poles = np.roots(G.den[0][0] + K * G.num[0][0]) zeta_actual = -np.cos(np.angle(poles[0])) if np.abs(zeta_actual - zeta) < 0.001: break # Print the gain value print(f'The gain K for a damping ratio of 0.2588 is approximately: {K:.3f}') ``` > [!question] 1.c > > Find the location of the roots for the value of $K$ found in 1.b With $K=10$, the poles are `[-1.5+2.78388218j -1.5-2.78388218j -1. +0.j ]`. The code is: ```python K = 10 print(np.roots([1, 4, 13, K])) ``` > [!question] 1.d > > Plot the step response of your closed-loop system, along with the step response of an ideal second order system with damping ratio 0.2588 and poles that correspond to the two poles with imaginary parts. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/A3/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/p1d.webp) Here is the code for that: ```python # Extract the imaginary part of the complex poles wn = np.abs(poles[0].imag) # Create the closed-loop transfer function with the found gain K G_cl = ctl.feedback(K * G, 1) # Create an ideal second-order system with the same damping ratio and natural frequency G_ideal = ctl.tf([wn**2], [1, 2 * zeta * wn, wn**2]) # Generate time vector for simulation t = np.linspace(0, 10, 1000) # Simulate the step response of the closed-loop system and the ideal system _, y_cl = ctl.step_response(G_cl, t) _, y_ideal = ctl.step_response(G_ideal, t) # Plot the step responses plt.figure() plt.plot(t, y_cl, label='Closed-loop System') plt.plot(t, y_ideal, '--', label='Ideal Second-Order System') plt.xlabel('Time') plt.ylabel('Output') plt.title('Step Response Comparison') plt.legend() plt.grid() plt.show() ``` > [!question] 1.e > > Find the value of K that leads to a marginally stable system. The characteristic equation is given by: $$ 1 + G(s) = 0 $$ or $$ s^3 + 4s^2 + 13s + K = 0 $$ Let’s setup the [Routh-Hurwitz table](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/A3/../../../../../../../../../../thoughts/Routh-Hurwitz-criterion): | $s^3$ | 1 | 13 | | ----- | -------------------- | -- | | $s^2$ | 4 | K | | $s^1$ | $b=13 - \frac{K}{4}$ | 0 | | $s^0$ | K | - | For marginal stability, the system must have poles on the imaginary axis. This occurs when the first element of any row in the Routh array is zero. Let $b=0$, then $13 - \frac{K}{4} = 0 \implies K = 52$. Therefore, the system is marginally stable for $K = 52$. This is the critical gain $K_{cr}$. For $K < 52$, all elements in the first column of the Routh array are positive, indicating stability. For $K > 52$, there is a sign change in the first column, indicating instability For frequency oscillation at marginal stability, solve characteristic equation for $s$ with $K=52$: $$ (s^2+4)(s+13) = 0 $$ imaginary roots are $\pm 2j$, thus frequency of oscillation is $2$ rad/s. --- ## Problemè 2 Consider the open-loop system $$ G(s) = \frac{(s+10)}{(s+1)(s+2)(s+12)} $$ > [!question] 2.a > > Suppose that design specifications are that the $\%OS$ is 20% and the settling time is 1 second. Use the root-locus approach to design a PD controller for this system. Given $\%OS$ is 20% and settling time is 1 second, we can find $\zeta$, $\sigma$, $\omega_{n}$ as: $$ \begin{align*} \zeta &= -\ln(\%OS/100) / \sqrt{\pi^2 + \ln^2(\%OS/100)} \approx 0.456 \\ \sigma &= \frac{4}{\zeta T_s} \approx 8.77 \\ \omega_{n} &= 19.24 \end{align*} $$ The code to find: ```python import numpy as np import matplotlib.pyplot as plt import control as ctl # Desired specifications OS = 0.20 # 20% overshoot Ts = 1 # 1 second settling time # Calculations for desired pole locations zeta = -np.log(OS) / np.sqrt(np.pi**2 + np.log(OS) ** 2) # damping ratio sigma = -4 / (zeta * Ts) # real part of poles wd = sigma / zeta # imaginary part of poles print(zeta, sigma, wd) ``` So desired dominant poles are: $$ s_{1,2} = -\zeta \omega_n \pm j\omega_{n}\sqrt{1-\zeta^2} = -4 \pm 7.66j $$ Propose a PD controller $D(s) = K(s+z)$ where $K$ is the gain and $z$ is the zero introduced by the PD controller. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/A3/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/p2a.webp) The code can be found [here](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a3/p2.py) --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/A4 tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/A4" title: Joint-control open-loop system date: 2024-03-28 --- See also [problem](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/A4/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/assignment4.pdf) ## Problemè 1 A robot arm has a joint-control open-loop transfer function $$ G(s) = \frac{300(s+100)}{s(s+10)(s+40)} $$ > [!question] 1.a > > Plot the asymptotic approximation of the Bode plot The code can be found in [p1a.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/p1a.py) ![bode plot](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/A4/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/p1a.webp) > [!question] 1.b > > Repeat this with the pole at 0 and the denominator replaced by a pole at -1 The code can be found in [p1b.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/p1b.py) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/A4/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/p1b.webp) > [!question] 1.c > > For the system above, estimate the bandwidth using only the asymptotic approximation The corner frequency are at 1, 10, 40 rad/s given the poles. Given the dominant pole is at -1, at 1 rad/s the gain starts dropping at -20 db/Decade. Therefore, the frequency needs to increase by a factor of $10^{\frac{3}{20}} \approx 1.41 \text{ rad s}$ > [!question] 1.d > > Use MATLAB to find the bandwidth of the system in (b). Why is the result different than your answer in c? The following is the code for finding the bandwidth of the system ```matlab % Define the transfer function num = 300 * [1 100]; den = conv([1 1], conv([1 10], [1 40])); G = tf(num, den); % Generate frequency vector (in rad/s) w = logspace(-2, 3, 1000); % Compute magnitude and phase [mag, phase, w] = bode(G, w); % Asymptotic magnitude approximation asymp_mag = zeros(size(w)); asymp_mag(w < 1) = 300 * 100 / (1 * 10 * 40); % DC gain asymp_mag(w >= 1 & w < 10) = 300 * 100 ./ (w(w >= 1 & w < 10) * 10 * 40); % -20 dB/dec slope asymp_mag(w >= 10 & w < 40) = 300 * 100 ./ (w(w >= 10 & w < 40).^2 * 40); % -40 dB/dec slope asymp_mag(w >= 40) = 300 * 100 ./ (w(w >= 40).^3); % -60 dB/dec slope % Asymptotic phase approximation asymp_phase = zeros(size(w)); asymp_phase(w < 0.1) = 0; % 0 deg asymp_phase(w >= 0.1 & w < 1) = -45; % -45 deg asymp_phase(w >= 1 & w < 10) = -90; % -90 deg asymp_phase(w >= 10 & w < 40) = -180; % -180 deg asymp_phase(w >= 40) = -270; % -270 deg % Plot Bode diagram figure; subplot(2, 1, 1); loglog(w, squeeze(mag)); hold on; loglog(w, asymp_mag, '--'); ylabel('Magnitude'); title('Asymptotic Bode Plot'); grid on; subplot(2, 1, 2); semilogx(w, squeeze(phase)); hold on; semilogx(w, asymp_phase, '--'); xlabel('Frequency (rad/s)'); ylabel('Phase (deg)'); grid on; % Find the bandwidth mag_db = 20*log10(squeeze(mag)); bandwidth = w(find(mag_db >= -3, 1, 'last')); fprintf('The bandwidth of the system is %.2f rad/s.\n', bandwidth); ``` The result yield 28.74 rad/s. The difference: - The actual magnitude plot has smooth transitions around the corner frequencies, which the asymptotic approximation does not capture. This leads to some error in the bandwidth estimate. - The asymptotic approximation does not account for the effect of the zero at $s=-100$, which causes a increase in the magnitude plot at high frequencies. - The -3 dB point on the actual magnitude plot occurs at a slightly higher frequency than predicted by the asymptotic approximation. --- ## Problemè 2 A system has plant $$ G(s) = \frac{3s^2+4s-2}{s^3+3s^2+7s+5} $$ > [!question] Question > > Add state variable feedback so that the closed-loop poles are at -4, -4, and -5 Given plant transfer function gives the following state-space representation $$ A = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ -5 & -7 & -3 \end{bmatrix}, \quad B = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}, \quad C = \begin{bmatrix} -2 & 4 & 3 \end{bmatrix} $$ The controllability matrix is $$ M_C = \begin{bmatrix} B & AB & A^2B \end{bmatrix} = \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & -3 \\ 1 & -3 & 4 \end{bmatrix} $$ For poles at $p_1 = -4, p_2=-4, p_3 = -5$ the desired characteristic equation is: $$ \Delta_D(s) = (s+4)(s+4)(s+5) = (s^2+8s+16)(s+5) = s^3 + 13s^2 + 56s + 80 $$ The feedback gain vector $K$ using Ackermann’s formula where $e_{3}^{T} = \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & -3 \\ 1 & -3 & 4 \end{bmatrix}$ and $$ \Delta_D(A) = A^3 + 13A^2 + 56A + 80I $$ yields $K = \begin{bmatrix} 75 & 49 & 10 \end{bmatrix}$ The code is found in [p2.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/p2.py). --- ## Problemè 3 A system is given by $$ \dot{x} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 9.8 & 0 \end{bmatrix} + \begin{bmatrix} 0 \\ 1 \\ 0 \\ -1 \end{bmatrix} u $$ > [!question] Question > > Use state variable feedback to place the closed-loop poles at $s= -2 \pm j$, -5, and -5 Controllability matrix is given by ```matlab % System matrices A = [0 1 0 0; 0 0 -1 0; 0 0 0 1; 0 0 9.8 0]; B = [0; 1; 0; -1]; C = eye(4); % Desired closed-loop poles p_des = [-2+1j, -2-1j, -5, -5]; p_des_poly = poly(p_des); CM = ctrb(A, B); ``` yields $$ CM = \begin{bmatrix} 0 & 1 & 0 & 1 \\ 1 & 0 & 1 & 0 \\ 0 & -1 & 0 & -9.8 \\ 0 & -1 & 0 & -9.8 \end{bmatrix} $$ This system is controllable. The desired poles are $s=-2 \pm j, -5, -5$, which yields the characteristic equation: $$ (s+2+j)(s+2-j)(s+5)^2 = s^4 + 14s^3 + 70s^2 + 150s + 125 = 0 $$ The close loop with state feedback $u=-Kx$ is $\dot{x}=(A-BK)x$ where $K=\begin{bmatrix} k_1 & k_2 & k_3 & k_4 \end{bmatrix}$. Solving for K from [p3.m](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a4/p3.m) yields $$ K = \begin{bmatrix} 14.20 & 17.045 & 94.0046 & 31.0455 \end{bmatrix} $$ Therefore the state feedback control is $u =-\begin{bmatrix} 14.20 & 17.045 & 94.0046 & 31.0455 \end{bmatrix} x$. œ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a5/A5 tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a5/A5" title: Observer and state-space model date: 2024-04-10 --- ## Problemè 1 Consider the following state-space model: $$ \begin{aligned} \dot{x} &= \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ -5 & -6 & 0 \\ \end{bmatrix} x + \begin{bmatrix} 0 \\ 0 \\ 1 \\ \end{bmatrix} u \\ y &= \begin{bmatrix} 1 & 0 & 0 \\ \end{bmatrix} x \end{aligned} $$ Design an observer to place the observer poles at -10, -10, -15 _Solution_ The characteristic equation of the observer is given by: $$ det(sI - (A - LC)) = (s + 10)(s + 10)(s + 15) = s^3 + 35s^2 + 350s + 1500 $$ From the coefficients of the characteristic equation we get $$ det(sI - (A-LC)) = s^3 + (l_1-6)s^2 + (l_2-5-6l_1)s + (l_3-5l_2) $$ Solving for the coefficients we get the observer gain matrix: $$ L = \begin{bmatrix} 4505 \\ 601 \\ 41 \end{bmatrix} $$ Thus the observer dynamics are given by: $$ \dot{\hat{x}} = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ -5 & -6 & 0 \\ \end{bmatrix} \hat{x} + \begin{bmatrix} 0 \\ 0 \\ 1 \\ \end{bmatrix} u + \begin{bmatrix} 4505 \\ 601 \\ 41 \\ \end{bmatrix} (y - \hat{y}) $$ ## Problemè 2 Given the plant $$ \begin{aligned} \dot{x} &= \begin{bmatrix} -1 & 1 \\ 0 & 2 \\ \end{bmatrix} x + \begin{bmatrix} 0 \\ 1 \\ \end{bmatrix} u \\ y &= \begin{bmatrix} 1 & 1 \\ \end{bmatrix} x \end{aligned} $$ Design an integral controller to yield a 10% overshoot, 0.5 second settling time and zero steady-state error for a step input. _Solution_ The code for the integral controller is given by [p2.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a5/p2.py). Add an integrator to the plant to ensure zero steady-state error for a step input. The augmented state-space model becomes: $$ \dot{x}_a = \begin{bmatrix} -1 & 1 & 0 \\ 0 & 2 & 0 \\ -1 & -1 & 0 \\ \end{bmatrix} x_a + \begin{bmatrix} 0 \\ 1 \\ 0 \\ \end{bmatrix} u $$ $$ y = \begin{bmatrix} 1 & 1 & 0 \\ \end{bmatrix} x_a $$ where x\_a = \begin{bmatrix} x \\\ \int e \\, dt \\\ \end{bmatrix} \$\$ and \$e = r - y\$ is the tracking error. Then, design the state feedback gains K = \begin{bmatrix} k\_1 & k\_2 & k\_i \ \end{bmatrix} $$ such that the closed-loop system meets the transient response specifications. The characteristic equation of the closed-loop system is: $$ \left| sI - (A\_a - B\_aK) \right| = 0 $$ Expanding this yields: $$ (s + k\_1)(s^2 + (1 - k\_2)s + k\_i) = 0 $$ The control law then given by: $$ u = -Kx\_a = -k\_1x\_1 - k\_2x\_2 - k\_i\int{e , dt} $$ The code yields: ```prolog zeta: 0.5911550337988974 omega_n: 13.53282902556064 Desired poles: [ -8. +10.91501083j -8. -10.91501083j -135.32829026 +0.j ] Plant model: : sys[2] Inputs (1): ['u[0]'] Outputs (1): ['y[0]'] States (2): ['x[0]', 'x[1]'] A = [[-1. 1.] [ 0. 2.]] B = [[0.] [1.]] C = [[1. 1.]] D = [[0.]] Augmented plant model: : sys[3] Inputs (1): ['u[0]'] Outputs (1): ['y[0]'] States (3): ['x[0]', 'x[1]', 'x[2]'] A = [[-1. 1. 0.] [ 0. 2. 0.] [-1. -1. 0.]] B = [[0.] [1.] [0.]] C = [[1. 1. 0.]] D = [[0.]] State feedback gains: K = [[-10193.77795361 152.32829026 -12391.83976888]] Integral controller transfer function: -1.239e+04 ---------- s Open-loop transfer function: : sys[6] Inputs (1): ['u[0]'] Outputs (1): ['y[0]'] States (3): ['sys[4]_x[0]', 'sys[2]_x[0]', 'sys[2]_x[1]'] A = [[-0.00000000e+00 0.00000000e+00 0.00000000e+00] [ 0.00000000e+00 -1.00000000e+00 1.00000000e+00] [-1.23918398e+04 0.00000000e+00 2.00000000e+00]] B = [[1.] [0.] [0.]] C = [[0. 1. 1.]] D = [[0.]] Closed-loop transfer function: : sys[9] Inputs (1): ['u[0]'] Outputs (1): ['y[0]'] States (3): ['sys[6]_sys[4]_x[0]', 'sys[6]_sys[2]_x[0]', 'sys[6]_sys[2]_x[1]'] A = [[ 0.00000000e+00 -1.00000000e+00 -1.00000000e+00] [ 0.00000000e+00 -1.00000000e+00 1.00000000e+00] [-1.23918398e+04 0.00000000e+00 2.00000000e+00]] B = [[1.] [0.] [0.]] C = [[0. 1. 1.]] D = [[0.]] ``` $$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/index tags: - sfwr3dx4 - university description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/index" title: Control System date: 2024-10-10 --- External source: [theory](https://engineering.purdue.edu/~sundara2/misc/ece380_notes.pdf) --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab1/content tags: - sfwr3dx4 - lab description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab1/content" title: PID Controller date: 2024-01-24 --- See [lab notes](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab1/content/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab1/lab1.pdf) ### prelab. The general open loop transfer function which models the angular velocity of $\omega(t)$ of a motor is: $$ G_{\omega}(s) = \frac{\omega(s)}{U(s)} = \frac{A}{\tau s + 1} $$ where $A$ and $\tau$ are positive constants. > [!question] 1 > > What is the transfer function of the angular position of a motor $\theta(t)$? Since angular velocity is the derivative of angular position, we have: $\omega(t) = \frac{d \theta(t)}{dt}$ From Theorem 7 of Table 2.2: $\mathcal{L} \{ \frac{d f(t)}{dt} \} = s F(s) - f(0^-)$. Assuming the initial angular position is zero, or $\theta(0^-) = 0$: $\mathcal{L}(\omega(t)) = \omega(s) = s \Theta(s)$ $$ \begin{align} \Theta(s) &= \frac{\omega(s)}{s} \\\ &= \frac{G_{\omega}(s) \cdot U(s)}{s} \\\ &= \frac{A}{s(\tau s + 1)} \cdot U(s) \end{align} $$ > Transfer function of the angular position of a motor is $\Theta(s) = \frac{A}{s(\tau s + 1)} \cdot U(s)$ > [!question] 2 > > What, if any, is the steady state value of $\omega(t)$ in open loop response to a step input: > > $$ > u(t) = \begin{cases} U_{\mathcal{o}}, & t \geq 0 \\\ 0, & t < 0 \end{cases} > $$ Using the final value theorem from Laplace transform, we have: $$ \lim_{t \to \infty} f(t) = \lim_{s \to 0} s F(s) $$ Laplace transform of $u(t)$ is $U(s) = \frac{U_{\mathcal{o}}}{s}$ the steady-state value of $\omega(t)$ is: $$ \begin{align*} &= \lim_{s \to 0} s \cdot G_{\omega}(s) \cdot \frac{U_{\mathcal{o}}}{s} \\\ &= \lim_{s \to 0} \frac{A}{\tau s + 1} \cdot U_{\mathcal{o}} \\\ &= A \cdot U_{\mathcal{o}} \end{align*} $$ --- ### lab. #### 5.1 $$ \begin{align} \frac{\theta}{V} &= \frac{K}{s((Js+b)(Ls+R) + K^{2})} \\\ & = \frac{K}{s(JLs^{2}+bLs + JRs+bR + K^{2})} \\\ G(s) & = \frac{K}{JLs^{3} + s^{2}(bL+JR) + (K^2+bR)s} \end{align} $$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab1/content/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab1/5.1-graph.webp) 1. What does the graph represents? What does the first derivative of the graph represent and look like? - Angular position of the motor. The first derivate would be the angular velocity, or the rate of change. It would start at zero (as the first part is flatten) then will keep increasing since the slope is positive. 2. What is represented by non-linear section? - Represent the system is accelerating 3. Steady-state error 4. percent overshoot 5. settling time of this response 6. is the response stable with respect to angular position? #### 5.2 ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab1/content/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab1/5.2-graph.webp) --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab2/content tags: - sfwr3dx4 - lab description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab2/content" title: Empirical Estimation of Transfer Functions for First Order Systems date: 2024-02-14 --- See [lab notes](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab2/content/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab2/lab2.pdf) ## prelab. The transfer function of a DC electric motor with respect to angular velocity is: $$ G_{\omega}(s) = \frac{\Omega(s)}{V(s)} = \frac{A}{\tau s + 1} $$ Where - $A$ and $\tau$ are positive, real-valued constants - $V(s)$ and $\Omega(s)$ are voltage and angular velocity as function of $s$ in the Laplace domain. Note that $\Omega(s) \coloneqq \mathcal{L}\{\omega(t)\}$ ### Q1. We will now develop a formula for the motor DC gain constant $A$ in terms of a step input change $\Delta V$ and step output change $\Delta \omega$ > [!question] problem a. > > Using the final value theorem, find an expression for the steady state value of $\omega(t)$ when a step input of amplitude $V_x$ is applied. > > note: $\tau > 0$ so the pole of $G_{\omega}(S)$ at $s=-\frac{1}{\tau}$ is in the open Left Half Plane (LHP) so the system is stable. Given that the final value theorem, the steady state value of $f(t)$ is given by: $$ \lim_{t \to \infty} f(t) = \lim_{s \to 0} s F(s) $$ When a step input $V(s) = \frac{V_x}{s}$ is applied, the output in the Laplace domain is given by: $$ \begin{align*} \Omega(s) = G_{\omega}(s) \cdot V(s) &= \frac{A}{\tau s + 1} \cdot \frac{V_x}{s} \\\ \lim_{t \to \infty} \omega(t) = \lim_{s \to 0} s \Omega(s) &= \lim_{s \to \infty} (\frac{A \cdot V_x}{\tau s + 1}) \end{align*} $$ Since $\tau > 0$ when $s \to 0$: $$ \lim_{t \to \inf} \omega(t) = \frac{A \cdot V_x}{\tau \cdot 0 + 1} = A \cdot V_x $$ > The steady-state value of $\omega(t)$ under a step amplitude of $V_x$ is directly proportional to the product of DC gain and step input $V_x$ > [!question] problem b. > > Give the expression for $\omega (t)$ in response to a step input $V_x$ at time $t=0$. Assume a non-zero initial condition for $\omega (t)$, i.e., $\omega(t) = \omega_0$. > > note: the response due to a non-zero initial condition $\omega_0$ can be modeled as the response due to input $v(t) = \omega_0 \delta(t)$ where $\delta(t)$ is the impulse function. Since $G_{\omega}(s)$ is a linear system, the response to step input $V_x$ with non-zero initial condition $\omega_0$ is just the sum of the responses due to the step and the initial condition. The zero-state response to step input $V_x$ is given by the inverse Laplace transform of $G_{\omega}(s) \cdot V(s)$: $$ \omega_{zs}(t) = AV_x \cdot (1 - e^{-\frac{t}{\tau}}) $$ The zero-input response to initial condition $\omega_0$ can be modeled as the response to an impulse input $\omega_0 \delta(t)$. The Laplace transform of the impulse response is: $$ \Omega_{zi}(s) = \frac{A \cdot \omega_0}{\tau s + 1} $$ Which the zero-input response is given by: $\omega_{zi}(t) = \omega_0 \cdot e^{-\frac{t}{\tau}}$ The total response $\omega(t)$ is the sum of the zero-state and zero-input responses (due to linearity): $$ \omega(t) = \omega_{zs}(t) + \omega_{zi}(t) = AV_x \cdot (1 - e^{-\frac{t}{\tau}}) + \omega_0 \cdot e^{-\frac{t}{\tau}} $$ > [!question] problem c. > > For $\omega(t)$ in computed in part b, what is the $\lim_{t \to \infty} \omega(t)$? How does it compare to result in part a? $$ \lim_{t \to \infty} \omega(t) = \lim_{t \to \infty} (AV_x \cdot (1 - e^{-\frac{t}{\tau}}) + \omega_0 \cdot e^{-\frac{t}{\tau}}) $$ Since $e^{-\frac{t}{\tau}} \to 0$ as $t \to \infty$, the steady-state value of $\lim_{t \to \infty} \omega(t)$ is: $$ \lim_{t \to \infty} \omega(t) = AV_x (1-0) + 0 = AV_x $$ > We see that the steady-state value of $\omega(t)$ is the same. The initial condition $\omega_0$ does not affect the steady-state value of $\omega(t)$, only influence transient response. > [!question] problem d. > > Now assume that you run the motor with an initial step input of $V_{\text{min}}$ until time $t_0$. At time $t_0$, assume that the system has reached steady state and the step input is changed to $V_{\text{max}}$ at time $t_0$. In other words, the system input will take the form > > $$ > v(t) = \begin{cases} v_{\text{min}} & \text{if } 0 \leq t < t_0 \\\ v_{\text{max}} & \text{if } t \geq t_0 \end{cases} > $$ > > where $t_0 \gg \tau$ and $V_{\text{max}}$ and $ V\_{\text{min}}$ may be non-zero. > > Use the results above to show that: > > $$ > A = \frac{\Delta \omega}{\Delta V} > $$ > > where $\Delta V = V_{\text{max}} - V_{\text{min}}$ and $\Delta \omega = \omega_{ss} - \omega_0$ where $\omega_0$ is the steady-state response to a constant input $V_\text{min}$ and $\omega_{ss}$ is the steady-state response to the input $V_\text{max}$ For input $V_{\text{min}}$, the steady-state response is $\omega_0 = A \cdot V_{\text{min}}$ Similarly, for input $V_{\text{max}}$, the steady-state response is $\omega_{ss} = A \cdot V_{\text{max}}$ Thus, the change in steady-state response is $$ \Delta \omega = \omega_{ss} - \omega = A \cdot V_{\text{max}} - A \cdot V_{\text{min}} = A \cdot \Delta V $$ Thus $A = \frac{\Delta \omega}{\Delta V}$ ### Q2. Using the formula derived in Q1, use the following graphs to calculate A for this system. Given $A = \frac{\Delta \omega}{\Delta V}$, from the graph $V_{\text{min}} = 1$ and $V_{\text{max}} = 5$ and $\omega_0 = 5$ and $\omega_{\text{ss}} = 25$, $A = 5$ ### Q3. For a first order system, the time it takes a step response to reach 63.2% of its steady state value ($t_1 − t_0$ in Fig. 1) is the response’s time constant $\tau$ . i.e., at time $t_1, \omega(t_1) = 0.632\Delta \omega + \omega_0$. Find the time constant $\tau$ for the above system. $$ \omega(t_1) = 0.632\Delta \omega + \omega_0 = 0.632 \cdot (25 - 5) + 5 = 17.64 $$ From the graph, $\tau \approx 0.8 \sec$ ### Q4. Using $A$ and $\tau$ calculated in Q2 and Q3, find the transfer function in terms of $s$ $$ G_{\omega}(s) = \frac{\Omega(s)}{V(s)} = \frac{A}{\tau s + 1} = \frac{3}{0.05s + 1} $$ ### Q5. The system quickly rises to a steady-state value without any oscillation, which suggests a first-order system. The transfer function: $$ G(s) = \frac{K}{\tau s + 1} $$ ### Q6. Deriving a transfer function experimentally and then use simulation software to design is preferable in situations where experimenting directly with the plant poses high risks, incurs excessive costs, requires downtown. For example, a chemical processing plant, probably we don’t want to experiment with the actual system, since it could lead to dangerous chemical reactions, waste of materials. Using simulation engineers can safely and cost-effectively design and test control strategies before implementing in real system. Conversely, deriving transfer function experimentally and then using simulation software to design is not preferable in situations where the plant is simple and safe to experiment with, and the cost of experimenting is low. For instance, a small educational laboratory setup with low-cost components and minimal hazardous concerns would mean it might be more practical and education to design and calibrate the controller directly through experimentation and observe behaviours in real-time. --- ## lab. - [ ] Check to change the TCP address of the model URI from `QUARC > Settings > Preferences > Model` $$ G(w) = \frac{1.867468}{0.027947 * s + 1} $$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/content tags: - sfwr3dx4 - lab description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/content" title: Voltage-controlled electromechanical systems date: 2024-02-28 --- See [lab notes](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/content/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/lab3.pdf) and [prelab.](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/content/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/prelab) > [!question] Question > > Do you notice a pattern in the DC gain values you calculated? If so, what is that pattern? All DC gain are constant around $A \approx 98.76$, and Input voltage wrt DC gain is linear. > [!question] Question > > Why do you think this is happening? > [!question] Question > > What does this say about our model of the DC motor? --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/prelab tags: - sfwr3dx4 - lab description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/prelab" title: Root mean square date: 2024-03-05 --- See also: [pdf](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/prelab.pdf) ### problème 1. > The Root Mean Square (RMS) value of a signal $f(t)$ that is periodic with period $T$ is given by the equation $\sqrt{\frac{1}{T} \int_0^T{(f(t))^2dt}}$ It can be shown that the RMS value of $u(t) = B \sin{\omega t}$ is $\frac{B}{\sqrt{2}}$ > [!question] 1.a > > Square wave ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/Square-wave-signal.webp) The square wave function is defined as: $$ f(t) = \begin{cases} 1 & \text{if } 0 \leq t < \frac{T}{2} \\ 0 & \text{if } \frac{T}{2} \leq t < T \end{cases} $$ ```python import sympy as sp t = sp.symbols('t') T = 2 RMS = sp.sqrt(1/T * sp.integrate(1, (t, 0, T/2))) ``` > RMS = $\frac{1}{\sqrt{2}}$ > [!question] 1.b > > Sawtooth wave ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/Saw-tooth-signal.webp) A sawtooth wave function is defined as: $$ f(t) = \frac{2A}{T}(t - \frac{T}{2}) $$ ```python import sympy as sp t = sp.symbols('t') T = 1 A = 0.5 f_t = 2 * A / T * (t - T/2) RMS = sp.sqrt(1/T * sp.integrate(f_t**2, (t, 0, T))) ``` > RMS = $\frac{\sqrt{3}}{6}$ > [!question] 1.c > > sine wave ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/sine-wave-signals.webp) A general form of the sine wave can be written as $$ f(t) = A \sin(\omega t + \phi) $$ Amplitude is 2.3, no phase shift > RMS = $\frac{2.3}{\sqrt{2}}$ --- ### problème 2. Find the cutoff frequency of the following low-pass filters. Cutoff frequency of low-pass filters, the frequency at which the amplitude falls to $\frac{1}{\sqrt{2}} \approx 0.707$ > [!question] 2.a > > ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/bode-plot-2a.webp) > 0.05Hz > [!question] 2.b > > ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/bode-plot-2.webp) > approx. 1.1e05 Hz > [!question] 2.c > > ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab3/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/bode-plot-3.webp) > approx 1.1Hz --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab4/prelab tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab4/prelab" title: Root locus and graphical analysis date: 2024-03-20 --- See also [problem](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab4/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab4/lab4-prelab.pdf) ## Problemè 1 > [!question] 1.a > > What does a root locus plot depict? A root locus plot depicts locations of the closed-loop poles of a system in the complex $s$-plane as a function of a gain parameter, commonly the controller gain $K$ - represents how the roots (poles) of the closed-loop characteristic equation move in the complex plane is varied from $0 \to \infty$ - root locus starts at open-loop poles when $K=0$ and ends at open-loop zzeros when $K \to \infty$ - shape determines stability and transient response characteristics of the closed-loop system - Points on root locus satisfy angle condition and magnitude condition in relation to the open-loop transfer function. > [!question] 1.b > > What must be done to a transfer function before its root locus can be graphed? 1. find the open-loop poles and zeros of $G(s)H(s)$, or solving $1+G(s)H(s)=0$. The poles are the roots of the denominator polynomial, and the zeros are the roots of the numerator polynomial. 2. determine the number of branches of the root locus, which is equal to the number of poles minus number of zeros 3. Check for root locus existence on the real axis. 4. Determine breakaway and break-in points where root locus departs from and arrives on the real axis, via solving $\frac{dK}{ds} = 0$, where K is the open-loop gain 5. Calculate asymptote centroid and angles. Centroid is the center of gravity of the poles and zeros. Asymptote angles are given by $(2q+1)*\frac{180}{P-Z}$ where $q=0,1,2,\dots$ 6. Determine angle of departure and arrival at complex poles and zeros using angle condition. > [!question] 1.c > > What is the significance of the gain $K$? K represents the variable loop gain in feedback control system. Since root locus starts at open-loop poles when $K=0$ and ends at open-loop zeros as $K \to \infty$, thus K determines the trajectory of closed-loop poles. The stability and transient response characteristics of the closed-loop system depend on pole locations, which is determined by K. For example: - If poles are in the right-half plane for a certain K, the system is unstable. - Poles further from the origin (higher K) give faster response. - Poles with larger imaginary parts (higher K) produce more oscillations. Finally, K can be selected to achieve target spec like damping ratio, settling time, to shape system response via gain tuning > [!question] 1.d > > How can a root locus plot be used to design a controller?a\\ 1. **Selecting K gain**: root locus show trajectories of closed-loop poles as K varies. By selecting K, the desired pole locations can be achieved to meet the desired transient response characteristics. 2. **Assessing stability**: root locus allow determine range of K for which the closed-loop system is stable. System is stable if all poles lie in the left-half plane. Segments of the real axis to the left of an odd number of poles and zeros are part of the root locus. 3. **Adding poles and zeros**: If original root locus does not pass through the desired closed-loop pole locations, poles and zeros can be added via the controller to reshape the root locus (lead compensators add zeros and lag compensators add poles) 4. **Meeting spec**: Lines of constant damping ratio $\zeta$ and natural frequency $\omega_n$ can be drawn on the root locus to meet the desired transient response characteristics. 5. **Improve steady-state error**: Adding poles at the origin or close to it with PI or lag controllers increases the system types and reduces steady-state error. > [!question] 1.e > > Imagine we have a partially finished root locus plot where only the pole and zero locations have been plotted. What are the rules for completing the root locus plot using pencil and paper? 1. Number of branches: - Number of branches of the root locus is equal to the number of poles minus the number of zeros. - Branches start at poles and end at the zeros 2. Symmetry: - Root locus is symmetrical about the real axis 3. Real axis segments: - Portions of the real axis are part of the root locus if the number of real poles and zeros to the right is odd 4. Asymptotes as $K \to \infty$: - Asymptotes intersect at the centroid of the poles and zeros, and the angles are given by $(2q+1)*\frac{180}{P-Z}$ where $q=0,1,2,\dots$ 5. Breakaway and break-in points: - Breakaway and break-in points where the locus departs from or arrives on the real axis can be found by solving for. They are found by solving $\frac{dK}{ds} = 0$. --- ## Problemè 2 For each of the following transfer functions, sketch a root locus plot using the pencil-and-paper method you outlined above: > [!question] 2.a > > $$ > G(s) = \frac{1}{(s+5)(s+9)} > $$ poles: $s=-5, -9, n=2$, zeros: $\infty, m=0$ branches: 2 ($n>m$) asymptotes: $\theta = (2q+1)\frac{180}{(n-m)} = 90^\circ, 270^\circ$ for $q=0,1$ Centroid: $\omega = \frac{-5-9}{2} = -7$ root locus on real axis: exists to the left of -5 and -9 breakaway, angle of departure/arrival: not applicable since no complex zeros locus is symmetrical about the real axis ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab4/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab4/p2a.webp) > [!question] 2.b > > $$ > G(s) = \frac{(s-4)(s-7)}{(s+2)(s+5)(s+12)} > $$ poles: $s=-2, -5, -12, n=3$, zeros: $s=4, 7, m=2$ branches: 3 ($n>m$) asymptotes: $\theta = (2q+1)\frac{180}{(n-m)} = 180^\circ$ for $q=0$ centroid: $\omega = \frac{-2-5-12-4-7}{3-2} = -30$ root locus on real axis: on the axis from 7 to 4, and from -2 to -5, and from -12 to $-\infty$. breakways/break-in points: solve for $s=\frac{dK}{ds}=0$, There are around two breakaways point, at $s=5.18, -3.13$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab4/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab4/p2b.webp) > [!question] 2.c > > $$ > G(s) = \frac{(s+7)}{(s+8)(s+9)(s+3)^2} > $$ poles: $s=-8, -9, -3, -3, n=4$, zeros: $s=-7, m=1$ branches: 4 ($n>m$) asymptotes: $\theta = (2q+1)\frac{180}{(n-m)} = 120^\circ, 180^\circ, 300^\circ$ for $q=0,1,2$ centroid: $\omega = \frac{-8-9-3-3-7}{4-1} = -10$ root locus on real axis: on the axis from -3 to -3, and from -7 to -8, and from -9 to $-\infty$. breakaways/break-in points: at $s=3$, which solves for $\frac{dK}{ds}=0$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab4/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab4/p2c.webp) --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab5/prelab tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab5/prelab" title: Steady state error and PID controller date: 2024-04-03 --- See also [problem](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab5/prelab/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/lab5/lab5-prelab.pdf) ## Problemè 1 In Lab 4, We used a PD compensator to control our ball and beam apparatus. The transfer function of our PD compensator was as follows: $$ G_C(s) = K_Ds + K_P $$ However, we did not use the compensator in this form. The transfer function we used in lab was as follows: $$ G_C(s) = K_C(s+z) $$ > [!question] Question > > Solve for $K_C$ and $z$ in terms of $K_P$ and $K_D$. Given $$ \begin{align*} G_C(s) &= K_C(s+z) \\\ G_C(s) &= K_Ds + K_P \end{align*} $$ Or it can be written as: $$ (K_C - K_D)s + K_Cz - K_P = 0 $$ To solve for the characteristic equation, we can set the coefficients of $s$ and the constant term to zero: $$ \begin{align*} K_C - K_D &= 0 \\\ K_Cz - K_P &= 0 \end{align*} $$ Thus, we can solve for $K_C$ and $z$ as follows: $$ \begin{align*} K_C &= K_D \\\ z &= \frac{K_P}{K_C} = \frac{K_P}{K_D} \end{align*} $$ ## Problemè 2 Given that the transfer function of our Ball and Beam plant used in the previous lab is as follows: $$ G(s) = \frac{0.419}{s^2} $$ And given that the controller is applied to the plant in cascade configuration, find: > [!question] 2.a > > Static error constant for position (position constant) This is a Type-2 system, thus position constant $K_p = \infty$ > [!question] 2.b > > Static error constant for velocity (velocity constant) Velocity constant $K_v = \lim_{s\to 0} sG(s) = \lim_{s\to 0} \frac{0.419s}{s^2} = \infty$ > [!question] 2.c > > Static error constant for acceleration (acceleration constant) Acceleration constant $K_a = \lim_{s\to 0} s^2G(s) = \lim_{s\to 0} \frac{0.419s^2}{s^2} = 0.419$ > [!question] 2.d > > Steady-state error for a step input $u(t)$ For a step input $R(s) = \frac{1}{s}$, the steady-state error is given by: $$ e_{ss} = \lim_{s\to 0} \frac{R(s)}{1+K_pG(s)} = \lim_{s\to 0} \frac{1/s}{1+\infty} = 0 $$ > [!question] 2.e > > Steady-state error for a ramp input $tu(t)$ For a ramp input $R(s) = \frac{1}{s^2}$, the steady-state error is given by: $$ e_{ss} = \lim_{s\to 0} \frac{sR(s)}{1+K_vG(s)} = \lim_{s\to 0} \frac{s/s^2}{1+0.419} = \frac{1}{0.419} \approx 2.39 $$ > [!question] 2.f > > Steady-state error for a parabolic input $t^2u(t)$ For a parabolic input $R(s) = \frac{1}{s^3}$, the steady-state error is given by: $$ e_{ss} = \lim_{s\to 0} \frac{s^2R(s)}{1+K_aG(s)} = \lim_{s\to 0} \frac{s^2/s^3}{1+0.419} = \frac{1}{0.419} \approx 2.39 $$ ## Problemè 3 We will be augmenting our controller to include an integrator. The transfer function of our new PID compensator will be as follows; $$ G_C(s) = K_Ds + K_P + \frac{K_I}{s} $$ Given that the transfer function for our plant has not changed, and given that this controller is also applied to the plant in cascade configuration. The closed-loop transfer function is $$ \frac{Y(s)}{R(s)} = \frac{G(s)G_C(s)}{1+G(s)G_C(s)} = \frac{\frac{0.419}{s^2} * (K_Ds + K_P + \frac{K_I}{s})}{1+\frac{0.419}{s^2} * (K_Ds + K_P + \frac{K_I}{s})} = \frac{0.419*(K_Ds + K_P + \frac{K_I}{s})}{s^2 + 0.419*(K_Ds + K_P + \frac{K_I}{s})} $$ > [!question] 3.a > > Static error constant for position (position constant) $$ K_P = \lim_{s\to 0} G_C(s)G(s) = \lim_{s\to 0} \frac{0.419}{s^2}(K_Ds+K_P+\frac{K_I}{s}) = \infty $$ > [!question] 3.b > > Static error constant for velocity (velocity constant) $$ K_V = \lim_{s\to 0} sG_C(s)G(s) = \lim_{s\to 0} s\frac{0.419}{s^2}(K_Ds+K_P+\frac{K_I}{s}) = 0.419K_D + \frac{0.419K_P}{s} + \frac{0.419K_I}{s^2} = 0.419K_I $$ > [!question] 3.c > > Static error constant for acceleration (acceleration constant) $$ K_A = \lim_{s\to 0} s^2G_C(s)G(s) = \lim_{s\to 0} s^2\frac{0.419}{s^2}(K_Ds+K_P+\frac{K_I}{s}) = 0.419K_P $$ > [!question] 3.d > > Steady-state error for a step input $u(t)$ For a step input $R(s) = \frac{1}{s}$, the steady-state error is given by: $$ e_{ss} = \lim_{s\to 0} \frac{sR(s)}{1-\frac{C(s)}{R(s)}} = \lim_{s\to 0} s\frac{1/s}{1-\frac{C(s)}{R(s)}} = \frac{1}{1+K_P} = 0 $$ > [!question] 3.e > > Steady-state error for a ramp input $tu(t)$ For a ramp input $R(s) = \frac{1}{s^2}$, the steady-state error is given by: $$ e_{ss} = \lim_{s\to 0} \frac{s^2R(s)}{1-\frac{C(s)}{R(s)}} = \lim_{s\to 0} \frac{s^2/s^2}{1-\frac{C(s)}{R(s)}} = \frac{1}{K_V} = \frac{1}{0.419K_I} $$ > [!question] 3.f > > Steady-state error for a parabolic input $t^2u(t)$ For a parabolic input $R(s) = \frac{1}{s^3}$, the steady-state error is given by: $$ e_{ss} = \lim_{s\to 0} \frac{s^3R(s)}{1-\frac{C(s)}{R(s)}} = \lim_{s\to 0} \frac{s^3/s^3}{1-\frac{C(s)}{R(s)}} = \frac{1}{K_A} = \frac{1}{0.419K_P} $$ ## Problemè 4 Ideally you want your controller design to reject a step disturbance input at $D(s)$. This means that in the steady state for $D(s) = \frac{1}{s}$, the output $Y(s)$ is unchanged. > [!question] 4.a > > Ignoring the input $R(s)$, what is the transfer function $\frac{E(s)}{D(s)}$ in terms of $G_1(s)$ and $G_2(s)$? To find the transfer function $\frac{E(s)}{D(s)}$, then the transfer function $\frac{E(s)}{D(s)}$ is given by: $$ \frac{E(s)}{D(s)}=\frac{G_1(s)G_2(s)}{1+G_1(s)G_2(s)} $$ > [!question] 4.b > > For $G_1(s) = K_C(s+z)$ and $G_2(s) = \frac{0.419}{s^2}$ what is the steady state error resulting from step inputs $R(s) = \frac{A}{s}$ and $D(s) = \frac{B}{s}$ The steady-state error to step input $R(s) = \frac{A}{s}$ is given by: $$ e_{ss}(R) = \lim_{s\to 0} \frac{A}{s}(\frac{1}{1+L(s)}) $$ with $L(s) = G_1(s)G_2(s) = \frac{0.419K_C(s+z)}{s^2}$ $$ e_{ss}(R) = \lim_{s\to 0} \frac{A}{s}(\frac{1}{1+\frac{0.419K_C(s+z)}{s^2}}) = \frac{A}{0.419K_C} $$ The steady-state error to step input $D(s) = \frac{B}{s}$ is given by: $e_{ss}(D) = \frac{B}{0.419K_C}$ Thus, the total steady-state error is $e_{ss} = e_{ss}(R) + e_{ss}(D) = \frac{A+B}{0.419K_C}$ > [!question] 4.c > > For $G_1(s) = K_Ds + K_P + \frac{K_I}{s}$ and $G_2(s) = \frac{0.419}{s^2}$ what is the steady state error resulting from step inputs $R(s) = \frac{A}{s}$ and $D(s) = \frac{B}{s}$ $$ L(s) = G_1(s)G_2(s) = \frac{0.419(K_Ds + K_P + \frac{K_I}{s})}{s^2} = \frac{0.419K_Ds^2 + 0.419K_Ps + 0.419K_I}{s^3} $$ The steady-state error to step input $R(s) = \frac{A}{s}$ is zero: $$ e_{ss}(R) = \lim_{s\to 0} \frac{A}{s}(\frac{1}{1+L(s)}) = \lim_{s\to 0} \frac{A}{s}(\frac{1}{1+\frac{0.419K_Ds^2 + 0.419K_Ps + 0.419K_I}{s^3}}) = 0 $$ --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous" title: Tout ce qu'il faut savoir sur la conception des systèmes de contrôle date: 2024-02-17 --- See also [source for code](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/code/tests.py) and [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/code/midterm) Book: [ISBN: 978-1-119-47422-7](https://www.wiley.com/en-us/Control+Systems+Engineering%2C+8th+Edition-p-9781119474227) and [pdf](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Norman-S.-Nise---Control-System-Engineering-Wiley-\(2019\).pdf) > [!note] Note > > `sp.Heaviside(t)` is $u(t)$ > [!tip] snippets > > ```python > import sympy > import sympy as sp > from symbol import symbols, apart, inverse_laplace_transform, simplify > from sympy.abc import s, t > ``` ## [Frequency domain](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/frequency_domain.pdf) See [notes](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Frequency-Domain) > [!tip] Common Laplace transform > > ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/laplace-transform-table.webp) > [!tip] Laplace Theorem > > ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/laplace-theorem.webp) ### Transfer function $n^{th}$ order _linear, time-invariant_ (LTI) differential equation: $$ a_n \frac{d^n c(t)}{dt^n} + a_{n-1} \frac{d^{n-1} c(t)}{dt^{n-1}} + \cdots + a_0 c(t) = b_m \frac{d^m r(t)}{dt^m} + b_{m-1} \frac{d^{m-1} r(t)}{dt^{m-1}} + \cdots + b_0 r(t) $$ _takes Laplace transform from both side_ $$ \begin{aligned} & a_n s^n C(s) + a_{n-1} s^{n-1} C(s) + \cdots + a_0 C(s) \text{ and init terms for } c(t) \\ & = b_m s^m R(s) + b_{m-1} s^{m-1} R(s) + \cdots + b_0 R(s) \text{ and init terms for } r(t) \\ \end{aligned} $$ _assume initial conditions are zero_ $$ \begin{aligned} (a_n s^n + a_{n-1} s^{n-1} + \cdots + a_0)C(s) &= (b_m s^m + b_{m-1} s^{m-1} + \cdots + b_0)R(s) \\[8pt] \frac{C(s)}{R(s)} &= G(s) = \frac{b_m s^m + b_{m-1} s^{m-1} + \cdots + b_0}{a_n s^n + a_{n-1} s^{n-1} + \cdots + a_0} \end{aligned} $$ > [!tip] Transfer function > > $$ > G(s)=\frac{C(s)}{R(s)} > $$ Q: $G(s) = \frac{1}{S+2}$. Input: $u(t)$. What is $y(t)$ ? $$ \begin{aligned} Y(s) &= G(s)\cdot u(s) \rightarrow Y(s)=\frac{1}{s(s+2)} = \frac{A}{s} + \frac{B}{s+2} = \frac{1}{2\cdot{s}} - \frac{1}{2\cdot{(s+2)}} \\ y(t) &= -\frac{1}{2}(1-e^{-2t})u(t) \end{aligned} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Frequency-Domain#transfer-function) Transfer function with feedback is under form $$ \frac{G(s)}{1+G(s)H(s)} $$ ### Equivalent Resistance and Impedance ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/electrical-system-equivalence.webp) ## [Block Diagrams](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Block-Diagrams) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/block-diagram-algebra.webp) ## [State space representation](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/State-space-representation) $$ \begin{align*} \dot{x} &= Ax + Bu \\ y &= Cx + Du \end{align*} $$ ### controller form Given $$ G(s) = \frac{\sum_{i=1}^{n-1}b_is^i + b_{0}}{s^n + \sum_{i=1}^{n-1}a_is^{i} + a_{0}} = \frac{Y(s)}{U(s)} $$ We get _controller canonical state space_ form: $$ \begin{aligned} \dot{x}(t) &= \begin{bmatrix} 0 & 1 & 0 & \cdots & 0 & 0 \\\ 0 & 0 & 1 & \cdots & 0 & 0 \\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\\ 0 & 0 & 0 & \cdots & 1 & 0 \\\ 0 & 0 & 0 & \cdots & 0 & 1 \\\ -a_0 & -a_1 & -a_2 & \cdots & -a_{n-2} & -a_{n-1} \end{bmatrix} x(t) + \begin{bmatrix} 0 \\\ 0 \\\ \vdots \\\ 0 \\\ 0 \\\ 1 \end{bmatrix} u(t) \\\ y(t) &= \begin{bmatrix} b_0 & b_1 & \cdots & b_{n-2} & b_{n-1} \end{bmatrix} x(t). \end{aligned} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/State-space-representation#controller-form) ### observer form We get _observer canonical state space_ form: $$ \begin{aligned} \dot{x}(t) &= \begin{bmatrix} -a_{n-1} & 1 & 0 & \cdots & 0 & 0 \\ -a_{n-2} & 0 & 1 & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ -a_2 & 0 & 0 & \cdots & 1 & 0 \\ -a_1 & 0 & 0 & \cdots & 0 & 1 \\ -a_0 & 0 & 0 & \cdots & 0 & 0 \end{bmatrix} x(t) + \begin{bmatrix} b_{n-1} \\ b_{n-2} \\ \vdots \\ b_2 \\ b_1 \\ b_0 \end{bmatrix} u(t) \\ y(t) &= \begin{bmatrix} 1 & 0 & \cdots & 0 & 0 \end{bmatrix} x(t). \end{aligned} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/State-space-representation#observer-form) ## [stability](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/stability) See [this](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/a2/content) for applications ### Necessary and sufficient condition for stability > to have all roots in open left hand plane is to have all coefficients of polynomial to be present and have same sign. [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/stability#necessary-and-sufficient-condition-for-stability) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/stability-comparison.webp) [Routh table](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/stability#routh-table) --- ## [Time response](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Time-response) > [!tip] Tip > > To find transfer function for a system given a step response graph, \*look for time over around 63% of the final value\$ > [!tip] Closed-loop transfer function > > $$ > T(s) = \frac{G(s)}{1+G(s)} > $$ ### %OS (percent overshoot) $$ \%OS = e^{\zeta \pi / \sqrt{1-\zeta^2}} \times 100 \% $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/Time-response#os-percent-overshoot) ## [steady-state error](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/steady-state-error) If a unity feedback system has a feedforward transfer function $G(s)$ then transfer function $\frac{E(s)}{R(s)}$ can be derived as: $$ \begin{aligned} C(s) &= E(s)\cdot G(s) \\\ E(s) &= R(s) - C(s) \end{aligned} $$ For $G(s) = K$ we get $\frac{E(s)}{R(s)} = \frac{1}{1+G(s)}$ ## state space design ### Pole placement with phase-variable form Closed-loop system characteristic equation $$ det(SI - (A-BK)) $$ ### Gain and Phase Stability Margins Closed loop pole exists when $$ 1+KG(s)H(s) = 0 $$ ## zero order hold Nyqust frequency: $$ f_N = \frac{1}{2}f_s $$ Set the third pole to s=-2 to cancel a zero as third pole. --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/stability tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/stability" title: Stability and natural responses. date: 2024-02-06 --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/stability/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/stability.pdf) > **Stable** if natural response tend to zero as $t \to \infty$. ### BIBO stability (bounded-input, bounded-output) A system is BIBO stable if the output is bounded for any bounded input. Stability and Poles > stable if _all poles_ are strictly in the left side of the complex plane. > unstable if _any pole_ is in the right side of the complex plane. > marginally stable e if no pole is on the right hand side, and its poles on the imaginary axis are of multiplicity one ### Necessary and sufficient condition for stability > to have all roots in open left hand plane is to have all coefficients of polynomial to be present and have same sign. ### [Routh-Hurwitz criterion](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/stability/../../../../../../../../thoughts/Routh-Hurwitz-criterion) ^routh-table Given $$ \frac{N(s)}{a_4s^4 + a_3s^3 + a_2s^2 + a_1s + a_0} $$ The characteristic equation is $a_4s^4 + a_3s^3 + a_2s^2 + a_1s + a_0 = 0$ Create a basic Routh table $$ \begin{array}{c|c|c|c} s^4 & a_4 & a_2 & a_0 \\ \hline s^3 & a_3 & a_1 & 0 \\ \hline s^2 & \frac{\begin{vmatrix} -a_4 & a_2 \\ -a_3 & a_1 \\ \end{vmatrix}}{a_{3}} = b_1 & \frac{\begin{vmatrix} -a_4 & a_0 \\ -a_3 & 0 \\ \end{vmatrix}}{a_{3}} = b_2 & \frac{\begin{vmatrix} -a_4 & 0 \\ -a_3 & 0 \\ \end{vmatrix}}{a_{3}} = 0 \\ \hline s^1 & \frac{\begin{vmatrix} -a_3 & a_1 \\ b_1 & b_2 \\ \end{vmatrix}}{b_{1}} = c_1 & \frac{\begin{vmatrix} -a_3 & 0 \\ b_1 & 0 \\ \end{vmatrix}}{b_{1}} = 0 & \frac{\begin{vmatrix} -a_3 & 0 \\ b_1 & 0 \\ \end{vmatrix}}{b_{1}} = 0 \\ \hline s^0 & \frac{\begin{vmatrix} -b_1 & b_2 \\ c_1 & 0 \\ \end{vmatrix}}{c_1} = d_1 & \frac{\begin{vmatrix} -b_1 & 0 \\ c_1 & 0 \\ \end{vmatrix}}{c_{1}} = 0 & \frac{\begin{vmatrix} -b_1 & 0 \\ c_1 & 0 \\ \end{vmatrix}}{c_{1}} = 0 \\ \end{array} $$ > states that the number of poles in the right half plane is equal to the number of sign changes in the first coefficient column of the table > [!tip] stability > > System is deemed **Stable** if there are no sign changes in the first column --- slug: thoughts/university/twenty-three-twenty-four/sfwr-3dx4/steady-state-error tags: - sfwr3dx4 description: resconstructed source of "https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/steady-state-error" title: steady-state error date: 2024-03-06 --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/steady-state-error/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/steady_state_error.pdf) > [!tip] Important > > System type is the number of integrators in the forward path, thus value of $n$ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/sfwr-3dx4/steady-state-error/../../../../../../../../thoughts/university/twenty-three-twenty-four/sfwr-3dx4/images/steady-state-error-table.webp) --- slug: thoughts/vllm tags: - ml - serving description: resconstructed source of "https://aarnphm.xyz/thoughts/vllm" title: vLLM date: 2024-09-09 --- See also [Paged Attention](https://aarnphm.xyz/thoughts/vllm/../../thoughts/Attention#paged-attention) ([Kwon et al., 2023](#bib-kwon2023efficient)) ## KV-Compress _variable compression rates per attention head_ source: [github](https://github.com/IsaacRe/vllm-kvcompress) ## idea. Look at past attention weights for each pair of key and value vectors (a measure of the degree with which that KV’s representation has been queried during past attention operations) Then select the KV with the least attention to evict Think of LFU (least frequency used) cache management policy the KV cache for each sequence in a particular layer is allocated on the GPU as a _# attention heads $X$ sequence length_ tensor. > [!tip] Important > > total memory allocation scales with the _maximum_ sequence length for all attention heads of the KV cache [Lien vers l'original](https://aarnphm.xyz/thoughts/vllm/../../thoughts/KV-compression#idea) > [!notes] Notes > > A variation of [Ada-SnapKV](https://aarnphm.xyz/thoughts/vllm/../../thoughts/KV-compression#ada-kv) idea: - _group-query-compression_: compress KV-cache of GQA without repeating it into the dimension of $\sum$ query heads. - Modified PagedAttention that compute _against_ KV-cache (contains variable numbers of KVs per head) ![](https://aarnphm.xyz/thoughts/vllm/../../thoughts/images/vllm/kv-compress-vllm.webp) > For vLLM, each cache block stores KV for every attention head of every layer > > For KV-Compress, each block only holds KVs for a single head. Block tables are expanded $l \times H$ so that unique block for each specific KV head and layer can be retrieved ### Query-Group Compression (QGC) KV compression algorithm doesn’t have GQA design in mind. - [Pyramid-KV](https://aarnphm.xyz/thoughts/vllm/../../thoughts/KV-compression#pyramid-kv) cache and compress KV _after_ repetition for alignment with query tensors - Redundancy in cache before compression > modification of eviction-based methods per groups ### Block layout and allocation idea: adapt PagedAttention to page out cache on a _per-head, per-layer–as well as per sequence–basis_ ![](https://aarnphm.xyz/thoughts/vllm/../../thoughts/images/vllm/paged-attention-block-kv-compress.webp) > [!note]- explanation > > A simplified example with two KV heads and a block size of two: > > - KV metrics are visualized for a given cache state, highlighting blocks of a particular sequence in the decoding batch that is scheduled to evict two blocks. > - Logical indices are displayed under the corresponding metrics slot. #### Evict from Paged KV cache > need to evict KV blocks instead of evict single KV attention ## automatic prefix caching _excerpt from [github](https://github.com/vllm-project/vllm/blob/main/docs/source/automatic_prefix_caching/details.md)_ ## block manager and evictor see also: [v2](https://github.com/vllm-project/vllm/blob/main/vllm/core/block_manager.py) and [v1](https://github.com/vllm-project/vllm/blob/5eda21e773447d81ffc661ac094716420dc7b7cb/vllm/core/block_manager_v1.py), [benchmark](https://docs.google.com/document/d/1XxYUFai07ta5rE7OdtCVhLJ5J0oAxEqrGgarFdjv0Zc/edit?tab=t.0) Reasoning for v2: - support sliding windows attention - lookahead slot for [speculative decoding](https://aarnphm.xyz/thoughts/vllm/../../thoughts/vllm#speculative-decoding) ## speculative decoding See [slides](https://docs.google.com/presentation/d/1p1xE-EbSAnXpTSiSI0gmy_wdwxN5XaULO3AnCWWoRe4/edit#slide=id.p) > Speculative execution for LLMs is an excellent inference-time optimization.\ > \ > It hinges on the following unintuitive observation: forwarding an LLM on a single input token takes about as much time as forwarding an LLM on K input tokens in a batch (for larger K than you might… > > — Andrej Karpathy (@karpathy) [31 août 2023](https://twitter.com/karpathy/status/1697318534555336961) - not all parameters are required for generations tokens - constraints tokens with low information-density > [!note] Ideas > > Uses a small cheap “draft model” to generate candidate K tokens ⇒ feed back to the large models in a batch > > - have a sort of sampling logics to get the probability of the next token, then forward passing for all later tokens. ## continuous batching ([Yu et al., 2022](#bib-280922)) solves the static batching to reduce cost and improve throughput by appending requests continuously into existing KV cache [^paper] ![](https://aarnphm.xyz/thoughts/vllm/../../thoughts/Continuous-batching/../../thoughts/images/vllm/continuous-batching.webp) [Lien vers l'original](https://aarnphm.xyz/thoughts/vllm/../../thoughts/Continuous-batching) ## guided decoding See [vllm-project/vllm#5423](https://github.com/vllm-project/vllm/issues/5423) - not supported from `SamplingParams` - requires support batch/async logits processing - engine will die if failed Benchmark script: [vllm-project/vllm#10046](https://github.com/vllm-project/vllm/pull/10046) ![](https://aarnphm.xyz/thoughts/vllm/../../thoughts/images/vllm/benchmark-guided-before-optimization.webp) > [!quote] overhead > > Currently logit\_processor are happening in frontend, so we should move this to model\_executor layers ### waterfall see also [introduction slides](https://docs.google.com/presentation/d/1QL-XPFXiFpDBh86DbEegFXBXFXjix4v032GhShbKf3s/edit) tldr: Bottleneck at `AsyncLogitProcessor` and `Scheduling` layer, given that this is row-wise operations [^row-wise] ```mermaid --- title: Initialization flow --- graph TB subgraph Engine AsyncLLMEngine[AsyncLLMEngine] end subgraph Executors GPU[GPUExecutorAsync] TPU[TPUExecutorAsync] XPU[XPUExecutorAsync] end subgraph Workers GPUWorker[GPUWorker] end subgraph Model Runners EmbeddingModelRunner[EmbeddingModelRunner] GPUModelRunner[GPUModelRunner] end subgraph Control Plane Scheduling[Scheduling] SequenceGroup[SequenceGroup] KVCache[KVCache] Executors end AsyncLLMEngine --> |init| C[init] C --> |device_type=gpu| GPU C --> |device_type=tpu| TPU C --> |device_type=xpu| XPU GPU --> Workers Workers --> |model_type=decoder| GPUModelRunner Workers --> |model_type=embeddings| EmbeddingModelRunner GPUModelRunner --> ModelClassImpl[LlamaModelForCausalLM] ``` ```mermaid --- title: Request flow --- graph TB subgraph Engine AsyncLLMEngine[AsyncLLMEngine] end subgraph Executors GPU[GPUExecutorAsync] TPU[TPUExecutorAsync] XPU[XPUExecutorAsync] end subgraph Workers GPUWorker[GPUWorker] end subgraph Model Runners EmbeddingModelRunner[EmbeddingModelRunner] GPUModelRunner[GPUModelRunner] end subgraph control plane Scheduling[Scheduling] end Request[prompt, sampling_params] --> AsyncLLMEngine AsyncLLMEngine --> |add_request_async| AsyncLogitProcessor[AsyncLogitProcessorList] AsyncLogitProcessor --> Scheduling --> Executors GPU --> GPUWorker --> GPUModelRunner --> |.execute_model| ModelClassImpl[LlamaModelForCausalLM] ``` > [!note]+ some related items > > Worker base: `vllm/worker/worker_base.py` > > Initialize GPU cache and sequence group in ModelRunner step > > Executor will handle all KVCache, block manager, and evictor layer here during model execution > > broadcast SPMD with sequence groups ### proposal The following document describes and summarizes existing works in vLLM to improve general guided decoding performance. [^performance] This design will largely affect how `logit_processor` are currently being handle within the vLLM architecture. Main mega thread: [vllm-project/vllm#5423](https://github.com/vllm-project/vllm/issues/5423) Goal: - Improve general TPS when using guided decoding. - Standardize logit processor interface [^samplingpr] - separate compute\_logits and preparing logits into two separate steps Orthogonal, but still goals: - [vllm-project/vllm#5006](https://github.com/vllm-project/vllm/pull/5006) - Logit processor plugins, similar to how vLLM plugins are handled. [vllm-project/vllm#4769](https://github.com/vllm-project/vllm/pull/4769) - xgrammar: Scope: `logit_processor`, sampling controller interface ## background ![flow](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/images/vllm/pre-optimized-logit-processor-handling.webp) _reference: [vllm-project/vllm#5329](https://github.com/vllm-project/vllm/pull/5329)_ Currently, generations with FSM is super slow, even with warmup steps to initialize given FSM. This behaviour is further exemplified when running with context longer than 4096 tokens. Additionally, all outlines logit processors are considered stateful, which slows down the model executor, given in V0 logit processors are applied [row-by-row blocking](https://github.com/vllm-project/vllm/blob/1ea291a4173a82c537ab42487e23375be4926d30/vllm/model_executor/layers/logits_processor.py#L143) Thus comparing to sglang, vLLM v0 is currently not up to par. ## plan - Implement [jump-ahead decoding](https://lmsys.org/blog/2024-02-05-compressed-fsm/#method-1-finite-state-machine-based) through a JSONWorker, we can then extend this to CFGWorker - similar to how spec decode is currently implemented in V0 echo from [**@cadedaniel**](https://github.com/cadedaniel): “tree scoring in \[spec decode] could use the same API as multi-path jump decoding.” > [!question] How should we handle FSM per requests? > > - Currently, users can specify different schemas per request, which means the FSM will be compiled per request. This is suboptimal because it slows down general TTFT. > - For most use cases, we should assume JSON schema similar to how the system prompt is currently being handled (pass during server init) --- ## appendix. The following includes background information about guided generations. ### compressed FSM for jump-ahead tokens. Implemented in ([Zheng et al., 2024](#bib-zheng2024sglangefficientexecutionstructured)) #### Method 1: [FSM](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/constrained-decoding#guided-generations-with-fsm)-based decoding - intuition: Using FSM ([Willard & Louf, 2023](#bib-willard2023efficientguidedgenerationlarge)) to guide generations by increasing logit bias for tokens that conform to given JSON schema. This allows us to track the current state during decoding and filter out invalid tokens by applying logit bias to the output. ![](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/images/vllm/constrained-json-fsm.webp) - limitation: we can see that given construction of FSM requires token-level access, it can only transition the state by only _one_ token at a time, resulting in slow decoding. #### Method 2: Interleaved-based - intuition: breaks down JSON schemas, each containing either a chunk prefill part or constrained decoding part. They are then executed interleaved by inference system. Faster than per-token decoding given that chunked prefill components can process multiple tokens per forward pass See also using llama.cpp as backend. - limitation: - interleaved-based require custom syntax, making it less expressive compared to regex. - struggles to deal with tokenization boundaries due to conflicts between decode and chunked prefill segments. - frequent communications between interpreter and back-end adds additional overhead. #### **Method 3: Jump-Forward Decoding with compressed FSM** ![](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/images/vllm/jump-forward-decoding-fsm.webp) > [!tip] tokenization boundary handling > > During decoding, it is preferred to combine multiple characters into a single tokens. > > For example, when decoding `"Hello"` in context of JSON decoding, LLM might output the following token `"`, `He`, `llo`, `",` > > This may cause some strange behaviour if we combine the last `"` with `,` (this regex `"[\w\d\s]*"` with the last `,` will lead to endless decoding because this token `",` is not valid even if the LM wants to stop.) Fix: - implement re-tokenization mechanism during jump-forward phase (append string instead of the tokens, followed with re-tokenization of the entire text) $\to$ add approximately 4% of overhead - use a comprehensive regex to guide the decoding phase, instead of employing multiple concatenated regex [^coalescence] ### Coalescence intuition: Instead of expanding to $n$ state, we can compress certain chunks into one state to reduce the size of said FSM. ![](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/images/vllm/part-of-json-fsm.webp) _figure 1: initial FSM state_ ![](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/images/vllm/compressed-fsm-json.webp) _figure 2: compressed FSM state_ A way to adapt character regex to work with tokens in `outlines`: ```python import outlines.fsm as fsm from outlines.fsm.regex import make_deterministic_fsm, create_fsm_index_tokenizer new_fsm, _ = make_deterministic_fsm(fsm) idx, _ = create_fsm_index_tokenizer(new_fsm, tokenizer) ``` ```mermaid stateDiagram-v2 [*] --> InputPrompt: Start state "input prompt" as InputPrompt state "next-token probability distribution" as GetProb state "valid tokens" as ListTokens { [*] --> CheckTransitions CheckTransitions --> FilterTokens: Get index[0].keys() FilterTokens --> [*] } state "Sample Token" as SampleToken state "Update FSM State" as UpdateState InputPrompt --> GetProb: "model.generate" GetProb --> ListTokens: Get next-token distribution ListTokens --> SampleToken: Use filtered token list SampleToken --> UpdateState: Selected token X UpdateState --> [*]: new_state = index[0]["X"] ``` ```python idx_with_tokens = { state: {tokenizer.tokenizer.decode([key]): value for key, value in transitions.items()} for state, transitions in idx.items() } ``` > [!note]- example > > ```mermaid > stateDiagram-v2 > direction LR > 0 --> 2: n > 0 --> 1: t > 1 --> 2: a > 2 --> 4: na > 2 --> 3: a > 3 --> 5: am > 4 --> 6: me > 5 --> 6: me > 2 --> 6: name > 6 --> 7: e > 6 --> 8: c > 7 --> 9: p > 8 --> 9: p > 9 --> 11: Paul > 9 --> 12: Pa > 9 --> 10: Jo > 11 --> 13: aul > 12 --> 14: ul > 10 --> 26: o > 26 --> 27: h > 27 --> 14: n > 13 --> 14: l > 14 --> 16: s > 14 --> 15: s > 15 --> 17: s > 16 --> 17: s > 17 --> 18: a > 17 --> 19: ag > 18 --> 20: ge > 19 --> 20: e > 20 --> 21: 30 > 20 --> 22: 20 > 21 --> 24: 2 > 22 --> 24: 2 > 22 --> 23: 3 > 24 --> 25: 0 > 25 --> [*] > ``` _note:_ each state of FSM represents a forward pass to the LM. In vanilla generation, this is essentially necessary. Thus there is no added overhead of FSM for controlling the generated outputs. From state 2-6, we observer that there are eight different paths to get the same generations of `name`. We probably don’t need to do this, given that it will all give us result `name` But suffice to say, we can hijack this behaviour to accelerate generations by append either of the following tokens **word** to currently generated sequence: - \[”name”] - \[”n”, “a”, “m”, “e”] - \[”na”, “m”, “e”] - \[”nam”, “e”] - \[”n”, “am”, “e”] - \[”n”, “ame”] - \[”na”, “me”] - \[”n”, “a”, “me”] A simplified index can be shown as: ```python simplified_index = { 0: {'{"': 2}, 2: {"name": 6}, 6: {'":"': 9}, 9: {'Paul': 14, 'John': 14}, 14: {'","': 17}, 17: {'age': 20}, 20: {'":': 22}, 22: {'20': 24, '30': 24}, 24: {'}': 25}, } ``` That’s at least a 5x speedup over structured generations, given that out of the 9 tokens, two states are single-state transitions. Therefore we only need to call the model twice!! > [!tip]- difference in sampling distribution > > All these paths lead to the same string and the same speedup, however they lead to potentially very different states for the LLM when it reaches state 6. That is, the strings are the same, but each path leads to a different conditional probability distribution in stage 6. > > ![](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/images/vllm/json-difference-in-sampling-distribution.webp) ### Guided generations with FSM. ([Willard & Louf, 2023](#bib-willard2023efficientguidedgenerationlarge)), implemented at _assumption: we are building against [autoregressive transformers models](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/Autoregressive-models)_ - Let $\mathcal{F} \subset \mathcal{P}(\mathcal{V})$, where $\mathcal{P}$ is the power set operator, be subset of multi-token string that ends with tokens $\text{EOS} \in \mathcal{V}$. - Text generation tasks is to draw samples from $\mathcal{F}$ Notable sampling methods include greedy decoding (generate tokens recursively with highest probability tokens), beam search (but using heuristic to find the mode of distribution) [^smc] A pseudocode for sampling procedure is as follow: ```pseudo \begin{algorithm} \caption{LLM token sampling} \begin{algorithmic} \Function{sample}{$L$} \State $s \gets ()$ \For{$i \gets 1, L$} \State $\alpha \gets \text{LM}(s, \theta)$ \State Sample $s \sim \text{Categorical}(\alpha)$ \If{$s = \text{EOS}$} \State \textbf{break} \EndIf \State $s \gets \text{append}(s, s)$ \EndFor \State \Return $s$ \EndFunction \end{algorithmic} \end{algorithm} ``` Given that we are dealing with finite discrete distribution, we can then compute an un-normalized conditional distribution by applying a boolean mask $m: \mathcal{P}(\mathcal{V}) \to \{0,1\}^N$, which restricts the support of original distribution: $$ \begin{aligned} \alpha &= \text{LM}(\tilde{S_t}, \theta) \\ \tilde{\alpha} &= m(\tilde{S_t}) \odot \alpha \\ \tilde{s_{t+1}} &\approx \text{Categorial}(\tilde{\alpha}) \end{aligned} $$ > [!math] augmentation upon sampling algorithm > > ```pseudo > \begin{algorithm} > \caption{token sampling with masking} > \begin{algorithmic} > \Function{sample}{$L$} > \State $s \gets ()$ > \For{$i \gets 1, L$} > \State $\alpha \gets \text{LM}(s, \theta)$ > \State Construct the mask m($s$) > \State $\tilde{\alpha} \gets m \odot \alpha$ > \State Sample $\tilde{s} \sim \text{Categorical}(\tilde{\alpha})$ > \If{$\tilde{s} = \text{EOS}$} > \State \textbf{break} > \EndIf > \State $s \gets \text{append}(s, \tilde{s})$ > \EndFor > \State \Return $s$ > \EndFunction > \end{algorithmic} > \end{algorithm} > ``` > [!tip] finite automaton > > We define a _finite-state machine_, given by $(Q, \Sigma , \delta, q_0, F)$ [^automaton-definition] where character comprising the strings in $\mathcal{V}$ are drawn from $\Sigma$, i.e: $\mathcal{V} \in \mathcal{P}(\Sigma)$ > > ![](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/images/vllm/fsm-iterative-generations.webp) We define finding sub-sequences of FSM $M$ that accept string $v$ as follow: ```pseudo \begin{algorithm} \caption{Find sub-sequences of the FSM $M$ that accept the string $v$} \begin{algorithmic} \Function{FindSubSequences}{$M, v$} \State $M = (Q, \Sigma, \delta, q_0, F)$ \State $\texttt{res} \gets ()$ \For{$r \in \delta^{-1}(\cdot, v_0)$} \Comment{$\text{ Loop through states that read } v_0$} \State $p \gets (r)$ \For{$i \gets 1, |v| - 1$} \Comment{$\text{ Walk the FSM}$} \If{$\delta(r, v_i) = \emptyset$} \Comment{$\text{ The FSM does not read } v_i$} \State $p \gets ()$ \State \textbf{break} \Comment{$\text{ Stop walking and try the next start state}$} \EndIf \State $r \gets \delta(r, v_i)$ \State $p \gets \text{append}(p, r)$ \EndFor \State $\texttt{res} \gets \text{append}(\texttt{res}, p)$ \EndFor \State \Return $\texttt{res}$ \EndFunction \end{algorithmic} \end{algorithm} ``` We can then define construction of $\sigma$ ```pseudo \begin{algorithm} \caption{Construct a map from FSM states to subsets of $\mathcal{V}$} \begin{algorithmic} \Function{MapStatesToVocab}{$M, \mathcal{V}$} \State $M = (Q, \Sigma, \delta, q_0, F)$ \State Initialize the map $\sigma$ with empty sets for each element in $Q$ \For{$v \in \mathcal{V}$} \Comment{$\text{Loop through the vocabulary}$} \State $Z \gets \text{find\_sub\_sequences}(M, v)$ \For{$z \in Z$} \Comment{$\text{Loop through state sequences accepting } v$} \State $\sigma(z_0) \gets \sigma(z_0) \cup v$ \EndFor \EndFor \State \Return $\sigma$ \EndFunction \end{algorithmic} \end{algorithm} ``` [Lien vers l'original](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding) ## References - Yu, G.-I., Jeong, J. S., Kim, G.-W., Kim, S., & Chun, B.-G. (2022). Orca: A Distributed Serving System for Transformer-Based Generative Models. _16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)_, 521–538. - Lew, A. K., Zhi-Xuan, T., Grand, G., & Mansinghka, V. K. (2023). _Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs_. arXiv preprint arXiv:2306.03081 [arxiv](https://arxiv.org/abs/2306.03081) - Willard, B. T., & Louf, R. (2023). _Efficient Guided Generation for Large Language Models_. arXiv preprint arXiv:2307.09702 [arxiv](https://arxiv.org/abs/2307.09702) - Zheng, L., Yin, L., Xie, Z., Sun, C., Huang, J., Yu, C. H., Cao, S., Kozyrakis, C., Stoica, I., Gonzalez, J. E., Barrett, C., & Sheng, Y. (2024). _SGLang: Efficient Execution of Structured Language Model Programs_. arXiv preprint arXiv:2312.07104 [arxiv](https://arxiv.org/abs/2312.07104) - Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C. H., Gonzalez, J. E., Zhang, H., & Stoica, I. (2023). Efficient Memory Management for Large Language Model Serving with PagedAttention. _Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles_. [^paper]: The [paper](https://www.usenix.org/conference/osdi22/presentation/yu) and [presentation](https://www.youtube.com/watch?v=Ob9PPLxETYU\&ab_channel=USENIX) for the paper. Most notable open source implementation is [vLLM](https://aarnphm.xyz/thoughts/vllm/../../thoughts/Continuous-batching/../../thoughts/vllm). p/s: Actually, I think first implemented in [huggingface/tgi](https://github.com/huggingface/text-generation-inference) [^row-wise]: Current implementation [of logits processor](https://github.com/vllm-project/vllm/blob/246598a6b1e22616630b7f1bf11bd9bcb31dc860/vllm/model_executor/layers/logits_processor.py#L112) mandates that we gather all logits from hidden state, scale if needed then apply the processors. ![flow](https://aarnphm.xyz/thoughts/vllm/../../thoughts/images/vllm/pre-optimized-logit-processor-handling.webp) _reference: [vllm-project/vllm#5329](https://github.com/vllm-project/vllm/pull/5329)_ Note that there is also [vllm-project/vllm#5006](https://github.com/vllm-project/vllm/pull/5006) that improves vLLM’s own Outlines implementations of the FSM where it halves memory transition from Python list to Tensors [^performance]: Benchmark script can be found at [vllm-project/vllm#10046](https://github.com/vllm-project/vllm/pull/10046). Current RFC [vllm-project/vllm#5423](https://github.com/vllm-project/vllm/issues/5423) Note that `lm-format-enforcer` failed to compile the test schema. [^samplingpr]: [vllm-project/vllm#6273](https://github.com/vllm-project/vllm/pull/6273) proposed a sampling controller interface, but [**@cadedaniel**](https://github.com/cadedaniel) shares some [concerns](https://github.com/vllm-project/vllm/pull/6273#issuecomment-2243654991) wrt fast-forward tokens [^coalescence]: this phenomena is also known as [coalescence](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/constrained-decoding#coalescence) in structured generations, where it exploit deterministic structures in desired outputs to skip expensive forward pass [^smc]: ([Lew et al., 2023](#bib-lew2023sequentialmontecarlosteering)) recently proposes a sequential [Monte Carlo steering](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/Monte-Carlo). The idea is to classify causal generations as a _posteriori inference_ problem in a class of discrete probabilistic sequence models. See also [Feynman-Kac transformers models](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/Transformers#feynman-kac) [^automaton-definition]: [finite state machine](https://aarnphm.xyz/thoughts/vllm/../../thoughts/constrained-decoding/../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/DFA) - $Q$ is a finite set of states - $\Sigma$ is a finite alphabet - $\delta: Q \times \Sigma \to Q$ is the transition function - $q_0 \in Q$ is the start state - $F \subseteq Q$ is the set of all accepted states. --- slug: thoughts/work tags: - evergreen description: A list of work that I have been doing for the past while. title: work. date: 2021-12-22 --- A collection of work I have done for the past while that I’m proud of. A backlog of unfinished ideas can be found [here](https://aarnphm.xyz/thoughts/work/../../ideas). --- ## writing. You can find internal monologue under [posts](https://aarnphm.xyz/thoughts/work/../../posts/) index. ## open source. - **Quartz** - 🌱 a fast, batteries-included static-site generator that transforms Markdown content into fully functional websites (2023-) - A set of tools that helps you publish your [digital garden](https://aarnphm.xyz/thoughts/work/../../thoughts/Digital-garden) and notes as a website for free. - Improved performance of graph interaction with Canvas [jackyzha0/quartz#1328](https://github.com/jackyzha0/quartz/pull/1328) - Added support for PDF in popover modal [jackyzha0/quartz#913](https://github.com/jackyzha0/quartz/pull/913) - Implemented font-fetching before runtime [jackyzha0/quartz#817](https://github.com/jackyzha0/quartz/pull/817) - Implemented telescope-style search [jackyzha0/quartz#722](https://github.com/jackyzha0/quartz/pull/722), [jackyzha0/quartz#774](https://github.com/jackyzha0/quartz/pull/774), [jackyzha0/quartz#782](https://github.com/jackyzha0/quartz/pull/782) - Added sidenotes components, inspired by [Tuffe’s CSS](https://edwardtufte.github.io/tufte-css/) [jackyzha0/quartz#1555](https://github.com/jackyzha0/quartz/pull/1555), [examples](https://aarnphm.xyz/thoughts/work/../../thoughts/mechanistic-interpretability) - Landing page of [this](https://aarnphm.xyz/thoughts/work/../../) website, with custom components, i.e: [supper club](https://aarnphm.xyz/thoughts/work/../../thoughts/atelier-with-friends/dundurn), [curius](https://aarnphm.xyz/thoughts/work/../../curius), parsing [jupyter notebooks](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/PCA) - Source: and [site](https://quartz.jzhao.xyz/) - **avante.nvim** - A [Cursor](https://www.cursor.com/)-like chat IDE for [Neovim](https://aarnphm.xyz/thoughts/work/../../uses#neovim) (2024-) - Implemented bounding UI popover to improve QOL [yetone/avante.nvim#29](https://github.com/yetone/avante.nvim/pull/29) - Added support for lazy setup for better load time improvement [yetone/avante.nvim#14](https://github.com/yetone/avante.nvim/pull/14) - Added Rust crates for `.avanterules` templates - Source: [](https://aarnphm.xyz/thoughts/work/../../thoughts/images/avante.mp4) - **tinymorph** - An exploration into how we build interfaces for machine-assisted writing tool (2024-) - **WARNING**: Currently in research phase. - Trained [sparse autoencoder](https://aarnphm.xyz/thoughts/work/../../thoughts/sparse-autoencoder) to interpret Llama 3.2 features ([Templeton et al., 2024](#bib-templeton2024scaling)) - **OpenLLM** - Run any open-source [LLMs](https://aarnphm.xyz/thoughts/work/../../thoughts/LLMs) as OpenAI compatible API endpoint in the cloud. (2023-) - 🔬 Build for fast and production usages - 🚂 Support Llama, Qwen, Gemma, etc, and **[quantized](https://aarnphm.xyz/thoughts/work/../../thoughts/quantization)** versions - ⛓️ OpenAI-compatible API - 💬 Built-in ChatGPT like UI - 🔥 Accelerated LLM decoding with state-of-the-art [inference](https://aarnphm.xyz/thoughts/work/../../thoughts/Transformers#inference) backends - Source: ![](https://aarnphm.xyz/thoughts/work/../../thoughts/images/openllm.gif) - **BentoML** - Build Production-grade AI Application (2021-) ([Yang et al., n.d.](#bib-yangbentoml2022)) - a framework that simplifies [machine learning](https://aarnphm.xyz/thoughts/work/../../thoughts/Machine-learning) model deployment and provides a faster way to ship your model to production. Supports a variety of use cases, from classical ML to [LLMs](https://aarnphm.xyz/thoughts/work/../../thoughts/LLMs), diffusions models. - Built using Python, [BuildKit](https://aarnphm.xyz/thoughts/work/../../thoughts/BuildKit), gRPC - Source: , [Documentation](https://docs.bentoml.com) - **incogni.to** - a pseudonymous event platform that curates for those yearning to be seen for who they are, not what they can "sell" (2024) - Implemented a [RAG](https://aarnphm.xyz/thoughts/work/../../thoughts/RAG) pipeline for recommendation system based on users preferences and interests, with [command-r-plus-08-2024](https://huggingface.co/CohereForAI/c4ai-command-r-plus), deployed with [vLLM](https://aarnphm.xyz/thoughts/work/../../thoughts/vllm) and BentoML ([Yang et al., n.d.](#bib-yangbentoml2022)) - Added semantic search to find relevant events based on query with [Cohere Rerank](https://cohere.com/rerank) - General UI implementation with shadcn/ui and vercel/next.js - Demoed at [New Build’24](https://x.com/newsystems_/status/1828455648377327976) - Source: [stream](https://x.com/i/broadcasts/1OwxWNvzRejJQ), [posts](https://aarnphm.xyz/thoughts/work/../../posts/new) - **onw** - A real-time navigation tools for safer commute (2021) - Implemented route optimization, heat map visualization to identify hot zones, peer notification system. - Added a heuristic Gaussian Mixture Model to find the safest path between different locations, trained on past assault data provided by Toronto Police Department. - Awarded: Finalists at [Hack the North 2021](https://devpost.com/software/twogether). - Built using AWS Fargate, React Native, TypeScript, GraphQL, Apache Spark MLlib, Google Maps API - Source: , [devpost](https://devpost.com/software/twogether) ## talks. - OpenLLM, and everything about running LLMs in production at Hack The North (2023) - Source: [slides](https://aarnphm.xyz/thoughts/work/../../thoughts/images/htn-openllm.pdf) ![](https://aarnphm.xyz/thoughts/work/../../thoughts/images/htn-2023-speaks.webp) ## companies. > Im thinking to build a toronto compute company, looking for funding > > — aaron (@aarnphm\_) [11 octobre 2024](https://twitter.com/aarnphm_/status/1844775079286120682) ## References - Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken, T., Chen, B., Pearce, A., Citro, C., Ameisen, E., Jones, A., Cunningham, H., Turner, N. L., McDougall, C., MacDiarmid, M., Freeman, C. D., Sumers, T. R., Rees, E., Batson, J., Jermyn, A., … Henighan, T. (2024). Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html) - Yang, C., Sean, S., Aaron, P., Shenyang, Z., Sauyon, L., Bo, J., Fog, D., Xipeng, G., & Frost, M. (n.d.). _BentoML: The framework for building reliable, scalable and cost-efficient AI application_. [\[GitHub\]](https://github.com/bentoml/BentoML) --- slug: thoughts/writing tags: - sapling - evergreen description: resconstructed source of "https://aarnphm.xyz/thoughts/writing" title: Writing date: 2023-07-29 --- ## why writing? > Writing as _crystallised_ thought, a way of expressing the labyrinth of interconnected, messy, and incoherent ideas in my mind. It is a form of [knowledge distillation](https://jzhao.xyz/thoughts/knowledge-distillation) (Jacky on [writing](https://jzhao.xyz/thoughts/writing)) Writing is an exploration, an excavation of self and the world in this painfully intricate dance. It is a way to bridge the chasm between ideas, to extend a filament of one consciousness to others. It is an extension of self, a second brain, where bounded by constraining nets of syntax and grammar, providing grounds for freedom of expression and articulate one’s interests and curiosity. > The thing I like about writing is that it’s quite literally _thinking_—a way for me access my own interiority and construct something from it. What I write is all mine, it’s a living thing, it’s an extension of me that wanders out into the world. It is desire turned inwards instead of outwards, focused instead of displaced. It’s a way to access self-knowledge and self-respect. (Ava on [how to avoid half-heartedness](https://www.avabear.xyz/p/how-to-avoid-half-heartedness)) The modality of [text](https://aarnphm.xyz/thoughts/writing/../../thoughts/Language#representation) essentially creates a universal interface that allows individuals from diverse backgrounds and contexts to form intricate networks of thoughts. Andy Matuschak’s on [books](https://aarnphm.xyz/thoughts/writing/../../books) and writing: > Writing is perhaps the greatest of human inventions, binding together people who never knew each other, citizens of distant epochs. Books break the shackles of time. A book is proof that humans are capable of working magic. > What I am doing right now, writing this essay, is, technically, **a linear walk through the [network](https://aarnphm.xyz/thoughts/writing/../../thoughts/Networked-Thoughts) of my ideas**. That is what writing is: turning a net into a line. But it is also very concretely what I do, since I have _externalised_ my ideas in a [note-taking system](https://obsidian.md/) where the thoughts are linked with [hyperlinks](https://aarnphm.xyz/thoughts/writing/../../thoughts/Hypertext) (Henrik Karlsson, [Reader-generated Essay](https://www.lesswrong.com/posts/ZtMsyMP5F7zzP8Gvc/reader-generated-essays)) At its core, writing endeavours to transmute the [chaos](https://aarnphm.xyz/thoughts/writing/../../thoughts/Chaos) of [existence](https://aarnphm.xyz/thoughts/writing/../../thoughts/Existentialism) into discernible narratives, offering a conduit for shared understanding amidst the inherent disarray of life. Such a form of [looseness in mutation](https://subconscious.substack.com/p/hypertext-montage) ## as playground. Writing is also a playground for nurturing your [“baby idea”](https://substack.com/inbox/post/140191029#footnote-5-140191029). The mind are often overwhelming, and I found the act of writing therapeutic, and help organize and control inner [entropy](https://aarnphm.xyz/thoughts/writing/../../thoughts/Entropy): > I notice this change most when I try to write. I think working a lot has made me a worse writer. 90 percent of the words I consume and produce in a week are emails, strategy docs, research reports, and documentation—text designed to be as digestible as possible for a busy, distracted end user. My prose has tightened, the excess trimmed. Information efficiency is paramount. I write like the 12 dollar desk salad, the bar that packs 20 grams of protein and plastic into one 200-calorie brick. But good writing, like a good meal, needs fat. It should indulge readers, is meant to be chewed and enjoyed, affording a generous escape from the prosaic and mundane. — [Jasmine Sun](https://jasmine.substack.com/p/audience-of-one) I write for me, and for me only. I see writing as a [love](https://aarnphm.xyz/thoughts/writing/../../tags/love) letter from my past-self, crafted and permanently available on the internet for my future-self to read. Writing to me serves as an escape from the realm of the living, venturing into the wonderland. I didn’t grow up writing or reading much, but living [abroad](https://aarnphm.xyz/thoughts/writing/../../posts/Chaos), I found solace in the land of the writers, getting loss in their imagination envisioning what the world _should_ be. ## paradox. You see, I think writing is this pursuit of clarity in the midst of chaos, a striving to impose some semblance of order on the boundless and unpredictable swirl of sensations, feelings, and thoughts that define our existence. It morphs into a bridge, spanning the chasm among individuals, carrying it across the echo: “This is me, this is the world as I see it, and is it the same for you?” It’s a reverberation in the void, fuelled by a yearning that amidst the boundless human tapestry, there exists a soul that perceives the echo, reciprocates the sentiment, and in doing so, forges a minuscule yet profound filament of comprehension and empathy. And yet, writing is an acknowledgement of the indelible solitude inherent to human existence. It is a quiet concession to the insurmountable walls that encase individuals’ inner sanctum. The act of writing is both a defiance of and a homage to the impenetrable mystery that shrouds the heart of another. Its delicate endeavour to articulate the inarticulable, to unveil the veiled, all the while knowing the quest may never consummate in total understanding. In this paradox lies the profound beauty and torment of writing. It’s a ceaseless sojourn towards the horizon of connection, propelled by a boundless hope and a quixotic resolve, yet shadowed by the solemn acceptance of inherent disconnection. This complex interplay births the agony and the ecstasy of the writing voyage, the ceaseless pull between the allure of communion and the stark reality of intrinsic solitude. ## motivation. Excerpt from _George Orwell’s Why I Write_ Sheer egoism: > But there is also the minority of gifted, willful people who are determined to live their own lives to the end, and writers belong in this class. Serious writers, I should say, are on the whole more vain and self-centered than journalists, though less interested in money. Aesthetic enthusiasm: - Perception of beauty in the external world, or, on the other hand, in words and their right arrangement - Pleasure in the impact of one sound on another, in the firmness of good prose or the rhythm of a good story - Desire to share an experience which one feels is valuable and ought not to be missed. Historical impulse - Seeing things as they are, to find out true facts and store them up for the use of posterity. Political purpose - Desire to push the world in a certain direction, to alter other peoples’ idea of the kind of society that they should strive after. Orwell is often known for his democratic socialism, and opposed for totalitarianism. But his [will to truth](https://aarnphm.xyz/thoughts/writing/../../thoughts/Will-to-Truth) is fundamental to his writing, a true [representation](https://aarnphm.xyz/thoughts/writing/../../thoughts/representations) of intellectual honesty. In the context of totalitarianism, Orwell pointed out that such regimes demand the continuous alteration of the past and probably a disbelief in the very existence of [objective truth](https://www.goodreads.com/book/show/35610790-orwell-on-truth). He saw the danger in a society that drifts from the truth, stating that it will hate those who speak it. > If you’re thinking without writing, you only think you’re thinking. > > So a world divided into writes and write-nots is more dangerous than it sounds. It will be a world of thinks and think-nots. I know which half I want to be in, and I bet you do too. > > — [Write and write-not](https://paulgraham.com/writes.html) ## query. _Excerpt from [A blog post is a very long and complex search query to find fascinating people and make them route interesting stuff to your inbox](https://www.henrikkarlsson.xyz/p/search-query)_ > The pleasant parts of the internet seemed to be curated by human beings, not algorithms. For my writing to find its way in this netherworld, I needed to have a rough sense of how information flowed down there. The pattern was this: words flowed from the periphery to the centers. This was a surprisingly rapid stream. Then the words cascaded from the center down in a broader but slower stream to the periphery again. > When writing in public, there is a common idea that you should make it _accessible_. This is a left over from mass media. Words addressed to a large and diverse set of people need to be simple and clear and free of jargon. It is valuable to write clearly of course, to a degree. Clear writing is clear thinking See also: [this](https://www.youtube.com/watch?v=FGqbUHOTog8\&ab_channel=buildspace) ## protocol. > Why do you build software for writing over protocol such as file? --- slug: thoughts/zero-shot-learning tags: - llm description: resconstructed source of "https://aarnphm.xyz/thoughts/zero-shot-learning" title: zero-shot prompting date: 2024-02-12 --- [Source](https://arxiv.org/pdf/2109.01652.pdf) The paper argues that zero-shot prompting on a instruction-tuned small language models outperform [LLMs](https://aarnphm.xyz/thoughts/zero-shot-learning/../../thoughts/LLMs) systems. - Instruction-tuning actually improve zero-shot learning performance. - Mostly tested on FLAN, but show results throughout with GPT-3 and on few reading comprehension dataset. Honorable mentions include prompt tuning or few-shots prompting --- slug: tweets tags: - seed description: collections of tweets I like, since bookmark is getting crowded title: /xeets/ date: 2024-10-24 --- > Static vs dynamic sites in the simplest terms I can think of:\ > \ > Static sites are like printed books. You write your content in an app like Obsidian, and then a tool turns that into a complete, self-contained website. The pages don't change unless you make updates and "reprint" the… > > — kepano (@kepano) [10 octobre 2024](https://twitter.com/kepano/status/1844491841787273467) > For about 10 days, I was stuck at square 1 with not a single idea. Then I saw a very dim opening. It was very technical, but gradually consolidated and clarified. I continued at the same pace for 49 days, without a single day off — until I posted on arXiv: > > — David Bessis (@davidbessis) [24 octobre 2024](https://twitter.com/davidbessis/status/1849442615176950202) > We previously shared our research on Layer Skip, an end-to-end solution for accelerating LLMs from researchers at Meta FAIR. It achieves this by executing a subset of an LLM’s layers and utilizing subsequent layers for verification and correction. We’re now releasing inference… [pic.twitter.com/gag29HSf6e](https://t.co/gag29HSf6e) > > — AI at Meta (@AIatMeta) [29 octobre 2024](https://twitter.com/AIatMeta/status/1851327605716435011) > research.log - 2024/10/29\ > \ > For the past couple weekends, I've been training Sparse Autoencoders on FLUX / CLIP vision encoders. Today, I'm happy to release fluxlens, an interactive interface for exploring SAE features.\ > \ > \- fluxlens\ > \- blog… [pic.twitter.com/Y6vQkRMGrB](https://t.co/Y6vQkRMGrB) > > — NYRE (@sleenyre) [30 octobre 2024](https://twitter.com/sleenyre/status/1851519830375207309) > just open-sourced the training and evaluation code for cde, our state-of-the-art small text embedding model\ > \ > includes code for lots of hard stuff:\ > \* efficient clustering large datasets\ > \* contrastive training for SOTA retrieval models\ > \* our custom two-stage model architecture that… [pic.twitter.com/ZvsssUHL20](https://t.co/ZvsssUHL20) > > — jack morris (@jxmnop) [30 octobre 2024](https://twitter.com/jxmnop/status/1851706815244902691) > Which Moo Deng are you today? [pic.twitter.com/giwJHaHHet](https://t.co/giwJHaHHet) > > — 💖 (@twaniimals) [19 septembre 2024](https://twitter.com/twaniimals/status/1836560827756740626) > This UI lives in my head rent-free. I always come back to it once a year. [pic.twitter.com/iQMDLu8YwQ](https://t.co/iQMDLu8YwQ) > > — Adrien Griveau (@Griveau) [31 octobre 2024](https://twitter.com/Griveau/status/1851937688988889514) > what if your journal visualized your emotions? (little concept inspired by hume and obsidian) [pic.twitter.com/zJe3oe3f5I](https://t.co/zJe3oe3f5I) > > — lele (@CherrilynnZ) [19 septembre 2024](https://twitter.com/CherrilynnZ/status/1836881535154409629) > In 2019, a month after I had joined Shopify, I sent this note to my team. I called it "say the thing" and I think it has stood the test of time well. [pic.twitter.com/8SThyPnNja](https://t.co/8SThyPnNja) > > — Kaz Nejatian (@CanadaKaz) [30 octobre 2024](https://twitter.com/CanadaKaz/status/1851653777633247673) > Anthropic interp put out a post two days ago on crosscoders. [@Connor\_Kissane](https://twitter.com/Connor_Kissane) just put out an open source replication of model diffing crosscoders on Gemma 2 2B!\ > \ > We're excited to enable further research: is model diffing a big deal for safety and do crosscoders help study it? > > — Neel Nanda (@NeelNanda5) [27 octobre 2024](https://twitter.com/NeelNanda5/status/1850656772002120009) --- slug: uses tags: - technical description: Includes the tools I use, workflow, etc. my [submission](https://uses.tech) title: uses. date: 2024-01-22 --- ### hardware. - Macbook Pro (M1 Max, 16-inch, 32GB RAM, 2021) - Apple Trackpad 2 - Logitech Pro X Superlight 2 - LG 4K 32UN880-B x 2 - [beyerdynamic DT 1990 Pro](https://global.beyerdynamic.com/dt-1990-pro.html) - Audioengine A2+ - Yamaha AG03MK2W & Universal Audio Apollo Twin X DUO (Audio Interface) - Shure SM7B w/ Cloudlifter CL-1 (Microphone) - Logitech MX Master 3 - Logitech BRIO 4K - A7III and [FX3](https://www.sony.ca/en/interchangeable-lens-cameras/products/ilme-fx3-body---kit) w/ a bunch of [lenses](https://aarnphm.xyz/thoughts/lenses) and rigs. ### software. - [Alacritty](https://alacritty.org/) - [neovim](https://neovim.io/) and [config](https://github.com/aarnphm/editor) - patched [Berkeley Mono](https://berkeleygraphics.com/typefaces/berkeley-mono/) with Nerd Fonts. - [Obsidian](https://obsidian.md/), hosted with [Quartz](https://quartz.jzhao.xyz) (this site) - GitHub - Apple Notes - [Bitwarden](https://bitwarden.com/) - Safari - [Live 11 Suite](https://www.ableton.com/en/live/) - [Raycast](https://www.raycast.com/) - Rectangle - everything under [expenses.](https://aarnphm.xyz/thoughts/Expenses) ### programming languages. - [Python](https://www.python.org/) for ml work. - [Go](https://golang.org/) for infrastructure work. - [Rust](https://www.rust-lang.org/) for exploration work. ### cookware. used for [hosting](https://aarnphm.xyz/thoughts/Dishes) - made-in stainless steel pans and pots sets. - 12’ All-Clad Saute Pan - Stand-mixer ### keebs. - Mode Eighty: Signal Tactile, 68g spring, 205g0, Durock V2 - Keychron Q8 Pro: Gateron Ink Black, 205g0 + 105g0, Durock V2, Olivia GMK - Keychron Q1 Pro: NK Cream, 105g0, 72g spring, Durock V2 - Logitech G515 TKL: Gateron Brown Slim, stock everything else. ### links. - [toronto coworking spots](https://www.corner.inc/list/02c68af9-8286-474f-91de-0b4e702330e6?sid=49933781-9175-48ae-852b-acb5006e8bca) - [GNU C Manual Reference](https://www.c-asm.com/gnu-c-manual.html)