--- slug: index tags: - evergreen - fruit description: "Aaron's digital garden" title: "Aaron's notes" date: 2022-04-22 permalink: https://aarnphm.xyz/llms.txt --- Beige and rosé are my two favourite colours. I try to be present, but you will find me either [writing](https://aarnphm.xyz/thoughts/writing#motivation) or [reading](https://aarnphm.xyz/books). I like to take long walks, host [functions](https://aarnphm.xyz/thoughts/atelier-with-friends), and people watching. Cooking is my [love](https://aarnphm.xyz/tags/love) language, which is how my mom expresses her love for me. How one cooks their eggs tells a lot about how they treat others. [open-source projects](https://aarnphm.xyz/thoughts/work) are overall net positive for everyone, so contribute. I believe in [tools](https://aarnphm.xyz/thoughts/papers/Tools-for-Conviviality-by-Ivan-Illich.pdf) that give back [agency](https://aarnphm.xyz/thoughts/Agency) to users and help them fulfil their [desire](https://aarnphm.xyz/thoughts/desire) in life. Understanding the [inner working](https://aarnphm.xyz/thoughts/mechanistic-interpretability) of large language models would help us to do better science. Currently, I’m building [serving infrastructure](https://bentoml.com) for [ml](https://aarnphm.xyz/thoughts/Machine-learning) systems and explore our interaction through [large language models](https://aarnphm.xyz/thoughts/LLMs). I’m best reached [on twitter](https://twitter.com/aarnphm_) or --- slug: books tags: - evergreen description: "the one with all the books I should read." title: "antilibrary." date: 2022-04-22 permalink: https://aarnphm.xyz/books.html.md --- A (mostly) up-to-date book lists that I read, wanting, am reading, or finished reading. See also: [digital version](https://aarnphm.xyz/curius) > In essence, an [antilibrary](https://nesslabs.com/antilibrary) is a collection of unread books. It represents an ode to self that reminds you about topics that one wants to explore. ## current. | title | author | notes | | ------------------------------------------------------------------------------------------------- | --------------------------------------------------- | ----------------------------------------------- | | Essay on Love | Alain de Botton | | | [Nietzsche and Philosophy](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche) | Gilles Deleuze | | | [The Gay Science](https://aarnphm.xyz/thoughts/papers/The-Gay-Science-by-Friedrich-Nietzsche.pdf) | Friedrich Nietzsche | | | Beyond Good and Evil | Friedrich Nietzsche | | | Beyond The Pleasure Principle | Sigmund [Freud](https://aarnphm.xyz/thoughts/Freud) | | | The Critique of Pure Reason | Immanuel Kant | | | The Metaphysics of Morals | Immanuel Kant | | | Crime and Punishment | Fyodor Dostoevsky | | | Structure and Interpretation of Computer Programs | Abelson and Sussman | [pdf](https://web.mit.edu/6.001/6.037/sicp.pdf) | | Man and His Symbols | Carl G. Jung | | ## to read. ### [philosophy](https://aarnphm.xyz/tags/philosophy) | title | author | notes | | ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | A Treatise of Human Nature | David Hume | | | The Evolution of Modern Metaphysics: Making Sense of Things | A. W. Moore | | | [Being and Some Philosophers](https://aarnphm.xyz/thoughts/papers/Being-and-Some-Philosophers.pdf) | Etienne Gilson | | | The Phenomenology of Spirit | G. W. F. Hegel | | | The World as Will and [Representation](https://aarnphm.xyz/thoughts/representations) | Arthur Schopenhauer | | | The Prince | Niccolò Machiavelli | | | Utilitarianism | John Stuart [Mill](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill) | | | Meditations on First Philosophy | René [Descartes](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Descartes), French ed. | | | Existentialism in Social [Pedagogy](https://aarnphm.xyz/thoughts/education) | Søren Kierkegaard | | | [The Will To Believe](https://aarnphm.xyz/thoughts/The-Will-To-Believe) | William James | | | The Care of the Self | Michel Foucault | | | Metaphysical myths, mathematical Practice: The Ontology and [Epistemology](https://aarnphm.xyz/thoughts/Epistemology) of the Exact Science | Michel Foucault | | | Repetition | Kierkegaard | | | On Certainty | Ludwig Wittgenstein | | | The Conquest of Happiness | Bertrand Russell | [html](https://russell-j.com/beginner/COH-TEXT.HTM) | | Being and Time | Heidegger | | | Pensees | Pascal | [html](https://www.gutenberg.org/files/18269/18269-h/18269-h.htm) | | Being and Nothingness | Jean-Paul Sartre | | | Philosophical Investigations | Ludwig Wittgenstein | [pdf](https://static1.squarespace.com/static/54889e73e4b0a2c1f9891289/t/564b61a4e4b04eca59c4d232/1447780772744/Ludwig.Wittgenstein.-.Philosophical.Investigations.pdf) | #### [Nietzsche](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche) - **The Birth of Tragedy** - **The Will to Power** - **Thus Spoke Zarathustra** - **Twilight of the Idols** - **On The Genealogy of Morals** - **Ecce Homo** #### [Kant](https://aarnphm.xyz/thoughts/Philosophy-and-Kant) - **The Critique of Practical Reason** - **Groundwork of the Metaphysics of Morals** #### [Camus](https://aarnphm.xyz/thoughts/Camus) - **The Fall** - **The Rebel** - **The First Man** - **Resistance, Rebellion, and Death** ### non-fiction | title | author | notes | | ------------------------------------------------------ | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | | Deep Work | Cal Newport | | | Digital Minimalism | Cal Newport | | | Playing Software: Homo Ludens in Computational Culture | Miguel Sicart | | | Reimagining Capitalism in a World on Fire | Rebecca Henderson | | | Principles | Ray Dalio | | | Mindset | Dr. Carol S. Dweck | | | The Pleasure of Finding Things Out | Richard P. Feynman | | | Walden and Civil Disobedience | Henry David Thoreau | | | Deep Sleep | Jade Wu | | | Are We Spiritual Machines? | Ray Kurzweil | [html](https://onlinebooks.library.upenn.edu/webbin/book/lookupid?key=olbp56055) | | Free to Choose | Milton Friedman | | | Seduction and Betrayal | Elizabeth Hardwick | [link](https://www.penguinrandomhouse.ca/books/643010/seduction-and-betrayal-by-elizabeth-hardwick-introduction-by-joan-didion/9780940322783) | ### fiction | title | author | | ----------------------- | --------------------- | | Recursion | Blake Crouch | | Sea of Tranquility | Emily St. John Mandel | | Oblivion | David Foster Wallace | | The Uninhabitable Earth | Wallace-Weels | | The Idiot | Fyodor Dostoevsky | | The Brothers Karamazov | Fyodor Dostoevsky | | Fall On Your Knees | Ann-Marie MacDonald | | Foundation series | Isaac Asimov | | The Three-Body Problem | Liu Cixin | | Robinson Crusoe | Daniel Defoe | | The Overstory | Richard Powers | | Rejection | Tony Tulathimutte | | Play It as It Lays | Joan Didion | ### poetry | title | author | | --------------------- | ----------- | | Dog songs | Mary Oliver | | Come Home To Yourself | Déjà Rae | --- ## finished. ### 2024 - **The Trial** by Frank Kafka - **The Triple Helix: Gene, Organism, and Environment** by Richard Lewontin - **Fear and Trembling** by Søren Kierkegaard - **Either/Or** by Søren Kierkegaard - **The Lily of the Field and the Bird of the Air** by Søren Kierkegaard - **Meditations** by Marcus Aurelius - **[The Myth of Sisyphus](https://aarnphm.xyz/thoughts/Camus#the-myth-of-sisyphus)** by Albert Camus - **The Stranger** by [Albert Camus](https://aarnphm.xyz/thoughts/Camus) - **The metamorphosis** by Franz Kafka - **The end of the affair** by Graham Greene - **The Little Book of [Deep Learning](https://aarnphm.xyz/thoughts/deep-learning)** by [François Fleuret](https://fleuret.org/public/lbdl.pdf) - **[The Ego and the Id](https://aarnphm.xyz/thoughts/Freud#the-ego-and-the-id)** by Sigmund Freud - **Tomorrow, and Tomorrow, and Tomorrow** by Gabrielle Zevin - **[Web Browser Engineering](https://browser.engineering/onepage.html)** by Pavel Panchekha & Chris Harrelson - **1984** by George Orwell ### 2023 - **Why I Write** by George Orwell - **Why I Am So Wise** by Friedrich Nietzsche - **[Civilisation and its Discontents](https://aarnphm.xyz/thoughts/Civilisation-and-its-Discontents)** by Sigmund Freud - **Dopamine Nation** by Dr. Anna Lembke - **The Midnight Library** by Matt Haig - **Out of Love** by Hazel Hayes - **In Emergency, Break Glass: What Nietzsche Can Teach Us About Joyful Living in a Tech-Saturated World** by Nate Anderson - **The Subtle Art of Not Giving a Fuck** by Mark Manson - **[Pretentiousness: Why it Matters](https://aarnphm.xyz/thoughts/fashion#pretentious)** by Dan Fox - **The Republic** by Plato - **Apology** by Plato - **Symposium** by Plato - **Pillow Thoughts IV** by Courtney Peppernell - **Radically Human: How New Technology Is Transforming Business and Shaping Our Future** by Paul Daugherty and H. James Wilson ### 2022 - **Infinite Jest** by DFW - **Dune** series by Frank Herbert - **Kafka on the Shore** by Haruki Murakami - **21 Lessons for the 21st Century** by Yuval Noah Harari - **The Outsiders: Eight Unconventional CEOs and Their Radically Rational Blueprint for Success** by Will Thorndike ### 2021 - **Working in Public: The Making and Maintenance of Open Source Software** by Nadia Eghbal - **The Death of Ivan Ilyich** by Tolstoy - **Godfather** and **The Scilian** by Mario Puzo - **1984** by George Orwell --- slug: cheatsheet tags: - evergreen description: "resconstructed source of https://aarnphm.xyz/cheatsheet" title: "cheatsheet" date: 2024-10-10 permalink: https://aarnphm.xyz/cheatsheet.html.md --- A list of cheatsheet of whatever that fits with my workflow $$ \begin{aligned} \text{Big O(micron)} &: O \text{ or } \mathcal{O} \\ \text{Big Omega} &: \Omega \\ \text{Big Theta} &: \Theta \\ \text{Small O(micron)} &: o \\ \text{Small Omega} &: \omega \\ \text{On the order of}: &: \sim \end{aligned} $$ --- slug: curius tags: - evergreen - hyperlinks description: "curius dot app slash aaron dash pham" title: "curius." date: 2024-01-26 permalink: https://aarnphm.xyz/curius.html.md --- See curius.app/aaron-pham or curius.aarnphm.xyz --- slug: ideas tags: - technical - evergreen description: "Liste de projets, d'idées, d'écrits auxquels on reviendra." title: "ideas." date: 2022-01-25 permalink: https://aarnphm.xyz/ideas.html.md --- ### lettres - love (wip) - self-healing and love - growth after death - education and pedagogical implications on next generations - recommendation system and word2vec - social interactions a la carte. ### projets - LaTeX codeblock renderer for [neovim](https://aarnphm.xyz/uses#neovim), in editor - Support KaTeX, and probably MathJax - Uses `conceallevel` - - yet another emulator in Rust - Want to stream current running process and make it clickable? - Vim and Emacs support - multiplexer - stream TTY? ```mermaid flowchart TD 1[GUI] --> 2[tty] --> 3[rsh] 1 --> 5[multiplexer] 2 --> 1 ``` - rsh: new shell language written with Rust-like syntax - I get fed up with bash - Should be cloud-first? - Nix inspiration for caching and package management? - [Rust](https://aarnphm.xyz/thoughts/Rust) key-value store - Think of it as MongoDB but has Redis capability - Dockerfile for LLM - [ollama](https://github.com/ollama/ollama)’s Modelfile. - Dockerfile frontend, [BuildKit](https://aarnphm.xyz/thoughts/BuildKit), [OCI](https://aarnphm.xyz/thoughts/OCI)-compliant frontend. - Stay away from Docker 😄 - disappearing text - For svg: [codepen](https://codepen.io/Mikhail-Bespalov/pen/yLmpxOG) > Im thinking to build a toronto compute company, looking for funding > > — aaron (@aarnphm\_) [11 octobre 2024](https://twitter.com/aarnphm_/status/1844775079286120682?ref_src=twsrc%5Etfw) ### écriture - bazil: A [Bazel](https://bazel.build/) for the unversed - Bazel is hard to get started with --- slug: infinite-poem tags: - seed description: "resconstructed source of https://aarnphm.xyz/infinite-poem" title: "infinite poem" date: 2024-10-11 permalink: https://aarnphm.xyz/infinite-poem.html.md --- ```js const rules = { start: "$line1\n $line2\n$line3\n $line4\n$line5", line1: "What shall a $dog_breed do?", line2: "$verbs through the $nature_place,", line3: "Then she $verbs her $dog_feature.", line4: "$human_action, I $human_verb", line5: "This $adj $noun of $emotion.", dog_breed: "labrador (4) | terrier | shepherd | beagle | poodle", dog_feature: "floppy ears | wagging tail | wet nose | playful eyes | soft fur", verbs: "runs | leaps | bounds | trots | dashes", nature_place: "meadow | forest | garden | park | beach", human_action: "Watching | Smiling | Laughing | Wondering | Marveling", human_verb: "contemplate | ponder | appreciate | cherish | admire", adj: "simple | joyful | precious | fleeting | eternal", noun: "moment | bond | connection | friendship | companionship", emotion: "love | happiness | wonder | gratitude | peace", } // Generate and print the poem 5 times for (let i = 0; i < 10; i++) { console.log(`Poem ${i + 1}:`) console.log(RiTa.grammar(rules).expand()) console.log() // Add a blank line between poems } ``` --- slug: influence tags: - growth description: "A list of folks that inspires me a bunch" title: "affecter." date: 2024-01-23 permalink: https://aarnphm.xyz/influence.html.md --- I think a lot about this [quote](https://aarnphm.xyz/quotes#life-jobs-smart) from Steve Jobs, and realised that you are who you surrounded yourself with. Whether online, daily, we often populate our minds and time by the people we hang around or work with. People who I owed a lot, but not limited to: [Jacky](https://jzhao.xyz/), [Chaoyu](https://twitter.com/chaoyu_), [Sean](https://www.linkedin.com/in/ssheng/), [Hank and John](https://www.youtube.com/@vlogbrothers), [Kieran](https://www.fourtet.net/), Nicole, Jesse, [Tommy](https://tommytrinh.me/) --- slug: inspo tags: - technical - seed description: "cool stuff on the internet" title: "website inspiration" date: 2024-10-24 permalink: https://aarnphm.xyz/inspo.html.md --- ## website _see also: [portfolio trail](https://curius.app/aaron-pham/portfolio)_ - Brian Sholis’ website: clean visual, great contents ([link](https://www.sholis.com/)) - Jacky’s website ([link](https://jzhao.xyz/)) - daylightcomputer’s inspired but in pure CSS and [HTML](https://github.com/jackyzha0/sunlit) - Daylight Computer ([link](https://daylightcomputer.com/)) - - - clean aesthetics with nice hierarchical components - - warm, graphics, animation smooth - - cool ascii animations - - - cool visualisation of typing process - - wait terminal go brrr ## essay - - vintage, letter type - ## protocol - Willow: protocol for synchronisable data store ([link](https://willowprotocol.org/specs/index.html#specifications)) ## resources - - --- slug: movies tags: - evergreen description: "resconstructed source of https://aarnphm.xyz/movies" title: "movies." date: 2024-02-07 permalink: https://aarnphm.xyz/movies.html.md --- A (mostly) up-to-date film, movies, shows that I have consumed, or on the watch list. > Similar to an [antilibrary](https://aarnphm.xyz/books), an anti-blockbusters is a collection of movies, short films that represents the art of film-making. Honourable mentions: [mubi](https://mubi.com/en/ca) and [a24](https://a24films.com/) ## to [watch.](https://aarnphm.xyz/thoughts/Cinematography) - [ ] The King of Comedy (1982) - [ ] Dead Poets Society (1989) - [ ] La Haine (1995) - [ ] Flame & Citron (2008) - [ ] Blue is the Warmest Color (2013) - [ ] Frances Ha (2012) - [ ] Dallas Buyers Club (2013) - [ ] Paterson (2016) - [ ] Manchester by the Sea (2016) - [ ] Killing of the Sacred Deer (2017) - [ ] The Favorite (2018) - [ ] Under The Silver Lake (2018) - [ ] The Father (2020) - [ ] Poor Things (2023) - [ ] Maestro (2023) - [ ] Paris, Texas (1984) - [ ] Before Sunrise (1995) - [ ] The Defiant Ones (1958) ## recurring. ### vintage. - Citizen Kane (1941) - Casablanca (1942) - Godfather (1972) - China Town (1974) - Scarface (1983) - Midnight Run (1988) - Goodfellas (1990) - Schindler’s List (1993) - Pulp Fiction (1994) - Forest Gump (1994) - Good Will Hunting (1997) - Notting Hill (1999) - Chicago (2002) ### thriller. - The Breakfast Club (1985) - The Silence of the Lambs (1991) - My Cousin Vinny (1992) - Shawshank Redemption (1994) - No Country for Old Men (2007) - Whiplash (2014) - Fury (2014) - The Revenant (2015) - La La Land (2016) - Hackshaw Ridge (2016) - Joker (2019) - The Banshees of Inisherin (2022) - Dune: Part Two (2024) ### comedy. - Intouchables (2011) - The Intern (2015) - Jojo Rabbit (2019) ### buster. - Saving Private Ryan (1998) - Fight Club (1999) - The Social Network (2010) - Hacksaw Ridge (2016) - Blade Runner 2048 (2017) - John Wick series (2014 - 2022) - Dune (2021) ### a24. - Ex machina (2015) - Lady Bird (2017) - The Lighthouse (2019) - Uncut Gems (2019) - The Green Knight (2021) - The Tragedy of Macbeth (2021) - Everything Everywhere All at Once (2022) - Causeway (2023) - Past Lives (2023) - The Whale (2023) - Dream Scenario (2023) ### bond. - Dr. No (1962) - Goldfinger (1964) - Never Say Never Again (1983) - Octopussy (1983) - Casino Royale (2006) - Skyfall (2012) - Spectre (2015) ### wes anderson. - Rushmore (1998) - The Royal Tenenbaums (2001) - The Life Aquatic with Steve Zissou (2004) - The Darjeeling Limited (2007) - Fantastic Mr. Fox (2009) - Moonrise Kingdom (2012) - The Grand Budapest Hotel (2014) - Isle of Dogs (2018) - The French Dispatch (2021) - Asteroid City (2023) ### christopher nolan. - Following (1998) - Memento (2000) - Insomnia (2002) - Batman Begins (2005) - The Prestige (2006) - The Dark Knight (2008) - Inception (2010) - The Dark Knight Rises (2012) - Interstellar (2014) - Dunkirk (2017) - Tenet (2020) - Oppenheimer (2023) ### martin scorsese. - Mean Streets (1973) - Taxi Driver (1976) - The Wolf of Wall Street (2013) - The Irishman (2019) - Killers of the Flower Moon (2022) ### short. - The Wonderful Story of Henry Sugar (2023) ### shows. - Black Mirror - Bojack Horseman - True Detective (2014) --- slug: posts/2023 tags: - fruit - growth description: "2023: A letter to myself." title: "2023: a letter." date: 2023-12-31T00:00:00Z permalink: https://aarnphm.xyz/posts/2023.html.md --- _tw: self-harm. This is a public journal entry. Some of the following writing may contains information that you might find disturbed. Please treat it with kindness and care should you choose to read it._ This is 2023. A letter to myself. I will start with a mere nostalgic reflection filled with the goods and not so good, and ends with what I wish to accomplish for 2024. --- _To 2023 self,_ Per tradition, your year began with a visit from your parents. We decided to do SF-NY, with good food, shopping, and visiting relatives. Living on a different continent, you’ve longed for the simple joys of home – those weekend returns, the comfort of a home-cooked meal, the tender gesture of cut fruits like in your younger days. Now, these moments are treasured yearly reunions, a home-away-from-home gathering, where relatives journey across the globe to rekindle familial bonds. Despite finding these gatherings overwhelming, in your heart, you cherish these visits, holding onto the warmth of your parents’ presence. Returning to San Francisco was like stepping back into a vibrant painting, a city pulsating with life and a myriad of experiences. It was here, amidst its dynamic streets and scenic vistas, that you found writing and reading, surrounded by a community of inspiring individuals. And it was San Francisco, that you and J crossed paths. Running became a rhythmic solace amidst sleepless nights and turbulent thoughts. It was a discipline that anchored you, a steady presence in the chaos of young adulthood. You would find yourself lacing up for a run, whether it was pushing through work until dawn or embarking on a 5-mile run along the Bay regardless of how tired you were. In a sense, running offered a sense of stability, a means to channel your energy and thoughts, though often at the expense of your physical and emotional health, under the guise of youthful resilience. _“You are young; you should be fine,”_ you would tell yourself, perhaps a bit too cavalierly. Then came the spontaneous decision to go backpacking in Yosemite with S. Despite not being in the best shape, the allure of adventure was irresistible. It was a journey of firsts – your inaugural backpacking trip, your first visit to the awe-inspiring Yosemite, and your first encounter with the chill of near-freezing nights outdoors. Each moment was a revelation, an invitation to embrace the unfamiliar and challenging, a vivid reminder to savor every new experience life offered. ![](https://aarnphm.xyz/posts/2023/../../posts/images/2023-collage-yosemite.svg) --- Quoting [Nietzsche](https://aarnphm.xyz/posts/2023/../../thoughts/Philosophy-and-Nietzsche), _“To live is to suffer, to survive is to find some meaning in suffering”_. I don’t know about you, but the moment life hints at normalcy and tranquility, a restless itch starts to stir within me. It’s like K often says with a knowing smile, “you are a messy gyal.” There’s a peculiar comfort in chaos, a familiar embrace in the whirlwind of change that I’ve always gravitated towards. Now, as I stand at this crossroads in the Bay Area, that restlessness is more pronounced than ever. I had taken a leave from school to move to the Bay for work, a decision that now hanged in the balance. The Bay Area, beckoned me to stay. Here, life, was a beautiful mosaic of experiences – doing what I love, being surrounded by friends, and cherishing those weekends with J. Yet, it was shadowed by the looming uncertainties of visa statuses, a constant undercurrent of anxiety about the future. The alternative, returning to Canada, loomed like a storm cloud. It’s a retreat into a past that’s drenched in discomfort, a reversal into what I’ve always perceived as a life of constraints and unfulfilled potential. The very thought of leaving J, hitting pause on our shared dreams in San Francisco, sends a pang of sorrow through me. Canada isn’t just a different location; it’s a return to a version of myself that I’ve struggled to leave behind. The visa challenges remained, a familiar yet unwelcome companion, no matter which border I call home. In the quiet moments, your mind wrestled with these paths, each fraught with its own set of fears and what-ifs. “Stop, don’t leave. You can do it. Stay,” a voice within you whispered, a blend of hope and desperation. It was a plea to cling to the life you’ve started to build here, to not let go of the joy and love you’ve found. This internal dialogue became your constant soundtrack, a reflection of the turmoil that dances within your heart. It was a Sunday afternoon, you and J were enjoying a cup of coffee, in the Marina. Like whispers of the gentlest breeze, the wind danced through J’s hair, each strand a melody, weaving tales of love in the air. It carried J’s scent, a tapestry of [rose](https://aarnphm.xyz/posts/2023/../../thoughts/Scents#le-labos-rose-31) subtly entwined with earth’s warm embrace, a tender symphony barely touching the senses. You whispered in her ears, _“I have to leave for Toronto.”_ Her hair, once a playground for your fingers, now swayed to the rhythm of a compassionate wind, each strand moving with the grace of unshed tears. The air, perfumed with the delicate scent of roses and spices, seemed to hold our memories, cradling them gently as if to soften the blow of parting. Our eyes met, yet tinged with the inevitable sorrow of farewell. Words were unnecessary; our hearts spoke in silent verses, each beat a soft adieu. It was a parting not of anger or regret, but of two souls acknowledging their journey together had reached a tender, inevitable end. We both sit there, cried in silent. The gentle wind, a compassionate witness to our farewell, carried away the last whispers of a love that was as beautiful as it was ephemeral, leaving behind a calm, poignant tranquility. --- It was now sunny July, you found yourself back in Canada, slowly acclimating to the new life. The makeshift bed, consists of two fitted sheets, a duvet, and a pillow, while waiting for furnitures to arrive back from SF, offered a modest comfort yet lacked the essence of home — a feeling that remained elusive, a sense of displacement that gently lingered. This wasn’t your first rodeo. Relocations has somewhat become normalcy for you: leaving Hanoi for boarding school seven years ago, then moving across Canada for university and lived in campus housing, to moving into student housing, _alone_ amidst 2020’s misfortunes, returning to Vietnam shortly after, then moved back to Canada for online-university living in an overcrowded unhygienic household filled with strangers, followed by your determination to leave Canada once and for all to SF, to chase the “American dream”. However, this time, the feeling stirred differently. Gone was the wide-eyed public school kid who first stepped onto Canadian soil, filled with aspirations. Faded, too, was the image of the bewildered freshman adrift in a sea of unfamiliar faces at university. And the weary, drained engineer who sought refuge in San Francisco, seeking an escape, had evolved. Now, as you sat amidst the quiet of your new space, you grappled with a curious blend of familiarity and foreignness, a paradox yet to be unraveled. It was as though each move had subtly reshaped you, leaving you at this juncture—a point where the past’s reflections and the present’s realities were gently converging, weaving a tapestry of your journey, both unique and universal. In this moment, you were at the cusp of reconciling these myriad selves, each a chapter in the unfolding story of your life. Staring into the abyss, you wonder what would unfold in this next chapter of life… --- There, in the quietude of your new surroundings, you embarked on a pilgrimage of the self. It was a journey marked not by physical distances but by the rich, inner landscapes you traversed. In the company of books – those silent yet eloquent companions – you sought refuge. The philosophers, with their timeless musings, the historians narrating tales of yore, and the modern sages offering insights of the present, became your guides in this quest for understanding. You also rekindled old friendships, those that had lain dormant in the wake of your sojourn to San Francisco. It was as if you were gathering scattered pieces of a once-familiar mosaic, each friend a fragment of a life you once knew. There was a sense of quiet accomplishment in the quiet transformation of your apartment. Each piece of furniture, was a testament to a life being patiently rebuilt, piece by piece. Physical exertion, too, found its place in your routine – climbing gym, disciplined rhythm of your runs, a pursuit of wellness that contrasted with the less tangible journey of the mind. The runs, though lacking the scenic vistas of San Francisco, offered a subtler, more introspective landscape. Work, too, assumed a new significance with [OpenLLM](https://aarnphm.xyz/posts/2023/../../thoughts/work#openllm----serve-fine-tune-and-deploy-llms-in-production). It demanded of you a pace and a depth of understanding that was both exhilarating and daunting. The ability to assimilate, to adapt swiftly, became what you accustom to. Then there was HackTheNorth. Convincing S to sponsor HackTheNorth, and your subsequent workshop on language models, was not merely a professional victory, but a reconnection to a vibrant belief in hacker culture, filled with anticipation and excitement for building technology. ![](https://aarnphm.xyz/posts/2023/../../posts/images/2023-collage-htn.svg) --- ![](https://aarnphm.xyz/posts/2023/../../posts/images/2023-collage-heal.svg) You [showed up](https://x.com/daniellefong/status/1732922352244302196). You showed your love and affection for your friends through the warmth of a home-cooked meals. Potlucks, tasting menus - they were your ways of nurturing the bonds of friendship, a respite from the pressures of student life. Remember that Halloween, when you cooked a feast, ensuring your friends were well-fed and ready for a night of revelry? Surrounded by the sizzling skillet and steaming hot mashed potatoes, you found a sense of belonging. You’ve never seen yourself as the quintessential party-goer, often feeling like an observer on the fringes of the festivities. But you went, drawn by the camaraderie, even as a part of you remained reluctant. At the party, a familiar sensation crept in – a detachment, a subtle unravelling of your connection with the scene around you. Your inner id, usually so deeply buried, surfaced to whisper a stark truth: you didn’t quite mesh with this crowd. This realisation triggered a rush of anxiety, a feeling that swelled like a wave, urging you to escape, to find solace in the quiet of your own space. So, you left. You left the noise, the laughter, and returned to the silence of your home. There, in the aftermath of the evening’s earlier warmth, you were greeted by the remnants of your culinary endeavours – the pots, pans, and utensils bearing testament to the meal shared in love and friendship. In the stillness of your kitchen, a profound sense of loneliness enveloped you. You sat there, amidst the silent witnesses of your earlier joy, and tears began to fall. It was a poignant contrast – the joy of cooking for others and the solitary ache of feeling out of place, misunderstood. --- _Remember that breakup with J?_ The summer was a portrait of heartbreak, painted in shades of sorrow and restless nights. It’s funny how we try to mend ourselves, isn’t it? With a schedule as a plaster over a gaping wound. I had it all mapped out, or so I thought. But life, in its infinite jest, has a way of upending even the best-laid plans. It was on a nondescript day, November 13th, that I found myself on a date with a woman I’d met in the digital maze of online dating. The evening was unremarkable, tinged with the effort of trying to reconnect with the world. We ended up at her place - an encounter that was at best, mediocre. In the midst of the intimacy, memories of J invaded my mind, unbidden, like ghosts from a past life. J and I, we were polyamorous. Unorthodox, yes (but not really in SF), but to each other, we were anchors. My reluctance to move back to Canada was rooted in her – she was my ‘it’, my endgame. And then, as if summoned by the universe, a message from J pierced the night. Her words, simple yet loaded, unravelled me. We had agreed to silence, to give time and space for healing. But there I was, haunted by the love that embraced my most authentic self, the part of me unshielded by the armor I’d forged over the years. That night was a symphony of restlessness, the presence of another unable to fill the void. 3:30 am, my phone shattered the silence – it was J. Panic and longing intertwined as I answered. _What harm could there be?_ What followed was a mosaic of late-night conversations, spanning many weeks. J’s voice, laced with tears, spoke of longing and loss. Our talks were a roller-coaster of emotions – laughter quickly drowned by arguments, smiles eclipsed by sorrow. I was a cocktail of anger and sadness; I had moved on, or so I had convinced myself. Why now, in the midst of this? J’s behaviour was a mystery, a deviation from her usual sensibility. And there I lay, sleep eluding me, troubled by the thought of her distress. It was a pain that seeped deep into my bones, a relentless reminder of a love that refused to be buried. One morning, you found yourself seeking refuge in kitchen. It’s curious how, in times of turmoil, we gravitate towards the mundane, the ritualistic. There’s a certain healing power in cooking – the methodical chopping of vegetables, the hiss and dance of ingredients in the skillet, the rich tapestry of scents that fill the air. But even in this culinary cocoon, the spectre of J haunted you, infusing your silent tears with the bitterness of memory. As you lost yourself in these reflections, a momentary lapse in attention brought a sharp pain – a startling intrusion into your reverie. A drop of blood bloomed on the cutting board, a vivid contrast against the muted colours of the vegetables. The sight of it, coupled with the realisation that you had inadvertently cut your finger, brought a wave of lightheadedness. Yet, even as the shock set in, you instinctively reached for a towel, pressing it firmly against the wound. With a calm born of necessity, you navigated your way to the first-aid kit. Your hands, guided by a survival instinct that momentarily eclipsed the overwhelming thoughts of J, worked diligently to clean and dress the wound. After tending to the injury, you slumped against the fridge, your gaze drifting aimlessly to the ceiling. In an instance, a thought flickered through your mind – the notion of ending it all. But just as quickly as it surfaced, it dissipated at the thought of your mother. The image of her, perhaps unaware of the depths of your current struggles, yet invariably intertwined with your existence, acted as a grounding force. In the quiet of your kitchen, with the pain in your finger a sharp but grounding sensation, you were left to confront your ‘ego’ – the pain, the emotion, the longing, the love, and the indomitable will to endure. --- Navigating the aftermath of a first serious relationship is akin to finding one’s way through an uncharted wilderness, especially for someone who had always embraced solitude. My relationship with J was a journey into unexplored emotional depths, a discovery of a love both profound and transformative. Yet, when it ended, I was adrift in a sea of emotions, overwhelmed like a teacup caught in a relentless downpour. In relationships, we often find ourselves surprised by the depths and complexities of those we hold close. J was a revelation in this sense, a mirror to parts of myself I hadn’t known. But as the emotional turbulence continued, my logical self, long subdued, finally asserted itself. It whispered of the need for closure, for the sake of my own well-being. The final call to J was a bridge between past and future, a necessary severance, blocking all lines of communication going forward. This decision, difficult as it was, felt like the only way forward, a path to healing for both of you. Sharing experiences with Mom did lift a weight off your shoulders. It marked a turning point, a chance to truly move on. And before you knew it, Christmas break was upon you. Your return to school was marked by a fresh perspective, one shaped by your stint in SF. School is now a place for you to explore your interests and have the joy of learning, as it should be. For the first time, you found joy in the very structure of academia. ![](https://aarnphm.xyz/posts/2023/../../posts/images/2023-collage-finals.svg) --- 2023’s Aaron did: - Work-wise, [OpenLLM](https://aarnphm.xyz/posts/2023/../../thoughts/work#openllm----serve-fine-tune-and-deploy-llms-in-production), we actually made revenue this year, and got to work with some very, very cool companies!! You also did [buildspace S4](https://buildspace.so/) - Favourite movie that I cried has to be [Past Lives](https://www.youtube.com/watch?v=kA244xewjcI\&ab_channel=A24). The quintessential symphony of my journey so far. - Favourite restaurant is [CIMA](http://www.cimaonlocke.ca/). The food is amazing and I love the staff there. I have cried here many, many times. - I found philosophy somewhat cumbersome before, but this one class in university did change my perspective on the subject. Read [Nietzsche](https://aarnphm.xyz/posts/2023/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche)’s work, did exploration on metaphysics, and found _Beyond Good and Evil_ my favourite for 2023. - Expanded my vinyl collections, including Daft Punk, Fleetwood Mac, Led Zeppelin. - Made some house [tunes](https://www.youtube.com/playlist?list=PLsRPzRsbp3lCxe4gXH4S4Zf38X_45Oj6N), very much inspired by Fred again, Four Tet, and Peggy Gou. You still find your tone, but keep working at it. The following was hacked together in an afternoon: [](https://aarnphm.xyz/posts/2023/../../posts/images/2023-flac-1.mp3) _ID 1_ - Also, that one YouTube video I kept playing on repeat is [this one from the Lot Radio](https://www.youtube.com/watch?v=hvO0PrMBH9I\&ab_channel=TheLotRadio), or anything from this [query](https://www.youtube.com/results?search_query=four+tet+lot+radio) - Favourite object is this [10-inch pan](https://madeincookware.com/products/stainless-steel-frying-pan/10-inch). I kid you not having a stainless steel pans feels like a hack. Absolute love this bad boy. Second favourite object is this [turtleneck](https://www.ralphlauren.ca/men-clothing-sweaters/wool-cashmere-turtleneck-sweater/625236.html?dwvar625236_colorname=Vest%20Olive%20Heather#q=turtleneck\&br=t\&fq=division%253A%2522Men%2522\&start=1). I wore this pretty much everywhere. If you see me IRL chances are you saw me wearing this. --- Looking back, twenty-twenty-three was filled with moments of joy and sorrow, of love and loss. What I want for 2024: - `atelier with friends`, where you can pay what you think the meal is worth. I want to do at least _10_ this year. - Continue in the rabbit hole of philosophy: Deleuze and and Camus - I want to tend to my garden a bit better. There are too many `draft` and `noindex` notes that needs to taken care of. Mainly because 2023 was pretty much turmoil galore 🤗 - Learning to _let go_, and boundaries. - Finish that G2. (_ok I do need to get a driver license_) - Apprendre le français Kindly, _Your present self_ --- slug: posts/Chaos tags: - sapling - growth - self description: "on growing one year older. And a few things I learned growing up in a foreign land." title: "Chaos is intuitive yet disheveled." date: 2024-02-18 permalink: https://aarnphm.xyz/posts/Chaos.html.md --- _This is an extension of my notes on [chaos](https://aarnphm.xyz/posts/Chaos/../../thoughts/Chaos)_ ![Passage by Giorgio Morandi](https://aarnphm.xyz/posts/Chaos/../../posts/images/passage-giorgio-m.webp) Passage by Giorgio Morandi Chaos isn’t merely an undercurrent of life; it’s a pervasive force, ever-present, often simmering just beneath the surface, ready to erupt and manifest in myriad forms. It serves not only as a backdrop in the narratives of storytellers and the musings of philosophers but also a distinct entity with the power to challenge those brave enough to embrace its unpredictability. To move abroad, to step into the unknown, is to court chaos – to acknowledge and accept the inevitability of change and the sharp tang of constant motion. So far, I’ve lived on my own (or far away from family) for a third of my life, having made the leap to Canada at 16. This move, though seemingly late comparing to high school peers, was a turning point. It wasn’t just a change of scenery; rather, it formed a new way of seeing and being in the world for me. To articulate about the essence of moving to a new continent, let alone partake in the Western [educational](https://aarnphm.xyz/posts/Chaos/../../thoughts/education) system, still to this day, is a task fraught with complexity that I yet to comprehend. In the years before my departure, I was enrolled at [Hanoi-Amsterdam](https://en.wikipedia.org/wiki/Hanoi_%E2%80%93_Amsterdam_High_School), some can considered the “crème de la crème” school within the public school system in Vietnam. Middle school was pretty much an endless march of memorisation and night classes, all leading up to the high school entrance exams. Within this rigorous routine, there was no room for complaints or questions. I wasn’t content, yet I found a way to push through, not realising the toll it was taking on my mental and physical health. Therapy, attempted much later, didn’t reveal anything new. Perhaps, my continued sessions are a search for external validation that I’ve longed for. My sense of [self](https://aarnphm.xyz/posts/Chaos/../../thoughts/Value) was intertwined with being accepted into this institution. --- ```poetry language=fr Three weeks before the entrance exam, or something like that. Saturday afternoon. ``` The sun blazed down with a ferocity that seemed almost personal, its rays relentless against the backdrop of an afternoon sky devoid of clouds. Inside, within the four walls of the room where I had spent my years growing from a child into something resembling an adult, I sat hunched over my literature review. The task was simple in theory: memorize one of three essential poems. Yet, as the sunlight fought its way through the window, casting a harsh light on the pages before me, the words seemed to dance and dodge my grasp, refusing to be tamed. My focus was a blade, dulling with each failed attempt to carve the verses into my memory. The stillness of the room, a stark contrast to the turmoil within me, was punctuated only by the occasional creak of the house settling, as if it too strained under the weight of the heat. The air was thick, the kind of heat that makes the mind sluggish, the body weary. It was as if the entire world outside had paused, holding its breath, while I waged my silent battle within these familiar walls. Frustration mounted within me, a tide that threatened to breach its banks. I pressed on, the words of the poem blurring before my eyes, each line a testament to my faltering resolve. My mom, ever attuned to my struggles, sensed my distress. Her suggestion to move on was gentle, her words soft, “It’s okay, darling, let’s skip this one.” But to me, they sounded like a verdict, a confirmation of my fears. At 15, her words did not offer the comfort she intended. Instead, they unleashed the floodgates, and tears streamed down my face, a silent scream of defiance and despair. ```poetry language=fr Mom, I can't fail this exam. ``` I managed through sobs, the words thick in my throat. The room, with its memories and familiar comforts, felt suddenly alien, a witness to my vulnerability. In that moment, the outside heat, the oppressive stillness, and the chaos of my inner turmoil melded into a single, inescapable reality. --- Even after securing my place at Hanoi-Amsterdam, my disdain for it grew. The competitive and toxic atmosphere was a far cry from what I expected. It was a battleground for status, with little regard for collaboration or personal growth. My mom, herself an educator, saw the system’s failure to nurture curiosity or critical thinking. So, when the chance to study abroad presented itself, I seized it, leaving Vietnam behind. This decision marked the start of a tumultuous journey within. [Entropy](https://aarnphm.xyz/posts/Chaos/../../thoughts/Entropy) was seemingly first introduced to me in the form of the Canadian education system. The transition from the rigid, rote-learning environment to the more open, discussion-based system in Canada was jarring. The shift from a public school to a private boarding school was equally disorienting. The culture shock was palpable, and the adjustment period was fraught with challenges. I was a stranger in a strange land, a fish out of water, and the chaos of my new reality was overwhelming. I was completely baffled, destroyed, was up to no good (if you knew me you knew what I’m talking about!). But one thing that I have learnt from all the trauma accumulated throughout my experience at Amsterdam, was that, “Mama ain’t raised no quitter.” Thus, it was not-quite-okay-but-found-a-functional-way-to-survive mental model to persist throughout high school. Then the rest was history. Seemingly, this untamed curious inner child, still clung to my being, propels me forward. It is that inner chaos that encourage me to embark on this journey of understanding. > The world is a scary place, but I’m learning to cope through it. The [Übermensch](https://aarnphm.xyz/posts/Chaos/../../thoughts/Philosophy-and-Nietzsche) crossed over the bridge and guided me through the trenches of life. --- ![that one bar that I visit too often](https://aarnphm.xyz/posts/Chaos/../../posts/images/cima.webp) that one bar that I visit too often > I’m not sure where I want to go from here. Writing it down felt like opening a door I have long left shut. Each word was a step deeper into memory I have neatly folded away, not realising how much they still pulsed with life beneath the surface. Each of them felt like a sword, that carved deep into the heart, has a way of prying open the floodgates of emotions long buried. It’s one thing to carry your past quietly within you, another entirely to lay it out for the world—and yourself—to see. Suddenly, the chaos I thought I had managed whispered louder, demanding attention. [Equanimity](https://aarnphm.xyz/posts/Chaos/../../thoughts/Chaos#versus-equanimity), that state of calm balance, feels elusive, almost mythical, when you’ve danced with chaos so intimately. It’s as if I’ve befriended the storm, finding a strange comfort in its unpredictability, its relentless energy. This chaos, it doesn’t just disturb; it defines, shaping the contours of who I am, how I see the world. There’s a fear in tranquility, a suspicion of its silence. What does it mean to be at peace when you’ve grown accustomed to the noise? Yet, this journey—my journey—isn’t about conquering the chaos but learning to live with it, to see its patterns and understand its rhythms. Maybe equanimity isn’t about taming the monster but recognizing it as a part of the self, a reflection of the complexities and contradictions that make us human. The pursuit of balance isn’t a battle but a negotiation, a conversation with the parts of ourselves we fear and love in equal measure. Embarking on this exploration of different “entropic phenomena,” as I’ve come to call it, isn’t running away. It’s a [search](https://aarnphm.xyz/posts/Chaos/../../thoughts/Search) for understanding, a way to navigate the tumult with eyes wide open. There’s beauty in the chaos, lessons in the turbulence. And perhaps, in acknowledging this, I move closer to the equanimity I seek—not as a destination, but as a way of being, fluid and ever-evolving, amidst the storms and stillness alike. --- Last but not least, I would leave you, future Aaron, with a few questions that past-Aaron has been longed to find an answer. Let us, the duality of self, partake in a [Socratic dialogue](https://aarnphm.xyz/posts/Chaos/../../thoughts/questions), hopefully, through the process, we can find some normalcy within ourself: ## Q: who are you trying to become? A: Perhaps it is less about becoming but more about unravelling the complexities from within. There is a certain naive desire, a childlike curiosity, that propels me towards the unknown, the seas of uncertainty. In embracing this naive desire, I become a vessel of my own making, navigating the complex seas of existence. As it may be, at the moment, I’m trying to protect that child and shield him from the turbulence and chaos we call life. ## Q: why can’t you move back home? A: Consider the river and the dam. The river, a living artery, courses from its source with a purpose as clear as its waters. It meanders, shaped by the land it traverses, until it reaches the dam. Here, it lies in a deep réservoir, a body of water in waiting, destined to flow through turbines and continue its journey downstream. This cycle is perpetual: the sun draws the water skyward, and it returns as rain, nourishing the earth on its way back to the river. But the droplets that return are transformed, no longer the same entities that once rested in the dam’s embrace. The act of leaving one’s home for foreign shores is akin to such journey - a voyage of transformation, of encountering new landscapes, of merging with unfamiliar currents. When one leaves home, they embark on a trajectory vastly different from those who stay. The familiar becomes distant, and upon return, the once-known world feels alienated. You stand apart, changed in the eyes of those who remember you once were. “Home” remains a static concept, a memory preserved in amber, while you, like the river, have been irrevocably altered by your experiences. _In other words, this is often known as [the theme of displacement](https://aarnphm.xyz/posts/Chaos/../../thoughts/displacement)_ To return home is to face a poignant paradox: the physical space may be unchanged, the same faces may greet you, the house of your childhood may still nestle in its familiar spot, but your perception of it all has shifted. Gone the person you once was; now you have become the confluence of experiences that mold the “now” you, just as the returning water is forever changed by its journey. Yet, despite these changes, the essence remains. The being of ‘aqua’, remains unchanged, as the inner child within us persists. It is this unchanging essence that bridges the gap between the person we have become and the place we once called home. The question, then, is not why you cannot move back home, but rather, how can one reconcile the transformed self with a place that is both intimately familiar and strangely foreign, a place etched in memory, unchanged by time yet estranged by the journey’s passage. ## Q: what do you want to achieve? A: I want to achieve a sense of peace, a balance between the chaos and the stillness. Navigating the tumult with grace, and learn to let people in. I want to look back, on what we have went through: the stillness, the moments of joy and sorrow, and know that I have lived fully, embracing the complexities and contradictions, that make me human. I want to settle down, finding a place that you truly found happiness, and found sparing partners that will help you enjoy the journey a lil bit more. ## Q: what is next? A: Changes are hard, pushes us from the comfort of our well-defined boundaries, daring us to step beyond the familiar. It whispers of growth, of the necessity to stretch our skins beyond the contours of our current selves. This leap, from one domain to another, is fraught with challenge, yet it pulses with the thrill of exploration. Yet, in this era, the drive for transformation often crashes against the shores of economic reality. Monetary values, trickles in sparingly, hardly enough to spark the fires of self-renewal. Chaos, in its disdain for the stagnant, scoffs at the notion of safety. Safety, a gilded cage, stifles growth, ensuring that within its confines, we remain less than what we might become.. Life, then, poses its eternal riddles: Why does fear of the unknown paralyze us so? How do we stand firm in the belief that we are not solitary wanderers in this vast expanse? The warmth of unseen affections often goes unnoticed, yet in the heart’s quiet moments, we understand that our absence would echo in spaces we have touched. The world, with its myriad terrors and wonders, unfolds before us, a realm where the overman’s gaze might fall upon us. Yet, this overman, this ideal, is but a mask, a collective facade beneath which we all seek refuge. An unexpected call from a high school friend, a rarity, blooms like a flower in the desert. It’s a testament to the enduring nature of connections, a comforting reminder that amidst the vastness, there are anchors, points of light in the familiarity of shared pasts. But the immensity of it all can be overwhelming. Life teems with endless possibilities, a ceaseless buzzing that fills the mind with anxiety. The world, too large, our time, too fleeting, and the soul, too eager, finds itself adrift in a sea of potential paths. I’ve learned the art of detachment. People, with their inherent unpredictability, often disappoint. By tempering expectations, we shield ourselves from the sting of disillusionment. Camus mused on alienation, a reflection on the distance between the self and the other, a chasm often widened by unmet expectations. What lies ahead is a question that perpetually dances on the edges of my thoughts, a melody whose tune is both haunting and invigorating. Perhaps the answer to this enigma doesn’t reside in a single destination or outcome but rather in the delicate equilibrium between the facets of my being. On one hand, there’s the driven Aaron, fueled by curiosity and a relentless pursuit of excellence. This Aaron is a force, a whirlwind of ambition and determination, always pushing forward, always reaching for the next peak to conquer. On the other hand, there exists another Aaron, one who carries the weight of past hurts and seeks not just to advance but to heal. This Aaron understands that growth isn’t solely about personal achievements but also about nurturing and repairing the web of relationships that envelop him. This version of myself is attuned to the quiet, often overlooked work of mending bridges and soothing wounds, both his own and those of the people around him. The path forward, then, might not be a straight line but a winding road that requires navigating the complexities of these dual identities. It’s about recognising that the quest for achievement and the journey toward healing are not mutually exclusive but are, in fact, complementary forces. By embracing both the drive to excel and the need to heal, one can forge a way forward that honours the entirety of your aspirations. In this balance, You might find not just the next step but a deeper understanding of what it means to truly live. It’s about making peace with the multifaceted nature of my desires and recognising that every facet, whether driven by ambition or the need for connection, plays a crucial role in defining who I am and who I aspire to be. The road ahead is one of integration, where the driven and the broken parts of me walk hand in hand, each lending strength to the other as I continue to explore the vast landscape of possibilities that life offers. With regards, Anh P. ![young Aaron through time](https://aarnphm.xyz/posts/Chaos/../../posts/images/aaron-younglings.ignore.jpeg) young Aaron through time --- slug: posts/Questions-about-Apology tags: - philosophy - fruit - philos1aa3 description: "Questions about Plato's Apology" title: "Questions about Apology" date: 2023-11-09 permalink: https://aarnphm.xyz/posts/Questions-about-Apology.html.md --- In Plato’s [Apology](https://aarnphm.xyz/posts/Questions-about-Apology/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato#apology), Socrates delineates a distinct boundary between pursuing a life of justice and engaging in politics. He posits that a life devoted to righteousness is fundamentally at odds with the realm of political involvement (Apology, pp. 41-42). Through the Socratic dialogues of Socrates’ own trials, Plato illustrates an examination of the moral and ethical foundations of the Athenian society and political system, underscored by Socrates’ assertions, _“He who will fight for the right, if he would live even for a brief space, must have a private station and not a public one”_. Consequently, I find myself aligning with Socrates’ perspective, asserting that leading a just life is an endeavour incompatible with holding political office. [Socrates](https://aarnphm.xyz/posts/Questions-about-Apology/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates) ascribes his abstention from political engagement to a divine mandate (“a voice”) directing him towards the pursuit of [truth](https://aarnphm.xyz/posts/Questions-about-Apology/../../thoughts/Will-to-Truth) and virtue. At the onset of the trial, Socrates mentions a prophecy from the Oracle of Delphi, which declares that “no man \[is] wiser” than Socrates. Spurred by this proclamation, Socrates engages with those renowned for their wisdom through scrutiny and questioning in an attempt to unravel the oracle’s message. Yet, none could furnish satisfactory answers to his inquires, leading Socrates to a profound realisation that his true wisdom is rooted in recognising his own ignorance for knowing nothing (Apology 21a-23b). Socrates then embarks on a path of reminding those around him always to use intellect to scrutinise their lives and questions whether they live their life truthfully, embodying a commitment to virtuous living (Apology, 23b-23d). This philosophy is further emphasised in his dialogue, where the statement “the unexamined life is not worth living” encapsulates his conviction and mission to lead a life rooted in truth and virtue. Socrates’ commitment to a virtuous life often stood in stark contrast to the political pragmatism of Athens, a discrepancy which resulted in numerous adversaries for Socrates over the years as he advocated for a virtuous way of life. Throughout his trial, Socrates shed light on the corruption ingrained within the Athenian democratic system, as evidenced by the charges against him—corrupting the youth and displaying impiety towards the Athenian pantheons (Apology 24a-28b). These accusations stemmed from his associations with individuals who had fallen out of political favour in Athens post-Peloponnesian War (Britannica). Upheld by the principle that “injustice and disobedience to a better, whether God or man, is evil and dishonourable,” Socrates found the notion of partaking in the “public life” of Athens’ turbulent political scene unpalatable, especially when faced with political decisions (Apology 24d, 25a). Hence, he chose to steer clear of a “public life,” recognising that the political domain, fraught with inherent compromises, could lead individuals towards committing injustices, thereby tarnishing the soul. In Socrates’ view, it was his duty as a philosopher to uphold moral integrity without succumbing to the compromises inherent in politics. Socrates even suggests that death is preferable to a life of dishonesty or moral compromise (38a, 30c-d). His willingness to face death rather than retract his philosophical beliefs during his trial epitomises this stance. By abstaining from political life, Socrates was able to dedicate himself to a life of virtue and truth, even at the cost of his own life. Through this choice, Socrates exemplifies the notion that a life worth living is one committed to higher principles rather than personal or political gain. While I understand this stance, I find it somewhat implausible as I believe a life worth living necessitates a balance between moral integrity and political engagement rather than solely focusing on maintaining a high moral compass. If one aligns solely with Socrates’ ideas, there’s a risk of being perceived as selfish for not seizing the opportunity to effect positive change. Historical figures like Martin Luther King Jr. embody a different ideology by embracing political engagement to drive substantial changes for the betterment of society (Strauss, B). In conclusion, [Plato](https://aarnphm.xyz/posts/Questions-about-Apology/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato)’s Apology articulates Socrates’ persuasive arguments regarding the dangers inherent in political life for individuals committed to justice and truth. Socrates’ life and trial serve as poignant exemplars of these challenges within the historical context. The core of Socrates’ philosophical inquiries and thought-provoking arguments, which challenge the values and norms of Athenian society, suggests that total withdrawal from public life is the sole path for a philosopher whose mission is to pursue truth and maintain personal integrity, irrespective of the political climate. While these ideas may hold relevance in a specific context, I argue that they are implausible. A nuanced balance between political engagement and moral integrity should be the cornerstone one aims for to lead a life that is truly worth living. ### References Plato. (n.d.). Apology. Translated by Benjamin Jowett. Retrieved from . Encyclopaedia Britannica. (n.d.). Background of the trial - Socrates. Retrieved from Strauss, B. (n.d.). Martin Luther King Jr. and Socrates. Retrieved from \_feedback: criticising Apology arguments, perceived as selfish Socrates in general with appearance vs. essence (value essence over appearance) so long as you are actually selfish Alegory of the cave for example. Devaluation of opinion Earlier in Repulblic doing thought experience (just look injust vs injust look just) ⇒ live of the just would be better to live \_ --- slug: posts/Questions-about-Metaphysics tags: - philosophy - fruit - philos1aa3 description: "Questions about Aristotle's Metaphysics" title: "Questions about Metaphysics" date: 2023-11-16 permalink: https://aarnphm.xyz/posts/Questions-about-Metaphysics.html.md --- In reflecting upon [Aristotle](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle)’s [Being qua being](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Being-qua-being), especially his demarcation of wisdom from mere experiential and technical knowledge, it becomes compelling to juxtapose his perspectives with the more fluid conceptions of knowledge for modern days, proposed by [Nietzsche](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/Philosophy-and-Nietzsche) and [Freud](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/Freud). Nietzsche’s theories, emphasising the subjective nature of knowledge, alongside [Freud](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/Freud)’s insights into the unconscious dimensions of human comprehension, present a stark contrast to Aristotle’s more structured paradigm. Aristotle delineates wisdom as a form of knowledge superior to others, stating, _“For the wise man must not be ordered but must order, and he must not obey another, but the less wise must obey him”_ (Metaphysics, Book 1, Chapter 2). This hierarchical and seemingly rigid distinction appears less pertinent in contemporary discourse, where the boundaries between various domains of knowledge are increasingly permeable and intertwined. My argument posits that while Aristotle’s framework offers a valuable basis for understanding wisdom, a modern interpretation of wisdom should not only incorporate a philosophical understanding of universal truths but also embrace the dynamic and ethical application of knowledge in varied contexts. Wisdom, in today’s world, goes beyond simple comprehension or command; it encapsulates adaptability, cooperative engagement, and the sophisticated application of knowledge in addressing the complex challenges that define our times. [Aristotle](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle) establishes a clear hierarchy between experience, knowledge, and wisdom, positing that while experience is valuable for practical action, it falls short of constituting true knowledge or wisdom. He notes, _“With a view to action, experience seems in no respect inferior to art… But yet we think that knowledge and understanding belong to art rather than to experience…”_ (Metaphysics, 132). Here, experience is depicted as the practical application of skills, a necessary but insufficient component of deeper understanding. In contrast, knowledge, particularly in forms like art or technical mastery, is portrayed as encompassing a comprehension of underlying principles and causes. Furthermore, Aristotle demarcates knowledge as a progression beyond experience, implying a deep understanding of the ‘why’ behind things. He states, _“For men of experience know that the thing is so, but do not know why, while the others know the ‘why’ and the cause”_ (Metaphysics, 132). Here, knowledge represents a transition from simply acknowledging facts to understanding their foundational principles and broader implications. This includes both ‘technē,’ a kind of knowledge relevant to making things (craftsmanship or art), and ‘epistēmē,’ scientific knowledge. These forms of knowledge are not just about knowing facts or processes; they involve understanding the principles and causes behind them. Wisdom (_Sophia_), according to Aristotle, is the pinnacle in this hierarchy. In discussing the nature of sciences and their quest for understanding, he observes, _“Clearly then Wisdom is knowledge about certain principles and causes” (Metaphysics, Book 1, Chapter 1)_. This assertion posits wisdom not as a mere collection of knowledge but as a synthesis of practical know-how, theoretical understanding, and philosophical introspection. It is through this synthesis that one apprehends the fundamental nature of reality. In Aristotle’s philosophical construct, wisdom thus signifies a deep and comprehensive grasp of universal truths and causes, transcending the limitations of both practical experience and technical knowledge. Wisdom is characterised by an ability to teach and understand the causes in every branch of knowledge. He views wisdom as the highest form of knowledge, one that seeks to understand the ultimate causes and principles of all things. In the modern world, the distinction between experiential knowledge and wisdom, as outlined by Aristotle, seems less rigid. This perspective is further challenged by the contributions of thinkers like Nietzsche and Freud, who bring unique insights into the nature of knowledge. Nietzsche’s concept of [perspectivalism](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/Philosophy-and-Nietzsche#thus-spoke-zarathustra) suggests that all knowledge is subjective and shaped by our viewpoints, challenging the idea of objective or absolute wisdom (The Atlas Society; Nietzsche on Truth and Philosophy, Cambridge University Press). Similarly, Freud’s deterministic view of the unconscious mind and the role of instincts in shaping human behaviour highlight the complexities and unconscious elements in our understanding of knowledge and wisdom (Internet Encyclopedia of Philosophy). These perspectives imply that in the modern context, where knowledge is often seen as more fluid and multifaceted, Aristotle’s structured approach to wisdom may not fully encapsulate the diverse and subjective nature of understanding. Technological advancements and the widespread accessibility of information have facilitated the acquisition of deep knowledge in various fields, transcending traditional academic boundaries. This democratisation of knowledge hints at a more integrated relationship between experience, technical expertise, and wisdom, aligning with the contemporary educational emphasis on interdisciplinary approaches and problem-solving skills. Furthermore, Nietzsche’s criticism of the concept of a predetermined human telos or purpose stands in opposition to Aristotle’s view of wisdom as the pursuit of universal and objective truths. Nietzsche’s perspective suggests that the potential for human excellence and virtue is not a fixed or singular path but rather a diverse and evolving journey shaped by individual experiences and perspectives. In conclusion, Aristotle’s hierarchical approach among experience, knowledge, and wisdom in Metaphysics, while foundation, is increasingly at odds with contemporary views and deemed not plausible for the modern world. Nietzsche’s critique, especially his rejection of objective moral values and advocacy for individualistic value creation, challenges Aristotle’s wisdom hierarchy (Philosophy Now). The [Übermensch](https://aarnphm.xyz/posts/Questions-about-Metaphysics/../../thoughts/Chaos) concept, focusing on individual value creation through self-justified actions, stands in stark contrast to Aristotle’s view of wisdom as understanding universal principles (Philosophy Now). Thus, a modern reinterpretation is warranted. Contemporary wisdom should merge a philosophical understanding of universal truths with dynamic, ethical knowledge application in various contexts. Wisdom today surpasses mere comprehension or command, embodying adaptability, cooperation, and innovative application of knowledge for complex challenges. Thus, my argument, therefore, aligns more with Nietzsche’s vision, advocating a nuanced, individualistic wisdom approach for the 21st century. ### Reference 1. Ansell-Pearson, K. (2012). Nietzsche’s Übermensch: A Hero of Our Time. _Philosophy Now_. Retrieved from 2. Thornton, S. (2020). Sigmund Freud (1856—1939). In _Internet Encyclopedia of Philosophy_. Retrieved from 3. Clark, M. (1990). _Nietzsche on Truth and Philosophy_. Cambridge University Press. Retrieved from --- slug: posts/Questions-about-Spinoza tags: - philosophy - fruit - philos1aa3 description: "Questions about Spinoza's Ethics. In the Appendix to Ethics Part One (pp. 180-85), Spinoza criticizes the idea “that God directs all things to some definite end” and “that God has made all things for man and has made man to worship God.” (181). Why do people believe such things?" title: "Questions about Spinoza" date: 2023-11-30 permalink: https://aarnphm.xyz/posts/Questions-about-Spinoza.html.md --- In delving deeper into the philosophical insights of Baruch [Sphinoza](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Sphinoza)’s “Ethics,” particularly his repudiation of teleology and the anthropocentric conception of divine power, it becomes essential to contrast these views with the philosophical tenets found in Friedrich Nietzsche’s “Beyond Good and Evil.” Spinoza, with his staunch rationalism, argues that misconceptions about the divine will lead to a skewed understanding of morality and aesthetics. He suggests that true morality emerges from comprehending nature and God as entities devoid of human-like intentions or ends. For Spinoza, morality is less about adhering to external moral codes and more about aligning oneself with a profound understanding of God’s nature. [Nietzsche](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Philosophy-and-Nietzsche), while sharing Spinoza’s scepticism of conventional [morality](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/moral), approaches the subject from a different vantage point. His concept of perspectivalism, particularly the idea of the [“Will to Power”](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Will#as-power), challenges traditional notions of morality and [truth](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Will-to-Truth) as expressions of an inherent drive in all living beings to assert and maintain influence and perspective. Unlike Spinoza, Nietzsche is more focused on the role of individual power in shaping morals, eschewing the existence of a higher being. While Spinoza prompts us to envision a deterministic universe without divine purpose, urging a more objective approach to morality, Nietzsche confronts us with the nihilistic consequences of such a universe, advocating for the creation of personal values in response. This juxtaposition of ideas is crucial in contemporary philosophical discussions, urging a critical reassessment of our moral beliefs. I posit that our moral values should not only draw strength from personal conviction but also be grounded in a rational understanding of our environment and history, informed by disciplines like anthropology. This balanced approach offers a way to navigate the complex landscape of moral philosophy in the modern world. The genesis of teleological beliefs, believed by Spinoza, stemmed from human ignorance and the inherent desire to seek personal advantage. He wrote, _“all men are born ignorant of the causes of things, and that all men want to seek their own advantage and are conscious of wanting this.”_ He asserts that individuals, born ignorant of the causes of things and conscious of their desires, mistake their subjective experiences and desires for universal truths. This ignorance leads them to ascribe purpose and intention to natural phenomena, a projection of their own human-centric perspective. Spinoza argues that people, unable to comprehend the true causes of events, resort to the idea of a purposeful divine intervention, attributing their fortunes and misfortunes to a deity’s will. This anthropocentric view, according to Spinoza, arises not from an understanding of the universe but from a fundamental ignorance about it. The arguments for the fallacy of teleological thinking are multifaceted. Spinoza first argues that attributing purposes to nature inverts the true order of cause and effect. By assuming that events occur for a specific end, people mistakenly elevate what are mere effects to the status of causes. Spinoza also challenges the notion of divine purpose, suggesting that if God created the world for an end, it implies a deficiency in God, contradicting the notion of divine perfection. He asserts that everything in nature occurs out of necessity and follows from God’s nature, not from a divine intention or goal. Spinoza’s argument here is radical for his time, as it removes divine will from the equation of existence, positioning nature and its occurrences as manifestations of a deterministic universe. Spinoza extends his critique to the realm of human morality and [aesthetics](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/aesthetic-value), arguing that the belief in a purposeful universe has led to skewed notions of good and evil, beauty and ugliness. He posits that these concepts are subjective and arise from how things affect individuals personally rather than from any intrinsic quality of the things themselves. By believing that everything is created for human use, people judge the value of things based on their utility or pleasure. This anthropocentric perspective, according to Spinoza, leads to a distorted understanding of nature and contributes to conflicts and scepticism, as what is considered ‘good’ or ‘beautiful’ varies widely among individuals. Nietzsche’s approach in [Beyond Good and Evil](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Philosophy-and-Nietzsche#anatomy-of-beyond-good-and-evil) presents a stark divergence from Spinoza’s rationalistic [determinism](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Determinism). Nietzsche, known for his provocative style and radical ideas, fundamentally challenges the concept of God, dismissing it as a mere human construct. He criticizes the Christian moral framework and the notion of an objective, universal truth. Nietzsche argues that what is often perceived as truth is merely a manifestation of human will and the power dynamics at play in society. In “Beyond Good and Evil,” Nietzsche states, “There is no such thing as moral phenomena, but only a moral interpretation of phenomena” (Beyond Good and Evil, Aphorism 108). This perspective reflects his belief in perspectivalism, the idea that all knowledge is interpretive and contingent upon individual perspectives. Nietzsche’s critique extends to the realm of [metaphysics](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Metaphysics) and [epistemology](https://aarnphm.xyz/posts/Questions-about-Spinoza/../../thoughts/Epistemology) he views the belief in God and divine teleology as a weakness, a human invention to impose meaning and order in a fundamentally chaotic and purposeless universe. Contrasting with Spinoza’s deterministic view, where everything follows from the necessity of God’s nature, Nietzsche’s perspective is that the universe and human existence lack any inherent meaning or purpose. He posits that moral values are not just human-centric interpretations but fundamental expressions of “The Will to Power” and subjective interpretation in shaping human understanding and morality. For Nietzsche, the universe is not a cosmos ordered by divine providence or natural law but is instead a dynamic play of forces and wills, constantly in flux and beyond any fixed moral categorization. Furthermore, Nietzsche’s critique of divine teleology is intertwined with his broader rejection of traditional metaphysical and moral systems. He perceives these systems as symptomatic of humanity’s fear of facing the existential void – the absence of inherent meaning or purpose in life. In Aphorism 36 of “Beyond Good and Evil,” Nietzsche explains this perspective, highlighting the human tendency to construct metaphysical worlds as a way of coping with the inherent meaninglessness of existence. In contrast to Spinoza’s concepts of morality, Nietzsche presents a more critical analysis of morality and aesthetics. In Nietzsche’s view, moral systems are tools employed by individuals or groups to exert their influence and control over others through “herd instincts” (Nietzsche, “Beyond Good and Evil,” Aphorism 202). This perspective implies that moral and aesthetic judgments are more about asserting dominance and control than about any objective assessment of utility or pleasure. In synthesizing Spinoza’s rational critique with Nietzsche’s radical perspective, we uncover a comprehensive philosophical framework that profoundly challenges traditional beliefs in divine purpose and absolute morality. Nietzsche extends beyond Spinoza’s critique of anthropocentrism, delving into the deeper power dynamics that shape moral and aesthetic assertions. He presents morality not as a universal truth but as a subjective construct influenced by prevailing power structures and individual wills, reflecting his broader themes of scepticism towards absolute truths and the subjective nature of human experience. This combined perspective of Spinoza’s deterministic view and Nietzsche’s perspectivalism offers a potent critique of human-centric views of the universe and teleological thinking. It underscores the contingency, subjectivity, and influence of desires and power structures in our interpretations and judgments. This dual approach not only remains profoundly relevant in contemporary discourse but also enriches our understanding of philosophical and ethical discussions. Together, Spinoza and Nietzsche compel us to reconsider our notions of the universe, morality, and our place within it, highlighting the necessity of acknowledging the complex interplay of knowledge, power, and subjective human experience in shaping our worldview. --- slug: posts/chatgpt tags: - engineer4a03 - fruit description: "And its implication on how we assess learning. an overview." title: "On ChatGPT and its pedagogical consequences" date: 2024-10-02 permalink: https://aarnphm.xyz/posts/chatgpt.html.md --- _The following in an excerpt of a paper I wrote for my coursework._ > [!question]- Question > > In the context of Gartner’s hype cycle, what has been the trajectory of generative conversational AI? > > Should a format including generative conversational AI be introduced to replace traditional essay assignments in educational settings, and if so, what are some potential implications for student learning and assessment? ([Dwivedi et al., 2023](#bib-dwivedi2023102642)) ## Introduction. Historically, Alan Turing’s seminal work “Computing Machinery and Intelligence” laid the foundation for exploring the possibilities of a thinking machine ([TURING, 1950](#bib-10.1093/mind/lix.236.433)). Subsequently, the development of [AI](https://aarnphm.xyz/posts/chatgpt/../../thoughts/Machine-learning) had taken a symbolic approach — world representations through systems that utilise high-level symbols and manipulate tokens to arrive at a result commonly referred to as Good Old-Fashioned AI (GOFAI) ([Haugeland, 1997](#bib-10.7551/mitpress/4626.001.0001)). While GOFAI showed promise through decision-tree [reasoning](https://aarnphm.xyz/posts/chatgpt/../../thoughts/reason), its limitations became apparent in the 1980s when the field entered “AI Winter.” This was likely due to the cynicism within the AI researchers’ community and a reduction in funding, which halted most research and development ([Hendler, 2008](#bib-handler2008avoidanotheraiwinter)). However, given the rise of Moore’s Law and the exponential amount of computing and [data](https://aarnphm.xyz/posts/chatgpt/../../thoughts/data) available, a new approach to [AI](https://aarnphm.xyz/posts/chatgpt/../../thoughts/AGI) arose, focusing on statistical methods and connectionist networks such as artificial neural networks. ([Haugeland, 1997](#bib-10.7551/mitpress/4626.001.0001)) dubbed this approach as New Fangled AI (NFAI). Fast forward to the $21^{\text{st}}$ century, ML has entered the mainstream through the rise of generative AI (GenAI). This paper posits that GenAI currently occupies the “peak of inflated expectations”, approaching the “trough of disillusionment” on Gartner’s hype cycle. It will also examine the implications of machine-assisted interfaces beyond conversational UI and their pedagogical consequences for student learning and assessment. ## Gartner’s hype cycle. For context, applications such as ChatGPT are built on top of [Transformers](https://aarnphm.xyz/posts/chatgpt/../../thoughts/Transformers) architecture and pre-trained on a large corpus of [text](https://aarnphm.xyz/posts/chatgpt/../../thoughts/Language#representation) ([Brown et al., 2020](#bib-brown2020languagemodelsfewshotlearners)). Given an input sequence of tokens length $n$, these systems will predict the next tokens at index $n+1$. Most implementations of transformers are autoregressive ([Croft, 2023](#bib-croft2023llm)), meaning that the model will predict the future values (index $n+1 \to \infty$) based on past values (index $0 \to n$). However, ([Keles et al., 2022, p. 4](#bib-keles2022computationalcomplexityselfattention)) proved that the computation complexity of self-attention is quadratic; therefore, running these systems in production remains a scaling problem ([Kaplan et al., 2020](#bib-kaplan2020scalinglawsneurallanguage)). The current positioning of GenAI at the peak of inflated expectations aligns with the ([Gartner, 2024](#bib-gartner2024multimodal)) prediction. Three key factors support this assessment: rapid advancement in research, widespread enterprise adoption, and increased public awareness. Ongoing research in GenAI, specifically language models, spans several topics, including mechanistic interpretability ([Nanda, 2023](#bib-nanda2023concrete)), which explores the inner workings of auto-regressive models, information retrieval techniques aimed to improve correctness and reduce hallucinations among LLM systems ([Béchard & Ayala, 2024](#bib-béchard2024reducinghallucinationstructuredoutputs); [Dhuliawala et al., 2023](#bib-dhuliawala2023chainofverificationreduceshallucinationlarge)), as well as vested interests in multimodal applications of transformers ([Xu et al., 2023](#bib-xu2023multimodallearningtransformerssurvey)). Leading research labs, from Anthropic on their interpretability and alignment work ([Bricken et al., 2023](#bib-bricken2023monosemanticity); [Elhage et al., 2022](#bib-elhage2022superposition); [Templeton et al., 2024](#bib-templeton2024scaling)), AI21’s Jamba with its innovative hybrid transformers architecture ([Team et al., 2024](#bib-jambateam2024jamba15hybridtransformermambamodels)) to open-weights models from [Meta](https://www.llama.com/), [Google](https://deepmind.google/technologies/gemini/pro/) continue lead redefine the boundaries of what these systems are capable of. Enterprise adoption is evident with Salesforce ([Nijkamp et al., 2023](#bib-nijkamp2023xgen7btechnicalreport)), Oracle’s [collaboration with Cohere](https://cohere.com/customer-stories/oracle), and Microsoft’s Copilot for its 365 Product Suite. However, widespread implementation doesn’t necessarily equate to immediate, measurable productivity gains. Integrating these systems effectively into enterprise workflows to deliver tangible business value takes time and effort. Despite the field’s excitement, the current hype and expectations often exceed its reliable capabilities, especially for complex use cases. Significant challenges persist, including hallucinations and lack of factual grounding ([Huang et al., 2023, p. 3](#bib-huang2023surveyhallucinationlargelanguage)). We observe such behaviours in ChatGPT, where the given knowledge cutoff prevents the systems from providing up-to-date information, which will “hallucinate” and provide inaccurate answers. ([Dwivedi et al., 2023, p. 4.4.9.1.2](#bib-dwivedi2023102642)) As the field progresses towards the “trough of disillusionment” on Gartner’s hype cycle, a more realistic assessment of GenAI’s capabilities will likely emerge, paving the way for more effective applications. ## Implications of machine-assisted interfaces and its pedagogical consequences for student learning and assessment. The proliferation of conversational user interfaces (CUI) is based upon a simple heuristic of how [auto-regressive models](https://aarnphm.xyz/posts/chatgpt/../../thoughts/Autoregressive-models) models surface their internal state through generating the next tokens. CUIs often prove frustrating when dealing with tasks requiring larger information sets. Additionally, for tasks that require frequent information retrieval (research, academic writing), CUIs are suboptimal as they compel users to maintain information in their working memory unnecessarily. Pozdniakov proposed a framework that incorporate both application and interaction design, emphasizing manual alignment inputs from end users ([Pozdniakov et al., 2024, p. 3](#bib-pozdniakov2024largelanguagemodelsmeet)). This approach, when applied replace traditional essay assignments, has two major implications for student learning and assessment: a shift in core competencies and collaborative assessment methods. With machine-assisted interfaces, students will need to develop stronger critical thinking skills to evaluate AI-generated content and formulate precise instructions. The focus will shift towards the process of reaching desired outcomes and improving information retrieval skills. This shift aligns with the potential for machine-assisted proofs to solve novel problems, as discussed by ([Tao, 2024](#bib-tao2024machineassisted)). These new interfaces will require instructors to adapt their evaluation methods. Assessment will need to consider students’ pace flexibility and their level of engagement with a given topic. This approach encourages a more holistic, cross-disciplinary understanding, better preparing students for continuous learning in our rapidly evolving technological landscape. ## Bibliographie - Béchard, P., & Ayala, O. M. (2024). _Reducing hallucination in structured outputs via Retrieval-Augmented Generation_. arXiv preprint arXiv:2404.08189 [\[arxiv\]](https://arxiv.org/abs/2404.08189) - Bricken, T., Templeton, A., Batson, J., Chen, B., Jermyn, A., Conerly, T., Turner, N., Anil, C., Denison, C., Askell, A., Lasenby, R., Wu, Y., Kravec, S., Schiefer, N., Maxwell, T., Joseph, N., Hatfield-Dodds, Z., Tamkin, A., Nguyen, K., … Olah, C. (2023). Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2023/monosemantic-features/index.html) - Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). _Language Models are Few-Shot Learners_. arXiv preprint arXiv:2005.14165 [\[arxiv\]](https://arxiv.org/abs/2005.14165) - Croft, B. (2023). _LLM Visualization_. - Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A., & Weston, J. (2023). _Chain-of-Verification Reduces Hallucination in Large Language Models_. arXiv preprint arXiv:2309.11495 [\[arxiv\]](https://arxiv.org/abs/2309.11495) - Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Baabdullah, A. M., Koohang, A., Raghavan, V., Ahuja, M., Albanna, H., Albashrawi, M. A., Al-Busaidi, A. S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I., Brooks, L., Buhalis, D., … Wright, R. (2023). Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. _International Journal of Information Management_, _71_, 102642. - Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield-Dodds, Z., Lasenby, R., Drain, D., Chen, C., Grosse, R., McCandlish, S., Kaplan, J., Amodei, D., Wattenberg, M., & Olah, C. (2022). Toy Models of Superposition. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2022/toy_model/index.html) - Gartner. (2024). _Gartner Predicts 40 Percent of Generative AI Solutions Will Be Multimodal By 2027_. - Haugeland, J. (1997). _Mind Design II: Philosophy, Psychology, and Artificial Intelligence_. The MIT Press. - Hendler, J. (2008). Avoiding Another AI Winter. _IEEE Intelligent Systems_, _23_(2), 2–4. - Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2023). _A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions_. arXiv preprint arXiv:2311.05232 [\[arxiv\]](https://arxiv.org/abs/2311.05232) - Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). _Scaling Laws for Neural Language Models_. arXiv preprint arXiv:2001.08361 [\[arxiv\]](https://arxiv.org/abs/2001.08361) - Keles, F. D., Wijewardena, P. M., & Hegde, C. (2022). _On The Computational Complexity of Self-Attention_. arXiv preprint arXiv:2209.04881 [\[arxiv\]](https://arxiv.org/abs/2209.04881) - Nanda, N. (2023). _Concrete Steps to Get Started in Transformer Mechanistic Interpretability_. - Nijkamp, E., Xie, T., Hayashi, H., Pang, B., Xia, C., Xing, C., Vig, J., Yavuz, S., Laban, P., Krause, B., Purushwalkam, S., Niu, T., Kryściński, W., Murakhovs’ka, L., Choubey, P. K., Fabbri, A., Liu, Y., Meng, R., Tu, L., … Xiong, C. (2023). _XGen-7B Technical Report_. arXiv preprint arXiv:2309.03450 [\[arxiv\]](https://arxiv.org/abs/2309.03450) - Pozdniakov, S., Brazil, J., Abdi, S., Bakharia, A., Sadiq, S., Gasevic, D., Denny, P., & Khosravi, H. (2024). _Large Language Models Meet User Interfaces: The Case of Provisioning Feedback_. arXiv preprint arXiv:2404.11072 [\[arxiv\]](https://arxiv.org/abs/2404.11072) - Tao, T. (2024). _Machine-Assisted Proofs_. - Team, J., Lenz, B., Arazi, A., Bergman, A., Manevich, A., Peleg, B., Aviram, B., Almagor, C., Fridman, C., Padnos, D., Gissin, D., Jannai, D., Muhlgay, D., Zimberg, D., Gerber, E. M., Dolev, E., Krakovsky, E., Safahi, E., Schwartz, E., … Shoham, Y. (2024). _Jamba-1.5: Hybrid Transformer-Mamba Models at Scale_. arXiv preprint arXiv:2408.12570 [\[arxiv\]](https://arxiv.org/abs/2408.12570) - Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken, T., Chen, B., Pearce, A., Citro, C., Ameisen, E., Jones, A., Cunningham, H., Turner, N. L., McDougall, C., MacDiarmid, M., Freeman, C. D., Sumers, T. R., Rees, E., Batson, J., Jermyn, A., … Henighan, T. (2024). Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html) - TURING, A. M. (1950). I.—COMPUTING MACHINERY AND INTELLIGENCE. _Mind_, _LIX_(236), 433–460. - Xu, P., Zhu, X., & Clifton, D. A. (2023). _Multimodal Learning with Transformers: A Survey_. arXiv preprint arXiv:2206.06488 [\[arxiv\]](https://arxiv.org/abs/2206.06488) --- slug: posts/corporate-personhood tags: - engineer4a03 - fruit description: "and moral responsibilities of corporation." title: "Of Corporations, Courts, Personhood, and Morality" date: 2024-11-19 permalink: https://aarnphm.xyz/posts/corporate-personhood.html.md --- The following in an excerpt of a paper I wrote for my coursework. > [!question]- Question > > Read “Of Corporations, Courts, Personhood, and Morality. Business Ethics Quarterly, 25(4), 415-431.” ([Blair, 2015](#bib-blair2015ofcorporations)) > > After reading this paper and in consideration of objective reality, subjective reality and legal fiction, do you think corporations should be regarded as separate legal “persons”? Do you agree with Prof. Thomas Donaldson’s vision of corporations as “moral” persons? Why or why not? How does the concept of corporate “personhood” influence our thinking about the social responsibilities of corporations? Corporate personhood posits complex philosophical challenges that intersect with practical questions of morality, responsibility and social impact. While corporations have historically been granted legal personhood to facilitate commerce and establish clear rules of operation ([Blair, 2015](#bib-blair2015ofcorporations)), this legal fiction deserves a thorough examination to determine its ethical validity in the 21st century. The essay posits that corporations should not be considered as separate legal “persons” and should be limited to a practical legal framework while strongly opposing ([Donaldson, 1984](#bib-donaldson1984corporation)) vision of treating corporations as moral agents. ## Against Donaldson’s argument Legally, corporations are treated as separate entities, allowing them to own property, enter contracts, and be liable for debts independent of their shareholders. ([Blair, 2015](#bib-blair2015ofcorporations)) highlights that this legal fiction facilitates economic growth by encouraging investment and risk-taking. The objective reality is that corporations are collectives of individuals, and legal personhood is a tool for managing complex economic activities. However, conflating this legal construct with moral personhood is problematic. ([Donaldson, 1984](#bib-donaldson1984corporation)) posits that corporations are moral agents capable of ethical reasoning and responsibility. However, [Kant](https://aarnphm.xyz/posts/corporate-personhood/../../thoughts/Philosophy-and-Kant)’s categorical imperative challenges this very notion. Kantian [ethics](https://aarnphm.xyz/posts/corporate-personhood/../../thoughts/ethics) require autonomous agents capable of rational decision-making and moral consideration for others ([Kant, 1785](#bib-kant1785kangft)). Corporations, driven primarily by profit maximisation, lack the capacity for moral autonomy. Their decision-making processes are constrained by shareholder interests and market forces, limiting their ability to act out of duty or universal moral laws. ([Deleuze & Guattari, 1972](#bib-deleuze1972anti))‘s critique further elucidates the inherent contradictions within [Capitalism and Schizophrenia](https://aarnphm.xyz/posts/corporate-personhood/../../thoughts/Giles-Deleuze#capitalism-and-schizophrenia) systems. They argue that capitalism dissolves traditional structures and encourages an unrestrained pursuit of profit and market power. Under this framework, corporations act as agents of “deterritorialisation”—entities that disrupt established social norms in their relentless pursuit of growth. When corporations are granted personhood, they influence and shape the socio-political landscape, often without meaningful accountability. This “schizophrenic” drive for growth highlights the ethical risks associated with conflating corporate interests with those of individuals. ([Chomsky, 1999](#bib-chomsky1999profit)) further argues that corporations, empowered by neo-liberal policies, often operate contrary to the public good, undermining democratic processes and social welfare. This perspective reinforces the view that corporations lack the [moral](https://aarnphm.xyz/posts/corporate-personhood/../../thoughts/moral) orientation necessary to be considered moral persons. While Donaldson’s vision of corporations as “moral persons” attempts to impose ethical obligations on corporate behaviour, it fails to address the fundamental contradiction between profit-driven corporate structure and genuine moral [agency](https://aarnphm.xyz/posts/corporate-personhood/../../thoughts/Agency). “True” moral personhood requires the capacity for autonomous ethical reasoning and the ability to act against self-interest when morally required. Corporate fiduciary duties to shareholders, as highlighted in the Delaware court decisions discussed in ([Blair, 2015](#bib-blair2015ofcorporations)), structurally prevent this kind of authentic moral reasoning. ## Implications on social responsibility By treating corporations as persons, we risk anthropomorphising entities that are fundamentally tools of capital accumulation. ([Zuboff, 2020](#bib-zuboff2020surveillance)) describes how corporations exploit personal [data](https://aarnphm.xyz/posts/corporate-personhood/../../thoughts/data) for profit, often at the expense of individual privacy and autonomy, with the rise of surveillance capitalism. Similarly, ([CRAWFORD, 2021](#bib-atlasofai)) demonstrates the deployment of AI to optimise corporate efficiency often lacks moral oversight, exacerbating inequalities and affecting marginalised communities disproportionately. Recognising corporations are not moral agents; we shift the onus onto legal frameworks and societal pressures to enforce ethical behaviour. This understanding aligns with the objective reality of corporations as collections of individuals whose actions must be guided by laws and norms rather than assumed moral capacities. In conclusion, corporate personhood should be recognised as a limited legal fiction rather than a morally meaningful form. While legal personhood serves practical functions in commerce and law, extending this to claims of moral personhood obscures the need for external regulation and democratic oversight of corporate power. Instead of expecting corporations to embody moral principles, society should strengthen regulatory frameworks that ensure corporate actions align with the broader public interest, especially in the era of AI and data capitalism. [^analogy] ## Bibliographie - Blair, M. M. (2015). Of Corporations, Courts, Personhood, and Morality. _Business Ethics Quarterly_, _25_(4), 415–431. - Chomsky, N. (1999). _Profit Over People: Neoliberalism and Global Order_. Seven Stories Press. - CRAWFORD, K. (2021). _The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence_. Yale University Press. - Deleuze, G., & Guattari, F. (1972). _Anti-Oedipus: Capitalism and Schizophrenia_. Les Editions de Minuit. - Donaldson, T. (1984). Corporations & Morality. _Noûs_, _18_(3), 548–551. - Kant, I. (1785). _Groundwork for the Metaphysics of Morals_ (T. E. Hill & A. Zweig, Eds.). Oxford University Press. - Zuboff, S. (2020). Surveillance Capitalism. _Project Syndicate_. [^analogy]: data capitalism or surveillance capitalism are used exchangeably in this context, as they both refer to the same concept of using personal data for profit. --- slug: posts/index tags: - fruit description: "collections of writing" title: "posts." date: 2024-01-10 permalink: https://aarnphm.xyz/posts/index.html.md --- Collections of writing I really like. Some will also get posted on [chaos of living alone.](https://livingalonealone.com/) --- slug: posts/new tags: - fruit description: "And on perplexity of hackathon." title: "I saw a disstrack dropped at a hackathon." date: 2024-09-30 permalink: https://aarnphm.xyz/posts/new.html.md --- ![Cohere Toronto Office, September night](https://aarnphm.xyz/posts/new/../../posts/images/cohere.webp) Cohere Toronto Office, September night ## feels and results. The train station loomed, a grey monolith against the ever-darkening sky. It was half-past seven on a Sunday, and I ran late for the 20:23 Lakeshore West Train back to Hamilton. Quickly grabbing my laptops from the bags I packed for the weekend away, I hop back onto the [stream](https://x.com/i/broadcasts/1OwxWNvzRejJQ) to catch others’ presentations. It wasn’t any ordinary Sunday, but rather the demo night of [New Build](https://x.com/newsystems_/status/1828455648377327976) Exhaustion clung to me like a second skin after 48 hours of sleep deprivation and intense focus on hacking on a project. Our team had already finished the demo, yet something gnawed at the corner of my mind. A vague unease, shapeless as the fog, settled over me. I couldn’t shake the feeling of [displacement](https://aarnphm.xyz/posts/new/../../thoughts/displacement) that slipped through my fingers, leaving the aftertaste of a half-remembered dream. ![conversation with K](https://aarnphm.xyz/posts/new/../../posts/images/new-feeling.webp) conversation with K I have done a fair shares of [hackathons](https://jzhao.xyz/posts/hackathons) in the past, yet New Build stood apart from most hackathons I have attended. New Build is **the** definition of “unc cracked tpot club” that build projects during the weekend. It was the distilled essence of Toronto’s raw talents that represents the ever-fast-growing tech scene in Canada. New Build was a multidisciplinary hackathon that combines intensive project development with team formation inspired by NBA Draft[^1]. One major feature that differentiated New from other hackathons is the draft mechanics. We knew who the team captains were. Lo and behold, yours truly was one of them. Given the crowd of cracked and brilliant minds participating in this event, the weight of self-imposed expectations hung heavy. I felt compelled to match their prowess, not for their sake but to prove something to myself. Yet beneath it all, a voice whispered a simple desire lingering at the back of my mind - to savour the experience and craft something genuine and [quaint](https://maggieappleton.com/folk-interfaces). I had an idea in mind infused with warmth, a reflection of my inner child, free from the cold glare of corporatism. > I want to play and build something novel! Yet, on Saturday morning, as soon as the clock struck 08:30, my corporate-wired mind took control, drowning out any remnants of authenticity I have. We immediately got carried away into short-term optimisation[^2] of the problem statement, min-maxxing for the potential outcomes of the project. Additionally, we were fixated on the name such that we wanted to make it work. > **We have fallen into the trap of corporatisation of hackathons**. ![questions](https://aarnphm.xyz/posts/new/../../posts/images/new-question.webp) questions This mindset got to me, and it showed during the demo. The panel said nothing. No questions, no grilling. Defeat washed over me, heavy as the silence. I felt small, like one of those shuttered storefronts dotting the neighbourhood. On the train home, I watched the city blur past - all grit, neon, and late-night diners. Something shifted, quiet as a whisper: I know my shit. Damn good, actually. The city kept moving, indifferent. And so would I. ## on hacker culture and implications of New Build. _the following is an excerpt from [Hacking the Hackathon](https://jzhao.xyz/posts/hackathons)_ A weird thing about startup/hustle culture: We fetishise exhaustion as a badge of honour. We have collectively decided that bags under our eyes are way cooler than a new iPhone. This behaviour very much stems from Silicon Valey’s [saviorism](https://stanforddaily.com/2018/02/16/silicon-valleys-saviorism-problem/) attitude. The time-boxed nature of hackathons only serves as microcosms of this zeitgeist and compels participants to push their limits in a 24-36 hours sprint to push out marketable products. The fundamental issue with this approach is its [reductionist](https://aarnphm.xyz/posts/new/../../thoughts/reductionism) nature. These rapid-fire development sessions rarely build upon existing knowledge or work in the field. More often than not, they ignore crucial context surrounding the complex issues they attempt to address, distilling multifaceted problems into a simple web app[^3]. This methodology prioritizes speed and novelty over depth and nuance, potentially leading to superficial innovations that fail to address root causes or consider long-term implications. “Hackers” are makers compelled to create - not for money or fame, but for the pure joy of bringing something new to life. The congregations of craftsmen eventually led to the formation of hackerspaces such as hackathons – a kind of digital-age speakeasy for the intellectually adventurous. These spaces were initially conceived as the “third space” outside the state’s influence and the capitalist market. Yet, these spaces often struggle to remain true to their vision without intentional intervention. The commercialisation of hackathons can be seen as an unintended consequence of their underlying financial incentives. Hackathons aren’t cheap to run, so organizers, with the best of intentions, turn to sponsorships to keep the lights on and the Red Bull flowing. But each logo slapped on a banner chips away at the original ethos. It’s a classic chicken-and-egg problem. Hackathons need money, but the incentive structure to foot the bill slowly morphs hackathons away from their original purposes. It is tricky, right? How do you keep the spirit of innovation and learning while all these other factors are at play? ```mermaid flowchart LR A[sponsors trying to maximize the benefits] --> B[organizers increases size and scope of events] B --> C[Hackers are incentivized to build] C --> A ``` > It's likely New Builds 2 will happen September 2025.\ > \ > If you're a company, a fund, and institution that wants to get involved to help make that happen, we can start discussing now.\ > \ > So far I know:\ > \ > \- Draft Night should move to Paradise Cinema for scale and theatrics.\ > \ > \- Grand… > > — V (@internetvin) [1 octobre 2024](https://twitter.com/internetvin/status/1841118814676668585?ref_src=twsrc%5Etfw) I think organizers should emphasize the ethos of hackathons, eliminate the focus on prizes and short-term projects, and replace it with something better. Reclaiming the design spaces means to cultivate a culture of [play](https://aarnphm.xyz/posts/new/../../thoughts/play) - a space for “for unfettered exploration which gives individuals freedom to explore ideas that might not have clear monetary values.” ```poetry language=fr A hackathon should be the infrastructure layer so that everyone can play. ``` ### implications from New Build. New Build tackled addresses some problems and challenges pretty well, such as the [draft mechanics](https://x.com/aarnphm_/status/1839714935963607405) which introduces some [entropy](https://aarnphm.xyz/posts/new/../../thoughts/Entropy), but fell short in terms of prizes incentives. \_K and I were chatting about how New Build felt like extended [New Office Hours](https://x.com/aarnphm_/status/1775641922029162773), which is a good first step in cultivating spaces for play\_ New Build represents what Toronto has to offer, a first step to solve the “human capital flight” (often refers as “brain drain”) in Canada. Looking ahead, I’d love to see New Build create more space for pure play. Maybe even go full retreat-style, similar to [rabbitholeathon](https://www.rabbitholeathon.com/). I have faith in the New Build team. They’ve got good people. And good people are the ultimate moat. ### going forward with hackathons. As for me, I keep saying each hackathon will be my last. The 36-hour coding binges aren’t as appealing as they once were. But I said that last time too, so who knows? There’s something addictive about the energy of a good hackathon[^4] Here’s the thing about hackathons: they don’t have to choose between being recruiting events and playgrounds for innovation. The best ones are both. But right now, the scales are tipped too far towards recruitment. It’s like optimizing for an acquisition instead of building something people want. The real magic of hackathons happens when you put hackers first. Everything else – the jobs, the networking, the sponsorships – that all follows naturally when you get the core experience right. ## to my teammates. ```poetry language=fr I'm obsessed with your work. I'm so blessed to have a chance to work with you all. I'm sorry that I couldn't do more, but overall it was a net positive. I wouldn't trade anything for it. Even though we didn't win, I'm glad that we did work on something together. I do hope that we would cross path again in the future. regards, aaron. ``` [^1]: At a conventional hackathon, one can form teams beforehand with friends or pick one team at the event for the unversed. [^2]: [Hackathons as Co-optation Ritual: Socializing Workers and Instituionalising Innovation in the “New” Economy](https://academicworks.cuny.edu/cgi/viewcontent.cgi?article=1575\&context=gc_pubs) by _Sharon Zukin and Max Papadantonakis_ [^3]: One team built AI agents to solve public policies. Per the demo, it seemed to recommend building “more police stations” to solve Moss Park’s challenges. However, it is not as simple as just “building more police stations”. The judge was pretty firm on this, but the idea was there. [^4]: Honestly, I only do this one because of [Tommy](https://tommytrinh.me/), tyfe. --- slug: posts/occupational-licensure tags: - engineer4a03 - fruit description: "and what it meant to be an engineer" title: "On occupational licensure" date: 2024-11-27 permalink: https://aarnphm.xyz/posts/occupational-licensure.html.md --- _The following in an excerpt of a paper I wrote for my coursework._ > [!question]- Question > > The Professional Engineers Act gives provincial regulators such as Professional Engineers Ontario (PEO) the responsibility to regulate the registration of professional engineers within their mandated jurisdictions. On May 15 2023, PEO implemented new changes in the process for one to become licensed as a Professional Engineer in Ontario. [^links] > > After your review of the Professional Engineers Act, and the licensing procedure, do you think this new procedure strengthens the engineering profession? Does it support what the Professional Engineers Act stands for? Is it equitable for engineers whose educational background is not considered ‘traditional’? For example one who’s undergraduate degree is not in engineering but have a graduate degree in engineering, or are internationally educated engineers? The overthrow of the medieval guild system was an indispensable early step in the rise of freedom in the Western world. -- Milton Friedman According to ([Friedman & Friedman, 1962](#bib-friedman1962capitalism)), while necessary to protect consumers from fraudulent activities, occupational licensure inevitably becomes “a tool in the hand of a special producer group” to obtain monopolistic practices at the cost of public welfare. While the Act mandates that the practice of professional engineers must be upheld by provincial regulators so that “the public interest may be served and protected”, the PEO’s May 15 update may inadvertently restrict new entries into the field, particularly affecting those whose educational backgrounds do not align with traditional pathways, and exacerbate Canada’s current engineering shortage. This essay will draw arguments from history and argue why these changes seem to undermine the general ethos of the Professional Engineers Act. The Act emphasises inclusivity by providing multiple pathways for qualified individuals to become licensed, including evaluating international qualifications and through the Engineering Interns program ([Government of Ontario, 1990](#bib-peact1990)). However, the stringent academic requirements may exclude those who hold graduate degrees in engineering but do not have an undergraduate degree in the field and internationally educated engineers who may have equivalent or even superior training. Wittgenstein’s analysis of [language](https://aarnphm.xyz/posts/occupational-licensure/../../thoughts/Language) games sheds light on how the concept of “professional engineer” is constructed through social and institutional practices ([Wittgenstein, 1953](#bib-wittgenstein1953philosophical)). The concept of “family resemblances” may suggest that rigid definitions of what constitutes an “engineer” may miss the essential qualities that make someone capable of performing engineering work effectively. By narrowing the definition of “qualified”, PEO risks excluding individuals who may bring valuable perspectives and skills to the profession. Engineering is a field that benefits from diverse ways of thinking, and a more inclusive approach to licensure would better serve the public interest by allowing a more comprehensive range of qualified individuals to contribute to solving complex engineering problems. From an economic standpoint, licensing requirements often restrict competition more than protect public safety ([Friedman & Friedman, 1962](#bib-friedman1962capitalism)). Friedman posited that requiring an individual to obtain permission from the state or a governing body to work in his chosen occupation inherently infringed on [individual freedom](https://aarnphm.xyz/posts/occupational-licensure/../../thoughts/Camus#absurd-freedom). When applied to engineering, excessive barriers to entry can drive talented professionals to jurisdictions with more reasonable requirements, particularly the United States, where many engineering roles don’t require formal licensing, as one should be recruited based on one’s occupational competency rather than being infringed upon by licensure. In the case of engineering in Ontario, the new licensing changes may create an artificial scarcity of engineers, which could lead to higher costs for engineering projects, reduced innovation, and delays in critical infrastructure development. Friedman also emphasised the importance of [free markets](https://aarnphm.xyz/posts/occupational-licensure/../../thoughts/monetary#free-market) and removing unnecessary barriers to economic participation ([Friedman & Friedman, 1962, Chapter V](#bib-friedman1962capitalism)). By imposing stricter licensing requirements, PEO effectively creates a barrier that prevents otherwise capable individuals from contributing to the field. In a sense, this promotes the phenomenon of brain drain, referring to the emigration of talented aliens to countries with more favourable professional opportunities. By imposing stricter requirements, PEO may encourage talented individuals to leave Ontario in favour of jurisdictions where they can more easily practice their profession. This results in a loss of talent and weakens Ontario’s competitiveness in attracting and retaining highly skilled professionals. In conclusion, while maintaining high professional standards is essential, it is equally important to ensure that licensing procedures do not inadvertently exclude capable engineers. From Friednmanite’s perspective, alternative approaches that maintain professional standards while avoiding these market distortions should be considered. He advocated for certification systems that would allow consumers to identify qualified practitioners while not preventing others from practising. In engineering, this could mean competency-based assessments rather than rigid credential requirements, recognition of diverse educational and professional pathways, and a tiered certification system recognising different levels of expertise. ## Bibliographie - Friedman, M., & Friedman, R. D. (1962). _Capitalism and Freedom_ (p. 202). University of Chicago Press. - Government of Ontario. (1990). _Professional Engineers Act, R.S.O. 1990, c. P.28_. Government of Ontario. - Wittgenstein, L. (1953). _Philosophical Investigations_ (G. E. M. Anscombe, Trans.). Blackwell. [^links]: See also [Professional Engineers Act](https://www.ontario.ca/laws/statute/90p28) and [PEO Procedure](https://www.peo.on.ca/apply/licensing-changes#:~:text=applicants%20%E2%80%93%20PEO%20has%20launched%20an,Limited%20Licence%20in%20the%20future) --- slug: posts/to-the-past-lovers tags: - sapling - poetry - love description: "on past love." title: "un ancien amour." date: 2024-02-12 permalink: https://aarnphm.xyz/posts/to-the-past-lovers.html.md --- ```poetry language=en Beneath the quiet of night, under the vast sky, where stars whisper stories of ancient light, I find you again in the sigh of the wind, in the gentle caress of the moon's soft beam. Your smile, a memory etched in the stars, a lantern guides me through the harrowed wall, of my heart. Your laughter, a melody that reverberates across the eons, a symphony that lingers in the silence of my mind, keeps me company among the tumultuous life. The miles stretched wide, a chasm of silent cries, a beacon once thought to withstand the test of time. But time, a cruel mistress, adds distance to the miles, a facade of perfection, at best, a jest. No distance too far, no age too enduring, to dim the echo of your laughter, to quell the fire of your touch. Yet, here I stand, reminiscing the good old days, lost in the labyrinth of time. a prisoner of the heart, and a slave to the mind, that misses the idea of you. ``` --- slug: quotes tags: - evergreen description: "A collection of quotes, wisdom, and advice." title: "advice." date: 2024-01-23 permalink: https://aarnphm.xyz/quotes.html.md --- ## On life. Throw me some wisdom, and advices? I have none. — Jesse Your life so far is a drawing canvas. You can’t change what’s already been drawn, but you can always paint a new line. — paraphrased from [@tommytrxnh](https://twitter.com/tommytrxnh) 20 years from now you will be more disappointed by the things that you didn’t do than by the ones you did do. So throw off the bowlines. Sail away from the safe harbour. Catch the trade winds in your sails. Explore. Dream. Discover. — Mark Twain Sometimes, we \[care] too much about potential, less on credentials — Kate ## On bits and bytes. Computer is a bicycle for the mind. — [Steve Jobs](https://www.youtube.com/watch?v=ob_GX50Za6c\&ab_channel=MichaelLawrence) All I can say to the young is close your eyes. — Ted Nelson An expert is a man who has made all the mistakes, which can be made, in a very narrow field. — Niels Bohr Effective system design requires insights drawn from serious [contexts of use](https://notes.andymatuschak.org/z51q8prEJzs5Jqa5WPThYoV?stackedNotes=z7EQ2nVGus5B1rS9CqT18g6) — Andy Matuschak ## On perspectives. Our capacity to deal with [language](https://aarnphm.xyz/thoughts/Language) is a complex, genetically-determined part of our biological endowment. It’s a product of evolution, part of our nature. — Noam Chomsky The falseness of an opinion is not for us any objection to it: it is here, perhaps, that our new [language](https://aarnphm.xyz/thoughts/Language) sounds most strangely. The question is, how far an opinion is life-furthering, life-preserving, species-preserving, perhaps species-rearing, and we are fundamentally inclined to maintain that the falsest opinions — that the renunciation of false opinions would be \[a renunciation of life]. — [Friedrich Nietzsche](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche) I always feel happy, you know why? Because I don’t expect anything from anyone. Expectations always hurt. Life is short. So love your life. Be Happy. & Keep smiling. Just live for yourself & before you speak, listen. Before you write, think. Before you spend, earn. Before you pray, forgive. Before you hurt, feel. Before you hate, love. Before you quit, try. Before you die, live. — William Shakespeare A pessimist sees the difficulty in every opportunity; an optimist sees the opportunity in every difficulty. — Winston Churchill People think focus means saying yes to the thing you’ve got to focus on. But that’s not what it means at all. It means saying no to the hundred other good ideas that there are — Steve Jobs _The moral thing that should wish to say is very simple. I should say: _Love is wise, hatred is foolish__ — [Bertrand Russell](https://www.youtube.com/watch?v=ihaB8AFOhZo\&ab_channel=PhilosophieKanal) Craftsman is knowing how to work, Art is knowing when to stop. — Ben Affleck Ask not what your country can do for you - ask what you can do for your country. — J.F.Kennedy ## On drive. _Life can be much broader when you discover one simple fact...that everything around you was made up by people no smarter than you.... Once you learn that, you'll never be the same again._— Steve Jobs I just wondered how things were put together. — Claude Shannon Never stop learning. Assume nothing, question everything. Teach others what you know. Analyze objectively — Richard Feynman The first principle is that you must not fool yourself, and you are the easiest person to fool. — Richard Feynman Success consists of going from failure to failure without loss of enthusiasm. — Winston Churchill \[One] who works with the door open gets all kinds of interruptions, but \[they] also occasionally gets clues as to what the world is and what might be important. — Richard Hamming ## On randomness and fun. I have to be successful because I like expensive things. — some random person on twitter People like you think I get lucky. Here’s the thing, I make my own luck. — Harvey Specter Sticks and stones may breaks my bone, but there will be always something that offend a feminist — Some random British reporter --- slug: thoughts/AGI tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/AGI" title: "AGI" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/AGI.html.md --- The proposal is that such an AGI would be able to understand or learn any intellectual task that a human being can. It would also be able to learn and improve itself, and possibly be able to do things that humans cannot do. We saw “some sparks” in [LLMs](https://aarnphm.xyz/thoughts/AGI/../../thoughts/LLMs) that it can “understand” [natural language](https://aarnphm.xyz/thoughts/AGI/../../thoughts/NLP) See also [Yann’s chat with Lex](https://www.youtube.com/watch?v=5t1vTLU7s40\&ab_channel=LexFridman) --- slug: thoughts/Agency tags: - seed - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Agency" title: "Agency" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Agency.html.md --- > The ability and freedom to act in their immediate environment. Considered to be a study of [action theory](https://aarnphm.xyz/thoughts/Agency/../../thoughts/action-theory) > Everyone talks about having agency, but when it comes to falling in love, we have none (that’s why it is called falling) [Chaos](https://aarnphm.xyz/thoughts/Agency/../../thoughts/Chaos) allows for agency, but too much [entropy](https://aarnphm.xyz/thoughts/Agency/../../thoughts/Entropy) can create problems. ## Self-determination theory [link](https://selfdeterminationtheory.org/theory/) ## having a shit blog has made me feel abundant source: [Escaping Flatland](https://www.henrikkarlsson.xyz/p/having-a-shit-blog-has-made-me-feel) ## agency as machine see [Direct Manipulation vs Interface Agents](https://dl.acm.org/doi/10.1145/267505.267514) Agency as an extension of end-users rather than the systems itself ![](https://aarnphm.xyz/thoughts/Agency/../../thoughts/images/complex-takes-away-agency.webp) > Instruments of _superagency_ --- slug: thoughts/Alignment tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Alignment" title: "Alignment" date: 2024-03-05 permalink: https://aarnphm.xyz/thoughts/Alignment.html.md --- See also: [Overton Window](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/Overton-Window) and this [blog on alignment research](https://openai.com/blog/our-approach-to-alignment-research) The act of aligning oneself with a particular group or ideology. This can be done for a variety of reasons, including: - To gain social acceptance - To gain power - To gain resources Often known as a solution to solve “hallucination” in large models token-generation. > To align a model is simply teaching it to generate tokens that is within the bound of the Overton Window. The goal is to build a aligned system that help us solve other alignment problems > Should we build a [ethical](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/ethics) aligned systems, or [morally](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/moral) aligned systems? One of [mechanistic interpretability](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/mechanistic-interpretability)’s goal is to [ablate](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/mechanistic-interpretability#ablation) harmful features ### [design](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/design) See also [Information Theory](https://aarnphm.xyz/thoughts/Alignment/../../thoughts/Information-Theory) --- slug: thoughts/Attention tags: - technical - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Attention" title: "Attention" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Attention.html.md --- ([Vaswani et al., 2023](#bib-vaswani2023attentionneed)) Attention operates on a sequence of query $Q$, key $K$ and value $V$ vector. Attention matrix of a sequence then computed as: $$ A(Q, K, V) = \text{softmax}(\frac{Q \cdot K^{T}}{\sqrt{d}})V \space \space \text{ for } Q_{L \times d}, K_{L \times d}, V_{L \times d} $$ ## Muti-head Attention Allows the model to jointly attend to information from different representation subspaces at different positions: $$ \begin{aligned} \text{MHA}(Q,K,V) &= \text{concat}(\text{head}_1, \cdots, \text{head}_n) W^O \\ &\text{where } \space \text{head}_i = \text{A}(QW_i^O, KW_i^O, VW_i^O) \\ W^O & \in \mathbb{R}^{hd_v \times d_{\text{model}}} \end{aligned} $$ ## Group-Query Attention by ([Ainslie et al., 2023](#bib-ainslie2023gqatraininggeneralizedmultiquery)) idea: reduce number of KV heads $n_k$ to a fraction $n_k^{'} = \frac{n_q}{k}$ of number of query heads $n_q$ (evenly dividing the query heads into $n_k$ groups with $r$ heads) ## RadixAttention Implemented in ([Zheng et al., 2024](#bib-zheng2024sglangefficientexecutionstructured)) where they maintain a LRU eviction policy to maintain relevant [KV cache](https://aarnphm.xyz/thoughts/Attention/../../thoughts/KV-compression) for all requests within a [radix tree](https://aarnphm.xyz/thoughts/Attention/../../thoughts/Radix-tree) radix tree setup: - key: sequence of tokens - value: KV cache tensor (stored in GPU in a paged layout) ![](https://aarnphm.xyz/thoughts/Attention/../../thoughts/images/vllm/radix-attention.webp) _dynamic evolution of the radix tree in response to various requests._ > [!abstract]- explanation of RadixAttention with LRU eviction policy > > These requests include two chat sessions, a batch of few-shot learning inquiries, and a self-consistency sampling. Each tree edge carries a label denoting a substring or a sequence of tokens. The nodes are color-coded to reflect different states: green for newly added nodes, blue for cached nodes accessed during the time point, and red for nodes that have been evicted. > > [full explanation](https://lmsys.org/blog/2024-01-17-sglang/#backend-automatic-kv-cache-reuse-with-radixattention) ### cache-aware scheduling We define the hit rate as $$ \begin{aligned} \text{hit rate} &= \frac{\sum_{r \in R} \text{number of cached prefill tokens in } r}{\sum_{r \in R} \text{number of prefill tokens in } r} \\[8pt] &=1 - \frac{C}{\sum_{r \in R} \text{number of prefill tokens}} \end{aligned} $$ _in batch settings: sort requests by matching prefix length and prioritise one with longer matched prefixes instead of FIFO schedule._ ```pseudo \begin{algorithm} \caption{Cache-Aware Scheduling for RadixAttention with Continuous Batching} \begin{algorithmic} \State \textbf{Input:} The radix tree $T$, the memory pool $P$, the current running batch $B$, the waiting queue $Q$. \State \textbf{Output:} Finished requests and updated system state. \State // Get all requests from the waiting queue \State requests $\gets Q.\text{get\_all\_requests}()$ \State // Search for prefix matching for all waiting request \For{req $\in$ requests} \State req.prefix\_node, req.prefix\_len $\gets$ T.match\_prefix(req.input\_tokens) \EndFor \State // Sort the request according to matched prefix lengths \State requests.sort() \State // Select requests for the next batch \State available\_size $\gets$ T.evictable\_size() + P.available\_size() \State current\_size $\gets$ 0 \State new\_batch $\gets$ [] \For{req $\in$ requests} \If{req.size() + current\_size $\le$ available\_size} \State new\_batch.append(req) \State $\delta \gets T.\text{increase\_ref\_counter}(req.\text{prefix\_node})$ \State available\_size $\gets$ available\_size + $\delta$ \EndIf \EndFor \State Q.remove\_requests(new\_batch) \State // Insert requests into the current running batch \State B.merge(new\_batch) \State // Allocate new memory and do eviction if necessary \State needed\_size $\gets$ B.needed\_size() \State success, buffer $\gets$ P.alloc(needed\_size) \If{$\neg \text{success}$} \State T.evict(needed\_size) \State success, buffer $\gets$ P.alloc(needed\_size) \EndIf \State B.run(buffer) \State // Process finished requests \State finished\_requests $\gets$ B.drop\_finished\_requests() \For{req $\in$ finished\_requests} \State T.decrease\_ref\_counter(req.prefix\_node) \State T.insert(req) \EndFor \State \Return finished\_requests \end{algorithmic} \end{algorithm} ``` We got lower bound: $$ C \ge \sum_{e \in \text{edges}(T)} \mid e \mid $$ Consider we visit radix tree $T$ in DFS order. For each edge $e$ of $T$, the first time we compute KV cache associated with $e$, then we will compute the whole subtree of $e$. During computation of $e$ subtree, then edge $e$ will be continuously hit, thus no additional computation will happen. > [!tip] cache hit > > with cache size $\ge$ maximum request length (which will equals to longest path in radix tree), edge $e$ **WILL NOT** be evicted during computation of its subtree since the common prefix including $e$ of the subtree will be continuously hit. We can show that longest-shared-prefix-first order is equivalent to DFS order by induction [^proof] ### compressed FSM for jump-ahead tokens. Implemented in ([Zheng et al., 2024](#bib-zheng2024sglangefficientexecutionstructured)) #### Method 1: [FSM](https://aarnphm.xyz/thoughts/Attention/../../thoughts/constrained-decoding/../../thoughts/constrained-decoding#guided-generations-with-fsm)-based decoding - intuition: Using FSM ([Willard & Louf, 2023](#bib-willard2023efficientguidedgenerationlarge)) to guide generations by increasing logit bias for tokens that conform to given JSON schema. This allows us to track the current state during decoding and filter out invalid tokens by applying logit bias to the output. ![](https://aarnphm.xyz/thoughts/Attention/../../thoughts/constrained-decoding/../../thoughts/images/vllm/constrained-json-fsm.webp) - limitation: we can see that given construction of FSM requires token-level access, it can only transition the state by only _one_ token at a time, resulting in slow decoding. #### Method 2: Interleaved-based - intuition: breaks down JSON schemas, each containing either a chunk prefill part or constrained decoding part. They are then executed interleaved by inference system. Faster than per-token decoding given that chunked prefill components can process multiple tokens per forward pass See also using llama.cpp as backend. - limitation: - interleaved-based require custom syntax, making it less expressive compared to regex. - struggles to deal with tokenization boundaries due to conflicts between decode and chunked prefill segments. - frequent communications between interpreter and back-end adds additional overhead. #### **_Method 3: Jump-Forward Decoding with compressed FSM_** ![](https://aarnphm.xyz/thoughts/Attention/../../thoughts/constrained-decoding/../../thoughts/images/vllm/jump-forward-decoding-fsm.webp) > [!tip] tokenization boundary handling > > During decoding, it is preferred to combine multiple characters into a single tokens. > > For example, when decoding `"Hello"` in context of JSON decoding, LLM might output the following token `"`, `He`, `llo`, `",` > > This may cause some strange behaviour if we combine the last `"` with `,` (this regex `"[\w\d\s]*"` with the last `,` will lead to endless decoding because this token `",` is not valid even if the LM wants to stop.) Fix: - implement _re-tokenization_ mechanism during jump-forward phase (append string instead of the tokens, followed with re-tokenization of the entire text) $\to$ add approximately 4% of overhead - use a comprehensive regex to guide the decoding phase, instead of employing multiple concatenated regex [^coalescence] [Lien vers l'original](https://aarnphm.xyz/thoughts/Attention/../../thoughts/constrained-decoding#compressed-fsm-for-jump-ahead-tokens) ## RingAttention ([Liu et al., 2023](#bib-liu2023ringattentionblockwisetransformers)) ## RazorAttention ([Tang et al., 2024](#bib-tang2024razorattentionefficientkvcache)) ## Paged Attention by ([Kwon et al., 2023](#bib-kwon2023efficient)) Used in conjunction with [continuous batching](https://aarnphm.xyz/thoughts/Attention/../../thoughts/Continuous-batching), implemented through [vLLM](https://aarnphm.xyz/thoughts/Attention/../../thoughts/vllm) Reduce memory usage of attention mechanism by swapping kv-cache in and out of memory. A block manager is similar to those of _virtual memory_ in OS. Essentially, it’s a form of **paging**, such that attention can be stored in contiguous memory. Partitions the KV cache of each sequence into KV blocks. Another optimization is to use [KV compression](https://aarnphm.xyz/thoughts/Attention/../../thoughts/KV-compression) to reduce the size of the KV cache for longer context. Given: - each block contains KV vectors for fixed number of tokens, denoted as block size $B$. - Key block $K_j= (k_{(j-1)B+1}, \ldots, k_{jB})$ - Value block $V_j= (v_{(j-1)B+1}, \ldots, v_{jB})$ $$ A_{ij} = \frac{\exp(q_i^T K_j / \sqrt{d})}{\sum_{t=1}^{i//B} \exp(q_i^T K_t / \sqrt{d})}, \quad o_i = \sum_{j=1}^{i//B} V_j A_{ij}^T $$ where $A_{ij}=(a_{i,(j-1)B+1}, \ldots a_{i,jB})$ is row vector of attention score on j-th KV block. ## Bibliographie - Ainslie, J., Lee-Thorp, J., de Jong, M., Zemlyanskiy, Y., Lebrón, F., & Sanghai, S. (2023). _GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints_. arXiv preprint arXiv:2305.13245 [\[arxiv\]](https://arxiv.org/abs/2305.13245) - Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C. H., Gonzalez, J. E., Zhang, H., & Stoica, I. (2023). Efficient Memory Management for Large Language Model Serving with PagedAttention. _Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles_. - Liu, H., Zaharia, M., & Abbeel, P. (2023). _Ring Attention with Blockwise Transformers for Near-Infinite Context_. arXiv preprint arXiv:2310.01889 [\[arxiv\]](https://arxiv.org/abs/2310.01889) - Tang, H., Lin, Y., Lin, J., Han, Q., Hong, S., Yao, Y., & Wang, G. (2024). _RazorAttention: Efficient KV Cache Compression Through Retrieval Heads_. arXiv preprint arXiv:2407.15891 [\[arxiv\]](https://arxiv.org/abs/2407.15891) - Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). _Attention Is All You Need_. arXiv preprint arXiv:1706.03762 [\[arxiv\]](https://arxiv.org/abs/1706.03762) - Zheng, L., Yin, L., Xie, Z., Sun, C., Huang, J., Yu, C. H., Cao, S., Kozyrakis, C., Stoica, I., Gonzalez, J. E., Barrett, C., & Sheng, Y. (2024). _SGLang: Efficient Execution of Structured Language Model Programs_. arXiv preprint arXiv:2312.07104 [\[arxiv\]](https://arxiv.org/abs/2312.07104) [^proof]: _base_: a random request correspond to node $x \in T$ will be processed. - All requests correspond to nodes $\{v_{1}, \ldots, v_{n}\}$ on path $x \gets \text{root}$ doesn’t need recomputation. - Thus, computation complexity for requests of nodes $\{v_{1}, \ldots, v_{n}, x\}$ is aligned with DFS _induction_: assume we visit node $y \in T$, and the visited node align with DFS order. Let $P$ denote _path of_ $y \gets \text{root}$. - Each node that has not been visited has the lowest common ancestor with visited nodes on $P$. - Since nodes on $P$ are cached, a node $z$ that has yet to be visited with lowest common accestor on $P$ will have the _longest shared prefix_ - longest-shared-prefix-first order will select $z$, which is a valid DFS q.e.d --- slug: thoughts/Autograd tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Autograd" title: "Autograd" date: 2021-10-10 permalink: https://aarnphm.xyz/thoughts/Autograd.html.md --- Auto differentiation and [XLA](https://aarnphm.xyz/thoughts/Autograd/../../thoughts/XLA) $f(x) = e^{2x} - x^3 \rightarrow \frac{df}{dx} = 2e^{2x} - 3x^2$ ← manual diff Others: - numerical, symbolic - autodiff - similar to symbolic, but on demand? - instead of expression → returns numerical value Forward mode - compute the partial diff of each scalar wrt each inputs in a forward pass - represented with tuple of original $v_i$ and _primal_ $v_i^o$ (tangent) $v_i \rightarrow (v_i, \dot{v^o})$ - [Jax](https://aarnphm.xyz/thoughts/Autograd/../../thoughts/Jax) uses operator overloading. Reverse mode - store values and dependencies of intermediate variables in memory - After forward pass, compute partial diff output wrt the intermediate adjoint $\bar{v}$ --- slug: thoughts/Automatic-Differentiation tags: - math description: "resconstructed source of https://aarnphm.xyz/thoughts/Automatic-Differentiation" title: "Automatic Differentiation" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Automatic-Differentiation.html.md --- see also: [Autograd](https://aarnphm.xyz/thoughts/Automatic-Differentiation/../../thoughts/Autograd) and [Jax](https://aarnphm.xyz/thoughts/Automatic-Differentiation/../../thoughts/Jax) Input: code compute a function Output: code compute the derivative of the function AD writes functions as sequence of compositions block $f(x) = f_n \circ f_{n-1} \circ \ldots \circ f_1(x)$, and then computes the derivative of the function by applying the chain rule. --- slug: thoughts/Autoregressive-models tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Autoregressive-models" title: "Autoregressive models" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Autoregressive-models.html.md --- A statistical model is autoregressive if it predicts future values based on past values. For example, an autoregressive model might seek to predict a stock’s future prices based on its past performance. In context of LLMs, generative pre-trained [transformers](https://aarnphm.xyz/thoughts/Autoregressive-models/../../thoughts/Transformers) (GPTs) are derivations of auto-regressive models where it takes an input sequence of tokens length $n$ and predicting the next token at index $n+1$. Auto-regressive models are often considered a more correct terminology when describing text-generation models. --- slug: thoughts/Behavirourism tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Behavirourism" title: "Behavirourism" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Behavirourism.html.md --- Positive reinforcement (praise, rewards) strengthens the behaviour and increases the likelihood of it being repeated, where as negative reinforcement ensures such behaviour are not being repeated. ### critique. - one-dimensional to understand human-behaviour, as it focuses only on observable behaviours and neglects internal mental processes - deterministic, as it assumes that behaviour is determined by the environment and not by the individual, which induces [confirmation bias](https://aarnphm.xyz/thoughts/Behavirourism/../../thoughts/confirmation-bias) - [Compression](https://aarnphm.xyz/thoughts/Behavirourism/../../thoughts/Compression) problems --- slug: thoughts/BuildKit tags: - seed - container description: "resconstructed source of https://aarnphm.xyz/thoughts/BuildKit" title: "BuildKit" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/BuildKit.html.md --- Concurrent, cache-efficient, and secure build system for building [OCI-compliant](https://aarnphm.xyz/thoughts/BuildKit/../../thoughts/OCI) images and artifacts. Containers are a form of [Content-addressable storage](https://aarnphm.xyz/thoughts/BuildKit/../../thoughts/Content-addressable-storage), such that you can run your application within an isolated environment. ### LLB You can think of it as LLVM IR to C is what LLB is to Dockerfile. Marshaled as a protobuf message, see [definition](https://github.com/moby/buildkit/blob/master/solver/pb/ops.proto) See also [Flatbuffer](https://aarnphm.xyz/thoughts/BuildKit/../../thoughts/In-memory-representation) --- slug: thoughts/Camus tags: - philosophy description: "Camus, a scattered thoughts and notes." title: "Camus" date: 2024-02-28 permalink: https://aarnphm.xyz/thoughts/Camus.html.md --- Of the absurd [reasoning](https://aarnphm.xyz/thoughts/Camus/../../thoughts/reason) and [existentialism](https://aarnphm.xyz/thoughts/Camus/../../thoughts/Existentialism). ## The Myth of Sisyphus > Influenced by [Nietzsche](https://aarnphm.xyz/thoughts/Camus/../../thoughts/Philosophy-and-Nietzsche), Camus argued that life is inherently meaningless, while human continues to impose order on existence and to look for answers to unanswerable questions ### Absurd and suicide Suicide is the solution for the absurd: - People never died because of ontological arguments - Suicide is often the result of people who didn’t find worth in the living - Life is not worth living therefore I took the easy way out as a paradox: - Suicide is the justification of meaning of life ← the most important question for philosophers - From [Nietzchean](https://aarnphm.xyz/thoughts/Camus/../../thoughts/Philosophy-and-Nietzsche) prose, those who say “no” acts as if they said “yes”: Schopenhauer Fantasise the act of eluding: > Hope of another life one must “deserve” or trickery those who lives not for life itself but for some great idea that will transcend it, refine it, give it meaning, and betray it. Logic is easy, but it is impossible to be logical to bitter end. It is considered truth if one decided to die at the hand of self, but does that mean life itself just have no meaning? > **Absurd reasoning** is based on whether there are [logic](https://aarnphm.xyz/thoughts/Camus/../../thoughts/logic) to reasons for men who died by their “own hands consequently follow to its conclusion of their emotional inclination” The absurd come from the abject at birth, similar to end pages of the books starts from the beginning. To understand absurd is to understand the art of living, the world of intelligence. Seemingly the questions of the absurd stems from question “Why”. The wearing of a normal life, inaugurates the impulse of consciousness. Heidegger: “mere anxiety \[is] a source of everything.” --- ### definition of absurd See also P.17, P.20, P.25 I realise that if through science I can seize phenomena and enumerate them, I cannot, for all that, apprehend the world. Absurd is the confrontation of this irrational call for clarity whose call echoes in the human heart ```poetry language=fr The absurd is measured by the mans in the world ``` The attack of [reasons](https://aarnphm.xyz/thoughts/Camus/../../thoughts/reason) and decency are never stronger than our own Once we recognised the absurd, it becomes passion. How many lives with his passion or not is a different question. Philosophers lives through their lenses of the world such that they ran these experiments and believed so strongly in the results Jaspers despair any ontology because we have lost naïveté Kierkegaard lives the absurd: no truth is absolute and can render satisfactory an existence impossible in itself. The absurd is born from reasons man making sense of the world and the irreparable silence of the universe echoed back to one. > [!note] P.30 > > In all these cases, from the simplest to the most complex, the magnitude of the absurdity will be in direct ratio to the distance between the two terms of my comparison. There are absurd marriages, challenges, rancors, silences, wars, and even peace treaties. For each of them the absurdity springs from a comparison. I am thus justified in saying that the feeling of absurdity does not spring from the mere scrutiny of a fact or an impression, but that it bursts from the comparison between a bare fact and a certain reality, between an action and the world that transcends it. The absurd is essentially a divorce. It lies in neither of the elements compared; it is born of their confrontation. In this particular case and on the plane of intelligence, I can therefore say that the Absurd is not in man (if such a metaphor could have a meaning) nor in the world, but in their presence together. For the moment it is the only bond uniting them. If I wish to limit myself to facts, i know what man wants, I know that the world offers him, snd now i can say that i know what links them. Rule of method: A man is always a prey for his truth. Once he has admitted them he cannot free himself from them. a man who become conscious of his absurd is now forever bound by it Indeed, Kierkegaard himself shows us the path taken. > I do not want to suggest anything here, but how can one fail to read in his works the signs of an almost intentional mutilation of the soul to balance the mutilation accepted in regard to the absurd? It is the leitmotiv of the Journal. “What I lacked was the animal which also belongs to human destiny… . But give me a body then.” And further on: “Oh! especially in my early youth what should I not have given to be a man, even for six months … What I lack, basically, is a body and the physical conditions of existence.” > Reconciliation through scandal is still reconciliation. It allows one perhaps, aa can be seen, to derive hope of its contrary, which is death Kierkegaard’s view on despair is that it is not a fact, but a state: the state of sin. For sin is what alienates from God. The absurd, is the metaphysical state of the conscious man, does not lead to God. Therefore, the absurd is the sin without God > [!note] P.44 > > I read merely these assertions of Husserl, apparently parade cal yet rigorously logical if what precedes is accept That which is true is true absolutely, in itself; truth, one, identical with itself, however different the creation who perceive it, men, monsters, angels or gods.” Reason triumphs and trumpets forth with that voice, I cannot, deny. What can its assertions mean in the absurd word The perception of an angel or a god has no meaning for me. That geometrical spot where divine reason ratifies mine will always be incomprehensible to me. There, too, I discern a leap, and though performed in the abstract, it nonetheless means for me forgetting just what I do not want to forget. Husserl exclaims: “If all masses subject to attraction were to disappear, the law of attraction would not be destroyed but would simply remain without any possible application, I know that I am faced with a metaphysic of consolation. And if I want to discover the point where though leaves the path of evidence, I have only to reread the parallel reasoning that H voices regarding the mind: if we could contemplate clearly the exact laws of psychic process, they would be seen to be likewise eternal and invariable, like the basic laws of theoretical science. Hence they would be valid even if there were no psychic process. Even if the mind were not, its law would be, i see then a psychological truth H aims to make a rational rule: after having denied the integrating power of human reason, he leaps this expedient by eternal reason. Husserl’s concrete universe in that all essences are not formal, but some are material, that the first are the object of logic and second of science, this is mere question of definition. I then realize that merely the order of the procession has been changed. This world has ceased to have its reflection in a higher universe, but the heaven of forms is figured in the host of images of this earth. This changes nothing for me. Rather than encountering here a taste for the concrete, the meaning of the human condition, I find an intellectualism sufficiently unbridled to generalize the concrete itself. ### absurd freedom Iindividual f I were a tree among trees, a cat among animals, this life would have a meaning, or rather this problem would not arise, for I should belong to this world. I should be this world to which I am now opposed by my whole consciousness and my whole insistence upon familiarity. This ridiculous reason is what sets me in opposition to all creation. I cannot cross it out with a stroke of pen. What I believe to be true I must therefore preserve. The absurd is simultaneously the awareness and rejection of death. Suicide as a solution for the absurd, the absolute, such that man’s cannot seem to live with his dreadful future, that he choose suicide as a solution. - Consciousness and revolt as rejections are contrary of renunciation - The method regards to a matter of persistent > [!tip] Freedom > > Knowing whether or not a man is free involves in whether he can/cannot have a master. The paradox of this freedom is that understanding the metaphysical liberty takes away its meaning of being free. God is problem of evil: either we are not free and God all-powerful is responsible for evil. Or we are free and responsible but god is not all-powerful. Freedom cannot be inferred as a general solution, such that it can only be derived from one’s experience. I don’t inherit freedom from a higher being, as I’m my own owner of my thoughts and actions Such that I’m responsible for my own actions. If the absurd cancel put the eternal freedom, it restored and magnifies my freedom of action. Man is bound to postulate his freedom based on the illusion of which he was living. Losing oneself in that bottomless certainty, feeling henceforth sufficiently remote from one’s own life to increase it and take a broad view of it - it involves a principles of liberation. Such new independence has a definite time limit, like any freedom of action. ### the absurd man The actor trains himself to feed only on appearances. --- ### Analysis Camus’ argument on the absurd: - the world is full of irrationality and indifference. The world is silent against humanity search for the meaning of life. - Meaning and value are constructed by humans, instead of what Kierkegaard implies in putting faith as a solution for outsource our value system. Because eventually life is meaningless - But what Kierkegaard is doing is actually a philosophical suicide. > [!note] Note > > I don’t know whether this world has a meaning that transcend it. But I know that i do not know that meaning and that it is impossible for me just now to know it > [!note] Note > > What can a meaning outside my condition mean to me. I can understand only in human terms. Did he mean the world or the human as absurd? No, because as rational human being we are programmed to create order and put meaning to life in a indifferent and irrational universe The why arises, and trying to find rational in an irrational world is absurd. The absurd cannot be negated, meaning we can live either in acceptance or escape from it. Religion is a set of pre-made answers for existential and philosophical questions, and use as tools for control. Philosophical suicide is to elude the absurd and trying to figure out the meaning of life, with a set of man-made beliefs How to live life in a meaningless world? It is to let loose in all definitions of meaning and live life fruitfully. Instead of despairing, see the silver lining, to focus on this life, create [value](https://aarnphm.xyz/thoughts/Camus/../../thoughts/Value) on our own, when our time is limited, with a full perception of it. One should not accept the absurd, we should revolt around it as we have full control of our own actions and freedom. Rebellion: full of thoughts and actions. as rejection of hope. The goal is to live solely with what he know, to accommodate with what is and to bring in nothing? --- slug: thoughts/Capitalism-and-Freedom tags: - book description: "by Milton Friedman" title: "Capitalism and Freedom" date: 2024-11-27 permalink: https://aarnphm.xyz/thoughts/Capitalism-and-Freedom.html.md --- ([Friedman & Friedman, 1962](#bib-friedman1962capitalism)) ### on occupational licensure Three grounds for this arguments: 1. registration 2. certification 3. licensing proper - against fraudulent claims: > It goes still farther in the direction of trenching upon the rights of individuals to enter into voluntary contracts. > The most obvious social cost is that anyone of these measures, whether it be registration, certification, or licensure, almost _inevitably becomes a tool_ in the hands of \[_a special producer group_] to obtain a **monopoly** position at the expense of the rest of the public. ## Bibliographie - Friedman, M., & Friedman, R. D. (1962). _Capitalism and Freedom_ (p. 202). University of Chicago Press. --- slug: thoughts/Cauchy-momentum-equation tags: - physics description: "and fluid dynamics." title: "Cauchy momentum equation" date: 2024-11-27 permalink: https://aarnphm.xyz/thoughts/Cauchy-momentum-equation.html.md --- In convective or Lagrangian form: $$ \begin{aligned} \frac{Du}{Dt} = &\frac{1}{\rho} \nabla \cdot \sigma + \mathbf{f}\\[12pt] \because \space u&: \text{flow velocity} \quad (\text{unit: } m/s) \\ t &: \text{time} \quad (\text{unit: } s) \\ \frac{Du}{Dt} &: \text{material derivative of } \mathbf{u} = \partial_t \mathbf{u} + \mathbf{u} \cdot \nabla u \quad (\text{unit: } m /s^2) \\ \rho &: \text{density at given point of the continuum} \quad (\text{unit: } kg/m^3) \\ \sigma &: \text{stress tensor} \quad (\text{unit: Pa} = N/m^2 = \text{kg} \cdot m^{-1} \cdot s^{-2}) \\[8pt] \mathbf{f} &: \begin{bmatrix} f_x \\ f_y \\ f_z \end{bmatrix} \quad (\text{unit: } m/s^2) \\ \nabla \cdot \boldsymbol{\sigma} &= \begin{bmatrix} \frac{\partial \sigma_{xx}}{\partial x} + \frac{\partial \sigma_{yx}}{\partial y} + \frac{\partial \sigma_{zx}}{\partial z} \\ \frac{\partial \sigma_{xy}}{\partial x} + \frac{\partial \sigma_{yy}}{\partial y} + \frac{\partial \sigma_{zy}}{\partial z} \\ \frac{\partial \sigma_{xz}}{\partial x} + \frac{\partial \sigma_{yz}}{\partial y} + \frac{\partial \sigma_{zz}}{\partial z} \end{bmatrix} \quad (\text{unit: Pa}/m) \\ \end{aligned} $$ NOTE: $\mathbf{f}$ is the _vector containing all accelerations caused by body force_ and $\nabla \cdot \boldsymbol{\sigma}$ is _the [divergence](https://aarnphm.xyz/thoughts/Cauchy-momentum-equation/../../thoughts/Vector-calculus#divergence) of __stress tensor_. > [!note] common annotation > > We only use Cartesian coordinate system (column vector) for clarity, but equation often written using physical components (which are neither covariants (column) nor contra-variants (row)) ## differential derivation > [!abstract] generalized momentum conservation principles > > The change in system momentum is proportional to the resulting force acting on this system $$ \vec{p}(t + \Delta t) - \vec{p}(t) = \Delta t \vec{\overline{F}} $$ where $\vec{p}(t)$ is momentum at time $t$, and $\vec{\overline{F}}$ is force averaged over $\Delta t$ ## integral derivation Applying Newton’s second law to a control volume in the continuum being gives $$ ma_i = F_i $$ Then based on [Reynolds transport theorem](https://aarnphm.xyz/thoughts/Cauchy-momentum-equation/../../thoughts/Reynolds-transport-theorem) using material derivative [^mat-derivative] annotations: $$ \begin{align} \int_{\Omega} \rho \frac{D u_i}{D t} \, dV &= \int_{\Omega} \nabla_j \sigma_i^j \, dV + \int_{\Omega} \rho f_i \, dV \\ \int_{\Omega} \left( \rho \frac{D u_i}{D t} - \nabla_j \sigma_i^j - \rho f_i \right) \, dV &= 0 \\ \rho \frac{D u_i}{D t} - \nabla_j \sigma_i^j - \rho f_i &= 0 \\ \frac{D u_i}{D t} - \frac{\nabla_j \sigma_i^j}{\rho} - f_i &= 0 \end{align} $$ where $\Omega$ represents control volume. ## conservation form $$ \frac{\partial j}{\partial t} + \nabla \cdot \mathbf{F} = \mathbf{s} $$ where $\mathbf{j}$ is the momentum density at given space-time point, $\mathbf{F}$ is the _flux_ associated to momentum density, and $\mathbf{s}$ contains all body force per unit volume. Assume conservation of mass, with known properties of divergence and gradient we can rewrite the conservation form of equations of motions $$ \frac{\partial}{\partial{t}}(\rho \mathbf{u}) + \nabla \cdot (\rho \mathbf{u} \otimes \mathbf{u}) = - \nabla p + \nabla \cdot \tau + \rho \mathbf{a} $$ where $\otimes$ is the outer product of the flow velocity $\mathbf{u}$: $\mathbf{u} \otimes \mathbf{u} = \mathbf{u} \mathbf{u}^T$ ## convective form $$ \frac{D \mathbf{u}}{Dt} = \frac{1}{\rho} \nabla \cdot \sigma + \mathbf{f} $$ [^mat-derivative]: the definition of material derivative are as follow: > [!math] definition > > For any [tensor field](https://aarnphm.xyz/thoughts/Cauchy-momentum-equation/../../thoughts/Tensor-field) $y$ that is _macroscopic_, or depends on ly on position and time coordinates $y=y(\mathbf{x}, t)$ > > $$ > \frac{Dy}{Dt} = \frac{\partial y}{\partial t} + \mathbf{u} \cdot \nabla y > $$ > > where $\nabla y$ is the covariant dervative of the tensor, and $\mathbf{u}(\mathbf{x}, t)$ is the flow velocity --- slug: thoughts/Cauchy-Schwarz tags: - math description: "resconstructed source of https://aarnphm.xyz/thoughts/Cauchy-Schwarz" title: "Cauchy-Schwarz" date: 2024-11-05 permalink: https://aarnphm.xyz/thoughts/Cauchy-Schwarz.html.md --- _useful for derive upper bounds, e.g when analysing the error or convergence rate of an algorithm_ > [!abstract] format > > for all vectors $v$ and $v$ of an inner product space, we have > > $$ > \mid \langle u, v \rangle \mid ^2 \le \langle u, u \rangle \dot \langle v, v \rangle > $$ In context of Euclidean norm: $$ \mid x^T y \mid \le \|x\|_2 \|y\|_2 $$ ## proof _using Pythagorean theorem_ special case of $v=0$. Then $\|u\|\|v\| =0$, ⇒ if $u$ and $v$ are [linearly dependent](https://aarnphm.xyz/thoughts/Cauchy-Schwarz/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1#linear-dependence-of-vectors)., then q.e.d Assume that $v \neq 0$. Let $z \coloneqq u - \frac{\langle u, v \rangle}{\langle v, v \rangle} v$ It follows from linearity of inner product that $$ \langle z,v \rangle = \langle u - \frac{\langle u,v \rangle}{\langle v, v \rangle} v,v \rangle = \langle u,v \rangle - \frac{\langle u,v \rangle}{\langle v,v \rangle}\langle v,v \rangle = 0 $$ Therefore $z$ is orthogonal to $v$ (or $z$ is the projection onto the plane orthogonal to $v$). We can then apply Pythagorean theorem for the following: $$ u = \frac{\langle u,v \rangle}{\langle v,v \rangle} v + z $$ which gives $$ \begin{aligned} \|u\|^{2} &= \mid \frac{\langle u,v \rangle}{\langle v,v \rangle} \mid^{2} \|v\|^{2} + \|z\|^2 \\ &=\frac{\mid \langle u,v \rangle \mid^{2}}{(\|v\|^2)^{2}} \|v\|^{2} + \|z\|^2 \\ &= \frac{\mid \langle u, v \rangle\mid^2}{\|v\|^{2} } + \|z\|^2 \ge \frac{\mid \langle u,v \rangle \mid^2}{\|v\|^{2} }\\ \end{aligned} $$ Follows $\|z\|^{2}=0 \implies z=0$, which estabilishes linear dependences between $u$ and $v$. q.e.d --- slug: thoughts/Chaos tags: - philosophy description: "Chaos a la carte." title: "Chaos" date: 2024-01-08 permalink: https://aarnphm.xyz/thoughts/Chaos.html.md --- Full [post](https://aarnphm.xyz/thoughts/Chaos/../../posts/Chaos). > Chaos: a looseness [collection](https://subconscious.substack.com/p/self-organizing-ideas) of one’s [will](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Will) to life. The etymology of chaos traces back to the Greek word χάος (khaos), meaning which means abyss, that which gapes wide open, that which is vast and [empty](https://www.merriam-webster.com/wordplay/chaos-meaning-and-history) ## as system. See also [Chaos as an intermittently forced linear system](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/papers/Chaos-as-an-intermittently-forced-linear-system.pdf) Chaos theory posits that within the apparent randomness of complex system lies an underlying pattern, self-similarity, and self-organization. The amount of time in which a system can be predicted is dependent on the following: - how much uncertainty can be tolerated in the forecast. - accuracy of measuring current state of the system. - A time scale, often known as [Lyapunov time](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Lyapunov-time) We can often see [entropy](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Entropy) as a consequence of chaos. These are often linked, yet distinct concepts. The loss of order induces unpredictability within deterministic systems, or such systems are _sensitive dependent_ on initial condition. Whereas entropy deals with property of how one system can be arranged. We can observe this through Lorenz [attractor](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/attractor) system: $$ \begin{align*} \frac{dx}{dt} &= \sigma(y - x), \\ \frac{dy}{dt} &= x(\rho - z) - y, \\ \frac{dz}{dt} &= xy - \beta z. \end{align*} $$ > Chaos: When the present determines the future, but the approximate present does not approximately determine the future. ## as scale. See also [this tweet](https://twitter.com/eshear/status/1760755072571777412) Often known as cognitive dissonance, or linked with emotional turmoil. The personal traits continuum scale, characterised by Carl Jung suggested that the human psyche lies within the spectrum of extroversion and introversion, rather than a definitive single continuum that modern psychology perceive it to be. _How does Chaos influence the scale of human psyche?_ ## fundamentals. What [Nietzsche](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Philosophy-and-Nietzsche) would imply: > Alas! there cometh the time when man will no longer launch the arrow of his longing beyond man—and the string of his bow will have unlearned to whizz! > > I tell you: one must still have chaos in one, to give birth to a dancing star. I tell you: ye have still chaos in you. _extracted from Z V, Death of The God_ Chaos is the essence of one existence. Such that the world is not governed by fixed rules and predetermined order. Nietzsche rejects [transcendentals chaos](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Transcendentals), such construct reality beyond sensory understanding. These truths are [philosophers’ prejudices](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Philosophy-and-Nietzsche#prejudices-of-philosophers) that deny one’s will to [power](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Will#as-power), such that all truth are just one’s perception and experience. - “eternal recurrence” [^1] \- a litmus test for an individual’s capacity to affirm life ← Actively mentioned throughout [Thus Spoke Zarathustra](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Philosophy-and-Nietzsche#thus-spoke-zarathustra). - implies the possibility of composite [self](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/papers/Nietzsche-the-Kantian-Self-and-Eternal-Recurrence.pdf), that the individual remains the same for eternity of life. - allows chaos to remain the force of life, and will to power to be a configuration of chaos. > The self, for Nietzsche, is not just a radically unstable postmodern self. It is such a self, but it is not simply such a self. It also has a stability, sameness, and unity that goes far beyond anything Kant ever imagined in his wildest dreams The Übermensch must find his footing, create his own values through the act of living. Chaos is important for the creation of anything that is truly new and valuable. Chaos, in its many forms, is often seen as a force to be feared or avoided, yet it is also a catalyst for growth. It challenges the boundaries of our comfort zones and compels us to engage with aspects of our lives and selves that we might prefer to ignore. ## versus equanimity. Equanimity should be one to seek, but yet chaos is all I desire. (moment of chaos, moment of equanimity) The rule of a utilitarian is to maximize desire at all cost, therefore, does it mean I should always seek chaos? Nietzsche would argue that the motion of chaos invokes entropy, and entropy induces value, and the Übermensch embarks upon the creation of value. [Taste](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/taste) implies multiplicity of being. It is driven by inner chaos to explore and expands on our [representation](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Language) of the world. Yet ignorance seems to overload chaos, and prevents the maximum utilization of ones potential. I wonder if chaos is just a collection of different entropic phenomena. Equanimity, represents a state of calmness and balance, even in the face of adversity. Achieving it is not about denying chaos or the tumult of emotions it can evoke, but rather about finding a way to navigate through it without being overwhelmed. It’s about learning to coexist with the chaos, recognizing it as a part of the broader tapestry of life and the self. Running away from normalcy to seek out “different entropic phenomena” speaks to a deep-seated curiosity and a desire not just for experience, but for understanding the intricate dynamics of life. It’s a testament to the strength and resilience of the human spirit in its quest for meaning, even when faced with the seemingly insurmountable. [^1]: See also [Giles Deleuze’s](https://aarnphm.xyz/thoughts/Chaos/../../thoughts/Giles-Deleuze#nietzsche-and-philosophy) interpretation. --- slug: thoughts/Cholesky-decomposition tags: - math description: "resconstructed source of https://aarnphm.xyz/thoughts/Cholesky-decomposition" title: "Cholesky decomposition" date: 2024-10-28 permalink: https://aarnphm.xyz/thoughts/Cholesky-decomposition.html.md --- decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose. (used for [Monte-Carlo simulations](https://aarnphm.xyz/thoughts/Cholesky-decomposition/../../thoughts/Monte-Carlo#simulations)) $$ A = LL^{*} $$ where $L$ is a lower triangular matrix with real and positive diagonal entries, and $L^{*}$ is the conjugate transpose of $L$. --- slug: thoughts/Cinematography tags: - film - evergreen description: "resconstructed source of https://aarnphm.xyz/thoughts/Cinematography" title: "Cinematography" date: 2023-09-11 permalink: https://aarnphm.xyz/thoughts/Cinematography.html.md --- Notes on format: - Anamorphic [lenses](https://aarnphm.xyz/thoughts/Cinematography/../../thoughts/lenses) Equipment: - A7III - Shallow depth of field - FX3 - larger sensor pixel area > [Lightning](https://aarnphm.xyz/thoughts/Cinematography/../../thoughts/Lighting) is key [Planimetric composition](https://aarnphm.xyz/thoughts/Cinematography/../../thoughts/Planimetric-composition) - Wes Anderson --- slug: thoughts/Civilisation-and-its-Discontents tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Civilisation-and-its-Discontents" title: "Civilisation and its Discontents" date: 2023-10-10 permalink: https://aarnphm.xyz/thoughts/Civilisation-and-its-Discontents.html.md --- See also: [Freud](https://aarnphm.xyz/thoughts/Civilisation-and-its-Discontents/../../thoughts/Freud) C1: ego and sense of self within the societal context - Oceanic feeling - ignorance for the existence of others - Cant seem to separate himself from the sense of reality C2: the meaning of happiness? - his discontent against personal freedom and societal restrictions - The sense of guilt? Guilty for not following societal norms - Eros and Thanatos C3: What are the core purposes of this biological beings we called self? Freud argues the human psyche is not a single monolith, rather comprises of complex interplay of the following components: Id: primal, instinctive part of self, seeking immediate gratification of pleasure Ego: logical, rational conscious part of the psyche Superego: internalized moral and societal values C5: Emphasis on the construct of human psyche creates internal conflicts, adding civilizations norms which increases the tendency for aggression versus self love --- slug: thoughts/Color tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Color" title: "Color" date: 2024-03-09 permalink: https://aarnphm.xyz/thoughts/Color.html.md --- ### theory. Complementary Analogous Triadic See also [coolors.co](https://aarnphm.xyz/thoughts/Color/../../coolors.co) contrast, combination, thickness 1. background Color 2. surface area --- slug: thoughts/Compiler tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Compiler" title: "Compiler" date: 2024-10-07 permalink: https://aarnphm.xyz/thoughts/Compiler.html.md --- ## just-in-time compiler ```mermaid graph TD A[Source Code] --> B[Bytecode / IR] B --> C[Interpreter] C --> D{Hot Spot?} D -->|Yes| E[JIT Compiler] D -->|No| C E --> F[Native Machine Code] F --> G[Execution] C --> G ``` See also: [thoughts/jit.py](https://cdn.aarnphm.xyz/assets/thoughts/jit.py) toy example for branch optimization: ```python import numpy as np import numpy.typing as npt cache: list[npt.NDArray[np.float32]] = [] def dct_jit(x: npt.NDArray[np.float32]) -> npt.NDArray[np.float32]: global cache x_tuple = tuple(x) if x_tuple in cache: return cache[x_tuple] N = len(x) result = np.zeros(N) for k in range(N): sum_val = 0 for n in range(N): sum_val += x[n] * np.cos(np.pi * k * (2 * n + 1) / (2 * N)) result[k] = sum_val cache[x_tuple] = result return result ``` --- slug: thoughts/Complexity tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Complexity" title: "Complexity" date: 2024-12-01 permalink: https://aarnphm.xyz/thoughts/Complexity.html.md --- papers: [Out of the Tar Pit, B. Moseley](https://aarnphm.xyz/thoughts/Complexity/../../thoughts/papers/Out-of-the-Tar-Pit,-Moseley.pdf) ## Cyclometric > A proxy metric for complexity Think of it as a structured programs defined with references to control-flow graph with an _edge: if control may pass from first to second_ > [!math] complexity > > defined as follows: > > $$ > \begin{aligned} \mathbb{M} &= \mathbb{M} - \mathbb{N} + 2 \mathbb{P} \\[8pt] &\because \mathbb{E} = \text{number of edges in the graph} \\ &\quad \space \mathbb{N} = \text{number of nodes in the graph} \\ &\quad \space \mathbb{P} = \text{number of connected components} \end{aligned} > $$ ## Law of Software Evolution see also: [paper](https://aarnphm.xyz/thoughts/Complexity/../../thoughts/papers/Programs,-Life-Cycles,-and-Laws-of-Software-Evolution---Lehman.pdf) --- slug: thoughts/Compression tags: - seed - technical description: "resconstructed source of https://aarnphm.xyz/thoughts/Compression" title: "Compression" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Compression.html.md --- --- slug: thoughts/Constructionist tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Constructionist" title: "Constructionist" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Constructionist.html.md --- Mindstorm and Design Justice --- slug: thoughts/Containers tags: - technical - storage description: "resconstructed source of https://aarnphm.xyz/thoughts/Containers" title: "Containers" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/Containers.html.md --- See also [OCI specification](https://aarnphm.xyz/thoughts/Containers/../../thoughts/OCI), [BuildKit](https://aarnphm.xyz/thoughts/Containers/../../thoughts/BuildKit) --- slug: thoughts/Content-addressable-storage tags: - seed - technical description: "resconstructed source of https://aarnphm.xyz/thoughts/Content-addressable-storage" title: "Content-addressable storage" date: 2023-04-15 permalink: https://aarnphm.xyz/thoughts/Content-addressable-storage.html.md --- Content-addressed storage is a mechanism to store information such that it can be retrieved based on its content, not name or location. > If you have a book, say “Control Systems Engineer by N.S.Nise, with ISBN: 978-1-119-47422-7”, you can find the book anywhere, including its information and content. > > By contrast, if I use location-addressing to identify the book, say, “the book on the second shelf of the third row in the library”, it would be difficult to find the book if the library is reorganized. | Content-addressed | Location-addressed | | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | use cryptographic hash functions[^1] to generate unique keys to retrieved based on contents | e.g: [HTTP](https://aarnphm.xyz/thoughts/Content-addressable-storage/../../thoughts/HTTP), look up content by its location (URI). Thus contents is controlled by the owner of the location | ## Immutable Objects, Mutable References Utilize [Merkle DAG](https://aarnphm.xyz/thoughts/Content-addressable-storage/../../thoughts/Merkle-DAG), immutable content-addressed objects, and mutable pointers to the DAG, which creates a dichotomy presents in many distributed systems. See also [IPFS](https://aarnphm.xyz/thoughts/Content-addressable-storage/../../thoughts/IPFS), [Block-reference mechanism](https://aarnphm.xyz/thoughts/Content-addressable-storage/../../thoughts/Block-reference-mechanism) [^1]: See [cryptographic functions](https://aarnphm.xyz/thoughts/Content-addressable-storage/../../thoughts/cryptography#functions) --- slug: thoughts/Continuous-batching tags: - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Continuous-batching" title: "Continuous batching" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/Continuous-batching.html.md --- ([Yu et al., 2022](#bib-280922)) solves the static batching to reduce cost and improve throughput by appending requests continuously into existing KV cache [^paper] ![](https://aarnphm.xyz/thoughts/Continuous-batching/../../thoughts/images/vllm/continuous-batching.webp) ## Bibliographie - Yu, G.-I., Jeong, J. S., Kim, G.-W., Kim, S., & Chun, B.-G. (2022). Orca: A Distributed Serving System for Transformer-Based Generative Models. _16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)_, 521–538. [^paper]: The [paper](https://www.usenix.org/conference/osdi22/presentation/yu) and [presentation](https://www.youtube.com/watch?v=Ob9PPLxETYU\&ab_channel=USENIX) for the paper. Most notable open source implementation is [vLLM](https://aarnphm.xyz/thoughts/Continuous-batching/../../thoughts/vllm). p/s: Actually, I think first implemented in [huggingface/tgi](https://github.com/huggingface/text-generation-inference) --- slug: thoughts/Database tags: - technical description: "resconstructed source of https://aarnphm.xyz/thoughts/Database" title: "Database" date: 2024-02-09 permalink: https://aarnphm.xyz/thoughts/Database.html.md --- See also [introduction](https://aarnphm.xyz/thoughts/Database/../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS) --- slug: thoughts/Determinism tags: - seed - computing description: "resconstructed source of https://aarnphm.xyz/thoughts/Determinism" title: "Determinism" date: 2024-01-08 permalink: https://aarnphm.xyz/thoughts/Determinism.html.md --- The argument from Hume --- slug: thoughts/Digital-garden tags: - seed - pattern description: "resconstructed source of https://aarnphm.xyz/thoughts/Digital-garden" title: "Digital garden" date: 2024-02-09 permalink: https://aarnphm.xyz/thoughts/Digital-garden.html.md --- A collection of notes, thoughts, and ideas that are cultivated and grown over time. It’s a place where you can plant seeds, grow them, and let them bloom. It’s a place where you can let your thoughts grow organically, and where you can let your ideas flourish. In a sense, it is a form of [hypertext](https://aarnphm.xyz/thoughts/Digital-garden/../../thoughts/Hypertext), a personalized Xanadu system. Wikipedia is also considered as society’s digital garden Joel Hooks puts this better than I can ever do: > A garden is usually a place where things grow. > > Gardens can be very personal and full of whimsy or a garden can be a source of food and substance. > > We gather and work together in community gardens to share the labor as well as the rewards of a collective effort. > > It’s a comparison that you can take very far. From “planting seeds” and “pulling weeds” to tending multiple gardens that each serve an individual need or desired outcome. > > Like with real gardens, our digital gardens are a constant ebb and flow towards [entropy](https://aarnphm.xyz/thoughts/Digital-garden/../../thoughts/Entropy). > Nerding hard on digital gardens, personal wikis, and experimental knowledge systems with [@\_jonesian](https://twitter.com/_jonesian?ref_src=twsrc%5Etfw) today.\ > \ > We have an epic collection going, check these out...\ > \ > 1\. [@tomcritchlow](https://twitter.com/tomcritchlow?ref_src=twsrc%5Etfw)'s Wikifolders: [pic.twitter.com/9ri6g9hD93](https://t.co/9ri6g9hD93) > > — Maggie Appleton (@Mappletons) [15 avril 2020](https://twitter.com/Mappletons/status/1250532315459194880?ref_src=twsrc%5Etfw) See also: [post](https://maggieappleton.com/garden-history) and [introduction](https://joelhooks.com/digital-garden) ## The garden and the stream [Source](https://hapgood.us/2015/10/17/the-garden-and-the-stream-a-technopastoral/) --- slug: thoughts/Dishes tags: - evergreen - menu description: "resconstructed source of https://aarnphm.xyz/thoughts/Dishes" title: "Menus" date: 2023-10-26 permalink: https://aarnphm.xyz/thoughts/Dishes.html.md --- A collection of courses. See [atelier with friends](https://aarnphm.xyz/thoughts/Dishes/../../thoughts/atelier-with-friends/) if you are interested to join.. This serves as a ground truth for a collection of dishes throughout. ## italienne. 1. Uovo la Raviolo ### salsa. 1. Marinara 2. Sugo Pomodoro ## le viandier. 1. Soupe à l’Oignon Gratinée 2. Chicken liver paté a la Jacques Pepin 3. La trout meunière a la choux de Bruxelles 4. Canard a l’orange 5. Salade Landaise 6. la charcuterie 7. Mousse au chocolat 8. gateau au chocolat - espresso buttercream, honey, sea salt, chocolate ganache. 9. choux au craquelin - matcha cream, powered sugar, matcha powder. --- slug: thoughts/Dysregulation tags: - seed - psychology description: "resconstructed source of https://aarnphm.xyz/thoughts/Dysregulation" title: "Dysregulation" date: 2024-02-12 permalink: https://aarnphm.xyz/thoughts/Dysregulation.html.md --- > That feeling when you want to text that person back, but you are too nervous about why they didn’t then you started forming up scenarios in your head why such things happens. prefrontal cortex goes to sleep and amygdala takes over ⇒ reaffirming core beliefs ⇒ get caught anxiety - [ ] How to deal with it? - [ ] Regulate your emotions, cut through that energy - [ ] Stop and name the feeling, turn on prefrontal cortex for logical brain - [ ] Safety lies within you, not in the other person --- slug: thoughts/Embedding tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Embedding" title: "Embedding" date: 2024-02-25 permalink: https://aarnphm.xyz/thoughts/Embedding.html.md --- See also [Transformers](https://aarnphm.xyz/thoughts/Embedding/../../thoughts/Transformers#inference) --- slug: thoughts/Entropy tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Entropy" title: "Entropy" date: 2024-01-11 permalink: https://aarnphm.xyz/thoughts/Entropy.html.md --- > In particular, "good, aligned, conversational AI" is just one of many possible different rollouts. Finetuning / alignment tries to "collapse" and control the entropy to that region of the simulator. Jailbreak prompts try to knock the state into other logprob ravines. > > — Andrej Karpathy (@karpathy) [6 mars 2023](https://twitter.com/karpathy/status/1632800082679705600?ref_src=twsrc%5Etfw) $$ S = k_b \ln \Omega $$ --- slug: thoughts/Epistemology tags: - seed - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Epistemology" title: "Epistemology" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Epistemology.html.md --- The study of knowledge and justified belief. --- slug: thoughts/Euler's-identity tags: - math description: "resconstructed source of https://aarnphm.xyz/thoughts/Euler's-identity" title: "Euler's identity" date: 2024-11-05 permalink: https://aarnphm.xyz/thoughts/Euler's-identity.html.md --- Probably the most [beautiful](https://aarnphm.xyz/thoughts/Euler's-identity/../../thoughts/aesthetic-value#beauty) equation in mathematics: $$ \begin{aligned} e^{i \pi} &+ 1 = 0 \\ \\ \because e &: \text{Euler's number} \\ i &: \text{imaginary unit satisfies } i^{2} = -1 \\ \pi &: \text{pi} \end{aligned} $$ special case of Euler’s formula: $$ e^{i \theta} = \cos(\theta) + i \sin(\theta) $$ --- slug: thoughts/Existentialism tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Existentialism" title: "Existentialism" date: 2024-02-29 permalink: https://aarnphm.xyz/thoughts/Existentialism.html.md --- See also [Camus](https://aarnphm.xyz/thoughts/Existentialism/../../thoughts/Camus)’s absurdism. The school of philosophy that emerged as a backdrop of WWII, where entire generation was confronted with the anxiety-provoking given of death, freedom, and meaninglessness. Most of frontier were French, most notably Jean-Paul Sartre, Simone de Beauvoir, Albert [Camus](https://aarnphm.xyz/thoughts/Existentialism/../../thoughts/Camus), Gabriel Marcel, and Maurice Merleau-Ponty, the conceptual groundwork of the movement was laid much earlier in the nineteenth century by pioneers like Søren Kierkegaard and Friedrich [Nietzsche](https://aarnphm.xyz/thoughts/Existentialism/../../thoughts/Philosophy-and-Nietzsche) and twentieth-century German philosophers like Edmund Husserl, Martin Heidegger, and Karl Jaspers as well as prominent Spanish intellectuals José Ortega y Gasset and Miguel de Unamuno. See also [definition](https://plato.stanford.edu/entries/existentialism/) --- slug: thoughts/Expenses tags: - evergreen description: "resconstructed source of https://aarnphm.xyz/thoughts/Expenses" title: "Expenses" date: 2024-01-09 permalink: https://aarnphm.xyz/thoughts/Expenses.html.md --- > [!tip] TL;DR > > This is for personal uses, and I fully understand that I’m very fortunate to afford such lifestyle. ### Subscriptions: | Description | \$ | occurrence | Currency | Card | | ----------------------------------------------------------------------- | --------- | ---------- | -------- | ----- | | Apple TV | 9.95 | M | USD | Chase | | Discord Nitro | 9.99 | M | USD | Chase | | Perplexity Pro | 200 | Y | USD | Chase | | bookbear express | 70 | Y | USD | Chase | | Vocabulary | 29.99 | Y | USD | Chase | | Duolingo Max | 149.99 | Y | USD | Chase | | Strava | 79.99 | Y | USD | Chase | | Twitter Premium+ | 210 | Y | USD | Chase | | Uber One | 9.99 | M | USD | Chase | | Youtube Premium Student | 7.99 | M | USD | Chase | | Grammarly (for mom) | 144 | Y | USD | Chase | | [fashion](https://aarnphm.xyz/thoughts/Expenses/../../thoughts/fashion) | recurrent | year | USD | Chase | ### Archive: List of subscription I have stopped using. | Description | \$ | occurrence | Currency | Card | | -------------- | ----- | ---------- | -------- | ----- | | ChatGPT Plus | 20 | M | USD | Chase | | Apple One | 19.95 | M | USD | Chase | | Midjourney | 10 | M | USD | Chase | | Supermaven Pro | 10 | M | USD | Chase | --- slug: thoughts/Fisher-Yates tags: - seed description: "Fisher-Yates shuffle algorithm" title: "Fisher-Yates" date: 2024-01-30 permalink: https://aarnphm.xyz/thoughts/Fisher-Yates.html.md --- Produced an _unbiased_ permutation: every permutation is equally likely. Pseudocode: ```pseudo \begin{algorithm} \caption{Fisher-Yates shuffle} \begin{algorithmic} \REQUIRE An array $A$ of length $n$ \FOR{$i = n-1$ \TO $1$} \STATE $j \gets$ random integer such that $0 \leq j \leq i$ \STATE swap $A[i]$ and $A[j]$ \ENDFOR \end{algorithmic} \end{algorithm} ``` Implementation of modern Fisher-Yates algorithm ```js title="FisherYates.js" function sample(obj, n, guard) { if (n == null || guard) { if (!isArrayLike(obj)) obj = values(obj) return obj[random(obj.length - 1)] } var sample = toArray(obj) var length = getLength(sample) n = Math.max(Math.min(n, length), 0) var last = length - 1 for (var index = 0; index < n; index++) { var rand = random(index, last) var temp = sample[index] sample[index] = sample[rand] sample[rand] = temp } return sample.slice(0, n) } ``` --- slug: thoughts/Freud tags: - seed - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Freud" title: "Sigmund Freud" date: 2023-10-10 permalink: https://aarnphm.xyz/thoughts/Freud.html.md --- ## Beyond the Pleasure Principle ## The Ego and The Id > The state of consciousness is very transitory - [ ] P.16-18 - [ ] The relationship between Pcts and Cs. ### Cs, Pcs, Ucs. Two kinds of consciousness but in a dynamic sense it is one The ego is a coherent organisation of mental processes, that the consciousness is attached to. > But what about those in the processes which we may—roughly and inexactly— up under the name of thought-processes? They represent displacements of mental energy which are effected, where in the interior of the apparatus as this energy proceed on its way towards action. Do they advance to the sur which causes consciousness to be generated? Or does sciousness make its way to them? This is clearly one of the difficulties that arise when one begins to take the spatial or topological idea of mental life logically. Both are equally unimaginable. There must be a third alternative. In itself something unconscious become preconscious such that how can we make something that is repressed (pre)conscious would be answered: Internal perception yields sensation of processes arising in the most diverse strata of the mental apparatus. These sensations are my views about their idea for this. These sensations are multilocular, like external perceptions; they may come from different places simultaneously and may thus have different or even opposite qualities. Sensations of a pleasurable nature have not anything inherently impelling about them, whereas unpleasurable ones have it in the highest degree. The latter impel towards change, towards discharge, and that is why we interpret un-pleasure as implying a heightening and pleasure a lowering of energic cathexis.’ Let us call what becomes conscious as pleasure and unpleasure a quantitative and qualitative ‘something’ in the course of mental events; the question then is whether this ‘something’ can become conscious in the place where it is, or whether it must first be transmitted to the system Pept. Clinical experience decides for the latter. It shows us that this something’ behaves like a repressed impulse. It can exert driving force without the ego noticing the compulsion. > [!tip] The Ego > > The ego is the id modified by influence of perceptual system object-cathexis and Oedipus complex to describe the form of ego ### Object-choices and identification ```poetry language=fr At this point we must widen our range a little. We succeeded in explaining the painful disorder of melancholia by supposing that [in those suffering from it] an object which was lost has been set up again inside the ego-that is, that an object-cathexis has been replaced by an identification. ``` At that time, however, we did not appreciate the full significance of this process and did not know how common and how typical it is. Since then we have come to understand that this kind of substitution has a great share in determining the form taken by the ego and that it makes an essential contribution towards building up what is called its ‘character At the very beginning, in the individual’s primitive oral phase, object-cathexis and identification are no doubt indistinguishable from each other. We can only suppose that later on object-cathexis proceed from the id, which feels erotic trends as needs. The ego, which to begin with is still feeble, becomes aware of the object-cathexis, and either acquiesces in them or tries to fend them off by the process of repression. The super-ego originates from the experience that let to totemism Early conflicts of the ego with object-cathexis of the id can be continued in conflicts with their heir, super-ego If the ego has not succeeded in properly mastering the Oedipus complex, the energic cathexis pf the latter, spring from the id will come into operation once more reaction-formation of the ego ideal. --- slug: thoughts/GPU-programming tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/GPU-programming" title: "GPU programming" date: 2023-10-10 permalink: https://aarnphm.xyz/thoughts/GPU-programming.html.md --- ## first principles _blog post: [Making Deep Learning Go Brrrr From First Principles](https://horace.io/brrr_intro.html)_ --- slug: thoughts/Garbage-in-Garbage-out tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Garbage-in-Garbage-out" title: "Garbage in Garbage out" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/Garbage-in-Garbage-out.html.md --- There has been this notion of “garbage in, garbage out” in CS which states that bad [data](https://aarnphm.xyz/thoughts/Garbage-in-Garbage-out/../../thoughts/data), inputs will then produce an output that is of equal quality. The problem of [alignment](https://aarnphm.xyz/thoughts/Garbage-in-Garbage-out/../../thoughts/Alignment): How can we ingest [information](https://aarnphm.xyz/thoughts/Garbage-in-Garbage-out/../../thoughts/Information-Theory)into a system to align with our objectives? How one creates agenda-free [representations](https://aarnphm.xyz/thoughts/Garbage-in-Garbage-out/../../thoughts/representations) of a agenda-filled world? --- slug: thoughts/Gestalt-Principles tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Gestalt-Principles" title: "Gestalt Principles" date: 2024-03-09 permalink: https://aarnphm.xyz/thoughts/Gestalt-Principles.html.md --- Relates to how we perceive [composition](https://aarnphm.xyz/thoughts/Gestalt-Principles/../../thoughts/composition) Proximity Common Region --- slug: thoughts/Giles-Deleuze tags: - philosophy - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Giles-Deleuze" title: "Giles Deleuze" date: 2024-02-24 permalink: https://aarnphm.xyz/thoughts/Giles-Deleuze.html.md --- French philosopher, known for his work on the concept of multiplicity, being and affirmation. Also work on critical philosophy and the study of sense and value. ## [Nietzsche and Philosophy](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/Philosophy-and-Nietzsche) The common misunderstanding of power is that it is the object of the will. Instead, Deleuze posits Power as subject of the will, such that [Will to Power](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/Will-to-Power) is not a [desire](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/desire) for domination, but expressive force that creates values Nietzsche’s genealogy work on [moral](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/Philosophy-and-Nietzsche#on-genealogy-of-morals) makes nihilism the presupposition of all metaphysics rather than a particular metaphysics, which allows nihilism to be overcome via the active negation of reactive forces. Deleuze rejects the traditional metaphysical view of being as stable and singular, instead proposing an ontology of difference where being is understood as a dynamic process of becoming. [^1] This process is characterized by the constant creation of new relations and entities, without any predetermined goal or final state. In this framework, the will to power is seen as the differential and generative force that drives the process of becoming, constantly creating new values and ways of being. Deleuze interprets Nietzsche’s “eternal return” as affirmation of becoming: The analogy of a dice throw[^2]: When we throw the dice, the outcome is the combination of chances (randomness) and the necessity (resulting combination that follows the throw). Deleuze infers that necessity is not something separate from chance but is affirmed through chance. Or necessity (outcome of dice throw) is realized through **the act** of throwing the dice. Nietzsche turns chance into an affirmation, identifying it with multiplicity, fragments, parts, and [chaos](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/Chaos). The dice throw affirms becoming, and the combination it forms upon falling is the affirmation of necessity. ## active and reactive forces. See also: [action theory](https://aarnphm.xyz/thoughts/Giles-Deleuze/../../thoughts/action-theory) ## Capitalism and Schizophrenia [^1]: See [this notes](https://faculty.fordham.edu/tampio/Tampio%20-%20Multiplicity.pdf) [^2]: [chances](https://piratesandrevolutionaries.blogspot.com/2009/05/dicethrow-11-in-deleuze-nietzsche.html?m=1): “Nietzsche identifie le hasard au multiple, aux fragments, aux membres, au chaos: chaos des dés qu’on choque et qu’on lance.” --- slug: thoughts/Group-theory tags: - math description: "resconstructed source of https://aarnphm.xyz/thoughts/Group-theory" title: "group theory" date: 2024-02-26 permalink: https://aarnphm.xyz/thoughts/Group-theory.html.md --- # Graph isomorphism[](#graph-isomorphism) --- slug: thoughts/HTTP tags: - technical description: "resconstructed source of https://aarnphm.xyz/thoughts/HTTP" title: "HTTP" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/HTTP.html.md --- --- slug: thoughts/Hegel tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Hegel" title: "Hegel" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Hegel.html.md --- ## Phenomenology of Spirit --- slug: thoughts/Helmholtz-decomposition tags: - math description: "resconstructed source of https://aarnphm.xyz/thoughts/Helmholtz-decomposition" title: "Helmholtz decomposition" date: 2024-11-27 permalink: https://aarnphm.xyz/thoughts/Helmholtz-decomposition.html.md --- > certain differentiable vector fields can be resolved into sum of an _irrotational_ vector field and _solenoidal_ vector vield > [!math] definition > > for a vector field $\mathbf{F} \in C^1 (V, \mathbb{R}^n)$ defined on a domain $V \subseteq \mathbb{R}^n$, a Helmholtz decomposition is a pair of vector fields $\mathbf{G} \in C^1 (V, \mathbb{R}^n)$ and $\mathbf{R} \in C^1 (V, \mathbb{R}^n)$ such that: > > $$ > \begin{aligned} \mathbf{F}(\mathbf{r}) &= \mathbf{G}(\mathbf{r}) + \mathbf{R}(\mathbf{r}) \\ \mathbf{G}(\mathbf{r}) &= - \nabla \Phi (\mathbf{r}) \\ \nabla \cdot \mathbf{R}(\mathbf{r}) &= 0 \end{aligned} > $$ Here $\Phi \in C^2(V, \mathbb{R})$ is a scalar potential, $\nabla \Phi$ is its gradient, and $\nabla \cdot \mathbf{R}$ is the [divergence](https://aarnphm.xyz/thoughts/Helmholtz-decomposition/../../thoughts/Vector-calculus#divergence) of the vector field $R$ --- slug: thoughts/Hidden-Markov-model tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Hidden-Markov-model" title: "Hidden Markov model" date: 2024-10-02 permalink: https://aarnphm.xyz/thoughts/Hidden-Markov-model.html.md --- See also [wikipedia](https://en.wikipedia.org/wiki/Hidden_Markov_model) A Markov model where observations are dependent on a latent [_Markov process_](https://en.wikipedia.org/wiki/Markov_chain) $X$ > an HMM has an additional requirement that the outcome of $Y$ at time $t = t_0$ must be “influenced” exclusively by the outcome of $X$ at $t = t_0$ and that the outcomes of $X$ and $Y$ at $t It is _non-sequential_ writing — text that branches and enable choices to readers, without the need to follow a predetermined path. Hypertext can also be interpreted as a [database](https://aarnphm.xyz/thoughts/Hypertext/../../thoughts/Database) format in which information related to that on a display can be accessed directly from the display ![](https://aarnphm.xyz/thoughts/Hypertext/../../thoughts/images/hypertext.webp) He also brought up the concept of transclusion, which include parts of documents within other documents by reference. He envisioned an utopia, a global hypertext system (Xanadu) where all data was stored once, no deletions, and every information can be accessed through a links [^1], and everyone would be paid fairly for their work. ## fiction See also: [url](http://fictionaut.com/blog/2010/02/12/checking-in-with-hypertext-fiction/) non-linear space that use hypertext to explore narrative possibilities [^1]: [Interview with Ted Nelson](https://ics.uci.edu/~ejw/csr/nelson_pg.html) --- slug: thoughts/IPFS tags: - seed - technical description: "resconstructed source of https://aarnphm.xyz/thoughts/IPFS" title: "IPFS" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/IPFS.html.md --- IPFS is a decentralized storage and delivery networks which built on top of [p2p](https://aarnphm.xyz/thoughts/IPFS/../../thoughts/p2p) networking and content-based addressing (CID). > Can be seen in [git](https://aarnphm.xyz/thoughts/IPFS/../../thoughts/git) repositories, BitTorent, and most recently Ethereum. Similar to how we can reference an URI, we can look up its content by [content-address](https://aarnphm.xyz/thoughts/IPFS/../../thoughts/Content-addressable-storage) How would we use IPFS to share and publish data? --- slug: thoughts/In-memory-representation tags: - technical - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/In-memory-representation" title: "In memory representation" date: 2022-10-01 permalink: https://aarnphm.xyz/thoughts/In-memory-representation.html.md --- ## flatbuffer _difference_ with protobuf: no unpacking/parsing [Benchmark](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html) zero-mem copy with slightly larger wire format ## protobuf --- slug: thoughts/Information-Theory tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Information-Theory" title: "Information Theory" date: 2024-01-20 permalink: https://aarnphm.xyz/thoughts/Information-Theory.html.md --- See also [pdf](https://fleuret.org/public/EN_essays/fleuret-inf-theory-2024.pdf) > Less horror. Probably full of typo.\ > \ > Source tex there: [pic.twitter.com/9e4FdQol3b](https://t.co/9e4FdQol3b) > > — François Fleuret (@francoisfleuret) [18 janvier 2024](https://twitter.com/francoisfleuret/status/1748011011590799462?ref_src=twsrc%5Etfw) ## hierarchy related to [design](https://aarnphm.xyz/thoughts/Information-Theory/../../thoughts/design) --- slug: thoughts/Intelligence-amplification tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Intelligence-amplification" title: "Intelligence amplification" date: 2024-01-07 permalink: https://aarnphm.xyz/thoughts/Intelligence-amplification.html.md --- > I’m playing around with calling our tech, as it is today, IA (intelligence amplification) instead of AI. IA have the vibe of tools for thought, needing human interaction, and resemble a lot more what we actually have today. AI feels more like independent long-running agents. > > — Andrej Karpathy (@karpathy) [7 janvier 2024](https://twitter.com/karpathy/status/1744062845426532473?ref_src=twsrc%5Etfw) Intelligence should be thought as a tool for thought, not an independent agent These systems should be built on top of human intelligence, not replace it. Next-token prediction is primitive to call a system intelligent. Can a [transformers](https://aarnphm.xyz/thoughts/Intelligence-amplification/../../thoughts/Transformers) ever be [Turing-complete](https://aarnphm.xyz/thoughts/Intelligence-amplification/../../thoughts/Turing-complete-Transformers)? ## research area. A lot of alpha in mechanistic analysis of the [representations](https://aarnphm.xyz/thoughts/Intelligence-amplification/../../thoughts/representations) these models exhibit., or “virtual brain analysis”. --- slug: thoughts/Jax tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Jax" title: "Jax" date: 2022-11-07 permalink: https://aarnphm.xyz/thoughts/Jax.html.md --- Numpy + [Autograd](https://aarnphm.xyz/thoughts/Jax/../../thoughts/Autograd). Use [XLA](https://aarnphm.xyz/thoughts/Jax/../../thoughts/XLA) to compile and run NumPy code on accelerators. Asynchronous dispatch, for sync use `block_until_ready()` ```python import jax.numpy as jnp from jax import random key = random.PRNGKey(0) x = random.normal(key, (10,)) jnp.dot(x, x.T).block_until_ready() ``` - notable function: - `jit()` for compilation of multiple computations - `grad()` for performing transformation (autodiff, [Jacobian](https://aarnphm.xyz/thoughts/Jax/../../thoughts/Vector-calculus#jacobian-matrix)-vector product) - `vmap()` for auto-vectorisation > Arrays are **immutable** in Jax - Treat functions as pure as to compiled with [XLA](https://aarnphm.xyz/thoughts/Jax/../../thoughts/XLA) ```python title="entropix/dslider.py" from functools import partial from typing import NamedTuple, Tuple import jax import jax.numpy as jnp import jax.scipy as jsp @jax.jit def kl_divergence(logp: jnp.ndarray, logq: jnp.ndarray) -> jnp.ndarray: """Compute KL divergence between two log probability distributions.""" p = jnp.exp(logp) return jnp.sum(jnp.where(p > 0, p * (logp - logq), 0.0), axis=-1) @jax.jit def ent_varent(logp: jnp.ndarray) -> Tuple[jnp.ndarray, jnp.ndarray]: """Compute entropy and varentropy from log probabilities.""" p = jnp.exp(logp) ent = -jnp.sum(p * logp, axis=-1) diff = logp + ent[..., None] varent = jnp.sum(p * diff**2, axis=-1) return ent, varent @jax.jit def normalize_logits(logits: jnp.ndarray, noise_floor: float) -> jnp.ndarray: """Normalize logits to log probabilities with noise floor truncation.""" shifted = logits - jnp.max(logits, axis=-1, keepdims=True) normalized = shifted - jax.nn.logsumexp(shifted + EPS, axis=-1, keepdims=True) # noise floor calculated for bfloat16 return jnp.where(normalized < noise_floor, jnp.log(EPS), normalized) ``` _references: [github](https://github.com/xjdr-alt/entropix/blob/main/entropix/dslider.py)_ ## control flow see also [link](https://jax.readthedocs.io/en/latest/notebooks/Common_Gotchas_in_JAX.html#python-control-flow-jit) The following works: ```python @jax.jit def f(x): for i in range(3): x = 2 * x return x print(f(3)) @jax.jit def g(x): y = 0. for i in range(x.shape[0]): y = y + x[i] return y print(g(jnp.array([1., 2., 3.]))) ``` > [!warning]- doesn't work > > ```python {2,4,6} > @jax.jit > def fail(x): > if x < 3: return 3. * x ** 2 > else : return -4 * x > > fail(2) > ``` Reasoning: `jit` traces code on `ShapedArray` abstraction, where each abstract value represents the set of all array values with a fixed shape and dtype > [!tip]+ type coercion tradeoff > > If we trace a Python function on a `ShapedArray((), jnp.float32)` that isn’t committed to a specific concrete value, when we hit a line like if `x < 3`, the expression x < 3 evaluates to an abstract `ShapedArray((), jnp.bool_)` that represents the set `{True, False}`. Fix: you can use `static_argnums` to specify which argument should be treated as static ```python @jit(static_argnums=(0,)) def f(x): if x < 3: return 3. * x ** 2 else: return -4 * x ``` ## buffers > [!question] How does JAX handle memory buffers? [fast replay buffers](https://github.com/instadeepai/flashbax) --- slug: thoughts/KV-compression tags: - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/KV-compression" title: "KV compression" date: 2024-10-10 permalink: https://aarnphm.xyz/thoughts/KV-compression.html.md --- see also: [github](https://github.com/October2001/Awesome-KV-Cache-Compression) TLDR: Most algorithm determine importance through aggregating attentions over observed queries ([Liu et al., 2023](#bib-liu2023scissorhandsexploitingpersistenceimportance); [Zhang et al., 2023](#bib-zhang2023h2oheavyhitteroracleefficient)) More recent work aggregated attention from _limited observation windows_ ([Cai et al., 2024](#bib-cai2024pyramidkvdynamickvcache); [Li et al., 2024](#bib-li2024snapkvllmknowslooking)) uses top\_k to find $k$-indices of attentions per head to preserve, and evict the not-so-important ones. ## idea. Look at past attention weights for each pair of key and value vectors (a measure of the degree with which that KV’s representation has been queried during past attention operations) Then select the KV with the least attention to evict Think of LFU (least frequency used) cache management policy the KV cache for each sequence in a particular layer is allocated on the GPU as a _# attention heads $X$ sequence length_ tensor. > [!tip] Important > > total memory allocation scales with the _maximum_ sequence length for all attention heads of the KV cache ## Adaptive KV-cache compression See also [paper](https://arxiv.org/abs/2310.01801) ([Ge et al., 2024](#bib-ge2024modeltellsdiscardadaptive)) ## Streaming LLM _Using attention sink_ see also [paper](https://arxiv.org/abs/2309.17453) ([Xiao et al., 2024](#bib-xiao2024efficientstreaminglanguagemodels)) Ablate attentions among layers that deemed to be less valuable to current generations. ## Pyramid-KV See also [paper](https://arxiv.org/abs/2406.02069) ([Cai et al., 2024](#bib-cai2024pyramidkvdynamickvcache)) ![](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/images/pyramid-kv.webp) ## Snap-KV See also [paper](https://arxiv.org/abs/2404.14469), [github](https://github.com/FasterDecoding/SnapKV) ([Li et al., 2024](#bib-li2024snapkvllmknowslooking)) Voting: calculating attention weights for each query within observation windows across all attention heads, then aggregate to highlight prefix positions. Formally for a single batch: $$ \begin{aligned} C = &\sum_{i=0}^{L_{\text{obs}}} W_{\text{obs}} [:,i,:] \\ I &= \text{Top}_{k}(C, k) \end{aligned} $$ _[hijack for llama\_hijack\_4\_37.py](https://github.com/FasterDecoding/SnapKV/blob/82135ce2cc60f212a9ba918467f3d9c8134e163f/snapkv/monkeypatch/llama_hijack_4_37.py#L19)_ > [!tip] Important > > $k$ is defined as $\lfloor p \times L_{\text{prefix}} \rfloor$, where $p$ is the compression rates. Hit Rate: essentially the attention features above a predefined threshold $\Theta$ to be _important_ features. The idea is to have two stages: - **Vote for important features**: select important features based on important features given fixed windows. - **Update and store the compressed KV**: concat attention features within the windows and update the KV-cache. - clustering via pooling ⇒ frequent hit-rate attention ```python attn_cache = pool1d(attn_weights_sum, kernel_size=kernel_size, padding=kernel_size//2, stride=1) ``` ## Ada-KV ideas: instead of uniform eviction for KV cache hit, allocate a certain budget $B_i$ per attention heads to dynamically evict certain heads _built on-top of PyramidKV and SnapKV_ ![](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/images/vllm/ada-kv.webp) > [!note] Note > > With Ada-SnapKV, each attention layers are still assigned with a fixed compression rate (refer to the image example) See also [paper](https://arxiv.org/abs/2407.11550) ([Feng et al., 2024](#bib-feng2024adakvoptimizingkvcache)) ## KIVI link: [github](https://github.com/jy-yuan/KIVI) --- - url: thoughts/vllm - description: KV-Compress ## KV-Compress _variable compression rates per attention head_ source: [github](https://github.com/IsaacRe/vllm-kvcompress) - url: thoughts/KV-compression - description: idea for kv cache ## idea. Look at past attention weights for each pair of key and value vectors (a measure of the degree with which that KV’s representation has been queried during past attention operations) Then select the KV with the least attention to evict Think of LFU (least frequency used) cache management policy the KV cache for each sequence in a particular layer is allocated on the GPU as a _# attention heads $X$ sequence length_ tensor. > [!tip] Important > > total memory allocation scales with the _maximum_ sequence length for all attention heads of the KV cache [Lien vers l'original](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm/../../thoughts/KV-compression#idea) > [!notes] Notes > > A variation of [Ada-SnapKV](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm/../../thoughts/KV-compression#ada-kv) Motivation: - _group-query-compression_: compress KV-cache of [GQA](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm/../../thoughts/Attention#group-query-attention) without repeating it into the dimension of $\sum$ query heads. - Modified `PagedAttention` that compute _against_ KV-cache (contains variable numbers of KVs per head) ![](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm/../../thoughts/images/vllm/kv-compress-vllm.webp) > For vLLM, each cache block stores KV for every attention head of every layer > > For KV-Compress, each block only holds KVs for a single head. Block tables are expanded $l \times H$ so that unique block for each specific KV head and layer can be retrieved ### Query-Group Compression (QGC) KV compression algorithm doesn’t have GQA design in mind. - [Pyramid-KV](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm/../../thoughts/KV-compression#pyramid-kv) cache and compress KV _after_ repetition for alignment with query tensors - Redundancy in cache before compression > modification of eviction-based methods per groups ### Block layout and allocation idea: adapt PagedAttention to page out cache on a _per-head, per-layer–as well as per sequence–basis_ ![](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm/../../thoughts/images/vllm/paged-attention-block-kv-compress.webp) > [!note]- explanation > > A simplified example with two KV heads and a block size of two: > > - KV metrics are visualized for a given cache state, highlighting blocks of a particular sequence in the decoding batch that is scheduled to evict two blocks. > - Logical indices are displayed under the corresponding metrics slot. #### Evict from Paged KV cache > need to evict KV blocks instead of evict single KV attention [Lien vers l'original](https://aarnphm.xyz/thoughts/KV-compression/../../thoughts/vllm#kv-compress) ## Bibliographie - Cai, Z., Zhang, Y., Gao, B., Liu, Y., Liu, T., Lu, K., Xiong, W., Dong, Y., Chang, B., Hu, J., & Xiao, W. (2024). _PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling_. arXiv preprint arXiv:2406.02069 [\[arxiv\]](https://arxiv.org/abs/2406.02069) - Feng, Y., Lv, J., Cao, Y., Xie, X., & Zhou, S. K. (2024). _Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference_. arXiv preprint arXiv:2407.11550 [\[arxiv\]](https://arxiv.org/abs/2407.11550) - Ge, S., Zhang, Y., Liu, L., Zhang, M., Han, J., & Gao, J. (2024). _Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs_. arXiv preprint arXiv:2310.01801 [\[arxiv\]](https://arxiv.org/abs/2310.01801) - Li, Y., Huang, Y., Yang, B., Venkitesh, B., Locatelli, A., Ye, H., Cai, T., Lewis, P., & Chen, D. (2024). _SnapKV: LLM Knows What You are Looking for Before Generation_. arXiv preprint arXiv:2404.14469 [\[arxiv\]](https://arxiv.org/abs/2404.14469) - Liu, Z., Desai, A., Liao, F., Wang, W., Xie, V., Xu, Z., Kyrillidis, A., & Shrivastava, A. (2023). _Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time_. arXiv preprint arXiv:2305.17118 [\[arxiv\]](https://arxiv.org/abs/2305.17118) - Xiao, G., Tian, Y., Chen, B., Han, S., & Lewis, M. (2024). _Efficient Streaming Language Models with Attention Sinks_. arXiv preprint arXiv:2309.17453 [\[arxiv\]](https://arxiv.org/abs/2309.17453) - Zhang, Z., Sheng, Y., Zhou, T., Chen, T., Zheng, L., Cai, R., Song, Z., Tian, Y., Ré, C., Barrett, C., Wang, Z., & Chen, B. (2023). _H₂O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models_. arXiv preprint arXiv:2306.14048 [\[arxiv\]](https://arxiv.org/abs/2306.14048) --- slug: thoughts/LLMs tags: - sapling - ml - llm description: "resconstructed source of https://aarnphm.xyz/thoughts/LLMs" title: "LLMs" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/LLMs.html.md --- [large language](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Machine-learning) models, often implemented as [autoregressive](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Autoregressive-models) [transformers](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Transformers) models. > [!note] GPTs and friends > > Most variants of LLMs are decoder-only ([Radford et al., 2019](#bib-radford2019language)) Have “capabilities” to understand [natural language](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/NLP). Exhibits [emergent behaviour](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/emergent-behaviour) of [intelligence](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/intelligence), but probably not [AGI](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/AGI) due to [observer-expectancy effect](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/observer-expectancy-effect). One way or another is a form of [behaviourism](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Behavirourism), through [reinforcement learning](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Machine-learning). It is being “told” what is good or bad, and thus act accordingly towards the users. However, this induces [confirmation bias](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/confirmation-bias) where one aligns and contains his/her prejudices towards the problem. ### Scalability Incredibly hard to scale, mainly due to their [large](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/large-models) memory footprint and tokens memory allocation. ### Optimization See also: [this talk](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/images/htn-openllm.pdf) - [Quantization](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/quantization): reduce computational and memory costs of running inference with representing the weight and activations with low-precision data type - [Continuous batching](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Continuous-batching): Implementing [Paged Attention](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Attention#paged-attention) with custom scheduler to manage swapping kv-cache for better resource utilisation ### on how we are being [taught](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/education#teaching). How would we assess thinking? Similar to calculator, it _simplifies_ and increase accessibility to the masses, but in doing so _lost_ the value in the _action of doing_ math. We do math to internalize the concept, and practice to thinking coherently. Similarly, we [write](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/writing) to help crystalised our ideas, and in the process improve through the act of putting it down. The process of rephrasing and arranging sentences poses a challenges for the writer, and in doing so, teach you how to think coherently. Writing essays is an exercise for students to articulate their thoughts, rather than testing the understanding of the materials. ### on [ethics](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/ethics) See also [Alignment](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Alignment). There are ethical concerns with the act of “hallucinating” content, therefore alignment research is crucial to ensure that the model is not producing harmful content. ### as philosophical tool. To create a better [representations](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/representations) of the world for both humans and machines to understand, we can truly have assistive tools to enhance our understanding of the world surround us ### AI generated content Don’t shit where you eat, **[Garbage in, garbage out](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Garbage-in-Garbage-out)**. The quality of the content is highly dependent on the quality of the data it was trained on, or model are incredibly sensitive to [data](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/data) variances and biases. Bland doublespeak See also: [All the better to see you with](https://www.kernelmag.io/2/all-the-better-to-see-you) > Here's a real problem though. Most people find writing hard and will get AIs to do it for them whenever they can get away with it. Which means bland doublespeak will become the default style of writing. Ugh. > > — Paul Graham (@paulg) [25 février 2024](https://twitter.com/paulg/status/1761801995302662175?ref_src=twsrc%5Etfw) ### machine-assisted writings _source: [`gwern[dot]net`](https://gwern.net/gpt-3)_ Idea: use [sparse autoencoders](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability#sparse-autoencoders) to guide ideas generations ### Good-enough > "How did we get AI art before self-driving cars?" IMHO this is the single best heuristic for predicting the speed at which certain AI advances will happen. [pic.twitter.com/yAo6pwEsxD](https://t.co/yAo6pwEsxD) > > — Joshua Achiam (@jachiam0) [1 décembre 2022](https://twitter.com/jachiam0/status/1598448668537155586?ref_src=twsrc%5Etfw) This only occurs if you only need a “good-enough” item where value outweighs the process. However, one should always consider to put in the work, rather than being “ok” with good enough. In the process of working through a problem, one will learn about bottleneck and problems to be solved, which in turn gain invaluable experience otherwise would not achieved if one fully relies on the interaction with the models alone. ### as [search](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/Search) These models are incredibly useful for summarization and information gathering. With the [taxonomy](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/taxonomy) of [RAG](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/RAG) or any other CoT tooling, you can pretty much augment and produce and improve search-efficiency bu quite a lot. notable mentions: - [perplexity.ai](https://perplexity.ai/): [RAG](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/RAG)-first search engine - [explorer.globe.engineer](https://explorer.globe.engineer/): tree-based [information retrieval](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/information-retrieval) - [Exa labs](https://twitter.com/ExaAiLabs) - [You.com](https://you.com/?chatMode=default) ### Programming Overall should be a net positive, but it’s a double-edged sword. #### as end-users [Source](https://www.geoffreylitt.com/2023/03/25/llm-end-user-programming.html) > I think it’s likely that soon all computer users will have the ability to develop small software tools from scratch, and to describe modifications they’d like made to software they’re already using #### as developers Tool that lower of barrier of entry is always a good thing, but it often will lead to probably even higher discrepancies in quality of software Increased in productivity, but also increased in technical debt, as these generated code are mostly “bad” code, and often we have to nudge and do a lot of **[prompt engineering](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/prompt-engineering)**. --- - url: thoughts/mechanistic-interpretability # mechanistic interpretability [whirlwind tour](https://www.youtube.com/watch?v=veT2VI4vHyU\&ab_channel=FAR%E2%80%A4AI), [initial exploration](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability/../../thoughts/pdfs/tinymorph-exploration.pdf), [glossary](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J) > The subfield of alignment that delves into reverse engineering of a neural network, especially [LLMs](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability/../../thoughts/LLMs) To attack the _curse of dimensionality_, the question remains: __how do we hope to understand a function over such a large space, without an exponential amount of time?__ [^lesswrongarc] ## inference application in the wild: [Goodfire](https://goodfire.ai/) and [Transluce](https://transluce.org/) > [!question]+ How we would do inference with SAE? > > > Quick 🧵 and some of quick introspection into how they might run inference > > > > — aaron (@aarnphm\_) [25 septembre 2024](https://twitter.com/aarnphm_/status/1839016131321016380?ref_src=twsrc%5Etfw) idea: treat SAEs as a `logit_processor`, similar to [guided decoding](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability/../../thoughts/vllm#guided-decoding) Current known bottleneck in vLLM: - `logit_processor` are row-wise, or logits are processed synchronously and blocking [^vllm-caveats] - no SPMD currently implemented ## steering refers to the process of manually modifying certain activations and hidden state of the neural net to influence its outputs For example, the following is a toy example of how a decoder-only transformers (i.e: GPT-2) generate text given the prompt “The weather in California is” ```mermaid flowchart LR A[The weather in California is] --> B[H0] --> D[H1] --> E[H2] --> C[... hot] ``` To steer to model, we modify $H_2$ layers with certain features amplifier with scale 20 (called it $H_{3}$)[^1] ```mermaid flowchart LR A[The weather in California is] --> B[H0] --> D[H1] --> E[H3] --> C[... cold] ``` One usually use techniques such as [sparse autoencoders](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability/../../thoughts/mechanistic-interpretability#sparse-autoencoders) to decompose model activations into a set of interpretable features. For feature [ablation](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability/../../thoughts/mechanistic-interpretability#ablation), we observe that manipulation of features activation can be strengthened or weakened to directly influence the model’s outputs A few examples where ([Panickssery et al., 2024](#bib-panickssery2024steeringllama2contrastive)) uses contrastive activation additions to steer Llama 2 ### contrastive activation additions intuition: using a contrast pair for steering vector additions at certain activations layers Uses _mean difference_ which produce difference vector similar to PCA: Given a dataset $\mathcal{D}$ of prompt $p$ with positive completion $c_p$ and negative completion $c_n$, we calculate mean-difference $v_\text{MD}$ at layer $L$ as follow: $$ v_\text{MD} = \frac{1}{\mid \mathcal{D} \mid} \sum_{p,c_p,c_n \in \mathcal{D}} a_L(p,c_p) - a_L(p, c_n) $$ > [!tip] implication > > by steering existing learned representations of behaviors, CAA results in better out-of-distribution generalization than basic supervised finetuning of the entire model. ## sparse autoencoders abbrev: SAE _see also: [landspace](https://docs.google.com/document/d/1lHvRXJsbi41bNGZ_znGN7DmlLXITXyWyISan7Qx2y6s/edit?tab=t.0#heading=h.j9b3g3x1o1z4)_ Often contains one layers of MLP with few linear ReLU that is trained on a subset of datasets the main LLMs is trained on. > empirical example: if we wish to interpret all features related to the author Camus, we might want to train an SAEs based on all given text of Camus to interpret “similar” features from Llama-3.1 > [!abstract] definition > > We wish to decompose a models’ activitation $x \in \mathbb{R}^n$ into sparse, linear combination of feature directions: > > $$ > \begin{aligned} x \sim x_{0} + &\sum_{i=1}^{M} f_i(x) d_i \\[8pt] \because \quad &d_i M \gg n:\text{ latent unit-norm feature direction} \\ &f_i(x) \ge 0: \text{ corresponding feature activation for }x \end{aligned} > $$ Thus, the baseline architecture of SAEs is a linear autoencoder with L1 penalty on the activations: $$ \begin{aligned} f(x) &\coloneqq \text{ReLU}(W_\text{enc}(x - b_\text{dec}) + b_\text{enc}) \\ \hat{x}(f) &\coloneqq W_\text{dec} f(x) + b_\text{dec} \end{aligned} $$ > training it to reconstruct a large dataset of model activations $x \sim \mathcal{D}$, constraining hidden representation $f$ to be sparse [L1 norm](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-autoencoder/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1#l1norm) with coefficient $\lambda$ to construct loss during training: $$ \begin{aligned} \mathcal{L}(x) &\coloneqq \| x-\hat{x}(f(x)) \|_2^2 + \lambda \| f(x) \|_1 \\[8pt] &\because \|x-\hat{x}(f(x)) \|_2^2 : \text{ reconstruction loss} \end{aligned} $$ > [!tip] intuition > > We need to reconstruction fidelity at a given sparsity level, as measured by L0 via a mixture of reconstruction fidelity and L1 regularization. We can reduce sparsity loss term without affecting reconstruction by scaling up norm of decoder weights, or constraining norms of columns $W_\text{dec}$ during training Ideas: output of decoder $f(x)$ has two roles - detects what features acre active ⇐ L1 is crucial to ensure sparsity in decomposition - _estimates_ magnitudes of active features ⇐ L1 is unwanted bias ### Gated SAE _uses Pareto improvement over training to reduce L1 penalty_ ([Rajamanoharan et al., 2024](#bib-rajamanoharan2024improvingdictionarylearninggated)) Clear consequence of the bias during training is _shrinkage_ ([Sharkey, 2024](#bib-sharkey2024feature)) [^shrinkage] Idea is to use [gated ReLU](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-autoencoder/../../thoughts/optimization#gated-linear-units-and-variants) encoder ([Dauphin et al., 2017](#bib-dauphin2017languagemodelinggatedconvolutional); [Shazeer, 2020](#bib-shazeer2020gluvariantsimprovetransformer)): $$ \tilde{f}(\mathbf{x}) \coloneqq \underbrace{\mathbb{1}[\underbrace{(\mathbf{W}_{\text{gate}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{gate}}) > 0}_{\pi_{\text{gate}}(\mathbf{x})}]}_{f_{\text{gate}}(\mathbf{x})} \odot \underbrace{\text{ReLU}(\mathbf{W}_{\text{mag}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{mag}})}_{f_{\text{mag}}(\mathbf{x})} $$ where $\mathbb{1}[\bullet > 0]$ is the (point-wise) Heaviside step function and $\odot$ denotes element-wise multiplication. | term | annotations | | -------------------- | ------------------------------------------------------------------------------- | | $f_\text{gate}$ | which features are deemed to be active | | $f_\text{mag}$ | feature activation magnitudes (for features that have been deemed to be active) | | $\pi_\text{gate}(x)$ | $f_\text{gate}$ sub-layer’s pre-activations | to negate the increases in parameters, use _weight sharing_: Scale $W_\text{mag}$ in terms of $W_\text{gate}$ with a vector-valued rescaling parameter $r_\text{mag} \in \mathbb{R}^M$: $$ (W_\text{mag})_{ij} \coloneqq (\exp (r_\text{mag}))_i \cdot (W_\text{gate})_{ij} $$ ![](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-autoencoder/../../thoughts/images/gated-sae-architecture.webp) _Figure 3: Gated SAE with weight sharing between gating and magnitude paths_ ![](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-autoencoder/../../thoughts/images/gated_jump_relu.webp) _Figure 4: A gated encoder become a single layer linear encoder with [JumpReLU](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-autoencoder/../../thoughts/optimization#jumprelu)_ ([Erichson et al., 2019](#bib-erichson2019jumpreluretrofitdefensestrategy)) _activation function_ $\sigma_\theta$ ### feature suppression See also: [link](https://www.alignmentforum.org/posts/3JuSjTZyMzaSeTxKk/addressing-feature-suppression-in-saes) Loss function of SAEs combines a MSE reconstruction loss with sparsity term: $$ \begin{aligned} L(x, f(x), y) &= \|y-x\|^2/d + c\mid f(x) \mid \\[8pt] &\because d: \text{ dimensionality of }x \end{aligned} $$ > the reconstruction is not perfect, given that only one is reconstruction. **For smaller value of $f(x)$, features will be suppressed** > [!note]- illustrated example > > consider one binary feature in one dimension $x=1$ with probability $p$ and $x=0$ otherwise. Ideally, optimal SAE would extract feature activation of $f(x) \in \{0,1\}$ and have decoder $W_d=1$ > > However, if we train SAE optimizing loss function $L(x, f(x), y)$, let say encoder outputs feature activation $a$ if $x=1$ and 0 otherwise, ignore bias term, the optimization problem becomes: > > $$ > \begin{aligned} a &= \argmin p * L(1,a,a) + (1-p) * L(0,0,0) \\ &= \argmin (1-a)^2 + \mid a \mid * c \\ &= \argmin a^2 + (c-2) *a +1 \end{aligned} \Longrightarrow \boxed{a = 1-\frac{c}{2}} > $$ > [!question]+ How do we fix feature suppression in training SAEs? > > introduce element-wise scaling factor per feature in-between encoder and decoder, represented by vector $s$: > > $$ > \begin{aligned} f(x) &= \text{ReLU}(W_e x + b_e) \\ f_s(x) &= s \odot f(x) \\ y &= W_d f_s(x) + b_d \end{aligned} > $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder) ## sparse crosscoders > [!tip] maturity > > a research preview from Anthroppic and this is pretty much still a work in progress see also [reproduction on Gemma 2B](https://colab.research.google.com/drive/124ODki4dUjfi21nuZPHRySALx9I74YHj?usp=sharing) and [github](https://github.com/ckkissane/crosscoder-model-diff-replication) A variant of [sparse autoencoder](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-crosscoders/../../thoughts/sparse-autoencoder) where it reads and writes to multiple layers ([Lindsey et al., 2024](#bib-lindsey2024sparsecrosscoders)) Crosscoders produces _shared features across layers and even models_ ## motivations Resolve: - cross-layer features: resolve cross-layer superposition - circuit simplification: remove redundant features from analysis and enable jumping across training many uninteresting identity circuit connections - model diffing: produce shared sets of features across models. This also introduce one model across training, and also completely independent models with different architectures. ### cross-layer [superposition](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-crosscoders/../../thoughts/mechanistic-interpretability#superposition-hypothesis) ![](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-crosscoders/../../thoughts/images/additive-residual-stream-llm.webp) _given the additive properties of transformers’ residual stream, **adjacent layers** in larger transformers can be thought as “almost parallel”_ > [!tip]- intuition > > In basis of superposition hypothesis, a feature is a linear combinations of neurons at any given layers. > > ![](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-crosscoders/../../thoughts/images/feature-neurons.webp) ![](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-crosscoders/../../thoughts/images/one-step-circuit.webp) ![](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-crosscoders/../../thoughts/images/parallel-joint-branch.webp) _if we think of adjacent layers as being “almost parallel branches that potentially have superposition between them”, then we can apply dictionary learning jointly [^jointlysae]_ ### persistent features and complexity Current drawbacks of sparse autoencoders is that we have to train it against certain activations layers to extract features. In terms of the residual stream per layers, we end up having lots of duplicate features across layers. > Crosscoders can simplify the circuit _given that we use an appropriate architecture_ [^risks] ## setup. > Autoencoders and transcoders as special cases of crosscoders. > > - autoencoders: reads and predict the same layers > - transcoders: read from layer $n$ and predict layer $n+1$ Crosscoder read/write to many layers, subject to causality constraints. > [!math]+ crosscoders > > Let one compute the vector of feature activation $f_(x_j)$ on data point $x_j$ by summing over contributions of activations of different layers $a^l(x_j)$ for layers $l \in L$: > > $$ > \begin{aligned} f(x_j) &= \text{ReLU}(\sum_{l\in L}W_{\text{enc}}^l a^l(x_j) + b_{\text{enc}}) \\[8pt] &\because W^l_{\text{enc}} : \text{ encoder weights at layer } l \\[8pt] &\because a^l(x_j) : \text{ activation on datapoint } x_j \text{ at layer } l \\ \end{aligned} > $$ We have loss $$ L = \sum_{l\in L} \|a^l(x_j) - a^{l^{'}}(x_j)\|^2 + \sum_{l\in L}\sum_i f_i(x_j) \|W^l_{\text{dec,i}}\| $$ and regularization can be rewritten as: $$ \sum_{l\in L}\sum_{i} f_i(x_j) \|W^l_{\text{dec,i}}\| = \sum_{i} f_i(x_j)(\displaystyle\sum_{l \in L} \|W^l_\text{dec,i}\|) $$ _weight of L1 regularization penalty by L1 norm of per-layer decoder weight norms_ $\sum\limits{l\in L} \|W^l_\text{dec,i}\|$ [^l2weightnorm] We use L1 due to - baseline loss comparison: L2 exhibits lower loss than sum of per-layer SAE losses, as they would effectively obtain a loss “bonus” by spreading features across layers - _layer-wise sparsity surfaces layer-specific features_: based on empirical results of [model diffing](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-crosscoders/../../thoughts/sparse-crosscoders#model-diffing), that L1 uncovers a mix of shared and model-specific features, whereas L2 tends to uncover only shared features. ## variants ![](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-crosscoders/../../thoughts/images/crosscoders-variants.webp) good to explore: 1. strictly causal crosscoders to capture MLP computation and treat computation performed by attention layers as linear 2. combine strictly causal crosscoders for MLP outputs without weakly causal crosscoders for attention outputs 3. interpretable attention replacement layers that could be used in combination with strictly causal crosscoders for a “replacement model” ## model diffing see also: [model stiching](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-crosscoders/../../thoughts/model-stiching) and [SVCCA](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-crosscoders/../../thoughts/SVCCA) > ([Laakso & Cottrell, 2000](#bib-doi:10.1080/09515080050002726)) proposes compare [representations](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-crosscoders/../../thoughts/representations) by transforming into representations of distances between data points. [^sne] ## questions > How do features change over model training? When do they form? > As we make a model wider, do we get more features? or they are largely the same, packed less densely? [Lien vers l'original](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders) ## superposition hypothesis > [!abstract]+ tl/dr > > phenomena when a neural network represents _more_ than $n$ features in a $n$-dimensional space > Linear representation of neurons can represent more features than dimensions. As sparsity increases, model use superposition to represent more [features](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability/../../thoughts/mechanistic-interpretability#features) than dimensions. > > neural networks “want to represent more features than they have neurons”. When features are sparsed, superposition allows compression beyond what linear model can do, at a cost of interference that requires non-linear filtering. reasoning: “noisy simulation”, where small neural networks exploit feature sparsity and properties of high-dimensional spaces to approximately simulate much larger much sparser neural networks In a sense, superposition is a form of **lossy [compression](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability/../../thoughts/Compression)** ### importance - sparsity: how _frequently_ is it in the input? - importance: how useful is it for lowering loss? ### over-complete basis _reasoning for the set of $n$ directions [^direction]_ ## features > A property of an input to the model When we talk about features ([Elhage et al., 2022, p. see “Empirical Phenomena”](#bib-elhage2022superposition)), the theory building around several observed empirical phenomena: 1. Word Embeddings: have direction which corresponding to semantic properties ([Mikolov et al., 2013](#bib-mikolov-etal-2013-linguistic)). For example: ```prolog V(king) - V(man) = V(monarch) ``` 2. Latent space: similar vector arithmetics and interpretable directions have also been found in generative adversarial network. We can define features as properties of inputs which a sufficiently large neural network will reliably dedicate a neuron to represent ([Elhage et al., 2022, p. see “Features as Direction”](#bib-elhage2022superposition)) ## ablation > refers to the process of removing a subset of a model’s parameters to evaluate its predictions outcome. idea: deletes one activation of the network to see how performance on a task changes. - zero ablation or _pruning_: Deletion by setting activations to zero - mean ablation: Deletion by setting activations to the mean of the dataset - random ablation or _resampling_ ## residual stream ```mermaid flowchart LR A[Token] --> B[Embeddings] --> C[x0] C[x0] --> E[H] --> D[x1] C[x0] --> D D --> F[MLP] --> G[x2] D --> G[x2] G --> I[...] --> J[unembed] --> X[logits] ``` residual stream $x_{0}$ has dimension $\mathit{(C,E)}$ where - $\mathit{C}$: the number of tokens in context windows and - $\mathit{E}$: embedding dimension. [Attention](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability/../../thoughts/Attention) mechanism $\mathit{H}$ process given residual stream $x_{0}$ as the result is added back to $x_{1}$: $$ x_{1} = \mathit{H}{(x_{0})} + x_{0} $$ ## grokking See also: [writeup](https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking), [code](https://colab.research.google.com/drive/1F6_1_cWXE5M7WocUcpQWp3v8z4b1jL20), [circuit threads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) > A phenomena discovered by ([Power et al., 2022](#bib-power2022grokkinggeneralizationoverfittingsmall)) where small algorithmic tasks like modular addition will initially memorise training data, but after a long time ti will suddenly learn to generalise to unseen data > [!tip] empirical claims > > related to phase change [^lesswrongarc]: good read from [Lawrence C](https://www.lesswrong.com/posts/6FkWnktH3mjMAxdRT/what-i-would-do-if-i-wasn-t-at-arc-evals#Ambitious_mechanistic_interpretability) for ambitious mech interp. [^vllm-caveats]: [the benchmark](https://github.com/vllm-project/vllm/pull/10046) was run against `vllm#0.6.3.dev236+g48138a84`, with all configuration specified in the pull request. [^1]: An example steering function can be: $$ H_{3} = H_{2} + \text{steering\_strength} * \text{SAE}.W_{\text{dec}}[20] * \text{max\_activation} $$ [^shrinkage]: If we hold $\hat{x}(\bullet)$ fixed, thus L1 pushes $f(x) \to 0$, while reconstruction loss pushes $f(x)$ high enough to produce accurate reconstruction. An optimal value is somewhere between. However, rescaling the [shrink](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/sparse-autoencoder/../../thoughts/mechanistic-interpretability#feature-suppression) feature activations ([Sharkey, 2024](#bib-sharkey2024feature)) is not necessarily enough to overcome bias induced by L1: a SAE might learnt sub-optimal encoder and decoder directions that is not improved by the fixed. [^jointlysae]: ([Gorton, 2024](#bib-gorton2024missingcurvedetectorsinceptionv1)) denotes that cross-branch superposition is significant in interpreting models with parallel branches (InceptionV1) [^risks]: causal description it provides likely differs from that of the underlying model. [^l2weightnorm]: $\|W_\text{dec,i}^l\|$ is the L2 norm of a single feature’s decoder vector at a given layer. In principe, one might have expected to use L2 norm of per-layer norm $\sqrt{\sum_{l \in L} \|W_\text{dec,i}^l\|^2}$ [^sne]: Chris Colah’s [blog post](https://colah.github.io/posts/2015-01-Visualizing-Representations/) explains how t-SNE can be used to visualize collections of networks in a function space. [^direction]: Even though features still correspond to directions, the set of interpretable direction is larger than the number of dimensions [Lien vers l'original](https://aarnphm.xyz/thoughts/LLMs/../../thoughts/mechanistic-interpretability) ## Bibliographie - Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). _Language Modeling with Gated Convolutional Networks_. arXiv preprint arXiv:1612.08083 [\[arxiv\]](https://arxiv.org/abs/1612.08083) - Erichson, N. B., Yao, Z., & Mahoney, M. W. (2019). _JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks_. arXiv preprint arXiv:1904.03750 [\[arxiv\]](https://arxiv.org/abs/1904.03750) - Rajamanoharan, S., Conmy, A., Smith, L., Lieberum, T., Varma, V., Kramár, J., Shah, R., & Nanda, N. (2024). _Improving Dictionary Learning with Gated Sparse Autoencoders_. arXiv preprint arXiv:2404.16014 [\[arxiv\]](https://arxiv.org/abs/2404.16014) - Sharkey, L. (2024). _Addressing Feature Suppression in SAEs_. AI Alignment Forum. [\[post\]](https://www.alignmentforum.org/posts/3JuSjTZyMzaSeTxKk/addressing-feature-suppression-in-saes) - Shazeer, N. (2020). _GLU Variants Improve Transformer_. arXiv preprint arXiv:2002.05202 [\[arxiv\]](https://arxiv.org/abs/2002.05202) - Gorton, L. (2024). _The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders to InceptionV1 Early Vision_. arXiv preprint arXiv:2406.03662 [\[arxiv\]](https://arxiv.org/abs/2406.03662) - Laakso, A., & Cottrell, G. (2000). Content and cluster analysis: Assessing representational similarity in neural systems. _Philosophical Psychology_, _13_(1), 47–76. - Lindsey, J., Templeton, A., Marcus, J., Conerly, T., Batson, J., & Olah, C. (2024). Sparse Crosscoders for Cross-Layer Features and Model Diffing. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2024/crosscoders/index.html) - Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield-Dodds, Z., Lasenby, R., Drain, D., Chen, C., Grosse, R., McCandlish, S., Kaplan, J., Amodei, D., Wattenberg, M., & Olah, C. (2022). Toy Models of Superposition. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2022/toy_model/index.html) - Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations. In L. Vanderwende, H. Daumé III, & K. Kirchhoff (Eds.), _Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies_ (pp. 746–751). Association for Computational Linguistics. - Panickssery, N., Gabrieli, N., Schulz, J., Tong, M., Hubinger, E., & Turner, A. M. (2024). _Steering Llama 2 via Contrastive Activation Addition_. arXiv preprint arXiv:2312.06681 [\[arxiv\]](https://arxiv.org/abs/2312.06681) - Power, A., Burda, Y., Edwards, H., Babuschkin, I., & Misra, V. (2022). _Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets_. arXiv preprint arXiv:2201.02177 [\[arxiv\]](https://arxiv.org/abs/2201.02177) - Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). _Language Models are Unsupervised Multitask Learners_. --- slug: thoughts/Language tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Language" title: "Language" date: 2024-01-08 permalink: https://aarnphm.xyz/thoughts/Language.html.md --- > Language as a public tool to understand the private life. important to our self-knowledge ⇒ emphasise through reading books. ## communication. Notably through the work of “Philosophical Investigations” by Ludwig Wittgenstein - Concept of “language-game” - The idea that each of us construct a pictures that we see the world through language. - Conflict arose when pictures are not aligned, often lead to context collapse. Possibly the most salient feature of [LLMs](https://aarnphm.xyz/thoughts/Language/../../thoughts/LLMs) is that the system is surprising patient per each interactions with humans. ## [representations](https://aarnphm.xyz/thoughts/Language/../../thoughts/representations). [Language models](https://aarnphm.xyz/thoughts/Language/../../thoughts/LLMs) is a representation of our knowledge. Techniques such as [deep learning](https://aarnphm.xyz/thoughts/Language/../../thoughts/deep-learning) has risen to prominence due to its ability to learn from data, and in doing so, it has the capability to represent the world in a way that is more similar to how we perceive it. --- slug: thoughts/Lighting tags: - seed - film description: "resconstructed source of https://aarnphm.xyz/thoughts/Lighting" title: "Lighting" date: 2023-11-11 permalink: https://aarnphm.xyz/thoughts/Lighting.html.md --- ### Key light - Book light - Key source ⇒ bounce towards a diffusers - Spot light ⇒ Soft and dim contrast to the shot --- slug: thoughts/Low-rank-adapters tags: - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Low-rank-adapters" title: "Low rank adapters" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/Low-rank-adapters.html.md --- --- slug: thoughts/Lyapunov-time tags: - seed - math description: "resconstructed source of https://aarnphm.xyz/thoughts/Lyapunov-time" title: "Lyapunov time" date: 2024-02-25 permalink: https://aarnphm.xyz/thoughts/Lyapunov-time.html.md --- --- slug: thoughts/Machine-learning tags: - ml - sapling description: "resconstructed source of https://aarnphm.xyz/thoughts/Machine-learning" title: "Machine learning" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Machine-learning.html.md --- Detects pattern within data and use it to make useful prediction. Generally AI $\subset$ ML $\subset$ [DL](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/deep-learning) Some main exploration: - [Transformers](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/Transformers) - CNN - [Optimization](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/optimization) - Gradient descent - hyperparameter tuning - Recommender systems - Reinforcement learning - Q-learning - Policy Gradient - [Monte-Carlo](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/Monte-Carlo) Tree Search - Generative Models - GAN - VAE - Autoencoder - Supervised Q-learning - [Low-rank adapters](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/Low-rank-adapters) Fields - [mechanistic interpretability](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/mechanistic-interpretability) Related: - [linear algebra](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm#linear-algebra-review). - [autograd](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/Automatic-Differentiation) - [supervised machine learning](https://aarnphm.xyz/thoughts/Machine-learning/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm). --- slug: thoughts/Merkle-DAG tags: - seed - technical description: "resconstructed source of https://aarnphm.xyz/thoughts/Merkle-DAG" title: "Merkle DAG" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/Merkle-DAG.html.md --- It is a directed acyclic [graph](https://aarnphm.xyz/thoughts/Merkle-DAG/../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/Graphs) where each node is a version of the content and edges represents the change (diffs) Each node has an identifier which is the results of hashing the content. Merkle DAG nodes are _immutable_ and _[content-addressable](https://aarnphm.xyz/thoughts/Merkle-DAG/../../thoughts/Content-addressable-storage)_. Any changes in the node would alter its identifier thus affect all ascendants, which create a different DAG. Examples of the DAG in action: - [IPFS](https://aarnphm.xyz/thoughts/Merkle-DAG/../../thoughts/IPFS) - [Containers](https://aarnphm.xyz/thoughts/Merkle-DAG/../../thoughts/Containers) - [git](https://aarnphm.xyz/thoughts/Merkle-DAG/../../thoughts/git) --- slug: thoughts/Metaphysics tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Metaphysics" title: "Metaphysics" date: 2024-02-09 permalink: https://aarnphm.xyz/thoughts/Metaphysics.html.md --- See also: [The Evolution of Modern Metaphysics](https://aarnphm.xyz/thoughts/Metaphysics/../../books#tagsphilosophy-philosophy) Gentle introduction from [Aristotle](https://aarnphm.xyz/thoughts/Metaphysics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle), with [Being qua being](https://aarnphm.xyz/thoughts/Metaphysics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Being-qua-being) --- slug: thoughts/Misra-Gries-heavy-hitters-algorithm tags: - algorithm description: "extends Boyer-Moore finding algorithm" title: "Misra-Gries heavy-hitters algorithm" date: 2024-10-11 permalink: https://aarnphm.xyz/thoughts/Misra-Gries-heavy-hitters-algorithm.html.md --- one of the earliest [data](https://aarnphm.xyz/thoughts/Misra-Gries-heavy-hitters-algorithm/../../thoughts/data) streaming algorithm. ## problem. > Given the bag $b$ of $n$ elements and an integer $k \geq 2$. Find the values that occur more than $n/k$ times in $b$ idea: two passes over the values in $b$, while storing at most $k$ values from $b$ and their number of occurrences. Assume the bag is available in array $b[0:n-1]$ of $n$ elements, then a __heavy-hitter__ of bag $b$ is a value that occurs more than $n/k$ times in $b$ for some integer $k \geq 2$ ## pseudocode. ```pseudo \begin{algorithm} \caption{Misra--Gries} \begin{algorithmic} \State $t \gets \{\}$ \State $d \gets 0$ \For{$i \gets 0$ to $n-1$} \If{$b[i] \notin t$} \State $t \gets t \cup \{b[i]\}$ \State $d \gets d + 1$ \Else \State $t \gets t \cup \{b[i]\}$ \EndIf \If{$d = k$} \State Delete $k$ distinct values from $t$ \State Update $d$ \EndIf \EndFor \end{algorithmic} \end{algorithm} ``` --- slug: thoughts/Monte-Carlo tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Monte-Carlo" title: "Monte-Carlo methods" date: 2024-04-12 permalink: https://aarnphm.xyz/thoughts/Monte-Carlo.html.md --- ## tree search. a [search](https://aarnphm.xyz/thoughts/Monte-Carlo/../../thoughts/Search) algorithm based on random sampling of the search space. - Selection: root $R$ and select successive child nodes until leaf $L$ is reached. - The root is current game state and leaf is any node that has a potential child from no simulation - Expansion: Unless $L$ ends the game decisively for either player, then create one (or more) child nodes and choose node $C$ from one of them. - Simulation: Complete **one** random playout from node $C$. - Backpropgation: Result of playout to update information in nodes on path from $C$ to $R$. ## simulations --- slug: thoughts/NLP tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/NLP" title: "NLP" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/NLP.html.md --- See also: [LLMs](https://aarnphm.xyz/thoughts/NLP/../../thoughts/LLMs) ### CoT prompting arxiv: [2201.11903](https://arxiv.org/abs/2201.11903) --- slug: thoughts/Nagle-and-TCP-Cork tags: - seed - networking description: "resconstructed source of https://aarnphm.xyz/thoughts/Nagle-and-TCP-Cork" title: "Nagle's algorithm and TCP_CORK" date: 2022-07-01 permalink: https://aarnphm.xyz/thoughts/Nagle-and-TCP-Cork.html.md --- ### Nagle’s algorithm and Delay ACK - _small packets_ → not for TCP → Nagle algorithm: `Maximize ratio of packets - data content` → Delay ACK: `silly window` ```prolog if available_data & window_size > MSS send payload on wire else if unconfirmed_data queue else send ``` ### Cork algorithm --- slug: thoughts/Navier-Stokes-equations tags: - physics - fluid-dynamics description: "partial differential equations describing the motion of fluid substances. One of seven $1M problems in mathematics" title: "Navier-Stokes equations" date: 2024-11-27 permalink: https://aarnphm.xyz/thoughts/Navier-Stokes-equations.html.md --- Express momentum balance for Newtonian fluids making use of conversation of mass. ## derivations derived as a particular form of the [Cauchy momentum equation](https://aarnphm.xyz/thoughts/Navier-Stokes-equations/../../thoughts/Cauchy-momentum-equation) - url: thoughts/Cauchy-momentum-equation - description: convective form ## convective form $$ \frac{D \mathbf{u}}{Dt} = \frac{1}{\rho} \nabla \cdot \sigma + \mathbf{f} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/Navier-Stokes-equations/../../thoughts/Cauchy-momentum-equation#convective-form) By setting Cauchy stress tensor $\sigma$ to viscosity term $\tau$ (deviatoric stress) and pressure term $-p \mathbf{I}$ (volumetric stress), we have $$ \rho \frac{D\mathbf{u}}{Dt} = - \nabla p + \nabla \cdot \tau + \rho \mathbf{a} $$ where: - $\frac{D}{Dt}$ is the [material derivative](https://aarnphm.xyz/thoughts/Navier-Stokes-equations/../../thoughts/Cauchy-momentum-equation#matderivative) - et al. ## assumption upon Cauchy stress tensor 1. stress is Galilean invariant [^galilean-invariant], or it doesn’t depend directly on the flow velocity, but the spatial derivatives of the flow velocity > [!tip] tensor gradient > > rate-of-strain tensor: $\boldsymbol{\varepsilon} (\nabla \mathbf{u}) \equiv \frac{1}{2} \nabla \mathbf{u} + \frac{1}{2} (\nabla \mathbf{u})^T$ 2) Deviatoric stress is **linear** in this variable $\sigma (\varepsilon) = -p \mathbf{I} + \mathbf{C} : \varepsilon$, - where $p$ is independent on the strain rate tensor - $\mathbf{C}$ is the fourth-order tensor for constant of proportionality (viscosity tensor) - $:$ is the double-dot product 3) fluid is assumed to be isotropic, and consequently $\mathbf{C}$ is an isotropic tensor. Furthermore, the deviatoric stress tensor is symmetric by [Helmholtz decomposition](https://aarnphm.xyz/thoughts/Navier-Stokes-equations/../../thoughts/Helmholtz-decomposition), expressed in terms of two Lamé parameters, second viscosity $\lambda$ and dynamic viscosity $\mu$: $$ \sigma (\varepsilon) = -p \mathbf{I} + \lambda \text{tr}(\varepsilon)\mathbf{I} + 2 \mu \varepsilon $$ Where $\mathbf{I}$ is the identity tensor and $\text{tr}(\varepsilon)$ is the trace of the rate-of-strain tensor. Thus we can rewrite as: $$ \sigma = -p \mathbf{I} + \lambda (\nabla \cdot \mathbf{u}) \mathbf{I} + \mu (\nabla \mathbf{u} + (\nabla \mathbf{u})^T) $$ Given trace of the rate of strain tensor in three dimension is the _[divergence](https://aarnphm.xyz/thoughts/Navier-Stokes-equations/../../thoughts/Vector-calculus#divergence) of the flow (rate of expansion):_ $$ \text{tr}(\varepsilon) = \nabla \cdot \mathbf{u} $$ - trace of the stress tensor then becomes $\text{tr}(\sigma) = -3p + (3 \lambda + 2 \mu) \nabla \cdot \mathbf{u}$ (trace of identity tensor is 3) - alternatively decomposing stress tensor into **isotropic** and **deviatoric** part in fluid dynamic: $$ \boldsymbol{\sigma} = -\left[ p - \left( \lambda + \frac{2}{3} \mu \right) (\nabla \cdot \mathbf{u}) \right] \mathbf{I} + \mu \left( \nabla \mathbf{u} + (\nabla \mathbf{u})^T - \frac{2}{3} (\nabla \cdot \mathbf{u}) \mathbf{I} \right) $$ Introduce bulk viscosity $\zeta$: $$ \zeta \equiv \lambda + \frac{2}{3} \mu $$ We now have the following linear stress equation: > [!math] linear stress constitutive equation > > $$ > \boldsymbol{\sigma} = -\left[ p - \zeta (\nabla \cdot \mathbf{u}) \right] \mathbf{I} + \mu \left[ \nabla \mathbf{u} + (\nabla \mathbf{u})^T - \frac{2}{3} (\nabla \cdot \mathbf{u}) \mathbf{I} \right] > $$ ## Compressible flow Convective form $$ \begin{aligned} &\rho \frac{D \mathbf{u}}{D t} = \rho \left( \frac{\partial \mathbf{u}}{\partial t} + (\mathbf{u} \cdot \nabla) \mathbf{u} \right) \\ &= -\nabla p + \nabla \cdot \left\{ \mu \left[ \nabla \mathbf{u} + (\nabla \mathbf{u})^T - \frac{2}{3} (\nabla \cdot \mathbf{u}) \mathbf{I} \right] \right\} + \nabla \left[ \zeta (\nabla \cdot \mathbf{u}) \right] + \rho \mathbf{a}. \end{aligned} $$ With index notation: $$ \begin{aligned} \rho \left( \frac{\partial u_i}{\partial t} + u_k \frac{\partial u_i}{\partial x_k} \right) &= -\frac{\partial p}{\partial x_i} \\ &+ \frac{\partial}{\partial x_k} \left[ \mu \left( \frac{\partial u_i}{\partial x_k} + \frac{\partial u_k}{\partial x_i} - \frac{2}{3} \delta_{ik} \frac{\partial u_l}{\partial x_l} \right) \right] \\ &+ \frac{\partial}{\partial x_i} \left( \zeta \frac{\partial u_l}{\partial x_l} \right) \\ &+ \rho a_i. \end{aligned} $$ Conservation form $$ \begin{equation} \begin{aligned} \frac{\partial}{\partial t} (\rho \mathbf{u}) &+ \nabla \cdot \Bigg( \rho \mathbf{u} \otimes \mathbf{u} + \Big[ p - \zeta (\nabla \cdot \mathbf{u}) \Big] \mathbf{I} \\ &\quad - \mu \Big[ \nabla \mathbf{u} + (\nabla \mathbf{u})^T - \frac{2}{3} (\nabla \cdot \mathbf{u}) \mathbf{I} \Big] \Bigg) \\ &= \rho \mathbf{a}. \end{aligned} \end{equation} $$ ## Incompressible flow [^galilean-invariant]: Implies the laws of motion are the same in all _inertial frames of references_ Often refers to this principle as applied to Newtonian mechanics, that is Newton’s laws of motion hold in all frames related to one another by a Galilean transformation. --- slug: thoughts/Nesterov-momentum tags: - ml - optimization description: "resconstructed source of https://aarnphm.xyz/thoughts/Nesterov-momentum" title: "Nesterov momentum" date: 2024-11-11 permalink: https://aarnphm.xyz/thoughts/Nesterov-momentum.html.md --- See also [paper](http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf), [momentum](https://aarnphm.xyz/thoughts/Nesterov-momentum/../../thoughts/optimization#momentum) idea: - first take a step in the direction of accumulated momentum - computes gradient at “lookahead” position, - make the update using this gradient. > [!abstract] definition > > For a parameter vector $\theta$, the update can be expressed as > > $$ > \begin{aligned} v_t &= \mu v_{t-1} + \nabla L(\theta_t + \mu v_{t-1}) \\ \theta_{t+1} &= \theta_t - \alpha v_t \end{aligned} > $$ Achieves better convergence rates | function type | gradient descent | Nesterove AG | | ------------------------ | ---------------------------------- | --------------------------------------- | | Smooth | $\theta(\frac{1}{T})$ | $\theta(\frac{1}{T^{2}})$ | | Smooth & Strongly Convex | $\theta(\exp (-\frac{T}{\kappa}))$ | $\theta(\exp -\frac{T}{\sqrt{\kappa}})$ | > [!math] optimal assignments for parameters > > $$ > \alpha = \frac{1}{\lambda_{\text{max}}}, \beta = \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1} > $$ --- slug: thoughts/Networked-Thoughts tags: - seed - pattern description: "resconstructed source of https://aarnphm.xyz/thoughts/Networked-Thoughts" title: "Networked Thoughts" date: 2024-02-09 permalink: https://aarnphm.xyz/thoughts/Networked-Thoughts.html.md --- --- slug: thoughts/OCI tags: - seed - container description: "resconstructed source of https://aarnphm.xyz/thoughts/OCI" title: "OCI Format" date: 2023-08-10 permalink: https://aarnphm.xyz/thoughts/OCI.html.md --- A standard for packaging and running containerized applications. [Specification](https://github.com/opencontainers/image-spec): ### Layout Directory structure for [location-addressable](https://aarnphm.xyz/thoughts/OCI/../../thoughts/Content-addressable-storage) blobs --- slug: thoughts/Orwellian tags: - seed - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Orwellian" title: "Orwellian" date: 2024-10-02 permalink: https://aarnphm.xyz/thoughts/Orwellian.html.md --- Described a situation, idea, or societal condition that George Orwell identified as being destructive to the welfare of a free and open society. --- slug: thoughts/Overton-Window tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Overton-Window" title: "Overton Window" date: 2024-03-05 permalink: https://aarnphm.xyz/thoughts/Overton-Window.html.md --- _also known as window of discourse_ > A window into the ideas that frames ideas that people are prepared to entertain. All ideas outside the window are not seriously considered. More prominent in the land of policy-making, but also apply to general idea perception. To move the window requires people, ideas outside of the window to shift what is considered “generally” acceptable by the public. --- slug: thoughts/PJRT tags: - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/PJRT" title: "PJRT" date: 2024-03-04 permalink: https://aarnphm.xyz/thoughts/PJRT.html.md --- Blog [post](https://opensource.googleblog.com/2023/05/pjrt-simplifying-ml-hardware-and-framework-integration.html) and [source](https://github.com/openxla/xla/tree/main/xla/pjrt) Lower stack layer for framework and hardware communication. As abstraction to transpile to different hardware targets: [TPU](https://aarnphm.xyz/thoughts/PJRT/../../thoughts/TPU), [GPU](https://aarnphm.xyz/thoughts/PJRT/../../thoughts/GPU-programming) --- slug: thoughts/PageRank tags: - seed - algorithm description: "resconstructed source of https://aarnphm.xyz/thoughts/PageRank" title: "PageRank" date: 2024-09-04 permalink: https://aarnphm.xyz/thoughts/PageRank.html.md --- --- slug: thoughts/Pavlovian-scale tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Pavlovian-scale" title: "Pavlovian scale" date: 2024-09-25 permalink: https://aarnphm.xyz/thoughts/Pavlovian-scale.html.md --- Also known as classical conditioning > a biologically potent stimulus is paired with a neutral stimulus --- slug: thoughts/Philosophy-and-Kant tags: - philosophy - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Philosophy-and-Kant" title: "Philosophy and Kant" date: 2023-12-04 permalink: https://aarnphm.xyz/thoughts/Philosophy-and-Kant.html.md --- ### ontology framework. ### critique. --- slug: thoughts/Philosophy-and-Nietzsche tags: - philosophy - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche" title: "Philosophy and Nietzsche" date: 2023-12-04 permalink: https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche.html.md --- See also: Nietzsche’s [Life](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche) and overall influence ## Nietzsche and Philosophy _by [Giles Deleuze](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Giles-Deleuze)_ The decadence of modern philosophy is the theory of value imposes conformism and a new form of submission Philosophy of sense and values has to be a critique ### Value Problem with [Kant](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Philosophy-and-Kant): failed to pose the problem of critique in terms of values Notion of [aesthetic value](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/aesthetic-value) implies critical reversal Critical philosophy has two inseparable moments: the referring back of all things and any kind of origin to values, but also the referring back of these values to something which is, as it were, the origin and determines their value. This is Nietzsche’s twofold struggle: - against those who remove values from criticism, contenting themselves with producing inventories of existing values or we criticising things in the name of established values (the “philosophy labourers”, Kant and Schopenhauer, [BGE ](#anatomy-of-beyond-good-and-evil)211) - against those who criticise, or respect, values by deriving them from simple facts, from so-called “objective facts” (the utilitarians, the “scholars”, BGE Part 6). Nietzsche attacks both the “high” idea of foundation which leaves values indifferent to their own origin and the idea of a simple causal derivation or smooth beginning which suggests an indifferent origin of values Genealogy: substitute pathos of difference or distance for both Kantian principle of universality and the principle of resemblance dear to utilitarianism (GM I) ### Sense - there are no def of sense - We don’t know where the force come from - Philosophy is symptomatology, not semeiology - To interpret and to evaluate is to weigh causal and effects. Force is not a cause, but a symptom. ### Against [dialectics](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/dialectics) Theory of force Life struggles with another form of life it affirms its own difference and enjoys this difference. The negative is not present in the essence as that from which force draws its activity: on the contrary it is a result of activity, of the existence of an active force and the affirmation of its difference. The negative is a product of existence itself: the aggression necessarily linked to an active existence, the aggression of an affirmation. As for negation as a concept, “it is only a subsequently-invented pale contrasting image in relation to its positive basic concept - filled with life and passion through and through” (GM I 10 p. 37). For the speculative element of negation, opposition or contradiction Nietzsche substitutes the practical element of difference, the object of affirmation and enjoyment. It is in this sense that there is a Nietzschean empiricism. The question which Nietzsche constantly repeats, “what does a will want, what does this one or that one want?”, must not be understood as the search for a goal, a motive or an object for this will. What a will wants is to affirm its difference. In its essential relation with the “other” a will makes its difference an object of affirmation. > “The pleasure of knowing oneself different”, the enjoyment of difference (BGE 260); This is the new, aggressive and elevated conceptual element that empiricism substitutes for the heavy notions of the dialectic and above all, as the dialectician puts it, for the labour of the negative. ### Tragedy > [!tip] Tragic > > The linking among contradictions, negatives, and opposition Tragedy has three ways of dying: - Socrates’ dialectics, or Euripidean death - Christianity - Modern dialectics and Wagner 1. BT emphasizes the contradiction is between primitive unity and individuality 2. Reflected in the opposition of Dionysus and Apollo - Apollo overcomes the suffering of the individual by the radiant glorification of the eternity of the phenomenon: construct appearances of appearance, thus freed from suffering - Dionysus shatters the individual, absorbing him into original being ⇒ reproduces contradictions as pain of individual and introduces into higher pleasure 3. Two antithesis ways of solving tragedy 4. Reconciliation dominated by Dionysus ### Nietzsche’s Evolution Tragic in totality lies within its contradiction, Dionysus’ resolutions and expressions of such solutions Characteristic of tragic culture, as seem in Kant, Schopenhauer, Wagner, as in trying to solve it - wisdom takes the place of science as the highest end. ### Existence and Innocence Necessary to disperse the universe, to lose respect for the whole > Innocence is the game of existence, of force and will Existence affirmed and appreciated, force not separated, the will not divided in two - first approximation of innocence Mentioned Heraclitus = tragic thinker H understood existence on the basis of an instinct of [play](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/play) Existence as an [aesthetic](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/aesthetic-value) phenomenon rather than moral or religious Affirmation of being Heraclitus denied the duality of worlds, “he denied being itself’. Moreover he made an affirmation of becoming. We have to reflect for a long time to understand what it means to make an affirmation of becoming. In the first place it is doubtless to say that there is only becoming. No doubt it is also to affirm becoming. But we also affirm the being of becoming, we say that becoming affirms being or that being is affirmed in becoming. Heraclitus has two thoughts which are like ciphers: according to one there is no being, everything is becoming; according to the other, being is the being of becoming as such. A working thought which affirms becoming and a contemplative thought which affirms the being of becoming. These two ways of thinking are inseparable, they are the thought of a single element, as Fire and Dike, as Physis and Logos. For there is no being beyond becoming, nothing beyond multiplicity; neither multiplicity nor becoming are appearances or illusions Multiplicity is the inseparable manifestation, essential transformation and constant symptom of unity Affirming being of becoming and affirming becoming are two return state Eternal return is distinct return of outward movement, distinct contemplation of action ### The dice-throw The game as two set of movement Earth is where the dice is thrown and sky is when the dice is thrown back The dice-throw affirm becoming and it affirms the being of becoming Not a large number of throws produce the repetition of combinations but rather the number of combinations which produce the repetition of the dice throw Dice that are thrown once is the affirmation of chance Combination of dice that are thrown is the affirmation of necessity Necessity is affirmed by chances and chances id being affirmed by the act of necessity ### Nietzsche and Mallermé 1. To think is to send out a dice-throw 2. Man does not know how to play 3. To throw a dice is not only irrational, but also constitute to the tragic attempt and tragic thought par excellence Necessity is the abomination of chance ### Tragic thoughts Spirit of revenge as in different form nihilism takes place It is a type, but not separable from typology The Touchstone Relates to other tragic philosopher, but shan’t take this at face value Tragedy in Nietzsche philosophy, one must ask: - How does this other think? - How much ressentiment and bad conscience remains in his thoughts? Zarathustra opposes playing to betting, dancing to leaping --- ## _Anatomy_ of Beyond Good and Evil ### Prejudices of Philosophers [Source](https://www.marxists.org/reference/archive/nietzsche/1886/beyond-good-evil/ch01.htm) - Begins by critiquing the traditional approaches of [truth](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Will-to-Truth) and morality, deemed it “hazardous enterprise”. From a perspective of a philosophers, who are “deemed to pursue the truth” doesn’t seem to fully understand why ### The Free Spirit [Source](https://www.marxists.org/reference/archive/nietzsche/1886/beyond-good-evil/ch02.htm) > [!note] Aphorism 24 > > What strange simplification and falsification mankind lives! One can never cease wondering once one has acquired eyes for this marvel! How we have made everything around us clear and free and easy and simple! How we have been able to give our senses a passport to everything superficial, our thoughts a godlike desire for wanton gambolling and false conclusions! - How from the beginning, we have contrived to retain our ignorance as to enjoy an almost inconceivable freedom, frivolity, impetuosity, bravery, cheerfulness of life, so as to enjoy life! Man lives in blissful ignorance, and it is this ignorance that allows him to enjoy life. Contains a deliberate overlooking or misunderstanding of complexity and depth of reality, such that one grant one’s thoughts the freedom to roam superficially. > And only on this solidified, granite-like foundation of ignorance could knowledge rear itself hitherto, the will to knowledge on the foundation of a far more powerful will, the will to ignorance, to the uncertain, to the untrue! Not as its opposite, but — as its refinement! Nietzsche posits that humans have contrived to retain their ignorance in order to enjoy life with freedom, lack of scruple, heartiness, and gaiety. This foundation of ignorance allows knowledge to rise, but it does so on the foundation of a far more powerful [will](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Will) --- will to ignorance to uncertainty, to the untrue. Nietzsche juxtaposes a paradox at our existence: a foundation of ignorance is actually built upon our will to knowledge. Will to knowledge is not opposed to ignorance, rather a refinement. Will to ignorance is actually a strategy of [power](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Will-to-Power), as it motivates [force](https://aarnphm.xyz/thoughts/Philosophy-and-Nietzsche/../../thoughts/Giles-Deleuze#active-and-reactive-forces). --- ## The Gay Science Mentions the Death of God and start the introduction to the doctrine of eternal occurrence > [!note] The connotation of "gay" in Nietzsche's dialectics > > The original title was “la gayza scienza”, and “gay” doesn’t necessarily means homosexuality, rather flexible and joyful. If word for word to be transcribed, it would meant The Joyful Science. --- ## On Genealogy of Morals --- ## Thus Spoke Zarathustra Consciousness is what you make of it. The values you gather through experience are curated largely based on your environment, and Zarathustra guides you on acting morally. People are innately good, but circumstances make them act a certain way. --- slug: thoughts/Planimetric-composition tags: - film description: "resconstructed source of https://aarnphm.xyz/thoughts/Planimetric-composition" title: "Planimetric composition" date: 2023-08-11 permalink: https://aarnphm.xyz/thoughts/Planimetric-composition.html.md --- --- slug: thoughts/Progressive-disclosure tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Progressive-disclosure" title: "Progressive disclosure" date: 2024-09-02 permalink: https://aarnphm.xyz/thoughts/Progressive-disclosure.html.md --- > make complexity easier to learn, but still enables power users to discover all workflows. --- slug: thoughts/PyTorch tags: - ml - framework description: "tidbits from PyTorch" title: "PyTorch" date: 2024-11-11 permalink: https://aarnphm.xyz/thoughts/PyTorch.html.md --- see also: [unstable docs](https://pytorch.org/docs/main/) ## `MultiMarginLoss` Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input $x$ (a 2D mini-batch `Tensor`) and output $y$ (which is a 1D tensor of target class indices, $0 \le y \le \text{x}.\text{size}(1) -1$): For each mini-batch sample, loss in terms of 1D input $x$ and output $y$ is: $$ \text{loss}(x,y) = \frac{\sum_{i} \max{0, \text{margin} - x[y] + x[i]}^p}{x.\text{size}(0)} \\ \because i \in \{0, \ldots x.\text{size}(0)-1\} \text{ and } i \neq y $$ --- slug: thoughts/RAG tags: - technical - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/RAG" title: "RAG" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/RAG.html.md --- Retrieval-Augmented Generation paper: [arxiv](https://arxiv.org/abs/2005.11401) Since models has finite memory, limited context windows, generations often leads to “hallucinations” and lack of cohesion The idea of RAG is to combine a pretrained retriever and a seq2seq to do end-to-end fine tuning. Two core components include [embeddings](https://aarnphm.xyz/thoughts/RAG/../../thoughts/Embedding) and vector databases. --- slug: thoughts/Radix-tree tags: - technical description: "resconstructed source of https://aarnphm.xyz/thoughts/Radix-tree" title: "Radix tree" date: 2024-11-18 permalink: https://aarnphm.xyz/thoughts/Radix-tree.html.md --- A prefix [trie](https://aarnphm.xyz/thoughts/Radix-tree/../../thoughts/university/twenty-three-twenty-four/sfwr-2c03/Hash-tables) in which each node that is the only child is merged with its parent. ![](https://aarnphm.xyz/thoughts/Radix-tree/../../thoughts/images/Patricia_trie.svg) _By Claudio Rocchini - Own work, CC BY 2.5, [wikimedia](https://commons.wikimedia.org/w/index.php?curid=2118795)_ result: number of all internal nodes is at most the radix $r$ of the tree, where $r=2^{x} \forall x \in \mathbb{R}^d \cap x \ge 1$ Edge can be labelled with sequences of elements as well as single elements. key at each node is compared chunk-of-bits, where quantity of bits in any given chunk is the radix $r$ of the radix tree: - $r=2$ then radix trie is binary, which minimise sparsity at the expense of maximising trie-depth - $r \ge 4$ is a power of two, then it is a r-ary trie, which lessen the depth at the expense of some sparseness **Lookup pseudocode**: ```pseudo \begin{algorithm} \caption{Lookup} \begin{algorithmic} \State $\text{traverseNode} \gets \text{root}$ \State $\text{elementsFound} \gets 0$ \While{traverseNode $\neq \text{null} \land \neg \text{traverseNode}.\text{isLeaf}() \land \text{elementsFound} < \text{length}(x)$} \State nextEdge $\gets$ select edge from traverseNode.edges where edge.label is a prefix of $x.\text{suffix}(\text{elementsFound})$ \If{nextEdge $\neq \text{null}$} \State traverseNode $\gets$ nextEdge.targetNode \State elementsFound $\gets$ elementsFound + length(nextEdge.label) \Else \State traverseNode $\gets$ null \EndIf \EndWhile \State \Return traverseNode $\neq \text{null} \land \text{traverseNode}.\text{isLeaf}() \land \text{elementsFound} = \text{length}(x)$ \end{algorithmic} \end{algorithm} ``` ## complexity Permits lookup, deletion, insertion in $O(k)$ rather than $O(\log n)$ Normally $k \ge \log n$, but in a balanced tree every comparison is a string comparison requires $O(k)$ worse-case time. Whereas in a trie all comparison require constant times, but takes $m$ comparisons to look up a string length $m$ --- slug: thoughts/Reynolds-transport-theorem tags: - math - calculus description: "resconstructed source of https://aarnphm.xyz/thoughts/Reynolds-transport-theorem" title: "Reynolds transport theorem" date: 2024-11-27 permalink: https://aarnphm.xyz/thoughts/Reynolds-transport-theorem.html.md --- Also known as _Leibniz-Reynolds transport therem_ A three-dimensional generalization of Leibniz integral rule > [!math] theorem > > Consider integrating $\mathbf{f} = \mathbf{f}(x, t)$ over time-dependent region $\Omega (t)$ that has boundary $\partial \Omega (t)$ then take derivative w\.r.t time: > > $$ > \frac{d}{dt} \int_{\Omega (t)} \mathbf{f} dV > $$ ## general form $$ \frac{d}{dt} \int_{\Omega(t)} \mathbf{f} dV = \int_{\Omega (t)} \frac{\partial{\mathbf{f}}}{\partial{t}} dV + \int_{\partial{\Omega (t)}}(\mathbf{v}_b \cdot \mathbf{n}) \mathbf{f} dA $$ where: - $\mathbf{n}(\mathbf{x},t)$ is the outward-pointing unit normal vector - $\mathbf{x}$ is the variable of integrations - $dV$ and $dA$ are volume and surface elements at $\mathbf{x}$ - $\mathbf{v}_b(\mathbf{x},t)$ is the velocity of the area element. --- slug: thoughts/Routh-Hurwitz-criterion tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Routh-Hurwitz-criterion" title: "Routh-Hurwitz criterion" date: 2024-02-06 permalink: https://aarnphm.xyz/thoughts/Routh-Hurwitz-criterion.html.md --- > Condition for the stability of linear time-invariant (LTI) [control system](https://aarnphm.xyz/thoughts/Routh-Hurwitz-criterion/../../tags/sfwr3dx4) > [!tip] sufficient condition for Stability > > All coefficients in the first column complete Routh array are the same sign For a system with transfer function $\hat{G}(s) = \frac{\mathcal{N}(s)}{\mathcal{D}(s)}$ Input-output stability implies that all root of $\mathcal{d}(s)$ are in the Left Half Plane (LHP) --- slug: thoughts/Rust tags: - seed - technical description: "resconstructed source of https://aarnphm.xyz/thoughts/Rust" title: "Rust" date: 2022-10-29 permalink: https://aarnphm.xyz/thoughts/Rust.html.md --- Ownership and Borrowing - Stack and heaps ```rust fn main() { let s = String::from("Hello"); } ``` borrow mutable ONCE - long running owners - refcount Foreign-Function Interfaces (FFI) --- slug: thoughts/SVCCA tags: - ml - interp description: "resconstructed source of https://aarnphm.xyz/thoughts/SVCCA" title: "SVCCA" date: 2024-11-04 permalink: https://aarnphm.xyz/thoughts/SVCCA.html.md --- ([Raghu et al., 2017](#bib-raghu2017svccasingularvectorcanonical)) proposed a way to compare two representations that is both invariant to affine transform and fast to compute [^explain] > based on canonical correlation analysis which was invariant to linear transformation. > [!abstract] definition > > Given a dataset $X = \{x_{1},\cdots, x_m\}$ and a neuron $i$ on layer $l$, we define $z_i^l$ to be the _vector_ of outputs on $X$, or: > > $$ > z^l_i = (z^l_i(x_1), \cdots, z^l_i(x_m)) > $$ SVCCA proceeds as following: 1. **Input**: takes as input two (not necessary different) sets of neurons $l_{1} = \{z_1^{l_{1}}, \cdots, z_{m_{1}}^{l_1}\}$ and $l_{2} = \{z_1^{l_2}, \cdots, z_{m_2}^{l_{2}}\}$ 2. **Step 1**: Perform [SVD](https://aarnphm.xyz/thoughts/SVCCA/../../thoughts/Singular-Value-Decomposition) of each subspace to get subspace $l^{'}_1 \subset l_1, l^{'}_2 \subset l_2$ 3. **Step 2**: Compute Canonical Correlation similarity between $l^{'}_1, l^{'}_2$, that is maximal correlations between $X,Y$ can be expressed as: $$ \max \frac{a^T \sum_{XY}b}{\sqrt{a^T \sum_{XX}a}\sqrt{b^T \sum_{YY}b}} $$ where $\sum_{XX}, \sum_{XY}, \sum_{YX}, \sum_{YY}$ are covariance and cross-variance terms. By performing change of basis $\tilde{x_{1}} = \sum_{xx}^{\frac{1}{2}} a$ and $\tilde{y_1}=\sum_{YY}^{\frac{1}{2}} b$ and Cauchy-Schwarz we recover an eigenvalue problem: $$ \tilde{x_{1}} = \argmax [\frac{x^T \sum_{X X}^{\frac{1}{2}} \sum_{XY} \sum_{YY}^{-1} \sum_{YX} \sum_{XX}^{-\frac{1}{2}}x}{\|x\|}] $$ 4. **Output**: aligned directions $(\tilde{z_i^{l_{1}}}, \tilde{z_i^{l_{2}}})$ and correlations $\rho_i$ > [!tip] distributed representations > > SVCCA has no preference for representations that are neuron (axed) aligned. [^testnet] ## Bibliographie - Raghu, M., Gilmer, J., Yosinski, J., & Sohl-Dickstein, J. (2017). _SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability_. arXiv preprint arXiv:1706.05806 [\[arxiv\]](https://arxiv.org/abs/1706.05806) [^explain]: means allowing comparison between different layers of network and more comparisons to be calculated than with previous methods [^testnet]: Experiments were conducted with a convolutional network followed by a residual network: convnet: `conv --> conv --> bn --> pool --> conv --> conv --> conv --> conv --> bn --> pool --> fc --> bn --> fc --> bn --> out` resnet: `conv --> (x10 c/bn/r block) --> (x10 c/bn/r block) --> (x10 c/bn/r block) --> bn --> fc --> out` Note that SVD and CCA works with $\text{span}(z_1, \cdots, z_m)$ instead of being axis aligned to $z_i$ directions. This is important if representations are distributed across many dimensions, which we observe in cross-branch superpositions! --- slug: thoughts/Scents tags: - evergreen description: "resconstructed source of https://aarnphm.xyz/thoughts/Scents" title: "Scents" date: 2024-01-07 permalink: https://aarnphm.xyz/thoughts/Scents.html.md --- A (mostly) up-to-date scents that I use/like/prefer. See [antilibrary](https://aarnphm.xyz/thoughts/Scents/../../books) for reading list. ### like. - Maison Margiela’s _Lazy Sunday Morning_ - Maison Francis Kurkdjian’s _OUD satin mood_ - Tom Ford’s _Noir de Noir_ ### current. #### [Le Labo’s Rose 31](https://www.lelabofragrances.ca/rose-31.html?bypass=true\®ion=CA\&locale=EN\&gad_source=1) - Definitely a winter/spring scent. - If you like the smell of roses. Alternatives are Matcha 26, or Fleurs d’Oranger 27. #### [Le Labo’s Labdanum 18](https://www.lelabofragrances.ca/labdanum-18.html?bypass=true\®ion=CA\&locale=EN\&gad_source=1) - warm and sweet scent, good for a summer, fall night. - definitely stays a lot longer comparing to rose. --- slug: thoughts/Search tags: - seed - technical description: "resconstructed source of https://aarnphm.xyz/thoughts/Search" title: "Search" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Search.html.md --- ## Engine A search engine is essentially query processing. It is a form of [information retrieval](https://aarnphm.xyz/thoughts/Search/../../thoughts/information-retrieval) that helps one to answer [questions](https://aarnphm.xyz/thoughts/Search/../../thoughts/questions) The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). Some search engines also mine [data](https://aarnphm.xyz/thoughts/Search/../../thoughts/data) available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler. ## query See also [PageRank](https://aarnphm.xyz/thoughts/Search/../../thoughts/PageRank) ### HITS algorithm --- slug: thoughts/Singular-Value-Decomposition tags: - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Singular-Value-Decomposition" title: "Singular Value Decomposition" date: 2024-10-21 permalink: https://aarnphm.xyz/thoughts/Singular-Value-Decomposition.html.md --- $$ \begin{aligned} X &= \begin{bmatrix} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_m \\ \vdots & \vdots & \vdots & \vdots \\ 1 & x_1 & \cdots & x_m \end{bmatrix} = U \Sigma V^T \\ &= \begin{bmatrix} 1 & 1 & \cdots & 1 \\ u_{1} & u_{2} & \cdots & u_n \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 1 & \cdots & 1 \end{bmatrix} \begin{bmatrix} \sigma_1 & \cdots & \cdots & \cdots \\ \vdots & \sigma_2 & \cdots & \cdots \\ \vdots & \cdots & \ddots & \cdots \\ \vdots & \cdots & \cdots & \sigma_m \\ 0 & 0 & 0 & 0 \\ \end{bmatrix} {\begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ v_{1} & v_{2} & \cdots & v_n \\ \vdots & \vdots & \vdots & \vdots \end{bmatrix}}^T \\ \\ x_k &\in \mathbb{R}^n \\ \\ \text{U, V } &: \text{unitary matrices} \\ \Sigma &: \text{diagonal matrix} \end{aligned} $$ where $\begin{bmatrix} 1 \\ u_{1} \\ \vdots \\ 1 \end{bmatrix}$ are “eigen-faces” $U$ is orthonormal, meaning: $$ \begin{aligned} U U^T &= U^T U = \mathbb{I}_{n \times n} \\ V V^T &= V^T V = \mathbb{I}_{m \times m} \\ \\ \Sigma &: \text{diagonal} \quad \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_m \geq 0 \end{aligned} $$ --- slug: thoughts/TPU tags: - hardware description: "resconstructed source of https://aarnphm.xyz/thoughts/TPU" title: "TPU" date: 2024-03-04 permalink: https://aarnphm.xyz/thoughts/TPU.html.md --- See also: [XLA](https://aarnphm.xyz/thoughts/TPU/../../thoughts/XLA), and [architecture](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm) --- slug: thoughts/Tensor-field tags: - math description: "a gentle introduction into tensor analysis" title: "Tensor field" date: 2024-11-27 permalink: https://aarnphm.xyz/thoughts/Tensor-field.html.md --- > a function assign a tensor to each point of a region of a mathematical space (typically a _Euclidean space_ or a [manifold](https://aarnphm.xyz/thoughts/Tensor-field/../../thoughts/manifold)) > [!math] Definition > > Let $M$ be a manifold, for instance the Euclidean plane $\mathbb{R}^n$ > > Then a tensor field of type $(p, q)$ is a section > > $$ > T \in \Gamma(M, V^{\otimes p} \otimes (V^{*})^{\otimes q}) > $$ > > where $V$ is a [vector bundle](https://aarnphm.xyz/thoughts/Tensor-field/../../thoughts/Tensor-field#vector-bundle) on $M$, $V^{*}$ is its [dual](https://aarnphm.xyz/thoughts/Tensor-field/../../thoughts/Tensor-field#dual) and $\otimes$ is the tensor product of vector bundles ## via coordinate transitions See also ([McConnell, 2014](#bib-mcconnell2014applications); [Schouten, 1951](#bib-schouten1951tensor)) --- ## appendix _a few math definitions_ ### metric tensors > A tangent space is a $n$-dimensional differentiable manifold $M$ associated with each point $p$. a non-degenerate, smooth, symmetric bilinear map that assigns a real number to pairs of tangent vectors at each tangent space of the manifold. > [!math] metric tensor > > $$ > g: T_p M \times T_p M \to \mathbb{R} > $$ The map is symmetric and bilinear, so if $X, Y, Z \in T_p M$ are tangent vectors at point $p$ to the manifold $M$ then we have: $$ \begin{aligned} g(X,Y) &= g(Y,X) \\ g(aX + Y, Z) &= ag(X,Z) + g(Y,Z) \end{aligned} $$ for any real number $a \in \mathbb{R}$ > $g$ is _non-degenerate_ means there is no non-zero $X \in T_p M$ such that $g(X,Y)=0 \forall \space Y \in T_p M$ ### vector bundle a topological construction that makes precise the idea of of a family of vector space parameterised by another space $X$ ex: $X$ could be a topological space, a [manifold](https://aarnphm.xyz/thoughts/Tensor-field/../../thoughts/manifold) [](https://aarnphm.xyz/thoughts/Tensor-field/../../thoughts/images/MobiusStrip.mp4) _Möbius strip_ > to every point $x$ of the space $X$ we “attach” a vector space $V(x)$ in such a way that these vector space fits together to form another space of the same kind as $X$ > [!math] definition > > A **real vector bundle** consists of > > - topological spaces $X$ (base space) and $E$ (total space) > - a continuous surjection $\pi: E \rightarrow X$ (bundle projection) > - For every $x$ in $X$ the structure of a _finite-dimensional real vector space_ on the [fiber](https://aarnphm.xyz/thoughts/Tensor-field/../../thoughts/Tensor-field#fiber) $\pi^{-1}(\{x\})$ > [!tip] compatibility condition > > For every point $p$ in $X$, there is an _open neighborhood_ $U \subseteq X$ of $p$ and a **[homeomorphism](https://aarnphm.xyz/thoughts/Tensor-field/../../thoughts/homeomorphism)** > > $$ > \varphi : U \times \mathbb{R}^k \rightarrow \pi^{-1}(U) > $$ > > such that for all $x$ in $U$: > > - $(\pi \circ \varphi)(x,v)=x$ for all vectors $v$ in $\mathbb{R}^k$ > - the map $v \mapsto \varphi(x,v)$ is a _linear isomorphism_ between vector spaces $\mathbb{R}^k$ and $\pi^{-1}(\{x\})$ #### properties - open neighborhood $U$ together with the hoemomorphism $\varphi$ is called a _local trivialisation_ of the vector bundle [^local-trivial] * every fiber $\pi^{-1}(\{x\})$ is a finite-dimensional real vector space and hence has a _dimension_ $k_x$ * function $x \to k_x$ is locally constant, and therefore constant on each _connected component_ of $X$ > [!note] rank of the vector bundle > > if $k_x$ is equal to constant $k$ on all of $X$, then $k$ is the rank of the vector bundle, and $E$ is a **vector bundle of rank** $k$ > [!math] trivial bundle > > The Cartesian product $X \times \mathbb{R}^k$ equipped with the projection $X \times \mathbb{R}^k \to X$ is considered as the _trivial bundle_ of rank $k$ over $X$ ### dual operations on vector bundle extending the operation of duality for vector space. > [!math] definition > > a _dual bundle_ of a vector bundle $\pi : E \rightarrow X$ is the vector bundle $\pi^{*}: E^{*} \rightarrow X$ whose fiber are the dual spaces to fibers of $E$ Equivalently, $E^{*}$ can be defined as the Hom bundle $\text{Hom}(E, \mathbb{R} \times X)$, the vector bundle of morphisms from $E$ to the trivial line bundle $\mathbb{R} \times X \rightarrow X$ ### fiber _a space that is _locally_ a product space, but _globally_ may have different topological structure_ > [!math] definition > > A fiber bundle is a structure $(E, B, \pi, F)$ where: > > - $E, B, F$ are topological space > - $\pi: E \rightarrow B$ is a _continuous surjection_ satisfying _local triviality_ condition $B$ is considered as _base space_, $E$ is **total space**, and $F$ is the _fiber space_ the map $\pi$ is called the **projection map** > [!abstract] consequences > > we require that for every $x \in B$, there is an open neighborhood $U \subseteq B$ of $x$ such that there is a [homeomorphism](https://aarnphm.xyz/thoughts/Tensor-field/../../thoughts/homeomorphism) $\varphi: \pi^{-1}(U) \rightarrow U \times F$ such that a way $\pi$ agrees with the projection onto the first factor. [^annotation] _source code_ where $\text{proj}_1: U \times F \rightarrow U$ is the natural projection and $\varphi : \pi^{-1}(U) \rightarrow U \times F$ is a homeomorphism. > The set of all $\{(U_i, \varphi_i)\}$ is called a **local trivialization** of the bundle Therefore, for any $p \in B$, the _preimage_ $\pi^{-1}(\{p\})$ is _homeomorphic_ to $F$ [^true] and is called the _fiber over_ p > [!note] annotation > > a fiber bundle $(E, B, \pi, F)$ is often denoted as > > $$ > F \to E \xrightarrow{\pi} B > $$ #### bundle map Suppose that $M$ and $N$ are base space, and $\pi_E: E \to M$ and $\pi_F: F \to N$ are fiber bundles over $M$ and $N$ respectively. > [!math] definition > > **bundle map/morphism** consists of a pair of continuous functions > > $$ > \varphi: E \to F, f: M \to N > $$ > > such that $\pi_F \circ \varphi = f \circ \pi_E$. That is the following is commutative: > > _source code_ ## Bibliographie - McConnell, A. J. (2014). _Applications of Tensor Analysis_. Dover Publications. - Schouten, J. A. (1951). _Tensor Analysis for Physicists_. Oxford University Press. [^local-trivial]: shows that _locally_ the map $\pi$ “looks like” the projection of $U \times \mathbb{R}^k$ on $U$ [^annotation]: $\pi^{-1}(U)$ is the given subspace topology, and $U \times F$ is the product space [^true]: since this is true of $\text{proj}_1^{-1}(\{p\})$ --- slug: thoughts/The-Prisoner's-Dilemma tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/The-Prisoner's-Dilemma" title: "The Prisoner's Dilemma" date: 2024-04-12 permalink: https://aarnphm.xyz/thoughts/The-Prisoner's-Dilemma.html.md --- a [game theory](https://aarnphm.xyz/thoughts/The-Prisoner's-Dilemma/../../thoughts/game-theory) thought experiment involves two rational agents, each of whom can cooperate for mutual benefit or “defect” for individual reward. --- slug: thoughts/The-Will-To-Believe tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/The-Will-To-Believe" title: "The Will To Believe" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/The-Will-To-Believe.html.md --- Book: [web](https://www.gutenberg.org/files/26659/26659-h/26659-h.htm) ## rationality > But this relief seems to be a negative rather than a positive character. Shall we then say that the feeling of rationality is constituted merely by the absence of any feeling of irrationality? Just as we feel no particular pleasure when we breathe freely, but a very intense feeling of distress when the respiratory motions are prevented,—so any unobstructed tendency to action discharges itself without the production of much cogitative accompaniment, and any perfectly fluent course of thought awakens but little feeling; but when the movement is inhibited, or when the thought meets with difficulties, we experience distress. It is only when the distress is upon us that we can be said to strive, to crave, or to aspire. > All feeling whatever, in the light of certain recent psychological speculations, seems to depend for its physical condition not on simple discharge of nerve-currents, but on their discharge under arrest, impediment, or resistance. --- slug: thoughts/Transcendentals tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Transcendentals" title: "Transcendentals" date: 2024-01-14 permalink: https://aarnphm.xyz/thoughts/Transcendentals.html.md --- > properties of being that are universal to all beings ### truth. Kant’s [transcendental idealism](https://aarnphm.xyz/thoughts/Transcendentals/../../thoughts/Philosophy-and-Kant). --- slug: thoughts/Transformers tags: - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Transformers" title: "Transformers" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Transformers.html.md --- See also: [LLMs](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/LLMs), [embedding](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/Embedding), [visualisation from Brendan Bycroft](https://bbycroft.net/llm) > A multi-layer perceptron (MLP) architecture built on top of a [multi-head attention](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/Attention#muti-head-attention) mechanism ([Vaswani et al., 2023](#bib-vaswani2023attentionneed)) to signal high entropy tokens to be amplified and less important tokens to be diminished. ELI5: Mom often creates a food list consists of $n$ of items to buy. Your job is to guess what the last item on this list would be. Most implementations are [autoregressive](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/Autoregressive-models). Most major SOTA are decoder-only, as encoder-decoder models has lack behind due to their expensive encoding phase. [state-space models](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/state-space-models) which address transformers’ [efficiency issues](https://arxiv.org/pdf/2009.06732) in attention layers within information-dense data ## memory limitations. _excerpt from [arxiv](https://arxiv.org/html/2403.14123)_ > "How is LLaMa.cpp possible?"\ > great post by [@finbarrtimbers](https://twitter.com/finbarrtimbers?ref_src=twsrc%5Etfw) \ > \ > llama.cpp surprised many people (myself included) with how quickly you can run large LLMs on small computers, e.g. 7B runs @ \~16 tok/s on a MacBook. Wait don't you need supercomputers to work… [pic.twitter.com/EIp9iPkZ6x](https://t.co/EIp9iPkZ6x) > > — Andrej Karpathy (@karpathy) [15 août 2023](https://twitter.com/karpathy/status/1691571869051445433?ref_src=twsrc%5Etfw) ## inference. Either compute-bound (batch inference, saturated usage) or memory-bound (latency) [speculative decoding](https://aarnphm.xyz/thoughts/Transformers/../../thoughts/vllm#speculative-decoding) ⇒ memory-bound (to saturate FLOPs) ### next-token prediction. Sampling: we essentially look forward K-tokens, and then we sample from the distribution of the next token. ## Feynman-Kac Let $\mathcal{V}$ be the vocab of given transformers model, and $\mathcal{S} = \mathcal{V}^{*}$ the set of multi-token strings. Assume $\mathcal{V}$ contains token `EOS` and write $\mathcal{F} \subseteq \mathcal{S}$ for the set of `EOS`-terminated strings. > [!definition] > > is a tuple $(s_{0}, \{M_t\}_{t\ge 1}, \{G_t\}_{t\ge 1})$ where: > > - $s_{0} \in \mathcal{S}$ is an _initial state_, which will take as empty string $\epsilon$ > - $M_t(s_t \mid s_{t-1}, f_\theta)$ is a _Markov kernel_ from $s_{t-1} \in \mathcal{F}^c$ to $s_t \in \mathcal{S}$, parameterised by a transformer network $f_\theta: \mathcal{F}^c \to \mathbb{R}^{\mid \mathcal{V} \mid}$ mapping non-`EOS`-terminated strings to vectors of logits > - $G_t(s_{t-1}, s_t, f_\theta)$ is a _potential function_, mapping a pair $(s_{t-1}, s_t) \in \mathcal{F}^c \times \mathcal{S}$ to a real-valued non-negative score. Goal: generate from distribution $\mathbb{P}$ that reweights Markove chain $\mathbb{M}$ by potential functions $G_t$. We define __step-t filtering posteriors_:_ $$ P_t(s_t) = \frac{\mathbb{E}_\mathbb{M} \left[ \prod_{i=1}^{t \wedge T} G_i(S_{i-1}, S_i, f_\theta) \cdot [S_t = s_t] \right]}{\mathbb{E}_\mathbb{M} \left[ \prod_{i=1}^{t \wedge T} G_i(S_{i-1}, S_i, f_\theta) \right]} $$ _Given that $T$ is mostly finite_ we can then define _overall posterior_ $\mathbb{P}(s) = \lim_{t \to \infty} \mathbb{P}_t(s)$ ([Lew et al., 2023, p. see 2.2 for examples](#bib-lew2023sequentialmontecarlosteering)) ```pseudo \begin{algorithm} \caption{Sequential Monte Carlo Transformer Steering} \begin{algorithmic} \State \textbf{Input:} $N$ (\# particles), $K$ (factor), Feynman-Kac Transformer model $\{s_0, \{M_t\}_{t \geq 1}, \{G_t\}_{t \geq 1}\}$ \State \textbf{Output:} Weighted particle approximation $\{(x_i, w_i)\}_{i=1,\ldots,N}$ of the posterior $\mathbb{P}$ \\ \State \textbf{Output:} Unbiased estimate $\hat{Z}$ of the partition function $Z = \mathbb{E}_\mathbb{M}[\prod_{t=1}^T G_t(s_t, s_{t-1}, f_\theta)]$ \\ \State Initialize $f_\theta \gets \texttt{CachedTransformer}()$ \State Initialize $(x_i, w_i) \gets (s_0, 1)$ for $i = 1, \ldots, N$ \State Initialize $t \gets 1$ \While{$x_i \not\in \mathcal{F}$ for some $i \in \{1, \ldots, N\}$} \State $K_i \gets K (1 - \mathbb{1}_{\mathcal{F}}(x_i)) + \mathbb{1}_{\mathcal{F}}(x_i)$ for $i = 1, \ldots, N$ \State $N' \gets \sum_{i=1}^N K_i$ \For{$i \in \{1, \ldots, N\}$} \If{$x_i \in \mathcal{F}$} \State Set $(x_{i,1}, w_{i,1}) \gets (x_i, w_i \cdot \frac{N'}{N})$ \Else \State Generate $x_{i,k} \sim M_t(\cdot \mid x_i, f_\theta)$ for $k = 1, \ldots, K$ \State Set $w_{i,k} \gets w_i \cdot G_t(x_i, x_{i,k}, f_\theta) \cdot \frac{N'}{K N}$ for $k = 1, \ldots, K$ \EndIf \EndFor \State Set normalized weights $\hat{w}_{i,k} \gets \frac{w_{(i,k)}}{\sum_{j=1}^N \sum_{l=1}^{K_j} w_{(j,l)}}$ for $i = 1, \ldots, N$ and $k = 1, \ldots, K_i$ \State Set $c^* \gets \inf\{c \in \mathbb{R}_{> 0} \mid \sum_{i=1}^N \sum_{k=1}^{K_i} (\mathbb{1} \wedge c \hat{w}_{(i,k)}) > N\}$ \State Set $(I_\text{det}, I_\text{stoch}, I_\text{strat}) \gets (\{(i,k) \mid c^{*} \hat{w}_{i,k} \geq 1\}, \{(i,k) \mid c^{*} \cdot \hat{w}_{i,k} < 1\}, \{\})$ \State Set $\alpha \gets \frac{\sum_{i \in I_\text{stoch}} \hat{w}_i}{|I_\text{det}|}$ and generate $U \sim \text{Uniform}([0, \alpha])$ \For{$i \in I_\text{stoch}$} \State Set $U \gets U - \hat{w}_i$ \If{$U < 0$} \State Set $I_\text{strat} \gets I_\text{strat} \cup \{i\}$ \State Set $U \gets U + \alpha$ \EndIf \EndFor \State Set particles $\{(x_i, w_i)\}_{i=1,\ldots,|I_\text{det}|} \gets \{(x_j, w_j \cdot \frac{N}{N'}) \mid j \in I_\text{det}\}$ \State Set particles $\{(x_i, w_i)\}_{i=|I_\text{det}|+1,\ldots,N} \gets \{(x_j, \frac{N}{c^* N'} \sum_{l=1}^{N} \sum_{k=1}^{K_l} w_{(j,k)}) \mid j \in I_\text{strat}\}$ \EndWhile \State \Return $\left((x_i, w_i)_{i=1,\ldots,N}, \hat{Z} = \frac{1}{N} \sum_{i=1}^N w_i \right)$ \end{algorithmic} \end{algorithm} ``` ## Bibliographie - Lew, A. K., Zhi-Xuan, T., Grand, G., & Mansinghka, V. K. (2023). _Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs_. arXiv preprint arXiv:2306.03081 [\[arxiv\]](https://arxiv.org/abs/2306.03081) - Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). _Attention Is All You Need_. arXiv preprint arXiv:1706.03762 [\[arxiv\]](https://arxiv.org/abs/1706.03762) --- slug: thoughts/Turing-complete-Transformers tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/Turing-complete-Transformers" title: "Turing-complete Transformers" date: 2024-01-30 permalink: https://aarnphm.xyz/thoughts/Turing-complete-Transformers.html.md --- > Turing Complete Transformers: Two Transformers Are More Powerful Than One\ > "We prove transformers are not Turing complete, propose a new architecture that is Turing complete, and empirically demonstrate that the new architecture can generalize more effectively than transformers."… [pic.twitter.com/LGVlZt0afu](https://t.co/LGVlZt0afu) > > — Burny — Effective Omni (@burny\_tech) [7 janvier 2024](https://twitter.com/burny_tech/status/1744100637187461455?ref_src=twsrc%5Etfw) The idea is to combine two small [transformers](https://aarnphm.xyz/thoughts/Turing-complete-Transformers/../../thoughts/Transformers) rather than one [large models](https://aarnphm.xyz/thoughts/Turing-complete-Transformers/../../thoughts/large-models) More specialised on given tasks, and prove to be Turing-complete? --- slug: thoughts/Value tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Value" title: "Value" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/Value.html.md --- Encapsulates the following branches: - [Moral](https://aarnphm.xyz/thoughts/Value/../../thoughts/moral) - [Aesthetic](https://aarnphm.xyz/thoughts/Value/../../thoughts/aesthetic-value) > [!tip] Axiology > > concerns about about the goodness of all varieties, encompasses the nature of value, where it came from. --- slug: thoughts/Vector-calculus tags: - math description: "just enough vector calculus to be dangerous" title: "Vector calculus" date: 2024-11-27 permalink: https://aarnphm.xyz/thoughts/Vector-calculus.html.md --- ## divergence operates on vector field producing a scalar field giving quantity of the gector field’s source at each points. > represents the volume density of the outward flux of a vector field from an infinitesimal volume around a given point. > [!math] definition > > the divergence of a vector field $\mathbf{F}(\mathbf{x})$ at point $x_{0}$ is defined as _the limit of the ratio_ of the surface integral of $\mathbf{F}$ out of the closed surface of a volume $V$ enclosing $x_0$ to the volume of $V$, as $V$ shrinks to zero $$ \operatorname{div} \mathbf{F} \big|_{\mathbf{x}_0} = \lim_{V \to 0} \frac{1}{|V|} \oiint_{S(V)} \mathbf{F} \cdot \hat{\mathbf{n}} \, dS $$ where $|V|$ is the volume of $V$, $S(V)$ is the boundary of $V$ and $\hat{\mathbf{n}}$ is the outward unit normal to that surface. ### Cartesian coordinates for a continuously differentiable vector field $\mathbf{F} = F_x \mathbf{i} + F_y \mathbf{j} + F_z \mathbf{k}$, divergence is defined as the scalar-valued function: $$ \begin{aligned} \operatorname{div} \mathbf{F} = \nabla \cdot \mathbf{F} &= \left( \frac{\partial}{\partial{x}}, \frac{\partial}{\partial{y}}, \frac{\partial}{\partial{z}} \right) \cdot \left( F_x, F_y, F_z \right) \\ &=\frac{\partial{F_x}}{\partial{x}} + \frac{\partial{F_y}}{\partial{y}} + \frac{\partial{F_z}}{\partial{z}} \end{aligned} $$ ## Jacobian matrix Suppose a function $\mathbf{f}: \mathbf{R}^n \to \mathbf{R}^m$ is a function such that each of its first-order partial derivatives exists on $\mathbf{R}^n$, then the Jacobian matrix of $\mathbf{f}$ is defined as follows: $$ \begin{equation} \begin{aligned} \mathbf{J}_{\mathbf{f}} &= \begin{bmatrix} \frac{\partial \mathbf{f}}{\partial x_1} & \cdots & \frac{\partial \mathbf{f}}{\partial x_n} \end{bmatrix} \\ &= \begin{bmatrix} \nabla^T f_1 \\ \vdots \\ \nabla^T f_m \end{bmatrix} \\ &= \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix}. \end{aligned} \end{equation} $$ > [!math] Jacobian determinant > > When $m = n$, the Jacobian matrix is a square, so its determinant is a well-defined function of $x$ [^conjecture] When $m=1$, or $f: \mathbf{R}^n \to \mathbf{R}$ is a scalar-valued function, then Jacobian matrix reduced to the row vector $\nabla^T f$, and this row vector of all first-order partial derivatives of $f$ is the _transpose of the [gradient](https://aarnphm.xyz/thoughts/Vector-calculus/../../thoughts/Vector-calculus#gradient) of _$f$, or $\mathbf{J}_f = \nabla^T f$ ## gradient a vector field $\nabla f$ whose value at a point $p$ gives the direction and the rate of fastest increase. In coordinate-free term, the gradient of a function $f(\mathbf{r})$ maybe defined by: $$ df = \nabla f \cdot d \mathbf{r} $$ where $df$ is the infinitesimal change in $f$ for an infinitesimal displacement $d \mathbf{r}$, and is seen to be maximal when $d \mathbf{r}$ is in the direction of the gradient $\nabla f$ > [!math] definition > > the gradient of $f$ ($\nabla f$) is defined as the unique vector field whose dot product with any vector $\mathbf{v}$ at each point $x$ is the directional derivative of $f$ along $\mathbf{v}$, such that: [^grad-annotation] > > $$ > (\nabla f(x)) \cdot \mathbf{v} = D_v f(x) > $$ ```tikz \usepackage{pgfplots} \usepackage{tikz-3dplot} \pgfplotsset{compat=1.16} \begin{document} \begin{tikzpicture} \begin{axis}[ view={25}{30}, xlabel=$x$, ylabel=$y$, zlabel=$z$, xmin=-80, xmax=80, ymin=-80, ymax=80, zmin=-4, zmax=0, grid=major ] \addplot3[ surf, domain=-80:80, y domain=-80:80, samples=40, samples y=40, faceted color=orange, fill opacity=0.7, mesh/interior colormap={autumn}{color=(yellow) color=(orange)}, shader=flat ] {-(cos(x)^2 + cos(y)^2)^2}; \addplot3[ ->, blue, quiver={ u={4*cos(x)*sin(x)*(cos(x)^2 + cos(y)^2)}, v={4*cos(y)*sin(y)*(cos(x)^2 + cos(y)^2)}, w=0 }, samples=15, samples y=15, domain=-80:80, y domain=-80:80, ] {-4}; \end{axis} \end{tikzpicture} \end{document} ``` [^conjecture]: See also [Jacobian conjecture](https://en.wikipedia.org/wiki/Jacobian_conjecture) [^grad-annotation]: another annotation often used in [Machine learning](https://aarnphm.xyz/thoughts/Vector-calculus/../../thoughts/Machine-learning) is `grad(f)`. See also [autograd](https://aarnphm.xyz/thoughts/Vector-calculus/../../thoughts/Automatic-Differentiation) --- slug: thoughts/Vietnamese-poem tags: - seed - poem description: "dedicated to my roots, Vietnamese born." title: "Vietnamese poem" date: 2024-11-18 permalink: https://aarnphm.xyz/thoughts/Vietnamese-poem.html.md --- ## Tố Hữu Nguyễn Kim Thành (alias: Tố Hữu) was born in Phù Lai Village, near cố đô Huế. He was considered to be one of the frontier in contemporary Vietnamese literature ### Vú em ```poetry language=vi Nàng gửi con về nương xóm cũ Nghẹn ngào trở lại đẩy xe nôi Rồi từ hôm ấy, ôm con chủ Trong cánh tay êm, luống ngậm ngùi Nàng nhớ con nằm trong tổ lạnh Không chăn, không nệm ấm, không màn. Biết đâu trong những giờ hiu quạnh Nó gọi tên nàng tiếng đã khan! Rồi từ hôm ấy, dưới đêm sâu Hồi hộp nàng ra vịn cửa lầu Nhìn xuống ven trời dày bóng nặng Tìm nghe trong gió tiếng con đâu Gió vẫn vô tình lơ đãng bay Những tàu cau yếu sẽ lung lay Xạc xào động cánh đau lòng mẹ Nghe tiếng lòng con vẳng tới đây! Ta thấy nàng nghiêng mình rũ rượi Gục đầu thổn thức trong bàn tay... Bạn ơi, nguồn thảm sầu kia bởi Số phận hay do chế độ này? Huế, tháng 5-1938 ``` --- slug: thoughts/Will-to-Power tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Will-to-Power" title: "Will to Power" date: 2024-02-24 permalink: https://aarnphm.xyz/thoughts/Will-to-Power.html.md --- --- slug: thoughts/Will-to-Truth tags: - seed - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/Will-to-Truth" title: "Will to Truth" date: 2023-10-24 permalink: https://aarnphm.xyz/thoughts/Will-to-Truth.html.md --- See also: [Philosophy and Nietzche](https://aarnphm.xyz/thoughts/Will-to-Truth/../../thoughts/Philosophy-and-Nietzsche) _excerpt from [Nietzsche](https://aarnphm.xyz/thoughts/Will-to-Truth/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche)’s Beyond Good and Evil_ > The will to truth, which is still going to tempt us to many a hazardous enterprise; that celebrated veracity of which all philosophers have hitherto spoken with reverence: what questions this will to truth has already set before us! What strange, wicked, questionable questions! It is already a long story — yet does it not seem as if it has only just begun? Is it any wonder we should at last grow distrustful, lose our patience, turn impatiently away? That this sphinx should teach us too to ask questions? Who really is it that here questions us? What really is it in us that wants ‘the truth’? Nietzsche critiques the traditional approaches of truth and morality, deemed it “hazardous enterprise”. From a perspective of a philosophers, who are “deemed to pursue the truth” doesn’t seem to fully understand why > We did indeed pause for a long time before the question of the origin of this will — until finally we came to a complete halt before an even more fundamental question. We asked after the value of this will. Granted we want truth: why not rather untruth? And uncertainty? Even ignorance? — The problem of the value of truth stepped before us — or was it we who stepped before this problem? Which of us is Oedipus here? Which of us sphinx? It is, it seems, a rendezvous of questions and question-marks. And, would you believe it, it has finally almost come to seem to us that this problem has never before been posed — that we have been the first to see it, to fix our eye on it, to hazard it? For there is a hazard in it and perhaps there exists no greater hazard --- slug: thoughts/Will tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Will" title: "Will" date: 2024-01-14 permalink: https://aarnphm.xyz/thoughts/Will.html.md --- ## truth. [Nietzsche](https://aarnphm.xyz/thoughts/Will/../../thoughts/Philosophy-and-Nietzsche) critiques the traditional approaches of [truth](https://aarnphm.xyz/thoughts/Will/../../thoughts/Will-to-Truth). Nietzsche argues that philosophical thinking, like all conscious thinking, is driven by “instinctive” psychological forces, underneath which lie “valuations or, more clearly, physiological demands for the preservation of a certain type of life.” What we really value is not truth, but survival, he says. He resists “accustomed value feelings,” and wants to go “beyond good and evil” (201) ## power. - url: thoughts/The-Will-To-Believe - description: believe ## rationality > But this relief seems to be a negative rather than a positive character. Shall we then say that the feeling of rationality is constituted merely by the absence of any feeling of irrationality? Just as we feel no particular pleasure when we breathe freely, but a very intense feeling of distress when the respiratory motions are prevented,—so any unobstructed tendency to action discharges itself without the production of much cogitative accompaniment, and any perfectly fluent course of thought awakens but little feeling; but when the movement is inhibited, or when the thought meets with difficulties, we experience distress. It is only when the distress is upon us that we can be said to strive, to crave, or to aspire. > All feeling whatever, in the light of certain recent psychological speculations, seems to depend for its physical condition not on simple discharge of nerve-currents, but on their discharge under arrest, impediment, or resistance. [Lien vers l'original](https://aarnphm.xyz/thoughts/Will/../../thoughts/The-Will-To-Believe#rationality) --- slug: thoughts/XLA tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/XLA" title: "XLA" date: 2022-12-23 permalink: https://aarnphm.xyz/thoughts/XLA.html.md --- - Accelerated Algebra - Developed from Tensorflow ```python def calc(x, y, z): return tf.reduce_sum(x + y * z) ``` Optimise compute graph via single kernel launch vs. launching three separate kernel See also [PJRT](https://aarnphm.xyz/thoughts/XLA/../../thoughts/PJRT) --- slug: thoughts/Zipf's-Law tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/Zipf's-Law" title: "Zipf's Law" date: 2024-12-01 permalink: https://aarnphm.xyz/thoughts/Zipf's-Law.html.md --- Applies to frequency table of word in corpus of [language](https://aarnphm.xyz/thoughts/Zipf's-Law/../../thoughts/Language): $$ \text{word frequency} \propto \frac{1}{\text{word rank}} $$ Empirically: - the most common word occurs approximately twice as often as the next common one, three times as often as the third most common, and so on. also known in _Zipf-Mandelbrot’s_ law: $$ \begin{aligned} \text{frequency} &\propto \frac{1}{(\text{rank} + b)^a} \\[8pt] &\because a,b: \text{fitted parameters with } a \approx 1 \text{ and } b \approx 2.7 \end{aligned} $$ ## definition > [!math] Zipf distribution > > the distribution on $N$ elements assign to element of rank $k$ (counting from 1) the probability: > > $$ > \begin{aligned} f(k;N) &= \begin{cases} \frac{1}{H_N} \frac{1}{k}, & \text{if } 1 \leq k \leq N, \\ 0, & \text{if } k < 1 \text{ or } N < k. \end{cases} \\[12pt] &\because H_N \equiv \sum_{k=1}^{N} \frac{1}{k}. (\text{normalisation constant}) \end{aligned} > $$ --- slug: thoughts/action-theory tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/action-theory" title: "action theory" date: 2024-02-22 permalink: https://aarnphm.xyz/thoughts/action-theory.html.md --- There is a huge difference between activity and passivity --- slug: thoughts/aesthetic-value tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/aesthetic-value" title: "aesthetic value" date: 2024-01-30 permalink: https://aarnphm.xyz/thoughts/aesthetic-value.html.md --- Also known as [taste](https://aarnphm.xyz/thoughts/aesthetic-value/../../thoughts/taste), under the scope of [value](https://aarnphm.xyz/thoughts/aesthetic-value/../../thoughts/Value) [Source](https://plato.stanford.edu/entries/aesthetic-concept/) > - whether artworks are necessarily aesthetic objects; > - how to square the allegedly perceptual basis of aesthetic judgments with the fact that we give [reasons](https://aarnphm.xyz/thoughts/aesthetic-value/../../thoughts/reason) in support of them; > - how best to capture the elusive contrast between an aesthetic attitude and a practical one; > - whether to define aesthetic experience according to its phenomenological or [representational](https://aarnphm.xyz/thoughts/aesthetic-value/../../thoughts/representations) content; > - how best to understand the relation between aesthetic value and aesthetic experience ## beauty --- slug: thoughts/algebraic-geometry tags: - math - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/algebraic-geometry" title: "Algebraic geometry" date: 2024-05-22 permalink: https://aarnphm.xyz/thoughts/algebraic-geometry.html.md --- See also [git source](https://github.com/stacks/stacks-project) and [web view](https://stacks.math.columbia.edu/) --- slug: thoughts/atelier-with-friends/dundurn tags: - menu description: "atelier with friends deux - orangeville" title: "dundurn." date: 2024-03-23 permalink: https://aarnphm.xyz/thoughts/atelier-with-friends/dundurn.html.md --- ## _entrée._ ### Soupe à l’Oignon Gratinée oignons caramélisés, bouillon de bœuf, gruyère, baguette. ## _plat principal._ ### [poissons.](https://aarnphm.xyz/thoughts/atelier-with-friends/dundurn/../../../../thoughts/atelier-with-friends/images/dundurn-1.webp) flétan, sauce au beurre citronné, carottes anciennes rôties, purée de carottes. ## _dessert._ ### tiramisu. espresso, mascarpone, biscuits à la cuillère, cacao. --- slug: thoughts/atelier-with-friends/index tags: - menu - evergreen description: "resconstructed source of https://aarnphm.xyz/thoughts/atelier-with-friends/index" title: "atelier with friends." date: 2024-03-07 permalink: https://aarnphm.xyz/thoughts/atelier-with-friends/index.html.md --- Somewhat impromptu supper club hosted by yours truly. See also [dishes](https://aarnphm.xyz/thoughts/atelier-with-friends/index/../../../../thoughts/Dishes) for a comprehensive repertoire. --- slug: thoughts/atelier-with-friends/orangeville tags: - menu description: "atelier with friends uno - orangeville" title: "orangeville." date: 2024-03-08 permalink: https://aarnphm.xyz/thoughts/atelier-with-friends/orangeville.html.md --- ## _pasta._ ### [Uovo la Raviolo](https://aarnphm.xyz/thoughts/atelier-with-friends/orangeville/../../../../thoughts/atelier-with-friends/images/orangeville-1.webp) uovo, beurre noisette, salvia, parmigiano reggiano, ricotta ripieno, noce moscata ### [Pomodori alla fetuccine](https://aarnphm.xyz/thoughts/atelier-with-friends/orangeville/../../../../thoughts/atelier-with-friends/images/orangeville-2.webp) bucatini, marinara, olio oliva ### Aglio e Olio bucatini, olio oliva, aglio ### Pesto alla bucatini bucatini, pesto, pepe ## _salsa._ ### Marinara pomodori, cipolla, basilico, aglio confit, fiocchi di peperoncino, origano, aceto di vino bianco. ### Pesto basilico, olio extra vergine di oliva, pinoli tostati, parmigiano reggiano, aglio tritato. --- slug: thoughts/attractor tags: - math - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/attractor" title: "Attractor" date: 2024-03-25 permalink: https://aarnphm.xyz/thoughts/attractor.html.md --- A set of points described by a dynamical system. Some exhibits [chaotic](https://aarnphm.xyz/thoughts/attractor/../../thoughts/Chaos) behaviour, see also [Paul Bourke’s work](https://paulbourke.net/fractals/) Often create visually appealing patterns, but its applications range from physics to biology: how we understand weather patterns, bird migration patterns, quantum phenomena. --- slug: thoughts/being tags: - seed - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/being" title: "being." date: 2024-06-12 permalink: https://aarnphm.xyz/thoughts/being.html.md --- What is being as a part of [epistemology](https://aarnphm.xyz/thoughts/being/../../thoughts/Epistemology)? ### why do we practice art? Practice anything, no matter how well or how bad, we practice art as a act of becoming, for our soul to grow. --- slug: thoughts/composition tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/composition" title: "composition" date: 2024-03-09 permalink: https://aarnphm.xyz/thoughts/composition.html.md --- How we combine elements to make a comprehensive model. Uses [Color](https://aarnphm.xyz/thoughts/composition/../../thoughts/Color) --- slug: thoughts/computational-poem tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/computational-poem" title: "computational poem" date: 2024-10-11 permalink: https://aarnphm.xyz/thoughts/computational-poem.html.md --- Workshop by [Alicia Guo](https://www.aliciaguo.com/) See also [thoughts/code/poem.js](https://cdn.aarnphm.xyz/assets/thoughts/code/poem.js) ## text generation with grammars So what shapes languages? Grammars do. ## context-free grammars --- slug: thoughts/confirmation-bias tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/confirmation-bias" title: "confirmation bias" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/confirmation-bias.html.md --- --- slug: thoughts/constrained-decoding tags: - ml - proposal description: "structured generations in vLLM a la carte" title: "constrained decoding" date: 2024-11-18 permalink: https://aarnphm.xyz/thoughts/constrained-decoding.html.md --- The following document describes and summarizes existing works in vLLM to improve general guided decoding performance. [^performance] This design will largely affect how `logit_processor` are currently being handle within the vLLM architecture. Main mega thread: [vllm-project/vllm#5423](https://github.com/vllm-project/vllm/issues/5423) Goal: - Improve general TPS when using guided decoding. - Standardize logit processor interface [^samplingpr] - separate compute\_logits and preparing logits into two separate steps Orthogonal, but still goals: - [vllm-project/vllm#5006](https://github.com/vllm-project/vllm/pull/5006) - Logit processor plugins, similar to how vLLM plugins are handled. [vllm-project/vllm#4769](https://github.com/vllm-project/vllm/pull/4769) - Scope: `logit_processor`, `sampling controller interface` ## background ![flow](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/pre-optimized-logit-processor-handling.webp) flow _reference: [vllm-project/vllm#5329](https://github.com/vllm-project/vllm/pull/5329)_ Currently, generations with FSM is super slow, even with warmup steps to initialize given FSM. This behaviour is further exemplified when running with context longer than 4096 tokens. Additionally, all outlines logit processors are considered stateful, which slows down the model executor, given in V0 logit processors are applied [row-by-row blocking](https://github.com/vllm-project/vllm/blob/1ea291a4173a82c537ab42487e23375be4926d30/vllm/model_executor/layers/logits_processor.py#L143) Thus comparing to sglang, vLLM v0 is currently not up to par. ## plan Implement structured decoding from scheduler, given that we can compute token bitmask and broadcast towards GPU workers - p1: Implement [jump-ahead decoding](https://lmsys.org/blog/2024-02-05-compressed-fsm/#method-1-finite-state-machine-based) > [**@cadedaniel**](https://github.com/cadedaniel): “tree scoring in \[spec decode] could use the same API as multi-path jump decoding.” > [!question] How should we handle FSM per requests? > > - Currently, users can specify different schemas per request, which means the FSM will be compiled per request. This is suboptimal because it slows down general TTFT. > - For most use cases, we should assume JSON schema similar to how the system prompt is currently being handled (pass during server init) > [!question] Why should we follow the plugins system? > > - If going with the best options, then what is the reasoning behind supporting different backends? > - Agree for extensibility, but seems to add additional overhead. --- ## appendix. The following includes background information about guided generations. ### batched constrained decoding using pushdown automaton Implemented in [mlc-ai/xgrammar](https://github.com/mlc-ai/xgrammar) > [!quote] Quote > > calculate adaptive token bit-mask per batch > [!tip] IMPORTANT > > operating on string level, not `token_id` `GrammarMatcher` ⇒ FSM in xgrammar #### questions - byte-level automaton overhead of token\_id ⇒ string Token for context-independent tokens vs dependent tokens within the generation masks async pre-compile synchronize apply mask for CPU → GPU? How do we apply said masks to GPU block? Zero-overhead generations? > [!question] worst-case scenario for grammar compilation? > > mask gen overhead: 36 $\mu s$ > [!question] time linearly increase for batch size? > > parallelize for compilation. > [!question] do we need to parallelize on vLLM? > > no, xgrammar parallelize it, with `pthread` > [!question] shape of masks? > > bitmask, tensors of vocab size ⇒ concat with recast ⇒ GPU > [!question] supported tokenizers? > > GLM yet to be supported (Nov 22nd) > [!question] Given that detokenizer is in a separate process with vLLM, then can we stops duplicating this process? > > Currently with `xgrammar`: detokenizer included in mask generations. > > token\_id ⇒ tokens #### future plans - Function calling support - Support more grammar (CFG, Python grammar) ### compressed FSM for jump-ahead tokens. Implemented in ([Zheng et al., 2024](#bib-zheng2024sglangefficientexecutionstructured)) #### Method 1: [FSM](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/constrained-decoding#guided-generations-with-fsm)-based decoding - intuition: Using FSM ([Willard & Louf, 2023](#bib-willard2023efficientguidedgenerationlarge)) to guide generations by increasing logit bias for tokens that conform to given JSON schema. This allows us to track the current state during decoding and filter out invalid tokens by applying logit bias to the output. ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/constrained-json-fsm.webp) - limitation: we can see that given construction of FSM requires token-level access, it can only transition the state by only _one_ token at a time, resulting in slow decoding. #### Method 2: Interleaved-based - intuition: breaks down JSON schemas, each containing either a chunk prefill part or constrained decoding part. They are then executed interleaved by inference system. Faster than per-token decoding given that chunked prefill components can process multiple tokens per forward pass See also using llama.cpp as backend. - limitation: - interleaved-based require custom syntax, making it less expressive compared to regex. - struggles to deal with tokenization boundaries due to conflicts between decode and chunked prefill segments. - frequent communications between interpreter and back-end adds additional overhead. #### **_Method 3: Jump-Forward Decoding with compressed FSM_** ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/jump-forward-decoding-fsm.webp) > [!tip] tokenization boundary handling > > During decoding, it is preferred to combine multiple characters into a single tokens. > > For example, when decoding `"Hello"` in context of JSON decoding, LLM might output the following token `"`, `He`, `llo`, `",` > > This may cause some strange behaviour if we combine the last `"` with `,` (this regex `"[\w\d\s]*"` with the last `,` will lead to endless decoding because this token `",` is not valid even if the LM wants to stop.) Fix: - implement _re-tokenization_ mechanism during jump-forward phase (append string instead of the tokens, followed with re-tokenization of the entire text) $\to$ add approximately 4% of overhead - use a comprehensive regex to guide the decoding phase, instead of employing multiple concatenated regex [^coalescence] ### Coalescence intuition: Instead of expanding to $n$ state, we can compress certain chunks into one state to reduce the size of said FSM. ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/part-of-json-fsm.webp) _figure 1: initial FSM state_ ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/compressed-fsm-json.webp) _figure 2: compressed FSM state_ A way to adapt character regex to work with tokens in `outlines`: ```python import outlines.fsm as fsm from outlines.fsm.regex import make_deterministic_fsm, create_fsm_index_tokenizer new_fsm, _ = make_deterministic_fsm(fsm) idx, _ = create_fsm_index_tokenizer(new_fsm, tokenizer) ``` ```mermaid stateDiagram-v2 [*] --> InputPrompt: Start state "input prompt" as InputPrompt state "next-token probability distribution" as GetProb state "valid tokens" as ListTokens { [*] --> CheckTransitions CheckTransitions --> FilterTokens: Get index[0].keys() FilterTokens --> [*] } state "Sample Token" as SampleToken state "Update FSM State" as UpdateState InputPrompt --> GetProb: "model.generate" GetProb --> ListTokens: Get next-token distribution ListTokens --> SampleToken: Use filtered token list SampleToken --> UpdateState: Selected token X UpdateState --> [*]: new_state = index[0]["X"] ``` ```python idx_with_tokens = { state: {tokenizer.tokenizer.decode([key]): value for key, value in transitions.items()} for state, transitions in idx.items() } ``` > [!note]- example > > ```mermaid > stateDiagram-v2 > direction LR > 0 --> 2: n > 0 --> 1: t > 1 --> 2: a > 2 --> 4: na > 2 --> 3: a > 3 --> 5: am > 4 --> 6: me > 5 --> 6: me > 2 --> 6: name > 6 --> 7: e > 6 --> 8: c > 7 --> 9: p > 8 --> 9: p > 9 --> 11: Paul > 9 --> 12: Pa > 9 --> 10: Jo > 11 --> 13: aul > 12 --> 14: ul > 10 --> 26: o > 26 --> 27: h > 27 --> 14: n > 13 --> 14: l > 14 --> 16: s > 14 --> 15: s > 15 --> 17: s > 16 --> 17: s > 17 --> 18: a > 17 --> 19: ag > 18 --> 20: ge > 19 --> 20: e > 20 --> 21: 30 > 20 --> 22: 20 > 21 --> 24: 2 > 22 --> 24: 2 > 22 --> 23: 3 > 24 --> 25: 0 > 25 --> [*] > ``` _note:_ each state of FSM represents a forward pass to the LM. In vanilla generation, this is essentially necessary. Thus there is no added overhead of FSM for controlling the generated outputs. From state 2-6, we observer that there are eight different paths to get the same generations of `name`. We probably don’t need to do this, given that it will all give us result `name` But suffice to say, we can hijack this behaviour to accelerate generations by append either of the following tokens **word** to currently generated sequence: - \[”name”] - \[”n”, “a”, “m”, “e”] - \[”na”, “m”, “e”] - \[”nam”, “e”] - \[”n”, “am”, “e”] - \[”n”, “ame”] - \[”na”, “me”] - \[”n”, “a”, “me”] A simplified index can be shown as: ```python simplified_index = { 0: {'{"': 2}, 2: {"name": 6}, 6: {'":"': 9}, 9: {'Paul': 14, 'John': 14}, 14: {'","': 17}, 17: {'age': 20}, 20: {'":': 22}, 22: {'20': 24, '30': 24}, 24: {'}': 25}, } ``` That’s at least a 5x speedup over structured generations, given that out of the 9 tokens, two states are single-state transitions. Therefore we only need to call the model _twice_!! > [!tip]- difference in sampling distribution > > All these paths lead to the same string and the same speedup, however they lead to potentially very different states for the LLM when it reaches state 6. That is, the strings are the same, but each path leads to a different conditional probability distribution in stage 6. > > ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/json-difference-in-sampling-distribution.webp) ### Guided generations with FSM. ([Willard & Louf, 2023](#bib-willard2023efficientguidedgenerationlarge)), implemented at _assumption: we are building against [autoregressive transformers models](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/Autoregressive-models)_ - Let $\mathcal{F} \subset \mathcal{P}(\mathcal{V})$, where $\mathcal{P}$ is the power set operator, be subset of multi-token string that ends with tokens $\text{EOS} \in \mathcal{V}$. - Text generation tasks is to draw samples from $\mathcal{F}$ Notable _sampling_ methods include greedy decoding (generate tokens recursively with highest probability tokens), beam search (but using heuristic to find the mode of distribution) [^smc] A pseudocode for sampling procedure is as follow: ```pseudo \begin{algorithm} \caption{LLM token sampling} \begin{algorithmic} \Function{sample}{$L$} \State $s \gets ()$ \For{$i \gets 1, L$} \State $\alpha \gets \text{LM}(s, \theta)$ \State Sample $s \sim \text{Categorical}(\alpha)$ \If{$s = \text{EOS}$} \State \textbf{break} \EndIf \State $s \gets \text{append}(s, s)$ \EndFor \State \Return $s$ \EndFunction \end{algorithmic} \end{algorithm} ``` Given that we are dealing with finite discrete distribution, we can then compute an un-normalized conditional distribution by applying a boolean mask $m: \mathcal{P}(\mathcal{V}) \to \{0,1\}^N$, which restricts the support of original distribution: $$ \begin{aligned} \alpha &= \text{LM}(\tilde{S_t}, \theta) \\ \tilde{\alpha} &= m(\tilde{S_t}) \odot \alpha \\ \tilde{s_{t+1}} &\approx \text{Categorial}(\tilde{\alpha}) \end{aligned} $$ > [!math] augmentation upon sampling algorithm > > ```pseudo > \begin{algorithm} > \caption{token sampling with masking} > \begin{algorithmic} > \Function{sample}{$L$} > \State $s \gets ()$ > \For{$i \gets 1, L$} > \State $\alpha \gets \text{LM}(s, \theta)$ > \State Construct the mask m($s$) > \State $\tilde{\alpha} \gets m \odot \alpha$ > \State Sample $\tilde{s} \sim \text{Categorical}(\tilde{\alpha})$ > \If{$\tilde{s} = \text{EOS}$} > \State \textbf{break} > \EndIf > \State $s \gets \text{append}(s, \tilde{s})$ > \EndFor > \State \Return $s$ > \EndFunction > \end{algorithmic} > \end{algorithm} > ``` > [!tip] finite automaton > > We define a _finite-state machine_, given by $(Q, \Sigma , \delta, q_0, F)$ [^automaton-definition] where character comprising the strings in $\mathcal{V}$ are drawn from $\Sigma$, i.e: $\mathcal{V} \in \mathcal{P}(\Sigma)$ > > ![](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/images/vllm/fsm-iterative-generations.webp) > _FSM making for regular expression `([0-9]*)?\.?[0-9]*`_ > > > [!note]- example illustration > > > > For simplicity, let the vocabulary $\mathcal{V}$ consists of strings $\{A, ., 42, .2, 1\}$ > > > > - generations start: FSM in state 0, so it masks “A”, since it wouldn’t accepted by the FSM. Then we only sample ”.”, “42”, “.2”, “1” in this case > > - if we sample “.2” then we advance the FSM to state 3. In this case. only “42” and “1” are valid completions, so we mask other values before sampling. If we sample “1” instead, then we advance FSM to state 1, in which case ”.”, “.42”, “.2”, and “1” are valid completions > [!tip] determinism > > Looping through the vocabulary is still the biggest issue. For that, we preprocess the vocabulary using Regex’s FSM and build a index. Thus a proceeding for producing matches starting at any point in the FSM is required. We define finding sub-sequences of FSM $M$ that accept string $v$ as follow: ```pseudo \begin{algorithm} \caption{Find sub-sequences of the FSM $M$ that accept the string $v$} \begin{algorithmic} \Function{FindSubSequences}{$M, v$} \State $M = (Q, \Sigma, \delta, q_0, F)$ \State $\texttt{res} \gets ()$ \For{$r \in \delta^{-1}(\cdot, v_0)$} \Comment{$\text{ Loop through states that read } v_0$} \State $p \gets (r)$ \For{$i \gets 1, |v| - 1$} \Comment{$\text{ Walk the FSM}$} \If{$\delta(r, v_i) = \emptyset$} \Comment{$\text{ The FSM does not read } v_i$} \State $p \gets ()$ \State \textbf{break} \Comment{$\text{ Stop walking and try the next start state}$} \EndIf \State $r \gets \delta(r, v_i)$ \State $p \gets \text{append}(p, r)$ \EndFor \State $\texttt{res} \gets \text{append}(\texttt{res}, p)$ \EndFor \State \Return $\texttt{res}$ \EndFunction \end{algorithmic} \end{algorithm} ``` We can then define construction of $\sigma$ ```pseudo \begin{algorithm} \caption{Construct a map from FSM states to subsets of $\mathcal{V}$} \begin{algorithmic} \Function{MapStatesToVocab}{$M, \mathcal{V}$} \State $M = (Q, \Sigma, \delta, q_0, F)$ \State Initialize the map $\sigma$ with empty sets for each element in $Q$ \For{$v \in \mathcal{V}$} \Comment{$\text{Loop through the vocabulary}$} \State $Z \gets \text{find\_sub\_sequences}(M, v)$ \For{$z \in Z$} \Comment{$\text{Loop through state sequences accepting } v$} \State $\sigma(z_0) \gets \sigma(z_0) \cup v$ \EndFor \EndFor \State \Return $\sigma$ \EndFunction \end{algorithmic} \end{algorithm} ``` ## Bibliographie - Lew, A. K., Zhi-Xuan, T., Grand, G., & Mansinghka, V. K. (2023). _Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs_. arXiv preprint arXiv:2306.03081 [\[arxiv\]](https://arxiv.org/abs/2306.03081) - Willard, B. T., & Louf, R. (2023). _Efficient Guided Generation for Large Language Models_. arXiv preprint arXiv:2307.09702 [\[arxiv\]](https://arxiv.org/abs/2307.09702) - Zheng, L., Yin, L., Xie, Z., Sun, C., Huang, J., Yu, C. H., Cao, S., Kozyrakis, C., Stoica, I., Gonzalez, J. E., Barrett, C., & Sheng, Y. (2024). _SGLang: Efficient Execution of Structured Language Model Programs_. arXiv preprint arXiv:2312.07104 [\[arxiv\]](https://arxiv.org/abs/2312.07104) [^performance]: Benchmark script can be found at [vllm-project/vllm#10046](https://github.com/vllm-project/vllm/pull/10046). Current RFC [vllm-project/vllm#5423](https://github.com/vllm-project/vllm/issues/5423) [^samplingpr]: [vllm-project/vllm#6273](https://github.com/vllm-project/vllm/pull/6273) proposed a sampling controller interface, but [**@cadedaniel**](https://github.com/cadedaniel) shares some [concerns](https://github.com/vllm-project/vllm/pull/6273#issuecomment-2243654991) wrt fast-forward tokens [^coalescence]: this phenomena is also known as [coalescence](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/constrained-decoding#coalescence) in structured generations, where it exploit deterministic structures in desired outputs to skip expensive forward pass [^smc]: ([Lew et al., 2023](#bib-lew2023sequentialmontecarlosteering)) recently proposes a sequential [Monte Carlo steering](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/Monte-Carlo). The idea is to classify causal generations as a _posteriori inference_ problem in a class of discrete probabilistic sequence models. See also [Feynman-Kac transformers models](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/Transformers#feynman-kac) [^automaton-definition]: [finite state machine](https://aarnphm.xyz/thoughts/constrained-decoding/../../thoughts/university/twenty-three-twenty-four/sfwr-2fa3/DFA) - $Q$ is a finite set of states - $\Sigma$ is a finite alphabet - $\delta: Q \times \Sigma \to Q$ is the transition function - $q_0 \in Q$ is the start state - $F \subseteq Q$ is the set of all accepted states. --- slug: thoughts/cryptography tags: - technical description: "resconstructed source of https://aarnphm.xyz/thoughts/cryptography" title: "cryptography" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/cryptography.html.md --- ### functions. See also [Merkle DAG](https://aarnphm.xyz/thoughts/cryptography/../../thoughts/Merkle-DAG) --- slug: thoughts/cyanotype tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/cyanotype" title: "cyanotype" date: 2024-10-03 permalink: https://aarnphm.xyz/thoughts/cyanotype.html.md --- > slow-reacting, economical photographic printing formulation In context of writing, similar to what telescopic writing is. --- slug: thoughts/data tags: - seed - pattern description: "resconstructed source of https://aarnphm.xyz/thoughts/data" title: "data" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/data.html.md --- Representation of information in a formalised manner suitable for communication, interpretation, or processing by humans or by automatic means. ⇒ semanticity Logistic regression: $$ \frac{1}{1 + e^{-(x - \mu)/s}} $$ - schema + relational. ## theory See also [database](https://aarnphm.xyz/thoughts/data/../../tags/sfwr3db3) ## types. nominal data - qualitative data - mutually exclusive - cannot be ranked - $= \neq \in \notin$ ordinal data - represents categories - $= \neq \in \notin > <$ time-series data (interval) - no true zero - $= \neq > < + -$ ratio data - $= \neq > < + - \times \%$ ## dimensionality --- slug: thoughts/deep-learning tags: - ml - framework description: "resconstructed source of https://aarnphm.xyz/thoughts/deep-learning" title: "deep learning" date: 2024-01-11 permalink: https://aarnphm.xyz/thoughts/deep-learning.html.md --- See also: [The Little Book of Deep Learning](https://aarnphm.xyz/thoughts/deep-learning/../../books#2024) ([pdf](https://fleuret.org/public/lbdl.pdf) or [lectures](https://fleuret.org/dlc/)) or this [lecture series at CMU](https://dlsyscourse.org/lectures/) - [PyTorch](https://aarnphm.xyz/thoughts/deep-learning/../../thoughts/PyTorch) - [Jax](https://aarnphm.xyz/thoughts/deep-learning/../../thoughts/Jax): from [autograd](https://github.com/HIPS/autograd) project, by pretty much the same core team --- slug: thoughts/design tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/design" title: "design" date: 2024-03-09 permalink: https://aarnphm.xyz/thoughts/design.html.md --- ### what? --- slug: thoughts/desire tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/desire" title: "Desire" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/desire.html.md --- --- slug: thoughts/dialectics tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/dialectics" title: "dialectics" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/dialectics.html.md --- often involves some sort of contradictory between opposing sides. ### [Hegel](https://aarnphm.xyz/thoughts/dialectics/../../thoughts/Hegel)’s dialectics The opposing sides are dependant of the topics being discussed. In [Phenomenology of Spirit](https://aarnphm.xyz/thoughts/dialectics/../../thoughts/Hegel#phenomenology-of-spirit) which presents his [epistemology](https://aarnphm.xyz/thoughts/dialectics/../../thoughts/Epistemology), the “opposing sides” are different definitions of consciousness and of the object that consciousness is aware of or claims to know. In his work in [logic](https://aarnphm.xyz/thoughts/dialectics/../../thoughts/logic), the opposing sides are logical concepts that are opposed to one another. --- slug: thoughts/displacement tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/displacement" title: "displacement" date: 2024-01-08 permalink: https://aarnphm.xyz/thoughts/displacement.html.md --- Often explored through Graham Greene’s works. --- slug: thoughts/distraction tags: - seed - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/distraction" title: "distraction" date: 2024-03-18 permalink: https://aarnphm.xyz/thoughts/distraction.html.md --- > i think we forget that a core part of the human experience is to create and be creative\ > \ > we create all the time, even in the most mundane ways\ > \ > but creativity isnt about the act of \*producing\* something - its about reaching a state of awareness that allows you to filter out the… [pic.twitter.com/A7PML2zLtt](https://t.co/A7PML2zLtt) > > — harpriya (@harpriiya) [18 mars 2024](https://twitter.com/harpriiya/status/1769532246674022407?ref_src=twsrc%5Etfw) --- slug: thoughts/education tags: - pattern description: "resconstructed source of https://aarnphm.xyz/thoughts/education" title: "education" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/education.html.md --- See more on [the extension](https://aarnphm.xyz/thoughts/education/../../posts/education) ## system current[^1] education system is not designed to inspire curiosity, and it would keep being this way unless the definition quality of success and quality moves to a more holistic measures. University should be a place for you to think, not to always be right. It should encourage a form of [intellectual playfulness](https://aarnphm.xyz/thoughts/education/../../thoughts/play) and [agency](https://aarnphm.xyz/thoughts/education/../../thoughts/Agency) for explore. ## teaching I do think that professor should use more primary sources, less secondary. Secondary sources are curated and [compressed](https://aarnphm.xyz/thoughts/education/../../thoughts/reductionism) the amount of information being given. Compression can lead to [confirmation bias](https://aarnphm.xyz/thoughts/education/../../thoughts/confirmation-bias), but saturation of information also overloads the students. ### shortification/tiktok-fication of information > \# on shortification of "learning"\ > \ > There are a lot of videos on YouTube/TikTok etc. that give the appearance of education, but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy thinking they are… > > — Andrej Karpathy (@karpathy) [10 février 2024](https://twitter.com/karpathy/status/1756380066580455557?ref_src=twsrc%5Etfw) The idea of learning is that it supposed to be mentally challenging, not fun and easy. In the process of shortification, we are losing the depth of information, as does any form of [compression](https://aarnphm.xyz/thoughts/education/../../thoughts/Compression). Similar to how [LLMs](https://aarnphm.xyz/thoughts/education/../../thoughts/LLMs) is being trained on today. > Learning is not supposed to be fun. It doesn’t have to be actively not fun either, but the primary feeling should be that of effort. It should look a lot less like that “10 minute full body” workout from your local digital media creator and a lot more like a serious session at the gym. You want the mental equivalent of sweating. It’s not that the quickie doesn’t do anything, it’s just that it is wildly suboptimal if you actually care to learn. The process of learning should be enduring, but rewarding. It should be a process of internalizing the concept, and practice to thinking coherently, similar to how we [write](https://aarnphm.xyz/thoughts/education/../../thoughts/writing). ### [Constructionist](https://aarnphm.xyz/thoughts/education/../../thoughts/Constructionist) critique[^2]: Too many tools and too much space: Large space should start small, and widen, rather than having everything readily available [^1]: [WEF on relevance of education system](https://www.weforum.org/agenda/2020/04/our-education-system-is-losing-relevance-heres-how-to-update-it/), written in April 13rd 2020 [^2]: See [here](https://saskschoolboards.ca/wp-content/uploads/97-07.htm#:~:text=Constructivist%20teaching%20is%20based%20on,rather%20than%20passively%20receiving%20information.) --- slug: thoughts/effective-procedure tags: - math description: "resconstructed source of https://aarnphm.xyz/thoughts/effective-procedure" title: "effective procedure" date: 2024-10-08 permalink: https://aarnphm.xyz/thoughts/effective-procedure.html.md --- In [logic](https://aarnphm.xyz/thoughts/effective-procedure/../../thoughts/logic), an effective procedure is a procedure for solving problem by any intuitively ‘effective’ means from a specific class. ## formation rules for propositional calculus (wff: well-formed formula) $$ \begin{aligned} \text{FR1} &. \text{ A variable standing alone is a wff} \\ \text{FR2} &. \text{ If } \alpha \text{ is a wff, so is } \neg \alpha \\ \text{FR3} &. \text{ If } \alpha \text{ and } \beta \text{ are wffs, then } (\alpha \cdot \beta ), (\alpha \space \beta), (\alpha \vee \beta ), (\alpha \supset \beta), \text{ and } (\alpha \equiv \beta) \text{ are wffs} \end{aligned} $$ --- slug: thoughts/emergent-behaviour tags: - seed - psychology description: "resconstructed source of https://aarnphm.xyz/thoughts/emergent-behaviour" title: "emergent behaviour" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/emergent-behaviour.html.md --- > When a complex entity exhibits properties, or behaviours that its parts do not have on their own. Or how can complex properties emerge from simple rules. We observe this from: - [LLMs](https://aarnphm.xyz/thoughts/emergent-behaviour/../../thoughts/LLMs), speculations at most - Ants colonies - mold simulations In context of single agent within multi-agent systems, is it due to the rules itself ([reductionist](https://aarnphm.xyz/thoughts/emergent-behaviour/../../thoughts/reductionism)) or additional factors are involved here? --- slug: thoughts/ethics tags: - philosophy - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/ethics" title: "ethics" date: 2024-03-05 permalink: https://aarnphm.xyz/thoughts/ethics.html.md --- Closely connected to [value](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Value) theory, or [moral](https://aarnphm.xyz/thoughts/ethics/../../thoughts/moral) philosophy. [Kantian](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Philosophy-and-Kant) ethics presupposes that there is a universal moral law that applies to all rational beings, or his deontological ethics framework that based on “categorical imperative”. This is different from [Mill’s utilitarianism](https://aarnphm.xyz/thoughts/ethics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill), who argued actions are right insofar as they promote happiness and wrong insofar that they produce a _reverse_ of happiness. [Nietzsche](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Philosophy-and-Nietzsche) critiqued conventional moral theories, and argued for reevaluation of [value](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Value). He believed that traditional morality stifled the full potential of human excellence, seen through BGE or “On the Genealogy of Moral”. Ethics arguments are based of the principles of “good” versus “evil”. What defined as “good” and “evil”? Does human whom ideology falls outside of the [Overton Window](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Overton-Window) considered “evil”? That’s why it’s important to understand our [alignment](https://aarnphm.xyz/thoughts/ethics/../../thoughts/Alignment) through anthropology work such that we didn’t repeat history. ## normative. ### consequentialism - Utilitariansm See also [John Stuart Mill](https://aarnphm.xyz/thoughts/ethics/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill) Locke: Action is acceptable if it respects human rights of everyone involved - Common good ### deontology Duty ethics ### virtue ### care ## meta-ethics. --- slug: thoughts/fashion tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/fashion" title: "fashion." date: 2024-02-19 permalink: https://aarnphm.xyz/thoughts/fashion.html.md --- Fashion, rather a hobby than a need (prob. why [expsense](https://aarnphm.xyz/thoughts/fashion/../../thoughts/Expenses) are high, but worth it). It is an art, a form of self-expression and self-care, for one to exert themselves to the world. > My mantra: “Quality over quantity.” Get a few good pieces that will last you a long time. I do follow trends, and fashion shows. A mix of smart casual, and street wear are my comfort zone. Keep it simple, and minimalistic. **Less is always more**. ### gender dynamics. See also: [Fashion and the Homospectatorial Look](https://aarnphm.xyz/thoughts/fashion/../../thoughts/papers/Fashion-and-the-Homospectatorial-Look.pdf) and [this video](https://www.youtube.com/watch?v=DA2PqBAyGqI\&t=454s\&ab_channel=oliSUNvia) Fuss’s arguments suggest that contemporary fashion photography does not simply cater to a heterosexual male gaze but also tacitly produces a gaze that, while regulating homosexual desire, provides opportunities for its expression. She argues that fashion photography often presents women in a manner that is eroticized, which can be seen as catering to a male gaze. However, this same eroticization can also appeal to women, creating a homospectatorial look where women are viewing other women through a lens that is both homoerotic and commodified. Plays well into spaces that open for more nuanced and subtle expression of [desire](https://aarnphm.xyz/thoughts/fashion/../../thoughts/desire) and [identity](https://aarnphm.xyz/thoughts/fashion/../../thoughts/identity). [Quiet luxury](https://aarnphm.xyz/thoughts/fashion/../../thoughts/fashion#quiet-luxury)’s emphasis on minimalism, subtle textures mirrors the homospectatorial look. ### why fashion show matters, and why it doesn’t. Similar to math, AI conference, it is a place for people to show off their work and get inspired. Legendary designers, with the likes of Ralph Lauren, Dean and Dan Caten, over the years has pretty much shaped and influenced how our jeans and causal style look like. Just look down to their pair of jeans, or slim-fit trousers you wear. Some of the crest or folded textures are inspired by probably one of these designers. It matters in the sense it drives the industry, but also it doesn’t matter because you can pretty much get the look or highlights of what is trendy from the internet, or social media, via “shortification” of videos and informations. I do care about fashion shows simply I appreciate the art and the work that goes into it. ### pretentious. Extensions from the book _Pretentiousness: Why It Matters_ by Dan Fox. > Pretentiousness is for anyone who has braved being different, whether that’s making a stand against artistic consensus or running the gauntlet of the last bus home dressed differently from everyone else > Calling a person pretentious can be a way of calling out the trappings and absurdities of power. It’s a way of undermining the authority that they have positioned themselves with. Fashion often get associated with pretentiousness, as present a signal of wealth. Simply by its outlandish message, people often get the impression that one wear these logo-mania brands often have a lot of capital in their possession, _which is usually the case_. In reality it is the middle class demographics who are purchasing these products, who can indulge themselves without compromising their financial well-beings. One shouldn’t get judged by the clothes they wear, but rather the character they possess. If you feel good in a designer dress, or you worked hard for it, then by all means you should enjoy it to the fullness. However, it often signals the flex culture, or inferred as new money. It is also a matter of [taste](https://aarnphm.xyz/thoughts/fashion/../../thoughts/taste), and [identity](https://aarnphm.xyz/thoughts/fashion/../../thoughts/identity). ### quiet luxury. > you don't actually love "quiet luxury." you're in love with the idea of inherited wealth, spacious uncluttered homes, and enough free time to pursue your hobbies [pic.twitter.com/OwSn6wWxVP](https://t.co/OwSn6wWxVP) > > — derek guy (@dieworkwear) [16 avril 2023](https://twitter.com/dieworkwear/status/1647662031619895296?ref_src=twsrc%5Etfw) Not a huge fan of fast fashion, I’d rather spend a bit more on a good pair of jeans that will last me years, than a cheap pair that will last me a few months. > Few exceptions include Uniqlo, Muji, but I wouldn’t consider them fast fashion, because they are actually high-quality products 😄 Don’t buy into maximalist brands. Overpriced, and the churn rate is high. > Few exceptions: Tom Ford, Maison Margiela, Saint Laurent --- > Go for quiet luxury, aka timeless pieces The following are few of favourite brands, in no particular order: | Brand | Genre of clothing to get | | --------------------------------------- | ----------------------------------------------------------------------------- | | Brunello Cucinelli | Cashmere, wool, and linen 🤌 😗 | | Manière De Voir | Probably the best black fit tee I’ve ever worn. | | Oak + Fort | Minimalistic, clean, and simple. Also not to be confused with _Frank and Oak_ | | Studio Nicholson | Trousers, and shirts, ong their leroy jacky are amazing. | | COS | Basics, and essentials. | | [Stoffa](https://stoffa.co/pages/store) | Custom made, and tailored, wanna know how to style. | | Sefr | Pretty niche, but some of their beige lazaro shirts are nice. | | Ted Baker | Holy crap that’s half of my closets. Trousers, shirts, suits, cargo, etc. | | Ralph Lauren Polo | Them trench coats are nice, daily driver during winter szn. | | Mansuir Gavriel | Their bags are my fav. | | Olend Backpacks | For the love of backpacks. | | Bellroy | Tote, durable, flexible. | | Loro Piana | If you can afford go for it | | Club Monaco | Trench coats, overcoats, too based. | | Brooks Brothers | Suits on special occasions. | | Arcteryx | Technical wear, performance, gears are awesome. | | Timberland | Utility, quality, style and worth for money. | | Banana Republic | Got their cashmere and merino wool sweaters. They are good. | | Abercrombie & Fitch | Baggy jeans, flannels comfy wear. | | Massimo Dutti | Their leather jackets are nice. | | Sezzane | Their blouses are nice. | --- slug: thoughts/friendship tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/friendship" title: "friendship" date: 2024-06-22 permalink: https://aarnphm.xyz/thoughts/friendship.html.md --- > Heartbreaks are also what you make of them. Relationships teach you how to gently treat someone as one of your own; they also bash and crush your heart, as if the world is crumbling in front of your eyes. But it is okay; relationships are what we deem worth giving meaning to the absurdity of life. ## of pleasure. or utility? > the issue with a majority of San Francisco's culture of authentic relating, circling, cuddle parties, "deep convo" events, is that it’s intimacy without relationship. Closeness without friendship.\ > \ > In other words, porn. It may feel good when you’re doing it, but empty… > > — Patricia Mou (@patriciamou\_) [16 février 2024](https://twitter.com/patriciamou_/status/1758354933521478126?ref_src=twsrc%5Etfw) > [!question]- How does one weave that delicate fabric of trust, when so many are ensnared in their own lives? > > To cultivate trust, we must first turn inward, nurturing the quiet confidence of self-trust before extending our hands to others. Yet, amidst the hustle and haste, how do we anchor our souls in authenticity and leave room for empathy? Can we create a sanctuary for trust to blossom, even in the most turbulent of seas? Trust requires vulnerability - the courage to show up as our authentic selves, to share our hopes and fears without guarantee. When we have faith in each other’s core intentions, even as we stumble and err, we weave a web of trust that can withstand the tempests of misunderstanding. > … because old friends may feel like strangers once substantial time has passed. Consistent with this possibility, several of the barriers that participants endorsed when thinking about reaching out to old friends are similar to the barriers that make people reluctant to talk to strangers. _People are surprisingly hesitant to reach out to old friend_ - [Communication Psychology](https://www.nature.com/articles/s44271-024-00075-8) reaching out feels… weird. In this age of tally, being genuine, would be the ONLY metrics that is important to maintaining relationship. But, can love so alloyed be counted as love at all? If each act of kindness conceals a grasping need, each smile a silent plea, then perhaps the better part of friendship has been lost. We trade in a base currency, a barter of pleasure and utility. The sacred alchemy of selfless affection feels beyond our ken. I yearn for something higher, a way of relating that does not reduce us to mere instruments of the other’s satisfaction. But the way is unclear, the path grown over from neglect. Perhaps most importantly, we must learn to give without strings, without ledger. It is ok to reach out to old friends, or go on that walk every Saturday to meet strangers. To offer our time, our energy, our care - not as a loan to be repaid, but as a gift freely bestowed. To delight in the joy of the other without thought to our own gain. None of this is to lay blame at anyone. We are all, to some degree, caught up in this dance, of pursuing something greater than ourselves. But even in such a world, we need not resign ourselves to a life of bartered affections. Still, I hold out hope that true friendship may yet be possible - a meeting of souls, freely given, that seeks the good of the other for their own sweet sake. Even if the world declares it folly, still I dream of a love uncorrupted. ## of mutual caring. > To care about something is generally to find it worthwhile or valuable in some way; caring about one’s friend is no exception. A central difference among the various accounts of mutual caring is the way in which these accounts understand the kind of evaluation implicit therein - [SEP](https://plato.stanford.edu/entries/friendship/) However, people are more afraid of commitment, and they are even more afraid of being hurt. Who wouldn’t? We have seen too much, borne witness to the cruelties that humans can inflict upon one another. It’s no wonder that hearts grow hesitant, that souls recoil from the prospect of vulnerability. And so we retreat, donning armor forged from fear, shielding ourself from this tumultuous life. But in our haste to protect ourselves, do we not also rob ourselves of life’s greatest joys? In the end, perhaps the greatest fear is not of commitment or solitude, of judgment or pain. Perhaps what we truly fear is the glorious, terrifying possibility of being seen, of being known in all our imperfect beauty. For to be truly seen is to be vulnerable, and relinquish control of oneself. And friends will be the ones who are there for you along the way. --- slug: thoughts/game-theory tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/game-theory" title: "game theory" date: 2024-04-12 permalink: https://aarnphm.xyz/thoughts/game-theory.html.md --- emerged John von Neumann published the paper _On the Theory of Games of Strategy_, when Neumann’s original proff used Brower’s fixed-point theorem on continuous mappings into compact convex sets. --- slug: thoughts/git tags: - technical description: "resconstructed source of https://aarnphm.xyz/thoughts/git" title: "git" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/git.html.md --- That one tool that every developers uses, but no one really understands. See also [The Git Parable](https://tom.preston-werner.com/2009/05/19/the-git-parable) by Tom Preston-Werner. # internals[](#internals) --- slug: thoughts/homeomorphism tags: - math - topology description: "or topological isomorphism." title: "homeomorphism" date: 2024-11-27 permalink: https://aarnphm.xyz/thoughts/homeomorphism.html.md --- alias: _topological isomorphism_, _bicontinuous function_ > bijective and continuous function between topological spaces that has a continuous inverse functions. > [!math] definition > > a function $f: X \rightarrow Y$ between two topological space is a **homeomorphism** if it has the following properties: > > - $f$ is a bijection (one-to-one and onto) > - $f$ is continuous > - $f^{-1}$ as the inverse function is continuous (or $f$ is an open mapping) > [!tip] > > $f^{-1}$ is continuous is _essential_. Consider the following example: > > - $f: \langle 0, 2 \pi ) \rightarrow S^1$ (the unit circle in $\mathbb{R}^2$) defined by $f(\varphi) = (\cos \varphi, \sin \varphi)$ > > - is bijective and continuous > - but not homeomorphism ($S^1$ is compact but $\langle 0, 2 \pi )$ is not) --- slug: thoughts/human-interaction tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/human-interaction" title: "human interaction" date: 2024-02-06 permalink: https://aarnphm.xyz/thoughts/human-interaction.html.md --- --- slug: thoughts/identity tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/identity" title: "identity" date: 2024-02-19 permalink: https://aarnphm.xyz/thoughts/identity.html.md --- ### [Freud](https://aarnphm.xyz/thoughts/identity/../../thoughts/Freud) --- slug: thoughts/index tags: - evergreen - fruit description: "resconstructed source of https://aarnphm.xyz/thoughts/index" title: "thoughts" date: 2024-01-09 permalink: https://aarnphm.xyz/thoughts/index.html.md --- Collection of scattered thoughts and ideas, concepts, thoughts that I entertain quite a lot. Here are some of my favourite [posts](https://aarnphm.xyz/thoughts/index/../../posts/) of [writing](https://aarnphm.xyz/thoughts/index/../../thoughts/writing) --- slug: thoughts/information-retrieval tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/information-retrieval" title: "information retrieval" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/information-retrieval.html.md --- --- slug: thoughts/intelligence tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/intelligence" title: "Intelligence" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/intelligence.html.md --- Lecture from [Hinton](https://www.youtube.com/watch?v=rGgGOccMEiY\&ab_channel=CSERCambridge) ## neuroscience --- slug: thoughts/joininteract tags: - seed - application description: "resconstructed source of https://aarnphm.xyz/thoughts/joininteract" title: "interact cohort 2024" date: 2024-08-23 permalink: https://aarnphm.xyz/thoughts/joininteract.html.md --- > [!question] Someone gives you 50,000 dollars for a project that explicitly can’t be a business. What’s the project you work on and why? I want to host [dinner](https://aarnphm.xyz/thoughts/joininteract/../../thoughts/atelier-with-friends) centered around intimacy and cultural curation in different cities. The project would involve hosting a series of 4-course meals for small groups, with each event celebrating the local cuisine and culture of its location. I also want to handcraft unique ceramic dishes for each course, adding my personal touch to the experience. Traveling to various cities would also allow me to explore regional ingredients, cooking techniques, and food traditions. I would document this culinary journey on a website featuring photos, recipes, and behind-the-scenes content from each dinner. At its core, this project stems from my love for people and my belief that cooking is a profound way to show care and strengthen human connection. In an age where superficial aspects of life often dominate, I cherish the authentic stories and bonds that can form when we gather around a shared meal. > [!question] What’s something you accomplished or created in the last year that you’re proud of? In the last year, I learned the quiet courage of loving oneself, albeit through hosting dinners, attuning to my inner child, and letting go. It was a hard-fought lesson, wrested from countless small surrenders to the ache and beauty that comprise this mortal coil. There were days mired in melancholy when I yearned to be someone, anyone else. To slip out of my own skin and leave behind the burdens I carried. But slowly, tentatively, I began to make peace with the face in the mirror - both foreign and familiar, an ally and adversary. I came to understand that even in the midst of pain, pinpricks of light could be found if one only remembered to look. It takes a peculiar kind of bravery to embrace the fullness of who you are, scars and all. To grant yourself grace on the days when you have nothing to give. I am still learning the art of it - how to meet my own gaze without flinching, how to be gentle with the wounded parts of my soul. But I am proud of how far I have come. Of the hard-won compassion I now extend to myself in moments of frailty and despair. There is a hushed valor in choosing to love the unlovely parts of your own being. In that quietude between the shadows and the light, I am beginning to discover the makings of peace. Some days, that is enough. Some days, it is everything. > [!question] Elaine Scarry: “Beauty comes out to greet us and prepares us for the other undertakings, for finding out about truth and committing ourselves to justice.” Agree or disagree? Elaine Scarry’s defence of [beauty](https://aarnphm.xyz/thoughts/joininteract/../../thoughts/papers/On-Beauty-and-Being-Just.pdf) against moral condemnation offers a compelling perspective on its role in our pursuit of higher truths and values. She argues that beauty’s immediate allure and clarity serve as an entry point to deeper understanding and ethical contemplation. The “clear discernibility” of beauty, according to Scarry, introduces us to states of certainty and conviction while simultaneously highlighting our capacity for error. This paradox encourages a nuanced approach to perception and judgment. Beauty’s power to induce “radical decentering” frees our minds from self-preoccupation, allowing us to better perceive the complexities of the world and the subtleties of truth. Scarry metaphorically describes beautiful things as “ladders reaching toward the beauty of the world,” suggesting that aesthetic experiences can elevate our consciousness and attune us to broader concepts of goodness and justice. [Kant](https://aarnphm.xyz/thoughts/joininteract/../../thoughts/Philosophy-and-Kant), in his Critique of Judgment, proposed that beauty can be seen as a symbol of the good. However, he cautioned that such an analogy should be approached with an awareness of the aspects in which beauty and goodness differ, as well as the aspects in which they reveal similarities. Ultimately, [beauty](https://aarnphm.xyz/thoughts/joininteract/../../thoughts/beauty) serves as a guide, albeit one shaped by cultural biases and power structures. While it can point us in fruitful directions and attune us to what is worthwhile, to truly arrive at truth, we must critically examine our notions of beauty and remain open to challenging our tastes. > [!question] What qualities or skills best characterize the way you discover and solve problems? My approach to discovering and solving problems is characterized by an intense curiosity and a willingness to dive deep into complex issues. I often cultivate a digital garden, a space where I voraciously explore tangentially related concepts and perspectives. While technology grants me access to a wealth of information and resources, it also exposes me to a world of uncertainty and ambiguity. Embracing the [Socratic](https://aarnphm.xyz/thoughts/joininteract/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates) paradox, I remain acutely aware of the limitations of my knowledge. This [epistemic](https://aarnphm.xyz/thoughts/joininteract/../../Epistemology) humility is not just a philosophical stance but a practical necessity for solving complex problems. It requires an openness to being wrong and a willingness to engage in trial-and-error experimentation. Underpinning this approach is a fundamental belief in [agency](https://aarnphm.xyz/thoughts/joininteract/../../Agency) and self-efficacy. Kant’s exhortation “Sapere aude” (dare to know) from his essay “Answering the Question: What is Enlightenment” resonates deeply with me. It encourages a program of intellectual self-liberation through reason, a path I strive to follow. I believe that with sufficient determination, ingenuity, and grit, we can achieve remarkable things. In this context, neuroticism, often viewed as a liability, becomes a gift when transmuted into dogged persistence. Combined with the humility to recognize the scope of ignorance, it propels a restless journey of discovery, chasing the light of knowledge into the unknown. Through this alchemical process, vices are transformed into virtues in the relentless pursuit of truth. > [!question] Who (between 18 and 23) would you be the most excited to find out was in your Fellowship class? Why? * I would be thrilled to meet individuals who are bridging the understanding gap between foundational models and humans through innovative interfaces and interactions. Language models perceive the world differently than we do, and developing rich interfaces to connect these distinct worldviews could lead to profound insights and a deeper understanding of our world. While techniques like prompting and dimensionality reduction offer glimpses into the possibilities, current interactions remain static. We have yet to experience a true extension of self through these models, as I firmly believe they are magical being. Solving this understanding gap would enhance our journey to refine our taste and attune to what is truly meaningful. --- slug: thoughts/large-models tags: - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/large-models" title: "Foundational large models" date: 2024-01-08 permalink: https://aarnphm.xyz/thoughts/large-models.html.md --- Popularized through [LLMs](https://aarnphm.xyz/thoughts/large-models/../../thoughts/LLMs), [GPT-3 paper](https://arxiv.org/abs/2005.14165), See also: 7.1 of [The Little Book of Deep Learning](https://aarnphm.xyz/thoughts/large-models/../../books#2024) Though, it should be thought as [Intelligence amplification](https://aarnphm.xyz/thoughts/large-models/../../thoughts/Intelligence-amplification) rather than “artificial intelligence” system. ## Scaling laws Initial [work](https://arxiv.org/abs/2001.08361) from OpenAI Distributed serving of large models requires cost-efficient methods[^1] - [Petals](https://petals.dev/): a decentralized system that run Llama 2 over internet ### large world models [LWM](https://github.com/LargeWorldModel/LWM): implementation of [RingAttention](https://aarnphm.xyz/thoughts/large-models/../../thoughts/Attention#ringattention) ## visions [^1]: [Distributed Inference and Fine-tuning of Large Language Models over the Internet](https://arxiv.org/abs/2312.08361) --- slug: thoughts/latent-space tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/latent-space" title: "latent space" date: 2024-04-03 permalink: https://aarnphm.xyz/thoughts/latent-space.html.md --- --- slug: thoughts/lenses tags: - seed - film description: "resconstructed source of https://aarnphm.xyz/thoughts/lenses" title: "Lenses" date: 2024-01-22 permalink: https://aarnphm.xyz/thoughts/lenses.html.md --- A collection of lenses I uses for both photos and [videos.](https://aarnphm.xyz/thoughts/lenses/../../thoughts/Cinematography) - Sony 10-18mm f/4 OSS - Sony 16-35mm f/2.8 GM II - Sony 24-70mm f/2.8 GM II - Sony 50mm f/1.8 - Sony 85mm f/1.8 --- slug: thoughts/linguistic tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/linguistic" title: "linguistic" date: 2024-02-12 permalink: https://aarnphm.xyz/thoughts/linguistic.html.md --- --- slug: thoughts/logic tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/logic" title: "logic" date: 2024-03-02 permalink: https://aarnphm.xyz/thoughts/logic.html.md --- --- slug: thoughts/manifold tags: - math description: "resconstructed source of https://aarnphm.xyz/thoughts/manifold" title: "manifold" date: 2024-11-27 permalink: https://aarnphm.xyz/thoughts/manifold.html.md --- a topological space that locally resembles Euclidean space near each point. > an $n$-dimensional manifold is a topological space with the property that each point has a [neighbourhood](https://aarnphm.xyz/thoughts/manifold/../../thoughts/manifold#neighborhood) that is [homeomorphic](https://aarnphm.xyz/thoughts/manifold/../../thoughts/homeomorphism) to an open subset of $n$-dimensional Euclidean space. Formally, a topological manifold is a _second countable Hausdorff space_ that is _locally homeomorphic_ to a Euclidean space. > [!abstract] Locally homeomorphic to a Euclidean space > > every point has a neighborhood [homeomorphic](https://aarnphm.xyz/thoughts/manifold/../../thoughts/homeomorphism) to an open subset of the Euclidean space $\mathbb{R}^n$ for some non-negative integer $n$ Implies that either the point is an isolated point $n=0$, or it has a neighborhood homeomorphic to the open ball: $$ \mathbf{B}^n = \{(x_{1},x_{2},\ldots, x_n) \in \mathbb{R}^n : x_1^2 + x_2^2 + \ldots x_n^2 <1\} $$ ## differentiable manifold _a topological manifold with a_ __globally__ defined differential structure. ### Pseudo-Riemannian manifold abbrev: Lorentzian manifold _with a metric tensor that is everywhere non-degenerate_ application used in general relativity is four-dimensional Lorentzian manifold for modeling space-time - url: thoughts/Tensor-field - description: metric tensors ### metric tensors > A tangent space is a $n$-dimensional differentiable manifold $M$ associated with each point $p$. a non-degenerate, smooth, symmetric bilinear map that assigns a real number to pairs of tangent vectors at each tangent space of the manifold. > [!math] metric tensor > > $$ > g: T_p M \times T_p M \to \mathbb{R} > $$ The map is symmetric and bilinear, so if $X, Y, Z \in T_p M$ are tangent vectors at point $p$ to the manifold $M$ then we have: $$ \begin{aligned} g(X,Y) &= g(Y,X) \\ g(aX + Y, Z) &= ag(X,Z) + g(Y,Z) \end{aligned} $$ for any real number $a \in \mathbb{R}$ > $g$ is _non-degenerate_ means there is no non-zero $X \in T_p M$ such that $g(X,Y)=0 \forall \space Y \in T_p M$ [Lien vers l'original](https://aarnphm.xyz/thoughts/manifold/../../thoughts/Tensor-field#metric-tensors) --- ## neighborhood think of open set or interior. intuition: a set of point containing that point where one can move some amount in any direction away from that point without leaving the set. > [!math] definition > > if $X$ is a topological space and $p$ is a point in $X$, then a **neighbourhood** of $p$ is a subset $V$ of $X$ that includes an _open set_ $U$ containing $p$: > > $$ > p \in U \subseteq V \subseteq X > $$ > > This is equivalent to the point $p \in X$ belonging to the topological interior of $V$ in $X$. > [!tip] properties > > the neighbourhood $V$ _need not be an open subset_ of $X$. --- slug: thoughts/mechanistic-interpretability tags: - interp description: "and reverse engineering neural networks." title: "mechanistic interpretability" date: 2024-10-30 permalink: https://aarnphm.xyz/thoughts/mechanistic-interpretability.html.md --- [whirlwind tour](https://www.youtube.com/watch?v=veT2VI4vHyU\&ab_channel=FAR%E2%80%A4AI), [initial exploration](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/pdfs/tinymorph-exploration.pdf), [glossary](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J) > The subfield of alignment that delves into reverse engineering of a neural network, especially [LLMs](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/LLMs) To attack the _curse of dimensionality_, the question remains: __how do we hope to understand a function over such a large space, without an exponential amount of time?__ [^lesswrongarc] ## inference application in the wild: [Goodfire](https://goodfire.ai/) and [Transluce](https://transluce.org/) > [!question]+ How we would do inference with SAE? > > > Quick 🧵 and some of quick introspection into how they might run inference > > > > — aaron (@aarnphm\_) [25 septembre 2024](https://twitter.com/aarnphm_/status/1839016131321016380?ref_src=twsrc%5Etfw) idea: treat SAEs as a `logit_processor`, similar to [guided decoding](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/vllm#guided-decoding) Current known bottleneck in vLLM: - `logit_processor` are row-wise, or logits are processed synchronously and blocking [^vllm-caveats] - no SPMD currently implemented ## steering refers to the process of manually modifying certain activations and hidden state of the neural net to influence its outputs For example, the following is a toy example of how a decoder-only transformers (i.e: GPT-2) generate text given the prompt “The weather in California is” ```mermaid flowchart LR A[The weather in California is] --> B[H0] --> D[H1] --> E[H2] --> C[... hot] ``` To steer to model, we modify $H_2$ layers with certain features amplifier with scale 20 (called it $H_{3}$)[^1] ```mermaid flowchart LR A[The weather in California is] --> B[H0] --> D[H1] --> E[H3] --> C[... cold] ``` One usually use techniques such as [sparse autoencoders](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/mechanistic-interpretability#sparse-autoencoders) to decompose model activations into a set of interpretable features. For feature [ablation](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/mechanistic-interpretability#ablation), we observe that manipulation of features activation can be strengthened or weakened to directly influence the model’s outputs A few examples where ([Panickssery et al., 2024](#bib-panickssery2024steeringllama2contrastive)) uses contrastive activation additions to steer Llama 2 ### contrastive activation additions intuition: using a contrast pair for steering vector additions at certain activations layers Uses _mean difference_ which produce difference vector similar to PCA: Given a dataset $\mathcal{D}$ of prompt $p$ with positive completion $c_p$ and negative completion $c_n$, we calculate mean-difference $v_\text{MD}$ at layer $L$ as follow: $$ v_\text{MD} = \frac{1}{\mid \mathcal{D} \mid} \sum_{p,c_p,c_n \in \mathcal{D}} a_L(p,c_p) - a_L(p, c_n) $$ > [!tip] implication > > by steering existing learned representations of behaviors, CAA results in better out-of-distribution generalization than basic supervised finetuning of the entire model. ## sparse autoencoders abbrev: SAE _see also: [landspace](https://docs.google.com/document/d/1lHvRXJsbi41bNGZ_znGN7DmlLXITXyWyISan7Qx2y6s/edit?tab=t.0#heading=h.j9b3g3x1o1z4)_ Often contains one layers of MLP with few linear ReLU that is trained on a subset of datasets the main LLMs is trained on. > empirical example: if we wish to interpret all features related to the author Camus, we might want to train an SAEs based on all given text of Camus to interpret “similar” features from Llama-3.1 > [!abstract] definition > > We wish to decompose a models’ activitation $x \in \mathbb{R}^n$ into sparse, linear combination of feature directions: > > $$ > \begin{aligned} x \sim x_{0} + &\sum_{i=1}^{M} f_i(x) d_i \\[8pt] \because \quad &d_i M \gg n:\text{ latent unit-norm feature direction} \\ &f_i(x) \ge 0: \text{ corresponding feature activation for }x \end{aligned} > $$ Thus, the baseline architecture of SAEs is a linear autoencoder with L1 penalty on the activations: $$ \begin{aligned} f(x) &\coloneqq \text{ReLU}(W_\text{enc}(x - b_\text{dec}) + b_\text{enc}) \\ \hat{x}(f) &\coloneqq W_\text{dec} f(x) + b_\text{dec} \end{aligned} $$ > training it to reconstruct a large dataset of model activations $x \sim \mathcal{D}$, constraining hidden representation $f$ to be sparse [L1 norm](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1#l1norm) with coefficient $\lambda$ to construct loss during training: $$ \begin{aligned} \mathcal{L}(x) &\coloneqq \| x-\hat{x}(f(x)) \|_2^2 + \lambda \| f(x) \|_1 \\[8pt] &\because \|x-\hat{x}(f(x)) \|_2^2 : \text{ reconstruction loss} \end{aligned} $$ > [!tip] intuition > > We need to reconstruction fidelity at a given sparsity level, as measured by L0 via a mixture of reconstruction fidelity and L1 regularization. We can reduce sparsity loss term without affecting reconstruction by scaling up norm of decoder weights, or constraining norms of columns $W_\text{dec}$ during training Ideas: output of decoder $f(x)$ has two roles - detects what features acre active ⇐ L1 is crucial to ensure sparsity in decomposition - _estimates_ magnitudes of active features ⇐ L1 is unwanted bias ### Gated SAE _uses Pareto improvement over training to reduce L1 penalty_ ([Rajamanoharan et al., 2024](#bib-rajamanoharan2024improvingdictionarylearninggated)) Clear consequence of the bias during training is _shrinkage_ ([Sharkey, 2024](#bib-sharkey2024feature)) [^shrinkage] Idea is to use [gated ReLU](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder/../../thoughts/optimization#gated-linear-units-and-variants) encoder ([Dauphin et al., 2017](#bib-dauphin2017languagemodelinggatedconvolutional); [Shazeer, 2020](#bib-shazeer2020gluvariantsimprovetransformer)): $$ \tilde{f}(\mathbf{x}) \coloneqq \underbrace{\mathbb{1}[\underbrace{(\mathbf{W}_{\text{gate}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{gate}}) > 0}_{\pi_{\text{gate}}(\mathbf{x})}]}_{f_{\text{gate}}(\mathbf{x})} \odot \underbrace{\text{ReLU}(\mathbf{W}_{\text{mag}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{mag}})}_{f_{\text{mag}}(\mathbf{x})} $$ where $\mathbb{1}[\bullet > 0]$ is the (point-wise) Heaviside step function and $\odot$ denotes element-wise multiplication. | term | annotations | | -------------------- | ------------------------------------------------------------------------------- | | $f_\text{gate}$ | which features are deemed to be active | | $f_\text{mag}$ | feature activation magnitudes (for features that have been deemed to be active) | | $\pi_\text{gate}(x)$ | $f_\text{gate}$ sub-layer’s pre-activations | to negate the increases in parameters, use _weight sharing_: Scale $W_\text{mag}$ in terms of $W_\text{gate}$ with a vector-valued rescaling parameter $r_\text{mag} \in \mathbb{R}^M$: $$ (W_\text{mag})_{ij} \coloneqq (\exp (r_\text{mag}))_i \cdot (W_\text{gate})_{ij} $$ ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder/../../thoughts/images/gated-sae-architecture.webp) _Figure 3: Gated SAE with weight sharing between gating and magnitude paths_ ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder/../../thoughts/images/gated_jump_relu.webp) _Figure 4: A gated encoder become a single layer linear encoder with [JumpReLU](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder/../../thoughts/optimization#jumprelu)_ ([Erichson et al., 2019](#bib-erichson2019jumpreluretrofitdefensestrategy)) _activation function_ $\sigma_\theta$ ### feature suppression See also: [link](https://www.alignmentforum.org/posts/3JuSjTZyMzaSeTxKk/addressing-feature-suppression-in-saes) Loss function of SAEs combines a MSE reconstruction loss with sparsity term: $$ \begin{aligned} L(x, f(x), y) &= \|y-x\|^2/d + c\mid f(x) \mid \\[8pt] &\because d: \text{ dimensionality of }x \end{aligned} $$ > the reconstruction is not perfect, given that only one is reconstruction. **For smaller value of $f(x)$, features will be suppressed** > [!note]- illustrated example > > consider one binary feature in one dimension $x=1$ with probability $p$ and $x=0$ otherwise. Ideally, optimal SAE would extract feature activation of $f(x) \in \{0,1\}$ and have decoder $W_d=1$ > > However, if we train SAE optimizing loss function $L(x, f(x), y)$, let say encoder outputs feature activation $a$ if $x=1$ and 0 otherwise, ignore bias term, the optimization problem becomes: > > $$ > \begin{aligned} a &= \argmin p * L(1,a,a) + (1-p) * L(0,0,0) \\ &= \argmin (1-a)^2 + \mid a \mid * c \\ &= \argmin a^2 + (c-2) *a +1 \end{aligned} \Longrightarrow \boxed{a = 1-\frac{c}{2}} > $$ > [!question]+ How do we fix feature suppression in training SAEs? > > introduce element-wise scaling factor per feature in-between encoder and decoder, represented by vector $s$: > > $$ > \begin{aligned} f(x) &= \text{ReLU}(W_e x + b_e) \\ f_s(x) &= s \odot f(x) \\ y &= W_d f_s(x) + b_d \end{aligned} > $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder) ## sparse crosscoders > [!tip] maturity > > a research preview from Anthroppic and this is pretty much still a work in progress see also [reproduction on Gemma 2B](https://colab.research.google.com/drive/124ODki4dUjfi21nuZPHRySALx9I74YHj?usp=sharing) and [github](https://github.com/ckkissane/crosscoder-model-diff-replication) A variant of [sparse autoencoder](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/sparse-autoencoder) where it reads and writes to multiple layers ([Lindsey et al., 2024](#bib-lindsey2024sparsecrosscoders)) Crosscoders produces _shared features across layers and even models_ ## motivations Resolve: - cross-layer features: resolve cross-layer superposition - circuit simplification: remove redundant features from analysis and enable jumping across training many uninteresting identity circuit connections - model diffing: produce shared sets of features across models. This also introduce one model across training, and also completely independent models with different architectures. ### cross-layer [superposition](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/mechanistic-interpretability#superposition-hypothesis) ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/images/additive-residual-stream-llm.webp) _given the additive properties of transformers’ residual stream, **adjacent layers** in larger transformers can be thought as “almost parallel”_ > [!tip]- intuition > > In basis of superposition hypothesis, a feature is a linear combinations of neurons at any given layers. > > ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/images/feature-neurons.webp) ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/images/one-step-circuit.webp) ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/images/parallel-joint-branch.webp) _if we think of adjacent layers as being “almost parallel branches that potentially have superposition between them”, then we can apply dictionary learning jointly [^jointlysae]_ ### persistent features and complexity Current drawbacks of sparse autoencoders is that we have to train it against certain activations layers to extract features. In terms of the residual stream per layers, we end up having lots of duplicate features across layers. > Crosscoders can simplify the circuit _given that we use an appropriate architecture_ [^risks] ## setup. > Autoencoders and transcoders as special cases of crosscoders. > > - autoencoders: reads and predict the same layers > - transcoders: read from layer $n$ and predict layer $n+1$ Crosscoder read/write to many layers, subject to causality constraints. > [!math]+ crosscoders > > Let one compute the vector of feature activation $f_(x_j)$ on data point $x_j$ by summing over contributions of activations of different layers $a^l(x_j)$ for layers $l \in L$: > > $$ > \begin{aligned} f(x_j) &= \text{ReLU}(\sum_{l\in L}W_{\text{enc}}^l a^l(x_j) + b_{\text{enc}}) \\[8pt] &\because W^l_{\text{enc}} : \text{ encoder weights at layer } l \\[8pt] &\because a^l(x_j) : \text{ activation on datapoint } x_j \text{ at layer } l \\ \end{aligned} > $$ We have loss $$ L = \sum_{l\in L} \|a^l(x_j) - a^{l^{'}}(x_j)\|^2 + \sum_{l\in L}\sum_i f_i(x_j) \|W^l_{\text{dec,i}}\| $$ and regularization can be rewritten as: $$ \sum_{l\in L}\sum_{i} f_i(x_j) \|W^l_{\text{dec,i}}\| = \sum_{i} f_i(x_j)(\displaystyle\sum_{l \in L} \|W^l_\text{dec,i}\|) $$ _weight of L1 regularization penalty by L1 norm of per-layer decoder weight norms_ $\sum\limits{l\in L} \|W^l_\text{dec,i}\|$ [^l2weightnorm] We use L1 due to - baseline loss comparison: L2 exhibits lower loss than sum of per-layer SAE losses, as they would effectively obtain a loss “bonus” by spreading features across layers - _layer-wise sparsity surfaces layer-specific features_: based on empirical results of [model diffing](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/sparse-crosscoders#model-diffing), that L1 uncovers a mix of shared and model-specific features, whereas L2 tends to uncover only shared features. ## variants ![](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/images/crosscoders-variants.webp) good to explore: 1. strictly causal crosscoders to capture MLP computation and treat computation performed by attention layers as linear 2. combine strictly causal crosscoders for MLP outputs without weakly causal crosscoders for attention outputs 3. interpretable attention replacement layers that could be used in combination with strictly causal crosscoders for a “replacement model” ## model diffing see also: [model stiching](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/model-stiching) and [SVCCA](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/SVCCA) > ([Laakso & Cottrell, 2000](#bib-doi:10.1080/09515080050002726)) proposes compare [representations](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders/../../thoughts/representations) by transforming into representations of distances between data points. [^sne] ## questions > How do features change over model training? When do they form? > As we make a model wider, do we get more features? or they are largely the same, packed less densely? [Lien vers l'original](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-crosscoders) ## superposition hypothesis > [!abstract]+ tl/dr > > phenomena when a neural network represents _more_ than $n$ features in a $n$-dimensional space > Linear representation of neurons can represent more features than dimensions. As sparsity increases, model use superposition to represent more [features](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/mechanistic-interpretability#features) than dimensions. > > neural networks “want to represent more features than they have neurons”. When features are sparsed, superposition allows compression beyond what linear model can do, at a cost of interference that requires non-linear filtering. reasoning: “noisy simulation”, where small neural networks exploit feature sparsity and properties of high-dimensional spaces to approximately simulate much larger much sparser neural networks In a sense, superposition is a form of **lossy [compression](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/Compression)** ### importance - sparsity: how _frequently_ is it in the input? - importance: how useful is it for lowering loss? ### over-complete basis _reasoning for the set of $n$ directions [^direction]_ ## features > A property of an input to the model When we talk about features ([Elhage et al., 2022, p. see “Empirical Phenomena”](#bib-elhage2022superposition)), the theory building around several observed empirical phenomena: 1. Word Embeddings: have direction which corresponding to semantic properties ([Mikolov et al., 2013](#bib-mikolov-etal-2013-linguistic)). For example: ```prolog V(king) - V(man) = V(monarch) ``` 2. Latent space: similar vector arithmetics and interpretable directions have also been found in generative adversarial network. We can define features as properties of inputs which a sufficiently large neural network will reliably dedicate a neuron to represent ([Elhage et al., 2022, p. see “Features as Direction”](#bib-elhage2022superposition)) ## ablation > refers to the process of removing a subset of a model’s parameters to evaluate its predictions outcome. idea: deletes one activation of the network to see how performance on a task changes. - zero ablation or _pruning_: Deletion by setting activations to zero - mean ablation: Deletion by setting activations to the mean of the dataset - random ablation or _resampling_ ## residual stream ```mermaid flowchart LR A[Token] --> B[Embeddings] --> C[x0] C[x0] --> E[H] --> D[x1] C[x0] --> D D --> F[MLP] --> G[x2] D --> G[x2] G --> I[...] --> J[unembed] --> X[logits] ``` residual stream $x_{0}$ has dimension $\mathit{(C,E)}$ where - $\mathit{C}$: the number of tokens in context windows and - $\mathit{E}$: embedding dimension. [Attention](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/Attention) mechanism $\mathit{H}$ process given residual stream $x_{0}$ as the result is added back to $x_{1}$: $$ x_{1} = \mathit{H}{(x_{0})} + x_{0} $$ ## grokking See also: [writeup](https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking), [code](https://colab.research.google.com/drive/1F6_1_cWXE5M7WocUcpQWp3v8z4b1jL20), [circuit threads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) > A phenomena discovered by ([Power et al., 2022](#bib-power2022grokkinggeneralizationoverfittingsmall)) where small algorithmic tasks like modular addition will initially memorise training data, but after a long time ti will suddenly learn to generalise to unseen data > [!tip] empirical claims > > related to phase change ## Bibliographie - Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). _Language Modeling with Gated Convolutional Networks_. arXiv preprint arXiv:1612.08083 [\[arxiv\]](https://arxiv.org/abs/1612.08083) - Erichson, N. B., Yao, Z., & Mahoney, M. W. (2019). _JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks_. arXiv preprint arXiv:1904.03750 [\[arxiv\]](https://arxiv.org/abs/1904.03750) - Rajamanoharan, S., Conmy, A., Smith, L., Lieberum, T., Varma, V., Kramár, J., Shah, R., & Nanda, N. (2024). _Improving Dictionary Learning with Gated Sparse Autoencoders_. arXiv preprint arXiv:2404.16014 [\[arxiv\]](https://arxiv.org/abs/2404.16014) - Sharkey, L. (2024). _Addressing Feature Suppression in SAEs_. AI Alignment Forum. [\[post\]](https://www.alignmentforum.org/posts/3JuSjTZyMzaSeTxKk/addressing-feature-suppression-in-saes) - Shazeer, N. (2020). _GLU Variants Improve Transformer_. arXiv preprint arXiv:2002.05202 [\[arxiv\]](https://arxiv.org/abs/2002.05202) - Gorton, L. (2024). _The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders to InceptionV1 Early Vision_. arXiv preprint arXiv:2406.03662 [\[arxiv\]](https://arxiv.org/abs/2406.03662) - Laakso, A., & Cottrell, G. (2000). Content and cluster analysis: Assessing representational similarity in neural systems. _Philosophical Psychology_, _13_(1), 47–76. - Lindsey, J., Templeton, A., Marcus, J., Conerly, T., Batson, J., & Olah, C. (2024). Sparse Crosscoders for Cross-Layer Features and Model Diffing. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2024/crosscoders/index.html) - Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield-Dodds, Z., Lasenby, R., Drain, D., Chen, C., Grosse, R., McCandlish, S., Kaplan, J., Amodei, D., Wattenberg, M., & Olah, C. (2022). Toy Models of Superposition. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2022/toy_model/index.html) - Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations. In L. Vanderwende, H. Daumé III, & K. Kirchhoff (Eds.), _Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies_ (pp. 746–751). Association for Computational Linguistics. - Panickssery, N., Gabrieli, N., Schulz, J., Tong, M., Hubinger, E., & Turner, A. M. (2024). _Steering Llama 2 via Contrastive Activation Addition_. arXiv preprint arXiv:2312.06681 [\[arxiv\]](https://arxiv.org/abs/2312.06681) - Power, A., Burda, Y., Edwards, H., Babuschkin, I., & Misra, V. (2022). _Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets_. arXiv preprint arXiv:2201.02177 [\[arxiv\]](https://arxiv.org/abs/2201.02177) [^lesswrongarc]: good read from [Lawrence C](https://www.lesswrong.com/posts/6FkWnktH3mjMAxdRT/what-i-would-do-if-i-wasn-t-at-arc-evals#Ambitious_mechanistic_interpretability) for ambitious mech interp. [^vllm-caveats]: [the benchmark](https://github.com/vllm-project/vllm/pull/10046) was run against `vllm#0.6.3.dev236+g48138a84`, with all configuration specified in the pull request. [^1]: An example steering function can be: $$ H_{3} = H_{2} + \text{steering\_strength} * \text{SAE}.W_{\text{dec}}[20] * \text{max\_activation} $$ [^shrinkage]: If we hold $\hat{x}(\bullet)$ fixed, thus L1 pushes $f(x) \to 0$, while reconstruction loss pushes $f(x)$ high enough to produce accurate reconstruction. An optimal value is somewhere between. However, rescaling the [shrink](https://aarnphm.xyz/thoughts/mechanistic-interpretability/../../thoughts/sparse-autoencoder/../../thoughts/mechanistic-interpretability#feature-suppression) feature activations ([Sharkey, 2024](#bib-sharkey2024feature)) is not necessarily enough to overcome bias induced by L1: a SAE might learnt sub-optimal encoder and decoder directions that is not improved by the fixed. [^jointlysae]: ([Gorton, 2024](#bib-gorton2024missingcurvedetectorsinceptionv1)) denotes that cross-branch superposition is significant in interpreting models with parallel branches (InceptionV1) [^risks]: causal description it provides likely differs from that of the underlying model. [^l2weightnorm]: $\|W_\text{dec,i}^l\|$ is the L2 norm of a single feature’s decoder vector at a given layer. In principe, one might have expected to use L2 norm of per-layer norm $\sqrt{\sum_{l \in L} \|W_\text{dec,i}^l\|^2}$ [^sne]: Chris Colah’s [blog post](https://colah.github.io/posts/2015-01-Visualizing-Representations/) explains how t-SNE can be used to visualize collections of networks in a function space. [^direction]: Even though features still correspond to directions, the set of interpretable direction is larger than the number of dimensions --- slug: thoughts/model-stiching tags: - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/model-stiching" title: "model stiching" date: 2024-11-04 permalink: https://aarnphm.xyz/thoughts/model-stiching.html.md --- ([Lenc & Vedaldi, 2015](#bib-lenc2015understandingimagerepresentationsmeasuring)) ## Bibliographie - Lenc, K., & Vedaldi, A. (2015). _Understanding image representations by measuring their equivariance and equivalence_. arXiv preprint arXiv:1411.5908 [\[arxiv\]](https://arxiv.org/abs/1411.5908) --- slug: thoughts/monetary tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/monetary" title: "Monetary" date: 2024-01-20 permalink: https://aarnphm.xyz/thoughts/monetary.html.md --- Karpathy on [AI’s 30 under 30](https://twitter.com/karpathy/status/1748816969858720232) > [Money](https://aarnphm.xyz/thoughts/monetary/../../thoughts/monetary) is an information system for labor allocation. (this [tweet](https://x.com/elonmusk/status/1349977642708168704?s=20)) Money doesn’t have any intrinsic power. You can’t simply throw more money into a system and hope it would fix the problem. [Chaos](https://aarnphm.xyz/thoughts/monetary/../../thoughts/Chaos) is produced from the act of generating wealth. What does it really means by accumulating wealth? If capital gains is a property in the pursuit for [knowledge](https://aarnphm.xyz/thoughts/monetary/../../thoughts/Epistemology), chances are you will enjoy your time. The problems with curiosity without [alignment](https://aarnphm.xyz/thoughts/monetary/../../thoughts/Alignment) of capitalism is that you will run out of time and money sooner or later. ## free market See also [Capitalism and Freedom](https://aarnphm.xyz/thoughts/monetary/../../thoughts/Capitalism-and-Freedom) ([Friedman & Friedman, 1962](#bib-friedman1962capitalism)) ## Bibliographie - Friedman, M., & Friedman, R. D. (1962). _Capitalism and Freedom_ (p. 202). University of Chicago Press. --- slug: thoughts/moral tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/moral" title: "Moral" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/moral.html.md --- See also: [Value](https://aarnphm.xyz/thoughts/moral/../../thoughts/Value) > [!tip] Justification > > Provide criteria for judging actions. It might be that the criterion is simple, such as right actions maximize the good, or it may be complex, such as the right action is the one that gives adequate weight to each competing duty Most notable are Kant’s [deontological ethics](https://aarnphm.xyz/thoughts/moral/../../thoughts/Philosophy-and-Kant), utilitarianism, and virtue ethics. Considering what is right? or provides the account of wrongness, permissibility. --- slug: thoughts/music-theory tags: - seed - sapling description: "resconstructed source of https://aarnphm.xyz/thoughts/music-theory" title: "Music theory" date: 2023-09-25 permalink: https://aarnphm.xyz/thoughts/music-theory.html.md --- Half steps → between E-F, B-C Full step from E → F# Minor → Flat major scale Elements per side for a House/Techno: L-R: Perc, piano, strings, pads, guitars, synths, fx, bv M: vocals, snare, bass, kick ### Effects ### Vocals Call and responses ### Mids - Guitar, piano ### Bass Usually with Synths/808 Tools: - Ableton Operator - Serum 1/16 note grid i-iv-vii ### Drums UKG: → Swing (1/16th bar off) #### Kick - Fat 909 #### Hats Jacking hi-hats: closed followed by a open hi-hats or vice versa #### Claps #### Percussion ### Syncopation ### Major third For any given scale, choose an altered chords: - Instead of a III as a minor, play as a major - major III to a minor VI --- slug: thoughts/observer-expectancy-effect tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/observer-expectancy-effect" title: "observer-expectancy effect" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/observer-expectancy-effect.html.md --- The observer’s prejudices influence towards the people she/he is observing. ## Robert Rosenthal paper: **The effect of experimenter bias on the performance of the albino rat** ### Clever Hans ## Operant conditioning Where behaviours are modified through the associations of stimuli with reinforcement or punishment. Thus, operants, or behaviours affected by the environment, are conditioned to happen more or less often based on the environmental consequence of the behaviour. --- slug: thoughts/optimization tags: - ml description: "A list of optimization functions that can be used in ML training to reduce loss, and more." title: "ml optimization" date: 2024-10-31 permalink: https://aarnphm.xyz/thoughts/optimization.html.md --- ## `exp()` see also ([Abdelkhalik et al., 2022](#bib-abdelkhalik2022demystifyingnvidiaamperearchitecture)), [RDNA3 instruction sets of V\_LDEXP\_F32](https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf) Usually a lot better comparing to `2**t` simply for [numerical stability](https://aarnphm.xyz/thoughts/optimization/../../thoughts/university/twenty-three-twenty-four/compsci-4x03/Equations) reasons For ARM the design specially [instructions](https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/FEXPA--Floating-point-exponential-accelerator-) set for it! ```cpp title="pseudocode-exp-fexpa.cpp" // Pseudocode representing the computation flow: float32x4_t exp_sve2(float32x4_t x) { // Step 1: Range reduction // N = round(x * log2(e)) // r = x - N * ln(2) [reduced argument] // Step 2: FEXPA instruction provides 2^N approximation // In hardware: FEXPA Z0.S, Z1.S float32x4_t exp_approx; // Result of FEXPA // Step 3: Polynomial evaluation for exp(r) // Typically uses Horner's method with reduced precision // coefficients since we're starting with a good approximation float32x4_t exp_r = evaluate_polynomial(r); // Step 4: Combine results return exp_approx * exp_r; } ``` Advantages of FEXPA: - single instruction latency for initial approximation - vectorized ops for batch processing On GPU we can utilise bit-shift `1< component-wise product of two linear transformations of the inputs, one of which is sigmoid-activated. ([Shazeer, 2020](#bib-shazeer2020gluvariantsimprovetransformer)) introduces a few GELU activations to yield improvements in [Transformers](https://aarnphm.xyz/thoughts/optimization/../../thoughts/Transformers) architecture. $$ \begin{aligned} \text{GLU}(x,W,V,b,c) &= \sigma(xW+b) \otimes (xV+c) \\ \text{Bilinear}(x,W,V,b,c) &= (xW+b) \otimes (xV+c) \end{aligned} $$ GLU in other variants: $$ \begin{aligned} \text{ReGLU}(x,W,V,b,c) &= \max(0, xW+b) \otimes (xV+c) \\ \text{GEGLU}(x,W,V,b,c) &= \text{GELU}(xW+b) \otimes (xV+c) \\ \text{SwiGLU}(x,W,V,b,c) &= \text{Swish}_\beta(xW+b) \otimes (xV+c) \end{aligned} $$ FFN for transformers layers would become: $$ \begin{aligned} \text{FFN}_\text{GLU}(x,W,V,W_{2}) &= (\sigma (xW) \otimes xV)W_{2} \\ \text{FFN}_\text{Bilinear}(x,W,V,W_{2}) &= (xW \otimes xV)W_{2} \\ \text{FFN}_\text{ReGLU}(x,W,V,W_{2}) &= (\max(0, xW) \otimes xV)W_{2} \\ \text{FFN}_\text{GEGLU}(x,W,V,W_{2}) &= (\text{GELU}(xW) \otimes xV)W_{2} \\ \text{FFN}_\text{SwiGLU}(x,W,V,W_{2}) &= (\text{Swish}_\beta(xW) \otimes xV)W_{2} \end{aligned} $$ _note_: reduce number of hidden units $d_\text{ff}$ (second dimension of $W$ and $V$ and the first dimension of $W_{2}$) by a factor of $\frac{2}{3}$ when comparing these layers ## JumpReLU ([Erichson et al., 2019](#bib-erichson2019jumpreluretrofitdefensestrategy)) application: [Gated SAE](https://aarnphm.xyz/thoughts/optimization/../../thoughts/sparse-autoencoder#gated-sae) ([Rajamanoharan et al., 2024](#bib-rajamanoharan2024jumpingaheadimprovingreconstruction)) $$ J(z) \coloneqq z H(z - \kappa) = \begin{cases} 0 & \text{if } z \leq \kappa \\ z & \text{if } z > \kappa \end{cases} $$ [](https://aarnphm.xyz/thoughts/optimization/../../thoughts/images/JumpReLU.mp4) ## momentum See also [Stochastic gradient descent](https://aarnphm.xyz/thoughts/optimization/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent), [Cornell’s CS6787](https://www.cs.cornell.edu/courses/cs6787/2017fa/Lecture3.pdf) > [!math] gradient descent > > $$ > x_{t+1} = x_t - \alpha \nabla f(x_t) > $$ > > _source code_ In the case of quadratic function: $f(x) = \frac{1}{2} x^2$, then $x_{t+1} = x_t - \alpha x_t = (1-\alpha)x_t$ Think of convergence rate $$ \mid x_{t+1} - 0 \mid = \mid 1 - \alpha \mid \mid x_t - 0 \mid $$ ![](https://aarnphm.xyz/thoughts/optimization/../../thoughts/images/convergence-vs-step-side-momentum.webp) If we set different curvature ($f(x) = 2x^2$) thus $x_{t+1} = x_t - 4 \alpha x_t = (1-4 \alpha)x_t$ > [!tip] step size > > step size depends on curvature for one-dimensional quadratics > > more curvature means smaller ideal step size _how would this play for general quadratics?_ for PSD symmetric $A$ $$ f(x) = \frac{1}{2} x^T Ax $$ with gradient descent has update step $$ x_{t+1} = x_t - \alpha A x_t = (I - \alpha A)x_t $$ convergence rate would be $$ \begin{aligned} \max_{x} \frac{\|(I - \alpha A)x\|}{\|x\|} &= \max_{x} \frac{1}{\|x\|} \left\| \left( I - \alpha \sum_{i=1}^{n} \lambda_i u_i u_i^T \right) x \right\| \\[8pt] &= \max_{x} \frac{\|\sum_{i=1}^{n} (1- \alpha \lambda_i) u_i u_i^T x\|}{\|\sum_{i=1}^{n} u_i u_i^T x\|} \\ &= max_i \mid 1- \alpha \lambda_i \mid \\ &=max(1-\alpha \lambda_{\text{min}}, \alpha \lambda_{\text{max}} - 1) \end{aligned} $$ > [!math] optimal convergence rate > > optimal value occurs when > > $$ > 1 - \alpha \lambda_{\text{min}} = \alpha \lambda_{\text{max}} - 1 \Rightarrow \alpha = \frac{2}{\lambda_{\text{max}} + \lambda_{\text{min}}} > $$ > > with rate > > $$ > \frac{\lambda_{\text{max}} - \lambda_{\text{min}}}{\lambda_{\text{max}} + \lambda_{\text{min}}} > $$ We denote $\kappa = \frac{\lambda_{\text{max}}}{\lambda_{\text{min}}}$ as **condition number** of matrix A > [!abstract] poorly conditioned > > Problems with larger condition numbers converge slower. > > Intuitively these are problems that are _highly curved in some directions, but flat others_ ### Polyak abbreviation: “heavy ball method” idea: add an extra momentum term to gradient descent $$ x_{t+1} = x_t - \alpha \nabla f(x_t) + \beta (x_t - x_{t-1}) $$ tl/dr: if current gradient step is in same direction as previous step, then move a little further in the same direction > [!math]- momentum for 1D quadratics > > $$ > f(x) = \frac{\lambda}{2} x^{2} > $$ > > momentum GD gives > > $$ > \begin{aligned} x_{t+1} &= x_t - \alpha \lambda x_t + \beta (x_t - x_{t-1}) \\ &= (1+\beta - \alpha \lambda) x_t - \beta x_{t-1} \end{aligned} > $$ > > characterizing momentum: > > - start with $x_{t+1} = (1+\beta -\alpha \lambda) x_t - \beta x_{t-1}$ > - trick: let $x_t = \beta^{t/2}z_t$ > > $$ > z_{t+1} = \frac{1 + \beta - \alpha \lambda}{\sqrt{\beta}} z_t - z_{t-1} > $$ > > let $u = \frac{1+\beta -\alpha \lambda}{2 \sqrt{\beta}}$, then > > $$ > z_{t+1} = 2 u z_t - z_{t-1} > $$ > > _degree-$\textbf{t}$ polynomial in $\textbf{u}$_ ### Nesterov See also [paper](http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf), [momentum](https://aarnphm.xyz/thoughts/optimization/../../thoughts/Nesterov-momentum/../../thoughts/optimization#momentum) idea: - first take a step in the direction of accumulated momentum - computes gradient at “lookahead” position, - make the update using this gradient. > [!abstract] definition > > For a parameter vector $\theta$, the update can be expressed as > > $$ > \begin{aligned} v_t &= \mu v_{t-1} + \nabla L(\theta_t + \mu v_{t-1}) \\ \theta_{t+1} &= \theta_t - \alpha v_t \end{aligned} > $$ Achieves better convergence rates | function type | gradient descent | Nesterove AG | | ------------------------ | ---------------------------------- | --------------------------------------- | | Smooth | $\theta(\frac{1}{T})$ | $\theta(\frac{1}{T^{2}})$ | | Smooth & Strongly Convex | $\theta(\exp (-\frac{T}{\kappa}))$ | $\theta(\exp -\frac{T}{\sqrt{\kappa}})$ | > [!math] optimal assignments for parameters > > $$ > \alpha = \frac{1}{\lambda_{\text{max}}}, \beta = \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1} > $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/optimization/../../thoughts/Nesterov-momentum) ## Bibliographie - Abdelkhalik, H., Arafa, Y., Santhi, N., & Badawy, A.-H. (2022). _Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis_. arXiv preprint arXiv:2208.11174 [\[arxiv\]](https://arxiv.org/abs/2208.11174) - Erichson, N. B., Yao, Z., & Mahoney, M. W. (2019). _JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks_. arXiv preprint arXiv:1904.03750 [\[arxiv\]](https://arxiv.org/abs/1904.03750) - Rajamanoharan, S., Lieberum, T., Sonnerat, N., Conmy, A., Varma, V., Kramár, J., & Nanda, N. (2024). _Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders_. arXiv preprint arXiv:2407.14435 [\[arxiv\]](https://arxiv.org/abs/2407.14435) - Ramachandran, P., Zoph, B., & Le, Q. V. (2017). _Searching for Activation Functions_. arXiv preprint arXiv:1710.05941 [\[arxiv\]](https://arxiv.org/abs/1710.05941) - Shazeer, N. (2020). _GLU Variants Improve Transformer_. arXiv preprint arXiv:2002.05202 [\[arxiv\]](https://arxiv.org/abs/2002.05202) --- slug: thoughts/papers/index tags: - folder description: "resconstructed source of https://aarnphm.xyz/thoughts/papers/index" title: "papers." date: 2024-01-20 permalink: https://aarnphm.xyz/thoughts/papers/index.html.md --- A somewhat local cache of all papers I’ve read. This is one source of my Zotero [library](https://aarnphm.xyz/thoughts/papers/index/../../../../books). --- slug: thoughts/pdfs/index tags: - folder description: "resconstructed source of https://aarnphm.xyz/thoughts/pdfs/index" title: "pdfs." date: 2024-10-29 permalink: https://aarnphm.xyz/thoughts/pdfs/index.html.md --- The following include a list of PDFs that are pretty cool --- slug: thoughts/personal-computing tags: - seed - computing description: "resconstructed source of https://aarnphm.xyz/thoughts/personal-computing" title: "personal computing" date: 2024-02-25 permalink: https://aarnphm.xyz/thoughts/personal-computing.html.md --- See [this tweet](https://twitter.com/joekndy/status/1761616198482219368) --- slug: thoughts/play tags: - seed - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/play" title: "Play" date: 2023-10-18 permalink: https://aarnphm.xyz/thoughts/play.html.md --- ### play? > intentional activity of doing the thing you want to do ⇒ create share ownership of spaces which competition cannot and have as much fun as we can. Turn life into a canvas, rather a graph with checkpoint. Throw away your 5-year life plan, to create a garden of your curiosity Commercial viability vs. creativity endeavour, [Do thing that don’t scale](https://paulgraham.com/ds.html) ### software. > 🏀 A Note on Playful Software 🏀\ > \ > Playful software != video games. I mean tinkerable, whimsical, playful consumer software: creative software, social networks, dating apps, messengers\ > \ > Play that's not segregated from ordinary life [pic.twitter.com/cHyGpcI9m8](https://t.co/cHyGpcI9m8) > > — XH (@xhfloz) [19 septembre 2023](https://twitter.com/xhfloz/status/1704176399173488823?ref_src=twsrc%5Etfw) Four components - whimsy - new people - surprise - joy Involves freedom of choice - social - about the process ### Create spaces not product > not necessarily meaning you are doing for yourself, but make it possible for others to utilise the space. ### Play as a form of tinkering Internet [playground](https://woolgather.sh/issue/2) Can we shift [education system](https://aarnphm.xyz/thoughts/play/../../thoughts/education#system) away from assessing students to let them explore their own interests? [Magic Circle](https://subconscious.substack.com/p/magic-circles) or [from squishy\[dot\]computer](https://newsletter.squishy.computer/p/magic-circles) - is a space which a game takes place. Once we step into it, we suspend the rules of life, allow the rules of the game to take over our interactions - boundaries of magic circle often via ceremonies: - National Anthem before olympics game - Gong before yoga class - Walking down the aisle at a wedding ⇒ similar to the idea of [liminal space](https://en.wikipedia.org/wiki/Liminality) in anthropology, or [Game of life](https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life) Graeber on [What’s the point of we can’t have fun](https://davidgraeber.org/articles/whats-the-point-if-we-cant-have-fun/) > Why does the existence of action carried out for the sheer pleasure of acting, the exertion of powers for the sheer pleasure of exerting them, strike us as mysterious? What does it tell us about ourselves that we instinctively assume that it is? - “Man plays only when he is in the full sense of the word a man” (Friedrich Schiller, 1795) ### [philosophy](https://aarnphm.xyz/thoughts/play/../../tags/philosophy) > [!notes] Philosophy as play > > Involves a form of perspective shifting: trying on or inhabit alternative perspective Intellectual playfulness[^1], loosely, is the disposition to try out new ideas, perspectives and systems of thought (involves perspective shifting) for the sheer joy of it. It is a disposition to explore ideas for the value of exploration itself. - intellectually playful exploration sometimes can better serve the goal of finding the truth, than will exploration that is strictly aimed at finding the truth - it functions against epistemic traps: belief systems that undermine our epistemic efforts, leaving us stuck inside them ### Irony Play involves lightness with rules — the ability to lightly step away from but also the ability to lightly adopt. To be serious about a game is to play it under the idea that its goals are really and genuinely important — as an Olympic athlete does. To be playful about games is neither to be utterly serious, or utterly ironic, but to move easily into and out of commitments to rule-sets > To be playful is to wear the games’ care lightly > To be playful is to be pretentious #### Pretentious: Why It Matters by Daniel Fox - argues that pretentious invokes curiosity and creativity, instead of negative connotation Necessitates freedom, conditional freedom? Play often initiate some sort of pressure, such that it expects us to be a part of the construction. [^1]: excerpt from [Playfulness vs Epistemic Traps](https://philpapers.org/archive/NGUPVE.pdf) --- slug: thoughts/prompt-engineering tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/prompt-engineering" title: "Prompt engineering" date: 2024-02-12 permalink: https://aarnphm.xyz/thoughts/prompt-engineering.html.md --- A constructive way to form communications with [LLMs](https://aarnphm.xyz/thoughts/prompt-engineering/../../thoughts/LLMs). As we improve the quality of prompts, we can expect better results from the models. Similar to [linguistic](https://aarnphm.xyz/thoughts/prompt-engineering/../../thoughts/linguistic), a good prompt is a good form of communication with the system. This is different from [zero-shot prompting](https://aarnphm.xyz/thoughts/prompt-engineering/../../thoughts/zero-shot-learning) ## CoT prompting See also: [NLP](https://aarnphm.xyz/thoughts/prompt-engineering/../../thoughts/NLP) You can think of it as explaining a big topics to a five years old. You break down topic into smaller, logic parts that mimics a train of thoughts. ## Least-to-most prompting Prompted to first list the sub-problems to a problem, then solve them in sequence. --- slug: thoughts/quantization tags: - seed - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/quantization" title: "Quantization" date: 2024-02-05 permalink: https://aarnphm.xyz/thoughts/quantization.html.md --- See also: [this talk](https://aarnphm.xyz/thoughts/quantization/../../thoughts/images/htn-openllm.pdf) I gave at Hack the North 2023. > reduce computational and memory costs of running inference with representing the weight and activations with low-precision data type - `int16` - [half precision](https://aarnphm.xyz/thoughts/quantization/../../thoughts/quantization#fp32-to-fp16) - `bfloat16` - `int8` > [!note] Note > > This also applies to post-training quantization, where the methodology is applied after the model has been trained, instead of during load-time. ## `fp32` to `fp16` > Does my operation support `fp16`? - CPU does support saving `fp16` weights, but computations are done in `fp32` > Does my operation _sensitive_ to `fp16`? For example `epsilon` in `LayerNormalization` usually is very small $1e^{-12}$, but smallest value in `fp16` is $\approx 6e^{-5}$, which cause `NaN` issues. ## `fp32` to `int8` Consider a float `x` in `[a, b]`, such that _affine quantization scheme_: $$ x = S \cdot (x_q - Z) $$ where: - $x_q$ is the quantized `int8` associated with `x` - $S$ and $Z$ are scaling and zero-point parameters - $S$ is the scale, positive `float32` - $Z$ is the zero-point, or the `int8` value corresponding to value `0` in `fp32` Thus quantized value $x_q$ is: $x_q = \text{round}(x / S + Z)$ And `fp32` value outside of `[a, b]` is clipped to closest representable value. $$ \forall x \in [a, b] \quad x_q = \text{clip}(\text{round}(x/S + Z), \text{round}(a/S + Z), \text{round}(b/S + Z)) $$ See also: [paper](https://arxiv.org/abs/1712.05877) ## quantization time - Post-training **dynamic quantization**: range of each activation is computed on the fly at _runtime_ - Post-training **static quantization**: range of each activation is computed _offline_ before _runtime_ - Observers are put on activations to collect their value - certain number of forward passes on calibration datasets - range of each computation are computed according to some _calibration technique_ - **Quantization aware training**: range of each activation is computed _during training_ - `fake_quantize` operations are inserted in the computation graph - `fake_quantize` is a no-op during inference, but during training, it simulates the effect of quantization ## Methods and libraries [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) and [GPTQ](https://arxiv.org/abs/2210.17323) --- slug: thoughts/questions tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/questions" title: "questions" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/questions.html.md --- What is questions really? People always say “there is no such thing as a stupid question”, but I do think questions comes from innate ability and desire to learn, not coming from subjective opinion. Source: [Ask better questions on Kernel](https://www.kernel.community/en/learn/module-2/better-questions/) ### Socratic method [Socrates](https://aarnphm.xyz/thoughts/questions/../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates) are notoriously known for just asking questions and involve in such dialogue. A method of hypothesis elimination, in that better hypotheses are found by steadily identifying and eliminating those that lead to contradictions. A Socratic Circle is an approach to understanding texts. It is based off the assumption that all knowledge is a posteriori knowledge, all thinking comes from asking questions, and that one question should lead to asking further questions. Students will then often involved in a Socratic [dialects](https://aarnphm.xyz/thoughts/questions/../../thoughts/dialectics), where inner circle will explore and ask questions, the outer circle will then provide feedback and vice versa. --- slug: thoughts/reason tags: - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/reason" title: "reason" date: 2024-02-26 permalink: https://aarnphm.xyz/thoughts/reason.html.md --- ### inductive. ### deductive. --- slug: thoughts/reductionism tags: - seed - psychology description: "resconstructed source of https://aarnphm.xyz/thoughts/reductionism" title: "reductionism" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/reductionism.html.md --- See also: [Compression](https://aarnphm.xyz/thoughts/reductionism/../../thoughts/Compression) Reductionism is the relationship among theories. 1. Ontology: a belief that whole of reality consists of a minimal number of parts 2. Methodology: scientific attemp to provide explanation in terms of ever-smaller entities 3. Theory: suggest newer theory does not replace/absorb older one, but reduces it to more basic terms. --- slug: thoughts/representations tags: - seed - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/representations" title: "representations." date: 2024-02-25 permalink: https://aarnphm.xyz/thoughts/representations.html.md --- See also: Edward Tuffe’s \[The Visual Display of Quantitative Information], ISBN-13: 978-1930824133 > [!question] Question > > How do we represent [information](https://aarnphm.xyz/thoughts/representations/../../thoughts/data) when we interact with different mediums, esp text? > Maps shape how we see and understand the world, which in turn shapes how we act within it. ## maps [Linus’ talk on Representation](https://www.media.mit.edu/events/thinking-with-sand-a-virtual-talk-series-exploring-new-software-interfaces-and-tools-for-augmented-thinking-and-creative-exploration/) how it shape our _agency_ ## as technology ([Viégas & Wattenberg, 2023](#bib-viégas2023modelusermodelexploring)) How can we manipulate new ideas and abstractions to create new _notations_? Thesis: Technologies for _representation_ mediate how we see the world. Tenet: 1. A representation must abstract 2. An interface always gives or takes agency. _Be intentional_ about which you choose 3. Represent to communicate, not model. ## internals grokking Though [mechanistic interpretability](https://aarnphm.xyz/thoughts/representations/../../thoughts/mechanistic-interpretability) to invoke agency _Seeing clearly_ to exercise _[agency](https://aarnphm.xyz/thoughts/representations/../../thoughts/Agency)_ ## interface Instrumental use: - frictionless input, low-latency progress: payments. Engaged use: - see clearly from right perspectives. Express intent naturally and precisely. - Explore the possible spaces only through engaging with the information Why don’t we do this more? > It is a hyperreal produced from a radiating synthesis of combinatory models in a hyperspace without atmosphere. _Simulacra and Simulation, Jean Baudrillard_ > Art of Cartography attained such Perfection that the map of a single Province occupied the entirely of a City \[…] _saw that that vast Map was Useless..._ _On Exatitude in Science, Jorge Luis Borges_ ## Bibliographie - Viégas, F., & Wattenberg, M. (2023). _The System Model and the User Model: Exploring AI Dashboard Design_. arXiv preprint arXiv:2305.02469 [\[arxiv\]](https://arxiv.org/abs/2305.02469) --- slug: thoughts/scripts/index tags: - folder description: "resconstructed source of https://aarnphm.xyz/thoughts/scripts/index" title: "scripts." date: 2024-10-30 permalink: https://aarnphm.xyz/thoughts/scripts/index.html.md --- A list of tools to be used for this vault. --- slug: thoughts/scripts/manim/index tags: - folder - math description: "manim-related scripts for some visualisation" title: "manim." date: 2024-11-24 permalink: https://aarnphm.xyz/thoughts/scripts/manim/index.html.md --- a list of manim-related code for a few visuals scattered around the garden. --- slug: thoughts/sparse-autoencoder tags: - ml - interp description: "resconstructed source of https://aarnphm.xyz/thoughts/sparse-autoencoder" title: "sparse autoencoder" date: 2024-11-04 permalink: https://aarnphm.xyz/thoughts/sparse-autoencoder.html.md --- abbrev: SAE _see also: [landspace](https://docs.google.com/document/d/1lHvRXJsbi41bNGZ_znGN7DmlLXITXyWyISan7Qx2y6s/edit?tab=t.0#heading=h.j9b3g3x1o1z4)_ Often contains one layers of MLP with few linear ReLU that is trained on a subset of datasets the main LLMs is trained on. > empirical example: if we wish to interpret all features related to the author Camus, we might want to train an SAEs based on all given text of Camus to interpret “similar” features from Llama-3.1 > [!abstract] definition > > We wish to decompose a models’ activitation $x \in \mathbb{R}^n$ into sparse, linear combination of feature directions: > > $$ > \begin{aligned} x \sim x_{0} + &\sum_{i=1}^{M} f_i(x) d_i \\[8pt] \because \quad &d_i M \gg n:\text{ latent unit-norm feature direction} \\ &f_i(x) \ge 0: \text{ corresponding feature activation for }x \end{aligned} > $$ Thus, the baseline architecture of SAEs is a linear autoencoder with L1 penalty on the activations: $$ \begin{aligned} f(x) &\coloneqq \text{ReLU}(W_\text{enc}(x - b_\text{dec}) + b_\text{enc}) \\ \hat{x}(f) &\coloneqq W_\text{dec} f(x) + b_\text{dec} \end{aligned} $$ > training it to reconstruct a large dataset of model activations $x \sim \mathcal{D}$, constraining hidden representation $f$ to be sparse [L1 norm](https://aarnphm.xyz/thoughts/sparse-autoencoder/../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1#l1norm) with coefficient $\lambda$ to construct loss during training: $$ \begin{aligned} \mathcal{L}(x) &\coloneqq \| x-\hat{x}(f(x)) \|_2^2 + \lambda \| f(x) \|_1 \\[8pt] &\because \|x-\hat{x}(f(x)) \|_2^2 : \text{ reconstruction loss} \end{aligned} $$ > [!tip] intuition > > We need to reconstruction fidelity at a given sparsity level, as measured by L0 via a mixture of reconstruction fidelity and L1 regularization. We can reduce sparsity loss term without affecting reconstruction by scaling up norm of decoder weights, or constraining norms of columns $W_\text{dec}$ during training Ideas: output of decoder $f(x)$ has two roles - detects what features acre active ⇐ L1 is crucial to ensure sparsity in decomposition - _estimates_ magnitudes of active features ⇐ L1 is unwanted bias ### Gated SAE _uses Pareto improvement over training to reduce L1 penalty_ ([Rajamanoharan et al., 2024](#bib-rajamanoharan2024improvingdictionarylearninggated)) Clear consequence of the bias during training is _shrinkage_ ([Sharkey, 2024](#bib-sharkey2024feature)) [^shrinkage] Idea is to use [gated ReLU](https://aarnphm.xyz/thoughts/sparse-autoencoder/../../thoughts/optimization#gated-linear-units-and-variants) encoder ([Dauphin et al., 2017](#bib-dauphin2017languagemodelinggatedconvolutional); [Shazeer, 2020](#bib-shazeer2020gluvariantsimprovetransformer)): $$ \tilde{f}(\mathbf{x}) \coloneqq \underbrace{\mathbb{1}[\underbrace{(\mathbf{W}_{\text{gate}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{gate}}) > 0}_{\pi_{\text{gate}}(\mathbf{x})}]}_{f_{\text{gate}}(\mathbf{x})} \odot \underbrace{\text{ReLU}(\mathbf{W}_{\text{mag}}(\mathbf{x} - \mathbf{b}_{\text{dec}}) + \mathbf{b}_{\text{mag}})}_{f_{\text{mag}}(\mathbf{x})} $$ where $\mathbb{1}[\bullet > 0]$ is the (point-wise) Heaviside step function and $\odot$ denotes element-wise multiplication. | term | annotations | | -------------------- | ------------------------------------------------------------------------------- | | $f_\text{gate}$ | which features are deemed to be active | | $f_\text{mag}$ | feature activation magnitudes (for features that have been deemed to be active) | | $\pi_\text{gate}(x)$ | $f_\text{gate}$ sub-layer’s pre-activations | to negate the increases in parameters, use _weight sharing_: Scale $W_\text{mag}$ in terms of $W_\text{gate}$ with a vector-valued rescaling parameter $r_\text{mag} \in \mathbb{R}^M$: $$ (W_\text{mag})_{ij} \coloneqq (\exp (r_\text{mag}))_i \cdot (W_\text{gate})_{ij} $$ ![](https://aarnphm.xyz/thoughts/sparse-autoencoder/../../thoughts/images/gated-sae-architecture.webp) _Figure 3: Gated SAE with weight sharing between gating and magnitude paths_ ![](https://aarnphm.xyz/thoughts/sparse-autoencoder/../../thoughts/images/gated_jump_relu.webp) _Figure 4: A gated encoder become a single layer linear encoder with [JumpReLU](https://aarnphm.xyz/thoughts/sparse-autoencoder/../../thoughts/optimization#jumprelu)_ ([Erichson et al., 2019](#bib-erichson2019jumpreluretrofitdefensestrategy)) _activation function_ $\sigma_\theta$ ### feature suppression See also: [link](https://www.alignmentforum.org/posts/3JuSjTZyMzaSeTxKk/addressing-feature-suppression-in-saes) Loss function of SAEs combines a MSE reconstruction loss with sparsity term: $$ \begin{aligned} L(x, f(x), y) &= \|y-x\|^2/d + c\mid f(x) \mid \\[8pt] &\because d: \text{ dimensionality of }x \end{aligned} $$ > the reconstruction is not perfect, given that only one is reconstruction. **For smaller value of $f(x)$, features will be suppressed** > [!note]- illustrated example > > consider one binary feature in one dimension $x=1$ with probability $p$ and $x=0$ otherwise. Ideally, optimal SAE would extract feature activation of $f(x) \in \{0,1\}$ and have decoder $W_d=1$ > > However, if we train SAE optimizing loss function $L(x, f(x), y)$, let say encoder outputs feature activation $a$ if $x=1$ and 0 otherwise, ignore bias term, the optimization problem becomes: > > $$ > \begin{aligned} a &= \argmin p * L(1,a,a) + (1-p) * L(0,0,0) \\ &= \argmin (1-a)^2 + \mid a \mid * c \\ &= \argmin a^2 + (c-2) *a +1 \end{aligned} \Longrightarrow \boxed{a = 1-\frac{c}{2}} > $$ > [!question]+ How do we fix feature suppression in training SAEs? > > introduce element-wise scaling factor per feature in-between encoder and decoder, represented by vector $s$: > > $$ > \begin{aligned} f(x) &= \text{ReLU}(W_e x + b_e) \\ f_s(x) &= s \odot f(x) \\ y &= W_d f_s(x) + b_d \end{aligned} > $$ ## Bibliographie - Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). _Language Modeling with Gated Convolutional Networks_. arXiv preprint arXiv:1612.08083 [\[arxiv\]](https://arxiv.org/abs/1612.08083) - Erichson, N. B., Yao, Z., & Mahoney, M. W. (2019). _JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks_. arXiv preprint arXiv:1904.03750 [\[arxiv\]](https://arxiv.org/abs/1904.03750) - Rajamanoharan, S., Conmy, A., Smith, L., Lieberum, T., Varma, V., Kramár, J., Shah, R., & Nanda, N. (2024). _Improving Dictionary Learning with Gated Sparse Autoencoders_. arXiv preprint arXiv:2404.16014 [\[arxiv\]](https://arxiv.org/abs/2404.16014) - Sharkey, L. (2024). _Addressing Feature Suppression in SAEs_. AI Alignment Forum. [\[post\]](https://www.alignmentforum.org/posts/3JuSjTZyMzaSeTxKk/addressing-feature-suppression-in-saes) - Shazeer, N. (2020). _GLU Variants Improve Transformer_. arXiv preprint arXiv:2002.05202 [\[arxiv\]](https://arxiv.org/abs/2002.05202) [^shrinkage]: If we hold $\hat{x}(\bullet)$ fixed, thus L1 pushes $f(x) \to 0$, while reconstruction loss pushes $f(x)$ high enough to produce accurate reconstruction. An optimal value is somewhere between. However, rescaling the [shrink](https://aarnphm.xyz/thoughts/sparse-autoencoder/../../thoughts/mechanistic-interpretability#feature-suppression) feature activations ([Sharkey, 2024](#bib-sharkey2024feature)) is not necessarily enough to overcome bias induced by L1: a SAE might learnt sub-optimal encoder and decoder directions that is not improved by the fixed. --- slug: thoughts/sparse-crosscoders tags: - ml - interp description: "and how we observe multiple activation layers. SAE is a special case of sparse crosscoders." title: "sparse crosscoders" date: 2024-11-03 permalink: https://aarnphm.xyz/thoughts/sparse-crosscoders.html.md --- > [!tip] maturity > > a research preview from Anthroppic and this is pretty much still a work in progress see also [reproduction on Gemma 2B](https://colab.research.google.com/drive/124ODki4dUjfi21nuZPHRySALx9I74YHj?usp=sharing) and [github](https://github.com/ckkissane/crosscoder-model-diff-replication) A variant of [sparse autoencoder](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/sparse-autoencoder) where it reads and writes to multiple layers ([Lindsey et al., 2024](#bib-lindsey2024sparsecrosscoders)) Crosscoders produces _shared features across layers and even models_ ## motivations Resolve: - cross-layer features: resolve cross-layer superposition - circuit simplification: remove redundant features from analysis and enable jumping across training many uninteresting identity circuit connections - model diffing: produce shared sets of features across models. This also introduce one model across training, and also completely independent models with different architectures. ### cross-layer [superposition](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/mechanistic-interpretability#superposition-hypothesis) ![](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/images/additive-residual-stream-llm.webp) _given the additive properties of transformers’ residual stream, **adjacent layers** in larger transformers can be thought as “almost parallel”_ > [!tip]- intuition > > In basis of superposition hypothesis, a feature is a linear combinations of neurons at any given layers. > > ![](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/images/feature-neurons.webp) ![](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/images/one-step-circuit.webp) ![](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/images/parallel-joint-branch.webp) _if we think of adjacent layers as being “almost parallel branches that potentially have superposition between them”, then we can apply dictionary learning jointly [^jointlysae]_ ### persistent features and complexity Current drawbacks of sparse autoencoders is that we have to train it against certain activations layers to extract features. In terms of the residual stream per layers, we end up having lots of duplicate features across layers. > Crosscoders can simplify the circuit _given that we use an appropriate architecture_ [^risks] ## setup. > Autoencoders and transcoders as special cases of crosscoders. > > - autoencoders: reads and predict the same layers > - transcoders: read from layer $n$ and predict layer $n+1$ Crosscoder read/write to many layers, subject to causality constraints. > [!math]+ crosscoders > > Let one compute the vector of feature activation $f_(x_j)$ on data point $x_j$ by summing over contributions of activations of different layers $a^l(x_j)$ for layers $l \in L$: > > $$ > \begin{aligned} f(x_j) &= \text{ReLU}(\sum_{l\in L}W_{\text{enc}}^l a^l(x_j) + b_{\text{enc}}) \\[8pt] &\because W^l_{\text{enc}} : \text{ encoder weights at layer } l \\[8pt] &\because a^l(x_j) : \text{ activation on datapoint } x_j \text{ at layer } l \\ \end{aligned} > $$ We have loss $$ L = \sum_{l\in L} \|a^l(x_j) - a^{l^{'}}(x_j)\|^2 + \sum_{l\in L}\sum_i f_i(x_j) \|W^l_{\text{dec,i}}\| $$ and regularization can be rewritten as: $$ \sum_{l\in L}\sum_{i} f_i(x_j) \|W^l_{\text{dec,i}}\| = \sum_{i} f_i(x_j)(\displaystyle\sum_{l \in L} \|W^l_\text{dec,i}\|) $$ _weight of L1 regularization penalty by L1 norm of per-layer decoder weight norms_ $\sum\limits{l\in L} \|W^l_\text{dec,i}\|$ [^l2weightnorm] We use L1 due to - baseline loss comparison: L2 exhibits lower loss than sum of per-layer SAE losses, as they would effectively obtain a loss “bonus” by spreading features across layers - _layer-wise sparsity surfaces layer-specific features_: based on empirical results of [model diffing](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/sparse-crosscoders#model-diffing), that L1 uncovers a mix of shared and model-specific features, whereas L2 tends to uncover only shared features. ## variants ![](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/images/crosscoders-variants.webp) good to explore: 1. strictly causal crosscoders to capture MLP computation and treat computation performed by attention layers as linear 2. combine strictly causal crosscoders for MLP outputs without weakly causal crosscoders for attention outputs 3. interpretable attention replacement layers that could be used in combination with strictly causal crosscoders for a “replacement model” ## model diffing see also: [model stiching](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/model-stiching) and [SVCCA](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/SVCCA) > ([Laakso & Cottrell, 2000](#bib-doi:10.1080/09515080050002726)) proposes compare [representations](https://aarnphm.xyz/thoughts/sparse-crosscoders/../../thoughts/representations) by transforming into representations of distances between data points. [^sne] ## questions > How do features change over model training? When do they form? > As we make a model wider, do we get more features? or they are largely the same, packed less densely? ## Bibliographie - Gorton, L. (2024). _The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders to InceptionV1 Early Vision_. arXiv preprint arXiv:2406.03662 [\[arxiv\]](https://arxiv.org/abs/2406.03662) - Laakso, A., & Cottrell, G. (2000). Content and cluster analysis: Assessing representational similarity in neural systems. _Philosophical Psychology_, _13_(1), 47–76. - Lindsey, J., Templeton, A., Marcus, J., Conerly, T., Batson, J., & Olah, C. (2024). Sparse Crosscoders for Cross-Layer Features and Model Diffing. _Transformer Circuits Thread_. [\[link\]](https://transformer-circuits.pub/2024/crosscoders/index.html) [^jointlysae]: ([Gorton, 2024](#bib-gorton2024missingcurvedetectorsinceptionv1)) denotes that cross-branch superposition is significant in interpreting models with parallel branches (InceptionV1) [^risks]: causal description it provides likely differs from that of the underlying model. [^l2weightnorm]: $\|W_\text{dec,i}^l\|$ is the L2 norm of a single feature’s decoder vector at a given layer. In principe, one might have expected to use L2 norm of per-layer norm $\sqrt{\sum_{l \in L} \|W_\text{dec,i}^l\|^2}$ [^sne]: Chris Colah’s [blog post](https://colah.github.io/posts/2015-01-Visualizing-Representations/) explains how t-SNE can be used to visualize collections of networks in a function space. --- slug: thoughts/state-space-models tags: - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/state-space-models" title: "state-space models" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/state-space-models.html.md --- See [state-space/mamba](https://github.com/state-spaces/mamba) and [paper](https://arxiv.org/abs/2312.00752) Mama uses a selective SSM scan. State-space duality (SSD): SSM + attentions layers (SMA, or structured masked [attention](https://aarnphm.xyz/thoughts/state-space-models/../../thoughts/Attention)) --- slug: thoughts/tacit-knowledge tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/tacit-knowledge" title: "tacit knowledge" date: 2024-10-22 permalink: https://aarnphm.xyz/thoughts/tacit-knowledge.html.md --- --- slug: thoughts/taste tags: - seed - pattern description: "resconstructed source of https://aarnphm.xyz/thoughts/taste" title: "taste" date: 2024-02-19 permalink: https://aarnphm.xyz/thoughts/taste.html.md --- ## as guide. [Jacky’s post](https://jzhao.xyz/posts/aesthetics-and-taste) > We have built up an instinctive habit of looking things up and seeing how other people have done it before trying it for ourselves. But the downside is that this habit primes our brains to value our work in the context of the taste of others rather than of our own. We have outsourced our [value](https://aarnphm.xyz/thoughts/taste/../../thoughts/Value) systems for what is good and bad (how we may judge [aesthetic value](https://aarnphm.xyz/thoughts/taste/../../thoughts/aesthetic-value)) to other people. > Looking at the history of scientific progress, we see plenty of evidence on how this reliance on the taste of committees and society broadly only serves to inhibit progress. Managed creativity can, at best, produce only what its managers specify. All that remains are the ideas that live in the Overton Window --- slug: thoughts/taxonomy tags: - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/taxonomy" title: "Taxonomy" date: 2024-02-07 permalink: https://aarnphm.xyz/thoughts/taxonomy.html.md --- --- slug: thoughts/university/twenty-four-twenty-five/engineer-4a03/case-study tags: - engineer4a03 description: "a case study into how surveillance capitalism drives one of the most influential controversy in data privacy of the 21st century" title: "Cambridge Analytica, a case study" date: 2024-11-08 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/case-study.html.md --- ## group Cambridge Analytica scandal epitomises a dark reality towards ethical responsibilities of corporations operating within the framework of surveillance capitalism ([Zuboff, 2019](#bib-zuboff2019age)) Through what Zuboff calls “extraction practices,” ([Zuboff, 2015, p. 78](#bib-doi:10.1057/jit.2015.5)) Cambridge Analytica harvested personal data from millions of Facebook users, treating individual privacy not as a right but as a commodity to be seized. [^1] As Zuboff argues, this new economic logic is fundamentally incompatible with democratic norms, as it concentrates unprecedented power in private companies while eliminating traditional reciprocities between corporations and people. The ethical responsibility of Facebook lies in its facilitation of an infrastructure that prioritizes data acquisition over user privacy ([Srnicek, 2017, p. pg.2, see expansion, monopolisation, invulnerabilities](#bib-srnicek2017platformcapitalism)). By designing a platform that encourages extensive data sharing and by failing to enforce strict oversight over third-party data access, Facebook normalized surveillance as a core aspect of its business model ([Couldry & Mejias, 2019](#bib-couldry2019costs)). This aligns with the principles of surveillance capitalism, where the commodification of personal information becomes a driving economic force, often at the expense of individual autonomy and privacy. Cambridge Analytica’s actions further exemplify the perils of surveillance capitalism by demonstrating how personal data can be weaponised to manipulate democratic processes. The firm’s use of regression ML algorithm to influence electoral outcomes highlights a significant ethical breach—transforming citizens from participants in a democracy to subjects of behavioral manipulation ([Susser et al., 2019](#bib-susser2019technology)). This not only undermines individual rights but also poses a threat to the integrity of democratic institutions. In a sense, Chris Wylie assumed significant ethical responsibilities as a whistleblower. By exposing the company’s unethical data practices, Wylie upheld a moral imperative to prevent harm to society and protect democratic processes. Whistleblowers often face substantial personal and professional risks, but their actions are vital in bringing unethical practices to light ([Vandekerckhove & Langenberg, 2012](#bib-vandekerckhove2012organize)). Wylie’s decision to reveal the inner workings of Cambridge Analytica provided transparency and prompted a global discourse on data privacy and the dangers of surveillance capitalism. Regulators and policymakers share in the ethical responsibility due to their delayed response to the evolving landscape of data privacy. The lack of robust legal frameworks allowed surveillance capitalism to flourish unchecked, exposing vulnerabilities in data protection and user rights ([Acquisti et al., 2016](#bib-10.1257/jel.54.2.442)). The scandal underscores the urgent need for comprehensive regulations that address the complexities of data commodification in the digital age. ## Bibliographie - Acquisti, A., Taylor, C., & Wagman, L. (2016). The Economics of Privacy. _Journal of Economic Literature_, _54_(2), 442–492. - Couldry, N., & Mejias, U. A. (2019). _The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism_. Stanford University Press. - Srnicek, N. (2017). The challenges of platform capitalism: Understanding the logic of a new business model. _Juncture_, _23_(4), 254–257. - Susser, D., Roessler, B., & Nissenbaum, H. (2019). Technology, autonomy, and manipulation. _Internet Policy Review_, _8_(2). - Vandekerckhove, W., & Langenberg, S. (2012). Can We Organize Courage? Implications from Foucault’s Parrhesia. _Electronic Journal of Business Ethics and Organizational Studies_. - Zuboff, S. (2015). Big other: Surveillance Capitalism and the Prospects of an Information Civilization. _Journal of Information Technology_, _30_(1), 75–89. - Zuboff, S. (2019). _The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power_. PublicAffairs. [^1]: Surveillance capitalism operates by extracting surplus data from individuals—often without their explicit consent—and using it to predict and influence behavior for profit.([Zuboff, 2015, p. 81](#bib-doi:10.1057/jit.2015.5)) Facebook’s business model relied heavily on harvesting vast amounts of user data to drive targeted advertising, creating an environment ripe for exploitation. --- slug: thoughts/university/twenty-four-twenty-five/engineer-4a03/index tags: - university - engineer4a03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/index" title: "Engineering Ethics" date: 2024-10-29 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/index.html.md --- See also [ethics](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/index/../../../../../../../../thoughts/ethics), [literature review](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/index/../../../../../../../../thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review), [case study](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/index/../../../../../../../../thoughts/university/twenty-four-twenty-five/engineer-4a03/case-study) --- slug: thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review tags: - engineer4a03 description: "How we understand machine learning system is how we can move towards a safe futures, yet the road ahead lies many troubles to overcome. A literature review into the inception of the field, as well as where do we go from here." title: "machine learning, as inception of time, a literature review" date: 2024-10-07 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review.html.md --- See also [essays on ChatGPT](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../posts/chatgpt), [case study on Cambridge Analytica](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/university/twenty-four-twenty-five/engineer-4a03/case-study) ## introduction. To understand how AI is fundamentally political, we need to go beyond neural nets and statistical pattern recognition to instead ask _what_ is being optimized, and _for whom_, and _who_ gets to decide. Then we can trace the implications of those choices. -- Kate Crawford, _The Atlas of AI_ 1979’s “Star-Trek: the Motion Picture” centered around the antagonist, V’Ger, an artificial entity that have outgrown its original programs, sought annihilation upon planet Earth. At the core, the movie is mostly fictional, yet its prevalence to our current state of affairs is uncanny. Much in Artificial intelligence (AI) has changed since 1960s, including a shift in symbolic systems to more recent hype about deep connectionist networks. AI has expanded rapidly as a academia field and as a industry[^1]. Yet, the belief of formalising human intelligence and reproduced by machine has always been the core disputes in the history of AI. There has always been two narratives discussed within academia and industry practitioners on how we should approach such systems: The likes of Marvin Minsky claiming “machine can think” ([CRAWFORD, 2021, pp. 5–9](#bib-atlasofai)); while Dreyfus ([Dreyfus, 2008](#bib-dreyfus2008why)) believed in a Heideggerian AI system would dissolve the framing problem[^framing]. Nowadays, this narrative morphs into two verticals: Entities that seek to build systems capable of outperforming at tasks that a human can do at a greater degree of accuracy and efficiency (OpenAI, Anthropic, SSI, many AI labs, etc.[^ssi]), and companies that build AI systems to amplify our abilities to create and improve efficiency for our work (Runway, Cohere, etc.). This literature review aims to provide a comprehensive overview of the current state of AI, through its history and current adoption. It will also include investigations into certain concerns for diversity, equity, and inclusion (DEI) within the field, as well as the ethical implications of AI systems. It will then conclude and posit questions about where we go from here. ## growth. _Mathematicians wish to treat matters of perception mathematically, and make themselves ridiculous \[...] the mind \[...] does it tacitly, naturally, and without technical rules._ -- Pascal, _Pensées_ The inception of [AI](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/Machine-learning) might well begin when the belief of a total formalisation of knowledge must be possible[^2]. From Plato’s dichotomy of the rational soul from the body with its skills and intuition[^3], to Leibniz’s conception of the binary systems as a “universal characteristics” ([Leibniz, 1951, pp. 15, 25, 38](#bib-leibniz_selections_1951)) that led to Babbage’s design of “Analytic Engine” being recognized as the “first digital computer”, Alan Turing posited that a high-speed digital computer, programmed with rules, might exhibit [emergent behaviour](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/emergent-behaviour) of [intelligence](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/intelligence) ([TURING, 1950](#bib-10.1093/mind/lix.236.433)). Thus, a paradigm among researchers that focused on symbolic [reasoning](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/reason) was born, referred to as Good Old-Fashioned AI (GOFAI) ([Haugeland, 1997](#bib-10.7551/mitpress/4626.001.0001)). GOFAI was built on a high level symbolic representation of the world, popularized through expert systems ([Jackson, 1998](#bib-jackson_introduction_1998)) that tried to mimic human expert on specialized tasks [^4]. Yet, we observed a period of “AI Winter” where most symbolic AI research either reached dead end or funding being dried up ([Hendler, 2008](#bib-handler2008avoidanotheraiwinter)). This is largely due to GOFAI’s semantic representation which were implausible to scale to generalized tasks. Concurrently, Donald Norman’s Parallel Distributed Processing ([Rumelhart et al., 1986](#bib-10.7551/mitpress/5236.001.0001)) group investigated variations of Rosenblatt’s project ([Rosenblatt, 1958](#bib-rosenblatt1958perceptron)), where they proposed intermediate processors within the network (often known as “hidden layers”) alongside with inputs and outputs to extrapolate appropriate responses based on what it had learned during training process. These systems, built on top of statistical methods[^5] and connectionist networks are often referred to by Haugeland as New-Fangled AI (NFAI) ([Haugeland, 1997](#bib-10.7551/mitpress/4626.001.0001)). In retrospect, GOFAI are [deterministic](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/Determinism) in a sense that intentionality is injected within symbolic tokens through explicit programming. Connectionist networks, on the other hand, are often considered as black-box models, given their hidden nature of intermediate representations of perceptron. Unlike GOFAI, its internal representation is determined by the state of the entire network rather than any single unit. Given the rise of Moore’s Law and the exponential amount of computing and data available, we are currently witnessing the dominance of connectionist networks, especially with the injection of LLMs into the mainstream ([Kaplan et al., 2020](#bib-kaplan2020scalinglawsneurallanguage)), where the majority of research are focused on developing artificial neural networks that optimizes around loss functions ([Vaswani et al., 2023](#bib-vaswani2023attentionneed)) ([Srivastava et al., 2014](#bib-srivastava_dropout_2014)). One notable example that combines both GOFAI and NFAI systems is AlphaZero, a connectionist network based Go playing systems, that uses a deep neural networks to assess new positions and Monte-Carlo Tree Search (a GOFAI algorithm) to determine its next move ([Silver et al., 2017](#bib-silver2017masteringchessshogiselfplay)). ## adoption. For context, we produce a lot of data: social media consumption, emails transaction, search, online shopping, mainly due to the rise of the internet and Web 2.0 post 9/11. While capitialism has always been a fraught system, there are incentives for harvesting our attention and predict our future behaviour — what Zuboff refers to as “surveillance capitalism” ([Carr, 2019](#bib-carr2019thieves)). In a sense, surveillance capitalism is built on top of the notion of _extraction imperatives_ where the Google and Facebook of the world have to mine as much information as possible [^6]. Machine learning benefited of this phenomenon since statistical methods often predict certain pattern from given data and yield certain predictions/decisions. ML can be categorized into two sub fields, supervised learning (where algorithms are trained on labelled data to provide prediction based on given labels) and unsupervised learning (where algorithms are trained on the basis of “produce _y_ in the form of _x_”)[^7]. Supervised learning methods including Naive Bayes, Decision tree, and other Bayesian models have been well integrated into industries to solve forecasting and classification problems ([Wu et al., 2020](#bib-zhang2020labelingmethod)) ## fairness See also: MIT Press ([Hao et al., 2019](#bib-haokarbuolamwini2019)), Darthmouth investigation in COMPAS system ([Dressel, 2018](#bib-doi:10.1126/sciadv.aao5580)) DEI has become a key aspect of technological progress in the $21^{\text{st}}$ century. This applies to AI, where its black-box nature has proven to be difficult for researchers to align certain bias bugs. Two main DEI methods emerge for addressing given problems: improving data diversity and ensuring fairness during the training procedure. The primary methods on fighting against bias bugs in contemporary AI system includes increase in data diversity. There is a timeless saying in computer science “[Garbage in Garbage out](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/Garbage-in-Garbage-out)”, which essentially states that bad data will produce outputs that’s of equal quality. This is most prevalent in AI, given the existence of these networks within a black-box model. One case of this is the very first iterations of Google Photos’ image recognition where it identified people with darker skins as “gorillas” ([BBC News, 2015](#bib-bbcgoogleapology2015)). Alliances such as The Data & Trust Alliance, including Meta, Nike, CVS Health, are formed to regulate and combat algorithmic bias. The Data & Trust Alliance aims to confront dangers of powerful algorithms in the work force before they can cause harm instead of simply reacting after the damage is done (Lohr, 2021). (Clarke, 2021) proposed that close inspection and regulation of these models should be monitored closely to mitigate misrepresentation of marginalized groups (Khan, 2022). Truth is, data lacks context. A prime example of this US’ COMPAS used by US courts to assess the likelihood of criminal to reoffend. ProPublica concluded that COMPAS was inherently biased towards those of African descent, citing that it overestimated the false positives rate for those of African descent by two folds ([Angwin et al., 2016](#bib-angwinlarsonmattukirchner2016)). Interestingly, a study done at Darthmouth showed a surprising accuracy on the rate of recidivism with random volunteers when given the same information as the COMPAS algorithm ([Dressel, 2018](#bib-doi:10.1126/sciadv.aao5580)). The question remains, how do we solve fairness and ensure DEI for marginalized groups when there is obviously prejudice and subjectivity that introduce bias at play? It is not a problem we can’t solve, rather collectively we should define what makes an algorithm **fair**. ## Bibliographie - Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A Learning Algorithm for Boltzmann Machines. _Cognitive Science_, _9_(1), 147–169. - Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). How We Analyzed the COMPAS Recidivism Algorithm. _ProPublica_. - Aristotle. (2009). _Nicomachean Ethics_ (L. Brown, Ed.; W. D. Ross, Trans.). Oxford University Press. - BBC News. (2015). Google apologises for Photos app’s racist blunder. _BBC News_. - Carr, N. (2019). Thieves of Experience: How Google and Facebook Corrupted Capitalism. _Los Angeles Review of Books_. - CRAWFORD, K. (2021). _The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence_. Yale University Press. - Dressel, J., & Hany Farid. (2018). The accuracy, fairness, and limits of predicting recidivism. _Science Advances_, _4_(1), eaao5580. - Dreyfus, H. L. (1972). _What Computers Can’t Do: A Critique of Artificial Reason_ (1st ed.). Harper & Row. - Dreyfus, H. L. (2008). Why Heideggerian AI Failed and How Fixing It Would Require Making It More Heideggerian. In _The Mechanical Mind in History_ (pp. 331–362). MIT Press. - Hao, K., Kar, J., & Buolamwini, J. (2019). Can you make AI fairer than a judge? Play our courtroom algorithm game. _MIT Technology Review_. - Haugeland, J. (1997). _Mind Design II: Philosophy, Psychology, and Artificial Intelligence_. The MIT Press. - Hendler, J. (2008). Avoiding Another AI Winter. _IEEE Intelligent Systems_, _23_(2), 2–4. - Jackson, P. (1998). _Introduction to Expert Systems_ (3rd ed., p. 542). Addison Wesley. - Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. _Science_, _349_(6245), 255–260. - Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). _Scaling Laws for Neural Language Models_. arXiv preprint arXiv:2001.08361 [\[arxiv\]](https://arxiv.org/abs/2001.08361) - Leibniz, G. W. (1951). _Leibniz Selections_ (P. P. Wiener, Ed.; p. 606). Charles Scribner’s Sons. - McKinsey & Company. (2024). McKinsey technology trends outlook 2024. _McKinsey Digital_. - Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. _Psychological Review_, _65_(6), 386–408. - Rumelhart, D. E., McClelland, J. L., & Group, P. R. (1986). _Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations_. The MIT Press. - Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2017). _Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm_. arXiv preprint arXiv:1712.01815 [\[arxiv\]](https://arxiv.org/abs/1712.01815) - Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. _Journal of Machine Learning Research_, _15_(56), 1929–1958. - TURING, A. M. (1950). I.—COMPUTING MACHINERY AND INTELLIGENCE. _Mind_, _LIX_(236), 433–460. - Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). _Attention Is All You Need_. arXiv preprint arXiv:1706.03762 [\[arxiv\]](https://arxiv.org/abs/1706.03762) - Wu, D., Wang, X., Su, J., Tang, B., & Wu, S. (2020). A Labeling Method for Financial Time Series Prediction Based on Trends. _Entropy_, _22_(10). [^1]: ([Jordan & Mitchell, 2015](#bib-jordan2015machine)) described the emerging trends within classical machine learning systems, focusing on recommendation systems. From a recent McKinsey’s reports of outlook trend of 2024, they reported around 570bn dollars equity investment in the adoption of generative AI, notably the integration of LLMs into enterprises usecase ([McKinsey & Company, 2024](#bib-mckinsey2024techtrends)) [^framing]: An intelligent being learns from its experience, then applies such intuition to predict future events. How does one select appropriate context (frame) for a given situation?\ Dreyfus’ argument is that machines are yet able to represent human’s reliance on many unconscious and subconscious processes ([Dreyfus, 1972](#bib-dreyfus1972what)). A Heideggerian AI would exhibit Dasein (being in the world). [^ssi]: Their goals are to build “artificial super intelligence” (ASI) systems. This target is largely due to certain observer-expectancy effect we observe in the current AI system. [^2]: According to [Plato](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato), Socrates asked Euthyphro, a fellow Athenian who is about to turn in his own father for murder in the name of piety: “I want to know what is characteristic of piety which makes all actions pious. \[…] that I may have it to turn to, and to use as a standard whereby to judge your actions and those of other men.” This is Socrates’ version of [effective procedure](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/engineer-4a03/literature-review/../../../../../../../../thoughts/effective-procedure) for modern-day computer scientists. [^3]: According to Plato, all knowledge must be universally applicable with explicit definitions, in other words, intuition, feeling would not constitute as the definition of knowing Aristotle differed from Plato where intuition was necessary to applying theory into practice ([Aristotle, 2009, p. 8, book VI](#bib-aristotle_nicomachean_ethics)). For Plato, cooks, who proceed by taste and intuition does not involve understanding because they have no knowledge. Intuition is considered as a mere belief. [^4]: Allen Newell and Herbert Simon’s work at RAND initially showed that computers can simulate important aspects of intelligence. [^5]: Notable figures include John Hopfield, Hinton’s “A Learning Algorithm for Boltzmann Machines” ([Ackley et al., 1985](#bib-ackley_learning_1985)) that introduces the concept of Boltzmann’s distributions in training neural networks, as well as Hinton’s later work on backpropagation algorithm. [^6]: Some notable quotes: - “Unlike financial derivatives, which they in some ways resemble, these new data derivatives draw their value, parasite-like, from human experience.”. - “\[Facebook’s algorithm fine-tuning and data wrangling] is aimed at solving one problem: how and when to intervene in the state of play that is your daily life in order to modify your behavior and thus sharply increase the predictability of your actions now, soon, and later.” [^7]: This is a mere simplification of the field. ML researchers also investigate in specific sub-fields --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS tags: - sfwr3db3 - university description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS" title: "DBMS" date: 2024-09-04 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS.html.md --- Book: Database Management System [ISBN-13:978-0072465631](https://www.amazon.ca/Database-Management-Systems-Raghu-Ramakrishnan/dp/0072465638) > [!tip] Midterm > > Thurs Oct.24 2024 (during lecture time) Due at 2200, late penalty of 20% per 24h, max 5 days. ```bash ssh se3db3 ``` Relational Model, E-R Model, Views, Indexes, Constraints, Relational Algebra - 2.5 exabytes of [data](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS/../../../../../../../../thoughts/data) per day. ## [search](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS/../../../../../../../../thoughts/Search) vs. query - indexed keyword - [PageRank](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/DBMS/../../../../../../../../thoughts/PageRank) - data independence - fault tolerant - concurrency control for transactions - reliable storage to maintain semantics ## independence - logical: protection from changes in _logical_ structure - physical: protection from changes in _physical_ structure --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/Entity-Relationship-Models tags: - sfwr3db3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/Entity-Relationship-Models" title: "Entity-Relationship Models" date: 2024-09-11 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/Entity-Relationship-Models.html.md --- ## E/R model > sketch databse schemas including constraints. - Entity set = rectangle - Attribute = oval, with a line to the rectangle (representing its entity set) ## relationship - connects two or more entity sets. - represented by a _diamonds_ value of a relationship is a **relationship set** ### many-to-many relationship > an entity of either set can be connected to many entities of the other set. ### many-to-one relationship > each entity of the first set can be connected to at most one entity of the second set. and each entity of the second set can be connected to at least one entity of the first set. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/Keys-and-Foreign-Keys tags: - sfwr3db3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/Keys-and-Foreign-Keys" title: "Foreign Keys and Relational Models" date: 2024-09-09 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/Keys-and-Foreign-Keys.html.md --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/Keys-and-Foreign-Keys/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/relationalModel_Sept5.pdf) > A relation is a table Relations are **unordered** ⇒ _relations are sets_ ## tuple and domain constraints - tuple: expresses conditions on the values of each tuple - domain constraint: tuple constrain that involves a single attributes ```sql (GPA <= 4.0) AND (GPA >= 0.0) ``` ## unique identifier > A _superkey_ is a set of attributes for a relation $r$ if $r$ cannot contain two distinct tuples $t_1$ and $t_2$ such that $t_1{[K]} = t_2{[K]}$ > A _(candidate) key_ for $r$ if $K$ is a minimal superkey ex: superkey of `RegNum` ## primary value handles `null` value > Presence of nulls in keys > [!tip] definition > > Each relation must have a **primary key** on which nulls are not allowed. > > notation: the attributes of the primary keys are _underlined_ ⇒ references between relations are realised through primary keys > [!note] Remark > > A set of fields is a _key_ for a relation if: > > 1. No two distinct tuples can have same values in all key fields > 2. This is not true for any subset of the key (minimal) > > If [#2](https://github.com/aarnphm/aarnphm.github.io/issues/2) is false, then a _superkey_ > > If there’s > 1 key for a relation, one of the keys is chosen to be _primary key_ Example: requirements: - For a given student and course, there is a single grade. ```sql CREATE TABLE Enrolled ( sid INTEGER, cid INTEGER, grade INTEGER, PRIMARY KEY (sid, cid), UNIQUE (cid, grade) ); ``` - Students can take only one course, and received a single grade for that courses; further, no two students in a course receive the grade ```sql CREATE TABLE Enrolled ( sid INTEGER, cid INTEGER, grade INTEGER, PRIMARY KEY sid, KEY (cid, grade) ); ``` > IC are validated when data is updated ## interpolation constraints (foreign keys) Referential integrity constraints _are imposed in order to guarantee **values** refer to existing tuples_ > [!note] Definition > > A _foreign key_ requires that the values on a set $X$ of attributes of a relation $R_1$ **must appear as values** for the _primary key_ of another relation $R_2$ Ex: _sid_ is a _foreign key_ referring to _Students_ > If al foreign key constraints are enforced ⇒ referential integrity is enforced ## enforcing referential integrity See also [source](https://www.ibm.com/docs/en/informix-servers/14.10?topic=integrity-referential) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/a1/content tags: - sfwr3db3 - assignment description: "some notes about entity-relationship models and foreign keys" title: "E/R models and keys" date: 2024-09-26 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/a1/content.html.md --- **Problem 1**: Consider the relations `PLAYERS` and `PLAYS` given by the schemas below. - `PLAYERS (playerID, firstName, lastName, gender, DOB, height, weight, drafted)` - `PLAYS (playerID, teamID, teamName, number, position, startYear)` PLAYERS provides information on all basketball players in the league, giving the playerID, first name and last name of the player, the gender, the date of birth (DOB), the player’s height and weight, and the year they were drafted into the league. PLAYS provides information about which players play on which teams. A player with playerID plays on a team with a teamID and team name. The player has a number, the position they play on the team, and the year they started playing with this team. For example, playerID 5 plays with teamID 1, the Toronto Raptors, with the number 4, in the point guard position, since 2021. Given these schemas, answer the following questions: > [!question] 1.a (9 marks) > > Identify three candidate keys. For each candidate key, describe the key, and briefly state the assumptions or conditions under which each candidate key would be valid Candidate keys: 1. $\text{playerID}$ in `PLAYERS` relation: - description: playerID contains a sole attribute, so it is minimal superkey. Given that each player will have unique `playerID` - assumption: each players has unique playerID 2. $\{\text{playerID}, \text{teamID}, \text{number}\}$ in `PLAYS` relation: - description: $\{\text{playerID}, \text{teamID}, \text{number}\}$ is minimal superkey given assumption. - assumption: A player uses the same number for their duration at a given team. 3. $\{\text{playerID}, \text{teamID}, \text{startYear}\}$ in `PLAYS` relation: - description: $\{\text{playerID}, \text{teamID}, \text{startYear}\}$ identifies the assumption, making it a minimal superkey. - assumption: A player can only be associated with a team at a given period in time. > [!question] 1.b (6 marks) > > List three integrity constraints that should hold over these relations. For each constraint, describe in one sentence why your constraint is necessary. 1. `playerID` in `PLAYS` references `playerID` in `PLAYERS`: - reason: foreign key constraint is necessary to ensure referential integrity, in other word, every player in `PLAY` must exist in `PLAYERS` 2. `drafted` in `PLAYERS` must be less than or equal to `startYear` in `PLAYS`: - reason: temporal integrity constraint, i.e., a player cannot start playing for a team before they were drafted into the league 3. $\{\text{teamID}, \text{number}\}$ in `PLAYS` table must be unique per `playerID` - reason: uniqueness constraint, i.e., no two players on the same team have the same number at any point in time --- **Problem 2**: You will prepare an E-R diagram describing the schema of airline operations storing information in an airline database. MacAir Aviation manages flight operations, passenger services, fleet maintenance, and staff. The company, henceforth referred to as “MacAir”, has hired you to design their database. MacAir wants to store information about people, where a person is represented with a person ID, name, age, and phone number. There are four types of persons: passenger, pilot, cabin crew, and ground staff: - A passenger has a dietary preference (e.g., ‘Vegan’, ‘Gluten-Free’, ‘Lactose- Free’, etc.). - A pilot, and a cabin crew both have a position (e.g., ‘Captain’, ‘First Officer’, etc.) and a salary. - Ground staff have attributes for salary and department (e.g. Billing and invoicing, Information Technology, etc.). An airline ticket has a 13-digit numeric ticket number, a seat number (e.g., 38A, 2E, etc.), and a class (‘E’, ‘B’, or ‘F’, representing economy, business, and first-class, respectively). Passengers book one or more tickets through a travel website (e.g., ‘Expedia’, ‘SkyScanner’, etc.) with an associated price. A ticket is bought by exactly one passenger. MacAir records an airline with an identifying alias, which is a 2-letter alphabetic code (‘AC’ for Air Canada), and the airline name (e.g., ‘Air Canada’). Airplanes have a serial number, a manufacturer, and a model (e.g. 737MAX). A pilot flies many airplanes, however, an airplane must be flown by at least one pilot. A cabin crew member works for at most one airline, and an airline has to have at least one cabin crew member working for it. An airline must own at least one airplane, but an airplane is owned by exactly one airline. A country has a code (a 3-letter alphabetic code, e.g., ‘CAN’ for Canada), a name, and a continent. An airport has an IATA code (International Air Transport Association, 3-letter alphabetic code, e.g., ‘YYZ’ for Toronto Pearson Airport), a name, and a city. A country has zero or more airports, however, an airport must be in exactly one country. An airline belongs to exactly one country, but a country can have many airlines. Ground staff work for at most one airport but an airport must have at least one ground staff. A (flight) route is represented with a numeric ID, the number of stops (e.g., 0 for nonstop), and the duration (in hours). A route contains exactly one source airport and exactly one destination airport (e.g., source airport: ’YYZ’, destination airport: ’MCO’). However, airports serve as the source or destination on many routes. An airline has many routes around the world, and a route is used by many airlines. The entity ‘Scheduled Flights’ contains all flights that serve a route. Scheduled flights are defined via an alpha-numeric flight number, departure date, arrival date, scheduled departure time, scheduled arrival time, actual departure time, and actual arrival time. A scheduled flight contains exactly one route, but a route participates in many (scheduled) flights. For example, the ‘YYZ’ to ‘MCO’ route appears in the scheduled flights for (AC1670, Sept. 13, Sept 13, 17:45, 20:35, 18:00, 20:50) Airlines use at least one scheduled flight to conduct operations, but a scheduled flight is associated to exactly one airline. A ticket is bought for exactly one (scheduled) flight, and there must be at least one ticket purchased for a (scheduled) flight. Baggage is associated to exactly one ticket. We record the type of bags (i.e., carry-on, checked, oversized, strollers), total quantity of bags for each type (e.g., 2 carry-on bags, 2 checked bags, 1 stroller, total weight of all bags for a type (e.g., 30kg for carry-on bags, 60kg for checked bags, 5kg for stroller), and whether the bags (per type) are fragile. A ticket is associated to many (types of) bags. > [!question] 2.a > > Draw the ER diagram capturing the described requirements. You may use any drawing tool of your choice, but please ensure your ER diagram is clearly readable, and the notation you use is clear and consistent (i.e., notation from the lecture slides or textbook). > [!question] 2.b > > Give a brief (one sentence) description of each of your entities and relationships, and any constraints that exist. For example, $X$ is a weak entity with attributes $(a, b, c)$, and has a many-one relationship with $Y$ _Person_: denotes the meta definition of a person with attributes $(\text{id [PK], name, age, phone\_number})$ _Baggage_: is an entity with attributes $(\text{type}, \text{quantity}, \text{weight}, \text{is\_fragile})$, has a many-to-one relationship with _Ticket_ _Passenger_: is a subclass of _Person_, with attributes $(\text{dietary\_preference})$, has a one-many relationship with _Ticket_ _Ticket_: is a strong entity with atributes $(\text{ticket\_number [PK]}, \text{seat\_number, class, price, travel\_website})$, having one-to-many relationship with _Baggage_ _Pilot_: is a subclass of _Person_, with attributes $(\text{position},\text{salary})$, has a “fly” one-to-many relationship with _airplane_ _Cabin Crew_: is a subclass of _Person_, with attributes $(\text{position},\text{salary})$, has a “work” many-to-one relationship with _airline_ _Ground Staff_: is a subclass of _Person_, with attributes $(\text{department},\text{salary})$, has a “work” many-to-one relationship with _airport_ _airport_: is a strong entity with attributes $(\text{iata\_code [PK, FK]}, \text{name [PK]}, \text{city})$, has “has” one-to-many relationship with _Ground Staff_ and many-to-one with _country_ _country_: is a strong entity with attributes $(\text{code [PK]}, \text{name}, \text{continent})$, has one-to-many relationship with _airline_ _airline_: is a strong entity with attributes $(\text{name}, \text{alias [PK]})$, has one-to-many relationship with _scheduled\_flight_, and one-to-many with _airplane_ _airplane_: is a strong entity with attributes $(\text{serial\_number [PK]}, \text{manufacturer}, \text{model})$, has many-to-one relationship with _pilot_ _flight\_route_: is a strong entity with attributes $(\text{id [PK]}, \text{stop, duration})$, has one-to-many relationship with _scheduled\_flight_ and one-to-one with _airport_ through relationship `source` and `dest` _scheduled\_flight_: is a strong entity with attributes: $$ \begin{aligned} (\text{flight\_number [PK]}, \text{departure\_date}, \text{arrival\_date} & \\ \text{scheduled\_departure\_time}, & \text{scheduled\_arrival\_time}, \\ \text{actual\_departure\_time}, & \text{actual\_arrival\_time}) \end{aligned} $$ has one-to-many relationship with _flight\_route_ and one-to-many with _airport_ through relationship `source` Constraints: - All person id are unique. - An airline must own at least one airplane and have at least one cabin crew member. - An airplane must be flown by at least one pilot. - An airport must have at least one ground staff. - A scheduled flight must have at least one ticket purchased for it. - A country can have zero or more airports, but an airport must be in exactly one country. - An airline belongs to exactly one country. - A route contains exactly one source airport and one destination airport. - A scheduled flight contains exactly one route and is associated with exactly one airline. - A ticket is bought for exactly one scheduled flight and by exactly one passenger. > [!question] 2.c > > Provide the corresponding DB2 `CREATE TABLE`` statements describing the relational schema. Please include all your statements in an executable script `airline.ddl\` that can be run on the DB2 command line, in a single command. Ensure that your script runs on the CAS DB2 server. See also: [airline.ddl](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/a1/airline.ddl) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/a2/content tags: - sfwr3db3 - assignment description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/a2/content" title: "SQL and Relational Algebra" date: 2024-11-11 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/a2/content.html.md --- ## 1. SQL > [!question] Q1 > > Find all passengers, between the ages of 20 and 30 (inclusive), who have a “Vegan” or “Vegetarian” dietary preference. Return their ID, name, and age. ```sql SELECT p.personid AS id, p.name, p.age FROM person p JOIN passenger pass ON p.personid = pass.personid WHERE p.age BETWEEN 20 AND 30 AND pass.dietarypref IN ('Vegan', 'Vegetarian') ORDER BY p.personid; ``` > [!question] Q2 > > a. Find the number of airplanes that exist for each model. Return the model and the count for each model. b. Extend your query from (a) to find the number of airplanes in each model for any of the following airlines: ‘Air Canada’, ‘Etihad Airways’, or ‘United Airlines’. Return the name of the airline, the model, and the number of airplanes. ```sql -- Q2a SELECT model, COUNT(*) AS numairplanes FROM airplane GROUP BY model ORDER BY model; -- Q2b SELECT a.name AS airlinename, p.model, COUNT(*) AS numairplanes FROM airplane p JOIN airline a ON p.airlinealias = a.alias WHERE a.name IN ('Air Canada', 'Etihad Airways', 'United Airlines') GROUP BY a.name, p.model ORDER BY a.name, p.model; ``` > [!question] Q3 > > a. For each “Air Canada” ticket, find the average of the total weight, for all baggage associated to the ticket. Return the ticket number, and the average total (baggage) weight. b. Find all tickets with “Oversized”, non-fragile baggage with a total weight (strictly) greater than 90 lbs, during the holiday season from Dec. 10, 2023 to Jan. 3, 2024 (inclusive). Return all qualifying ticket numbers, and the total `(Oversized)` baggage weight. ```sql -- Q3a SELECT t.ticketno, AVG(b.totalweight) AS AverageBaggageWeight FROM ticket t JOIN scheduledflight sf ON t.flightno = sf.flightno AND t.flightdepdate = sf.depdate JOIN airline a ON sf.airlinealias = a.alias LEFT JOIN baggage b ON t.ticketno = b.ticketno WHERE a.name = 'Air Canada' GROUP BY t.ticketno ORDER BY t.ticketno; -- Q3b SELECT b.ticketno, b.totalweight AS OversizedBaggageWeight FROM baggage b JOIN ticket t ON b.ticketno = t.ticketno JOIN scheduledflight sf ON t.flightno = sf.flightno AND t.flightdepdate = sf.depdate WHERE b.bagtype = 'Oversized' AND b.fragile = FALSE AND b.totalweight > 90 AND sf.depdate BETWEEN '2023-12-10' AND '2024-01-03' ORDER BY b.ticketno; ``` > [!question] Q4 > > Where and when are the cheapest tickets for flights from Toronto “YYZ” to Orlando “MCO”? Return the ticket number, the date of departure, the minimum price (rename to min-Price), and the website where the ticket(s) were purchased. ```sql WITH MinPriceFlights AS ( -- First find the minimum price for this route SELECT MIN(b.Price) as min_price FROM Route r JOIN ScheduledFlight sf ON r.RouteID = sf.RouteID JOIN Ticket t ON sf.FlightNo = t.FlightNo AND sf.DepDate = t.FlightDepDate JOIN Book b ON t.TicketNo = b.TicketNo WHERE r.srcAirport = 'YYZ' AND r.dstAirport = 'MCO' ) SELECT t.TicketNo, sf.DepDate as DepartureDate, b.Price as minPrice, b.Website FROM Route r JOIN ScheduledFlight sf ON r.RouteID = sf.RouteID JOIN Ticket t ON sf.FlightNo = t.FlightNo AND sf.DepDate = t.FlightDepDate JOIN Book b ON t.TicketNo = b.TicketNo CROSS JOIN MinPriceFlights mpf WHERE r.srcAirport = 'YYZ' AND r.dstAirport = 'MCO' AND b.Price = mpf.min_price ORDER BY sf.DepDate; ``` > [!question] Q5 > > a. Which routes are served by at least three airlines? Return the routeID, and display your results in descending order by the number of airlines. b. Which routes are not served by any airline? Return the routeID, the source and destination airports ```sql -- Q5a SELECT u.RouteID, COUNT(DISTINCT u.AirlineAlias) as NumAirlines FROM Use u GROUP BY u.RouteID HAVING COUNT(DISTINCT u.AirlineAlias) >= 3 ORDER BY NumAirlines DESC; -- Q5b SELECT r.RouteID, r.srcAirport as SourceAirport, r.dstAirport as DestinationAirport FROM Route r LEFT JOIN Use u ON r.RouteID = u.RouteID WHERE u.AirlineAlias IS NULL ORDER BY r.RouteID; ``` > [!question] Q6 > > a. Find the number of distinct passengers who also work as either a pilot, cabin crew, or ground staff. Rename this result as NumStaffPassengers. b. For each airline, how many pilots or cabin crew are also passengers? Return the airline (alias), and the corresponding count ```sql -- Q6a SELECT COUNT(DISTINCT p.PersonID) as NumStaffPassengers FROM Passenger p WHERE p.PersonID IN ( SELECT PersonID FROM Pilot UNION SELECT PersonID FROM CabinCrew UNION SELECT PersonID FROM GroundStaff ); -- Q6b SELECT a.Alias as AirlineAlias, COUNT(DISTINCT p.PersonID) as StaffPassengerCount FROM Airline a LEFT JOIN ( -- Get all pilots and cabin crew SELECT PersonID, AirlineAlias FROM CabinCrew UNION -- For pilots, we need to get their airline through the planes they fly SELECT DISTINCT pi.PersonID, ap.AirlineAlias FROM Pilot pi JOIN Flies f ON pi.PersonID = f.PilotID JOIN Airplane ap ON f.AirplaneSNo = ap.SerialNo ) AS staff ON a.Alias = staff.AirlineAlias -- Join with Passenger to check which staff are also passengers JOIN Passenger pass ON staff.PersonID = pass.PersonID GROUP BY a.Alias ORDER BY a.Alias; ``` > [!question] Q7 > > a. Find all the one-way routes operated by airline “ACA”, i.e., airline alias = ‘ACA’. In this context, a one-way route is where the airline serves from a source airport to a destination airport, but not in the reverse direction. Return the route ID, and the corresponding source and destination airports, respectively. b. Find the most popular route where the departure date lies between “2023-12-01” to “2023-12-31” (inclusive). Popularity is defined as the maximum number of tickets purchased during this time duration. Return the route ID, the corresponding source and destination air- ports, and number of tickets sold along this route. ```sql -- Q7a SELECT r1.RouteID, r1.srcAirport as SourceAirport, r1.dstAirport as DestinationAirport FROM Route r1 JOIN Use u1 ON r1.RouteID = u1.RouteID WHERE u1.AirlineAlias = 'ACA' AND NOT EXISTS ( -- Check if reverse route exists SELECT 1 FROM Route r2 JOIN Use u2 ON r2.RouteID = u2.RouteID WHERE u2.AirlineAlias = 'ACA' AND r2.srcAirport = r1.dstAirport AND r2.dstAirport = r1.srcAirport ) ORDER BY r1.RouteID; -- Q7b WITH RouteTickets AS ( -- Count tickets per route in December 2023 SELECT r.RouteID, r.srcAirport, r.dstAirport, COUNT(*) as TicketCount FROM Route r JOIN ScheduledFlight sf ON r.RouteID = sf.RouteID JOIN Ticket t ON sf.FlightNo = t.FlightNo AND sf.DepDate = t.FlightDepDate WHERE sf.DepDate BETWEEN '2023-12-01' AND '2023-12-31' GROUP BY r.RouteID, r.srcAirport, r.dstAirport ), MaxTickets AS ( -- Find the maximum ticket count SELECT MAX(TicketCount) as MaxCount FROM RouteTickets ) SELECT rt.RouteID, rt.srcAirport as SourceAirport, rt.dstAirport as DestinationAirport, rt.TicketCount as NumberOfTickets FROM RouteTickets rt, MaxTickets mt WHERE rt.TicketCount = mt.MaxCount ORDER BY rt.RouteID; ``` > [!question] Q8 > > a. Which Air Canada (alias “ACA”) flights from source airport “YYZ” to destination airport “MCO” have “First” class tickets? Return all satisfying flight numbers. b. Find all airlines that are unique to their country (i.e., they are the only airline for their country). Return the airline alias, airline name, and the country name ```sql -- Q8a WITH AirlinesPerCountry AS ( -- Count airlines per country SELECT c.Code as CountryCode, c.Name as CountryName, COUNT(*) as AirlineCount FROM Country c JOIN Airline a ON c.Code = a.CountryCode GROUP BY c.Code, c.Name HAVING COUNT(*) = 1 ) SELECT a.Alias as AirlineAlias, a.Name as AirlineName, apc.CountryName FROM Airline a JOIN AirlinesPerCountry apc ON a.CountryCode = apc.CountryCode ORDER BY apc.CountryName, a.Name; -- Q8b SELECT a1.Alias as AirlineAlias, a1.Name as AirlineName, c.Name as CountryName FROM Airline a1 JOIN Country c ON a1.CountryCode = c.Code WHERE NOT EXISTS ( SELECT 1 FROM Airline a2 WHERE a2.CountryCode = a1.CountryCode AND a2.Alias != a1.Alias ) ORDER BY c.Name, a1.Name; ``` ## 2. Relational Algebra > [!question] Question > > For queries Q1 - Q6, give the corresponding relational algebra expression ### Q1 $$ \begin{align} & R_1 = \text{Person} \bowtie_{\text{Person.PersonID} = \text{Passenger.PersonID}} \text{Passenger} \\[6pt] & R_2 = \sigma_{\substack{ \text{Age} \geq 20 \\ \wedge \, \text{Age} \leq 30 \\ \wedge \, \big(\text{DietaryPref} = \text{'Vegan'} \\ \phantom{\wedge \,} \vee \, \text{DietaryPref} = \text{'Vegetarian'}\big) }} (R_1) \\[6pt] & \text{Result} = \pi_{\text{PersonID}, \, \text{Name}, \, \text{Age}} (R_2) \end{align} $$ ### Q2 a. $$ \gamma_{\text{Model}, \text{count}(*) \rightarrow \text{NumAirplanes}}(\text{Airplane}) $$ b. $$ \begin{align} & R_1 = \text{Airplane} \bowtie_{\text{AirlineAlias = Alias}} \text{Airline} \\[6pt] & R_2 = \sigma_{\substack{ \text{Name} = \text{'Air Canada'} \\ \vee \, \text{Name} = \text{'Etihad Airways'} \\ \vee \, \text{Name} = \text{'United Airlines'} }} (R_1) \\[6pt] & \text{Result} = \gamma_{\substack{ \text{Name}, \text{Model}, \\ \text{count}(*) \rightarrow \text{NumAirplanes} }} (R_2) \end{align} $$ ### Q3 a. $$ \begin{align} & R_1 = \text{Ticket} \bowtie_{ \substack{ \text{FlightNo = FlightNo} \\ \wedge \, \text{FlightDepDate = DepDate} }} \text{ScheduledFlight} \\[6pt] & R_2 = R_1 \bowtie_{\text{AirlineAlias = Alias}} \text{Airline} \\[6pt] & R_3 = R_2 \Join_{\text{Ticket.TicketNo = Baggage.TicketNo}} \text{Baggage} \\[6pt] & R_4 = \sigma_{\text{Name} = \text{'Air Canada'}} (R_3) \\[6pt] & R_5 = \pi_{\text{TicketNo}, \text{TotalWeight}} (R_4) \\[6pt] & \text{Result} = \\ & \quad \gamma_{\text{TicketNo}, \, \text{avg}(\text{TotalWeight}) \rightarrow \text{AverageBaggageWeight}} (R_5) \end{align} $$ _NOTE_: R2 should “\leftouterjoin” instead (but current limitation of LaTeX renderer) b. $$ \begin{align} & R_1 = \text{Ticket} \bowtie_{ \substack{ \text{FlightNo = FlightNo} \\ \wedge \, \text{FlightDepDate = DepDate} } } \text{ScheduledFlight} \\[6pt] & R_2 = \text{Baggage} \bowtie_{\text{TicketNo = TicketNo}} R_1 \\[6pt] & R_3 = \sigma_{\substack{ \text{BagType} = \text{'Oversized'} \\ \wedge \, \text{Fragile} = \text{False} \\ \wedge \, \text{TotalWeight} > 90 \\ \wedge \, \text{DepDate} \geq \text{'2023-12-10'} \\ \wedge \, \text{DepDate} \leq \text{'2024-01-03'} }} (R_2) \\[6pt] & \text{Result} = \pi_{\text{TicketNo, TotalWeight}} (R_3) \end{align} $$ ### Q4 $$ \begin{align} & R_1 = \sigma_{\substack{\text{srcAirport} = \text{'YYZ'} \\ \land \, \text{dstAirport} = \text{'MCO'}}} (\text{Route}) \\[6pt] & R_2 = R_1 \bowtie_{\text{Route.RouteID} = \text{ScheduledFlight.RouteID}} \text{ScheduledFlight} \\[6pt] & R_3 = R_2 \bowtie_{ \substack{ \text{ScheduledFlight.FlightNo} = \text{Ticket.FlightNo} \\ \land \, \text{ScheduledFlight.DepDate} = \text{Ticket.FlightDepDate} }} \text{Ticket} \\[6pt] & R_4 = R_3 \bowtie_{\text{Ticket.TicketNo} = \text{Book.TicketNo}} \text{Book} \\[6pt] & \text{MinPrice} = \mathcal{G}_{\emptyset, \, \text{min\_price} \leftarrow \text{MIN(Price)}} \Big( \Pi_{\text{Price}} (R_4) \Big) \\[6pt] & \text{Result} = \\ & \quad \Pi_{ \substack{ \text{TicketNo}, \, \text{DepDate} \rightarrow \text{DepartureDate}, \\ \text{Price} \rightarrow \text{minPrice}, \, \text{Website} }} \Big( \sigma_{\text{Price} = \text{min\_price}} (R_4 \times \text{MinPrice}) \Big) \end{align} $$ ### Q5 a. $$ \begin{align} R_1 &= \Pi_{\text{RouteID}, \text{AirlineAlias}} (\text{Use}) \\[8pt] R_2 &= \mathcal{G}_{\text{RouteID}, \text{NumAirlines} \leftarrow \text{COUNT}(\text{AirlineAlias})} (R_1) \\ \text{Result} &= \Pi_{\text{RouteID}} (\sigma_{\text{NumAirlines} \geq 3} (R_2)) \end{align} $$ b. $$ \begin{align} R_1 &= \text{Route} \: \Join_{\text{Route.RouteID = Use.RouteID}} \: \text{Use} \\[6pt] R_2 &= \sigma_{\text{AirlineAlias} \: \text{IS} \: \text{NULL}} (R_1) \\[6pt] \text{Result} &= \\ & \quad \Pi_{\text{RouteID}, \, \substack{ \text{srcAirport} \rightarrow \text{SourceAirport}, \\ \text{dstAirport} \rightarrow \text{DestinationAirport} }} (R_2) \end{align} $$ _NOTE_: Route should “\leftouterjoin” instead (but current limitation of LaTeX renderer) ### Q6 a. $$ \begin{align} & \text{Staff} = \\ & \quad \Pi_{\text{PersonID}} (\text{Pilot}) \space \cup \\ & \quad \Pi_{\text{PersonID}} (\text{CabinCrew}) \space \cup \\ & \quad \Pi_{\text{PersonID}} (\text{GroundStaff}) \\[6pt] & \text{StaffPassengers} = \\ & \quad \Pi_{\text{PersonID}} (\text{Passenger}) \cap \text{Staff} \\[6pt] & \text{Result} = \\ & \quad \mathcal{G}_{\emptyset, \, \text{NumStaffPassengers} \leftarrow \text{COUNT(PersonID)}} (\text{StaffPassengers}) \end{align} $$ b. $$ \begin{align} & \text{CabinCrewWithAirline} = \\ & \quad \Pi_{\text{PersonID}, \, \text{AirlineAlias}} (\text{CabinCrew}) \\[6pt] & \text{PilotsWithPlanes} = \\ & \quad \Pi_{\text{PersonID}, \, \text{AirlineAlias}} (\\ & \qquad \text{Pilot} \bowtie_{\text{Pilot.PersonID} = \text{Flies.PilotID}} \text{Flies} \\ & \qquad \bowtie_{\text{Flies.AirplaneSNo} = \text{Airplane.SerialNo}} \text{Airplane}\\ & \quad ) \\[6pt] & \text{AllStaffWithAirline} = \\ & \quad \text{CabinCrewWithAirline} \cup \text{PilotsWithPlanes} \\[6pt] & \text{StaffPassengers} = \\ & \quad \text{AllStaffWithAirline} \bowtie_{\text{PersonID}} \Pi_{\text{PersonID}} (\text{Passenger}) \\[6pt] & \text{Result} = \\ & \quad \mathcal{G}_{\text{AirlineAlias}, \, \text{StaffPassengerCount} \leftarrow \text{COUNT(PersonID)}} (\text{StaffPassengers}) \end{align} $$ ## 3. Indexes The following includes two possible indexes: ### $\text{(FlightNo, DeptDate)}$ on `ScheduledFlight` table - Attributes: (FlightNo, DeptDate) on `ScheduledFlight` table - Properties: composite index on both attributes , clustered index respectively - Benefits - Q3, Q4, Q7b given these queries heavily join with ScheduledFlight and filter on depature dates - composite nature supports queries that use both FlightNo and DepDate in joins (frequently due to the foreign key relationship with Ticket table) - Since these fields are part of the primary key of ScheduledFlight and are frequently used in joins with Ticket - help with range scan on DepDate ### $\text{(RouteID, AirlineAlias)}$ on `Use` table - Attributes: (RouteID, AirlineAlias) on `Use` table - Properties: composite index., unclustered index respectively - Benefits: - Q5a, Q5b, Q7a and indirect Q4 - given these rely on route-airline relationship - Q5a needs to count distinct airlines per route, so this index eliminate this scan - Q7a looks for ACA airline routes, so this will provide direct access - Being unclustered is appropriate as `Use` is frequently accessed for lookups but doesn’t require physical ordering --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/a3/content tags: - sfwr3db3 - assignment description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/a3/content" title: "DB design, concurrency and transaction" date: 2024-11-24 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/a3/content.html.md --- ## Database Design ### Finding Keys > [!question] a > > Consider a relation schema $R(A, B, C, D, E)$ and the set of functional dependencies > > $$ > \mathbf{F} = \{ A \rightarrow BC, \, CD \rightarrow E, \, B \rightarrow D, \, E \rightarrow A \} > $$ > > Find all candidate keys (minimal keys) of relation $R$. Show all the steps you took to derive each key, and clearly state which of Armstrong’s axioms are used in each step. **Closure of $A$** $A \rightarrow BC$ means $A^{+} = \{A, B, C\} \text{Decomposition}$ $B \rightarrow D$ means $A^+ \{A,B,C,D\} \text{(Transitivity)}$ $CD \rightarrow E$ means $A^+ \{A,B,C,D,E\} \text{(Transitivity)}$ ($CD \in A$) > $A$ is a candidate key **Closure of $B$** $B^+ = \{B\}$ $B \rightarrow D$ means $B^+ = \{B, D\}$ No other closure can be applied thus $B$ is _not_ a key **Closure of $C$** We can’t derive all attributes, thus $C$ is _not_ a key **Closure of $D$** No applicable dependencies, thus $D$ is _not_ a key **Closure of $E$** $E \rightarrow A$ means $E^+ = \{E, A\} \text{(FD)}$ $A \rightarrow BC$ means $E^+ = \{E, A, B, C\} \text{Decomposition and Transitivity}$ $B \rightarrow D$ means $E^+ = \{E, A, B, C, D\} \text{Transitivity}$ > $E$ is a candidate key **Closure of $CD$** Initial $CD$ gives $(CD)^+ = \{C, D\}$ $CD \rightarrow E$ gives $(CD)^+ = \{C, D, E\} \text{FD}$ $E \rightarrow A$ gives $(CD)^+ = \{C, D, E, A\} \text{Transitivity}$ $A \rightarrow BC$ gives $(CD)^+ = \{C, D, E, A, B, C\} \text{Transitivity and Decomposition}$ > $CD$ is a candidate key \*\*Closure of $BC$ Initial $BC$ gives $(BC)^+ = \{B, C\}$ $B \rightarrow D$ gives $(BC)^+ = \{B, C, D\} \text{FD}$ $CD \rightarrow E$ gives $(BC)^+ = \{B, C, D, E\} \text{Transitivity}$ $E \rightarrow A$ gives $(BC)^+ = \{B, C, D, E, A\} \text{Transitivity}$ > $BC$ is a candidate key > [!quote] conclusion > > Final candidate keys are $A, E, CD, BC$ > [!question] b > > Is the FD $AB \rightarrow C$ entailed by **F**? Show your work that supports your answer We will find closure of $AB$ under $F$ $(AB)^+ = \{A, B\}$ $A \rightarrow BC$ entails $A \rightarrow B$ and $A \rightarrow C$ (Decomposition) with augmentation on $A \rightarrow C$ we have $AB \rightarrow CB$ Decomposition to $AB \rightarrow CB$ gets $AB \rightarrow C$ Therefore $AB \rightarrow C$ is entailed by $F$ ### Minimal Cover > [!question] Question > > Given the relational schema $T(A, B, C, D)$ and the FDs: $F \{ABC \rightarrow D, CD \rightarrow A, CA \rightarrow B, AD \rightarrow C, CD \rightarrow B \}$, compute the minimal cover $F^{'}$ of $F$. Show all your work (derivation) to compute $F^{'}$ We have the following FD $$ \begin{aligned} ABC &\rightarrow D \\ CD &\rightarrow A \\ CA &\rightarrow B \\ AD &\rightarrow C \\ CD &\rightarrow B \end{aligned} $$ 1. RHS of FDs into single attributes - Already in this form 2. minimize LHS by removing extraneous FD1: $ABC \rightarrow D$ - $B$ is extraneous given that $AC \rightarrow D$ holds: - $(AC)^+ = \{A, C\}$ - $CA \rightarrow B$ then add $B$ to closure - $ABC \rightarrow D$ then add $D$ to closure > Update FD1: $AC \rightarrow D$ FD2: $CD \rightarrow A$ - no FD is applied is either C or D is assume extraneous, therefore remained unchanged FD3: $CA \rightarrow B$ - can’t reduce $CA \rightarrow B$ given that neither $C \rightarrow B$ and $A \rightarrow B$ holds FD4: $AD \rightarrow C$ - can’t reduce $AD \rightarrow C$ given that neither $A \rightarrow C$ and $D \rightarrow C$ holds FD5: $CD \rightarrow B$ - can’t reduce $CD \rightarrow B$ given that neither $C \rightarrow B$ and $D \rightarrow B$ holds 3. Remove redudant FDs FD5: $CD \rightarrow B$ Can be calculated from $CD \rightarrow A$ and $CA \rightarrow B$: Closure of $CD$ is $(CD)^+ = \{C,D\}$, $CD \rightarrow A$ gives $\{C,D,A\}$ and $CD \rightarrow B$ gives $\{C,D,A,B\}$ thus this is redudant > [!quote] Final minimal cover is > > $F^{'} = \{AC \rightarrow D, CD \rightarrow A, CA \rightarrow B, AD \rightarrow C\}$ ### Armstrong’s Axioms > [!question] a > > Given the relational schema $R(A,B, C, D, E, F)$ and FDs $F_1: \{AB \rightarrow C, A \rightarrow D, CD \rightarrow EF\}$. Show that $AB \rightarrow F$ $$ \begin{align} A &\rightarrow D &\text{(Given)} \\ AB &\rightarrow DB &\text{(Augmentation w/B)} \\ AB &\rightarrow D \cup AB \rightarrow B &\text{Decomposition)} \\ AB &\rightarrow CD &\text{(Union with 2 and } AB \rightarrow C) \\ AB &\rightarrow EF &\text{(Transitivity with 3 and } CD \rightarrow EF) \\ AB &\rightarrow F &\text{(Decomposition)} \\ &\because \end{align} $$ > [!question] b > > Given the relational schema $R(A,B, C, D, E, F)$ and FDs $F_1: \{C \rightarrow D, BE \rightarrow A, BEF \rightarrow C \}$. Show that $BEF$ is a key Proof: $BEF^{+} = \{A,B,C,D,E,F\}$ and $BEF$ is minimal 1. Proving closure of $BEF$ We have $BEF^{+} = \{B,E,F\}$ $BEF \rightarrow C$ and $BEF \in BEF^{+}$ by reflexivity, we add $C$ to the closure $BEF^{+} = \{B,E,F, C\}$ $C \rightarrow D$ and with Transitivity of $BEF \rightarrow C$ gives $BEF \rightarrow D$. Add $D$ to closure $BEF^{+} = \{B,E,F, C, D\}$ $BE \rightarrow A$ thus add $A$ to the closure $BEF^{+} = \{B,E,F, C, D, A\}$ Therefore by union we have prove closure of $BEF$ 2. minimal of $BEF$ Case 1: Remove $B$ from $BEF$: - Compute $EF^{+}$, and there is no FD to prove this transition, therefore $EF$ does not determine all attributes Case 2: Remove $E$ from $BEF$: - Compute $BF^{+}$, and there is no FD to prove this transition, therefore $EF$ does not determine all attributes Case 3. remove $F$ from $BEF$: - Closure of $BE$ is $BE^{+} = \{B, E\}$ - $BE \rightarrow A$ means $BE^{+} = \{B, E, A\}$ - No further can be added Therefore BEF is minimal > BEF is a key ### 3NF, BCNF > [!question] 1 > > List all functional dependencies and keys that can be inferred from this information 1. **functional dependencies** For Company table: FD1: $\text{companyID} \rightarrow \text{companyName, cityName, countr, assets}$ FD2: $\text{companyName, cityName} \rightarrow \text{companyID, country, assets}$ Candidate key: $\text{companyID}$ (minimal key based on FD1) and $\text{companyName, cityName}$ (based on FD2) For Department table: FD3: $\text{deptID} \rightarrow \text{deptName, companyID, cityName, country, deptMgrID}$ FD4: $\text{companyID, depthName} \rightarrow \text{deptID, cityName, country, deptMgrID}$ FD5: $\text{deptMgrID} \rightarrow \text{deptID}$ Candidate key: $\text{deptID}$ and $\text{companyID, depthName}$ For City table: FD6: $\text{cityID} \rightarrow \text{cityName, country}$ FD7: $\text{cityName, country} \rightarrow \text{cityID}$ Candidate key: $\text{cityID}$ and $\text{cityName, country}$ > [!question] 2 > > schemas satisfies either BCNF or 3NF Note that for both Company and City tables, it satisifies BCNF. however, for Department table: For FD3, $\text{deptID}$ is a candidiate key, thus satisfies BCNF For FD4, $\text{companyID, depthName}$ is a candidiate key, thus satisfies BCNF But for FD5 given that $deptMgrID$ is not a candidate key, thus violate BCNF **Improvement** - Create a new table DeptManager $\text{deptMgrID, deptID}$ with decomposition - remove $\text{deptMgrID}$ from the original table (now $\text{deptName, companyID, cityName, country}$) Thus should satisfy BCNF ## Transactions and Concurrency ### Schedules Consider schedules $S_{1}, S_{2}, S_{3}$ State which of the following properties holds (or not) for each schedule: _strict, avoid cascading aborts, recoverability_. Provide brief justification for each answer > [!question] a > > $S_{1}: \text{r1(X); r2(Z); r1(Z); r3(X); r3(Y); w1(X); c1; w3(Y); c3; r2(Y); w2(Z); w2(Y); c2}$ - strict: _no_ because $\text{r3}$ reads X before $T_{1}$ commits, and $\text{r2}$ reads Y before $T_{3}$ commits - avoid cascading aborts: no, because $\text{r3}$ reads X before $T_{1}$ commits - recoverability: yes, since $T_{2}$ reads data written by $T_{3}$ has committed > [!question] b > > $S_{2}: \text{r1(X); r2(Z); r1(Z); r3(X); r3(Y); w1(X); w3(Y); r2(Y); w2(Z); w2(Y); c1; c2; c3}$ - strict: no because $T_{2}$ reads uncommitted data from $T_{3}$ before committed - avoid cascading aborts: no because $\text{r2}(Y)$ reads an uncommitted value from $T_{3}$ - recoverability: no because $T_{2}$ reads Y written by $T_{3}$ but commits before $T_{3}$ commits > [!question] c > > $S_{3}: \text{r1(X); r2(Z); r3(X); r1(Z); r2(Y); r3(Y); w1(X); w2(Z); w3(Y); w2(Y); c3; c1; c2}$ - strict: no, because $T_{2}$ writes to $Y$ after it has been modified by uncommitted $T_{3}$ - avoid cascading aborts: yes, because all reads are from initial state, not from uncommitted transaction - recoverability: yes, because $T_{2}$ is committed after $T_3$ ### Serialisability Which of the following schedules is (conflict) serializable? For each serializable schedule, find the equivalent serial schedules. > [!question] a > > $\text{r1(X); r3(X); w1(X); r2(X); w3(X)}$ $$ \begin{align*} r_1(X) \rightarrow w_3(X) &: T_1 \text{ reads } X \text{ before } T_3 \text{ writes it} \implies T_1 \rightarrow T_3 \\ r_3(X) \rightarrow w_1(X) &: T_3 \text{ reads } X \text{ before } T_1 \text{ writes it} \implies T_3 \rightarrow T_1 \\ w_1(X) \rightarrow r_2(X) &: T_1 \text{ writes } X \text{ before } T_2 \text{ reads it} \implies T_1 \rightarrow T_2 \\ w_1(X) \rightarrow w_3(X) &: T_1 \text{ writes } X \text{ before } T_3 \text{ writes it} \implies T_1 \rightarrow T_3 \\ r_2(X) \rightarrow w_3(X) &: T_2 \text{ reads } X \text{ before } T_3 \text{ writes it} \implies T_2 \rightarrow T_3 \end{align*} $$ The precedence graph contains a cycle between $T_{1}$ and $T_{3}$, thus this is **not conflict serializable** > [!question] b > > $\text{r3(X); r2(X); w3(X); r1(X); w1(X)}$ Note that there are no conflict between $T_{2}$ and other nodes given that it only read This is **conflict serializable** with the following equivalent serial schedules: $$ \begin{aligned} & T_{3} \to T_{1} \to T_{2} \\ & T_{3} \to T_{2} \to T_{1} \end{aligned} $$ ### Locking > [!question] Question > > Consider the following locking protocol: > > - Before a transaction T writes a data object A, T has to obtain an exclusive lock on A. > - For a transaction T, we hold these exclusive locks until the end of the transaction. > - If a transaction T reads a data object A, no lock on A is obtained. > > State which of the following properties are ensured by this locking protocol: serializability, conflict-serializability, recoverability, avoids cascading aborts, avoids deadlock. Explain and justify your answer for each property. 1. serializability - Not ensured given that reads aren’t controlled by locks, therefore two transaction can read the same data item and write to in different order (example: $\text{r1(X); r2(X); w2(X); w1(X)}$) 2. conflict-serializability - Not ensured, same reason as above 3. recoverability - not ensured given that if a transaction $T_j$ reads data written by $T_i$, then $T_j$ should commit only after $T_i$ commit. however, in this protocol, transaction can read uncommitted data, given that read is not locked (dirty read.) 4. avoid cascading aborts - not ensured given that dirty read can happen (example: $\text{w1(X); r2(X)}$. In this case if $T_{1}$ aborts, $T_{2}$ will need to aboart, causing cascade aborts) 5. avoid deadlock - ensured, given that each transaction have to obtain exclusive lock on A, as transactions can’t wait for each other in a scycle since reads don’t require locks. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/index tags: - university - sfwr3db3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/index" title: "Databases" date: 2024-10-29 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/index.html.md --- See also [databases](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/index/../../../../../../../../thoughts/databases) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm tags: - sfwr3db3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm" title: "databases internals" date: 2024-10-23 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm.html.md --- ## Practice Q1. - a. F - b. F (wrong: Must be T) - A relation R(A,B,C) **may** have at most three minimal keys (not superkey) - c. T - d. T - e. T (any ops involving a null is a null) - f. F (DML: data manipulation, not management) - g. F (a weak entity set has one or more many-many relationship) - h. F Q3. ```prolog Product(maker, model, price) PC(model, speed) Printer(model, type) ``` - model is PK for all relations - `type` are “laser” and “ink-jet” - every PC model and every printer model is a Product model (every PC/printer must be referenced in relation to Product) - price of a product should not be more than 10% higher than the average price of all product (average price of all product is given value avgPrice) - model and price are int, all other attributes of type char(20) ```sql title="create schema" create table Product( model INTEGER PRIMARY KEY NOT NULL; maker CHAR(20), price INTEGER (CHECK price <= (SELECT AVG(price)*1.10 FROM Product)) ); create table PC( model INTEGER PRIMARY KEY NOT NULL; speed CHAR(20), FOREIGN KEY(model) REFERENCES Product(model) ); create table Printer( model INTEGER PRIMARY KEY NOT NULL; type CHAR(20) (CHECK (type IN ('laser', 'ink-jet'))) FOREIGN KEY(model) REFERENCES Product(model) ); ``` ```sql title="find makers from whom a combination (PC and Printer) can be bought for less than 2000" SELECT DISTINCT p1.maker FROM Product p WHERE EXISTS ( SELECT * FROM PC pc, Printer pr, Product p1, Product p2 WHERE p1.model = pc.model and p2.model = pr.model and p1.price + p2.price < 2000 and p1.maker = p.maker and p2.maker = p.maker ) ``` ```sql title="For each maker, find the min and max price of a (PC, ink-jet printer) combination" SELECT p1.maker, min(p1.price+p2.price), max(p1.price+p2.price) FROM Product p1, Product p2, PC pc, Printer pr WHERE pr.type = 'ink-jet' AND p1.model = pc.model AND p2.model = pr.model and p1.maker = p2.maker ORDER BY p1.maker; ``` Q4. a. (1,3) b. cartesian products - url: thoughts/.../Keys-and-Foreign-Keys - description: Keys and Foreign Keys # Foreign Keys and Relational Models See also [slides](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/Keys-and-Foreign-Keys/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/relationalModel_Sept5.pdf) > A relation is a table Relations are **unordered** ⇒ _relations are sets_ ## tuple and domain constraints - tuple: expresses conditions on the values of each tuple - domain constraint: tuple constrain that involves a single attributes ```sql (GPA <= 4.0) AND (GPA >= 0.0) ``` ## unique identifier > A _superkey_ is a set of attributes for a relation $r$ if $r$ cannot contain two distinct tuples $t_1$ and $t_2$ such that $t_1{[K]} = t_2{[K]}$ > A _(candidate) key_ for $r$ if $K$ is a minimal superkey ex: superkey of `RegNum` ## primary value handles `null` value > Presence of nulls in keys > [!tip] definition > > Each relation must have a **primary key** on which nulls are not allowed. > > notation: the attributes of the primary keys are _underlined_ ⇒ references between relations are realised through primary keys > [!note] Remark > > A set of fields is a _key_ for a relation if: > > 1. No two distinct tuples can have same values in all key fields > 2. This is not true for any subset of the key (minimal) > > If [#2](https://github.com/aarnphm/aarnphm.github.io/issues/2) is false, then a _superkey_ > > If there’s > 1 key for a relation, one of the keys is chosen to be _primary key_ Example: requirements: - For a given student and course, there is a single grade. ```sql CREATE TABLE Enrolled ( sid INTEGER, cid INTEGER, grade INTEGER, PRIMARY KEY (sid, cid), UNIQUE (cid, grade) ); ``` - Students can take only one course, and received a single grade for that courses; further, no two students in a course receive the grade ```sql CREATE TABLE Enrolled ( sid INTEGER, cid INTEGER, grade INTEGER, PRIMARY KEY sid, KEY (cid, grade) ); ``` > IC are validated when data is updated ## interpolation constraints (foreign keys) Referential integrity constraints _are imposed in order to guarantee **values** refer to existing tuples_ > [!note] Definition > > A _foreign key_ requires that the values on a set $X$ of attributes of a relation $R_1$ **must appear as values** for the _primary key_ of another relation $R_2$ Ex: _sid_ is a _foreign key_ referring to _Students_ > If al foreign key constraints are enforced ⇒ referential integrity is enforced ## enforcing referential integrity See also [source](https://www.ibm.com/docs/en/informix-servers/14.10?topic=integrity-referential) [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/Keys-and-Foreign-Keys) ## [ER Model](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/Entity-Relationship-Models) > A weak entity doesn’t have enough information to have its own PK and relies on supporting entity for unique identification > [!tip] Weak Entity > > _weak_ identity we need one (or more) many-to-one (_supporting_) relationship(s) to other (supporting) entity sets ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/weak-entity.webp) Role - entity set may appear more than once in a relationship (label the edge between relationship) ## sql. ```sql create table Beers ( name CHAR(20) PRIMARY KEY, -- fixed-length of $n$ character manf VARCHAR(20), -- variable length of $n$ character ) create table Sells ( bar CHAR(20), beer CHAR(20) REFERENCES Beers(name), price REAL NOT NULL, PRIMARY KEY (bar, beer) ) -- or create table Sells ( bar CHAR(20), beer CHAR(20), price REAL NOT NULL, PRIMARY KEY (bar, beer), FOREIGN KEY(beer) REFERENCES Beers(name) ) ``` > [!tip] values > > any values can be `NULL`, unless specified otherwise > [!tip] PRIMARY KEYS vs. UNIQUE. > > - 1 PK for a relation, but several UNIQUE > - No attributes of PK can be NULL > - Attributes declared UNIQUE may have NULL ### DATE and TIME ```sql DATE("yyyy-mm-dd") TIME("hh:mm:ss") ``` ### constraints. - keys - foreign keys - domain - tuple-based - assertions - `REFERENCES` attribute _**must be** _`PRIMARY KEY` or `UNIQUE` ```prolog FOREIGN KEYS REFERENCES (attributes) ``` **enforcing** constraints from relation $R$ to relation $S$, the following violation are possible: 1. insert/update $R$ introduces values not found in $S$ 2. deletion/update to $S$ causes tuple of $R$ to “dangle” ex: suppose $R=\text{Sell} \cap S=\text{Beer}$ _delete or update to $S$ that removes a beer value found in some tuples of $R$_ actions: 1. _Default_: reject modification 2. `CASCADE`: make the same changes in Sells - Delete beer: delete Sells tuple - Update beer: change value in Sells 3. `SET NULL`: change beer to `NULL` > Can choose either `CASCADE` or `SET NULL` as policy, otherwise reject as default ```sql create table Sells ( bar CHAR(20), beer CHAR(20) CHECK (beer IN (SELECT name FROM Beers)), price REAL CHECK (price <= 5.00), FOREIGN KEY(beer) REFERENCES Beers(name) ON DELETE SET NULL ON UPDATE CASCADE ) ``` > [!tip] attributed-based check > > `CHECK()`: cond may use name of attribute, but **any other relation/attribute name MUST BE IN subquery** > > `CHECK` only runs when a value for that attribute is inserted or updated. > [!note] Tuple-based checks > > added as a relation-schema element > > check on insert or update only ```sql create table Sells ( bar CHAR(20), beer CHAR(20), price REAL, CHECK (bar = 'Joe''s Bar' OR price <= 5.00), ) ``` ### queries ```sql SELECT name FROM Beers WHERE manf = 'Anheuser-Busch'; SELECT t.name FROM Beers t WHERE t.manf = 'Anheuser-Busch'; SELECT * FROM Beers WHERE manf = 'Anheuser-Busch'; SELECT name AS beer, manf FROM Beers WHERE manf = 'Anheuser-Busch'; SELECT bar, beer, price*95 AS priceInYen FROM Sells; -- constants as expr (using Likes(drinker,beer)) SELECT drinker, 'likes Bud' as whoLikesBud FROM Likes WHERE beer = 'Bud'; ``` > [!note] patterns > > `%` is any string, and `_` is any character > > ```sql > SELECT name FROM Drinkers > WHERE phone LIKE '%555-_ _ _ _'; > ``` > In sql, logics are 3-valued: TRUE, FALSE, UNKNOWN > > - comparing any value with `NULL` yields `UNKNOWN` > - A tuple in a query answer iff the `WHERE` is `TRUE` `ANY()` and `ALL()` ensures anyof or allof relations. > [!tip] > > `IN` is concise > > ```sql > SELECT * FROM Cartoons WHERE LastName IN ('Simpsons', 'Smurfs', 'Flintstones') > ``` IN is a predicate about `R` tuples ```sql -- (1,2) satisfies the condition, 1 is output once SELECT a FROM R -- loop once where b in (SELECT b FROM S); -- (1,2) with (2,5) and (1,2) with (2,6) both satisfies the condition, 1 is output twice SELECT a FROM R, S -- double loop WHERE R.b = S.b; ``` > NOT EQUAL operator in SQL is `<>` > [!note] Difference between > > - `ANY` means not = a, _or_ not = b, _or_ not = c > - `NOT IN` means not = a, _and_ not = b, _and_ not = c. (analogous to `ALL`) > [!note] > > `EXISTS()` is true iff subquery result is not empty. > [!note] > > structure: `()()` ### bag > or a multiset, is like a set, but an element may appear more than once. - Force results to be a set with `SELECT DISTINCT` - Force results to be a bag with `UNION ALL` `ORDER BY` ops followed with `desc` ### insert, update, delete ```sql INSERT INTO Likes VALUES('Sally', 'Bud'); -- or INSERT INTO Likes(beer, drinker) VALUES('Bud', 'Sally'); ``` add `DEFAULT` value during `CREATE TABLE` (`DEFAULT` value will be used if inserted tuple has no value for given attributes) ```sql create table Drinkers ( name CHAR(30) PRIMARY KEY, addr CHAR(50) DEFAULT '123 Sesame Street', phone CHAR(16) ); -- in this case, this will use DEFAULT value for addr -- | name | address | phone | -- | Sally | 123 Sesame Street | NULL | INSERT INTO Drinkers(name) VALUES('Sally'); ``` _Those drinkers who frequent at least one bar that Sally also frequents_ ```sql INSERT INTO Buddies (SELECT d2.drinker FROM Frequents d1, Frequents d2 WHERE d1.drinker = 'Sally' AND d2.drinker <> 'Sally' AND d1.bar = d2.bar) ``` `DELETE FROM`: ```sql -- remove a relation DELETE FROM Beers WHERE name = 'Bud'; -- remove all relation DELETE FROM Likes; -- Delete from Beer(name, manf) all beers for which there is another beer by the same manufacturer DELETE FROM Beers b WHERE EXISTS ( SELECT name FROM Beers WHERE manf = b.manf AND name <> b.name ) ``` `UPDATE` schema: ```prolog UPDATE SET WHERE ``` ### aggregations `SUM`, `AVG`, `COUNT`, `MIN`, `MAX` can be applied toa column in `SELECT` clause `COUNT(*)` counts the number of tuples ```sql -- find average price of Bud SELECT AVG(price) FROM Sells WHERE beer = 'Bud'; -- to get distinct value, then use DISTINCE SELECT COUNT(DISTINCT price) FROM Sells WHERE beer = 'Bud'; ``` > `NULL` _never_ contributes to a sum, average, or count > > however, if all values in a column are `NULL`, then aggregation is `NULL` > > exception: `COUNT` of an empty set is 0 `GROUP BY`: according to the values of all those attributes, and any aggregation is applied only within each group: ```sql -- find the youngest employees per rating SELECT rating, MIN(age) FROM Employees GROUP BY rating -- find for each drinker the average price of Bud at the bars they frequent SELECT drinker, AVG(price) FROM Frequents, Sells WHERE beer = 'Bud' AND Frequents.bar = Sells.bar GROUP BY drinker; ``` > [!tip] restriction on > > each element of `SELECT` must be either: > > 1. Aggregated > > 2. An attribute on `GROUP BY` list > > > [!warning]- illegal example > > > > ```sql > > SELECT bar,beer,AVG(price) FROM Sells GROUP BY bar > > -- only one tuple out for each bar, no unique way to select which beer to output > > ``` `HAVING()` _may_ followed by `GROUP_BY` > If so, the condition applies to each group, and groups not satisfying the condition are eliminated. ```sql -- Get average price of beer given all beer groups exists with at -- least three bars or manufactured by Pete's SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 OR beer in (SELECT name FROM Beers WHERE manf = 'Pete''s'); ``` requirements on `HAVING`: - Anything goes in a subquery - Outside subqueries they may refer to attributes only if they are either: - A grouping attribute - aggregated ### cross product (cartesian product) ```sql -- Frequents x Sells -- (Bar) | Beer | Price | Drinker | (Bar) -- Joe | Bud | 3.00 | Aaron | Joe -- Joe | Bud | 3.00 | Mary | Jane SELECT drinker FROM Frequents, Sells WHERE beer = 'Bud' AND Frequents.bar = Sells.bar; ``` Or known as **join operations** ⇒ all join operations are considered cartesian products. Outer join preserves dangling tuples by padding with `NULL` > A tuple of $R$ that has no tuple of $S$ which it joins is said to be `dangling` ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/left-outer-join.webp) _Left outer join_ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/right-outer-join.webp) _Right outer join_ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/full-outer-join.webp) _full outer join_ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/inner-join.webp) _inner join_ ```sql R [NATURAL] [LEFT|RIGHT|FULL] OUTERJOIN [ON] S -- example R NATURAL FULL OUTERJOIN S ``` - natural: check equality on all common attributes && no two attributes with same name - left: padding dangling tuples of R only - right: padding dangling tuples of S only - full: padding both (default) ## views - many views (how users see data), single _logical schema (logical structure)_ and _physical schema (files and indexes used)_ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/view-abstraction.webp) _virtual_ views _does not stored in database_ (think of query for constructing relations) _materialized_ views are constructed and stored in DB. ```sql title="view default to virtual" CREATE [MATERIALIZED] VIEW as ; -- example: CanDrink(drinker, beer) create view CanDrink AS SELECT drinker, beer FROM Frequents f, Sells s WHERE f.bar = s.bar; ``` > Usually one shouldn’t update view, as it simply doesn’t exists. ## index idea: think of DS to speed access tuple of relations, organize records via tree or hashing DS: B+ Tree Index or Hash-based Index ### B+ Tree note: each node are at least 50% full ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3db3/b-plus-tree.webp) > [!tip] cost > > tree is “height-balanced” > > insert/delete at $\log_{F}N$ cost > > min 50% occupancy, each node contains $d \leq m \leq 2d$ entries, where $d$ is the _order or the tree_ #### insert a data entry - find correct leaf $L$ - put data entry onto $L$ - if $L$ has enough space ⇒ done! - `split` $L$ - redistribute entries evenly, `copy up` middle key - insert index entry point to $L_{2}$ in parent of $L$ > split grow trees, root split increase heights #### delete a data entry - find leaf $L$ where entry belongs - remove entry - if L is at least half-full ⇒ done! - if not - redistribute, borrowing from _sibling_ (adjacent node with same parent of $L$) - if fails, _merge_ and sibling - merge occured then delete entry (point to $L$ or sibling) from parent of $L$ > merge propagate root, decrease heights ### Hash-based Index - index is a collection of _buckets_ Insert: if bucket is full ⇒ `split` ### Alternatives for data entries | | How | | --------------------- | --------------------------------------------------------------------- | | By Value | record contents are stored in index file (no need to follow pointers) | | By Reference | \ | | By List of References | \ | --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3db3/tut/t1 tags: - sfwr3db3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/tut/t1" title: "/squeel/" date: 2024-09-13 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3db3/tut/t1.html.md --- TA: [Jongjun Park](mailto:parkj182@mcmaster.ca) ```sql db2 connect to se3db3 ``` ```bash scp /path/to/.ddl macid@se3db3.cas.mcmaster.ca:/home/macid/workspace/.dll # on server db2 -tnf .ddl ``` --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3ra3/Stakeholders tags: - sfwr3ra3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/Stakeholders" title: "Stakeholders" date: 2024-09-13 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/Stakeholders.html.md --- ## stakeholders help to _identify_ the problem Eliciting SSON to validate first hypothesis ## personas are V.A.R.I.E.D 1. Vivid 2. Actionable 3. Real 4. Identifiable 5. Exact 6. Detailed ex: Personal Floation Device (PFD) Which Personas could be relevant? - Cabin Crew - Frequent traveller - Traveller with small children - Traveller with Disability --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3ra3/acme-visits tags: - sfwr3ra3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/acme-visits" title: "acme visits" date: 2024-09-12 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/acme-visits.html.md --- See also [description](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/acme-visits/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3ra3/acme/requirements) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3ra3/index tags: - sfwr3ra3 - university description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/index" title: "Software Requirements and Security Considerations" date: 2024-09-04 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/index.html.md --- prof: [Dr. Sébastien Mosser](https://mosser.github.io/teaching/) See also ### requirements. functional vs. non-functional: IEEE 29148 --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3ra3/midterm tags: - sfwr3ra3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/midterm" title: "requirements notes" date: 2024-10-21 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/midterm.html.md --- > People do not buy products, they buy solutions to their problems. A good requirements is - necessary - verifiable - attainable Goals: desired results for target organisation - obstacles: property to be overcome Behaviour: - functional: outcome produced by _system_ - non-functional: property of how system achieves outcomes Constraints: - imposed by environment VARIED framework: Vivid: actually meet the persona Actionable: it should _help the team_ to build the product Real: where user are, observe and interact Identifiable: dog-food Exact: Be specific Detailed: Good personas are substantials > [!abstract] Requirement Engineering > > is a **human activity** based on cognitive psychology, anthropology, sociology, and linguistics ## SSON (Single Statement of Need) > clear, concise statement about _system's overall goals_ and _how it will accomplish_ those goals - describe what **capability the system** being developed will provide ## goal. > convey **intention/rationale/objective** of stakholders - support _elicitation, analysis, and provide inputs for specification_ Actor: act within a system to achieve the goal Agent: act on behalf of other actors Role: an actor can play $0 \cdots n$ roles Position: consistent roles that are cohesive > [!note] Hard Goal > > Can **measure**, quantify and describe in their entirety > [!note] Soft Goal > > We know we need but cannot describe fully Resource: Can be used to achieve goals by an actor Plans: how actor will execute actions ### resolving soft goals 1. Definitions: convert soft ⇒ hard 2. Contributions: create sub goals to solve soft goals as functions 3. Decomposition: decompose soft to multiple sub goals ## Risks. flexible, adaptive, and changeable > potential events can impact your project progress Issues: known problems can be identified Risk (what if) ⇒ issues (current) who to contact how to mitigate such risks > calm, figure out root cause, and come up with solutions ### RACI matrix Responsible Accountable Consulted Informed ### fish-bone diagram scope creep risk register risk assessment ### probability and impact matrix inherent risk: measure of a risk, calculated by its probability and impact time risks budget risks scope risks: not be able to deliver milestones external risks: Single point of failure: risk that has potential to cause a catastrophic failure. dependency: relations between different tasks ### mitigation strategies - avoid - accept - reduce and control (use decision tree) - transfer ## Requirements of Oppression socio-technical, DEI ⇒ social infrastructure that reflect, reinforce, and amplify the matrix of oppression - Gender - Ability - Race Bias in data 1. Center the margins, or increase more diversity 2. social conflict? 3. Human-centric ## [tacit knowledge](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/midterm/../../../../../../../../thoughts/tacit-knowledge) ## modelling > conceptual representation of something --- slug: thoughts/university/twenty-four-twenty-five/sfwr-3ra3/t2 tags: - sfwr3ra3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/t2" title: "Identifying Stakeholders" date: 2024-09-09 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/t2.html.md --- See also [instruction](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-3ra3/t2/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-3ra3/T02.pdf) ## case study - As people go green, there is an increased need for information on the facilities for cycling and pedestrian traffic in cities. - The tutorials for this course will develop an application that allows citizens to find information about these facilities. - Assume that you have been hired as a consulting company for the city of Hamilton to provide a mobile application (codename: BikeTour) for these facilities > [!question] Task 1 > > Identify the stakeholders of the following software system. > > > Brainstorm a collection of stakeholders that you should consult for this application. > > > > Differentiate the direct stakeholders from the indirect ones. Reminder: A stakeholder is any individual/group/org with a vested interest in your product 1. direct: - City of Hamilton - Hamilton cyclist - Pedestrians and other active transportation users 2. indirect - Hamilton City Council - decision on infrastructure billing - Government - properties developers - Nearby municipalities with connecting transportation (i.e: Dundas, Burlington, etc.) - Local academia institution who might benefit from this infrastructures. - Cycling advocacy groups > [!question] Task 2 > > What other requirements sources could be used to develop that product? > > > Brainstorm a collection of requirements sources for this application. - regulations and industry “best-practice” - municipals bylaws and official plans - accesibility guidelines - cycling apps engagement strategies - social media - local newspapers - similar functionalities - existing transport data - cycling volumes, traffic, collisions stats - bike share data, transit network data (enable multi-modal planning trips) - User research / stakeholders input - Surveys, “blind studies”, interviews with users - Testing and feedback loop - refinement of functionality - Technical requirements > [!question] Task 3 > > Perform an analysis of what types of elicitation methods would be appropriate for your identified stakeholders. - Interviews - Focus on the needs of the stakeholders (one-on-one), allowing detailed discussions with subject matters experts opinions - Focus groups - Organize focus groups with Hamilton cyclists and pedestrians to get direct feedback on their needs, experiences, and expectations from the app - Surveys - Gather feedback from a large number of users through surveys to gather requirements. - Building knowledge base, existing documents (i.e: city bylaws, municipality plans, etc.) ⇒ inform elicitation process - Brainstorming sessions - conduct sessions with diverse stakeholders to generate possible solutions > [!question] Task 4 > > Identify your “most-valuable” stakeholder(s) and the most valuable feature(s) BikeTour can bring to them Write a couple of scenarios 1. discovering new routes Sarah is an avid cyclist living in Hamilton. She commutes to work by bike daily but is getting bored of her usual route. She opens up the BikeTour app, selects her starting point and destination, and browses the suggested route options. She filters for routes that are scenic but still bike-friendly and direct enough for commuting. The app displays several new route options with elevation profiles, estimated trip duration, and route difficulty ratings sourced from other local cyclists’ trip data. 2. data-driven cycling infrastructure planning Trevor is a PM in City of Hamilton’s Sustainable Mobility Department. He is working on prioritising new cycling infrastuctures projects for the coming calendar year. Trevor logs into the BikeTour admin dashboard and views the aggregated, anonymized trip data from Hamilton cyclists using the app. A heatmap shows the most popular cycling routes, while another data visualization identifies “problem areas” with frequent cyclist-reported issues like potholes, close calls with cars, or inadequate bike parking. By cross-referencing this crowdsourced data from actual Hamilton cyclists with the city’s existing cycling network data, Trevor can easily identify key gaps and safety hotspots. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/index tags: - sfwr4aa4 - university description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/index" title: "Real-time system a la carte" date: 2024-09-06 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/index.html.md --- 1. Soft RT system are those which do not : False 2. A good scheduling algorithm for hard real time: False 3. (continuous graph): B --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab4/content tags: - lab - sfwr4aa4 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab4/content" title: "Threaded LED" date: 2024-10-04 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab4/content.html.md --- See [part1](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab4/part1/main.c) See [part2](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab4/part2/main.c) See [part3](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab4/part3/main.c) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab5/content tags: - lab - sfwr4aa4 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab5/content" title: "external LED" date: 2024-10-04 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab5/content.html.md --- See [part1](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab5/part1/main.c) See [part2](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab5/part2/main.c) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/lab8 tags: - sfwr4aa4 - lab description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/lab8" title: "PWM and Shared Memory" date: 2024-11-08 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/lab8.html.md --- $$ f_\text{PWM} = \frac{f_\text{clk}}{N(X+1)} $$ See [thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part1.c](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part1.c) See [thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part2.c](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part2.c) and [thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part2\_application.c](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part2_application.c) See [thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part3.c](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part3.c) and [thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part3\_application.c](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab8/pwm/part3_application.c) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content tags: - sfwr4aa4 - lab description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content" title: "PID Controller from input signals" date: 2024-11-15 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content.html.md --- Transfer function for angular speed: $$ \frac{A}{1 + \tau s} $$ The input signal that begins at time $t_{0}$ and its minimum and maximum values are given by $u_\text{min}, u_\text{max}$. The resulting output signal is initially at $y_{0}$ and eventually settles down for a steady state value of $y_\text{ss}$. The steady state gain $A$ is given by: $$ A = \frac{y_\text{ss} - y_0}{u_\text{max} - u_\text{min}} = \frac{\triangle y}{\triangle u} $$ Time constant $\tau$ is time required for output to increase from initial value to $0.632 \times \triangle y$ Let $t_1$ is time when change in output is $0.632 \times \triangle y$: $$ \begin{aligned} y(t_{1}) &= 0.632 \times (y_\text{ss} -y_{0}) + y_{0} \\[8pt] \tau &= t_{1} - t_{0} \end{aligned} $$ ## find the transfer function ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/first-graph.webp) $$ \begin{align*} \triangle y &= 3V \\[6pt] \triangle v &= 8.285 - 2.454 = 5.831 \text{rad/s} \\[6pt] A &= \frac{5.831}{3} = 1.9436666667 \text{rad/s} \\[12pt] \text{target velocity} &= 2.454 + 0.632 * 5.831 = 6.139192 \text{rad/s} \\[8pt] \tau \approx 0.029 \text{secs} \end{align*} $$ _note: reach it at around 5.029 sec_ ## graphs see [simulink file](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/lab9part2.slx) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/graph-p2.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/lab9/pid-setup.webp) _proportional_ $P = 350 * \frac{\pi}{180}$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm tags: - sfwr4aa4 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm" title: "rt_system items" date: 2024-10-22 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm.html.md --- correctness: $|C(t) - Cs(t)| < \epsilon$ **drift** is RoC of the clock value from perfect clock. Given clock has bounded drift $\rho$ then $$ \mid \frac{dC(t)}{dt} -1 \mid < \rho $$ Monotonicity: $\forall t_{2} > t_{1}: C(t_{2}) > C(t_{1})$ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/rt-sys-failure.webp) ## kernels `syscall` in kernel: User space and Kernel Space are in different spaces ```mermaid graph LR A[procedure] --[parameters]--> B[TRAP] B --> C[Kernel] C --> B --> A ``` > [!tip] Important > > a user process becomes kernel process when _execute syscall_ Scheduling ensures fairness, min response time, max throughput | | OS | RTOS | | ------------ | ----------------- | --------------------------------------- | | philos | time-sharing | event-driven | | requirements | high-throughput | schedulablity (meet all hard deadlines) | | metrics | fast avg-response | ensureed worst-case response | | overload | fairness | meet critical deadlines | > Kernel programs can always preempt user-space programs Kernel program example: ```c #include /* Required by macros*/ #include /*KERN_INFO needs it*/ #include static char *my_string __initdata = "dummy"; static int my_int __initdata = 4; /* Init function with user defined name*/ static int __init hello_4_init(void) { printk(KERN_INFO "Hello %s world, number %d\n", my_string, my_int); return 0; } /* Exit function with user defined name*/ static void __exit hello_4_exit(void) { printf(KERN_INFO "Goodbye cruel world 4\n"); } /*Macros to be used after defining init and exit functions*/ module_init(hello_4_init); module_exit(hello_4_exit) ``` ## **preemption** && `syscall` > The act of temporarily interrupting a currently scheduled task for higher priority tasks. > NOTE: `make` doesn’t recompile if DAG is not changed. ## process - independent execution, logical unit of work scheduled by OS - in virtual memory: - Stack: store local variables and function arguments - Heaps: dyn located (think of `malloc`, `calloc`) - BSS segment: uninit data - Data segment: init data (global & static variables) - text: RO region containing program instructions | | stack | heap | | -------- | ------------------------------------------- | ------------------------- | | creation | `Member m` | `Member*m = new Member()` | | lifetime | function runs to completion | delete, free is called | | grow | fixed | dyn added by OS | | err | stack overflow | heap fragmentation | | when | size of memory is known, data size is small | large scale dyn mem | ## `fork()` - create a `child` process that is identical to its parents, return `0` to child process and pid - add a lot of overhead as duplicated. **Data space is not shared** > variables init b4 `fork()` will be duplicated in both parent and child. ```c #include int main(int argc, char** argv) { int child = fork(); int c = 0; if (child) c += 5; else { child = fork(); c += 5; if (child) c += 5; } printf("%d ", c); } ``` ## threads - program-wide resources: global data & instruction - execution state of control stream - shared address space for faster context switching > * Needs synchronisation (global variables are shared between threads) > * lack robustness (one thread can crash the whole program) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/mem-layout-threaded.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/single-vs-multithreaded.webp) ```c #include void *foo(void *args) {} pthread_attr_t attr; pthread_attr_init(attr); pthread_t thread; // pthread_create(&thread, &attr, function, arg); ``` To solve race condition, uses semaphores. ## polling and interrupt - polling: reading memloc to receive update of an event - think of ```prolog while (true) { if (event) { process_data() event = 0; } } ``` - interrupt: receieve interrupt signal - think of ```prolog signal(SIGNAL, handler) void handler(int sig) { process_data() } int main() { while (1) { do_work() } } ``` | | interrupt | polling | | ------------ | --------- | ------- | | speed | fast | slow | | efficiency | good | poor | | cpu-waste | low | high | | multitasking | yes | yes | | complexity | high | low | | debug | difficult | easy | ## process priority `nice`: change process priority - 0-99: RT tasks - 100-139: Users > lower the NICE value, higher priority ```c #include int getpriority(int which, id_t who); int setpriority(int which, id_t who, int value); ``` set scheduling policy: `sched_setscheduler(pid, SCHED_FIFO | SCHED_RR | SCHED_DEADLINE, ¶m)` ## scheduling 1. Priority-based preemptive scheduling ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/pbps.webp) Temporal parameters: Let the following be the scheduling parameters: | desc | var | | -------------------- | --------------------- | | # of tasks | $n$ | | release/arrival-time | $r_{i,j}$ | | absolute deadline | $d_i$ | | relative deadline | $D_i = r_{i,j} - d_i$ | | execution time | $e_i$ | | response time | $R_i$ | ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/abs-rel-deadline.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/resp-time-exec-time.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/resp-time-preempted-exec.webp) _response time when execution is preempted_ > Period $p_i$ of a periodic task $T_i$ is **min length** of all time intervales between release times of consecutive tasks. > Phase of a Task $\phi_i$ is the release time $r_{i,1}$ of a task $T_i$, or $\phi_i = r_{i,1}$ > _in phase_ are first instances of several tasks that are released simultaneously > [!tip] Representation > > a periodic task $T_i$ can be represented by: > > - 4-tuple $(\phi_i, P_i, e_i, D_i)$ > - 3-tuple $(P_i, e_i, D_i)$, or $(0, P_i, e_i, D_i)$ > - 2-tuple $(P_i, e_i)$, or $(0, P_i, e_i, P_i)$ > [!tip] Utilisation factor > > for a task $T_i$ with execution time $e_i$ and period $p_i$ is given by > > $$ > u_i = \frac{e_i}{p_i} > $$ For system with $n$ tasks overall system utilisation is $U = \sum_{i=1}^{n}{u_i}$ ## cyclic executive assume tasks are non-preemptive, jobs parameters with hard deadlines known. - no race condition, no deadlock, just function call - however, very brittle, number of frame $F$ can be large, release times of tasks must be fixed ### _hyperperiod_ > is the least common multiple (lcm) of the periods. > [!tip] maximum num of arriving jobs > > $$ > N = \sum_{i=1}^{n} \frac{H}{p_i} > $$ **Frames**: each task must fit within a single frame with size $f$ ⇒ number of frames $F = \frac{H}{f}$ C1: A job must fit in a frame, or $f \geq \text{max} \space e_i \forall \space 1\leq i \leq n$ for all tasks C2: hyperperiod has an integer number of frames, or $\frac{H}{f} = \text{integer}$ C3: $2f - \text{gcd}(P_i, f) \leq D_i$ per task. ### task slices idea: if framesize constraint doesn’t met, then “slice” into smaller sub-tasks $T_3=(20, 5)$ becomes $T_{3_{1}}=(20,1)$ and $T_{3_{2}}=(20,3)$ and $T_{3_{3}}=(20, 1)$ ### Flow Graph for hyper-period - Denote all jobs in hyperperiod of $F$ frames as $J_{1} \cdots J_{F}$ - Vertices: - $k$ job vertices $J_{1},J_{2},\cdots,J_{k}$ - $F$ frame vertices $x,y,\cdots,z$ - Edges: - $(\text{source}, J_i)$ with capacity $C_i=e_i$ - Encode jobs’ compute requirements - $(J_i, x)$ with capacity $f$ iff $J_i$ can be scheduled in frame $x$ - encode periods and deadlines - edge connected job node and frame node if the following are met: 1. job arrives **before** or at the starting time of the frame 2. job’s absolute deadline **larger** or equal to ending time of frame - $(f, \text{sink})$ with capacity $f$ - encodes limited computational capacity in each frame ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/flow-graph-hyperperiod.webp) ## static priority assignment For higher priority: - shorter period tasks (rate monotonic RM) - tasks with shorter relative deadlines (deadline monotonic DM) ### rate-monotonic - running on uniprocessor, tasks are preemptive, no OS overhead for preemption > task $T_i$ has higher priority than task $T_j$ if $p_i < p_j$ > [!tip] schedulability test for RM (Test 1) > > Given $n$ periodic processes, independent and preemptable, $D_i \geq p_i$ for all processes, **periods of all tasks are _integer_ multiples of each other** > > a sufficient condition for tasks to be scheduled on uniprocessor: $U = \sum_{i=1}^{n}\frac{e_i}{p_i} \leq 1$ > [!tip] schedulability test for RM (Test 2) > > A _sufficient_ but not necessary condition is $U \leq n \cdot (2^{\frac{1}{n}} - 1)$ for $n$ periodic tasks > > for $n \to \infty$, we have $U < \ln(2) \approx 0.693$ > [!tip] schedulability test for RM (Test 3) > > Consider a set of task $(T_{1},T_{2},\cdots,T_i)$ with $p_{1} > Supposed $T_2$ finishes at $t$. Total number of isntances of task $T_1$ released over time interval $[0; t)$ is $\lceil \frac{t}{p_{1}} \rceil$ > > Thus the following condition must be met for every instance of task $T_1$ released during tim interval $(0;t)$: > > $$ > t = \lceil \frac{t}{p_{1}} \rceil \space e_1 + e_2 > $$ idea: find $k$ such that time $t = k \times p_1 \geq k * e_1 + e_2$ and $k\times p_1 \leq p_2$ for task 2 > [!tip] general solution for RM-schedulability > > The time demand function for task $i; 1 \leq i \leq n$: > > $$ > \begin{aligned} \omega_i(t) &= \sum_{k=1}^{i} \lceil \frac{t}{p_k} \rceil \space e_k \leq t \\ \\ &\because 0 \leq t \leq p_i \end{aligned} > $$ > > holds a time instant $t$ chosen as $t=k_j p_j, (j=1,\cdots,i)$ and $k_j = 1, \cdots, \lfloor \frac{p_i}{p_j} \rfloor$ ### deadline-monotonic - if every task has period equal to relative deadline, same as RM - arbitrary deadlines then DM performs better than RM - **RM always fails if DM fails** ## dynamic priority assignment ### earliest-deadline first (EDF) _depends on closeness of absolute deadlines_ > [!tip] EDF schedulability test 1 > > set of $n$ periodic tasks, each whose relative deadline is equal to or greater than its period iff $\sum_{i=1}^{n}(\frac{e_i}{p_i}) \leq 1$ > [!tip] EDF schedulability test 2 > > relative deadlines are not equal to or greater than their periods > > $$ > \sum_{i=1}^{n}(\frac{e_i}{\text{min}(D_i, p_i)}) \leq 1 > $$ ## Priority Inversion **critical sections** to avoid **race condition** > Higher priority task can be blocked by a lower priority task due to resource contention shows how resource contention can delay completion of higher priority tasks - access shared resources guarded by Mutex or semaphores - access non-preemptive subsystems (storage, networks) Resource Access Control ### mutex serially reusable: a resource cannot be interrupted > If a job wants to use $k_i$ units of resources $R_i$, it executes a lock $L(R_i; k_i)$, and unlocks $U(R_i; k_i)$ once it finished ### Non-preemptive Critical Section Protocol (NPCS) idea: schedule all critical sections non-preemptively **While a task holds a resource it executes at a priority higher than the priorities of all tasks** **a higher priority task is blocked only when some lower priority job is in critical section** pros: - zk about resource requirements of tasks cons: - task can be blocked by a lower priority task for a long time even without resource conflict ### Priority Inheritance Protocol (PIP) idea: increase the priorites only upon resource contention avoid NPCS drawback would still run into deadlock (think of RR task resource access) ### Priority Ceiling Protocol (PCP) idea: extends PIP to prevent deadlocks - assigned priorities are fixed - resource requirements of all the tasks that will request a resource $R$ is known `ceiling(R)`: highest priority. Each resource has fixed priority ceiling --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3 tags: - sfwr4aa4 - quiz description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3" title: "q3" date: 2024-09-27 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3.html.md --- ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-1.webp) Answer: B ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-2.webp) Answer: 8 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-3.webp) Answer: 70 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-4.webp) Answer: D ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-5.webp) Answer: B ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-6.webp) Answer: B ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-7.webp) Answer: C ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4aa4/q3-8.webp) Answer: B --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4aa4/w2 tags: - sfwr4aa4 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/w2" title: "Fork and threads" date: 2024-09-13 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4aa4/w2.html.md --- Q1: T Q2: T Q3: F Q4: User Q5: Save memory space Q6: Yes, but the modification can be only seen in child process, and value in parents process cannot be changed. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4g06ab/index tags: - university - sfwr4g06ab description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4g06ab/index" title: "Software Enginering Capstone a la carte." date: 2024-09-04 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4g06ab/index.html.md --- ## projects See [tinymorph](https://tinymorph.aarnphm.xyz) ## statement and goals. 1. natural-language driven terminal Possible prof: [Emil Sekerinski](https://www.cas.mcmaster.ca/~emil/) or [Richard Paige](https://www.google.com/search?q=Richard+Paige\&sourceid=chrome\&ie=UTF-8) - [warp](https://www.warp.dev) as an example, but closed source - So you can think it like [Alacritty](https://github.com/alacritty/alacritty) but with async command-runner - voice-driven assistant: real-time transcribe ⇒ generate commands from language to shell commands - voice → natural language - natural language → commands - Configuration, maybe in Lua - stretch goal: new shell based on rust syntax and borrowing concept of variables. 2. WYSIWYG editor (choosen, see [docs](https://tinymorph.aarnphm.xyz)) - Markdown renderer - train [SAE](https://transformer-circuits.pub/2023/monosemantic-features/index.html) for specific type of writing tonality ⇒ manual steering for text generation on creative writing - exploration of internals writing features based on text - inspired by [Prism](https://x.com/thesephist/status/1747099907016540181) 3. Infrastructure and AI Companion for Engineering Knowledge Management (19) - [Quartz](https://quartz.jzhao.xyz/) + similarity search + ANN for reranking --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept tags: - sfwr4ml3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept" title: "Bias and intercept" date: 2024-09-16 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept.html.md --- See also: [slides 3](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture3.pdf), [slides 4](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture4.pdf), [slides 5](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture5.pdf) ## adding bias in D-dimensions OLS $$ X^{'}_{n \times (d+1)} = \begin{pmatrix} x_1^{1} & \cdots & x_1^{d} & 1 \\ \vdots & \ddots & \vdots & \vdots \\ x_n^{1} & \cdots & x_n^{d} & 1 \end{pmatrix} $$ and $$ W_{(d+1) \times 1} = \begin{pmatrix} w_1 \\ \vdots \\ w_d \\ w_0 \end{pmatrix} $$ Add an new auxiliary dimension to the input data, $x_{d+1} = 1$ Solve OLS: $$ \min\limits{W \in \mathbb{R}^{d \times 1}} \|XW - Y\|_2^2 $$ Gradient for $f: \mathbb{R}^d \rightarrow \mathbb{R}$ $$ \triangledown_{w} \space f = \begin{bmatrix} \frac{\partial f}{\partial w_1} \\ \vdots \\ \frac{\partial f}{\partial w_d} \\ \end{bmatrix} $$ [Jacobian](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept/../../../../../../../../thoughts/Vector-calculus#jacobian-matrix) for $g: \mathbb{R}^m \rightarrow \mathbb{R}^n$ $$ \begin{aligned} \triangledown_{w} \space g &= \begin{bmatrix} \frac{\partial g_1}{\partial w_1} & \cdots & \frac{\partial g_1}{\partial w_d} \\ \vdots & \ddots & \vdots \\ \frac{\partial g_n}{\partial w_1} & \cdots & \frac{\partial g_n}{\partial w_d} \end{bmatrix}_{n \times m} \\ \\ &u, t \in \mathbb{R}^d \\ &\because g(u) = u^T v \implies \triangledown_{w} \space g = v \text{ (gradient) } \\ \\ &A \in \mathbb{R}^{n \times n}; u \in \mathbb{R}^n \\ &\because g(u) = u^T A u \implies \triangledown_{w} \space g = (A + A^T) u^T \text{ (Jacobian) } \end{aligned} $$ > [!tip] result > > $$ > W^{\text{LS}} = (X^T X)^{-1} X^T Y > $$ ## non-linear data Idea is to include adding an additional padding ## multivariate polynomials. > question the case of multivariate polynomials > > - Assume $M >> d$ > - Number of terms (monomials): $\approx (\frac{M}{d})^d$ > - `#` of training samples $\approx$ `#` parameters An example of `Curse of dimensionality` ## overfitting. strategies to avoid: - add more training data - L1 (Lasso) or L2 (Ridge) regularization - add a penalty term to the objective function - L1 makes sparse models, since it forces some parameters to be zero (robust to outliers). Since having the absolute value to the weights, forcing some model coefficients to become exactly 0. $$ \text{Loss}(w) = \text{Error} + \lambda \times \| w \| $$ - L2 is better for feature interpretability, for higher non-linear. Since it doesn’t perform feature selection, since weights are only reduced near 0 instead of exactly 0 like L1 $$ \text{Loss}(w) = \text{Error} + \lambda \times w^2 $$ - Cross-validation - split data into k-fold - early stopping - dropout, see [example](https://keras.io/api/layers/regularization_layers/dropout/) - randomly selected neurons are ignored ⇒ makes network less sensitive **sample complexity** of learning multivariate polynomials ## regularization. L2 regularization: $$ \text{min}_{W \in \mathbb{R}^{d}} \| XW - Y \|^{2}_{2} + \lambda \| W \|_{2}^{2} $$ > [!tip] Solving > > Solve that > > $$ > W^{\text{RLS}} = (X^T X + \lambda I)^{-1} X^T Y > $$ > > Inverse exists as long as $\lambda > 0$ ## polynomial curve-fitting revisited feature map: $\phi{(x)}: R^{d_1} \rightarrow R^{d_2}$ where $d_{2} >> d_{1}$ training: - $W^{*} = \min\limits{W} \| \phi W - Y \|^{2}_{2} + \lambda \| W \|_{2}^{2}$ - $W^{*} = (\phi^T \phi + \lambda I)^{-1} \phi^T Y$ prediction: - $\hat{y} = \langle{W^{*}, \phi{(x)}} \rangle = {W^{*}}^T \phi(x)$ > [!abstract] choices of > > - Gaussian basis functions: $\phi(x) = \exp{(-\frac{\| x - \mu \|^{2}}{2\sigma^{2}})}$ > - Polynomial basis functions: $\phi(x) = \{1, x, x^{2}, \ldots, x^{d}\}$ > - Fourier basis functions: DFT, FFT ## computational complexity calculate $W^{\text{RLS}} = (\phi^T \phi + \lambda I)^{-1} \phi^T Y$ matmul: - Native: $O(d^3)$ - Strassen’s algorithm: $O(d^{2.81})$ - Copper-Smith-Winograd: $O(d^{2.376})$ matrix inversion: - Gaussian elimination: $O(d^3)$ - [Cholesky decomposition](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept/../../../../../../../../thoughts/Cholesky-decomposition): $O(d^3)$ (involved around $\frac{1}{3}n^3$ FLOPs) ## kernels compute higher dimension inner products $$ K(x^i, x^j) = \langle \phi(x^i), \phi(x^j) \rangle $$ Polynomial kernels of degree 2: $$ k(x^i, x^j) = (1 + (x^i)^T x^j)^2 = (1 + \langle{x^i, x^j} \rangle)^2 \\ \\ \because O(d) \text{ operations} $$ > [!abstract] degree M polynomial > > $$ > k(x^i, x^j) = (1 + (x^i)^T x^j)^M > $$ How many operations? - improved: $d + \log M$ ops --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression tags: - sfwr4ml3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression" title: "Linear regression" date: 2024-09-10 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression.html.md --- See also [slides for curve fitting](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture1.pdf), [regression](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture2.pdf), [colab link](https://colab.research.google.com/drive/1eljHSwYJSR5ox6bB9zopalZmMSJoNl4v?usp=sharing) python: [ols\_and\_kls.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/code/ols_and_kls.py) ## curve fitting. > [!question] how do we fit a distribution of data over a curve? > > Given a set of $n$ data points $S=\set{(x^i, y^i)}^{n}_{n=1}$ - $x \in \mathbb{R}^{d}$ - $y \in \mathbb{R}$ (or $\mathbb{R}^{k}$) ## ols. > [!tip] Ordinary Least Squares (OLS) > > Let $\hat{y^i}$ be the prediction of a model $X$, $d^i = \| y^i - \hat{y^i} \|$ is the error, minimize $\sum_{i=1}^{n} (y^i - \hat{y^i})^2$ In the case of 1-D ordinary least square, the problems equates find $a,b \in \mathbb{R}$ to minimize $\min\limits_{a,b} \sum_{i=1}^{n} (ax^i + b - y^i)^2$ ### optimal solution $$ \begin{aligned} a &= \frac{\overline{xy} - \overline{x} \cdot \overline{y}}{\overline{x^2} - (\overline{x})^2} = \frac{\text{COV}(x,y)}{\text{Var}(x)} \\ b &= \overline{y} - a \overline{x} \end{aligned} $$ where $\overline{x} = \frac{1}{N} \sum{x^i}$, $\overline{y} = \frac{1}{N} \sum{y^i}$, $\overline{xy} = \frac{1}{N} \sum{x^i y^i}$, $\overline{x^2} = \frac{1}{N} \sum{(x^i)^2}$ ### hyperplane > [!abstract] Hyperplane equation > > $$ > \hat{y} = w_{0} + \sum_{j=1}^{d}{w_j x_j} \\ \because w_0: \text{the y-intercept (bias)} > $$ Homogeneous hyperplane: $$ \begin{aligned} w_{0} & = 0 \\ \hat{y} &= \sum_{j=1}^{d}{w_j x_j} = \langle{w,x} \rangle \\ &= w^Tx \end{aligned} $$ Matrix form OLS: $$ X_{n\times d} = \begin{pmatrix} x_1^1 & \cdots & x_d^1 \\ \vdots & \ddots & \vdots \\ x_1^n & \cdots & x_d^n \end{pmatrix}, Y_{n\times 1} = \begin{pmatrix} y^1 \\ \vdots \\ y^n \end{pmatrix}, W_{d\times 1} = \begin{pmatrix} w_1 \\ \vdots \\ w_d \end{pmatrix} $$ $$ \begin{aligned} \text{Obj} &: \sum_{i=1}^n (\hat{y}^i - y^i)^2 = \sum_{i=1}^n (\langle w, x^i \rangle - y^i)^2 \\ &\\\ \text{Def} &: \Delta = \begin{pmatrix} \Delta_1 \\ \vdots \\ \Delta_n \end{pmatrix} = \begin{pmatrix} x_1^1 & \cdots & x_d^1 \\ \vdots & \ddots & \vdots \\ x_1^n & \cdots & x_d^n \end{pmatrix} \begin{pmatrix} w_1 \\ \vdots \\ w_d \end{pmatrix} - \begin{pmatrix} y^1 \\ \vdots \\ y^n \end{pmatrix} = \begin{pmatrix} \hat{y}^1 - y^1 \\ \vdots \\ \hat{y}^n - y^n \end{pmatrix} \end{aligned} $$ > [!question] minimize > > $$ > \min\limits_{W \in \mathbb{R}^{d \times 1}} \|XW - Y\|_2^2 > $$ > [!abstract] OLS solution > > $$ > W^{\text{LS}} = (X^T X)^{-1}{X^T Y} > $$ Example: $$ \hat{y} = w_{0} + w_{1} \cdot x_{1} + w_{2} \cdot x_{2} $$ With $$ X_{n \times 2} = \begin{pmatrix} x^{1}_{1} & x^{1}_{2} \\ x^{2}_{1} & x^{2}_{2} \\ x^{3}_{1} & x^{3}_{2} \end{pmatrix} $$ and $$ X^{'}_{n \times 3} = \begin{pmatrix} x^{1}_{1} & x^{1}_{2} & 1 \\ x^{2}_{1} & x^{2}_{2} & 1 \\ x^{3}_{1} & x^{3}_{2} & 1 \end{pmatrix} $$ With $$ W = \begin{pmatrix} w_1 \\ w_2 \end{pmatrix} $$ and $$ W^{'} = \begin{pmatrix} w_1 \\ w_2 \\ w_0 \end{pmatrix} $$ thus $$ X^{'} \times W = \begin{pmatrix} w_0 + \sum{w_i \times x_i^{1}} \\ \vdots \\ w_0 + \sum{w_i \times x_i^{n}} \end{pmatrix} $$ See also [Bias and intercept](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent tags: - sfwr4ml3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent" title: "Stochastic gradient descent" date: 2024-11-11 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent.html.md --- See also [SGD and ODEs](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/A4) [Nesterov momentum](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent/../../../../../../../../thoughts/Nesterov-momentum) is based on [On the importance of initialization and momentum in deep learning](http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf) ```pseudo \begin{algorithm} \caption{SGD} \begin{algorithmic} \State \textbf{input:} $\gamma$ (lr), $\theta_0$ (params), $f(\theta)$ (objective), $\lambda$ (weight decay), \State $\mu$ (momentum), $\tau$ (dampening), nesterov, maximize \For{$t = 1$ to $...$} \State $g_t \gets \nabla_\theta f_t(\theta_{t-1})$ \If{$\lambda \neq 0$} \State $g_t \gets g_t + \lambda\theta_{t-1}$ \EndIf \If{$\mu \neq 0$} \If{$t > 1$} \State $b_t \gets \mu b_{t-1} + (1-\tau)g_t$ \Else \State $b_t \gets g_t$ \EndIf \If{$\text{nesterov}$} \State $g_t \gets g_t + \mu b_t$ \Else \State $g_t \gets b_t$ \EndIf \EndIf \If{$\text{maximize}$} \State $\theta_t \gets \theta_{t-1} + \gamma g_t$ \Else \State $\theta_t \gets \theta_{t-1} - \gamma g_t$ \EndIf \EndFor \State \textbf{return} $\theta_t$ \end{algorithmic} \end{algorithm} ``` --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Support-Vector-Machine tags: - sfwr4ml3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Support-Vector-Machine" title: "Support Vector Machine" date: 2024-11-11 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Support-Vector-Machine.html.md --- idea: maximizes margin and more robust to “perturbations” Euclidean distance between two points $x$ and the hyperplane parametrized by $W$ is: $$ \frac{\mid W^T x + b \mid }{\|W\|_2} $$ > Assuming $\| W \|_2=1$ then the distance is $\mid W^T x + b \mid$ ## maximum margin hyperplane $W$ has $\gamma$ margin if $$ \begin{aligned} W^T x + b \ge \gamma \space &\forall \text{ blue x} \\ W^T x +b \le - \gamma \space &\forall \text{ red x} \end{aligned} $$ Margin: $$ Z = \{(x^{i}, y^{i})\}_{i=1}^{n}, y \in \{-1, 1\}, \|W\|_2 = 1 $$ ## hard-margin SVM ```pseudo \begin{algorithm} \caption{Hard-SVM} \begin{algorithmic} \REQUIRE Training set $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE \textbf{solve:} $(w_{0},b_{0}) = \argmin\limits_{(w,b)} \|w\|^2 \text{ s.t } \forall i, y_{i}(\langle{w,x_i} \rangle + b) \ge 1$ \STATE \textbf{output:} $\hat{w} = \frac{w_0}{\|w_0\|}, \hat{b} = \frac{b_0}{\|w_0\|}$ \end{algorithmic} \end{algorithm} ``` note that this version is sensitive to outliers ## soft-margin SVM ```pseudo \begin{algorithm} \caption{Soft-SVM} \begin{algorithmic} \REQUIRE Input $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE \textbf{parameter:} $\lambda > 0$ \STATE \textbf{solve:} $\min_{\mathbf{w}, b, \boldsymbol{\xi}} \left( \lambda \|\mathbf{w}\|^2 + \frac{1}{m} \sum_{i=1}^m \xi_i \right)$ \STATE \textbf{s.t: } $\forall i, \quad y_i (\langle \mathbf{w}, \mathbf{x}_i \rangle + b) \geq 1 - \xi_i \quad \text{and} \quad \xi_i \geq 0$ \STATE \textbf{output:} $\mathbf{w}, b$ \end{algorithmic} \end{algorithm} ``` --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content tags: - sfwr4ml3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content" title: "Least Squared Regression" date: 2024-10-07 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content.html.md --- See also [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/LSR), [pdf](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/assignment.pdf), [solutions](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/solution.pdf) ## question 1. ### problem 1. > [!question]- part 1 > > 1. Divide the dataset into three parts: 1800 samples for training, 200 samples for validation, and 200 samples for testing. Perform linear OLS (without regularization) on the training samples twice—first with a homogeneous model (i.e., where the y-intercepts are zero) and then with a non-homogeneous model (allowing for a non-zero y-intercept). Report the MSE on both the training data and the validation data for each model > 2. Compare the results. Which approach performs better? Why? Apply the better-performing approach to the test set and report the MSE. > 3. Do you observe significant overfitting in any of the cases? 1. For homogeneous model, the MSE on training data is 26.1649 and on validation data is 77.0800 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p1-1.webp) Whereas with non-homogeneous model, the MSE on training data is 2.5900 and on validation data is 8.8059 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p1-12.webp) 2. We can observe that non-homogeneous model clearly performs better than the homogeneous models, given a significantly lower MSE (indicates that predictions are closer to the actual value). We can also see the difference between training and validation sets for non-homogeneous models shows better consistency, or better generalisation. Test set MSE for non-homogeneous model is 2.5900 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p1-2.webp) 3. We observe in both cases that the training MSE is significantly lower than the validation MSE, indicating overfitting. The non-homogeneous model shows a lower difference between training and validation MSE, which suggest there were some overfitting. The homogeneous models show more severe overfitting due to its constraints (forcing intercept to zero). > [!question]- part 2 > > 1. Divide the dataset into three parts: 200 samples for training, 1800 samples for validation, and 200 samples for testing. Perform linear OLS (without regularization) on the training samples twice—first with a homogeneous model (i.e., where the y-intercepts are zero) and then with a non-homogeneous model (allowing for a non-zero y-intercept). Report the MSE on both the training data and the validation data for each model > 2. Compare these results with those from the previous part. Do you observe less overfitting or more overfitting? How did you arrive at this conclusion? 1. For homogeneous model, the MSE on training data is 0.000 and on validation data is 151.2655 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p2-1.webp) Whereas with non-homogeneous model, the MSE on training data is 0.000 and on validation data is 15.8158 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p2-nhom.webp) 2. We observe an increased in overfitting, given the perfit fit in training data versus validation MSE for both model. We can still see that non-homogeneous models outperform homogeneous models, but the difference between training and validation MSE is significantly higher than the previous case. This is largely due to smaller training set (200 training samples versus 1800 training samples), models have less data to train on. ### problem 2. > [!question]- part 1 > > 1. Divide the Dataset into Three Parts: > > - **Training Data**: Select **200 data points**. > > - **Validation Data**: Assign **1800 data points**. > > - **Testing Data**: Set aside the **remaining 200 data points** for testing. > > 2. Run Regularized Least Squares (non-homogeneous) using 200 training data points. Choose various values of lambda within the range `{exp(-2), exp(-1.5), exp(-1), …, exp(3.5), exp(4)}`. This corresponds to $\lambda$ values ranging from exp(-2) to exp(4) with a step size of 0.5. For each value of $\lambda$, Run Regularized Least Squares (non-homogeneous) using 200 training data points. Compute the Training MSE and Validation MSE. > 3. Plot the Training MSE and Validation MSE as functions of lambda. The following is the graph for Training and Validation MSE as functions of lambda. ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q2-p2-g.webp) > [!question]- part 2 > > 1. What is the best value for lambda? Why? > 2. Use the best value of lambda to report the results on the test set. 1. Best $\lambda$ would be the one corresponding to lowest point on the validation MSE curve, as it is the one that minimizes the validation MSE. From the graph, we observe it is around $\lambda \approx 7.3891$ 2. Using $\lambda \approx 7.3891$, we get the following Test MSE around 1.3947 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-p2-rls-test.webp) ### problem 3. > [!question]- part 1 > > Choose a preprocessing approach (i.e., select a mapping) that transforms the 900-dimensional data points (900 pixels) into a new space. This new space can be either lower-dimensional or higher-dimensional. Clearly explain your preprocessing approach. We will use 2D Discrete Cosine Transform (DCT) to transform our data, followed by feature selection to reduce dimensionality by selecting a top-k coefficient. Reason: 1. DCT is mostly used in image compression (think of JPEG). Transform image from spatial to frequency domain. 2. Reduce dimensionality to help with overfitting, given we will only use 200 samples for training. In this case, we will choose `n_coeffs=100` > [!question]- part 2 > > implement your preprocessing approach. See the [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/LSR) for more information > [!question] part 3 > > Report the MSE on the training and validation sets for different values of lambda and plot it. **As mentioned, it should perform better for getting points.** choose the best value of lambda, apply your preprocessing approach to the test set, and then report the MSE after running RLS. The following graph shows the Training and Validation MSE as functions of $\lambda$. The optimal alpha is found to be $\lambda \approx 4.4817$ ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-dct-preprocess.webp) The given Test MSE is found to be around 3.2911 ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a1/q1-test-dct.webp) --- ## question 2. > [!question] problem statement > > In this question, we will use least squares to find the best line ($\hat{y}=ax + b$) that fits a non-linear function, namedly $f(x) = 2x - x^3 -1$ > > For this, assume that you are given a set of $n$ training point $\{ (x^i, y^i)\}^{n}_{i=1} = \{(({i}/{n}), 2({i}/{n})- ({i}/{n})^3- 1)\}^{n}_{i=1}$. > > Find a line (i.e $a,b \in \mathbb{R}$) that fits the training data the best when $n \to \infty$. Write down your calculations as well as the final values for $a$ and $b$. > > Additional notes: $n \to \infty$ assumption basically means that we are dealing with an integral rather than a finite summation. You can also assume $x$ is uniformly distributed on \[0, 1] We need to minimize sum of squared errors: $$ MSE(a,b) = \int_{0}^{1}(ax^i + b - y^i)^2 dx $$ We can compute $\mu_{x}, \mu_{y}$: $$ \begin{aligned} \mu_{x} &= \int_{0}^{1}x dx = \frac{1}{2} \\ \mu_{y} &= \int_{0}^{1}f(x) dx = \int_{0}^{1}(2x - x^3 - 1) dx = [x^2]^{1}_{0} - [\frac{x^4}{4}]^{1}_{0} - [x]^{1}_{0} = - \frac{1}{4} \end{aligned} $$ $$ \begin{aligned} \text{Var}(x) &= E[x^2] - (E[x])^2 = \int_{0}^{1}x^2 dx - (\frac{1}{2})^2 = \frac{1}{3} - \frac{1}{4} = \frac{1}{12} \\ \text{Cov}(x,y) &= E[xy] - E[x]E[y] = \int_{0}^{1}x(2x - x^3 - 1) dx - (\frac{1}{2})(-\frac{1}{4}) \end{aligned} $$ Compute $E[xy] = \int_{0}^{1}(2x-x^4-x)dx = \frac{2}{3} - \frac{1}{5} - \frac{1}{2} = - \frac{1}{30}$: Therefore we can compute covariance: $$ \text{Cov}(x,y) = - \frac{1}{30} + \frac{1}{8} = \frac{11}{120} $$ Slope $a$ and intercept $b$ can the be computed as: $$ \begin{aligned} a &= \frac{\text{Cov}(x,y)}{\text{Var}(x)} = \frac{11}{120} \times 12 = 1.1 \\ b &= \mu_{y} - a\mu_{x} = - \frac{1}{4} - \frac{11}{10} \times \frac{1}{2} = - \frac{4}{5} = -0.8 \end{aligned} $$ Thus, the best-fitting line is $\hat{y} = ax + b = \frac{11}{10}x - \frac{4}{5}$ ## question 3. > [!question] problem statement > > In this question, we would like to fit a line with zero y-intercept ($\hat{y} = ax$) to the curve $y=x^2$. However, instead of minimising the sume of squares of errors, we want to minimise the folowing objective function: > > $$ > \sum_{i} [\log {\frac{\hat{y}^i}{y^i}}]^2 > $$ > > Assume that the distribution of $x$ is uniform on \[2, 4]. What is the optimal value for $a$? Show your work. _asumption: log base 10_ We need to minimize the objective function $$ \text{Objective}(a) = \text{argmin} \sum_{i} [\log {\frac{\hat{y}^i}{y^i}}]^2 $$ where $\hat{y}^i = ax^i$ and $y^i=(x^i)^2$ Given $x$ is uniformly distributed on \[2, 4], we can express the sum as integral: $$ \begin{aligned} \text{Objective}(a) &= \int_{2}^{4} [\log {\frac{ax}{x^2}}]^2 dx \\ &= \int_{2}^{4} [\log(a) + \log(x) - 2 \log(x)]^2 dx \\ &= \int_{2}^{4} [\log(a) - \log(x)]^2 dx \end{aligned} $$ let $\ell = \log(a)$, we can rewrite the objective function as: $$ \begin{aligned} \text{Objective}(\ell) &= \int_{2}^{4} [\ell - \log(x)]^2 dx \\ &= \int_{2}^{4} [\ell^2 - 2\ell \log(x) + \log^2(x)] dx \\ &= \ell^2 \int_{2}^{4} dx - 2\ell \int_{2}^{4} \log(x) dx + \int_{2}^{4} \log^2(x) dx \end{aligned} $$ Compute each integral: $$ \begin{aligned} I_0 &= \int_{2}^{4} dx = 4 - 2 = 2 \\ I_1 &= \int_{2}^{4} \log(x) dx = [x \log(x) - x]^{4}_{2} = 4 \log(4) - 4 - 2 \log(2) + 2 = 4 \log(2) = 6 \log(2) - 2 \\ I_2 &= \int_{2}^{4} \log^2(x) dx \end{aligned} $$ Given we only interested in finding optimal $a$, we find the partial derivatives of given objective function: $$ \frac{\partial}{\partial \ell} \text{Objective}(\ell) = \frac{\partial}{\partial \ell} (\ell^2 I_0 - 2 \ell I_1 + I_2) = 2\ell I_0 - 2I_1 $$ Set to zero to find minimum $\ell$: $\log(a) = \ell = \frac{I_1}{I_0} = \frac{6 \log(2) - 2}{2} = 3\log(2) - 1$ Therefore, $a_{\text{opt}} = e^{\ell} = e^{3 \log(2) - 1} = e^{3 \log(2)} \times \frac{1}{e} = \frac{8}{e}$ Thus, optimal value for a s $a=8/e$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content tags: - sfwr4ml3 description: "implementation of PCA on LFW and TNC datasets" title: "PCA and Kernels, from scratch" date: 2024-10-21 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content.html.md --- See also [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/PCA), [pdf](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/assignment.pdf), [solutions](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/solution.pdf) ## question 1. ### task 1: eigenfaces implementation of `centeralize_data()` and `pca_components()` ```python def centeralize_data(data): return data - (data_mean := np.mean(data, axis=0).reshape(1, -1)), data_mean # fmt: off def pca_components(Vt, n_components): return Vt[:n_components] # fmt: on ``` Yields the following when running `plot_class_representatives`: [result](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t1.webp) ### task 2: PCA transformation and reconstructing > [!question] part A > > Implement `pca_tranform` ```python def pca_transform(X, n_components): U, s, *result = normalized_svd(X) return U[:, :n_components] * s[:n_components], *result ``` > [!question] part B > > Implement `pca_inverse_transform` ```python def pca_inverse_transform(transformed_data, Vt, n_components, data_mean): return transformed_data @ pca_components(Vt, n_components) + data_mean ``` Which yields the following for TNC visualisation: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-tnc-viz.webp) and LFW visualisation: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-lfw-viz.webp) We also expect some loss in information while reconstructing: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-bush-loss-info.webp) ### task 3: average reconstruction error for LFW $$ \text{error}=\frac{1}{n}\sum_{i=1}^n||x_i-\text{reconstruct}(pca(x_i))||^2_2 $$ > [!question] part A > > plot average reconstruction error on training and testing data points Training code: ```python # Define the number of components to test in PCA c_components = [2, 10, 30, 60, 100] # Initialize lists to store the reconstruction errors for training and testing data train_errors, test_errors = [], [] # Initialize deterministic seed SEED = 42 X_train, X_test = train_test_split(X_bush, train_size=400, random_state=SEED) # \text{error}=\frac{1}{n}\sum_{i=1}^n||x_i-\text{reconstruct}(pca(x_i))||^2_2 def mse(train_data, reconstructed): return np.mean(np.sum((train_data - reconstructed) ** 2, axis=1)) # Loop through each specified number of components for PCA for n_components in c_components: # Apply PCA and then inverse PCA to the training data transformed_train, Vt_train, mean_train = pca_transform(X_train, n_components) # Calculate the Mean Squared Error (MSE) as the reconstruction error for the training set train_errors.append(mse(X_train, pca_inverse_transform(transformed_train, Vt_train, n_components, mean_train))) # Normalize the test data. Transform the test data using the train data's PCA components # and reconstruct the test data. # Calculate the Mean Squared Error (MSE) as the reconstruction error for the test set test_errors.append(mse(X_test, pca_inverse_transform((X_test - mean_train) @ pca_components(Vt_train, n_components).T, Vt_train, n_components, mean_train))) # fmt: skip # Print the average reconstruction errors for each number of components for i, n_components in enumerate(c_components): print(f'Components: {n_components}\n\tTrain Error: {train_errors[i]:.4f}\n\tTest Error: {test_errors[i]:.4f}') ``` yields the following observation ```prolog Components: 2 Train Error: 40.2048 Test Error: 44.1277 Components: 10 Train Error: 21.6275 Test Error: 25.1425 Components: 30 Train Error: 11.6392 Test Error: 15.6092 Components: 60 Train Error: 6.6892 Test Error: 11.4092 Components: 100 Train Error: 3.7635 Test Error: 8.7075 ``` The eval results graph: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t3-eval.webp) > [!question] part B > > 1. Explains the difference between the two graphs > 2. What would the error be if we compute it for the TNC dataset while using two components and 2000 samples? 1. The following observation can be made: - Both decreases as the number of components increases (lower means better reconstruction quality). However, we observe test error line (red) is higher than train error (blue). This shows some overfitting given smaller training data size (400) against LFW dataset (which includes 1288 entries) - Both show diminishing returns, yet this effect is more pronounced on test error - As `n_components` increases, we see a decreases in bias (improving reconstruction for both train and test data). However, test error decreases more slowly given later components are less effective in reconstructing features for unseen data 2. Error for average reconstruction error for TNC is shown below: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t3-tnc-reconstruct-error.webp) ### task 4: Kernel PCA > [!question] part A > > Apply Kernel PCA and plot transformed Data Applied a `StandardScaler` to `X_TNC` and plot 3x4 grid with the (1,1) being the original data plot, followed by 11 slots for `gamma` from $[ 0.0001 \cdots 1 ]$. Run on `n_components=2` ```python gamma_values = [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1] n_components = 2 # Standardize the features scaler = StandardScaler() X_TNC_scaled = scaler.fit_transform(X_TNC) # Create subplots to visualize the transformed data for each gamma plt.figure(figsize=(20, 15)) # Plot the original data before applying Kernel PCA plt.subplot(3, 4, 1) plt.scatter(X_TNC_scaled[:, 0], X_TNC_scaled[:, 1], c=Y_TNC, cmap='bwr') plt.title('Original Data') plt.xlabel('coord_x') plt.ylabel('coord_y') # Set the limits for the x and y axes x_limits = (-4, 4) y_limits = (-4, 4) # Apply Kernel PCA for each gamma value for idx, gamma in enumerate(gamma_values): # Apply Kernel PCA kpca = KernelPCA(n_components=n_components, kernel='rbf', gamma=gamma) X_kpca = kpca.fit_transform(X_TNC_scaled) # Plot the transformed data plt.subplot(3, 4, idx + 2) plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c=Y_TNC, cmap='bwr') plt.title(f'Gamma = {gamma}') plt.xlabel('First principal component') plt.ylabel('Second principal component') # Set fixed x and y axis limits plt.xlim(x_limits) plt.ylim(y_limits) plt.tight_layout() plt.show() ``` Yield the following graph: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t4-kernel-pca-n-2.webp) > [!question] part B > > Based on your observations, how does Kernel PCA compare to Linear PCA on this dataset with red and blue labels? In what ways does Kernel PCA affect the distribution of the data points, particularly in terms of how well the red and blue points are organized? Choose the best value(s) for `gamma` and report it (them). What criteria did you use to determine the optimal `gamma` value? **Comparison**: - Kernel PCA is more effective in capturing the non-linear relationships in the data, in which we see the spread between blue and red circles, which modify the data distribution. Whereas with linear PCA, it maintains the circular structure, meaning linear PCA doesn’t alter data distribution that much **Effects**: - For small value of gamma $[ 0.0001, 0.0005, 0.001 ]$ the points are highly concentrated, meaning kernels is too wide (this makes sense given that `gamma` is the inverse of standard deviations) - For gamma $[ 0.005, \cdots 0.05 ]$, we notice a separation between blue and red circles. - For gamma $[0.1, 0.2]$ , we start to see similar features from original data entries, albeit scaled down given RBF kernels. - At gamma $[0.5, 1]$, we notice datasets to spread out, forming elongated features. > For gamma $[ 0.1, 0.2 ]$ seems to provide best representation of the original data **Criteria**: - class separation: how well the blue and red circles are separated from each other - compact: how tightly clustered the points within each classes are. - structure preservation: how well the circular nature of the original datasets are preserved. - dimensionality reduction: how well the data is projected in lower dimensions space > [!question] part C > > Find best values for reconstruction error of kernel PCA training loop yields the following: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t4-part-b-opt-kernel.webp) > [!question] part D > > 1. Visualisation of Reconstruction Error > 2. How does kernel PCA compare to Linear PCA on this dataset? If Kernel PCA shows improved performance, please justify your answer. If Linear PCA performs better, explain the reasons for its effectiveness. Reconstruction Error from kernel PCA as well as linear PCA: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a2/q1-t4-reconstruct-err-pca-kernels.webp) **Performance**: - Linear PCA has significantly better reconstruction error than kernel PCA (6.68 of linear PCA against 47.48 at $\text{gamma}=0.01$ of kernel PCA) - Regardless of `gamma`, Kernel PCA shows a lot higher error **Reasoning for Linear PCA**: 1. Data characteristic: most likely LFW contains mostly linear relationship between features (face images have strong linear correlations in pixel intensities and structures) 2. Dimensionality: This aligns with Task 3 Part B where we observe same value with `n_components=60` for linear PCA 3. Overfitting: less prone to overfitting, given that Kernel PCA might find local optima that overfit given patterns of data (in this case face features). Additionally, RBF is more sensitive to outliers Explanation why Kernel PCA doesn’t work as well: 1. Kernel: RBF assumes local, non-linear relationships. This might not work with facial data given strong linear correlation among facial features. 2. Gamma: We notice that with $\text{gamma}=0.01$ achieve lowest error, still underperformed comparing to linear PCA. 3. Noise: non-linear kernel mapping are more prone to capture noise or irrelevant patterns in facial images. --- ## question 2. > [!note] problem statement > > “Driving high” s prohibited in the city, and the police have started using a tester that shows whether a driver is high on cannabis. The tester is a binary classifier (1 for positive result, and 0 for negative result) which is not accurate all the time: > > - if the driver is truly high, then the test will be positive with probability $1 - \beta_1$ and negative with probability $\beta_1$ (so the probability of wrong result is $\beta_1$ in this case) > - if the driver is not high, then the test will be positive with probability $\beta_2$ and negative with probability $1-\beta_2$ (so the probability of wrong result is $\beta_2$ in this case) > > Assume the probability of (a randomly selected driver from the population) being “truly high” is $\alpha$ > [!question] part 1 > > What is the probability that the tester shows a positive result for a (randomly selected) driver? (write your answer in terms of $\alpha, \beta_1, \beta_2$) Probability of a driver being truly high: $P(\text{High}) = \alpha$ Probability of a driver not being high: $P(\text{Not High}) = 1- \alpha$ Probability of a positive test given the dirver is high: $P(\text{Positive} | \text{High}) = 1 - \beta_1$ Probability of a positive test given the dirver is not high: $P(\text{Positive} | \text{Not High}) = \beta_2$ _using law of total probability to find overall probability of a positive test result:_ $$ \begin{aligned} P(\text{Positive}) &= P(\text{Positive} | \text{High}) \cdot P(\text{High}) + P(\text{Positive} | \text{Not High}) P(\text{Not High}) \\ &= (1 - \beta_1) \cdot \alpha + (\beta_2) \cdot (1 - \alpha) \end{aligned} $$ > [!question] part 2 > > The police have collected test results for n randomly selected drivers (i.i.d. samples). What is the likelihood that there are exactly $n_{+}$ positive samples among the $n$ samples? Write your solution in terms of $\alpha, \beta_1, \beta_2, n_{+}, n$ Let probability of positive test result for a randomly selected driver is $$ p = P(\text{Positive}) = (1 - \beta_1) \cdot \alpha + (\beta_2) \cdot (1 - \alpha) $$ Now, apply binomial probability to find the likelihood of $n_{+}$ positive samples among $n$ samples: $$ \begin{aligned} P(X=n_{+}) &= \binom{n}{n_{+}} \cdot p^{n_{+}} \cdot (1-p)^{n-n_{+}} \\ &= \binom{n}{n_{+}} \cdot [(1 - \beta_1) \cdot \alpha + (\beta_2) \cdot (1 - \alpha)]^{n_{+}} \\ &\quad \quad \quad \quad \cdot (1 - ((1 - \beta_1) \cdot \alpha + (\beta_2) \cdot (1 - \alpha)))^{n-n_{+}} \\ &= \binom{n}{n_{+}} \cdot [(1 - \beta_1 - \beta_2) \cdot \alpha + \beta_2]^{n_{+}} \cdot (1 - \beta_2 + \alpha \cdot (\beta_1 + \beta_2 - 1))^{n-n_{+}} \\ \end{aligned} $$ > [!question] part 3 > > What is the maximum likelihood estimate of $\alpha$ given a set of $n$ random samples from which $n_{+}$ are positive results? In this part, you can assume that $\beta_1$ and $\beta_2$ are fixed and given. Simplify your final result in terms of $n, n_{+}, \beta_1, \beta_2$ _Assumption: using nature log `ln`_ _MLE of $\alpha$_ Let likelikhood function $L(\alpha)$: $$ \begin{aligned} L(\alpha) &= \binom{n}{n_{+}} \cdot p(\alpha)^{n_{+}} \cdot (1-p(\alpha))^{n-n_{+}} \\ \\ \because &\quad p(\alpha) = (1 - \beta_1) \cdot \alpha + \beta_2 \cdot (1-\alpha) \end{aligned} $$ Take log of both sides and drop constant term: $$ \ln L(\alpha ) = n_{+} \ln [p(\alpha)] + (n-n_{+}) \ln [1-p(\alpha)] $$ To find the maximum likelihood, we differentiate with respect to $\alpha$ and set to zero: $$ \begin{aligned} n_{+} \cdot \frac{p^{'}(\alpha)}{p(\alpha )} &- (n-n_{+}) \cdot \frac{p^{'}(\alpha)}{1-p(\alpha )} = 0 \\ \\ \because &\quad p'(\alpha ) = 1 - \beta_1 - \beta_2 \\ \\ \\ \\ n_{+} \cdot \frac{1 - \beta_1 - \beta_2}{p(\alpha )} &= (n-n_{+}) \cdot \frac{1 - \beta_1 - \beta_2}{1-p(\alpha )} \\ \\ n_{+} - n_{+} p(\alpha ) &= n p(\alpha) - n_{+} p(\alpha) n_{+} = np(\alpha) \end{aligned} $$ Substituting $p(\alpha) = (1 - \beta_1) \cdot \alpha + \beta_2 \cdot (1-\alpha)$: $$ \begin{aligned} n_{+} &= n [(1-\beta_1) \cdot \alpha + \beta_2 \cdot (1-\alpha)] \\ \frac{n_{+}}{n} &= (1-\beta_1-\beta_2) \cdot \alpha + \beta_2 \\ \\ \\ \\ \text{MLE for } \hat{\alpha} &= \frac{\frac{n_{+}}{n} - \beta_2}{1 - \beta_{1} - \beta_{2}} \\ &= \frac{n_{+} - n \cdot \beta_{2}}{n - n\cdot \beta_{1} - n\cdot \beta_{2}} \end{aligned} $$ > [!question] part 4 > > What will be the maximum likelikhood estimate of $\alpha$ for the special cases of > > - $(i) \beta_{1} = \beta_{2} = 0$ > - $(i) \beta_{1} = \beta_{2} = 0.5$ > - $(i) \beta_{1} = 0.2, \beta_{2} = 0.3$ For $(i) \beta_{1} = \beta_{2} = 0$: $\hat{\alpha} = \frac{n_{+}}{n}$ For $(i) \beta_{1} = \beta_{2} = 0.5$: $\hat{\alpha} = \text{undefined}$ _note: this makes sense, given when the test is completely random, then there is no information about true proportion of high drivers._ For $(i) \beta_{1} = 0.2, \beta_{2} = 0.3$: $\hat{\alpha} = \frac{n_+ - 0.3n}{0.5n} = \frac{2n_{+}}{n} - \frac{3}{5} = \frac{2n_+}{n} - 0.6$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content tags: - sfwr4ml3 description: "implementation in pure PyTorch" title: "SVM and Logistic Regression" date: 2024-11-11 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content.html.md --- See also [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/svm) ## task 1: linear [SVM](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Support-Vector-Machine) for MNIST classification > [!question] part a > > Is the implementation of the multi-class linear SVM similar to the end-to-end multi-class SVM that we learned in the class? Are there any significant differences? | Differences | multi-class linear SVM | end-to-end multi-class SVM | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | Loss function | Uses `MultiMarginLoss`, which creates a criterion that optimises a multi-class classification hinge loss [^multiloss] | multi-vector encoding where $h(x) = \arg\max_{y} $ | | Architecture | Through a single linear layers based on given input\_size and `num_classes` | optimized over pairs of class scores with multi-vector encoding | | Parameter Learning | Uses [SGD](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Stochastic-gradient-descent) with minibatches to optimize MML | Whereas we show a theoretical formulation of optimizing over multi-vector encoded space [^theoretical] | > [!question] part B > > 1. Compute the accuracy on the train and test set after each epoch in the training. Plot these accuracies as a function of the epoch number and include it in the report (include only the plot in your report, not all the 2\*100 numbers). > 2. Compute the hinge loss on the train and test set after each epoch in the training. Plot these loss values as a function of the epoch number and include it in the report.(include only the plot in your report, not all the 2\*100 numbers) > 3. Report the last epoch results (including loss values and accuracies) for both train and test sets. > 4. Does the model shows significant overfitting? Or do you think there might be other factors that are more significant in the mediocre performance of the model? The following includes graph for both accuracy and loss on train/test sets after 100 epochs ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t1-partb.webp) Last epoch results for both train and test sets: ```prolog ------------------------------------------------------------- Epoch 100 - Train loss: 0.016170, Train accuracy: 100.00% - Test loss: 0.165001, Test accuracy: 78.50% ------------------------------------------------------------- ``` We observe training accuracy continuing to improve, while test accuracy plateaus. Same observation can be made for in `Loss vs. Epochs` graph, where gap between training and test loss increases as epochs increase __While this shows evidence of overfitting, one can argue there are factors affecting model performance:__ **Liminal training data**: - we are currently only use 0.25% of MNIST dataset (which is around 150 samples) [^size] - This makes it difficult for the model to learn generalizable patterns **Model limitation**: - Linear SVM can only learn linear decision boundaries - MNIST datasets requires non-linear decision boundaries to achieve high performance (we observe this through relatively quick plateau test accuracy after 78.5%) > We don’t observe in degrading test performance, which is not primarily behaviour of overfitting. > [!question] part c > > Weight decay works like regularization. Set weight decay to each of the values (0.1, 1, 10) during defining the SGD optimizer (see [SGD optimizer documentation](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) for how to do that). > > Plot the train/test losses and accuracies per epoch. Also report the last epoch results (loss and accuracy for both train and test) . > > > [!tip] Important > > > > Does weight decay help in this case? Justify the results. The following are logs for set of weight decay from (0.1, 1, 10) ```text Training with weight decay = 0.1 ============================================================= Epoch 020 - Train loss: 0.1048, Train accuracy: 94.67% - Test loss: 0.2342, Test accuracy: 75.30% ------------------------------------------------------------- Epoch 040 - Train loss: 0.0638, Train accuracy: 98.00% - Test loss: 0.2072, Test accuracy: 78.60% ------------------------------------------------------------- Epoch 060 - Train loss: 0.0520, Train accuracy: 98.67% - Test loss: 0.2034, Test accuracy: 79.10% ------------------------------------------------------------- Epoch 080 - Train loss: 0.0447, Train accuracy: 99.33% - Test loss: 0.2043, Test accuracy: 80.00% ------------------------------------------------------------- Epoch 100 - Train loss: 0.0422, Train accuracy: 99.33% - Test loss: 0.2051, Test accuracy: 79.60% ------------------------------------------------------------- ``` ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t1-partc-wd-point1.webp) ```text Training with weight decay = 1 ============================================================= Epoch 020 - Train loss: 0.2499, Train accuracy: 90.67% - Test loss: 0.3714, Test accuracy: 73.00% ------------------------------------------------------------- Epoch 040 - Train loss: 0.2374, Train accuracy: 89.33% - Test loss: 0.3621, Test accuracy: 73.30% ------------------------------------------------------------- Epoch 060 - Train loss: 0.2416, Train accuracy: 87.33% - Test loss: 0.3646, Test accuracy: 72.80% ------------------------------------------------------------- Epoch 080 - Train loss: 0.2367, Train accuracy: 90.67% - Test loss: 0.3621, Test accuracy: 74.70% ------------------------------------------------------------- Epoch 100 - Train loss: 0.2366, Train accuracy: 90.67% - Test loss: 0.3592, Test accuracy: 74.20% ------------------------------------------------------------- ``` ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t1-partc-wd-1.webp) ```text Training with weight decay = 10 ============================================================= Epoch 020 - Train loss: 0.7413, Train accuracy: 33.33% - Test loss: 0.7881, Test accuracy: 23.10% ------------------------------------------------------------- Epoch 040 - Train loss: 0.7422, Train accuracy: 37.33% - Test loss: 0.7906, Test accuracy: 22.00% ------------------------------------------------------------- Epoch 060 - Train loss: 0.7437, Train accuracy: 33.33% - Test loss: 0.7938, Test accuracy: 18.50% ------------------------------------------------------------- Epoch 080 - Train loss: 0.7316, Train accuracy: 26.67% - Test loss: 0.7883, Test accuracy: 16.90% ------------------------------------------------------------- Epoch 100 - Train loss: 0.7415, Train accuracy: 24.00% - Test loss: 0.7953, Test accuracy: 13.70% ------------------------------------------------------------- ``` ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t1-partc-wd-10.webp) ```text final results comparison: ====================================================================== weight decay train loss test loss train acc test acc ---------------------------------------------------------------------- 0.1 0.0422 0.2051 99.33% 79.60% 1.0 0.2366 0.3592 90.67% 74.20% 10.0 0.7415 0.7953 24.00% 13.70% ``` Yes, but the result is highly sensitive based on given weight decay value. 1. with `weight_decay = 0.1` we observe the best performance, with training accuracy reaches to 99.33%, smaller gap between train and test loss. Smooth learning curves with stable conversion. 2. with `weight_decay = 1` we saw a decrease in training accuracy, larger gap between training and test loss, training become a bit unstable with fluctuation in accuracy, and regularisation is too strong, which affect learning 3. with `weight_decay = 10`, we saw it severely impairs model performance, given that it is too restrictive. Unstable training, high loss values, regularisation is too aggressive. > Small dataset makes the model more sensitive to regularisation. Linearity makes it lax to require regularisation. > Weight decay does help when properly tuned, and make learning a bit more stable. ## task 2: Logistic Regression for MNIST classification > [!question] part a > > Use Cross Entropy Loss (rather than Hinge loss) to implement logistic regression _context_: - Hinge Loss: it penalized predictions that are not sufficiently confident. Only cares about correct classification with sufficient margin - cross-entropy: For binary loss is defined: $$ L(y, p(x)) = -(y * \log(p(x)) + (1-y) * \log (1-p(x))) $$ For multi-class is defined: $$ L(y, p(x)) = - \sum y_i * \log(p_i(x)) $$ > [!question] part b > > 1. Compute the accuracy on the train and test set after each epoch in the training. Plot these accuracies as a function of the epoch number. > 2. Compute the cross-entropy loss on the train and test set after each epoch in the training. Plot these loss values as a function of the epoch number. > 3. Report the last epoch results (including loss values and accuracies) for both train and test sets. > 4. Does the model shows significant overfitting? Or do you think there might be other factors that are more significant in the mediocre performance of the model? The following is the graph entails both accuracy and loss on train/test dataset: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t2-partb.webp) ```text ------------------------------------------------------------- Epoch 100 - Train loss: 2.3271, Train accuracy: 8.67% - Test loss: 2.3272, Test accuracy: 8.20% ------------------------------------------------------------- ``` No sign of overfitting, given training/test accuracy are very close together. Training loss and test loss curves are pretty close The reason for poor performance are as follow: - random chance baseline: for 10-class problem, random guessing would give \~10% accuracy, so it perform a bit worse. - The model doesn’t seem to learn at all. It perform significantly worse than SVM. - Cross-entropy loss might need additional tuning. - Non-linearity: Given that MNIST data contains non-linear features, it might be hard for LR to capture all information from training dataset. > [!question] part c > > Does it work better, worse, or similar? Significantly worse, due to the difference in loss function. ## task 3: non-linearity > [!question] part a > > Add a hidden layer with 5000 neurons and a RELU layer for both logistic regression and SVM models in Task 1 and Task 2. > > 1. For both models, plot the train loss and the test loss. > 2. For both models, plot the train and test accuracies. > 3. For both models, report the loss and accuracy for both train and test sets. The following is the modified version of LinearSVM with hidden layers: ```python class ModifiedModel(nn.Module): def __init__(self, input_size, hidden_size, num_classes): super().__init__() self.fc1 = nn.Linear(input_size, hidden_size) self.relu = nn.ReLU() self.fc2 = nn.Linear(hidden_size, num_classes) def forward(self, x): x = x.view(-1, input_size) x = self.fc1(x) x = self.relu(x) return self.fc2(x) ``` With training/test accuracy and loss graph: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t3-parta.webp) Final epoch result: ```text ------------------------------------------------------------ Epoch 100: Train Loss: 0.0033, Train Accuracy: 100.00% Test Loss: 0.1723, Test Accuracy: 78.10% ------------------------------------------------------------ ``` Modified version of `LogisticRegression` with hidden layers: ```python class ModifiedLogisticModel(nn.Module): def __init__(self, input_size, hidden_size, num_classes): super().__init__() self.fc1 = nn.Linear(input_size, hidden_size) self.relu = nn.ReLU() self.fc2 = nn.Linear(hidden_size, num_classes) def forward(self, x): x = x.view(-1, input_size) x = self.fc1(x) x = self.relu(x) return self.fc2(x) ``` With training/test accuracy and loss graph: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t3-partb-lr.webp) Final epoch result: ```text ------------------------------------------------------------ Epoch 100: Train Loss: 0.1133, Train Accuracy: 100.00% Test Loss: 0.6675, Test Accuracy: 78.70% ------------------------------------------------------------ ``` > [!question] part b > > Compare the results with the linear model (without weight decay, to keep the comparison fair). Which approach works better? Why? Which appproach is more prone to overfitting? Explain your findings and justify it. Linear model works better in this case, even thought it achieve lower loss, similar test accuracy. The added complexity of the hidden layer and [ReLU](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/optimization#relu) activation didn’t improve the model’s performance given the dataset size (too small) The problem set might be linearly separable enough such that the model simply learns to generalise overall behaviour of the whole dataset (also known as grokking [^grokking]). > Note that overfitting suggests that there weren’t enough data in given training sets, given we observe similar test metrics for both `LinearSVM` and `ModifiedModel` (with ReLU and hidden layers) So it is not necessary “which works better”, rather it should be about limited training data rather than architectural options. ## task 4: data augmentation > [!note]+ instruction > > In this task, we will explore the concept of data augmentation, which is a powerful technique used to enhance the diversity of our training dataset without collecting new data. By applying various transformations to the original training images, we can create modified versions of these images. We can then use these modified images to train our model with a “richer” set of examples. The use of data augmentation helps to improve the robustness and generalization of our models. Data augmentation is particularly beneficial in tasks like image classification, where we expect the model to be invariant to slight variations of images (e.g., rotation, cropping, blurring, etc.) > > For this task, you are given a code that uses Gaussian Blur augmentation, which applies a Gaussian filter to slightly blur the images. If you run the code, you will see that this type of augmentation actually makes the model less accurate (compared with Task 3, SVM test accuracy) > > For this task, you must explore other types of data augmentation and find one that improves the test accuracy by at least 1 percent compared with not using any augmentation (i.e., compared with Task 3, SVM test accuracy). Only change the augmentation approach, and keep the other parts of the code unchanged. Read the PyTorch documentation on different augmentation techniques [here](https://pytorch.org/vision/stable/transforms.html), and then try to identify a good augmentation method from them. > > Report the augmentation approach that you used, and explain why you think it helps. Also include train/test accuracy plots per epoch, and the train/test accuracy at the final epoch. The following augmentation achieves higher test accuracy comparing to `ModifiedModel` without any transformation ```python augmentation = transforms.Compose([ # Small random rotation with higher probability of small angles transforms.RandomRotation(degrees=3, fill=0), # Even more conservative rotation # Very subtle random perspective transforms.RandomPerspective(distortion_scale=0.15, p=0.3, fill=0), # Convert to tensor transforms.ToTensor(), # Normalize to improve training stability transforms.Normalize((0.1307,), (0.3081,)), # MNIST mean and std # Extremely subtle random noise transforms.RandomAdjustSharpness(sharpness_factor=1.2, p=0.3) ]) ``` ### **Explanation** `ToTensor` is self-explanatory. Additional augmentation playground can also be found in the [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/svm) #### `RandomRotation` - we use $+-3$ degrees given that digits can appear at slightly different angles in said dataset - small rotation preserves readability, while increase variety - fill set to 0 to preserve black background #### `RandomPerspective` - add a small distortion scale to simulate viewing angle variations. - help with robustness to viewpoint change #### `Normalise` - Add MNIST mean and std to normalise training - make it more stable #### `RandomAdjustSharpness` - Simulate some random noise - One can also use `RandomErasing`, but the essentially work the same ### results The following is the final epoch result: ```text ------------------------------------------------------------- Epoch 100 - Train loss: 0.015159, Train accuracy: 99.33% - Test loss: 0.183071, Test accuracy: 81.10% ------------------------------------------------------------- ``` With graphs: ![](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/t4-highest.webp) [^multiloss]: [Loss](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/PyTorch#multimarginloss) is defined as: $\text{loss}(x,y) = \frac{\sum_{i} \max{0, \text{margin} - x[y] + x[i]}^p}{x.\text{size}(0)}$ [^theoretical]: Given input $(x_1, y_1), \ldots, (x_m, y_m)$ parameters: - regularization parameter $\lambda > 0$ - loss function $\delta: \mathcal{Y} \times \mathcal{Y} \rightarrow \mathbb{R}_{+}$ - class sensitive feature mapping $\Psi: \mathcal{X} \times \mathcal{Y} \rightarrow \mathbb{R}^d$ In this case, we solve for $$ \min_{w \in \mathbb{R}^d} (\lambda \|w\|^2 + \frac{1}{m} \sum_{i=1}^{m} \max_{y^{'} \in \mathcal{Y}}(\delta (y^{'}, y_i) + \langle w, \Psi (x_i, y^{'}) - \Psi (x_i, y_i) \rangle)) $$ [^size]: MNIST datasets are [60000](https://keras.io/api/datasets/mnist/) 28x28 grayscale images, therefore $0.25/100 * 60000 = 150$ samples being used [^grokking]: [grokking](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a3/content/../../../../../../../../../../thoughts/mechanistic-interpretability#grokking) is a process where neural network learns a pattern in the data, and it “memorize” this pattern to generalize to all unseen dataset, in which improve generalisation performance from random chance to perfect generalisation! Though, this phenomena is often observed in larger networks beyond overfitting. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/content tags: - sfwr4ml3 description: "and image processing." title: "Application of Convolutional Neural Network" date: 2024-11-11 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/content.html.md --- See also [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/CNN) and [Kaggle](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/kaggle) ## Task 1: SVHN Image Classification Using CNN ```python class SVHNClassifier(nn.Module, PretrainedMixin): def __init__(self): super(SVHNClassifier, self).__init__() # not specified in spec, but add dropout for stability self.convblock1 = nn.Sequential( nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(32), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), ) self.convblock2 = nn.Sequential( nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), ) self.convblock3 = nn.Sequential( nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), ) # Calculate input size for the first fully connected layer # Input image: 32x32 # After 3 max pooling layers (32 -> 16 -> 8 -> 4) # With 128 channels: 128 * 4 * 4 = 2048 self.fc = nn.Sequential(nn.Linear(128 * 4 * 4, 128), nn.ReLU(), nn.Linear(128, 10)) def forward(self, x): x = self.convblock1(x) x = self.convblock2(x) x = self.convblock3(x) x = x.view(x.size(0), -1) x = self.fc(x) return x ``` Note that we include a small serialisation helpers `PretrainedMixin` using `safetensors`: ```python class PretrainedMixin: @classmethod def from_pretrained(cls, filepath, device='cuda'): model = cls().to(device) load_model(model, filepath) model.eval() return model def save_pretrained(self, base_path='./model'): save_pretrained(self, name=self.__class__.__qualname__, base_path=base_path) ``` Plot for training metrics can be found as follow: ![Accuracy over epochs for SVHN classifier](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/accuracy-epochs-svhn-simple.webp) Accuracy over epochs for SVHN classifier ![loss over epochs for SVHN classifier](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/loss-epochs-svhn-simple.webp) loss over epochs for SVHN classifier ## Task 2: CNN for Image Denoising ```python class ImageDenoisingCNN(nn.Module, PretrainedMixin): def __init__(self): super(ImageDenoisingCNN, self).__init__() # First Convolutional Layer # Input: 32x32x3 -> Output: 32x32x30 self.conv1 = nn.Conv2d(in_channels=3, out_channels=30, kernel_size=3, padding=1, stride=1) self.relu = nn.ReLU() # Second Convolutional Layer # Input: 32x32x30 -> Output: 32x32x3 self.conv2 = nn.Conv2d(in_channels=30, out_channels=3, kernel_size=3, padding=1, stride=1) self.sigmoid = nn.Sigmoid() def forward(self, x): # First conv layer with ReLU x = self.conv1(x) x = self.relu(x) # Second conv layer with Sigmoid x = self.conv2(x) x = self.sigmoid(x) return x ``` training and eval loop: ```python def train(train_loader, test_loader, model, epochs, loss_function, optimizer, device='cuda'): """ Train the model on the training dataset and evaluate it on the test dataset. """ # Move model to the specified device model = model.to(device) train_loss_epochs = [] test_loss_epochs = [] for epoch in range(epochs): model.train() train_loss_batches = [] # Use context manager for batch progress bar with tqdm( enumerate(train_loader), total=len(train_loader), desc=f'epoch {epoch + 1}/{epochs}', ncols=100 ) as batch_pbar: for batch_idx, (clean_images, noisy_images) in batch_pbar: # Move data to device clean_images = clean_images.to(device) noisy_images = noisy_images.to(device) # Zero the gradients optimizer.zero_grad() # Forward pass denoised_images = model(noisy_images) loss = loss_function(denoised_images, clean_images) # Backward pass and optimize loss.backward() optimizer.step() # Track batch loss train_loss_batches.append(loss.item()) batch_pbar.set_postfix({'batch_loss': loss.item()}) # Display sample results every 5 epochs, at the last batch if epoch % 5 == 0 and batch_idx == len(train_loader) - 1: show_images_grid2(clean_images[:5].detach().cpu(), title='Clean', cols=5) show_images_grid2(noisy_images[:5].detach().cpu(), title='Noisy', cols=5) show_images_grid2(denoised_images[:5].detach().cpu(), title='Denoised', cols=5) # Calculate average training loss for the epoch train_loss_epoch = np.mean(train_loss_batches) train_loss_epochs.append(train_loss_epoch) # Evaluate model on test set test_loss_epoch = evaluate(test_loader, model, loss_function, epoch + 1, num_epochs, device=device) test_loss_epochs.append(test_loss_epoch) return train_loss_epochs, test_loss_epochs def evaluate(dataloader, model, loss_function, epoch, num_epochs, device='cuda'): """ Evaluate the model on the test dataset and return the average loss. """ model.eval() test_losses = [] with torch.no_grad(): with tqdm(dataloader, desc=f'eval {epoch}/{num_epochs}', ncols=100) as eval_pbar: for clean_images, noisy_images in eval_pbar: # Move data to device clean_images = clean_images.to(device) noisy_images = noisy_images.to(device) # Forward pass denoised_images = model(noisy_images) loss = loss_function(denoised_images, clean_images) # Track batch loss test_losses.append(loss.item()) return np.mean(test_losses) ``` Last sample for this training loop: ![last sample of this training epochs](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/last-sample-training-epochs.webp) last sample of this training epochs ```text epoch 96/100: 100%|██████████████████████████████| 24/24 [00:00<00:00, 34.90it/s, batch_loss=0.0027] eval 96/100: 100%|█████████████████████████████████████████████████| 24/24 [00:00<00:00, 80.53it/s] epoch 97/100: 100%|█████████████████████████████| 24/24 [00:00<00:00, 70.63it/s, batch_loss=0.00307] eval 97/100: 100%|█████████████████████████████████████████████████| 24/24 [00:00<00:00, 78.39it/s] epoch 98/100: 100%|█████████████████████████████| 24/24 [00:00<00:00, 69.79it/s, batch_loss=0.00271] eval 98/100: 100%|█████████████████████████████████████████████████| 24/24 [00:00<00:00, 79.21it/s] epoch 99/100: 100%|█████████████████████████████| 24/24 [00:00<00:00, 70.38it/s, batch_loss=0.00367] eval 99/100: 100%|█████████████████████████████████████████████████| 24/24 [00:00<00:00, 79.09it/s] epoch 100/100: 100%|████████████████████████████| 24/24 [00:00<00:00, 70.95it/s, batch_loss=0.00302] eval 100/100: 100%|████████████████████████████████████████████████| 24/24 [00:00<00:00, 78.81it/s] ``` ### visualisation ```python # Create the plot plt.figure(figsize=(10, 6)) # Plot training and test losses epochs = range(1, len(train_loss_epochs) + 1) plt.plot(epochs, train_loss_epochs, label='Training Loss', color='blue', linestyle='-') plt.plot(epochs, test_loss_epochs, label='Test Loss', color='red', linestyle='-') # Customize the plot plt.title('Training and Test Losses Over Time', fontsize=14, pad=15) plt.xlabel('Epochs', fontsize=12) plt.ylabel('Loss (MSE)', fontsize=12) plt.grid(True, linestyle='--', alpha=0.7) plt.legend(fontsize=10) # Add minor gridlines plt.minorticks_on() plt.grid(True, which='minor', linestyle=':', alpha=0.4) # Adjust layout and display plt.tight_layout() plt.show() # Print final losses print(f'Final Training Loss: {train_loss_epochs[-1]:.6f}') print(f'Final Test Loss: {test_loss_epochs[-1]:.6f}') ``` yields the following: ```text Final Training Loss: 0.003326 Final Test Loss: 0.003811 ``` ![training and test loss of denoising image over time](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/training-test-loss-over-time.webp) training and test loss of denoising image over time ### denoising last five samples ```text Average Test Loss on classes 5-9: 0.003754 ``` ![denoising last five samples](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/content/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/denoising-last-five-examples.webp) denoising last five samples --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/kaggle tags: - sfwr4ml3 - competition description: "CIFAR Challenge: Classify the World of Objects!" title: "CIFAR100 with CNN" date: 2024-12-03 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/kaggle.html.md --- See also [jupyter notebook](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/kaggle/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/Kaggle) Kaggle username: aar0npham Last attempt: 0.4477 on CIFAR100 ## training spec ```python num_epochs = 30 batch_size = 128 optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4) criterion = nn.CrossEntropyLoss(label_smoothing=0.1) scheduler = optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=5, T_mult=2, eta_min=1e-6) ``` Transformations for train and test respectively: ```python train = transforms.Compose([ transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(CIFAR100_MEAN, CIFAR100_STD), ]) test = transforms.Compose([transforms.ToTensor(), transforms.Normalize(CIFAR100_MEAN, CIFAR100_STD)]) ``` Model: fine tuned version of EfficientNetV2 trained on ImageNet21k from ([Tan & Le, 2021](#bib-tan2021efficientnetv2smallermodelsfaster)) ## reasoning reference: [paper](https://arxiv.org/pdf/2104.00298) EfficientNetV2 includes a optimisations to make training a lot faster while keeping the model relatively lean. They were built on top of a limited search space and a fused conv layers called Fused-MBConv. ![Fused-MBConv block](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/kaggle/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/fused-mbconv.webp) Fused-MBConv block I attempted to replicate the paper’s dropout and adaptive regularization but didn’t see a lot of benefits as mentioned from the paper itself. ![training metadata](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/kaggle/../../../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/a4/loss-acc-efficientnetv2.webp) training metadata Improvement: - Could have probably run on a longer epochs training durations - I tried `AdamW` but results in overfitting way too fast comparing to `SGD` ## code ```python # uv pip install pandas safetensors torch scipy tqdm torchvision torch torchinfo timm tensorboard import os, inspect from datetime import datetime from typing import Literal import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import torchvision import pandas as pd import numpy as np import matplotlib.pyplot as plt from tqdm import tqdm from PIL import Image from torchinfo import summary from torchvision import datasets, transforms, models from torch.utils.data import DataLoader, Dataset, random_split, Subset from safetensors.torch import save_model, load_model # Define CIFAR-100 mean and std CIFAR100_MEAN = (0.5071, 0.4867, 0.4408) CIFAR100_STD = (0.2675, 0.2565, 0.2761) # Hyperparameters num_epochs = 30 lr = 0.001 weight_decay = 1e-4 batch_size = 128 model_prefix = f'efficientnet_v2_{lr}_{num_epochs}' device = 'cuda' if torch.cuda.is_available() else 'cpu' ncols = 100 # CIFAR-100 dataset (download and create DataLoader) def get_dataloaders(batch_size): transform_train = transforms.Compose([ transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(CIFAR100_MEAN, CIFAR100_STD), ]) transform_test = transforms.Compose([transforms.ToTensor(), transforms.Normalize(CIFAR100_MEAN, CIFAR100_STD)]) train_dataset = torchvision.datasets.CIFAR100(root='./data', train=True, download=True, transform=transform_train) train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4) test_dataset = torchvision.datasets.CIFAR100(root='./data', train=False, download=True, transform=transform_test) test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=4) return train_loader, test_loader def _holistic_patch(model, num_features=100):model.classifier[1]=nn.Linear(model.classifier[1].in_features, num_features) # Load EfficientNetV2 model def init_model(variants: Literal['S', 'M', 'L'] = 'S', patch=_holistic_patch): if variants == 'S' : model = models.efficientnet_v2_s(weights=models.EfficientNet_V2_S_Weights.DEFAULT) elif variants == 'M': model = models.efficientnet_v2_m(weights=models.EfficientNet_V2_M_Weights.DEFAULT) elif variants == 'L': model = models.efficientnet_v2_l(weights=models.EfficientNet_V2_L_Weights.DEFAULT) patch(model) model.variants = variants model = model.to(device) return model # Load model if exists def load_checkpoint(filepath, model=None, variants='S'): if model is None: model = init_model(variants) load_model(model, filepath) model.eval() return model # Save model to safetensors def save_checkpoint(model, accuracy, model_prefix, basedir="./model"): timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') os.makedirs(basedir, exist_ok=True) variants = "default" if hasattr(model, "variants"): variants = model.variants filepath = os.path.join(basedir, f'{model_prefix}_{variants}_{accuracy:.2f}_{timestamp}.safetensors') save_model(model, filepath) print(f'Model checkpoint saved to {filepath}.') # Train the model def train(model, train_loader, criterion, optimizer, scheduler, num_epochs, *, ncols=100): best_accuracy = 0.0 train_losses = [] train_accuracies = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 correct = 0 total = 0 with tqdm(enumerate(train_loader), total=len(train_loader), ncols=ncols) as bar: for i, (images, labels) in bar: images, labels = images.to(device), labels.to(device) # Forward pass outputs = model(images) loss = criterion(outputs, labels) # Backward pass and optimization optimizer.zero_grad() loss.backward() optimizer.step() running_loss += loss.item() _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() bar.set_description(f'Epoch [{epoch + 1}/{num_epochs}]') bar.set_postfix(loss=loss.item()) scheduler.step() epoch_loss = running_loss / len(train_loader) epoch_acc = 100 * correct / total train_losses.append(epoch_loss) train_accuracies.append(epoch_acc) print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {epoch_loss:.4f}, Accuracy: {epoch_acc:.2f}%') # Evaluate the model on test set after each epoch test_acc = evaluate(model, test_loader) if test_acc > best_accuracy: best_accuracy = test_acc save_checkpoint(model, best_accuracy, model_prefix) # Plotting training history plot_training_history(train_losses, train_accuracies) # Evaluate the model def evaluate(model, test_loader): model.eval() correct = 0 total = 0 with torch.no_grad(): for images, labels in test_loader: images, labels = images.to(device), labels.to(device) outputs = model(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() accuracy = 100 * correct / total print(f'Test Accuracy: {accuracy:.2f}%') return accuracy # Plot training history def plot_training_history(train_losses, train_accuracies): plt.figure(figsize=(12, 5)) # Plot training loss plt.subplot(1, 2, 1) plt.plot(train_losses, label='Training Loss') plt.xlabel('Epoch') plt.ylabel('Loss') plt.title('Training Loss over Epochs') plt.legend() # Plot training accuracy plt.subplot(1, 2, 2) plt.plot(train_accuracies, label='Training Accuracy') plt.xlabel('Epoch') plt.ylabel('Accuracy (%)') plt.title('Training Accuracy over Epochs') plt.legend() plt.show() def validations(model, test_loader, classes, num_examples=16): model.eval() SAMPLES, PREDS, LABELS = [], [], [] with torch.no_grad(): for _ in range(num_examples): idx = np.random.randint(len(test_loader.dataset)) sample_image, actual_label = test_loader.dataset[idx] sample_image = sample_image.unsqueeze(0).to(device) SAMPLES.append(sample_image.squeeze(0)) LABELS.append(actual_label) output = F.softmax(model(sample_image), dim=-1) pred_values, pred_labels = output.max(-1) PREDS.append(round(float(pred_values), 4)) LABELS.append(int(pred_labels)) fig, ax = plt.subplots(nrows=4, ncols=4, figsize=(21, 19)) i = 0 for R in range(4): for C in range(4): image_np = SAMPLES[i].cpu().numpy().transpose(1, 2, 0) image_np = (image_np * np.array((0.2675, 0.2565, 0.2761)) + np.array((0.5071, 0.4867, 0.4408))) # Unnormalize image_np = np.clip(image_np, 0, 1) ax[R, C].imshow(image_np) ax[R, C].set_title('Actual: ' + classes[LABELS[i]], fontsize=16).set_color('k') ax[R, C].set_ylabel(PREDS[i], fontsize=16, rotation=0, labelpad=25).set_color('m') if LABELS[i] == LABELS[i]: ax[R, C].set_xlabel('Predicted: ' + classes[LABELS[i]], fontsize=16).set_color('b') else: ax[R, C].set_xlabel('Predicted: ' + classes[LABELS[i]], fontsize=16).set_color('r') ax[R, C].set_xticks([]) ax[R, C].set_yticks([]) i += 1 plt.show() if __name__ == "__main__": model = init_model(variants="L") optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9, weight_decay=1e-4) criterion = nn.CrossEntropyLoss(label_smoothing=0.1) # Add label smoothing scheduler = optim.lr_scheduler.CosineAnnealingWarmRestarts( optimizer, T_0=5, T_mult=2, eta_min=1e-6 ) train(model, train_loader, criterion, optimizer, scheduler, num_epochs, ncols=ncols) evaluate(model, test_loader) ``` ## Bibliographie - Tan, M., & Le, Q. V. (2021). _EfficientNetV2: Smaller Models and Faster Training_. arXiv preprint arXiv:2104.00298 [\[arxiv\]](https://arxiv.org/abs/2104.00298) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/index tags: - university - sfwr4ml3 - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/index" title: "Introduction to Machine Learning" date: 2024-09-10 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/index.html.md --- See also [machine learning](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/index/../../../../../../../../thoughts/Machine-learning) and [introduction](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/index/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture0.pdf) For annotated slides check out [annotated folders](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/index/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec) Books: - Pattern Recognition and Machine Learning” by Christopher M. Bishop - [Understanding Machine Learning](https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-machine-learning-theory-algorithms.pdf) by Shai Shalev-Shwartz and Shai Ben-David. Generative-adversarial networks: [github](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) similar courses offered by [Cornell](https://www.cs.cornell.edu/courses/cs4780/2024sp/) --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/index tags: - sfwr4ml3 - folder description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/index" title: "annotated slides." date: 2024-11-01 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/index.html.md --- Slides for all lectures with annotations. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood tags: - sfwr4ml3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood" title: "likelihood" date: 2024-10-07 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood.html.md --- ## maximum likelihood estimation $$ \begin{aligned} \alpha &= \argmax P(X | \alpha) \\ &= \argmin - \sum_{i} \log (P(x^i | \alpha)) \end{aligned} $$ $P(\alpha)$ captures a priori distribution of $\alpha$. $P(\alpha | X)$ is the posterior distribution of $\alpha$ given $X$. ## maximum a posteriori estimation $$ \begin{aligned} \alpha^{\text{MAP}} &= \argmax P(\alpha | X) \\ &= \argmax_{\alpha} \frac{P(X|\alpha)P(\alpha)}{P(X)} \\ &= \argmin_{\alpha}(-\log P(\alpha)) - \sum_{i=1}^{n} \log P(x^i | \alpha) \end{aligned} $$ $$ \begin{aligned} \argmax_{W} P(x | \alpha) P (\alpha) &= \argmax_{W} [\log P(\alpha) + \sum_{i} \log (x^i, y^i | W)] \\ &= \argmax_{W} [\ln \frac{1}{\beta} - \lambda {\parallel W \parallel}_{2}^{2} - \frac{({x^i}^T W - y^i)^2}{\sigma^2}] \end{aligned} $$ $$ P(W) = \frac{1}{\beta} e^{\lambda \parallel W \parallel_{2}^{2}} $$ > [!question] What if we have > > $$ > P(W) = \frac{1}{\beta} e^{\frac{\lambda \parallel W \parallel_{2}^{2}}{r^2}} > $$ $$ \argmax_{W} P(Z | \alpha) = \argmax_{W} \sum \log P(x^i, y^i | W) $$ $$ P(y | x, W) = \frac{1}{\gamma} e^{-\frac{(x^T W-y)^2}{2 \sigma^2}} $$ ## expected error minimisation Squared loss: $l(\hat{y},y)=(y-\hat{y})^2$ solution to $y^* = \argmin_{\hat{y}} E_{X,Y}(Y-\hat{y}(X))^2$ is $E[Y | X=x]$ Instead we have $Z = \{(x^i, y^i)\}^n_{i=1}$ ### error decomposition $$ \begin{aligned} &E_{x,y}(y-\hat{y_Z}(x))^2 \\ &= E_{xy}(y-y^{*}(x))^2 + E_x(y^{*}(x) - \hat{y_Z}(x))^2 \\ &= \text{noise} + \text{estimation error} \end{aligned} $$ ### bias-variance decompositions For linear estimator: $$ \begin{aligned} E_Z&E_{x,y}(y-(\hat{y}_Z(x)\coloneqq W^T_Zx))^2 \\ =& E_{x,y}(y-y^{*}(x))^2 \quad \text{noise} \\ &+ E_x(y^{*}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{bias} \\ &+ E_xE_Z(\hat{y_Z}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{variance} \end{aligned} $$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm tags: - sfwr4ml3 - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm" title: "Supervised machine learning" date: 2024-10-28 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm.html.md --- See also: [book](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Understand-Machine-Learning.pdf) ## probability density function if $X$ is a random variable, the probability density function (pdf) is a function $f(x)$ such that: $$ P(a \leq X \leq b) = \int_{a}^{b} f(x) dx $$ if distribution of $X$ is uniform over $[a,b]$, then $f(x) = \frac{1}{b-a}$ - url: thoughts/.../Linear-regression - description: curve fitting ## curve fitting. > [!question] how do we fit a distribution of data over a curve? > > Given a set of $n$ data points $S=\set{(x^i, y^i)}^{n}_{n=1}$ - $x \in \mathbb{R}^{d}$ - $y \in \mathbb{R}$ (or $\mathbb{R}^{k}$) [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression#curve-fitting) - url: thoughts/.../Linear-regression - description: 1D OLS In the case of 1-D ordinary least square, the problems equates find $a,b \in \mathbb{R}$ to minimize $\min\limits_{a,b} \sum_{i=1}^{n} (ax^i + b - y^i)^2$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression#1dols) > [!question]+ minimize > > $$ > \begin{aligned} \frac{\partial f}{\partial a} &= 2 \sum^{n}_{i=1}{(ax^i + b - y^i)} x^{i} = 0 \\ \frac{\partial f}{\partial b} &= 2 \sum^{n}_{i=1}{(ax^i + b - y^i)} = 0 \\ \\ \implies 2nb + 2a \sum_{i=1}^{n} x^i &= 2 \sum_{i=1}^{n} y^i \\ \implies b + a \overline{x} &= \overline{y} \\ \implies b &= \overline{y} - a \overline{x} \\ \\ \because \overline{y} &= \frac{1}{n} \sum_{i=1}^{n} y^{i} \\ \overline{x} &= \frac{1}{n} \sum_{i=1}^{n} x^{i} \end{aligned} > $$ - url: thoughts/.../Linear-regression - description: optimal solution ### optimal solution $$ \begin{aligned} a &= \frac{\overline{xy} - \overline{x} \cdot \overline{y}}{\overline{x^2} - (\overline{x})^2} = \frac{\text{COV}(x,y)}{\text{Var}(x)} \\ b &= \overline{y} - a \overline{x} \end{aligned} $$ where $\overline{x} = \frac{1}{N} \sum{x^i}$, $\overline{y} = \frac{1}{N} \sum{y^i}$, $\overline{xy} = \frac{1}{N} \sum{x^i y^i}$, $\overline{x^2} = \frac{1}{N} \sum{(x^i)^2}$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression#optimal-solution) - url: thoughts/.../Linear-regression - description: hyperplane ### hyperplane > [!abstract] Hyperplane equation > > $$ > \hat{y} = w_{0} + \sum_{j=1}^{d}{w_j x_j} \\ \because w_0: \text{the y-intercept (bias)} > $$ Homogeneous hyperplane: $$ \begin{aligned} w_{0} & = 0 \\ \hat{y} &= \sum_{j=1}^{d}{w_j x_j} = \langle{w,x} \rangle \\ &= w^Tx \end{aligned} $$ Matrix form OLS: $$ X_{n\times d} = \begin{pmatrix} x_1^1 & \cdots & x_d^1 \\ \vdots & \ddots & \vdots \\ x_1^n & \cdots & x_d^n \end{pmatrix}, Y_{n\times 1} = \begin{pmatrix} y^1 \\ \vdots \\ y^n \end{pmatrix}, W_{d\times 1} = \begin{pmatrix} w_1 \\ \vdots \\ w_d \end{pmatrix} $$ $$ \begin{aligned} \text{Obj} &: \sum_{i=1}^n (\hat{y}^i - y^i)^2 = \sum_{i=1}^n (\langle w, x^i \rangle - y^i)^2 \\ &\\\ \text{Def} &: \Delta = \begin{pmatrix} \Delta_1 \\ \vdots \\ \Delta_n \end{pmatrix} = \begin{pmatrix} x_1^1 & \cdots & x_d^1 \\ \vdots & \ddots & \vdots \\ x_1^n & \cdots & x_d^n \end{pmatrix} \begin{pmatrix} w_1 \\ \vdots \\ w_d \end{pmatrix} - \begin{pmatrix} y^1 \\ \vdots \\ y^n \end{pmatrix} = \begin{pmatrix} \hat{y}^1 - y^1 \\ \vdots \\ \hat{y}^n - y^n \end{pmatrix} \end{aligned} $$ > [!question] minimize > > $$ > \min\limits_{W \in \mathbb{R}^{d \times 1}} \|XW - Y\|_2^2 > $$ > [!abstract] OLS solution > > $$ > W^{\text{LS}} = (X^T X)^{-1}{X^T Y} > $$ Example: $$ \hat{y} = w_{0} + w_{1} \cdot x_{1} + w_{2} \cdot x_{2} $$ With $$ X_{n \times 2} = \begin{pmatrix} x^{1}_{1} & x^{1}_{2} \\ x^{2}_{1} & x^{2}_{2} \\ x^{3}_{1} & x^{3}_{2} \end{pmatrix} $$ and $$ X^{'}_{n \times 3} = \begin{pmatrix} x^{1}_{1} & x^{1}_{2} & 1 \\ x^{2}_{1} & x^{2}_{2} & 1 \\ x^{3}_{1} & x^{3}_{2} & 1 \end{pmatrix} $$ With $$ W = \begin{pmatrix} w_1 \\ w_2 \end{pmatrix} $$ and $$ W^{'} = \begin{pmatrix} w_1 \\ w_2 \\ w_0 \end{pmatrix} $$ thus $$ X^{'} \times W = \begin{pmatrix} w_0 + \sum{w_i \times x_i^{1}} \\ \vdots \\ w_0 + \sum{w_i \times x_i^{n}} \end{pmatrix} $$ See also [Bias and intercept](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept) [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Linear-regression#hyperplane) - url: thoughts/.../Bias-and-intercept - description: adding-bias-in-d-dimensions-ols ## adding bias in D-dimensions OLS $$ X^{'}_{n \times (d+1)} = \begin{pmatrix} x_1^{1} & \cdots & x_1^{d} & 1 \\ \vdots & \ddots & \vdots & \vdots \\ x_n^{1} & \cdots & x_n^{d} & 1 \end{pmatrix} $$ and $$ W_{(d+1) \times 1} = \begin{pmatrix} w_1 \\ \vdots \\ w_d \\ w_0 \end{pmatrix} $$ Add an new auxiliary dimension to the input data, $x_{d+1} = 1$ Solve OLS: $$ \min\limits{W \in \mathbb{R}^{d \times 1}} \|XW - Y\|_2^2 $$ Gradient for $f: \mathbb{R}^d \rightarrow \mathbb{R}$ $$ \triangledown_{w} \space f = \begin{bmatrix} \frac{\partial f}{\partial w_1} \\ \vdots \\ \frac{\partial f}{\partial w_d} \\ \end{bmatrix} $$ [Jacobian](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept/../../../../../thoughts/Vector-calculus#jacobian-matrix) for $g: \mathbb{R}^m \rightarrow \mathbb{R}^n$ $$ \begin{aligned} \triangledown_{w} \space g &= \begin{bmatrix} \frac{\partial g_1}{\partial w_1} & \cdots & \frac{\partial g_1}{\partial w_d} \\ \vdots & \ddots & \vdots \\ \frac{\partial g_n}{\partial w_1} & \cdots & \frac{\partial g_n}{\partial w_d} \end{bmatrix}_{n \times m} \\ \\ &u, t \in \mathbb{R}^d \\ &\because g(u) = u^T v \implies \triangledown_{w} \space g = v \text{ (gradient) } \\ \\ &A \in \mathbb{R}^{n \times n}; u \in \mathbb{R}^n \\ &\because g(u) = u^T A u \implies \triangledown_{w} \space g = (A + A^T) u^T \text{ (Jacobian) } \end{aligned} $$ > [!tip] result > > $$ > W^{\text{LS}} = (X^T X)^{-1} X^T Y > $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept#adding-bias-in-d-dimensions-ols) - url: thoughts/.../Bias-and-intercept - description: overfitting ## overfitting. strategies to avoid: - add more training data - L1 (Lasso) or L2 (Ridge) regularization - add a penalty term to the objective function - L1 makes sparse models, since it forces some parameters to be zero (robust to outliers). Since having the absolute value to the weights, forcing some model coefficients to become exactly 0. $$ \text{Loss}(w) = \text{Error} + \lambda \times \| w \| $$ - L2 is better for feature interpretability, for higher non-linear. Since it doesn’t perform feature selection, since weights are only reduced near 0 instead of exactly 0 like L1 $$ \text{Loss}(w) = \text{Error} + \lambda \times w^2 $$ - Cross-validation - split data into k-fold - early stopping - dropout, see [example](https://keras.io/api/layers/regularization_layers/dropout/) - randomly selected neurons are ignored ⇒ makes network less sensitive **sample complexity** of learning multivariate polynomials [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept#overfitting) - url: thoughts/.../Bias-and-intercept - description: regularization ## regularization. L2 regularization: $$ \text{min}_{W \in \mathbb{R}^{d}} \| XW - Y \|^{2}_{2} + \lambda \| W \|_{2}^{2} $$ > [!tip] Solving > > Solve that > > $$ > W^{\text{RLS}} = (X^T X + \lambda I)^{-1} X^T Y > $$ > > Inverse exists as long as $\lambda > 0$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept#regularization) - url: thoughts/.../Bias-and-intercept - description: polynomial-curve-fitting-revisited ## polynomial curve-fitting revisited feature map: $\phi{(x)}: R^{d_1} \rightarrow R^{d_2}$ where $d_{2} >> d_{1}$ training: - $W^{*} = \min\limits{W} \| \phi W - Y \|^{2}_{2} + \lambda \| W \|_{2}^{2}$ - $W^{*} = (\phi^T \phi + \lambda I)^{-1} \phi^T Y$ prediction: - $\hat{y} = \langle{W^{*}, \phi{(x)}} \rangle = {W^{*}}^T \phi(x)$ > [!abstract] choices of > > - Gaussian basis functions: $\phi(x) = \exp{(-\frac{\| x - \mu \|^{2}}{2\sigma^{2}})}$ > - Polynomial basis functions: $\phi(x) = \{1, x, x^{2}, \ldots, x^{d}\}$ > - Fourier basis functions: DFT, FFT [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept#polynomial-curve-fitting-revisited) - url: thoughts/.../Bias-and-intercept - description: kernels ## kernels compute higher dimension inner products $$ K(x^i, x^j) = \langle \phi(x^i), \phi(x^j) \rangle $$ Polynomial kernels of degree 2: $$ k(x^i, x^j) = (1 + (x^i)^T x^j)^2 = (1 + \langle{x^i, x^j} \rangle)^2 \\ \\ \because O(d) \text{ operations} $$ > [!abstract] degree M polynomial > > $$ > k(x^i, x^j) = (1 + (x^i)^T x^j)^M > $$ How many operations? - improved: $d + \log M$ ops [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Bias-and-intercept#kernels) ## kernel least squares Steps: - $W^{*} = \min\limits_{W} \|\phi W - Y\|_2^2 + \lambda \| W \|_2^2$ - shows that $\exists \space a \in \mathbb{R}^n \mid W^{*} = \phi^T a$, or $W^{*} = \sum a_i \phi(x^i)$ > [!note]- proof > > $$ > \begin{aligned} 0 &= \frac{\partial}{\partial W} (\|\phi W - Y\|_2^2 + \lambda \| W \|_2^2) \\ &= 2 W^T (\phi^T \phi) - 2 Y^T \phi + 2 \lambda W^T \\ &\implies \lambda W = \phi^T Y - \phi^T \phi W \\ &\implies \lambda W = \phi^T \frac{(Y - \phi W)}{\lambda} \\ \end{aligned} > $$ - Uses $W^{*} = \sum a_i \phi(x^i)$ to form the dual representation of the problem. $$ \min\limits_{\overrightarrow{a} \in \mathbb{R}^n} \| Ka - Y \|_2^2 + \lambda a^T K a \\ \because \hat{Y} = \phi \phi^T a = K_{n \times n} \dots a_{n \times 1} $$ Solution: $$ a^{*} = (K + \lambda I)^{-1} Y $$ ### choices - polynomial kernel: $K(x, z) = (1 + x^T z)^d$ - Gaussian kernel: $K(x, z) = e^{-\frac{\|x-z\|_2^2}{2\sigma^2}} = e^{-\alpha \|x-z\|^2_2}$ ## mapping high-dimensional data - url: thoughts/.../principal-component-analysis - description: minimising reconstruction error ## minimising reconstruction error - Given $X \in \mathbb{R}^{d \times n}$, find $A$ that minimises the reconstruction error: $$ \min\limits_{A,B} \sum_{i} \| x^i - B A x^i \|_2^2 $$ > if $q=d$, then error is zero. Solution: - $B = A^T$ - $\min\limits_{A} \sum_i \| x^i - A^T A x^i \|^2$ is subjected to $A A^T = I_{q \times q}$ - assuming data is centered, or $\frac{1}{n} \sum\_{i} x^i = \begin{bmatrix} 0 & \cdots & 0 \end{bmatrix}^T $ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/principal-component-analysis#minimising-reconstruction-error) - url: thoughts/.../principal-component-analysis - description: eigenvalue decomposition ## eigenvalue decomposition $$ \begin{aligned} X^T X \mathcal{u} &= \lambda \mathcal{u} \\ X^T X &= U^T \Lambda U \\ \\ \\ \because \Lambda &= \text{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d) \\ &= \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_q \end{bmatrix} \end{aligned} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/principal-component-analysis#eigenvalue-decomposition) - url: thoughts/.../principal-component-analysis - description: pca ## pca Idea: given input $x^1, \cdots, x^n \in \mathbb{R}^d$, $\mu = \frac{1}{n} \sum_{i} x^i$ Thus $$ C = \sum (x^i - \mu)(x^i - \mu)^T $$ Find the eigenvectors/values of $C$: $$ C = U^T \Lambda U $$ Optimal $A$ is: $$ A = \begin{bmatrix} u_1^T \\ u_2^T \\ \vdots \\ u_q^T \end{bmatrix} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/principal-component-analysis#pca) ## bayes rules and chain rules Joint distribution: $P(X,Y)$ Conditional distribution of $X$ given $Y$: $P(X|Y) = \frac{P(X,Y)}{P(Y)}$ Bayes rule: $P(X|Y) = \frac{P(Y|X)P(X)}{P(Y)}$ Chain rule: $P(X_1, X_2, \ldots , X_k) = P(X_1)P(X_2|X_1)P(X_3|X_2,X_1)\ldots P(X_k|X_1,X_2,\ldots,X_{k-1})$ > [!note] i.i.d assumption > > assume underlying distribution $D$, that train and test sets are independent and identically distributed (i.i.d) Example: flip a coin Outcome $H=0$ or $T=1$ with $P(H) = p$ and $P(T) = 1-p$, or $x \in \{0,1\}$, $x$ is the Bernoulli random variable. $P(x=0)=\alpha$ and $P(x=1)=1-\alpha$ Would be [maximum likelihood estimate](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood) $$ \alpha^{\text{ML}} = \argmax P(X | \alpha) = \argmin_{\alpha} - \sum_{i} \log (P(x^i | \alpha)) $$ - url: thoughts/.../likelihood - description: maximum a posteriori estimation ## maximum a posteriori estimation $$ \begin{aligned} \alpha^{\text{MAP}} &= \argmax P(\alpha | X) \\ &= \argmax_{\alpha} \frac{P(X|\alpha)P(\alpha)}{P(X)} \\ &= \argmin_{\alpha}(-\log P(\alpha)) - \sum_{i=1}^{n} \log P(x^i | \alpha) \end{aligned} $$ $$ \begin{aligned} \argmax_{W} P(x | \alpha) P (\alpha) &= \argmax_{W} [\log P(\alpha) + \sum_{i} \log (x^i, y^i | W)] \\ &= \argmax_{W} [\ln \frac{1}{\beta} - \lambda {\parallel W \parallel}_{2}^{2} - \frac{({x^i}^T W - y^i)^2}{\sigma^2}] \end{aligned} $$ $$ P(W) = \frac{1}{\beta} e^{\lambda \parallel W \parallel_{2}^{2}} $$ > [!question] What if we have > > $$ > P(W) = \frac{1}{\beta} e^{\frac{\lambda \parallel W \parallel_{2}^{2}}{r^2}} > $$ $$ \argmax_{W} P(Z | \alpha) = \argmax_{W} \sum \log P(x^i, y^i | W) $$ $$ P(y | x, W) = \frac{1}{\gamma} e^{-\frac{(x^T W-y)^2}{2 \sigma^2}} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood#maximum-a-posteriori-estimation) - url: thoughts/.../likelihood - description: expected error minimisation ## expected error minimisation Squared loss: $l(\hat{y},y)=(y-\hat{y})^2$ solution to $y^* = \argmin_{\hat{y}} E_{X,Y}(Y-\hat{y}(X))^2$ is $E[Y | X=x]$ Instead we have $Z = \{(x^i, y^i)\}^n_{i=1}$ ### error decomposition $$ \begin{aligned} &E_{x,y}(y-\hat{y_Z}(x))^2 \\ &= E_{xy}(y-y^{*}(x))^2 + E_x(y^{*}(x) - \hat{y_Z}(x))^2 \\ &= \text{noise} + \text{estimation error} \end{aligned} $$ ### bias-variance decompositions For linear estimator: $$ \begin{aligned} E_Z&E_{x,y}(y-(\hat{y}_Z(x)\coloneqq W^T_Zx))^2 \\ =& E_{x,y}(y-y^{*}(x))^2 \quad \text{noise} \\ &+ E_x(y^{*}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{bias} \\ &+ E_xE_Z(\hat{y_Z}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{variance} \end{aligned} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood#expected-error-minimisation) - url: thoughts/.../nearest-neighbour # nearest neighbour See also: [slides 13](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture13.pdf), [slides 14](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture14.pdf), [slides 15](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture15.pdf) - url: thoughts/.../likelihood - description: expected error minimisation ## expected error minimisation Squared loss: $l(\hat{y},y)=(y-\hat{y})^2$ solution to $y^* = \argmin_{\hat{y}} E_{X,Y}(Y-\hat{y}(X))^2$ is $E[Y | X=x]$ Instead we have $Z = \{(x^i, y^i)\}^n_{i=1}$ ### error decomposition $$ \begin{aligned} &E_{x,y}(y-\hat{y_Z}(x))^2 \\ &= E_{xy}(y-y^{*}(x))^2 + E_x(y^{*}(x) - \hat{y_Z}(x))^2 \\ &= \text{noise} + \text{estimation error} \end{aligned} $$ ### bias-variance decompositions For linear estimator: $$ \begin{aligned} E_Z&E_{x,y}(y-(\hat{y}_Z(x)\coloneqq W^T_Zx))^2 \\ =& E_{x,y}(y-y^{*}(x))^2 \quad \text{noise} \\ &+ E_x(y^{*}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{bias} \\ &+ E_xE_Z(\hat{y_Z}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{variance} \end{aligned} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood#expected-error-minimisation) ## accuracy zero-one loss: $$ l^{0-1}(y, \hat{y}) = 1_{y \neq \hat{y}}= \begin{cases} 1 & y \neq \hat{y} \\\ 0 & y = \hat{y} \end{cases} $$ ## linear classifier $$ \begin{aligned} \hat{y}_W(x) &= \text{sign}(W^T x) = 1_{W^T x \geq 0} \\[8pt] &\because \hat{W} = \argmin_{W} L_{Z}^{0-1} (\hat{y}_W) \end{aligned} $$ ## surrogate loss functions _assume_ classifier returns a discrete value $\hat{y}_W = \text{sign}(W^T x) \in \{0,1\}$ > [!question] What if classifier's output is continuous? > > $\hat{y}$ will also capture the “confidence” of the classifier. Think of contiguous loss function: margin loss, cross-entropy/negative log-likelihood, etc. ## linearly separable data > [!math] linearly separable > > A binary classification data set $Z=\{(x^i, y^i)\}_{i=1}^{n}$ is linearly separable if there exists a $W^{*}$ such that: > > - $\forall i \in [n] \mid \text{SGN}() = y^i$ > - Or, for every $i \in [n]$ we have $(W^{* T}x^i)y^i > 0$ ## linear programming $$ \begin{aligned} \max_{W \in \mathbb{R}^d} &\langle{u, w} \rangle = \sum_{i=1}^{d} u_i w_i \\ &\text{s.t } A w \ge v \end{aligned} $$ Given that data is _linearly separable_ $$ \begin{aligned} \exists \space W^{*} &\mid \forall i \in [n], ({W^{*}}^T x^i)y^i > 0 \\ \exists \space W^{*}, \gamma > 0 &\mid \forall i \in [n], ({W^{*}}^T x^i)y^i \ge \gamma \\ \exists \space W^{*} &\mid \forall i \in [n], ({W^{*}}^T x^i)y^i \ge 1 \end{aligned} $$ ## LP for linear classification - Define $A = [x_j^iy^i]_{n \times d}$ - find optimal $W$ equivalent to $$ \begin{aligned} \max_{w \in \mathbb{R}^d} &\langle{\vec{0}, w} \rangle \\ & \text{s.t. } Aw \ge \vec{1} \end{aligned} $$ ## perceptron Rosenblatt’s perceptron algorithm ```pseudo \begin{algorithm} \caption{Batch Perceptron} \begin{algorithmic} \REQUIRE Training set $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE Initialize $\mathbf{w}^{(1)} = (0,\ldots,0)$ \FOR{$t = 1,2,\ldots$} \IF{$(\exists \space i \text{ s.t. } y_i\langle\mathbf{w}^{(t)}, \mathbf{x}_i\rangle \leq 0)$} \STATE $\mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} + y_i\mathbf{x}_i$ \ELSE \STATE \textbf{output} $\mathbf{w}^{(t)}$ \ENDIF \ENDFOR \end{algorithmic} \end{algorithm} ``` ### greedy update $$ \begin{aligned} W_{\text{new}}^T x^i y^i &= \langle W_{\text{old}}+ y^i x^i, x^i \rangle y^i \\ &=W_{\text{old}}^T x^{i} y^{i} + \|x^i\|_2^2 y^{i} y^{i} \end{aligned} $$ ### proof See also ([Novikoff, 1962](#bib-novikoff1962convergence)) > [!math] Theorem > > Assume there exists some parameter vector $\underline{\theta}^{*}$ such that $\|\underline{\theta}^{*}\| = 1$ and $\exists \space \upgamma > 0 \text{ s.t }$ > > $$ > y_t(\underline{x_t} \cdot \underline{\theta^{*}}) \ge \upgamma > $$ > > _Assumption_: $\forall \space t= 1 \ldots n, \|\underline{x_t}\| \le R$ > > Then _perceptron makes at most $\frac{R^2}{\upgamma^2}$ errors_ _proof by induction_ > [!abstract] definition of > > to be parameter vector where algorithm makes $k^{\text{th}}$ error. _Note_ that we have $\underline{\theta^{1}}=\underline{0}$ Assume that $k^{\text{th}}$ error is made on example $t$, or $$ \begin{align} \underline{\theta^{k+1}} \cdot \underline{\theta^{*}} &= (\underline{\theta^k} + y_t \underline{x_t}) \cdot \underline{\theta^{*}} \\ &= \underline{\theta^k} \cdot \underline{\theta^{*}} + y_t \underline{x^t} \cdot \underline{\theta^{*}} \\ &\ge \underline{\theta^k} \cdot \underline{\theta^{*}} + \upgamma \\[12pt] &\because \text{ Assumption: } y_t \underline{x_t} \cdot \underline{\theta^{*}} \ge \upgamma \end{align} $$ Follows up by induction on $k$ that $$ \underline{\theta^{k+1}} \cdot \underline{\theta^{*}} \ge k \upgamma $$ Using [Cauchy-Schwarz](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../thoughts/Cauchy-Schwarz) we have $\|\underline{\theta^{k+1}}\| \times \|\underline{\theta^{*}}\| \ge \underline{\theta^{k+1}} \cdot \underline{\theta^{*}}$ $$ \begin{align} \|\underline{\theta^{k+1}}\| &\ge k \upgamma \\[16pt] &\because \|\underline{\theta^{*}}\| = 1 \end{align} $$ In the second part, we will find upper bound for (5): $$ \begin{align} \|\underline{\theta^{k+1}}\|^2 &= \|\underline{\theta^k} + y_t \underline{x_t}\|^2 \\ &= \|\underline{\theta^k}\|^2 + y_t^2 \|\underline{x_t}\|^2 + 2 y_t \underline{x_t} \cdot \underline{\theta^k} \\ &\le \|\underline{\theta^k}\|^2 + R^2 \end{align} $$ (9) is due to: - $y_t^2 \|\underline{x_t}^2\|^2 = \|\underline{x_t}^2\| \le R^2$ by assumption of theorem - $y_t \underline{x_t} \cdot \underline{\theta^k} \le 0$ given parameter vector $\underline{\theta^k}$ gave error at $t^{\text{th}}$ example. Follows with induction on $k$ that $$ \begin{align} \|\underline{\theta^{k+1}}\|^2 \le kR^2 \end{align} $$ from (5) and (10) gives us $$ \begin{aligned} k^2 \upgamma^2 &\le \|\underline{\theta^{k+1}}\|^2 \le kR^2 \\ k &\le \frac{R^2}{\upgamma^2} \end{aligned} $$ --- - url: thoughts/.../Support-Vector-Machine - description: SVM # Support Vector Machine idea: maximizes margin and more robust to “perturbations” Euclidean distance between two points $x$ and the hyperplane parametrized by $W$ is: $$ \frac{\mid W^T x + b \mid }{\|W\|_2} $$ > Assuming $\| W \|_2=1$ then the distance is $\mid W^T x + b \mid$ ## maximum margin hyperplane $W$ has $\gamma$ margin if $$ \begin{aligned} W^T x + b \ge \gamma \space &\forall \text{ blue x} \\ W^T x +b \le - \gamma \space &\forall \text{ red x} \end{aligned} $$ Margin: $$ Z = \{(x^{i}, y^{i})\}_{i=1}^{n}, y \in \{-1, 1\}, \|W\|_2 = 1 $$ ## hard-margin SVM ```pseudo \begin{algorithm} \caption{Hard-SVM} \begin{algorithmic} \REQUIRE Training set $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE \textbf{solve:} $(w_{0},b_{0}) = \argmin\limits_{(w,b)} \|w\|^2 \text{ s.t } \forall i, y_{i}(\langle{w,x_i} \rangle + b) \ge 1$ \STATE \textbf{output:} $\hat{w} = \frac{w_0}{\|w_0\|}, \hat{b} = \frac{b_0}{\|w_0\|}$ \end{algorithmic} \end{algorithm} ``` note that this version is sensitive to outliers ## soft-margin SVM ```pseudo \begin{algorithm} \caption{Soft-SVM} \begin{algorithmic} \REQUIRE Input $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE \textbf{parameter:} $\lambda > 0$ \STATE \textbf{solve:} $\min_{\mathbf{w}, b, \boldsymbol{\xi}} \left( \lambda \|\mathbf{w}\|^2 + \frac{1}{m} \sum_{i=1}^m \xi_i \right)$ \STATE \textbf{s.t: } $\forall i, \quad y_i (\langle \mathbf{w}, \mathbf{x}_i \rangle + b) \geq 1 - \xi_i \quad \text{and} \quad \xi_i \geq 0$ \STATE \textbf{output:} $\mathbf{w}, b$ \end{algorithmic} \end{algorithm} ``` [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Support-Vector-Machine) ## Bibliographie - Novikoff, A. B. J. (1962). On Convergence Proofs for Perceptrons. _Proceedings of the Symposium on the Mathematical Theory of Automata_, _12_, 615–622. [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour) ## linear algebra review. Diagonal matrix: every entry except the diagonal is zero. $$ A = \begin{bmatrix} a_{1} & 0 & \cdots & 0 \\ 0 & a_{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & a_{n} \end{bmatrix} $$ trace: sum of the entries in main diagonal: $\text{tr}(A) = \sum_{i=1}^{n} a_{ii}$ Properties of transpose: $$ \begin{aligned} (A^T)^T &= A \\ (A + B)^T &= A^T + B^T \\ (AB)^T &= B^T A^T \end{aligned} $$ Properties of inverse: $$ \begin{aligned} (A^{-1})^{-1} &= A \\ (AB)^{-1} &= B^{-1} A^{-1} \\ (A^T)^{-1} &= (A^{-1})^T \end{aligned} $$ > [!tip] Inverse of a matrix > > if a matrix $A^{-1}$ exists, mean A is _invertible_ (non-singular), and vice versa. ### quadratic form > Given a square matrix $A \in \mathbb{R}^{n \times n}$, the quadratic form is defined as: $x^TAx \in \mathbb{R}$ $$ x^TAx = \sum_{i=1}^{n} \sum_{j=1}^{n} a_{ij} x_i x_j $$ ### norms A function $f : \mathbb{R}^n \Rightarrow \mathbb{R}$ is a norm if it satisfies the following properties: - non-negativity: $\forall x \in \mathbb{R}^n, f(x) > 0$ - definiteness: $f(x) = 0 \iff x=0$ - Homogeneity: $\forall x \in \mathbb{R}^n, t\in \mathbb{R}, f(tx) \leq \mid t\mid f(x)$ - triangle inequality: $\forall x, y \in \mathbb{R}^n, f(x+y) \leq f(x) + f(y)$ ### symmetry > A square matrix $A \in \mathbb{R}^{n \times n}$ is symmetric if $A = A^T \mid A \in \mathbb{S}^n$ > > Anti-semi-symmetric if $A = -A^T \mid A$ Given any square matrix $A \in \mathbb{R}^{n \times n}$, the matrix $A + A^T$ is symmetric, and $A - A^T$ is anti-symmetric. > $A = \frac{1}{2}(A+A^T) + \frac{1}{2}(A-A^T)$ > [!tip] positive definite > > $A$ is positive definite if $x^TAx > 0 \forall x \in \mathbb{R}^n$. > > - It is denoted by $A \succ 0$. > - The set of all positive definite matrices is denoted by $\mathbb{S}^n_{++}$ > [!tip] positive semi-definite > > $A$ is positive semi-definite if $x^TAx \geq 0 \forall x \in \mathbb{R}^n$. > > - It is denoted by $A \succeq 0$. > - The set of all positive semi-definite matrices is denoted by $\mathbb{S}^n_{+}$ > [!tip] negative definite > > $A$ is negative definite if $x^TAx < 0 \forall x \in \mathbb{R}^n$. > > - It is denoted by $A \prec 0$. > - The set of all negative definite matrices is denoted by $\mathbb{S}^n_{--}$ > [!tip] negative semi-definite > > $A$ is negative semi-definite if $x^TAx \leq 0 \forall x \in \mathbb{R}^n$. > > - It is denoted by $A \preceq 0$. > - The set of all negative semi-definite matrices is denoted by $\mathbb{S}^n_{-}$ A symmetric matrix $A \in \mathbb{S}^n$ is _indefinite_ if it is neither positive semi-definite or negative semi-definite. $$ \exists x_1, x_2 \in \mathbb{R}^n \space \mid \space x_1^TAx_1 > 0 \space and \space x_2^TAx_2 < 0 $$ > Given **any** matrix $A \in \mathbb{R}^{m \times n}$, the matrix $G = A^TA$ is always positive semi-definite (known as the Gram matrix) > > Proof: $x^TGx = x^TA^TAx = (Ax)^T(Ax) = \|Ax\|_2^2 \geq 0$ ### eigenvalues and eigenvectors The non-zero vector $x \in \mathbb{C}^n$ is an eigenvector of A and $\lambda \in \mathbb{C}$ is called the eigenvalue of A if: $$ Ax = \lambda x $$ > [!note] finding eigenvalues > > $$ > \begin{aligned} \exists \text{ non-zero eigenvector } x \in \mathbb{C} & \iff \text{ null space of } (A - \lambda I) \text{ is non-empty} \\ \implies \mid A - \lambda I \mid \text{ is singular } \\ \mid A - \lambda I \mid &= 0 \end{aligned} > $$ > > Solving eigenvectors via $(A-\lambda_{i}I)x_i=0$ See also [matrix cookbook](https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf) ## matrix representation of a system of linear equations $$ \begin{aligned} x_1 + x_2 + x_3 &= 5 \\ x_1 - 2x_2 - 3x_3 &= -1 \\ 2x_1 + x_2 - x_3 &= 3 \end{aligned} $$ Equivalent matrix representation of $Ax = b$ $$ \begin{aligned} A &= \begin{bmatrix} 1 & 1 & 1 \\ 1 & -2 & -3 \\ 2 & 1 & -1 \end{bmatrix} \\ x &= \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \\ b &= \begin{bmatrix} 5 \\ -1 \\ 3 \end{bmatrix} \end{aligned} \because A \in R^{m \times n}, x \in R^n, b \in R^m $$ > [!tip] Transpose of a matrix > > $A \in R^{m \times n}$ and $A^T \in R^{n \times m}$ ## dot product. $$ \begin{aligned} \langle x, y \rangle &= \sum_{i=1}^{n} x_i y_i \\ &= \sum_{i=1}^{n} x_i \cdot y_i \end{aligned} $$ ## linear combination of columns Let $A \in R^{m \times n}$, $X \in R^n$, $Ax \in R^n$ Then $Ax = \sum_{i=1}^{n}{\langle a_i \rangle} x_i \in R^n$ ## inverse of a matrix The inverse of a square matrix $A \in R^{n \times n}$ is a **unique** matrix denoted by $A^{-1} \in \mathbb{R}^{n\times{n}}$ $$ A^{-1} A = I = A A^{-1} $$ ## euclidean norm $L_{2}$ norm: $$ \| x \|_{2} = \sqrt{\sum_{i=1}^{n}{x_i^2}} = X^TX $$ L1 norm: $\| x \|_{1} = \sum_{i=1}^{n}{|x_i|}$ $L_{\infty}$ norm: $\| x \|_{\infty} = \max_{i}{|x_i|}$ p-norm: $\| x \|_{p} = (\sum_{i=1}^{n}{|x_i|^p})^{1/p}$ > [!tip] Comparison > > $ \|x\|_{\infty} \leq \|x\|_{2} \leq \|x\|\_{1}$ > One can prove this with Cauchy-Schwarz inequality ## linear dependence of vectors Given $\{x_1, x_2, \ldots, x_n\} \subseteq \mathbb{R}^d$ and $\alpha_1, \alpha_2, \ldots, \alpha_n \in \mathbb{R}$ $$ \forall i \in [ n ], \forall \{a_1, a_2, \ldots, a_n\} \subseteq \mathbb{R}^d \space s.t. \space x_i \neq \sum_{j=1}^{n}{a_j x_j} $$ ## Span > Given a set of vectors $\{x_1, x_2, \ldots, x_n\} \subseteq \mathbb{R}^d$, the span of the set is the set of all possible linear combinations of the vectors. > > $$ > \text{span}(\{x_1, x_2, \ldots, x_n\}) = \{ y: y = \sum_{i=1}^{n}{\alpha_i x_i} \mid \alpha_i \in \mathbb{R} \} > $$ If $x_{1}, x_{2}, \ldots, x_{n}$ are linearly independent, then the span of the set is the entire space $\mathbb{R}^d$ ## Rank For a matrix $A \in \mathbb{R}^{m \times n}$: - column rank: max number of linearly independent columns of $A$ - row rank: max number of linearly independent rows of $A$ If $\text{rank}(A) \leq m$, then the rows are linearly independent. If $\text{rank}(A) \leq n$, then the columns are linearly independent. > rank of a matrix $A$ is the number of linearly independent columns of $A$: > > - if $A$ is full rank, then $\text{rank}(A) = \min(m, n)$ ($\text{rank}(A) \leq \min(m, n)$) > - $\text{rank}(A) = \text{rank}(A^T)$ ## solving linear system of equations If $A \in \mathbb{R}^{n}$ is invertible, there exists a solution: $$ x = A^{-1}b $$ ## Range and Projection Given a matrix $A \in \mathbb{R}^{m \times n}$, the range of $A$, denoted by $\mathcal{R}(A)$ is the span of columns of $A$: $$ \mathcal{R}(A) = \{ y \in \mathbb{R}^m \mid y = Ax \mid x \in \mathbb{R}^m \} $$ Projection of a vector $y \in \mathbb{R}^m$ onto $\text{span}(\{x_1, \cdots, x_n\})$, $x_i \in \mathbb{R}^m$ is a vector in the span that is as close as possible to $y$ wrt $l_2$ norm $$ \text{Proj}(y; \{x_{1}, \cdots, x_n\}) = \argmin_{{v \in \text{span}(\{x_1, \cdots, x_n\})}} \| y - v \|_2 $$ ## Null space of $A$ is the set of all vectors that satisfies the following: $$ \mathcal{N}(A) = \{ x \in \mathbb{R}^n \mid Ax = 0 \} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/midterm/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1) ## probability theory With Bayes rules we have $$ P(Y|X) = \frac{P(X|Y)P(Y)}{P(X)} $$ Chain rule states for event $A_1, \ldots A_n$: $$ \begin{aligned} P(A_1 \cap A_2 \cap \ldots \cap A_n) &= P(A_n|A_{n-1} \cap \ldots \cap A_1)P(A_{n-1} \cap \ldots \cap A_1) \\ &= P(A_1) \prod_{i=2}^{n} P(A_i|\cap_{j=1}^{i-1} A_j) \end{aligned} $$ > [!tip] Law of Total Probability > > If $B_{1}, \ldots , B_{n}$ are finite partition of the same space, or $\forall i \neq j, B_i \cap B_j = \emptyset \land \cup_{i=1}^{n} B_i = \Omega$, then _law of total probability_ state that for an event A > > $$ > P(A) = \sum_{i=1}^{n} P(A|B_i)P(B_i) > $$ ### cumulative distribution function For a random variable X, a CDF $F_X(x): \mathbb{R} \rightarrow [0,1]$ is defined as: $$ F_X(x) \coloneqq P(X \leq x) $$ - $0 0$ $$ \begin{aligned} p_X(x) &= \frac{e^{-\lambda} \lambda^x}{x!} \\ \mathbb{E}[X] &= \lambda \\ \text{Var}(X) &= \lambda \end{aligned} $$ ### continuous random variables Uniform distribution: $X \sim \text{Unif}(a,b), a \le b$ $$ \begin{aligned} f_X(x) &= \begin{cases} \frac{1}{b-a} & \text{if } a \le x \le b \\ 0 & \text{otherwise} \end{cases} \\ \\ \mathbb{E}[X] &= \frac{a+b}{2} \\ \text{Var}(X) &= \frac{(b-a)^2}{12} \end{aligned} $$ Exponential distribution: $X \sim \text{Exp}(\lambda), \lambda > 0$ $$ \begin{aligned} f_X(x) = \lambda e^{-\lambda x} \\ \\ \mathbb{E}[X] &= \frac{1}{\lambda} \\ \text{Var}(X) &= \frac{1}{\lambda^2} \end{aligned} $$ Gaussian distribution: $X \sim \mathcal{N}(\mu, \sigma^2), -\infty < \mu < \infty, \sigma^2 > 0$ $$ \begin{aligned} p_X(x) &= \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \\ \\ \mathbb{E}[X] &= \mu \\ \text{Var}(X) &= \sigma^2 \end{aligned} $$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour tags: - sfwr4ml3 - ml description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour" title: "nearest neighbour" date: 2024-10-28 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour.html.md --- See also: [slides 13](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture13.pdf), [slides 14](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture14.pdf), [slides 15](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/lec/Lecture15.pdf) - url: thoughts/.../likelihood - description: expected error minimisation ## expected error minimisation Squared loss: $l(\hat{y},y)=(y-\hat{y})^2$ solution to $y^* = \argmin_{\hat{y}} E_{X,Y}(Y-\hat{y}(X))^2$ is $E[Y | X=x]$ Instead we have $Z = \{(x^i, y^i)\}^n_{i=1}$ ### error decomposition $$ \begin{aligned} &E_{x,y}(y-\hat{y_Z}(x))^2 \\ &= E_{xy}(y-y^{*}(x))^2 + E_x(y^{*}(x) - \hat{y_Z}(x))^2 \\ &= \text{noise} + \text{estimation error} \end{aligned} $$ ### bias-variance decompositions For linear estimator: $$ \begin{aligned} E_Z&E_{x,y}(y-(\hat{y}_Z(x)\coloneqq W^T_Zx))^2 \\ =& E_{x,y}(y-y^{*}(x))^2 \quad \text{noise} \\ &+ E_x(y^{*}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{bias} \\ &+ E_xE_Z(\hat{y_Z}(x) - E_Z(\hat{y_Z}(x)))^2 \quad \text{variance} \end{aligned} $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/likelihood#expected-error-minimisation) ## accuracy zero-one loss: $$ l^{0-1}(y, \hat{y}) = 1_{y \neq \hat{y}}= \begin{cases} 1 & y \neq \hat{y} \\\ 0 & y = \hat{y} \end{cases} $$ ## linear classifier $$ \begin{aligned} \hat{y}_W(x) &= \text{sign}(W^T x) = 1_{W^T x \geq 0} \\[8pt] &\because \hat{W} = \argmin_{W} L_{Z}^{0-1} (\hat{y}_W) \end{aligned} $$ ## surrogate loss functions _assume_ classifier returns a discrete value $\hat{y}_W = \text{sign}(W^T x) \in \{0,1\}$ > [!question] What if classifier's output is continuous? > > $\hat{y}$ will also capture the “confidence” of the classifier. Think of contiguous loss function: margin loss, cross-entropy/negative log-likelihood, etc. ## linearly separable data > [!math] linearly separable > > A binary classification data set $Z=\{(x^i, y^i)\}_{i=1}^{n}$ is linearly separable if there exists a $W^{*}$ such that: > > - $\forall i \in [n] \mid \text{SGN}() = y^i$ > - Or, for every $i \in [n]$ we have $(W^{* T}x^i)y^i > 0$ ## linear programming $$ \begin{aligned} \max_{W \in \mathbb{R}^d} &\langle{u, w} \rangle = \sum_{i=1}^{d} u_i w_i \\ &\text{s.t } A w \ge v \end{aligned} $$ Given that data is _linearly separable_ $$ \begin{aligned} \exists \space W^{*} &\mid \forall i \in [n], ({W^{*}}^T x^i)y^i > 0 \\ \exists \space W^{*}, \gamma > 0 &\mid \forall i \in [n], ({W^{*}}^T x^i)y^i \ge \gamma \\ \exists \space W^{*} &\mid \forall i \in [n], ({W^{*}}^T x^i)y^i \ge 1 \end{aligned} $$ ## LP for linear classification - Define $A = [x_j^iy^i]_{n \times d}$ - find optimal $W$ equivalent to $$ \begin{aligned} \max_{w \in \mathbb{R}^d} &\langle{\vec{0}, w} \rangle \\ & \text{s.t. } Aw \ge \vec{1} \end{aligned} $$ ## perceptron Rosenblatt’s perceptron algorithm ```pseudo \begin{algorithm} \caption{Batch Perceptron} \begin{algorithmic} \REQUIRE Training set $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE Initialize $\mathbf{w}^{(1)} = (0,\ldots,0)$ \FOR{$t = 1,2,\ldots$} \IF{$(\exists \space i \text{ s.t. } y_i\langle\mathbf{w}^{(t)}, \mathbf{x}_i\rangle \leq 0)$} \STATE $\mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} + y_i\mathbf{x}_i$ \ELSE \STATE \textbf{output} $\mathbf{w}^{(t)}$ \ENDIF \ENDFOR \end{algorithmic} \end{algorithm} ``` ### greedy update $$ \begin{aligned} W_{\text{new}}^T x^i y^i &= \langle W_{\text{old}}+ y^i x^i, x^i \rangle y^i \\ &=W_{\text{old}}^T x^{i} y^{i} + \|x^i\|_2^2 y^{i} y^{i} \end{aligned} $$ ### proof See also ([Novikoff, 1962](#bib-novikoff1962convergence)) > [!math] Theorem > > Assume there exists some parameter vector $\underline{\theta}^{*}$ such that $\|\underline{\theta}^{*}\| = 1$ and $\exists \space \upgamma > 0 \text{ s.t }$ > > $$ > y_t(\underline{x_t} \cdot \underline{\theta^{*}}) \ge \upgamma > $$ > > _Assumption_: $\forall \space t= 1 \ldots n, \|\underline{x_t}\| \le R$ > > Then _perceptron makes at most $\frac{R^2}{\upgamma^2}$ errors_ _proof by induction_ > [!abstract] definition of > > to be parameter vector where algorithm makes $k^{\text{th}}$ error. _Note_ that we have $\underline{\theta^{1}}=\underline{0}$ Assume that $k^{\text{th}}$ error is made on example $t$, or $$ \begin{align} \underline{\theta^{k+1}} \cdot \underline{\theta^{*}} &= (\underline{\theta^k} + y_t \underline{x_t}) \cdot \underline{\theta^{*}} \\ &= \underline{\theta^k} \cdot \underline{\theta^{*}} + y_t \underline{x^t} \cdot \underline{\theta^{*}} \\ &\ge \underline{\theta^k} \cdot \underline{\theta^{*}} + \upgamma \\[12pt] &\because \text{ Assumption: } y_t \underline{x_t} \cdot \underline{\theta^{*}} \ge \upgamma \end{align} $$ Follows up by induction on $k$ that $$ \underline{\theta^{k+1}} \cdot \underline{\theta^{*}} \ge k \upgamma $$ Using [Cauchy-Schwarz](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../../../../thoughts/Cauchy-Schwarz) we have $\|\underline{\theta^{k+1}}\| \times \|\underline{\theta^{*}}\| \ge \underline{\theta^{k+1}} \cdot \underline{\theta^{*}}$ $$ \begin{align} \|\underline{\theta^{k+1}}\| &\ge k \upgamma \\[16pt] &\because \|\underline{\theta^{*}}\| = 1 \end{align} $$ In the second part, we will find upper bound for (5): $$ \begin{align} \|\underline{\theta^{k+1}}\|^2 &= \|\underline{\theta^k} + y_t \underline{x_t}\|^2 \\ &= \|\underline{\theta^k}\|^2 + y_t^2 \|\underline{x_t}\|^2 + 2 y_t \underline{x_t} \cdot \underline{\theta^k} \\ &\le \|\underline{\theta^k}\|^2 + R^2 \end{align} $$ (9) is due to: - $y_t^2 \|\underline{x_t}^2\|^2 = \|\underline{x_t}^2\| \le R^2$ by assumption of theorem - $y_t \underline{x_t} \cdot \underline{\theta^k} \le 0$ given parameter vector $\underline{\theta^k}$ gave error at $t^{\text{th}}$ example. Follows with induction on $k$ that $$ \begin{align} \|\underline{\theta^{k+1}}\|^2 \le kR^2 \end{align} $$ from (5) and (10) gives us $$ \begin{aligned} k^2 \upgamma^2 &\le \|\underline{\theta^{k+1}}\|^2 \le kR^2 \\ k &\le \frac{R^2}{\upgamma^2} \end{aligned} $$ --- - url: thoughts/.../Support-Vector-Machine - description: SVM # Support Vector Machine idea: maximizes margin and more robust to “perturbations” Euclidean distance between two points $x$ and the hyperplane parametrized by $W$ is: $$ \frac{\mid W^T x + b \mid }{\|W\|_2} $$ > Assuming $\| W \|_2=1$ then the distance is $\mid W^T x + b \mid$ ## maximum margin hyperplane $W$ has $\gamma$ margin if $$ \begin{aligned} W^T x + b \ge \gamma \space &\forall \text{ blue x} \\ W^T x +b \le - \gamma \space &\forall \text{ red x} \end{aligned} $$ Margin: $$ Z = \{(x^{i}, y^{i})\}_{i=1}^{n}, y \in \{-1, 1\}, \|W\|_2 = 1 $$ ## hard-margin SVM ```pseudo \begin{algorithm} \caption{Hard-SVM} \begin{algorithmic} \REQUIRE Training set $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE \textbf{solve:} $(w_{0},b_{0}) = \argmin\limits_{(w,b)} \|w\|^2 \text{ s.t } \forall i, y_{i}(\langle{w,x_i} \rangle + b) \ge 1$ \STATE \textbf{output:} $\hat{w} = \frac{w_0}{\|w_0\|}, \hat{b} = \frac{b_0}{\|w_0\|}$ \end{algorithmic} \end{algorithm} ``` note that this version is sensitive to outliers ## soft-margin SVM ```pseudo \begin{algorithm} \caption{Soft-SVM} \begin{algorithmic} \REQUIRE Input $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$ \STATE \textbf{parameter:} $\lambda > 0$ \STATE \textbf{solve:} $\min_{\mathbf{w}, b, \boldsymbol{\xi}} \left( \lambda \|\mathbf{w}\|^2 + \frac{1}{m} \sum_{i=1}^m \xi_i \right)$ \STATE \textbf{s.t: } $\forall i, \quad y_i (\langle \mathbf{w}, \mathbf{x}_i \rangle + b) \geq 1 - \xi_i \quad \text{and} \quad \xi_i \geq 0$ \STATE \textbf{output:} $\mathbf{w}, b$ \end{algorithmic} \end{algorithm} ``` [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/nearest-neighbour/../../../../../../../../thoughts/university/twenty-four-twenty-five/sfwr-4ml3/Support-Vector-Machine) ## Bibliographie - Novikoff, A. B. J. (1962). On Convergence Proofs for Perceptrons. _Proceedings of the Symposium on the Mathematical Theory of Automata_, _12_, 615–622. --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/principal-component-analysis tags: - sfwr4ml3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/principal-component-analysis" title: "principal component analysis" date: 2024-10-07 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/principal-component-analysis.html.md --- ## problem statement - map $x \in R^d$ to $z \in \mathbb{R}^q$ with $q < d$ - A $q \times d$ matrix can represent a linear mapping: $$ z = Ax $$ - Assume that $A A^T = I$ (orthonormal matrix) ## minimising reconstruction error - Given $X \in \mathbb{R}^{d \times n}$, find $A$ that minimises the reconstruction error: $$ \min\limits_{A,B} \sum_{i} \| x^i - B A x^i \|_2^2 $$ > if $q=d$, then error is zero. Solution: - $B = A^T$ - $\min\limits_{A} \sum_i \| x^i - A^T A x^i \|^2$ is subjected to $A A^T = I_{q \times q}$ - assuming data is centered, or $\frac{1}{n} \sum\_{i} x^i = \begin{bmatrix} 0 & \cdots & 0 \end{bmatrix}^T $ ## eigenvalue decomposition $$ \begin{aligned} X^T X \mathcal{u} &= \lambda \mathcal{u} \\ X^T X &= U^T \Lambda U \\ \\ \\ \because \Lambda &= \text{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d) \\ &= \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_q \end{bmatrix} \end{aligned} $$ ## pca Idea: given input $x^1, \cdots, x^n \in \mathbb{R}^d$, $\mu = \frac{1}{n} \sum_{i} x^i$ Thus $$ C = \sum (x^i - \mu)(x^i - \mu)^T $$ Find the eigenvectors/values of $C$: $$ C = U^T \Lambda U $$ Optimal $A$ is: $$ A = \begin{bmatrix} u_1^T \\ u_2^T \\ \vdots \\ u_q^T \end{bmatrix} $$ --- slug: thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1 tags: - sfwr4ml3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1" title: "linalg review" date: 2024-09-11 permalink: https://aarnphm.xyz/thoughts/university/twenty-four-twenty-five/sfwr-4ml3/tut/tut1.html.md --- See also [matrix cookbook](https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf) ## matrix representation of a system of linear equations $$ \begin{aligned} x_1 + x_2 + x_3 &= 5 \\ x_1 - 2x_2 - 3x_3 &= -1 \\ 2x_1 + x_2 - x_3 &= 3 \end{aligned} $$ Equivalent matrix representation of $Ax = b$ $$ \begin{aligned} A &= \begin{bmatrix} 1 & 1 & 1 \\ 1 & -2 & -3 \\ 2 & 1 & -1 \end{bmatrix} \\ x &= \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \\ b &= \begin{bmatrix} 5 \\ -1 \\ 3 \end{bmatrix} \end{aligned} \because A \in R^{m \times n}, x \in R^n, b \in R^m $$ > [!tip] Transpose of a matrix > > $A \in R^{m \times n}$ and $A^T \in R^{n \times m}$ ## dot product. $$ \begin{aligned} \langle x, y \rangle &= \sum_{i=1}^{n} x_i y_i \\ &= \sum_{i=1}^{n} x_i \cdot y_i \end{aligned} $$ ## linear combination of columns Let $A \in R^{m \times n}$, $X \in R^n$, $Ax \in R^n$ Then $Ax = \sum_{i=1}^{n}{\langle a_i \rangle} x_i \in R^n$ ## inverse of a matrix The inverse of a square matrix $A \in R^{n \times n}$ is a **unique** matrix denoted by $A^{-1} \in \mathbb{R}^{n\times{n}}$ $$ A^{-1} A = I = A A^{-1} $$ ## euclidean norm $L_{2}$ norm: $$ \| x \|_{2} = \sqrt{\sum_{i=1}^{n}{x_i^2}} = X^TX $$ L1 norm: $\| x \|_{1} = \sum_{i=1}^{n}{|x_i|}$ $L_{\infty}$ norm: $\| x \|_{\infty} = \max_{i}{|x_i|}$ p-norm: $\| x \|_{p} = (\sum_{i=1}^{n}{|x_i|^p})^{1/p}$ > [!tip] Comparison > > $ \|x\|_{\infty} \leq \|x\|_{2} \leq \|x\|\_{1}$ > One can prove this with Cauchy-Schwarz inequality ## linear dependence of vectors Given $\{x_1, x_2, \ldots, x_n\} \subseteq \mathbb{R}^d$ and $\alpha_1, \alpha_2, \ldots, \alpha_n \in \mathbb{R}$ $$ \forall i \in [ n ], \forall \{a_1, a_2, \ldots, a_n\} \subseteq \mathbb{R}^d \space s.t. \space x_i \neq \sum_{j=1}^{n}{a_j x_j} $$ ## Span > Given a set of vectors $\{x_1, x_2, \ldots, x_n\} \subseteq \mathbb{R}^d$, the span of the set is the set of all possible linear combinations of the vectors. > > $$ > \text{span}(\{x_1, x_2, \ldots, x_n\}) = \{ y: y = \sum_{i=1}^{n}{\alpha_i x_i} \mid \alpha_i \in \mathbb{R} \} > $$ If $x_{1}, x_{2}, \ldots, x_{n}$ are linearly independent, then the span of the set is the entire space $\mathbb{R}^d$ ## Rank For a matrix $A \in \mathbb{R}^{m \times n}$: - column rank: max number of linearly independent columns of $A$ - row rank: max number of linearly independent rows of $A$ If $\text{rank}(A) \leq m$, then the rows are linearly independent. If $\text{rank}(A) \leq n$, then the columns are linearly independent. > rank of a matrix $A$ is the number of linearly independent columns of $A$: > > - if $A$ is full rank, then $\text{rank}(A) = \min(m, n)$ ($\text{rank}(A) \leq \min(m, n)$) > - $\text{rank}(A) = \text{rank}(A^T)$ ## solving linear system of equations If $A \in \mathbb{R}^{n}$ is invertible, there exists a solution: $$ x = A^{-1}b $$ ## Range and Projection Given a matrix $A \in \mathbb{R}^{m \times n}$, the range of $A$, denoted by $\mathcal{R}(A)$ is the span of columns of $A$: $$ \mathcal{R}(A) = \{ y \in \mathbb{R}^m \mid y = Ax \mid x \in \mathbb{R}^m \} $$ Projection of a vector $y \in \mathbb{R}^m$ onto $\text{span}(\{x_1, \cdots, x_n\})$, $x_i \in \mathbb{R}^m$ is a vector in the span that is as close as possible to $y$ wrt $l_2$ norm $$ \text{Proj}(y; \{x_{1}, \cdots, x_n\}) = \argmin_{{v \in \text{span}(\{x_1, \cdots, x_n\})}} \| y - v \|_2 $$ ## Null space of $A$ is the set of all vectors that satisfies the following: $$ \mathcal{N}(A) = \{ x \in \mathbb{R}^n \mid Ax = 0 \} $$ --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/Atmosphere tags: - astron2e03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Atmosphere" title: "Atmospheric properties for exoplanets" date: 2024-03-07 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Atmosphere.html.md --- Ref: [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Atmosphere/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/06_Atmospheres_2024.pdf) ### features. $$ H = \frac{k_BT}{\omega m_H g} $$ 1. solid or dash solid q 2. larger mean molecular weights? B: shallow features ⇒ higher mean molecular weights mass-metallicity trend > [!question] Question > > Can we detect clouds in exoplanets? Clouds suppress atmospheric chemical signatures > [!tip] Important > > Introduced a degeneracy between _cloud-top pressure_ and _mean molecular weight_ ### clouds/winds on giant planets. Wind cells Hadley cells > [!note] Coriolis Effect > > Winds do not follow a straight trajectory ### Winds on tidally-locked exoplanets --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/Blackbody-Radiation tags: - astron2e03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Blackbody-Radiation" title: "Blackbody Radiation" date: 2024-02-06 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Blackbody-Radiation.html.md --- ### Atmospheric escape > _non-thermal escape_: A physical process that results in the full or partial loss of a planet’s atmosphere. **large-scale magnetic fields** - conductive material - convective motion - has kinetic energy > Mars doesn’t have convective interior, since the core has been cooled off. > > radioactive decay within the core ### Stellar winds continuous flow of _ionized particles_ emitted by the Sun and other stars. #### Charge exchange ### Thermal escape #### Jeans escape Given the Maxwell-Boltzmann distribution, the probability of a particle having a certain velocity is given by: $$ \left( \frac{dN}{dv} \right)_{m,T} = v^2 \left( \frac{m}{2 \pi k_B T} \right)^{\frac{3}{2}} \exp \left( -\frac{mv^2}{2k_BT} \right) $$ --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/Exoplanets tags: - astron2e03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Exoplanets" title: "Expolanets" date: 2024-02-02 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Exoplanets.html.md --- ### Q1) a. _Would you see any of the solar system planets transit?_ For an inclination of $i = 45 \degree$, transits are mostly observed when the orbital plan is edge on to the observer. It is plausible for some planets that is larger sized and orbit closer to ecliptic plane would transit the Sun given the direct line of sight. b. _If you monitored the Sun with radial velocity (RV) measurements and your technology was precise enough that you could measure RV signals down to 1 m/s, show and discuss whether you’re able to detect Venus._ Given the semi-amplitude $K$ of the radial velocity curve is given by $$ K = \frac{M_p \sin i}{(M_{*}+M_p)^{\frac{2}{3}}} \left( \frac{2 \pi G}{P} \right)^{\frac{1}{3}} $$ We have $$ \begin{align*} G &= 6.674 \times 10^{-11} m^3 \text{kg}^{-1} s^{-1} \\\ M_p &= 4.87 \times 10^{24} \text{kg} \\\ M_{*} &= 1.989 \times 10^{30} \text{kg} \\\ P &= 224.7 \text { days} \\\ K & = 4.87 \times 10^{24} \sin 45 \left( \frac{2 \pi G}{224.7 \times 24 \times 3600} \right)^{\frac{1}{3}} \approx 0.061 \text{m/s} \end{align*} $$ Given the precision of the RV measurements is 1 m/s, we can conclude that Venus is not detectable with the current technology. Venus induces a very small motion in the Sun due to gravitation pull, since RV is more sensitive to larger planets closer to their host stars. c. _Using the same RV measurements, show and discuss whether you’re able to detect Jupiter_ For Jupiter, we have $$ \begin{align*} G &= 6.674 \times 10^{-11} m^3 \text{kg}^{-1} s^{-1} \\\ M_p &= 1.898 \times 10^{27} \text{kg} \\\ M_{*} &= 1.989 \times 10^{30} \text{kg} \\\ P &= 224.7 \text { days} \\\ K = 1.898 \times 10^{27} \sin 45 \left( \frac{2 \pi G}{224.7 \times 24 \times 3600} \right)^{\frac{1}{3}} \approx 8.81 \text{m/s} \end{align*} $$ We can conclude that Jupiter is detectable with the current technology. This is due to Jupyter’s significant mass and gravitational pull on the Sun, which induces a larger motion via the Doppler shifts. d. _If you knew that the Sun’s mass is $1 M$ and you successfully detected Venus and/or Jupiter using these RV data, could you measure either planet’s absolute mass and why_ Detecting a planet using RV allows us to measure planet’s minimum mass, not absolute mass. This has to do with the inclination angle of its orbit ($\sin i$) If the orbit is edge-on ($i = 90 \degree$), then RV gives the closest approximation to the planet’s absolute mass. However, in this case our $i = 45 \degree$, so we can only measure the minimum mass of the planet based on the assumption of an edge-on orbit. e. _If you also monitored the Sun with astrometric measurements and your technology was precise enough that you could measure signals down to 10 $\mu \text{as}$ (i.e. micro-arcseconds), show and discuss whether you’re able to detect Jupiter_ The amplitude of astrometric signal $a$ is given by $$ a = \frac{m_{p}}{m_{*}} \frac{a_{p}}{d} $$ where $m_{p}$ is the mass of the planet, $m_{*}$ is the mass of the star, $a_{p}$ is the semi-major axis of the planet’s orbit, and $d$ is the distance to the star. For Jupyter, we have $$ \begin{align*} m_{p} &= 1.898 \times 10^{27} \text{kg} \\\ m_{*} &= 1.989 \times 10^{30} \text{kg} \\\ a_{p} &= 5.2 \text{AU} \\\ d &= 10 \text{pc} \\\ a &= \frac{1.898 \times 10^{27}}{1.989 \times 10^{30}} \frac{5.2 \times 1.496 \times 10^{11}}{10 pc} * 1e^6 \approx 496.21 \mu \text{as} \end{align*} $$ Therefore, Jupyter would be easily detectable. The signal is the result of Jupyter’s substantial mass and larger distance from the Sun. f. _Using the same astrometric measurements, show and discuss whether you’re able to detect Venus_ For Venus, we have $$ \begin{align*} m_{p} &= 4.87 \times 10^{24} \text{kg} \\\ m_{*} &= 1.989 \times 10^{30} \text{kg} \\\ a_{p} &= 0.72 \text{AU} \\\ d &= 10 \text{pc} \\\ a &= \frac{4.87 \times 10^{24}}{1.989 \times 10^{30}} \frac{0.72 \times 1.496 \times 10^{11}}{10 pc} * 1e^6 \approx 0.177 \mu \text{as} \end{align*} $$ Therefore, Venus would not be detectable. The signal is the result of Venus’s smaller mass and closer proximity to the Sun, therefore exert a smaller gravitational effect on the Sun’s position. g. _If you knew that the Sun’s mass is 1 M and you successfully detected Venus and/or Jupiter using these astrometric data, could you measure either planet’s absolute mass and why?_ Yes, since astrometric measures the displacement of the star’s position relative to distant background stars as it orbits around. The amplitude of the astrometric signal is directly proportional to the mass of the planet, and inversely proportional to the mass of the star, therefore we can calculate the absolute mass of the planet, given the semi-major axis of its orbits and the mass of the stars (which is 1M in this case here). ### Q2) $$ \begin{align*} L_{\text{orb}} &= \frac{2 \pi a^2 \sqrt{1-e^2}}{P} M \\\ L_{\text{rot}} &= I \omega \\\ I &= \frac{2}{5} M R^2 \\\ \omega &= \frac{2 \pi}{P_{\text{rot}}} \end{align*} $$ a. _Derive the expression for the ratio of orbital to rotational angular momenta. For this exercise, assume a circular orbit_ For ratio $\frac{L_{\text{orb}}}{L_{\text{rot}}}$ we have $$ \begin{align*} L_{\text{orb}} &= \frac{2 \pi a^2}{P} M \\\ L_{\text{rot}} & = I \omega = \frac{2}{5} M R^2 \frac{2 \pi}{P_{\text{rot}}} = \frac{4 \pi M R^2}{5 P_{\text{rot}}} \end{align*} $$ Therefore $\frac{L_{\text{orb}}}{L_{\text{rot}}} = \frac{5 a^2 P_{\text{rot}}}{2 R^2 P}$ b. \_It is a common misconception that the planets in our solar system orbit the Sun. In reality, the planets and the Sun all orbit their common center of mass. As such, the Sun has a non-zero semimajor axis $a_{\odot}$. Let us approximate the solar system as a 1-planet system that contains the Sun and Jupiter. In this scenario, what is the expression for $a_{\odot}$ in terms of Jupiter’s semimajor axis $a_J$ and both objects’ masses?\_ In a two-body system, the formula to derive the distance of the Sun from the barycenter is given by: $$ a_{\odot} = \frac{a_J M_J}{M_{\odot}} $$ where $a_J$ is the semimajor axis of Jupiter, $M_J$ is the mass of Jupiter, and $M_{\odot}$ is the mass of the Sun. The total distance $D$ between the Sun and Jupyter is the sum of their distance to the center of mass: $D = a_{\odot} + a_J$ Thus, considering this, the distance of the Sun from the barycenter is given by: $$ a_{\odot} = \frac{a_J M_J}{M_J + M_{\odot}} $$ c. _Using this expression, calculate the value of a in au_ Given that $a_J = 5.2 \text{AU}$, $M_J = 1.898 \times 10^{27} \text{kg}$, and $M_{\odot} = 1.989 \times 10^{30} \text{kg}$, we have $$ a_{\odot} = \frac{5.2 \times 1.898 \times 10^{27}}{1.898 \times 10^{27} + 1.989 \times 10^{30}} \approx 0.00496 \text{AU} $$ d. \_Given your value of $a_\odot$, calculate the ratio of the Sun’s orbital angular momentum to its rotation angular momentum. Is most of the Sun’s angular momentum manifested as orbital or rotational?\_ Using the formula derived in part a, we have $$ \frac{L_{\text{orb}}}{L_{\text{rot}}} = \frac{5 a_{\odot}^2 P_{\text{rot}}}{2 R^2 P} = \frac{5 \times {0.00496 \text{AU}}^2 \times 25 * 86400 \text{ sec}}{2 \times {(6.96 \times 10^8)}^2 \times 11.86 \times 3.153 \times 10^7} \approx 0.0164 $$ This indicates that most of the Sun’s angular momentum is manifested as rotational. e. _Now calculate the ratio of Jupiter’s orbital angular momentum to its rotational angular momentum. Is most of Jupiter’s angular momentum manifested as orbital or rotational?_ Using the formula derived in part a, we have $$ \frac{L_{\text{orb}}}{L_{\text{rot}}} = \frac{5 a_J^2 P_{\text{rot}}}{2 R^2 P} = \frac{5 \times {5.2 \text{AU}}^2 \times 9.93 \times 3600 \text{ sec}}{2 \times {(7.149 \times 10^7)}^2 \times 11.86 \times 3.153 \times 10^7} \approx 28287.8 $$ This indicates that most of Jupiter’s angular momentum is manifested as orbital. f. \_In parts d) and e) above, you should have found that the total angular momenta of both the Sun and Jupiter are heavily dominated by either their own $Li_{\text{orb}}$ or $L_{\text{rot}}$. Using the dominant forms of angular momenta for each body, calculate the ratio $\frac{L_J}{L_\odot}$\_ For Jupyter’s orbital angular momentum $L_{\text{orb}, J}$, we have $L_{\text{orb}, J} = M_J \sqrt{G M_{\odot} a_J}$, and for the Sun’s rotational angular momentum $L_{\text{rot}, \odot} = I_{\odot} \omega_{\odot}$, we have $L_{\text{rot}, \odot} = \frac{2}{5} M_{\odot} R_{\odot}^2 \omega_{\odot} = \frac{2}{5} M_{\odot} R_{\odot}^2 \frac{2 \pi}{P_{\text{rot,} \odot}}$ Thus the ratio $\frac{L_J}{L_\odot}$ is given by $$ \frac{L_J}{L_\odot} = \frac{L_{\text{orb}, J}}{L_{\text{rot}, \odot}} = \frac{M_J \sqrt{G M_{\odot} a_J}}{\frac{2}{5} M_{\odot} R_{\odot}^2 \frac{2 \pi}{P_{\text{rot,} \odot}}} $$ Given that $a_J = 5.2 \text{AU}$, $M_J = 1.898 \times 10^{27} \text{kg}$, $M_{\odot} = 1.989 \times 10^{30} \text{kg}$, $R_{\odot} = 6.96 \times 10^8 \text{m}$, and $P_{\text{rot,} \odot} = 25 \times 86400 \text{sec}$, we have $$ \frac{L_J}{L_\odot} \approx 17.20 $$ g. _Comment on where most of the angular momentum in the solar system is located._ Most of angular momentum in the solar system is located in the orbital motion of the planets, with Jupyter having the most significant contribution to the total angular momentum. This is due to the angular momentum of an orbiting body is proportional to the mass of the body and the distance from the center of mass, and inversely proportional to the period of the orbit. ### Q3) $$ \begin{align} v(\theta) &= \sqrt{GM \left( \frac{2}{r(\theta)} - \frac{1}{a} \right)} \\\ E = K + U &= -\frac{GMm}{2a} \\\ \end{align} $$ a. _Use the conservation of angular momentum L and mechanical energy E to derive Eq. 4_ The angular momentum $L$ of a planet in orbit around a larger mass is given by $$ L = mrv_{\perp} $$ where: - $m$ is the mass of the planet - $v_{\perp}$ is the velocity of the planet perpendicular to the vector pointing from the Sun - $r$ is the distance from the planet to the larger mass. In an elliptical orbit, the direction of veloocity changes, but magnitude of angular momentum is conserved due to no external torques. Therefore $$ L = mr(\theta)v(\theta)\sin \phi = \text{constant} $$ The total mechanical energy $E$ of a planet in orbit around a larger mass is given by The kinetic energy $K$ and the potential energy $U$ of a planet in orbit around a larger mass is given by $$ \begin{align} K &= \frac{1}{2}mv(\theta)^2 \\\ U &= -\frac{GMm}{r(\theta)} \end{align} $$ The total mechanical energy $E$ of a planet in orbit around a larger mass is given by $$ E = K + U = = \frac{1}{2}mv(\theta)^2 - \frac{GMm}{r(\theta)} $$ Given that the orbital velocity $v(\theta)$ is given by $$ v(\theta) = \sqrt{GM \left( \frac{2}{r(\theta)} - \frac{1}{a} \right)} $$ We can substitute $v(\theta)$ into the equation for $K$ to get $$ K = GMm \left( \frac{1}{r(\theta)} - \frac{1}{2a} \right) $$ Thus the total mechanical energy $E$ of a planet in orbit around a larger mass is given by $$ \begin{align} E = K + U &= GMm \left( \frac{1}{r(\theta)} - \frac{1}{2a} \right) - \frac{GMm}{r(\theta)} \\\ &= GMm \left( \frac{1}{r(\theta)} - \frac{1}{2a} - \frac{1}{r(\theta)} \right) \\\ &= -\frac{GMm}{2a} \end{align} $$ b. Use Eq. 4 to derive Eq. 3 $$ E = K + U = = \frac{1}{2}mv(\theta)^2 - \frac{GMm}{r(\theta)} $$ Since E remains constant, given that the total energy in a bound orbit is negative, we have $$ E = -\frac{GMm}{2a} $$ where $a$ is the semi-major axis of the orbit. We equate the two equations and solve for $v(\theta)$ to get $$ \begin{align} -\frac{GMm}{2a} &= \frac{1}{2}mv(\theta)^2 - \frac{GMm}{r(\theta)} \\\ v(\theta)^2 &= \frac{GM}{r(\theta)} \left( \frac{2}{r(\theta)} - \frac{1}{a} \right) \\\ v(\theta) &= \sqrt{GM \left( \frac{2}{r(\theta)} - \frac{1}{a} \right)} \end{align} $$ --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/Heating-Cooling-GH-effect tags: - astron2e03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Heating-Cooling-GH-effect" title: "Heating, Cooling, and the Greenhouse Effect" date: 2024-02-26 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Heating-Cooling-GH-effect.html.md --- Ref: [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/Heating-Cooling-GH-effect/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/06_HeatingCooling_GHeffect_2024.pdf) ### toy modal. eq: $T_{\text{surf}} ~ 1.32 \times T_{\text{atm, 2}}$ > [!note] General form of > > $$ > T_{\text{surf}} = {\lbrack \frac{(n+1)S}{\omega} \rbrack}^{\frac{1}{4}} = (n+1)^{\frac{1}{4}} \times T_{\text{atm, n}} > $$ --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/W1 tags: - astron2e03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1" title: "Solar systems in the context of exoplanets" date: 2024-01-08 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1.html.md --- Ref: [Solar System Exoplanets 2024](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/Solar-System-Exoplanets-2024.pdf) ## Obj. - content of solar system and orbital properties - Compare properties of Solar System to known exoplanetary - _six_ techniques for exoplanet detection & limitation. --- ## How people learn? > Student enter the classroom with preconceptions about how the world works. If their _initial understanding is not fully engaged, they may fail to grasp new concepts_ _develop competence_ 1. foundation knowledge 2. interrelationships among facts and concepts 3. retrieval and application. ## Solar system Sun → terrestrial planets → asteroid belt → Jovian (gas giants) \~ Ice giant planets → Trans-Neptunian objects (TNOs) (Dwarf planets → Kuiper belt → Oort cloud) > `1 au` (astronomical unit): average distance between Earth and Sun > Planetary orbits are (nearly) _co-planar_ - Dispersion in mutual inclinations: $\Delta{i} \approx 2\text{ deg}$ - Pluto and many other TNOs are \_more highly inclined ## Consequence of **Protoplanetary disks** _from Alma telescope_ - radio images of _warm dust continuum_ ($\leq 10^6\text{ Myrs}$) - Disk sizes $\approx 100\text{ au}$ - Variety of morphologies > [!question] Question > > Concentric gaps opened by _protoplanets_? - Due to active construction of _two protoplanets?_ > [!question] Question > > What other _dynamical properties_ do you expect for planets formed from a disk? - **Keplerian Motion**: Planets formed from a disk are expected to exhibit Keplerian motion → direct consequence rather than properties ## Regular vs. Irregular Satellites (aka, moons) | Regular Satellites | Irregular | | --------------------------------------------------------- | ----------------------------- | | Resemble mini planetary systems | Irregular orbits | | prograde | prograde or retrograde orbits | | low mutual inclinations, e.g: 4 Galilean moons of Jupyter | highly elliptical | | nearly circular orbits | highly inclined | ![Exoplanets discovery technique](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/exoplanets-discovery-technique.webp) Exoplanets discovery technique > Most exoplanetary systems are compact Kepler-11 System ## Transit ![Trasit](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/transit.webp) Trasit - Time-resolved photometry (i.e. stellar brightness) = “light curve” Can measure: - Orbital period - Orbital inclination - Has to be edged-on - **relative to telescope**, not to the _star_ - Reference is line-of-sight to exoplanetary system. - Planet radius ### transit depth. $$ \begin{aligned} \mathbf{Z} &= \frac{\text{Area}_{pl}}{\text{Area}_{*}} = (\frac{R_{pl}}{R_*})^2 \\\ &\\\ Z&: \text{transit depth} \\\ R_{pl}&: \text{planet radius} \\\ R_{*}&: \text{stellar radius} \\\ \end{aligned} $$ ### limb-darkening - appears fainter at their edges compared to centres - depends on the **star’s temperature structure** and the **wavelength of the observations** ![Example transit graph](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/transit-graph.webp) Example transit graph > The higher the depth, the larger the planet > Limb-darkening only depends on the stars, and wavelength observing at > Depth **doesn’t depends** on how far away the planets is away from the star (depends on the durations, orbiting more slowly) > Duration is impacted by _period_ and _inclination_ ### known transiting expolanets ![Radius Period diagram](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/radius-period-diagram.webp) Radius Period diagram Geometric transit probability: $$ \begin{align*} P_{tr} &\approx \frac{R_{*}}{a} \\ &= 0.5\% \left( \frac{R_{*}}{R_{\odot}} \right) \left( \frac{a}{a_{\oplus}} \right)^{-1} \end{align*} $$ where $\odot$ and $\oplus$ is the _sun_ and _earth_ respectively ## Transit Timing Variations _oscillating orbits_ ![Transit timing variation example](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/transit-timing-variation.webp) Transit timing variation example > B exhibits larger TTV > A is more massive, since B is influenced by A (pulled by gravitational effect) ## Radial velocity Only sees the bigger stars chemical abundances in star atmosphere → graphs (dotted vertical lines) Time-resolved spectroscopy to measure _Doppler-shifted spectral features_ > Radial velocity shift translates into wavelength shift $$ \frac{\lambda_{obs}}{\lambda_{ref}} = \sqrt{\frac{1+v_{rad}/c}{1-v_{rad}/c}} $$ Can measure - Orbital period - Orbital eccentricity - Planet’s minimum mass semi-amplitude of RV signal _K_ > K depends on the orbital inclination _i_ such that RV method is _sensitive an upper limit on planetary mass_ $$ \begin{align} K &= M_p(\frac{2\pi{G}}{PM_{*}^{2}})^{1/3} \\\ K &= M_p \sin{i} (\frac{2\pi{G}}{PM_{*}^{2}})^{1/3} \end{align} $$ _Derivation_ $$ \begin{align} a_sM_s &= a_pM_p \\\ P^2 &= \frac{4\pi^2}{GM_{*}}a_p^3 \end{align} $$ $M_p$: planet mass, $i$: orbital inclination, $P$: orbital period, $M_{*}$: stellar mass > - Insensitive to face-on, maximally sensitive to edge-on > - Easier to detect big planets Transits + Radial Velocity (Radius + mass) → planet bulk density ## Astrometry > proper motions ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/astrometry.webp) ### Aside - Parallax `1 pc = 1 AU / 1"` 1” arcsec = 1/60 arcminutes = (1/60)/60 degrees parsec is the distance from two planets Consider a star-planet system located at _d_ from us $x=d\theta = 1{AU}(\frac{d}{1pc})(\frac{\theta}{1"})$ $$ \triangle{\theta} = \frac{M_p}{d}(\frac{GP^2}{4\pi^2M^2_{*}})^{1/3} $$ biased on long period ### Gravitational Microlensing > Mass bends spacetime → light ray are bent by a curved spacetime → massive object act as _gravitational lens_ ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/W1/../../../../../../../../thoughts/university/twenty-three-twenty-four/astron-2e03/gravitational-microlensing.webp) --- slug: thoughts/university/twenty-three-twenty-four/astron-2e03/index tags: - university - astron2e03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/index" title: "Planetary Astronomy" date: 2024-01-08 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/astron-2e03/index.html.md --- Dr. [Ryan Cloutier](mailto:ryan.cloutier@mcmaster.ca) or [link](https://avenue.cllmcmaster.ca/d2l/home/598689) Book (optional): [_Fundamental Planetary Science: Physics, Chemistry, and Habitability by Lissauer, J.J. & de Pater, I. (ISBN 9781108411981)_](https://www.cambridge.org/highereducation/books/fundamental-planetary-science/8FD11659BE64C35A172DF0432D7FCFA4#overview) - warm-up quizzes b4 classes, post-lecture quizzes - Participation marks for pre, post will be graded - 3 in-class tests - Jan 25th - Feb 15th - March 21st - 3 take-home assignments, due date as hard copies, at the beginning of the class: - Feb 1st - March 7th - April 4th - 1 finals ## Overview 1. Our solar system in the context of exoplanetary systems 2. Exoplanet discovery techniques - Transits + Transit Timing Variations - Radial velocity - Astrometry - Direct Imaging - Gravitational microlensing 3. Orbital mechanics - Kepler’s laws, - Gravity, angular momentum, and energy - Orbital resonances - 3-body dynamics - Tides 4. Heating & Cooling - Blackbody radiation - Star-planet interactions - Greenhouse effect 5. Planetary Atmospheres - Thermal structure - Energy transport - Cloud formation - Composition - Exoplanet transmission spectra 6. Planetary Interiors - Bulk density and composition - Mass-radius relation 7. Exoplanet formation/demographics - Core accretion - Measuring occurrence rates - Observed occurrence rates and formation inferences --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Defining-Internal-Alignment-and-Job-Analysis tags: - commerce4be3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Defining-Internal-Alignment-and-Job-Analysis" title: "Defining Internal Alignment & Job Analysis" date: 2024-01-24 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Defining-Internal-Alignment-and-Job-Analysis.html.md --- > internal alignment: relationship among different jobs/skills/competencies within a single organisation, job structure also known as internal equity Structure needs - support organisation strategy - support workflow (process which good/services are delivered to the customer) - motivates behaviour (line-of-sight) > **career laddering/progression** ## Internal Pay Structure > refers to the array of pay rates for different work or skills within a single organisation - number of level - pay differentials between levels - criteria or bases used to determine those levels and differentials. ## differentials > pay difference among levels - requiring more skill/experience - performed in unpleasant work conditions - adds more value to the company - motivations ## criteria - content: work performed in a job - value: worth of the work ## structure. Job-based structure: work content - tasks, behaviours, responsibilities (engineering teams) Person-based structure: skill, knowledge, competencies focus to employees (lawyer, clientele) ## impact. ### external factors - economic pressures: inflations, COL - Government policies, Laws and Regulations: Pay-Equity Act - Stakeholders: board, employees - Cultures and customs: high-performance and focus internal equity. ### organisation factors - strategy - technology - human capital - HR policy - Employee acceptance - Cost implications ## internal labour markets > rules and procedures that determine pay for different jobs within single organisation and allocate employees among those different jobs. ## Strategy for designing internal structures. | Tailored | Loosely Coupled | Egalitarian | Hierarchical | | -------------------------------------------- | ----------------------------------------------------- | ------------------------------------------------- | -------------------------------------------------------------- | | Adapted by organisation with low costs | Adapted by organisaion require constant innovation | Few levels | multiple levels | | well-defined jobs with detailed paystructure | Pay structure are more loosely linked to organization | smaller differentials | detailed job description | | McDonald | Job are flexible, adaptable and changing | Equal treatment = knowledgable feels underpaid | | | | | higher performance when collaboration is required | higher performance when workflow depends on individual effort. | ## Equity theory: fairness - compare ratio of their own outcomes ## Tournament theory - relationship between motivation and performance ## Institutional theory - Copy others and conform - use “best practices” - align for one organisation might not align with another ## Consequences. _for internally-aligned Pay Structure_ - efficiency - fairness - compliance --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels--and-Employee-Benefits tags: - commerce4be3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels--and-Employee-Benefits" title: "Designing Pay Levels and Employee Benefits" date: 2024-02-28 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels--and-Employee-Benefits.html.md --- See also [Designing Pay Levels, Pay Mix and Pay Structure](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels--and-Employee-Benefits/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels,-Pay-Mix-and-Pay-Structure.pdf) and [Pay Employment Benefits](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels--and-Employee-Benefits/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/Pay-Employment-Benefits.pdf) ## decision for externally competitive pay levels and structure. - employer’s competitive pay policy - purpose of survey - construct market line. - balance competitiveness with internal alignment through _pay range, flat rates, bands_ ## survey. - adjust pay level - pay mix: stock, benefits - pay structure: job evaluation results. - estimate competitors’ labour costs (competitive intelligence) ### design Which job to include? - benchmark job approach, low-high approach, conversion/survey level What information to collect? - organisation data, total compensation data, information about incumbent ### interpretation Verify anomalies, accuracy of match, validation to other trends. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Designing-Pay-Levels--and-Employee-Benefits/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/survey-data-elements-for-inclusion.webp) ## select relevant market competitors 1. Relevant labor markets 2. Fuzzy markets: new orgs/orgs with unique jobs fuse diverse factors for relevant markets fuzzy > [!question] Question > > What factors determine the relevant market for pay surveys? Why is the definition of the relevant market important? - **Industry and Job Function**: depending on the job sector and industry size. - **Geographic Location**: location-based pay - **Experience and Education Level**: pay for experience and education - **Market trends**: market trends and changes importance because: - **Competitiveness**: to attract and retain employees. - **Fairness and Equity**: enhance satisfaction and reduce turnover. - **Legal compliance**: to avoid discrimination. ### organization | Basic Elements | Examples | Rationale | | --------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------- | | Identification | Company name, address, contact person | Further contacts | | Financial performance | Assets, sales, profits (after taxes), cashflow | Indicates nature of the product/service markets, the ability to pay, size and financials | | Size | Profit centres, product lines | Importance of specific job groups to business success | | | Total number of employees | Impact on labour market | | Structure | Organizational charts | Indicates how business is organized and how important managerial jobs are. | ### Total compensation - cash forms used - non-cash forms used | | Advantages | Disadvantages | | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | **Base pay** | Tells how competitors are valuing the work in similar jobs. | Fails to include performance incentives and other forms, so will not give true picture if competitors offer low base but high incentives. | | **Total cash** | Tells how competitors are valuing work; also tells the cash pay for performance opportunity in the job. | Not all employees may receive incentives, so it may overstate the competitors’ pay; plus, it does not include long-term incentives. | | **Total compensation (base + bonus + stock options + benefits)** | Tells the total value competitors place on this work. | All employees may not receive all the forms. Don’t set base pay equal to competitors’ total compensation. | ### incumbent & jobs | Basic Elements | Examples | Rationale | | -------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------- | | Date | Date survey data in effect | Need to update rates to current date | | Job | Match generic job description | Indicates degree of similarity with survey’s key jobs | | Individual | Number of employees supervised and reporting levels | Describes scope of responsibilities | | | Years since degree, education, date of hire | Indicates training and tenure of incumbents | | Pay | Actual rates paid to each individual, total earnings, last increase, bonuses, incentives | | ### hr outcomes. | Basic Elements | Examples | Rationale | | ------------------ | -------------------------------------------------------------------------------------- | ----------------------------------------------- | | Productivity | Revenues to employee ratio, revenues to labour costs ratio | Reflect organization performance and efficiency | | Total labour costs | Number of employees x (average wages and benefits) | Major expense | | Attraction | Yield ratio, number accepting offer to number of job offers ratio | Reveals recruiting success | | Retention | Turnover rate; number of high or low performers who leave to number of employees ratio | Reveals outflow of people | | Employee views | Total pay satisfaction | Reveals what employees think about their pay | ## market pay line > links a company’s benchmark jobs on horizontal axis with market rates paid by competitors on the vertical axis. ### Internal structure and external market rates - pay-policy line - pay ranges #### pay-policy line > percent above or below market line intend to “lead”, “lag”, or “match” rate. > [!tip] Develop grades > > single grade will have same pay range #### pay range - midpoints where pay-policy line crosses centre of grade, minimum and maximum - larger ranges in managerial jobs reflect the greater opportunity for performance variants in the work - firm uses percentiles as maximum and minimums while other establish them separately. > pay disparity among candidates. 1. Internal pressures - recognize performance pay difference with pay - expectations pay over time 2. External pressures - difference in quality among individuals - difference in productivity or value variations - mix of pay forms #### range overlap Overlap ought to be large enough to induce employees to seek promotions. ## Broadbanding > collapse salary grades into a few broad bands, each with a minimum and maximum - flexibility - career growth --- ## Employee Benefits - Flexible hours - WFH: 45% of employees love their jobs (according to Forbes) - Vacation time and PTO: No timeout more prone to burnt out - Pay parental leave > part of compensation package, other than pay for time worked. Growth in Employee Benefits - Cost effectiveness of Benefits - Union - Employer impetus - Government Impetus ## issues. - ensure external competitiveness - adequacy of benefits - Who should be protected? - How much choice should employees have among an array of benefits? - How should benefits be financed? > [!question] Question > > How does external equity differ when pay versus benefits? - Pay is quantifiable regarding monetary values, whereas benefits are objective in terms of equity. --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Final-report tags: - commerce4be3 description: "a Uber compensation policies case study" title: "Uber compensation analysis" date: 2024-03-20 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Final-report.html.md --- --- ## meeting minutes ### 2024-03-20 Evan is missing, everyone else is present Part 1: Company introduction (Evan) Part 2: Identification of issues/problem statement (Vanessa, Josh) Part 3: Analysis of the current compensation system Part 4: Proposals for compensation package and performance criteria (Aaron, Imran) Part 5: Implementation and details of improvement, suggestions Part 6: Conclusion, recommendations --- To establish pay transparency, Uber should disclose to drivers how their pay is calculated, including the commission Uber takes from each fare, typically around 25% (Zinkula. 2024). Uber can provide a detailed breakdown in the driver app and weekly pay statements showing the passenger fare, Uber’s take rate, and the driver payout for each trip. Uber should also publish its average take rates and driver earnings by the city to provide greater transparency and allow drivers to make informed decisions. Uber should ensure drivers do not operate at a loss after accounting for expenses like fuel, insurance, and vehicle maintenance, which is \$0.32 per mile (Zoepf, 2018). Uber should guarantee drivers a minimum hourly earnings rate after accounting for expenses or a minimum rate card per mile and per minute to implement minimum earning guarantees. This will require extensive research on different car models and fuel consumption, as well as constructing statistical models to predict the expected costs for each driver accurately. This will provide greater financial security and help compensate drivers fairly for their time and costs. Uber should reward high-performing drivers with incentives based on metrics like trips completed and utilization rate, in addition to current perks provided through Uber Pro (Uber 2024). Uber should also consider tenure-based increases (an example is their proposed Upfront Driver Pay), such as raising driver rates by 2-3% for each year of service (Sherman, 2024). This will help retain experienced drivers and demonstrate that Uber values its long-term driver partners. Lastly, Uber should expand its driver rewards program, Uber Pro, which offers vehicle maintenance discounts based on points earned for trips (Mishel, 2024). Kessler (2020) reported that while Uber has provided some sick pay and other financial assistance to drivers, but many say it is insufficient during the pandemic. Drivers are classified as independent contractors, lacking benefits like health insurance and paid time off. However, these benefits are still limited compared to employee benefits packages. To provide more security for drivers, Uber should look into offering occupational accident insurance, disability payments, and subsidized health insurance in more markets. Uber benefits drivers in European cities like London and Paris (SERUpractice) --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-description-exercise tags: - commerce4be3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-description-exercise" title: "exercise" date: 2024-01-31 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-description-exercise.html.md --- 4BE3 Job Description Assignment Factor 1 – Knowledge/Education/Training We agree with the Level 3 decision because the types of duties required of the Customer Service Representative are quite heavy and a person who is still in high school may not fully be able to grasp these ideas or fully understand how to do them. Factor 2 – Skill Gained by Experience We agree with the Level 3 decision (over 3 months and including 6 months) because these tasks may take a bit more time to learn and require more training and support with them. Employee probation is usually done after 3 months so they should be able to do these tasks on their own by that point with minimal support. Factor 3 – Responsibility for Decisions and Skill in Operations We think that Factor 3 should be increased to a Level 4 because the employee has decision-making authority over refunds and payments, and also needs to pay close attention to inventory and ordering to ensure everything is ordered in a timely manner. They need to also manage multiple programs. Factor 4 – Responsibility for Ingenuity and Creativity We agree with the Level 2 decision because the role does not require too much creativity other than the occasional thinking on the spot and coming up with solutions, but these solutions will not be implemented company wide. --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-based-Pay-structures-and-Job-Evaluation tags: - commerce4be3 - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-based-Pay-structures-and-Job-Evaluation" title: "Job-based Pay structures and Job Evaluation" date: 2024-01-31 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-based-Pay-structures-and-Job-Evaluation.html.md --- See [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-based-Pay-structures-and-Job-Evaluation/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-based-Pay-Structures-and-Job-Evaluation.pdf) ## Speakers - ask better questions - need to know what you want the compensation philosophy - what is the purpose of the compensation plan? - are we still doing this correct? - questions this philosophy? - ask hard problems, questions - internet is not a good salary source - Good communication and understand what your compensation philosophy is - Keys: - Jobs and hierarchy they brings - skills and competencies they offer - Documents and version control it. - Subjective and Bias builtin - Research the roles within the jobs. ## Job Evaluation > systematically determining the relative worth of jobs to create job structure within an organisation > based on combination of job content, skills, values, organisation culture, external market Decision: - purpose - single or multiple plans - among alternative approaches - involvement of relevant stakeholders - evaluate usefulness ## establish purpose aligned if - supports organisation strategy - supports workflow - fair to employees - motivates behaviour toward organisation objectives. ## single vs. multiple plans - evaluation plans for different types of workflow - number of job evaluation plans _depends_ on how detailed it needs to be to make pay decisions ## choices of job evaluation ### simple ranking. - order from highest to lowest based on relative values - advantages: simple, fast, easy to understand and explain to employees; least expensive initially. - disadvantages: - ranking criteria is poorly defined → evaluations become biased - evaluators must be knowledgeable about all jobs - results are difficult to defend and costly solutions maybe be required. | alternatives methods | description | | -------------------- | ---------------------------------------------------------------------------------------------------- | | Alternation | order descriptions alternately at each extreme, evaluators agree on which jobs are the most valuable | | Paired comparison | compare each job with every other job, number of comparisons = n(n-1)/2 | ### classification - series of classes cover the range of jobs - descriptions are labels which capture general nature of work ## point method - assignment of numeric score - procedure results in a relative ordering of jobs based on the number of points that each job “scores”. ### 1. Job Analysis - representative benchmark jobs is drawn for analysis ### 2. Determine Compensable Factors - based on strategy and values of organisation - based on the work performed - acceptable to stakeholders affected by the resulting pay structure challenges: - small numbers - unique criteria ### 3. Scale the Factors - 4 to 8 degrees ### 4. Weigh the Factors - of important - weights reflect the relative importance of each Factors - determined through an advisory committee (a priori judgement approach) ### 5 & 6. Communicate. Who? - employees - consultants - union representatives ### Design process and Job structures provides a hierarchy of work, or a job structure [exercise](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-based-Pay-structures-and-Job-Evaluation/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/Job-description-exercise) ## Skill-based plans - in the trade - link pay to the depth or breath of skills - pay individuals for all relevant skills → wage attach to person ### types. 1. depth - based on knowledge of the person 2. generalist/breadth - increased by acquiring new knowledge > [!tip] Purpose of skill-based > > supports organisation’s strategy, workflow, fair to employees, and motivates behaviour ### Outcomes - well accepted by employees and provide strong motivation for individuals to increase new skills - become increasingly expensive - flexibility permits leaner staff - success is determined by how well it aligns ## Competency-based plans - faireness and motivations - skill-based, foundation for successful work - core competencies are often linked to the mission statement - competency set translates core → action - indicators are observable behaviour ### Analysis - core competencies are not unique for each company - differs applies their competencies - verify their possession of that competency - no objective way to certifying competencies - relatively few levels and wide differentials for increased levels ## internal alignment reflected in structures - purposed of job and person-based procedures is to design and manage a pay structure. ## reliability and validity - consultants - improve reliability by using evaluators familiar with the work and trained in job evaluation. - validity refers to degree the evaluation assesses relative job worth. ## Acceptability - formal appeals process → request and re-analysis or skills re-evaluation. - employee attitude surveys assess perceptions of how useful evaluation is as a management tools. ## bias. To ensure bias-free evaluation: - compensable factors and scales to include the content of jobs - factor weights are consistently biased against jobs - apply the plan as bias free a manner as feasible. --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/Pay-model tags: - commerce4be3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Pay-model" title: "Pay model" date: 2024-01-10 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Pay-model.html.md --- see also: [Slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/Pay-model/../../../../../../../../thoughts/university/twenty-three-twenty-four/commerce-4be3/The-Pay-Model.pdf) ## compensation. > refer to all forms of financial returns and tangible services and benefits receive as part of an employment relationship 1. societal - pay and benefits as measure of justice - job losses or gains in a country is a function of labor costs 2. stockholders - ESA: employment options plan and stock purchase plan, ISO - executive pays: VPs, higher up. - performance measures 3. managers - major expense that must be managed - major determinant of employee attitudes and behaviours 4. employees - financial freedom - exchange of good - incentive to work a job, and have a reward for having done so. Merit payment: > Total Rewards: RRSP: 401k Health spending account: employment security: union membership > Social capital Employee value proposition Psychological safety: without having retaliation and being safe at work environment. ## total reward. ### total compensation - include cash payments (IA, CPP) Cash compensation: - Base pay: Job evaluation - merit increases are increments - COLA (cost of living adjustment) - incentives (bonuses) Benefits - health insurance - pension: retirement and saving - allowances ### relational returns > Non-financial returns that substantially impact employee behaviour, such as employment security and learning and developmental opportunities - psychological returns - recognition and status ## pay model. ```mermaid graph LR SP{{Strategic polcies}} --> T{{Techniques}} --> SO{{Strategic objectives}} ``` collective bargaining - objectives - policies that form the foundation of compensation - techniques that make up compensation system. ### internal alignment - comparisons among jobs and skill levels within organization - pertains pay rates both for employees - Pay relationship affect compensations objectives ### external competitiveness - pay comparisons with competitors externally - **market driven** - objectives: - ensure pay is sufficient to attract - control labor cost to ensure competitive pricing of product. ### employee contributions - how employees are rewarded - bases for performance-based evaluations, perceive pay as fair. ### management - right people get the right pay for achieving the right objectives the right way. ## pay techniques - four basic policies - tools and mechanism that are used to achieve objectives. Gender inequality [article](https://web.archive.org/web/20230602214140/https://www.theglobeandmail.com/business/careers/article-not-a-single-large-public-canadian-firm-has-closed-the-gender-pay-gap/) --- slug: thoughts/university/twenty-three-twenty-four/commerce-4be3/index tags: - university - commerce4be3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/index" title: "Compensation" date: 2024-10-29 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/commerce-4be3/index.html.md --- --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/A1 tags: - swfr4x03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A1" title: "Floating points error, Taylor series, and approximation" date: 2023-09-25 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A1.html.md --- **Problem 1 \[5 points]** Consider solving the scalar equation $ax = b$, for given a and b and assume that you have computed $\hat{x}$. To measure the quality of $\hat{x}$, we can compute the residual $r = b − a\hat{x}$. Derive the error in $fl(r)$, that is the relative error in the floating point representation of $r$. Can it be large? Explain. _Answer_: Given $r = b - a\hat{x}$, - Let $fl(a)$ is the floating point representation of $a$ - Let $fl(b)$ be the floating point representation of $b$ - Let $fl(\hat{x})$ be the floating point representation of $\hat{x}$ Assuming relative error of $fl(\hat{x})$ is $\delta_{\hat{x}}$ ⇒ $fl(\hat{x}) = \hat{x}_{true}(1+\delta_{\hat{x}})$ Therefore: $a*\hat{x}=a*\hat{x}_{true}(1+\delta_{\hat{x}})$ Assuming relative error of $fl(a\hat{x})$ is $\delta_{m}$ ⇒ $fl(a\hat{x}) = a*\hat{x}_{true}(1+\delta_{\hat{x}})(1+\delta_{m})$ Computed residual $r = b - a*\hat{x}_{true}(1+\delta_{\hat{x}})$ Assuming relative error of $fl(b-a\hat{x})$ is $\delta_{s}$ ⇒ $fl(b-a\hat{x}) = b - a*\hat{x}_{true}(1+\delta_{\hat{x}})(1+\delta_{m})(1+\delta_{s})$ Thus, the error in $fl(r)$ is $\delta_{r} = (1+\delta_{\hat{x}})(1+\delta_{m})(1+\delta_{s}) - 1$ > The error can be large if: > > - the relative error of $\hat{x}$ is large > - significant rounding error in multiplication and subtraction (otherwise $\delta_m$ and $\delta_s$ is large) > - value of $a$ and $b$ such that $b - a\hat{x}$ introduces “catastrophic cancellation”, or $b \approx a\hat{x}$ --- **Problem 2 \[2 points]** Explain the output of the following code ```matlab clear all; x = 10/9; for i=1:20 x = 10*(x-1); end x ``` Is the result accurate? _Answer_: The following includes steps for the above MATLAB code: 1. `clear all` clears all variables in current workspace 2. `x = 10/9` initialise the first value of $x$ to $\frac{10}{9}$ 3. The `for` loop runs for 20 times, where it updates $x$ using the following formula $x:=10*(x-1)$ 4. Finally, `x` prints out the value of `x` into the MATLAB terminal window. The output of the code is not correct, due to floating point errors. Machine epsilon $\epsilon_{mach}$ by default in MATLAB (which is in double precision) is approx. $2.2204e-16$ Since $x$ is a floating point, every iteration in the `for` loop will include a floating point error, and thus after 20 iterations, the results won’t be accurate to its mathematical value. --- **Problem 3 \[3 points]** Suppose you approximate $e^x$ by its truncated Taylor series. For given $x = 0.1$, derive the smallest number of terms of the series needed to achieve accuracy of $10^{−8}$ . Write a Matlab program to check that your approximation is accurate up to $10^{−8}$. Name your program `check_exp.m`. _Answer_: Taylor series of real or complex $f$ at $c$ is defined by $f(x) = \sum^{\inf}_{k=0}\frac{f^{(k)}(c)}{k!}(x-c)^k$ Given $f$ has $n+1$ continuous derivative $[a, b]$, or $f \in C^{n+1}[a, b]$ , then the truncated Taylor series can be defined as $f(x) = \sum^{\inf}_{k=0}\frac{f^{(k)}(c)}{k!}(x-c)^k + E_{n+1}$ where $E_{n+1} = \frac{f^{n+1}(\xi(c, x))}{(n+1)!}(x-c)^{n+1} = \frac{f^{n+1}(\xi)}{(n+1)!}(x-c)^{n+1}$ Hence, with $x := x+h$ we have $f(x+h) = \sum^{\inf}_{k}\frac{f^{(k)}(x)}{k!}(h)^k + E_{n+1}$ where $E_{n+1} = \frac{f^{n+1}(\xi)}{(n+1)!}h^{n+1}$ and $\xi$ is between $x$ and $x+h$ Thus, we need to find $n$ terms such that $| E_{n+1} = \frac{e^x(\xi)}{(n+1)!}x^{n+1} | \le 10^{-8}$ with $\xi$ between 0 and $x$ With $x=0.1$, then $e^0.1 \approx 1.1052$. $E_{n+1} = \frac{e^{\xi}}{(n+1)!}x^{n+1} = \frac{1.1052}{(n+1)!}0.1^{n+1} \le 10^{-8} \rightleftharpoons \frac{0.1^{n+1}}{(n+1)!} \le 9.0481e-09$ From the above function, with $n=6$ the Taylor Series will be accurate up to $10^{-8}$ The below is the Matlab to examine the above terms: ```matlab title="check_exp.m" function check_exp() x = 0.1; % Approximation for the first 6 terms of the Taylor series approx = 1 + x + x^2/factorial(2) + x^3/factorial(3) + x^4/factorial(4) + x^5/factorial(5); actual = exp(x); error = abs(approx - actual); % Display the results fprintf('Approximated value: %f\n', approx); fprintf('Actual value: %f\n', actual); fprintf('Error: %e\n', error); % Check if the error is less than 10^-8 if error < 10^-8 disp('The approximation is accurate up to 10^-8.'); else disp('The approximation is NOT accurate up to 10^-8.'); end end ``` --- **Problem 4 \[3 points]** The sine function has the Taylor series expansion $sin(x) = x − \frac{x^3}{3!} + \frac{x^5}{5!} − \frac{x^7}{7!} + · · · +$ Suppose we approximate $sin(x)$ by $x − \frac{x^3}{3!} + \frac{x^5}{5!}$. What are the absolute and relative errors in this approximation for $x = 0.1, 0.5, 1.0$? Write a Matlab program to produce these errors; name it `sin_approx.m`. _Answer_: Assuming $y=sin(x)$ as exact value and $\tilde{y}$ is the approximate value of $sin(x)$, which is $\tilde{y} = x − \frac{x^3}{3!} + \frac{x^5}{5!}$ - Absolute error is given by $|y - \tilde{y}|$ - Relative error is given by $\frac{|y-\tilde{y}|}{y}$ For the following $x \in {0.1, 0.5, 1.0}$, the following table represents the error: | Error | $x=0.1$ | $x=0.5$ | $x=1.0$ | | -------- | ------------ | ------------ | ------------ | | Absolute | 1.983852e-11 | 1.544729e-06 | 1.956819e-04 | | Relative | 1.987162e-10 | 3.222042e-06 | 2.325474e-04 | ```matlab title="sin_approx.m" function sin_approx() % Define the values of x x_values = [0.1, 0.5, 1.0]; % Loop through each value of x to compute the errors for i = 1:length(x_values) x = x_values(i); % Calculate the approximation approx = x - x^3/factorial(3) + x^5/factorial(5); % Calculate the actual value of sin(x) actual = sin(x); % Calculate the absolute error abs_error = abs(approx - actual); % Calculate the relative error rel_error = abs_error / abs(actual); % Display the results for each x fprintf('For x = %f:\n', x); fprintf('Approximated value: %f\n', approx); fprintf('Actual value: %f\n', actual); fprintf('Absolute Error: %e\n', abs_error); fprintf('Relative Error: %e\n\n', rel_error); end end ``` --- **Problem 5 \[2 points]** How many terms are needed in the series $arccot(x) = \frac{π}{2} − x + \frac{x^3}{3} − \frac{x^5}{5} + \frac{x^7}{7} + · · ·$ to compute $arccot(x)$ for $|x| \le 0.5$ accurate to 12 decimal places. _Answer_: To calculate $arccot(x)$ for $|x| \le 0.5$ accurate to 12 decimal places, we need to find $n$ such that $|E_{n+1}| < 10^{-12}$ Substitute for error term, needs to find $n$ such that $|\frac{f^{n+1}(\xi)}{(n+1)!}h^{n+1}| < 10^{-12}$ We know that the general term for Taylor series of $arccot(x)$ is $a_n = \frac{(-1)^nx^{2n+1}}{2n+1}$ Since we are considering on interval $|x| \le 0.5$, and `arccot(x)` is an alternating series, the largest possible value of the error term will occur when $x=0.5$ Thus, the equation to solve for $n$ term is $|\frac{(-1)^{n+1}*x^{2n+1}}{(2n+1)*(n+1)!}| < 10^{-12} \rightleftharpoons \frac{x^{2n+1}}{(2n+1)*(n+1)!} < 10^{-12}$ Using the following function `find_nth_term`, we can find that when $n=17$ will ensure the $arccot(x)$ for $|x| \le 0.5$ to be accurate to 12 decimal places. ```python import math def find_nth_terms(x: float, eps: float = 1e-12): n = 0 term = x while abs(term) >= eps: n += 1 term = math.pow(-1, n) * math.pow(x, 2 * n + 1) / (2 * n + 1) return n find_nth_terms(0.5) ``` --- **Problem 6 \[2 points]** Consider the expression $1024 + x$. Derive for what values of $x$ this expression evaluates to 1024. _Answer_: In IEEE 754 double precision, $\epsilon_{mach} = 2^{-52} \approx 2.2*10^{−16}$ From the definition of machine epsilon ($1024 + \epsilon_{mach} > 1024$), the difference between $N$ and the next representable numbers is proportional to $N$, that is $N*\epsilon_{mach}$ Thus the problem implies there is such $x$ that exists within a range such that $x < \frac{1}{2}*\epsilon_{mach}*N$ Substitute value for $N=1024$ and $\epsilon_{mach} \approx 2.2*10^{−16}$ ⇒ $x < \frac{1}{2}*2.2*10^{-16}*1024 \approx 1.1368448×10^{−13}$ > $\forall x \lessapprox 1.1368448×10^{−13} \rightarrow (1024 + x) \: \text{evaluates} \: 1024$ --- **Problem 7 \[2 points]** Give an example in base-10 floating-point arithmetic when a. $(a + b) + c \neq a + (b + c)$ b. $(a ∗ b) ∗ c \neq a ∗ (b ∗ c)$ _Answer_: For the first example $(a + b) + c \neq a + (b + c)$, assuming using double precision: Let: - $a=1.0$ - $b=1.0*10^{-16}$ - $c=-1.0$ ⇒ $(a+b)+c = 0$, whereas $a+(b+c) = 1.11022*10^{-16}$ The explanation from _Problem 6_ can be used to explain that $(a+b) = a$ since $b < 1.1368448×10^{−13}$, therefore $(a+b)+c=0$, whereas in $a+(b+c) \approx 1.0 - 0.999999999 \approx 1.11022*10^{-16}$ due to round up for floating point arithmetic. For the second example $(a ∗ b) ∗ c \neq a ∗ (b ∗ c)$, assuming the following $FP$ system $(10, 3, L, U)$ where $x=\pm{d_0.d_1d_2}*10^e, d_0 \neq 0, e \in [L, U]$ Let: - $a=1.23$ - $b=4.56$ - $c=7.89$ ⇒ $(a*b)*c=44.3$ ($a*b=5.61$ rounded and $5.61*c=44.3$), whereas $a*(b*c)=44.2$ ($b*c=35.9$ rounded and $35.9*a = 44.2$) --- **Problem 8 \[8 points]** Consider a binary floating-point (FP) system with normalised FP numbers and 8 binary digits after the binary point: $x=\pm{1.d_1d_2 · · · d_8 × 2^e}$ For this problem, assume that we do not have a restriction on the exponent $e$. Name this system B8. (a) \[2 points] What is the value (in decimal) of the unit roundoff in B8? (b) (1 point) What is the next binary number after $1.10011001$? (c) \[5 points] The binary representation of the decimal $0.1$ is infinite: $0.00011001100110011001100110011 · · ·$. Assume it is rounded to the nearest FP number in B8. What is this number (in binary)? _Answer_: B8 system can also be defined as $FP(2, 8, L, U)$ (a). For a binary FP system with $p$ binary digits after binary point, the unit roundoff $u$ is given by $u=2^{-p}$ With $t=8$, unit roundoff for this system in decimal is $u = 2^{-8} = 0.00390625$ (b). Given $u=2^{-8}=0.00000001$ in binary, the next binary number can be calculated as: ``` 1.10011001 + 0.00000001 = 1.10011010 ``` (c). first 9 digits after the binary point to determine how to round: 0.000110011 Given the unit roundoff is $2^{-8}$ and 9th digit is 1 (or $2^{-9}$) → round up Therefore, 0.1 rounded to nearest FP system in B8 is $0.00011010$ in binary --- **Problem 9 \[10 points]** For a scalar function $f$ consider the derivative approximations $f^{'}(x) \approx g_1(x, h) := \frac{f(x + 2h) − f(x)}{2h}$ and $f^{'}(x) \approx g_2(x, h) := \frac{f(x + h) − f(x − h)}{2h}$ . a. \[4 points] Let $f(x) = e^{sin(x)}$ and $x_0 = \frac{\pi}{4}$. - Write a Matlab program that computes the errors $|f ′(x_0)−g1(x_0, h)|$ and $|f′(x_0)−g_2(x_0, h)|$ for each $h = 10^{−k}, k = 1, 1.5, 2, 2.5 . . . , 16$. - Using `loglog`, plot on the same plot these errors versus $h$. Name your program `derivative_approx.m`. For each of these approximations: b. \[4 points] Derive the value of $h$ for which the error is the smallest. c. \[2 points] What is the smallest error and for what value of $h$ is achieved? How does this value compare to the theoretically “optimum” value? _Answer_: (a). ```matlab title="derivative_approx.m" function derivative_approx() % Define the function f and its derivative f = @(x) exp(sin(x)); df = @(x) cos(x) * exp(sin(x)); % Define the approximation functions g1 and g2 g1 = @(x, h) (f(x + 2*h) - f(x)) / (2*h); g2 = @(x, h) (f(x + h) - f(x - h)) / (2*h); % Define x0 x0 = pi/4; % Define k values and compute h values k_values = 1:0.5:16; h_values = 10.^(-k_values); % Initialize error arrays errors_g1 = zeros(size(h_values)); errors_g2 = zeros(size(h_values)); % Compute errors for each h_value for i = 1:length(h_values) h = h_values(i); errors_g1(i) = abs(df(x0) - g1(x0, h)); errors_g2(i) = abs(df(x0) - g2(x0, h)); end % Find the h value for which the error is the smallest for each approximation [~, idx_min_error_g1] = min(errors_g1); [~, idx_min_error_g2] = min(errors_g2); h_min_error_g1 = h_values(idx_min_error_g1); h_min_error_g2 = h_values(idx_min_error_g2); % Display the h values for the smallest errors fprintf('For g1, the smallest error is at h = %e\n', h_min_error_g1); fprintf('For g2, the smallest error is at h = %e\n', h_min_error_g2); % Plot errors using loglog loglog(h_values, errors_g1, '-o', 'DisplayName', '|f''(x_0) - g_1(x_0, h)|'); hold on; loglog(h_values, errors_g2, '-x', 'DisplayName', '|f''(x_0) - g_2(x_0, h)|'); hold off; % Add labels, title, and legend xlabel('h'); ylabel('Error'); title('Errors in Derivative Approximations'); legend; grid on; end ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A1/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/derivative-approx.svg) (b). The Taylor’s series expansion of function $f(x)$ around point $a$ is: $f(x) = \sum_{n=0}^{\inf}{\frac{f^{(n)}(a)}{n!}(x-a)^n} = f(a) + f^{'}(a)(x-a) + \frac{f^{''}(a)}{2!}(x-a)^2 + \frac{f^{'''}(a)}{3!}(x-a)^3 + ...$ For the first approximation $g_1(x, h)$, with Taylor series expansion: $f(x+2h) = f(x) + 2hf^{'}(x) + (2h)^2\frac{f^{''}(x)}{2!}$ for $x \leq \xi \leq x + 2h$ $\rightarrow g_1(x, h) = f^{'}(x) + (2h){f^{''}(\xi)}$ for $x \leq \xi \leq x + 2h$ Hence the error term is $2hf^{''}(\xi)$ ⇒ $h=2*\sqrt{\epsilon_{mach}}*\frac{1}{\sqrt{e^{sin(x)}cos(x)^2−e^{sin(x)}sin(x)}} = \frac{2\sqrt{\epsilon_{mach}}}{\sqrt{\frac{e^{\frac{1}{\sqrt{2}}}}{2} - \frac{e^{\frac{1}{\sqrt{2}}}}{\sqrt{2}}}}$ For the second approximation $g_2(x, h)$: the error term is $-\frac{1}{6}h^2f^{'''}(x)$ (c). For $g_1$, the smallest error is at h = 1.000000e-08 For $g_2$, the smallest error is at h = 3.162278e-06 --- **Problem 10 \[7 points]** In the Patriot disaster example, the decimal value 0.1 was converted to a single precision number with chopping. Suppose that it is converted to a double precision number with chopping. (a). \[5 points] What is the error in this double precision representation of 0.1. (b). \[2 points] What is the error in the computed time after 100 hours? _Answer_: (a). Given the binary representation of $0.1$ in double precision: - Sign: $0$ - Exponent: $0111111101101111111011$, which is 1019 in decimal ⇒ effective exponent is $1029-1023=-4$ - Significand: $10011001100110011001100110011001100110011001100110101001100110011001100110011001100110011001100110011010$ the binary digits will be chopped off at 52 bit. Therefore, $\epsilon_{mach} = 2^{-52}$ and thus $\text{roundoff error} = \frac{1}{2}\epsilon_{mach} = 2^{-53} \approx 1.11×10^{−16}$ (b). After 100 hours: $100 × 60 × 60 × 10 × 1.11 × 10^{−16} \approx 3.996×10^{−10} sec$ --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/A2 tags: - swfr4x03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2" title: "Gaussian elimination, LU decompositions, and errors LS solving" date: 2023-10-24 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2.html.md --- **Problem 1 \[8 points]** Consider the system $Ax = b$, where $A=\begin{bmatrix} 0.1 & 0.3 & 0.9\\ 0.3 & 0.9 & 2.7\\ 0.6 & 0.7 & 0.1 \end{bmatrix}$ and $b = \begin{bmatrix} 1.3 & 3.9 & 1.4\end{bmatrix}^T$ a. \[2 points] Show that $A$ is singular. b. \[2 points] If we were to use Gaussian elimination with partial pivoting to solve this system using exact arithmetic, show where the process fails. c. \[2 points] Solve this system in double precision using partial pivoting. Do not use Matlab’s functions. What is the solution that you obtain? d. \[2 points] Matlab’s `A\b` produces `NaN -Inf Inf` as a solution. Explain why NaN, -Inf and Inf. _Answer_: a. _For $A$ to be singular, prove $det(A) = 0$_ _Using Gaussian elimination without partial pivoting_ $$ \begin{aligned} A|b &= \begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.3 & 0.9 & 2.7 & | & 3.9\\ 0.6 & 0.7 & 0.1 & | & 1.4 \end{bmatrix} \\\ R_{2} - R_{1} \rightarrow A|b &= \begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.2 & 0.6 & 1.8 & | & 2.6\\ 0.6 & 0.7 & 0.1 & | & 1.4 \end{bmatrix} \\\ R_{3} - 3*R_{1} \rightarrow A|b &= \begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.2 & 0.6 & 1.8 & | & 2.6\\ 0.3 & -0.2 & -2.6 & | & -2.5 \end{bmatrix} \\\ R_3 - \frac{1}{2}*R_2 \rightarrow A|b &= \begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.2 & 0.6 & 1.8 & | & 2.6\\ 0.2 & -0.5 & -3.5 & | & -3.8 \end{bmatrix} \\\ \text{Thus } \rightarrow A|b \leftarrow &\begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.2 & 0.6 & 1.8 & | & 2.6\\ 0.2 & -0.5 & -3.5 & | & -3.8 \end{bmatrix} \\\ & \\\ det(A) = a(ei−fh)−b(di−fg)+c(dh−eg), A &=\begin{bmatrix} a & b & c \\ d & e & f\\ g & h & i \end{bmatrix} \\\ & \\\ \rightarrow det(A) = 0.1*(-0.6*3.5+1.8*0.5) - & \\\ 0.3*(-0.2*3.5-1.8*0.2) + & \\\ 0.9*(-0.5*0.2-0.6*0.2) &= 0 \end{aligned} $$ > [!tip] Lemma > > **$A$ is singular** b. _With partial pivoting_: $$ \begin{align} A|b &=\begin{bmatrix} 0.1 & 0.3 & 0.9 & | & 1.3\\ 0.3 & 0.9 & 2.7 & | & 3.9\\ 0.6 & 0.7 & 0.1 & | & 1.4 \end{bmatrix} \\\ R3 \leftrightarrow R1 \leftarrow A|b&=\begin{bmatrix} 0.6 & 0.7 & 0.1 & | & 1.4\\ 0.3 & 0.9 & 2.7 & | & 3.9\\ 0.1 & 0.3 & 0.9 & | & 1.3 \end{bmatrix} \\\ R2 - \frac{1}{2}R1 \leftarrow A|b&=\begin{bmatrix} 0.6 & 0.7 & 0.1 & | & 1.4\\ 0 & 0.55 & 2.65 & | & 3.2\\ 0.1 & 0.3 & 0.9 & | & 1.3 \end{bmatrix} \\\ R3 - \frac{1}{6}R1 \leftarrow A|b&=\begin{bmatrix} 0.6 & 0.7 & 0.1 & | & 1.4\\ 0 & 0.55 & 2.65 & | & 3.2\\ 0 & 0.18333333 & 0.88333333 & | & 1.06666667 \end{bmatrix} \\\ R3 - \frac{1}{3}R2 \leftarrow A|b&=\begin{bmatrix} 0.6 & 0.7 & 0.1 & | & 1.4\\ 0 & 0.55 & 2.65 & | & 3.2\\ 0 & 0 & 0 & | & -0.3 \end{bmatrix} \end{align} $$ We notice that $R3-\frac{1}{3}R2 \rightarrow 0=-0.3$, thus invalid. c. _With partial pivoting in double precision_ The $LU$ decomposition of $A=\begin{bmatrix} 0.1 & 0.3 & 0.9\\ 0.3 & 0.9 & 2.7\\ 0.6 & 0.7 & 0.1 \end{bmatrix}$ The following portray steps to calculate $U$ _(lower triangular)_: $$ \begin{aligned} R_3 \leftrightarrow R_1 \rightarrow U &= \begin{bmatrix} 0.6 & 0.7 & 0.1\\ 0.3 & 0.9 & 2.7\\ 0.1 & 0.3 & 0.9 \end{bmatrix}, \quad P_1 = \begin{bmatrix} 0 & 0 & 1\\ 0 & 1 & 0\\ 1 & 0 & 0 \end{bmatrix} \\\ R_2 - \frac{1}{2}R_1 \rightarrow U &= \begin{bmatrix} 0.6 & 0.7 & 0.1\\ 0 & 0.55 & 2.6500000000000004\\ 0.1 & 0.3 & 0.9 \end{bmatrix} \\\ R_3 - \frac{1}{6}R_1 \rightarrow U &= \begin{bmatrix} 0.6 & 0.7 & 0.1\\ 0 & 0.55 & 2.6500000000000004\\ 0 & 0.18333333333333335 & 0.8833333333333333 \end{bmatrix} \\\ R_3 - \frac{1}{3}R_2 \rightarrow U &= \begin{bmatrix} 0.6 & 0.7 & 0.1\\ 0 & 0.55 & 2.6500000000000004\\ 0 & 0 & 4.8109664400423476 \times 10^{-17} \end{bmatrix} \end{aligned} $$ \_note: the $a_{33}$ is close to zero, hence consistent with previous finding\_ $L=\begin{bmatrix} 1 & 0 & 0\\ 0.5 & 1 & 0\\ 0.16666666666666669 & 0.33333333333333326 & 1 \end{bmatrix}$ To solve for $x$ with $LU$ decomposition, We solve $L(Ux)=Pb$ $\rightarrow x=\begin{bmatrix} 14.006993006993 & -10.48951048951048 & 3.3846153846153832\end{bmatrix}$ d. Since A is singular, it doesn’t have an inverse. Matlab uses LU decomposition, and as we explored above, a pivot element is found to be zero or close to zero (matrix is _probably ill-conditioned_), which leads to $0x_1 + 0x_2 + 0x_3=\text{non negative value}$, which results in `NaN`. For the second value `-Inf`, the division is small. `Inf` is due to division by zero --- **Problem 2 \[2 points]** Apply Gaussian elimination with partial pivoting on the following matrix $A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ −1 & 1 & 0 & 0 & 1\\ −1 & −1 & 1 & 0 & 1\\ −1 & −1 & −1 & 1 & 1\\ −1 & −1 & −1 & −1 & 1 \end{bmatrix}$ Show all the steps. _Answer_: $A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ −1 & 1 & 0 & 0 & 1\\ −1 & −1 & 1 & 0 & 1\\ −1 & −1 & −1 & 1 & 1\\ −1 & −1 & −1 & −1 & 1 \end{bmatrix}$ $R2+R1 \text{ and } R3+R1\text{ and } R4+R1\text{ and } R5+R1\rightarrow A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ 0 & 1 & 0 & 0 & 2\\ 0 & −1 & 1 & 0 & 2\\ 0 & −1 & −1 & 1 & 2\\ 0 & −1 & −1 & −1 & 2 \end{bmatrix}$ $R3+R2 \text{ and } R4+R2\text{ and } R5+R2\rightarrow A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ 0 & 1 & 0 & 0 & 2\\ 0 & 0 & 1 & 0 & 4\\ 0 & 0 & −1 & 1 & 4\\ 0 & 0 & −1 & −1 & 4 \end{bmatrix}$ $R4+R3 \text{ and } R5+R3\rightarrow A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ 0 & 1 & 0 & 0 & 2\\ 0 & 0 & 1 & 0 & 4\\ 0 & 0 & 0 & 1 & 8\\ 0 & 0 & 0 & −1 & 8 \end{bmatrix}$ ## $R5+R4\rightarrow A=\begin{bmatrix} 1 & 0 & 0 & 0 & 1\\ 0 & 1 & 0 & 0 & 2\\ 0 & 0 & 1 & 0 & 4\\ 0 & 0 & 0 & 1 & 8\\ 0 & 0 & 0 & 0 & 16 \end{bmatrix}$ **Problem 3 \[5 points]** (a) (3 points) Let $A$, $B$, and $C$ be $n × n$ matrices, where $B$ and $C$ are nonsingular. For an $n-$vector $b$, describe how you would implement the formula $x = C^{-1} (A + I)(A + B^{−1})b.$ without computing any inverses. Here, $I$ is the $n × n$ identity matrix. (b) (2 points) What is the complexity of your approach in terms of big-O notation? _Answer_: a. _Given $B$ and $C$ are non-singular_ 1. Step 1: _Using $LU$ decomposition of B, such that $B=LU$_ 2. Step 2: Solve for $y$ in $By=b$ (As $y=B^{-1}b$) 1. solve for $z$ in $Lz=b$ via forward substitution 2. solve for $y$ in $Uy=z$ via backward substitution 3. Step 3: Compute $z=(A+B^{-1})b$ 1. This becomes $z=Ab+y$ 4. Step 4: Compute $w = (A+I)z$ 1. Via _matrix multiplication_ $\rightarrow w=Az + z$ 5. Step 5: _using $LU$ decomposition of C, such that $C=LU$_ 6. Step 6: Solve for $x$ in $Cx=w$ (As $x=C^{-1}w$) 1. Solve for $z'$ in $Lz'=w$ via forward substitution 2. Solve for $x$ in $Ux=z'$ via backward substitution With expansion, solved $x = C^{-1} (A + I)(A + B^{−1})b.$ b. Complexity analysis Let `total_cost` be the big-O notation Step 1 _using $LU$ decomposition of $B$_ $\rightarrow \text{total\_cost}=O(n^3)$ Step 2 _solving each $Lz=b$ and $Uy=z$_ takes $O(n^2)$ each, thus solving $Lz=b$ using $LU$ decomposition takes $O(2n^2)$ $\rightarrow \text{total\_cost}=O(n^3) + O(2n^2)$ Step 3 _Compute $z=(A+B^{-1})b$_ - MatmulOp of $Ab$ is $O(n^2)$ - AddOp of $Ab+y$ is $O(n)$ - Total for this step $O(n^2) + O(n)$ $\rightarrow \text{total\_cost}=O(n^3) + O(3n^2) + O(n)$ Step 4 _Compute $w = (A+I)z$_ - MatmulOp of $Ab$ is $O(n^2)$ - AddOp of $Ab+y$ is $O(n)$ - Total for this step $O(n^2) + O(n)$ $\rightarrow \text{total\_cost}=O(n^3) + O(4n^2) + O(2n)$ Step 5 _using $LU$ decomposition of $C$_ $\rightarrow \text{total\_cost}=O(2n^3) + O(4n^2) + O(2n)$ Step 6 _solving each $Lz'=w$ and $Ux=z'$ using LU composition_ takes $O(2n^2)$ $\rightarrow \text{total\_cost}=O(2n^3) + O(6n^2) + O(2n)$ --- **Problem 4 \[6 points]** An $n × n$ Hilbert matrix, denote it by $H$, has entries $h_{ij} = \frac{1}{(i+j-1)}, i, j = 1, . . . , n.$ For $n = 2, 3, . . .$ , generate the Hilbert matrix of order $n$, and also generate the $n-$vector $b = Hx$, where $x$ is a random vector. Solve the resulting system $Hx = b$ to obtain an approximate solution $\hat{x}$. (See the functions `hilb` and `rand`.) (a) \[2 points] How large can you take $n$ before the error $\frac{\Vert{\hat{x} - x}\Vert}{\Vert{x}\Vert}$ is 100 percent? (b) \[2 points] For $n$ up to the value you find in (a), report $\frac{\Vert{r}\Vert}{\Vert{b}\Vert}$ , where $r = b − H\hat{x}$, and $\frac{\Vert{\hat{x} - x}\Vert}{\Vert{x}\Vert}$. (c) \[2 points] As $n$ increases, how does the number of correct digits in the computed solution relate to the condition number of the matrix? See the `cond` function. Submit your Matlab program producing the above results. Name the Matlab file `hilb_problem.m`. _Answer_: The following `hilb_problem.m` is used: ```matlab title="hilb_problem.m" function hilb_problem() n = 1; while true % Generate Hilbert matrix of order n H = hilb(n); % Generate random vector x x = rand(n, 1); % Compute b = Hx b = H * x; % Solve the system Hx = b x_hat = H \ b; % Compute the relative error error = norm(x_hat - x) / norm(x); fprintf("error=%d, n=%d\n", error, n) % If the error is 100 percent, break if error >= 1 break; end n = n + 1; end fprintf('\n=============\n\nThe largest n before the error is 100 percent is: %d\n\n=============\n', n-1); for i = 1:n-1 H = hilb(i); x = rand(i, 1); b = H * x; x_hat = H \ b; r = b - H * x_hat; rel_resid = norm(r) / norm(b); rel_error = norm(x_hat - x) / norm(x); %fprintf('%d %.16f\n',i, rel_resid) fprintf('| %d | %.32f | %.32f |\n', i, rel_resid, rel_error); end cond_num = cond(H); fprintf('The condition number of the matrix for n = %d is: %f\n', n-1, cond_num); end ``` a. largest $n=12$ before the error $\frac{\Vert{\hat{x} - x}\Vert}{\Vert{x}\Vert}$ is 100 percent. b. The following entails the value of $\frac{\Vert{r}\Vert}{\Vert{b}\Vert}$ and $\frac{\Vert{\hat{x} - x}\Vert}{\Vert{x}\Vert}$ | n | $\frac{\Vert{r}\Vert}{\Vert{b}\Vert}$ | $\frac{\Vert{\hat{x} - x}\Vert}{\Vert{x}\Vert}$ | | -- | ------------------------------------- | ----------------------------------------------- | | 1 | 0.00000000000000000000000000000000 | 0.00000000000000000000000000000000 | | 2 | 0.00000000000000000000000000000000 | 0.00000000000000013220372219891702 | | 3 | 0.00000000000000000000000000000000 | 0.00000000000000363350625815651572 | | 4 | 0.00000000000000000000000000000000 | 0.00000000000006709266750580992637 | | 5 | 0.00000000000000007733975117624287 | 0.00000000000747821082933078000054 | | 6 | 0.00000000000000013934207506736382 | 0.00000000023960543432895825359428 | | 7 | 0.00000000000000010660570398371085 | 0.00000000837749558262967895463873 | | 8 | 0.00000000000000007165565184570407 | 0.00000009992506975169996005028294 | | 9 | 0.00000000000000007076549838447114 | 0.00000608952488692639798140973303 | | 10 | 0.00000000000000012662840530707719 | 0.00002450986238666613242472361311 | | 11 | 0.00000000000000011997633780813789 | 0.00379971054180424641297242338567 | | 12 | 0.00000000000000006503338066505365 | 0.25404291536273732043937911839748 | c. _As $n$ increases, the condition number increases, which means the matrix becomes more ill-conditioned. This means fewer digits in the computed solution are correct._ > [!tip] IMPORTANT > > The number of correct digits in the computed solution decreases due to the increase in the condition number as $n$ increases --- **Problem 5 \[4 points]** You have to interpolate $sin(x)$ by a polynomial of degree five using equally spaced points in \[0, 1]. (a) \[2 points] What (absolute) error would you expect if you use this polynomial? (b) \[2 points] Using equally spaced points, what degree polynomial would you use to achieve a maximum error of $10^{-8}$? _Answer_: a. Interpolate $sin(x)$ by a polynomial of degree _five_ using equally spaced on in $[0,1]$, Error as follow $f(x) - p_n(x) = E(x) = \frac{f^{n+1}(\xi)}{(n+1)!}\prod_{i=0}^{n}{(x-x_i)}$ where - $n$ is the degree of the polynomial ($n=5$) - $f^{n+1}(\xi)$ is $(n+1)\text{-th}$ derivate of $f$ Derivate of $sin(x)$ every 4 terms is $sin(x), cos(x), -sin(x), -cos(x)$. Therefore the 6th derivative is $-cos(x)$ Here $h=\frac{b-a}{n}=\frac{1}{5}$ and $M = max_{0\leq t\leq 1}|-cos(t)| = 1 - cos(1) = 2sin^2(\frac{1}{2})$ Therefore $|E(x)| = |f(x) - sin(x)| \leq \frac{M}{4(n+1)}h^{n+1}=\frac{2sin^2(\frac{1}{2})}{4(6)}(1/5)^6 \approx 1.225860517684960×10^{−6}$ b. To achieve maximum error of $10^{-8}$, We have $|f(x) - sin(x)| \leq\frac{max_{0\leq t\leq 1}|sin^{(n+1)}(t)|}{4(n+1)*n^{n+1}} = 10^{-8}$ derivative of $sin(x)$ cycles every 4 term, thus the max value of $|sin^{(n+1)}(t)|$ over $[0,1]$ is 1 Thus we need to solve for $n$ in $\frac{1}{4(n+1)n^{n+1}}=10^{-8} \rightarrow n\approx 7 \text{ (through trial and error)}$ Hence considering to use polynomial degree _seven_ to achieve the desired error bound. --- **Problem 6 \[3 points]** You are given the values of $\sqrt{x}$ at three points | | | | | | ---------- | - | - | - | | x | 1 | 4 | 9 | | $\sqrt{x}$ | 1 | 2 | 3 | (a) \[2 points] Construct the interpolating polynomial interpolating these data. (b) \[1 points] Using this polynomial, approximate $\sqrt{1.5}$. _Answer_: a. To construct the interpolating polynomial for these data, we will use _Lagrange basis_ $P(x)=\sum_{i=0}^{n-1}{y_i}{L_i(x)}$ where $L_i(x)$ is the $i\text{-th}$ Lagrange basis polynomial, defined as $L_i(x) = \prod_{j=0,j\neq i}^{n-1}\frac{x-x_j}{x_i-x_j}$ With $y(x) = \sqrt{x}$, and data point $x_0=1,y_0=1;x_1=4,y_1=2;x_2=9,y_2=3$ $P(x)=\sum_{i=0}^{2}{y_i}{L_i(x)} \text{ where } L_i(x) = \prod_{j=0,j\neq i}^{2}\frac{x-x_j}{x_i-x_j}$ $L_0(x) = \frac{(x-x_1)(x-x_2)}{(x_0-x_1)(x_0-x_2)} = \frac{(x-4)(x-9)}{(1-4)(1-9)} = \frac{(x-4)(x-9)}{24}$ $L_1(x) = \frac{(x-x_0)(x-x_2)}{(x_1-x_0)(x_1-x_2)}=\frac{(x-1)(x-9)}{(4-1)(4-9)}=\frac{(x-1)(9-x)}{15}$ $L_2(x) = \frac{(x-x_0)(x-x_1)}{(x_2-x_0)(x_2-x_1)}=\frac{(x-1)(x-4)}{(9-1)(4-1)} = \frac{(x-4)(x-1)}{40}$ $P(x) = y_0L_0(x) + y_1L_1(x) + y_2L_2(x) = 1 * \frac{(x-4)(x-9)}{24} + 2*\frac{(x-1)(9-x)}{15} + 3*\frac{(x-4)(x-1)}{40}$ > [!tip] IMPORTANT > > The interpolating polynomial $P(x)=\frac{(x-4)(x-9)}{24} + \frac{2(x-1)(9-x)}{15} + \frac{3(x-4)(x-1)}{40}$ b. The approximation of $P(\sqrt{1.5})=\frac{(1.5-4)(1.5-9)}{24} + \frac{2(1.5-1)(9-1.5)}{15} + \frac{3(1.5-4)(1.5-1)}{40}=1.1875$ --- **Problem 7 \[7 points]** Let $f(x) = \frac{sin(x)}{(1+20x)^2}$. Interpolate this function over $x \in [−1, 1]$ using (a) \[2 points] polynomial interpolation of degree $n = 15$ at equally spaced points. Then evaluate this polynomial at $N = 100$ equally spaced points. Denote the interpolating polynomial by $p(x)$. Plot - $f(x)$ and $p(x)$ versus $x$ at the interpolation points and at the $N$ points (on the same plot); - $|f(x) − p(x)|$ versus $x$ at the $N$ points. You can use the `polyfit` function. See the `linspace` function. (b) \[2 points] Repeat (a) but now using Chebyshev points. (c) \[2 points] Repeat (a) but now using spline interpolation at $n + 1$ equally spaced points. See the `spline` function. (d) \[1 points] Discuss the accuracies of your results. Submit your plots (6 in total) and the Matlab code producing them. Name your Matlab file `interp_problem.m`. _Answer_ $f(x)$ implementation in matlab are as follow: ```matlab f = @(x) sin(x)./((1 + 20*x).^2); ``` a. The following is a snippet of `interp_problem.m` for polynomial interpolation of degree $n=15$ ```matlab % (a) Polynomial interpolation of degree n = 15 at equally spaced points % Define the number of interpolation points and the degree of the polynomial n = 15; N = 100; % Generate n+1 equally spaced points in the interval [-1, 1] x = linspace(-1, 1, n+1); y = f(x); % Interpolate using polyfit p_coeff = polyfit(x, y, n); % Evaluate the interpolating polynomial at N equally spaced points x_N = linspace(-1, 1, N); p_N = polyval(p_coeff, x_N); % Plot f(x) and p(x) on the same graph figure; plot(x_N, f(x_N), 'b-', x_N, p_N, 'r--', x, y, 'go'); legend('f(x)', 'p(x)', 'Interpolation Points'); title('f(x) and p(x) vs. x'); xlabel('x'); ylabel('y'); % Plot the absolute error |f(x) - p(x)| at the N points figure; plot(x_N, abs(f(x_N) - p_N), 'm-'); title('Absolute Error |f(x) - p(x)| vs. x'); xlabel('x'); ylabel('Error'); ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig1.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig2.webp) b. The following is a snippet of `interp_problem.m` for Cheybyshev points ```matlab % (b) Polynomial interpolation using Chebyshev points % Generate Chebyshev points in the interval [-1, 1] x_cheb = cos((2*(1:n+1)-1)*pi/(2*n)); y_cheb = f(x_cheb); % Interpolate using polyfit p_cheb_coeff = polyfit(x_cheb, y_cheb, n); % Evaluate the interpolating polynomial at N equally spaced points p_cheb_N = polyval(p_cheb_coeff, x_N); % Plot f(x) and p(x) using Chebyshev points on the same graph figure; plot(x_N, f(x_N), 'b-', x_N, p_cheb_N, 'r--', x_cheb, y_cheb, 'go'); legend('f(x)', 'p(x) with Chebyshev', 'Interpolation Points'); title('f(x) and p(x) with Chebyshev vs. x'); xlabel('x'); ylabel('y'); % Plot the absolute error |f(x) - p(x)| using Chebyshev points at the N points figure; plot(x_N, abs(f(x_N) - p_cheb_N), 'm-'); title('Absolute Error |f(x) - p(x) with Chebyshev| vs. x'); xlabel('x'); ylabel('Error'); ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig3.webp) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig4.webp) c. The following is a snippet of `interp_problem.m` through spline interpolation at $n + 1$ equally spaced points. ```matlab % (c) Spline interpolation at n+1 equally spaced points % Evaluate the function at n+1 equally spaced points y_spline = f(x); % Use the spline function to get the piecewise polynomial representation pp = spline(x, y_spline); % Evaluate the spline at N equally spaced points spline_N = ppval(pp, x_N); % Plot f(x) and the spline on the same graph figure; plot(x_N, f(x_N), 'b-', x_N, spline_N, 'r--', x, y_spline, 'go'); legend('f(x)', 'spline(x)', 'Interpolation Points'); title('f(x) and spline(x) vs. x'); xlabel('x'); ylabel('y'); % Plot the absolute error |f(x) - spline(x)| at the N points figure; plot(x_N, abs(f(x_N) - spline_N), 'm-'); title('Absolute Error |f(x) - spline(x)| vs. x'); xlabel('x'); ylabel('Error'); ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig5.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a2-fig6.webp) d. Discussion 1. The polynomial interpolation using equally spaced points _might show oscillations_ near endpoints due to _Runge phenomenon_ (oscillations near the endpoints of the interpolated interval become pronounced). We saw oscillation in the error graph here. 2. Polynomial interpolation using Chebyshev points should mitigate the oscillations 3. The spline interpolation will provide a piecewise polynomial that should fit the function smoothly and might offer better accuracy than polynomial interpolation --- **Problem 8 \[4 points]** Given the three data points $(−1, 1), (0, 0), (1, 1)$, determine the interpolating polynomial of degree two using: a. \[1 point] monomial basis b. \[1 point] Lagrange basis c. \[1 point] Newton basis \[1 point] Show that the three [representations](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A2/../../../../../../../../thoughts/representations) give the same polynomial. _Answer_: a. Monomial basis The monomial basis for a polynomial of degree two is given by: $p(x)=a_0+a_1*x+a_2*x^2$ The linear system as follow $a_0-a_1+a_2=1$ $a_0=0$ $a_0+a_1+a_2=1$ Solving this system to obtain the $a_0=0,a_1=0, a_2=1$ > [!note] NOTE > > Thus _monomial basis_ of this polynomial of degree two is $p(x) = x^2$ b. Lagrange basis The Lagrange basis for a polynomial of degree two is given by: $p(x)=\sum_{j=0}^{2}{y_j}{L_j(x)} = f(x_0)L_0{(x)} + f(x_1)L_1{(x)} + f(x_2)L_2{(x)}$ where $L_0(x) = \frac{(x-x_1)(x-x_2)}{(x_0-x_1)(x_0-x_2)} = \frac{x(x-1)}{2}$ $L_1(x) = \frac{(x-x_0)(x-x_2)}{(x_1-x_0)(x_1-x_2)}=-x(x-1)$ $L_2(x) = \frac{(x-x_0)(x-x_1)}{(x_2-x_0)(x_2-x_1)}=\frac{x(x+1)}{2}$ Thus $p(x) = 1*\frac{x(x-1)}{2} + 0*(-x(x-1)) + \frac{x(x+1)}{2} = x^2$ > [!note] NOTE > > Thus _Lagrange basis_ of this polynomial of degree two is $p(x) = x^2$ c. Newton basis The Newton basis for a polynomial of degree two is given by: $p(x)=f(x_0)+(x-x_0)f[x_0, x_1] + (x-x_0)(x-x_1)f[x_0, x_1, x_2]$ where $f[x_0,x_1]=\frac{f(x_1)-f(x_0)}{x_1-x_0} = \frac{0-1}{0+1} = -1$ $f[x_0,x_1,x_2]=\frac{f[x_1, x_2]-f[x_0, x_1]}{x_2-x_0} = \frac{1+1}{1+1} = 1$ We have $f[x_1,x_2]=\frac{f(x_2)-f(x_1)}{x_2-x_1} = \frac{1-0}{1-0} = 1$ Thus $p(x)=1+(x+1)(−1)+(x+1)(x)*2 =1 - x-1 + (x^2+x)=x^2$ > [!note] NOTE > > Thus _Newton basis_ of this polynomial of degree two is $p(x) = x^2$ Therefore, we prove that all three basis yield the same polynomial for degree two. --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/A3 tags: - swfr4x03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3" title: "Least squares, Trapezoidal and Simpson's rules" date: 2023-11-30 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3.html.md --- **Problem 1** a. ```matlab function [q, nfun] = adsimpson(f, a, b, tol) persistent recursion_depth nfun_internal; if isempty(recursion_depth) recursion_depth = 0; end if isempty(nfun_internal) nfun_internal = 0; end recursion_depth = recursion_depth + 1; nfun_internal = nfun_internal + 1; % Increment function evaluations if recursion_depth > 1000 % Check recursion depth error('Maximum recursion depth exceeded.'); end c = (a + b)/2; h = b - a; fa = f(a); fb = f(b); fc = f(c); S = (h/6) * (fa + 4*fc + fb); d = (a + c)/2; e = (c + b)/2; fd = f(d); fe = f(e); Sleft = (h/12) * (fa + 4*fd + fc); Sright = (h/12) * (fc + 4*fe + fb); S2 = Sleft + Sright; if abs(S2 - S) < 15*tol q = S2 + (S2 - S)/15; else mid = (a + b)/2; [q_left, nfun_left] = adsimpson(f, a, mid, tol/2); [q_right, nfun_right] = adsimpson(f, mid, b, tol/2); q = q_left + q_right; nfun_internal = nfun_internal + nfun_left + nfun_right; end if nargout > 1 nfun = nfun_internal; end recursion_depth = recursion_depth - 1; if recursion_depth == 0 nfun_internal = 0; % Reset on the last exit end end ``` b. ```matlab function q = dsimpson(f, a, b, c, d, tol) function qx = integrand_x(y) [qx, ~] = adsimpson(@(x) f(x, y), a, b, tol); end [q, ~] = adsimpson(@(y) integrand_x(y), c, d, tol); end ``` The output are as follow ```prolog dsimpson 2.9491801536006179e-01 integral2 2.9491801499984915e-01 |dsimpson-integral2| =3.60e-10 ``` --- **Problem 2** ```matlab title="pendulum.m" function pendulum % Define the range for x values x_values = linspace(-0.99, 0.99, 200); % Adjust the number of points for smoothness K_values = zeros(size(x_values)); evals = zeros(size(x_values)); tol = 1e-10; % Define the integrand for the elliptic integral of the first kind for i = 1:length(x_values) x = x_values(i); integrand = @(theta) 1 ./ sqrt(1 - x^2 .* sin(theta).^2); % Use adsimpson to integrate and capture the number of function evaluations [K_values(i), evals(i)] = adsimpson(integrand, 0, pi/2, tol); end % Plot K(x) versus x figure; plot(x_values, K_values); title('Complete Elliptic Integral of the First Kind K(x) versus x'); xlabel('x'); ylabel('K(x)'); % Plot the number of function evaluations versus x figure; plot(x_values, evals); title('Number of Function Evaluations versus x'); xlabel('x'); ylabel('Number of Function Evaluations'); end ``` The following graph are then produced ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a3-p2-f1.svg) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a3-p2-f2.svg) _Explanation_ The graph show extreme spike at both end of the range, close to `+-1` The graph shows an extreme spike in the number of function evaluations at both ends of the $x$ range, close to $\pm1$. This is consistent with the expectation that as $x$ approaches $\pm1$, the integrand of the complete elliptic integral of the first kind, $\frac{d\theta}{\sqrt{1 - x^2 \sin^2 \theta}}$, approaches a singularity for some $theta$ within the interval $[0, \pi/2]$. When $x$ is near $\pm1$, the term $x^2 \sin^2 \theta$ can approach $1$, causing the denominator to approach zero and the integrand to become very large or approach infinity, especially as $\theta$ approaches $\pi/2$. The adaptive Simpson’s method tries to maintain the specified tolerance by increasing the number of intervals (thus function evaluations) where the integrand varies rapidly or becomes difficult to approximate due to singular behavior. Near these singularities, even small intervals can have large differences in the integrand values, leading the adaptive algorithm to recursively subdivide the intervals, resulting in a substantial increase in function evaluations. The sharp increase in function evaluations at the edges of the graph indicates that the algorithm is working as expected, refining the integration intervals to handle the challenging behavior of the integrand near the points where it is not well-behaved. The function evaluations become extremely high as the integrand requires very fine subdivisions to approximate the integral within the specified tolerance near the singular points. --- **Problem 3** ### C Trapezoidal rule ```matlab title="trapezoid.m" function I = trapezoid(f, a, b, n) % Composite Trapezoidal Rule x = linspace(a, b, n+1); % Generate n+1 points from a to b y = f(x); dx = (b - a)/n; I = (dx/2) * (y(1) + 2*sum(y(2:end-1)) + y(end)); end ``` ### C Simpson’s rule ```matlab title="simpson.m" function I = simpson(f, a, b, n) % Composite Simpson's Rule % Ensure n is even if mod(n, 2) == 1 warning('Simpson’s rule requires an even number of intervals.'); n = n + 1; end x = linspace(a, b, n+1); % Generate n+1 points from a to b y = f(x); dx = (b - a)/n; I = (dx/3) * (y(1) + 4*sum(y(2:2:end-1)) + 2*sum(y(3:2:end-2)) + y(end)); end ``` a. Given $\int_{0}^{\frac{\pi}{2}}e^xcos(x)dx$ with absolute error of at most $tol=10^{-4}$ #### Trapezoidal The error bound is given by $E_t\leq \frac{(b-a)^3}{12n^3}max_{a\leq x\leq b}|f^{''}(x)|$, where $f(x)=e^xcos(x)$ $f^{''}(x)=e^x(2cos(x) - 2sin(x))$ Since $e^x$ increasing and $|cos(x)-sin(x)|$ maximised at $x=\frac{\pi}{4}$ Therefore $f^{''}(x)$ is maximised at $x=\frac{\pi}{4}$ for interval $[0, \frac{\pi}{2}]$ $max|f^{''}(x)| = |e^{\frac{\pi}{4}}(2cos(\frac{\pi}{4}) - 2sin(\frac{\pi}{4}))| = e^{\frac{\pi}{4}}\sqrt{2}$ Then, we need to solve for $\frac{(\frac{\pi}{2})^3}{12n^2}e^{\frac{\pi}{4}}\sqrt{2} \leq 10^{-4}$ and gives $n \geq 101$ to satisfy the `tol` #### Simpson’s The error bound is given by $E_s \leq \frac{(b-a)^5}{180n^4}max_{a\leq x\leq b}|f^{4}(x)|$ $f^{4}(x)=e^x(-4sin(x) - 4cos(x))$ on interval $[0, \frac{\pi}{2}]$ is approx. 19.2419 Then, we need to solve for $\frac{(\frac{\pi}{2})^5}{180n^4}max|f^{4}(x)| \leq 10^{-4}$, which yields $n \geq 12$ b. #### Trapezoidal Using the following ```matlab f = @(x) exp(x) .* cos(x); a = 0; b = pi/2; tol = 1e-4; % Compute the exact integral value exact_integral = integral(f, a, b); % Initialize n and the approximate integral n = 1; approx_integral = 0; while true n = n + 1; % Increment n % Compute the trapezoidal approximation approx_integral = trapezoid(f, a, b, n); % Calculate the absolute error error = abs(exact_integral - approx_integral); % Check if the error is within the tolerance if error <= tol break; end end % Display the smallest n that meets the tolerance requirement disp(n); ``` yield $n \geq 110$ #### Simpson’s Using the following ```matlab f = @(x) exp(x) .* cos(x); a = 0; b = pi/2; tol = 1e-4; % Compute the exact integral value exact_integral = integral(f, a, b); % Initialize n (must be even for Simpson's rule) and the approximate integral n = 2; % Start with the smallest even number approx_integral = 0; while true % Compute the Simpson's approximation approx_integral = simpson(f, a, b, n); % Calculate the absolute error error = abs(exact_integral - approx_integral); % Check if the error is within the tolerance if error <= tol break; end n = n + 2; % Increment n by 2 to ensure it's even end % Display the smallest n that meets the tolerance requirement disp(['The smallest n for Simpson''s rule is ', num2str(n)]); ``` yield $n \geq 8$ c. #### Trapezoidal The following ```matlab f = @(x) exp(x) .* cos(x); a = 0; b = pi/2; n_values = 2:200; % n can be any integer for the trapezoidal rule tol = 1e-4; exact_integral = integral(f, a, b); % Initialize arrays to store the actual errors and theoretical error bounds actual_errors_trap = zeros(size(n_values)); bounds_trap = zeros(size(n_values)); % Compute the second derivative for the trapezoidal rule error bound f_second = @(x) exp(x) .* (cos(x) - sin(x) - sin(x) - cos(x)); % f''(x) max_f_second = max(abs(f_second(linspace(a, b, 1000)))); % Max over [a, b] % Calculate errors and bounds for each n for i = 1:length(n_values) n = n_values(i); % Trapezoidal rule calculations approx_integral_trap = trapezoid(f, a, b, n); actual_errors_trap(i) = abs(exact_integral - approx_integral_trap); bounds_trap(i) = ((b - a)^3 / (12 * n^2)) * max_f_second; end % Plot the error bounds and actual errors on a loglog plot figure; loglog(n_values, bounds_trap, 'r-', n_values, actual_errors_trap, 'b--'); legend('Trapezoid Bound', 'Trapezoid Actual'); title('Error Bounds and Actual Errors for Trapezoidal Rule'); xlabel('n (number of subintervals)'); ylabel('Error'); ``` yields ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a3-p3-c-trapezoidal.webp) #### Simpson’s The following: ```matlab title="errors.m" f = @(x) exp(x) .* cos(x); a = 0; b = pi/2; n_values = 2:2:200; % Simpson's rule requires an even number of intervals tol = 1e-4; exact_integral = integral(f, a, b); % Initialize arrays to store the actual errors and theoretical error bounds actual_errors_simp = zeros(size(n_values)); bounds_simp = zeros(size(n_values)); % Compute the fourth derivative for Simpson's rule error bound max_f_4th = max(abs(exp(linspace(a, b, 1000)) .* (cos(linspace(a, b, 1000)) - 4.*sin(linspace(a, b, 1000)) - 6.*cos(linspace(a, b, 1000)) - 4.*sin(linspace(a, b, 1000)) + cos(linspace(a, b, 1000))))); % Calculate errors and bounds for each n for i = 1:length(n_values) n = n_values(i); % Simpson's rule calculations approx_integral_simp = simpson(f, a, b, n); actual_errors_simp(i) = abs(exact_integral - approx_integral_simp); bounds_simp(i) = ((b - a)^5 / (180 * n^4)) * max_f_4th; end % Plot the error bounds and actual errors on a loglog plot figure; loglog(n_values, bounds_simp, 'r-', n_values, actual_errors_simp, 'b--'); legend('Simpson Bound', 'Simpson Actual'); title('Error Bounds and Actual Errors for Simpson''s Rule'); xlabel('n (number of subintervals)'); ylabel('Error'); ``` yields ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a3-p3-c-simpsons.webp) d. #### Trapezoidal Error bound for theoretical is proportional to $\frac{1}{n^2}$, therefore on the `loglog` the theoretical appears to be a straight lines with negative slope. Slope should be `-2`, because the error bound decreases with square of `# n` The actual error observed also diminished as $n$ becomes larger. Similar to error bound, the actual error is expected to decrease with increase in n, but may decrease faster/slower. In `loglog` plot, it then appears to be straight line. #### Simpson’s Error bound for theoretical is proportional to $\frac{1}{n^4}$, therefore on the `loglog` the theoretical appears to be a straight lines with negative slope. The actual error observed when using Simpson’s rule also shows a rapid decrease with increasing $n$. The actual error may decrease faster than the error bound predicts because the bound is a worst-case estimate. The true error often is less than this bound, especially for well-behaved functions. The difference in slopes between the actual error curve and the theoretical error bound curve is expected. The theoretical curve represents the maximum possible error, not the exact error, which can be much less depending on how the function behaves within each subinterval. The actual error may flatten as $n$ increases past a certain point. This is due to the limitations of numerical precision in Matlab. --- **Problem 4** ```matlab title="timeadd.m" function timeadd % Define the sizes of the matrices sizes = 500:100:1500; times_addR = zeros(length(sizes), 1); times_addC = zeros(length(sizes), 1); % Time the functions and record the execution times for i = 1:length(sizes) n = sizes(i); A = rand(n, n); B = rand(n, n); f_addR = @() addR(A, B); f_addC = @() addC(A, B); times_addR(i) = timeit(f_addR); times_addC(i) = timeit(f_addC); end % Perform least squares fitting to the model t = cn^2 X = [ones(length(sizes), 1), sizes'.^2]; crow_krow = X \ times_addR; ccol_kcol = X \ times_addC; % Output the constants fprintf('crow: %e\n', crow_krow(1)); fprintf('krow: %e\n', crow_krow(2)); fprintf('ccol: %e\n', ccol_kcol(1)); fprintf('kcol: %e\n', ccol_kcol(2)); % Plot the results figure; loglog(sizes, times_addR, 'o-', 'DisplayName', 'addR'); hold on; loglog(sizes, times_addC, 'o-', 'DisplayName', 'addC'); xlabel('Matrix Size (n)'); ylabel('Time (seconds)'); title('Time Complexity of Matrix Addition'); legend show; grid on; end function C = addR(A, B) [n, ~] = size(A); C = zeros(n, n); for i = 1:n C(i, :) = A(i, :) + B(i, :); end end function C = addC(A, B) [n, ~] = size(A); C = zeros(n, n); for j = 1:n C(:, j) = A(:, j) + B(:, j); end end ``` Yields ```matlab crow: -7.047139e-03 krow: 2.787915e-08 ccol: -4.545719e-04 kcol: 1.913233e-09 ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A3/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a3-p4-timeadd.webp) Reason for $k_{row} \approx 3$ 1. Overhead of function call: we include a lot of measurement noise in the function, so probably will increase system load and other process. 2. `addR` memory access: `addR` is not optimal since MATLAB’s column-major order. Accessing elements row-wise can lead to cache misses and inefficient usage of memory bandwidth. 3. Added overheads, maybe associated with MATLAB’s JIT compilation, memory management. 4. Polynomial fitting: LS model fits a polynomial of form $t=c+kn^2$. If error that increase with $n$, then there is a leading overestimation of the quadratic term. --- **Problem 5** $y=ae^{x^2} + bx^3$ For each datapoint $(x_i, y_i)$, compute the residual as $r_i=ae^{x_i^2}+bx_{i}^{3} - y_i$ Sum of squared residuals $S=\sum_{i=1}^{n}{r_i^{2}}$ Or in this case $S=(ae^{-1}-b-0)+(a-1)^2 + (ae+b-2)^2$ is minimized Or $\frac{\partial S}{\partial a}=0$ and $\frac{\partial S}{\partial b}=0$ which results to $2(ae^{-1}-b)(e^{-1}) + 2(a-1) + 2(ae+b-2)e = 0$ and $-2(ae^{-1} -b) + 2(ae+b-2)=0$ $a=\frac{2e+2e^2+2e^3}{1+4e^2+e^4}$ and $b=\frac{-e^3+2+e+4e^2}{1+4e^2+e^4}$ --- **Problem 6** a. $r_k =k(l_k-l_0) -F(l_k)$ $\phi(k)=\sum_{k=1}^{n}[k(l_k-l_0) - F(l_k)]^2$ $\frac{\partial \phi}{\partial k}=\sum_{k=1}^{n}2[k(l_k-l_0) - F(l_k)](l_k-l_0)=0$ Or $k\sum_{k=1}^{n}(l_k-l_0)^2=\sum_{k=1}^{n}F(l_k)(l_k-l_0) \rightarrow k=\frac{\sum_{k=1}^{n}F(l_k)(l_k-l_0)}{\sum_{k=1}^{n}(l_k-l_0)^2}$ $k \approx 0.8996 N/m$ ```python # Given data l_values = [7, 9.4, 12.3] # l values F_values = [2, 4, 6] # F(l) values l0 = 5.3 # Unstretched length of the spring # Calculate the numerator and denominator for the k value numerator = sum([F * (l - l0) for F, l in zip(F_values, l_values)]) denominator = sum([(l - l0)**2 for l in l_values]) # Calculate k k = numerator / denominator ``` b. Using the same logic with additional data, we get $k\approx 0.9052 N/m$ ```python # Additional measurements for part B additional_l_values = [8.3, 11.3, 14.4, 15.9] # Additional l values additional_F_values = [3, 5, 8, 10] # Additional F(l) values # Combine old and new data points all_l_values = l_values + additional_l_values all_F_values = F_values + additional_F_values # Calculate the numerator and denominator for the new k value numerator_all = sum([F * (l - l0) for F, l in zip(all_F_values, all_l_values)]) denominator_all = sum([(l - l0)**2 for l in all_l_values]) # Calculate the new k using all data k_all = numerator_all / denominator_all ``` To determine which constant `k` best fit the dataset, we calculate the sum of squares of residuals `SSR` using entire datasets ```python # Calculate the sum of squares of residuals for the original k and the new k def sum_of_squares(k, l_values, F_values, l0): return sum([(k * (l - l0) - F)**2 for l, F in zip(l_values, F_values)]) # Sum of squares of residuals using k from part A for the whole data SSR_k = sum_of_squares(k, all_l_values, all_F_values, l0) # Sum of squares of residuals using k from part B for the whole data SSR_k_all = sum_of_squares(k_all, all_l_values, all_F_values, l0) SSR_k, SSR_k_all ``` This yield SSR from A is approx. 0.9062, whereas from part B is approx 0.8962. The lower the better here, which means part B is a better fit to the entire data comparing to part A. --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/A4 tags: - swfr4x03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4" title: "SGD, ODEs" date: 2023-11-30 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4.html.md --- ## P1 You are given the file [points.mat](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/points.mat) with training data. There are three kinds of points. The goal is to train with these data and classify the points in $[0, 1] × [0, 1]$. Modify the file [netbpfull.m](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/netbpfull.m) such that it works with the three categories of points. • Load the data with `load points.mat`. This will result in an array $x$ containing points in 2D and an array `labels` containing labels. • Modify the function cost such that it returns $\text{accuracy}=\frac{\text{number of points classified correctly}}{\text{total number of points}}*100$ and also returns the indices (in `x`) of training points that are not classified correctly. • [netbpfull.m](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/netbpfull.m) should plot - accuracy versus number of iterations - cost versus number of iterations and - two plots like in ![this plot](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/Figure1.webp) this plot - The training should stop if accuracy of 95% is reached; otherwise it should continue to `Niter=1e6`. For full marks, you need to achieve 95%. For pretty code, see [net.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/net.py). _Solution_ The following contains the diff of the original `netbpfull.m` and the modified version ```diff diff --git a/content/thoughts/university/compsci-4x03/netbpfull.m b/content/thoughts/university/compsci-4x03/netbpfull.m index 5a1b6e13..df3714f9 100644 --- a/content/thoughts/university/compsci-4x03/netbpfull.m +++ b/content/thoughts/university/compsci-4x03/netbpfull.m @@ -1,134 +1,225 @@ function netbp_full %NETBP_FULL % Extended version of netbp, with more graphics -% -% Set up data for neural net test -% Use backpropagation to train -% Visualize results -% -% C F Higham and D J Higham, Aug 2017 -% -% -% xcoords, ycoords, targets -x1 = [0.1,0.3,0.1,0.6,0.4,0.6,0.5,0.9,0.4,0.7]; -x2 = [0.1,0.4,0.5,0.9,0.2,0.3,0.6,0.2,0.4,0.6]; -y = [ones(1,5) zeros(1,5); zeros(1,5) ones(1,5)]; - -figure(1) -clf -a1 = subplot(1,1,1); -plot(x1(1:5),x2(1:5),'ro','MarkerSize',12,'LineWidth',4) -hold on -plot(x1(6:10),x2(6:10),'bx','MarkerSize',12,'LineWidth',4) -a1.XTick = [0 1]; -a1.YTick = [0 1]; -a1.FontWeight = 'Bold'; -a1.FontSize = 16; -xlim([0,1]) -ylim([0,1]) - -%print -dpng pic_xy.webp - - -% Initialize weights and biases + +% Load the data +load points.mat x; % This loads 'x' which contains points and 'labels' +load points.mat labels; % This loads 'x' which contains points and 'labels' + +x_mean = mean(x, 2); +x_std = std(x, 0, 2); +x = (x - x_mean) ./ x_std; % Normalize the data + +% Initialize weights and biases for a network with three outputs rng(5000); -W2 = 0.5*randn(2,2); -W3 = 0.5*randn(3,2); -W4 = 0.5*randn(2,3); -b2 = 0.5*randn(2,1); -b3 = 0.5*randn(3,1); -b4 = 0.5*randn(2,1); - - - -% Forward and Back propagate -% Pick a training point at random -eta = 0.05; +num_hidden_1 = 20; % Increased the number of neurons +num_hidden_2 = 20; +W2 = randn(num_hidden_1, 2) * 0.01; +W3 = randn(num_hidden_2, num_hidden_1) * 0.01; +W4 = randn(size(labels, 1), num_hidden_2) * 0.01; +b2 = zeros(num_hidden_1, 1); +b3 = zeros(num_hidden_2, 1); +b4 = zeros(size(labels, 1), 1); + +% Training parameters +eta = 0.001; % Adjusted learning rate +alpha = 0.89; % Momentum term +alpha_leak = 0.01; % Define this once at the beginning of your script +lambda = 0.001; % L2 Regularization strength Niter = 1e6; -savecost = zeros(Niter,1); +batch_size = 16; % Adjusted batch size for batch training +% Learning rate decay +decay_rate = 0.99; +decay_step = 10000; % Apply decay every 10000 iterations + +% buffers +savecost = zeros(Niter, 1); +saveaccuracy = zeros(Niter, 1); +savemisclassified = cell(Niter, 1); + +% Momentum variables +mW2 = zeros(size(W2)); +mW3 = zeros(size(W3)); +mW4 = zeros(size(W4)); +mb2 = zeros(size(b2)); +mb3 = zeros(size(b3)); +mb4 = zeros(size(b4)); + +% Training loop with batch training for counter = 1:Niter - k = randi(10); - x = [x1(k); x2(k)]; - % Forward pass - a2 = activate(x,W2,b2); - a3 = activate(a2,W3,b3); - a4 = activate(a3,W4,b4); - % Backward pass - delta4 = a4.*(1-a4).*(a4-y(:,k)); - delta3 = a3.*(1-a3).*(W4'*delta4); - delta2 = a2.*(1-a2).*(W3'*delta3); - % Gradient step - W2 = W2 - eta*delta2*x'; - W3 = W3 - eta*delta3*a2'; - W4 = W4 - eta*delta4*a3'; - b2 = b2 - eta*delta2; - b3 = b3 - eta*delta3; - b4 = b4 - eta*delta4; - % Monitor progress - newcost = cost(W2,W3,W4,b2,b3,b4); % display cost to screen - newcost = cost(W2,W3,W4,b2,b3,b4); % display cost to screen - fprintf("iter=% 5d cost=%e\n", counter, newcost) + % Select a batch of points + batch_indices = randperm(size(x, 2), batch_size); + x_batch = x(:, batch_indices); + labels_batch = labels(:, batch_indices); + + % Initialize gradients for the batch + gradW2 = zeros(size(W2)); + gradW3 = zeros(size(W3)); + gradW4 = zeros(size(W4)); + gradb2 = zeros(size(b2)); + gradb3 = zeros(size(b3)); + gradb4 = zeros(size(b4)); + + % Loop over all examples in the batch + for k = 1:batch_size + xk = x_batch(:, k); + labelk = labels_batch(:, k); + + % Forward pass + a2 = actfn(xk, W2, b2, 'leaky_relu'); + a3 = actfn(a2, W3, b3, 'leaky_relu'); + a4 = actfn(a3, W4, b4, 'sigmoid'); + + % Backward pass + delta4 = (a4 - labelk) .* a4 .* (1 - a4); + delta3 = (W4' * delta4) .* (a3 > 0 + alpha_leak * (a3 <= 0)); % Leaky ReLU derivative + delta2 = (W3' * delta3) .* (a2 > 0 + alpha_leak * (a2 <= 0)); % Leaky ReLU derivative + + % Accumulate gradients over the batch + gradW4 = gradW4 + delta4 * a3'; + gradW3 = gradW3 + delta3 * a2'; + gradW2 = gradW2 + delta2 * xk'; + gradb4 = gradb4 + delta4; + gradb3 = gradb3 + delta3; + gradb2 = gradb2 + delta2; + end + + % Average gradients over the batch + gradW4 = gradW4 + (lambda / batch_size) * W4; + gradW3 = gradW3 + (lambda / batch_size) * W3; + gradW2 = gradW2 + (lambda / batch_size) * W2; + gradb4 = gradb4 / batch_size; + gradb3 = gradb3 / batch_size; + gradb2 = gradb2 / batch_size; + + % Update weights with gradients + mW4 = alpha * mW4 - eta * gradW4; + mW3 = alpha * mW3 - eta * gradW3; + mW2 = alpha * mW2 - eta * gradW2; + mb4 = alpha * mb4 - eta * gradb4; + mb3 = alpha * mb3 - eta * gradb3; + mb2 = alpha * mb2 - eta * gradb2; + + W4 = W4 + mW4; + W3 = W3 + mW3; + W2 = W2 + mW2; + b4 = b4 + mb4; + b3 = b3 + mb3; + b2 = b2 + mb2; + % Calculate cost and accuracy for the whole dataset + [newcost, accuracy, misclassified] = cost(W2, W3, W4, b2, b3, b4, x, labels); savecost(counter) = newcost; + saveaccuracy(counter) = accuracy; + savemisclassified{counter} = misclassified; + + % Apply decay to the learning rate + if mod(counter, decay_step) == 0 + eta = eta * decay_rate; + end + + % Early stopping if accuracy is above 95% + if accuracy >= 95 + fprintf('Achieved 95\n', counter, newcost, accuracy); + end end -figure(2) -clf -semilogy([1:1e4:Niter],savecost(1:1e4:Niter),'b-','LineWidth',2) -xlabel('Iteration Number') -ylabel('Value of cost function') -set(gca,'FontWeight','Bold','FontSize',18) -print -dpng pic_cost.webp - -%%% Display shaded and unshaded regions -N = 500; -Dx = 1/N; -Dy = 1/N; -xvals = [0:Dx:1]; -yvals = [0:Dy:1]; -for k1 = 1:N+1 - xk = xvals(k1); - for k2 = 1:N+1 - yk = yvals(k2); - xy = [xk;yk]; - a2 = activate(xy,W2,b2); - a3 = activate(a2,W3,b3); - a4 = activate(a3,W4,b4); - Aval(k2,k1) = a4(1); - Bval(k2,k1) = a4(2); - end +% After training loop: Plot accuracy vs. number of iterations +figure; +plot(saveaccuracy); +xlabel('Number of Iterations'); +ylabel('Accuracy (%)'); +title('Accuracy vs. Number of Iterations'); + +% Plot cost vs. number of iterations +figure; +plot(savecost); +xlabel('Number of Iterations'); +ylabel('Cost'); +title('Cost vs. Number of Iterations'); + +% Plot decision boundaries and points +% First, create a meshgrid to cover the input space +[xv, yv] = meshgrid(linspace(min(x(1,:)), max(x(1,:)), 100), linspace(min(x(2,:)), max(x(2,:)), 100)); +mesh_x = [xv(:)'; yv(:)']; +mesh_a2 = actfn(mesh_x, W2, b2, 'leaky_relu'); +mesh_a3 = actfn(mesh_a2, W3, b3, 'leaky_relu'); +mesh_a4 = actfn(mesh_a3, W4, b4, 'sigmoid'); +[~, mesh_classes] = max(mesh_a4); +mesh_classes = reshape(mesh_classes, size(xv)); + +% Find the misclassified points from the last iteration +misclassified_indices = savemisclassified{end}; +classified_correctly_indices = setdiff(1:size(x, 2), misclassified_indices); + +% First Plot: Decision boundaries and correctly classified points only +figure; +contourf(xv, yv, mesh_classes); +hold on; +gscatter(x(1,classified_correctly_indices), x(2,classified_correctly_indices), vec2ind(labels(:,classified_correctly_indices)), 'rgb', 'osd', 12, 'LineWidth', 4); +title('Decision Boundaries and Correctly Classified Points'); +xlabel('Feature 1'); +ylabel('Feature 2'); +legend('Class 1', 'Class 2', 'Class 3'); +hold off; + +% Second Plot: Decision boundaries and misclassified points only +figure; +contourf(xv, yv, mesh_classes); +hold on; +gscatter(x(1,misclassified_indices), x(2,misclassified_indices), vec2ind(labels(:,misclassified_indices)), 'rgb', 'osd', 12, 'LineWidth', 4); +title('Decision Boundaries and Misclassified Points Only'); +xlabel('Feature 1'); +ylabel('Feature 2'); +legend('Misclassified'); +hold off; + + +% Activation function with switch for ReLU +function z = actfn(x, W, b, activation_type) + if strcmp(activation_type, 'leaky_relu') + % Define the Leaky ReLU slope for negative inputs + alpha_leak = 0.01; + z = max(alpha_leak * (W * x + b), W * x + b); + elseif strcmp(activation_type, 'relu') + z = max(0, W * x + b); + else + z = 1 ./ (1 + exp(-W * x - b)); + end +end + +% Cost function with accuracy and misclassified indices calculation +function [costval, accuracy, misclassified] = cost(W2, W3, W4, b2, b3, b4, x, labels) + misclassified = []; + correct_count = 0; + costval = 0; % Initialize the cost value + + for i = 1:size(x, 2) + input = x(:, i); + target = labels(:, i); + a2 = actfn(input, W2, b2, 'leaky_relu'); + a3 = actfn(a2, W3, b3, 'leaky_relu'); + a4 = actfn(a3, W4, b4, 'sigmoid'); + + % Compute the cross-entropy loss + epsilon = 1e-12; % since it could happen log(0), so set a small epsilon + costval = costval - sum(target .* log(a4 + epsilon) + (1 - target) .* log(1 - a4 + epsilon)); + + [~, predicted_class] = max(a4); + actual_class = find(target == 1); + if predicted_class == actual_class + correct_count = correct_count + 1; + else + misclassified = [misclassified, i]; + end + end + costval = costval / size(x, 2); % Average the cost over all examples + accuracy = (correct_count / size(x, 2)) * 100; end -[X,Y] = meshgrid(xvals,yvals); - -figure(3) -clf -a2 = subplot(1,1,1); -Mval = Aval>Bval; -contourf(X,Y,Mval,[0.5 0.5]) -hold on -colormap([1 1 1; 0.8 0.8 0.8]) -plot(x1(1:5),x2(1:5),'ro','MarkerSize',12,'LineWidth',4) -plot(x1(6:10),x2(6:10),'bx','MarkerSize',12,'LineWidth',4) -a2.XTick = [0 1]; -a2.YTick = [0 1]; -a2.FontWeight = 'Bold'; -a2.FontSize = 16; -xlim([0,1]) -ylim([0,1]) - -print -dpng pic_bdy_bp.webp - - function costval = cost(W2,W3,W4,b2,b3,b4) - - costvec = zeros(10,1); - for i = 1:10 - x =[x1(i);x2(i)]; - a2 = activate(x,W2,b2); - a3 = activate(a2,W3,b3); - a4 = activate(a3,W4,b4); - costvec(i) = norm(y(:,i) - a4,2); - end - costval = norm(costvec,2)^2; - end % of nested function end ``` I have done the following changes - While loading `points.mat`, X is now normalized such that training can converge faster - The hidden layers has now been increased to 20 neurons each - `W4` and `b4` are now initialized with the number of classes in the dataset - The weights are initialized with a smaller scale (multiplied by 0.01), likely to maintain a tighter initial distribution, reducing the risk of saturation of neurons if a sigmoid activation function is used. - Hyperparameters tuning: - The initial learning rate has been reduced to `0.001` for more stable training - Momentum (`alpha`) is set to 0.89, which is used to update the weight changes. This is to help the neural net get out of local minima points so that a more important global minimum is found. - Added batch-size of `16` for mini-batch gradient descent, a balance between SGD and single gradient descent. - Added learning rate decay, which reduces the learning rate by a factor of `0.99` every `10000` iterations. This is to help the neural net converge to a global minimum. - Training process: - Introduce batch-aware training, which trains the neural net with a batch of points instead of a single point. This is to help the neural net converge faster. - Updated activation function to provide three options: `sigmoid`, `relu`, and `leaky_relu`. The latter two are used for the hidden layers, while the former is used for the output layer. - Update the backpropagation to compute LeaKy ReLU derivatives for the hidden layers. - Updated gradients over batches, and also update the weights with L2 regularization. - Finally, updated cost functions from norm-based error (MSE) to cross-entropy loss, which is more suitable for classification problems. - Activation function: The leaky ReLU activation function is explicitly defined, which helps to mitigate the “dying ReLU” problem where neurons can become inactive and only output zero. The following contains graphs of the training process: ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p1-acc.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p1-cost.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p1-correct.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p1-failed.webp) ## P2 Implement in Matlab the bisection and Newton’s method for finding roots of scalar equations. Use your implementation of the bisection method to find a root of a. $\frac{1}{x} - exp(2-\sqrt{x})$ in $[0.1, 1]$ b. $x*sin(x) − 1$ in $[0, 2]$. Use your implementation of Newton’s method and Matlab’s `fsolve` to find a root of a. $\frac{1}{x} - exp(2-\sqrt{x})$ with initial guess $x_0 = 1$ b. $x*sin(x) − 1$ with initial guess $x_0 = 2$ For the bisection method, use $\text{tol} = 10*{−10}$. For your Newton and `fsolve`, solve until $|f(x_nn)| \leq 10^{−10}$. If you are obtaining different roots, explain the differences. Also, discuss the number of iterations. _Solution_ ### Bisection method ```matlab title="bisection.m" function [root, fval, iter] = bisection(f, a, b, tol) if f(a) * f(b) >= 0 error('f(a) and f(b) must have opposite signs'); end iter = 0; while (b - a) / 2 > tol iter = iter + 1; c = (a + b) / 2; if f(c) == 0 break; end if f(a) * f(c) < 0 b = c; else a = c; end end root = (a + b) / 2; fval = f(root); end f1 = @(x) 1/x - exp(2 - sqrt(x)); a1 = 0.1; b1 = 1; tol = 1e-9; [root_bisect1, fval_bisect1, iter_bisect1] = bisection(f1, a1, b1, tol); f2 = @(x) x*sin(x) - 1; a2 = 0; b2 = 2; [root_bisect_2, fval_bisect_2, iter_bisect_2] = bisection(f2, a2, b2, tol); ``` ### Newton method ```matlab title="newton.m" function [root, fval, iter] = newton(f, df, x0, tol) maxIter = 1000; % Limit number of iterations to prevent infinite loop iter = 0; x = x0; fx = f(x); while abs(fx) > tol && iter < maxIter iter = iter + 1; x = x - fx / df(x); fx = f(x); end root = x; fval = fx; end df1 = @(x) -1/x^2 + (1/(2*sqrt(x))) * exp(2 - sqrt(x)); x1 = 1; [root_newton_1, fval_newton_1, iter_newton_1] = newton(f1, df1, x1, tol); df2 = @(x) sin(x) + x*cos(x); x2 = 2; [root_newton_2, fval_newton_2, iter_newton_2] = newton(f2, df2, x2, tol); ``` ### `fsolve` ```matlab title="fsolve.m" options = optimoptions('fsolve', 'Display', 'off', 'FunctionTolerance', 1e-9); [root_fsolve_1, fval_fsolve_1, exitflag_1, output_1] = fsolve(f1, x1, options); iter_fsolve_1 = output_1.iterations; [root_fsolve_2, fval_fsolve_2, exitflag_2, output_2] = fsolve(f2, x2, options); iter_fsolve_2 = output_2.iterations; ``` ### Table For $\frac{1}{x} - exp(2-\sqrt{x})$ | method | root $r$ | $f(r)$ | num. iterations | | --------- | -------- | ----------- | --------------- | | bisection | 0.2152 | 8.7809e-09 | 29 | | Newton | 28.6942 | -2.5223e-14 | 9 | | `fsolve` | 28.5131 | -3.7357e-04 | 12 | For $x*sin(x) − 1$ | method | root $r$ | $f(r)$ | num. iterations | | --------- | -------- | ----------- | --------------- | | bisection | 1.1142 | 4.3660e-10 | 30 | | Newton | -9.3172 | -2.4834e-11 | 5 | | `fsolve` | 1.1142 | -1.9488e-08 | 3 | ### Analysis For $\frac{1}{x} - exp(2-\sqrt{x})$ 1. Bisection method: The bisection method found a root in the interval $[0.1,1]$ as expected. This method guarantees convergence to a root when it exists within the interval and the function changes sign. However, it is generally slower, as indicated by the higher number of iterations. 2. Newton’s method: converged to a completely different root, which is outside the interval considered for the bisection method. This shows that Newton’s method is highly sensitive to the initial guess. It also converges faster (fewer iterations) but can lead to roots that are far from the initial guess if the function is complex or if the derivative does not behave well. 3. `fsolve`: Similar to Newton’s method, `fsolve` also found a root far from the interval used for the bisection method. Likely uses a variant of Newton’s method or a similar approach, which explains the similar behavior. For $x*sin(x) − 1$ 1. Bisection method: As with the first function, the bisection method finds a root within the specified interval. The method is reliable but slow, as seen from the number of iterations. 2. Newton’s method: converged to a negative root, which is quite far from the interval $[0,2]$. This indicates that for this particular function, the method diverged significantly from the initial guess due to the function’s complex behavior, especially when considering trigonometric functions combined with polynomial terms. **Discussion**: - Root Differences: The significant differences in roots, especially for Newton’s method and `fsolve`, highlight the sensitivity of these methods to initial guesses and the nature of the function. For complex functions, especially those with multiple roots, the choice of the initial guess can lead to convergence to entirely different roots. - Number of Iterations: Newton’s method and `fsolve` generally require fewer iterations than the bisection method, demonstrating their faster convergence rate. However, this comes at the cost of potentially finding different roots, as seen in the results. ## P3 The annuity equation is $A =\frac{P}{r}(1 − (1 + r)^{−n})$ where $A$ is borrowed amount, $P$ is the amount of each payment, $r$ is the interest rate per period, and there are $n$ equally spaced payments. - Write Newton’s method for finding $r$. - Implement the function `function r = interest(A, n, P)` which returns the annual interest rate. Your function must call `fsolve`. Ensure that `fsolve` uses the analytical form of the derivative. Report the values of `interest(100000, 20*12, 1000), interest(100000, 20*12, 100)`. Interpret the results. _Solution_ ### Newton’s function for finding $r$ Given $A =\frac{P}{r}(1 − (1 + r)^{−n})$ We have $f(r)=\frac{P}{r}(1 − (1 + r)^{−n}) - A$ Newton’s methods says $r_1 = r_0 - \frac{f(r_0)}{f^{'}(r_0)}$, with $f^{'}(r)=-\frac{P}{r^2}(1-(1+r)^{-n}) + \frac{Pn}{r}(1+r)^{-n-1}$ Thus, $r = r_0 - \frac{\frac{P}{r_0}(1 − (1 + r_0)^{−n}) - A}{-\frac{P}{r_0^2}(1-(1+r_0)^{-n}) + \frac{Pn}{r_0}(1+r_0)^{-n-1}}$ ### Implementation ```matlab title="interest.m" function r = interest(A, n, P) % Define the function f(r) function value = f(r) value = P/r * (1 - (1 + r)^-n) - A; end % Define the derivative f'(r) function value = df(r) value = -P/r^2 * (1 - (1 + r)^-n) + P*n/r * (1 + r)^(-n-1); end % Initial guess for r r_initial = 0.05; % A typical starting value for interest rates % Solve for r using fsolve options = optimoptions('fsolve', 'Display', 'none', 'SpecifyObjectiveGradient', true); r = fsolve(@(r) deal(f(r), df(r)), r_initial, options); end f_1000=interest(100000, 20*12, 1000) f_100=interest(100000, 20*12, 100) ``` From calculation, `f_100=-0.0099` and `f_1000=0.0500`. Given here that we use `r_0=0.05` (or 5% interests rate) For $P=1000$, the interest rate that satisfies the annuity equation is approximately 5% For $P=100$, the interest rate required to satisfy the loan conditions would have to be different from 5%. The negative value of the function (-0.0099) suggests that the actual interest rate required to meet the annuity equation under these conditions is lower than the initial guess of 5%. _Note that the initial value here affects Newton’s approximation vastly. If one changes to `1%` one might observe different value_ ## P4 Consider Newton’s method on $x^5 − x^3 − 4x = 0$ a. How do the computed approximations behave with $x_0 = 1$? b. Try your implementation with $x_0 = 1$ and $x_0 = 1 + 10^{−14}$. Explain why this method behaves differently, when started with $x_0 = 1 + 10^{−14}$, compared to when it is started with $x_0 = 1$. c. Solve also with `fsolve`. Comment on the results. _Solution_ Given the Newton’s implementation ```matlab function [root, fval, iter] = newton(f, df, x0, tol) maxIter = 1000; % Limit number of iterations to prevent infinite loop iter = 0; x = x0; fx = f(x); while abs(fx) > tol && iter < maxIter iter = iter + 1; x = x - fx / df(x); fx = f(x); end root = x; fval = fx; end % Define the function and its derivative f = @(x) x.^5 - x.^3 - 4*x; df = @(x) 5*x.^4 - 3*x.^2 - 4; % Initial guess x0 = 1 x0 = 1; tol = 1e-10; root = newton(f, df, x0, tol); disp(root); % Initial guess x0 = 1 + 10^-14 x0 = 1 + 1e-14; root = newton(f, df, x0, tol); disp(root); ``` a. The approximation converges to $x=1.0$. This indicates that the method finds a root at $x=1$. b. Newton’s Method with $x_0=1$: The approximation converges to $x=1.0$. This indicates that the method finds a root at $x=1$. Newton’s Method with $x_0=1+10^{−14}$: The approximation converges to a different value, approximately $x=1.600485180440241$. This suggests that a small change in the initial guess leads Newton’s method to converge to a different root, highlighting the method’s sensitivity to initial conditions. c. Using `fsolve` ```matlab options = optimoptions('fsolve','Display','none'); % Suppress fsolve output root_fsolve = fsolve(f, 1, options); disp(root_fsolve); ``` The fsolve result differs from the Newton’s method result for the same initial guess (yields `0`). This could be due to the inherent differences in the algorithms used by `fsolve`. [fsolve](https://www.mathworks.com/help/optim/ug/fsolve.html) in matlab uses `Levenberg-Marquardt`, which finds roots approximately by minimizing the sum of squares of the function and is quite robust, comparing to the heuristic implementation of the Newton’s implementation. ## P5 Implement Newton’s method for systems of equations. Each of the following systems of nonlinear equations may present some difficulty in computing a solution. Use Matlab’s `fsolve` and your own implementation of Newton’s method to solve each of the systems from the given starting point. In some cases, the nonlinear solver may fail to converge or may converge to a point other than a solution. When this happens, try to explain the reason for the observed behavior. Report for `fsolve` and your implementation of Newton’s method and each of the systems below, the number of iterations needed to achieve accuracy of $10^{−6}$ (if achieved). ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p5-a.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p5-b.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p5-c.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p5-d.webp) _Solution_ For all of the following Newton’s method implementation, it will follow the following framework ```matlab function p5 % Define the system of equations F function F = equations(x) end % Define the Jacobian of the system function J = jacobian(x) end % Initial guess x0 = ...; % Tolerance and maximum number of iterations tol = 1e-6; max_iter = 100; % Newton's method x = x0; for iter = 1:max_iter F_val = equations(x); J_val = jacobian(x); delta = -J_val \ F_val; % Solve for the change using the backslash operator x = x + delta; % Update the solution % Check for convergence if norm(delta, Inf) < tol fprintf('Newton''s method: Solution found after %d iterations.\n', iter); fprintf('x1 = %.6f, x2 = %.6f\n', x(1), x(2)); break; end end if iter == max_iter fprintf('Newton''s method: No solution found after %d iterations.\n', max_iter); end % fsolve method options = optimoptions('fsolve', 'Display', 'off', 'TolFun', tol, 'MaxIterations', max_iter); [x_fsolve, ~, exitflag, output] = fsolve(@equations, x0, options); if exitflag > 0 % fsolve converged to a solution fprintf('fsolve: Solution found after %d function evaluations.\n', output.funcCount); fprintf('x1 = %.6f, x2 = %.6f\n', x_fsolve(1), x_fsolve(2)); else fprintf('fsolve: No solution found, exit flag = %d.\n', exitflag); end end ``` Where `equations` is the Newton’s system of equations, `jacobian` is the Jacobian of the system, followed by `x` as the approximation and check for convergence. ### A. ```matlab function p5 % Define the system of equations function F = equations(x) F = zeros(2,1); % Ensure F is a column vector F(1) = x(1) + x(2)*(x(2)*(5 - x(2)) - 2) - 13; F(2) = x(1) + x(2)*(x(2)*(1 + x(2)) - 14) - 29; end % Define the Jacobian of the system function J = jacobian(x) J = zeros(2,2); % Initialize J as a 2x2 matrix J(1,1) = 1; J(1,2) = (5 - 3*x(2))*x(2) - 2; J(2,1) = 1; J(2,2) = (1 + 3*x(2))*x(2) - 14; end % Initial guess x0 = [15; -2]; % Tolerance and maximum number of iterations tol = 1e-6; max_iter = 100; % Newton's method x = x0; for iter = 1:max_iter F_val = equations(x); J_val = jacobian(x); delta = -J_val \ F_val; % Solve for the change using the backslash operator x = x + delta; % Update the solution % Check for convergence if norm(delta, Inf) < tol fprintf('Newton''s method: Solution found after %d iterations.\n', iter); fprintf('x1 = %.6f, x2 = %.6f\n', x(1), x(2)); break; end end if iter == max_iter fprintf('Newton''s method: No solution found after %d iterations.\n', max_iter); end % fsolve method options = optimoptions('fsolve', 'Display', 'off', 'TolFun', tol, 'MaxIterations', max_iter); [x_fsolve, ~, exitflag, output] = fsolve(@equations, x0, options); if exitflag > 0 % fsolve converged to a solution fprintf('fsolve: Solution found after %d function evaluations.\n', output.iterations); fprintf('x1 = %.6f, x2 = %.6f\n', x_fsolve(1), x_fsolve(2)); else fprintf('fsolve: No solution found, exit flag = %d.\n', exitflag); end end ``` yields ```prolog Newton's method: Solution found after 23 iterations. x1 = 5.000000, x2 = 4.000000 fsolve: No solution found, exit flag = -2. ``` Since Newton’s method is locally convergent, meaning that if the starting point is close enough to the actual solution, it will usually converge quickly, and in this case, it did. This means the initial guess was sufficiently close with the true solution. However, `fsolve` did not converge to a solution. Since `fsolve` uses Levenberg-Marquardt algorithm (This algorithm is a [trust-region type algorithm](https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm), which is a combination of the Gauss-Newton algorithm and the method of gradient descent), it does come with limitation: 1. The Levenberg-Marquardt algorithm can be sensitive to the starting values. If the initial guess is not sufficiently close to the true solution, the algorithm may not converge. (which we observed) 2. Local minima: The algorithm may converge to a local minimum instead of a global minimum, especially if the function landscape is complex with multiple minima. exit flag of -2 means that the two consecutive steps taken by the algorithm were unable to decrease the residual norm, and the algorithm terminated prematurely. ### B ```matlab function p5 % Define the system of equations function F = equations(x) F = zeros(3,1); % Ensure F is a column vector F(1) = x(1)^2 + x(2)^2 + x(3)^2 - 5; F(2) = x(1) + x(2) - 1; F(3) = x(1) + x(3) - 3; end % Define the Jacobian of the system function J = jacobian(x) J = zeros(3,3); % Initialize J as a 3x3 matrix J(1,1) = 2*x(1); J(1,2) = 2*x(2); J(1,3) = 2*x(3); J(2,1) = 1; J(2,2) = 1; J(2,3) = 0; J(3,1) = 1; J(3,2) = 0; J(3,3) = 1; end % Initial guess x0 = [(1+sqrt(3))/2; (1-sqrt(3))/2; sqrt(3)]; % Tolerance and maximum number of iterations tol = 1e-6; max_iter = 100; % Newton's method x = x0; for iter = 1:max_iter F_val = equations(x); J_val = jacobian(x); delta = -J_val \ F_val; % Solve for the change using the backslash operator x = x + delta; % Update the solution % Check for convergence if norm(delta, Inf) < tol fprintf('Newton''s method: Solution found after %d iterations.\n', iter); fprintf('x1 = %.6f, x2 = %.6f, x3 = %.6f\n', x(1), x(2), x(3)); break; end end if iter == max_iter fprintf('Newton''s method: No solution found after %d iterations.\n', max_iter); end % fsolve method options = optimoptions('fsolve', 'Display', 'off', 'TolFun', tol, 'MaxIterations', max_iter); [x_fsolve, ~, exitflag, output] = fsolve(@equations, x0, options); if exitflag > 0 % fsolve converged to a solution fprintf('fsolve: Solution found after %d function evaluations.\n', output.iterations); fprintf('x1 = %.6f, x2 = %.6f, x3 = %.6f\n', x_fsolve(1), x_fsolve(2), x_fsolve(3)); else fprintf('fsolve: No solution found, exit flag = %d.\n', exitflag); end end ``` yields ```prolog Newton's method: Solution found after 57 iterations. x1 = 1.666667, x2 = -0.666667, x3 = 1.333333 fsolve: Solution found after 6 function evaluations. x1 = 1.000000, x2 = 0.000000, x3 = 2.000000 ``` Newton’s method takes steps based directly on the local derivative information, potentially taking large steps when far from the solution and smaller steps when closer. fsolve, when using the Levenberg-Marquardt algorithm, combines aspects of the gradient descent method (which takes smaller, more cautious steps) with the Gauss-Newton method (which is more aggressive). This can lead to different paths through the solution space and convergence to different solutions fsolve might have found a local minimum, which it mistook for a global minimum, while Newton’s method might have bypassed this due to its larger initial steps. ### C ```matlab function p5 % Define the system of equations function F = equations(x) F = zeros(4,1); % Ensure F is a column vector F(1) = x(1) + x(2)*10; F(2) = sqrt(5)*(x(3) - x(4)); F(3) = (x(2)-x(3))^2; F(4) = sqrt(10)*(x(1)-x(4))^2; end % Define the Jacobian of the system function J = jacobian(x) J = zeros(4,4); % Initialize J as a 3x3 matrix J(1, :) = [1, 10, 0, 0]; J(2, :) = [0, 0, sqrt(5), -sqrt(5)]; J(3, :) = [0, 2*(x(2) - x(3)), -2*(x(2) - x(3)), 0]; J(4, :) = [2*sqrt(10)*(x(1) - x(4)), 0, 0, -2*sqrt(10)*(x(1) - x(4))]; end % Initial guess x0 = [1; 2; 1; 1]; % Tolerance and maximum number of iterations tol = 1e-6; max_iter = 100; % Newton's method x = x0; for iter = 1:max_iter F_val = equations(x); J_val = jacobian(x); delta = -J_val \ F_val; % Solve for the change using the backslash operator x = x + delta; % Update the solution % Check for convergence if norm(delta, Inf) < tol fprintf('Newton''s method: Solution found after %d iterations.\n', iter); fprintf('x1 = %.6f, x2 = %.6f, x3 = %.6f, x4 = %.6f\n', x); break; end end if iter == max_iter fprintf('Newton''s method: No solution found after %d iterations.\n', max_iter); end % fsolve method options = optimoptions('fsolve', 'Display', 'iter', 'TolFun', tol, 'MaxIterations', max_iter); [x_fsolve, fval, exitflag, output] = fsolve(@equations, x0, options); if exitflag > 0 % fsolve converged to a solution fprintf('fsolve: Solution found after %d function evaluations.\n', output.funcCount); fprintf('x1 = %.6f, x2 = %.6f, x3 = %.6f, x4 = %.6f\n', x_fsolve); else fprintf('fsolve: No solution found, exit flag = %d.\n', exitflag); end end ``` yields ```prolog Newton's method: No solution found after 100 iterations. fsolve: Solution found after 35 function evaluations. x1 = -0.002673, x2 = 0.000267, x3 = 0.000407, x4 = 0.000407 ``` Newton’s method cannot find convergence after 100 steps because of divergence, if the initial guess is not close to the root, especially in the presence of steep gradients or saddle points. The Jacobian matrix at some point during the iteration may become ill-conditioned, which would lead to large numerical errors in the computation of the inverse or the solution of the linear system in each iteration `fsolve` converged here, meaning Levenberg-Marquardt is probably more robust in converging a local minima in this case. ### D ```matlab function p5 % Define the system of equations function F = equations(x) F = zeros(2,1); % Ensure F is a column vector F(1) = x(1); F(2) = 10*x(1) / (x(1) + 0.1) + 2*x(2)^2; end % Define the Jacobian of the system function J = jacobian(x) J = zeros(2,2); % Initialize J as a 2x2 matrix J(1, :) = [1, 0]; J(2, :) = [10*(0.1)/(x(1) + 0.1)^2, 4*x(2)]; end % Initial guess x0 = [1.8; 0]; % Tolerance and maximum number of iterations tol = 1e-6; max_iter = 100; % Newton's method x = x0; for iter = 1:max_iter F_val = equations(x); J_val = jacobian(x); delta = -J_val \ F_val; % Solve for the change using the backslash operator x = x + delta; % Update the solution % Check for convergence if norm(delta, Inf) < tol fprintf('Newton''s method: Solution found after %d iterations.\n', iter); fprintf('x1 = %.6f, x2 = %.6f\n', x(1), x(2)); break; end end if iter == max_iter fprintf('Newton''s method: No solution found after %d iterations.\n', max_iter); end % fsolve method options = optimoptions('fsolve', 'Display', 'off', 'TolFun', tol, 'MaxIterations', max_iter); [x_fsolve, ~, exitflag, output] = fsolve(@equations, x0, options); if exitflag > 0 % fsolve converged to a solution fprintf('fsolve: Solution found after %d function evaluations.\n', output.iterations); fprintf('x1 = %.6f, x2 = %.6f\n', x_fsolve(1), x_fsolve(2)); else fprintf('fsolve: No solution found, exit flag = %d.\n', exitflag); end end ``` yields ```prolog Newton's method: No solution found after 100 iterations. fsolve: Solution found after 15 function evaluations. x1 = 0.000000, x2 = -0.000316 ``` For Newton’s method: - Non-convergence: The fact that Newton’s method did not converge could be due to several factors such as a poor initial guess, especially since $x_1=0$ is one of the solutions, which may lead to a division by zero or a derivative that does not exist at some point during the iteration. - Sensitive Derivative: The function $\frac{10x_1}{(x_1+0.1)}$ has a derivative that becomes very large as $x_1$ approaches -0.1, and this can cause numerical issues, such as overflow or large rounding errors, which can prevent convergence. - Flat Regions: The method might be getting stuck in a flat region of the function where the gradient is very small, leading to very small steps that do not significantly change the estimate of the solution. For `fsolve` observation, same arguments can be made that of similar to problem C observation, with regards to robustness of Levenberg-Marquardt algorithm in solving this system of non-linear equations. ## P6 You are given the data file [data.txt](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/data.txt). Each row contains the 2D coordinates $(x_i, y_i)$ of an object at time $t_i$. This object exhibits a periodic motion. Implement the function `function period = findPeriod(file_name)` that reads the data from a file and computes the period of the periodic motion. The points in time where the object returns to the same position must be determined using fsolve. Report the value for the computed period. _Solution_ ```matlab title="findPeriod.m" function period = findPeriod(file_name) % Parse the data from the file data = parse(file_name); % Extract time, x, and y coordinates t = data(:, 1); x = data(:, 2); y = data(:, 3); % Define a tolerance for how close the object needs to be to its initial position tolerance = 1e-9; % Define interpolation functions for x and y, restricted to the range of data x_interp = @(tq) interp1(t, x, tq, 'spline', 'extrap'); y_interp = @(tq) interp1(t, y, tq, 'spline', 'extrap'); % Define the distance function from the initial position distance_from_initial = @(tq) sqrt((x_interp(tq) - x(1))^2 + (y_interp(tq) - y(1))^2); % Initial guess for fsolve - use the midpoint of the time data initial_guess = t(floor(length(t)/2)); % Use fsolve to find the time at which the distance is minimized options = optimoptions('fsolve', 'Display', 'iter', 'TolFun', tolerance, 'MaxFunEvals', 10000); t_period = fsolve(distance_from_initial, initial_guess, options); % Calculate the period period = t_period - t(1); end function data = parse(file_name) % Open the file fid = fopen(file_name, 'rt'); if fid == -1 error('Failed to open file: %s', file_name); end % Read the data from the file % Assuming the data is separated by spaces or tabs data = fscanf(fid, '%f %f %f', [3, Inf]); % Transpose the data to have rows as individual entries data = data'; % Close the file fclose(fid); end ``` yields the computed period of `39.3870` ## P7 Consider two bodies of masses $\mu = 0.012277471$ and $\hat{\mu} = 1 - \mu$ (Earth and Sun) in a planar motion, and a third body of negligible mass (moon) moving in the same plane. The motion is given by $u_1^{''} = u_1 + 2u_2^{'} -\hat{\mu}\frac{u_1+\mu}{((u_1+\mu)^2+u_2^2)^{\frac{3}{2}}}-\mu\frac{(u_1-\hat{\mu})}{((u_1-\hat{\mu})^2+u_2^2)^{\frac{3}{2}}}$ and $u_2^{''} = u_2 - 2u_1^{'} - \hat{\mu}\frac{u_2}{((u_1+\mu)^2 + u_2^2)^{\frac{3}{2}}} - \mu\frac{u_2}{((u_1 - \hat{\mu})^2 + u_2^2)^{\frac{3}{2}}}$ The initial values are $u_1(0) = 0.994$, $u_1^{'}(0) = 0$, $u_2(0) = 0$, $u_2^{'}(0) = −2.001585106379082522420537862224$. Implement the classical Runge-Kutta method of order 4 and integrate this problem on $[0,17.1]$ with uniform stepsize using 100, 1000, 10,000, and 20,000 steps. Plot the orbits for each case. How many uniform steps are needed before the orbit appears to be qualitatively correct? Submit plots and discussion. _Solution_ ```matlab title="rk4.m" function rk4 % Constants mu = 0.012277471; mu_hat = 1 - mu; % Initial Conditions u0 = [0.994, 0, 0, -2.001585106379082522420537862224]; % Time Span t_span = [0 17.1]; % Solve for different step sizes step_sizes = [100, 1000, 10000, 20000]; for i = 1:length(step_sizes) solve_with_steps(t_span, u0, step_sizes(i), mu, mu_hat); end end function solve_with_steps(t_span, u0, steps, mu, mu_hat) % RK4 Integration h = (t_span(2) - t_span(1)) / steps; t = linspace(t_span(1), t_span(2), steps); u = zeros(length(u0), length(t)); u(:,1) = u0'; for i = 1:length(t)-1 k1 = equations(t(i), u(:,i), mu, mu_hat); k2 = equations(t(i) + h/2, u(:,i) + h/2*k1, mu, mu_hat); k3 = equations(t(i) + h/2, u(:,i) + h/2*k2, mu, mu_hat); k4 = equations(t(i) + h, u(:,i) + h*k3, mu, mu_hat); u(:,i+1) = u(:,i) + h/6 * (k1 + 2*k2 + 2*k3 + k4); end % Plotting figure; plot(u(1,:), u(3,:)); xlabel('u1'); ylabel('u2'); title(sprintf('Orbit of the Third Body with RK4 (%d Steps)', steps)); grid on; % NOTE: The below is the correct approximation using ode45, % but for the sake of this assignment, we implement RK4 % t_eval = linspace(t_span(1), t_span(2), steps); % [T, U] = ode45(@(t,u) equations(t, u, mu, mu_hat), t_eval, u0); % % Plotting % figure; % plot(U(:,1), U(:,3)); % xlabel('u1'); % ylabel('u2'); % title(sprintf('Orbit of the Third Body (%d Steps)', steps)); % grid on; end function dudt = equations(t, u, mu, mu_hat) u1 = u(1); u1_prime = u(2); u2 = u(3); u2_prime = u(4); delta1 = ((u1 + mu)^2 + u2^2)^1.5; delta2 = ((u1 - mu_hat)^2 + u2^2)^1.5; du1dt = u1_prime; du1_primedt = u1 + 2*u2_prime - mu_hat*(u1 + mu)/delta1 - mu*(u1 - mu_hat)/delta2; du2dt = u2_prime; du2_primedt = u2 - 2*u1_prime - mu_hat*u2/delta1 - mu*u2/delta2; dudt = [du1dt; du1_primedt; du2dt; du2_primedt]; end ``` yields the following graph ### 100 steps ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p7-100.webp) It appears that the plot for the first 100 steps of the three-body problem using the RK4 method in MATLAB shows a spiral pattern rather than the expected closed orbit. This divergence could be due to several factors: Step Size: A step size of 100 may be too large to accurately capture the dynamics of the system, leading to significant numerical errors. The three-body problem is known for its sensitivity to initial conditions and step sizes, and thus requires a smaller step size for a more accurate solution. Numerical Stability: The RK4 method, while fourth-order accurate for each step, is not guaranteed to be stable for all step sizes and problems. ### 1000 steps ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p7-1000.webp) The plot for the 1000 steps case shows a significant improvement over the 100 steps case. This plot demonstrates a more defined and coherent orbit, which suggests that the step size is more appropriate for capturing the dynamics of the system. The spiral pattern from the 100 steps case is less pronounced, and the orbit begins to resemble the expected closed path of the three-body problem. However, there is still some noticeable deviation and distortion in the orbit, which indicates that while the solution is converging towards the correct behavior with a smaller step size, further refinement might be necessary. In practice, continuing to reduce the step size can help further improve the accuracy of the orbit. ### 10000 steps ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p7-10000.webp) The plot for 10,000 steps demonstrates a substantial improvement and now depicts a closed orbit, which is characteristic of the three-body problem when solved with sufficient numerical accuracy. This indicates that a step size small enough to capture the system’s dynamics accurately has been achieved, and the RK4 method is yielding a reliable approximation of the moon’s orbit. The orbit is smooth and does not exhibit the distortions seen in the plots with fewer steps. This suggests that the numerical integration is now sufficiently resolving the trajectory over the time span of interest. With 10,000 steps, it appears that the orbit is qualitatively correct, showing the expected behavior of a third body under the gravitational influence of the other two massive bodies. Sufficient Resolution: The step size of 10,000 steps seems to provide a high enough resolution for the RK4 method to produce a stable and accurate orbit. Numerical Accuracy: The smaller step size has reduced the numerical errors to a level where they do not significantly affect the qualitative behavior of the solution. Orbit Stability: The closed and stable orbit indicates that the solution is likely converging to the true physical behavior of the system. ### 20000 steps ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p7-20000.webp) The plot for 20,000 steps exhibits a very stable and well-defined orbit, which closely resembles the plot for 10,000 steps. This consistency between the two resolutions suggests that the numerical solution has converged, and increasing the step count further does not result in any significant changes to the orbit’s shape or accuracy. _NOTE:_ The above implementation manually implement RK4, since `ode45` is an adaptive methods and not conform to the fixed-step RK4. The equivalent of [Python’s implementation](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/rk4.py) with RK45. ## P8 The following system of ODEs, formulated by [Lorenz](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/Chaos#as-system), represents are crude model of atmospheric circulation: $$ \begin{align} y_1^{'} &= \sigma(y_2-y_1) \\\ y_2^{'} &= ry_1 - y_2 - y_1y_3 \\\ y_3^{'} &= y_1y_2 - by_3 \end{align} $$ Set $\omega = 10, b = \frac{8}{3}, r = 28$, take initial values $y_1(0) = 15, y_2(0) = 15, and y_3(0) = 36$, and integrate this ODE from $t = 0 \text{to} t = 100$ using Matlab’s `ode45`. Plot each component of the solution as a function of $t$. Plot also $(y_1, y_2)$, $(y_1, y_3)$, and $(y_2, y_3)$ (in separate plots). Change the initial values by a tiny amount (e.g. $10^{−10}$) and integrate again. Compare the difference in the computed solutions. _Solution_ ```matlab title="lorenz.m" function lorenz % Parameters sigma = 10; b = 8/3; r = 28; % Initial conditions y0 = [15; 15; 36]; % Time span tspan = [0 100]; % Solve the ODE [t, Y] = ode45(@(t,y) lorenzODE(t, y, sigma, b, r), tspan, y0); % Plotting the solutions figure; subplot(2, 2, 1); plot(t, Y(:,1), t, Y(:,2), t, Y(:,3)); title('Time Series of y1, y2, y3'); legend('y1', 'y2', 'y3'); xlabel('Time'); ylabel('Values'); subplot(2, 2, 2); plot(Y(:,1), Y(:,2)); title('y1 vs y2'); xlabel('y1'); ylabel('y2'); subplot(2, 2, 3); plot(Y(:,1), Y(:,3)); title('y1 vs y3'); xlabel('y1'); ylabel('y3'); subplot(2, 2, 4); plot(Y(:,2), Y(:,3)); title('y2 vs y3'); xlabel('y2'); ylabel('y3'); % Modify initial conditions and solve again y0_mod = y0 + 1e-10; [t_mod, Y_mod] = ode45(@(t,y) lorenzODE(t, y, sigma, b, r), tspan, y0_mod); % Interpolate Y_mod to match the time points of t Y_mod_interp = interp1(t_mod, Y_mod, t); % Compute the differences Y_diff = Y - Y_mod_interp; % Plot the differences figure; plot(t, Y_diff); title('Difference in Solutions with Modified Initial Conditions'); legend('Δy1', 'Δy2', 'Δy3'); xlabel('Time'); ylabel('Difference in Values'); end function dydt = lorenzODE(t, y, sigma, b, r) % Lorenz system ODEs dydt = [sigma*(y(2) - y(1)); r*y(1) - y(2) - y(1)*y(3); y(1)*y(2) - b*y(3)]; end ``` ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p8-four-graphs.webp) The difference between the time series graph as shown ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p8-delta.webp) and diff between Lorenz are shown ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/a4-p8-delta-funcs.webp) The difference in the time series of $y_1, y_2, y_3$ indicates how sensitive the Lorenz system is to initial conditions. Even though the change in the initial conditions is extremely small (on the order of $10^{-10}$), the differences in the variables grow over time. This divergence is a characteristic of [chaotic](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/A4/../../../../../../../../thoughts/Chaos#as-system) systems and is known as sensitivity to initial conditions or the butterfly effect. Difference in Phase Space Graphs, or the delta plots show that the discrepancies between the two sets of solutions with slightly different initial conditions also exhibit complex behavior. Initially, the differences are small, but as time progresses, they become more pronounced, indicating that the system’s trajectory has deviated significantly from the original path. ## P9 Let A be an $n × n$ singular matrix. Let $F(X) = I − AX$ where $I$ is the $n × n$ identity matrix. When $F(X)$ is the zero $n × n$ matrix, then $X = A^{-1}$. We can use Newton’s method to find $A^{−1}$: $X_{k+1} = X_k + A^{−1}(I − AX_k)$ We replace $A^{-1}$ by $X_k$ to obtain the formula $X_{k+1} = X_k + X_k(I − AX_k)$ (1) a. Write a function to compute the inverse of a given matrix A using (1). You can use as an initial guess $X_0 = \frac{A^T}{{|A|}_1{|A|}_{\infty}}$ Test your program on a few random matrices and report numerical experiments comparing its accuracy and efficiency with Matlab’s inverse function `inv`. b. Does (1) converge quadratically? Provide sufficient detail supporting your claim. _Solution_ a. The following entails the MATLAB solution (Python equivalent is [inverse\_newt.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/compsci-4x03/inverse_newt.py)) ```matlab title="matrix_inverse_newt.m" function A_inv = matrix_inverse_newt(A) tol = 1e-9; max_iter = 100; n = size(A, 1); I = eye(n); Xk = A' / (norm(A, 1) * norm(A, inf)); for k = 1:max_iter Rk = I - A * Xk; Xk_new = Xk + Xk * Rk; % Stopping criterion based on the norm of the residual matrix if norm(Rk) < tol break; end Xk = Xk_new; end A_inv = Xk; end % Test the function with a random matrix rng(0); % Seed for reproducibility n = 4; % Size of the matrix A = rand(n, n); A_inv = matrix_inverse_newt(A); % Compare with MATLAB's built-in inverse function A_inv_true = inv(A); disp('Inverse using Newton''s method:'); disp(A_inv); disp('True Inverse:'); disp(A_inv_true); disp('Difference:'); disp(A_inv - A_inv_true); ``` yields ```prolog Inverse using Newton's method: -15.2997 3.0761 14.7235 9.6445 -0.2088 -1.8442 1.0366 1.8711 14.5694 -1.9337 -14.6497 -9.0413 -0.3690 0.5345 1.4378 -0.4008 True Inverse: -15.2997 3.0761 14.7235 9.6445 -0.2088 -1.8442 1.0366 1.8711 14.5694 -1.9337 -14.6497 -9.0413 -0.3690 0.5345 1.4378 -0.4008 Difference: 1.0e-09 * 0.6111 -0.1011 -0.6027 -0.3839 0.0353 -0.0058 -0.0348 -0.0222 -0.5881 0.0973 0.5800 0.3694 0.0273 -0.0045 -0.0269 -0.0171 ``` b. For quadratic convergence, we need $lim_{k\to\infty} \frac{|e_{k+1}|}{|e_k^2|} = C$ In this case, we need to check $E_k = X_k - A^{-1}$ as k increases. $X_{k+1} = X_k + A^{−1}(I − AX_k)$ Substitute $E_k$ we have $E_{k+1}=(I-X_kA)E_k$ > Therefore, for quadratic convergence, we need $|E_{k+1}| \leq C|E_k|^2$ for some constant $C$ From this, we have $|E_{k+1}| = |(I-X_kA)E_k| \leq |I-X_kA||E_k|$ The following modification of Python implementation is used to track errors ```python title="newt_err.py" import os, numpy as np def errors(A): A_inv = matrix_inverse_newt(A) # Recalculate the inverse and get the error at each iteration A_inv_newton, errors_newton = matrix_inverse_newt_err(A) # Now we will check for quadratic convergence by calculating the ratio of errors ratios = [] for i in range(1, len(errors_newton)-1): ratios.append(errors_newton[i+1] / errors_newton[i]**2) return ratios def matrix_inverse_newt_err(A, tol=1e-9, max_iter=100): n = A.shape[0] I = np.eye(n) A_inv_true = np.linalg.inv(A) # True inverse for error calculation Xk = A.T / (np.linalg.norm(A, 1) * np.linalg.norm(A, np.inf)) errors = [] # List to track errors over iterations for _ in range(max_iter): Rk = I - np.dot(A, Xk) Xk_new = Xk + np.dot(Xk, Rk) # Calculate and store the current error current_error = np.linalg.norm(Xk_new - A_inv_true) errors.append(current_error) # Stopping criterion based on the norm of the residual matrix if current_error < tol: break Xk = Xk_new return Xk, errors if __name__ == "__main__": # Test the function with a random matrix np.random.seed(420) # Seed for reproducibility n = 4 # Size of the matrix A = np.random.rand(n, n) print(errors(A)) ``` From the Python implementation, we calculated the ratio of the error at step $n+1$ to the square of the error at step $n$, and observed that these ratios seemed to stabilize around a constant value, rather than decreasing to zero. However, the ratios did not significantly deviate, indicating a consistent rate of convergence that could be quadratic. To assert that the convergence is quadratic, we would expect the ratios to be bounded and for $|E_{k+1}|$ to be significantly less than $|E_k|^2$ as $n$ increases. The results from Python showed that the error does decrease from one iteration to the next, which is consistent with convergence. > The discrepancy between the theoretical expectation of quadratic convergence and the observed stabilisation of the error ratios might suggest that while the Newton’s method for matrix inversion is converging, it may not exhibit pure quadratic convergence in the empirical test we conducted. There could be several reasons for this: - The matrix $A$ used in the test may not meet the conditions required for quadratic convergence throughout the iterations. - The numerical precision and floating-point representation in Python may affect the calculation of the error and its ratios. --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/Equations tags: - fruit - swfr4x03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/Equations" title: "ODEs, Polynomials approx., Linear Least Squares, and Errors" date: 2023-12-06 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/Equations.html.md --- ### Machine epsilon $fl(x) = x(1+\mathbf{\epsilon}) \space\text{where }|\epsilon|\leq{u}$ $|\frac{fl(x)-x}{x}|=|\epsilon|\leq u \space\text{is called relative error.}$ $\text{Cancellations occur when subtracting nearby number containing roundoff.}$ ### Taylor series $$ \begin{aligned} f(x) &= \sum_{k=0}^{\inf}\frac{f^{(k)}(c)}{k!}(x-c)^k\\\ E_{n+1} &= \frac{f^{(n+1)}(\xi)}{(n+1)!}(h:=x-c)^{n+1}\\\ |E_{n+1}| \leq ch^{n+1}\\\ \end{aligned} $$ ### Polynomial Interpolation $$ \begin{aligned} v(x) = &\sum_{j=0}^{n}c_j\phi_{j}(x) \space \rightarrow \text{linearly independent iff} \space v(x) = 0 \space \forall \space x \rightarrow c_j=0 \space \forall \space j)\\\ &\\\ \text{Linear system: } &\begin{bmatrix} \phi_0(x_0) & \phi_1(x_0) & \cdots & \phi_n(x_0) \\ \phi_0(x_1) & \phi_1(x_1) & \cdots & \phi_n(x_1) \\ \vdots & \vdots & \ddots & \vdots \\ \phi_0(x_n) & \phi_1(x_n) & \cdots & \phi_n(x_n) \end{bmatrix} \begin{bmatrix} c_0 \\ c_1 \\ \vdots \\ c_n \end{bmatrix} = \begin{bmatrix} y_0 \\ y_1 \\ \vdots \\ y_n \end{bmatrix} \end{aligned} $$ $$ \begin{aligned} \text{Monomial basis: }&\phi_j(x)=x^j, \space j=0,1,...,n \space \rightarrow v(x)=\sum_{j=0}^{n}c_jx^j\\\ &p_n(x_i) = c_0 + c_1x_i + c_2x_i^2 + \cdots + c_nx_i^n = y_i \\\ &\\\ X: &\text{Vandermonde matrix} \rightarrow \text{det}(X)=\prod_{i=0}^{n-1} \left[ \prod_{j=i+1}^{n} (x_j - x_i) \right]\\\ \text{if } &x_i \space\text{are distinct:}\\\ &\bullet\space \text{det}(X) \neq 0\\\ &\bullet\space X\space \text{is nonsingular}\\\ &\bullet\space \text{system has unique solution}\\\ &\bullet\space \text{unique polynomial of degree}\leq{n}\space \text{that interpolates the data}\\\ &\bullet\space \text{can be poorly conditioned, work is }O(n^3)\\\ \end{aligned} $$ $$ \begin{aligned} \text{Lagrange basis: }&L_j(x_i) = \begin{cases} 0 & \text{if } i \neq j \\ 1 & \text{if } i = j \end{cases} \\\ &L_j(x) = \prod_{i=0,i\neq{j}}^{n}\frac{x-x_i}{x_j-x_i}\\\ &p_n(x_i) = \sum_{j=0}^{n} y_jL_j(x_i) = \sum_{j=0}^{i-1} y_jL_j(x_i) + y_iL_i(x_i) + \sum_{j=i+1}^{n} y_jL_j(x_i) = y_i\\\ \end{aligned} $$ $$ \begin{aligned} \text{Newton's basis: }&\phi_j(x)=\prod_{i=0}^{j-1}(x-x_i), j=0:n\\\ &p_n(x_i)=c_0 + c_1(x_i-x_0)+ \cdots + c_n(x_i-x_0)(x_i-x_1)\cdots(x_i-x_{n-1})=f(x_i)\\\ \end{aligned} $$ $$ \begin{aligned} &\text{Divided differences: }f[x_i,\cdots,x_j] = \frac{f[x_{i+1},\cdots,x_j]-f[x_i,\cdots,x_{j-1}]}{x_j-x_i}\\\ &\bullet\space\text{at } x=x_0 \text{ then } c_0 = f(x_0) = f[x_0]\\\ &\bullet\space\text{at } x=x_1 \text{ then } c_1 = \frac{f(x_1)-f(x_0)}{x_1-x_0} = f[x_0, x_1]\\\ &\bullet\space\text{at } x=x_2 \text{ then } c_2 = \frac{f(x_2)-c_0-c_1(x_2-x_0)}{(x_2-x_0)(x_2-x_1)} = \frac{\frac{f(x_2)-f(x_1)}{x_2-x_1}-\frac{f(x_1)-f(x_0)}{x_1-x_0}}{x_2-x_0} = f[x_0, x_1, x_2]\\\ &\\\ &\therefore\forall x\in{[a,b]}\space\exists\space\xi=\xi(x)\in(a,b)\space : \space f(x)-p_n(x)=\frac{f^{n+1}(\xi)}{(n+1)!} \prod_{i=0}^{n} (x - x_i)\\\ &\therefore\space\text{Error: } |f(x)-p_n(x)|\leq\frac{M}{4(n+1)}h^{n+1}\\\ &\text{where: }\\\ &\bullet\space M=max_{a\leq{t}\leq{b}}|f^{n+1}(t)|\\\ &\bullet\space h=\frac{b-a}{n}\\\ &\bullet\space x_i=a+ih \text{ for }i=0,1,\cdots,n \end{aligned} $$ ### Basic Numeric Integration $$ \begin{aligned} &I_f = \int_{a}^{b}{f(x)dx} \approx \sum_{j=0}^{n}a_jf(x_j)\space\text{(quadrature rule)}\\\ &\bullet\space x_0,\cdots,x_n\space\text{be distinct points in } [a,b]\\\ &\bullet\space p_n(x)\space\text{be interpolating polynomial of }f\rightarrow\space \int_{a}^{b}f(x)dx\approx\int_{a}^{b}p_n(x)dx\\\ &\bullet\space \text{Uses Lagrange form: }\int_{a}^{b}f(x)dx\approx\sum_{j=0}^{n}f(x_j)\int_{a}^{b}L_j(x)dx=\sum_{j=0}^{n}f(x_j)a_j\\\ \end{aligned} $$ $$ \begin{aligned} \text{Trapezoidal rule: } &f(x) \approx p_1(x)=f(x_0)L_0(x) + f(x_1)L_1(x)\space(n=1, x_0=a,x_1=b)\\\ \therefore\space &I_f=\int_{a}^{b}f(x)dx \approx f(a)\int_{a}^{b}{\frac{x-b}{a-b}dx} + f(b)\int_{a}^{b}{\frac{x-a}{b-a}dx} \\\ &\space\space\space\space=\frac{b-a}{2}[f(a) + f(b)]\\\ \text{Error: } &f(x) - p_1(x) = \frac{1}{2}f^{''}(\xi(x))(x-a)(x-b)\\\ \text{then: }&\int_{a}^{b}{(f(x)-p_1(x))dx} = \frac{1}{2}\int_{a}^{b}{f^{''}(\xi(x))(x-a)(x-b)dx}\\\ \text{From MVT: } &\exists\space\eta\in(a,b) \space : \space \int_{a}^{b}{f^{''}(\xi(x))(x-a)(x-b)dx} = f^{''}(\eta)\int_{a}^{b}{(x-a)(x-b)dx}\\\ \therefore\space&\text{Error of Trapezoidal rule: }\space I_f - I_{trap} = -\frac{f^{''}(\eta)}{12}(b-a)^3\\\ \end{aligned} $$ $$ \begin{aligned} \text{Midpoint rule: } &I_f \approx I_{mid} = (b-a)f(\frac{a+b}{2})\\\ &\text{Let } m=\frac{a+b}{2}\rightarrow f(x)=f(m)+f^{'}(m)(x-m)+\frac{1}{2}f^{''}(\xi(x))(x-m)^2\\\ \therefore\space&I_f = \int_{a}^{b} f(x) = (b - a)f(m) + \frac{1}{2} \int_{a}^{b} f''(\xi(x))(x - m)^2 \, dx\\\ &\exists\space\eta\in(a,b)\space : \space \frac{1}{2} \int_{a}^{b} f''(\xi(x))(x - m)^2 \, dx = \frac{f''(\eta)}{24}(b - a)^3\\\ \therefore\space&\text{Error of Midpoint rule: }\space I_f - I_{mid} = \frac{f^{''}(\eta)}{24}(b-a)^3\\\ \end{aligned} $$ $$ \begin{aligned} \text{Simpson's rule: } &I_f \approx I_{simp} = \frac{b-a}{6}[f(a) + 4f(\frac{a+b}{2}) + f(b)]\\\ &(p_2(x),n=2,x_0=a,x_1=\frac{a+b}{2},x_2=b)\\\ \therefore\space&\text{Error of Simpson's rule: }\space I_f - I_{Simpson} = -\frac{f^{(4)}(\eta)}{90}(\frac{b-a}{2})^5,\space\eta\in(a,b)\\\ \end{aligned} $$ ### Composite Numeric Integration $$ \begin{aligned} &\bullet\space\text{subdivide }[a,b]\space\text{int }r\space\text{subintervals}\\\ &\bullet\space h=\frac{b-a}{r}\space\text{length per interval}\\\ &\bullet\space t_i=a+ih\space\text{for }i=0,1,\cdots,r\\\ &t_0=a,t_r=b\space\rightarrow\space\int_{a}^{b}f(x)\,dx=\sum_{i=1}^{r}\int_{t_{i-1}}^{t_i}f(x)\,dx\\\ \end{aligned} $$ $$ \begin{aligned} \text{Composite Trapezoidal rule: } &I_{cf} = \frac{h}{2} [f(a) + f(b)] + h \sum_{i=1}^{r-1} f(t_i)\\\ \text{Error: } &I_f - I_{cf} = -\frac{f^{''}(\mu)}{12}(b-a)h^2\\\ \text{Composite Simpson rule: } &I_{cs} = \frac{h}{3} [f(a) + 2 \sum_{i=1}^{r/2-1} f(t_{2i}) + 4 \sum_{i=1}^{r/2} f(t_{2i-1}) + f(b)]\\\ \text{Error: } &I_f - I_{cs} = -\frac{f^{(4)}(\zeta)}{180}(b-a)h^4\\\ \text{Composite Midpoint rule: } &I_{cm} = h \sum_{i=1}^{r} f(a + (i - 1/2)h)\\\ \text{Error: } &I_f - I_{cm} = -\frac{f^{''}(\eta)}{24}(b-a)h^2\\\ \end{aligned} $$ ### Linear Least Squares \_Find $c_j$ such that $\sum_{k=0}^{m}(v(x*k)-y_k)^2=\sum*{k=0}^{m}(\sum*{j=0}^{n}c_j\phi_j(x_k)-y_k)^2$ is minimised\* Conditions: $\frac{\partial \phi}{\partial a} = 0, \quad \frac{\partial \phi}{\partial b} = 0$ $$ \begin{aligned} \text{Linear fit: } y_k&=ax_k+b,k=1,\cdots,m\\\ \begin{bmatrix} \sum_{k=0}^{m} x_k^2 & \sum_{k=0}^{m} x_k \\ \sum_{k=0}^{m} x_k & m + 1 \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} &= \begin{bmatrix} \sum_{k=0}^{m} x_k y_k \\ \sum_{k=0}^{m} y_k \end{bmatrix}\\\ p &= \sum_{k=0}^{m} x_k, \quad q = \sum_{k=0}^{m} y_k, \quad r = \sum_{k=0}^{m} x_k y_k, \quad s = \sum_{k=0}^{m} x_k^2\\\ \rightarrow\begin{bmatrix} s & p \\ p & m + 1 \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} &= \begin{bmatrix} r \\ q \end{bmatrix}\\\ \leftrightarrow A\mathbf{z} &= \begin{bmatrix} x_0 & 1 \\ x_1 & 1 \\ \vdots & \vdots \\ x_m & 1 \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} y_0 \\ y_1 \\ \vdots \\ y_m \end{bmatrix} = \mathbf{f}\space\text{is overdetermined}\\\ &\\\ \end{aligned} $$ $$ \begin{aligned} \text{Solving linear system: }r &= b - Ax\\\ ||r||_2^2 &= \sum_{i=1}^{m}r_i^2 = \sum_{i=1}^{m}(b_i-\sum_{j=1}^{n}a_{ij}x_j)^2\\\ \text{Let } \phi(x) &= \frac{1}{2}\|r\|^2_2 = \frac{1}{2} \sum_{i=1}^{m} (b_i - \sum_{j=1}^{n} a_{ij}x_j)^2\\\ \text{Conditions}: \frac{\partial \phi}{\partial x_k} &= 0, \quad k = 1, \cdots, n\\\ 0&=\sum_{i=1}^{m}(b_i-\sum_{j=1}^{n}a_{ij}x_j)(-a_{ik})\\\ \rightarrow \sum_{i=1}^{m}a_{ik}\sum_{j=1}^{n}a_{ij}x_j &= \sum_{i=1}^{m}a_{ik}b_i, k=1,\cdots,n\space (\text{equivalent to } A^{T}Ax=A^{T}b)\\\ \end{aligned} $$ $$ \begin{aligned} A^T Ax &= A^T b \space\text{is called the normal equations}\\\ \text{If }A \text{ has a full-column rank}, &\min_{x} \|b - Ax\|_2\space\text{has uniq sol:}\\\ x&=(A^TA)^{-1}A^Tb=A^{+}b\\\ \end{aligned} $$ $$ \begin{aligned} \text{Adaptive Simpson: find } &Q \space : \space |Q - I| \leq \text{tol}\\\ I &= \int_{a}^{b} f(x) \, dx = S(a, b) + E(a, b) \\\ S_1=S(a, b) &= \frac{h}{6} \left[ f(a) + 4f\left( \frac{a + b}{2} \right) + f(b) \right] \\\ E_1=E(a, b) &= -\frac{1}{90} \left( \frac{h}{2} \right)^5 f^{(4)}(\xi), \quad \xi \text{ between } a \text{ and } b\\\ \end{aligned} $$ $$ \begin{aligned} S =\space&\text{quadSimpson}(f, a, b, \text{tol})\\\ &h = b - a, \quad c = \frac{a + b}{2}\\\ &S_1 = \frac{h}{6} [f(a) + 4f\left(\frac{a+b}{2}\right) + f(b)]\\\ &S_2 = \frac{h}{12} [f(a) + 4f\left(\frac{a+c}{2}\right) + 2f(c) + 4f\left(\frac{c+b}{2}\right) + f(b)]\\\ &\tilde{E}_2 = \frac{1}{15}(S_2 - S_1)\\\ &\text{if} |\tilde{E}_2| \leq \text{tol}\\\ &\space\space\text{return } Q = S_2 + \tilde{E}_2 \\\ &\text{else}\\\ &\space\space Q_1 = \text{quadSimpson}(f, a, c, \text{tol}/2)\\\ &\space\space Q_2 = \text{quadSimpson}(f, c, b, \text{tol}/2)\\\ &\space\space\text{return } Q = Q_1 + Q_2 \\\ \end{aligned} $$ ### Newton’s Method for Nonlinear equations $x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}$ Convergence: if $f, f', f''$ are continuous in a neighborhood of a root $r$ of $f$ and $f'(r) \neq 0$, then $\exists\delta\ >0\space : \space |r-x_0|\leq{\delta}$, then $\forall x_n\space : \space: |r-x_n|\leq{\delta}, |r-x_{n+1}|\leq c(\delta)|r-x_n|^2$ $|e_{n+1}|\leq c(\delta)|e_n|^2$ (Quadratic convergence, order is 2) Let $c(\delta)=\frac{1}{2}*\frac{\max_{|r-x|\leq{\delta}}|f''(x)|}{\min_{|r-x|\leq{\delta}}|f'(x)|}$ For linear system: denote $\mathbf{x}=(x_1,x_2,\cdots,x_n)^T$ and $\mathbf{F}=(f_1,f_2,\cdots,f_n)$, find $\mathbf{x}^{*}$ such that $F(x^{*})=0$ $$ \begin{aligned} F(x^{(k)}) + F'(x^{(k)})(x^{(k+1)}-x^{(k)}) &= 0\\\ F'(x^{(k)}) \space&\text{is the Jacobian of } \mathbf{F} \space\text{at } x^{(k)}\\\ \text{Let } \mathbf{s} &= \mathbf{x}^{(k+1)} - \mathbf{x}^{(k)}\\\ \therefore\space F'(x^{(k)})s &= -F(x^{(k)})\\\ \mathbf{x}^{(k+1)} &= \mathbf{x} ^{(k)} + \mathbf{s}\\\ \end{aligned} $$ ### IVP in ODEs. $$ \begin{aligned} \text{Given } y'=f(t,y), y(a)=c, \text{ find } y(t) \text{ for } t\in[a,b]\\\ y' &\equiv y'(t) \equiv \frac{dy}{dt}\\\ \text{System of n first-order: } y' &= f(t,y), f: \mathbb{R} \times \mathbb{R}^n \rightarrow \mathbb{R}^n\\\ \end{aligned} $$ $$ \begin{aligned} \text{Forward Euler's method (explicit): } y_{t_{i+1}} &\approx y(t_i) + hf(t_i, y_(t_i))\\\ \text{where: }h &= \frac{b-a}{N}, N > 1\\\ h &= \text{step size}\\\ t_0 &= a, t_i=a+ih, i=1,2,\cdots,N\\\ \end{aligned} $$ $\text{Backward Euler's method (implicit): } y_{i+1} = y_i + hf(t_{i+1}, y_{i+1})$ > Non-linear, then apply Newton’s methods $$ \begin{aligned} \text{FE Stability: } y'&=\lambda{y},y(0)=y_0\\\ \text{Exact sol: } y(t)&=y_0e^{\lambda{t}}\\\ \text{FE sol with constant stepsize h: } y_{i+1}&=(1+h\lambda)y_i=(1+h\lambda)^{i+1}y_0\\\ \text{To be numerically stable: } h&\leq{\frac{2}{|\lambda|}}\\\ &\\\ \text{BE Stability: } y'&=\lambda{y},y(0)=y_0\\\ |y_{i+1}| &= \frac{1}{|1-h\lambda|}|y_i| \leq |y_i|\space\forall\space h > 0 \\\ \end{aligned} $$ ### Order, Error, Convergence and Stiffness $$ \begin{aligned} \text{Local truncation error of FE: } &d_i = \frac{y(t_{i+1}) - y(t_i)}{h} - f(t_i, y(t_i)) = \frac{h}{2}y''(\eta_i)\space\text{(q=1)}\\\ \text{Local truncation error of BE: } &d_i = -\frac{h}{2}y''(\xi_i)\space\text{(q=1)}\\\ \end{aligned} $$ $\text{A method of order }q\space\text{ if} q\text{ is the lowest positive int such that any smooth exact sol of }y(t):\max_{i}|d_i|=O(h^q)$ $$ \begin{aligned} \text{Global error: } e_i &= y(t_i) - y_i, i=0,1,\cdots,N\\\ \text{Consider } u' &= f(t,u), u(t_{i-1}) = y_{i-1}, \space\text{local error: }l_i=u(t_i)\\\ \end{aligned} $$ $$ \begin{aligned} \text{Convergence: } &\max_i e_i = \max_i |y(t_i) - y_i| \rightarrow 0 \text{ as } h \rightarrow 0\\\ \end{aligned} $$ > Stiffness is when the stepsize is restricted by stability rather than accuracy ### Runge-Kutta Methods $$ \begin{aligned} \text{Implicit trapezoidal: } y'(t) &= f(t,y), y(t_i)=y_i\\\ y_{i+1} &= y_i + \frac{h}{2} [f(t_i, y_i) + f(t_{i+1}, y_{i+1})]\\\ d_i = O(h^2) &= \frac{y(t_{i+1})-y(t_i)}{h}-\frac{1}{2}[f(t_i,y(t_i)) + f(t_{i+1},y(t_{i+1}))]\\\ &\\\ \text{Explicit trapezoidal: } Y&=y_i+hf(t_i,y_i)\\\ y_{i+1} &= y_i + \frac{h}{2} [f(t_i, y_i) + f(t_{i+1}, Y)]\\\ d_i = O(h^2) &= \frac{y(t_{i+1})-y(t_i)}{h}-\frac{1}{2}[f(t_i,y(t_i)) + f(t_{i+1},y(t_i)+hf(t_i,y(t_i)))]\\\ &\\\ \text{Implicit midpoint: } y_{i+1} &= y_i + hf(t_i+h/2, (y_i+y_{i+1})/2)\\\ \text{Explicit midpoint: } Y &= y_i + \frac{h}{2}f(t_i, y_i)\\\ \end{aligned} $$ Classical RK4: based on Simpson’s quadrature rule, $O(h^4)$ accuracy $$ \begin{align*} Y_1 &= y_i \\\ Y_2 &= y_i + \frac{h}{2}f(t_i, Y_1) \\\ Y_3 &= y_i + \frac{h}{2}f(t_i + \frac{h}{2}, Y_2) \\\ Y_4 &= y_i + hf(t_i + \frac{h}{2}, Y_3) \\\ y_{i+1} &= y_i + \frac{h}{6} [f(t_i, Y_1) + 2f(t_i + \frac{h}{2}, Y_2) + 2f(t_i + \frac{h}{2}, Y_3) + f(t_{i+1}, Y_4)]\\\ \end{align*} $$ --- slug: thoughts/university/twenty-three-twenty-four/compsci-4x03/index tags: - university - swfr4x03 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/index" title: "Scientific Computation" date: 2023-09-04 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/compsci-4x03/index.html.md --- Introduction to Scientific Computation --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors tags: - eng3px3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors" title: "Conversion Factors" date: 2024-01-23 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors.html.md --- See also: [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors.pdf) and [this one](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-04-Conversion-Factors.pdf) Relevant to economic analysis process must: - explicitly incorporated into [NVF](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Net-value-analysis) by giving it _conversion factor_ - included as a hard constraints > conversion factor: convert benefit and costs into common units Determinants: - time, cost of labour, opportunity cost - marginal NV and quantity-dependent conversion Factors ### cost of labours. - wages - materials - overhead: HR, tools/equipment ### cost of time. - overtime shifts, extra works or outsourcing? - additional factor: happiness, time already spent (context: not all time is equal) ### opportunity cost. > negative impact from having to give up the best alternatives > [!tip] Important > > Should always consider this when going forward with a project. - cost of those forgone alternatives in _conversion units_ - costs for not solving other problems - compare NV for solving the other one. > Double counting: mutually exclusive alternatives that is considered as double-counting in calculating NVF. ### conversion function. - quantity-dependent conversion Factors $$ NV_{\text{oranges}}(x) = B_{\text{oranges}}(x) - C_{\text{oranges}}(x) $$ ### marginal value change. > extra net value obtained for one more item $$ \Delta NV = NV(x+1) - NV(x) $$ ### environmental impact conversion. > externalities: of a decision is an impact (benefit or cost) for people _other_ than decision makers. > externalities doesn’t have the same weight to benefits and costs. (failure of incentives) Correct this failure with policies: - taxes: carbon emission - subsidies ### economic of GHG emission. - changes overtime and relatively hard to calculate accurately. - 2022 study in Nature estimates at \\\frac{185}{\text{tonne}}\$ ### health costs. - difficult to answer this, but most common pollutants: $PM_{2.5}$ (fine particulate matter) and $NO$ (Nitrogen oxides) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/table-health-costs.webp) ### ethical consideration > [!question] ethical > > - What is the cost of negative societal/ethical/equality impact? > - Can you put the price on safety? F-N graph Emission for $PM_{2.5}$ per year is $$ \begin{align*} & = \frac{\text{Health cost per year}}{\text{total emission }} \cdot \text{emission off power generation} \cdot \frac{1}{\text{total annual}} \\\ & = \frac{\$166e9}{3.5e6\space \text{tonne}} * 6000 \text{ tones} * \frac{1}{640e9 \text{ kWh}} \\\ &= \$0.0004446429 \text{ per kWh} \\\ \end{align*} $$ --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Finals tags: - eng3px3 description: "Economics for engineer, a guide." title: "Economics for engineer, a guide." date: 2024-04-12 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals.html.md --- ## samples. 4.b 11.e 12.c 13.d 14.b 15.a 16.e 17.c 18.c 19.b 20.c 21.b 22.c 23.b 24.a 25.a 26.e 27.a 28.e 29.a 30.a --- ## [net value function](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function) $$ \text{NVF} = \text{benefit} - \text{cost} $$ ## conversion factors - url: thoughts/.../Conversion-Factors - description: Conversion Factors ### marginal value change. > extra net value obtained for one more item $$ \Delta NV = NV(x+1) - NV(x) $$ [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Conversion-Factors#marginal-value-change) ## optimisation - url: thoughts/.../Optimization - description: Optimization # model-based[](#model-based) - conclusions from the model of the system Components: - decision variables - constraints - objectives - functions: mathematical function that determines the objective as a function of decision variable $$ \begin{align*} \min_{x} \phi = f(x) & &\leftarrow &\space \text{Objective function} \\\ \text{s.t} & &\leftarrow &\space \text{Constraints} \\\ h(x) = 0 & &\leftarrow &\space \text{Equality constraints} \\\ g(x) \leq 0 & &\leftarrow &\space \text{Inequality constraints} \\\ x_{lb} \leq x \leq x_{ub} & &\leftarrow &\space \text{Bounds} \end{align*} $$ ## decision variables ### discrete. > limited to a fixed or countable set of values $$ x_{\mathcal{D}} \mid a \in \mathcal{I} = \lbrace 1, 2, 3, 4, 5 \rbrace $$ ### continuous. > can take any value within a range $$ x_{\mathcal{C}} \subset \mathcal{R} $$ ## constraints - physical limitations: cannot purchase negative raw materials - model assumptions: assumptions about the system > [!tip] > > a decision upper and lower bounds ($x^{\mathcal{U}}$ and $x^{\mathcal{L}}$) > [!note] Properties > > - **Active/binding**: $\exists \space x^{*} \mid g(x^{*}) = 0$ > - **Inactive**: $\exists \space x^{*} \mid g(x^{*}) < 0$ ### graphing models > [!note] feasible set of an optimization model > > The collection of decision variables that satisfy all constraints > > $$ > \mathcal{S} \triangleq \lbrace x : g(x) \leq 0, h(x) = 0, x^L \leq x \leq x^U \rbrace > $$ ## outcomes > [!tip] optimal value > > the optimal value $\phi^{*}$ is the value of the objective at the optimum(s) > > $$ > \phi^{*} \triangleq \phi(x^{*}) > $$ > Constraints satisfy, but it is not binding Linear optimization problems $$ \begin{aligned} \underset{x_1,x_2}{\min} \space \phi &= 50x_1 + 37.5x_2 \\ &\text{s.t} \\\ 0.3x_1 + 0.4x_2 &\geq 2000 \\\ 0.4x_1 + 0.15x_2 &\geq 1500 \\\ 0.2x_1 + 0.35x_2 &\leq 1000, \\\ x_1 &\leq 9000 \\\ x_2 &\leq 6000 \\\ x_i &\geq 0 \end{aligned} $$ See also [Linear Optimization](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization/../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization) [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization#model-based) Linear optimization: - url: thoughts/.../Linear-Optimization - description: Linear Optimization ```math \begin{align*} \min_{x} \phi = c^\mathbf{T} \mathcal{x} & &\leftarrow &\space \text{Objective function} \\\ \text{s.t} & &\leftarrow &\space \text{Constraints} \\\ A_h \mathcal{x} = \mathcal{b}_h & &\leftarrow &\space \text{Equality constraints} \\\ A_g \mathcal{x} \leq \mathcal{b}g \leq 0 & &\leftarrow &\space \text{Inequality constraints} \\\ \mathcal{x}_{lb} \leq \mathcal{x} \leq \mathcal{x}_{ub} & &\leftarrow &\space \text{Variable Bounds} \end{align*} ``` [Lien vers l'original](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization#linops) ## time value of money ### interest Interest $I$ is the compensation for loaning money. > [!tip] interest rate > > $i = \frac{I}{P}$. Thus $F = P(1+i)$ > [!tip] Simple interests > > $I_{\text{each}} = P \times \frac{i}{\text{year}}$, total interest $I = I_{\text{each}} \times N_{\text{year}}$ > > $F_n = P(1 + ni)$ > [!tip] Compound interests > > $F_n = P(1+i)^n$ > [!tip] nominal interest rates > > $r$ is the equivalent yearly rate if interest is withdrawn so it doesn’t compound. (i.e: $r=mi$ where $m$ is the number of compounding periods per year) > [!tip] effective annual interest rates > > $i_{\text{eff}} = (1 + \frac{r}{m})^m - 1$ > [!tip] effective interest rates > > how much interest do you accrue after a year if nominal rate is 12%? $F=P(1+i)^m=P(1+\frac{r}{m})^m$ > [!tip] continuous compounding > > $F = P e^{ry}$ ### net present value $$ \text{NPV} = \text{CF}_0 + \sum_{n=1}^{N}{\frac{\text{CF}_n}{(1+i)^n}} $$ where $\text{CF}_0$ is the initial cash flow, $\text{CF}_n$ is the cash flow at the end of the $n^{th}$ period, $i$ is the _effective interest rate_ > [!tip] discount rate > > Present value $PV = \frac{\text{CF}_t}{(1+r_d)^t}$, where $\text{CF}_t$ is cash flow happening in $t$ years in the future, and $r_d$ is the discount rate. > > sources: opportunity cost, inflation, risk, time preference, inflation, option premium regular deposit: Future value $FV = A \sum_{k=0}^{n-1}(1+i)^k = A \frac{(1+i)^n - 1}{i}$ where $A$ is the monthly, or time period, deposit. fraction of last payment that was interest was $\frac{i}{1+i}$, principal of the last payment is $A = F_{\text{last}}(1+i)$ > [!tip] geometric series > > $$ > \sum_{k=0}^{n-1}r^k = \frac{1-r^n}{1-r} > $$ ### inflation > [!tip] real vs. nominal > > nominal value refers to actual cash flow at the time it hapens, real value refers to equivalent amount of value at reference time, converted using inflation rates. > > real dollar $R = \frac{\text{CF}_n}{(1+r_i)^n}$, where $\text{CF}_n$ is the nominal cash flow at time $n$, and $r_i$ is the effective yearly inflation rate. > [!tip] internal rate of return > > the discount rate that results in a NPV of zero (break-even scenario) > > $$ > \text{CF}_0 + \sum_{n=1}^{N}{\frac{\text{CF}_n}{(1+r_{\text{IRR}})^n}} = 0 > $$ > [!tip] minimum acceptable rate of return > > a rate of return set by stakeholders that must be earned for a project to be accepted > > real vs. nominal MARR: real MARR is MARR if returns are calculated using real dollars, whereas nominal MARR is MARR if returns are calculated using nominal dollars. > > $\text{MARR}_{\text{real}} = \frac{1+\text{MARR}}{1+f} - 1$ where $f$ is the inflation rate ## risk management and stochastic modelling > Convert to dollar/wk to base calculation on same unit uncertainty, evaluating likeliness and potential impact, organize to risk matrix, determine expected impact, then propose mitigation strategies ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/most-critical-risk.webp) > [!tip] expected impact > > the chance it happens multiplied by the impact it will have if it happens. $\text{E[NPV]} = \sum_{i}{\text{NPV}(x_i)p(x_i)}$ > > Then use this to create necessary mitigation ### NPV with risk and uncertainty > [!note] probability distribution > > $p(x)$ of a discrete random variable $x$: Normalization requires that $\sum_{i}{p(x_i)} = 1$ > > PDF (probability density function) $f(x)$ of a continuous random variable $x$: Normalization requires that $\int{p(x)dx} = 1$ > [!tip] expected value for calculating stochastic to deterministic > > of function $f(x)$ is $\text{E}[f] = \sum_{i}{f(x_i)p(x_i)}$ for discrete random variable $x$ with probability distribution $p(x)$ > > of function $f(x)$ is $\text{E}[f] = \int_x{f(x)p(x)dx}$ for continuous random variable $x$ with PDF $p(x)$ > [!note] Normal distribution > > $f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ > > `NORM.DIST(x, mean, stddev, cumulative)`: cumulative is `1` for CDF, `0` for PDF `NORM.INV(RAND(), 0.5, 0.05)`: draw values from a normal distribution with mean 0.5 and stddev 0.05 ### non-linear deterministic and stochastic models mean value $\mu_{x}$ of a random variable $x$ is its own expected value $\text{E}[x]$, variance $\sigma^2_{x}$ is the expected value of the squared deviation from the mean $\text{E}[(x-\mu_x)^2]$, and stddev $\sigma_x$ > [!tip] central limit theorem > > sample size becomes large enough, the distribution of the sample mean will be approximately normally distributed, regardless of the distribution of the population, using [Monte-Carlo](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/Monte-Carlo) simulation > Expected value of linear and nonlinear functions: suppose $x$ and $y$ are independent random variables with means $\mu_x$ and $\mu_y$, and variances $\sigma^2_x$ and $\sigma^2_y$, then $E[x^{2}] = \sigma_x^2 - \mu_x^2$, $E[xy] = \int \int xyp_xp_ydxdy=\int xp_xdx \int yp_ydy=\mu_x \mu_y$ Dealing with 12 months per year: saying outcomes over a year should be **normally distributed** (CLT), with a mean given by expected value of monthly outcome and stddev given stddev of outcome divided by square root of the # of rolls ($\sqrt{12}$) --- ## project management and CPM - scope, cost, time to maximize quality WBS (work breakdown structure): hierarchical decomposition of the total scope of work CPM (critical path method): determine the longest path through the network, the critical path, and the shortest time to complete the project![cpm.webp](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/cpm.webp)cpm.webp crashing a project means using additional resources to shorten a specific task ## supply and demand market equilibrium: where supply and demand curves intersect, quantity demanded equals quantity supplied. shift to right: greater demand, higher price, higher quantity. shift to left: lower demand, lower price, lower quantity. factors of production: land, labour, capital, entrepreneurship determinants of demand: - price: quantity demanded $Q_d$ falls when price $P$ rises and vice versa - prices of related goods: substitutes and complements determinants of supply: - price: quantity supplied $Q_s$ rises when price $P$ rises and vice versa - factors of productions - fiscal policies, taxes, regulation > [!tip] elasticity: how responsive quantity demanded or supplied is to a change in price. > > Surplus when $Q_s > Q_d$, shortage when $Q_s < Q_d$. > > Elasticity of demand: $E_d = \frac{\% \Delta Q_d}{\% \Delta P} = \frac{\mid \frac{P}{Q_D} \mid}{\mid \frac{dP}{dQ_D} \mid}$ > > Elasticity of supply: $E_s = \frac{\% \Delta Q_s}{\% \Delta P} = \frac{\mid \frac{P}{Q_S} \mid}{\mid \frac{dP}{dQ_S} \mid}$ > > higher slope corresponds to lower elasticity: inelastic, lower slope corresponds to higher elasticity: elastic Demand elasticity: $E_D <1$ means if price increases by 5% then demand will decrease by less than 5%, inelastic. $E_D >1$ means if price increases by 5% then demand will decrease by more than 5%, elastic. > [!tip] taxes > > arbitrary lower the equilibrium quantity, > > price seen by consumers vs. suppliers changes depends on relative elasticities of demand and supply: more price change will end up on consumer side > > quantities change depends on total elasticities of demand and supply: more elastic means more quantity change. > [!tip] subsidies > > arbitrary increase the equilibrium quantity, > > price seen by consumers vs. suppliers changes depends on relative elasticities of demand and supply: more price change will end up on consumer side > > quantities change depends on total elasticities of demand and supply: more elastic means more quantity change. ## behavioural economics invisible hand of the market: self-interest of individuals leads to the best outcome for society as a whole, in a free market economy, as rational actors are motivated by incentives. perfect competition: wheat (control of price none, low barrier to entry, high # of producers, products are identical) monopolistic competition: restaurants (control of price low, low barrier to entry, high # of producers, products are similar) oligopoly: airlines (control of price high, high barrier to entry, few producers, products are similar) monopoly: utilities (control of price high, high barrier to entry, one producer, unique product) game theory, most notable [The Prisoner’s Dilemma](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/The-Prisoner's-Dilemma) anti-trust legislation: prevent monopolies, promote competition, protect consumers > behavioural economics: + psychology to look at reasons people make _irrational_ decisions > > “bounded rationality”: you don’t have perfect information, and understand there’s an opportunity cost to get it law of demand and _ultimatum game_: people will pay less for a good if they can get it elsewhere for less, even if they value it more than the price they pay. [Cooperation](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/Cooperation): R. Axelrod’s _The Evolution of Cooperation_ propose a “strategy”, what you do dependent on what the other person does. PPF (production possibility frontier): trade-offs between two goods, given a fixed amount of resources. risk aversion: people prefer a certain outcome to a risky one, even if the expected value of the risky one is higher. ⇒ assume that the given investment is loss, then calculate based on margin gains ## tax, incentives and depreciations _income, corporate, property, sales_ personal income tax: progressive tax rate corporate tax: flat tax rate, regardless of income level → net income: subtracting expenses from gross income. profit on investments will be tax. If yields loss, then offset the loss against the profits from another to pay less tax overall. [optimization](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Finals/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization) strategies: minimize liabilities, timing of expenditures → incorporate into financial models, do sensitivity analysis before-tax MARR: set MARR high enough to include taxes that need to be paid ⇒ for investment’s gross profit after-tax MARR: if tax is explicitly accounted for in the cash flows of the project, then MARR should be lower ⇒ for final investment decisions $$ MARR_{\text{after-tax}} = MARR_{\text{before-tax}} \times (1 - \text{corporate tax rate}) $$ _incentives_: tax credits, tax reliefs, programs to encourage certain activities _depreciation_: due to use-related physical loss, technological obsolescence, functional loss, market fluctuation. > Deprecation is a non-cash expense, but reduces the taxable income of a business. Can deduct annually by spreading the cost of an asset over its useful life. affects NPV (net present value), IRR (internal rate of return), and payback period calculation _Market value_: actual value of the asset can be sold for, estimated _Book value_: deprecated value of the asset, using a depreciation model _Salvage value_: estimated value of the asset at the end of its useful life > [!tip] value calculations > > Depreciation in year $n$ $D(n)$ is the decline in book value over that year: $BV(n) = BV(n-1) - D(n)$ > > Salvage value $SV$ is the book value at object’s EOL: $SV = BV(N) = MV(0) - \sum_{n=1}^{N} D(n)$ > [!note] Straight-line depreciation > > spreads uniformly over useful life, SLD of a period $D_{\text{sl}}(n) = \frac{\text{Purchase price}-\text{Salvage value after N periods}}{\text{N periods of useful life}}$. > > book value at end of $n^{th}$ year: $BV_{\text{sl}}(n) = P - n \times \frac{P-S}{N}$ > [!note] Declining-balance depreciation > > different assets are classified into classes: $D_{\text{db}}(n) = BV_{\text{db}}(n-1) \times d (\text{depreciation rate})$, such that book value at the end of a period $BV_{\text{db}}(n)$ is $BV_{\text{db}}(n) = P(1-d)^n$ > > given salvage value $S$ and period of useful life $N$, depreciation rate $d = 1 - \sqrt[N]{\frac{S}{P}}$ > [!note] Sum-of-years-digits depreciation > > $D_{\text{syd}}(n) = \frac{N-n+1}{\sum_{i=1}^{N} i} \times (P-S)$ > [!note] Unit of production depreciation > > $D_{\text{uop}}(n) = \frac{\text{units produced of period}}{\text{life in \# of units}} \times (P - S)$ > > assumes a SLD but vs. # of units rather than time. --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization tags: - eng3px3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization" title: "Linear Optimization in Economics Analysis" date: 2024-02-08 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization.html.md --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-08---Linear-Optimization.pdf), [optimization](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization) Linearization around [first order Taylor series](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/compsci-4x03/Equations#taylor-series) expansions Usage: - Resource allocation - Project selection - Scheduling and Capital budgeting - Energy network optimization > [!tip] Criteria for optimization models > > - comprised of only **continuous variables** > - **linear objective function** > - either only **linear constraints** or inequality constraints $$ \begin{align*} \min_{x} \phi = c^\mathbf{T} \mathcal{x} & &\leftarrow &\space \text{Objective function} \\\ \text{s.t} & &\leftarrow &\space \text{Constraints} \\\ A_h \mathcal{x} = \mathcal{b}_h & &\leftarrow &\space \text{Equality constraints} \\\ A_g \mathcal{x} \leq \mathcal{b}g \leq 0 & &\leftarrow &\space \text{Inequality constraints} \\\ \mathcal{x}_{lb} \leq \mathcal{x} \leq \mathcal{x}_{ub} & &\leftarrow &\space \text{Variable Bounds} \end{align*} $$ where: - $\mathcal{x} \rightarrow j^{\text{th}}$: decision variables - $c \rightarrow j^{\text{th}}$: cost coefficients of the $j^{\text{th}}$ decision variable - $a_{i, j}$: constraint coefficient for variable $j$ in constraint $i$ - $b_i \rightarrow \text{RHS}$: coefficient for constraint $i$ - $(A_k \mid k = \lbrace \mathcal{h}, \mathcal{g} \rbrace)$: matrix of size $\lbrack m_k \times n \rbrack$ ## Sensitivity reports ### Decision variables **Reduced cost**: the amount of objective function will change if variable bounds are tighten **Allowable increase/decrease**: how much objective coefficient must change before optimal solution changes. > [!note] > > If there are simultaneous changes to objective coefficients, and $\sum_{\text{each coefficient}}(\frac{\text{Proposed change}}{\text{Allowable change}}) \leq 100 \%$ then the optimal solution _would not change_. ### Constraints **Final value**: the value of constraints at the optimal solution **Shadow price**: of a constraint is the marginal improvement of the objective function value if the RHS is increased by 1 unit. **Allowable increase/decrease**: how much the constraint can change before the shadow prices changes. See [lemon\_orange.py](https://cdn.aarnphm.xyz/assets/thoughts/university/twenty-three-twenty-four/eng-3px3/lemon_orange.py) --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function tags: - eng3px3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function" title: "Net value function" date: 2024-01-09 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function.html.md --- See [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-Net-Value-Functions.pdf) ## What is economics? > Relocation of resources > everything _has_ a cost Cost-benefit analysis _Jules Dupuit_ See [Economics evolving: a history of economic thought by _Agnar Sandmo_](https://press.princeton.edu/books/paperback/9780691148427/economics-evolving) ## Net Value functions $$ \text{Net value = [Benefit] - [Cost]} $$ - relativity - perspective: $\text{(Benefit - Cost)}_{\text{client}}$ $$ \text{Benefits}_{\text{client}} > \text{Sale Price} > \text{Cost}_{\text{producer}} $$ $$ \text{System Net Value =} \space \text{Benefits}_{\text{client}} - \text{Cost}_{\text{producer}} $$ $$ \text{NVF = Benefits - Cost of space - Cost of time - ...} $$ Unit matching and conversion > [!notes] marginal value, quantity-dependent value > > _marginal net value_ of buying an apple is the change in NV from buying one more apple (slope of NVF wrt number of apple bought) either subsequent items gives more NV or lower costs. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/marginal-apple-q.webp) --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Net-value-analysis tags: - eng3px3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-value-analysis" title: "Net Value Analysis" date: 2024-01-16 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-value-analysis.html.md --- See [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Net-value-analysis/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Engineering-Economics--and--Net-Value-Applications.pdf) > determine which options has most positive net value for the lab $$ \text{Net value}_{\text{Lab}} = \text{Benefits}_{\text{Lab}} - \text{Cost}_{\text{Lab}} $$ Simulation: gold nanoparticles $$ \begin{aligned} NV_P \text{(relative to purchasing)} &= NV_P - NV_P = 0 \\\ NV_F \text{(relative to purchasing)} &= NV_F - NV_P = C_P - C_F \\\ NV_{nR} \text{(relative to purchasing)} &= NV_{nR} - NV_P = C_P - C_{nR} \\\ \end{aligned} $$ ### relative to purchasing $$ NV = \$896 \, \text{week}^{-1} - \left( \frac{\$5}{100 \, \text{mL}} q_{\text{ingred}} + \frac{\$12.5}{\text{hr}} t_{\text{FumeHood}} + \frac{\$100}{\text{hr}} t_{\text{SEM}} + \frac{\$15}{\text{hr}} t_{\text{GradStudent}} + C_{\text{other}} \right) $$ --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Non-linear-optimization tags: - eng3px3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Non-linear-optimization" title: "Non-linear Optimization" date: 2024-04-12 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Non-linear-optimization.html.md --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Non-linear-optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-09---Nonlinear-Optimization.pdf) --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization tags: - eng3px3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization" title: "Economic Optimization" date: 2024-02-01 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization.html.md --- See also [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-07-Optimization-Problem-Formulation.pdf) # model-based[](#model-based) - conclusions from the model of the system Components: - decision variables - constraints - objectives - functions: mathematical function that determines the objective as a function of decision variable $$ \begin{align*} \min_{x} \phi = f(x) & &\leftarrow &\space \text{Objective function} \\\ \text{s.t} & &\leftarrow &\space \text{Constraints} \\\ h(x) = 0 & &\leftarrow &\space \text{Equality constraints} \\\ g(x) \leq 0 & &\leftarrow &\space \text{Inequality constraints} \\\ x_{lb} \leq x \leq x_{ub} & &\leftarrow &\space \text{Bounds} \end{align*} $$ ## decision variables ### discrete. > limited to a fixed or countable set of values $$ x_{\mathcal{D}} \mid a \in \mathcal{I} = \lbrace 1, 2, 3, 4, 5 \rbrace $$ ### continuous. > can take any value within a range $$ x_{\mathcal{C}} \subset \mathcal{R} $$ ## constraints - physical limitations: cannot purchase negative raw materials - model assumptions: assumptions about the system > [!tip] > > a decision upper and lower bounds ($x^{\mathcal{U}}$ and $x^{\mathcal{L}}$) > [!note] Properties > > - **Active/binding**: $\exists \space x^{*} \mid g(x^{*}) = 0$ > - **Inactive**: $\exists \space x^{*} \mid g(x^{*}) < 0$ ### graphing models > [!note] feasible set of an optimization model > > The collection of decision variables that satisfy all constraints > > $$ > \mathcal{S} \triangleq \lbrace x : g(x) \leq 0, h(x) = 0, x^L \leq x \leq x^U \rbrace > $$ ## outcomes > [!tip] optimal value > > the optimal value $\phi^{*}$ is the value of the objective at the optimum(s) > > $$ > \phi^{*} \triangleq \phi(x^{*}) > $$ > Constraints satisfy, but it is not binding Linear optimization problems $$ \begin{aligned} \underset{x_1,x_2}{\min} \space \phi &= 50x_1 + 37.5x_2 \\ &\text{s.t} \\\ 0.3x_1 + 0.4x_2 &\geq 2000 \\\ 0.4x_1 + 0.15x_2 &\geq 1500 \\\ 0.2x_1 + 0.35x_2 &\leq 1000, \\\ x_1 &\leq 9000 \\\ x_2 &\leq 6000 \\\ x_i &\geq 0 \end{aligned} $$ See also [Linear Optimization](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Optimization/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Linear-Optimization) --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Sensitivity-analysis tags: - eng3px3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Sensitivity-analysis" title: "Sensitivity analysis" date: 2024-02-01 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Sensitivity-analysis.html.md --- See [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Sensitivity-analysis/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-06-Sensitivity-Analysis.pdf) ### Marginal analysis > determining the impact of a decision on net value, especially when the decision is incremental (e.g., change in NV with one more orange) ### Sensitivity analysis > how sensitive the model (i.e., NVF) is to changes in its inputs or parameters (like conversion factors). \= marginal analysis for each variable separately and comparing the results. --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Simple-Report tags: - eng3px3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Simple-Report" title: "NVF for affordable housing" date: 2024-02-05 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Simple-Report.html.md --- The high-level net value function for which is defined by performance parameters and conversion factors: $$ \text{NVF} = \text{HouseSalesRevenue} - \text{LabourCost} - \text{EnergyCost} - \text{MaterialsCost} - \text{R\&D} - \text{UpfrontConstructionCost} $$ Where the performance parameters and conversion factors are defined by the following: - **Productivity and Housing construction rate**: This is defined by the rate of construction and the number of prefabricated units sold. The conversion factor is considered by the sold units, and such generate revenue. The assumption here is that there are specific quota of units to be sold that meet the production capacity, as well as the price per unit sold are within market value. - **Labour cost**: This is defined by the operational cost for given number of workers to build the houses and automations. The conversion factors are derived from the average wage per hour, number of working hours per year, as well as number of workers on the project. The assumption here is that workers are paid accordingly to their job and the number of hours they work. (i.e. no overtime, \$35/hour, 45 hours/week) - **Energy cost**: This is defined by the energy consumption for the construction and operation of the houses. The conversion factors are derived from the consumption per square foot, total operational area, energy price. The assumption here is that the energy efficiency of workplace are within the standard, and using market price for energy consumption per kWh. - **Materials cost**: This is defined by the cost of materials for the construction of the houses. The conversion factors are derived from the average cost of material per square foot, and the total area of the houses, and the number of house built per year. The assumption, similar to as above, is that there is a certain quota of houses to be built, cost of raw materials required for construction meet standards and policies from rule makers. - **R\&D**: This is defined by the yearly budget for research and development to innovate and improve both the construction and the prefabricated units. The conversion factors is a lump sump of the budget allocated for R\&D. The assumption for this is that the budget will be able to afford the best team and resources to innovate on current design. - **Upfront construction cost**: This is defined by the initial investment to start the operation, including factory setup and equipment purchases, compliance with building codes, and other considerations. The conversion factors is an amortized, one-time cost of the initial investment. This will be reflected as the capital expenditure needed for the project. Some of the following considerations are made for the aforementioned [NVF](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Simple-Report/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/Net-Value-Function#net-value-functions), as well as the performance parameters and conversion factors: - **Environmental**: The focus on material uses should be vital, such that all materials are sustainable and non-toxic, to reduce emissions. Additionally, the energy consumption would also be increased due to utilisation of automation software and robotics. Therefore, The NVF should be reflected where both `EnergyCost` and `MaterialsCost` would be increased from initial assumptions and due diligence. - **Regulatory**: Compliance with building codes and different considerations for prefabricated homes are recognized. Compliance can implies additional costs, and therefore `UpfrontConstructionCost` could increase. This could also affect operational feasibility of the projects if any of regulatory requirements are not met. - **Ethical and DEI**: Possible **DEI** concerns include the wage gap, the small number of workforce due to robots and automation, and the potential displacement of workers. Additionally, DEI are also taken into account to provide affordable housing to different socio-economics classes, such that it aligns with broader social objectives. However, this would then also increase `UpfrontConstructionCost`, similar to regulatory considerations. --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/Technical-Design tags: - eng3px3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Technical-Design" title: "Technical Design" date: 2024-01-25 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Technical-Design.html.md --- See also: [slides](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/Technical-Design/../../../../../../../../thoughts/university/twenty-three-twenty-four/eng-3px3/3PX3-05-Tech-Design.pdf) > technical analysis: Using science to determine how variables are related in order to draw conclusions in engineering-relevant context - Licensing is not discipline-specific > engineering design: > > - making decisions: _on the basis of engineering principles_ > - create plans: _for someone to create/modify something_ > - benefit of humans ### terms. 1. Decision variables: - could change about the design 2. Performance parameters - describes how well the realised design works that is relevant to the end users - can’t control performance parameters directly ### optimum engineering design. 1. use **technical analysis** to determine decision variables 2. write **NVF** in terms of _decision variables_ 3. use **optimisation methods** to determine - optimum set of decision variables - corresponding value of NVF - sensitive the optimum set and resulting NVF are to changes in decision variables and other parameters ### validity and assumptions: - push to one extreme --- slug: thoughts/university/twenty-three-twenty-four/eng-3px3/index tags: - university - eng3px3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/index" title: "Engineering Economics" date: 2024-01-09 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/eng-3px3/index.html.md --- Dr. [Matt Minnick](mailto:prof3px3@mcmaster.ca) Objective: 1. Economic principles to make decisions 2. Formulate Net Value function to evaluate and compare value & cost of alternative engineering decision 3. Make assumption or perform necessary research to cope with ambiguity and uncertainty in required tasks 4. Apply fundamentals of cost, price, present value, and other financial metrics 5. Manage group projects and interpersonal relations 6. Economic analysis Progress check-in: - One-pager what you have completed last week, Gaant chart, progress --- slug: thoughts/university/twenty-three-twenty-four/hci-4hc3/Interaction-Critical-Evaluation tags: - sfwr4hc3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/Interaction-Critical-Evaluation" title: "Interaction Critical Evaluation" date: 2023-09-25 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/Interaction-Critical-Evaluation.html.md --- The smartphone stands as a quintessential example of human-centered design in modern technology. Its interaction paradigm is built around the principles of affordances and signifiers; the touchscreen affords gestures such as taps, swipes, and pinches, while the graphical user interface is replete with icons and visual elements that signify their function and operation. For instance, a trash bin icon universally suggests deletion, and an envelope icon suggests messaging or email. The design of the smartphone interface also heavily relies on mappings; the spatial arrangement of apps on the home screen often corresponds to their frequency of use or importance, with the most essential apps placed at the bottom within easy reach of the thumb. Feedback is another critical aspect, with the device providing tactile, visual, or auditory responses to interactions, confirming actions such as sending a message or taking a photo. The smartphone’s conceptual model is designed to be intuitive, often mirroring real-world objects and actions, which reduces the learning curve and makes the technology accessible to a broad audience. However, despite the general usability, smartphones can sometimes lead to unintended interactions, such as accidental inputs when the device is in a user’s pocket, commonly referred to as ‘pocket dialing.’ This phenomenon supports the hypothesis that while the design is highly optimized for intentional use, it can occasionally misinterpret unintentional user input as valid. Nonetheless, the smartphone’s design is overwhelmingly helpful and useful, enabling a vast array of tasks to be performed with a single, portable device. It is a powerful testament to human-centered design, with its success lying in its ability to evolve continually, integrating feedback from millions of users to refine its interaction model. The smartphone not only accomplishes its intended tasks but also anticipates and adapts to user needs, often extending beyond its basic functions to serve as a camera, a GPS device, a gaming console, and much more, making it an indispensable tool in daily life. The convection oven stove is a staple in many kitchens (including mine), offering a combination of traditional stove top cooking and the advanced technology of convection baking. In terms of affordances, the stove provides clear cues for interaction; burners afford placing pots and pans, and the oven affords inserting food for baking or roasting. The knobs and buttons are signifiers that indicate where to interact to adjust the temperature and settings. The design typically includes mappings that are logical and aligned with the user’s expectations; for instance, turning a knob to the right often increases the heat, which is a standard convention in many cultures. Feedback is immediate and informative; the glow of an electric burner or the ignition click of a gas stove provides a clear indication that the stove is operational, while digital displays on the oven relay the temperature and cooking mode. The conceptual model of a convection oven stove is built upon the user’s familiarity with cooking appliances, leveraging analogies to traditional ovens and stoves while introducing new features like fan-assisted cooking, which improves heat distribution and cooking times. Despite these intuitive design elements, there can be unintended interactions or experiences. For example, There are knobs on my convection oven that are relatively confusing and its software interface are often times to complex for my daily usage. Additionally, the stove’s flat surface can sometimes make it unclear whether a burner is hot, which can be a safety hazard if the only feedback is visual and not tactile. Observations that support these unintended interactions include anecdotal evidence of users accidentally leaving the convection feature on or off, misunderstanding the icons that indicate convection settings, or touching a hot surface without realizing it because the stove lacks adequate warning indicators for residual heat. In conclusion, while the convection oven stove is designed to enhance the cooking experience by providing more uniform heat and faster cooking times, it is not without its usability challenges. The design is generally helpful, facilitating a wide range of cooking tasks, but it requires users to adapt and learn cooking techniques specific to convection as well as the oven specific interface. Improvements could be made to enhance the user experience, such as better signifiers for the convection feature and clearer safety warnings for hot surfaces. Last but not least, a smart fridge represents a leap forward in kitchen appliance technology, integrating features such as inventory tracking, internet connectivity, and even internal cameras. The affordances of a smart fridge are similar to those of traditional refrigerators, such as storing food at cool temperatures, but they also include interactive touch screens and the ability to sync with other smart devices. Signifiers are evident in the design of the touch screen interface, which often uses icons and menus to indicate where to tap to access features like temperature control, shopping lists, or to view the contents of the fridge via an internal camera. Mappings in a smart fridge are designed to be intuitive; for instance, adjusting the temperature settings involves sliding a bar, which corresponds with the user’s mental model of up for more and down for less. Feedback is provided through the touch screen with visual confirmation when a setting is changed, or when the fridge door is left open, sometimes accompanied by an auditory alert. The conceptual model of a smart fridge is built upon the idea that a refrigerator can be more than just a cooling appliance; it can be a food management system. It assumes that users will understand and appreciate the additional functionalities, like being able to check the contents of their fridge from their smartphone while at the grocery store. However, smart fridges can introduce unintended interactions. I find the multitude of features overwhelming or non-essential, leading to underutilization of the technology. For instance, if the interface is pretty cluttered or complex, and sometimes I struggle to perform even simple tasks like changing the temperature. Moreover, if the fridge’s software requires regular updates or experiences glitches, it can lead to frustration or even temporary loss of basic functionalities. Observations that support these potential issues include users ignoring smart features and using the fridge as a traditional refrigerator, or instances where a software malfunction may cause the interface to freeze or become unresponsive, requiring a reset or technical support. In assessing the helpfulness and usefulness of the smart fridge’s design, it’s clear that it aims to enhance the user’s experience by integrating with their digital life and providing convenience. However, the design’s success is contingent upon the user’s engagement with the smart features and their tolerance for adopting new technology in a traditionally non-technical space. While the smart fridge is a forward-thinking appliance, it must balance its advanced capabilities with the fundamental requirement of being user-friendly and reliable in performing its primary task of food preservation. --- slug: thoughts/university/twenty-three-twenty-four/hci-4hc3/Interactive-cycle tags: - sfwr4hc3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/Interactive-cycle" title: "Interactive cycle" date: 2023-10-10 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/Interactive-cycle.html.md --- ```mermaid flowchart TD 1[Computer] --> 2[Interaction] --> 3[User] 4[Input] --> 5[Interface] --> 6[Output] 1 --> 6 --> 3 --> 4 --> 1 ``` --- slug: thoughts/university/twenty-three-twenty-four/hci-4hc3/Psychopathology-of-everything tags: - sfwr4hc3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/Psychopathology-of-everything" title: "Psychopathology of everything" date: 2023-10-10 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/Psychopathology-of-everything.html.md --- ### The complexity of modern devices > What is a good design? > > - Discoverability > > - possible to figure out what actions are possible > - where and how to perform them > > - Understanding > > - What does it mean? > - How is it supposed to be used | Design fields | Purpose | Optimisation target | Users | | ------------- | ----------------------------- | ------------------------------------------------------------------------------ | -------------------- | | Industrial | form & material | function, value, appearance of the product & system | Users & manufacturer | | Interaction | understandability & usability | understanding in technology interaction, upon psychology, design, art | users | | Experience | emotional impact | designing products focused placed on quality and enjoyment of total experience | Users | ?: What are the deficiency in human-machine interaction? - limitation of today-technology - self-imposed restriction such as: cost - lack of understanding of the design principles > Human Centred Design is an approach that puts human needs, capabilities, and behaviour first, then design to accommodate those needs, capabilities and machine behaviour. | Experience Design Industrial Design Interaction design | Areas of Focus | | ------------------------------------------------------ | ------------------------------------------------------------------------------------------------- | | Human-centred design | Process that ensures design match needs and capabilities of the people for whom they are intended | ### Fundamental principles of Interaction #### Experience - how fondly people remember their interaction - discoverability - affordances - signifiers - contraints - mappings - feedback - conceptual model of the system --- slug: thoughts/university/twenty-three-twenty-four/hci-4hc3/System-Image-and-Paradox-of-Technology tags: - swfr4hc3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/System-Image-and-Paradox-of-Technology" title: "System Image and Paradox of Technology" date: 2023-09-12 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/System-Image-and-Paradox-of-Technology.html.md --- [Fundamental principles of Interaction](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/System-Image-and-Paradox-of-Technology/../../../../../../../../thoughts/university/twenty-three-twenty-four/hci-4hc3/Psychopathology-of-everything#fundamental-principles-of-interaction) goes into the fundamentals of interaction when designing a product → conceptual models - People create mental models of themselves, environment, interaction - Designer conceptual model vs. Users conceptual models > What does it takes to create a good conceptual models? > > - Users study? > - Low-fidelity prototype Paradox of technology The same technology that simplifies life by providing more functions in each device also complicates life by making the device harder to learn, harder to us Design challenges - Price - features parity - reliability - supports --- slug: thoughts/university/twenty-three-twenty-four/hci-4hc3/index tags: - university - sfwr4hc3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/index" title: "Human Centred Design" date: 2023-09-04 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/hci-4hc3/index.html.md --- The following includes notes for the following course 4HC3 - Human-Centred Interface --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle tags: - philos1aa3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle" title: "Aristotle" date: 2023-09-11 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle.html.md --- Two types of learning - From perception to habit. Animal association of particulars. - From perception to belief. Rational cognition of particulars. This is the learning of experience (emperia). > being qua being > Metaphysics: The philosophical study of being qua being Against Plato - being = idea - how do ideas cause particulars? - how do ideas cause motion? > ideas, immaterial and changeless, are real seems unrealistic Form: a thing’s organization or disposition to behave Essence: a principle of reality Matter: potential for a substance to change ### Are Aristotle’s forms the same as Plato’s Ideas? They are similar in what they are expected to do, but they work in different ways. For Plato, the Idea of horse is different from every particular horse. It is a separate entity, immaterial, changeless, and better, more real than any fleshy animal. For Aristotle, forms have no existence separate from the individual substances whose form they are. Where there is a form, there is a particular substance. Light is the actualization of a potential state of a transparent medium. It is an accident of a transparent medium. The medium is a substance: air. It has accidents. One of these accidents is to become illuminated in the presence of colored bodies. Which is what we see as light. ## The Unmoved Mover (Metaphysics, Books 12, Chapter 6-7) Unmoved Mover causes motion without itself moving. Even without moving, a thing can cause other things to move toward it by causing love or desire. Something loved or desired need make no motion of its own to cause things move toward it; it initiates motion without moving. That is how the Unmoved Mover moves things—by being the object of love and desire • Necessarily exists (cannot not exist) • The final cause of motion in nature • The comprehensive reason for everything else • Divine • Alive and happy (because imperturbable) --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Descartes tags: - philos1aa3 - seed description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Descartes" title: "Descartes" date: 2023-12-09 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Descartes.html.md --- > Descartes’s Method of Doubt: Press doubt as far as possible in order to find the boundaries of knowledge. --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Epicurus tags: - philosophy - philos1aa3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Epicurus" title: "Epicurus" date: 2023-11-09 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Epicurus.html.md --- [Socrates](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Epicurus/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates) ideas: - care for self → happiness and virtue are problems of knowledge - Idealism: Being is Idea; fundamental reality is immaterial, spiritual, and rational - Never by itself lead anyone astray Epicurus - Emphasize the value of philosophy as care of the self. - Deny Idealism, affirm matterialism in the form of atomism ## Desire > completely innocent, if it goes wrong because of its belief May be necessary or not necessary ### Necessary _non-satisfaction brings pain_ - Happiness (philosophy, friends) - Life (food, water) - Untroubled body (law, leisure) ### Not necessary _non-satisfaction not necessarily painful. Any pain of non-satisfaction relieved by other means (change one’s opinion about the object)_ - Natural (sex, immortality) - Conventional (reputation) ## The notoriety of Epicurus - materialism, denying the spiritual in nature - belief in chance and no final purpose - disbelief in afterlife - hedonism: pleasure is the highest good ## Idea of Pleasure - A feeling, not a sensation - An evaluation of sensation - Pleasure and pain are distinct qualities, like the two poles of a magnet. Neither is merely the lack of the other ## Pleasure ### Kinetic _depends on an object and is intermittent or discontinuous_ Shadowed by pain Excess → produce pain ### Katastematic _Continuous, independent of external objects._ Types: - _Aponia_: leisure, physical ease, stressless well-being - _Ataraxia_: untroubled, tranquil mind Un-quantified terms _Reasons for promoting pleasure as _the highest good__ 1. Cradle Argument The goodness of pleasure is learned in the cradle. The first good, naturally pursued 2. Conceptual Argument Concept of good becomes meaningless when conceived as independent of pleasure > The more obstructive, unpersuasive the definition comes ## Plato’s Against Pleasure as the Good - Pleasure is the replenishment of lack - Life spent in pursuit of pleasure constantly tries to fullfill newly arising lack - Any pleasure is made better by adding virtue. - Pleasure plus wisdom is better than pleasure without wisdom - Pleasure plus courage is better than pleasure without courage > So pleasure cannot be the highest good ⇒ Answer of Epicurus: > Wisdom, courage, and all the virtues _are_ katastematic pleasures Higher and lower hedonism ## Virtues > [!note] NOTE > > Personal qualities that assist us in the pursuit of happiness Katastematic Virtues according to Epicurus ### Prudence, practical wisdom Truly prudence have knowledge of kinetic pleasure ⇒ whether to choose or avoid it ⇒ never interfere with their katastematic pleasure Successful life =: uncanny unsucessful life in the eye of the world Aim for self-sufficiency, cultivate leisures, prefer private life, private pleasure, low-profile Learn to live in the little → circumstances changes ⇒ make due with less ### Self-sufficiency ### Frugality Less toy ⇒ more katastematic pleasure Wealth shouldn’t be the most important Doesn’t advocate poverty → invokes how we think wealth in a new way Wealth is not money, but the mean to enjoy life > Wealth is an abundant of katastematic pleasures Basis in nature ⇒ easy to apply ### Friendship Being a friend, having friend ⇒ support katastematic pleasures Awareness among friends that they _are not alone_ Sense of security := katastematic pleasure of virtues ### Justice Can’t be happy when act unjust ⇒ Epicurus believes in social contract. Human lives without any organisation 1st Civilisation: Agreement to prevent harm among themselves - Why?: Motives is not fear, but the desire for friendship. Unpleasant to prepare to fight at every moment → Violence is not a way of life > [!tip] IMPORTANT > > Justice and pleasures are fundamental building blocks of society Saw the needs for more formal definition of contract ⇒ Law and Justice > Justice is neither natural nor sheer conventional. Originated from conventional, but the motive is natural (pleasure of security and friendship) - Justice is an conventional good contrived to promote pleasure - Not eternal. Justice changes as circumstances change - Not inherently good. Good as a means to the higher end of pleasure. ## Challenge to Religion - Our world is one of infinite worlds in endless void - Nothing spiritual in nature. Human beings not special in nature. They are animals, system of matter, like everything else. Death is extinction. - The gods takes no interest in human affairs and cannot be moved by sacrifice or prayer. - Religious ceremonies are superstitious. They are the way a powerful few control the rest. The aim of philosophy is to liberate people from superstition. ## Tetrapharmakos _The four-fold remedy_ - The gods present no fears - Death presents no worry - The good is readily attainable - The terrible is readily endurable --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill tags: - philos1aa3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill" title: "John Stuart Mill" date: 2023-11-30 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill.html.md --- On _Liberty_ > “There are but few persons… whose experiments, if adopted by others, would be likely to be of any improvement on established practice. But these few are the salt of the earth; without them, human life would become a stagnant pool. Not only is it they who introduce good things which did not before exist; it is they who keep the life in those which already exist.” (252) > “The general tendency of things throughout the world is to render mediocrity the ascendant power among mankind.” (253) > “The initiation of all wise or noble things comes and must come from individuals.” (253) > ""What crushes individuality is despotism” (251) Experiments in living ⇒ praises for these people. Liberty is limited by the requirements of _Do no harm_ Self-regarding actions vs. Other-regarding actions Self: be as self, be as different Other: Do no harm If something is not-moral: Then Moral is wrong Utilitarianism: Individual’s expression - not maximise pleasure - but maximise **progress of humanity** > Harmonious development of humanity Unlimited regards to actions to self-regarding actions, but not other-regarding actions. ### Moral Individuality As individuals as we do, but still be Morals to others [Moral](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill/../../../../../../../../thoughts/moral) is good thing. Ask [Nietzsche](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill/../../../../../../../../thoughts/Philosophy-and-Nietzsche)’s, what is good thing? --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche tags: - philosophy - philos1aa3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche" title: "Nietzsche's Life" date: 2023-11-30 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche.html.md --- _1844-1900_ Theology, acquainted with Greek philosophy _Twilight of the Idols, 1889_ # Problem of Socrates.[](#problem-of-socrates) **Decadent: in decline, decay. “Doing a bad thing carefully.”** > An unexamined life is not worth living. Only Knowledge alone makes life worth living Importance things in life: goodness, happiness, depends on reasons, arguments → bizarre equations. > Maybe something wrong with this? What drives Socrates from this demands with reasons and knowledge? What are the motives? Hume: Common life Plato: Turns away from appearance, material life to know the true being Artist: playful presentation, loving life for the work Philosophers: Not joy, serious, engaged in serious business, striving to know the truth. Invent nothing, contemplate what is, what is true, “being qua being” > One must be all means stretch out one’s fingers and make the attempt to grasp this amazing finesse, that the value of life cannot be estimated. (269) Life is not a closed books, but evolving still. # God is Dead[](#god-is-dead) _God of the philosophers_ Atheism Denis Diderot: “It is … very important not to mistake hemlock for parsley; but to believe or not to believe in god, is not important at all” Nietzsche: “God is dead”. Means optimism, faith in science, the redemptive power of knowledge is “dead”, that is, unconvicing, hard to take seriously. _Nihilism: The highest values are devaluing themselves_ How do we have duty on truth? Doesn’t need reasons to be atheism? For N, believing god is passing into the past, and have no feature. Science has devitalise god. So to die the superior value of truth. Value of Truth is problematic. ## From _Thus Spoke Zarathustra_ Contrast between Nobility and goodness _noble spirits vs. good_ nobility > good noble people: maintain nobility, might considered by other as setback not become “a churl” (churlish, misanthropic, a hater of humanity) ## Morality as Anti-Nature Critiques in Christianity, - Anti-nature because anti-difference, when nature is all difference - Anti-nature because it values people all the same, when in nature, by nature, we are amazingly different. Security with “herd mentality” Regards as sign of decline, docile (democracy, or [John Stuart Mill](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill)’s instution) > Obstacles that are good for us: Becomes who we are Herd = human society - like sheep: unhappy on our own. Watch each other carefully, follow into our line - all herd: All animals all nervously watch each others Peace is overrated, as defeats kind of challenges to grow for leaders herd = security (survival values, of Europe at this time?) Everything modern people thinks good is bad, things judges to be evil could turn out to be good in the future. Decadent: arts and philosophy ⇒ artists wants to play with empiricists, doesn’t care about the truth (philosophers’ motive) Arguments against morality: One rule for everything Judge everything by one rules: Morality reduces human to singularity Object to Kant’s Morals, but not [John Stuart Mill](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/John-Stuart-Mill)’s Utilitarianism Respect for others is imposed on us ⇒ refused to embrace > Morals compromise creativity, and value more creativity more than morals (beyond good and evil) ### What I Owe the Ancients _The Birth of Tragedy, 1872_ Greek tragic drama - Aeschylus, _Oresteia_ - Sophocles, _Oedipus the King_ - Euripides, _Bacchae_ > Why do we enjoy tragedies? Why do we enjoy watching people suffer? > “All becoming and growing - all that guarantees a future - involves pain.” (282) > “Art is worth more than truth.” Shares’ [Plato](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato)’s view of democracy. Values life of creativity, to invent new values. Acknowledges death, suffering, tragedy. Not defects, or overcome by science. Knowledge can demoralise people (Knowledge is good) Knowledge is a very thing that it is good. Knowledge is not the path to virtue and happiness > Science tells us no good in itself, no purposes in itself. Values are selected by us, not stumbled upon. Life has no values, since we cannot see all that life has to offer. It is a place for adventure. Creating values is science not do, but art can ⇒ art is more important than truth. --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous tags: - philos1aa3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous" title: "Nous" date: 2023-12-07 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous.html.md --- Notes: [notes](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/All.pdf) Reference: [text](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/1A3Reader\(2019\).pdf) ### [Socrates](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates)’s Idea of Good - Something is good when it contributes to the flourish of human being - Deny democracy - the masses are childish - unnatural, confuses freedom with lack of restraint - inefficient - bad at financial management In [Phaedo](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Phaedo-and-Apology): `The body confuses the soul and does not allow it to acquire truth and wisdom` _See [Apology](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato#apology) for more information_ _Arguments for Survival_ Idea of all things come into being from their opposite → The have come from the death a soul _must_ exist despite being dead - Understanding of perfection is independent of experience - To have knowledge independently of experience → soul must have exists b4 body - Yes, soul will survive death, because soul that exists before birth must come from something dead - does not requires a living body to be a living soul _Against soul scattering_ - can dissolve and scatter must be composite - composite changes, simple doesn’t - Idea are simple - Understanding ideas is a pure power of mind - ideas are simple → soul understand them are also simple - Soul does not consist of parts → cannot change - Soul brings life to a body → death changes the body, but the souls live on. - Idea of the Even cannot become odd, Hot cannot become cold, soul, makes a body alive cannot die. _See [Republic](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato#republic)_ - Belief is liable to error, knowledge is not. - Belief can be changed by persuasion, knowledge cannot be. - Belief does not bring understanding, knowledge does. - True belief, right opinion, is still essentially belief or opinion, and cannot be knowledge since its truth is accidental. - Opinion is shameful because it is not a passive thing that innocently occurs to a person. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/knowledge-map-thoughts.webp) _Allegory of the Cave_ Plato’s pessimism We are sunk in error, addicted to opinion, and democracy is hopeless. It is the political expression of minds ruled by opinion, bereft of wise knowledge. Plato’s optimism The cosmos is organized by goodness. By gasping that we understand the world we live in, and by understanding that we understand how best to live. > And we can understand that, the idea of the good. At least some of us can. They are the philosophers, masters of the dialectic, and they should govern the rest. [Aristotle](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Aristotle) ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/aristotle-metaphysics.webp)![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/aristotle-form.webp) _What is Truth?_ > “To say of what is that it is, and of what is not that it is not, is true.” correspondence theory of truth > Truth is the correspondence of substance and statement _Theory of Causes_ - Formal cause: law of change - Material cause: material persisting through change - Efficient cause: agent of change - Final cause (teleological cause, _telos_): purpose of change [Epicurus](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Epicurus#desire) - God doesn’t concern human affair - nothing to fear > Happiness is uninterrupted tranquility. If intervene, from some disturbance of tranquility existence of evils proves indifference to gods. Atomism Doesn’t against the Soul, only against its immateriality [Stoic](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Stoic) Cynic principles Materialism without atomism Matter is continuous and without void. No empty space. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/epistemology.webp) _Free will and determinism_ > Epictetus: “If a good man could foresee the future, he would cooperate with sickness, death, and mutilation; for he would be aware that this had been ordained by the universal order of things, and that the whole is more important than the parts.” All causes are either: 1. _Antecedent causes_: events leading up to a change. 2. _Active, operating causes_: immediately produce the effect. _Moral_ The highest good (= virtue) is right volition. Every act is chosen, voluntary. No moral luck. Whether life goes well or ill is completely in our control. Suffering is a kind of error, a cognitive mistake, due to wrong judgment and false belief. [Descartes](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Descartes) Muller-Lyer Illusion > The idea of god is a perfect being > I think, therefore, I exists Thinking implies existence provides a test for truth, and _cognito_ Descartes equates material substance (matter, body) with spatial extension. The essence of body, what makes a body corporeal or material, is spatial extension. _Impliciations_ 1. primary and secondary qualities ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/descartes-qualities.webp) 2. Plenum > space is identical with matter, then the idea of “empty space” becomes impossible. The physical universe is therefore “filled up,” a plenum, with no empty space. 3. Inertness - Spatial extension is the whole essence of matter. - No other quality, except those primary qualities that necessarily accompany extension - Motion is not essential to a body. If a body moves, motion was transmitted to it from another moving body. 4. Mind-body problem - Mental perceptions vs. Physical causation [Sphinoza](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Sphinoza) Deus sive Nature Descartes: Mind and body are separate substances. A person is a substantial union of thinking substance and extended substance. Spinoza: A human being cannot be a substance. 1. Substance cannot **not** exist; it is a necessary, self-caused, _causa sui_ being. 2. No human being is a necessary, self-caused being. 3. Therefore no human being is a substance. Even less can a human being be what Descartes said—a composite of two substances. ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Nous/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/spinoza-knowledge.webp) --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato tags: - philosophy - philos1aa3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato" title: "Plato" date: 2023-11-08 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato.html.md --- See also [Socrates](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates) # Apology[](#apology) - Defence against charges against them - Promote study under the earth and sky - Corrupt youth use of Athens → promote new gods (last for 1 day) Poisons - Contrary to the charges → never takes money for instruction - Available to anyone without any fees → Socrates didn’t teach, just talk in public - Converse with knowledgable → these ppl never shares their wisdom - Socrates go after people? - Need to embarrass them? - Oracle at Delphi → Anyone wiser than Socrates? - No one is wiser than Socrates - Socrates doesn’t feel wise? - Enquire about what _wisdom_ is? - Enquire knowledgable → bored - Enquire poets → Wisdom doesn’t come from poets (poets inspired from wisdom of the god) - Enquire technicians → not the right source of knowledge for wisdom - Craftman == incapable of explain why he does what he does ⇒ verbal explanation is irrelevant > Wisdom can be used both for good and bad ```poetry language=fr Wisdom is a penetrate into the goods, not the bad ``` Oracle at Delphi → reputation for enigma (ironic, play with words) Oracle could mock humanity ```poetry language=fr No one is wiser than Socrates ``` - He is the wisest - No one is wise (Socrates is as wise, and he knows he is not wise → no hope) - He is the wisest because he knows he is not wisest > He does not know anything **fine** and **good** Socrates posses no expertise in making goodness > Socrates: Mission to relentless for people to confront them for their ignorance, takes care of your soul Convicted by the stupidity of them all ### Dialog _Dikasts_ → Arguments: - Behaviour is not subversive, contrary to their beliefs, and doesn’t harm the city - Actions are sanctified by the gods → You don’t understand anything - Follow his conscience rather than follow their democracy - Contempt towards democracy, Mockery - Democracy is a childish form of government > My trial will be equivalent to a doctor being prosecuted by a pastry-cook before a jury of children #### Context of the trial - Athens → democracy from 508 to 322 BCE - Peloponnesian War → Sparta defats Athens - Alkibiades: friend of Socrates - Dissolute - Conflicted about this character - Admire his charm, leadership quality - Poty, aristocratic → fear from his ambition - Fear from friends of Alkibiades - His family is from Spartan - Charges for treason → Sentenced to death → Resurface in Spartan → democracy is corrupted - Assassination attempted - Return to Athens as a hero - 404 → Athens accepted terms to surrender to Spartan - Assassinated while traveling → probable enemy of Democracy (Socrates) → look for someone to blame - Because he was a friend of Alkibiades → find scapegoat for a failing democracy Among philoshophers, poets to be contempted with Athens democracy: - Masses are childish, fickle, easily misled - Unnatural, tyranny of the weak over the strong - Confuses freedom with lack of restraint, favors flatterers - Inefficient > Government should be efficient → chose the best person to govern Socratic rules would not be Aristocrat, rather experts, masters acquired by rulership. Philosophical rulers → - Crucial to make them wise and knowledgable - Establish education → reason well and follow reasons [Nietzsche](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Plato/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Nietzsche) → problems with Socrates - Reasoned and consciously sound - Un-examined mind is not worth living? > What makes wisdom so good? ## Socratic Idea of Good > Something is good when it contributes to the full flourishing of a human being in all our powers and faculties for the natural duration of life. Know how to use all things in a way it tends good What is not good might not be wise? → really cares about Athens, wisdom, then why not participate and join in debates? - Afraid participating in this democracy? - Heard this voice → never tells him what to do, speak against things from what he is considering doing - Turned him away from doing wrong things - Too honest to survive if occupied with justice - Can’t serve justice with service - Man whose serve justice must live a private not public life → Never does the voice tells him to defence himself → Thought about fleeing → Voice told him not to → What happens after trial may be not a bad thing → Death is not a bad thing → Socrates is unmoved after receiving death > You too must be a good hope in place of death. A good man cannot be harmed What you think to harm me harms you a lot more He knows there is a life after death ? > Divine mission: encourage Athenians in self-reflection for good moral life Difference - Gods are perfectly just and follow moral standard, whereas everyone else (Holmer show us Gods brought up death) doesn’t - Poets: Gods bring meaningless suffering to people → Socrates thinks this is wrong, and poets should be defamed by not worshipping gods Virtuous - masculine - Conventionally good, knowledge within wisdom are good only if it is used wisely What does wisdom do for us? - Knowledge of the goods, and the power that comes with it - Knowing the good universally and philosophically, and from that > A good person cannot be harmed The unexamined life is not worth living Doing wrong is worse than suffering wrong Riches and power contribute nothing to happiness. Only wisdom and virtue matter, and wisdom is the ultimate virtue # Phaedo[](#phaedo) _by Plato_ _a month later, Socrates’ in jail, waiting for his execution_ Tries to write poetry - Write to gods of Apollo (Oracle at Delphi) - fables of issa - Recurring dream accross his life → ends with him hearing a voice: Socrates, practice art _last day of life_ jurors thinks death is the worst harm, yet Socrates said: > That might be true for somebody. Philosophers should fear death less than anyone else Philosophy is a apparition of practice of death What is death? - Turning away images of body, to intellectual form of ideas - Free their souls from the confusion of the mind - Body is an obstacle to knowledge - Truth is known by intellect, reasoning → involves the best part of the body, ignoring all sense > The body confuses the soul and does not allow it to acquire truth and wisdom. As long as we have a body and our soul is fused with such an evil and well shall never adequately attain what we want, which truth. > If we are ever to have pure knowledge, we must escape from the body and observe things in themselves with the soul by itself. It seems likely that we shall attain. wisdom only when we are dead (65d-66e) Explains the value of philosophy as a preparation of death free of deception, sensory of self → attain the wisdom that wait us from the other side Is there another side? Last hour: Prove: The soul cannot die. - Contrived and unconvincing - Life after death? Arguments: 1. Arguments of survival 1. All things come into being from their opposite Living come from the dead, and the dead comes from the living - To have come from the dead the soul 2. 1. Understanding of perfection is independent of sense experience 1. A _priori knowledge_: independent of experience 2. A _posterior knowledge_: depends on experience 2. To have knowledge independently of experience → the soul must have been alive prior to bodily life → will it survive death? 3. Yes, because a soul that exists before birth must come from something dead → association with a living body is not essential to a soul Do not require a living body to be a living soul 3. Against soul scattering - Soul can dissolve and scatter must be composite → _What is composite changes, what is simple does not change_ - Ideas like Equality or Justice do not change → Ideas are simple, not composite (simple = non-composite) - Understanding ideas is a **pure** power of mind and does not depend on the body - Since ideas are simple → soul that understands them must also be simple - So a soul does not consist of parts, is indivisible, and therefore cannot change - So death cannot change the soul 4. 1. Soul brings life to the body → makes body a live, in the way that the form of the even make six, for of Hot make fire hot 2. The idea of Even cannot become odd, the idea of the hot cannot become cold. > Pythagorean: Soul is more than all of what makes a body → Soul is a being in its own right, separate entity, detachable from body, which makes the body alive No one does evil without knowing that they are doing evil ⇒ Conclude: Cannot harm a good man, cannot kill a soul, tranquil I shall no longer with you, offer his cup of poison → reminds what is important Reserve to Crito: - We owe a cock to a Symposium. Make the offering and do not forgive → cure to a disease, from his body. Cannot trust from the world. It lacks things from what we need → The nether is much better Plato: Such was the end of the comrade. The men who does he knows Conquer the west → Western civilisation places a lot of sign in faith ### Symposium _some time after Socrates’ death_ _Agathon, a poet of tragedies, is the host_ > Soul is an abstraction from empirical knowledge. Alkibiades, of Scocrates: > This utterly unnatural, this truly extraordinary man … this hopelessly arrogant, this unbelievably insolent man … \[of] amazing arrogance and pride … he unique - Seduction: is a game wrt Socrates → Socrates life is one big game of irony Alkibiades: Seduction for a trade - invites Socrates to wrestle in the gym ⇐ doesn’t work - drunk (no one has seen him drunk). Lies down → makes move → nothing > Finding yourself falling in love with Socrates > > - Pretends to fall in love → others will fall in love with him Ordinary irony: False ⇒ to implies the opposite Socratic irony: Both is and is not seriously meant ⇒ True in one way, false in another > _to Euthyphro_: You think that your knowledge of the divine, and of piety and impiety, is so accurate that … you have no fear of having acted impiously in bringing your father to trial - Knowledge of divine is false ⇒ false ⇒ Euthyphro: self-righteous fouls doesn’t understand anything Angers when seeing Socrates falling for someone else. Future replicates the past. Love must be more than intense alliances from others. > Socrates can’t teach anybody anything that they don’t already know > > - Alkibiades is so rough beneath the smooth surface → not interesting to bring in as disciple > - Alkibiades knows himself too much to love, to be drawn to Socrates > - Maybe Alkibiades can be redeemed by philosophers ## Republic 1. Metaphysics: philosophical theory of being > To be is to be an idea. Idea means ideal form, means it is perfect, doesn’t know by the triad or the body Idea is intelligible (things grasped by intellect), not sensible (sense-perceptible things) being. True reality is world of ideas and immaterial, changeless, ethereal, fully rational. > Opinions without knowledge are shameful and ugly things. Sounds are copy of ideas ← bad > Idea of good is the idea to make things to be good. 1. Idea of good is the most important thing that everyone know - By conforming this, everything else would be useful and beneficient - Don’t know the idea of good = everything else is useless > Obtain knowledge of good is the foundation of Platonic philosophy Merely believed to be true: Everyone wants what is good > Every souls persue the goods and do what it is good Can’t understand our own goods without knowing how our own goods integrate with the best self of everything. → makes people the best of what they can be. Satisfied with opinion where knowledge is out there to be found? Opinion: doxa Knowledge: episteme Understanding: nous • Belief is liable to error, knowledge is not. • Belief can be changed by persuasion, knowledge cannot be. • Belief does not bring understanding, knowledge does. • True belief, right opinion, is still essentially belief or opinion, and cannot be knowledge since its truth is accidental. • Opinion is shameful because it is not a passive thing that innocently occurs to a person. ### Sun: Visible Things : Sight Idea of good stands to intelligible things as intelligible things stand to understanding Good illuminate to us understanding - Cause of beings for ideas - Cause for knowledge Virtually all ideas (white light is virtually all colors) ⇒ Makes all minds to be true and understanding. > To understand is to focus intellect on the form, idea (stare into the sun and not be blinded) Understand an idea is understand what true being is. Knowledge knows what is is and must be it is Criterion of knowledge: - Infallibility, the impossibility of error Understanding (nous) ⇒ Philosophy Beauty, justice are entities Thought (dianoia) ⇒ Science - Requires some intellectual understanding → thoughts - Hypothesis: first principle of science yet stayed unexplained Perception (aesthesis) ⇒ Opinion > How do we understand philosophy? ## Dialetic > Inquiry that systematically attempts to grasp with respect to each thing itself what they being of it is ( that is, the idea) Does away with presuppositions. It overcomes everything hypothetical in thought and leads to presuppositionless knowledge. --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Presocratics tags: - philos1aa3 - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Presocratics" title: "Presocratics philosophers" date: 2023-09-09 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Presocratics.html.md --- ### Anaximander > material cause and first element of things was the unbounded _aperion_ - earth is cylindrical form, that its depth is as a third part of its breath - begetting hot and cold was separated off from the eternal at the origin of the world. The heavenly bodies are wheels of fire separated off from the fire which encircles the world, enclosed with air Living creatures arose from the moist element as it was evaporated by the sun. Man was like another animal, namely, a fish, in the beginning. often associated with _aperion_ and _infinite_ as the fundamental of the universe > Human arose from inside of the fish, and become capable of protecting themselves. ### Xenophanes 1. --- #### Parmenides > Archaic The questions of the beginning ``` Originate from the gods? Natural occurrence of where things from ``` - Pythagoras - Scientific > Being is one entities Not being is not known \| Change is not being → not green is Logic convince `changes is not real` Appearance of changes = dilusion > Truth is being → homogeneous changes Ultimate being is not one → ## Materialism atheist? ### Empedocles Perspective of nature - configuration of _materialism_ Elements: Earth, Air, Fire and Water Elemental force: Love and Strifes real difference upon things manifestation of these real articles ### Democritus Atomism Atomic hypothesis → nature is Atoms and Void - Nature is body and void, nothing else, not purpose or design - Soul is the body System of the atom Atoms of different sizes and shape Coalesce and accumulate → soul > Soul is the matter of the finest, smallest, matter, coherently all part of the organism - Primary and secondary qualities - Primary: atoms are size, shape, and height - Secondary: molecular combinations include hot/cold, moist/dry > Sweet exists by convention, bitter by convention, colour by convention. Division comes to a stop when atom → Atom is the least divisible factor Absolute stopping point for destruction → rearrangement of these atoms are indispensable If there are gods → they have to have body → no atom is immortal → Gods are immortal. For Greek, Gods are human are indifferent. --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates tags: - philosophy - philos1aa3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates" title: "Socrates" date: 2023-09-25 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates.html.md --- [Pre-Socratic](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Socrates/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/Presocratics) lives by materialism refers to Ionian: > keen on _nature_ science. Why it comes to be, why exists? Dissatisfied with nature science Apolo’s oracle at Delphi: `Know yourself` He knows nothing about superior things and right way to live He is not wise > Cannot be wise about nature, without wise of ourself for how we live and love Use the knowledge well ### Wisdom > Begins in self-understanding What is the way we ought to live? Philo-sophia love-wisdom != not being wise Can only love what we lack > Precisely not to be wise Philosophers vs. Fool: Philosophers knows hes not wise, whereas a fool doesn’t Longs for wisdom ⇒ only god is wise complete wisdom are not for human Learn from Pythagorias Soul is not human life, paradoxical of life Hermenedies: Pythagorian cares for the soul (rational, pursue for rational) Neither gods nor beast Spiritual beings operate between gods and beast - not posses wisdom but can have the ability to pursue such wisdom Socrates What is X? - True of all case - Reason why something is an X ex: Logos what gods love some, not others > Platonic Idea (Form) The idea of X is the form all particular Xs share, and which cause them to be X. Plato ⇒ Idea for all Socrates asks ⇒ More than materialism ⇒ from Socratic or Platonic is wrong - Idea is not material, not body in space or time - Ordinary experience Charges for Athenian trials? Animosity with regards to to his way of being? --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/Sphinoza tags: - philos1aa3 - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Sphinoza" title: "Sphinoza" date: 2023-10-10 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/Sphinoza.html.md --- ## Ethics (1677) Deus sive Nature European considered Sphinoza as Atheist ⇒ but his works mentioned God ## God of the Philosophers/Metaphysicians - Unmoved Mover of Aristotle (pre-Socratic Ionian Xenophanes) ⇒ Value of philosophers esteemed, not the value of the normal being - Not a personal being with loves or being - but more rationalistic god Monotheistic God != nature ⇒ God creates nature Spinoza: God = nature - powerful, perfect being Substances **Aristotle**: Substance is that which exists in itself and not in another **Descartes**: Two substances, mental and physical (dualism) **Parmenides**: “Say and think only this: Being is” A single substance, numerically one. **Spinoza**: Combine the conclusion of Aristotle about what substance is with the argument of Parmenides, that there can be only one _single one substance_ What is this? One being is alive, infinite, intelligent, all-powerful material, spacial > God is not transcend natural, not super nature cause God is nature ## Substances (1D3) Substance is cause of itself, causa sui Cause of itself (causa sui) = necessary existence Something that doesn’t exist but how it can beomce? Substances has no cause, belongs to the substance it is It has to exists, essence includes existence ## Monism > Substance is unique. There exists only one substance. Not one _kind_ of substance, but _numerically one single substance_ 1. Something exists 2. What ever exists has a sufficient cause 3. Therefore a causa sui substance must exist, and 4. There is at most one Why one? 1. If there are two causa sui substances, must be difference 2. If there is a difference, then there must be a cause 3. One causa sui substance cannot cause change in another self-caused being ## Aristotle’s Idea of Substance Color does not exists in the horse, instead of the form Horse would exists with itself? > Nature is not a totality of things ## Mode and Attribute **Attribute**: That which is intellect perceives of substances as constituting its essence (1D4) Substances has inf attributes, each attributes has inf mode **Mode**: The affections of a stubstance; that is, that which is in something else and conceived through somethign else (1D5) Each mode is connected, can be modified by others “Mode” = modification, modality way. A mode of substance is a modification of it, some way in which substnace is modified. “Conceived through” = explained by, made intelligible, reasonable by ## Spinoza’s idea of Substances (=God) One single, infinite, eternal, complex substance, comprising infinitely many modes of infinitely many attributes Substances > Attribute > Mode Invisibility implies corruption God can’t have parts and pieces Is god divisible? Definition 6: By God I emean an absolutely infinite being; that is, substances Proposition 11: ### Method for the proof of god Ontological Proof: Explain God as a being that cannot not exist. God’s essence includes existence Cosmological Proof: God is the first cause, the aultimate cause of everything else. WIthout God the chain of cause and effect would recede forever, and the world would be without a rational foundation. Aristotle’s proof of the Unmoved Mover in Metaphysics was this type of proof Teleological Proof (“Design argument”): Nature shows evidence of intelligent design. ← Sphinoza rejects the _Teleological proof_. His three arguments in Proposition 11 #### Reduction of Absurdity (reductio ad absurdum) ```prolog To prove P: assume NP show that if NP, then Q & NQ Q and NQ is contradiction and is impossible (contradiction=False) so not NP therefore P ``` First proof: (ontological argument) 1. Suppose God does not exist 2. Axiom 7: If a thing can be conceived as not existing, its essence does not involve existence 3. Prop. 7: Existence belongs to the nature of substance. - Why? a. Substance cannot be produced by another b. So, from def.1, substance is self-caused, aso its essence involves edistence 4. The hypothetical non-existence of God reduces to contradiction 5. Therefore, God exists. Second proof: (cosmological argument) 1. For everything, there must be a cause, either of its existence or its nonexistence. 2. The cause, whether of existence or non-existence, is either in the thing or in another 3. A thing necessarily exists if no cause prevents its existence. 4. So, if God does not exist, there must be a cause of non-existence, and this cause must be in another 5. What causes God not to exist must absolute exclude God from being, and can there have nothing in commong with God 6. If two things have nothing in common, one cannot prevent the other’s existence 7. Therefore, no cause prevents God’s existence 8. So, God exists. > There is nothing of which we can be more certain than the existence of an absolutely infinite or perfect entity --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/index tags: - philos1aa3 - university description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/index" title: "Philosophical Text" date: 2023-09-04 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/index.html.md --- An introduction to philosophy through the close reading of selected classical texts. Authors to be considered may include Plato, Descartes, Hobbes, Hume, Marx, Mill, Nietzsche. The full notes can be found [here](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/index/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/All.pdf), with all of the reference [text](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/index/../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/1A3Reader\(2019\).pdf). Tutorial notes can be found [here](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/index/../../../../../../../../thoughts/university/philo-1aa3/tut) --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Being-qua-being tags: - philosophy - philos1aa3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Being-qua-being" title: "Being qua being" date: 2023-09-15 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Being-qua-being.html.md --- # Wisdom[](#wisdom) - Most general knowledge - Not instrumental knowledge - Practical (Techne) → practical - Epistemic (Episteme) → theoretical > Aristotle values episteme, since Techne is dependent on others ``` Theoretical is independent of others scenarios ``` Intrinsic values of epistemology? Instrumental values only relatable once intrinsic values are considered → intrinsic values over instrumental values - Since of being (`being qua being`) ⇒ is it possible scientifically? - Philosophy of science - How do we justify science is possible? - Unify principle to account for the variety of things - Questions wrt metaphysics - Science of metaphysics to study everything (study of being) Being has a lot of sense # Passage[](#passage) _Bk. 4 Ch. 2 (p. 135)_ Priority of Substances Focal point Healthy → functional relativity to ‘health’ Using health or different aspect of health That is being is related to one central point all things to be that are of substances, - affections - process - destruction/privations/qualities - productive if there is a science for one → a science for all? Metaphysics is a science for investigating What is the focal point of metaphysics? What is the primary sense of being? > Primary sense of being is being substances Substances is an unified things in the world ex: Human is a substances --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Epicurus tags: - philos1aa3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Epicurus" title: "Epicurus's Hedonism and Materialism" date: 2023-09-11 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Epicurus.html.md --- # Materialism[](#materialism) - made of atoms # Central role of chance[](#central-role-of-chance) # Denial of afterlife[](#denial-of-afterlife) - believes in gods and souls, but souls are _material of thing_ - souls dies once the body dies # Hedonism[](#hedonism) - pleasure as the highest good Q: - Why is pleasure the highest good? - Which pleasures should we choose? - Agree? Pleasure - Pain (good) (bad) Blessed life ⇒ healthy body ⇒ undisturbed soul lead to pleasure ``` Endure pain for greater pleasure ``` Kinetic vs. Katastematic pleasure object dependent vs. object independent --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Phaedo-and-Apology tags: - philosophy - philos1aa3 description: "Phaedo and Apology" title: "Phaedo and Apology" date: 2023-09-22 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Phaedo-and-Apology.html.md --- ## Structure Problem ⇒ Thesis ⇒ Structure ## Apology - Outsider - Speaking in a language that is not the court ⇒ Knowing what good is enough for being good → connect between wisdom and good action ⇒ To find the truth of what is good > Athenians are harming themselves since Socrates is the only one who concerns the truth and what truly good is. - “Ancient” / Recent accusation - Bad association with pre/post - Ancient: - Physicist → enquire things in heaven and under the earth - Sophist: Use arguments that leads people away from the truth - Doesn’t care about physics - “human/political virtue” ⇒ People aren’t as wise as they were - Corrupts the youth - Hates democracy - not believe to the gods of the state ⇒ can’t corrupt the youth: “No one does bad willingly. Corrupt person is more likely to harm people.” ⇒ If you know that you are harming people → you should come to me > The unexamined life is not worth living - Won’t act as what is good and knowledge → living a life of ignorance → not a worth living life. ## Phaedo - Pain and pleasure - How suicide is wrong, facing death = good ? - Philosophers desire death? - What if the soul is not immortal? - Single vs composite > All knowing is remembering ⇐ posteriori pp 62,63: - We know absolute equality - Material equalities fall short of absolute equality > To see inequality → need to have knowledge of what absolute equality is, from experience For Socrates → ideas are not fluid --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Republic tags: - philosophy - philos1aa3 description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Republic" title: "Allegory of the cave, Republic" date: 2023-09-13 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Republic.html.md --- Allegory of the cave: - See the shadow, but not the reflection of the real world - Escape the prisoner → convince them to go out and see what the real world is - Talking about shadow (escape to identify real world now instead of the shadow) ### Republic pp. 124-125 State of Ignorance? State of being in the cave = Ignorance eyesight ⇒ capacity for learning Sun = knowledge, intelligible world shadow → sensible world > taints these visions are unwilling to decent to human affair for the ⇒ which desire for allegory (don’t want to go down to the cape) Fear of the unknown? Journey to Enlightenment? Enlightenment as a duty? Why most of them don’t want to escape at the beginning? Why only one of them to escape? → fear of the unknown? The form? See also: [Symposium](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Republic/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Symposium#symposium-pp-105-106) Ladders’ of beauty ![](https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Republic/../../../../../../../../../../thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/IMG_0308.webp) --- slug: thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Spinoza tags: - philos1aa3 - philosophy description: "resconstructed source of https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Spinoza" title: "Arguments regards to Sphinoza" date: 2023-10-11 permalink: https://aarnphm.xyz/thoughts/university/twenty-three-twenty-four/philo-1aa3/tut/Spinoza.html.md --- Substance = God = Nature → There is only one substances Many attributes: - Thought - Extensions Each attributes: - Mode (Aristotle’s substances) ```mermaid stateDiagram-v2 state "Substances" as A state "Thoughts" as B state "Extension" as C A --> B A --> C B --> M1 B --> M2 B --> M3 ``` _Think Least of Death_ by Nadler ## Principle of S