The following in an excerpt of a paper I wrote for my coursework.

Introduction.

Historically, Alan Turing’s seminal work “Computing Machinery and Intelligence” laid the foundation for exploring the possibilities of a thinking machine (TURING, 1950). Subsequently, the development of AI had taken a symbolic approach — world representations through systems that utilise high-level symbols and manipulate tokens to arrive at a result commonly referred to as Good Old-Fashioned AI (GOFAI) (Haugeland, 1997).

While GOFAI showed promise through decision-tree reasoning, its limitations became apparent in the 1980s when the field entered “AI Winter.” This was likely due to the cynicism within the AI researchers’ community and a reduction in funding, which halted most research and development (Hendler, 2008).

However, given the rise of Moore’s Law and the exponential amount of computing and data available, a new approach to AI arose, focusing on statistical methods and connectionist networks such as artificial neural networks. (Haugeland, 1997) dubbed this approach as New Fangled AI (NFAI). Fast forward to the century, ML has entered the mainstream through the rise of generative AI (GenAI).

This paper posits that GenAI currently occupies the “peak of inflated expectations”, approaching the “trough of disillusionment” on Gartner’s hype cycle. It will also examine the implications of machine-assisted interfaces beyond conversational UI and their pedagogical consequences for student learning and assessment.

Gartner’s hype cycle.

For context, applications such as ChatGPT are built on top of Transformers architecture and pre-trained on a large corpus of text (Brown et al., 2020). Given an input sequence of tokens length , these systems will predict the next tokens at index . Most implementations of transformers are autoregressive (Croft, 2023), meaning that the model will predict the future values (index ) based on past values (index ). However, (Keles et al., 2022, p. 4) proved that the computation complexity of self-attention is quadratic; therefore, running these systems in production remains a scaling problem (Kaplan et al., 2020).

The current positioning of GenAI at the peak of inflated expectations aligns with the (Gartner, 2024) prediction. Three key factors support this assessment: rapid advancement in research, widespread enterprise adoption, and increased public awareness. Ongoing research in GenAI, specifically language models, spans several topics, including mechanistic interpretability (Nanda, 2023), which explores the inner workings of auto-regressive models, information retrieval techniques aimed to improve correctness and reduce hallucinations among LLM systems (Béchard & Ayala, 2024; Dhuliawala et al., 2023), as well as vested interests in multimodal applications of transformers (Xu et al., 2023). Leading research labs, from Anthropic on their interpretability and alignment work (Elhage et al., 2022), (Bricken et al., 2023), (Templeton et al., 2024), AI21’s Jamba with its innovative hybrid transformers architecture (Team et al., 2024) to open-weights models from Meta, Google continue lead redefine the boundaries of what these systems are capable of.

Enterprise adoption is evident with Salesforce (Nijkamp et al., 2023), Oracle’s collaboration with Cohere, and Microsoft’s Copilot for its 365 Product Suite. However, widespread implementation doesn’t necessarily equate to immediate, measurable productivity gains. Integrating these systems effectively into enterprise workflows to deliver tangible business value takes time and effort.

Despite the field’s excitement, the current hype and expectations often exceed its reliable capabilities, especially for complex use cases. Significant challenges persist, including hallucinations and lack of factual grounding (Huang et al., 2023, p. 3). We observe such behaviours in ChatGPT, where the given knowledge cutoff prevents the systems from providing up-to-date information, which will “hallucinate” and provide inaccurate answers. (Dwivedi et al., 2023, p. 4.4.9.1.2)

As the field progresses towards the “trough of disillusionment” on Gartner’s hype cycle, a more realistic assessment of GenAI’s capabilities will likely emerge, paving the way for more effective applications.

Implications of machine-assisted interfaces and its pedagogical consequences for student learning and assessment.

The proliferation of conversational user interfaces (CUI) is based upon a simple heuristic of how auto-regressive models models surface their internal state through generating the next tokens.

CUIs often prove frustrating when dealing with tasks requiring larger information sets. Additionally, for tasks that require frequent information retrieval (research, academic writing), CUIs are suboptimal as they compel users to maintain information in their working memory unnecessarily. Pozdniakov proposed a framework that incorporate both application and interaction design, emphasizing manual alignment inputs from end users (Pozdniakov et al., 2024, p. 3). This approach, when applied replace traditional essay assignments, has two major implications for student learning and assessment: a shift in core competencies and collaborative assessment methods.

With machine-assisted interfaces, students will need to develop stronger critical thinking skills to evaluate AI-generated content and formulate precise instructions. The focus will shift towards the process of reaching desired outcomes and improving information retrieval skills. This shift aligns with the potential for machine-assisted proofs to solve novel problems, as discussed by (Tao, 2024).

These new interfaces will require instructors to adapt their evaluation methods. Assessment will need to consider students’ pace flexibility and their level of engagement with a given topic. This approach encourages a more holistic, cross-disciplinary understanding, better preparing students for continuous learning in our rapidly evolving technological landscape.

Béchard, P., & Ayala, O. M. (2024). Reducing hallucination in structured outputs via Retrieval-Augmented Generation. https://arxiv.org/abs/2404.08189
Bricken, T., Templeton, A., Batson, J., Chen, B., Jermyn, A., Conerly, T., Turner, N., Anil, C., Denison, C., Askell, A., Lasenby, R., Wu, Y., Kravec, S., Schiefer, N., Maxwell, T., Joseph, N., Hatfield-Dodds, Z., Tamkin, A., Nguyen, K., … Olah, C. (2023). Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. Transformer Circuits Thread.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. https://arxiv.org/abs/2005.14165
Croft, B. (2023). LLM Visualization. https://bbycroft.net/llm
Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A., & Weston, J. (2023). Chain-of-Verification Reduces Hallucination in Large Language Models. https://arxiv.org/abs/2309.11495
Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Baabdullah, A. M., Koohang, A., Raghavan, V., Ahuja, M., Albanna, H., Albashrawi, M. A., Al-Busaidi, A. S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I., Brooks, L., Buhalis, D., … Wright, R. (2023). Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71, 102642. https://doi.org/https://doi.org/10.1016/j.ijinfomgt.2023.102642
Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield-Dodds, Z., Lasenby, R., Drain, D., Chen, C., Grosse, R., McCandlish, S., Kaplan, J., Amodei, D., Wattenberg, M., & Olah, C. (2022). Toy Models of Superposition. Transformer Circuits Thread.
Gartner. (2024). Gartner Predicts 40 Percent of Generative AI Solutions Will Be Multimodal By 2027. https://www.gartner.com/en/newsroom/press-releases/2024-09-09-gartner-predicts-40-percent-of-generative-ai-solutions-will-be-multimodal-by-2027
Haugeland, J. (1997). Mind Design II: Philosophy, Psychology, and Artificial Intelligence. The MIT Press. https://doi.org/10.7551/mitpress/4626.001.0001
Hendler, J. (2008). Avoiding Another AI Winter. IEEE Intelligent Systems, 23(2), 2–4. https://doi.org/10.1109/MIS.2008.20
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2023). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. https://arxiv.org/abs/2311.05232
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling Laws for Neural Language Models. https://arxiv.org/abs/2001.08361
Keles, F. D., Wijewardena, P. M., & Hegde, C. (2022). On The Computational Complexity of Self-Attention. https://arxiv.org/abs/2209.04881
Nanda, N. (2023). Concrete Steps to Get Started in Transformer Mechanistic Interpretability. https://www.neelnanda.io/mechanistic-interpretability/getting-started
Nijkamp, E., Xie, T., Hayashi, H., Pang, B., Xia, C., Xing, C., Vig, J., Yavuz, S., Laban, P., Krause, B., Purushwalkam, S., Niu, T., Kryściński, W., Murakhovs’ka, L., Choubey, P. K., Fabbri, A., Liu, Y., Meng, R., Tu, L., … Xiong, C. (2023). XGen-7B Technical Report. https://arxiv.org/abs/2309.03450
Pozdniakov, S., Brazil, J., Abdi, S., Bakharia, A., Sadiq, S., Gasevic, D., Denny, P., & Khosravi, H. (2024). Large Language Models Meet User Interfaces: The Case of Provisioning Feedback. https://arxiv.org/abs/2404.11072
Tao, T. (2024). Machine-Assisted Proofs. https://terrytao.wordpress.com/wp-content/uploads/2024/03/machine-assisted-proof-notices.pdf
Team, J., Lenz, B., Arazi, A., Bergman, A., Manevich, A., Peleg, B., Aviram, B., Almagor, C., Fridman, C., Padnos, D., Gissin, D., Jannai, D., Muhlgay, D., Zimberg, D., Gerber, E. M., Dolev, E., Krakovsky, E., Safahi, E., Schwartz, E., … Shoham, Y. (2024). Jamba-1.5: Hybrid Transformer-Mamba Models at Scale. https://arxiv.org/abs/2408.12570
Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken, T., Chen, B., Pearce, A., Citro, C., Ameisen, E., Jones, A., Cunningham, H., Turner, N. L., McDougall, C., MacDiarmid, M., Freeman, C. D., Sumers, T. R., Rees, E., Batson, J., Jermyn, A., … Henighan, T. (2024). Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Transformer Circuits Thread. https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html
TURING, A. M. (1950). I.—COMPUTING MACHINERY AND INTELLIGENCE. Mind, LIX(236), 433–460. https://doi.org/10.1093/mind/LIX.236.433
Xu, P., Zhu, X., & Clifton, D. A. (2023). Multimodal Learning with Transformers: A Survey. https://arxiv.org/abs/2206.06488