The following in an excerpt of a paper I wrote for my coursework.

Introduction.

Historically, Alan Turing’s seminal work “Computing Machinery and Intelligence” laid the foundation for exploring the possibilities of a thinking machine (TURING, 1950). Subsequently, the development of AI had taken a symbolic approach — world representations through systems that utilise high-level symbols and manipulate tokens to arrive at a result commonly referred to as Good Old-Fashioned AI (GOFAI) (Haugeland, 1997).

While GOFAI showed promise through decision-tree reasoning, its limitations became apparent in the 1980s when the field entered “AI Winter.” This was likely due to the cynicism within the AI researchers’ community and a reduction in funding, which halted most research and development (Hendler, 2008).

However, given the rise of Moore’s Law and the exponential amount of computing and data available, a new approach to AI arose, focusing on statistical methods and connectionist networks such as artificial neural networks. (Haugeland, 1997) dubbed this approach as New Fangled AI (NFAI). Fast forward to the 21st21^{\text{st}} century, ML has entered the mainstream through the rise of generative AI (GenAI).

This paper posits that GenAI currently occupies the “peak of inflated expectations”, approaching the “trough of disillusionment” on Gartner’s hype cycle. It will also examine the implications of machine-assisted interfaces beyond conversational UI and their pedagogical consequences for student learning and assessment.

Gartner’s hype cycle.

For context, applications such as ChatGPT are built on top of Transformers architecture and pre-trained on a large corpus of text (Brown et al., 2020). Given an input sequence of tokens length nn, these systems will predict the next tokens at index n+1n+1. Most implementations of transformers are autoregressive (Croft, 2023), meaning that the model will predict the future values (index n+1n+1 \to \infty) based on past values (index 0n0 \to n). However, (Keles et al., 2022, p. 4) proved that the computation complexity of self-attention is quadratic; therefore, running these systems in production remains a scaling problem (Kaplan et al., 2020).

The current positioning of GenAI at the peak of inflated expectations aligns with the (Gartner, 2024) prediction. Three key factors support this assessment: rapid advancement in research, widespread enterprise adoption, and increased public awareness. Ongoing research in GenAI, specifically language models, spans several topics, including mechanistic interpretability (Nanda, 2023), which explores the inner workings of auto-regressive models, information retrieval techniques aimed to improve correctness and reduce hallucinations among LLM systems (Béchard & Ayala, 2024; Dhuliawala et al., 2023), as well as vested interests in multimodal applications of transformers (Xu et al., 2023). Leading research labs, from Anthropic on their interpretability and alignment work (Bricken et al., 2023; Elhage et al., 2022; Templeton et al., 2024), AI21’s Jamba with its innovative hybrid transformers architecture (Team et al., 2024) to open-weights models from Meta, Google continue lead redefine the boundaries of what these systems are capable of.

Enterprise adoption is evident with Salesforce (Nijkamp et al., 2023), Oracle’s collaboration with Cohere, and Microsoft’s Copilot for its 365 Product Suite. However, widespread implementation doesn’t necessarily equate to immediate, measurable productivity gains. Integrating these systems effectively into enterprise workflows to deliver tangible business value takes time and effort.

Despite the field’s excitement, the current hype and expectations often exceed its reliable capabilities, especially for complex use cases. Significant challenges persist, including hallucinations and lack of factual grounding (Huang et al., 2023, p. 3). We observe such behaviours in ChatGPT, where the given knowledge cutoff prevents the systems from providing up-to-date information, which will “hallucinate” and provide inaccurate answers. (Dwivedi et al., 2023, p. 4.4.9.1.2)

As the field progresses towards the “trough of disillusionment” on Gartner’s hype cycle, a more realistic assessment of GenAI’s capabilities will likely emerge, paving the way for more effective applications.

Implications of machine-assisted interfaces and its pedagogical consequences for student learning and assessment.

The proliferation of conversational user interfaces (CUI) is based upon a simple heuristic of how auto-regressive models models surface their internal state through generating the next tokens.

CUIs often prove frustrating when dealing with tasks requiring larger information sets. Additionally, for tasks that require frequent information retrieval (research, academic writing), CUIs are suboptimal as they compel users to maintain information in their working memory unnecessarily. Pozdniakov proposed a framework that incorporate both application and interaction design, emphasizing manual alignment inputs from end users (Pozdniakov et al., 2024, p. 3). This approach, when applied replace traditional essay assignments, has two major implications for student learning and assessment: a shift in core competencies and collaborative assessment methods.

With machine-assisted interfaces, students will need to develop stronger critical thinking skills to evaluate AI-generated content and formulate precise instructions. The focus will shift towards the process of reaching desired outcomes and improving information retrieval skills. This shift aligns with the potential for machine-assisted proofs to solve novel problems, as discussed by (Tao, 2024).

These new interfaces will require instructors to adapt their evaluation methods. Assessment will need to consider students’ pace flexibility and their level of engagement with a given topic. This approach encourages a more holistic, cross-disciplinary understanding, better preparing students for continuous learning in our rapidly evolving technological landscape.

References

  • Béchard, P., & Ayala, O. M. (2024). Reducing hallucination in structured outputs via Retrieval-Augmented Generation. arXiv preprint arXiv:2404.08189 arxiv
  • Bricken, T., Templeton, A., Batson, J., Chen, B., Jermyn, A., Conerly, T., Turner, N., Anil, C., Denison, C., Askell, A., Lasenby, R., Wu, Y., Kravec, S., Schiefer, N., Maxwell, T., Joseph, N., Hatfield-Dodds, Z., Tamkin, A., Nguyen, K., … Olah, C. (2023). Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. Transformer Circuits Thread. [link]
  • Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. arXiv preprint arXiv:2005.14165 arxiv
  • Croft, B. (2023). LLM Visualization. https://bbycroft.net/llm
  • Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A., & Weston, J. (2023). Chain-of-Verification Reduces Hallucination in Large Language Models. arXiv preprint arXiv:2309.11495 arxiv
  • Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Baabdullah, A. M., Koohang, A., Raghavan, V., Ahuja, M., Albanna, H., Albashrawi, M. A., Al-Busaidi, A. S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I., Brooks, L., Buhalis, D., … Wright, R. (2023). Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71, 102642. https://doi.org/https://doi.org/10.1016/j.ijinfomgt.2023.102642
  • Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield-Dodds, Z., Lasenby, R., Drain, D., Chen, C., Grosse, R., McCandlish, S., Kaplan, J., Amodei, D., Wattenberg, M., & Olah, C. (2022). Toy Models of Superposition. Transformer Circuits Thread. [link]
  • Gartner. (2024). Gartner Predicts 40 Percent of Generative AI Solutions Will Be Multimodal By 2027. https://www.gartner.com/en/newsroom/press-releases/2024-09-09-gartner-predicts-40-percent-of-generative-ai-solutions-will-be-multimodal-by-2027
  • Haugeland, J. (1997). Mind Design II: Philosophy, Psychology, and Artificial Intelligence. The MIT Press. https://doi.org/10.7551/mitpress/4626.001.0001
  • Hendler, J. (2008). Avoiding Another AI Winter. IEEE Intelligent Systems, 23(2), 2–4. https://doi.org/10.1109/MIS.2008.20
  • Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2023). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv preprint arXiv:2311.05232 arxiv
  • Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling Laws for Neural Language Models. arXiv preprint arXiv:2001.08361 arxiv
  • Keles, F. D., Wijewardena, P. M., & Hegde, C. (2022). On The Computational Complexity of Self-Attention. arXiv preprint arXiv:2209.04881 arxiv
  • Nanda, N. (2023). Concrete Steps to Get Started in Transformer Mechanistic Interpretability. https://www.neelnanda.io/mechanistic-interpretability/getting-started
  • Nijkamp, E., Xie, T., Hayashi, H., Pang, B., Xia, C., Xing, C., Vig, J., Yavuz, S., Laban, P., Krause, B., Purushwalkam, S., Niu, T., Kryściński, W., Murakhovs’ka, L., Choubey, P. K., Fabbri, A., Liu, Y., Meng, R., Tu, L., … Xiong, C. (2023). XGen-7B Technical Report. arXiv preprint arXiv:2309.03450 arxiv
  • Pozdniakov, S., Brazil, J., Abdi, S., Bakharia, A., Sadiq, S., Gasevic, D., Denny, P., & Khosravi, H. (2024). Large Language Models Meet User Interfaces: The Case of Provisioning Feedback. arXiv preprint arXiv:2404.11072 arxiv
  • Tao, T. (2024). Machine-Assisted Proofs. https://terrytao.wordpress.com/wp-content/uploads/2024/03/machine-assisted-proof-notices.pdf
  • Team, J., Lenz, B., Arazi, A., Bergman, A., Manevich, A., Peleg, B., Aviram, B., Almagor, C., Fridman, C., Padnos, D., Gissin, D., Jannai, D., Muhlgay, D., Zimberg, D., Gerber, E. M., Dolev, E., Krakovsky, E., Safahi, E., Schwartz, E., … Shoham, Y. (2024). Jamba-1.5: Hybrid Transformer-Mamba Models at Scale. arXiv preprint arXiv:2408.12570 arxiv
  • Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken, T., Chen, B., Pearce, A., Citro, C., Ameisen, E., Jones, A., Cunningham, H., Turner, N. L., McDougall, C., MacDiarmid, M., Freeman, C. D., Sumers, T. R., Rees, E., Batson, J., Jermyn, A., … Henighan, T. (2024). Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Transformer Circuits Thread. [link]
  • TURING, A. M. (1950). I.—COMPUTING MACHINERY AND INTELLIGENCE. Mind, LIX(236), 433–460. https://doi.org/10.1093/mind/LIX.236.433
  • Xu, P., Zhu, X., & Clifton, D. A. (2023). Multimodal Learning with Transformers: A Survey. arXiv preprint arXiv:2206.06488 arxiv