ChatGPT-Generated Article Illustrates Limitations, Challenges With Technology
By W. Frazier Pruitt
The advent of artificial intelligence (AI) threatens to revolutionize various industries, bolstering efficiency, scalability and innovative problem-solving. Nevertheless, its impact on job markets continues to fuel rigorous debate as AI’s capabilities progress. Sectors wresting with this issue include those in which tasks such as programming, writing simple copy, and collecting and analyzing data are key.
Generative AI models such as OpenAI’s GPT-3, which also powers Microsoft’s Bing AI, and separate competitor Google’s Bard have shown potential in these tasks. But the question remains: Could these AI models ultimately supplant jobs?
We’re bombarded with headlines such as “Is AI Coming for Your Job? ChatGPT Renews Fears”1 and “AI Could Replace Equivalent of 300 Million Jobs.”2 These concerns, despite the deplorable fear mongering, are not without justification. An arms race to discover and harness generative AI is underway, and there is no shortage of people betting their capital on success.
What is less frequently discussed in headlines is what effect generative AI will have on quality. Is quality susceptible to the AI waves of change?
AI and quality management
In the long run, it’s likely no industry will be immune. Nevertheless, quality management poses unique challenges due to its intrinsic complexity and the novelty of the problems it tackles. The issues encountered in quality are multifaceted, influenced by a variety of factors such as human motivations and organizational constraints.
Consequently, the tasks involved necessitate a delicate balance of soft and hard skills, many of which AI struggles to emulate. Furthermore, the novel problems in quality management—well illustrated when writing about quality—demand unique insights derived from data and experience—a challenge that AI models have yet to overcome fully.
Yet, writing instructions and analyzing data are key aspects of quality management and seem to align with the strengths of generative AI models. These models include ChatGPT, Bard and other algorithms that have been developed by scouring the internet and consuming tremendous amounts of data. This situation again begs the question: Will these AI technologies replace jobs in quality?
This column examines the promise and limitations of AI, particularly generative AI, by testing its capability in the task of writing about quality. The act of writing about quality mimics the complexities of quality management and also some repetitive tasks, such as updating work instructions. It is, therefore, a fair and insightful—though incomplete—proxy to test the effectiveness of current generative AI on quality management.
To test, a resulting manuscript was submitted earlier this year, with consent of QP editors, to QP’s peer review process and feedback analysis. The results were conclusive: The paper produced did not meet QP’s publication standards. A thematic analysis of the reviewers’ comments identified three major categories: redundancy, superficial content and timeliness of the content. The first two had negative sentiment, and the third was positive.
Redundancy
Reviewers heavily criticized the manuscript for redundancy, noting the repetition of ideas and sentences throughout the text. The most direct comment regarding redundancy read: “I have never read a manuscript with more repetition of the same ideas and even exact statements.”
This is a strong indictment, but is not unwarranted. Repetition could result in reader frustration and detract from the manuscript’s overall quality. Reviewers noted the manuscript seemed to be AI-generated, a conclusion likely drawn from its repetitive nature, but direct author editing potentially could address this issue.
In reflection of the process used to write the manuscript, there might have been heavy repetition due to the constraints of GPT-3 and a minimalist approach to human intervention. The process used in the test involved only minor editing and redirection at each step.
The final essay targeted 2,600 words, far more than the algorithm was naturally inclined, so the author asked the AI to expand the sections multiple times to achieve the length goals.
While GPT-3 added some detail, it was clearly not proportionate to the increase in sheer verbiage. More advanced prompt engineering might have improved the results significantly, along with the tighter segmentation of topics and elimination of internal summaries and conclusions within the specific responses.
The complaint of redundancy is common of generative AI and one that will continue, but it should improve overtime. Larger capacity algorithms will need less stitching of independent segments responses and, as result, will be better able to write more elegant transitions between topics. But content still depends on the stock information to draw from. Depth and detail will continue to be a challenge for cutting edge or novel research.
Superficial content
Source information also is a key driver of the second most common criticism: lack of depth. One reviewer wrote, “Content is not well-developed. Few specific examples of AI’s applications are provided except for being mentioned in passing. No concrete examples/use cases/case studies are provided.”
This, like the first criticism, is undoubtedly true. In treating this as an experiment, however, the condition of the test is of paramount importance. To reiterate, the author took a minimalist approach to manual intervention. No effort was made to check or add specific research or details to the manuscript. AI brainstormed the content, drafted the structure of the manuscript and wrote the content. AI has been shown to create impressive results in other areas, but here we’re dealing with a topic relatively new and novel, compared to renaissance literature for contrast where it can produce seemingly insightful arguments.
Another limiting factor was the training of the generative AI model, GPT-3, which is not trained on data after 2021 nor is it able to search the internet. Consequently, it struggled with depth in the niche and new area of research tested.
As expected, by the nature of large language model generative AI, it was likely just combining quality management and AI topics, and constructing words in a pattern that it might expect to find probable. This is related to a risk of confabulation—AI making up facts. Most illustrative examples occur when the AI is asked for references. It almost always will create plausible-sounding titles, authors and even entire journals that are completely erroneous.
The key takeaway from this reflection is the current technology excels, for a computer, at finding context and summarizing. To do that, however, technology needs sufficient stock information from which to draw. Issues of confabulation are serious but getting a lot of attention. The improvements likely will be incremental. In applications of AI, risks must be assessed for sustainability, balancing the severity of consequences with the current likelihood of errors. Still the likelihood of an AI creating new ideas seems a long way off.
Timeliness
The third theme found was the timeliness of the topic. Indeed, with everyone talking about AI and large investments being made, such as Microsoft’s deal with OpenAI, it’s reasonable and important that the quality field continues to improve iteratively and press on with generative AI and other Industry 4.0 topics such as the Internet of Things, machine learning and big data. Whether the conclusion “drawn” by the AI has any merit must be investigated with real-world examples and testing.
One reviewer may have summarized it best: “AI is a hot, modern subject that will encourage discussion. People could discuss the four applications provided as well as their own ideas.” AI may not have gotten it right, but the discussion is important for the future of quality.
Limitations—for now
The implications of AI on quality management and our jobs paint a complex and evolving picture. While AI has demonstrated impressive capabilities, the experiment of using generative AI models to write about quality indicates that we are far away from a time when when AI fully can replace human practitioners in this field.
The criticisms of redundancy and superficial content in the AI-generated content highlight the current limitations of AI in this context, specifically generative AI models. They still are not capable of fully understanding and replicating the complexity and novelty inherent in quality management tasks.
Moreover, the inability of current AI models to access recent and relevant data further limits their utility in a field that requires up-to-date knowledge and understanding of unique organizational contexts.
On the other hand, the recognition of the topic’s timeliness underscores the relevance and importance of continuing to explore the implications of AI on quality management. The picture is yet incomplete, but as the reviewer mentioned it is “hot” and the value will come from the confluence of many voices.
As investments in AI continue to grow and the technology continues to evolve, it is crucial for the quality field to adapt and integrate these advancements. It also is essential for practitioners to take a proactive role in shaping how AI is used within the field, ensuring that it serves to enhance, rather than replace, the valuable work performed by humans in quality management.
Although the future of AI in quality management remains uncertain, one thing is clear: The conversation is far from over.
REFERENCES
- Max Zahn, “Is AI Coming for Your Job? ChatGPT Renews Fears,” ABC News, Feb. 14, 2023.
- Chris Vallance, “AI Could Replace Equivalent of 300 Million Jobs—Report,” BBC News, March 28, 2023.
This article was originally published in Quality Progress October 2023