1

A Review of Advanced Prompting Techniques in Large Language Models (LLMs)

Course: MBAR 661: Academic Research Project
Submitted to: Professor Dr. Mohsen Ghodrat
Submitted by: Sundar Neupane
Student ID:
University Canada West
Date: 26/06/2025

2
Acknowledgement
I want to thank University Canada West from the bottom of my heart for all the help,
advice, and resources they gave me during this study. I particularly acknowledge Professor
Dr. Mohsen Ghodrat who has been very supportive, provided me with useful comments, and
served as a supportive mentor to me. He has made my academic life much improved.
I also wish to give credit to my wife, mother and family and friends. They have been
there to keep faith and support me and I find it the best strength of mine. Lastly, I would like
to show my appreciation to the general scholarly community whose research and hard work
have guided and encouraged this work.

3
Table of Contents
Acknowledgement .................................................................................................................. 2
List of Tables ......................................................................................................................... 5
List of Figures........................................................................................................................ 6
Abstract ................................................................................................................................ 7
Introduction .......................................................................................................................... 8
Research Aims and Objectives................................................................................................ 9
Research Questions/Hypotheses .............................................................................................. 9
Methodology ........................................................................................................................ 10
PRISMA Flowchart .......................................................................................................... 10
Protocol and Selection Criteria ......................................................................................... 12
Inclusion Criteria ......................................................................................................... 12
Exclusion Criteria ......................................................................................................... 12
Rationale ......................................................................................................................... 13
Search Strategy and Databases Used ............................................................................. 13
Keywords and Boolean Combinations Used ................................................................... 13
PRISMA Compliance Summary ....................................................................................... 14
Summary of Selected Literature ....................................................................................... 14
Discussion of Findings ...................................................................................................... 16
Artificial Intelligence ........................................................................................................... 17
Examples and Applications of AI ...................................................................................... 18
A Brief History of AI: Key Milestones ............................................................................... 18
Limitations and Challenges ............................................................................................... 20
Machine Learning (ML) ....................................................................................................... 20
Deep Learning and Neural Networks .................................................................................... 22
A Brief History of Deep Learning and Neural Networks .................................................... 23
Key Components and How Deep Neural Networks Learn .................................................. 24
Examples of Deep Learning Architectures ......................................................................... 24
Applications ..................................................................................................................... 25
Key Findings .................................................................................................................... 26
Generative AI ...................................................................................................................... 27
Historical Development .................................................................................................... 27
Core Models and Examples............................................................................................... 28
Applications Across Domains ............................................................................................ 28
Key Advantages ............................................................................................................... 29

4
Large Language Models (LLMs) .......................................................................................... 30
The Journey of LLMs: A Historical View .......................................................................... 31
How Large Language Models (LLMs) Work ..................................................................... 33
Methods to Improve LLMs Response ................................................................................ 36
1.

Fine Tuning .............................................................................................................. 36

2.

Retrieval Augmented Generation (RAG) .................................................................... 38

3.

Prompt Engineering .................................................................................................. 41

Prompt Engineering ............................................................................................................. 43
Prompt Engineering Technique ............................................................................................ 45
1.

Persona Technique .................................................................................................... 45

2.

Clarity ...................................................................................................................... 48

3.

Task/Instruction” Prompting ..................................................................................... 51

4.

Example.................................................................................................................... 53

5.

Zero-Shot Prompting ................................................................................................ 57

6.

Few Shot Prompting .................................................................................................. 60

7. Chain Prompting .......................................................................................................... 65
8. Chain of Thought.......................................................................................................... 68
9. Tree of Thought ............................................................................................................ 75
Ethical Implications and Practical Challenges in Advanced Prompting Techniques ................ 78
Recommendations ................................................................................................................ 81
Conclusion........................................................................................................................... 83
References ........................................................................................................................... 85

5
List of Tables
Table 1 Boolean Search Query Breakdown ......................................................................... 14
Table 2 PRISMA Compliance Summary ............................................................................. 14
Table 3 Summary of Selected Literature.............................................................................. 14
Table 4 Examples of In-Context Learning Types with Descriptions and Prompts ................ 53
Table 5 Examples of Chain Prompting in Practice ............................................................... 68
Table 6 Examples of Chain-of-Thought Prompting in Different Reasoning Tasks ............... 73

6
List of Figures
Figure 1 Prisma Flowchart .................................................................................................. 11
Figure 2 Digital Neural Network Representing Artificial Intelligence ................................. 17
Figure 3 Portrait of Alan Turing .......................................................................................... 19
Figure 4 Arthur Samuel's Checkers Program on the IBM 701, ............................................. 21
Figure 5 Deep Neural Network............................................................................................ 22
Figure 6 Illustration of LLMs and GANs in AI technology. ................................................. 27
Figure 7 Transformer Architecture Diagram… .................................................................... 30
Figure 8 Evolution of Large Language Models (LLMs). ...................................................... 32
Figure 9 Visual Breakdown of Prompt Engineering Components ........................................ 43
Figure 10 Screenshot of a Response Generated by ChatGPT ............................................... 47
Figure 11 Screenshot of ChatGPT’s Recommendation of the Personal Development Book . 50
Figure 12 Flowchart Explaining Zero-Shot Learning .......................................................... 58
Figure 13 Few-Shot Prompt for Sentiment Classification. ................................................... 63
Figure 14 Few-Shot Prompt for Grammar Correction .......................................................... 64
Figure 15 Illustrates the Tree of Thoughts (ToT) Framework .............................................. 75

7
Abstract
This thesis provides a detailed discussion of Sophisticated prompting techniques in
large language models (LLMs) to enhance performance with particular tasks, contextual
understanding, and reasoning. Knowledge of how to efficiently engineer the outputs of LLMs
such as GPT-4 has become increasingly important, as these models become increasingly
popular in numerous applications. This paper examines and categorized other ways to do so,
including zero-shot and few-shot prompting, Chain-of-Thought (CoT) prompting, and other
recent methods, such as Tree of Thoughts (ToT) prompting and Persona prompting.
Considering that look at the work of each technique, advantages and disadvantages of the
technique, when the technique is the most appropriate, what is the effectiveness of the
technique. The thesis also discusses how one can employ the ideas in the real world, the
issues that arise and what could go on in future. The findings demonstrate that timely
engineering is decisive in extracting the maximum out of LLMs, minimizing hallucinations,
and ensuring that outputs conform to intentions of the user. The study adds to the growing
body of literature by providing a systematic and fact-based review that can be used for more
research and use.

8
Introduction
Large Language Models (LLMs) like GPT-3.5, GPT-4, and other generative AI tools
have emerged as a result of the quick development of artificial intelligence (AI), particularly
in the area of natural language processing (NLP). These models are changing whole
industries by letting machines understand and write text that is similar to what people write.
But the way users enter instructions or questions into LLMs, or how they are prompted, has a
big effect on how well they work.
Without prompt engineering, Large Language Models (LLMs) often give results that
are unclear, don't match what the user expects, or aren't factually accurate. Although after this
extensive training on all kinds of textual corpora, LLMs are intrinsically sensitive to the
phrasing of the input and may not be able to reliably recognize the intent of the user without
supervised training. Prompt engineering is a systematic way of creating, enhancing and
structuring input prompts to make LLMs provide more precise, relevant, and coordinated
answers. The method enhances the collaboration between people and AI systems by
providing details, particularity in tasks, rational ordering of the material to the model. Prompt
engineering is extremely crucial in refining models to perform better, reasoning better,
diminishing visions, and ensuring that the output corresponds to the objectives on which it
was supposed to be. This paper examines the significance of prompt engineering as a
fundamental method of maximizing the use of LLMs in the real world.
This analysis provides an in-depth view of refined prompting mechanisms that can
enhance the practicality, efficiency and precision of LLM products. The information on such
promoting approaches as persona-based prompting, chain-of-thought reasoning, zero-shot
and few-shot prompting, etc. is abundant. This review is to demonstrate the most appropriate
methods of carrying out prompt engineering and discuss the advantages and disadvantages of
each in real-life scenarios.

9
Research Aims and Objectives
The key objective of the research is to determine how improved prompting strategies
can help Large Language Models (LLMs) be more effective, reliable, and flexible. This study
aims to reflect the theoretical progress of prompt engineering and its practical applications in
business, healthcare, education, and so on. The study has the following specific goals:


To look into how prompting methods have changed over the years: Users
discussed how the use of prompting techniques no longer involves a single question
but rather some more sophisticated methods such as zero-shot, few-shot, Chain-ofThought (CoT), and Tree of Thought (ToT) prompting. There was also discussion of
how inputs are growing more contextual and structured.



To see how well the newest prompting strategies work: Users looked at how
advanced prompting techniques improve a model's ability to reason, be accurate, and
fit in with its surroundings. People often use these methods along with retrievalaugmented generation, reinforcement learning, and instruction tuning, among others.



To give helpful tips for real-world situations: Users give researchers, developers,
and practitioners evidence-based advice on how to make and use complex prompting
strategies that get the most out of LLM outputs for different use cases, while also
thinking about possible problems and moral issues.

Research Questions/Hypotheses
The purpose of this study is to find out how advanced prompting strategies change the
usefulness, reliability, and ethical use of large language models (LLMs) in different areas.
The study is based on the following main questions:

10
Research Question 1: In a number of real-world settings, such as customer service,
healthcare, and education, which advanced prompting techniques work best to make large
language models work better?
Research Question 2: How do different prompting methods, such as contextual, fewshot, and zero-shot prompting, affect the accuracy, coherence, and reasoning skills of LLM
outputs?
Research Question 3: What ethical and practical problems come up when using
advanced prompting techniques, especially in sensitive or high-stakes situations?

Methodology
This study uses a systematic literature review method based on the Preferred
Reporting Items for Systematic Reviews and Meta-Analyses 2020 (PRISMA) framework to
ensure that the academic and industry sources used for advanced prompting techniques in
Large Language Models (LLMs) are clear and correct.

PRISMA Flowchart
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses
(PRISMA) framework is used to help choose which studies to include in this systematic
review and meta-analysis. PRISMA gives a clear and standard way to find, screen, and
decide if published literature is eligible, which makes sure that evidence synthesis is done in a
methodologically sound and reproducible way. The structured flow diagram makes it easy to
keep track of how studies were included or left out at each step of the review.
Following the PRISMA process, users found a total of 112 records, 110 of which
came from database searches and 2 from other sources. Users looked at the titles and
abstracts of 90 unique records after removing duplicates. Users then removed 30 studies that

11
were not related to large language models (LLMs) or prompt engineering, were blog posts or
opinion pieces, or were incomplete or hard to find. After that, 60 full-text articles were
checked to see if they were eligible. Ten were not because they didn't have enough
methodological detail, their prompting focus wasn't relevant, or their content was too similar
to other articles. In the end, the systematic review included 74 studies, and 40 of them were
good enough and relevant enough to be included in the meta-analysis.

Figure 1: Prisma Flowchart

12
Protocol and Selection Criteria
There were 74 sources in this study, including peer-reviewed journal articles, arXiv
preprints, conference papers, and technical reports that were published between 2017 and
2025. The review followed a set protocol from the PRISMA 2020 guidelines, which made
sure that screening, eligibility evaluation, and thematic classification were all clear. Users
picked sources that were related to advanced prompting strategies, making large language
models (LLMs) work better, and basic AI/ML/DL developments. The different sources show
that prompt engineering is a field that crosses over into many others, such as healthcare,
education, and business AI systems.

Inclusion Criteria
The following standards are used to pick sources that are both useful and high-quality:
1. Articles, preprints, technical reports, and industry publications that have been peerreviewed and published between 2017 and 2025.
2. Concentrate on strategies for prompting, engineering prompts, or improving
performance in LLMs.
3. Papers that talk about how to prompt, how to compare different methods, or the moral
issues that come up.
4. Sources in English.

Exclusion Criteria
Users used the following criteria to eliminate out sources that weren't relevant or were of
poor quality:
1. Blogs or opinion pieces that don't have any technical information.
2. Research that doesn't focus on LLMs or NLP-driven prompts.

13
3. Articles that don't explain their methods well enough.

Rationale
Most of the time, peer-reviewed journals and official reports from AI research labs
publish LLM research. Users used open-access research from official sources (like OpenAI,
Meta, Anthropic, and Hugging Face) as our main sources because they were
methodologically clear and useful in real life.

Search Strategy and Databases Used
Users used a systematic search strategy to find studies that were relevant to prompting
and large language models. Users used a number of databases and platforms for this search,
including:
1. Google Scholar,
2. arXiv.org,
3. Science Direct,
4. Springer Link,
5. OpenAI Blog,
6. Meta AI and Hugging Face repositories.

Keywords and Boolean Combinations Used
Users chose keywords based on how they are commonly used in academic and
professional settings, and then users used Boolean operators to combine them so that they
could be used in both. Users looked for open-source implementations and publications to
make sure they would work in real life.

14
Table 1: Boolean Search Query Breakdown
Component

Search Terms

Prompting Technique

"prompt engineering" OR "prompt design"

Model Type

"LLM" OR "large language model"

Prompting Variants

"few-shot" OR "zero-shot" OR "contextual prompting" OR
"chain of thought"

Performance Metrics

"performance" OR "accuracy" OR "alignment" OR
"hallucination"

PRISMA Compliance Summary
Table 2: PRISMA Compliance Summary

PRISMA Element

Compliance Status

Eligibility Criteria Defined

Yes

Search Strategy Transparent

Yes

Duplicate Removal Performed

Yes

Records Screened & Filtered

Yes

Full-Text Assessed

Yes

Included Studies Counted

Yes

Summary of Selected Literature
Table 3: Summary of Selected Literature
Category

Artificial Intelligence (AI)

Machine Learning (ML)

Deep Learning (DL)

Number of Papers
Cited

Key Contribution

5

Foundations of AI, Turing
test, evolution of intelligent
agents

3

ML algorithms, probabilistic
learning, auto-encoding,
supervised/unsupervised
learning

5

DNNs, backpropagation,
CNN/RNN, variational
autoencoders

15

9

Evolution of LLMs,
generative architectures, GPT
models

4

Prompt without example;
effective with structured
instruction

Few-shot Prompting

4

In-context learning with
examples; generalization
without fine-tuning

Chain-of-Thought (CoT)

6

Step-by-step logic reasoning;
improves answer consistency

Tree-of-Thought (ToT)

2

Recursive reasoning with treebased decision nodes

Prompt Chaining / AI Chains

2

Modular workflows;
sequential sub-tasks in LLM
prompts

Persona Prompting

2

Adds tone and domainawareness to output

Task/Instruction Prompting

2

Direct role or task prompting;
improves intent alignment

Clarity-based Prompting

2

Focuses on rephrasing,
simplification, instruction
clarity

Example-based Prompting

2

Demonstration-based
guidance for structured tasks

Self-consistency Prompting

1

Sampling multiple reasoning
paths to select stable answer

Prompt Optimization

2

Refining inputs to improve
consistency and reduce errors

Multimodal Prompting (Vision +
Text)

2

Vision-language prompts
using models like BLIP-2

Retrieval-Augmented Generation
(RAG)

4

Augments responses with
factual external documents

Fine-Tuning (General)

3

Domain-specific model tuning
and performance
improvement

Parameter-Efficient Fine-Tuning
(PEFT)

2

Low-resource alternatives to
full fine-tuning

Generative AI & LLMs Overview

Zero-shot Prompting

16

Instruction Tuning

1

Aligns model output with
natural-language task
guidance

Ethics and Risks in Prompting

2

Risk of hallucination, prompt
manipulation, persona misuse

Discussion of Findings
The systematic review of 74 chosen studies shows how the field of prompt
engineering is changing in relation to Large Language Models (LLMs). Techniques like zeroshot, few-shot, chain-of-thought (CoT), and persona prompting have made LLMs work much
better in all areas, especially in reasoning, task alignment, and content generation. There is
empirical evidence from several studies (e.g., Brown et al., 2020; Wei et al., 2022) that wellstructured prompts not only improve the accuracy of responses but also lower the uncertainty
and hallucinations of models.
However, even though the practical improvements are impressive, ethical issues came
up again and again. For example, persona prompting was found to improve tone control and
domain specificity, but it also made it easier for people to misrepresent themselves, especially
when pretending to be experts like doctors or engineers (Kim et al., 2024; Olea et al., 2024).
Also, the fact that the output changes based on small changes in the prompt wording shows
that LLM behaviour is not clear or easy to understand (Ceurstemont, 2025). This output
sensitivity makes things difficult in high-stakes areas like healthcare and finance, where
being accurate and responsible is very important.
The results also show that prompt engineering is not a stand-alone strategy but is
becoming more and more integrated with other strategies that work well with it, such as
Retrieval-Augmented Generation (RAG), instruction tuning, and parameter-efficient finetuning. It seems that these approaches work best when they are used together to slow down
the rate at which knowledge becomes obsolete, reduce hallucinations, and improve factual

17
accuracy, especially in environments that are constantly changing or lack data (Lewis et al.,
2020; Hu et al., 2021).

Artificial Intelligence
Artificial Intelligence (AI) is a field of computer science that looks at how to make
machines that can do things that normally require human intelligence. These tasks include
thinking, learning, solving problems, perceiving, and understanding a language. Researchers
often define intelligence as an agent's ability to reach goals in a variety of settings (Legg &
Hutter, 2007).

Figure 2: Digital neural network representing artificial intelligence Photo by A. Smith on
Unsplash (https://images.unsplash.com/photo-1504384308090-c894fdcc538d). Used under
Unsplash license.
AI uses different methods, such as machine learning, natural language processing, and
robotics, to mimic these human cognitive functions. Machine learning (ML) is a big part of
AI that helps algorithms learn to find patterns and make decisions on their own using data
(Kühl et al., 2020).

18
Examples and Applications of AI
AI has changed quickly and is now useful in a lot of ways. AI systems have proven
that they can make hard choices when they play games. They beat human chess champions in
1996 and human Go champions in 2016 (Gupta, 2023). GPT-4 and other advanced systems
are very good at understanding language and reasoning. Because of this, they work well for
chatbots, translation services, and tools that make content (Bubeck et al., 2023). AI has also
come a long way in computer vision, which helps machines understand visual information for
things like recognizing faces, driving by themselves, and medical imaging (Gupta, 2023). AI
helps doctors by giving them automated diagnostic tools, such as CT scans that can find
strokes, and it also makes decision support systems better (Bubeck et al., 2023). AI is what
lets self-driving cars read sensor data, make decisions about how to drive in real time, and get
through heavy traffic (Gupta, 2023). AI has also changed the things that people buy. It runs
smart assistants like the Amazon Echo, suggests movies and TV shows on streaming services
based on user preferences, and lets virtual customer service agents do their jobs (Gupta,
2023). These examples show that AI systems are becoming more common in both technical
and everyday life. This makes things work better and faster.

A Brief History of AI: Key Milestones
The concept of AI has been around since the 1940s. The "Turing Test" for machine
intelligence was first described in Alan Turing's important 1950 paper (Gupta,, 2023). AI has
changed over the years from systems that used symbolic logic to machine learning methods

19
that use data.

Figure 3: Portrait of Alan Turing
(https://www.nationalgeographic.com/science/article/alan-turing-test-artificial-intelligencelife-history).
AI has made progress through a number of important stages. Basic work in logic,
algorithms, and symbolic AI from the 1950s to the 1970s made it possible for more progress
to be made. IBM's Deep Blue beat world chess champion Garry Kasparov in 1996. This was
a big step forward for AI in making strategic decisions. In 2016, DeepMind's AlphaGo
shocked everyone by beating a world champion at the game of Go. A lot of people thought
this was too hard for machines because there are so many possible moves. In the 2020s, large
language models (LLMs) like GPT-4 became very popular very quickly. They can think, talk,
and figure things out in a lot of different fields (Bubeck et al., 2023). Over time, AI has

20
grown into a field that includes ideas and methods from computer science, neuroscience,
statistics, and linguistics (Yu & Kumbier, 2017).
Limitations and Challenges
AI has come a long way, but it still has a lot of big problems to work on. One big
problem is hallucinations, which happen when AI systems give out false or misleading
information. Also, current models don't have long-term memory, so they can't remember and
build on what they've done in the past. Planning is still a big problem because AI can't do
tasks that require more than one step, foresight, or strategic reasoning. Also, there are still
worries about bias and transparency because AI systems often reflect the biases that are
already present in the data they were trained on and usually have trouble explaining how they
make decisions (Bubeck et al., 2023). These limitations show that modern AI systems, even
the most advanced large language models, are still not good enough to reach Artificial
General Intelligence (AGI), which is the ability to do any intellectual task that a human can
do.

Machine Learning (ML)
The field of machine learning (ML) is concerned with creating algorithms and
statistical models that allow computers to carry out certain tasks without being told to do so
(Mahesh, 2020). It lets computers get better at what they do by using experience and learning
from data. Goli and Singh (2024) define ML as a field that lets systems learn on their own,
which shows how important it is in modern intelligent applications.
Machine learning (ML) is part of the larger story of how people have always tried to
make difficult tasks easier by making and using tools and machines (Goli & Singh, 2024).
Alan Turing is a very important person on this journey. He is often called the father of

21
modern computer science. In 1936, Turing thought of the "Turing Machine." This was a
theoretical model that made the rules of computation more clear and set the stage for
algorithmic processing, which is a key part of machine learning (National Institute of
Standards and Technology [NIST], n.d.). In 1950, he came up with the famous "Turing Test,"
which is a philosophical and technical way to see how well a machine can act like a person.
This standard is still used in AI today (New Scientist, n.d.).
Arthur Samuel and other pioneers built on these ideas to move the field forward by
making early self-learning systems, like a checkers-playing program in the 1950s. Samuel's
work made the idea that machines can learn from experience more popular, and he is credited
with defining machine learning as the ability of computers to learn without being
programmed (Goli & Singh, 2024). These early advances made it possible for today's
powerful machine learning systems, like large language models, to exist. These systems are
still changing quickly, affecting industries and how people use computers.

Figure 4: Arthur Samuel's Checkers Program on the IBM 701, demonstrated on live
television in 1956. This early example of machine learning showcased the potential of
computers to learn from experience, a concept central to the development of artificial

22
intelligence.
Source: Press, G. (2021, May 28). On thinking machines, machine learning, and how AI
took over statistics. Forbes. https://www.forbes.com/sites/gilpress/2021/05/28/on-thinkingmachines-machine-learning-and-how-ai-took-over-statistics/
Deep Learning and Neural Networks

Deep learning (DL) is a powerful part of machine learning (ML) that uses artificial
neural networks (ANNs) as its main way of doing math to learn hierarchical, abstract
representations of data on its own (LeCun et al., 2015). Deep learning models can
automatically learn more and more complex features from raw data without having to do any
manual feature engineering. To do this, user stack several layers of neurons on top of each
other (Goodfellow et al., 2016). These multilayered networks, called deep neural networks
(DNNs), can do a lot of different things.

Figure 5: Deep Neural Network. Adapted from Bre, F., Gimenez, J., & Fachinotti, V. (2017).
Prediction of wind pressure coefficients on building surfaces using Artificial Neural
Networks. Energy and Buildings, 158, 1429–1441.
https://doi.org/10.1016/j.enbuild.2017.11.045

23
A Brief History of Deep Learning and Neural Networks
The main idea behind deep learning comes from early attempts to copy how
biological nervous systems work. McCulloch and Pitts made the MCP model in 1943. It was
the first mathematical model of a neuron. It was the starting point for making neural
networks. Rosenblatt built on this idea in 1958 by creating the perceptron, a two-layer neural
network that could do simple classification tasks. This was the first time that machine
learning used the MCP model. Minsky and Papert (1969) later pointed out the problems with
single-layer perceptrons, especially how they couldn't solve linearly inseparable problems.
People were less interested in artificial neural networks (ANNs) in the 1970s after this
discovery.
When Geoffrey Hinton and his team created the backpropagation algorithm in 1986,
they made a big leap forward. Multilayer perceptrons could now learn complex, nonlinear
mappings, which brought neural networks back into the spotlight (Rumelhart et al., 1986).
Hinton and his team made another big step forward in 2006 by coming up with a method that
began with unsupervised pre-training and ended with supervised fine-tuning. A lot of people
think that this method was the beginning of modern deep learning (Hinton et al., 2006). It
fixed the problem with the gradient going away. A lot has changed in the field since 2012.
Numerous labeled datasets such as ImageNet, parallel-computing hardware such as GPUs,
and improved neural networks are only a few of them. These changes have made deep
learning very popular in both schools and businesses nowadays (Krizhevsky et al., 2012;
LeCun et al., 2015).

24
Key Components and How Deep Neural Networks Learn
Artificial neural networks are constituted of neurons. They are linked up units that
process information. These neurons have three parts or layers, including input, hidden, and
output (Goodfellow et al., 2016).
Neurons and Layers: The individual neurons are fed with a nonlinear activation (e.g.,
ReLU, sigmoid or tanh) to discover the presence of complex patterns by performing an affine
combination of the inputs and then adding a bias (Nwankpa et al., 2018).
Weights and biases: Weight and biases are the parameters that one can train and
adjust when making better predictions through learning (Goodfellow et al., 2016).
Leaning Process: A number of network forward passes (computation of outputs) and
BPN (updating weights by performing backpropagation and gradient descent) are performed
to reduce a loss. This assists them to improve with time (Rumelhart et al., 1986).

Examples of Deep Learning Architectures
Different types of neural network architecture have been constructed by researchers to
cope with the reality that tasks and data type will not always be identical. Multi-layer
perceptron (MLP) is the simplest type of a feed-forward network. They possess a series of
fully linked layers which are capable of modelling complicated nonlinear functions
(Goodfellow et al., 2016). The CNNs are useful with the kind of data that resembles a grid
such as pictures. They are taught to arrange themselves the space at their own using pooling
and convolution layers. There has been a significant impact on computer vision applications
by such popular CNN architectures as LeNet, AlexNet, VGG and ResNet (Krizhevsky et al.,
2012; He et al., 2016).

25
A different source of inspiration, on the other hand, is recurrent neural networks
(RNNs), which are specifically structured to operate sequentially in order to capture timedependencies. Learning long-range dependencies has also been made easier by people seeing
fit to improve such networks as Long Short-Term Memory (LSTM) networks and Gated
Recurrent Units (GRUs) (Hochreiter & Schmidhuber, 1997). Another huge step in the right
direction is the generative adversarial network (GAN). It possesses a generator and a
discriminator which are set to operate in opposition to produce a falsified data, which is so
nearly actual to be quite deceiving (Goodfellow et al., 2014).
Transformers that were published in 2017 revolutionized sequence modelling by
incorporating self-attention that allowed concurrent processing of inputs. This has
transformed the way natural language processing (NLP) has been carried out and that is why
new frameworks such as BERT and GTP can now work (Vaswani et al., 2017). These main
architectures are not the only ones improving, though. Autoencoders teach user unsupervised
representation, Graph Neural Networks (GNNs) are used when user data is in graph form,
and more models such as Capsule Networks and Neural Ordinary Differential Equations
(Neural ODEs) demonstrate that the field is dynamic.

Applications
Deep learning has transformed many areas by allowing models to be able to learn
complicated patterns on data:
Computer Vision: With CNNs (LeCun et al., 2015), sorting, finding, and cutting
images into parts and determining what is happening in a scene are much easier.

26
Natural Language Processing: The RNNs and transformers help machines to translate
languages, comprehend how people feel, respond to questions, and write text (Vaswani et al.,
2017).
Healthcare: The medical imaging, compression, and analytics, as well as genomic
and drug discovery analysis, become more precise and automatic due to deep learning
(Esteva et al., 2017).
Finance: It is applied in search of fraud, algorithmic trading, risk estimation,
prediction (Feng et al., 2021).
Robotics and Autonomous Systems: This enables drones and driverless cars to
observe, investigate, and act in an autonomous way (Kiran et al., 2021).
Key Findings
Scholars have found out that deep learning can automatically extract properties out of
the raw data. It shows that the manual feature engineering is required to a much lesser extent
(LeCun et al., 2015). Deep learning also works better on a larger dataset and on a more
difficult to discern model. They tend to be more precise and better in applying what they
know to new circumstances than their older machine learning algorithm counterparts
(Goodfellow et al., 2016). Deep learning models have been referred to by people as black
boxes as it is difficult to understand how they work with them but always achieve the best
performance on many different jobs. This skill has led to new ideas in many fields, including
healthcare, finance, and others (LeCun et al., 2015).

27
Generative AI
Generative AI (GenAI) is a kind of AI that uses machine learning to create new data
outputs such as text, pictures, code, audio, video, and simulations. GenAI is not the same as
regular AI because it is made to make new, useful, and often human-like content (Saini &
Sharma, 2024). Most of the time, people use traditional AI to look at or sort through data that
is already there. GenAI is a small part of a larger field of AI that learns from data and uses
that knowledge to create new things (Tiwari & Patel, 2024).
Historical Development
In the 1960s, Joseph Weizenbaum did early tests on rule-based chatbots like ELIZA.
The first steps toward generative AI were these tests (Kumar & Sharma, 2024). Generative
Adversarial Networks (GANs) were created by Goodfellow and others in 2014. This was a
big step forward because it let people make things that looked very real. The release of
ChatGPT in late 2022 (Liu et al., 2024) made GenAI the most cutting-edge field because of
Large Language Models (LLMs) like GPT-3, LLaMA, and ChatGPT.

Figure 6: Illustration of LLMs and GANs in AI technology. Adapted from “LLMs and GANs:
The AI technologies that will create our new reality (Part 4),” by S. Mali, 2023, Medium.
https://medium.com/@surabhimali/llms-and-gans-the-ai-technologies-that-will-create-ournew-reality-part-4-aiseries-2d3b2c757b0b

28
Core Models and Examples
There are a few important model architectures that generative AI is based on.
Generative Adversarial Networks (GANs) are very good at making pictures and videos that
look real. There is a generator and a discriminator that are always at odds with each other
(Liu et al., 2024). Variational Autoencoders (VAEs) are another well-known way to make
data using variational inference. They come close to probability distributions (Kingma &
Welling, 2016). The Generative Pre-trained Transformer (GPT) series and other transformerbased models have become the most important parts of modern large language models in the
last few years. These models use self-attention to write text that is easy to read and has a lot
of context. This makes a lot of generative tasks in natural language processing possible
(Radford et al., 2019; Vaswani et al., 2017).
People often use ChatGPT, a generative AI program, to write conversations. DALL-E
is good at drawing. Codex helps with code generation, and WaveNet is used for voice
synthesis.
Applications Across Domains
Generative AI is changing a lot of things by finding new ways to use it. People in
healthcare use it to find new drugs, make medical images, and create care plans that are
different for each patient (Saini & Sharma, 2024). Generative AI writes reports on its own,
combines a lot of data, and runs smart chatbots that help customers in the finance industry
(Tiwari & Patel, 2024). The media and entertainment industry uses generative AI to make
games, music, videos, and scripts that people can play (IJTSRD72647, 2024). Generative
models look at and use customer data to help make ads and strategies that are more relevant
to them (Liu et al., 2024). Scientists' work has also changed a lot because of generative AI. It
has helped scientists come up with new ideas, run molecular simulations, and make fake data

29
to show situations that aren't well represented (Arxiv:2403.04190). Schools and software
developers are also getting AI-powered tutors and helpers for programming. The way these
tools can change will depend on what the students need and how good they are at using them
(IJTSRD72647, 2024).
Key Advantages
The major advantages of GenAI includes;
Speed and Scalability: GenAI can make content much faster and on a much larger scale than
humans can (Saini & Sharma, 2024).
Data augmentation: This lets user train in places where there isn't a lot of data by using fake
datasets (Arxiv:2001.06937).
Customization and Personalization: This feature gives each user results and suggestions that
are unique to them (Liu et al., 2024).
Challenges and Risks
GenAI has a lot of potential, but it also makes us wonder about a lot of things:
Bias and Hallucination: The outputs could make biases in the training data stronger or make
up false information (Kumar & Sharma, 2024).
Security Risks: GenAI can be used for bad things like deepfakes, phishing scams, and
spreading false information (Arxiv:2403.04190).
Legal Risks: There are some legal and moral issues that aren't clear, like copyright, the fact
that there are no human authors, and the ethics of getting data (ResearchGate, 2024).

30
Environmental Impact: Training big models uses a lot of computer power and energy (Tiwari
& Patel, 2024).
Job Loss: Automation could put jobs in the creative and technical fields at risk (Saini &
Sharma, 2024).

Large Language Models (LLMs)
Large Language Models (LLMs) are advanced AI systems that can understand,
analyze, and create human language. They are like big digital brains that have been trained on
a lot of text data, so they can write text that sounds a lot like what a person would write (Patil
& Gudivada, 2024). LLMs are built on the Transformer architecture. This is a kind of neural
network that is very good at working with text and other data that comes in a sequence. This
design lets LLMs figure out what the next word in a sequence will be, which helps them
understand and use language that makes sense and fits the situation. This is a very important
part of their training (Patil & Gudivada, 2024).

Figure 7: Transformer Architecture Diagram. Adapted from Brownlee, J. (2023, October 2).
The Transformer Model. Machine Learning Mastery.
https://machinelearningmastery.com/the-transformer-model/

31
Self-supervised learning is when LLMs learn from data that doesn't have labels. This
helps them find difficult language structures without having to look at a lot of examples that
have already been marked (Patil & Gudivada, 2024). They can use this skill for a lot of
different NLP tasks, such as translating, summarizing, and figuring out how people feel about
something. They can understand and write in natural language (NLU and NLG), which
makes them useful for both work and school.
One thing that makes LLMs different is their size. These models often have billions,
and sometimes trillions, of parameters (Patil & Gudivada, 2024). This level not only helps
people learn languages better, but it also gives them new skills, such as reasoning, planning,
and learning how to adapt to new tasks in context. In this way, the model learns how to solve
new problems by looking at a few examples that are given to it (Patil & Gudivada, 2024).
GPT-3.5 and GPT-4 from OpenAI are two well-known LLMs that work well in a lot
of languages and tasks. This shows that AI is moving away from being used for only a few
things and toward systems that can do a lot of different things (Goli & Singh, 2024). These
skills are very important for business owners and decision-makers, especially when it comes
to figuring out how to make the customer experience better, automate tasks, manage
knowledge, and go digital.

The Journey of LLMs: A Historical View
The emergence of LLMs is only one aspect of the broader shift in business technology
and artificial intelligence. Initially, the most popular approaches in NLP were statistical and
rule-based approaches. These systems' syntactic and semantic rules were difficult to establish
and lacked flexibility and breadth (Research.pdf). NLP was greatly improved by deep

32
learning, particularly the Transformer model that was released in 2017 (Patil & Gudivada,
2024).
PLMs, or pre-trained language models, were significant. Two of the earliest PLMs
that allowed user to learn from the work of others were BERT and T5. This implies that
models trained to perform general language tasks could be improved for more specialized
tasks, such as responding to inquiries or determining an individual's emotional state (Patil &
Gudivada, 2024). These basic models were used to make the LLMs, which were all about
making things bigger and more general.
This path took a big turn with GPT-3. All user had to do was give GPT-3 a prompt
with instructions or examples, and it could do things it had never done before. This was
different from earlier models that had to be adjusted for each job. Researchers call this "zeroshot" or "few-shot" learning (Research.pdf). People started to think of prompt engineering as
the best way for users and models to talk to each other after this change. This means that AI
features can be used by people who know how to use them and people who don't know how
to use them.

Figure 8: Evolution of Large Language Models (LLMs). Adapted from Huang, L. (2024,
April 22). Large Language Model — History. Medium.
https://medium.com/@linghuang_76674/llm-history-5db2c9e236f5

33
Early LLMs also demonstrated some issues. Because the training data is incomplete
or doesn't align with user preferences, they may produce inaccurate, biased, or harmful
content. Therefore, it was essential to improve alignment techniques. To help models learn to
give safe and sensible responses, users added reinforcement learning using human feedback
(RLHF) and instruction tuning. The mentioned techniques play a significant role in the
ethical use of AI by companies and the reduction of hallucinations (Naveed et al., 2024).
The idea of model editing has become a trend in the recent years as it has allowed the
functionality of a model to be changed without recreating everything. As stated by Yao et al.
(2023), throughout its business processes, such firms could adjust or improve the results of
the models at any point, making them more responsive and flexible. Now LLMs can be used
by everyone due to the open-source models. This enhances openness, academic research, and
individualized corporate solutions (Naveed et al., 2024).
Over the course of a few years, medium-sized LLMs have evolved to become
commonplace. This indicates how much they have transformed machine learning and digital
approaches in general (Kaddour et al., 2023). New objectives are fast becoming a business,
and cross-border operation priority with the rise of LLMs. These are the ability to understand,
never stop learning, have resilience, and understand how to work with data using different
languages and formats (Patil & Gudivada, 2024).

How Large Language Models (LLMs) Work
There are three big concepts behind Large Language Models (LLMs) such as GPT,
BERT and T5, tokenization, next-token prediction, and training over massive web-based data
sets.

34
1. Tokenization: Breaking Text into Tokens
LLMs look at tokens which are short texts. Such tokens are word fragments
commonly generated by advanced algorithms encompassing SentencePiece, WordPiece, or
Byte Pair Encoding (BPE). This approach makes models more likely to cope with new words,
rare words, and different languages (Devlin et al., 2018; Radford et al., 2019; Understanding
the Latest Advances in AI, n.d.; Touvron et al., 2024).
Example:
Input:
Transformers are powerful
Tokenized (WordPiece or BPE):
["Trans", "##form", "##ers", "are", "powerful"]
2. Next-Token Prediction:
The Core Learning Task LLMs are trained to predict the token that will be in a
sequence based on the previous tokens given. This example is a description of autoregressive
modelling method. The model is trained to give probabilities to the possible tokens that may
come in the future and picks the most likely tokens. The methods which make the output
more random and more interesting are temperature scaling, top-k sampling, and nucleus
sampling(top-p) (Brown et al., 2020; Touvron et al., 2024).
For example, a temperature of 0 means always choosing the most likely token
(deterministic), while a temperature of 0.8 makes the text more random, which makes it more
human-like and varied (Understanding the Latest Advances in AI, n.d.).
Example:

35
Input so far:
Transformers are
Model prediction probabilities:
"powerful" → 0.70
"great" → 0.20
"fun" → 0.10
Selected output (depends on sampling strategy).
3. Training on Web-Scale Data
Reading a lot of text from the internet helps LLMs learn how to understand language.
Some of the most common datasets are Wikipedia, BookCorpus, Common Crawl (a huge
web scrape), WikiText-103, and open-source code repositories like GitHub. These corpora
have trillions of tokens, which helps models learn grammar, facts, reasoning, and even how to
code (Gao et al., 2020; Raffel et al., 2020; Touvron et al., 2024).
The quality of the data, on the other hand, is very important; data that is noisy or has
duplicates can make the model work less well. So, filtering, deduplication, and curation of
data are important steps to make training more effective (Shoeybi et al., 2019; Touvron et al.,
2024).

Example:
Raw text from dataset:
"Transformers are powerful models used in natural language processing."
Tokenized:
["Transform", "##ers", "are", "powerful", "models", "used", "in", "natural", "language",
"processing", "."]

36
The model learns:
"Transformers are powerful" → "models"
"language" → "processing"
Methods to Improve LLMs Response
1. Fine Tuning
Fine-tuning is a very important way to adapt pre-trained Large Language Models
(LLMs) to specific tasks or areas by changing their internal weights with datasets that are
specific to those tasks. This process helps LLMs learn a lot about a specific field that generalpurpose pre-trained models often miss (Chung et al., 2022). Fine-tuning is not the same as
prompt engineering or other methods that change the input without changing the model's
parameters. Improving performance metrics, such as accuracy or F1 score, on a validation set
that is comparable to the task at hand is the primary objective.

Use Cases and Applications
Fine-tuning has resulted in significant changes in the real world:
Product Attribute Extraction: After fine-tuning with only 200 labelled samples, the accuracy
of attribute extraction (such as product titles and prices) rose from 70% to 88%. The returns
begin to decline after 6,500 samples (Zhang et al., 2022).
Natural Language Processing in Biomedicine: In domains such as biomedicine, where
labelled data is scarce, fine-tuned LLMs have facilitated tasks like identifying related clinical
texts and providing answers to biomedical questions. This can be achieved by providing the
model with field-specific vocabulary and formats (Gu et al., 2021).
General NLP Standards: NLU and NLG tasks such as WikiSQL (generating SQL queries),
MultiNLI (generating natural language inferences), and SAMSum (generating dialogue

37
summaries) have benefited greatly from the fine-tuning of large models like GPT-3 (175B)
(Wei et al., 2022).

Comparison with Prompt Engineering
Prompt engineering changes the words in inputs so that LLMs give the right outputs.
It does this with prompts like zero-shot, few-shot, and chain-of-thought (CoT). This is good
for computers, but it isn't perfect. For example, CoT reasoning might make Visual Question
Answering (VQA) tasks less useful because the two types of reasoning don't match up (Li et
al., 2023).
Instruction tuning, which uses natural language task descriptions to make small
adjustments, has been shown to outperform zero-shot and few-shot prompting. On 20 out of
25 benchmarks, such as ANLI, RTE, and BoolQ, FLAN, a 137B model that was optimized
on instructions, outperformed GPT-3 (175B) (Chung et al., 2022).

Parameter-Efficient Fine-Tuning (PEFT) Methods
PEFT techniques can be helpful even though fine-tuning large models can be expensive:
LoRA, or low-rank adaptation: While keeping the same weights, it adds small, trainable
matrices to the attention layers. This leads to a three-fold reduction in GPU memory usage
and a 10,000-fold reduction in the number of trainable parameters without extending the
inference process duration (Hu et al., 2021).
Adapter Layers: These small modules travel between the layers of the Transformer and are
replaced for each job. Despite their usefulness, adapters slow down inference because they
can only handle one thing at a time (Pfeiffer et al., 2020).

38
Prefixes for tuning: It enhances an always-present prompt that appears before the input. It
works well, but it restricts how many tokens user can use and makes optimization more
challenging. LoRA does not lose or worsen performance without these issues.
2. Retrieval Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a new strategy to increase the accuracy, clarity,
and flexibility of Large Language Models (LLMs). Since the model takes into account useful
externally sourced documents as input, RAG is not a typical LLM (Lewis et al., 2020). It is
able to dynamically and contextually utilize real-time data due to its neural retriever and
sequence-to-sequence generator. The hybrid system assists LLMs with their problems such as
the hallucinations, outdated, and generalizations of various subjects.

How RAG Improves LLM Responses
Original LLMs are parametric implicit storehouses of knowledge that remember
everything that they learn in their weights. It is very difficult to follow the latest updates of
the memory, make sure the facts given out are proper and tell the truth regarding what is
happening as to this design. These issues are resolved in RAG because it keeps memory and
model parameters apart. An ordinary LLM would formulate a definition to a concept of
middle ear but a system endowed with RAG would seek a medical record that would provide
a correct definition and then propose an answer. RAG enhances the accuracy of the
responses, as well as provides links to contents that the user is able to follow or read (Lewis
et al., 2020).
RAG will also assist user to ensure details are up to date. User do not need to re-train
the model, but may simply switch out the external retrieval corpus. Because RAG is modular,

39
it is particularly helpful in industries that undergo rapid change, such as healthcare, law, and
customer service.

Comparison with Other Methods
The following section contrasts RAG with prompt engineering, fine-tuning, and traditional
LLMs to highlight the advantages and disadvantages of each.
LLMs in the Past: These models are fast and adept at drawing broad conclusions, but they
frequently invent things and grow stale. Additionally, their responses lack a source, which
reduces their usefulness for crucial assignments.
Fine-tuning: Fine-tuned models are highly specialized and limited to a particular job or style.
They need to be retrained to keep their knowledge up to date, which costs a lot of money, and
they tend to fit too well in small areas (Gu et al., 2021). Fine-tuned models use weights that
are based on what user know about the domain, but these weights don't change after the
model is deployed.
RAG: RAG helps us get new information quickly and doesn't take as many resources to
change. For example, a well-tuned model might need to be retrained to include the most
recent medical guidelines. A RAG model can do this by updating its corpus. This makes it
cheaper and easier to use.
Prompt Engineering: Changing the inputs of a model changes its outputs, but prompt
engineering has no basis in facts from the real world. RAG makes prompt engineering better
by adding real retrieved documents to the input sequence. This improves the accuracy of the
responses. By directly inserting retrieval signals into the model's embedding space, R2AG
and other sophisticated systems go one step further. This further facilitates the collaboration
between generation and retrieval (Zhao et al., 2023).

40
RAG Applications and Use Cases
RAG is beneficial in numerous ways:
Open-domain Question Answering: By improving the accuracy and usefulness of
responses, RAG has enhanced datasets such as Natural Questions and WebQuestions (Lewis
et al., 2020).
Fact Verification: RAG typically does this without assistance, correctly classifying
claims in tasks like FEVER by searching Wikipedia for them.
Customer service: IBM and Salesforce, for instance, use chatbots that integrate with
RAG to provide real-time responses based on documents. The responses become more
accurate and reliable as a result.
Content Creation: RAG is used by Jasper.ai and other websites to create fact-based
marketing materials that are unique for every company.
Autonomous Agents: Agents with access to LLM and RAG can retrieve information
from the internet or long-term memory, enabling them to make better decisions instantly.
Search Engines: RAG-inspired techniques make it easier for tools like Google MUM
and Bing Chat to generate snippets and provide answers to queries.
General NLP: By providing factual context to inputs and hidden layers, RAG
enhances translation, summarization, dialogue systems, and classification tasks.
RAG is a big step forward in designing language models because it uses both
parametric and non-parametric memory. It helps LLMs give answers that are based on facts,

41
up-to-date, and easy to understand. This lets powerful AI systems work in a lot of different
fields.
3. Prompt Engineering
One of the best ways to improve Large Language Models (LLMs) is to carefully plan
and improve the input prompts that are given to them. The best way to use these powerful
models is to make sure that the output is correct, useful, and makes sense. This field has
changed a lot, from simple empirical methods to a well-organized area of research.
How Prompt Engineering Improves LLM Responses
When user use prompt engineering, LLMs give better answers because the input is clearer
and more organized. There are both easy and hard ways to do this:
Basic Ways: Some of these are giving clear instructions, giving people roles (like "User are
an AI expert..."), using delimiters to separate parts of a prompt, and trying out different
answers to see which one works best. Zero-shot, one-shot, and few-shot prompting help
people respond by giving them different numbers of examples (Brown et al., 2020; Wei et al.,
2022).
Advanced Techniques: Some of these are Chain-of-Thought prompting for step-by-step
reasoning (Kojima et al., 2022), Self-Consistency for picking the most consistent answer
(Wang et al., 2022), and Prompt Optimization for making prompts better (Zhou et al., 2022).
Using more complicated methods like Tree of Thoughts or Decomposed Prompting can help
user solve problems better (Yao et al., 2023; Press et al., 2022).

42
Multimodal Prompting: Prompt engineering helps models that work with both text and
images, like Vision-Language Models (Zhou et al., 2022), bring together language and visual
understanding.
Comparison with Other Methods
Prompt Engineering and Fine-Tuning: Fine-tuning means changing a model's internal
settings so that it can do certain things better. This often needs a lot of storage and computing
power. But prompt engineering only changes the input prompts and leaves the model
parameters the same. This makes it work better, especially for big models or models that use
APIs and don't let user directly access weights (Lester et al., 2021). Prompt tuning, which is a
part of prompt engineering, only needs a few parameters—sometimes more than five orders
of magnitude fewer—than full model tuning. It is more efficient in situations when the data is
not many, or the field has altered (Zhou et al., 2022).
Prompt Engineering and Retrieval-Augmented Generation (RAG): After coming up with the
LLMs, RAG improves its performance by identifying and introducing external knowledge
that relates to the prompt. This assists in preventing occurrence of hallucinations. However,
RAG is not a different approach, as it is commonly applied to the working processes of
prompt engineering to increase the depth of truth and context (Lewis et al., 2020). Two cases
of how timely engineering can interact with other knowledge systems are the ReAct
framework and ART (Yao et al., 2022).
Use Cases of Prompt Engineering
Prompt engineering has become a helpful and versatile means to enhance the
performance of big language models (LLMs) in many diverse domains. In school it enables
the teaching staff to tailor educational content and provide individual feedback to students to

43
address their needs. With prompt engineering as the tool of material production, user will find
it less complicated to write stories and other texts that make sense in more than one language.
This will make the population more efficient and allows it more language choice. In the
world of programming, it's a good thing that it helps programmers write and fix code faster.
When people have to think about math and logic problems, well-designed prompts help them
do better. Prompt engineering is also very important for making datasets because it creates
fake data or data with labels for training. It is used in security to find and fix problems with
LLMs, like prompt injection or stealing models. Overall, prompt engineering makes the
results of LLMs much safer, more accurate, creative, and logical. It helps user get the most
out of new language models.

Prompt Engineering
As generative AI becomes more common, prompt engineering is becoming a more
important skill to have. This is the process of telling AI systems exactly what to do to get the
results user want (Patil & Puranik, 2024). More specifically, it means making sure that large
language models (LLMs) like ChatGPT can understand what the user wants by writing clear
and simple input prompts (Ekin, 2024). This process is both technical and creative; it is a
"art" to shape prompts so that the answers are useful, correct, and appropriate for the situation
(Bansal, 2024). In this way, prompt engineering is a very important link between what people
want and what machines can do.

Figure 9: Visual breakdown of prompt engineering components, highlighting the importance
of large language models (LLMs) trained on extensive data, and the role of instruction and
context in shaping AI outputs.

44
Source: Sahoo, Kumar, & Singh (2024), "A Systematic Survey of Prompt Engineering in
Large Language Models: Techniques and Applications," arXiv.
https://arxiv.org/abs/2402.07927
Prompt engineering is important because it directly affects how good and useful AIgenerated content is. Ceurstemont (2025) says that even small changes to how something is
written can change what AI makes a lot. For generative AI systems to work well, prompts
need to be very clear. When user give AI models like ChatGPT and DALL·E well-written
prompts, they work better. This makes them better for tasks like creating content, analyzing
data, teaching, and helping customers (Ceurstemont, 2025; Ekin, 2024). Prompt engineering
is also important for making sure that AI answers are useful, relevant, and in line with the
organization's goals (Patil & Puranik, 2024).
Prompt engineering is also an important part of using AI in a responsible way. It helps
reduce bias and makes sure that the results are fair, include everyone, and are aware of social
issues (Bansal, 2024). Well-thought-out prompts can also help with uncertainty by either
asking for more information or showing different ways to think about hard questions (Ekin,
2024). Prompt engineering makes AI systems easier to use and more reliable as they get
better (Patil & Puranik, 2024).
Users need prompt engineering because the generative AI systems users have now
don't understand natural language very well. These systems are very powerful, but they often
need carefully chosen inputs to give accurate, coherent, and contextually appropriate answers
(Ceurstemont, 2025). Bansal (2024) says that AI still needs people to tell it what to do, give it
context, and set goals for it to work well. If the prompts aren't set up well, the model's
answers might not be clear, correct, or useful.
I believe that prompt engineering is more than just a technical fix; it's also a way to
communicate in a smart way. It shows that people need to be more careful when they use

45
smart systems. Instead of just people who use content made by machines, it turns users into
AI partners. This skill will become more important as users move to places with more AI.
This is not just for engineers or data scientists; anyone who uses AI tools can use it. Some
people think that AI systems will become so smart in the future that they won't need prompts
that were made by people (Ceurstemont, 2025). But I think that prompt engineering will
always be needed in some way. Even if an AI were "perfect," people would still need to be
clear about what they want and how they plan to get it, especially when making tough or
moral decisions.
When working with AI models that make text, code, or images that look like people
(Ceurstemont, 2025), prompt engineering is very important. It's important to do this if user
want to use AI in business, like systems that automatically handle customer service or find
fraud (Ceurstemont, 2025). Companies need to pay for prompt engineering (Patil & Puranik,
2024) to make sure that these tools help them reach their goals and get good results. It needs
to happen on a larger scale when the user and the machine don't agree on what they want.
This difference is not likely to go away, even if AI gets better and better.
In short, prompt engineering is not only a technical need, but also a strategic one. As
users put more AI in schools, businesses, healthcare, and the arts, it's important to know how
to make prompts so that AI can help user.

Prompt Engineering Technique

1. Persona Technique
According to Furukawa (2024, p. 34), a persona in the context of large language
models (LLMs) is a personality profile that includes things like age, gender, or job. It is a
structured and consistent identity that affects how the model reacts (Olea et al., 2024, p. 34).

46
Personas are very important for prompt engineering because they help make interactions with
LLMs more like certain behaviours or areas of expertise (Furukawa, 2024, p. 34).
How the Technique Works
The persona technique changes how the model responds by putting a character
profile right into the prompt. For example, user could say, "One is a civil engineer" (Kim et
al., 2024, p. 35). User can use this method in three main ways. Handcrafted personas are the
first type. The user clearly defines them with examples like "20s, Executive" or "Act as an
intelligent researcher" (Furukawa, 2024, p. 36; Olea et al., 2024, p. 35). The second method
makes LLM-generated personas on the fly for each query, which means they can be used in a
variety of situations (Olea et al., 2024, p. 36). Lastly, multi-agent personas give each agent a
different expert role. This lets them work together or think about a task in a certain way (Olea
et al., 2024, p. 36).
Why It Is Effective
There are many ways that persona prompting has been shown to help people do their
jobs better. For example, combining demographic factors like age and profession, such as
comparing "20s, Engineer" with "60s, Designer," had a big effect on idea evaluation scores,
which made the evaluations better (Furukawa, 2024, pp. 37–38). Expert personas did better
than generic control prompts on tasks that required creative or subjective reasoning. This
shows that they were better for open-ended tasks (Olea et al., 2024, pp. 39–40). Also,
domain-specific personas, like the "Mathematician" persona, have been shown to help people
think logically on certain tasks, which shows how well tailored persona prompting works
(Kim et al., 2024, p. 38).

47
Examples of Application
Idea Evaluation: The "40s, Designer" persona changed how people thought about creativity
and feasibility in their answers (Furukawa, 2024, p. 36).
Answering Questions: Auto-generated expert personas made fact-based questions more
accurate (Olea et al., 2024, p. 36).
Reasoning Tasks: A "Civil Engineer" persona came up with confident but wrong math
answers, showing how powerful and dangerous persona use can be (Kim et al., 2024, p. 35).

Figure 10: Screenshot of a Response Generated by ChatGPT in the Business Analyst
Persona.
Why the Technique Enhances LLM Performance
The persona technique changes the model's tone, depth, and point of view, which
changes how it reacts. It helps copy how experts think, which makes the results more like
what users expect in specialized tasks. Using the wrong persona, on the other hand, can hurt
performance. For instance, using an engineering persona to solve math problems could lead
to mistakes in logic (Kim et al., 2024, p. 35). The Jekyll & Hyde framework fixes this by
giving both persona-based and neutral answers and then using an LLM evaluator to choose

48
the best one (Kim et al., 2024, p. 36). Using LLM-generated personas with this mixed method
made the results more stable and accurate (Kim et al., 2024, p. 38).
2. Clarity
Making sure that prompts are clear is an important part of natural language processing
(NLP). It implies ensuring that the guidance provided to a language model (LM) is concise
and unambiguous. The calibrated prompt should contain an unambiguous question (Chiodi et
al., 2023). The initial segment of the CLEAR Framework also emphasizes the extent to which
prompts should be brief and precise. The reason is that models of AI need not pay attention to
the most essential components (Sparks et al., 2023).

How Clarity Works
With the clear prompts, there is no guessing part of what the user wants because the
AI is told what to prioritize and what the user wants (Klie et al., 2023). Clarity makes users
realize what they desire. This includes ensuring that the question is well understood and
emphasizes correct answer and clear guidelines. It is also worth finding the balance between
being too general and too specific. It is essential to be precise, yet the over-specific prompts
may decline the variability or bias of the responses (Ganesan et al., 2024). It is highly
paramount to watch user words as it has been demonstrated that the arrangement of the words
in the prompt can even mix up the quality of the output (Ganesan et al., 2024). It is a good
suggestion to add a line at the end of the prompt to inform the model to ask questions in order
to clarify things (Chiodi et al., 2023).

Why Clarity is Effective and Helps Get Better Responses
The prompts provided greatly affect the effectiveness of AI models. Prompts should
be structured to be clear and, in that way, better answers are likely to be given (Sparks et al.,

49
2023). It has been demonstrated that accuracy rates can increase to even over 98 percent after
good prompt engineering is applied and this also incorporates being clear (Chiodi et al.,
2023). Another valuable aspect about a good prompt is that it is easy to comprehend
(Ganesan et al., 2024).
In many cases, bad prompts do not contain sufficient information or are too generic,
which makes it harder to use them. When user do not use clear language, the people may
reply to user in a vague way that cannot assist to achieve objectives. In case the AI lacks
sufficient context or details, it may fail to determine what the question is. This would result in
stock solutions that are out of tune with what the user needs (Klie et al., 2023). In case the
prompts are quite general and lack sufficient details, the AI may be confused and provide
generalized and unrelated answers (Chiodi et al., 2023).
Having clear and direct prompts, the user would decrease the risk of receiving blank,
general, or unrelated answers. Specific and clear prompts give better and precise responses.
They assist in ensuring that the skills of the AI are appropriate to the needs of the user. This is
particularly necessary in the apps where the clarity and relevancy of the answers influences
the level of happiness of the user (Sparks et al., 2023). Such imprecision in prompts may
cause incorrect responses and increased biasness. Conversely, in the event of clear prompts,
they assist the AI in performing better and in providing answers that are nearer to the
objective (Ganesan et al., 2024).

Examples
The sources present a couple of illustrations on how vague or unclear prompts are
contrasted with specific and clear ones:

50
"Tell me about a good book" is too vague. A better prompt would be, "Can user
suggest a non-fiction book about personal growth?" This level of detail helps get rid of any
confusion (Chiodi et al., 2023).

Figure 11: Screenshot of ChatGPT’s recommendation of the personal development book
For digital marketing, instead of asking, "Tell me about digital marketing," a more
effective prompt clearly defines the task by asking, "Write a 200-word summary of the latest
trends in digital marketing" (Klie et al., 2023).
When it comes to being clear and to the point, "Explain the process of photosynthesis
and its significance" is better than "Can user give me a detailed explanation of the process of
photosynthesis and its significance?" "Identify factors behind China's recent economic
growth" is also shorter than asking for a "extensive discussion" on the same topic (Ganesan et
al., 2024).

51
3. Task/Instruction” Prompting
Basic and fundamental prompt presentation technique is called Task/Instruction,
which allows user to control the output of large language models (LLMs) and visionlanguage models (VLMs) by designing instructions carefully around each task. This approach
can be used to easily extend the pre-trained models to downstream tasks by only making the
model to act in a specific desired manner depending on what the model receives as input
without modifying the core networks parameters (Wei et al., 2022). The developing of
prompt engineering has given LLMs a lot more flexibility, and they perform well in many
tasks and fields, such as question answering and common sense (Brown et al., 2020).
A simple and yet important way of employing prompt engineering is by providing
directions. It is to communicate with the LLM what it must do in an unequivocal manner.
Provided that instructions on prompts are congruent with the task, they constitute a
significant component of the future response that the LLM will have (Ouyang et al., 2022).
Here are some important parts of "Task/Instruction" prompting:
Giving Full Descriptions: Rather than providing loose or brief descriptions of tasks that may
result to generic outputs user ought to provide clear and elaborate descriptions. This will
assist user in coming up the right and helpful answers. Without any specific instructions,
LLMs trained by large datasets are more likely to provide general answers (Wei et al., 2022).
Clarity and Accuracy: Its model is less confusing to use when there are clear prompts and
assists in providing more precise and defined answers. This degree of accuracy is easier to
achieve when fulfilling the expectations of users (Brown et al., 2020).

52
Role-Prompting: To teach the model, one can provide it with a job, e.g. be a historian or an
assistant. Such framing provides the model with a perspective, which alters the tone, depth,
and level of details in its answers (Reynolds & McDonell, 2021).
Structured Formatting: Organized formatting, such as triple-quotes or indented parts, enables
the model to note how to execute orders, particularly in cases when they are challenging or
comprise multiple procedures (White et al., 2023).
Why It Is Effective and Helps Get Better Responses
When researchers provided big language models with task instructions and task
strategies such as CoT prompting, they found that they were more accurate and logical. As an
example, CoT cues such as "think step by step, please" encourage models in thinking up
stages of reasoning so that their outputs are more understandable and structured (Wei et al.,
2022). The task-driven instructions are also beneficial in that, the models get to learn more
since thinking through the instruction helps them think of the problems rather than simple
answer provision. It has proven to be extremely useful when it comes to assignments that
demand critical or multi fact-based thinking (Ouyang et al., 2022). These instructions also
render models more versatile, such that they can apply what they have learned on emerging
tasks with minimal or no additional training (Zhao et al., 2023). In addition, instruction-based
prompts help trace the steps that logically result in model conclusions. It can be helpful to
detect bugs along with ensuring that things are up to standard (Reynolds & McDonell, 2021).
This ensures that the process is clearer and visible. According to White et al. (2023), the use
of strategic task instructions improves the performance of big language models significantly
in many contexts, such as education, healthcare, and financial services, compared with simple
or so-called vanilla prompts, which simply pose a question.

53
4. Example
Prompt engineering is an example as a method of prompt, or alternatively termed as
in-context learning because an individual is shown how to perform something correctly in the
prompt. Such an approach allows large language models, including, GPT-3, to perform
linguistic tasks that they have not previously performed without having to update the internal
parameters of the model or modify the gradient (Brown et al., 2020). In-context learning
usually happens in three main ways:
Few-Shot Learning: The prompt gives user a few examples of the task (10–100), and
they usually come from the same distribution as the test input.
One-shot learning: The prompt only shows the task once, usually with instructions in
natural language.
Zero-Shot Learning: There are no examples in the prompt for zero-shot learning; it
only has an instruction.Fine-tuning is different because it updates a model's weights by
training it on a large labelled dataset, which often needs thousands of examples that are
specific to the task. In-context learning lets user use the model's parameters to make
inferences about tasks without changing them.
Table 4: Examples of In-Context Learning Types with Descriptions and Prompts
Type of InContext Learning

Description

Example Prompt

Few-shot learning

Prompt contains several
examples of the task

“Translate English to French: 1. Cat → Chat
2. Dog → Chien 3. House → Maison
Translate: Bird →”

One-shot learning

Prompt contains one
example plus instruction

“Translate English to French. Example: Cat
→ Chat. Now translate: Dog →”

Prompt contains only an
Zero-shot learning instruction, no examples

“Translate the word ‘Dog’ from English to
French.”

54
How the Technique Works
Prompt conditioning is what makes in-context learning work. A pre-trained LLM is
given a set of examples that are formatted within the input sequence. These examples are like
"soft training" for the model to learn what the job is and what the result should be. The model
learned from a lot of data from the internet and can use its general knowledge and ability to
find patterns to finish new tasks by following the structure of the prompt (Min et al., 2022).
For example, when using in-context learning for Visual Question Answering (VQA)
tasks, models like BLIP2 are shown pairs of questions and answers that only have text. These
don't have pictures with them during the demonstration, but they help the model understand
how the question is set up and what the answer should look like. This lets the model "adapt"
to new tasks by making guesses about what will happen next based on the examples in the
prompt (Chen et al., 2023).

Why It Is Effective
There are a lot of reasons why using examples as a prompt works. First, few-shot
learning needs a lot less labelled data than traditional fine-tuning. This makes it great for
tasks with few resources. Second, learning in context makes it easier to use what user have
learned in a lot of different situations right away. This ability improves when the model is
larger (Brown et al., 2020). Third, large language models are dynamic in the sense that they
can evolve just like humans when given limited examples of the task structure and intent.
This is how people learn a lot when one is following instructions. Finally, scaling is quite
feasible with this method since it is more effective with larger models, and the performance is
improved with an increased number of examples and a larger model (Wei et al., 2023).

55
Examples of Effectiveness
GPT-3 also indicates that its approach can be applied to many various scenarios,
given that it performs effectively even in the few-shot setting. Large language models
(LLMs) have performed quite well across a broad scope of language tasks, sometimes
performing as well as or better than fine-tuned smaller models. GPT-3 scored 86.4 percent of
the responses on the LAMBADA dataset. The work needs future interpretation of the
situation. It aimed at predicting the final word of sentences relying on much information
(Brown et al., 2020). When tested on the narrative completion task HellaSwag, GPT-3 also
performed better in comparison with smaller, fine-tuned models with the accuracy of 79.3%.
It was just as good or better than fine-tuned models in answering questions. It
achieved 71.2% accuracy in TriviaQA using a small number of examples and 85.0 F1 score
on Conversational Question Answering (CoQA) dataset in few-shot mode, which is almost as
high as would be achieved by a person.
The few-shot texts abilities yielded comparative outcomes to cross-lingual
translations by the finest unsupervised neural machine translation systems in French-English
and English-French translations. This demonstrates the flexibilities of LLMs in language
pairs.
In terms of reasoning, GPT-3 correctly answered 98.9% of subtraction problems and
100% of two-digit addition problems in few-shot mode. On SAT analogy tasks, it performed
better than the majority of college applicants, correctly answering 65.2% of the questions.
This indicates that it excels at word games and analogical reasoning.
Multimodal tasks, such as answering questions with pictures, can also benefit from
thoughtful prompt design. When visual cues such as image captions were combined with

56
text-based Q&A examples, BLIP2 performed better. This demonstrates the significance of
using various prompt types to increase model accuracy (Li et al., 2023).
Why This Method Gets a Better Response
Although it has certain drawbacks, learning with examples in a few shots is
beneficial. One major issue is that the model's performance varies significantly based on the
prompt's example selection and order. Accuracy can vary greatly depending on the
combination, ranging from nearly random to nearly perfect (Zhao et al., 2021). Furthermore,
maintaining high performance is difficult due to model biases. These biases include recency
bias, which shows that the model favours answers that it has seen more recently, common
label bias, which shows that the model tends to favour answer types that are most prevalent in
the prompt, and common token bias, which shows that the model is more likely to produce
outputs that contain tokens that are prevalent in its pretraining data, even if they are unrelated
to the task.
To increase the dependability of few-shot learning, these issues can be resolved in a
number of ways. User can detect and fix output biases by displaying a "null" prompt, which
is an empty or masked input, using contextual calibration. The model's responses are
therefore more accurate and less likely to change. The model will also be better able to
understand and finish the task if the instructions are rewritten to make them easier to read and
follow, for instance, by using bullet points or segmenting challenging tasks into smaller, more
manageable pieces. This is what users term as prompt reframing. The image caption in
question or similar visual aids (Chain-of-Thought justification of purely text-based prompts)
have been proved beneficial in making models perform better on visual question-and-answer
tasks, increasing the information available.

57
Such changes do more than improve the stability of performance, they also make incontext learning a scalable and low-resource (compared to fine-tuning) solution. As bigger
models such as GPT-4 and its follow-ups are developed, the use of even a handful of wellselected examples will grow increasingly useful in any number of domains.

5. Zero-Shot Prompting
Zero-shot prompting is the process applied to the Large Language Models (LLMs)
that teaches the model to act in specific natural language way without giving it any example
to relate to (Yin et al., 2023). Cases Because it is an in-context learning, the prompt is not
dependent on a specific example of input that is being solved (Zhao et al., 2023). It utilizes
the high volume of data that is employed in training the LLMs. This, as stated by Kim et al.
(2023), allows them to generalize and conduct extensive tasks without a need to be
configured individually or have labelled data.
Large models have so-called emergent abilities, and they encapsulate the interplays
between these skills that models with fewer parameters can not develop (Yin et al., 2023).
The models do not require examples now because they can understand and do things using
only a well-written instruction.

How Zero-Shot Prompting Works
The main aim of zero-shot prompting is to make the task apparent and as simple as
possible using plain language. The prompt usually has a place to enter data and an
instruction. Sometimes it also has a place to put the output (Zhao et al., 2023). For instance,
"Translate the following English sentence to French: 'Hello, how are?'" does not need any
training examples; it just needs a clear instruction.

58

Figure 12: This flowchart explains Zero-Shot Learning (ZSL), showing how it handles new
tasks without labeled data and the challenges of traditional methods. It’s useful for AI
engineers to grasp ZSL basics.
This mechanism basically changes a task so that it looks like the model's pre-training
goal (like next-token prediction or masked language modelling). This lets it figure out the
right output structure just from the language context (Kim et al., 2023). Depending on how
the task is set up, well-thought-out prompts can make the same model give very different or
very structured answers (Haque et al., 2023).

Why It Is Effective and Helps Get Better Responses
There are many benefits to using zero-shot discrete prompts that are written in a way
that people can understand. They are simple to understand, don't require a lot of labelled data,
and are simple to create because all they need to do is give task instructions (Yin et al., 2023).
The design is flexible, so users can change the input and expected output without having to
do a lot of prompt engineering.
LLMs are helpful because they can apply what they learn in a lot of different
situations. In the Text2SQL task, which turns natural language into SQL queries, prompts

59
that included the database schema and sample content did much better than those that only
included the question, even when there were no examples (Zhao et al., 2023). This shows that
providing context is important for good performance, even if there are no examples.
But it is known that prompt design is fragile; even small changes like changing the
order of words or punctuation can have a big impact on the quality of the output (Kim et al.,
2023). This implies that zero-shot prompting is quite handy, yet it performs optimally in
occasions where the language is highly explicit.

Examples of Zero-Shot Prompting Strategies
Some common ways to get zero-shot are:
Simple Steps: Simply tell them what to do, like "Summarize the article below."
Task and Label Description: Instead, one can add more information by providing potential
labels or groups such as "Label this review as positive, negative, or neutral."
Questions to think about: Inserting the prompts such as "Why don’t users think step
by step" to make the model apply its reasoning capabilities similarly to chain-of-thought
prompting (Yin et al., 2023; Kim et al., 2023).
Visual Question Answering (VQA) experiments indicate that the design of zero-shot
prompt, i.e., employing style QA template or providing model with instructions, can
significantly influence the level of accuracy (Haque et al., 2023). Phrasing of the good
prompts is also very important as evident with the different words that are used to the labels
of different classes being used to aid in the classification tasks.

60
Impact on Getting Better Responses
Users like zero-shot prompting because it is simple and adaptable. In some cases,
well-made zero-shot prompts have worked better than few-shot prompts, especially when
examples don't matter or make things more complicated (Zhao et al., 2023). However, it is
not always a good option. Where task specific knowledge is highly essential, fine-tuned
models or few-shot prompting can be more practical than a zero-shot strategy.
The inclusion of additional information such as paper titles (which models might have
been trained on) can usually not help and is even sometimes obstructive (Haque et al., 2023).
This indicates that prompt difficulty is not so important but its clarity and pertinence.
The result is that zero-shot prompting allows a plain user to leverage whatever
capabilities that LLMs can bring without having to invest on Kangaroo data labelling or
model fine-tuning efforts. It can be successful only due to a set of well thought-out prompts
to provide clear instructions and offer a context so that the goal of the user could be achieved
as close as possible through the work of the model.
6. Few Shot Prompting
Few-shot prompting is a technique in natural language processing (NLP) of which the
user provides a small number of examples of the input-output behaviour they desire to have
in the prompt. The technique allows large language models (LLMs) to perform inference by
transferring what they have learned to situations they have not encountered before and
generate suitable responses without fine-tuning the model. It differs with zero-shot prompting
(which does not provide any examples) and one-shot prompting (which provides one). Fewshot prompting is an excellent method to enhance the quality and quality of responses and

61
ensure that they remain on task since it makes the model comprehend the structure and
context much better (Brown et al., 2020; Zhang et al., 2022).

How Few-Shot Prompting Works
Few-shot prompting involves putting a few instances of tasks in the prompt itself. The
demos often have several output-input pairs that resemble what the user desires in format and
structure. Illustrating patterns, the model becomes "interconditioned" to deduce the task and
repeat it with a different query. The technique involves in-context learning, and this is an
activity that LLMs such as GPT models are capable of performing. The cues in the prompt
can enable them to deduce generalization patterns (Min et al., 2022).
The few-shot examples guide the model to learn to reply in proper response, tone, or
format. They give examples to make vague instructions clearer, which helps the model figure
out what it needs to do. These examples also give the model patterns that are specific to each
task that it can use when it gets new inputs that it hasn't seen before. This helps it learn more
and do things correctly.
The number and quality of the examples have a big effect on how well the
performance goes. Usually, a carefully chosen set of 3 to 5 examples is all the model needs
(Brown et al., 2020).

Why Few-Shot Prompting is Effective and Helps Get Better Responses
Few-shot prompting is effective in that, the model learns to interpret abstract text-only
instructions. It is also more effective to use a few-shot prompting than to give instructions.
This can be quite helpful when the required task is difficult or specialized and the general
training data involved in the training of the model could be irrelevant to the user (Zhao et al.,
2021).

62
These researchers concluded that few-shot prompting is significantly more accurate
than zero-shot prompting on sentiment classification, question answering and summarization
tasks (Brown et al., 2020; Zhang et al., 2022). On some tasks that involved processing natural
language, GPT-3 did up to 20 percent better when it was switched to few-shot mode instead
of zero-shot mode (Brown et al., 2020).
The few-shot prompting can also contribute to the reduction of the variety of possible
errors in the interpretation and increase consistency in the answers that models generate by
reducing the variability. It assists the model remain in the correct form when it talks with the
giving the model structured examples and enabling it to apply multi-turn reasoning.
Furthermore, fewer shots of prompting also ensure the final result can be more aligned with
the desires of the user, so they are more satisfied with the task and the output is more
pertinent when creating one of the more structured outputs, such as tables, summaries, or
decision-making structures (Min et al., 2022).

Examples
Example 1 (Sentiment Classification)
Few-shot:
“Review: ‘I loved the food and the service was excellent.’
Sentiment: Positive
Review: ‘The experience was awful and the room was dirty.’
Sentiment: Negative
Review: ‘The movie was terrible and boring.’
Sentiment:”

63

Figure 13: Few-Shot Prompt for Sentiment Classification. This chatGPT screenshot
demonstrates a case of few-shot learning where by identifying patterns on previous labelled
samples; the model determines the sentiment of the last review.
Source: Generated by ChatGPT based on Zhao et al. (2021).
In this case, the model will be able to deduce by patterns that the final review will be defined
as "Negative" (Zhao et al., 2021).
Example 2 (Grammar Correction)
Few-shot prompt:
“Incorrect: ‘He go to store.’
Correct: ‘He goes to the store.’
Incorrect: ‘She not like pizza.’
Correct: ‘She does not like pizza.’
Incorrect: ‘They eats fast.’
Correct:”

64

Figure 14: Few-Shot Prompt for Grammar Correction
This ChatGPT screenshot reveals that it can be used in few-shot prompting to correct
grammatical errors. The model takes the labelled dataset of incorrect and corrected
sentences to learn to predict how a new sentence should be written.
Source: Generated by ChatGPT based on Brown et al. (2020).
It goes on: “They eat quickly.” with effective pattern generalization.

Conclusion
Few-shot prompting is an extremely beneficial efficient approach, which exploits the
capabilities of large language models to identify patterns and generalize. By providing
structured examples to the model, users can assist the model to make superior outputs with
less errors. It is particularly useful in the case of custom applications and tasks that have
special requirements based on a specific field of application where examples are better than
instructions. As language models become more powerful, few-shot prompting remains an
important component of reliable and controllable prompt engineering.

65
7. Chain Prompting
Chain prompting also known as LLM chaining is a new method to resolve hard
questions into little manageable pieces. This work can then be sequentially performed on
tasks by the large language models (LLMs). It is organized in such a manner that the output
of one task can serve as the input of another one so today this way promotes a workflow. Wu
and co-authors (2022) state that this approach aims at promoting communication and
cooperation between people and AI.
How Chain Prompting Works
The process has a number of important steps. The first is the task decomposition that
requires the division of a large task into smaller, well-defined subtasks, each related to a
specific model operation and prompt. These subtasks demand primitive operations, the basic
LLM activities that are used in generating chains. Typical primitives involve information
collection (accessing the facts, creating content, and generating ideas), verification and
classification (deciding whether a query can be answered), and re-organization (accessing a
given data, formatting modifications, text into lists, or combining outputs) (Chase et al.,
2023; Wu et al., 2022). Each of the links in the chain is driven by a natural language prompt,
possibly with instructions, background data, and examples. The output of one step, also
known as a data layer, is fed into the next step and a logic pipeline is formed (Wu et al.,
2022). Moreover, interactive interface on some of these tools shows such chains and provides
the opportunity to modify the results at every step of processing, modify sequence of
calculations, and always enhance work on the task. This allows having modular and flexible
control (Chase et al., 2023).
Why Chain Prompting Is Effective

66
Chain prompting, in numerous influential aspects, enhances the user experience and
the performance of the model. Chain prompting has been found to facilitate classification and
summarization. To give an example, it outperforms zero-shot ChatGPT in classifying legal
documents and stepwise prompting in summary writing (Chase et al., 2023). Chain
prompting also makes the model logic clearer and allows the user to have more control by
letting them change the temperature during thinking to brainstorm or the order of the
rewriting stages to provide improved results (Wu et al., 2022). It is also easier to track down
and resolve bugs using this modular technique. Rerunning or rearranging of steps enables
user to quickly find, isolate, and correct bugs on a hardware device since each chain step is a
unit test (Chase et al., 2023). Chain prompting is also used to accelerate the performance of
large language models by breaking down tasks that might be challenging, such as formatting,
creativity and accuracy further into smaller steps. In such a manner, each step may leverage
its advantages (Wu et al., 2022).
Chain prompting is also a solution to some of the most common problems with
LLMs. It solves reasoning dilemmas whose steps are many by simplifying them. It also
minimizes exposure bias by asking one question at a time with regards to new content
segments. Finally, it resolves input sensitivity issues by altering inputs through basic
operations (Chase et al., 2023). By reducing distractions and preventing errors unrelated to
the task from propagating throughout the process, the set output and data connections for
each step serve as barriers that help keep the task on course (Wu et al., 2022). Last but not
least, chain prompting facilitates human-like iteration, particularly for tasks like
summarization, where a multi-pass workflow comprising drafting, critiquing, and revising is
more akin to the way editors operate and produces better results (Chase et al., 2023).

67
Examples of Chain Prompting in Practice
Peer review rewriting is the process of taking negative feedback and turning it into
useful suggestions in an organized way. The first step, called "Split Points," divides the
review into different parts. The next step, "Ideation," gives user ideas for how to fix those
problems. Finally, a Compose Points step puts these answers together into a more polished
and clear response that improves the tone and clarity overall.
Making personalized flashcards to help user learn English and French takes a few
steps. The Ideation phase looks for interactions that make sense, like going to a restaurant.
The next step, Generation, makes example sentences in English. The last step, Rewriting,
changes these examples into French and makes flashcards that are useful and make sense in
the context.
The point of debugging visualization code is to find and fix mistakes in VegaLite
specifications. The first step is to rewrite. It changes the code into simple English
descriptions. A Classification step then looks for specification violations and suggests the
best ways to fix them. The last step of rewriting is to make the corrected code again and
check that it meets all the standards.
Assisted text entry helps people by expanding abbreviations or finishing sentences.
The first step in the process is to check if the input is shorthand. If the abbreviation is there,
the Rewriting step adds to it. If not, the Generation step finishes the phrase, which helps
people write faster.
Long legal document classification breaks down hard-to-read legal texts into smaller
parts. Summary Generation makes long documents shorter. Then, Semantic Search looks for
similar samples to give context. Finally, Label Generation uses in-context learning to give
documents the right tags, which makes it easier to find and organize them (Chase et al.,
2023).

68
Table 5: Examples of Chain Prompting in Practice
Use Case
Peer Review
Rewriting
Language
Flashcard
Creation

Step 1
Split critical
feedback

Step 2

Step 3

Outcome/Goal

Compose
Ideate
empathetic
constructive fixes responses

Improved clarity
and tone in
reviews

Identify context Generate English Translate into
(e.g., dining)
sentences
French

Visualization
Describe code in Identify spec
Code Debugging natural language violations
Assisted Text
Entry

Generate
corrected code

Personalized and
contextual
flashcards
Bug-free VegaLite
visualizations

Detect shorthand Expand or predict
Faster and useror partial input phrases
Finalize sentence friendly text input

Assign legal
Legal Document Summarize long Retrieve relevant labels using
Classification
documents
cases
prompts

Efficient legal
document
organization

Source: Adapted from Chase et al. (2023)
8. Chain of Thought
Chain-of-Thought (CoT) prompting is another next-level process that assists large
language models (LLMs) to think better, as it makes them generate a sequence of
intermediate reasoning steps. These steps, or rationales, which resemble the way people
think, can help the model arrive at more adequate conclusions even in case of complex
assignments (Wei et al., 2022).

Mechanisms of Chain-of-Thought Prompting
Chain-of-Thought (CoT) prompting is tooling to assist big language models (LLMs)
to think logically. This can be done in many ways. As a common technique in few-shot
prompting, manual CoT provides several examples of input, thought process, and output. The
instances introduce this model to the procedure of problem-solving by displaying each step
(Kojima et al., 2022). Conversely, Zero-shot CoT just provides an addition of such a phrase a

69
question, as Let us think step by step, after the question and also does not provide examples.
It is a simple approach that does not require special skills to complete a task, which greatly
simplifies reasoning tasks (Kojima et al., 2022). Automatic CoT (Auto-CoT) makes its own
reasoning chains, which means that user don't have to do as much work by hand to make
prompts. It chooses representative samples, groups questions to add variety, and tells LLMs
to "Let's think step by step" (Zhou et al., 2023). Faithful CoT and Symbolic CoT (SymbCoT)
are more advanced frameworks that break the process into two parts to include symbolic
reasoning. First, they turn natural language into symbolic steps (like PDDL or Python code),
and then they use deterministic solvers to find the answer. Faithful CoT combines symbolic
logic with explanations in everyday language (Zhang et al., 2023). SymbCoT, on the other
hand, combines Translator, Planner, Solver, and Verifier modules to make completely logical
reasoning pipelines.
In our study, users looked at a number of different types of CoT prompts:
1. Zero-shot CoT: This technique provides a clue such as Let us think step by step
when asked a question. It does not require any examples to be presented. Zero-shot CoT
helps models become significantly more competent in terms of reasoning, despite the fact it is
basic (Kojima et al., 2022).
When trying to find fraud, for instance, asking "Is this transaction suspicious?" and
"Let's think step by step" lets the model look at the transaction history and find more specific
signs of fraud.
2. Manual/Few-shot CoT: This version will also indicate examples of how to think in
the prompt so that the model can learn how to make middle steps. According to Brown et al.
(2020), it is effective when doing specific tasks that are exclusive in a particular area, like
sorting taxes and assisting customers.

70
The prompts were these, and users automated customer service this way, example:
Type: The customer wants his or her money back because the package was late. CoT:
"Compare the policy times with the delivery date to check whether a refund is possible." This
gave the impression that models were choosing.
3. Auto-CoT (Automatic Chain-of-Thought): Auto-CoT does not leave this to other
people but instead creates its own different reasoning samples by combining questions and
selecting prompts which are characteristic of the group. It reduces the labour cost and makes
it easy to generalize (Zhou et al., 2023).
A logistics AI could answer the question, "What is the best route for delivery?" AutoCoT gives examples like "Start from warehouse → Check traffic → Optimize by distance and
time → Choose route with lowest delay."
4. CoT with Self-Consistency: This method looks at more than one CoT output instead
of just one reasoning chain to find the answer that is most consistent. This ensemble-style
method gives results that are more accurate and reliable (Wang et al., 2022).
For instance, an AI financial assistant checks risk levels by taking a number of routes:
"Path A: First, look at debt, then income, and finally credit score. Path B: Look at credit
history first and compare it to the limit. It chooses the conclusion that makes the most sense.
Along with these basic methods, users also added new CoT variations:
1. Step back Prompting: This metacognitive prompting method tells the model to look
at its original logic and results again. It helps people avoid jumping to conclusions or making
mistakes, which makes it easier to decide things like policy and product prices.

71
"Lowering the price to boost sales" is a common reason for doing something in retail
planning. Then, it is told to "step back" and think again: "But will that change how people see
the brand or how much money it makes?"
2. Analogical Prompting: This method helps the model figure out a solution by
comparing the problem to a case or analogy that has already been made. This is very helpful
for guessing what will happen in business and legal matters.
The model might say, "This is like the XYZ merger in 2020" when a merger is being
planned. What happened next? What caused it to work or not work?
All of these methods help LLM responses be more clear and CoT's ability to break
down hard tasks into logical parts get better. Users compared how well they worked, how
easy they were to understand, and how adaptable they were to different parts of business, like
operations, finance, and compliance.

Why CoT Prompting Is Effective

Chain-of-Thought (CoT) prompting works well because of a few key features. One important
part is the rise of reasoning skills on a large scale: CoT usually only works well with very big
models, like those with more than 100 billion parameters. For instance, PaLM-540B does
much better than smaller versions like PaLM-62B on tasks that require multiple steps of
reasoning because it has better semantic parsing and makes fewer logical mistakes
(Chowdhery et al., 2022). Problem decomposition is another important feature. It lets models
break down hard tasks into smaller steps, which helps them focus their computational effort
better and improves both accuracy and interpretability (Wei et al., 2022). CoT outputs also
show clear reasoning paths that help users figure out how conclusions are reached, which
makes it easier to understand and fix problems. Standard CoT chains, on the other hand, don't

72
always follow the rules of logic, which means that the final answer may not strictly follow
from the reasoning steps. Faithful CoT solves this problem by using executable logic to check
the reasoning (Zhou et al., 2023).

CoT prompting can be used in many areas, such as arithmetic (GSM8K), commonsense
reasoning (StrategyQA), and symbolic reasoning (Last Letter Concatenation) (Wei et al.,
2022; Kojima et al., 2022). It is also strong against changes in example demonstrations,
annotator style, and language use. Auto-CoT stresses the importance of having a wide range
of examples to make it even stronger (Zhou et al., 2023). CoT is especially helpful for
instruction-tuned models like GPT-4 and Flan-PaLM because they can find and fix bad
reasoning. This shows that they are strong even when demonstrations have mistakes
(Chowdhery et al., 2022; OpenAI, 2023). Finally, symbolic CoT frameworks like SymbCoT
get the best results in logical reasoning by making fewer syntax mistakes and making answers
more accurate and easier to understand (Zhang et al., 2023).

Practical Examples of CoT Prompting
Chain-of-Thought prompting helps with a range of reasoning types on different tasks.
For example, in arithmetic reasoning, a question like "The cafeteria had 23 apples" How
many are left if they used 20 and bought 6 more? is solved by breaking the problem down
into smaller parts. For example, starting with 23 apples, taking away 20 used apples to get 3,
and then adding 6 more purchased apples to get 9. This way of thinking in steps helps cut
down on mistakes in calculations. For example, when someone says, "Bring me something
that isn't a fruit," the model uses common sense to figure out that an energy bar fits the bill
and makes plans to find, pick, and deliver it. CoT is also helpful for symbolic reasoning tasks.
For instance, the Last Letter Concatenation problem, also known as "Waldo Schmidt," is
solved by taking the last letters "o" and "t" from each name and putting them together to

73
make "ot." In a coin flip situation, the model starts with heads up, sees that one flip (an odd
number) changes the state to tails, and correctly says the answer is "no." Frameworks like
SymbCoT use first-order logic to figure out things like whether someone is part of a six-way
tie, using multiple symbolic premises. Standard CoT might make these tasks too simple or get
them wrong.
Table 6: Examples of Chain-of-Thought Prompting in Different Reasoning Tasks
Reasoning
Type

Example Prompt

Chain-of-Thought
Reasoning Steps

Final Answer Outcome/Goal

Arithmetic
Reasoning

“The cafeteria had
23 apples. If they
used 20 and
23 - 20 = 3; 3 + 6 = 9
bought 6 more,
how many are
left?”

9

Improved clarity
and tone in
reviews

Energy bar

Personalized and
contextual
flashcards

ot

Bug-free
VegaLite
visualizations

No

Faster and userfriendly text
input

Commonsense
Reasoning

“Bring me
something that
isn’t a fruit.”

Identifies object types
→ Filters fruits →
Chooses ‘energy bar’
as a suitable item

Symbolic
Reasoning

“What is the result
Last letters: ‘o’ from
of Last Letter
Waldo, ‘t’ from
Concatenation of
Schmidt →
‘Waldo
Concatenates → “ot”
Schmidt’?”

Coin Flip
Logic

“A coin shows
Initial = heads → 1
heads. User flip it
flip = odd = change →
once. Is it still
Now = tails → “No”
heads?”

“If 6 people are Converts to logic form
Logical/Symb tied for first place, → Compares with tie
Depends on
olic
is Alice included
conditions →
logic chain
(SymbCoT)
if she won 3
Evaluates Alice's
matches?”
eligibility logically

Efficient legal
document
organization

Source: Adapted from Wei et al. (2022); Kojima et al. (2022); LoBue et al. (2023)

74
Advantages of CoT Prompting

There are many good things about chain-of-thought prompting. First, it helps with
structured reasoning by breaking down hard problems into smaller, easier-to-solve ones. This
is similar to how people think and helps them do better on hard tasks (Wei et al., 2022).
Second, models do much better when demonstrations include natural language reasoning
steps instead of just equations. This is because these naturalistic chains fit better with how
large language models are trained (Kojima et al., 2022). Third, approaches like Auto-CoT,
which chooses a wide range of examples that are representative and diverse, help reduce the
spread of bad logic. Faithful CoT, on the other hand, makes sure that reasoning processes are
deterministic, which improves both accuracy and transparency (Zhou et al., 2023). Fourth,
Chain-of-Thought prompting improves generalization, which means that models can solve
problems that are harder than the ones they trained on, especially when it comes to symbolic
reasoning tasks (Zhang et al., 2023). Finally, CoT uses what it already knows from
instruction-tuned models to fix mistakes and make good reasoning even when demonstrations
aren't perfect (OpenAI, 2023).

Limitations
Chain-of-Thought prompting has a lot of good points, but it also has some big
problems. First, making high-quality reasoning demonstrations for supervised fine-tuning is
still a lot of work that needs careful planning and knowledge. Second, big language models
can sometimes make up reasoning chains that sound reasonable but are actually wrong or
don't make sense. Finally, the reasoning process isn't always reliable unless symbolic
reasoning frameworks are used to check and make sure that the logic is correct.

75
9. Tree of Thought
The Tree of Thoughts (ToT) is a way to get large language models (LLMs) to think
more clearly and solve problems better. ToT is different from linear prompting methods like
Chain of Thought (CoT), which only give the model one set of reasoning steps. Instead, ToT
organizes intermediate steps into a tree structure, which lets the model think about more than
one way to reason (Yao et al., 2023). Each node in this tree stands for a partial solution or
thought, and the branches show how the reasoning could go on, making the process of
thinking more flexible and like a human.

Figure 15: Illustrates the Tree of Thoughts (ToT) framework, showing how multiple
reasoning paths are explored as a tree structure. Each node represents a partial solution,
and branching enables flexible, human-like problem solving by evaluating various thought
sequences before arriving at a final answer.
How the Technique Works
There are a number of modules and steps in the ToT framework. To begin, the
problem is broken down into a series of intermediate reasoning steps, or "thoughts." For each
current state (or partial solution), the model creates several possible thoughts using either

76
sampling methods or constrained generation methods. Then, a state evaluator gives each
partial solution a heuristic score using either scalar value estimation or comparative voting.
Using these evaluations, a search algorithm like breadth-first search (BFS) or depth-first
search (DFS) is used to look into the most promising reasoning paths.
More parts make ToT work better. A prompter agent makes prompts that depend on
the situation, and a checker module checks intermediate outputs against task-specific rules. A
memory module keeps track of past reasoning paths so that user can go back and forth, and
the controller is in charge of the whole thing, deciding when to explore or give up on a
reasoning path (Yao et al., 2023; Long, 2023).

Why the Technique is Effective
ToT fixes a number of major problems with traditional LLM prompting methods.
First, it eases the linearity constraint by adding a branching structure that lets user look into
other ways to solve the problem. Second, adding a state evaluator and checker module makes
it possible to validate and fix things in the middle, which stops errors from spreading. Third,
the ability to go back and forth in time is like how people solve problems and makes the
model more flexible when it has to deal with tasks that are unclear or hard (Yao et al., 2023).
Unlike traditional methods, which often use quick, intuitive reasoning (like
Kahneman's System 1), ToT adds a planned and systematic planning layer, like System 2
thinking. This two-step method lets LLMs look at a number of possible futures, weigh the
pros and cons of each one, and change their plans on the fly (Kahneman, 2011; Yao et al.,
2023).

77
Examples of Application
ToT has been proven to work in many different areas. ToT did much better than CoT
in math reasoning tasks like the Game of 24, where it got 74% of the answers right with
GPT-4 and only 4% of the answers right with CoT (Yao et al., 2023). Both human and
automated evaluators said that the outputs from ToT were more coherent and appropriate for
the context in creative writing tasks. In the same way, ToT made it easier to place words and
letters correctly in Mini Crossword puzzles than with baseline methods.
ToT has also done better than other methods at solving logic-based games like
Sudoku and hard benchmark tasks in the Big-Bench Hard (BBH) suite. These results show
that ToT is better at handling long-range dependencies and abstract reasoning than traditional
prompting strategies (Long, 2023).

Why This Technique Helps Produce Better Responses
ToT works well because it is structured and modular. ToT lowers the risk of early
mistakes ruining the final product by making the model think of many different things, giving
them scores, and letting it rethink or change its approach. This process of going through it
over and over again helps user think more deeply and broadly. Also, ToT can work with a lot
of different search strategies and evaluation methods, so it can be used for a lot of different
things, from symbolic reasoning to open-ended creative generation (Yao et al., 2023; Bubeck
et al., 2023).
CoT, few-shot, and zero-shot prompting modes cannot verify and correct themselves,
which means that their results may become poorer in case of the initial assumptions. Checker
and memory modules within ToT, on the other hand, hamper this process by storing and
testing partial solutions. This dynamic, recursive structure enables a kind of algorithmic
thinking that closely resembles the kind done by people as they solve problems.

78
Ethical Implications and Practical Challenges in Advanced Prompting Techniques
Prompt engineering is an emerging method of enhancing the performance of large
language models (LLMs), however, it also creates numerous ethical concerns. More
advanced prompting strategies such as Chain-of-Thought (CoT), persona prompting, and
zero-shot/few-shot learning can ensure that models perform better, reason better, and become
more aligned with the wishes of the users. Nevertheless, they are also liable to providing
biased, misleading, or unclear results. In order to ensure that AI is utilized in a safe and
responsible way, it is worth addressing these issues particularly in the sensitive sectors such
as healthcare, education, legal services, and finance.
1. Stereotyping and Bias Amplification
Prompt engineering can unwittingly make more salient bias in training data stronger.
Such methods as persona prompting forcing models to pretend to be an engineer, teacher or
therapist may cause individuals to conceive stereotypes revolving around their gender, race or
age (Furukawa, 2024; Kim et al., 2024). In other words, as an example, when being asked,
users are a nurse, models can inappropriately associate their role with female qualities and
engineer with male qualities. This is due to biases within datasets which training LLMs. That
raises deep moral concerns regarding equity, inclusivity and representation in products
generated by AI. Unless these types of biases are crated, they might perpetuate systemic
inequity and be less confident in using AI systems (Olea et al., 2024).
2. Hallucinations and Wrong Information
Although LLMs are enhanced with prompting, they nevertheless hallucinate, that is,
they may produce grammatically correct text that is incorrect or outright fictional (Bubeck et
al., 2023). This can be very risky in areas of medicine, law or finance because individuals
may make serious decisions using the wrong information. Prompt engineering can contribute

79
to the decreasing of the taken place of hallucinations because of making the tasks more
precise and putting these aspects in the proper context, but it is unable to fully eliminate this
issue. The issue of ethics is how to correctly balance usability and reliability. What is the
point of having a customer believe in results of a very fluent but not well-informed model of
what is true (Ceurstemont, 2025)?
3. Risks of Manipulation and False Information
The accuracy that prompt engineering makes possible raises the risk that models will be used
to make content that is misleading or manipulative. For instance, prompts can be made to
make false claims about products, spread false political information, or create emotionally
persuasive stories that lead people astray (Saini & Sharma, 2024). AI-generated content can
be made in large quantities with little effort, which raises even more ethical concerns than
traditional disinformation campaigns. This makes us think about what developers and users
should do to stop LLMs from being used in ways that hurt society.
4. Persona Prompting and Lying
Persona prompting, in which LLMs act out roles like "doctor," "lawyer," or "financial
advisor," makes the tone and context more consistent, but it also raises the risk of
misrepresentation (Olea et al., 2024). People who use a model might mistakenly think that
outputs made under professional personas are the same as expert advice. This is especially
bad when the AI gives medical, legal, or psychological advice without the right training,
which could lead to bad choices or bad results. Because of this, using AI in an ethical way
means making it clear that the outputs are made by AI and should not be used instead of
professional advice (Patil & Puranik, 2024).

80
5. Being Responsible and Accountable
One of the main ethical problems is figuring out who is responsible for AI outputs that were
made with engineered prompts. Because many LLMs are black-box systems that can only be
accessed through APIs, it can be hard to figure out who is responsible for bad outputs (Patil
& Puranik, 2024). Still, users, developers, and organizations that work on prompt design are
partly responsible for making sure that AI-generated outputs are correct, fair, and safe. This
problem gets worse when AI is used in automated decision-making systems without any
human oversight, which raises questions about who is responsible if something goes wrong.
6. Not Being Clear and Not Being Open
As is the nature of the problems which continue to arise when dealing with prompt
engineering, LLM behaviour is difficult to observe. Users are unable to observe the impact of
trivial alterations of wording in prompts on how the reasoning behind the model changes
(Ceurstemont, 2025) and they can significantly change the resulting outputs. When outputs of
AI are difficult to understand, it becomes more difficult to audit, verify, or make more
accountable and trustworthy. But also, the fact that there are no industry guidelines on how to
make prompts compounds this issue, and it is difficult to implicate ethical norms or
compliance standards to various AI use cases (Kim et al., 2024).
Altogether, timely engineering transforms LLMs into a much more helpful and
practical technique, yet it also raises significant ethical and practical concerns, which can not
be overlooked. Such issues as bias amplification, hallucination, misinformation, Sociotechnical misrepresentation, and the intransparency of model behaviour will require close
attention of AI developers, researchers, and policymakers. Ethical standards, openness rules,
and means to penalize individuals in promoting timely engineering will be needed as the use

81
of LLMs in sensitive and high-stakes domains grows. This will assist in the promotion of
responsible uses of AI.

Recommendations
Advanced prompting strategies change quickly, so using them responsibly and
effectively requires careful planning, being aware of the context, and making small changes
over time. Based on the research that has been done and the new ideas that have come out of
this study, here are some evidence-based suggestions for researchers and practitioners:
1. Begin with aligning tasks with goals.
To get the best results, user need to match the types of tasks with the prompting strategies.
Chain-of-Thought prompting is best for logical reasoning (Wei et al., 2022), while zero-shot
or instruction-based prompting is best for classification or generation tasks that don't need
much setup (Zhao et al., 2023; Yin et al., 2023).
2. Use modular testing to make changes to prompts.
Researchers should use modular testing to test prompts multiple times because they are
sensitive to prompts (Zhao et al., 2021). Using tools like OpenAI Playground or LangChain
to do structured experiments (as explained in the methodology and use cases) makes the
output more reliable by comparing different versions in real life (Chiodi et al., 2023).
3. Use Retrieval-Augmented Generation (RAG) to make sure the facts are correct.
In fields like healthcare or law where the truth of the output is very important, using
Retrieval-Augmented Generation (RAG) makes the facts more solid and cuts down on
hallucinations (Lewis et al., 2020; Zhao et al., 2023). In situations where user need to

82
remember something right away, this method has already been shown to work better than
regular LLM prompting.
4. Use Parameter-Efficient Fine-Tuning (PEFT) to make changes that don't use up too
many resources.
LoRA (Hu et al., 2021) and Adapter Layers (Pfeiffer et al., 2020) are two examples of
techniques that allow for domain-specific tuning without sacrificing efficiency. These can be
used with prompting techniques to customize models for specific or changing use cases
without having to retrain the whole model (Chung et al., 2022).
5. Add ethical protections to prompt engineering.
The danger of persona manipulation and hallucinations along with bias ramping should be
considered (Bansal, 2024; Ceurstemont, 2025). Calibration and prompt designs, such as biasaware prompt design (Ganesan et al., 2024) or prompt calibration (Chiodi et al., 2023), and
persona validation frameworks (Kim et al., 2024) are very important in sensitive situations.
6. Keep records of prompts and make sure they can be repeated.
By monitoring various versions of prompts and maintaining properly organized stores of
prompts, it is possible to ensure that deployments remain lucid and can be replicated (Zhao et
al., 2021; Ceurstemont, 2025). This comes in particular handy in such an environment as a
school, the government, or collaboration.
7. Help non-technical users learn to read quickly.
People need to apply AI prompts to various disciplines, so researchers should create prompt
templates, books, and software to show different individuals how to utilize AI prompts (Patil

83
& Puranik, 2024; Ekin, 2024). This facilitates the use of LLMs in education, marketing,
journalism, and customer care where it has become commonly available.

Conclusion
This research sought to identify the influence of the sophistication of prompting skills
on Large Language Models (LLMs) performance, reliability, and ethical application. The
paper categorized immediate engineering approaches in four key categories by observing the
peer-reviewed publications, technical reports, as well as industry research systematically. The
former category features such techniques as persona prompting and instructional cues that are
aimed at clarity and roles. The methods allow clarifying intentions and better aligning
answers with the context (Furukawa, 2024). These are techniques that contribute to the
setting of the tone of the model, and ensuring that it appears to know a lot about the matter.
This is particularly useful when writing such things as essays, aiding customers, and
generating materials. The second category involves the example-based prompting, which can
encompass zero-shot, one-shot, and few-shot learning (Brown et al., 2020; Wei et al., 2022).
Such tools allow LLMs to learn contextually by providing them with only a few examples to
carry out new tasks. These ways are essential to use to deploy LLMs in permanently
changing places or low-resource ones without retraining them.
Among the stepwise reasoning techniques are Chain-of-Thought (CoT), Prompt
Chaining, and Tree-of-Thought (ToT) that contribute to logical coherence and systematic
problem-solving (Wei et al., 2022; Zhou et al., 2022). Such strategies come in handy
particularly when dealing with challenging problems, which involve several stages of
mathematical or logical thinking. The fourth category, knowledge-augmented methods such
as Retrieval-Augmented Generation (RAG) deepens factual consistency by allowing LLMs to
obtain information in real-time using external data sources (Lewis et al., 2020). These various
strategies of prompting exhibit that prompt engineering is not simply a question of

84
grammatical formulations; it is also an exercise in designing systematic, tactical interactions
that align LLM capacity with the demands of tasks. In the study, it was also necessary to
bring out the prompt engineering needed to optimize the benefits of LLM, minimizing the
risks of hallucinations, confusion, and bias (Bubeck et al., 2023; Kim et al., 2023). With an
improvement of the LLM technologies, however, the need to know how to use advanced
prompting techniques will remain essential to researchers, developers, and practitioners
interested in using generative AI responsibly and effectively.

85
References
Bansal, A. (2024). Prompt engineering for generative AI: A practical guide to unlocking AI's full
potential. AI Publications.
Bre, F., Gimenez, J., & Fachinotti, V. (2017). Prediction of wind pressure coefficients on building
surfaces using artificial neural networks. Energy and Buildings, 158, 1429–1441.
https://doi.org/10.1016/j.enbuild.2017.11.045
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D.
(2020). Language models are few-shot learners. Advances in Neural Information
Processing Systems, 33, 1877–1901. https://doi.org/10.48550/arXiv.2005.14165
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., ... & Zhang, Y.
(2023). Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv.
https://doi.org/10.48550/arXiv.2303.12712
Ceurstemont, S. (2025). How to talk to AI: The rise of prompt engineering. Nature, 621(7981),
12–14.
Chase, R., Chan, M. C., Zhou, K., Singh, J., & Liang, P. (2023). PromptChainer: Chaining large
language model prompts through visual programming. arXiv.
https://doi.org/10.48550/arXiv.2305.14218
Chen, P., Li, J., Zhang, Y., & Yu, A. W. (2023). BLIP-2: Bootstrapping language-image pretraining with frozen image encoders and large language models. Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
https://arxiv.org/abs/2301.12597

86
Chiodi, M., Gray, M., Chen, M., Wolf, T., & Pyo, S. (2023). Mastering prompt engineering: A
guide to effective AI interaction. ResearchGate.
https://www.researchgate.net/publication/383920503
Chowdhery, A., Narang, S., Devlin, J., et al. (2022). PaLM: Scaling language modeling with
pathways. arXiv. https://doi.org/10.48550/arXiv.2204.02311
Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., ... & Le, Q. (2022). Scaling
instruction-finetuned language models. arXiv. https://arxiv.org/pdf/2210.11416.pdf
Eight things to know about large language models. (n.d.). Critical AI, Duke University Press.
https://dukeupress.edu/critical-ai
Ekin, H. (2024). Communicating with AI: The role of prompt engineering in human-AI
interaction. AI & Society, 39(1), 77–89.
Furukawa, H. (2024). How does the persona given to large language models affect the idea
evaluations? IIAI Letters on Informatics and Interdisciplinary Research, 6, 34–42.
https://doi.org/10.52731/iiir.v006.342
Ganesan, D., Grubb, A., & Ravishankar, K. (2024). Advanced prompting techniques and prompt
engineering for enterprises: A comprehensive guide. ResearchGate.
https://www.researchgate.net/publication/383453095
Goli, A., & Singh, A. (2024). Frontiers: Can large language models capture human preferences?
Marketing Science, 43(4), 709–722. https://doi.org/10.1287/mksc.2023.0306
Goli, A., & Singh, S. (2024). Machine learning: Fundamentals and applications. Springer.

87
Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., ... & Poon, H. (2021). Domainspecific language model pretraining for biomedical natural language processing. ACM
Transactions on Computing for Healthcare, 3(1), 1–23.
https://arxiv.org/pdf/2007.15779.pdf
Hadi, M. U., Qureshi, R., Shah, A., Irfan, M., Zafar, A., Shaikh, M. B., ... & Mirjalili, S. (2023).
Large language models: A comprehensive survey of its applications, challenges,
limitations, and future prospects. Authorea Preprints, 1–26.
https://doi.org/10.48550/arXiv.2305.13172
Haque, A. U., Rust, P., Röder, M., & Hauff, C. (2023). Navigating prompt complexity for zeroshot classification: A study of large language models in computational social science.
ResearchGate. https://www.researchgate.net/publication/370981439
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, L., ... & Raj, A. (2021). LoRA: Lowrank adaptation of large language models. arXiv. https://arxiv.org/pdf/2106.09685.pdf
Huang, L. (2024, April 22). Large Language Model — History. Medium.
https://medium.com/@linghuang_76674/llm-history-5db2c9e236f5
Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.
Kim, J., Yang, N., & Jung, K. (2024). Persona is a double-edged sword: Mitigating the negative
impact of role-playing prompts in zero-shot reasoning tasks. Proceedings of the
International Conference on Machine Learning (ICML), 34–42.
Kim, S., Le, D., Lim, Y., & Kim, G. (2023). Better zero-shot reasoning with self-adaptive
prompting. arXiv. https://arxiv.org/pdf/2305.14106.pdf

88
Kingma, D. P., & Welling, M. (2016). Auto-encoding variational Bayes. arXiv.
https://arxiv.org/pdf/1605.05396
Klie, T., Zhang, C., & Gurevych, I. (2023). Crafting effective prompts: Enhancing AI
performance through structured input design. ResearchGate.
https://www.researchgate.net/publication/385591891
Kojima, T., Gu, S. S., Reid, M., et al. (2022). Large language models are zero-shot reasoners.
arXiv. https://doi.org/10.48550/arXiv.2205.11916
Larson, A. M. (2025). Large language model (LLM). In Salem Press Encyclopedia of Science.
Salem Press.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Riedel, S. (2020).
Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv.
https://arxiv.org/pdf/2005.11401.pdf
Li, J., Hu, R., Zeng, Y., Hu, X., Zhang, L., Liu, P., ... & Shi, J. (2023). BLIP-2: Bootstrapped
language-image pretraining with frozen image encoders and large language models. arXiv.
https://arxiv.org/pdf/2301.12597.pdf
Li, J., Shen, Y., Chen, Z., Ke, L., & Yu, A. W. (2023). Evaluating the use of chain-of-thought in
VQA with BLIP-2. arXiv preprint. https://arxiv.org/abs/2310.09297
Liu, W., Gupta, S., Shi, Y., Kang, Y., Kumar, A., & Song, L. (2024). Generative AI: A survey of
its development trends and future outlook. ResearchGate.
https://www.researchgate.net/publication/380032572

89
Long, J. (2023). Improving the reasoning ability of large language models via a tree of thoughts.
Microsoft Research Blog. https://www.microsoft.com/en-us/research/blog/improving-thereasoning-ability-of-large-language-models-via-a-tree-of-thoughts/
Mahesh, B. (2020). Machine learning algorithms: A review. International Journal of Science and
Research (IJSR), 9(1), 381–386. https://doi.org/10.21275/ART20203995
National Geographic Society. (n.d.). Alan Turing [Photograph]. National Geographic.
https://www.nationalgeographic.com/science/article/alan-turing-test-artificial-intelligencelife-history
National Institute of Standards and Technology. (n.d.). Alan Turing and the beginning of AI. U.S.
Department of Commerce. https://www.nist.gov/news-events/news/2021/06/alan-turingand-beginning-ai
Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., &
Mian, A. (2024). A comprehensive overview of large language models. arXiv preprint.
https://doi.org/10.48550/arXiv.2307.06435
New Scientist. (n.d.). What is the Turing test? https://www.newscientist.com/definition/turingtest/
Olea, C., Tucker, H., Phelan, J., Pattison, C., Zhang, S., Lieb, M., Schmidt, D., & White, J.
(2024). Evaluating persona prompting for question answering tasks. Proceedings of the
AAAI Conference on Artificial Intelligence, 34–42.
OpenAI. (2023). GPT-4 technical report. arXiv. https://doi.org/10.48550/arXiv.2303.08774
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Christiano, P.
(2022). Training language models to follow instructions with human feedback. Advances

90
in Neural Information Processing Systems, 35, 27730–27744.
https://doi.org/10.48550/arXiv.2203.02155
Patil, A., & Puranik, S. (2024). Prompt engineering in the age of generative AI: Enhancing
productivity and trust in LLMs. Journal of Artificial Intelligence Research and
Applications, 58(2), 105–118.
Patil, R., & Gudivada, V. (2024). A review of current trends, techniques, and challenges in large
language models (LLMs). Applied Sciences, 14(5), 2074.
https://doi.org/10.3390/app14052074
Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., & Gurevych, I. (2020). AdapterFusion: Nondestructive task composition for transfer learning. arXiv preprint.
https://arxiv.org/pdf/2005.00247.pdf
Press, O., Bar, A., Smith, N. A., & Levy, O. (2022). Measuring and narrowing the
compositionality gap in language models. arXiv. https://arxiv.org/abs/2210.03350
PromptHub. (2024, April 30). Chain-of-thought prompting guide: Techniques for improving
reasoning in LLMs. https://www.prompthub.us/blog/chain-of-thought-prompting-guide
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models
are unsupervised multitask learners. OpenAI.
Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond
the few-shot paradigm. arXiv. https://doi.org/10.48550/arXiv.2102.07350
Saghiri, A. M. (2024). Why GPT-based chatbots will be vital applications, challenges, and the
shaping of the fragile job market [Flowchart]. ResearchGate.
https://www.researchgate.net/publication/378477275

91
Sahoo, S., Kumar, A., & Singh, P. (2024). A systematic survey of prompt engineering in large
language models: Techniques and applications. arXiv. https://arxiv.org/abs/2402.07927
Saini, M., & Sharma, R. (2024). Applications and future of generative AI in various domains.
International Journal of Trend in Scientific Research and Development (IJTSRD), 8(2).
https://www.ijtsrd.com/papers/ijtsrd72647.pdf
Shanahan, M. (2024). Talking about large language models. Communications of the ACM, 67(2),
68–79. https://doi.org/10.1145/3642084
Smith, A. (2017). Artificial intelligence digital neural network [Photograph]. Unsplash.
https://images.unsplash.com/photo-1504384308090-c894fdcc538d
Sparks, R., Koharchik, L., & Meyer, H. (2023). The CLEAR path: A framework for enhancing
information literacy through prompt engineering. The Journal of Academic Librarianship,
49(6), 102689. https://doi.org/10.1016/j.acalib.2023.102689
Tiwari, S., & Patel, A. (2024). A review on generative AI: Challenges, opportunities, and
applications. arXiv. https://arxiv.org/pdf/2403.04190
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., &
Polosukhin, I. (2017). Attention is all need. arXiv. https://arxiv.org/pdf/1706.03762.pdf
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., & Zhou, D. (2022). Selfconsistency improves chain of thought reasoning in language models. arXiv.
https://arxiv.org/abs/2203.11171
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., ... & Le, Q. V. (2022). Chain
of thought prompting elicits reasoning in large language models. arXiv.
https://doi.org/10.48550/arXiv.2201.11903

92
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., ... & Le, Q. V. (2023). Chainof-thought prompting elicits reasoning in large language models. Nature, 615(7950), 660–
665. https://doi.org/10.1038/s41586-023-05886-4
What are large language models. (n.d.). MachineLearningMastery.com.
https://machinelearningmastery.com/what-are-large-language-models/
White, T., Zhang, S., Lin, Z., & Dai, Z. (2023). Prompting GPT-3 to be reliable. Proceedings of
the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP).
https://doi.org/10.48550/arXiv.2301.11270
Wu, W., Zhou, K., Park, S., & Liang, P. (2022). AI Chains: Transparent and controllable humanAI interaction by chaining large language model prompts. arXiv.
https://doi.org/10.48550/arXiv.2209.11302
Yao, S., Ye, D., Liang, P., & Etzioni, O. (2023). Tree of thoughts: Deliberate problem solving
with large language models. arXiv. https://arxiv.org/abs/2305.10601
Yao, S., Zhao, J., Yu, D., Anil, R., Yu, Y., Park, J. S., & Cao, Y. (2022). ReAct: Synergizing
reasoning and acting in language models. arXiv. https://arxiv.org/abs/2210.03629
Yin, W., Choi, E., & Neubig, G. (2023). A practical survey on zero-shot prompt design for incontext learning. Proceedings of the International Conference RANLP 2023.
https://aclanthology.org/2023.ranlp-1.69.pdf
Zhang, R., Liu, L., Hu, Y., & He, X. (2022). Efficient fine-tuning of pretrained language models
via low-rank adaptation. Findings of the Association for Computational Linguistics:
EMNLP 2022. https://arxiv.org/pdf/2207.01093.pdf

93
Zhang, Y., Sun, S., Galley, M., Chen, Y. C., Brockett, C., Gao, X., & Dolan, B. (2022). Few-shot
learning with retrieval augmented language models. Proceedings of the 60th Annual
Meeting of the Association for Computational Linguistics, 1126–1140.
https://doi.org/10.18653/v1/2022.acl-long.80
Zhao, C., Haris, N., & Balog, M. (2023). Investigating prompting techniques for zero- and fewshot visual question answering. ResearchGate.
https://www.researchgate.net/publication/371684481
Zhao, W., Lin, Z., Song, X., Ren, X., Tan, M., & Qi, P. (2023). InstructGPT meets zero-shot
learning: An empirical study. arXiv. https://doi.org/10.48550/arXiv.2305.03079
Zhao, W., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Calibrate before use: Improving
few-shot performance of language models. International Conference on Machine
Learning, 12697–12706. https://doi.org/10.48550/arXiv.2102.09690
Zhao, Z., Yin, W., Li, S., Li, Y., Li, X., & Ma, J. (2023). R2AG: Retrieval-aware augmented
generation for knowledge-intensive NLP. arXiv preprint.
https://arxiv.org/pdf/2303.12712.pdf
Zhou, H., Liu, J., & Guo, C. (2023). Zero-shot prompting strategies for table question answering
with a low-resource language. Emerging Science Journal, 7(5), 1222–1238.
https://www.ijournalse.org/index.php/ESJ/article/view/2540/pdf
Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for vision-language models.
arXiv. https://arxiv.org/abs/2109.01134
Zhou, S., Yang, H., Wang, Y., et al. (2023). Least-to-most prompting enables complex reasoning
in large language models. arXiv. https://doi.org/10.48550/arXiv.2205.10625

94