Skip to Content

Generative AI with LLMs: Fine-Tuning LLMs with Human Feedback: What is the Action and the Action Space?

Learn what the action and the action space are when fine-tuning large language models (LLMs) with human feedback, a technique that trains a reward model from human feedback and uses it to optimize the LLM’s policy.

Table of Contents

Question

Fill in the blanks: When fine-tuning a large language model with human feedback, the action that the agent (in this case the LLM) carries out is ________ and the action space is the _________.

A. Generating the next token, the context window
B. Calculating the probability distribution, the LLM model weights.
C. Processing the prompt, context window.
D. Generating the next token, vocabulary of all tokens.

Answer

D. Generating the next token, vocabulary of all tokens.

Explanation

The correct answer is D. Generating the next token, vocabulary of all tokens. When fine-tuning a large language model (LLM) with human feedback, the action that the agent (in this case the LLM) carries out is generating the next token, and the action space is the vocabulary of all tokens.

Fine-tuning a LLM with human feedback is a technique that trains a reward model directly from human feedback and uses the model as a reward function to optimize the LLM’s policy using reinforcement learning (RL). The LLM is treated as an agent that interacts with an environment (the text corpus) and learns from its own actions (the generated tokens) based on a reward signal (the human feedback). The agent’s goal is to maximize the expected cumulative reward over a sequence of actions (the generated text).

The action that the agent carries out is generating the next token, which is the basic unit of text generation. The agent chooses the next token based on the current state (the input prompt and the previous tokens) and the policy (the probability distribution over the vocabulary). The action space is the vocabulary of all tokens, which is the set of possible choices for the next token. The agent can select any token from the vocabulary, but some tokens may be more likely or desirable than others depending on the context and the task. The agent learns to generate better tokens by receiving feedback from the reward model, which is trained to predict the human preferences for different outputs.

Generative AI Exam Question and Answer

The latest Generative AI with LLMs actual real practice exam question and answer (Q&A) dumps are available free, helpful to pass the Generative AI with LLMs certificate exam and earn Generative AI with LLMs certification.