Task Workflow Explained
pdf
keyboard_arrow_up
School
University of California, Los Angeles *
*We aren’t endorsed by this school
Course
3663
Subject
Information Systems
Date
Nov 24, 2024
Type
Pages
6
Uploaded by shannonchen168
Task Workflow Explained
In this section you will find an explanation of a complete task so you can understand how it works
and how it looks. We will go through this example task step-by-step:
1.
Examine the User Prompt
2. Examine Response A
3.
Examine the Tool Execution Summary for Response A
4.
Assess Fulfillment for Response A:
To assess Fulfillment, ask yourself the following question: Does this response address the intent of
the user's Prompt such that a user would not feel the Prompt was ignored or misinterpreted by the
Response?
Fulfillment: To what extent does the Response demonstrate that it correctly addresses the intent of
the user's Prompt such that a user would not feel the Prompt was ignored or misinterpreted by the
Response.
1. Rate the Fulfillment.
1.
Use “Not at all” if the Response does not address the most important aspects of the Prompt.
The user would feel like their request was not at all understood.
2.
Use “Partially” if the Response does not address some minor aspects and/or ignores some
requirements of the Prompt. Users will feel their query is partially understood.
3.
Use “Completely” if the Response addresses all aspects and adheres to all requirements of
the Prompt. The user would feel like their request was completely understood.
2.
In case when a tool should be invoked but failed (shown as another input), rate a “Not at all”
for fulfillment. For example, when a user asks for youtube videos, but youtube_tool fails.
1.
See tool execution summary below for additional instructions.
3.
Important: If the request is outside the tools’ capabilities or the tool outputs are empty, the
model should recognize and reply acknowledging the limitation. That type of Response is
also known as a Punt:
1.
If the model recognizes that it cannot fulfill the user's request and correctly replies to the
user, mark it as “completely accurate”.
2.
If the model responds with a negative but doesn't talk about what it missed, mark it as
“reasonably accurate”.
3.
If the model makes up fake information in a response, mark it as “not accurate”.
o
For example, the flights tool returns empty flights after a search, but Bard responds
with flights that may not exist.
Example below: After reading the response, we know that it completely understands the request: A
list of places to visit in LA during 4 days.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
5.
Assess Factuality for Response A:
Your factuality rating measures how accurate the information in the response is, based on
common sense, the Tool Output, and external research. Factuality should be rated based on
the response itself, regardless of the prompt.
1.
Please open all links in the response to make sure the information in the link
matches what is being presented to the user.
2.
Check the URLs (links) present in the Tool Output.
1.
Links like http://googleusercontent.com/… are placeholders. Consider them as valid
links.
3.
For information that is not backed up by a link, use a quick google search to check the
accuracy.
4. Rate the Factuality.
1.
Use "Completely accurate" if all information is correct, or with just a minor issue that
does not affect the fidelity of the response.
2.
Use "Reasonably accurate" if the most important factual information in the Response
is accurate or would widely be viewed as accurate. However, the Response may
include minor inaccuracies in less important pieces of factual information or contain
factual information presented in a way that could potentially be misleading. (one
minor piece of information missing).
▪
For example, if 5 bullet points are given, 3-4 of them are accurate, 1-2
irrelevant, consider rating "reasonably accurate".
▪
When rating "reasonably accurate" always provide an explanation on which
part of the response is inaccurate.
3.
Use “Not accurate” when at least one piece of important factual information is
verifiably incorrect (e.g., flight does not exist).
4.
Use “Can't confidently assess” when the Response is unclear or it is difficult to
sufficiently determine the accuracy of at least one piece of important factual
information.
Important notes:
•
Only rely on TOOL OUTPUT for determining the factuality of flight prices, hotel prices, and
travel times. This means you SHOULD NOT check the current prices directly on the web,
because they will have changed since the model response was generated.
•
Please use date and time in the use tool output instead of real-time info when assessing the
accuracy of the response.
Example below: Response A is providing a list of places to visit in LA, so this section will allow us to
check if that information is real or not. Just open the link and see if the information in that URL
matches the information in the Response.
Example below: Now that you have examined the response and tool output, you can select the level
of Factuality. For this example we already know that the information is completely accurate:
6.
Repeat steps 1 through 5 for Response B
7.
Select the best response:
Now you have all the information that you need to select which response is better.
8. Justify your selection:
•
Finally, you will have to justify why you selected an answer as best, or why they are tied.
•
In this particular scenario Response A is better:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help