Task Workflow Explained

pdf

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

3663

Subject

Information Systems

Date

Nov 24, 2024

Type

pdf

Pages

Uploaded by shannonchen168

Task Workflow Explained In this section you will find an explanation of a complete task so you can understand how it works and how it looks. We will go through this example task step-by-step: 1. Examine the User Prompt 2. Examine Response A 3. Examine the Tool Execution Summary for Response A

4. Assess Fulfillment for Response A: To assess Fulfillment, ask yourself the following question: Does this response address the intent of the user's Prompt such that a user would not feel the Prompt was ignored or misinterpreted by the Response? Fulfillment: To what extent does the Response demonstrate that it correctly addresses the intent of the user's Prompt such that a user would not feel the Prompt was ignored or misinterpreted by the Response. 1. Rate the Fulfillment. 1. Use “Not at all” if the Response does not address the most important aspects of the Prompt. The user would feel like their request was not at all understood. 2. Use “Partially” if the Response does not address some minor aspects and/or ignores some requirements of the Prompt. Users will feel their query is partially understood. 3. Use “Completely” if the Response addresses all aspects and adheres to all requirements of the Prompt. The user would feel like their request was completely understood.

2. In case when a tool should be invoked but failed (shown as another input), rate a “Not at all” for fulfillment. For example, when a user asks for youtube videos, but youtube_tool fails. 1. See tool execution summary below for additional instructions. 3. Important: If the request is outside the tools’ capabilities or the tool outputs are empty, the model should recognize and reply acknowledging the limitation. That type of Response is also known as a Punt: 1. If the model recognizes that it cannot fulfill the user's request and correctly replies to the user, mark it as “completely accurate”. 2. If the model responds with a negative but doesn't talk about what it missed, mark it as “reasonably accurate”. 3. If the model makes up fake information in a response, mark it as “not accurate”. o For example, the flights tool returns empty flights after a search, but Bard responds with flights that may not exist. Example below: After reading the response, we know that it completely understands the request: A list of places to visit in LA during 4 days.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

5. Assess Factuality for Response A: Your factuality rating measures how accurate the information in the response is, based on common sense, the Tool Output, and external research. Factuality should be rated based on the response itself, regardless of the prompt. 1. Please open all links in the response to make sure the information in the link matches what is being presented to the user. 2. Check the URLs (links) present in the Tool Output. 1. Links like http://googleusercontent.com/… are placeholders. Consider them as valid links. 3. For information that is not backed up by a link, use a quick google search to check the accuracy. 4. Rate the Factuality. 1. Use "Completely accurate" if all information is correct, or with just a minor issue that does not affect the fidelity of the response. 2. Use "Reasonably accurate" if the most important factual information in the Response is accurate or would widely be viewed as accurate. However, the Response may include minor inaccuracies in less important pieces of factual information or contain factual information presented in a way that could potentially be misleading. (one minor piece of information missing). ▪ For example, if 5 bullet points are given, 3-4 of them are accurate, 1-2 irrelevant, consider rating "reasonably accurate". ▪ When rating "reasonably accurate" always provide an explanation on which part of the response is inaccurate. 3. Use “Not accurate” when at least one piece of important factual information is verifiably incorrect (e.g., flight does not exist). 4. Use “Can't confidently assess” when the Response is unclear or it is difficult to sufficiently determine the accuracy of at least one piece of important factual information. Important notes: • Only rely on TOOL OUTPUT for determining the factuality of flight prices, hotel prices, and travel times. This means you SHOULD NOT check the current prices directly on the web, because they will have changed since the model response was generated. • Please use date and time in the use tool output instead of real-time info when assessing the accuracy of the response.

Example below: Response A is providing a list of places to visit in LA, so this section will allow us to check if that information is real or not. Just open the link and see if the information in that URL matches the information in the Response. Example below: Now that you have examined the response and tool output, you can select the level of Factuality. For this example we already know that the information is completely accurate: 6. Repeat steps 1 through 5 for Response B

7. Select the best response: Now you have all the information that you need to select which response is better. 8. Justify your selection: • Finally, you will have to justify why you selected an answer as best, or why they are tied. • In this particular scenario Response A is better:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Task Workflow Explained

Related Documents