Towards_Collaborative_Question_Answering
pdf
keyboard_arrow_up
School
TAFE NSW - Sydney Institute *
*We aren’t endorsed by this school
Course
411
Subject
Computer Science
Date
Nov 24, 2024
Type
Pages
12
Uploaded by ChefAtomEchidna44
Towards Collaborative Question Answering: A Preliminary Study
Xiangkun Hu
1,
*
, Hang Yan
2,
*
, Qipeng Guo
1
, Xipeng Qiu
2
, Weinan Zhang
3
, Zheng Zhang
1
1
AWS Shanghai AI Lab
2
School of Computer Science, Fudan University
3
Shanghai Jiao Tong University
{
xiangkhu, gqipeng, zhaz
}
@amazon.com
{
hyan19,xpqiu
}
@fudan.edu.cn
wnzhang@apex.sjtu.edu.cn
Abstract
Knowledge and expertise in the real-world can
be disjointedly owned.
To solve a complex
question, collaboration among experts is often
called for. In this paper, we propose CollabQA,
a novel QA task in which several expert agents
coordinated by a moderator work together to
answer questions that cannot be answered with
any single agent alone. We make a synthetic
dataset of a large knowledge graph that can be
distributed to experts. We define the process
to form a complex question from ground truth
reasoning path, neural network agent models
that can learn to solve the task, and evalua-
tion metrics to check the performance.
We
show that the problem can be challenging with-
out introducing prior of the collaboration struc-
ture, unless experts are perfect and uniform.
Based on this experience, we elaborate exten-
sions needed to approach collaboration tasks
in real-world settings.
1
Introduction
One of the fascinating aspects of human activity is
collaboration: despite the limitations of our individ-
ual experience and knowledge, we can collaborate
to solve a problem too challenging for any one per-
son alone. In the context of this paper, we are inter-
ested in collaboration via rounds of questions and
answers
internal
to a panel of experts responding
to an
external
question. Forms of such activities
have broadened into the realm of robots as well.
For instance, customer service is automated with
the backing of machine agents, each holding expert
knowledge in a specific domain.
Figure
1
shows a hypothetical customer service
example, where an AI agent is serving a customer
who is about to place an order of a mask. Even
though the agent has access to the features (e.g.,
‘N95’) of the mask in its local database, it cannot
answer the question “
Can this mask protect me from
*
Equal contribution.
…
Can this mask protect me from COVID-19
?
Select TYPE from order …
Can the N95 mask prevent COVID-19? Yes, because it can prevent 95% small particles.
N95
Expert 1
Expert 2
Yes, because it can prevent 95% small particles.
①
②
③
④
⑤
⑥
Expert …
customer
Database
Agent
Figure 1: An example in a hypothetical customer ser-
vice scenario. The customer asks a question about a fea-
ture of the product in the order he/she is about to place.
However the database of the service agent doesn’t con-
tain the information. Instead of responding with some-
thing like “
Sorry I don’t know
”, the better way is to get
help from human experts, or other QA agents.
COVID-19?
”. Instead of responding with “
Sorry I
don’t know.
” as most of the current QA systems do,
it can reroute a new question “
Can the N95 mask
prevent the COVID-19?
” to a human expert.
We call this task
CollabQA
where a single agent
(human or robot) cannot reason and respond to a
complex question, but collectively they can.
In
other words, knowledge is not shared across agents,
but the union contains the required reasoning path,
which necessitates collaboration.
To solve the problem in its general form is hard.
In this paper, we take a few steps forward by
proposing 1) a simplified version of CollabQA task
where one front-serving agent decomposes an ex-
ternal question into simple ones for the rest of the
experts to answer, 2) a synthetic dataset of a large
knowledge graph that can be distributed to experts,
and 3) a set of baseline models and the associated
arXiv:2201.09708v1 [cs.AI] 24 Jan 2022
evaluation metrics.
Despite this very simple form, we show the prob-
lem can be very challenging. Our overall conclu-
sion is that, even with such a simple setting where
1) knowledge is clearly decomposed, 2) collabora-
tion is passive, and 3) questions and answers are
formed with simple templates and node prediction,
training a good collaboration policy remains chal-
lenging, unless we add a strong prior reflecting the
collaboration structure, and assume collaborators
that are both perfect and uniform. We use these
experiences to reflect how to improve this task to
gradually approach collaboration tasks with more
real-world flavor.
The rest of the paper is organized as follows.
Section
3
formally defines the CollabQA task set-
ting and shows the toy dataset we synthesized for a
preliminary study. Section
4
describes the approach
we proposed to for the task. We show experimental
results and their enlightenment in Section
5
. Sec-
tion
6
surveys related works to CollabQA and dis-
cuss the key differences. And Section
7
discusses
some potential directions for future works.
2
Opening Remarks
This paper was initially submitted to the EMNLP
2020 on June 3, 2020.
The reviewers’ primary
concern about this paper was that it lacked real
data experiments and rejected this paper. Since
then, we thought we might have time to polish this
paper further, but the research direction of our team
changed to other fields. Therefore, we did not have
the chance to go deeper in this direction. Some of
the settings or discussions may be interesting to
the community, so we decided to release this paper
on the arxiv. Since 2020, we have noticed more
related papers in this section. We list some of them
for the readers’ reference and leave other parts of
this paper almost unaltered to its initial version.
To make agents collaborate, we usually need to
decompose a complex task into simpler ones so
that different agents can tackle these simple tasks.
Wolfson et al.
(
2020
) defined several operators, and
a complex question will be decomposed into sev-
eral sub-queries so that each sub-query will have
only one operator. Based on this principle,
Wolf-
son et al.
(
2020
) annotated a large dataset BREAK
which can be served as a good starting point for
Question Answering (QA) collaboration.
He et al.
(
2017
) proposed a dataset that requires two people,
each with a distinct private list of friends, to find
Notations
Description
P
i
The
i
-th panelist.
Q
The external complex question.
q
t
Utterance by
P
0
at
t
-th dialog turn.
u
(
t
)
i
Response of
P
i
at
t
-th dialog turn.
KG
i
Knowledge graph owned by
P
i
.
τ
(
Q
)
The reasoning path of
Q
.
T
Dialog turns.
Table 1: Notations of CollabQA.
their mutual friends through talking. CEREAL-
BAR proposed in (
Suhr et al.
,
2019
) is a collab-
orative game, which requires an instructor and a
follower to collaborate to gather three cards in a
virtual environment. The instructor can use natural
language to pass messages to a follower, but not
vice versa. The instructor has to learn to use better
instructions to achieve better scores.
Khot et al.
(
2021a
) proposed to use natural language to make
several existing QA models collaborate so that they
together can solve a question that cannot be solved
solely by any existing QA models. And they fur-
ther proposed a synthetic benchmark COMMAQA
which can facilitate the research of collaboration
QA (
Khot et al.
,
2021b
).
3
The CollabQA Task
3.1
Notations and Settings
The general setting of CollabQA simulates a group
of panelists
{
P
i
}
n
i
=0
, out of which
P
0
is a special:
it is the front-serving
receptionist
and the repre-
sentative to the external world, it is also the
mod-
erator
of the collaboration among
{
P
i
}
n
i
=1
, who
we term as the panelists. When
P
0
receives an ex-
ternal question
Q
, it broadcasts an utterance
q
(1)
to the panelists and collects responses
{
u
(1)
i
}
n
i
=1
from them. This process continues iteratively, each
round is a tuple
(
q
(
t
)
,
{
u
(
t
)
i
}
n
i
=1
)
, until a maximum
of
T
turns, and/or when
P
0
is able to generate the
final response which includes “UNK”, that means
“I don’t know”. Notations used in this paper are
listed in Table
1
.
The panelists
{
P
i
}
n
i
=1
owns a list of knowl-
edge graphs,
KG
1
,
KG
2
, . . . ,
KG
n
, and their union
KG
=
∪
n
i
=1
KG
i
is the total graph. Questions are
usually complex in the sense that they cannot be
answered by one single agent. However, they are
always answerable by
KG
. In other words,
τ
(
Q
)
,
the reasoning path of question
Q
can cut across
different graphs but is always contained within
KG
.
As such,
P
0
must generate multiple polls to the
panelists to stitch together
τ
(
Q
)
. Our objective is
to minimize the total number of turns while maxi-
mizing the success rate.
3.2
A Toy Task
Inspired by the bAbI task (
Weston et al.
,
2015
),
we construct a CollabQA dataset, which contains
a series of QA pairs and 3 supporting knowledge
graphs.
We first construct
KG
1
,
KG
2
,
KG
3
consisting of
fabricated
person
,
company
and
city
entities and
their relations. They stores the knowledge of
N
1
persons,
N
2
companies and
N
3
cities respectively.
The details of the three knowledge graphs are listed
in Appendix
A
. They are assigned to the panelists
P
1
,
P
2
and
P
3
respectively as their knowledge.
Then we synthesize QA pairs from the knowl-
edge graphs as well as the reasoning paths. Each
question needs a cross-graph multi-hop reasoning.
To illustrate the process of creating the dataset ex-
amples, we use an example to show how to create
a 2-hop question: from a node “Person#1” in
KG
1
,
we follow a path with many-to-one or one-to-one
types of relations, for example the “birthplace” rela-
tion and get a triplet (Person#1, birthplace, City#4);
then we start from node “City#4” in
KG
3
to search
a triplet (City#4, largest
company, Company#4).
Then we combine the two triplets into a reasoning
path:
Person#1
birthplace
-----→
City#4
largest
company
---------→
Company#4
(1)
so the final answer is the entity Company#4 in
the end of the path. The question
Q
asking about
Company#4 following the reasoning path is “What
is the largest company in the city where Person#1
was born?”, which is generated by templates. A
more complex example is shown in the upper part
of Figure
2
.
The reason we use many-to-one or one-to-one
types of relations during the search, is that it en-
sures the entities occurred in the path are unique, so
that we can decompose the question into sub ques-
tions and each with unique answer. In general, to
generate an
n
-hop question, we randomly pick an
entity node and perform
n
-hop Depth First Search
(DFS). Note that multiple edges may exist between
a pair of entities (as in a person may live and die in
the same city).
We limit the communications among the pan-
elists are natural language. Therefore,
P
0
needs to
learn how to ask questions. To alleviate the burden
of text generation, we pre-define a set of templates
of sub questions. So, for
τ
(
Q
)
in equation
1
, the
sub questions are “Which city was Person# born
in ?”
for the first hop and “What is the largest
company in City#4 ?” for the second hop. The bot-
tom part of Figure
2
shows an ideal collaborative
process.
Table
2
lists the overall statistics of the dataset.
Statistics description
Value
Train set size
66,800
Dev set size
8,350
Test set size
8,350
# of templates of
Q
49
# of templates of simple questions
28
Table 2: Overall statistics of the CollabQA dataset.
Person#1
175cm
height
City#4
birthplace
live_in
City#8
Person#9
work_in
live_in
Company#2
Company#4
female
gender
gender
work_in
Company#4
Person#8
founder
2010.2.8
establish_date
City#4
locate_in
Company#6
locate_in
CEO
Person#2
City#4
Company#4
largest_company
area
500
??
#
mayor
Person#5
State#1
contained_by
City#1
contained_by
mayor
Person#3
Turn 1
:
?
%
: Which city was
Person#1
born in ?
?
&
:
City#4
?
#
: UNK
?
’
: UNK
Turn 2
:
?0
: What is the largest company in City#4 ?
?
&
: UNK
?
#
: UNK
?
’
:
Company#4
Turn 3
:
?0
: When was Company#4 established ?
?
&
: UNK
?
#
:
2010.2.8
?
’
: UNK
Turn 4
:
?
%
: [return answer]
2010.2.8
?
&
?
#
?
’
Reasoning Path
?:
When
was
the
largest
company
in
the
city
where
Person#1
was born established ?
?
:
2010.2.8
Figure 2: Illustration of Toy CollabQA Task: an exam-
ple of QA pair and the ideal collaborative process.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
3.3
Links to other QA Tasks
CollabQA can be regarded as an combination of
several kinds of QA tasks: knowledge graph ques-
tion answering (KGQA), multi-hop QA, multi-turn
dialogue.
KGQA
In CollabQA, each panelist is a KGQA
system.
KGQA assumes that each panelist can
answer the questions according its own KG, though
it may need one- or several-step reasoning.
Multi-hop QA
In multi-hop QA, the supporting
facts of a question are scattered in different sources.
Most models for multi-hop QA assume they can
access all the sources. Different from multi-hop
QA, the supporting facts in CollabQA are sepa-
rately owned by different panelists. Each support-
ing fact is not accessible except its owner. There-
fore, panelists need communicate with each other
to exchange information.
Multi-turn Dialogue
The multi-turn dialogue
usually occurs between human and agent.
Col-
labQA aims to develop multi-turn interactions
among several agents (panelists).
As such, CollabQA is more challenging than
KGQA, multi-hop QA and multi-turn dialogue.
4
Proposed Approach
In our setting, panelists collaborate
passively
in
that they respond with what they know or else with
“UNK”. Therefore,
P
0
leads the process of collabo-
ration. Our general approach consists of two stages:
1) pre-train the panelists with supervised learning;
2) train the collaboration policy with reinforcement
learning.
4.1
Panelists
Panelists share the same model architecture:
a
Graph Encoder
that encode the knowledge graph
into a graph representation matrix
H
(
KG
)
, a
Ques-
tion Encoder
that encodes the incoming question
q
(
t
)
into
h
(
q
(
t
)
)
, feeding both into a Node Selector
then picks an entity as the answer.
Graph Encoder
Without ambiguity, we call
each knowledge graph owned by expert agents
KG
= (
V
,
E
)
.
KG
is a heterogeneous graph con-
sisting of different types of entities
V
and their
relations
E
. The form of each relation is a triplet
{
(
u, rel, v
)
}
, where
u, v
∈ V
and
rel
∈ R
and
R
is the set of relation types.
We
come
up
with
a
modified
version
of
Relational
Graph
Convolutional
Network
(R-
GCN) (
Schlichtkrull et al.
,
2018
) as the graph en-
coder. R-GCN encodes the graph by aggregating
the neighbor and edge information to the nodes.
Given a node
v
in
KG
, let
h
(
l
)
v
denote its represen-
tation at the
l
-th layer of R-GCN, then
h
(
l
+1)
v
=
δ
summationdisplay
rel
∈R
summationdisplay
u
∈N
rel
v
1
c
v,rel
W
(
l
)
rel
h
(
l
)
u
+
W
(
l
)
0
h
(
l
)
v
,
(2)
where
δ
is an activation function,
N
rel
v
denotes
the neighbors of
v
which have relation
r
with
v
,
W
(
l
)
rel
is the weight matrix of relation
rel
at the
l
-th
layer, and
1
c
v,rel
is a normalization factor. After
L
aggregations, the final representations of the nodes
are
H
(
KG
)
.
However, R-GCN suffers from high GPU mem-
ory usage, making it hard to scale to large graphs.
The reason is that computing the message includes
a direct tensor operation that will produce a very
large tensor, espeically when number of relations
is large. On the other hand, if we compute the mes-
sages with a for-loop, the speed of the aggregation
will suffer. To get rid of this problem, we make
modifications similar to (
Vashishth et al.
,
2020
) but
simpler and sufficient for this task in CollabQA
dataset: each relation is modelled by a trainable
vector
h
rel
instead of a matrix
W
rel
, then the ag-
gregation process changes to:
h
(
l
+1)
v
=
δ
1
c
v,rel
summationdisplay
rel,u
∈N
rel
v
MLP
(
l
)
([
h
(
l
)
v
,
h
(
l
)
rel
,
h
(
l
)
u
])
.
(3)
In our experiments, we observe significant GPU
memory saving. Our implementation leverages the
DGL package for its superior GPU performance
(
Wang et al.
,
2019
).
Question Encoder
Similarly, we call the ques-
tion to the expert agents
q
. The representation of
a question
q
is computed by BiLSTM (
Hochreiter
and Schmidhuber
,
1997
):
h
(
q
)
=BiLSTM(
q
)
,
(4)
Node Selector
Node selector performs an atten-
tion operation with
h
(
q
)
on
H
(
KG
)
, and returns at-
tention scores
α
(
KG
)
on
H
(
G
)
as the likelihood of
selecting each node as answer:
α
(
KG
)
=(
H
(
KG
)
)
T
W
h
(
q
)
(5)
then the answer is the value of the node which has
the highest attention value.
4.2
Moderator and Collaboration Policy
P
0
coordinates collaboration according to a learned
collaboration policy.
At turn
t
,
P
0
takes
a
(
t
)
according to its current state
s
(
t
)
=
f
s
(
d
(
t
)
)
,
where
d
(
t
)
is the dialog history up to
t
,
d
(
t
)
=
[
Q, q
(1)
,
{
u
(1)
i
}
n
i
=1
,
· · ·
, q
(
t
-
1)
,
{
u
(
t
-
1)
i
}
n
i
=1
]
. The
state encoder
f
s
(
·
)
can be any neural model; here
we use BiLSTM.
The action space includes asking a new sub ques-
tion or returning the final answer. To alleviate the
burden of text generation,
P
0
generates sub ques-
tion by selecting templates from a predefined set
U
. To enable
P
0
to determine whether to finish the
collaboration and return the answer of
Q
, we add
a special template in
U
which stands for “finish
the collaboration”. Thus, we use a simple Multi-
layer perceptron (MLP) to implement the collabo-
ration policy
π
(
a
(
t
)
|
s
(
t
)
)
, which takes
s
(
t
)
as input,
outputs a probability distribution over the list of
templates.
In CollabQA dataset, at each dialog turn, only
one of the answers from the panelists is not “UNK”.
So, once the template is selected, we fill in the
placeholder with this answer and update
s
(
t
)
to
generate
q
(
t
+1)
or the final answer.
Reward
We use the number of correct answers
for CollabQA as
baseline reward
. For each ques-
tion, getting an correct answer within
T
max
turns
leads to a reward
r
= +1
; otherwise, the reward
r
=
-
1
.
To alleviate the problem of reward sparsity, we
assign the reward
r
to all the actions in the tra-
jectory.
Besides, we add an entropy regulariza-
tion term to encourage exploration (
Haarnoja et al.
,
2018
). We apply policy gradient method to train
P
0
. The gradient of the policy is
∇
θ
J
=
E
τ
∼
π
bracketleftBigg
T
summationdisplay
t
=1
r
∇
θ
log
π
(
a
(
t
)
|
s
(
t
)
) +
∇
θ
parenleftBig
max(0
, C
−
H
(
π
(
·|
s
(
t
)
)))
parenrightBigbracketrightBig
,
(6)
where
T
is the turns of dialogue,
C
is a hyper-
parameter, and
θ
stands for the parameters of the
policy.
In our simple setting, we can introduce an in-
ductive bias specifically tailored to improve the
learning effects. Since experts do not share knowl-
edge, and there shall be exactly one response that is
not “UNK” in each turn, we add an extra negative
reward
β
(
β <
0)
if it’s not the case. Therefore, the
reward
r
are re-denoted as
Figure 3: The training curve for
P
0
with baseline and
enhanced reward . The final answer accuracy (EMA)
will tend to converge, while the reasoning path accu-
racy (EMP) fluctuates.
Adding prior will make the
training faster and achieve better final accuracy, but can-
not reduce the reasoning path accuracy fluctuation.
r
=
−
1+
β,
if not exactly one answer
,
−
1
,
if wrong answer
,
+1
,
if right answer
.
(7)
where
β
is a hyper-parameter. As we add prior
information in this setting, we call it
enhanced
reward
setting.
5
Experiments and Analysis
5.1
Experimental Setup
Pre-training of the Panelists
We first train the
P
1
,
P
2
,
P
3
with sub questions and their answers
appeared in the training set. We fix the well trained
panelists as the environment during training
P
0
.
The performance of them are shown in Table
3
.
P
1
P
2
P
3
Accuracy
99.6
99.6
100
Table 3: Performance of the pre-trained panelists when
asked one-hop questions on their domain knowledge.
The hyper-parameters of the model used in our
experiments are listed in Table
4
.
Evaluation Metrics
We evaluate the perfor-
mance of
P
0
with two metrics:
1)
EMA
: exact match of the final answer;
2)
EMP
: the extracted reasoning path of
P
0
ex-
actly matches the ground-truth path.
5.2
Results and error analysis
The main results are shown in Table
5
. In the top
row, we show the performance for a random
P
0
Hyper-parameter
Value
R-GCN Layer
1
R-GCN hidden size
80
Embedding dim.
40
Bi-LSTM hidden size
40
Number of Epoch
1000
Batch size
500
Optimizer
Adam
Learning rate
3e-3
Entropy threshold
C
0.1
Prior penalty reward
β
-0.2
Table 4: The hyper-parameter used to learn the pan-
elists and
P
0
.
EMA
EMP
Random
0.0
0.0
Baseline Reward
68.8
45.3
Enhanced Reward
80.1
52.6
Table 5: Main results of the experiments on the test set.
who randomly picks one question to ask at each
step. Its EMA is zero, which reveals that it is not
easy to guess the right answer. The “baseline re-
ward” row presents the results of using the Equa-
tion
6
as the gradient to optimize the model, and
this model has one more termination action at each
step (and it should pose the “termination” action
only at the 4th turn.). The “enhanced reward” row
is the additional penalty we give in Equation
7
(i.e.
knowing exact number of turns and only one re-
sponse is not “UNK”).
Performance gap between two kinds of rewards
The accuracy difference between the “baseline
reward” row and the “enhanced reward” row is
caused by that the “baseline reward” setting has one
extra action, therefore it has the chance to termi-
nate too early or fail to stop at the last turn. Based
on our experiments, we found that this improper
termination accounts for near 9% of the total errors.
However, this kind error can be totally avoided in
the “enhanced reward” setting. The left 2.3% per-
formance lost may attribute to better training of
“enhanced reward”. Another noticeable fact is that
the EMP drop from “enhanced reward” to “baseline
reward” is not as large as the EMA, this is because
the wrong reasoning path in “enhanced reward” is
also prone to terminate improperly. The training
curves for “baseline reward” and “enhanced reward”
are presented in Figure
3
, the confidence interval
Figure 4: One example about the data bias in our data,
99% of persons have the same “live in” and “born in”
city.
is calculated from 5 experiments. From the figure,
the accuracy for the EMA is quite stable, while the
EMP is fluctuating. This is because the number
of distinct samples are not very rich in our dataset,
the variance should be innately small. However,
because of the data bias which will be discussed
in the following part, the EMP will fluctuate and
without hurting the EMA.
Fitting the data bias
What is interesting is that a
high answer accuracy (EMA) does not mean a high
reasoning path accuracy (EMP); in both settings
there is a large gap between the two accuracies.
To understand the reason behind the gap between
answer accuracy and reasoning path accuracy, we
compute the performance for each type of
Q
. We
found that the model finds wrong reasoning path
mostly on the questions that has sub questions
“Which city does [PersonName] live in ?”
and
“Which city was [PersonName] born in ?”. It turns
out that, in our dataset, nearly 99% of the time that
a person’s “birthplace” and “live in place” are the
same, and the model cannot distinguish the differ-
ence between them during training. We observe
that nearly all the questions that need to be decom-
posed to “Which city does [PersonName] live in
?” have been decomposed to “Which city was [Per-
sonName] born in ?” instead. One example can be
viewed in Figure
4
.
To further show how this overlap will have an
impact on our results, we vary the overlap ratio,
which is the probability one person has the same
“live in placce” and “birthplace”, in Figure
5
. As
the overlap ratio goes up, it will be harder for P0 to
discern between “live in placce” and “birthplace”,
but the difficulty does not go up linearly, the EMP
drops sharply after some point. However, the the
drop of EMP does not have too much negative
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Figure 5: The final answer accuracy and reasoning path
accuracy with respect to the overlap ratio.
affect on the EMA, since when the overlap ratio is
high, it can still get the right answer with the wrong
reasoning path.
In other words, the model settles on an approxi-
mate question decomposition that the final reward
cannot distinguish. We note that similar data bias
exist in the real world, and the model exploited
it through exploration in our dataset.
Fixing it
means
P
0
should inspect the semantic consistency
between the sub questions and the original question,
instead of blindly selecting templates.
Except for the data bias issue, there are numer-
ous error cases. For instance, an imperfect an ex-
pert can pick a wrong answer that are structurally
correct (i.e. answer the birth city when the question
is work location) that leads to a correct decomposi-
tion but wrong final answer. Note that our panelists
are nearly perfect. Thus, small errors accumulate
over the turns and greatly affect
P
0
’s performance.
The problem of group bias
The data bias de-
scribed above reminds us of another kind of bias,
that during learning to collaborate,
P
0
may fit the
bias of the panelists. This is intuitive, since pan-
elists are environment, any bias therein will lead to
bias in
P
0
. What is more interesting in a collabora-
tion setting is that such bias is per
group
.
To
verify
this
assumption,
we
conduct
experiments
training
3
groups
of
panelists
with different initialization, and we call them
Panel
(1)
, Panel
(2)
, Panel
(3)
.
Each group has
similar performance to those in Table
3
. Then we
train 3 versions of
P
0
paired with different groups
of panelist:
P
(
i
)
0
trains with
Panel
(
i
)
.
During
testing, we pair each
P
0
with different group of
panelist. The the resulting answer accuracy are
listed in Table
6
.
The results show that there are always perfor-
mance drop when
P
0
is paired with panelists which
P
(1)
0
P
(2)
0
P
(3)
0
Panel
(1)
81.7
82.5
76.4
Panel
(2)
80.0
84.6
74.2
Panel
(3)
80.8
83.6
81.9
Table 6:
Results for pairing each
P
0
with different
group of panelist at testing time. Each column shows
how a version of
P
0
is paired with different panels; the
diagonal entry is where
P
0
pairs with the group it was
trained on.
is not trained with it.
6
Related work
KGQA
In the simplified setting of CollabQA,
each panelist is a simple KGQA system. The ques-
tions are either simple, or need one step reasoning
to be reformulated into another simple question.
KGQA has been widely studied. The most com-
mon way of doing KGQA is by semantic parsing.
Semantic parser maps a natural language question
to a formal query such as SPARQL,
λ
-DCS (
Liang
et al.
,
2011
) or FunQL (
Liang et al.
,
2011
). Previ-
ous works on KGQA can be categorized into classi-
fication based, ranking based and translation based
methods (
Chakraborty et al.
,
2019
). The model
of the panelists we proposed is most related to
classification based methods. Classification based
methods assume the target formal query has a fixed
structure, and the task is to predict the elements in it.
For example, in SimpleQuestions benchmark (
Bor-
des et al.
,
2015
), all the questions are factoid ques-
tions that need one-step reasoning. SimpleQues-
tions has been approached by various NN models
(
He and Golub
,
2016
;
Dai et al.
,
2016
;
Yin et al.
,
2016
;
Yu et al.
,
2017
;
Lukovnikov et al.
,
2017
;
Mo-
hammed et al.
,
2018
;
Petrochuk and Zettlemoyer
,
2018
;
Huang et al.
,
2019
).
Another line of KGQA approaches leverages
knowledge graph embedding to make full use of
the structural information of KGs (
Huang et al.
,
2019
).
Multi-hop QA
To answer a multi-hopped ques-
tion, multiple supporting facts are needed. Wiki-
Hop (
Welbl et al.
,
2018
) and HotpotQA (
Yang et al.
,
2018
) are recently proposed multi-hop QA datasets
for text understanding. Different from multi-hop
QA, the supporting facts in CollabQA are sepa-
rately owned by different panelists. Each support-
ing fact is not accessible except its owner. There-
fore, CollabQA is more challenging than multi-hop
QA.
Multi-Agent Reinforcement Learning (MARL)
In this paper, panelists are passive and pre-trained,
we just train the collaborative policy under single-
agent RL setting. However, the general CollabQA
should allow the panelists discuss with each other;
therefore, each panelist has its own policy and
can update the policy.
Under this general set-
ting, CollabQA naturally falls into the realm of
MARL (
Bu
s
¸
oniu et al.
,
2010
;
Foerster et al.
,
2016
),
which is a more challenging task.
7
Discussion
The task of CollabQA as it stands is very simple.
Nevertheless, the experiences are helpful to drive
towards an improved setting that is closer to real-
world scenarios. To put it differently, if we were
to design the task anew, what are the most impor-
tant extensions? We examine three dimension: 1)
the role and capability of participants, 2) the col-
laboration structure and 3) scaling to real-world
problems.
(1) Role Definition
In the current setting, the
moderator
P
0
assumes no knowledge of its own
and its capacity is limited to breaking down a com-
plex question. The panelists are domain experts
whose knowledge do not overlap, and they can
only respond with facts, without proactively ask
questions, nor can they reveal any reasoning path.
These are much simplified assumptions that do not
reflect the reality.
Relaxing these constraints is
in general challenging; we list some of the issues
below.
Consider the issue of
common sense knowledge
.
Although inconsistencies among individuals do ex-
ist, it is nevertheless the foundation where col-
laboration among a collection of human experts
can start. Often times, common sense is required
to meaningfully decompose a complex question,
whether the panelists are involved or not. Take the
question “Does Person#1 work in the same city as
Person#2 ?” as an example.
P
0
needs to realize
that the entities of the companies and their locations
are key to solve this question. These missing steps,
which are not obvious from the question itself, need
to be inserted and it takes common sense to deduce
them, since “working city” is not a relation readily
available in our KG.
A debate is interesting when there are gaps
between experts, not because they have non-
overlapping knowledge but more often because
they have different opinions on the same facts.
As such we need to introduce
overlapping knowl-
edge
imbued with different certainty (or reliability).
This, in turn, requires
P
0
to have the capacity to
arbitrate among parallel responses from different
panelists.
(2) Collaboration Structure
The overall struc-
ture of a moderator working together with a group
of expert is not uncommon. Even with this broad
structure, there can be other valid variations. For
instance, instead of broadcast, the moderator can
have pointed question to one panelist, or more gen-
erally a subset of the panelists. It is also possible
that the final response needs a vote when the mod-
erator cannot resolve a difference.
The constraint that panelists can only passively
state facts is problematic when a question is am-
biguous. Consider the question “Where does [Per-
sonName] work ?” There are multiple legitimate
responses (e.g. a company, a city, and/or a coun-
try). As such, a panelist should ask
clarification
question
; drawing an exhaustive list from KG is
a possibility, but an unnatural one. As a further
extension, clarification questions can be generated
and responded by any of the participants.
(3) Scale to Real World Scenario
Despite its
simplicity, our current setting is meaningful to ap-
proach real CollabQA tasks. In order to do so we
believe there are few more necessary extensions.
Currently, we assume a complex question is the
realization of a unique path. In general this is not
true even when the reasoning does take a multi-
hop path; multiple edges can exist between a pair
of entities. A lazy (or unlucky)
P
0
may learn to
choose only one of them, if the only award is to
get the final answer right. This is one problem we
discussed in our experiments where “work
in” and
“live
in” happen to overlap in the end nodes.
In general,
reasoning can take a graph
(thought
we can consider a path as a degenerated graph,
too). The earlier example (“Does Person#1 work in
the same city as Person#2 ?”) can only be solved
by a two-level tree with a Boolean comparison
at the root. Booking an airline ticket with both
pricing and timing constraints while the required
information reside in different KGs is similar. As a
result, to generate complex question there is a need
to go beyond the perspective of a reasoning path.
In our current setting,
P
0
selects template, and
panelists respond with the entity. As such, the ac-
tion space of
P
0
is constrained, and there is very
low risk that communications get “lost in transla-
tion.” Ideally, such communication should take
generated natural language. In other words, Col-
labQA needs
natural language generation
(NLG)
as a component. However, doing so will be pro-
hibitive expensive if we want to train from scratch.
If there are
|
V
|
valid words, the number of pos-
sible sentences for a
L
length sequence will be
|
V
|
L
, and that is only for one turn. This will ex-
ponentially exacerbate the issue of sparse reward,
making training difficult. Thus, we believe that this
is not a fundamental problem. In other words, in
the context of CollabQA, leaning what to ask is
more important than how to ask. A more practical
approach is using transfer learning to endorse the
agents with NLG capability.
However, there should be surface realization di-
versity even for semantically identically questions.
Doing so is not only a practically required, but will
also make the system more robust. This can be
easily accomplished by adding noises to templates,
provided that the action space stays manageable.
8
Conclusion
The fact that knowledge are not shared gives rise
to individual diversity and motivates collaboration.
We believe natural-language based collaboration
system is a domain that has practical implication
and holds scientific values. The CollabQA task and
dataset we proposed in this paper is a small step
towards that direction.
References
Antoine Bordes, Nicolas Usunier, Sumit Chopra, and
Jason Weston. 2015.
Large-scale simple question
answering with memory networks.
arXiv preprint
arXiv:1506.02075
.
Lucian Bus
¸oniu, Robert Babuˇska, and Bart De Schut-
ter. 2010.
Multi-agent reinforcement learning: An
overview. In
Innovations in multi-agent systems and
applications-1
, pages 183–221. Springer.
Nilesh Chakraborty, Denis Lukovnikov, Gaurav Ma-
heshwari, Priyansh Trivedi, Jens Lehmann, and Asja
Fischer. 2019. Introduction to neural network based
approaches for question answering over knowledge
graphs.
arXiv preprint arXiv:1907.09361
.
Zihang Dai, Lei Li, and Wei Xu. 2016.
CFO: Condi-
tional focused neural question answering with large-
scale knowledge bases
. In
Proceedings of the 54th
Annual Meeting of the Association for Computa-
tional Linguistics (Volume 1: Long Papers)
, pages
800–810, Berlin, Germany. Association for Compu-
tational Linguistics.
Jakob Foerster, Ioannis Alexandros Assael, Nando
De Freitas, and Shimon Whiteson. 2016.
Learn-
ing to communicate with deep multi-agent reinforce-
ment learning.
In
Advances in neural information
processing systems
, pages 2137–2145.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and
Sergey Levine. 2018.
Soft actor-critic: Off-policy
maximum entropy deep reinforcement learning with
a stochastic actor.
In
International Conference on
Machine Learning
, pages 1861–1870.
He He, Anusha Balakrishnan, Mihail Eric, and Percy
Liang. 2017.
Learning symmetric collaborative dia-
logue agents with dynamic knowledge graph embed-
dings
. In
Proceedings of the 55th Annual Meeting of
the Association for Computational Linguistics, ACL
2017, Vancouver, Canada, July 30 - August 4, Vol-
ume 1: Long Papers
, pages 1766–1776. Association
for Computational Linguistics.
Xiaodong He and David Golub. 2016.
Character-
level question answering with attention
. In
Proceed-
ings of the 2016 Conference on Empirical Methods
in Natural Language Processing
, pages 1598–1607,
Austin, Texas. Association for Computational Lin-
guistics.
Sepp
Hochreiter
and
J¨urgen
Schmidhuber.
1997.
Long short-term memory.
Neural computation
,
9(8):1735–1780.
Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping
Li. 2019.
Knowledge graph embedding based ques-
tion answering
. In
Proceedings of the Twelfth ACM
International Conference on Web Search and Data
Mining
, WSDM ’19, page 105–113, New York, NY,
USA. Association for Computing Machinery.
Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter
Clark, and Ashish Sabharwal. 2021a.
Text modu-
lar networks: Learning to decompose tasks in the
language of existing models
.
In
Proceedings of
the 2021 Conference of the North American Chap-
ter of the Association for Computational Linguistics:
Human Language Technologies, NAACL-HLT 2021,
Online, June 6-11, 2021
, pages 1264–1279. Associ-
ation for Computational Linguistics.
Tushar Khot, Kyle Richardson, Daniel Khashabi, and
Ashish
Sabharwal.
2021b.
Learning
to
solve
complex
tasks
by
talking
to
agents
.
CoRR
,
abs/2110.08542.
Percy Liang, Michael Jordan, and Dan Klein. 2011.
Learning dependency-based compositional seman-
tics
. In
Proceedings of the 49th Annual Meeting of
the Association for Computational Linguistics: Hu-
man Language Technologies
, pages 590–599, Port-
land, Oregon, USA. Association for Computational
Linguistics.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Denis Lukovnikov, Asja Fischer, Jens Lehmann, and
S¨oren Auer. 2017.
Neural network-based question
answering over knowledge graphs on word and char-
acter level
. In
Proceedings of the 26th International
Conference on World Wide Web
, WWW ’17, page
1211–1220, Republic and Canton of Geneva, CHE.
International World Wide Web Conferences Steering
Committee.
Salman Mohammed, Peng Shi, and Jimmy Lin. 2018.
Strong baselines for simple question answering over
knowledge graphs with and without neural networks
.
In
Proceedings of the 2018 Conference of the North
American Chapter of the Association for Compu-
tational Linguistics:
Human Language Technolo-
gies, Volume 2 (Short Papers)
, pages 291–296, New
Orleans, Louisiana. Association for Computational
Linguistics.
Michael Petrochuk and Luke Zettlemoyer. 2018.
Sim-
pleQuestions nearly solved: A new upperbound and
baseline approach
. In
Proceedings of the 2018 Con-
ference on Empirical Methods in Natural Language
Processing
, pages 554–558, Brussels, Belgium. As-
sociation for Computational Linguistics.
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem,
Rianne Van Den Berg, Ivan Titov, and Max Welling.
2018. Modeling relational data with graph convolu-
tional networks. In
European Semantic Web Confer-
ence
, pages 593–607. Springer.
Alane Suhr, Claudia Yan, Jacob Schluger, Stanley Yu,
Hadi Khader, Marwa Mouallem, Iris Zhang, and
Yoav Artzi. 2019.
Executing instructions in situ-
ated collaborative interactions
.
In
Proceedings of
the 2019 Conference on Empirical Methods in Nat-
ural Language Processing and the 9th International
Joint Conference on Natural Language Processing,
EMNLP-IJCNLP 2019, Hong Kong, China, Novem-
ber 3-7, 2019
, pages 2119–2130. Association for
Computational Linguistics.
Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, and
Partha Talukdar. 2020.
Composition-based multi-
relational graph convolutional networks
. In
Interna-
tional Conference on Learning Representations
.
Minjie Wang,
Lingfan Yu,
Da Zheng,
Quan Gan,
Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang,
Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang,
Haibin Lin, Junbo Zhao, Jinyang Li, Alexander J.
Smola, and Zheng Zhang. 2019.
Deep graph li-
brary: Towards efficient and scalable deep learning
on graphs.
CoRR
, abs/1909.01315.
Johannes
Welbl,
Pontus
Stenetorp,
and
Sebastian
Riedel. 2018.
Constructing datasets for multi-hop
reading comprehension across documents
.
Transac-
tions of the Association for Computational Linguis-
tics
, 6:287–302.
Jason Weston, Antoine Bordes, Sumit Chopra, Alexan-
der M Rush, Bart van Merri¨enboer, Armand Joulin,
and Tomas Mikolov. 2015.
Towards ai-complete
question answering: A set of prerequisite toy tasks.
arXiv preprint arXiv:1502.05698
.
Tomer Wolfson, Mor Geva, Ankit Gupta, Yoav Gold-
berg, Matt Gardner, Daniel Deutch, and Jonathan
Berant. 2020.
Break it down:
A question under-
standing benchmark
.
Trans. Assoc. Comput. Lin-
guistics
, 8:183–198.
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio,
William Cohen, Ruslan Salakhutdinov, and Christo-
pher D. Manning. 2018.
HotpotQA: A dataset
for diverse, explainable multi-hop question answer-
ing
. In
Proceedings of the 2018 Conference on Em-
pirical Methods in Natural Language Processing
,
pages 2369–2380, Brussels, Belgium. Association
for Computational Linguistics.
Wenpeng Yin, Mo Yu, Bing Xiang, Bowen Zhou, and
Hinrich Sch¨utze. 2016.
Simple question answering
by attentive convolutional neural network
.
In
Pro-
ceedings of COLING 2016, the 26th International
Conference on Computational Linguistics: Techni-
cal Papers
, pages 1746–1756, Osaka, Japan. The
COLING 2016 Organizing Committee.
Mo Yu, Wenpeng Yin, Kazi Saidul Hasan, Cicero dos
Santos, Bing Xiang, and Bowen Zhou. 2017.
Im-
proved neural relation detection for knowledge base
question answering
. In
Proceedings of the 55th An-
nual Meeting of the Association for Computational
Linguistics (Volume 1:
Long Papers)
, pages 571–
581, Vancouver, Canada. Association for Computa-
tional Linguistics.
person_1
male
gender
city_12
170cm
height
1991.2.1
birthday
birthplace
71kg
weight
$80,000
annual_income
company_22
work_in
city_3
live_in
(a) An example of Person entity in
G
1
company_1
business_1
2012.3.2
person_19
CEO
person_22
founder
time_of_establishment
500
number_of_employees
main_business
city_3
locate_in
person_34
chairman_of_the_board
20 million
market_value
city_3
has_service_in
city_4
has_service_in
(b) An example of Company entity in
G
2
city_1
company_10
largest_company
1.2 million
population
2000 km
2
area
person_55
mayor
state_10
locate_in
(c) An example of City entity in
G
3
Figure 6: Structure and examples of entities in the three
proposed knowledge graphs.
Appendices
A
Details of the CollabQA dataset
Structures of the three KGs
Figure
6
shows the
structure and examples in our proposed knowledge
graphs.
G
1
contains a list of
Person
entities. The
value of a property of the entity is randomly gener-
ated within a reasonable range. For example, the
value of a person’s height is randomly sampled
in the range
[160
cm,
200
cm
]
. We add a series of
constraints to make the KGs more realistic, such
as a person who doesn’t have job gets no annual
income; a person cannot be a mayor and be an
employee in some company at the same time; the
largest company of a city must be located in that
city, and so on.
Statistics of the KGs
The detailed statistics of
the three KGs are shown in Table
7
.
G
1
G
2
G
3
Overall
Number of entities: 7541
Number of entities: 7719
Number of entities: 1360
Number of relations: 24000
Number of relations: 16000
Number of relations: 1500
Number of
different node
types
gender
value: 2
CompanyName: 2000
CityName: 300
PersonName: 3000
date
value: 1862
area
value: 211
height
value: 21
number
value: 836
number
value: 259
weight
value: 31
PersonName: 2600
PersonName: 285
date
value: 2597
BusinessName: 20
CompanyName: 300
CityName: 300
CityName: 300
StateName: 5
CompanyName: 1559
market
value: 101
annual
income
value: 31
Number of
different
relation types
height: 3000
establish
date: 2000
area: 300
weight: 3000
number
of
employees:2000
population: 300
birthday: 3000
ceo:2000
mayor: 300
gender: 3000
founder:2000
largest
company: 300
birthplace: 3000
main
business:2000
contained
by: 300
live
in: 3000
locate
in:2000
work
in: 3000
has
service
in:2000
annual
income: 3000
chairman:2000
market
value:2000
Table 7: Statistics of three knowledge graphs used in our experiment.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you
![Text book image](https://www.bartleby.com/isbn_cover_images/9781337101356/9781337101356_smallCoverImage.jpg)
Information Technology Project Management
Computer Science
ISBN:9781337101356
Author:Kathy Schwalbe
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781305082168/9781305082168_smallCoverImage.gif)
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781337097536/9781337097536_smallCoverImage.gif)
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781305971776/9781305971776_smallCoverImage.gif)
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781285867168/9781285867168_smallCoverImage.gif)
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Recommended textbooks for you
- Information Technology Project ManagementComputer ScienceISBN:9781337101356Author:Kathy SchwalbePublisher:Cengage LearningFundamentals of Information SystemsComputer ScienceISBN:9781305082168Author:Ralph Stair, George ReynoldsPublisher:Cengage Learning
- Fundamentals of Information SystemsComputer ScienceISBN:9781337097536Author:Ralph Stair, George ReynoldsPublisher:Cengage LearningPrinciples of Information Systems (MindTap Course...Computer ScienceISBN:9781305971776Author:Ralph Stair, George ReynoldsPublisher:Cengage LearningPrinciples of Information Systems (MindTap Course...Computer ScienceISBN:9781285867168Author:Ralph Stair, George ReynoldsPublisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781337101356/9781337101356_smallCoverImage.jpg)
Information Technology Project Management
Computer Science
ISBN:9781337101356
Author:Kathy Schwalbe
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781305082168/9781305082168_smallCoverImage.gif)
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781337097536/9781337097536_smallCoverImage.gif)
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781305971776/9781305971776_smallCoverImage.gif)
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781285867168/9781285867168_smallCoverImage.gif)
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning