CANE:Context-Aware Network Embedding for Relation Modeling翻译

原创 2019-11-16 22:34  阅读 38 次 评论 0 条

Network embedding (NE) is playing a critical role in network analysis, due to its ability to represent vertices with efficient low-dimensional embedding vectors. However, existing NE models aim to learn a fixed context-free embedding for each vertex and neglect the diverse roles when interacting with other vertices. In this paper, we assume that one vertex usually shows different aspects when interacting with different neighbor vertices, and should own different embeddings respectively. Therefore, we present Context-Aware Network Embedding (CANE), a novel NE model to address this issue. CANE learns context-aware embeddings for vertices with mutual attention mechanism and is expected to model the semantic relationships between vertices more precisely. In experiments, we compare our model with existing NE models on three real-world datasets. Experimental results show that CANE achieves significant improvement than state-of-the-art methods on link prediction and comparable performance on vertex classification. The source code and datasets can be obtained from https://github.com/ thunlp/CANE.

网络嵌入(NE)在网络分析中起着至关重要的作用,因为它能够用有效的低维嵌入向量表示顶点。但是,现有的NE模型旨在为每个顶点学习固定的无上下文嵌入,并且在与其他顶点交互时忽略了各种角色。在本文中,我们假设一个顶点在与不同的相邻顶点交互时通常显示不同的方面,并且应该分别拥有不同的嵌入。因此,我们提出了上下文感知网络嵌入(CANE),这是一种新颖的NE模型,可以解决此问题。 CANE通过相互关注的机制学习顶点的上下文感知嵌入,并有望更精确地建模顶点之间的语义关系。在实验中,我们在三个真实的数据集上将我们的模型与现有的NE模型进行了比较。实验结果表明,与最新的链路预测方法和可比的顶点分类性能相比,CANE取得了显着改善。可以从https://github.com/thunlp/CANE获取源代码和数据集。

1 Introduction

Network embedding (NE), i.e., network representation learning (NRL), aims to map vertices of a network into a low-dimensional space according to their structural roles in the network. NE provides an efficient and effective way to represent and manage large-scale networks, alleviating the computation and sparsity issues of conventional symbol-based representations. Hence, NE is attracting many research interests in recent years (Perozzi et al., 2014; Tang et al., 2015; Grover and Leskovec, 2016), and achieves promising performance on many network analysis tasks including link prediction, vertex classification, and community detection.

网络嵌入(NE),即网络表示学习(NRL),旨在根据网络的顶点在网络中的结构角色将其映射到低维空间。 NE提供了一种有效的方式来表示和管理大型网络,从而减轻了传统基于符号的表示的计算复杂度和稀疏性问题。 因此,近年来,NE吸引了许多研究兴趣(Perozzi等,2014; Tang等,2015; Grover和Leskovec,2016),并且在许多网络分析任务包括链路预测, 顶点分类和社区检测。

In real-world social networks, it is intuitive that one vertex may demonstrate various aspects when interacting with different neighbor vertices. For example, a researcher usually collaborates with various partners on diverse research topics (as illustrated in Fig. 1), a social-media user contacts with various friends sharing distinct interests, and a web page links to multiple pages for different purposes. However, most existing NE methods only arrange one single embedding vector to each vertex, and give rise to the following two invertible issues: (1) These methods cannot flexibly cope with the aspect transition of a vertex when interacting with different neighbors. (2) In these models, a vertex tends to force the embeddings of its neighbors close to each other, which may be not the case all the time. For example, the left user and right user in Fig. 1, share less common interests, but are learned to be close to each other since they both link to the middle person. This will accordingly make vertex embeddings indiscriminative.

在现实世界的社交网络中,一个顶点在与不同的相邻顶点进行交互时可能会展示出各个方面。例如,研究人员通常与各种合作伙伴就不同的研究主题进行协作(如图1所示),与具有不同兴趣的各种朋友进行社交媒体用户联系,以及出于不同目的而链接到多个页面的网页。但是,大多数现有的NE方法仅将一个嵌入矢量分配给每个顶点,并引起以下两个可逆问题:(1)这些方法无法灵活地处理顶点与不同邻居交互时的内容。(2)在这些模型中,顶点往往会迫使其邻居的嵌入彼此靠近,但并非总是如此。例如,图1中的左用户和右用户共享较少的共同兴趣,但是由于他们都链接到中间人,因此被学习为彼此接近。因此,这将使顶点嵌入变得无区别。

To address these issues, we aim to propose a Context-Aware Network Embedding (CANE) framework for modeling relationships between vertices precisely. More specifically, we present CANE on information networks, where each vertex also contains rich external information such as text, labels or other meta-data, and the significance of context is more critical for NE in this scenario. Without loss of generality, we implement CANE on text-based information networks in this paper, which can easily extend to other types of information networks.

为了解决这些问题,我们提出一种旨在感知上下文的网络嵌入(CANE)框架,以精确地建模顶点之间的关系。 更具体地说,我们在信息网络上展示CANE,其中每个顶点还包含丰富的外部信息,例如文本,标签或其他元数据,在这种情况下,对于NE而言上下文的重要性更为明显。 在不失一般性的前提下,我们在本文中的基于文本的信息网络上实现了CANE,它可以轻松地扩展到其他类型的信息网络。

In conventional NE models, each vertex is represented as a static embedding vector, denoted as context-free embedding. On the contrary, CANE assigns dynamic embeddings to a vertex according to different neighbors it interacts with, named as context-aware embedding. Take a vertex u and its neighbor vertex v for example. The context- free embedding of u remains unchanged when interacting with different neighbors. On the contrary, the context-aware embedding of u is dynamic when confronting different neighbors.

在传统的NE模型中,每个顶点都表示为静态嵌入向量,表示为无上下文嵌入。相反,CANE根据顶点不同的交互邻居,动态地分配不同的嵌入,这称为上下文感知嵌入。以一个顶点u及其相邻的顶点v为例,在上下文无关的网络嵌入中,当u与其他邻居交互时,u的嵌入保持不变。相反,在上下文感知嵌入的嵌入中,当面对不同的邻居时,u的嵌入是动态的。

When u interacting with v, their context em- beddings concerning each other are derived from their text information, \(S_u\) and \(S_v\) respectively. For each vertex, we can easily use neural models, such as convolutional neural networks (Blunsom et al., 2014; Johnson and Zhang, 2014; Kim, 2014) and recurrent neural networks (Kiros et al., 2015; Tai et al., 2015), to build context-free text-based embedding. In order to realize context-aware text- based embeddings, we introduce the selective attention scheme and build mutual attention between u and v into these neural models. The mutual attention is expected to guide neural models to emphasize those words that are focused by its neighbor vertices and eventually obtain context- aware embeddings.

当u与v交互时,它们彼此相关的上下文嵌入分别来自于其文本信息\(S_u\)和\(S_v\)。对于每个顶点,我们可以轻松地使用神经模型,例如卷积神经网络(Blunsom等,2014; Johnson和Zhang,2014; Kim,2014)和递归神经网络(Kiros等,2015; Tai等。 (2015年),以构建基于上下文的无文本嵌入。为了实现基于上下文的基于文本的嵌入,我们引入了选择性注意方案,并在这些神经模型中建立了u和v之间的交互信息。相互注意机制将指导神经模型关注相邻顶点所表示的单词,并最终获得上下文感知的嵌入。

Both context-free embeddings and context- aware embeddings of each vertex can be efficiently learned together via concatenation using existing NE methods such as DeepWalk (Perozzi et al., 2014), LINE (Tang et al., 2015) and node2vec (Grover and Leskovec, 2016).

可以联合使用现有的NE方法,例如DeepWalk(Perozzi等人,2014),LINE(Tang等人,2015)和node2vec(Grover和Leskovec,2016年),从而有效地学习每个顶点的无上下文嵌入和上下文感知嵌入。

We conduct experiments on three real-world datasets of different areas. Experimental results on link prediction reveal the effectiveness of our framework as compared to other state-of-the-art methods. The results suggest that context-aware embeddings are critical for network analysis, in particular for those tasks concerning about complicated interactions between vertices such as link prediction. We also explore the performance of our framework via vertex classification and case studies, which again confirms the flexibility and superiority of our models.

我们对不同区域的三个真实世界的数据集进行了实验。链接预测的实验结果表明,与其他最新方法相比,我们的框架更加有效。结果表明,上下文感知嵌入对于网络分析至关重要,特别是对于那些涉及顶点之间复杂交互(例如链接预测)的任务。我们还通过顶点分类和案例研究探索了我们框架的性能,这再次证实了我们模型的灵活性和优越性。

2 Related Work

With the rapid growth of large-scale social networks, network embedding, i.e. network representation learning has been proposed as a critical technique for network analysis tasks.

随着大规模社交网络的快速增长,网络嵌入(即网络表示学习)被作为用于网络分析任务的关键技术被提出。

In recent years, there have been a large number of NE models proposed to learn efficient vertex embeddings (Tang and Liu, 2009; Cao et al., 2015; Wang et al., 2016; Tu et al., 2016a). For example, DeepWalk (Perozzi et al., 2014) performs random walks over networks and introduces an efficient word representation learning model, Skip- Gram (Mikolov et al., 2013a), to learn network embeddings. LINE (Tang et al., 2015) optimizes the joint and conditional probabilities of edges in large-scale networks to learn vertex represen- tations. Node2vec (Grover and Leskovec, 2016) modifies the random walk strategy in DeepWalk into biased random walks to explore the network structure more efficiently. Nevertheless, most of these NE models only encode the structural information into vertex embeddings, without considering heterogeneous information accompanied with vertices in real-world social networks.

近年来,学者们提出了许多有效的学习顶点嵌入的模型(Tang和Liu,2009; Cao等,2015; Wang等,2016; Tu等,2016a)。 例如,DeepWalk(Perozzi等人,2014)在网络上执行随机游走,并引入了一种有效的单词表示学习模型Skip-Gram(Mikolov等人,2013a)来学习网络嵌入。 LINE(Tang等人,2015)优化了大型网络中边的联合概率和条件概率,以学习顶点表示。 Node2vec(Grover和Leskovec,2016)将DeepWalk中的随机游走策略修改为有偏的随机游走,以更有效地探索网络结构。 然而,大多数这些NE模型仅将结构信息编码作为顶点嵌入,而没有考虑现实世界社交网络中伴随顶点的异构信息。

To address this issue, researchers make great efforts to incorporate heterogeneous information into conventional NE models. For instance, Yang et al. (2015) present text-associated Deep- Walk (TADW) to improve matrix factorization based DeepWalk with text information. Tu et al. (2016b) propose max-margin DeepWalk (MMDW) to learn discriminative network representations by utilizing labeling information of vertices. Chen et al. (2016) introduce group- enhanced network embedding (GENE) to integrate existing group information in NE. Sun et al. (2016) regard text content as a special kind of vertices, and propose context-enhanced network embedding (CENE) through leveraging both structural and textural information to learn network embeddings.

为了解决这个问题,研究人员付出了巨大的努力将异构信息纳入常规的NE模型中。 例如,Yang等(2015年)提出了文本关联的深度漫游(TADW),以改进基于文本信息的基于矩阵分解的DeepWalk。 Tu等(2016b)提出了最大边距深度漫游(MMDW),通过利用顶点的标记信息来学习判别网络表示。 Chen等(2016)引入了组增强网络嵌入(GENE),以将现有组信息集成到NE中。 Sun等 (2016)将文本内容看作一种特殊的顶点,并通过利用结构和文本信息来学习网络嵌入,提出了上下文增强的网络嵌入(CENE)。

To the best of our knowledge, all existing NE models focus on learning context-free embeddings, but ignore the diverse roles when a vertex interacts with others. In contrast, we assume that a vertex has different embeddings according to which vertex it interacts with, and propose CANE to learn context-aware vertex embeddings.

据我们所知,所有现有的NE模型都专注于学习无上下文的嵌入,忽略了顶点与其他顶点交互时的各种角色。 相反,我们假设一个顶点与别的顶点交互而具有不同的嵌入,利用CANE学习上下文感知的顶点嵌入。

3 Problem Formulation

We first give basic notations and definitions in this work. Suppose there is an information network G = (V, E, T), where V is the set of vertices, E ⊆ V × V are edges between vertices, and T denotes the text information of vertices. Each edge \(e_{u,v} ∈ E\) represents the relationship between two vertices (u, v), with an associated weight \(w_{u,v}\). Here, the text information of a specific vertex v ∈ V is represented as a word sequence \(S_v = (w_1, w_2, . . . , w_{n_v})\), where \(n_v = |S_v|\). NRL aims to learn a low-dimensional embedding \(v ∈ R^d\) for each vertex v ∈ V according to its network structure and associated information, e.g. text and labels. Note that, d ≪ |V | is the dimension of rep- resentation space.

我们首先给出这项工作的基本概念和定义。 假设有一个信息网络G =(V,E,T),其中V是顶点的集合,E⊆V×V是顶点之间的边,T表示顶点的文本信息。 每个边 \(e_{u,v} ∈ E\)表示两个顶点(u,v)之间的关系,并具有关联的权重\(w_{u,v}\)。 这里,特定顶点v∈V的文本信息表示为单词序列\(S_v = (w_1, w_2, . . . , w_{n_v)}\),其中\(n_v = |S_v|\)。 NRL的目的是根据每个顶点v∈V的网络结构和相关信息(例如文本和标签)学习低维嵌入\(v ∈ R^d\),其中,d≪ | V | 是表示空间的维数。

Definition 1. Context-free Embeddings: Conventional NRL models learn context-free embedding for each vertex. It means the embedding of a vertex is fixed and won’t change with respect to its context information (i.e., another vertex it interacts with).

Definition 2. Context-aware Embeddings: Different from existing NRL models that learn context-free embeddings, CANE learns various embeddings for a vertex according to its different contexts. Specifically, for an edge \(e_{u,v} \), CANE learns context-aware embeddings\(v_{(u)}\) and \(u_{(v)}\).

定义1.无上下文嵌入:常规的NRL模型学习每个顶点的无上下文嵌入。 这意味着顶点的嵌入是固定的,不同上下文信息(即与之交互的另一个顶点)不会改变。

定义2.上下文感知的嵌入:与现有的学习无上下文嵌入的NRL模型不同,CANE根据顶点的不同上下文学习不同的嵌入。 具体来说,对于边\(e_{u,v} \),CANE学习上下文感知嵌入\(v_{(u)}\)和\(u_{(v)}\)。

4 The Method

4.1 Overall Framework

To take full use of both network structure and associated text information, we propose two types of embeddings for a vertex v, i.e., structure- based embedding \(v^s\) and text-based embedding \(v^t\). Structure-based embedding can capture the information in the network structure, while text-based embedding can capture the textual meanings lying in the associated text information. With these embeddings, we can simply concatenate them and obtain the vertex embeddings as \(v = v^s ⊕ v^t\), where ⊕ indicates the concatenation operation. Note that, the text-based embedding \(v^t\) can be either context-free or context-aware, which will be introduced detailedly in section 4.4 and 4.5 respectively. When \(v^t\) is context-aware, the over-all vertex embeddings v will be context-aware as well.

In the following part, we give the detailed introduction to the two objectives respectively.

为了充分利用网络结构和关联的文本信息,我们提出了针对顶点v的两种类型的嵌入,即基于结构的嵌入\(v^s\)和基于文本的嵌入\(v^t\)。 基于结构的嵌入可以捕获网络结构中的信息,而基于文本的嵌入可以捕获相关文本信息中的文本含义。 有了这些嵌入,我们可以简单地将它们串联起来,并获得顶点嵌入,如\(v = v^s ⊕ v^t\),其中⊕表示串联操作。 注意,基于文本的嵌入\(v^t\)可以是无上下文的或有上下文意识的,将分别在第4.4和4.5节中详细介绍。 当\(v^t\)是上下文感知的时,整个顶点嵌入v也将是上下文感知的。

在下面的部分中,我们分别对这两个目标进行详细介绍。

4.2 Structure-based Objective

Without loss of generality, we assume the network is directed, as an undirected edge can be consid- ered as two directed edges with opposite directions and equal weights. Thus, the structure-based objective aims to measure the log-likelihood of a directed edge us- ing the structure-based embeddings as

在不失一般性的前提下,我们假定网络是有向的,因为可以将无向边视为具有相反方向和相等权重的两个有向边。 因此,基于结构的目标旨在使用基于结构的嵌入来测量有向边的对数似然

4.3 Text-based Objective

Vertices in real-world social networks usually accompany with associated text information. Therefore, we propose the text-based objective to take advantage of these text information, as well as learn text-based embeddings for vertices.

现实世界中的社交网络中的顶点通常会伴随关联的文本信息。 因此,我们提出了基于文本的目标,以利用这些文本信息,并学习基于文本的顶点嵌入。

The text-based objective \(L_{t}(e)\) can be defined with various measurements. To be compatible with \(L_{s}(e)\), we define \(L_{t}(e)\) as follows:

\(L_ {t}(e) = α·L_{tt}(e) + β·L_{ts}(e) + γ·L_{st}(e), (5)\)

可以使用各种度量来定义基于文本的目标\( L_ {t}(e) \)。 为了与\( L_ {s}(e) \)兼容,我们定义\( L_ {t(e)} \)如下:

\(L_ {t}(e) = α·L_{tt}(e) + β·L_{ts}(e) + γ·L_{st}(e), (5)\)

where α, β and γ control the weights of various parts, and

\(L_{tt}(e) = w_{u,v} log p(v^t|u^t),\)

\(L_{ts}(e) = w_{u,v} log p(v^t|u^s) ,\)

\(L_{st}(e) = w_{u,v} log p(v^s|u^t) .\)

其中α,β和γ控制各个部分的权重,并且

\(L_{tt}(e) = w_{u,v} log p(v^t|u^t),\)

\(L_{ts}(e) = w_{u,v} log p(v^t|u^s) ,\)

\(L_{st}(e) = w_{u,v} log p(v^s|u^t) .\)

The conditional probabilities in Eq. (6) map the two types of vertex embeddings into the same representation space, but do not enforce them to be identical for the consideration of their own characteristics. Similarly, we employ softmax function for calculating the probabilities, as in Eq. (4).

等式中的条件概率(6)将两种类型的顶点嵌入映射到相同的表示空间中,但考虑到其自身的特性,不强制它们相同。类似地,我们使用如等式(4)中的softmax函数来计算概率。

The structure-based embeddings are regarded as parameters, the same as in conventional NE models. But for text-based embeddings, we intend to obtain them from associated text information of vertices. Besides, the text-based embeddings can be obtained either in context-free ways or context-aware ones. In the following sections, we will give detailed introduction respectively.

与传统的NE模型相同,将基于结构的嵌入视为参数。 但是对于基于文本的嵌入,我们打算从顶点的关联文本信息中获取它们。 此外,可以采用无上下文方式或上下文感知方式获得基于文本的嵌入。 在以下各节中,我们将分别进行详细介绍。

4.4 Context-Free Text Embedding

There has been a variety of neural models to obtain text embeddings from a word sequence, such as convolutional neural networks (CNN) (Blunsom et al., 2014; Johnson and Zhang, 2014; Kim, 2014) and recurrent neural networks (RNN) (Kiros et al., 2015; Tai et al., 2015).

有各种各样的神经模型可以从单词序列中获取文本嵌入,例如卷积神经网络(CNN)(Blunsom等人,2014; Johnson and Zhang,2014; Kim,2014)和递归神经网络(RNN) (Kiros等人,2015; Tai等人,2015)。

In this work, we investigate different neural networks for text modeling, including CNN, Bidirectional RNN (Schuster and Paliwal, 1997) and GRU (Cho et al., 2014), and employ the best performed CNN, which can capture the local semantic dependency among words.

在这项工作中,我们研究了用于文本建模的不同神经网络,包括CNN,双向RNN(Schuster和Paliwal,1997年)和GRU(Cho等人,2014年),并采用了性能最佳的CNN。 可以捕获单词之间的局部语义依赖性。

Taking the word sequence of a vertex as input, CNN obtains the text-based embedding through three layers, i.e. looking-up, convolution and pooling.

CNN以顶点的单词序列作为输入,通过三层(即查找,卷积和合并)获得了基于文本的嵌入。

Looking-up. Given a word sequence \(S_v = (w_1, w_2, . . . , w_{n_v)}\), the looking-up layer transforms each word \(w_i∈S\) into its corresponding word embedding \(wi_ ∈ R^{d′}\) and obtains embedding sequence as \(S_v = (w_1, w_2, . . . , w_{n_v)}\). Here, d′ indicates the dimension of word embeddings.

Convolution. After looking-up, the convolution layer extracts local features of input embedding sequence S. To be specific, it performs convolution operation over a sliding window of length \(l\) using a convolution matrix \(C ∈ R^{d×(l×d′)}\) as follows:

\(x_i = C·S_{i:i+l−1} + b, (7)\)

where \(S_{i:i+l−1}\) denotes the concatenation of word embeddings within the i-th window and b is the bias vector. Note that, we add zero padding vectors (Hu et al., 2014) at the edge of the sentence.

Max-pooling. To obtain the text embedding \(v_t\), we operate max-pooling and non-linear transformation over {\({x_0^i, . . . , x_n^i}\)} as follows:

\(r_i = tanh(max({x_0^i, . . . , x_n^i})), (8)\)

At last, we encode the text information of a vertex with CNN and obtain its text-based embedding \(v^t = [r_1, . . . , r_d]^T\) . As \(v^t\) is irrelevant to the other vertices it interacts with, we name it as context-free text embedding.

查找。给定单词序列\(S_v=(w_1,w_2,...,w_{n_v)}\),查找层将每个单词\(w_i∈S\)转换为相应的单词嵌入\(w_i∈R^{d'}\)并获得嵌入序列为\(S_v=(w_1,w_2,...,w_{n_v)}\)。在此,d'表示单词嵌入的维数。

卷积。查找后,卷积层提取输入嵌入序列S的局部特征。具体来说,它使用以下的卷积矩阵\(C ∈ R^{d×(l×d′)}\)在长度为\(l\)的滑动窗口上执行卷积运算:

\(x_i = C·S_{i:i+l−1} + b, (7)\)

其中,\(S_{i:i+1-1}\)表示第i个窗口内单词嵌入的连接,b是偏量。请注意,我们在句子的边上添加了零填充向量(Hu等,2014)。

最大池。为了获得嵌入\(v_t\)的文本,我们对{\({x_0^i, . . . , x_n^i}\)}进行最大池化和非线性变换如下:

\(r_i = tanh(max({x_0^i, . . . , x_n^i})), (8)\)

最后,我们使用CNN对顶点的文本信息进行编码,并获得基于文本的嵌入\(v^t = [r_1, . . . , r_d]^T\)。由于\(v^t\)与与其交互的其他顶点无关,我们将其命名为无上下文文本嵌入。

4.5 Context-Aware Text Embedding

As stated before, we assume that a specific vertex plays different roles when interacting with others vertices. In other words, each vertex should have its own points of focus about a specific vertex, which leads to its context-aware text embeddings.

To achieve this, we employ mutual attention to obtain context-aware text embedding. It enables the pooling layer in CNN to be aware of the vertex pair in an edge, in a way that text information from a vertex can directly affect the text embedding of the other vertex, and vice versa.

如前所述,我们假设特定顶点在与其他顶点交互时扮演着不同的角色。换句话说,每个顶点应该针对特定的顶点拥有不同的关注点,这引出节点上下文感知的文本嵌入。

为实现此目的,我们采用相互关注机制以获取上下文感知的文本嵌入。它使CNN中的池化层能够关注边上的一对顶点,从而使来自顶点的文本信息可以直接影响另一个顶点的文本嵌入,反之亦然。

In Fig. 2, we give an illustration of the generating process of context-aware text embedding. Given an edge \(e_{u,v}\) with two corresponding text sequences \(S_u\) and \(S_v\), we can get the matrices \(P∈R^{d×m}\) and \(Q∈R^{d×n}\) through convolution layer. Here, m and n represent the lengths of \(S_u\) and \(S_v\) respectively. By introducing an attentive matrix \(A∈R^{d×d}\), we compute the correlation matrix \(F∈R^{m×n}\) as follows:

\(F = tanh(P^TAQ). (9)\)

Note that, each element \(F_{i,j}\) in F represents the pair-wise correlation score between two hidden vectors, i.e., \(P_i\) and \(Q_j\).

在图2中,我们给出了上下文感知文本嵌入的生成过程的说明。给定一个具有两个对应文本序列\(S_u\)和\(S_v\)的边\(e_{u,v}\),我们可以通过卷积层得到矩阵\(P∈R^{d×m}\)和\(Q∈R^{d×n}\)。这里,m和n分别表示\(S_u\)和\(S_v\)的长度。通过引入一个注意矩阵\(A∈R^{d×d}\),我们计算相关矩阵\(F∈R^{m×n}\)如下:

\(F=tanh(P^TAQ).(9)\)

注意,F中的每个元素\(F_{i,j} \)表示两个隐藏向量之间的成对相关分数,即\(P_i\)和\(Q_j\)。

After that, we conduct pooling operations along rows and columns of F to generate the importance vectors, named as row-pooling and column pooling respectively. According to our experiments, mean-pooling performs better than max-pooling. Thus, we employ mean-pooling operation as follows:

\(g_i^p = mean(F_{i,1}, . . . , F_{i,n}),\)

\(g_i^p = mean(F_{1,i}, . . . , F_{m,i}). (10)\)

The importance vectors of P and Q are obtained as \(g^p = [g_1^p, . . . , g_m^p]^T\) and \(g^q =[g_1^q, . . . , g_n^q]^T\) .

之后,我们沿着F的行和列进行池化操作以生成重要性向量,分别称为行池化和列池化。根据我们的实验,均值池的性能优于最大池。 因此,我们采用平均池操作如下:

\(g_i^p = mean(F_{i,1}, . . . , F_{i,n}),\)

\(g_i^p = mean(F_{1,i}, . . . , F_{m,i}). (10)\)

P和Q的重要性向量为\(g^p = [g_1^p, . . . , g_m^p]^T\)和\(g^q =[g_1^q, . . . , g_n^q]^T\)。

At last, the context-aware text embeddings of u and v are computed as

\(u_{(v)}^t = Pa^p,\)

\(v_{(u)}^t = Qa^q.\)

Now, given an edge \((u, v)\), we can obtain the context-aware embeddings of vertices with their structure embeddings and context-aware text embeddings as \(u_{(v)} = u^s⊕u_{(v)}^t and v_{(u)} = v^s⊕v_{(u)}^t\).

最后,u和v的上下文感知文本嵌入计算如下

\(u_{(v)}^t = Pa^p,\)

\(v_{(u)}^t = Qa^q.\)

现在,给定边\((u, v)\)我们可以获得顶点的上下文感知嵌入,其结构嵌入和上下文感知文本嵌入为\(u_{(v)} = u^s⊕u_{(t)}^t 和 v_{(u)} = v^s⊕v_{(u)}^t\)。

4.6 Optimization of CANE

According to Eq. (3) and Eq. (6), CANE aims to maximize several conditional probabilities between \(u ∈ {u^s, u_{(v)} and v ∈ {v^s, v_{(u)}^t}\). It is intuitive that optimizing the conditional probability using softmax function is computationally expensive. Thus, we employ negative sampling (Mikolov et al., 2013b) and transform the objective into the following form:

where k is the number of negative samples and σ represents the sigmoid function. \(P(v)∝d_v^{3/4} \)denotes the distribution of vertices, where \(d_v\) is the out-degree of v.

根据等式(3)和等式(6),CANE的目标是最大化\(u∈{u^s,u_{(v)}和v∈{v^s,v_{(u)}^t}\)之间的几个条件概率。明显,使用softmax函数优化条件概率的计算量很大。因此,我们采用负采样(Mikolov等,2013b)并将目标转换为以下形式:

其中k是负样本数,σ表示S型函数。\(P(v)∝d_v^{3/4}\)表示顶点的分布,其中\(d_v\)是v的出度。

Afterward, we employ Adam (Kingma and Ba, 2015) to optimize the transformed objective. Note that, CANE is exactly capable of zero-shot scenarios, by generating text embeddings of new vertices with well-trained CNN.

之后,我们采用Adam(Kingma and Ba,2015)优化转换后的目标。 注意,CANE通过生成经过良好训练的CNN的新顶点的文本嵌入,完全能够实现零样本方案。

5 Experiments

To investigate the effectiveness of CANE on modeling relationships between vertices, we conduct experiments of link prediction on several realworld datasets. Besides, we also employ vertex classification to verify whether context-aware embeddings of a vertex can compose a high-quality context-free embedding in return.

为了研究CANE对顶点之间的关系建模的有效性,我们在几个真实世界的数据集上进行了链接预测的实验。 此外,我们还使用顶点分类来验证顶点的上下文感知嵌入比无上下文感知的嵌入具有更高的质量。

5.1 Datasets

We select three real-world network datasets as follows:

Cora is a typical paper citation network constructed by (McCallum et al., 2000). After filtering out papers without text information, there are 2277 machine learning papers in this network, which are divided into 7 categories.

HepTh (High Energy Physics Theory) is another citation network from arXiv released by (Leskovec et al., 2005). We filter out papers without abstract information and retain 1038 papers at last.

Zhihu is the largest online Q&A website in China. Users follow each other and answer questions on this site. We randomly crawl 10, 000 active users from Zhihu, and take the descriptions of their concerned topics as text information. The detailed statistics are listed in Table 1.

我们选择以下三个实际网络数据集:

Cora是(McCallum et al.,2000)构建的典型论文引用网络。 过滤掉没有文本信息的论文后,该网络中有2277篇机器学习论文,分为7类。

HepTh(高能物理理论)是arXiv发行的另一篇引文网络(Leskovec等,2005)。 我们筛选出没有摘要信息的论文,最后保留1038篇论文。

知乎网是中国最大的在线问答网站。用户互相关注,并在此站点上回答问题。我们随机从Zhihu抓取了10,000位活跃用户,并将他们所关注主题的描述作为文本信息。 表1中列出了详细的统计信息。

5.2 Baselines

We employ the following methods as baselines:

Structure-only:

MMB (Airoldi et al., 2008) (Mixed Membership Stochastic Blockmodel) is a conventional graphical model of relational data. It allows each vertex to randomly select a different ”topic” when forming an edge.

DeepWalk (Perozzi et al., 2014) performs random walks over networks and employ Skip-Gram model (Mikolov et al., 2013a) to learn vertex embeddings.

LINE (Tang et al., 2015) learns vertex embeddings in large-scale networks using first-order and second-order proximities.

Node2vec (Grover and Leskovec, 2016) proposes a biased random walk algorithm based on DeepWalk to explore neighborhood architecture more efficiently.

我们采用以下方法作为基准:

仅结构:

MMB(Airoldi等,2008)(Mixed Membership Stochastic Blockmodel)是关系数据的常规图形模型。它允许每个顶点在形成边时随机选择一个不同的“主题”。

DeepWalk(Perozzi等人,2014)通过网络执行随机游走,并使用Skip-Gram模型(Mikolov等人,2013a)来学习顶点嵌入。

LINE(Tang等人,2015)使用一阶和二阶邻近度学习大规模网络中的顶点嵌入。

Node2vec(Grover和Leskovec,2016)提出了一种基于DeepWalk的有偏随机游走算法,以更有效地探索邻域架构。

Structure and Text:

Naive Combination: We simply concatenate the best-performed structure-based embeddings with CNN based embeddings to represent the vertices.

TADW (Yang et al., 2015) employs matrix factorization to incorporate text features of vertices into network embeddings.

CENE (Sun et al., 2016) leverages both structure and textural information by regarding text content as a special kind of vertices, and optimizes the probabilities of heterogeneous links.

结构和文字:

Naive Combination:我们仅将性能最佳的基于结构的嵌入与基于CNN的嵌入连接起来,以表示顶点。

TADW(Yang等人,2015)使用矩阵分解将顶点的文本特征合并到网络嵌入中。

CENE(Sun et al。,2016)通过将文本内容视为一种特殊的顶点来利用结构和文本信息,并优化异构链接的概率。

5.3 Evaluation Metrics and Experiment Settings

For link prediction, we adopt a standard evaluation metric AUC (Hanley and McNeil, 1982), which represents the probability that vertices in a random unobserved link are more similar than those in a random nonexistent link.

For vertex classification, we employ L2-regularized logistic regression (L2R-LR) (Fan et al., 2008) to train classifiers, and evaluate the classification accuracies of various methods.

对于链接预测,我们采用标准评估度量AUC(Hanley和McNeil,1982),它表示未观察到的边中的顶点比随机选择的不存在的边的顶点更相似的概率。

对于顶点分类,我们采用L2正则逻辑回归(L2R-LR)(Fan等,2008)来训练分类器,并评估各种方法的分类精度。

To be fair, we set the embedding dimension to 200 for all methods. In LINE, we set the number of negative samples to 5; we learn the 100 dimensional first-order and second-order embeddings respectively, and concatenate them to form the 200 dimensional embeddings. In node2vec, we em- ploy grid search and select the best-performed hyper-parameters for training. We also apply grid search to set the hyper-parameters α, β and γ in CANE. Besides, we set the number of negative samples k to 1 in CANE to speed up the training process. To demonstrate the effectiveness of considering attention mechanism and two types of objectives in Eqs. (3) and (6), we design three versions of CANE for evaluation, i.e., CANE with text only, CANE without attention and CANE.

公平起见,我们将所有方法的嵌入维数都设置为200。在LINE中,我们将负样本数设置为5;我们分别学习了100维的一阶和二阶嵌入,并将它们连接起来以形成200维的嵌入。在node2vec中,我们采用网格搜索并选择性能最佳的超参数进行训练。我们还应用网格搜索来设置CANE中的超参数α,β和γ。此外,我们在CANE中将负样本数k设置为1,以加快训练过程。为了评估注意力机制和公式3和公式6中两种目标的有效性,我们设计了三个版本的CANE进行评估,即仅包含文本的CANE,无注意力机制的CANE和有注意力机制的CANE。

5.4 Link Prediction

As shown in Table 2, Table 3 and Table 4, we evaluate the AUC values while removing different ratios of edges on Cora, HepTh and Zhihu respec- tively. Note that, when we only keep 5% edges for training, most vertices are isolated, which results in the poor and meaningless performance of all the methods. Thus, we omit the results under this training ratio. From these tables, we have the following observations:

如表2,表3和表4所示,我们评估了AUC值,同时分别去除了Cora,HepTh和Zhihu上不同比例的边。注意,当我们仅保留5%的边进行训练,大多数顶点都是孤立的,这将导致所有方法的性能较差且毫无意义。因此,在此训练比率下,我们省略了结果。 从这些表中,我们有以下观察结果:

(1) Our proposed CANE consistently achieves significant improvement comparing to all the base- lines on all different datasets and different train- ing ratios. It indicates the effectiveness of CANE when applied to link prediction task, and verifies that CANE has the capability of modeling relationships between vertices precisely.

(1)与所有不同数据集和不同训练比率的所有基线相比,我们提出的CANE一直取得显着改善。它表明了将CANE应用于链接预测任务时的有效性,并证明了CANE具有精确建模顶点之间关系的能力。

(2) What calls for special attention is that, both CENE and TADW exhibit unstable performance under various training ratios. Specifically, CENE performs poorly under small training ratios, be- cause it reserves much more parameters (e.g. convolution kernels and word embeddings) than TADW, which need more data for training. Different from CENE, TADW performs much better under small training ratios, because DeepWalk based methods can explore the sparse network structure well through random walks even with limited edges. However, it achieves poor performance under large ones, as its simplicity and the limitation of bag-of-words assumption. On the contrary CANE has a stable performance in various situations. It demonstrates the flexibility and robustness of CANE.

(2)需要特别注意的是,CENE和TADW在各种训练比率下均表现出不稳定的性能。具体而言,CENE在小的训练比率下表现较差,因为与TADW相比,CENE保留了更多的参数(例如卷积核和词嵌入),后者需要更多的数据进行训练。与CENE不同,TADW在较小的训练比率下性能要好得多,这是因为基于DeepWalk的方法即使在边缘有限的情况下也可以通过随机游走很好地探索稀疏网络结构。然而,由于TADW的简单性和词袋假设的局限性,在较大的抽样比例下性能反而较差。相反,CANE在各种情况下均具有稳定的性能。 它展示了CANE的灵活性和鲁棒性。

(3) By introducing attention mechanism, the learnt context-aware embeddings obtain considerable improvements than the ones without attention. It verifies our assumption that a specific vertex should play different roles when interacting with other vertices, and thus benefits the relevant link prediction task.

(3)通过引入注意力机制,学习到的上下文感知嵌入比没有注意力的嵌入有了很大的改进。它验证了我们的假设,即特定顶点在与其他顶点交互时应扮演不同的角色,这有益于相关的链接预测任务。

To summarize, all the above observations demonstrate that CANE can learn high-quality context-aware embeddings, which are conducive to estimating the relationship between vertices precisely. Moreover, the experimental results on link prediction task state the effectiveness and robustness of CANE.

综上所述,以上所有观察结果表明,CANE可以学习高质量的上下文感知嵌入,这有助于精确估计顶点之间的关系。此外,关于链路预测任务的实验结果也表明了CANE的有效性和鲁棒性。

5.5 Vertex Classification

In CANE, we obtain various embeddings of a vertex according to the vertex it connects to. It’s intuitive that the obtained context-aware embeddings are naturally applicable to link prediction task. However, network analysis tasks, such as vertex classification and clustering, require a global embedding, rather than several context-aware embeddings for each vertex.

在CANE中,我们根据其连接到的顶点获得顶点的不同嵌入。直观地知道,获得的上下文感知嵌入自然适用于链接预测任务。但是,像顶点分类和聚类这样的网络分析任务需要全局嵌入,而不是每个顶点需要多个上下文感知的嵌入。

To demonstrate the capability of CANE to solve these issues, we generate the global embedding of a vertex u by simply averaging all the context-aware embeddings as follows:

为了演示CANE解决这些问题的能力,我们通过简单地平均所有上下文感知的嵌入来生成顶点u的全局嵌入,如下所示:

where N indicates the number of context-aware embeddings of u.

其中N表示u的上下文感知嵌入的数量。

With the generated global embeddings, we conduct 2-fold cross-validation and report the average accuracy of vertex classification on Cora. As shown in Fig. 3, we observe that:

利用生成的全局嵌入,我们进行了2次交叉验证,并报告了Cora上顶点分类的平均准确性。 如图3所示,我们观察到:

(1) CANE achieves comparable performance with state-of-the-art model CENE. It states that the learnt context-aware embeddings can transform into high-quality context-free embeddings through simple average operation, which can be further employed to other network analysis tasks.

(1)CANE与最先进的CENE模型取得可比的性能。它指出,学习到的上下文感知嵌入可以通过简单的平均操作转换为高质量的无上下的嵌入,并且可以进一步应用于其他网络分析任务。

(2) With the introduction of mutual attention mechanism, CANE has an encouraging improvement than the one without attention, which is in accordance with the results of link prediction. It denotes that CANE is flexible to various network analysis tasks.

(2)随着相互关注机制的引入,CANE比没有关注的机制有了令人鼓舞的改进,这与链路预测的结果是一致的。它表示CANE可以灵活地执行各种网络分析任务。

5.6 Case Study

To demonstrate the significance of mutual attention on selecting meaningful features from text information, we visualize the heat maps of two vertex pairs in Fig. 4. Note that, every word in this figure accompanies with various background colors. The stronger the background color is, the larger the weight of this word is. The weight of each word is calculated according to the attention weights as follows.

为了证明相互关注对从文本信息中选择有意义的特征的重要性,我们在图4中可视化了两个顶点对的热图。注意,该图中的每个单词都带有各种背景色。 背景颜色越强,此单词的权重就越大。 每个单词的权重根据关注权重计算如下。

For each vertex pair, we can get the attention weight of each convolution window according to Eq. (11). To obtain the weights of words, we as- sign the attention weight to each word in this window, and add the attention weights of a word together as its final weight.

对于每个顶点对,我们可以根据等式11获得每个卷积窗口的注意力权重。为了获得单词的权重,我们在该窗口中为每个单词分配注意力权重,并将单词的注意力权重加在一起作为其最终权重。

The proposed attention mechanism makes the relations between vertices explicit and interpretable. We select three connected vertices in Cora for example, denoted as A, B and C. From Fig. 4, we observe that, though there exists citation relations with identical paper A, paper B and C concern about different parts of A. The attention weights over A in edge #1 are assigned to “reinforcement learning”. On the contrary, the weights in edge #2 are assigned to “machine learning’”, “supervised learning algorithms” and “complex stochastic models”. Moreover, all these key elements in A can find corresponding words in B and C. It’s intuitive that these key elements give an exact explanation of the citation relations. The discovered significant correlations between vertex pairs reflect the effectiveness of mutual attention mechanism, as well as the capability of CANE for modeling relations precisely.

所提出的注意力机制使顶点之间的关系明确且可解释。例如,我们在Cora中选择三个连接的顶点,分别表示为A,B和C。从图4中,我们观察到,尽管存在与相同文章A的引用关系,但文章B和C关心A的不同部分。 边#1中超过A的权重被分配给“强化学习”。相反,边2中的权重分配给“机器学习”,“监督学习算法”和“复杂随机模型”。 此外,A中的所有这些关键元素都可以在B和C中找到对应的单词。很直观的是,这些关键元素对引文关系给出了准确的解释。发现的顶点对之间的显着相关性反映了相互关注机制的有效性以及CANE精确建模关系的能力。

6 Conclusion and Future Work

In this paper, we propose the concept of Context- Aware Network Embedding (CANE) for the first time, which aims to learn various context-aware embeddings for a vertex according to the neighbors it interacts with. Specifically, we implement CANE on text-based information networks with proposed mutual attention mechanism, and conduct experiments on several real-world information networks. Experimental results on link prediction demonstrate that CANE is effective for modeling the relationship between vertices. Besides, the learnt context-aware embeddings can compose high-quality context-free embeddings.

We will explore the following directions in future:

在本文中,我们首次提出了上下文感知网络嵌入(CANE)的概念,该模型让节点可以从与之交互的邻居顶点中学习到不同的上下文感知的嵌入。具体来说,我们通过在具有文本信息的网络上使用具有相互注意力机制的CANE模型,并在多个现实世界的信息网络上进行实验。链接预测的实验结果表明,CANE可有效地建模顶点之间的关系。此外,学习到的上下文感知嵌入可以组成高质量的无上下文嵌入。

将来我们将探索以下方向:

(1) We have investigated the effectiveness of CANE on text-based information networks. In future, we will strive to implement CANE on a wider variety of information networks with multi-modal data, such as labels, images and so on.

(1)我们研究了CANE在基于文本的信息网络上的有效性。将来,我们将努力在具有多种模式数据(例如标签,图像等)的各种信息网络上实施CANE。

(2) CANE encodes latent relations between vertices into their context-aware embeddings. Furthermore, there usually exist explicit relations in social networks (e.g., families, friends and colleagues relations between social network users), which are expected to be critical to NE. Thus, we want to explore how to incorporate and predict these explicit relations between vertices in NE.

(2)CANE将顶点之间的潜在关系编码为上下文感知的嵌入。 此外,社交网络中通常存在显式关系(例如,社交网络用户之间的家庭,朋友和同事关系),这对于NE至关重要。 因此,我们想探索如何在NE中合并和预测这些顶点之间的显式关系。

Acknowledgements

This work is supported by the 973 Program (No. 2014CB340501), the National Natural Science Foundation of China (NSFC No. 61572273, 61532010, 61661146007), and Tsinghua University Initiative Scientific Research Program (20151080406).

这项工作得到了973计划(编号2014CB340501),国家自然科学基金(NSFC编号61572273、61532010、61661146007)和清华大学自主科学研究计划(20151080406)的支持。

Context-Aware Network Embedding for Relation Modeling

论文:http://www.aclweb.org/anthology/P17-1158 Tu C.2017.ACL.CANE Context-Aware Network Embedding for Relation Modeling

本文地址:http://51blog.com/?p=6037
关注我们:请关注一下我们的微信公众号:扫描二维码广东高校数据家园_51博客的公众号,公众号:数博联盟
版权声明:本文为原创文章,版权归 jnussl 所有,欢迎分享本文,转载请保留出处!

发表评论


表情