# CANE:Context-Aware Network Embedding for Relation Modeling翻译

Network embedding (NE) is playing a critical role in network analysis, due to its ability to represent vertices with efﬁcient low-dimensional embedding vectors. However, existing NE models aim to learn a ﬁxed context-free embedding for each vertex and neglect the diverse roles when interacting with other vertices. In this paper, we assume that one vertex usually shows different aspects when interacting with different neighbor vertices, and should own different embeddings respectively. Therefore, we present Context-Aware Network Embedding (CANE), a novel NE model to address this issue. CANE learns context-aware embeddings for vertices with mutual attention mechanism and is expected to model the semantic relationships between vertices more precisely. In experiments, we compare our model with existing NE models on three real-world datasets. Experimental results show that CANE achieves signiﬁcant improvement than state-of-the-art methods on link prediction and comparable performance on vertex classiﬁcation. The source code and datasets can be obtained from https://github.com/ thunlp/CANE.

1 Introduction

Network embedding (NE), i.e., network representation learning (NRL), aims to map vertices of a network into a low-dimensional space according to their structural roles in the network. NE provides an efﬁcient and effective way to represent and manage large-scale networks, alleviating the computation and sparsity issues of conventional symbol-based representations. Hence, NE is attracting many research interests in recent years (Perozzi et al., 2014; Tang et al., 2015; Grover and Leskovec, 2016), and achieves promising performance on many network analysis tasks including link prediction, vertex classiﬁcation, and community detection.

In real-world social networks, it is intuitive that one vertex may demonstrate various aspects when interacting with different neighbor vertices. For example, a researcher usually collaborates with various partners on diverse research topics (as illustrated in Fig. 1), a social-media user contacts with various friends sharing distinct interests, and a web page links to multiple pages for different purposes. However, most existing NE methods only arrange one single embedding vector to each vertex, and give rise to the following two invertible issues: (1) These methods cannot ﬂexibly cope with the aspect transition of a vertex when interacting with different neighbors. (2) In these models, a vertex tends to force the embeddings of its neighbors close to each other, which may be not the case all the time. For example, the left user and right user in Fig. 1, share less common interests, but are learned to be close to each other since they both link to the middle person. This will accordingly make vertex embeddings indiscriminative.

To address these issues, we aim to propose a Context-Aware Network Embedding (CANE) framework for modeling relationships between vertices precisely. More speciﬁcally, we present CANE on information networks, where each vertex also contains rich external information such as text, labels or other meta-data, and the signiﬁcance of context is more critical for NE in this scenario. Without loss of generality, we implement CANE on text-based information networks in this paper, which can easily extend to other types of information networks.

In conventional NE models, each vertex is represented as a static embedding vector, denoted as context-free embedding. On the contrary, CANE assigns dynamic embeddings to a vertex according to different neighbors it interacts with, named as context-aware embedding. Take a vertex u and its neighbor vertex v for example. The context- free embedding of u remains unchanged when interacting with different neighbors. On the contrary, the context-aware embedding of u is dynamic when confronting different neighbors.

When u interacting with v, their context em- beddings concerning each other are derived from their text information, $$S_u$$ and $$S_v$$ respectively. For each vertex, we can easily use neural models, such as convolutional neural networks (Blunsom et al., 2014; Johnson and Zhang, 2014; Kim, 2014) and recurrent neural networks (Kiros et al., 2015; Tai et al., 2015), to build context-free text-based embedding. In order to realize context-aware text- based embeddings, we introduce the selective attention scheme and build mutual attention between u and v into these neural models. The mutual attention is expected to guide neural models to emphasize those words that are focused by its neighbor vertices and eventually obtain context- aware embeddings.

Both context-free embeddings and context- aware embeddings of each vertex can be efﬁciently learned together via concatenation using existing NE methods such as DeepWalk (Perozzi et al., 2014), LINE (Tang et al., 2015) and node2vec (Grover and Leskovec, 2016).

We conduct experiments on three real-world datasets of different areas. Experimental results on link prediction reveal the effectiveness of our framework as compared to other state-of-the-art methods. The results suggest that context-aware embeddings are critical for network analysis, in particular for those tasks concerning about complicated interactions between vertices such as link prediction. We also explore the performance of our framework via vertex classiﬁcation and case studies, which again conﬁrms the ﬂexibility and superiority of our models.

2 Related Work

With the rapid growth of large-scale social networks, network embedding, i.e. network representation learning has been proposed as a critical technique for network analysis tasks.

In recent years, there have been a large number of NE models proposed to learn efﬁcient vertex embeddings (Tang and Liu, 2009; Cao et al., 2015; Wang et al., 2016; Tu et al., 2016a). For example, DeepWalk (Perozzi et al., 2014) performs random walks over networks and introduces an efﬁcient word representation learning model, Skip- Gram (Mikolov et al., 2013a), to learn network embeddings. LINE (Tang et al., 2015) optimizes the joint and conditional probabilities of edges in large-scale networks to learn vertex represen- tations. Node2vec (Grover and Leskovec, 2016) modiﬁes the random walk strategy in DeepWalk into biased random walks to explore the network structure more efﬁciently. Nevertheless, most of these NE models only encode the structural information into vertex embeddings, without considering heterogeneous information accompanied with vertices in real-world social networks.

To address this issue, researchers make great efforts to incorporate heterogeneous information into conventional NE models. For instance, Yang et al. (2015) present text-associated Deep- Walk (TADW) to improve matrix factorization based DeepWalk with text information. Tu et al. (2016b) propose max-margin DeepWalk (MMDW) to learn discriminative network representations by utilizing labeling information of vertices. Chen et al. (2016) introduce group- enhanced network embedding (GENE) to integrate existing group information in NE. Sun et al. (2016) regard text content as a special kind of vertices, and propose context-enhanced network embedding (CENE) through leveraging both structural and textural information to learn network embeddings.

To the best of our knowledge, all existing NE models focus on learning context-free embeddings, but ignore the diverse roles when a vertex interacts with others. In contrast, we assume that a vertex has different embeddings according to which vertex it interacts with, and propose CANE to learn context-aware vertex embeddings.

3 Problem Formulation

We ﬁrst give basic notations and deﬁnitions in this work. Suppose there is an information network G = (V, E, T), where V is the set of vertices, E ⊆ V × V are edges between vertices, and T denotes the text information of vertices. Each edge $$e_{u,v} ∈ E$$ represents the relationship between two vertices (u, v), with an associated weight $$w_{u,v}$$. Here, the text information of a speciﬁc vertex v ∈ V is represented as a word sequence $$S_v = (w_1, w_2, . . . , w_{n_v})$$, where $$n_v = |S_v|$$. NRL aims to learn a low-dimensional embedding $$v ∈ R^d$$ for each vertex v ∈ V according to its network structure and associated information, e.g. text and labels. Note that, d ≪ |V | is the dimension of rep- resentation space.

Deﬁnition 1. Context-free Embeddings: Conventional NRL models learn context-free embedding for each vertex. It means the embedding of a vertex is ﬁxed and won’t change with respect to its context information (i.e., another vertex it interacts with).

Deﬁnition 2. Context-aware Embeddings: Different from existing NRL models that learn context-free embeddings, CANE learns various embeddings for a vertex according to its different contexts. Speciﬁcally, for an edge $$e_{u,v}$$, CANE learns context-aware embeddings$$v_{(u)}$$ and $$u_{(v)}$$.

4 The Method

4.1 Overall Framework

To take full use of both network structure and associated text information, we propose two types of embeddings for a vertex v, i.e., structure- based embedding $$v^s$$ and text-based embedding $$v^t$$. Structure-based embedding can capture the information in the network structure, while text-based embedding can capture the textual meanings lying in the associated text information. With these embeddings, we can simply concatenate them and obtain the vertex embeddings as $$v = v^s ⊕ v^t$$, where ⊕ indicates the concatenation operation. Note that, the text-based embedding $$v^t$$ can be either context-free or context-aware, which will be introduced detailedly in section 4.4 and 4.5 respectively. When $$v^t$$ is context-aware, the over-all vertex embeddings v will be context-aware as well.

In the following part, we give the detailed introduction to the two objectives respectively.

4.2 Structure-based Objective

Without loss of generality, we assume the network is directed, as an undirected edge can be consid- ered as two directed edges with opposite directions and equal weights. Thus, the structure-based objective aims to measure the log-likelihood of a directed edge us- ing the structure-based embeddings as

4.3 Text-based Objective

Vertices in real-world social networks usually accompany with associated text information. Therefore, we propose the text-based objective to take advantage of these text information, as well as learn text-based embeddings for vertices.

The text-based objective $$L_{t}(e)$$ can be deﬁned with various measurements. To be compatible with $$L_{s}(e)$$, we deﬁne $$L_{t}(e)$$ as follows:

$$L_ {t}(e) = α·L_{tt}(e) + β·L_{ts}(e) + γ·L_{st}(e), (5)$$

$$L_ {t}(e) = α·L_{tt}(e) + β·L_{ts}(e) + γ·L_{st}(e), (5)$$

where α, β and γ control the weights of various parts, and

$$L_{tt}(e) = w_{u,v} log p(v^t|u^t),$$

$$L_{ts}(e) = w_{u,v} log p(v^t|u^s) ,$$

$$L_{st}(e) = w_{u,v} log p(v^s|u^t) .$$

$$L_{tt}(e) = w_{u,v} log p(v^t|u^t),$$

$$L_{ts}(e) = w_{u,v} log p(v^t|u^s) ,$$

$$L_{st}(e) = w_{u,v} log p(v^s|u^t) .$$

The conditional probabilities in Eq. (6) map the two types of vertex embeddings into the same representation space, but do not enforce them to be identical for the consideration of their own characteristics. Similarly, we employ softmax function for calculating the probabilities, as in Eq. (4).

The structure-based embeddings are regarded as parameters, the same as in conventional NE models. But for text-based embeddings, we intend to obtain them from associated text information of vertices. Besides, the text-based embeddings can be obtained either in context-free ways or context-aware ones. In the following sections, we will give detailed introduction respectively.

4.4 Context-Free Text Embedding

There has been a variety of neural models to obtain text embeddings from a word sequence, such as convolutional neural networks (CNN) (Blunsom et al., 2014; Johnson and Zhang, 2014; Kim, 2014) and recurrent neural networks (RNN) (Kiros et al., 2015; Tai et al., 2015).

In this work, we investigate different neural networks for text modeling, including CNN, Bidirectional RNN (Schuster and Paliwal, 1997) and GRU (Cho et al., 2014), and employ the best performed CNN, which can capture the local semantic dependency among words.

Taking the word sequence of a vertex as input, CNN obtains the text-based embedding through three layers, i.e. looking-up, convolution and pooling.

CNN以顶点的单词序列作为输入，通过三层（即查找，卷积和合并）获得了基于文本的嵌入。

Looking-up. Given a word sequence $$S_v = (w_1, w_2, . . . , w_{n_v)}$$, the looking-up layer transforms each word $$w_i∈S$$ into its corresponding word embedding $$wi_ ∈ R^{d′}$$ and obtains embedding sequence as $$S_v = (w_1, w_2, . . . , w_{n_v)}$$. Here, d′ indicates the dimension of word embeddings.

Convolution. After looking-up, the convolution layer extracts local features of input embedding sequence S. To be speciﬁc, it performs convolution operation over a sliding window of length $$l$$ using a convolution matrix $$C ∈ R^{d×(l×d′)}$$ as follows:

$$x_i = C·S_{i:i+l−1} + b, (7)$$

where $$S_{i:i+l−1}$$ denotes the concatenation of word embeddings within the i-th window and b is the bias vector. Note that, we add zero padding vectors (Hu et al., 2014) at the edge of the sentence.

Max-pooling. To obtain the text embedding $$v_t$$, we operate max-pooling and non-linear transformation over {$${x_0^i, . . . , x_n^i}$$} as follows:

$$r_i = tanh(max({x_0^i, . . . , x_n^i})), (8)$$

At last, we encode the text information of a vertex with CNN and obtain its text-based embedding $$v^t = [r_1, . . . , r_d]^T$$ . As $$v^t$$ is irrelevant to the other vertices it interacts with, we name it as context-free text embedding.

$$x_i = C·S_{i:i+l−1} + b, (7)$$

$$r_i = tanh(max({x_0^i, . . . , x_n^i})), (8)$$

4.5 Context-Aware Text Embedding

As stated before, we assume that a speciﬁc vertex plays different roles when interacting with others vertices. In other words, each vertex should have its own points of focus about a speciﬁc vertex, which leads to its context-aware text embeddings.

To achieve this, we employ mutual attention to obtain context-aware text embedding. It enables the pooling layer in CNN to be aware of the vertex pair in an edge, in a way that text information from a vertex can directly affect the text embedding of the other vertex, and vice versa.

In Fig. 2, we give an illustration of the generating process of context-aware text embedding. Given an edge $$e_{u,v}$$ with two corresponding text sequences $$S_u$$ and $$S_v$$, we can get the matrices $$P∈R^{d×m}$$ and $$Q∈R^{d×n}$$ through convolution layer. Here, m and n represent the lengths of $$S_u$$ and $$S_v$$ respectively. By introducing an attentive matrix $$A∈R^{d×d}$$, we compute the correlation matrix $$F∈R^{m×n}$$ as follows:

$$F = tanh(P^TAQ). (9)$$

Note that, each element $$F_{i,j}$$ in F represents the pair-wise correlation score between two hidden vectors, i.e., $$P_i$$ and $$Q_j$$.

$$F=tanh(P^TAQ).(9)$$

After that, we conduct pooling operations along rows and columns of F to generate the importance vectors, named as row-pooling and column pooling respectively. According to our experiments, mean-pooling performs better than max-pooling. Thus, we employ mean-pooling operation as follows:

$$g_i^p = mean(F_{i,1}, . . . , F_{i,n}),$$

$$g_i^p = mean(F_{1,i}, . . . , F_{m,i}). (10)$$

The importance vectors of P and Q are obtained as $$g^p = [g_1^p, . . . , g_m^p]^T$$ and $$g^q =[g_1^q, . . . , g_n^q]^T$$ .

$$g_i^p = mean(F_{i,1}, . . . , F_{i,n}),$$

$$g_i^p = mean(F_{1,i}, . . . , F_{m,i}). (10)$$

P和Q的重要性向量为$$g^p = [g_1^p, . . . , g_m^p]^T$$和$$g^q =[g_1^q, . . . , g_n^q]^T$$。

At last, the context-aware text embeddings of u and v are computed as

$$u_{(v)}^t = Pa^p,$$

$$v_{(u)}^t = Qa^q.$$

Now, given an edge $$(u, v)$$, we can obtain the context-aware embeddings of vertices with their structure embeddings and context-aware text embeddings as $$u_{(v)} = u^s⊕u_{(v)}^t and v_{(u)} = v^s⊕v_{(u)}^t$$.

$$u_{(v)}^t = Pa^p,$$

$$v_{(u)}^t = Qa^q.$$

4.6 Optimization of CANE

According to Eq. (3) and Eq. (6), CANE aims to maximize several conditional probabilities between $$u ∈ {u^s, u_{(v)} and v ∈ {v^s, v_{(u)}^t}$$. It is intuitive that optimizing the conditional probability using softmax function is computationally expensive. Thus, we employ negative sampling (Mikolov et al., 2013b) and transform the objective into the following form:

where k is the number of negative samples and σ represents the sigmoid function. $$P(v)∝d_v^{3/4}$$denotes the distribution of vertices, where $$d_v$$ is the out-degree of v.

Afterward, we employ Adam (Kingma and Ba, 2015) to optimize the transformed objective. Note that, CANE is exactly capable of zero-shot scenarios, by generating text embeddings of new vertices with well-trained CNN.

5 Experiments

To investigate the effectiveness of CANE on modeling relationships between vertices, we conduct experiments of link prediction on several realworld datasets. Besides, we also employ vertex classiﬁcation to verify whether context-aware embeddings of a vertex can compose a high-quality context-free embedding in return.

5.1 Datasets

We select three real-world network datasets as follows:

Cora is a typical paper citation network constructed by (McCallum et al., 2000). After ﬁltering out papers without text information, there are 2277 machine learning papers in this network, which are divided into 7 categories.

HepTh (High Energy Physics Theory) is another citation network from arXiv released by (Leskovec et al., 2005). We ﬁlter out papers without abstract information and retain 1038 papers at last.

Zhihu is the largest online Q&A website in China. Users follow each other and answer questions on this site. We randomly crawl 10, 000 active users from Zhihu, and take the descriptions of their concerned topics as text information. The detailed statistics are listed in Table 1.

Cora是（McCallum et al.，2000）构建的典型论文引用网络。 过滤掉没有文本信息的论文后，该网络中有2277篇机器学习论文，分为7类。

HepTh（高能物理理论）是arXiv发行的另一篇引文网络（Leskovec等，2005）。 我们筛选出没有摘要信息的论文，最后保留1038篇论文。

5.2 Baselines

We employ the following methods as baselines:

Structure-only:

MMB (Airoldi et al., 2008) (Mixed Membership Stochastic Blockmodel) is a conventional graphical model of relational data. It allows each vertex to randomly select a different ”topic” when forming an edge.

DeepWalk (Perozzi et al., 2014) performs random walks over networks and employ Skip-Gram model (Mikolov et al., 2013a) to learn vertex embeddings.

LINE (Tang et al., 2015) learns vertex embeddings in large-scale networks using ﬁrst-order and second-order proximities.

Node2vec (Grover and Leskovec, 2016) proposes a biased random walk algorithm based on DeepWalk to explore neighborhood architecture more efﬁciently.

MMB（Airoldi等，2008）（Mixed Membership Stochastic Blockmodel）是关系数据的常规图形模型。它允许每个顶点在形成边时随机选择一个不同的“主题”。

DeepWalk（Perozzi等人，2014）通过网络执行随机游走，并使用Skip-Gram模型（Mikolov等人，2013a）来学习顶点嵌入。

LINE（Tang等人，2015）使用一阶和二阶邻近度学习大规模网络中的顶点嵌入。

Node2vec（Grover和Leskovec，2016）提出了一种基于DeepWalk的有偏随机游走算法，以更有效地探索邻域架构。

Structure and Text:

Naive Combination: We simply concatenate the best-performed structure-based embeddings with CNN based embeddings to represent the vertices.

TADW (Yang et al., 2015) employs matrix factorization to incorporate text features of vertices into network embeddings.

CENE (Sun et al., 2016) leverages both structure and textural information by regarding text content as a special kind of vertices, and optimizes the probabilities of heterogeneous links.

Naive Combination：我们仅将性能最佳的基于结构的嵌入与基于CNN的嵌入连接起来，以表示顶点。

CENE（Sun et al。，2016）通过将文本内容视为一种特殊的顶点来利用结构和文本信息，并优化异构链接的概率。

5.3 Evaluation Metrics and Experiment Settings

For link prediction, we adopt a standard evaluation metric AUC (Hanley and McNeil, 1982), which represents the probability that vertices in a random unobserved link are more similar than those in a random nonexistent link.

For vertex classiﬁcation, we employ L2-regularized logistic regression (L2R-LR) (Fan et al., 2008) to train classiﬁers, and evaluate the classiﬁcation accuracies of various methods.

To be fair, we set the embedding dimension to 200 for all methods. In LINE, we set the number of negative samples to 5; we learn the 100 dimensional ﬁrst-order and second-order embeddings respectively, and concatenate them to form the 200 dimensional embeddings. In node2vec, we em- ploy grid search and select the best-performed hyper-parameters for training. We also apply grid search to set the hyper-parameters α, β and γ in CANE. Besides, we set the number of negative samples k to 1 in CANE to speed up the training process. To demonstrate the effectiveness of considering attention mechanism and two types of objectives in Eqs. (3) and (6), we design three versions of CANE for evaluation, i.e., CANE with text only, CANE without attention and CANE.

As shown in Table 2, Table 3 and Table 4, we evaluate the AUC values while removing different ratios of edges on Cora, HepTh and Zhihu respec- tively. Note that, when we only keep 5% edges for training, most vertices are isolated, which results in the poor and meaningless performance of all the methods. Thus, we omit the results under this training ratio. From these tables, we have the following observations:

(1) Our proposed CANE consistently achieves signiﬁcant improvement comparing to all the base- lines on all different datasets and different train- ing ratios. It indicates the effectiveness of CANE when applied to link prediction task, and veriﬁes that CANE has the capability of modeling relationships between vertices precisely.

（1）与所有不同数据集和不同训练比率的所有基线相比，我们提出的CANE一直取得显着改善。它表明了将CANE应用于链接预测任务时的有效性，并证明了CANE具有精确建模顶点之间关系的能力。

(2) What calls for special attention is that, both CENE and TADW exhibit unstable performance under various training ratios. Speciﬁcally, CENE performs poorly under small training ratios, be- cause it reserves much more parameters (e.g. convolution kernels and word embeddings) than TADW, which need more data for training. Different from CENE, TADW performs much better under small training ratios, because DeepWalk based methods can explore the sparse network structure well through random walks even with limited edges. However, it achieves poor performance under large ones, as its simplicity and the limitation of bag-of-words assumption. On the contrary CANE has a stable performance in various situations. It demonstrates the ﬂexibility and robustness of CANE.

(3) By introducing attention mechanism, the learnt context-aware embeddings obtain considerable improvements than the ones without attention. It veriﬁes our assumption that a speciﬁc vertex should play different roles when interacting with other vertices, and thus beneﬁts the relevant link prediction task.

（3）通过引入注意力机制，学习到的上下文感知嵌入比没有注意力的嵌入有了很大的改进。它验证了我们的假设，即特定顶点在与其他顶点交互时应扮演不同的角色，这有益于相关的链接预测任务。

To summarize, all the above observations demonstrate that CANE can learn high-quality context-aware embeddings, which are conducive to estimating the relationship between vertices precisely. Moreover, the experimental results on link prediction task state the effectiveness and robustness of CANE.

5.5 Vertex Classiﬁcation

In CANE, we obtain various embeddings of a vertex according to the vertex it connects to. It’s intuitive that the obtained context-aware embeddings are naturally applicable to link prediction task. However, network analysis tasks, such as vertex classiﬁcation and clustering, require a global embedding, rather than several context-aware embeddings for each vertex.

To demonstrate the capability of CANE to solve these issues, we generate the global embedding of a vertex u by simply averaging all the context-aware embeddings as follows:

where N indicates the number of context-aware embeddings of u.

With the generated global embeddings, we conduct 2-fold cross-validation and report the average accuracy of vertex classiﬁcation on Cora. As shown in Fig. 3, we observe that:

(1) CANE achieves comparable performance with state-of-the-art model CENE. It states that the learnt context-aware embeddings can transform into high-quality context-free embeddings through simple average operation, which can be further employed to other network analysis tasks.

（1）CANE与最先进的CENE模型取得可比的性能。它指出，学习到的上下文感知嵌入可以通过简单的平均操作转换为高质量的无上下的嵌入，并且可以进一步应用于其他网络分析任务。

(2) With the introduction of mutual attention mechanism, CANE has an encouraging improvement than the one without attention, which is in accordance with the results of link prediction. It denotes that CANE is ﬂexible to various network analysis tasks.

（2）随着相互关注机制的引入，CANE比没有关注的机制有了令人鼓舞的改进，这与链路预测的结果是一致的。它表示CANE可以灵活地执行各种网络分析任务。

5.6 Case Study

To demonstrate the signiﬁcance of mutual attention on selecting meaningful features from text information, we visualize the heat maps of two vertex pairs in Fig. 4. Note that, every word in this ﬁgure accompanies with various background colors. The stronger the background color is, the larger the weight of this word is. The weight of each word is calculated according to the attention weights as follows.

For each vertex pair, we can get the attention weight of each convolution window according to Eq. (11). To obtain the weights of words, we as- sign the attention weight to each word in this window, and add the attention weights of a word together as its ﬁnal weight.

The proposed attention mechanism makes the relations between vertices explicit and interpretable. We select three connected vertices in Cora for example, denoted as A, B and C. From Fig. 4, we observe that, though there exists citation relations with identical paper A, paper B and C concern about different parts of A. The attention weights over A in edge #1 are assigned to “reinforcement learning”. On the contrary, the weights in edge #2 are assigned to “machine learning’”, “supervised learning algorithms” and “complex stochastic models”. Moreover, all these key elements in A can ﬁnd corresponding words in B and C. It’s intuitive that these key elements give an exact explanation of the citation relations. The discovered signiﬁcant correlations between vertex pairs reﬂect the effectiveness of mutual attention mechanism, as well as the capability of CANE for modeling relations precisely.

6 Conclusion and Future Work

In this paper, we propose the concept of Context- Aware Network Embedding (CANE) for the ﬁrst time, which aims to learn various context-aware embeddings for a vertex according to the neighbors it interacts with. Speciﬁcally, we implement CANE on text-based information networks with proposed mutual attention mechanism, and conduct experiments on several real-world information networks. Experimental results on link prediction demonstrate that CANE is effective for modeling the relationship between vertices. Besides, the learnt context-aware embeddings can compose high-quality context-free embeddings.

We will explore the following directions in future:

(1) We have investigated the effectiveness of CANE on text-based information networks. In future, we will strive to implement CANE on a wider variety of information networks with multi-modal data, such as labels, images and so on.

（1）我们研究了CANE在基于文本的信息网络上的有效性。将来，我们将努力在具有多种模式数据（例如标签，图像等）的各种信息网络上实施CANE。

(2) CANE encodes latent relations between vertices into their context-aware embeddings. Furthermore, there usually exist explicit relations in social networks (e.g., families, friends and colleagues relations between social network users), which are expected to be critical to NE. Thus, we want to explore how to incorporate and predict these explicit relations between vertices in NE.

（2）CANE将顶点之间的潜在关系编码为上下文感知的嵌入。 此外，社交网络中通常存在显式关系（例如，社交网络用户之间的家庭，朋友和同事关系），这对于NE至关重要。 因此，我们想探索如何在NE中合并和预测这些顶点之间的显式关系。

Acknowledgements

This work is supported by the 973 Program (No. 2014CB340501), the National Natural Science Foundation of China (NSFC No. 61572273, 61532010, 61661146007), and Tsinghua University Initiative Scientiﬁc Research Program (20151080406).

Context-Aware Network Embedding for Relation Modeling