9. 搭建模型
首先是载入预训练模型。
1 | from transformers import BertTokenizer, BertModel |
我们使用 Bert 预训练词向量与 GRU 组成模型,然后接一个全连接层。我们需要使用 with torch.no_grad()
避免预训练词向量发生变化。
1 | class BERTGRUSentiment(nn.Module): |
接下来我们使用标准超参数将模型实例化。
1 | HIDDEN_DIM = 256 |
由于 transformer 的训练量实在比较大,我们设置不更新 bert 的权重:
1 | for name, param in model.named_parameters(): |
10. 训练模型
优化器使用 Adam,损失函数使用 nn.BCEWithLogitsLoss()
。除此以外,我们再定义一个评价准确率的函数:
1 | def binary_accuracy(preds, y): |
训练函数:
1 | def train(model, iterator, optimizer, criterion): |
验证函数与训练函数类似,区别在于:
- 不更新权重;
- 没有优化器。训练函数与验证函数都写好了以后就可以进行真正的训练了。在每一轮里,我们首先更新权重,然后用新的权重去验证。如果验证的损失小于之前的最小值,我们保存当前的模型。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15def evaluate(model, iterator, criterion):
epoch_loss = 0
epoch_acc = 0
model.eval()
with torch.no_grad():
for batch in iterator:
predictions = model(batch.text).squeeze(1)
loss = criterion(predictions, batch.label)
acc = binary_accuracy(predictions, batch.label)
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)训练过程如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14N_EPOCHS = 10
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save(model.state_dict(), 'tut6-model.pt')
print(f'Epoch: {epoch+1:02}
print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
print(f'\t Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc*100:.2f}%')1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30Epoch: 01 | Epoch Time: 7m 5s
Train Loss: 0.468 | Train Acc: 76.80%
Val. Loss: 0.266 | Val. Acc: 89.47%
Epoch: 02 | Epoch Time: 7m 4s
Train Loss: 0.280 | Train Acc: 88.42%
Val. Loss: 0.244 | Val. Acc: 90.20%
Epoch: 03 | Epoch Time: 7m 4s
Train Loss: 0.239 | Train Acc: 90.48%
Val. Loss: 0.220 | Val. Acc: 91.07%
Epoch: 04 | Epoch Time: 7m 4s
Train Loss: 0.211 | Train Acc: 91.66%
Val. Loss: 0.236 | Val. Acc: 90.85%
Epoch: 05 | Epoch Time: 7m 5s
Train Loss: 0.187 | Train Acc: 92.91%
Val. Loss: 0.222 | Val. Acc: 91.12%
Epoch: 06 | Epoch Time: 7m 5s
Train Loss: 0.164 | Train Acc: 93.71%
Val. Loss: 0.251 | Val. Acc: 91.29%
Epoch: 07 | Epoch Time: 7m 4s
Train Loss: 0.137 | Train Acc: 94.94%
Val. Loss: 0.231 | Val. Acc: 90.73%
Epoch: 08 | Epoch Time: 7m 4s
Train Loss: 0.115 | Train Acc: 95.73%
Val. Loss: 0.374 | Val. Acc: 86.99%
Epoch: 09 | Epoch Time: 7m 4s
Train Loss: 0.095 | Train Acc: 96.57%
Val. Loss: 0.259 | Val. Acc: 91.22%
Epoch: 10 | Epoch Time: 7m 5s
Train Loss: 0.078 | Train Acc: 97.30%
Val. Loss: 0.282 | Val. Acc: 91.77%11. 模型推断
训练好模型以后,我们可以用这个模型来做推断。上面直接返回概率,也可以处理一下返回 positive 或 negative。1
2
3
4
5
6
7
8
9
10
11
12def predict_sentiment(model, tokenizer, sentence):
model.eval()
tokens = tokenizer.tokenize(sentence)
tokens = tokens[:max_input_length-2]
indexed = [init_token_idx] + tokenizer.convert_tokens_to_ids(tokens) + [eos_token_idx]
tensor = torch.LongTensor(indexed).to(device)
tensor = tensor.unsqueeze(0)
prediction = torch.sigmoid(model(tensor))
return prediction.item()
predict_sentiment(model, tokenizer, "This film is terrible") # 0.021611081436276436
predict_sentiment(model, tokenizer, "This film is great") # 0.9428628087043762