0%

DL-PyTorch-折桂-18:使用-TorchText-和-transformers-进行情感分类(2)

接上文

9. 搭建模型

首先是载入预训练模型。

1
2
3
from transformers import BertTokenizer, BertModel

bert = BertModel.from_pretrained('bert-base-uncased')

我们使用 Bert 预训练词向量与 GRU 组成模型,然后接一个全连接层。我们需要使用 with torch.no_grad() 避免预训练词向量发生变化。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
class BERTGRUSentiment(nn.Module):
def __init__(self,
bert,
hidden_dim,
output_dim,
n_layers,
bidirectional,
dropout):

super().__init__()
self.bert = bert
embedding_dim = bert.config.to_dict()['hidden_size']
self.rnn = nn.GRU(embedding_dim,
hidden_dim,
num_layers = n_layers,
bidirectional = bidirectional,
batch_first = True,
dropout = 0 if n_layers < 2 else dropout)

self.out = nn.Linear(hidden_dim * 2 if bidirectional else hidden_dim, output_dim)
self.dropout = nn.Dropout(0.5)

def forward(self, text):
#text = [batch size, sent len]
with torch.no_grad():
embedded = self.bert(text)[0]
#embedded = [batch size, sent len, emb dim]
_, hidden = self.rnn(embedded)

#hidden = [n layers * n directions, batch size, emb dim]
if self.rnn.bidirectional:
hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))
else:
hidden = self.dropout(hidden[-1,:,:])

#hidden = [batch size, hid dim]
output = self.out(hidden)

#output = [batch size, out dim]
return output

接下来我们使用标准超参数将模型实例化。

1
2
3
4
5
6
7
8
9
10
11
12
HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.25

model = BERTGRUSentiment(bert,
HIDDEN_DIM,
OUTPUT_DIM,
N_LAYERS,
BIDIRECTIONAL,
DROPOUT)

由于 transformer 的训练量实在比较大,我们设置不更新 bert 的权重:

1
2
3
for name, param in model.named_parameters():                
if name.startswith('bert'):
param.requires_grad = False

10. 训练模型

优化器使用 Adam,损失函数使用 nn.BCEWithLogitsLoss()。除此以外,我们再定义一个评价准确率的函数:

1
2
3
4
5
6
7
8
9
def binary_accuracy(preds, y):
"""
Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
"""
#round predictions to the closest integer
rounded_preds = torch.round(torch.sigmoid(preds))
correct = (rounded_preds == y).float() #convert into float for division
acc = correct.sum() / len(correct)
return acc

训练函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def train(model, iterator, optimizer, criterion):
epoch_loss = 0
epoch_acc = 0

model.train()

for batch in iterator:
optimizer.zero_grad()
predictions = model(batch.text).squeeze(1)
loss = criterion(predictions, batch.label)
acc = binary_accuracy(predictions, batch.label)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
epoch_acc += acc.item()

return epoch_loss / len(iterator), epoch_acc / len(iterator)

验证函数与训练函数类似,区别在于:

  1. 不更新权重;
  2. 没有优化器。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    def evaluate(model, iterator, criterion):
    epoch_loss = 0
    epoch_acc = 0

    model.eval()

    with torch.no_grad():
    for batch in iterator:
    predictions = model(batch.text).squeeze(1)
    loss = criterion(predictions, batch.label)
    acc = binary_accuracy(predictions, batch.label)
    epoch_loss += loss.item()
    epoch_acc += acc.item()

    return epoch_loss / len(iterator), epoch_acc / len(iterator)
    训练函数与验证函数都写好了以后就可以进行真正的训练了。在每一轮里,我们首先更新权重,然后用新的权重去验证。如果验证的损失小于之前的最小值,我们保存当前的模型。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    N_EPOCHS = 10
    best_valid_loss = float('inf')

    for epoch in range(N_EPOCHS):
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)

    if valid_loss < best_valid_loss:
    best_valid_loss = valid_loss
    torch.save(model.state_dict(), 'tut6-model.pt')

    print(f'Epoch: {epoch+1:02}
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc*100:.2f}%')
    训练过程如下:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    Epoch: 01 | Epoch Time: 7m 5s
    Train Loss: 0.468 | Train Acc: 76.80%
    Val. Loss: 0.266 | Val. Acc: 89.47%
    Epoch: 02 | Epoch Time: 7m 4s
    Train Loss: 0.280 | Train Acc: 88.42%
    Val. Loss: 0.244 | Val. Acc: 90.20%
    Epoch: 03 | Epoch Time: 7m 4s
    Train Loss: 0.239 | Train Acc: 90.48%
    Val. Loss: 0.220 | Val. Acc: 91.07%
    Epoch: 04 | Epoch Time: 7m 4s
    Train Loss: 0.211 | Train Acc: 91.66%
    Val. Loss: 0.236 | Val. Acc: 90.85%
    Epoch: 05 | Epoch Time: 7m 5s
    Train Loss: 0.187 | Train Acc: 92.91%
    Val. Loss: 0.222 | Val. Acc: 91.12%
    Epoch: 06 | Epoch Time: 7m 5s
    Train Loss: 0.164 | Train Acc: 93.71%
    Val. Loss: 0.251 | Val. Acc: 91.29%
    Epoch: 07 | Epoch Time: 7m 4s
    Train Loss: 0.137 | Train Acc: 94.94%
    Val. Loss: 0.231 | Val. Acc: 90.73%
    Epoch: 08 | Epoch Time: 7m 4s
    Train Loss: 0.115 | Train Acc: 95.73%
    Val. Loss: 0.374 | Val. Acc: 86.99%
    Epoch: 09 | Epoch Time: 7m 4s
    Train Loss: 0.095 | Train Acc: 96.57%
    Val. Loss: 0.259 | Val. Acc: 91.22%
    Epoch: 10 | Epoch Time: 7m 5s
    Train Loss: 0.078 | Train Acc: 97.30%
    Val. Loss: 0.282 | Val. Acc: 91.77%

    11. 模型推断

    训练好模型以后,我们可以用这个模型来做推断。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    def predict_sentiment(model, tokenizer, sentence):
    model.eval()
    tokens = tokenizer.tokenize(sentence)
    tokens = tokens[:max_input_length-2]
    indexed = [init_token_idx] + tokenizer.convert_tokens_to_ids(tokens) + [eos_token_idx]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(0)
    prediction = torch.sigmoid(model(tensor))
    return prediction.item()

    predict_sentiment(model, tokenizer, "This film is terrible") # 0.021611081436276436
    predict_sentiment(model, tokenizer, "This film is great") # 0.9428628087043762
    上面直接返回概率,也可以处理一下返回 positive 或 negative。

欢迎关注我的其它发布渠道