欄目導航

新聞資訊

新聞資訊

我們將構建和訓練字符級RNN來對單詞進行分類。字符級RNN將單詞作為一系列字符讀取，在每一步輸出預測和“隱藏狀態”，將其先前的隱藏狀態輸入至下一時刻。我們將最終時刻輸出作為預測結果，即表示該詞屬于哪個類。

具體來說，我們將在18種語言構成的幾千個名字的數據集上訓練模型，根據一個名字的拼寫預測它是哪種語言的名字：

$ python predict.py Hinton
(-0.47) Scottish
(-1.52) English
(-3.57) Irish
$ python predict.py Schmidhuber
(-0.19) German
(-2.48) Czech
(-2.68) Dutch

1. 準備數據

下載數據（https://download.pytorch.org/tutorial/data.zip）并將其解壓到當前文件夾。

在"data/names"文件夾下是名稱為"[language].txt"的18個文本文件。每個文件的每一行都有一個名字，它們幾乎都是羅馬化的文本（但是我們仍需要將其從Unicode轉換為ASCII編碼）

我們最終會得到一個語言對應名字列表的字典，{language: [names ...]}。通用變量“category”和“line”（例子中的語言和名字單詞）用于以后的可擴展性。

from __future__ import unicode_literals, print_function, division
from io import open
import glob
import os

def findFiles(path): return glob.glob(path)

print(findFiles('data/names/*.txt'))

import unicodedata
import string

all_letters=string.ascii_letters + " .,;'"
n_letters=len(all_letters)

# 將Unicode字符串轉換為純ASCII, 感謝https://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
 return ''.join(
 c for c in unicodedata.normalize('NFD', s)
 if unicodedata.category(c) !='Mn'
 and c in all_letters
 )

print(unicodeToAscii('?lusàrski'))

# 構建category_lines字典，每種語言的名字列表
category_lines={}
all_categories=[]

# 讀取文件并分成幾行
def readLines(filename):
 lines=open(filename, encoding='utf-8').read().strip().split('\n')
 return [unicodeToAscii(line) for line in lines]

for filename in findFiles('data/names/*.txt'):
 category=os.path.splitext(os.path.basename(filename))[0]
 all_categories.append(category)
 lines=readLines(filename)
 category_lines[category]=lines

n_categories=len(all_categories)

輸出結果：

['data/names/French.txt', 'data/names/Czech.txt', 'data/names/Dutch.txt', 'data/names/Polish.txt', 'data/names/Scottish.txt', 'data/names/Chinese.txt', 'data/names/English.txt', 'data/names/Italian.txt', 'data/names/Portuguese.txt', 'data/names/Japanese.txt', 'data/names/German.txt', 'data/names/Russian.txt', 'data/names/Korean.txt', 'data/names/Arabic.txt', 'data/names/Greek.txt', 'data/names/Vietnamese.txt', 'data/names/Spanish.txt', 'data/names/Irish.txt']
Slusarski

現在我們有了category_lines，一個字典變量存儲每一種語言及其對應的每一行文本(名字)列表的映射關系。變量all_categories是全部語言種類的列表，變量n_categories是語言種類的數量，后續會使用。

print(category_lines['Italian'][:5])

輸出結果：

['Abandonato', 'Abatangelo', 'Abatantuono', 'Abate', 'Abategiovanni']

單詞轉變為張量

現在我們已經加載了所有的名字，我們需要將它們轉換為張量來使用它們。

我們使用大小為<1 x n_letters>的“one-hot 向量”表示一個字母。一個one-hot向量所有位置都填充為0，并在其表示的字母的位置表示為1，例如"b"=<0 1 0 0 0 ...>.（字母b的編號是2，第二個位置是1，其他位置是0）

我們使用一個<line_length x 1 x n_letters>的2D矩陣表示一個單詞

額外的1維是batch的維度，PyTorch默認所有的數據都是成batch處理的。我們這里只設置了batch的大小為1。

import torch

# 從all_letters中查找字母索引，例如 "a"=0
def letterToIndex(letter):
 return all_letters.find(letter)

# 僅用于演示，將字母轉換為<1 x n_letters> 張量
def letterToTensor(letter):
 tensor=torch.zeros(1, n_letters)
 tensor[0][letterToIndex(letter)]=1
 return tensor

# 將一行轉換為<line_length x 1 x n_letters>，
# 或一個0ne-hot字母向量的數組
def lineToTensor(line):
 tensor=torch.zeros(len(line), 1, n_letters)
 for li, letter in enumerate(line):
 tensor[li][0][letterToIndex(letter)]=1
 return tensor

print(letterToTensor('J'))

print(lineToTensor('Jones').size())

輸出結果：

tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
 0., 0., 0.]])
torch.Size([5, 1, 57])

2. 構造神經網絡

在autograd之前，要在Torch中構建一個可以復制之前時刻層參數的循環神經網絡。layer的隱藏狀態和梯度將交給計算圖自己處理。這意味著你可以像實現的常規的 feed-forward 層一樣，以很純粹的方式實現RNN。

這個RNN組件 (幾乎是復制的the PyTorch for Torch users tutorial（https://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#example-2-recurrent-net）) 僅使用兩層 linear 層對輸入和隱藏層做處理,在最后添加一層 LogSoftmax 層預測最終輸出。

import torch.nn as nn

class RNN(nn.Module):
 def __init__(self, input_size, hidden_size, output_size):
 super(RNN, self).__init__()

 self.hidden_size=hidden_size

 self.i2h=nn.Linear(input_size + hidden_size, hidden_size)
 self.i2o=nn.Linear(input_size + hidden_size, output_size)
 self.softmax=nn.LogSoftmax(dim=1)

 def forward(self, input, hidden):
 combined=torch.cat((input, hidden), 1)
 hidden=self.i2h(combined)
 output=self.i2o(combined)
 output=self.softmax(output)
 return output, hidden

 def initHidden(self):
 return torch.zeros(1, self.hidden_size)

n_hidden=128
rnn=RNN(n_letters, n_hidden, n_categories)

要運行此網絡的一個步驟，我們需要傳遞一個輸入（在我們的例子中，是當前字母的Tensor）和一個先前隱藏的狀態（我們首先將其初始化為零）。我們將返回輸出（每種語言的概率）和下一個隱藏狀態（為我們下一步保留使用）。

input=letterToTensor('A')
hidden=torch.zeros(1, n_hidden)

output, next_hidden=rnn(input, hidden)

為了提高效率，我們不希望為每一步都創建一個新的Tensor，因此我們將使用lineToTensor函數而不是letterToTensor函數，并使用切片方法。這一步可以通過預先計算批量的張量進一步優化。

input=lineToTensor('Albert')
hidden=torch.zeros(1, n_hidden)

output, next_hidden=rnn(input[0], hidden)
print(output)

輸出結果：

tensor([[-2.8857, -2.9005, -2.8386, -2.9397, -2.8594, -2.8785, -2.9361, -2.8270,
 -2.9602, -2.8583, -2.9244, -2.9112, -2.8545, -2.8715, -2.8328, -2.8233,
 -2.9685, -2.9780]], grad_fn=<LogSoftmaxBackward>)

可以看到輸出是一個<1 x n_categories>的張量，其中每一條代表這個單詞屬于某一類的可能性（越高可能性越大）。

3. 訓練

3.1 訓練前的準備

進行訓練步驟之前我們需要構建一些輔助函數。

第一個是當我們知道輸出結果對應每種類別的可能性時，解析神經網絡的輸出。我們可以使用 Tensor.topk函數得到最大值在結果中的位置索引：

def categoryFromOutput(output):
 top_n, top_i=output.topk(1)
 category_i=top_i[0].item()
 return all_categories[category_i], category_i

print(categoryFromOutput(output))

輸出結果：

('Arabic', 13)

第二個是我們需要一種快速獲取訓練示例（得到一個名字及其所屬的語言類別）的方法：

import random

def randomChoice(l):
 return l[random.randint(0, len(l) - 1)]

def randomTrainingExample():
 category=randomChoice(all_categories)
 line=randomChoice(category_lines[category])
 category_tensor=torch.tensor([all_categories.index(category)], dtype=torch.long)
 line_tensor=lineToTensor(line)
 return category, line, category_tensor, line_tensor

for i in range(10):
 category, line, category_tensor, line_tensor=randomTrainingExample()
 print('category=', category, '/ line=', line)

輸出結果：

category=Dutch / line=Tholberg
category=Irish / line=Murphy
category=Vietnamese / line=An
category=German / line=Von essen
category=Polish / line=Kijek
category=Scottish / line=Bell
category=Czech / line=Marik
category=Korean / line=Jeong
category=Korean / line=Choe
category=Portuguese / line=Alves

3.2 訓練神經網絡

現在，訓練過程只需要向神經網絡輸入大量的數據，讓它做出預測，并將對錯反饋給它。

nn.LogSoftmax作為最后一層layer時，nn.NLLLoss作為損失函數是合適的。

criterion=nn.NLLLoss()

訓練過程的每次循環將會發生：

構建輸入和目標張量
構建0初始化的隱藏狀態
讀入每一個字母
將當前隱藏狀態傳遞給下一字母
比較最終結果和目標
反向傳播
返回結果和損失

learning_rate=0.005 # If you set this too high, it might explode. If too low, it might not learn

def train(category_tensor, line_tensor):
 hidden=rnn.initHidden()

 rnn.zero_grad()

 for i in range(line_tensor.size()[0]):
 output, hidden=rnn(line_tensor[i], hidden)

 loss=criterion(output, category_tensor)
 loss.backward()

 # 將參數的梯度添加到其值中，乘以學習速率
 for p in rnn.parameters():
 p.data.add_(-learning_rate, p.grad.data)

 return output, loss.item()

現在我們只需要準備一些例子來運行程序。由于train函數同時返回輸出和損失，我們可以打印其輸出結果并跟蹤其損失畫圖。由于有1000個示例，我們每print_every次打印樣例，并求平均損失。

import time
import math

n_iters=100000
print_every=5000
plot_every=1000

# 跟蹤繪圖的損失
current_loss=0
all_losses=[]

def timeSince(since):
 now=time.time()
 s=now - since
 m=math.floor(s / 60)
 s -=m * 60
 return '%dm %ds' % (m, s)

start=time.time()

for iter in range(1, n_iters + 1):
 category, line, category_tensor, line_tensor=randomTrainingExample()
 output, loss=train(category_tensor, line_tensor)
 current_loss +=loss

 # 打印迭代的編號，損失，名字和猜測
 if iter % print_every==0:
 guess, guess_i=categoryFromOutput(output)
 correct='?' if guess==category else '? (%s)' % category
 print('%d %d%% (%s) %.4f %s / %s %s' % (iter, iter / n_iters * 100, timeSince(start), loss, line, guess, correct))

 # 將當前損失平均值添加到損失列表中
 if iter % plot_every==0:
 all_losses.append(current_loss / plot_every)
 current_loss=0

輸出結果：

5000 5% (0m 8s) 2.7792 Verdon / Scottish ? (English)
10000 10% (0m 16s) 2.0748 Campos / Greek ? (Portuguese)
15000 15% (0m 25s) 2.0458 Kuang / Vietnamese ? (Chinese)
20000 20% (0m 33s) 1.1703 Nghiem / Vietnamese ?
25000 25% (0m 41s) 2.6035 Boyle / English ? (Scottish)
30000 30% (0m 50s) 2.2823 Mozdzierz / Dutch ? (Polish)
35000 35% (0m 58s) nan Lagana / Irish ? (Italian)
40000 40% (1m 6s) nan Simonis / Irish ? (Dutch)
45000 45% (1m 15s) nan Nobunaga / Irish ? (Japanese)
50000 50% (1m 23s) nan Ingermann / Irish ? (English)
55000 55% (1m 31s) nan Govorin / Irish ? (Russian)
60000 60% (1m 39s) nan Janson / Irish ? (German)
65000 65% (1m 48s) nan Tsangaris / Irish ? (Greek)
70000 70% (1m 56s) nan Vlasenkov / Irish ? (Russian)
75000 75% (2m 4s) nan Needham / Irish ? (English)
80000 80% (2m 12s) nan Matsoukis / Irish ? (Greek)
85000 85% (2m 21s) nan Koo / Irish ? (Chinese)
90000 90% (2m 29s) nan Novotny / Irish ? (Czech)
95000 95% (2m 37s) nan Dubois / Irish ? (French)
100000 100% (2m 45s) nan Padovano / Irish ? (Italian)

3.3 繪畫出結果

從all_losses得到歷史損失記錄，反映了神經網絡的學習情況：

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

plt.figure()
plt.plot(all_losses)

4. 評價結果

為了了解網絡在不同類別上的表現，我們將創建一個混淆矩陣，顯示每種語言（行）和神經網絡將其預測為哪種語言（列）。為了計算混淆矩陣，使用evaluate()函數處理了一批數據，evaluate()函數與去掉反向傳播的train()函數大體相同。

# 在混淆矩陣中跟蹤正確的猜測
confusion=torch.zeros(n_categories, n_categories)
n_confusion=10000

# 只需返回給定一行的輸出
def evaluate(line_tensor):
 hidden=rnn.initHidden()

 for i in range(line_tensor.size()[0]):
 output, hidden=rnn(line_tensor[i], hidden)

 return output

# 查看一堆正確猜到的例子和記錄
for i in range(n_confusion):
 category, line, category_tensor, line_tensor=randomTrainingExample()
 output=evaluate(line_tensor)
 guess, guess_i=categoryFromOutput(output)
 category_i=all_categories.index(category)
 confusion[category_i][guess_i] +=1

# 通過將每一行除以其總和來歸一化
for i in range(n_categories):
 confusion[i]=confusion[i] / confusion[i].sum()

# 設置繪圖
fig=plt.figure()
ax=fig.add_subplot(111)
cax=ax.matshow(confusion.numpy())
fig.colorbar(cax)

# 設置軸
ax.set_xticklabels([''] + all_categories, rotation=90)
ax.set_yticklabels([''] + all_categories)

# 每個刻度線強制標簽
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

# sphinx_gallery_thumbnail_number=2
plt.show()

你可以從主軸線以外挑出亮的點，顯示模型預測錯了哪些語言，例如漢語預測為了韓語，西班牙預測為了意大利。看上去在希臘語上效果很好，在英語上表現欠佳。（可能是因為英語與其他語言的重疊較多）。

處理用戶輸入

def predict(input_line, n_predictions=3):
 print('\n> %s' % input_line)
 with torch.no_grad():
 output=evaluate(lineToTensor(input_line))

 # 獲得前N個類別
 topv, topi=output.topk(n_predictions, 1, True)
 predictions=[]

 for i in range(n_predictions):
 value=topv[0][i].item()
 category_index=topi[0][i].item()
 print('(%.2f) %s' % (value, all_categories[category_index]))
 predictions.append([value, all_categories[category_index]])

predict('Dovesky')
predict('Jackson')
predict('Satoshi')

輸出結果：

> Dovesky
(-0.74) Russian
(-0.77) Czech
(-3.31) English
> Jackson
(-0.80) Scottish
(-1.69) English
(-1.84) Russian
> Satoshi
(-1.16) Japanese
(-1.89) Arabic
(-1.90) Polish

最終版的腳本in the Practical PyTorch repo （https://github.com/spro/practical-pytorch/tree/master/char-rnn-classification）將上述代碼拆分為幾個文件：

data.py (讀取文件)
model.py (構造RNN網絡)
train.py (運行訓練過程)
predict.py (在命令行中和參數一起運行predict()函數)
server.py (使用bottle.py構建JSON API的預測服務)

運行train.py來訓練和保存網絡

將predict.py和一個名字的單詞一起運行查看預測結果 :

$ python predict.py Hazaki
(-0.42) Japanese
(-1.39) Polish
(-3.51) Czech

運行server.py并訪問http://localhost:5533/Yourname 得到JSON格式的預測輸出

【分享成果，隨喜正能量】尊嚴這個東西，其實是和欲望成反比的，你想得到一個東西，就會變得低三下四，死皮賴臉，而當你對眼前這個人，這件事無動于衷的時候，尊嚴就會在你心中拔地而起。。

跟我學VBA，我這里專注VBA, 授人以漁。我98年開始，從源碼接觸VBA已經20余年了，隨著年齡的增長，越來越覺得有必要把這項技能傳遞給需要這項技術的職場人員。希望職場和數據打交道的朋友，都來學習VBA,利用VBA,起碼可以提高自己的工作效率，可以有時間多陪陪父母，多陪陪家人，何樂而不為呢？

這講我們繼續學習64位Office API聲明語句第85講，這些內容是MS的權威資料，看似枯燥，但對于想學習API函數的朋友是非常有用的。

' LB_SETCOUNT sent to non-lazy listbox.

Const ERROR_SETCOUNT_ON_BAD_LB=1433&

' This list box does not support tab stops.

Const ERROR_LB_WITHOUT_TABSTOPS=1434&

' Cannot destroy object created by another thread.

Const ERROR_DESTROY_OBJECT_OF_OTHER_THREAD=1435&

' Child windows cannot have menus.

Const ERROR_CHILD_WINDOW_MENU=1436&

' The window does not have a system menu.

Const ERROR_NO_SYSTEM_MENU=1437&

' Invalid message box style.

Const ERROR_INVALID_MSGBOX_STYLE=1438&

' Invalid system-wide (SPI_) parameter.

Const ERROR_INVALID_SPI_VALUE=1439&

' Screen already locked.

Const ERROR_SCREEN_ALREADY_LOCKED=1440&

' All handles to windows in a multiple-window position structure must

' have the same parent.

Const ERROR_HWNDS_HAVE_DIFF_PARENT=1441&

' The window is not a child window.

Const ERROR_NOT_CHILD_WINDOW=1442&

' Invalid GW_ command.

Const ERROR_INVALID_GW_COMMAND=1443&

' Invalid thread identifier.

Const ERROR_INVALID_THREAD_ID=1444&

' Cannot process a message from a window that is not a multiple document

' interface (MDI) window.

Const ERROR_NON_MDICHILD_WINDOW=1445&

' Popup menu already active.

Const ERROR_POPUP_ALREADY_ACTIVE=1446&

' The window does not have scroll bars.

Const ERROR_NO_SCROLLBARS=1447&

' Scroll bar range cannot be greater than 0x7FFF.

Const ERROR_INVALID_SCROLLBAR_RANGE=1448&

' Cannot show or remove the window in the way specified.

Const ERROR_INVALID_SHOWWIN_COMMAND=1449&

' End of WinUser error codes

' /////////////////////////

' //

' Eventlog Status Codes //

' //

' /////////////////////////

' The event log file is corrupt.

Const ERROR_EVENTLOG_FILE_CORRUPT=1500&

' No event log file could be opened, so the event logging service did not start.

Const ERROR_EVENTLOG_CANT_START=1501&

' The event log file is full.

Const ERROR_LOG_FILE_FULL=1502&

' The event log file has changed between reads.

Const ERROR_EVENTLOG_FILE_CHANGED=1503&

' End of eventlog error codes

' /////////////////////////

' //

' RPC Status Codes //

' //

' /////////////////////////

' The string binding is invalid.

Const RPC_S_INVALID_STRING_BINDING=1700&

' The binding handle is not the correct type.

Const RPC_S_WRONG_KIND_OF_BINDING=1701&

' The binding handle is invalid.

Const RPC_S_INVALID_BINDING=1702&

' The RPC protocol sequence is not supported.

Const RPC_S_PROTSEQ_NOT_SUPPORTED=1703&

' The RPC protocol sequence is invalid.

Const RPC_S_INVALID_RPC_PROTSEQ=1704&

' The string universal unique identifier (UUID) is invalid.

Const RPC_S_INVALID_STRING_UUID=1705&

' The endpoint format is invalid.

Const RPC_S_INVALID_ENDPOINT_FORMAT=1706&

' The network address is invalid.

Const RPC_S_INVALID_NET_ADDR=1707&

' No endpoint was found.

Const RPC_S_NO_ENDPOINT_FOUND=1708&

' The timeout value is invalid.

Const RPC_S_INVALID_TIMEOUT=1709&

' The object universal unique identifier (UUID) was not found.

Const RPC_S_OBJECT_NOT_FOUND=1710&

' The object universal unique identifier (UUID) has already been registered.

Const RPC_S_ALREADY_REGISTERED=1711&

' The type universal unique identifier (UUID) has already been registered.

Const RPC_S_TYPE_ALREADY_REGISTERED=1712&

' The RPC server is already listening.

Const RPC_S_ALREADY_LISTENING=1713&

' No protocol sequences have been registered.

Const RPC_S_NO_PROTSEQS_REGISTERED=1714&

' The RPC server is not listening.

Const RPC_S_NOT_LISTENING=1715&

' The manager type is unknown.

Const RPC_S_UNKNOWN_MGR_TYPE=1716&

' The interface is unknown.

Const RPC_S_UNKNOWN_IF=1717&

' There are no bindings.

Const RPC_S_NO_BINDINGS=1718&

' There are no protocol sequences.

Const RPC_S_NO_PROTSEQS=1719&

' The endpoint cannot be created.

Const RPC_S_CANT_CREATE_ENDPOINT=1720&

' Not enough resources are available to complete this operation.

Const RPC_S_OUT_OF_RESOURCES=1721&

' The RPC server is unavailable.

Const RPC_S_SERVER_UNAVAILABLE=1722&

' The RPC server is too busy to complete this operation.

Const RPC_S_SERVER_TOO_BUSY=1723&

' The network options are invalid.

Const RPC_S_INVALID_NETWORK_OPTIONS=1724&

' There is not a remote procedure call active in this thread.

Const RPC_S_NO_CALL_ACTIVE=1725&

' The remote procedure call failed.

Const RPC_S_CALL_FAILED=1726&

' The remote procedure call failed and did not execute.

Const RPC_S_CALL_FAILED_DNE=1727&

' A remote procedure call (RPC) protocol error occurred.

Const RPC_S_PROTOCOL_ERROR=1728&

我20多年的VBA實踐經驗，全部濃縮在下面的各個教程中：

【分享成果，隨喜正能量】心平氣和地告別過去，只爭朝夕地活在當下，淡定從容地迎接未來，看山神靜，觀海心闊，心態平和，知足常樂！過耳的虛話，過眼的云煙，辛苦事小，傷心事大，說者未必真心，聽者也無須多心。善解怨，結善緣。人生的必修課是接受無常，人生的選修課是放下執著。。

欧美vvv,亚洲第一成人在线,亚洲成人欧美日韩在线观看,日本猛少妇猛色XXXXX猛叫

1. 準備數據

2. 構造神經網絡

3. 訓練

4. 評價結果