#deep-learning #nlp #bert-language-model #huggingface-transformers #question-answering
Вопрос:
Я использую предварительно обученную модель Берта для вопросов и ответов. Он возвращает правильный результат, но с большим количеством пробелов между текстом
Код приведен ниже :
def get_answer_using_bert(question, reference_text):
bert_model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
bert_tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
input_ids = bert_tokenizer.encode(question, reference_text)
input_tokens = bert_tokenizer.convert_ids_to_tokens(input_ids)
sep_location = input_ids.index(bert_tokenizer.sep_token_id)
first_seg_len, second_seg_len = sep_location 1, len(input_ids) - (sep_location 1)
seg_embedding = [0] * first_seg_len [1] * second_seg_len
model_scores = bert_model(torch.tensor([input_ids]),
token_type_ids=torch.tensor([seg_embedding]))
ans_start_loc, ans_end_loc = torch.argmax(model_scores[0]), torch.argmax(model_scores[1])
result = ' '.join(input_tokens[ans_start_loc:ans_end_loc 1])
result = result.replace('#', '')
return result
Далее следует код ниже :
reference_text = 'Mukesh Dhirubhai Ambani was born on 19 April 1957 in the British Crown colony of Aden (present-day Yemen) to Dhirubhai Ambani and Kokilaben Ambani. He has a younger brother Anil Ambani and two sisters, Nina Bhadrashyam Kothari and Dipti Dattaraj Salgaonkar. Ambani lived only briefly in Yemen, because his father decided to move back to India in 1958 to start a trading business that focused on spices and textiles. The latter was originally named Vimal but later changed to Only Vimal His family lived in a modest two-bedroom apartment in Bhuleshwar, Mumbai until the 1970s. The family financial status slightly improved when they moved to India but Ambani still lived in a communal society, used public transportation, and never received an allowance. Dhirubhai later purchased a 14-floor apartment block called Sea Wind in Colaba, where, until recently, Ambani and his brother lived with their families on different floors.'
question = 'What is the name of mukesh ambani brother?'
get_answer_using_bert(question, reference_text)
И результат таков :
'an il am ban i'
Может ли кто-нибудь помочь мне решить эту проблему. Это было бы действительно полезно.
Ответ №1:
Вы можете просто использовать функцию декодирования токенизатора:
bert_tokenizer.decode(input_ids[ans_start_loc:ans_end_loc 1])
Выход:
'anil ambani'
В случае, если вы не хотите использовать декодирование, вы можете использовать:
result.replace(' ##', '')
Комментарии:
1. Спасибо вам за ответ, и это действительно полезно.