Обобщение Gensim, возвращающее повторяющиеся строки в виде резюме текстовых документов

#python #nlp #gensim #summarization #summarize

#python #nlp #gensim #обобщение #обобщить

Вопрос:

Я получаю повторяющиеся строки в выводе моего обобщителя. Я использую genism в python для обобщения текстовых документов. Как удалить повторяющиеся строки из выходных данных сумматора. На выходе получается повторяющееся содержимое. Как я могу сохранить только уникальные строки в выходных данных из сумматора.Входной файл выглядит следующим образом

 From: Jos
To: Halley, Ibizo /FR
Cc: pqr Secretariat; Björnsson Ulrika
Subject: [EXTERNAL] pqr Response to Letter of Intent for a Variation WS procedure:SE/H/xxxx/WS/
Date: vendredi 1 juin 2018 13:16:48
Attachments: image001.jpg

A07_SE_xxx yy Ramp;D.PDF

Dear Ibizo,

Thank you for your letter of intent.

The pqr agrees, on the basis of the documentation provided, that the above mentioned work-
sharing application as specified in the enclosed letter of intent is acceptable for submission under
Article 20 of the Commission Regulation (EC) No 1234/2008 of 24 November 2008.

The reference authority for the worksharing procedure will be Sweden and the assigned work sharing
procedure number will be:

A07: SE/H/xxxx/WS/



Please be advised that this confirmation is not to be considered as validation of your application. The
validity of the worksharing application will be checked by the reference authority after submission.

Please liaise with the assigned reference authority for the further proceedings.


Kind regards,


Joe
Assistant Administrator
Parallel Distribution amp; Certificates
Committees amp; Inspections Department
Panthers Medicines Agency
30 ABC St, Michigan lane
Fax  44 (0)20 certificate@zz.europa.eu | www.zz.europa.eu


This message and any attachment contain information which may be confidential or otherwise
protected from disclosure. It is intended for the addressee(s) only and should not be relied upon as
legal advice unless it is otherwise stated. If you are not the intended recipient(s) (or authorised by
an addressee who received this message), access to this e-mail, or any disclosure or copying of its
contents, or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If
you have received this e-mail in error, please inform the sender immediately.
P Please consider the environment and don't print this e-mail unless you really need to



From: Jos 
Sent: 30 April 2018 11:17
To: Ibizo.Halley@xxx.com
Cc: pqr Secretariat
Subject: RE: Alfuzosin Hydrochloride - Request for Worksharing procedure

Dear Ibizo,
Thank you for your zzil.
The letter of intent will be discussed in the May 2018 pqr meeting and you will receive feedback
within two weeks following the meeting.



Kind regards,


Joe
Assistant Administrator
Parallel Distribution amp; Certificates
Committees amp; Inspections Department

mailto:eretta.ab@zz.europa.eu
mailto:Ibizo.Halley@xxx.com
mailto:H-pqrSecretariat@zz.europa.eu
mailto:Ulrika.Bjornsson@mpa.se
mailto:certificate@zz.europa.eu

pqr/162/2010/Rev.2, August 2014 








26 April 2018 

pqr Secretariat 
Panthers Medicines Agency 
30 Bluegoon Place, ABC Wharf 
ABC E14 5EU  
United Kingdom 



Subject: Letter of intent for the submission of a worksharing procedure to the pqr according 


to Article 20 of Commission Regulation (EC) No 1234/2008 



Worksharing Applicant details: 


Name  : xxx-yy Ramp;D 


   Address : 1, lane Pierre Brossolette  
91385 Chilly-Maz 
Sw



Contact person details  
(i.e. name, address, e-mail 
address, phone number, fax 
number) 


: Ibizo Halley 
1, lane Pierre Brossolette  
91385 Chilly-Maz
Sw 
zzil: Ibizo.halley@xxx.com 
Tel :   33 1 60 49 51 61 





Application details: 

This letter of intent for the submission of a Type II following a worksharing procedure according to 
Article 20 of Commission Regulation (EC) No 1234/2008, concerns the following medicinal products 
authorised via MRP and national procedures: 


Products authorized via MRP: 

Alfuzosin 2.5 mg film-coated tablets 

Product name Active 


substance(s) 
MRP number 


XATRAL Alfuzosin 
hydrochloride 


SE/H/0112/001 











mailto:Ibizo.halley@xxx.com





Alfuzosin 5 mg prolonged-release tablets 

Product name Active 


substance(s) 
MRP number 


XATRAL SR 5 MG Alfuzosin 
hydrochloride 


SE/H/0112/002 


XATRAL Alfuzosin 
hydrochloride 


SE/H/0112/002 



Alfuzosin 10 mg prolonged-release tablets 

Product name Active 


substance(s) 
MRP number 


XATRAL UNO       10 MG Alfuzosin 
hydrochloride 


SE/H/0112/003 


ALFUZOSIN WINTHROP 
UNO 10 MG 


Alfuzosin 
hydrochloride 


DE/H/2130/001 


ALFUZOSIN ZENTIVA 10 
MG 


Alfuzosin 
hydrochloride 


DE/H/2131/001/MR 


UROXATRAL Alfuzosin 
hydrochloride 


DE/H/2129/001 


Alfuzosin Zentiva    10 mg 
Retardtabletten 


Alfuzosin 
hydrochloride 


DE/H/2131/001 


XATRAL OD 10 MG Alfuzosin 
hydrochloride 


SE/H/0112/003 




Products authorised via national procedure:  

Alfuzosin 2.5 mg film-coated tablets 

Product name Active 


substance(s) 
National MA 


number 
Member state 


XATRAL Alfuzosin 
hydrochloride 


NO APPLICATION 
CODE -#10600 


Denmark 


XATRAL 2.5 MG Alfuzosin 
hydrochloride 


NL 14785 France 


ALFUZOSIN 
WINTHROP 2.5 MG 


Alfuzosin 
hydrochloride 


32177.00.00 Germany 


UROXATRAL Alfuzosin 
hydrochloride 


18111.00.00 Germany 


XATRAL Alfuzosin 
hydrochloride 


NO APPLICATION 
CODE -#10602 


Greece 


XATRAL 2.5 MG Alfuzosin 
hydrochloride 


PA 540/162/1 Ireland 


XATRAL Alfuzosin 
hydrochloride 


027314018 Italy 


MITTOVAL Alfuzosin 
hydrochloride 


026670024 Italy 


ALFUZOSINA 
ZENTIVA 


Alfuzosin 
hydrochloride 


NO APPLICATION 
CODE -#10163 


Italy 


XATRAL Alfuzosin 
hydrochloride 


RVG 13689 Netherlands 


DALFAZ Alfuzosin 
hydrochloride 


R/6812 Poland 


BENESTAN 2.5 MG Alfuzosin 
hydrochloride 


60031 Spain 


XATRAL 2.5 MG Alfuzosin 
hydrochloride 


PL 04425/0655 United Kingdom 







ALFUZOSIN 
HYDROCHLORIDE 


2.5MG 


Alfuzosin 
hydrochloride 


PL 17780/0220 United Kingdom 






Alfuzosin 5 mg prolonged-release tablets 

Product name Active 


substance(s) 
National MA 


number 
Member state 


XATRAL 5 RETARD Alfuzosin 
hydrochloride 


NAT-H-4908-01 Belgium 


XATRAL Alfuzosin 
hydrochloride 


17139 



Cyprus 


XATRAL LP 5 MG Alfuzosin 
hydrochloride 


NL 19090 France 


ALFUZOSIN 
WINTHROP 5 MG 


Alfuzosin 
hydrochloride 


34637.00.00 Germany 


XATRAL Alfuzosin 
hydrochloride 


NO APPLICATION 
CODE -#10812 


Greece 


ALFETIM SR 5 MG Alfuzosin 
hydrochloride 


OGYI-T-4374/01 Hungary 


ALFUZOSINA 
ZENTIVA 


Alfuzosin 
hydrochloride 


NO APPLICATION 
CODE -#8994 


Italy 


XATRAL 5 RETARD Alfuzosin 
hydrochloride 


583/98/12/4785 Luxembourg 


XATRAL SR 5 MG Alfuzosin 
hydrochloride 


MA082/05001 Malta 


DALFAZ SR Alfuzosin 
hydrochloride 


8127 Poland 


XATRAL LP 5 MG Alfuzosin 
hydrochloride 


1026/2008 Romania 


XATRAL 5-SR Alfuzosin 
hydrochloride 


77/0275/96-S  Slovakia 


BENESTAN 
RETARD 5 MG 


Alfuzosin 
hydrochloride 


60767 Spain 








Alfuzosin 10 mg prolonged-release tablets 

Product name Active 


substance(s) 
National MA 


number 
Member state 


XATRAL UNO       
10 MG 


Alfuzosin 
hydrochloride 


NAT-H-4908-04 Belgium 


XATRAL XL 10 MG Alfuzosin 
hydrochloride 


19244  Cyprus 


XATRAL SR 10 MG Alfuzosin 
hydrochloride 


345201 Estonia 


XATRAL CR 10 MG Alfuzosin 
hydrochloride 


13973 Finland 


ALFUZOSINE 
ZENTIVA LP 10 MG 


Alfuzosin 
hydrochloride 


NL 24407 France 


XATRAL LP 10 MG Alfuzosin 
hydrochloride 


NL 24386 France 


XATRAL OD Alfuzosin 
hydrochloride 


NO APPLICATION 
CODE -#9520 


Greece 







ALFETIM UNO     
10 MG 


Alfuzosin 
hydrochloride 


OGYI-T-8022/01 Hungary 


XATRAL 10 MG Alfuzosin 
hydrochloride 


PA 540/162/3 Ireland 


MITTOVAL Alfuzosin 
hydrochloride 


026670048-051 Italy 


XATRAL 10 MG Alfuzosin 
hydrochloride 


027314044-057 Italy 


ALFUZOSINA 
ZENTIVA 


Alfuzosin 
hydrochloride 


NO APPLICATION 
CODE -#9579 


Italy 


XATRAL SR 10 MG Alfuzosin 
hydrochloride 


99-0702 Latvia 


XATRAL SR 10 MG Alfuzosin 
hydrochloride 


LT-2000/7118/10 Lithuania 


XATRAL UNO       
10 MG 


Alfuzosin 
hydrochloride 


0005/01/09/0045 Luxembourg 


XATRAL XL 10 MG Alfuzosin 
hydrochloride 


MA082/05002 Malta 


XATRAL XR 10 MG Alfuzosin 
hydrochloride 


RVG 23923 Netherlands 


DALFAZ UNO Alfuzosin 
hydrochloride 


8378 Poland 


BENESTAN OD    
10 MG 


Alfuzosin 
hydrochloride 


99/H/0006/01 Portugal 


ALFUZOSINA 
ZENTIVA, 10 MG 


Alfuzosin 
hydrochloride 


99/H/0007/001 Portugal 


XATRAL SR 10 MG Alfuzosin 
hydrochloride 


7893/2006 Romania 


UNIBENESTAN    
10 MG 


Alfuzosin 
hydrochloride 



63605 


Spain 


XATRAL XL 10 MG Alfuzosin 
hydrochloride 


PL 04425/0657 United Kingdom 


BESAVAR XL Alfuzosin 
hydrochloride 


PL 17780/0221 United Kingdom 








The following variation is intended to be part of the work-sharing procedure: 





Number as in the 
classification guideline: 


Title of variation as in the classification 
guideline 


Type of variation: 



C.I.4 



Changes in the Summary of Product 
Characteristics, Labelling or package 
Leaflet due new quality, preclinical, 
clinical or pharmacovigilance data 



Type II 








Justification for worksharing : xxx submitted for alfuzosin hydrochloride separate national and MRP variations for implementation of CCDS V13 including 
among other topics the addition of a contraindication to strong 
CYP3A4 inhibitors in the sections 4.3 and 4.5. 

The MAH received on 04 April 2018 a letter from pqr 
(zz/pqr/195547/2018) requesting to re-submit the variation 
for this contraindication as a work-sharing application including 







all MRP and nationally authorised products to harmonise the 
assessment of the contraindication in section 4.3 and 4.5 of the 
SmPC across the EU (provided in Annex I). 





Justification for grouping :  Not applicable 






Intended submission date : 30 June 2018 





Preferred Reference Authority 



: The Para Medical Products Agency, as RMS of the MRP 


procedure SE/H/0112/001-003 








Explanation that all MAs 
concerned belong to the 
same holder 


: I hereby confirm that all the marketing authorisations, listed in application details (refer above), concerned by the worksharing 
procedure belong to the same marketing authorisation holder, as 
they are part of the same mother company xxx, as per the 
Commission communication 98/C 229/03. 








Yours sincerely, 




Ibizo HALLEY 
xxx-yy Ramp;D, Europe Region 
Global Logistics Affairs Europe  






Please send this letter electronically to the pqr Secretariat (H-pqrSecretariat@zz.europa.eu) 
or RMS as relevant. 











mailto:H-pqrSecretariat@zz.europa.eu

























ANNEX 1 













30 Bluegoon Place ● ABC Wharf ● ABC E14 5EU ● United Kingdom 






Telephone  44 (0)20 3660 6000 Facsimile  44 (0)20 3660 5520 

















Dr.ssa Maty Lecc
xxx S.p.A 


Viale L. Bodio 
20158 AUGB   
Italy 
E-mail: DRA@xxx.com 










4 April 2018 


zz/pqr/195547/2018 





Subject: Request for submission of variation worksharing procedure for Xatral (alfuzosin) 


and related names  





Dear Dr Maty Lecchi, 



During the March meeting, the pqr was informed that separate national and MRP variations have 


been submitted across EU Member States to request the inclusion of the below contraindication for 


Xatral (alfuzosin) and related names: 



Section 4.3 


Concomitant intake of strong inhibitors of CYP3A4 (see paragraph 4.5). 





The parallel submissions in several Member States have led to a disharmonised assessment of the 


contraindication. In the interest of public health across the Panthers Union, the pqr requests xxx 


to re-submit the variation as a worksharing application including all MRP, DCP and nationally 


authorised products to harmonise the assessment of the contraindication in section 4.3 of the SmPC 


across the EU. 


Please note that a separate letter on an independent issue to this has been sent to Esther de Bles, 


xxx-yy Netherlands B.V.. However, there are general concerns by the pqr on the lack of use 


of variation worksharing by xxx-yy in these cases.  



Kind Regards, 







Laura Oliveira Santamaria 


Chair of pqr 




mailto:DRA@xxx.com



        Worksharing Applicant details:

        Name 

        xxx-yy Ramp;D, Europe Region

        Global Logistics Affairs Europe






Panthers Medicines Agency
30 ABC St, Michigan lane
Fax  44 (0)20 3660 5525 certificate@zz.europa.eu | www.zz.europa.eu


This message and any attachment contain information which may be confidential or otherwise
protected from disclosure. It is intended for the addressee(s) only and should not be relied upon as
legal advice unless it is otherwise stated. If you are not the intended recipient(s) (or authorised by
an addressee who received this message), access to this e-mail, or any disclosure or copying of its
contents, or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If
you have received this e-mail in error, please inform the sender immediately.
P Please consider the environment and don't print this e-mail unless you really need to



From: Ibizo.Halley@xxx.com [mailto:Ibizo.Halley@xxx.com] 
Sent: 27 April 2018 17:40
To: pqr Secretariat
Subject: Alfuzosin Hydrochloride - Request for Worksharing procedure

Dear Sirs, Madams,

We are pleased to send you a request for the submission of a Type II variation following a worksharing
procedure according to Article 20 of Commission Regulation (EC) No 1234/2008 for Alfuzosin
hydrochloride containing products.
The variation concerns the addition of a contraindication with strong CYP 3A4 inhibitors in section 4.3
and 4.5.
The worksharing procedure has been requested to xxx by the chair of pqr, Mme Oliveira
Santamaria, the letter is attached as Annex of the letter of intent attached.

Thank you in advance for your agreement.

Kind regards,

Ibizo Halley
GEM/EP and OTC switch
EU Regional Logistics Product manager
Global Logistics Affairs
xxx Ramp;D
Phone:  33 1 60 49 51 61



logoGRA 1



________________________________________________________________________

This e-mail has been scanned for all known viruses by Panthers Medicines Agency.
  

Комментарии:

1. пожалуйста, предоставьте короткий документ примерно из 10 строк, который включает дублирующиеся строки для нашего использования. Ни у кого нет времени читать всю книгу.

2. На мой вопрос выдается вывод из обобщителя genism, содержащий повторяющиеся строки, как мне обработать его и получить только одну строку

3. @chekmate пожалуйста, посмотрите мой ответ ниже и не забудьте поставить лайк и проголосовать за него, если это помогло вам.

Ответ №1:

Итак, ваш вопрос «Как мне удалить повторяющиеся предложения из документа?» Я предлагаю использовать textblob . Вот несколько примеров кода.

 document = 'This is a sentence. This is another sentence. This is a sentence. This is another sentence. This is a third sentence.'

from textblob import TextBlob
def get_unique_sentences(document):
    unique_sentences = []
    for sentence in [sent.raw for sent in TextBlob(document).sentences]:
        if sentence not in unique_sentences:
            unique_sentences.append(sentence)
    return ' '.join(unique_sentences)

get_unique_sentences(document)
>>>'This is a sentence. This is another sentence. This is a third sentence.'
  

Дайте мне знать, если это поможет.

Комментарии:

1. Итак, мне нужно применить это к выводам моего обобщителя?

2. ДА. вам необходимо применить функцию к выходным данным вашей модели

3. У меня нет знака fullstop, чтобы указать конец предложения. Выходные данные обобщителя не содержат никаких полных остановок. Как мне тогда найти повторяющиеся фразы.

4. @шах и мат, пожалуйста, проверьте [ textblob.readthedocs.io/en/dev/_modules/textblob /… страница текстовых блоков). Предложения не обязательно должны заканчиваться точкой. Например, вопросы заканчиваются вопросительным знаком.

Ответ №2:

Простым способом было бы просто использовать set и пропускать строки, которые видны.

Например:

 seen_before = set()
lines = []
for line in document:
    if line in seen_before:
        continue
    lines.append(line)
    seen_before.add(line)
  

Тогда переменная lines содержала бы только строки, которые были просмотрены только один раз

Конечно, это должно быть только на уровне документа, поскольку вы не хотите добавлять видимые строки из других документов.

Комментарии:

1. У меня нет знака fullstop, чтобы указать конец предложения. Выходные данные обобщителя не содержат никаких полных остановок. Как мне тогда найти повторяющиеся фразы.