Избавьтесь от странных отступов при получении описания в beautiful soup

#python #beautifulsoup

#python #beautifulsoup

Вопрос:

У меня есть программа bs4, в которой я собираю описания ссылок. Сначала он проверяет, есть ли какие-либо теги meta description, и если их нет, он получает описания из

Теги.

Это код:

 from bs4 import BeautifulSoup
import requests

def find_title(url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')

    with open('descrip.txt', 'a', encoding='utf-8') as f:
        description = soup.find('meta', attrs={'name':'og:description'}) or soup.find('meta', attrs={'property':'description'}) or soup.find('meta', attrs={'name':'description'})

        if description:
            desc = description["content"]
        else:
            desc = soup.find_all('p')[0].getText()
            lengths = len(desc)
            index = 0

            while lengths == 1:
                index = index   1
                desc = soup.find_all('p')[index].getText()
                lengths = len(desc)

                if lengths > 300:
                    desc = soup.find_all('p')[index].getText()[0:300]

                elif lengths < 300:
                    desc = soup.find_all('p')[index].getText()[0:lengths]

        print(desc)
        f.write(desc   'n')

find_title('https://en.wikipedia.org/wiki/Portal:The_arts')
find_title('https://en.wikipedia.org/wiki/Portal:Biography')
find_title('https://en.wikipedia.org/wiki/Portal:Geography')
find_title('https://en.wikipedia.org/wiki/November_15')
find_title('https://en.wikipedia.org/wiki/November_16')
find_title('https://en.wikipedia.org/wiki/Wikipedia:Selected_anniversaries/November')
find_title('https://lists.wikimedia.org/mailman/listinfo/daily-article-l')
find_title('https://en.wikipedia.org/wiki/List_of_days_of_the_year')
find_title('https://en.wikipedia.org/wiki/File:Proclamação_da_República_by_Benedito_Calixto_1893.jpg')
find_title('https://en.wikipedia.org/wiki/First_Brazilian_Republic')
find_title('https://en.wikipedia.org/wiki/Empire_of_Brazil')
find_title('https://en.wikipedia.org/wiki/Pedro_II_of_Brazil')
find_title('https://en.wikipedia.org/wiki/Benedito_Calixto')
find_title('https://en.wikipedia.org/wiki/Rio_de_Janeiro')
find_title('https://en.wikipedia.org/wiki/Deodoro_da_Fonseca')
  

Но вывод в descrip.txt имеет несколько странных отступов, а некоторые описания занимают несколько строк, и между некоторыми из них есть пробелы
Это результат:

 The arts refers to the theory, human application and physical expression of creativity found in human cultures and societies through skills and imagination in order to produce objects, environments and experiences. Major constituents of the arts include visual arts (including architecture, ceramics,
A biography, or simply bio, is a detailed description of a person's life. It involves more than just the basic facts like education, work, relationships, and death; it portrays a person's experience of these life events. Unlike a profile or curriculum vitae (résumé), a biography presents a subject's
Geography (from Greek: γεωγραφία, geographia, literally "earth description") is a field of science devoted to the study of the lands, features, inhabitants, and phenomena of the Earth and planets. The first person to use the word γεωγραφία was Eratosthenes (276–194 BC). Geography is an all-encompass
November 15 is the 319th day of the year (320th in leap years) in the Gregorian calendar.  46 days remain until the end of the year.

November 16 is the 320th day of the year (321st in leap years) in the Gregorian calendar.  45 days remain until the end of the year.

The arts refers to the theory, human application and physical expression of creativity found in human cultures and societies through skills and imagination in order to produce objects, environments and experiences. Major constituents of the arts include visual arts (including architecture, ceramics,
A biography, or simply bio, is a detailed description of a person's life. It involves more than just the basic facts like education, work, relationships, and death; it portrays a person's experience of these life events. Unlike a profile or curriculum vitae (résumé), a biography presents a subject's
Geography (from Greek: γεωγραφία, geographia, literally "earth description") is a field of science devoted to the study of the lands, features, inhabitants, and phenomena of the Earth and planets. The first person to use the word γεωγραφία was Eratosthenes (276194 BC). Geography is an all-encompass
November 15 is the 319th day of the year (320th in leap years) in the Gregorian calendar.  46 days remain until the end of the year.

November 16 is the 320th day of the year (321st in leap years) in the Gregorian calendar.  45 days remain until the end of the year.

Selected anniversaries / On this day archive
All · January · February · March · April · May · June · July · August · September · October · November · December



       The sum of all human knowledge.  Delivered to your inbox every day.

      
The following pages list the historical events, births, deaths, and holidays and observances of the specified day of the year:

Original file ‎(5,799 × 3,574 pixels, file size: 15.11 MB, MIME type: image/jpeg)

The First Brazilian Republic or República Velha (Portuguese pronunciation: [ʁeˈpublikɐ ˈvɛʎɐ], "Old Republic"), officially the Republic of the United States of Brazil, refers to the period of Brazilian history from 1889 to 1930. The República Velha ended with the Brazilian Revolution of 1930 that installed Getúlio Vargas as a new president. 

The Empire of Brazil was a 19th-century state that broadly comprised the territories which form modern Brazil and (until 1828) Uruguay. Its government was a representative parliamentary constitutional monarchy under the rule of Emperors Dom Pedro I and his son Dom Pedro II. A colony of the Kingdom of Portugal, Brazil became the seat of the Portuguese colonial Empire in 1808, when the Portuguese Prince regent, later King Dom João VI, fled from Napoleon's invasion of Portugal and established himself and his government in the Brazilian city of Rio de Janeiro. João VI later returned to Portugal, leaving his eldest son and heir, Pedro, to rule the Kingdom of Brazil as regent. On 7 September 1822, Pedro declared the independence of Brazil and, after waging a successful war against his father's kingdom, was acclaimed on 12 October as Pedro I, the first Emperor of Brazil. The new country was huge, sparsely populated and ethnically diverse.

Early life (182540)
Consolidation (184053)
Growth (185364)
Paraguayan War (186470)
Apogee (187081)
Decline and fall (188189)
Exile and death (188991)
Legacy

Benedito Calixto de Jesus (14 October 185331 May 1927) was a Brazilian painter.[1] His works usually depicted figures from Brazil and Brazilian culture, including a famous portrait of the bandeirante Domingos Jorge Velho in 1923,[2] and scenes from the coastline of São Paulo.[3] Unlike many artis
Rio de Janeiro (/ˈriːoʊ di ʒəˈnɛəroʊ, - deɪ -, - də -/; Portuguese: [ˈʁi.u d(ʒi) ʒɐˈne(j)ɾu] (listen);[3]), or simply Rio,[4] is anchor to the Rio de Janeiro metropolitan area and the second-most populous municipality in Brazil and the sixth-most populous in the Americas. Rio de Janeiro is the capit
Manuel Deodoro da Fonseca (Portuguese pronunciation: [mɐnuˈɛw deoˈdɔɾu da fõˈsekɐ]; 5 August 182723 August 1892) was a Brazilian politician and military officer who served as the first President of Brazil. He took office after heading a military coup that deposed Emperor Pedro II and proclaimed t
  

есть ли какой-нибудь способ решить эту проблему?

Ответ №1:

Добавьте strip=True в getText() (примечание: это псевдоним get_text() ), а затем добавьте пробел в качестве разделителя. Например:

 get_text(strip=True, separator=' ')