#web-scraping #beautifulsoup #python-requests
#веб-очистка #beautifulsoup #python-запросы
Вопрос:
Я использую beautifulsoup и запросы для очистки веб-сайта, но я не получаю правильный html. Это то, что я получаю (удалены некоторые теги ссылок из head):
<!DOCTYPE html>
html dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>
.... title was here ...
</title>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible">
<meta content="yes" name="apple-mobile-web-app-capable">
<link href="/img/logo.png" rel="icon" type="image/png"/> </link>
</head>
<style>
form.search-form {
display: block;
}
li, ol {
list-style-type: none;
}
#header {
top: 0;
z-index: 997;
width: 100%;
height: 40px;
background: rgb(18, 18, 18);
position: fixed;
}
.containerv {
margin-right: auto;
margin-left: auto;
padding-left: 15px;
padding-right: 15px;
}
.containerv:before, .nav:before, .navbar-collapse:before, .navbar-collapse:before {
content: " ";
display: table;
}
#rowheader {
width: 100%;
margin: 0;
position: relative;
}
#top-brand {
position: absolute;
left: 60px;
margin: 6px 0 0 -32.5px;
}
.logo-animated {
animation-name: Breathing2;
animation-duration: 3s;
animation-timing-function: ease-in-out;
}
.logo-animated.img {
width: 60px;
border-radius: 60px;
overflow: hidden;
margin-left: 10px;
height: 80px;
margin-top: -10px;
background: url(/img/logon.png);
background-size: 100% 100%;
background-repeat: no-repeat;
}
.logo-animated.img:hover {
-webkit-filter: drop-shadow(2px 2px 4px #53bbf4); /* Safari 6.0 - 9.0 */
filter: drop-shadow(2px 2px 4px #53bbf4);
}
.content-header-bottom {
border-left: 65px solid transparent;
border-right: 65px solid transparent;
border-top: 36.6px solid rgb(18, 18, 18);
position: absolute;
top: 38px;
z-index: -1;
margin-left: auto;
margin-right: auto;
}
.info-not {
margin-top: 40px;
width: 100%;
height: 40px;
float: left;
background: #333333;
overflow: hidden;
z-index: -3;
}
.navbar-nav.brand-left {
position: absolute;
left: 130px;
margin-right: 60px;
margin-top: 0;
}
.navbar-nav>li {
float: left;
position: relative;
display: inline-flex;
}
.navbar-nav>li>a {
padding: 10px 15px;
font-size: 15px;
transition: opacity .5s ease;
font-weight: 700;
}
.nav-random {
position: fixed;
top: 11px;
right: 10px;
z-index: 11;
width: 24px;
height: 24px;
}
svg.n {
fill: #e8e5d3;
height: 18px;
width: 18px;
}
.info-not ol {
width: auto;
float: left;
height: 40px;
}
.info-not ol li {
width: auto;
float: left;
list-style: none;
height: 40px;
background: #333333;
}
.not {
margin-left: 147px;
}
.info-not ol li:first-child a {
padding: 0 15px;
font-weight: 700;
font-size: 13px;
}
.info-not ol li a {
height: 40px;
float: left;
color: #fff;
font-size: 15px;
line-height: 40px;
padding: 0 10px;
letter-spacing: -.7px;
padding-left: 28px;
position: relative;
background: #333333;
}
.info-not ol li:last-child a {
background: 0 0;
margin-left: 1px;
transition: opacity .5s ease;
font-weight: 700;
letter-spacing: 1px;
background: #333333;
}
svg.footer path {
fill: #2e2e2e;
}
.info-not ol li a.noti:before {
border: solid transparent;
border-left-color: #121212;
border-width: 20px;
left: 0;
top: 0;
content: '';
position: absolute;
height: 0;
width: 0;
}
.info-not ol li a.noti:after {
border: solid transparent;
border-left-color: #333333;
border-width: 20px;
left: -1px;
top: 0;
content: '';
position: absolute;
height: 0;
width: 0;
}
.main {
max-width: 1080px;
overflow: hidden;
width: 100%;
margin: 0 auto;
}
.menu{
display: none;
}
.route {
font-size: 13px;
line-height: 30px;
}
nav.sidebar-wrapper {
position: fixed;
width: 250px;
left: 0;
top: 79.5px;
padding: 1rem;
height: 100%;
background: #070707;
color: #fafafa;
overflow-x: hidden;
overflow-y: auto;
-webkit-font-smoothing: antialiased;
transform-origin: 0 0;
transform: translate(-100%,0);
transition: transform .5s cubic-bezier(0.77,.2,.05,1);
z-index: 998;
}
.sidebar-content {
max-height: calc(100% - 30px);
height: calc(100% - 30px);
overflow-y: auto;
position: relative;
}
li.header-menu {
margin: .1em 0;
}
.sidebar-menu ul li a span {
padding: 8px;
font-size: 16px;
}
li.header-menu span {
color: #eeedeb;
display: inline-block;
padding: 10px 15px;
font-size: 16px;
transition: opacity .5s ease;
font-weight: 700;
}
.sidebar-menu ul li a {
display: inline-block;
width: 100%;
text-decoration: none;
position: relative;
padding: 8px 30px 8px 20px;
}
nav.sidebar-wrapper.show {
transform: scale(1,1);
}
.searchModal {
width: 100%;
height: 0%;
position: fixed;
left: 0;
top: 0;
overflow: hidden;
-webkit-transform-origin: 100% 0;
transform-origin: 100% 0;
-webkit-transition-duration: .5s;
transition-duration: .5s;
-webkit-transition-timing-function: cubic-bezier(.7,0,.3,1);
transition-timing-function: cubic-bezier(.7,0,.3,1);
z-index: 999;
bottom: 0;
background: rgba(3, 3, 3, 0.73);
-webkit-transform-origin: 100% 0;
transform-origin: 100% 0;
-webkit-transition-duration: .5s;
transition-duration: .5s;
-webkit-transition-timing-function: cubic-bezier(.7,0,.3,1);
transition-timing-function: cubic-bezier(.7,0,.3,1);
}
.sm-content {
overflow: hidden;
position: fixed;
-webkit-transform-origin: 100% 0;
transform-origin: 100% 0;
-webkit-transition-duration: .5s;
transition-duration: .5s;
-webkit-transition-timing-function: cubic-bezier(.7,0,.3,1);
transition-timing-function: cubic-bezier(.7,0,.3,1);
width: 98%;
left: 1%;
height: 0%;
top: 1%;
-webkit-transition-delay: .4s;
transition-delay: .4s;
}
.close, .searc {
width: 20px;
height: 20px;
cursor: pointer;
border: 0;
background: #e7e7e7;
}
.ads-module {
margin-top: 10px;
overflow: hidden;
width: 100%;
padding: 10px;
height: 475px;
background-color: #23201d;
border: 2px solid #594d42;
}
.close {
margin: 8px;
}
.close svg, .searc svg {
width: 100%;
height: 100%;
}
button.searc svg {
fill: #171717;
}
.close svg {
background: #171717;
fill: #e7e7e7;
}
form#search-form {
text-align: left;
color: #cbcbcb;
border: 0;
width: 100%;
background: #fff;
overflow: hidden;
margin: 0;
height: 36px;
background-color: #e7e7e7!important;
border-bottom: 1px solid #333333;
border-top: 1px solid #3a3939;
border-left: 1px solid #3a3939;
border-right: 1px solid #3a3939;
}
.input-group {
padding: 6px;
width: 100%;
height: 100%;
}
input#search-input {
border: 0;
width: 100%;
border-radius: 5px;
overflow: hidden;
background-color: #e7e7e7!important;
height: 100%;
}
.searchModal.active * {
box-sizing: border-box;
}
.sea-fle, .input-group {
display: flex;
}
.sea-fle {
background-color: #1e4459;
border: 2px solid #4CADE1;
padding: 15px;
}
.searchModal.active {
height: 100%;
}
.searchModal.active>.sm-content {
height: 98%;
padding: 40px 200px;
}
@media (max-width: 960px) {
#main-nav, .nav-random, .info-not ol li {
display: none;
}
a.menu {
color: #e5e5d3;
cursor: pointer;
display: block;
height: 2.5rem;
position: relative;
width: 3.25rem;
margin-left: auto;
}
.menu span:first-child {
background-color: currentColor;
display: block;
height: 1px;
left: calc(50% - 8px);
position: absolute;
transform-origin: center;
transition-duration: 86ms;
transition-property: background-color,opacity,transform,-webkit-transform;
transition-timing-function: ease-out;
width: 16px;
top: calc(50% - 6px);
}
.menu span:nth-child(2) {
background-color: currentColor;
display: block;
height: 1px;
left: calc(50% - 8px);
position: absolute;
transform-origin: center;
transition-duration: 86ms;
transition-property: background-color,opacity,transform,-webkit-transform;
transition-timing-function: ease-out;
width: 16px;
opacity: 1;
top: calc(50% - 1px);
}
.menu span:nth-child(3) {
background-color: currentColor;
display: block;
height: 1px;
left: calc(50% - 8px);
position: absolute;
transform-origin: center;
transition-duration: 86ms;
transition-property: background-color,opacity,transform,-webkit-transform;
transition-timing-function: ease-out;
width: 16px;
top: calc(50% 4px);
}
.menu.active span:first-child {
similar format continues .....
Я ожидаю появления body
тега после head
. В то время как в chrome devtools он показывает ok.
Я искал его, и первая мысль {'Content-Encoding': 'gzip'}
— создать такой формат. Но я изменил свой код и добавил {'Accept-Encoding': 'identity'}
в заголовки.
Но проблема все еще сохранялась.
И я также где-то читал, что библиотека запросов не может автоматически обрабатывать gzip.
Это заголовки:
{'Date': 'Sun, 11 Oct 2020 09:24:27 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Set-Cookie': '__cfduid=d2c982fbe3f6d64b1b4d3a8be7080e90c1602408267; expires=Tue, 10-Nov-20 09:24:27 GMT; path=/; domain=.pantsubase.com; HttpOnly; SameSite=Lax; Secure,
PHPSESSID=4e41d27d1428d26da658a9b0e5c0b9ae; path=/', 'Cache-Control': 'no-store, no-cache, must-revalidate', 'Cf-Railgun': 'd561de73b9 stream 0.000000 0210 c794', 'Expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'Pragma': 'no-cache', 'Vary': 'Accept-Encoding', 'CF-Cache-Status': 'DYNAMIC', 'cf-request-id': '05b892d7720000e664e71a7200000001', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'Report-To': '{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report?lkg-colo=21amp;lkg-time=1602408268"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"report_to":"cf-nel","max_age":604800}', 'Server': 'cloudflare', 'CF-RAY': '5e078738bc97e664-LHR', 'Content-Encoding': 'gzip', 'alt-svc': 'h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400'}
Что я делаю неправильно или вообще не делаю?
Комментарии:
1. Можете ли вы поделиться URL-адресом?
3. Пожалуйста, имейте в виду, что если содержимое сайта динамическое, вы не сможете получить к нему доступ с помощью запросов.
4. Да, теперь я это понял. Мне придется использовать selenium или что-то в этом роде, но это медленно. Я думаю, что в этом случае я могу получить этот второй документ только по запросам. Могу ли я? Я действительно не знаю, поэтому я попробую это, и если ничего не произойдет, я использую что-то другое. Любая дополнительная помощь приветствуется.
Ответ №1:
Попробуйте указать другой анализатор ( lxml
или html5lib
когда вы создаете soup). Кажется, что разметка на этой странице «сломана»:
import requests
from bs4 import BeautifulSoup
url = 'https://old.pantsubase.com/watch/203230-digimon-adventure-episode-19-english-sub'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'lxml')
print(soup.body)
Комментарии:
1. Это сработало. Но я использовал оба анализатора ранее. Его soup.body, который работал. Я печатал только soup.prettify Можете ли вы сказать, почему это не отображается, в то время как soup.body делает
2. @hac_ticc Какую версию
bs4
вы используете? Когда я используюsoup.prettify()
, я вижу<body>
тег. Возможно, вы можете запуститьdiagnose()
функцию в своей разметке.3. Я думал, что это проблема, поэтому я уже обновил bs4 и запросы, прежде чем задавать вопрос. И я попробую diagnose ()
4. @hac_ticc этот
<div id="video">
тег создается динамически с помощью JavaScript, поэтомуbeautifulsoup
он его не видит. Я рекомендую посмотретьselenium
— это запустит Javascript, и вы сможете загрузить видео.
Ответ №2:
Попробуйте добавить a user-agent
к запросу, который вы делаете, например:
import requests
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0'}
r = requests.get('https://old.pantsubase.com/watch/203230-digimon-adventure-episode-19-english-sub',headers = headers)
print(r.content)
Вывод:
b'<!DOCTYPE html>n<html lang="en" dir="ltr">n<head>n<meta charset="utf-8">n<title>Watch Digimon Adventure: Episode 19 English Sub : Pantsubase xe2x98x85 Moe Moe, Kyun!</title>n<meta name="viewport" content="width=device-width, initial-scale=1">n<meta http-equiv="X-UA-Compatible" content="IE=Edge" />n<meta name="apple-mobile-web-app-capable" content="yes" />n<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">n<meta name="theme-color" content="#53bbf4">n<script src="/cdn-cgi/apps/head/zYR5uwvdEStbpZyYVe_i8NIfIRw.js"></script><link rel="icon" type="image/png" href="/img/logo.png" />n<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>n<link rel="stylesheet" href="/css/style.css">n<link id="theme-changer" rel="stylesheet" type="text/css">n<script src="/js/kuroani.js"></script>n<link href="https://fonts.googleapis.com/css?family=Staatliches" rel="stylesheet">n<link rel="image_src" href="" />n<link rel="apple-touch-icon" sizes="152x152" href="https://pantsubase.com/img/apple-touch-icon.png" />n<link href="https://pantsubase.com/splashscreens/iphone5_splash.png" media="(device-width: 320px) and (device-height: 568px) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image" />n<link href="https://pantsubase.com/splashscreens/iphone6_splash.png" media="(device-width: 375px) and (device-height: 667px) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image" />n<link href="https://pantsubase.com/splashscreens/iphoneplus_splash.png" media="(device-width: 621px) and (device-height: 1104px) and (-webkit-device-pixel-ratio: 3)" rel="apple-touch-startup-image" />n<link href="https://pantsubase.com/splashscreens/iphonex_splash.png" media="(device-width: 375px) and (device-height: 812px) and (-webkit-device-pixel-ratio: 3)" rel="apple-touch-startup-image" />n<link href="https://pantsubase.com/splashscreens/iphonexr_splash.png" media="(device-width: 414px) and (device-height: 896px) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image" />n<link href="https://pantsubase.com/splashscreens/iphonexsmax_splash.png" media="(device-width: 414px) and (device-height: 896px) and (-webkit-device-pixel-ratio: 3)" rel="apple-touch-startup-image" />n<link href="https://pantsubase.com/splashscreens/ipad_splash.png" media="(device-width: 768px) and (device-height: 1024px) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image" />n<link href="https://pantsubase.com/splashscreens/ipadpro1_splash.png" media="(device-width: 834px) and (device-height: 1112px) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image" />n<link href="https://pantsubase.com/splashscreens/ipadpro3_splash.png" media="(device-width: 834px) and (device-height: 1194px) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image" />n<link href="https://pantsubase.com/splashscreens/ipadpro2_splash.png" media="(device-width: 1024px) and (device-height: 1366px) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image" />n<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.8.1/css/all.css" integrity="sha384-50oBUHEmvpQ 1lW4y57PTFmhCaXp0ML5d60M1M7uH2 nqUivzIebhndOJK28anvf" crossorigin="anonymous">n<link rel="manifest" href="manifest.json" />n<script async src="https://cdn.rawgit.com/GoogleChrome/pwacompat/v2.0.1/pwacompat.min.js"></script>n</head>n<style>nform.search-form {n display: block;n}nli, ol {n list-style-type: none;n}n#header {n top: 0;n z-index: 997;n width: 100%;n height: 40px;n background: rgb(18, 18, 18);n position: fixed;n}n.containerv {n margin-right: auto;n margin-left: auto;n padding-left: 15px;n padding-right: 15px;n}n.containerv:before, .nav:before, .navbar-collapse:before, .navbar-collapse:before {n content: " ";n display: table;n}n#rowheader {n width: 100%;n margin: 0;n position: relative;n}n#top-brand {n position: absolute;n left: 60px;n margin: 6px 0 0 -32.5px;n}n.logo-animated {n animation-name: Breathing2;n animation-duration: 3s;n animation-timing-function: ease-in-out;n}n.logo-animated.img {n width: 60px;n border-radius: 60px;n overflow: hidden;n margin-left: 10px;n height: 80px;n margin-top: -10px;n background: url(/img/logon.png);n background-size: 100% 100%;n background-repeat: no-repeat;n}n.logo-animated.img:hover {n -webkit-filter: drop-shadow(2px 2px 4px #53bbf4); /* Safari 6.0 - 9.0 */n filter: drop-shadow(2px 2px 4px #53bbf4);n}n.content-header-bottom {n border-left: 65px solid transparent;n border-right: 65px solid transparent;n border-top: 36.6px solid rgb(18, 18, 18);n position: absolute;n top: 38px;n z-index: -1;n margin-left: auto;n margin-right: auto;n}n.info-not {n margin-top: 40px;n width: 100%;n height: 40px;n float: left;n background: #333333;n overflow: hidden;n z-index: -3;n}n.navbar-nav.brand-left {n position: absolute;n left: 130px;n margin-right: 60px;n margin-top: 0;n}n.navbar-nav>li {n float: left;n position: relative;n display: inline-flex;n}n.navbar-nav>li>a {n padding: 10px 15px;n font-size: 15px;n transition: opacity .5s ease;n font-weight: 700;n}n.nav-random {n position: fixed;n top: 11px;n right: 10px;n z-index: 11;n width: 24px;n height: 24px;n}nsvg.n {n fill: #e8e5d3;n height: 18px;n width: 18px;n}n.info-not ol {n width: auto;n float: left;n height: 40px;n}n.info-not ol li {n width: auto;n float: left;n list-style: none;n height: 40px;n background: #333333;n}n.not {n margin-left: 147px;n}n.info-not ol li:first-child a {n padding: 0 15px;n font-weight: 700;n font-size: 13px;n}n.info-not ol li a {n height: 40px;n float: left;n color: #fff;n font-size: 15px;n line-height: 40px;n padding: 0 10px;n letter-spacing: -.7px;n padding-left: 28px;n position: relative;n background: #333333;n}n.info-not ol li:last-child a {n background: 0 0;n margin-left: 1px;n transition: opacity .5s ease;n font-weight: 700;n letter-spacing: 1px;n background: #333333;n}nsvg.footer path {n fill: #2e2e2e;n}n.info-not ol li a.noti:before {n border: solid transparent;n border-left-color: #121212;n border-width: 20px;n left: 0;n top: 0;n content: '';n position: absolute;n height: 0;n width: 0;n}n.info-not ol li a.noti:after {n border: solid transparent;n border-left-color: #333333;n border-width: 20px;n left: -1px;n top: 0;n content: '';n position: absolute;n height: 0;n width: 0;n}n.main {n max-width: 1080px;n overflow: hidden;n width: 100%;n margin: 0 auto;n}n.menu{n display: none;n}n.route {n font-size: 13px;n line-height: 30px;n}nnav.sidebar-wrapper {n position: fixed;n width: 250px;n left: 0;n top: 79.5px;n padding: 1rem;n height: 100%;n background: #070707;n color: #fafafa;n overflow-x: hidden;n overflow-y: auto;n -webkit-font-smoothing: antialiased;n transform-origin: 0 0;n transform: translate(-100%,0);n transition: transform .5s cubic-bezier(0.77,.2,.05,1);n z-index: 998;n}n.sidebar-content {n max-height: calc(100% - 30px);n height: calc(100% - 30px);n overflow-y: auto;n position: relative;n}nli.header-menu {n margin: .1em 0;n}n.sidebar-menu ul li a span {n padding: 8px;n font-size: 16px;n}nli.header-menu span {n color: #eeedeb;n display: inline-block;n padding: 10px 15px;n font-size: 16px;n transition: opacity .5s ease;n font-weight: 700;n}n.sidebar-menu ul li a {n display: inline-block;n width: 100%;n text-decoration: none;n position: relative;n padding: 8px 30px 8px 20px;n}nnav.sidebar-wrapper.show {n transform: scale(1,1);n}n.searchModal {n width: 100%;n height: 0%;n position: fixed;n left: 0;n top: 0;n overflow: hidden;n -webkit-transform-origin: 100% 0;n transform-origin: 100% 0;n -webkit-transition-duration: .5s;n transition-duration: .5s;n -webkit-transition-timing-function: cubic-bezier(.7,0,.3,1);n transition-timing-function: cubic-bezier(.7,0,.3,1);n z-index: 999;n bottom: 0;n background: rgba(3, 3, 3, 0.73);n -webkit-transform-origin: 100% 0;n transform-origin: 100% 0;n -webkit-transition-duration: .5s;n transition-duration: .5s;n -webkit-transition-timing-function: cubic-bezier(.7,0,.3,1);n transition-timing-function: cubic-bezier(.7,0,.3,1);n}n.sm-content {n overflow: hidden;n position: fixed;n -webkit-transform-origin: 100% 0;n transform-origin: 100% 0;n -webkit-transition-duration: .5s;n transition-duration: .5s;n -webkit-transition-timing-function: cubic-bezier(.7,0,.3,1);n transition-timing-function: cubic-bezier(.7,0,.3,1);n width: 98%;n left: 1%;n height: 0%;n top: 1%;n -webkit-transition-delay: .4s;n transition-delay: .4s;n}n.close, .searc {n width: 20px;n height: 20px;n cursor: pointer;n border: 0;n background: #e7e7e7;n}n.ads-module {n margin-top: 10px;n overflow: hidden;n width: 100%;n padding: 10px;n height: 475px;n background-color: #23201d;n border: 2px solid #594d42;n}n.close {n margin: 8px;n}n.close svg, .searc svg {n width: 100%;n height: 100%;n}nbutton.searc svg {n fill: #171717;n}n.close svg {n background: #171717;ntfill: #e7e7e7;n}nform#search-form {n text-align: left;n color: #cbcbcb;n border: 0;n width: 100%;n background: #fff;n overflow: hidden;n margin: 0;n height: 36px;n background-color: #e7e7e7!important;n border-bottom: 1px solid #333333;n border-top: 1px solid #3a3939;n border-left: 1px solid #3a3939;n border-right: 1px solid #3a3939;n}n.input-group {n padding: 6px;n width: 100%;n height: 100%;n}ninput#search-input {n border: 0;n width: 100%;n border-radius: 5px;n overflow: hidden;n background-color: #e7e7e7!important;n height: 100%;n}n.searchModal.active * {n box-sizing: border-box;n}n.sea-fle, .input-group {n display: flex;n}n.sea-fle {ntbackground-color: #1e4459;n border: 2px solid #4CADE1;n padding: 15px;n n}n.searchModal.active {n height: 100%;n}n.searchModal.active>.sm-content {n height: 98%;n padding: 40px 200px;n}n@media (max-width: 960px) {n#main-nav, .nav-random, .info-not ol li {n display: none;n}na.menu {n color: #e5e5d3;n cursor: pointer;n display: block;n height: 2.5rem;n position: relative;n width: 3.25rem;n margin-left: auto;n}n.menu span:first-child {n background-color: currentColor;n display: block;n height: 1px;n left: calc(50% - 8px);n position: absolute;n transform-origin: center;n transition-duration: 86ms;n transition-property: background-color,opacity,transform,-webkit-transform;n transition-timing-function: ease-out;n width: 16px;n top: calc(50% - 6px);n}n.menu span:nth-child(2) {n background-color: currentColor;n display: block;n height: 1px;n left: calc(50% - 8px);n position: absolute;n transform-origin: center;n transition-duration: 86ms;n transition-property: background-color,opacity,transform,-webkit-transform;n transition-timing-function: ease-out;n width: 16px;n opacity: 1;n top: calc(50% - 1px);n}n.menu span:nth-child(3) {n background-color: currentColor;n display: block;n height: 1px;n left: calc(50% - 8px);n position: absolute;n transform-origin: center;n transition-duration: 86ms;n transition-property: background-color,opacity,transform,-webkit-transform;n transition-timing-function: ease-out;n width: 16px;n top: calc(50% 4px);n}n.menu.active span:first-child {n transform: translateY(5px) rotate(45deg);n}n.menu.active span:nth-child(2) {n opacity: 0;n}n.menu.active span:nth-child(3) {n transform: translateY(-5px) rotate(-45deg);n}nfooter {n padding-left: 8px;n padding-right: 8px;n}nsvg.footer {n display: none;n}n.logoWrap-so {n display: -webkit-box;n display: -ms-flexbox;n display: flex;n margin: 16px 0;n width: 45%;n}na.logo-co {n margin: 0 auto;n}n.navigation {n -ms-flex-align: start;n -ms-flex-flow: row wrap;n -ms-flex-pack: distribute;n -webkit-box-align: start;n -webkit-box-direction: normal;n -webkit-box-orient: horizontal;n align-items: flex-start;n flex-flow: row wrap;n justify-content: space-around;n}n.navigationSection-con {n margin: 16px 0;n text-align: center;n width: 45%;n}n.searchModal.active>.sm-content {n padding: 40px 50px;n}n}n@media (max-width: 550px) {n.searchModal.active>.sm-content {n padding: 40px 20px;n}n}n</style>n<body class="list-me">n<script>if ('serviceWorker' in navigator) {n // sw.js can literally be empty, but must existn navigator.serviceWorker.register('/sw.js');n}nnfunction addToHomeScreen() {n let a2hsBtn = document.querySelector(".ad2hs-prompt"); // hide our user interface that shows our A2HS buttonn a2hsBtn.style.display = 'none'; // Show the promptn deferredPrompt.prompt(); // Wait for the user to respond to the promptn deferredPrompt.userChoicen .then(function(choiceResult){n if (choiceResult.outcome === 'accepted') {n console.log('User accepted the A2HS prompt');n } else {n console.log('User dismissed the A2HS prompt');n }n deferredPrompt = null;n });}n function showAddToHomeScreen() {n let a2hsBtn = document.querySelector(".ad2hs-prompt");n a2hsBtn.style.display = "block";n a2hsBtn.addEventListener("click", addToHomeScreen);n }n let deferredPrompt;n window.addEventListener('beforeinstallprompt', function (e) {n // Prevent Chrome 67 and earlier from automatically showing the promptn e.preventDefault();n // Stash the event so it can be triggered later.n deferredPrompt = e;n showAddToHomeScreen();n });n n function showIosInstall() {n let iosPrompt = document.querySelector(".ios-prompt");n iosPrompt.style.display = "block";n iosPrompt.addEventListener("click", () => {n iosPrompt.style.display = "none";n });n }n n // Detects if device is on iOSn const isIos = () => {n const userAgent = window.navigator.userAgent.toLowerCase();n return /iphone|ipad|ipod/.test( userAgent );n }n // Detects if device is in standalone moden const isInStandaloneMode = () => ('standalone' in window.navigator) amp;amp; (window.navigator.standalone);n // Checks if should display install popup notification:n if (isIos() amp;amp; !isInStandaloneMode()) {n // this.setState({ showInstallMessage: true });n showIosInstall();n }n</script>n<div id="fb-root"></div>n<script async defer crossorigin="anonymous" src="https://connect.facebook.net/en_US/sdk.js#xfbml=1amp;version=v3.2amp;appId=400381637444503amp;autoLogAppEvents=1"></script>n<div id="header" class="containerv"><div id="rowheader" class="row vertical-center"><div id="top-brand"><a href="/"><div class="logo-animated"><div class="logo-animated img"></div></div></a></div><div class="content-header-bottom"></div><div id="main-nav"><ul class="nav navbar-nav brand-left"><li><a class="inactive-menu" href="/anime-list">Browse</a></li><li><a class="inactive-menu" href="/support">Support</a></li><li><a id="search" class="inactive-menu" href="#search">Search</a></li></ul></div><a class="nav-random" href="https://old.pantsubase.com/anime/3191-robomasters-the-animated-series"><svg viewBox="0 0 24 24" class="n"><path d="M17,3L22.25,7.5L17,12L22.25,16.5L17,21V18H14.26L11.44,15.18L13.56,13.06L15.5,15H17V12L17,9H15.5L6.5,18H2V15H5.26L14.26,6H17V3M2,6H6.5L9.32,8.82L7.2,10.94L5.26,9H2V6Z"></path></svg></a><a role="button" class="menu"><span></span> <span></span> <span></span></a></div></div>n<div class="info-not">n<ol>n<li><a class="not" href="#notice">Notice</a></li>n<li><a class="noti">Join our discord community, We recently just merged with A Shelter For Anime Lovers! </a></li>n<li><a class="noti" href="https://discord.gg/zAnR33k"><i class="fab fa-discord" style="font-size:30px;color:#738adb;padding-top:5px;" aria-hidden="true"></i></a></li>n<li><a class="noti" href="https://pantsubase.com">Switch to New Pantsubase</a>n</li>n</ol>n</div>n<div class="clear"></div>n<div class="content"><meta property="og:image" content="//cdn.animeapi.com/images/1/dcd384395f5d_352x220.jpg">n<meta name="description" content="Watch online and download anime Digimon Adventure: Episode 19 english subbed/dubbed in high quality">n<link rel="stylesheet" href="/css/anime-episode.css">n<script>$(document).ready(function(){$("#server").click(function(){$(".server").toggleClass("show");});})</script>n<style>.server.show{display: block;}</style>n<script type="text/javascript" src='//platform-api.sharethis.com/js/sharethis.js#property=5b698620d5484b00116a8fddamp;product=inline-share-buttons' async='async'></script>n<div class="server"><div id="mirrors" class="dropdown-menu p-0" aria-labelledby="btnGroupDrop1"></div></div>n<div class="clear"></div>n<div class="container">n<div class="player-san">n<div id="player" class="embed-container" style="min-height: 525"></div>n<div class="player-menu">n<div class="desd2">n<div id="server"><a href="#"><i class="fa fa-font"></i> | <i class="fas fa-server"></i></a></div>n<div id="refresh" data-index="0"><i class="fas fa-sync-alt"></i> Refresh</div>n</div>n<div class="desd">n<div class="desd3"><a class='btn btn-danger float-left' href='https://old.pantsubase.com/watch/203151-digimon-adventure-episode-18-english-sub'>Prev</a><a class='all-epi btn btn-danger float-left' href='https://old.pantsubase.com/anime/4832-digimon-adventure'><i class='fa fa-list-ol'></i></a></div>n</div>n</div>n<div style="n max-width: 480px;n width: 100%;n margin: 0 auto;n">n</div>n</div>n<div class="episo-detai">n<div class="item meta">n<div class="tb" itemprop="image" itemscope="" itemtype="https://schema.org/ImageObject">n<img width='350' height='476' src='https://cdn.animeapi.com/images/anime/4832.jpg' class='attachment-post-thumbnail size-post-thumbnail wp-post-image' alt=''> </div>n<div class="lm">n<h1 itemprop='name'>Digimon Adventure: English Sub</h1><span class='epi-name'><span class='epx'>Episode 19</span> <span class='year'>In the year 2020, technology is everywhere. Every digital device around the world is connected by a singular network where data travels. Unbeknownst to humans, this network has become home to life forms known as amp;#34;Digimon.amp;#34; Fifth-grader Taichi Yagami is preparing for summer camp when strange occurrences begin in Tokyo; certain electronic systems have started going haywire. When he discovers that his sister and mother are trapped on an unstoppable train, he rushes to the nearby station. Suddenly, Taichi is transported to another world where he meets a strange creature by the name of Agumon, who somehow already knows his name. Taichi also receives a strange device called a amp;#34;Digivice,amp;#34; which allows him to communicate with the undigitized world. Taichi discovers he is in the amp;#34;Network,amp;#34; and virus-like Digimon are attacking the areas that maintain Tokyoxe2x80x99s electronic systems. It is up to Taichi and his new partner Agumon to stop these cyberattacks before the whole world is threatened by the actions of mischievous Digimon. [Written by MAL Rewrite]</span> </div>n</div>n<div class="sharethis-inline-share-buttons" style="margin-top: 7px;"></div>n<div id="fb-save-button"><div id="save-button" class="fb-save" data-uri="https://old.pantsubase.com/watch/203230-digimon-adventure-episode-19-english-sub" data-size="large"></div></div>n<style>ndiv#fb-save-button {n margin: auto;n margin-top: 5px;n margin-bottom: 5px;n width: 150px;n}</style>n</div>n</div>n<script type='text/javascript'>var episode_videos = [{"host":"trollvid","id":"dcd384395f5d","type":"subbed","date":"2020-10-11 04:50:05"},{"host":"vidstreaming","id":"MTQ2Mjk2","type":"subbed","date":"2020-10-11 04:50:03"}];</script>n<script src="https://code.jquery.com/jquery-3.3.1.min.js"></script>n<script type='text/javascript' src="/js/anime-controls.js?11"></script>n<script type='text/javascript' src="/js/pagination.min.js"></script>n</div>n<nav class="sidebar-wrapper">n<div class="sidebar-content">n<div class="sidebar-menu">n<ul>n<li class="header-menu">n<span>General</span>n</li>n<li>n<a href="/anime-list">n<span>Browse</span>n</a>n</li>n<li>n<a href="/ongoing-anime">n<span>Ongoing</span>n</a>n</li>n<li>n<a href="https://old.pantsubase.com/anime/2746-jungle-wa-itsumo-hare-nochi-guu">n<span>Random</span>n</a>n</li>n<li class="search-menu">n<a id="search-con" href="#search">n<span>Search</span>n</a>n</li>n<li class="header-menu">n<span>Extra</span>n</li>n<li class="header-menu">n<span>Social Networks</span>n</li>n<li>n<a href="https://discord.gg/5tm4C2M">n<span>Discord</span>n</a>n</li>n<li>n<a href="https://www.facebook.com/pantsubesu/">n<span>Facebook</span>n</a>n</li>n<li>n<a href="https://www.facebook.com/groups/pantsubase/">n<span>Join our Group</span>n</a>n</li>n</ul>n</div>n</div>n</nav>n<div class="searchModal">n<div class="sm-content">n<div class="sea-fle">n<form action="/search" method="GET" id="search-form" class="search-form">n<div class="input-group">n<input type="text" class="form-control" name="term" id="search-input" placeholder="Search Anime.." aria-label="Search for...">n<button class="searc" type="submit">n<svg viewBox="0 0 24 24"><path d="M13.262,14.868l2.479,2.478c-0.376,0.725-0.415,1.445-0.017,1.843l4.525,4.526 c0.571,0.571,1.812,0.257,2.768-0.7c0.956-0.955,1.269-2.195,0.697-2.766l-4.524-4.526c-0.399-0.398-1.119-0.36-1.842,0.016 l-2.48-2.478L13.262,14.868z M8.5,0C3.806,0,0,3.806,0,8.5C0,13.194,3.806,17,8.5,17S17,13.194,17,8.5C17,3.806,13.194,0,8.5,0z M8.5,15C4.91,15,2,12.09,2,8.5S4.91,2,8.5,2S15,4.91,15,8.5S12.09,15,8.5,15z"></path></svg>n</button>n</div>n</form>n<div class="close"><svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 21.9 21.9" xmlns:xlink="http://www.w3.org/1999/xlink" enable-background="new 0 0 21.9 21.9">n<path d="M14.1,11.3c-0.2-0.2-0.2-0.5,0-0.7l7.5-7.5c0.2-0.2,0.3-0.5,0.3-0.7s-0.1-0.5-0.3-0.7l-1.4-1.4C20,0.1,19.7,0,19.5,0 c-0.3,0-0.5,0.1-0.7,0.3l-7.5,7.5c-0.2,0.2-0.5,0.2-0.7,0L3.1,0.3C2.9,0.1,2.6,0,2.4,0S1.9,0.1,1.7,0.3L0.3,1.7C0.1,1.9,0,2.2,0,2.4 s0.1,0.5,0.3,0.7l7.5,7.5c0.2,0.2,0.2,0.5,0,0.7l-7.5,7.5C0.1,19,0,19.3,0,19.5s0.1,0.5,0.3,0.7l1.4,1.4c0.2,0.2,0.5,0.3,0.7,0.3 s0.5-0.1,0.7-0.3l7.5-7.5c0.2-0.2,0.5-0.2,0.7,0l7.5,7.5c0.2,0.2,0.5,0.3,0.7,0.3s0.5-0.1,0.7-0.3l1.4-1.4c0.2-0.2,0.3-0.5,0.3-0.7 s-0.1-0.5-0.3-0.7L14.1,11.3z" />n</svg></div>n</div>n</div>n</div>n<div class="bar">n<div class="container-dark">n<div class="theme quicktoggler">n<span class="text">Change theme</span>n<div class="bwbox" id="bbox" onclick="changeTheme()">n<div class="bwselect" id="bwsel"></div>n</div>n</div>n</div>n</div>n<footer class="foot">n<div class="foot-con">nAll Rights Reserved Pantsubase Inc.n<script>n $( document ).ready(function() {n $( ".menu" ).click(function() {$( ".sidebar-wrapper" )n .toggleClass( "show" );});});n $( document ).ready(function() {n $( ".menu" ).click(function() {$( ".menu" )n .toggleClass( "active" );});});n$('#search').click(function(){n $('.searchModal').addClass('active')n});n$('#search-con').click(function(){n $('.searchModal').addClass('active')n});n$('.close').click(function(){n $('.searchModal').removeClass('active')n});n </script>n</footer>n<script src="/js/theme.js"></script>n<script type='text/javascript' src="/js/controls.js?v=.0005">n </body>n</html>'
Надеюсь, что это поможет!
Комментарии:
1. Спасибо, но я уже добавил User-Agent. Без user-agent он выдавал 403 запрещенных.
2. О … Хорошо. Кстати, что вы хотите очистить с веб-сайта?
3. Я просто изучал очистку и просто пытался получить URL-адрес видео. Хотя это не что-то конкретное.