#python #scrapy-splash
Вопрос:
Я пытаюсь отобразить страницу входа в coursera, используя следующее:
docker run -p 8050:8050 scrapinghub/splash
Лог
2021-09-20 18:17:03 0000 [-] Log opened.
2021-09-20 18:17:03.188493 [-] Xvfb is started: ['Xvfb', ':1502023191', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-splash'
2021-09-20 18:17:03.510918 [-] Splash version: 3.5
2021-09-20 18:17:03.701541 [-] Qt 5.14.1, PyQt 5.14.2, WebKit 602.1, Chromium 77.0.3865.129, sip 4.19.22, Twisted 19.7.0, Lua 5.2
2021-09-20 18:17:03.702994 [-] Python 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0]
2021-09-20 18:17:03.704023 [-] Open files limit: 1048576
2021-09-20 18:17:03.704705 [-] Can't bump open files limit
2021-09-20 18:17:03.739538 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2021-09-20 18:17:03.740350 [-] memory cache: enabled, private mode: enabled, js cross-domain access: disabled
2021-09-20 18:17:04.013267 [-] verbosity=1, slots=20, argument_cache_max_entries=500, max-timeout=90.0
2021-09-20 18:17:04.014356 [-] Web UI: enabled, Lua: enabled (sandbox: enabled), Webkit: enabled, Chromium: enabled
2021-09-20 18:17:04.015550 [-] Site starting on 8050
2021-09-20 18:17:04.016283 [-] Starting factory <twisted.web.server.Site object at 0x7f53580415c0>
2021-09-20 18:17:04.017285 [-] Server listening on http://0.0.0.0:8050
Затем я захожу на localhost:8050, вставляю URL-адрес выше и нажимаю Render me!, я получаю следующее:
2021-09-20 18:18:21.831799 [-] "172.17.0.1" - - [20/Sep/2021:18:18:21 0000] "GET / HTTP/1.1" 200 7675 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15"
2021-09-20 18:19:21.399658 [-] "172.17.0.1" - - [20/Sep/2021:18:19:21 0000] "GET / HTTP/1.1" 200 7675 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15"
2021-09-20 18:19:26.580224 [-] "172.17.0.1" - - [20/Sep/2021:18:19:26 0000] "GET /info?wait=0.5amp;images=1amp;expand=1amp;timeout=90.0amp;url=https://www.coursera.org/?authMode=loginamp;lua_source=function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(0.5))
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end HTTP/1.1" 200 5622 "http://localhost:8050/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15"
2021-09-20 18:19:33.763611 [events] {"path": "/execute", "rendertime": 7.115760803222656, "maxrss": 292156, "load": [0.11, 0.07, 0.02], "fds": 70, "active": 0, "qsize": 0, "_id": 139995935677240, "method": "POST", "timestamp": 1632161973, "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15", "args": {"url": "https://www.coursera.org/?authMode=login", "wait": 0.5, "resource_timeout": 0, "viewport": "1024x768", "render_all": false, "images": 1, "http_method": "GET", "html5_media": false, "http2": false, "save_args": [], "load_args": {}, "timeout": 90, "request_body": false, "response_body": false, "engine": "webkit", "har": 1, "png": 1, "html": 1, "lua_source": "function main(splash, args)rn assert(splash:go(args.url))rn assert(splash:wait(0.5))rn return {rn html = splash:html(),rn png = splash:png(),rn har = splash:har(),rn }rnend", "uid": 139995935677240}, "status_code": 200, "client_ip": "172.17.0.1"}
2021-09-20 18:19:33.766020 [-] "172.17.0.1" - - [20/Sep/2021:18:19:33 0000] "POST /execute HTTP/1.1" 200 1141156 "http://localhost:8050/info?wait=0.5amp;images=1amp;expand=1amp;timeout=90.0amp;url=https://www.coursera.org/?authMode=loginamp;lua_source=function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(0.5))
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15"
2021-09-20 18:19:33.908110 [-] "172.17.0.1" - - [20/Sep/2021:18:19:33 0000] "GET /_ui/harviewer-2.0.17a/scripts/require.js?_=1632161966613 HTTP/1.1" 200 86262 "http://localhost:8050/info?wait=0.5amp;images=1amp;expand=1amp;timeout=90.0amp;url=https://www.coursera.org/?authMode=loginamp;lua_source=function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(0.5))
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15"
Вот скриншот