Одиночная вставка в порядке, но массовый импорт выдает ошибку типа "not_x_content_exception"

#elasticsearch #import

#elasticsearch #импорт

Вопрос:

Я пытаюсь импортировать данные в Elasticsearch из файла JSON, который содержит один документ на строку. Только данные.

Вот как я создаю индекс и пытаюсь вставить один документ:

DELETE /tests

 PUT /tests
{}

 PUT /tests/test/_mapping
{
  "test":{
    "properties":{
      "env":{"type":"keyword"},
      "uid":{"type":"keyword"},
      "ok":{"type":"boolean"}
    }
  }
}

 POST /tests/test
{"env":"dev", "uid":12346, "ok":true}

 GET /tests/_search
{"query":{"match_all":{}}}

Все работает нормально, ошибок нет, документ проиндексирован правильно и может быть найден в ES.

Теперь давайте попробуем сделать это с помощью elasticdump .

Вот содержимое файла, который я пытаюсь импортировать:

 cat ./data.json
{"env":"prod","uid":1111,"ok":true}
{"env":"prod","uid":2222,"ok":true}

Вот как я пытаюсь импортировать:

 elasticdump 
    --input="./data.json" 
    --output="http://elk:9200" 
    --output-index="tests/test" 
    --debug 
    --limit=10000 
    --headers='{"Content-Type": "application/json"}' 
    --type=data

Но я получил ошибку Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes .

Вот полный вывод:

 root@node-tools:/data# elasticdump 
>     --input="./s.json" 
>     --output="http://elk:9200" 
>     --output-index="tests/test" 
>     --debug 
>     --limit=10000 
>     --headers='{"Content-Type": "application/json"}' 
>     --type=data
Tue, 16 Apr 2019 16:26:28 GMT | starting dump
Tue, 16 Apr 2019 16:26:28 GMT | got 2 objects from source file (offset: 0)
Tue, 16 Apr 2019 16:26:28 GMT [debug] | discovered elasticsearch output major version: 6
Tue, 16 Apr 2019 16:26:28 GMT [debug] | thisUrl: http://elk:9200/tests/test/_bulk, payload.body: "{"index":{"_index":"tests","_type":"test"}}nundefinedn{"index":{"_index":"tests","_type":"test"}}nundefinedn"
{ _index: 'tests',
  _type: 'test',
  _id: 'ndj4JmoBindjidtNmyKf',
  status: 400,
  error:
   { type: 'mapper_parsing_exception',
     reason: 'failed to parse',
     caused_by:
      { type: 'not_x_content_exception',
        reason:
         'Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes' } } }
{ _index: 'tests',
  _type: 'test',
  _id: 'ntj4JmoBindjidtNmyKf',
  status: 400,
  error:
   { type: 'mapper_parsing_exception',
     reason: 'failed to parse',
     caused_by:
      { type: 'not_x_content_exception',
        reason:
         'Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes' } } }
Tue, 16 Apr 2019 16:26:28 GMT | sent 2 objects to destination elasticsearch, wrote 0
Tue, 16 Apr 2019 16:26:28 GMT | got 0 objects from source file (offset: 2)
Tue, 16 Apr 2019 16:26:28 GMT | Total Writes: 0
Tue, 16 Apr 2019 16:26:28 GMT | dump complete

Что я делаю не так? Почему ручная вставка работает нормально, пока _batch выдает ошибки. Есть идеи?

UPD

Пробовал использовать python elasticsearch_loader — работает нормально.

 elasticsearch_loader 
    --es-host="http://elk:9200" 
    --index="tests" 
    --type="test" 
    json --json-lines ./data.json

Некоторую дополнительную информацию можно найти здесь: https://github.com/taskrabbit/elasticsearch-dump/issues/534

1. Могу я спросить вас, какие версии Elasticsearch и elasticdump вы используете?

2. @NikolayVasiliev elasticdump = 4.7, ES = 6.5.1 Проблема уже решена одним из участников elasticdump, смотрите Здесь github.com/taskrabbit/elasticsearch-dump/issues/534

Ответ №1:

Документы Json должны быть предоставлены как _source .

БЫЛО: {"env":"prod","uid":1111,"ok":true}

ТЕПЕРЬ: {"_source":{"env":"prod","uid":1111,"ok":true}}

Это можно сделать на лету, elasticdump используя --transform аргумент:

 elasticdump 
    --input="./data.json" 
    --output="http://elk:9200" 
    --output-index="tests/test" 
    --debug 
    --limit=10000 
    --type=data 
    --transform="doc._source=Object.assign({},doc)"

Спасибо @ferronrsmith из github.
Подробнее здесь: https://github.com/taskrabbit/elasticsearch-dump/issues/534