#solr #apache-tika
#solr #apache-tika
Вопрос:
У меня возникают трудности при выполнении импорта Solr с помощью Tika, мои документы продолжают сбоить при индексации веб-страниц.
Я удаляю содержимое документов Tika и перезапускаю импорт, но это очень утомительно, и я, очевидно, теряю содержимое этих документов.
Вот журнал сбоев:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 927
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:130)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@b623d7
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
... 8 more
Caused by: java.lang.NullPointerException
Nov 10, 2011 10:51:29 AM org.apache.solr.common.SolrException log
SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 927
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:130)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@b623d7
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
... 8 more
Caused by: java.lang.NullPointerException
Пример сбоя данных:
pageText=pageText(1.0)={<table width="100%" height="100%" border="0" cellpadding="0" cellspacing="0" nodeIndex="3" class="ril_layoutTable">
<tr nodeIndex="2">
<td width="50%" rowspan="3" nodeIndex="1">amp;nbsp;</td>
<td width="1" rowspan="3" nodeIndex="4"></td>
<td nodeIndex="5">
<!-- ImageReady Slices (headergraphics.psd) -->
<table width="780" border="0" cellpadding="0" cellspacing="0" nodeIndex="8" class="ril_layoutTable">
<tr nodeIndex="7">
<td colspan="9" nodeIndex="6">
<table width="780" height="40" border="0" cellpadding="0" cellspacing="0" nodeIndex="11" class="ril_layoutTable">
<tr nodeIndex="10">
<td width="500" nodeIndex="9">amp;nbsp;</td>
<td width="135" nodeIndex="12">
<a href="/login.html" nodeIndex="80"></a>
<a href="/login.html" nodeIndex="81"></a>
</td>
<td width="135" nodeIndex="13">amp;nbsp;</td>
<td nodeIndex="14">amp;nbsp;</td>
</tr>
</table>
</td>
</tr>
<tr nodeIndex="16">
<td nodeIndex="15"></td>
<td nodeIndex="17" childIsOnlyALink="1">
<a href="/index.html" nodeIndex="84"></a>
</td>
<td nodeIndex="18" childIsOnlyALink="1">
<a href="/history.html" nodeIndex="86"></a>
</td>
<td nodeIndex="19" childIsOnlyALink="1">
<a href="/faq.html" nodeIndex="88"></a>
</td>
<td nodeIndex="20" childIsOnlyALink="1">
<a href="/prep.html" nodeIndex="90"></a>
</td>
<td nodeIndex="21"></td>
<td nodeIndex="22" childIsOnlyALink="1">
<a href="/exercises.html" nodeIndex="93"></a>
</td>
<td nodeIndex="23" childIsOnlyALink="1">
<a href="/faq.html?contact=true" nodeIndex="95"></a>
</td>
<td nodeIndex="24"></td>
</tr>
<tr nodeIndex="26">
<td colspan="9" nodeIndex="25"></td>
</tr>
</table><!-- End ImageReady Slices -->
</td>
<td width="1" rowspan="3" nodeIndex="27"></td>
<td width="50%" rowspan="3" nodeIndex="28">amp;nbsp;</td>
</tr>
<tr nodeIndex="30">
<td height="100%" valign="top" nodeIndex="29">
<table width="780" border="0" cellpadding="0" cellspacing="0" nodeIndex="33" class="ril_layoutTable">
<tr nodeIndex="32">
<td width="534" valign="top" nodeIndex="31">
<table width="534" border="0" cellpadding="0" cellspacing="0" nodeIndex="36" class="ril_layoutTable">
<tr nodeIndex="35">
<td width="534" valign="top" class="bgdown" nodeIndex="34">
<table cellspacing="0" cellpadding="0" nodeIndex="39" class="ril_layoutTable">
<tr nodeIndex="38">
<td valign="top" width="508" nodeIndex="37">
<!--Begin Content-->
<h2 nodeIndex="40">Welcome to IQTest.com, home of the original online IQ test.</h2>
<p nodeIndex="41" childIsOnlyALink="1">
<a href="/prep.html" nodeIndex="100">Click here</a> to take our free, private, and fun IQ test.</p>
<p nodeIndex="42">
Our original IQ test is the most scientifically valid IQ test available on
the web today. Previously offered only to corporations, schools, and in certified professional applications, it is now available to you. In addition to measuring your general IQ, our exclusive test assesses your performance in 13 different areas of intelligence, revealing your key cognizant
strengths and weaknesses.</p>
<p nodeIndex="43">
Developed by PhDs and statistically sound, our test reflects the best research available.<br nodeIndex="101">
<a href="/prep.html" nodeIndex="102">Click here to begin</a>
<br nodeIndex="103">
<br nodeIndex="104">
</p>
<h2 nodeIndex="44">
<a href="/prep.html" nodeIndex="105">IQTest.com<br nodeIndex="106">
Take the Test</a>
</h2>
<br nodeIndex="107">
<h2 nodeIndex="45">
<strong nodeIndex="108">What is an IQ?
</strong>
</h2>
<p nodeIndex="46">An Intelligence Quotient indicates a person's mental abilities relative to others of approximately the same age. Everyone has hundreds of specific mental
abilities--some can be measured accurately and are reliable predictors of academic and financial success.</p>
<p nodeIndex="47">Read more about <a href="whatisaniqscore.html" nodeIndex="109">Intelligence Testing</a></p>
<!-- End of StatCounter Code -->
<!--End Content-->
<br nodeIndex="113">
<p nodeIndex="48"></p>
</td>
</tr>
</table><!-- </div> -->
</td>
</tr>
<tr nodeIndex="50">
<td nodeIndex="49"></td>
</tr>
</table>
</td>
<!--Begin Sidebar-->
<td height="100%" nodeIndex="51">amp;nbsp;</td>
<td width="225" valign="top" nodeIndex="52">
<table class="ril_layoutTable" width="225" border="0" cellpadding="0" cellspacing="0" nodeIndex="55">
<tr nodeIndex="54">
<td nodeIndex="53"></td>
</tr>
<tr nodeIndex="57">
<td width="225" valign="top" nodeIndex="56">
<h4 nodeIndex="118">What does my score mean?</h4>
<p nodeIndex="58">Please <a href="whatisaniqscore.html" nodeIndex="119">click here</a> for an explanation of IQ testing and standard deviation.<br nodeIndex="120">
Please <a href="faq.html#chart" nodeIndex="121">click here</a> for a test score comparison chart.<br nodeIndex="122">
Please <a href="history.html" nodeIndex="123">click here</a> for a history of intelligence testing.</p>
<div align="center" margin="0" nodeIndex="59">
</div>
</td>
</tr>
<tr nodeIndex="61">
<td nodeIndex="60"></td>
</tr>
<tr nodeIndex="63">
<td width="225" valign="top" nodeIndex="62">
<h4 nodeIndex="127">What is the Complete Personal Intelligence Profile?</h4>
<p nodeIndex="64">Your Complete Personal Intelligence Profile will give you much greater detail about the range and variety of your mental abilities. <a href="profileexplain.html" nodeIndex="128">Read More...</a></p>
</td>
</tr>
<tr nodeIndex="66">
<td nodeIndex="65"></td>
</tr>
<tr nodeIndex="68">
<td width="225" valign="top" nodeIndex="67">
<h4 nodeIndex="130">Consciousness Exercises</h4>
<p nodeIndex="69">The Consciousness Exercises are a set of entertaining psycho-spiritual games, puzzles, dialogs, and more, which can expand your awareness. <a href="exercises.html" nodeIndex="131">Read More...</a></p>
</td>
</tr>
<tr nodeIndex="71">
<td nodeIndex="70"></td>
</tr>
</table>
</td>
<!--End Sidebar-->
</tr>
</table>
</td>
</tr>
<tr nodeIndex="73">
<td nodeIndex="72">
<table width="780" border="0" cellpadding="0" cellspacing="0" nodeIndex="76" class="ril_layoutTable">
<tr nodeIndex="75">
<td width="780" height="33" align="center" nodeIndex="74">
<a href="/index.html" nodeIndex="133">Home</a>amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;
<a href="/history.html" nodeIndex="134">History</a>amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;
<a href="/faq.html" nodeIndex="135">FAQ</a>amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;
<a href="/prep.html" nodeIndex="136">Test</a>amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;
<a href="/exercises.html" nodeIndex="137">Consciousness Exercises</a>amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;
<a href="/faq.html?contact=true" nodeIndex="138">Contact Us</a>amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;
<a href="/privacy.html" nodeIndex="139">Privacy Policy</a>amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;amp;nbsp;
<a href="/remove.html" nodeIndex="140">Unsubscribe</a>
</td>
</tr>
<tr nodeIndex="78">
<td width="780" height="34" align="center" nodeIndex="77">amp;copy; 2003 -2011 Autumn Group. All rights reserved</td>
</tr>
</table>
</td>
</tr>
Комментарии:
1. Какую версию Solr вы используете? Анализируется ли ваш документ последней версией Apache Tika в автономном режиме?
2. Использование SOLR 3.4.0. Не уверен, что автономный Tika анализирует его…