Open Source RDBMS - Seamless, Scalable, Stable and Free

English | Login |Register

Full index of CUBRID Database using Solr DataImportHandler

This tutorial will illustrate an example where a sample data is batch indexed from a CUBRID database and POST'ed into Solr using DataImportHandler. Often it is called a pull approach.

At this point we assume that you have already completed previous steps and have the sample data in your database.

In this tutorial we will create a simple single core server which will be located in /home/solr/apache-solr-4.0.0/example/cubrid-solr-example directory.

If you do not want to copy/paste all of the following codes, you can download and extract it under example/ directory. The contents of this archive include all files from this tutorial. Make sure that the lib folder contains the JDBC driver corresponding to your CUBRID server.

First, we need to login to solr user we have created before and navigate to the example/ directory.

su - solr
cd apache-solr-4.0.0/example/

Now let's create the home directory structure for our example.

mkdir cubrid-solr-example
cd cubrid-solr-example
mkdir conf lib

This will create the following directory structure.


Considering that your are in the home cubrid-solr-example directory, create the main configuration file for solr.

touch solr.xml

Save the following into this file. This will instruct Solr to create a single core server.

<?xml version="1.0" encoding="UTF-8" ?>

<solr persistent="false">

  <cores adminPath="/admin/cores" defaultCoreName="cubrid-example-core">
    <core name="cubrid-example-core" instanceDir="." />

Now create a schema file which reflects the database schema of our sample database.

cd conf/
touch schema.xml

Save the following contents into this file. The following instructs Solr to create a schema for a users table which has 3 columns (id, email, join_date).

<?xml version="1.0" encoding="UTF-8" ?>

<schema name="tbl_users" version="1.0">

    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

   <field name="id" type="tint" indexed="true" stored="true" multiValued="false" required="true" /> 
   <field name="email" type="string" indexed="true" stored="true" multiValued="false" required="true" />
   <field name="join_date" type="tint" indexed="true" stored="true" multiValued="false" required="true" />


 <solrQueryParser defaultOperator="OR"/>


Now create a configuration file for this cubrid-example-core in the same directory.


touch solrconfig.xml


Add the following to this file. Most of the configurations you see below are default settings. But we will stop on DataImportHandler and explain it a little bit more.

DataImportHandler is used to perform batch indexing of an SQL database table. But it is not a native Solr library, therefore it is not loaded by default. We need to instruct Solr to load it by <lib ...> tag. This handler comes with Solr package by default and is located in the dist/ directory in the root of the package. Therefore the path should be relative to the home directory of your example (in our case cubrid-solr-example/) and not relative to this solrconfig.xml file.

Further below we create a separate requestHandler for our DataImportHandler which will respond when you navigate to http://localhost:8983/dataimport URL. We tell Solr that the configurations for DataImportHandler are located in the /home/solr/apache-solr-4.0.0/example/cubrid-solr-example/conf/data-config.xml file.

<?xml version="1.0" encoding="UTF-8" ?>




  <lib dir="../../dist/" regex="apache-solr-dataimporthandler-\d.*\.jar" />

  <directoryFactory name="DirectoryFactory" 

  <updateHandler class="solr.DirectUpdateHandler2" />

  <requestDispatcher handleSelect="true" >
    <requestParsers enableRemoteStreaming="false" />

  <requestHandler name="standard" class="solr.StandardRequestHandler" default="true" />

  <requestHandler name="/update" 
                  startup="lazy" />

  <requestHandler name="/admin/" 
                  class="solr.admin.AdminHandlers" />

  <requestHandler name="/admin/ping" class="solr.PingRequestHandler">
    <lst name="invariants">
      <str name="qt">search</str>
      <str name="q">solrpingquery</str>
    <lst name="defaults">
      <str name="echoParams">all</str>
  <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
      <str name="config">data-config.xml</str>



For DataImportHandler to connect and query a database, it requires a JDBC driver of that database server. CUBRID provides a native JDBC Driver. If you have already installed CUBRID in your system, you can find the JDBC driver in /jdbc directory. In case of Ubuntu apt-get installation, it is located in /opt/cubrid/jdbc.

If you have not installed CUBRID in your system and plan to connect to a remote CUBRID server, you can download the JDBC driver from the CUBRID downloads page.

Important: you need to download the right JDBC driver, i.e. if the version of your remote CUBRID server is 8.4.1, then download JDBC driver for 8.4.1.

Place CUBRID JDBC driver into /lib directory of cubrid-solr-example. Solr will automatically load any library located in this directory.

Now we need to create this data-config.xml file for DataImportHandler which tells the exact authentication information, the exact SELECT query to execute to retrieve the data from the database, etc.

touch data-config.xml

In this file we tell DataImportHandler that:

  • it should use CUBRID's JDBC driver;
  • in the connection URL we indicate that CUBRID server is installed in the localhost, the broker port is 33000 (default), and the database name is sample_db;
  • the user name is dba;
  • no password is needed.

Also we indicate:

  • the exact query which should be execute to obtain the data from our database.
  • name of fields we created before for our Solr server and the corresponding column names of the database table.
    <dataSource type="JdbcDataSource" driver="cubrid.jdbc.driver.CUBRIDDriver" url="jdbc:cubrid:localhost:33000:sample_db" user="dba" password=""/>
    <document name="users">
        <entity name="user" query="select * from tbl_users">
            <field column="ID" name="id" />
            <field column="EMAIL" name="email" />
            <field column="JOIN_DATE" name="join_date" />

Finally, you should have the following file structure.




To confirm if all the configurations are correct, we can now start our Solr instance.

cd /home/solr/apache-solr-4.0.0/example
java -Dsolr.solr.home="./cubrid-solr-example/" -jar start.jar

This should start Solr on port 8983. Navigate to http://localhost:8983/solr/dataimport. You should see an XML output which indicates that the configurations are correct. Otherwise, your Solr instance would not start in which case write to our CUBRID Forum. We will be glad to help you out.

To import your data from sample_db, first, make sure your database is running.

cubrid server start sample_db

Then navigate to http://localhost:8983/solr/dataimport?command=full-import to perform full import. If you see an XML output, Solr may have successfully imported all your data from CUBRID database. To confirm this, you can search for data in Solr admin panel at http://localhost:8983/solr/admin/. In the Query String field enter a string which exists in your database. Solr should display the related data if such entry exists.

At this point you have learnt how to perform batch index of CUBRID database using DataImportHandler. For more examples, see Using Solr / Lucene for full text search with CUBRID Database on Ubuntu.

comments powered by Disqus
문서 정보
viewed 7611 times
번역 en
posted 2년 전
마지막 수정시간 작년
변경 내역 보기
Share this article