Important announcements from CouchCMS team
63 posts Page 1 of 7
Previous 1, 2, 3, 4, 5 ... 7 Next
Hi everybody,

A CSV importer to populate Couch pages using existing data (e.g. products etc.) has been a long standing request (http://www.couchcms.com/forum/viewtopic.php?f=3&t=7065).

I started working on it sometime back but had to leave it unfinished and move on to other pressing things.
As happens often, @cheesypoof goaded me back into action by creating one himself and I had to go back and complete my incomplete work taking inspiration from his effort :)

Creating a csv importer will require making use of an addon named 'csv' that is attached herewith.
install.zip
(15.9 KiB) Downloaded 3906 times
https://github.com/CouchCMS/CSV


IMP: This addon will work with Couch v1.4.5 (RC1 at the time of this post and downloadable from http://www.couchcms.com/forum/viewtopic.php?f=5&t=8581)

Extracting the zip attached above will yield a folder named 'install'.

Please copy the 'csv' folder (found below 'install/couch/addons/') and place it within the 'couch/addons' folder of your Couch installation and add the following line to 'couch/addons/kfunctions.php' of your installation (you might have to rename kfunctions.example.php to kfunctions.php for newer installations)

Code: Select all
require_once( K_COUCH_DIR.'addons/csv/csv.php' );

Our importer will also make use of DataBound Forms module so please also uncomment (i.e. remove the leading '//' double-slash) from the following line in kfunctions.php.
Code: Select all
require_once( K_COUCH_DIR.'addons/data-bound-form/data-bound-form.php' );

The 'csv' addon exposes a tag named cms:csv_reader that loops through any csv file specified and makes available each row as variables.

Following is an example showing all the parameters the tag supports. Only the 'file' parameter is mandatory - rest are shown for illustration purpose with their default values. For our examples we'll use a csv file named 'cars.csv' placed within the 'csv' addon folder.
Code: Select all
<cms:csv_reader
        file="<cms:show k_admin_path />addons/csv/cars.csv"
        has_header='1'
        use_cache='0'
        delimiter=','
        enclosure='"'
        prefix=''
       
        count_only='0'
        startcount='1'
        limit='0'
        offset='0'
        paginate='0'
    >
   
    <cms:dump />
   
</cms:csv_reader>

I am sure you'll recognize all the parameters in the second group (cms:pages tag uses the same parameters).
The parameters in the first group will require some explanation -

1. file
absolute path to the csv file.
Can use <cms:show k_admin_path /> if file is placed somewhere within the 'couch' folder or <cms:show k_site_path /> if it is within the site's root.

2. has_header
Set this to '0' if the first row of the csv file does not represent the column names.
Default is '1'.

3. use_cache
If the csv file being imported is very large, set this parameter to '1'.
With this parameter set, this tag caches an index of the file on the first run (i.e while showing the first 'page' of the paginated set
). All subsequent pages will not read the complete file and so the processing will be several magnitudes faster than when caching is not used.
Default value is '0'.

4. delimiter
The single character separating the columns in each row.
Can use '\r\n', '\r' or '\n' if separator is a newline character and '\t' if it is a tab.
Default is a comma (which gives CSV its name).

5. enclosure
The single character used to enclose each column if the value contains newlines, comma or double-quotes.
Default is a double-quote.

6. prefix
By default this tag makes available values contained in each column of the row being iterated as variable named after the column-name as specified in the header (i.e. the very first row of the file). For example, if a column is named 'car_model' in the header row, its value will be vailable as a variable of the same name i.e. 'car_model'.

If the header is missing (i.e. 'has_header' parameter is '0'), the variables are named 'col_1', 'col_2' etc.

Any value specified in the 'prefix' parameter will be appended to these names. For example if 'prefix' is set to '_', the variables will become '_car_model' or '_col_1'. This can be used to disambiguate the variables when using this tag in a context where variables of the same name might exist already (e.g. in the page-view of a template).

In addition to the cms:csv_reader tag explained above, two auxiliary tags are also made available by this module - cms:csv_headers and cms:csv_columns. These are helper tags and are supposed to be used only enclosed within the cms:csv_reader tag. For example -
Code: Select all
<cms:csv_reader 
    file="<cms:show k_admin_path />addons/csv/cars.csv"
    paginate='1'
    limit='10'
>
   
    <cms:if k_paginated_top >
        <cms:csv_headers>
            <cms:dump />
        </cms:csv_headers>
    </cms:if>

    <cms:csv_columns>
        <cms:dump />
    </cms:csv_columns>
   
    <cms:paginator />
   
</cms:csv_reader>

The two auxiliary tags can be helpful in outputting the csv rows as HTML table. For example as follows -
Code: Select all
<cms:csv_reader 
    file="<cms:show k_admin_path />addons/csv/cars.csv"
    paginate='1'
    limit='10'
>
    <cms:if k_paginated_top >
        <table border='0'>
            <thead>
                <tr>
                    <th>No.</th>
                    <cms:csv_headers>
                        <th><cms:show value /></th>
                    </cms:csv_headers>   
                </tr>
            </thead>
            <tbody>
    </cms:if>
   
    <tr>
        <td><cms:show k_current_record /></td>
        <cms:csv_columns>
            <td><cms:show value /></td>
        </cms:csv_columns>   
    </tr>
       
    <cms:if k_paginated_bottom >   
                <tr>
                    <td></td>
                    <td colspan='<cms:show k_csv_header_count />'>
                        <cms:paginator simple='1' />
                    </td>
                </tr>
            </tbody>
        </table>   
    </cms:if>
   
</cms:csv_reader>

Building a CSV Importer:

Couch already has cms:db_persist tag ('data-bound-form' module) that can be used to create new cloned pages.
If we can loop through the rows in the csv file to be imported and feed the values into cms:db_persist, our job is done.

However, those who have worked before with importers (csv, SQL or even XML) will know that this seemingly simple task is fraught with two very sticky problems if the number of records to be imported is large (i.e. the csv file is big in size) -

1. Looping through all the records in a single go is likely to result in either running out of memory or the script timing out - both case resulting in the script getting terminated before the import process is complete.
2. Reading in a large csv file can also result in running out of memory.

One solution to avoid both the aforementioned problems is by working with only a subset of the total records at one time.

If you are familiar with cms:pages tag, you'll know that it can work with potentially millions of pages through 'pagination' i.e. where we divide the recordset into 'pages' where each page contains only a limited subset of the total records. At the bottom we show a 'paginator' to navigate through the 'pages' and thus iterate through all the records by moving from one page to the next.

This cms:csv_reader tag works almost exactly like cms:pages tag and also supports pagination (all the pagination variables that get set while using cms:pages are also available with this tag).

We can use this property to create a 'staggered' importer - the idea is that we use pagination and set a limit (of say 100) on the number of rows fetched on each page. This will divide the csv file in a set of 'pages' and we can then visit the pages one by one thus importing only a small number of records in a single page execution.

This coupled with the 'use_cache' parameter mentioned above, should make it possible to work with very large csv files containing hundreds of thousands of records without risking running out of memory or script timeouts.

So let's get it done.
In the attached files you'll find a template named 'cars.php' and a csv file named 'cars.csv'.
Place cars.php in your site's root and register it (by accessing it as super-admin).
Place cars.csv within the 'csv' addons folder.
We'll be importing the rows in cars.csv into the cars.php template.

To do so, create a new template (say named 'import.php') place the two mandatory lines of PHP within it and register it by accesing it as super-admin.
Code: Select all
<?php require_once( 'couch/cms.php' ); ?>
    <cms:template title='Import' ></cms:template>

<?php COUCH::invoke(); ?>

Within this template we'll place a slightly modified version of the code we have already seen above where we displayed a paginated listing of all rows in the csv file in a tabular format -
Code: Select all
<?php require_once( 'couch/cms.php' ); ?>
    <cms:template title='Import' ></cms:template>

    <cms:set mystart="<cms:gpc 'import' method='get' />" />
   
    <cms:if mystart >
        <cms:csv_reader
            file="<cms:show k_admin_path />addons/csv/cars.csv"
            paginate='1'
            limit='100'
            prefix='_'
            use_cache='0'
        >
            <cms:if k_paginated_top >
           
                <cms:if k_paginate_link_next >
                    <script language="JavaScript" type="text/javascript">
                        var myVar;
                        myVar = window.setTimeout( 'location.href="<cms:show k_paginate_link_next />";', 1000 );
                    </script>
                    <button onclick="clearTimeout(myVar);">Stop</button>
                <cms:else />
                    Done!   
                </cms:if>
               
                <h3><cms:show k_current_page /> / <cms:show k_total_pages /> pages (Total <cms:show k_total_records /> records. Showing <cms:show k_paginate_limit /> records per page)</hr>
               
               
                <table border='0'>
                    <thead>
                        <tr>
                            <th>No.</th>
                            <cms:csv_headers>
                                <th><cms:show value /></th>
                            </cms:csv_headers>   
                        </tr>
                    </thead>
                    <tbody>
            </cms:if>
           
            <tr>
                <td><cms:show k_current_record /></td>
                <cms:csv_columns>
                    <td><cms:show value /></td>
                </cms:csv_columns>   
               
                <!-- database operation here -->
                <cms:ignore>
                    <cms:db_persist
                        _auto_title       = '0'
                        _invalidate_cache = '0'
                        _masterpage       = 'cars.php'
                        _mode             = 'create'

                        k_page_title      = "<cms:show _car_make/> <cms:show _car_model/>"

                        car_make          = _car_make
                        car_model         = _car_model
                        car_type          = _car_type
                        car_origin        = _car_origin
                        car_drivetrain    = _car_drivetrain
                        car_msrp          = _car_msrp
                        car_invoice       = _car_invoice
                        car_engine_size   = _car_engine_size
                        car_cylinders     = _car_cylinders
                        car_horsepower    = _car_horsepower
                        car_mpg_city      = _car_mpg_city
                        car_mpg_highway   = _car_mpg_highway
                        car_weight        = _car_weight
                        car_wheelbase     = _car_wheelbase
                        car_length        = _car_length
                    >
                        <cms:if k_error>
                            <strong style="color:red;">ERROR:</strong> <cms:show k_error/>
                        </cms:if>
                    </cms:db_persist>
                </cms:ignore>
                <!-- end database operation -->
               
            </tr>
               
            <cms:if k_paginated_bottom >   
                        <tr>
                            <td></td>
                            <td colspan='<cms:show k_csv_header_count />'>
                                <cms:paginator simple='1' />
                            </td>
                        </tr>
                    </tbody>
                </table>   
            </cms:if>
           
        </cms:csv_reader>   
    <cms:else/>
        <button onclick='location.href="<cms:add_querystring k_page_link 'import=1' />"'>Start!</button>
    </cms:if>


<?php COUCH::invoke(); ?>

The notable addition to the original table code is the cms:db_persist tag which creates new cloned pages using the data made available by the cms:csv_reader tag. In the code above the cms:db_persist block is commented out using cms:ignore tags to stop it from executing. Once you have tested that everything works as expected, remove the cms:ignore tags to begin the import process. If the csv file you are using is large, make sure to set 'use_cache' to '1' too.

Another (admittedly very crude) addition is the use of JS to automatically navigate through the paginated pages (instead of manually clicking on the paginator buttons). The point is to show how to couple cms:csv_reader tag with cms:db_persist tag and create a 'staggered' importer. I an sure you can create an importer that is more aesthetical (and maybe uses AJAX etc.).

Once the import process is complete, this template can be deleted.

In closing, please note that the cms:csv_reader tag is concerned with only making available the data contained in the specified csv file. What gets done with that data is completely up to you. It is only 'incidental' that we are using cms:db_persist with the data to create cloned pages thus effectively creating an importer.

This also translates to that we can easily extend the functionality of the importer we created by adding our own Couch code. For example, instead of creating new pages we can make it modify existing pages that have data different from what is contained in the csv file or delete pages not found in the csv file etc.

Hope this helps.
Do let us know your thoughts on this addon.

Addendum:
If you are looking for a CSV Exporter, please see
viewtopic.php?f=8&t=11378
Wow - a Christmas present to us all! Thank you. When needed this will prove invaluable.
I am glad that this functionality has been exposed through Couch tags in a very generic manner, consistent with cms:pages usage. I think this will be a big boost for those who work on projects with large existing data sets and/or have requirements for synchronizing data between other systems and Couch.

For those needing to import CSV data on more than the one-off occasion, a formal interface could be provided to admin users. Additionally, as @KK mentioned, this can be extended with JavaScript (specifically AJAX methods) to provide a better user experience through the previewing of records and display of a progress bar to indicate the status of an importation.

Great work @KK ;)
Hi There

I just implemented the plugin, which is very great.

But unfortunately it does not import rows where special characters are used, e.g. ä ö ü, which are common chars in Switzerland.

Does anyone have an idea how to fix this?

Kind regards,
Oliver
Hi Oliver,

Could you please PM me a sample CSV that is displaying the problem?
If you could additionally share with me the importer template, it'd be great.
Hi

The importer script ist as follows (filename: newsletter_import.php):
Code: Select all
<?php require_once( 'admin/cms.php' ); ?>
<cms:template title='Import (Newsletter)' order='500' executable='0' clonable='0'>
   <cms:editable order='1' type='file' name='file' label='Link zur Liste' required='0' desc='Format:csv-file mit Semikolon als Trennzeichen, Erste Zeile Spaltentitel wie folgt: firma|anrede|vorname|nachname|mail|telefon|strasse|plz_ort|webseite. File hochladen, auswählen und Doppelklick. Danach im Textfeld alles links von media entfernen'></cms:editable>
</cms:template>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
   <head>
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      <title>CSV Import</title>
   </head>
   <body>
   <cms:if file>
      <cms:csv_reader
         file="<cms:show file />"
         paginate='1'
         delimiter=';'
         enclosure=''
         limit='10'
      >
         <cms:if k_paginated_top >
            <table border='0'>
               <thead>
                  <tr>
                     <th>Nr.</th>
                     <cms:csv_headers>
                        <th><cms:show value /></th>
                     </cms:csv_headers>   
                  </tr>
               </thead>
               <tbody>
         </cms:if>
         
                  <tr>
                     <td><cms:show k_current_record /></td>
                     <cms:csv_columns>
                        <td><cms:show value /></td>
                     </cms:csv_columns>   
                  </tr>

                  <cms:db_persist
                     _auto_title       = '1'
                     _invalidate_cache = '0'
                     _masterpage       = 'newsletter_liste.php'
                     _mode             = 'create'
                     firma             = firma
                     anrede            = anrede
                     vorname           = vorname
                     nachname          = nachname
                     mail              = mail
                     telefon           = telefon
                  >
                     <cms:if k_error>
                        <strong style="color:red;">ERROR:</strong> <cms:show k_error/>
                     </cms:if>
                  </cms:db_persist>
                  <cms:ignore>
                  </cms:ignore>

         <cms:if k_paginated_bottom >   
                  <tr>
                     <td></td>
                     <td colspan='<cms:show k_csv_header_count />'>
                        <cms:paginator simple='1' />
                     </td>
                  </tr>
               </tbody>
            </table>   
         </cms:if>
         
      </cms:csv_reader>
   </cms:if>
   </body>
</html>

<?php COUCH::invoke(); ?>


The template that holds the list data is (filename: newsletter_liste.php):

Code: Select all
<?php require_once( 'admin/cms.php' ); ?>
<cms:template title='Liste (Newsletter)' order='920' executable='0' clonable='1'>

   <cms:editable order='1'  type='text' name='status_gesendet' label='Gesendet' required='0' />
   <cms:editable order='2'  type='text' name='status_gelesen' label='Mail Gelesen' required='0' />
   <cms:editable order='3'  type='text' name='status_webseite' label='Webseite geöffnet' required='0' />
   <cms:editable order='4'  type='text' name='status_file' label='File geöffnet' required='0' />

   <cms:editable order='11'  type='text' name='firma' label='Firma' required='0' />
   <cms:editable order='12'  type='dropdown' name='anrede' label='Anrede' required='0' opt_values='... | Herr | Frau' />
   <cms:editable order='13'  type='text' name='vorname' label='Vorname' required='0' />
   <cms:editable order='14'  type='text' name='nachname' label='Nachname' required='0' />
   <cms:editable order='15'  type='text' name='mail' label='E-Mail' required='0' />
   <cms:editable order='16'  type='text' name='telefon' label='Telefon' required='0' />
   <cms:editable order='17' type='textarea' name='bemerkungen' label='Bemerkungen' required='0' />
   
</cms:template>
<?php COUCH::invoke(); ?>


An example CSV which causes problems:
Code: Select all
firma;anrede;vorname;nachname;mail;telefon;strasse;plz_ort;webseite
Max Muster u. Söhne GmbH;Herr;Fritz;Schulz;info@mustermann.de;011 89 87 12-3;Musterplatz 11;90000 Musterstadt;
Käsemann + Partner GmbH;Frau  ;Stefanie ;Muster-Käsemann;info@kastermann.de;011 123 46 0 - 0;Bahnhostrasse 11;90001 Muserkland;


Fields with Umlaute like ä,ö,ü are not imported.
Thanks.

Sorry to bother you further but could you please PM me the physical CSV file?
It is important because, possibly, there is a encoding mismatch at work and I need to study that with the real file.
Sorry

Didn't first find the attach-option :-)

Attachments

Thanks. I'll test and get back.
This is an amazing piece of software. It is working extremely solid :) First test load of 1000 rows it processed in no time :D

One small trick. It is pretty convenient to create an editable in backend to upload csv files. Csv reader then takes the file and does the job. The small non-comfort thing is that it needs relative path, not link. Let's fix it.

Code: Select all
<cms:set my_csv_file = "
                        <cms:php>
                            $old_path   = '<cms:show csv_file />';
                            $site_link  = '<cms:show k_site_link />';
                            $site_path  = '<cms:show k_site_path />';
                            $new_path = str_replace($site_link, $site_path, $old_path);
                            echo($new_path);
                        </cms:php>
                        "
                        scope='global' />

<cms:if "<cms:not_empty my_csv_file />" ><cms:else /><cms:abort "Let's upload file in CMS." /></cms:if>

<cms:if mystart >
    <cms:csv_reader file="<cms:show my_csv_file />"
.........


This relatively easy to understand code takes link from editable csv_file and creates path which is saved in my_csv_file. Probably there are better and faster ways of doing this, but nothing is cooler than a piece of working and tested code.
Speaking of code, below is definition of editables, just in case.
Code: Select all
 <cms:template title='CSV Operations' access_level='10' executable='1' order='2050'>

    <cms:editable name='csv_file' label='CSV File' desc='Upload file here. No spaces in filename!' type='file' order='10' />
   
    <cms:editable name='delay' label='Pause' desc='X seconds delay between loads' type='text' validator='non_negative_integer | non_zero_integer' order='20' >5</cms:editable>
   
    <cms:editable name='limit' label='Limit' desc='XX rows per load' type='text' validator='non_negative_integer | non_zero_integer'  order='30' >10</cms:editable>
   
  </cms:template>
Previous 1, 2, 3, 4, 5 ... 7 Next
63 posts Page 1 of 7