Hi everybody,
A CSV importer to populate Couch pages using existing data (e.g. products etc.) has been a long standing request (http://www.couchcms.com/forum/viewtopic.php?f=3&t=7065).
I started working on it sometime back but had to leave it unfinished and move on to other pressing things.
As happens often, @cheesypoof goaded me back into action by creating one himself and I had to go back and complete my incomplete work taking inspiration from his effort
Creating a csv importer will require making use of an addon named 'csv' that is attached herewith.
https://github.com/CouchCMS/CSV
IMP: This addon will work with Couch v1.4.5 (RC1 at the time of this post and downloadable from http://www.couchcms.com/forum/viewtopic.php?f=5&t=8581)
Extracting the zip attached above will yield a folder named 'install'.
Please copy the 'csv' folder (found below 'install/couch/addons/') and place it within the 'couch/addons' folder of your Couch installation and add the following line to 'couch/addons/kfunctions.php' of your installation (you might have to rename kfunctions.example.php to kfunctions.php for newer installations)
Our importer will also make use of DataBound Forms module so please also uncomment (i.e. remove the leading '//' double-slash) from the following line in kfunctions.php.
The 'csv' addon exposes a tag named cms:csv_reader that loops through any csv file specified and makes available each row as variables.
Following is an example showing all the parameters the tag supports. Only the 'file' parameter is mandatory - rest are shown for illustration purpose with their default values. For our examples we'll use a csv file named 'cars.csv' placed within the 'csv' addon folder.
I am sure you'll recognize all the parameters in the second group (cms:pages tag uses the same parameters).
The parameters in the first group will require some explanation -
1. file
absolute path to the csv file.
Can use <cms:show k_admin_path /> if file is placed somewhere within the 'couch' folder or <cms:show k_site_path /> if it is within the site's root.
2. has_header
Set this to '0' if the first row of the csv file does not represent the column names.
Default is '1'.
3. use_cache
If the csv file being imported is very large, set this parameter to '1'.
With this parameter set, this tag caches an index of the file on the first run (i.e while showing the first 'page' of the paginated set
). All subsequent pages will not read the complete file and so the processing will be several magnitudes faster than when caching is not used.
Default value is '0'.
4. delimiter
The single character separating the columns in each row.
Can use '\r\n', '\r' or '\n' if separator is a newline character and '\t' if it is a tab.
Default is a comma (which gives CSV its name).
5. enclosure
The single character used to enclose each column if the value contains newlines, comma or double-quotes.
Default is a double-quote.
6. prefix
By default this tag makes available values contained in each column of the row being iterated as variable named after the column-name as specified in the header (i.e. the very first row of the file). For example, if a column is named 'car_model' in the header row, its value will be vailable as a variable of the same name i.e. 'car_model'.
If the header is missing (i.e. 'has_header' parameter is '0'), the variables are named 'col_1', 'col_2' etc.
Any value specified in the 'prefix' parameter will be appended to these names. For example if 'prefix' is set to '_', the variables will become '_car_model' or '_col_1'. This can be used to disambiguate the variables when using this tag in a context where variables of the same name might exist already (e.g. in the page-view of a template).
In addition to the cms:csv_reader tag explained above, two auxiliary tags are also made available by this module - cms:csv_headers and cms:csv_columns. These are helper tags and are supposed to be used only enclosed within the cms:csv_reader tag. For example -
The two auxiliary tags can be helpful in outputting the csv rows as HTML table. For example as follows -
Building a CSV Importer:
Couch already has cms:db_persist tag ('data-bound-form' module) that can be used to create new cloned pages.
If we can loop through the rows in the csv file to be imported and feed the values into cms:db_persist, our job is done.
However, those who have worked before with importers (csv, SQL or even XML) will know that this seemingly simple task is fraught with two very sticky problems if the number of records to be imported is large (i.e. the csv file is big in size) -
1. Looping through all the records in a single go is likely to result in either running out of memory or the script timing out - both case resulting in the script getting terminated before the import process is complete.
2. Reading in a large csv file can also result in running out of memory.
One solution to avoid both the aforementioned problems is by working with only a subset of the total records at one time.
If you are familiar with cms:pages tag, you'll know that it can work with potentially millions of pages through 'pagination' i.e. where we divide the recordset into 'pages' where each page contains only a limited subset of the total records. At the bottom we show a 'paginator' to navigate through the 'pages' and thus iterate through all the records by moving from one page to the next.
This cms:csv_reader tag works almost exactly like cms:pages tag and also supports pagination (all the pagination variables that get set while using cms:pages are also available with this tag).
We can use this property to create a 'staggered' importer - the idea is that we use pagination and set a limit (of say 100) on the number of rows fetched on each page. This will divide the csv file in a set of 'pages' and we can then visit the pages one by one thus importing only a small number of records in a single page execution.
This coupled with the 'use_cache' parameter mentioned above, should make it possible to work with very large csv files containing hundreds of thousands of records without risking running out of memory or script timeouts.
So let's get it done.
In the attached files you'll find a template named 'cars.php' and a csv file named 'cars.csv'.
Place cars.php in your site's root and register it (by accessing it as super-admin).
Place cars.csv within the 'csv' addons folder.
We'll be importing the rows in cars.csv into the cars.php template.
To do so, create a new template (say named 'import.php') place the two mandatory lines of PHP within it and register it by accesing it as super-admin.
Within this template we'll place a slightly modified version of the code we have already seen above where we displayed a paginated listing of all rows in the csv file in a tabular format -
The notable addition to the original table code is the cms:db_persist tag which creates new cloned pages using the data made available by the cms:csv_reader tag. In the code above the cms:db_persist block is commented out using cms:ignore tags to stop it from executing. Once you have tested that everything works as expected, remove the cms:ignore tags to begin the import process. If the csv file you are using is large, make sure to set 'use_cache' to '1' too.
Another (admittedly very crude) addition is the use of JS to automatically navigate through the paginated pages (instead of manually clicking on the paginator buttons). The point is to show how to couple cms:csv_reader tag with cms:db_persist tag and create a 'staggered' importer. I an sure you can create an importer that is more aesthetical (and maybe uses AJAX etc.).
Once the import process is complete, this template can be deleted.
In closing, please note that the cms:csv_reader tag is concerned with only making available the data contained in the specified csv file. What gets done with that data is completely up to you. It is only 'incidental' that we are using cms:db_persist with the data to create cloned pages thus effectively creating an importer.
This also translates to that we can easily extend the functionality of the importer we created by adding our own Couch code. For example, instead of creating new pages we can make it modify existing pages that have data different from what is contained in the csv file or delete pages not found in the csv file etc.
Hope this helps.
Do let us know your thoughts on this addon.
Addendum:
If you are looking for a CSV Exporter, please see
viewtopic.php?f=8&t=11378
A CSV importer to populate Couch pages using existing data (e.g. products etc.) has been a long standing request (http://www.couchcms.com/forum/viewtopic.php?f=3&t=7065).
I started working on it sometime back but had to leave it unfinished and move on to other pressing things.
As happens often, @cheesypoof goaded me back into action by creating one himself and I had to go back and complete my incomplete work taking inspiration from his effort
Creating a csv importer will require making use of an addon named 'csv' that is attached herewith.
https://github.com/CouchCMS/CSV
IMP: This addon will work with Couch v1.4.5 (RC1 at the time of this post and downloadable from http://www.couchcms.com/forum/viewtopic.php?f=5&t=8581)
Extracting the zip attached above will yield a folder named 'install'.
Please copy the 'csv' folder (found below 'install/couch/addons/') and place it within the 'couch/addons' folder of your Couch installation and add the following line to 'couch/addons/kfunctions.php' of your installation (you might have to rename kfunctions.example.php to kfunctions.php for newer installations)
- Code: Select all
require_once( K_COUCH_DIR.'addons/csv/csv.php' );
Our importer will also make use of DataBound Forms module so please also uncomment (i.e. remove the leading '//' double-slash) from the following line in kfunctions.php.
- Code: Select all
require_once( K_COUCH_DIR.'addons/data-bound-form/data-bound-form.php' );
The 'csv' addon exposes a tag named cms:csv_reader that loops through any csv file specified and makes available each row as variables.
Following is an example showing all the parameters the tag supports. Only the 'file' parameter is mandatory - rest are shown for illustration purpose with their default values. For our examples we'll use a csv file named 'cars.csv' placed within the 'csv' addon folder.
- Code: Select all
<cms:csv_reader
file="<cms:show k_admin_path />addons/csv/cars.csv"
has_header='1'
use_cache='0'
delimiter=','
enclosure='"'
prefix=''
count_only='0'
startcount='1'
limit='0'
offset='0'
paginate='0'
>
<cms:dump />
</cms:csv_reader>
I am sure you'll recognize all the parameters in the second group (cms:pages tag uses the same parameters).
The parameters in the first group will require some explanation -
1. file
absolute path to the csv file.
Can use <cms:show k_admin_path /> if file is placed somewhere within the 'couch' folder or <cms:show k_site_path /> if it is within the site's root.
2. has_header
Set this to '0' if the first row of the csv file does not represent the column names.
Default is '1'.
3. use_cache
If the csv file being imported is very large, set this parameter to '1'.
With this parameter set, this tag caches an index of the file on the first run (i.e while showing the first 'page' of the paginated set
). All subsequent pages will not read the complete file and so the processing will be several magnitudes faster than when caching is not used.
Default value is '0'.
4. delimiter
The single character separating the columns in each row.
Can use '\r\n', '\r' or '\n' if separator is a newline character and '\t' if it is a tab.
Default is a comma (which gives CSV its name).
5. enclosure
The single character used to enclose each column if the value contains newlines, comma or double-quotes.
Default is a double-quote.
6. prefix
By default this tag makes available values contained in each column of the row being iterated as variable named after the column-name as specified in the header (i.e. the very first row of the file). For example, if a column is named 'car_model' in the header row, its value will be vailable as a variable of the same name i.e. 'car_model'.
If the header is missing (i.e. 'has_header' parameter is '0'), the variables are named 'col_1', 'col_2' etc.
Any value specified in the 'prefix' parameter will be appended to these names. For example if 'prefix' is set to '_', the variables will become '_car_model' or '_col_1'. This can be used to disambiguate the variables when using this tag in a context where variables of the same name might exist already (e.g. in the page-view of a template).
In addition to the cms:csv_reader tag explained above, two auxiliary tags are also made available by this module - cms:csv_headers and cms:csv_columns. These are helper tags and are supposed to be used only enclosed within the cms:csv_reader tag. For example -
- Code: Select all
<cms:csv_reader
file="<cms:show k_admin_path />addons/csv/cars.csv"
paginate='1'
limit='10'
>
<cms:if k_paginated_top >
<cms:csv_headers>
<cms:dump />
</cms:csv_headers>
</cms:if>
<cms:csv_columns>
<cms:dump />
</cms:csv_columns>
<cms:paginator />
</cms:csv_reader>
The two auxiliary tags can be helpful in outputting the csv rows as HTML table. For example as follows -
- Code: Select all
<cms:csv_reader
file="<cms:show k_admin_path />addons/csv/cars.csv"
paginate='1'
limit='10'
>
<cms:if k_paginated_top >
<table border='0'>
<thead>
<tr>
<th>No.</th>
<cms:csv_headers>
<th><cms:show value /></th>
</cms:csv_headers>
</tr>
</thead>
<tbody>
</cms:if>
<tr>
<td><cms:show k_current_record /></td>
<cms:csv_columns>
<td><cms:show value /></td>
</cms:csv_columns>
</tr>
<cms:if k_paginated_bottom >
<tr>
<td></td>
<td colspan='<cms:show k_csv_header_count />'>
<cms:paginator simple='1' />
</td>
</tr>
</tbody>
</table>
</cms:if>
</cms:csv_reader>
Building a CSV Importer:
Couch already has cms:db_persist tag ('data-bound-form' module) that can be used to create new cloned pages.
If we can loop through the rows in the csv file to be imported and feed the values into cms:db_persist, our job is done.
However, those who have worked before with importers (csv, SQL or even XML) will know that this seemingly simple task is fraught with two very sticky problems if the number of records to be imported is large (i.e. the csv file is big in size) -
1. Looping through all the records in a single go is likely to result in either running out of memory or the script timing out - both case resulting in the script getting terminated before the import process is complete.
2. Reading in a large csv file can also result in running out of memory.
One solution to avoid both the aforementioned problems is by working with only a subset of the total records at one time.
If you are familiar with cms:pages tag, you'll know that it can work with potentially millions of pages through 'pagination' i.e. where we divide the recordset into 'pages' where each page contains only a limited subset of the total records. At the bottom we show a 'paginator' to navigate through the 'pages' and thus iterate through all the records by moving from one page to the next.
This cms:csv_reader tag works almost exactly like cms:pages tag and also supports pagination (all the pagination variables that get set while using cms:pages are also available with this tag).
We can use this property to create a 'staggered' importer - the idea is that we use pagination and set a limit (of say 100) on the number of rows fetched on each page. This will divide the csv file in a set of 'pages' and we can then visit the pages one by one thus importing only a small number of records in a single page execution.
This coupled with the 'use_cache' parameter mentioned above, should make it possible to work with very large csv files containing hundreds of thousands of records without risking running out of memory or script timeouts.
So let's get it done.
In the attached files you'll find a template named 'cars.php' and a csv file named 'cars.csv'.
Place cars.php in your site's root and register it (by accessing it as super-admin).
Place cars.csv within the 'csv' addons folder.
We'll be importing the rows in cars.csv into the cars.php template.
To do so, create a new template (say named 'import.php') place the two mandatory lines of PHP within it and register it by accesing it as super-admin.
- Code: Select all
<?php require_once( 'couch/cms.php' ); ?>
<cms:template title='Import' ></cms:template>
<?php COUCH::invoke(); ?>
Within this template we'll place a slightly modified version of the code we have already seen above where we displayed a paginated listing of all rows in the csv file in a tabular format -
- Code: Select all
<?php require_once( 'couch/cms.php' ); ?>
<cms:template title='Import' ></cms:template>
<cms:set mystart="<cms:gpc 'import' method='get' />" />
<cms:if mystart >
<cms:csv_reader
file="<cms:show k_admin_path />addons/csv/cars.csv"
paginate='1'
limit='100'
prefix='_'
use_cache='0'
>
<cms:if k_paginated_top >
<cms:if k_paginate_link_next >
<script language="JavaScript" type="text/javascript">
var myVar;
myVar = window.setTimeout( 'location.href="<cms:show k_paginate_link_next />";', 1000 );
</script>
<button onclick="clearTimeout(myVar);">Stop</button>
<cms:else />
Done!
</cms:if>
<h3><cms:show k_current_page /> / <cms:show k_total_pages /> pages (Total <cms:show k_total_records /> records. Showing <cms:show k_paginate_limit /> records per page)</hr>
<table border='0'>
<thead>
<tr>
<th>No.</th>
<cms:csv_headers>
<th><cms:show value /></th>
</cms:csv_headers>
</tr>
</thead>
<tbody>
</cms:if>
<tr>
<td><cms:show k_current_record /></td>
<cms:csv_columns>
<td><cms:show value /></td>
</cms:csv_columns>
<!-- database operation here -->
<cms:ignore>
<cms:db_persist
_auto_title = '0'
_invalidate_cache = '0'
_masterpage = 'cars.php'
_mode = 'create'
k_page_title = "<cms:show _car_make/> <cms:show _car_model/>"
car_make = _car_make
car_model = _car_model
car_type = _car_type
car_origin = _car_origin
car_drivetrain = _car_drivetrain
car_msrp = _car_msrp
car_invoice = _car_invoice
car_engine_size = _car_engine_size
car_cylinders = _car_cylinders
car_horsepower = _car_horsepower
car_mpg_city = _car_mpg_city
car_mpg_highway = _car_mpg_highway
car_weight = _car_weight
car_wheelbase = _car_wheelbase
car_length = _car_length
>
<cms:if k_error>
<strong style="color:red;">ERROR:</strong> <cms:show k_error/>
</cms:if>
</cms:db_persist>
</cms:ignore>
<!-- end database operation -->
</tr>
<cms:if k_paginated_bottom >
<tr>
<td></td>
<td colspan='<cms:show k_csv_header_count />'>
<cms:paginator simple='1' />
</td>
</tr>
</tbody>
</table>
</cms:if>
</cms:csv_reader>
<cms:else/>
<button onclick='location.href="<cms:add_querystring k_page_link 'import=1' />"'>Start!</button>
</cms:if>
<?php COUCH::invoke(); ?>
The notable addition to the original table code is the cms:db_persist tag which creates new cloned pages using the data made available by the cms:csv_reader tag. In the code above the cms:db_persist block is commented out using cms:ignore tags to stop it from executing. Once you have tested that everything works as expected, remove the cms:ignore tags to begin the import process. If the csv file you are using is large, make sure to set 'use_cache' to '1' too.
Another (admittedly very crude) addition is the use of JS to automatically navigate through the paginated pages (instead of manually clicking on the paginator buttons). The point is to show how to couple cms:csv_reader tag with cms:db_persist tag and create a 'staggered' importer. I an sure you can create an importer that is more aesthetical (and maybe uses AJAX etc.).
Once the import process is complete, this template can be deleted.
In closing, please note that the cms:csv_reader tag is concerned with only making available the data contained in the specified csv file. What gets done with that data is completely up to you. It is only 'incidental' that we are using cms:db_persist with the data to create cloned pages thus effectively creating an importer.
This also translates to that we can easily extend the functionality of the importer we created by adding our own Couch code. For example, instead of creating new pages we can make it modify existing pages that have data different from what is contained in the csv file or delete pages not found in the csv file etc.
Hope this helps.
Do let us know your thoughts on this addon.
Addendum:
If you are looking for a CSV Exporter, please see
viewtopic.php?f=8&t=11378