Tuesday, September 23, 2014

Easy method to parse data in CSV format into individual fields

Dear Reader

Today we will take a look at how we can very easily parse a line of data in CSV format into its individual columns/fields

We usually use a control file and SQLLDR to load some data from a CSV file into a staging table and then process it. But this involves needing to launch a HOST or SQL*Loader program to perform this task from within your PLSQL program

What if we wanted to keep everything together inside our PLSQL program and not create other programs just to do the load? Simple, follow these steps

1. Put the CSV file in any utl_file_dir location
2. Open it using utl_file and get the data line using utl_file.get_line
3. Now comes the interesting part i.e. parsing the data line. Use the below code and it will return you a PLSQL table containing the individual columns.

DECLARE
   TYPE string_tbl IS TABLE OF VARCHAR2(400) INDEX BY BINARY_INTEGER;
   x_token_tbl string_tbl;
   p_input_string VARCHAR2(2000) := 'Col1,Col2,Col3,Col4';
   gv_delimiter VARCHAR2(1) := ',';
BEGIN
   SELECT   trim(SUBSTR(t1.str, t1.curr_pos + 1, DECODE(t1.next_pos, 0, LENGTH(t1.str) + 2, t1.next_pos)- t1.curr_pos - 1)) tokens
   BULK COLLECT INTO x_token_tbl
   from
   (
         select   ids str,
                  DECODE(level, 1, 0, INSTR(ids, gv_delimiter, 1, level - 1)) as curr_pos,
                  INSTR(ids, gv_delimiter, 1, level) as next_pos
         from     (SELECT p_input_string ids FROM dual)
         connect by level <= LENGTH(ids) - LENGTH(replace(ids,gv_delimiter,'')) + 1
   ) t1;
   FOR i IN x_token_tbl.FIRST..x_token_tbl.LAST LOOP
      dbms_output.put_line('Value at index: ' || i || ' is : ' || x_token_tbl(i));
   END LOOP;

END;

If you run the above script you would see that "Col1" will be at index-1, "Col2" at index-2 and so on.

Now, you have successfully parsed your CSV data line into its pieces and can go ahead and insert into your staging table ... No host program, No SQLLDR, No control file ... simple right?

Hope this helps

Cheers
A

1 comment: