It seems the venerable file format CSV (Character/Comma-separated values) never goes out of style. According to CSV wikipedia page, IBM Fortran started supporting this format in 1967, before many of us were born. CSV has been with us, through thick and thin, silently but steadfastly, ready to spring into action when duty calls. For sure, it’s one of data professionals’ best friends! Often times, we’d convert spreadsheet files or dump data inside a database into a CSV before it can be distributed and consumed downstream.
Major league scripting languages, such as Perl, Python, and Ruby, all have their own way of converting Excel spreadsheet files into CSVs. Here I list their most popular libraries, based on my research: for Perl, there is Spreadsheet::ParseExcel; for Python, there is xlrd; for Ruby, there is roo.
However, none of these addressed a problem I had recently.
Here is my use case:
Given Excel files, in both xls and xlsx format, that have hyperlink columns in them, convert them to CSV. For hyperlink columns, save the text value (also known as Friendly_name) but not its URL. None of the libraries mentioned above can handle it.
So I ended up trying PHP, and found a PHP library called PHPExcel that addressed my needs. Below is a quick CLI PHP program I wrote.
Follow steps below to use it:
Hope it helps!
#!/usr/bin/php -q < ?php require_once('/Directory2PHPExcel/PHPExcel/Classes/PHPExcel.php'); $inputFile = $argv[1]; $outputFile = $argv[2]; Xls2Csv($inputFile,$outputFile); function Xls2Csv($infile,$outfile) { $fileType = PHPExcel_IOFactory::identify($infile); $objReader = PHPExcel_IOFactory::createReader($fileType); $objReader->setReadDataOnly(true); $objPHPExcel = $objReader->load($infile); $objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV'); $objWriter->save($outfile); } ?>