Reading multiple columns from large CSV files in PHP

Asked
Active3 hr before
Viewed126 times

6 Answers

filescolumnsreadingmultiple
90%

fgetcsv — Gets line from file pointer and parse for CSV fields,fputcsv() - Format line as CSV and write to file pointer, Similar to fgets() except that fgetcsv() parses the line it reads for fields in CSV format and returns an array containing the fields read. , The optional separator parameter sets the field separator (one single-byte character only).

load more v
88%

$file = fopen('file.csv', 'r');
while (($line = fgetcsv($file)) !== FALSE) {
   print_r($line);
}
fclose($file);
load more v
72%

I am trying to parse a csv file into an array. Unfortunately one of the columns contains commas and quotes (Example below). Any suggestions how I can avoid breaking up the column in to multiple columns?,Maybe you should think about using PHP's file()-function which reads you CSV-file into an array.,I have tried changing the deliminator in the fgetcsv function but that didn't work so I tried using str_replace to escape all the commas but that broke the script.,Example of CSV format

Example of CSV format

title, - > link, - > description, - > id
Achillea, - > http: //www.example.com,->another,short example "Of the product",->346346
   Seeds, - > http: //www.example.com,->"please see description for more info, thanks",->34643
   Ageratum, - > http: //www.example.com,->this is, a brief description, of the product.,->213421

   // Open the CSV
   if (($handle = fopen($fileUrl, "r")) !== FALSE) {

      // Set the parent array key to 0
      $key = 0;

      // While there is data available loop through unlimited times (0) using separator (,)
      while (($data = fgetcsv($handle, 0, ",")) !== FALSE) {

         // Count the total keys in each row
         $c = count($data);

         //Populate the array
         for ($x = 0; $x < $c; $x++) {
            $arrCSV[$key][$x] = $data[$x];
         }
         $key++;
      } // end while

      // Close the CSV file
      fclose($handle);
   }
load more v
65%

If you have a column name more than two times, you'll end up with this, probably unintended, result:,As performance is your main concern, let's face this first. To complete the example CSV-file with ~36k lines your original script needs around 139s*.,With this improvement the script processes all 36k lines in ~0.5s now*. Seems a little faster. ;),In the beginning you create unique names for duplicate entries in the header row:

The main bottlenecks are in_array:

if (!in_array($record, $return_waarde)) {}

and array_combine:

$return_waarde[] = array_combine($header_trimmed, str_getcsv(utf8_encode($record), ';'));

Result

while (false !== ($data = fgetcsv($handle, 1000, ','))) {
   $hash = md5(serialize($data));

   if (!isset($hashes[$hash])) {
      $hashes[$hash] = true;
      $values[] = array_combine($headerUnique, $data);
   }
}
if (!in_array($record, $return_waarde)) {
   $return_waarde[] = array_combine($header_trimmed, str_getcsv());
}
if (!in_array($value, $header_trimmed)) {
   $header_trimmed[] = $trim;
} else {
   $header_trimmed[] = $trim.
   "1";
}

If you have a column name more than two times, you'll end up with this, probably unintended, result:

['column', 'column1', 'column1']

You can create a function to make the names truly unique, e.g.:

function unique_columns(array $columns): array {
   $values = [];

   foreach($columns as $value) {
      $count = 0;
      $value = $original = trim($value);

      while (in_array($value, $values)) {
         $value = $original.
         '-'.++$count;
      }

      $values[] = $value;
   }

   return $values;
}

This will result in

['column', 'column-1', 'column-2']

Currently your function read_csv() does return either a string or an array. The function should always return an array. You can even make the parameter- and return-value-types more strict:

function read_csv(string $file): array {}

Also try to exit early, when something went wrong instead of nesting if-statements. If you actually want to do something, if an error occurs, throw an exception:

if (!$file) {
   throw new Exception('File not found: '.file);
}

Finally let's make this function more versatile by adding the line length and delimiter as optional parameters.

function read_csv(string $file, int $length = 1000, string $delimiter = ','): array {
   $handle = fopen($file, 'r');
   $hashes = [];
   $values = [];
   $header = null;
   $headerUnique = null;

   if (!$handle) {
      return $values;
   }

   $header = fgetcsv($handle, $length, $delimiter);

   if (!$header) {
      return $values;
   }

   $headerUnique = unique_columns($header);

   while (false !== ($data = fgetcsv($handle, $length, delimiter))) {
      $hash = md5(serialize($data));

      if (!isset($hashes[$hash])) {
         $hashes[$hash] = true;
         $values[] = array_combine($headerUnique, $data);
      }
   }

   fclose($handle);

   return $values;
}
load more v
75%

How to split large CSV files and process using PHP?,If you want to learn more about the CSV format and MIME types, the RFC 4180 document has more information.,How to import large CSV files?,Process CSV file with comma data values using PHP

        col_11, col_12, col_13 CRLF
        col_21, col_22, col_23 CRLF
load more v
40%

In this method, we will split one CSV file into multiple CSVs based on rows.,We will filter the columns based on the specific column name Gender to its values (Male and Female). Then convert that to CSV file using to_csv in pandas. ,Python program to check if a string is palindrome or not,We can group more than two columns and can create multiple files on the basis of a combination of unique values from both Columns value. Take Gender and Annual Income columns. 

Other "files-columns" queries related to "Reading multiple columns from large CSV files in PHP"