06-09-2019

Laravel Lazy Collections - Import Large dataset from CSV

Laravel Lazy Collections - Import Large dataset from CSV

Lazy Collections, implemented by Joseph Silber, is one of my favourite Laravel 6 new feature.

the LazyCollection class leverages PHP's generators to allow you to work with very large collections of data, including Eloquent model collections while keeping memory usage low.

In addition the query builder's cursor method has been updated to return a LazyCollection instance. This allows you to still only run a single query against the database but also only keep one Eloquent model loaded in memory at a time.

The Lazy Collections class can be very useful when you need to import data from big CSV file with more than 100.000 rows.

So I've tried to import a large CSV file with 300.000 rows into a Mysql table.

Hereafter the code I've used:

use App\Project;
use Illuminate\Support\LazyCollection;


Route::get('/', function () {

LazyCollection::make(function () {
// project.csv with 300.000 rows
$filePath = storage_path('app/public/project.csv');
$handle = fopen($filePath, 'r');
while ($line = fgetcsv($handle)) {
yield $line;
}
})
->chunk(10000) //split in chunk to reduce the number of queries
->each(function ($lines) {

$list = [];
foreach ($lines as $line) {
if (isset($line[1])) {
$list[] = [
'name' => $line[1],
'email' => $line[2],
'status_id' => $line[3]
];
}
}
// insert 10000 rows in one shot
Project::insert($list);

});
/* display memory usage */
echo number_format(memory_get_peak_usage() / 1048576, 2) . ' MB';
});

 

In order to reduce the number of insert queries I've used the chunck method that breaks the data collection into multiple smaller collections of 10000 rows.

In that way you can reduce the number of query to 30.

Running the import on my local machine take less tha 30MB memory usage after disabling DebugBar.

ADDITIONAL RESOURCES 

Lazy Collection Methods from Laravel Documentation

Lazy Collections - Laracasts

Massive DB rows insert in Laravel

Create LazyCollection from API resource