Loading all files from directory to table

dirIn one of my previous articles (Full-Text Search with PDF documents) I loaded a few files into a table using OPENROWSET function. It was a good way to achieve what I wanted to do but I understand it might be a bit painful if there are hundreds or thousands of files to load. A today's question is - is there a way to load all PDF files from a specific directory without loading them one by one? 

 

Goal - load all PDF files from a directory

I have PDF documents in c:\temp\ directory which I would like load into a table in a SQL Server database. There might be many purposes of it, for example to index them for Full-Text Search. If there are a few files only, it can be done by listing them one by one and loading them separately:

INSERT INTO dbo.articles (content, extension, title )
SELECT bulkcolumn, '.pdf', 'DBA presents - Database-independent script in Liquibase'
FROM OPENROWSET (BULK 'c:\temp\DBA presents - Database-independent script in Liquibase.pdf', SINGLE_BLOB ) AS t;

INSERT INTO dbo.articles (content, extension, title )
SELECT bulkcolumn, '.pdf', 'DBA presents - Full-Text Search - practical introduction [video]'
FROM OPENROWSET (BULK 'c:\temp\DBA presents - Full-Text Search - practical introduction [video].pdf', SINGLE_BLOB ) AS t;

INSERT INTO dbo.articles (content, extension, title )
SELECT bulkcolumn, '.pdf', 'DBA presents - Generating XML from SQL - part 1'
FROM OPENROWSET (BULK 'c:\temp\DBA presents - Generating XML from SQL - part 1.pdf', SINGLE_BLOB ) AS t;

A problem emerges when there multiple of documents - hundreds, thousands, millions. Loading them one by one using the above statements might be though.

 

Solution - automate scripts generation

Is there a way to tell SQL Server to load all files from a directory to a table? Hmm ... not an easy one.

 

OPENROWSET with MULTIPLE_BLOBS option?

If you take a look at OPENROWSET documentation, you will not find an option like MULTIPLE_BLOBS next to SINGLE_BLOB that would require a directory location instead of a path to a file. We need to keep looking.

 

Iterate through all files using T-SQL?

It seems possible to write a script that uses xp_DirTree function to return all files from a directory and uses this result set to load each file with OPENROWSET. There is one big BUT - xp_DirTree is undocumented and unsupported. Because of it, I do not recommend that way.

 

Generate T-SQL script using Windows CMD

This is a solution I prefer. It works perfectly if you need to load multiple files once. I create a BAT file (genSQL.bat):

@echo off
echo. > loadFiles.sql
for /r %%i in (*.pdf) do (
	echo INSERT INTO dbo.articles(content, extension, title^)
	echo SELECT bulkcolumn, '.pdf', '%%~ni'
	echo FROM OPENROWSET(BULK '%%i', SINGLE_BLOB^) AS t;
) >> loadFiles.sql

When executed in the Windows CMD:

c:\temp>genSQL.bat

c:\temp>

it iterates through all PDF files in the current directory and prints an insert statement to loadFiles.sql for each one. Then, loadFiles.sql contains a separate insert statement for each document:

INSERT INTO dbo.articles (content, extension, title )
SELECT bulkcolumn, '.pdf', 'DBA presents - Database-independent script in Liquibase'
FROM OPENROWSET (BULK 'c:\temp\DBA presents - Database-independent script in Liquibase.pdf', SINGLE_BLOB ) AS t;

INSERT INTO dbo.articles (content, extension, title )
SELECT bulkcolumn, '.pdf', 'DBA presents - Full-Text Search - practical introduction [video]'
FROM OPENROWSET (BULK 'c:\temp\DBA presents - Full-Text Search - practical introduction [video].pdf', SINGLE_BLOB ) AS t;

INSERT INTO dbo.articles (content, extension, title )
SELECT bulkcolumn, '.pdf', 'DBA presents - Generating XML from SQL - part 1'
FROM OPENROWSET (BULK 'c:\temp\DBA presents - Generating XML from SQL - part 1.pdf', SINGLE_BLOB ) AS t;
...

This content can be executed against a SQL Server database to load all PDF files to a table.

This solution follows a two phase approach: generate SQL script and execute it. It is good enough in many real cases.

Do you have any other idea? Share in the comments.

Do not miss valuable content. You will receive a monthly summary email. You can unsubscribe anytime.

We use cookies

We use cookies on our website. Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.