This week I was creating a functionality. An independent service to export large part of data to MS Excel file and part of service which retrieve data from database. It is obvious that xls format can contain about 65 thousand of rows, so I decided to use xlsx format which I thought it is unlimited but about this it will be later. My requirement was to export from database to excel a large set of data and not kill application.
First of all I focused on the output. The solution was to not use a XSSFWorkbook but SXSSFWorkbook. In my application currently I use old version of Apache POI v3.7 and there isn't implemented SXSSFWorkbook so in this case there is impossible to solve my problem. SXSSFWorkbook is available from v3.8.
However what I could do after I upgrade a libraries? I checked and it is possible to export huge part of data using less then 64MB heap memory. The SXSSFWorkbook implementation can save simple data in a stream. Process of creating file is split into two phases. In the first phase implementation is saving processed data into temporary file (on linux it is /tmp/.... file). In the second phase temporary xml file is compressed with additional files containing styles and other information into final file.
However what I could do after I upgrade a libraries? I checked and it is possible to export huge part of data using less then 64MB heap memory. The SXSSFWorkbook implementation can save simple data in a stream. Process of creating file is split into two phases. In the first phase implementation is saving processed data into temporary file (on linux it is /tmp/.... file). In the second phase temporary xml file is compressed with additional files containing styles and other information into final file.
By the way I found out that xlsx is not unlimited and every sheet can have maximum a little more then one million rows (2^20) and about 16 thousand columns (2^14).
After I had found out how to export large volume of data to xlsx I looking for solution how to retrieve data from database row by row. I'd like to separate input from output service. I created interface of DataProvider and injected there a RowMapper and other types used in NamedParameterStatement's query method but it doesn't work. Finally I used a ScrollableResultSet with Forward option and limitation of retrieved data at once and it works.