• person rss_feed

    Michał "phoe" Herda’s feed


    • chevron_right

      CFFI arrays versus STATIC-VECTORS: a comparison

      Michał "phoe" Herda · Friday, 21 December, 2018 - 12:37 edit · 3 minutes

    This blogpost describes some details of using CFFI arrays versus static vectors in implementing a wrapper around a foreign library. In our case, it is a LZMA (de)compressor, with original C sources available here and my wrapper library available here. CL-LZMA supports both basic compression and decompression. For brevity, let us discuss decompression only.

    The old version of my code used CFFI to convert between Lisp data and C data, by means of FOREIGN-ARRAY-TO-LISP and LISP-TO-FOREIGN-ARRAY. These functions allocate a new array each time and copy the contents over, which is a slow process.

    This code was also doubly wasteful: not only FOREIGN-ARRAY-TO-LISP copies once by creating a new Lisp array containing the data, but COERCE does a second copy, this time copying the data into an array of proper element type. Back when this code was written, CFFi's FOREIGN-ARRAY-TO-LISP had no means of specifying element type to the created array; I have fixed it with some help from Luís Oliveira.

    However, the current code, uses a different strategy. I no longer use CFFI's LISP-TO-FOREIGN-ARRAY or FOREIGN-ARRAY-TO-LISP to copy array contents. Instead, I create a single static vector in L183.

    The downside of a static vector is that it is not garbage collected, so we have to manually free it when we are done with it; the upside is that we are able to get a C pointer of such a vector that refers to the same memory area that is used by Lisp. This means that a static vector allows us to store a single vector of data that can be freely operated on by C functions (since we have a immobile memory pointer) and by Lisp functions (since we have a reference to the ARRAY object).

    In my case, the LZMA (de)compressor accepts pointers to raw memory. The ability to use static vectors requires your library to be designed around externally supplied buffers; in other words, static vectors are useless if your C code allocates its buffers on its own. In general, it is impossible to create a Lisp vector from a pointer to raw memory without copying the whole vector, so the only way of getting a static vector to work with C code is allocating it inside Lisp and then passing its pointer to whatever C function you are using.

    As for a small speed-test, let us test compressing and decompressing one megabyte of random data on my machine.

    (defun random-vector (length)
      (let ((vector (make-array length :element-type '(unsigned-byte 8)))
            (*random-state* (make-random-state t)))
        (loop for i below (length vector)
              do (setf (aref vector i) (random 256)))
    (defvar *random-data* (random-vector (expt 10 6)))

    New code:

    CL-USER> (time (multiple-value-call #'cl-lzma:lzma-decompress
                     (cl-lzma:lzma-compress *random-data*)))
    Evaluation took:
      0.323 seconds of real time
      0.323035 seconds of total run time (0.318949 user, 0.004086 system)
      100.00% CPU
      682,239,302 processor cycles
      0 bytes consed

    Old code:

    CL-USER> (time (multiple-value-call #'cl-lzma:lzma-decompress
                     (cl-lzma:lzma-compress *random-data*)))
    Evaluation took:
      1.324 seconds of real time
      1.330467 seconds of total run time (1.294264 user, 0.036203 system)
      [ Run times consist of 0.083 seconds GC time, and 1.248 seconds non-GC time. ]
      100.45% CPU
      2,796,481,840 processor cycles
      1,119,702,784 bytes consed

    Not only the new code is four times faster (a third of a second versus four thirds of a second), the new code does not cons a single byte.

    The SBCL profiler tells me that consing is really high in the old code:

    CL-USER> (sb-profile:profile cffi:foreign-array-to-lisp 
    CL-USER> (sb-profile:report)
    measuring PROFILE overhead..done
      seconds  |     gc     |     consed    | calls |  sec/call  |  name  
         0.638 |      0.070 | 1,020,998,800 |     3 |   0.212666 | CFFI:FOREIGN-ARRAY-TO-LISP
         0.365 |      0.006 |    96,665,568 |     3 |   0.121666 | CFFI:LISP-ARRAY-TO-FOREIGN
         0.207 |      0.000 |     1,013,680 |     1 |   0.206998 | CL-LZMA:LZMA-COMPRESS
         0.063 |      0.000 |     1,000,016 |     1 |   0.062998 | CL-LZMA:LZMA-DECOMPRESS
         1.273 |      0.076 | 1,119,678,064 |     8 |            | Total

    This becomes even more evident when we up the ante to one hundred megabytes.

    (setf *random-data* (random-vector (expt 10 8)))

    (Remember to avoid printing the resulting structure. Printing hundreds of megabytes to your SLIME session is a good way to cause yourself problems. Either avoid printing them altogether or set the *print-length* special variable.)

    New code:

    CL-USER> (time (multiple-value-call #'cl-lzma:lzma-decompress
                     (cl-lzma:lzma-compress *random-data*)))
    Evaluation took:
      37.950 seconds of real time
      37.883646 seconds of total run time (37.687516 user, 0.196130 system)
      99.83% CPU
      80,150,688,876 processor cycles
      0 bytes consed

    Old code:

    CL-USER> (time (multiple-value-call #'cl-lzma:lzma-decompress
                     (cl-lzma:lzma-compress *random-data*)))
    ;; Error: Heap exhausted (no more space for allocation).
    ;; 757301248 bytes available, 810862240 requested.
    ;;   [Condition of type SB-KERNEL::HEAP-EXHAUSTED-ERROR]
    ;; Backtrace:
    ;;   (...)
    ;;   10: (CFFI:FOREIGN-ARRAY-TO-LISP #.(SB-SYS:INT-SAP #X7FCE5E1DB010) (:ARRAY :UINT8 101357778))
    ;;   11: (CL-LZMA::%LZMA-COMPRESS #.(SB-SYS:INT-SAP #X7FCE5E1DB010) #.(SB-SYS:INT-SAP #X7FCE7C002410) #.(SB-SYS:INT-SAP #X7FCE760A1010) 100000000)
    ;;   12: (CL-LZMA:LZMA-COMPRESS #(153 48 248 229 45 236 ...))

    I do not think this last result needs any additional comments. :)