We propose several techniques for efficient multi-GPU acceleration in direct linear system solver, which is particularly designed for finite-difference frequency-domain analysis of photonic structures. The algorithm is based on compressed hierarchical Schur method (CHiS), where redundant computation can be avoided with knowledge of duplicated physical structures and numerical elimination process. Since many high-intensity matrix computations are the major workloads in the CHiS algorithm, they can be divided into multiple panels and processed by multiple GPUs. Our implementation uses multithreading to control multiple GPUs. Performance analysis shows that the workload division yields significantly better scale-up results with 4 GPUs compared with naive GPU acceleration.