We report a new multi-GPU capable ab initio Hartree-Fock/density functional theory implementation integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Details on the load balancing algorithms for electron repulsion integrals and exchange correlation quadrature across multiple GPUs are described. Benchmarking studies carried out on up to 4 GPU nodes, each containing 4 NVIDIA V100-SMX2 type GPUs demonstrate that our implementation is capable of achiev- ing excellent load balancing and high parallel efficiency. For representative medium to large size protein/organic molecular sys- tems, the observed efficiencies remained above 86%. The accelerations on NVIDIA A100, P100 and K80 platforms also have real- ized parallel efficiencies higher than 74%, paving the way for large-scale ab initio electronic structure calculations.