Optimizing Batch HGEMM on Small Sizes Using Tensor Cores