Benignity of loss landscape with weight decay requires both large overparametrization and initialization

Published:

Direct Link