Summary: This paper develops a 2nd-order optimization method based on the “Hessian-free” approach. Pros: 1. The presentation covered the most relevant technologies and skipped the unnecessary content so that the presentation was finished on time. 2. The speaker answered the questions quite well. Comments: 1. [Presentation] The speaker may want to check the pronunciation of some words. 2. [Presentation] It is better to reorganize the presentation structure and consider each audience's various knowledge backgrounds. The fundamental definitions and concepts should be introduced before the main body of the paper. 3. [Slide] Try to keep the coherence of the math notations in the slides and avoid typos. The speaker needs to explain any content/representation/definition in the slides. If some contents (concepts or equations or figures) are not explained, they should be very obvious to the audience. As a general rule, the explained should be a strict subset of what you talk about. Selected questions: 1. Whether d_{i}, the i-th search direction in Conjugate Gradient Newton's method (CG-Newton), be conjugate to any previous search directions not only d_{i+1}? There are proofs in the Numerical Optimization book. 2. How to understand the max iterations in CG-Newton equals to the dimension of the input variable x. See the above book. 3. Can we find any connections between the eigenvalue of the Hessian matrix (H_f) and \rho? What is rho? 4. In which situation that the method from this paper will work better than SGD? What happens if we train GCN on a large graph using this method? 5. Will the approximation for Hv (Hessian-vector product) affect the overall convergence?