Summary: This paper develops a 2nd-order optimization method based on the “Hessian-free” approach. 

Pros:  
1.	The presentation covered the most relevant technologies and skipped the unnecessary content so that the presentation was finished on time. 
2.	The speaker answered the questions quite well. 

Comments: 
1.	[Presentation] The speaker may want to check the pronunciation of some words. 
2.	[Presentation] It is better to reorganize the presentation structure and consider each audience's various knowledge backgrounds. The fundamental definitions and concepts should be introduced before the main body of the paper.  
3.	[Slide] Try to keep the coherence of the math notations in the slides and avoid typos. The speaker needs to explain any content/representation/definition in the slides. If some contents (concepts or equations or figures) are not explained, they should be very obvious to the audience. As a general rule, the explained should be a strict subset of what you talk about.

Selected questions: 
1.	Whether d_{i}, the i-th search direction in Conjugate Gradient Newton's method (CG-Newton), be conjugate to any previous search directions not only d_{i+1}? There are proofs in the Numerical Optimization book.
2.	How to understand the max iterations in CG-Newton equals to the dimension of the input variable x. See the above book.
3.	Can we find any connections between the eigenvalue of the Hessian matrix (H_f) and \rho? What is rho?
4.	In which situation that the method from this paper will work better than SGD? What happens if we train GCN on a large graph using this method?
5.	Will the approximation for Hv (Hessian-vector product) affect the overall convergence?