Matrix Triangular Solve GPU op by c0g · Pull Request #5010

Matrix Triangular Solve GPU op by c0g · Pull Request #5010 · tensorflow/tensorflow

Adds a GPU version of the triangular solver op using cuBLAS trsm.

Can one of the admins verify this patch?

c0g mentioned this pull request

Oct 17, 2016

		Licensed under the Apache License, Version 2.0 (the "License");
		you may not use this file except in compliance with the License.
		You may obtain a copy of the License at

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't change whitespace in lines you aren't otherwise modifying

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo these changes.

		}

		int64 GetCostPerUnit(const TensorShapes& input_matrix_shapes) const final
		{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably need to multiply this by Eigen's AddCost / MulCost to get appropriate estimates

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I messed up the formatting on the original op due to changing white space. Everything in MatrixTriangularSolveOp is original code from TF.

I'm happy to modify the original code to include AddCost/MulCost.

		{
		double rows = static_cast<double>(input_matrix_shapes[0].dim_size(0));
		double num_rhss = static_cast<double>(input_matrix_shapes[1].dim_size(1));
		double cost = rows * rows * num_rhss;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you doing this in double precision?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the original tensorflow op code (see above). Possibly because double can represent larger numbers than int? I doubt anyone will try a matrix multiply large enough to overflow a 64 bit int though. Happy to change to int64.


		if __name__ == "__main__":
		tf.test.main()
		tf.test.main() No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo whitespace changes

		} else {
		output.noalias() = triangle.solve(rhs);
		}
		trans = perftools::gputools::blas::Transpose::kNoTranspose;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use longer variable names instead of 'trans', 'lda', 'ldb', etc?

then in ThenBlasTrsm call, you can do e.g. upper_or_lower /* uplo */, ..., etc.

easier for other users to read.

		cublas_m, cublas_n, 1.0, matrix_ptr, lda, &out_ptr,
		ldb)
		.ok();
		// LOG(INFO) << blas_launch_status;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented out code

		stream
		->ThenBlasTrsm(perftools::gputools::blas::Side::kRight, uplo, trans,
		perftools::gputools::blas::Diagonal::kNonUnit,
		cublas_m, cublas_n, 1.0, matrix_ptr, lda, &out_ptr,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be Scalar(1) instead of 1.0 (which is a double)

Thanks for the PR! A few comments.

		@@ -24,15 +24,17 @@
		class MatrixTriangularSolveOpTest(tf.test.TestCase):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah in that case let's leave it as double. You're right.

On Oct 17, 2016 9:20 AM, "c0g" notifications@github.com wrote:

@c0g commented on this pull request.

In tensorflow/core/kernels/matrix_triangular_solve_op.cc
#5010:
{
   Base::ValidateSquareSolver(context, input_matrix_shapes);
}
TensorShapes GetOutputMatrixShapes(
   const TensorShapes& input_matrix_shapes) const final
{
   return TensorShapes({ TensorShape({ input_matrix_shapes[0].dim_size(1),
       input_matrix_shapes[1].dim_size(1) }) });
}
int64 GetCostPerUnit(const TensorShapes& input_matrix_shapes) const final

{
   double rows = static_cast<double>(input_matrix_shapes[0].dim_size(0));
   double num_rhss = static_cast<double>(input_matrix_shapes[1].dim_size(1));
   double cost = rows \* rows \* num_rhss;
This is the original tensorflow op code (see above). Possibly because
double can represent larger numbers than int? I doubt anyone will try a
matrix multiply large enough to overflow a 64 bit int though. Happy to
change to int64.

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#5010, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim1ZVajP4Q6Mph62ruIpHz25QrFSYks5q06BhgaJpZM4KYwau
.

Do we still need changes?

Jenkins, test this please.

I have not yet made the requested changes.

On 19 Oct 2016, at 07:03, drpngx notifications@github.com wrote:

Do we still need changes?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@ebrevdo : I reverted to the original file to fix white space then made your other requested changes.

I don't see the revert. Please push?

@drpngx you're right I have no idea what happened to them. The changes should now be up. Sorry for the lag/commit spam.

Jenkins, test this please.

Seems to be passing now, only looks to be waiting on verification of @ebrevdo requested changes.

		double rows = static_cast<double>(input_matrix_shapes[0].dim_size(0));
		double num_rhss = static_cast<double>(input_matrix_shapes[1].dim_size(1));
		double cost = rows * rows * num_rhss;
		double cost = rows * rows * num_rhss *

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! Should be gone now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit; otherwise LGTM.


		#if GOOGLE_CUDA
		#include "tensorflow/core/platform/stream_executor.h"
		#endif // GOOGLE_CUDA

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

@c0g commented on this pull request.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment