[Bug] [hdfs-connector] Timestamp type column can't read

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

master分支

orc filetype
Using hdfs-connector to write timestamp type cannot be read。
使用hdfs-connector写timestamp类型不能读取

error message:

Caused by: java.lang.RuntimeException: ORC split generation failed with exception: org.apache.orc.impl.SchemaEvolution$IllegalEvolutionException: ORC does not support type conversion from file type struct<nanos:int> (1) to reader type timestamp (1)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1851)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1939)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:425)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:395)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:314)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:540)
	... 16 more
Caused by: java.util.concurrent.ExecutionException: org.apache.orc.impl.SchemaEvolution$IllegalEvolutionException: ORC does not support type conversion from file type struct<nanos:int> (1) to reader type timestamp (1)
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1845)
	... 21 more
Caused by: org.apache.orc.impl.SchemaEvolution$IllegalEvolutionException: ORC does not support type conversion from file type struct<nanos:int> (1) to reader type timestamp (1)
	at org.apache.orc.impl.SchemaEvolution.buildConversion(SchemaEvolution.java:559)
	at org.apache.orc.impl.SchemaEvolution.buildConversion(SchemaEvolution.java:528)
	at org.apache.orc.impl.SchemaEvolution.<init>(SchemaEvolution.java:123)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1669)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1533)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1329)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1513)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1510)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1510)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1329)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

hive table struct:

CREATE TABLE `1020_test`(
  `create_time` timestamp)
stored as orc

chunjun json:

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1,
        "bytes": 0
      },
      "errorLimit": {
        "record": 0
      },
      "restore": {
        "isStream": false,
        "isRestore": false,
        "restoreColumnName": "",
        "restoreColumnIndex": 0,
        "maxRowNumForCheckpoint": 0
      },
      "log": {
        "isLogger": false,
        "level": "info",
        "path": "",
        "pattern": ""
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "${mysql_username}",
            "password": "${mysql_password}",
            "connection": [
              {
                "jdbcUrl": [
                  "jdbc:mysql://${mysql_server}:3306/test?useUnicode=true&characterEncoding=utf-8&useSSL=false"
                ],
                "table": [
                  "dict"
                ]
              }
            ],
            "column": [
              {
                "name": "create_time",
                "type": "timestamp"
              }
            ],
            "customSql": "",
            "where": "",
            "queryTimeOut": 1000,
            "requestAccumulatorInterval": 2,
            "startLocation": "0",
            "polling": false,
            "pollingInterval": 3000
          }
        },
        "writer": {
          "name": "hdfswriter",
          "parameter": {
            "path": "/user/hive/warehouse/1020_test",
            "column": [
              {
                "name": "create_time",
                "type": "timestamp"
              }
            ],
            "writeMode": "overwrite",
            "fileType": "orc",
            "encoding": "utf-8",
            "fieldDelimiter": "\u0001",
            "defaultFS": "hdfs://127.0.0.1:9000"
          }
        }
      }
    ]
  }
}

What you expected to happen

org.apache.orc.impl.SchemaEvolution$IllegalEvolutionException: ORC does not support type conversion from file type struct<nanos:int> (1) to reader type timestamp (1)

It looks like the timestamp type in java cannot be directly written to the timestamp type in orc, But I did some research and didn't find a more appropriate type.

How to reproduce

as above

Anything else

I'm willing to make a PR,but before I try,we can have a little discuss.

Version

master

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct