Issue33941
Created on 2018-06-22 13:03 by Raghunath Lingutla, last changed 2022-04-11 14:59 by admin.
| Messages (3) | |||
|---|---|---|---|
| msg320233 - (view) | Author: Raghunath Lingutla (Raghunath Lingutla) | Date: 2018-06-22 13:03 | |
Can not recognize invalid date values for %Y%m%d, %y%m%d, %Y%m%d %H:%M and few more formats. In Java we have setLenient option which help us to validate to pattern and convert only valid formats
Ex: datetime.strptime('181223', '%Y%m%d')
For above input I am getting output as 1812-02-03 00:00:00 but expected output is error as ValueError: time data '181223' does not match format '%Y%m%d'
I tested below mentioned 4 modules. All modules giving same output
1) datetime.strptime
2) timestring.Date
3) parser.parse from dateutil
4) dateparser.parse
|
|||
| msg320245 - (view) | Author: Chris Wilcox (crwilcox) * | Date: 2018-06-22 17:15 | |
As %m and %d denote zero padded forms of month and day it seems to me this shouldn't match. Executing a small c program `char* ret = strptime("181223", "%Y%m%d", &tm);` confirms that this is considered invalid to c. The datetime docs indicate that the behavior should match C89 so I would expect python to return ValueError here as well. https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
|
|||
| msg320291 - (view) | Author: Chris Wilcox (crwilcox) * | Date: 2018-06-22 23:22 | |
I looked a bit at _strptime.py and the corresponding tests and thought I would share my notes.
The regular expressions clearly allow non-zero padded values for both %d and %m matches. There is one test where the following is run: time.strptime("Mar 1", "%b %d"). So it seems intentional that %d and %m allow non-zero padded values.
It also just occurred to me that the example '181223' isn't ambiguous as %Y requires 4 digits and months cannot be more than 12. So it seems to me this could only be Y=1812,M=2,D=3.
There do exist cases in which they are truly ambiguous for non-zero padded values. For instance, 2018111 could potentially be 2018-Nov-1 or 2018-Jan-11. Python will deterministically take the most possible for the next value, so this will be November 11, 2018. Though, there is really no reason I can figure that can be assumed.
The edits required to stop allowing non-zero padded values were pretty straightforward and only one unit test (one that verifies 'Mar 1' comes after 'Feb 29') had to be altered. That may point more to a need to add additional tests though than an endorsement that no one is using single digit day or month values.
|
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:02 | admin | set | github: 78122 |
| 2018-07-05 15:00:50 | p-ganssle | set | nosy:
+ p-ganssle |
| 2018-06-22 23:22:55 | crwilcox | set | messages: + msg320291 |
| 2018-06-22 20:50:35 | ned.deily | set | nosy:
+ belopolsky |
| 2018-06-22 17:15:18 | crwilcox | set | versions:
+ Python 2.7 nosy: + crwilcox messages: + msg320245 components: + Library (Lib), - Tests |
| 2018-06-22 13:03:38 | Raghunath Lingutla | create | |