possible problems with gprof clocktick resolution when analyzing the program/library time ratio using gprof
Mark Farnell
mark.farnell@gmail.com
Fri Jun 8 12:23:00 GMT 2007
More information about the Binutils mailing list
Fri Jun 8 12:23:00 GMT 2007
- Previous message (by thread): While compiling binutils-2.17, i get the error "dereferencing type-punned pointer will break strict-aliasing rules"
- Next message (by thread): possible problems with gprof clocktick resolution when analyzing the program/library time ratio using gprof
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I have a problem with the total time reported by gmon which I think may be related to the resolution of gmon.out (10ms). The case is as below: I am undertaking a project which involves implementing a parallel program with different underlying parallel libraries and in each implementation, I would like to see how much time goes to the program itself and how much time goes to the library. Then I compile both the library with -g -pg and I static link when I compile the library as well as linking the client program using -static This causes each process to generate its own gmon.out file. I have arranged the each process to chdir to its own directory at the beginning of the run so that at the end, they will be able to write their own gmon.out file without overwriting each others' file. I made a flat file output by: gprof -b -p <program> gmon.out Then I ran nm on all object files of the client program to extract the list of client functions, which I use to identify the client programs' function and I add up the self-time of these functions as the program's time. I also add up the self-time of the remaining functions as other_time. However the mystery is that the total self-time of all functions are 10 times the actual time my client program took!. For example, when I run with 32 nodes, it took 8 seconds. However gprof says that 30 seconds were spent by the program and 40s spent on elsewhere! Then when I look at the output of gmon flat profile, I saw some function being called in the order of 100k times within 8s, that each call only take very very little time. So do you think it is the resolution problem which makes gprof overestimate the call duration of each call, thus overestimate time taken? OR can you think of any other explanations? Will the ratio calculated be reliable? How can I fix this problem? Alternatively, do you think there are some problems in multiplication of time therefore the reported time is always 10x the actual time? Here is the output of the gmon flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 16.98 14.83 14.83 102672 0.00 0.00 prims_in_box2 14.34 27.35 12.52 select 13.44 39.08 11.73 _IO_vfscanf 8.40 46.41 7.33 __strtod_internal 5.26 51.00 4.59 mprotect 4.87 55.25 4.25 sendmsg 3.79 58.55 3.31 257767 0.00 0.00 VecMatMult 2.74 60.94 2.39 _int_malloc 2.05 62.73 1.79 165 0.01 0.02 Vdc_arrange_view_diff 1.53 64.07 1.34 132414 0.00 0.00 PolyPeIntersect 1.43 65.32 1.25 str_to_mpn 1.31 66.46 1.14 recv 1.13 67.45 0.99 16122 0.00 0.00 Vdc_diff_apply 1.07 68.38 0.93 92396 0.00 0.00 TriElementBoundBox 0.93 69.19 0.81 51336 0.00 0.00 subdiv_bintree 0.92 69.99 0.80 27677 0.00 0.00 push_down_grid 0.73 70.63 0.64 __mpn_lshift 0.70 71.24 0.62 strlen 0.69 71.84 0.60 7404 0.00 0.00 TriTransform 0.68 72.44 0.60 sigprocmask 0.66 73.02 0.58 __strtol_internal 0.65 73.59 0.57 51750 0.00 0.00 bintree_lookup 0.62 74.13 0.54 round_and_return 0.61 74.66 0.53 _IO_sputbackc 0.52 75.11 0.45 read 0.48 75.53 0.42 22018 0.00 0.00 sigio_handler 0.48 75.95 0.42 7404 0.00 0.00 TriDataNormalize 0.47 76.36 0.41 121097 0.00 0.00 step_grid 0.46 76.76 0.40 __profile_frequency 0.44 77.14 0.38 __mpn_mul 0.42 77.51 0.37 1 0.37 1.10 CloseFrameBuffer 0.42 77.88 0.37 204671 0.00 0.00 ObjectMalloc 0.41 78.24 0.36 __mpn_construct_double 0.41 78.59 0.36 7404 0.00 0.00 TriRead 0.40 78.94 0.35 123750 0.00 0.00 lookup_emptycells 0.39 79.28 0.34 108351 0.00 0.00 VecNorm 0.37 79.60 0.32 sqrt 0.33 79.89 0.29 256 0.00 0.00 RunLengthEncode 0.33 80.18 0.29 gettimeofday 0.32 80.46 0.28 hasmntopt 0.32 80.74 0.28 __mpn_mul_1 0.30 81.00 0.26 8147 0.00 0.00 init_ray 0.27 81.24 0.24 vfprintf 0.26 81.47 0.23 54052 0.00 0.00 NormalizeBoundBox 0.25 81.69 0.22 __mpn_divrem 0.25 81.91 0.22 send 0.25 82.13 0.22 7404 0.00 0.00 TriBoundBox 0.25 82.34 0.22 malloc 0.23 82.54 0.20 54941 0.00 0.00 lookup_hashtable 0.23 82.74 0.20 1 0.20 6.65 ReadGeoFile 0.22 82.94 0.20 memcpy 0.19 83.11 0.17 __errno_location 0.19 83.27 0.17 40485 0.00 0.00 next_nonempty_voxel 0.18 83.43 0.16 250449 0.00 0.00 LocalMalloc 0.16 83.57 0.14 16303 0.00 0.00 IntersectHuniformPrimlist 0.15 83.70 0.13 414 0.00 0.04 create_bintree 0.15 83.83 0.13 1 0.13 0.13 OpenFrameBuffer 0.14 83.95 0.12 flockfile 0.14 84.07 0.12 33953 0.00 0.00 next_nonempty_leaf 0.13 84.18 0.11 414 0.00 0.00 deleteBinTree 0.13 84.29 0.11 fscanf 0.13 84.40 0.11 isnan 0.13 84.51 0.11 sYSMALLOc 0.11 84.61 0.10 7135 0.00 0.00 TriPeIntersect 0.11 84.71 0.10 375 0.00 0.00 Vdc_create_diff 0.11 84.81 0.10 1 0.10 17.17 create_grid 0.10 84.90 0.09 1 0.09 0.09 Vdc_page_init 0.09 84.98 0.08 31286 0.00 0.00 insert_in_hashtable 0.09 85.06 0.08 5502 0.00 0.00 Vdc_acquire_view 0.09 85.14 0.08 1 0.08 2.75 NormalizeGeo 0.08 85.21 0.07 1 0.07 0.07 MakeElementArray 0.08 85.28 0.07 strtof 0.08 85.35 0.07 5233 0.00 0.00 AddPixelColor 0.07 85.42 0.07 102762 0.00 0.00 next_voxel 0.07 85.48 0.06 4098 0.00 0.00 shm_resize 0.07 85.54 0.06 1457 0.00 0.00 Shade 0.07 85.60 0.06 _IO_new_file_xsputn 0.07 85.66 0.06 vfscanf 0.06 85.71 0.05 1024 0.00 0.00 PutJob 0.06 85.76 0.05 1 0.05 7.08 RayTrace 0.06 85.81 0.05 __libc_csu_fini 0.06 85.86 0.05 __mpn_rshift 0.06 85.91 0.05 vfork 0.06 85.96 0.05 write 0.05 86.00 0.05 5233 0.00 0.00 PushRayTreeStack 0.05 86.04 0.04 _IO_default_seekoff 0.05 86.08 0.04 __ieee754_fmod 0.05 86.12 0.04 __mpn_construct_float 0.05 86.16 0.04 brk 0.05 86.20 0.04 buffered_vfprintf 0.04 86.24 0.04 5502 0.00 0.00 Vdc_release_view 0.04 86.27 0.04 funlockfile 0.04 86.31 0.04 setitimer 0.04 86.34 0.04 31286 0.00 0.00 mark_nonempty 0.03 86.37 0.03 5502 0.00 0.00 Vdc_create_view 0.03 86.40 0.03 asprintf 0.03 86.43 0.03 mmap 0.03 86.46 0.03 socket 0.03 86.49 0.03 3776 0.00 0.00 ConvertPrimRayJobToRayMsg 0.03 86.52 0.03 2914 0.00 0.00 HuniformShadowIntersect 0.03 86.55 0.03 352 0.00 0.00 Vdc_merge_diff 0.03 86.58 0.03 fmod 0.03 86.61 0.03 7422 0.00 0.00 Vdc_page_req_handler 0.03 86.63 0.03 _IO_default_xsputn 0.02 86.65 0.02 9615 0.00 0.00 DEBUG 0.02 86.67 0.02 9615 0.00 0.00 DebugIsEnabled 0.02 86.69 0.02 5233 0.00 0.00 TraverseHierarchyUniform 0.02 86.71 0.02 377 0.00 0.00 diff_initial 0.02 86.73 0.02 1 0.02 0.17 InitWorkPool 0.02 86.75 0.02 LocalFree 0.02 86.77 0.02 Rotate 0.02 86.79 0.02 Vdc_simple_ack 0.02 86.81 0.02 __find_specmb 0.02 86.83 0.02 pipe 0.02 86.85 0.02 read_int 0.02 86.87 0.02 strcmp 0.02 86.89 0.02 mremap_chunk 0.02 86.90 0.02 perror 0.02 86.92 0.02 strerror_r 0.02 86.93 0.02 375 0.00 0.00 Vdc_merge_dirty_page 0.02 86.95 0.02 31 0.00 0.00 Vdc_accept 0.02 86.96 0.02 pow 0.01 86.97 0.01 35824 0.00 0.00 ma_rayinfo 0.01 86.98 0.01 20464 0.00 0.00 mark_empty 0.01 86.99 0.01 18335 0.00 0.00 free_rayinfo 0.01 87.00 0.01 17756 0.00 0.00 Vdc_buff_alloc 0.01 87.01 0.01 10466 0.00 0.00 CopyRayMsg 0.01 87.02 0.01 7552 0.00 0.00 frand 0.01 87.03 0.01 4098 0.00 0.00 Vdc_malloc 0.01 87.04 0.01 4098 0.00 0.00 shm_malloc 0.01 87.05 0.01 1457 0.00 0.00 SpecularDirection 0.01 87.06 0.01 450 0.00 0.00 PolyElementBoundBox 0.01 87.07 0.01 375 0.00 0.00 Vdc_make_twin 0.01 87.08 0.01 238 0.00 0.00 GetJob 0.01 87.09 0.01 237 0.00 0.00 GetJobs 0.01 87.10 0.01 225 0.00 0.00 PolyDataNormalize 0.01 87.11 0.01 225 0.00 0.00 PolyTransform 0.01 87.12 0.01 143 0.00 0.00 Vdc_gather_view_diff 0.01 87.13 0.01 62 0.00 0.00 Vdc_barrier_req_handler 0.01 87.14 0.01 2 0.01 0.16 Vdc_barrier 0.01 87.15 0.01 1 0.01 0.01 DebugInit 0.01 87.16 0.01 _IO_file_close 0.01 87.17 0.01 _IO_file_read 0.01 87.18 0.01 __ieee754_pow 0.01 87.19 0.01 __new_exitfn 0.01 87.20 0.01 __res_vinit 0.01 87.21 0.01 __uflow 0.01 87.22 0.01 _dl_start 0.01 87.23 0.01 mempcpy 0.01 87.24 0.01 munmap 0.01 87.25 0.01 open 0.01 87.26 0.01 recvfrom 0.01 87.27 0.01 1 0.01 0.01 InitRayTreeStack 0.01 87.27 0.01 1 0.01 32.82 main 0.01 87.28 0.01 TriIntersect 0.01 87.28 0.01 Vdc_view_req_dup_handler 0.01 87.29 0.01 _IO_default_uflow 0.01 87.29 0.01 finite 0.01 87.30 0.01 sigaction 0.01 87.30 0.01 strcasecmp 0.00 87.30 0.00 18335 0.00 0.00 pop_up_a_grid 0.00 87.30 0.00 16294 0.00 0.00 reset_rayinfo 0.00 87.30 0.00 9009 0.00 0.00 PopRayTreeStack 0.00 87.30 0.00 4012 0.00 0.00 GetRayJobFromBundle 0.00 87.30 0.00 3776 0.00 0.00 TransformViewRay 0.00 87.30 0.00 1457 0.00 0.00 PolyNormal 0.00 87.30 0.00 567 0.00 0.00 segv_handler 0.00 87.30 0.00 414 0.00 0.00 init_bintree 0.00 87.30 0.00 375 0.00 0.00 Vdc_free_twin 0.00 87.30 0.00 375 0.00 0.00 get_view_ptr 0.00 87.30 0.00 374 0.00 0.00 Vdc_insert_diff 0.00 87.30 0.00 225 0.00 0.00 PolyBoundBox 0.00 87.30 0.00 225 0.00 0.00 PolyRead 0.00 87.30 0.00 204 0.00 0.00 Vdc_view_req_handler 0.00 87.30 0.00 177 0.00 0.00 alloc_view_buffer 0.00 87.30 0.00 73 0.00 0.00 buff_expand 0.00 87.30 0.00 73 0.00 0.00 buff_initialize 0.00 87.30 0.00 62 0.00 0.00 get_socket_fd 0.00 87.30 0.00 34 0.00 0.00 Vdc_err 0.00 87.30 0.00 32 0.00 0.01 Vdc_acquire_Rview 0.00 87.30 0.00 32 0.00 0.00 Vdc_release_Rview 0.00 87.30 0.00 32 0.00 0.00 add_host 0.00 87.30 0.00 31 0.00 0.00 Vdc_accept_initialize 0.00 87.30 0.00 31 0.00 0.00 Vdc_connect 0.00 87.30 0.00 31 0.00 0.00 get_process_commandline 0.00 87.30 0.00 31 0.00 0.00 start_process 0.00 87.30 0.00 31 0.00 0.00 wait_until_remote_started 0.00 87.30 0.00 19 0.00 0.00 LookupCommand 0.00 87.30 0.00 6 0.00 0.00 timer_out 0.00 87.30 0.00 5 0.00 0.00 MatrixIdentity 0.00 87.30 0.00 5 0.00 0.00 MatrixMult 0.00 87.30 0.00 5 0.00 0.00 VerifyColorRange 0.00 87.30 0.00 4 0.00 0.00 MatrixCopy 0.00 87.30 0.00 2 0.00 0.00 MatrixInverse 0.00 87.30 0.00 2 0.00 0.00 Translate 0.00 87.30 0.00 1 0.00 17.24 BuildHierarchy_Uniform 0.00 87.30 0.00 1 0.00 0.00 CreateViewMatrix 0.00 87.30 0.00 1 0.00 0.00 Huniform_defaults 0.00 87.30 0.00 1 0.00 0.00 InitDisplay 0.00 87.30 0.00 1 0.00 0.00 InitEnv 0.00 87.30 0.00 1 0.00 0.00 MatrixTranspose 0.00 87.30 0.00 1 0.00 0.00 PrintStatistics 0.00 87.30 0.00 1 0.00 0.00 ReadEnvFile 0.00 87.30 0.00 1 0.00 0.00 Scale 0.00 87.30 0.00 1 0.00 0.00 Vdc_buff_init 0.00 87.30 0.00 1 0.00 0.00 Vdc_distribute 0.00 87.30 0.00 1 0.00 0.00 Vdc_exit 0.00 87.30 0.00 1 0.00 0.00 Vdc_free 0.00 87.30 0.00 1 0.00 0.00 Vdc_segv_init 0.00 87.30 0.00 1 0.00 0.00 Vdc_sigio_init 0.00 87.30 0.00 1 0.00 0.11 Vdc_startup 0.00 87.30 0.00 1 0.00 0.00 Vdc_timer_init 0.00 87.30 0.00 1 0.00 0.00 Vdc_view_init 0.00 87.30 0.00 1 0.00 0.00 ViewRotate 0.00 87.30 0.00 1 0.00 0.00 analyze_args 0.00 87.30 0.00 1 0.00 0.00 display_stat 0.00 87.30 0.00 1 0.00 0.00 exit_handler 0.00 87.30 0.00 1 0.00 0.00 init_mask_arrays 0.00 87.30 0.00 1 0.00 0.00 init_masks 0.00 87.30 0.00 1 0.00 0.09 init_memory 0.00 87.30 0.00 1 0.00 0.02 init_network 0.00 87.30 0.00 1 0.00 0.00 init_world_grid 0.00 87.30 0.00 1 0.00 0.00 init_world_voxel 0.00 87.30 0.00 1 0.00 0.00 ma_print 0.00 87.30 0.00 1 0.00 0.00 powint 0.00 87.30 0.00 1 0.00 0.00 read_infofile 0.00 87.30 0.00 1 0.00 0.00 shm_free 0.00 87.30 0.00 1 0.00 0.00 table_init 0.00 87.30 0.00 1 0.00 0.00 test_localhost
- Previous message (by thread): While compiling binutils-2.17, i get the error "dereferencing type-punned pointer will break strict-aliasing rules"
- Next message (by thread): possible problems with gprof clocktick resolution when analyzing the program/library time ratio using gprof
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Binutils mailing list