possible problems with gprof clocktick resolution when analyzing the program/library time ratio using gprof

Mark Farnell mark.farnell@gmail.com
Fri Jun 8 12:23:00 GMT 2007
I have a problem with the total time reported by gmon which I think
may be related to the resolution of gmon.out (10ms).  The case is as
below:

I am undertaking a project which involves implementing a parallel
program with different underlying parallel libraries and in each
implementation, I would like to see how much time goes to the program
itself and how much time goes to the library.

Then I compile both the library with
-g -pg
and I static link when I compile the library as well as linking the
client program using -static

This causes each process to generate its own gmon.out file.  I have
arranged the each process to chdir to its own directory at the
beginning of the run so that at the end, they will be able to write
their own gmon.out file without overwriting each others' file.

I made a flat file output by:

gprof -b -p <program> gmon.out

Then I ran nm on all object files of the client program to extract the
list of client functions, which I use to identify the client programs'
function and I add up the self-time of these functions as the
program's time.  I also add up the self-time of the remaining
functions as other_time.

However the mystery is that the total self-time of all functions are
10 times the actual time my client program took!.  For example, when I
run with 32 nodes, it took 8 seconds.  However gprof says that 30
seconds were spent by the program and 40s spent on elsewhere!

Then when I look at the output of gmon flat profile, I saw some
function being called in the order of 100k times within 8s, that each
call only take very very little time.  So do you think it is the
resolution problem which makes gprof overestimate the call duration of
each call, thus overestimate time taken?  OR can you think of any
other explanations?

Will the ratio calculated be reliable?  How can I fix this problem?

Alternatively, do you think there are some problems in multiplication
of time therefore the reported time is always 10x the actual time?

Here is the output of the gmon flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 16.98     14.83    14.83   102672     0.00     0.00  prims_in_box2
 14.34     27.35    12.52                             select
 13.44     39.08    11.73                             _IO_vfscanf
  8.40     46.41      7.33                             __strtod_internal
  5.26     51.00     4.59                             mprotect
  4.87     55.25     4.25                             sendmsg
  3.79     58.55     3.31   257767     0.00     0.00  VecMatMult
  2.74     60.94     2.39                             _int_malloc
  2.05     62.73     1.79      165     0.01     0.02  Vdc_arrange_view_diff
  1.53     64.07     1.34   132414     0.00     0.00  PolyPeIntersect
  1.43     65.32     1.25                             str_to_mpn
  1.31     66.46     1.14                             recv
  1.13     67.45     0.99    16122     0.00     0.00  Vdc_diff_apply
  1.07     68.38     0.93    92396     0.00     0.00  TriElementBoundBox
  0.93     69.19     0.81    51336     0.00     0.00  subdiv_bintree
  0.92     69.99     0.80    27677     0.00     0.00  push_down_grid
  0.73     70.63     0.64                             __mpn_lshift
  0.70     71.24     0.62                             strlen
  0.69     71.84     0.60     7404     0.00     0.00  TriTransform
  0.68     72.44     0.60                             sigprocmask
  0.66     73.02     0.58                             __strtol_internal
  0.65     73.59     0.57    51750     0.00     0.00  bintree_lookup
  0.62     74.13     0.54                             round_and_return
  0.61     74.66     0.53                             _IO_sputbackc
  0.52     75.11     0.45                             read
  0.48     75.53     0.42    22018     0.00     0.00  sigio_handler
  0.48     75.95      0.42     7404     0.00     0.00  TriDataNormalize
  0.47     76.36     0.41   121097     0.00     0.00  step_grid
  0.46     76.76     0.40                             __profile_frequency
  0.44     77.14     0.38                             __mpn_mul
  0.42     77.51     0.37        1     0.37     1.10  CloseFrameBuffer
  0.42     77.88     0.37   204671     0.00     0.00  ObjectMalloc
  0.41     78.24     0.36                             __mpn_construct_double
  0.41     78.59     0.36     7404     0.00     0.00  TriRead
  0.40     78.94     0.35   123750     0.00     0.00  lookup_emptycells
  0.39     79.28     0.34   108351     0.00     0.00  VecNorm
  0.37     79.60      0.32                             sqrt
  0.33     79.89     0.29      256     0.00     0.00  RunLengthEncode
  0.33     80.18     0.29                             gettimeofday
  0.32     80.46     0.28                             hasmntopt
  0.32     80.74     0.28                             __mpn_mul_1
  0.30     81.00     0.26     8147     0.00     0.00  init_ray
  0.27     81.24     0.24                             vfprintf
  0.26     81.47      0.23    54052     0.00     0.00  NormalizeBoundBox
  0.25     81.69     0.22                             __mpn_divrem
  0.25     81.91     0.22                             send
  0.25     82.13     0.22     7404     0.00     0.00  TriBoundBox
  0.25     82.34     0.22                             malloc
  0.23     82.54     0.20    54941     0.00     0.00  lookup_hashtable
  0.23     82.74     0.20        1     0.20     6.65  ReadGeoFile
  0.22     82.94     0.20                             memcpy
  0.19     83.11     0.17                             __errno_location
  0.19     83.27     0.17    40485     0.00     0.00  next_nonempty_voxel
  0.18     83.43     0.16   250449     0.00     0.00  LocalMalloc
  0.16     83.57     0.14    16303     0.00     0.00  IntersectHuniformPrimlist
  0.15     83.70     0.13      414     0.00     0.04  create_bintree
  0.15     83.83     0.13        1     0.13     0.13  OpenFrameBuffer
  0.14     83.95     0.12                             flockfile
  0.14     84.07     0.12    33953     0.00     0.00  next_nonempty_leaf
  0.13      84.18     0.11      414     0.00     0.00  deleteBinTree
  0.13     84.29     0.11                             fscanf
  0.13     84.40     0.11                             isnan
  0.13     84.51     0.11                             sYSMALLOc
  0.11     84.61     0.10     7135     0.00     0.00  TriPeIntersect
  0.11     84.71     0.10      375     0.00     0.00  Vdc_create_diff
  0.11     84.81     0.10        1     0.10    17.17  create_grid
  0.10      84.90     0.09        1     0.09     0.09  Vdc_page_init
  0.09     84.98     0.08    31286     0.00     0.00  insert_in_hashtable
  0.09     85.06     0.08     5502     0.00     0.00  Vdc_acquire_view
  0.09      85.14     0.08        1     0.08     2.75  NormalizeGeo
  0.08     85.21     0.07        1     0.07     0.07  MakeElementArray
  0.08     85.28     0.07                             strtof
  0.08     85.35     0.07     5233     0.00     0.00  AddPixelColor
  0.07     85.42     0.07   102762     0.00     0.00  next_voxel
  0.07     85.48     0.06     4098     0.00     0.00  shm_resize
  0.07     85.54     0.06     1457     0.00     0.00  Shade
  0.07     85.60     0.06                             _IO_new_file_xsputn
  0.07     85.66     0.06                             vfscanf
  0.06     85.71     0.05     1024     0.00     0.00  PutJob
  0.06     85.76     0.05        1     0.05     7.08  RayTrace
  0.06     85.81     0.05                             __libc_csu_fini
  0.06     85.86     0.05                             __mpn_rshift
  0.06     85.91     0.05                             vfork
  0.06     85.96     0.05                             write
  0.05     86.00     0.05     5233     0.00     0.00  PushRayTreeStack
  0.05     86.04     0.04                             _IO_default_seekoff
  0.05     86.08     0.04                             __ieee754_fmod
  0.05     86.12     0.04                             __mpn_construct_float
  0.05     86.16     0.04                             brk
  0.05      86.20     0.04                             buffered_vfprintf
  0.04     86.24     0.04     5502     0.00     0.00  Vdc_release_view
  0.04     86.27     0.04                             funlockfile
  0.04     86.31     0.04                             setitimer
  0.04     86.34     0.04    31286     0.00     0.00  mark_nonempty
  0.03     86.37     0.03     5502     0.00     0.00  Vdc_create_view
  0.03     86.40     0.03                             asprintf
  0.03     86.43     0.03                             mmap
  0.03     86.46     0.03                             socket
  0.03     86.49     0.03     3776     0.00     0.00   ConvertPrimRayJobToRayMsg
  0.03     86.52     0.03     2914     0.00     0.00  HuniformShadowIntersect
  0.03     86.55     0.03      352     0.00     0.00  Vdc_merge_diff
  0.03     86.58     0.03                             fmod
  0.03     86.61     0.03     7422     0.00     0.00  Vdc_page_req_handler
  0.03     86.63     0.03                             _IO_default_xsputn
  0.02     86.65     0.02     9615     0.00     0.00  DEBUG
  0.02     86.67     0.02     9615     0.00     0.00  DebugIsEnabled
  0.02     86.69     0.02     5233     0.00     0.00  TraverseHierarchyUniform
  0.02     86.71     0.02      377     0.00     0.00  diff_initial
  0.02     86.73     0.02        1     0.02     0.17  InitWorkPool
  0.02     86.75     0.02                             LocalFree
  0.02     86.77     0.02                             Rotate
  0.02     86.79     0.02                             Vdc_simple_ack
  0.02     86.81     0.02                             __find_specmb
  0.02     86.83     0.02                             pipe
  0.02     86.85     0.02                             read_int
  0.02     86.87     0.02                             strcmp
  0.02     86.89     0.02                             mremap_chunk
  0.02     86.90     0.02                             perror
  0.02     86.92     0.02                             strerror_r
  0.02     86.93     0.02      375     0.00     0.00  Vdc_merge_dirty_page
  0.02     86.95     0.02       31     0.00     0.00  Vdc_accept
  0.02     86.96     0.02                             pow
  0.01     86.97     0.01    35824     0.00     0.00  ma_rayinfo
  0.01     86.98     0.01    20464     0.00     0.00  mark_empty
  0.01     86.99     0.01    18335     0.00     0.00  free_rayinfo
  0.01     87.00     0.01    17756     0.00     0.00  Vdc_buff_alloc
  0.01     87.01     0.01    10466     0.00     0.00  CopyRayMsg
  0.01     87.02     0.01     7552     0.00     0.00  frand
  0.01     87.03     0.01     4098     0.00     0.00  Vdc_malloc
  0.01     87.04     0.01     4098     0.00     0.00  shm_malloc
  0.01     87.05     0.01     1457     0.00     0.00  SpecularDirection
  0.01     87.06     0.01      450     0.00     0.00   PolyElementBoundBox
  0.01     87.07     0.01      375     0.00     0.00  Vdc_make_twin
  0.01     87.08     0.01      238     0.00     0.00  GetJob
  0.01     87.09     0.01      237     0.00     0.00  GetJobs
  0.01     87.10     0.01      225     0.00     0.00  PolyDataNormalize
  0.01     87.11     0.01      225     0.00     0.00  PolyTransform
  0.01     87.12     0.01      143     0.00     0.00  Vdc_gather_view_diff
  0.01     87.13     0.01       62     0.00     0.00  Vdc_barrier_req_handler
  0.01     87.14     0.01        2     0.01     0.16  Vdc_barrier
  0.01     87.15     0.01        1     0.01     0.01  DebugInit
  0.01     87.16     0.01                             _IO_file_close
  0.01     87.17     0.01                             _IO_file_read
  0.01     87.18     0.01                             __ieee754_pow
  0.01     87.19     0.01                             __new_exitfn
  0.01     87.20     0.01                             __res_vinit
  0.01     87.21     0.01                             __uflow
  0.01     87.22     0.01                             _dl_start
  0.01     87.23     0.01                             mempcpy
  0.01     87.24     0.01                             munmap
  0.01     87.25     0.01                             open
  0.01     87.26     0.01                             recvfrom
  0.01     87.27     0.01        1     0.01     0.01  InitRayTreeStack
  0.01     87.27     0.01        1     0.01    32.82  main
  0.01     87.28     0.01                             TriIntersect
  0.01     87.28     0.01                             Vdc_view_req_dup_handler
  0.01     87.29     0.01                             _IO_default_uflow
  0.01     87.29     0.01                             finite
  0.01     87.30     0.01                             sigaction
  0.01     87.30     0.01                             strcasecmp
  0.00     87.30     0.00    18335     0.00     0.00  pop_up_a_grid
  0.00     87.30     0.00    16294     0.00     0.00  reset_rayinfo
  0.00     87.30     0.00     9009     0.00     0.00  PopRayTreeStack
  0.00     87.30     0.00     4012     0.00     0.00  GetRayJobFromBundle
  0.00     87.30     0.00     3776     0.00     0.00  TransformViewRay
  0.00     87.30     0.00     1457     0.00     0.00  PolyNormal
  0.00     87.30     0.00      567     0.00     0.00  segv_handler
  0.00     87.30     0.00      414     0.00     0.00  init_bintree
  0.00     87.30     0.00      375     0.00     0.00  Vdc_free_twin
  0.00     87.30     0.00      375     0.00     0.00  get_view_ptr
  0.00     87.30     0.00      374     0.00     0.00  Vdc_insert_diff
  0.00     87.30     0.00      225     0.00     0.00  PolyBoundBox
  0.00     87.30     0.00      225     0.00     0.00  PolyRead
  0.00     87.30     0.00      204     0.00     0.00  Vdc_view_req_handler
  0.00      87.30     0.00      177     0.00     0.00  alloc_view_buffer
  0.00     87.30     0.00       73     0.00     0.00  buff_expand
  0.00     87.30     0.00       73     0.00     0.00  buff_initialize
  0.00     87.30     0.00       62     0.00     0.00  get_socket_fd
  0.00     87.30     0.00       34     0.00     0.00  Vdc_err
  0.00     87.30     0.00       32     0.00     0.01  Vdc_acquire_Rview
  0.00     87.30     0.00       32     0.00     0.00  Vdc_release_Rview
  0.00     87.30     0.00       32     0.00     0.00  add_host
  0.00     87.30     0.00       31     0.00     0.00  Vdc_accept_initialize
  0.00     87.30     0.00        31     0.00     0.00  Vdc_connect
  0.00     87.30     0.00       31     0.00     0.00  get_process_commandline
  0.00     87.30     0.00       31     0.00     0.00  start_process
  0.00     87.30     0.00        31     0.00     0.00  wait_until_remote_started
  0.00     87.30     0.00       19     0.00     0.00  LookupCommand
  0.00     87.30     0.00        6     0.00     0.00  timer_out
  0.00     87.30     0.00         5     0.00     0.00  MatrixIdentity
  0.00     87.30     0.00        5     0.00     0.00  MatrixMult
  0.00     87.30     0.00        5     0.00     0.00  VerifyColorRange
  0.00     87.30     0.00        4     0.00     0.00  MatrixCopy
  0.00     87.30     0.00        2     0.00     0.00  MatrixInverse
  0.00     87.30     0.00        2     0.00     0.00  Translate
  0.00     87.30     0.00        1     0.00    17.24  BuildHierarchy_Uniform
  0.00     87.30     0.00        1     0.00     0.00  CreateViewMatrix
  0.00     87.30     0.00        1     0.00     0.00  Huniform_defaults
  0.00     87.30     0.00        1     0.00     0.00  InitDisplay
  0.00     87.30     0.00        1     0.00     0.00  InitEnv
  0.00     87.30     0.00        1     0.00     0.00  MatrixTranspose
  0.00     87.30     0.00        1     0.00     0.00  PrintStatistics
  0.00     87.30     0.00        1     0.00     0.00  ReadEnvFile
  0.00     87.30     0.00        1     0.00     0.00  Scale
  0.00     87.30     0.00        1     0.00     0.00  Vdc_buff_init
  0.00     87.30     0.00        1     0.00     0.00  Vdc_distribute
  0.00     87.30     0.00        1     0.00     0.00  Vdc_exit
  0.00     87.30     0.00        1     0.00     0.00  Vdc_free
  0.00     87.30     0.00        1     0.00     0.00  Vdc_segv_init
  0.00     87.30     0.00        1     0.00     0.00  Vdc_sigio_init
  0.00     87.30     0.00        1     0.00     0.11  Vdc_startup
  0.00     87.30     0.00        1     0.00     0.00  Vdc_timer_init
  0.00      87.30     0.00        1     0.00     0.00  Vdc_view_init
  0.00     87.30     0.00        1     0.00     0.00  ViewRotate
  0.00     87.30     0.00        1     0.00     0.00  analyze_args
  0.00     87.30     0.00        1     0.00     0.00  display_stat
  0.00     87.30     0.00        1     0.00     0.00  exit_handler
  0.00     87.30     0.00        1     0.00     0.00  init_mask_arrays
  0.00     87.30     0.00        1     0.00     0.00  init_masks
  0.00     87.30     0.00        1     0.00     0.09  init_memory
  0.00     87.30     0.00        1     0.00     0.02  init_network
  0.00     87.30     0.00        1     0.00     0.00  init_world_grid
  0.00     87.30     0.00        1     0.00     0.00  init_world_voxel
  0.00     87.30     0.00        1     0.00     0.00  ma_print
  0.00     87.30     0.00        1     0.00     0.00  powint
  0.00     87.30      0.00        1     0.00     0.00  read_infofile
  0.00     87.30     0.00        1     0.00     0.00  shm_free
  0.00     87.30     0.00        1     0.00     0.00  table_init
  0.00     87.30     0.00        1     0.00     0.00  test_localhost



More information about the Binutils mailing list