\gdef\dst{\displaystyle} \gdef\dg{\dagger}

5th Numerical Lattice Field Theory Training Camp

Lattice QCD Data Analysis


Lattice Parton Collabration

 

Outline

  • Theory on two-point function

  • Data analysis

    • Programing language/packages

    • iog data format

    • Resampling

    • least χ2\chi^2 fit

  • Linux Basics

  • Introduction to HDF5

 

I. Theory on two-point function

Two-point correlation function (two-point function, 2pt)

O(tsnk)O(tsrc)=n=00OnnOeEn(tsnktsrc)=c0eE0t+c1eE1t+\begin{aligned} \left\langle \mathcal{O}(t_\mathrm{snk}) \mathcal{O}(t_\mathrm{src})^\dg \right\rangle&=\sum_{n=0}^\infty \langle 0|\mathcal{O}|n\rangle\langle n|\mathcal{O}^\dg\rangle e^{-E_n(t_\mathrm{snk}-t_\mathrm{src})} = c_0 e^{-E_0 t} + c_1 e^{-E_1 t} + \cdots \end{aligned}

《格点量子色动力学导论》刘川,§20

  • For large enough τfτi\tau_f-\tau_i, higher excited states decay faster

    O(tsnk)O(tsrc)\left\langle \mathcal{O}(t_\mathrm{snk}) \mathcal{O}(t_\mathrm{src})^\dg \right\rangle \approx

    • c0eE0tc_0 e^{-E_0 t} 🢩 one state analysis
    • c0eE0t+c1eE1tc_0 e^{-E_0 t} + c_1 e^{-E_1 t} 🢩 two state analysis
  • Determine the hadron mass spectrum from lagrangian (first-principle or ab initio calculation)

 

Boundary conditions

  • Open bondary condition

    • ϕi=0\displaystyle \phi_{i}=0 if iNi\geq N or i<N0i<N_0
  • Periodic/AntiPeriodic boundary condition

    • ϕN=±ϕ0\displaystyle \phi_N=\pm\phi_0, ϕN+i=±ϕi\displaystyle \phi_{N+i} = \pm\phi_i 🢩 ϕi+N=±ϕ(i+N) mod N\phi_{i+N}=\pm \phi_{(i+N)\text{ mod }N}

 
  • Considering periodic boundary condition

    • propagate in both ±t\pm t direction

      C2(t)C0eT2E0cosh[E0(T2t)]\displaystyle C_2(t) \approx C_0 e^{-\frac{T}{2}E_0}\cosh\left[E_0\left(\tfrac{T}{2}-t\right)\right]

    《格点量子色动力学导论》刘川,§20

  • Determine hadron effective energy E0E_0 (meffm_{\mathrm{eff}})

    • A glance

      • For 0<tT/2\displaystyle 0<t\ll T/2 🢩 eE0teE0(T/2t)e^{-E_0 t}\gg e^{-E_0(T/2-t)} 🢩 lnC2(t)C2(t+a)aE0\displaystyle \ln\frac{C_2(t)}{C_2(t+a)}\approx a E_0
      • C2(t)C2(t+a)=cosh[E0(T/2t)]cosh[E0(T/2(t+a))]\dst \frac{C_2(t)}{C_2(t+a)} = \frac{\cosh[E_0(T/2-t)]}{\cosh[E_0(T/2-(t+a))]} or C2(t+a)+C2(ta)2C2(t)=coshaE0\dst \frac{C_2(t+a)+C_2(t-a)}{2C_2(t)}=\cosh aE_0

      log

      cosh

    • More systematically way 🢩 Fit (see Part. 3 for detail)

 

II. 2pt Data Analysis

Programing Language and Packages

  • Demo provided in python3
  • Python package
    • importing data : ctypes, pandas
    • Data Processing: numpy
    • Fitting: gvar, lsqfit
    • Visualization: matplotlib
  • Useful but not mandatory
    • Regular expression (python package re)
  • Advanced
    • HDF5, h5py
    • MPI
 
 
  • Installation

    • pip3: package installer for python

    • pip3 install package_name --user

    • --user install under current user enviorment, does not need root permission

  • Python devel package (required by iog.so)

    • header /usr/include/python3.6m/
 

Data Format

  • iog: io general, used in LPC

    • Needs to write your own parser or dump as text file
    • Interface to pandas dataframe and to HDF5, h5py are provided
      #!/usr/bin/env python3
      # encoding: utf-8
      from iog_reader import iog_read
      
      iog_file = "./2pt_5350.dat.iog"
      intrptr = ["cnfg","hdrn","t"]
      
      iog = iog_read(iog_file,intrptr)
      print(iog,'\n--------\n')
      
      iog_pi=iog.loc[(iog['hdrn']=='0')]
      print(iog_pi['Re'].to_numpy(),'\n--------\n')
      print(iog_pi['Re'].to_list(),'\n--------\n')
      
  • Detailed usage of pandas.DataFrame.loc: see documentation

 
  • iog_reader.py : iog 🢩 pandas.dataframe

    #!/usr/bin/env python3
    # encoding: utf-8
    
    from ctypes import *
    import os
    import numpy as np
    import pandas as pd
    
    lib = cdll.LoadLibrary(os.getcwd()+ '/iog.so')
    
    def iog_read(iog_fl, intrprtrs):
        iog_file = c_char_p(iog_fl.encode('utf-8'))   
        size = lib.getsize(iog_file)
    
        class iog_type(Structure):
            _fields_ = [('info_shape', POINTER(c_int)), ('info', POINTER(c_int)), 
                        ('Re', POINTER(c_double)),  ('Im', POINTER(c_double))]
    
        lib.getdat.restype = POINTER(iog_type)
        iog = lib.getdat(iog_file)
    
        info_shape = np.ctypeslib.as_array(iog.contents.info_shape, shape=[2])
        info = np.ctypeslib.as_array(iog.contents.info, shape=[np.prod(info_shape)])
        re = np.ctypeslib.as_array(iog.contents.Re, shape=[size])
        im = np.ctypeslib.as_array(iog.contents.Im, shape=[size])
        lables = info.reshape(info_shape)
        lables_str = np.vectorize(str)(lables)
    
        iog_dict = {}
        for indx in range(0, len(intrprtrs)):
            iog_dict.update({intrprtrs[indx]:lables_str[:,indx]})
    
        iog_dict.update({'Re':re})
        iog_dict.update({'Im':im})
    
        dataFrm = pd.DataFrame(iog_dict)
        return dataFrm
    
 
  • In case you need to compile iog.so by yourself

  • Source located under demo/iog

  • Set the value of INCL pointing to the path containing python develop tools header in Makefile

  • Execute make iog.so

 
  • Output of iog.py

          cnfg hdrn   t             Re            Im
    0     5350    0   0  591279.959489      0.000005
    1     5350    0   1  522915.505562     -0.000003
    2     5350    0   2  482360.005234     -0.000012
    3     5350    0   3  440684.923986     -0.000002
    4     5350    0   4  387859.756810      0.000004
    ...    ...  ...  ..            ...           ...
    1147  5350   15  67   26763.208835   7141.283387
    1148  5350   15  68   42259.980047  12240.936307
    1149  5350   15  69   79407.008466  17624.534133
    1150  5350   15  70  141667.387280  25814.545022
    1151  5350   15  71  245504.136076  36646.858214
    
    [1152 rows x 5 columns]
    --------
    
    [591279.9594888687, 522915.5055618286,...
    --------
    
    [591279.95948887 522915.50556183 482360.00523376 ...
    

string

  • Filter by value: iog.loc[18<=iog['t'].astype(int)]
  • Use logical operator: iog_pi=iog.loc[(iog['hdrn']=='0') & (iog['t'].astype(int)<18)]
 

What is intrptr ?

  • iog convention

          cnfg hdrn   t             Re            Im
    0     5350    0   0  591279.959489      0.000005
    1     5350    0   1  522915.505562     -0.000003
    2     5350    0   2  482360.005234     -0.000012
    3     5350    0   3  440684.923986     -0.000002
    4     5350    0   4  387859.756810      0.000004
    ...    ...  ...  ..            ...           ...
    
  • cnfg: configuration number

  • hdrn (hadron) : ["PION", "KAON", "PROTON", "RHO", "D", "ETAS", "ETAC", "JPSI", "LAMBDA", "LAMBDAC", "SIGMA", "XI", "XIC", "OMEGA", "DELTA"]

  • t: time slice

  • Re, Im: real and imaginary part of two-point function

    theoretically, 2pt should be real, due to numberical errors, it may have a neglectable imaginary part

 
  • more complicated iog convention (not in this traning camp)

          cnfg tsrc lnk snksmr       hdrn    mntm    t   Re   Im
    0     2125   24   0      0     200505  505050    0  0.0  0.0
    1     2125   24   0      0     200505  505050    1  0.0  0.0
    ...    ...  ...  ..    ...        ...     ...  ...  ...  ...
    4606  2125   24   0      1  100001515  505045  126  0.0  0.0
    4607  2125   24   0      1  100001515  505045  127  0.0  0.0
    
  • cnfg: configuration number

  • tsrc: position of 1st source in time direction

  • snksmr (sink smear):0 🢩 unsmeared sink, 1 🢩 smeared sink

  • lnk: dummy label in two-point data file

  • hdrn (hadron) : 200505 🢩 nucleon, 1001515 🢩 π\pi

  • mntm (momentum): next slide

  • t: time slices,

  • Re, Im: real and imaginary part of two-point function

 
  • Momentum label (not used in this traning camp)

    • momentum : 505050 🢩 50 50 50 🢩 0, 0, 0 🢩 P=(0,0,0)\vec{P}=(0,0,0)

      • lattice momentum unit: 2πL\dst \frac{2\pi}{L}, L=naL = n a

        Discertized Fourier transformation

    • p~x,p~y,p~z\tilde{p}_x, \tilde{p}_y, \tilde{p}_z, eg. 50 50 48

      • P=[(50,50,50) ⁣ ⁣(p~x,p~y,p~z)]×2πL ⁣= ⁣(0,0,2)×2πL\displaystyle \vec{P}=\left[(50,50,50)\!-\!(\tilde{p}_x, \tilde{p}_y, \tilde{p}_z)\right]\times\frac{2\pi}{L}\! =\! (0,0,2)\times\frac{2\pi}{L}🢩 2 unit momentum in zz-direction
 

Resampling


  • Conventional resampling method

    • Jack-knife

    • Bootstrap

 

Why Resampling

  • Suppose CC is directly calculate on the lattice, O=F(C)\mathcal{O}=F(\langle {C} \rangle) is what we are intrested in

    FF usually is a nonlinear function

  • Usually, O=F(C)F(C)\mathcal{O} = F(\langle {C} \rangle) \neq \langle F(C) \rangle

  • Error estimation of O\mathcal{O}

    • usually, the target O\mathcal{O} has a complicate dependence (usually nonlinear) on lattice calculable C\mathcal{C}
    • Error propagation σOF(C)σC\sigma_{\mathcal{O}}\approx F'(\langle C \rangle)\sigma_{\mathcal{C}} ?
      • 😭 FF is nonlinear
      • 😭 σC\sigma_\mathcal{C} is not small enough to neglect O(σC2)\mathcal{O}(\sigma_{\mathcal{C}}^2) term

Solution: Resampling

 

1. Jack-knife Resampling

  • Original Datasets:

    • cnfg 1   🢩 C(1)=(C0(1),,Ct(1))\vec{C}^{(1)} = (C_0^{(1)},\cdots ,C_t^{(1)})
    • \cdots \cdots \cdots
    • cnfg NN 🢩 C(N)=(C0(N),,Ct(N))\vec{C}^{(N)} = (C_0^{(N)},\cdots ,C_t^{(N)})

      time slice: 0,1,,t0, 1,\cdots, t

  • iith Jack-knife sample

    • C~(i)=NN1C1N1C(i)\tilde{C}^{(i)} = \frac{N}{N-1}\overline{C}-\frac{1}{N-1}\vec{C}^{(i)} 🢩 drop cnfg ii, calculate mean value of the rest
  • Estimator

    • C=1NiC(i)=1NiC~(i)\overline{C}=\frac{1}{N}\sum_{i}\vec{C}^{(i)} = \frac{1}{N}\sum_{i}\tilde{C}^{(i)}
  • cov=(N1)×cov~\mathrm{cov} = (N-1)\times\tilde{\mathrm{cov}}, cov~\tilde{\mathrm{cov}}: covaiance matrix of Jack-knife samples

    (cov)t,t=1N1i(Ct(i)Ct)(Ct(i)Ct)\dst\left(\mathrm{cov}\right)_{t,t'}=\frac{1}{N-1}\sum_{i}(C^{(i)}_t-\overline{C}_t)(C^{(i)}_{t'}-\overline{C}_{t'})

  • σ=N1×σ~\sigma=\sqrt{N-1}\times\tilde{\sigma}

    σ~=1N1i(C(i)C)2\dst\tilde{\sigma}=\frac{1}{N-1}\sum_{i}(C^{(i)}-\overline{C})^2

 
  • Jacknife samples are highly correlated
    • Mean of almost identical data
    • (C~(i)C)2=(NN1C1N1C(i)C)2=1(N1)2(C(i)C)2\dst \left(\tilde{C}^{(i)}-\overline{C}\right)^2 = \left(\frac{N}{N-1}\overline{C}-\frac{1}{N-1}\vec{C}^{(i)}-\overline{C}\right)^2=\frac{1}{(N-1)^2}\left(\vec{C}^{(i)}-\overline{C}\right)^2
 

Coding tips

  • Vectroize your code instead of using loop

    #!/usr/bin/env python3
    # encoding: utf-8
    
    import numpy as np
    from os import listdir
    from iog_reader import iog_read
    from re import match
    import cProfile
    from functools import reduce
    
    iog_path = "./Data/"
    T = 72
    
    files = listdir(iog_path)
    iogs = [file for file in files if match(r".*2pt.*.dat.iog",file)!=None]
    Ncnfg = len(iogs)
    
    c2pt = np.zeros((Ncnfg,T))
    for indx in range(0,Ncnfg):
        dat=iog_read(iog_path+iogs[indx],["cnfg","hdrn","t"])
        c2pt[indx] = dat.loc[(dat["hdrn"]=="0")]["Re"].to_numpy()
    
    c2pt_jcknf_1 = np.zeros((Ncnfg,T))
    c2pt_jcknf_2 = np.zeros((Ncnfg,T))
    c2pt_jcknf_3 = np.zeros((Ncnfg,T))
    
    def c2pt_jcknf_loop_1():
        for i in range(0,Ncnfg):
            for j in range(0,T):
                c2pt_jcknf_1[i,j] = np.mean(np.delete(c2pt,[i],axis=0)[:,j])
    
    def c2pt_jcknf_loop_2():
        for i in range(0,Ncnfg):
            c2pt_jcknf_2[i] = np.mean(np.delete(c2pt,[i],axis=0),axis=0)
    
    def c2pt_jcknf():
        c2pt_sum = np.sum(c2pt,axis=0)
        global c2pt_jcknf_3 
        c2pt_jcknf_3 = ((c2pt_sum) - c2pt)/(Ncnfg-1)
    
    cProfile.run('c2pt_jcknf_loop_1()')
    print(" ------ \n")
    cProfile.run('c2pt_jcknf_loop_2()')
    print(" ------ \n")
    cProfile.run('c2pt_jcknf()')
    print(" ------ \n")
    
    print(reduce(lambda x,y: x&y,np.fabs((c2pt_jcknf_1-c2pt_jcknf_2)/c2pt_jcknf_2)<1e-8))
    print(reduce(lambda x,y: x&y,np.fabs((c2pt_jcknf_2-c2pt_jcknf_3)/c2pt_jcknf_2)<1e-8))
    
    
 

Coding tips

  • output

             115780 function calls in 0.082 seconds
       Ordered by: standard name
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         4824    0.003    0.000    0.039    0.000 <__array_function__ internals>:177(delete)
         4824    0.002    0.000    0.036    0.000 <__array_function__ internals>:177(mean)
         ......
     ------ 
             1612 function calls in 0.001 seconds
       Ordered by: standard name
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
           67    0.000    0.000    0.001    0.000 <__array_function__ internals>:177(delete)
           67    0.000    0.000    0.001    0.000 <__array_function__ internals>:177(mean)
        ......
    ------
             13 function calls in 0.000 seconds
       Ordered by: standard name
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(sum)
            1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        ......
    ------
       [ True  True  True  True  True  True  True  True  True  True  True  True ......
    
 

2. Bootstrap Resampling

  • Original Datasets:

    • cnfg 1   🢩 C(1)=(C0(1),,Ct(1))\vec{C}^{(1)} = (C_0^{(1)},\cdots ,C_t^{(1)})
    • \cdots \cdots \cdots
    • cnfg NN 🢩 C(N)=(C0(N),,Ct(N))\vec{C}^{(N)} = (C_0^{(N)},\cdots ,C_t^{(N)})

      time slice: 0,1,,t0, 1,\cdots, t

  • iith Bootstrap sample

    • Randomly (uniformly) sample NN cnfgs from original datasets, allow repetition,
    • C~(i)\tilde{C}^{(i)} 🢩 mean of previsouly sampled data
  • Repeat resampling MM times, M>NM > N, usually MO(103)M\sim \mathcal{O}(10^3)

    • resampled dataset 🢩 (C~(1),,C~(M))\left(\tilde{C}^{(1)},\cdots,\tilde{C}^{(M)}\right)
 
  • Estimator
    • C=1NiC(i)=1NiC~(i)\dst \overline{C}=\frac{1}{N}\sum_{i}\vec{C}^{(i)} = \frac{1}{N}\sum_{i}\tilde{C}^{(i)}
    • cov=cov~\mathrm{cov} = \tilde{\mathrm{cov}}, cov~\tilde{\mathrm{cov}}: covaiance matrix of bootstrap samples
    • σ=σ~\sigma=\tilde{\sigma}
 

Estimator for O=F(C)\mathcal{O}=F(\langle C \rangle)

  • C~=C~(1),C~(2),C~(N)\tilde{C}=\tilde{C}^{(1)},\,\tilde{C}^{(2)},\,\cdots \tilde{C}^{(N)} denote the resampled data

  • Calculate O~=O~(1),O~(2),O~(N)=F(C~(1)),F(C~(2)),F(C~(N))\tilde{\mathcal{O}}=\tilde{\mathcal{O}}^{(1)},\,\tilde{\mathcal{O}}^{(2)},\,\cdots \tilde{\mathcal{O}}^{(N)}=F(\tilde{C}^{(1)}),\,F(\tilde{C}^{(2)}),\,\cdots F(\tilde{C}^{(N)})

  • Centeral value O=O~\overline{\mathcal{O}}=\overline{\tilde{\mathcal{O}}}

  • Error estimation

    • Jack-knife resample 🢩 σO=N1×σO~\sigma_{\mathcal{O}} = \sqrt{N-1}\times\sigma_{\tilde{\mathcal{O}}}

    • Bootstrap resample 🢩 σO=σO~\sigma_{\mathcal{O}} = \sigma_{\tilde{\mathcal{O}}}

      σO~2=1N1i(O(i)O)2\dst \sigma_{\tilde{\mathcal{O}}}^2=\frac{1}{N-1}\sum_i\left(\mathcal{O}^{(i)}-\overline{\mathcal{O}}\right)^2

  • Parallel your code to do F(C~(i))F(\tilde{C}^{(i)})

    • e.g. MPI: each process calculate one F(C~(i))F(\tilde{C}^{(i)})
 

Least χ2\chi^2 fit

  • Data:
    • cnfg 1   🢩 C(1)=(C0(1),,CT(1))\vec{C}^{(1)} = (C_0^{(1)},\cdots ,C_T^{(1)})
    • \cdots \cdots \cdots
    • cnfg NN 🢩 C(N)=(C0(N),,CT(N))\vec{C}^{(N)} = (C_0^{(N)},\cdots ,C_T^{(N)})

      0  T0 \cdots\; T: time slice indices
      0N0\cdots N: cnfg indices

  • Model: f(t,{pi})f(t,\{p_i\})
    • tt: time slices (w.r.t source)

    • {pi}\{p_i\} : fit parameters 🢩 c0,c1,E0,ΔEc_0, c_1, E_0, \Delta E

    • e.g. For two-point function (two-state fit)

      f(t,{pi})    c0eE0t(1+c1eΔEt)f(t,\{p_i\}) \implies c_0 e^{-E_0 t} (1+c_1 e^{-\Delta E t})

 

χ2\chi^2 definition

  • Uncorrelated case

    χ2=t[Ctf(t,{pi})]2σt2\chi^2 = \sum_t \frac{\left[\overline{C}_t - f(t,\{p_i\})\right]^2}{\sigma_t^2}\hskip 4em

    • Ct\overline{C}_t 🢩 mean value of the tt-th timeslices
    • σt\sigma_t\, 🢩 standard deviation of the tt-th timeslices
    • No interference between different tt 🢪🢩 uncorrelated
  • Correlated case

    χ2=t,t[Ctf(t,{pi})](cov1)t,t[Ctf(t,{pi})]\chi^2 = \sum_{t, t'}\left[\overline{C}_{t}-f(t,\{p_i\}) \right] \left(\mathrm{cov}^{-1}\right)_{t,t'} \left[\overline{C}_{t'}-f(t',\{p_i\})\right]

    • cov\dst \mathrm{cov} : covariance matrix

      (cov)t,t=1N1i(Ct(i)Ct)(Ct(i)Ct)\left(\mathrm{cov}\right)_{t,t'}=\frac{1}{N-1}\sum_{i}(C^{(i)}_t-\overline{C}_t)(C^{(i)}_{t'}-\overline{C}_{t'})

 

Coding Tips

  • python array index starts from 0
  • numpy package has build-in mean, std, cov functions
    • e.g. a[2:6] 🢩 a2,a3,a4,a5a_2,a_3,a_4,a_5
  • Vectorize (and broadcast) your code instead of using loops
  • numpy: numpy.mean, numpy.std, numpy.cov support high rank tensors
    • Set axis parameter in numpy function
    • e.g. R=Ri,j,k,l\mathrm{R}=R_{i,j,k,l}, i=1,...,n0i=1,...,n_0, j=1,...,n1j=1,...,n_1 , k=1,...,n2k=1,...,n_2, l=1,...,n3l=1,...,n_3
      • Assuming R, R' are numpy arrays
      • S=np.mean(R,axis=1) 🢪🢩 Si,k,l=1n1jRi,j,k,l\dst \mathrm{S}_{i,k,l}=\frac{1}{n_1}\sum_{j}R_{i,j,k,l}
      • R*R' 🢩 (RR)i,j,k,l=Ri,j,k,l×Ri,j,k,l(\mathrm{R}*\mathrm{R}')_{i,j,k,l}=\mathrm{R}_{i,j,k,l} \times\mathrm{R}'_{i,j,k,l}
  • Not sure about which axis? Print out the shape of R\mathrm{R}, S\mathrm{S}: R.shape, S.shape
 
  • χ2\chi^2 🢩 measure the deviation between data and model

  • least χ2\chi^2 fit 🢩 optimize {pi}\{p_i\} for minimal BUT resonable χ2\chi^2

  • lsqfit 🢩 a python package perform least χ\chi-square fit

    fit = lsqfit.nonlinear_fit(data=(var_val, dat_val), 
                               fcn=model, prior=ini_val, debug=False)
    
    • var_val: value of variables 🢩 xx-axis
    • dat_val: value of datas (with erros) 🢩 yy-axis (with err/cov)
    • model    : fit model function
    • ini_val : initial values of (also constrains on) fit parameters in model
 

lsqfit - an example

#!/usr/bin/env python3
# encoding: utf-8

import gvar as gv
import lsqfit
import numpy as np
import matplotlib.pyplot as plt

c2pt_raw = np.load("./data_2pt.npy")
c2pt = c2pt_raw[:, 0, :, -2]
Ncnfg = c2pt.shape[0]
T = c2pt.shape[1]
T_hlf = T//2

#? jack-knife resample 
#! only used in plot , compare original data with fit
c2pt_sum = np.sum(c2pt,axis=0)
c2pt_jcknf = ((c2pt_sum) - c2pt)/(Ncnfg-1) # jack-knife resample
c2pt_cntrl = np.mean(c2pt_jcknf,axis=0) # jack-knife mean
c2pt_err = np.sqrt(Ncnfg-1)*np.std(c2pt_jcknf,axis=0) # jack-knife std

#? drop the first data point, normalized to c2[1] ~ 1
t_ary = np.array(range(1,T))
c2pt_jcknf = c2pt_jcknf/c2pt_cntrl[1]
c2pt_cntrl = np.mean(c2pt_jcknf,axis=0) # jack-knife mean
c2pt_err = np.sqrt(Ncnfg-1)*np.std(c2pt_jcknf,axis=0) # jack-knife std

#? average forward/backward propgation, for better symmetry 
c2pt_jcknf_fwrd = c2pt_jcknf[:,1:T_hlf+1] #? forward propgation part
c2pt_jcknf_bwrd = c2pt_jcknf[:,T_hlf:][:,::-1] #? backward propgation part
c2pt_jcknf_avg = (c2pt_jcknf_fwrd+c2pt_jcknf_bwrd)/2 #? average forward/backward propgation
c2pt_avg_cntrl = np.mean(c2pt_jcknf_avg,axis=0) # jack-knife mean
c2pt_avg_cov = (Ncnfg-1)*np.cov(np.transpose(c2pt_jcknf_avg,axes=(1,0))) # jack-knife covariance

T_strt = 10 #? starting point of the fit
T_end =  30 #? ending point of the fit
tsep_dctnry = {'c2pt': t_ary[T_strt:T_end]}
c2pt_dctnry = {'c2pt': gv.gvar(c2pt_avg_cntrl[T_strt:T_end], c2pt_avg_cov[T_strt:T_end, T_strt:T_end])}

def ft_mdls(t_dctnry, p):
    mdls = {}
    ts = t_dctnry['c2pt']
    mdls['c2pt'] = p['c0']*np.exp(-p['E0']*T_hlf/2)*np.cosh(p['E0']*(ts-T_hlf))
    return mdls

ini_prr = gv.gvar({'c0': '0(5)','E0': '0.2(0.5)'}) #? initial value
fit = lsqfit.nonlinear_fit(data=(tsep_dctnry, c2pt_dctnry), fcn=ft_mdls, prior=ini_prr, debug=True) #? excute the fit
print(fit.format(True)) #? print out fit results

#? time slices used in fit
t_ary = fit.data[0]['c2pt']

#? data to be fitted
c2pt_avg_dat_gvar = fit.data[1]['c2pt'] 
c2pt_avg_dat_cntrl = np.array([c2.mean for c2 in c2pt_avg_dat_gvar])
c2pt_avg_dat_err = np.array([c2.sdev for c2 in c2pt_avg_dat_gvar])

#? fitted function values
t_lst = np.linspace(10, T-10, 50) 
c2pt_fit_fcn_gvar = fit.fcn({'c2pt':t_lst}, fit.p)['c2pt']
c2pt_fit_fcn_cntrl = np.array([c2.mean for c2 in c2pt_fit_fcn_gvar])
c2pt_fit_fcn_err = np.array([c2.sdev for c2 in c2pt_fit_fcn_gvar])

#? plots
fig, ax = plt.subplots(1,1, figsize=(10, 7*0.5))
ax.errorbar(np.array(range(0,T)),c2pt_cntrl,yerr=c2pt_err,fmt='bo',alpha=0.4,label="$C_2$") #? plot original data
ax.errorbar(t_ary,c2pt_avg_dat_cntrl,yerr=c2pt_avg_dat_err,fmt='go',alpha=0.5,label="frwrd/bckwrd avg. $C_2$") #? plot forward/backward averaged data
ax.plot(t_lst,c2pt_fit_fcn_cntrl,color="b",label="best fit") #? plot fitted function
ax.fill_between(t_lst,c2pt_fit_fcn_cntrl-c2pt_fit_fcn_err,c2pt_fit_fcn_cntrl+c2pt_fit_fcn_err,alpha=0.3) #? plot fitted function errorband
plt.yscale("log")
plt.xlabel('t/a')
plt.ylabel('$C_2$')
ax.legend(loc='upper center', fontsize=10, frameon=True, fancybox=True, framealpha=0.8, borderpad=0.3, \
            ncol=1, markerfirst=True, markerscale=1, numpoints=1, handlelength=1.5)
fig.subplots_adjust(left=0.15, right=0.9, top=0.9, bottom=0.15)
fig.savefig("c2pt_fit.png")
 
  • var_val, dat_val, model, ini_val: dictionary


  • dictionary {index_1: value_1, index_2: value_2, ... }


    key
    value

  • Why wrap data in dictionaries?

    • Consider the case of
      • fit 2 datasets with 2 models
      • 2 models may have shared parameters
 

Combined fit

  • Combined fit: e.g. 2 data sets, 2 models (with shared parameters)

    label label1 label2
    data (x(1),y(1))(\vec{x}^{(1)}, \vec{y}^{(1)}) (x(2),y(2))(\vec{x}^{(2)}, \vec{y}^{(2)})
    model f(x,{p,q})f(x,\{\textcolor{blue}{\vec{p}},\vec{q}\}) g(x,{p,r})g(x,\{\textcolor{blue}{\vec{p}},\vec{r}\})
    ini_val p0\textcolor{blue}{\vec{p}_0}, q0\vec{q}_0 p0,r0\textcolor{blue}{\vec{p}_0}, \vec{r}_0
    • p\textcolor{blue}{\vec{p}} denotes the shared parameters
 
  • Wraping data for combined fit

    var_val = {'label1': [x1_1,x1_2,...], 'label2': [x2_1,x2_2,...
    
    dat_val = {'label1': [y1_1,y1_2,...], 'label2': [y2_1,y2_2,...
    
    prior = {'p_1': p_1,'p_2': p_2, ..., 'q_1': q_1, 'q_2': q_2, ... ,'r_1': r_1, 'r_2': r_2}
    
    • y_1,2: gvar class 🢩 y_1 = gvar(y_1_cntrl, y_1_cov) ...

    • p_1,2: gvar class 🢩 p_1 = gvar(p_1_cntrl, p_1_err) ...

    • Feed the var_val, dat_val and prior to lsqfit.nonlinear_fit

  • Wrapping model for combined fit

    def fit_mdls(t_dctnry, p):
      mdls = {}
      mdls['label_1'] = ...
      mdls['label_2'] = ...
      return mdls
    
 

layout: two-cols

Multiple/Single source, Open/Periodic boundary condition

  • Put multiple source in time direction to increase the statistics

  • Average over sources within each config (iog file)

  • Example of open boundary condition, two sources

 

 

  • Single source, periodic boundary condition
  • Example
    • C(T+a)=C(0)C_(T+a)=C(0)
    • C(t)C0eE0T2cosh[E0(T2t)]C(t)\approx C_0 e^{-E_0\frac{T}{2}}\cosh\left[{-E_0\left(\frac{T}{2}-t\right)}\right]
 

Fit Results - A glance

Least Square Fit:
 chi2/dof [dof] = 0.96 [20]    Q = 0.5    logGBF = 116.43

Parameters:
            c0    0.252 (10)     [  0.0 (5.0) ]  
            E0   0.1090 (27)     [  0.20 (50) ]  
  • Fences: two-point function data
  • Bands: fitted model

Error has been magnified with factor of 5 in figure

  • Fited effective mass: E0=E_0=

    • The fitted E0E_0 is in lattice unit.
    • E0tE_0t in eE0te^{-E_0t} must be dimensionless 🢩 E0=E0,latt.×0.1973a[GeV]\dst E_0 = \frac{E_{0,\mathrm{latt.}}\times 0.1973}{a}\,[\mathrm{GeV}]
 

Output log

Least Square Fit:
  chi2/dof [dof] = 0.96 [20]    Q = 0.5    logGBF = 116.43

Parameters:
             c0    0.252 (10)     [  0.0 (5.0) ]  
             E0   0.1090 (27)     [  0.20 (50) ]  

Fit:
       key         y[key]      f(p)[key]
----------------------------------------
    c2pt 0     0.277 (12)    0.2712 (79)  
         1     0.248 (11)    0.2435 (74)  
         2     0.223 (11)    0.2186 (69)  
         3     0.199 (10)    0.1964 (66)  
         4    0.1790 (94)    0.1765 (62)  
         5    0.1610 (87)    0.1586 (59)  
         6    0.1446 (79)    0.1427 (56)  
         7    0.1304 (75)    0.1284 (54)  
         8    0.1180 (71)    0.1157 (51)  
         9    0.1073 (68)    0.1044 (49)  
        10    0.0981 (68)    0.0943 (47)  
        11    0.0900 (69)    0.0853 (44)  
        12    0.0825 (68)    0.0773 (42)  
        13    0.0758 (66)    0.0703 (41)  
        14    0.0701 (64)    0.0641 (39)  
        15    0.0649 (62)    0.0586 (37)  *
        16    0.0603 (61)    0.0539 (36)  *
        17    0.0562 (58)    0.0498 (35)  *
        18    0.0523 (55)    0.0462 (33)  *
        19    0.0490 (52)    0.0433 (32)  *

Settings:
  svdcut/n = 1e-12/0    tol = (1e-08,1e-10,1e-10*)    (itns/time = 9/0.1)
  fitter = scipy_least_squares    method = trf

Print this detailed log by setting print(fit.format(True))

 

Access the fitted model function


fit = lsqfit.nonlinear_fit(data=(tsep_dctnry, c2pt_dctnry), 
fcn=ft_mdls, prior=ini_prr, debug=True)

c2pt_fit_gvar = fit.fcn({'c2pt':t_lst}, fit.p)['c2pt']
  • tsep_dctnry = {'c2pt':np.array(range(T_strt,T_end))}

  • fit.p 🢩 fitting parameter, dictionry

    • fit.p['E0']
 

What is a resonable fit?

  • Case 1

    Least Square Fit:
      chi2/dof [dof] = 13 [10]     
      Q = 5.2e-24    
      logGBF = -74.281
    
  • Underfit: Model failed to discribe data

 

What is a resonable fit?

  • Case 2

    Least Square Fit:
      chi2/dof [dof] = 0.07 [10]   
      Q = 1    
      logGBF = -46.654
    
    • Over fit: Try to extract too much info from data
 

What is a resonable fit?

  • Case 3

    Least Square Fit:
      chi2/dof [dof] = 1 [10]      
      Q = 0.41    
      logGBF = -15.05
    
    • Resonable fit
 

How good is a fit? Interpretation of lsqfit log

Least Square Fit:
  chi2/dof [dof] = 1 [10]    Q = 0.41    logGBF = -15.05
  • chi2/dof: better around 11

  • Q: The probability that the χ2\chi^2 from the fit could have been larger, by chance, assuming the best-fit model is correct.

    • Good fits have Q values larger than 0.1 or so
  • logGBF : lnP(datafit)\ln P(\mathrm{data}|\mathrm{fit}) 🢩 Probability of obtaining the data by randomly sampling the fitted model

    • Useful for comparing fits of the same data to different models, with different priors and/or fit functions.
    • The model with the largest value of logGBF is the one preferred by the data.
    • Differences in fit.logGBF smaller than 1 are not very significant.
 

Coding Tips

  • Vectorize your code, instead of looping over
  • Modulize
    • e.g. in 2-point analysis demo
      • module 1: extract required data (hadron id, momentum id, smearing...) from iog/h5, export to data file
      • module 2: load data from 1, Jack-knife / bootstrap analysis, export results in data file
      • module 3: load data from 2, perform lsqfit, outupt
  • Naming convention and comments
    • e.g. c2pt_cntrl, c2pt_cov, t_ary v.s. a, b, t
  • Version control (git, SVN)
 

Exercise

1. Read the pion two-point function data

  • Create a directory named by your name under the home directory
  • copy /dssg/home/acct-phyww/phyww/qazhang/training_camp/class1_xiong/Exercise_1 to the created directory

2. Find out what boundary condition and how many sources are used

3. Jack-knife analysis of the two-point function

4. Using lsqfit to fit the pion's 2pt data and determine the gournd state energy

5. Comparing your fit and data (For this exercise: lattice spacing: 0.108 fm, mπ200MeVm_\pi \approx 200\mathrm{MeV})

 

Hint:

  • iog_reader, pandas.DataFrame.loc,pandas.DataFrame.to_numpy,
    numpy.mean, numpy.cov, numpy.std, lsqfit.nonlinear_fit

  • Data containing the 50 two-point function iog data files, from 50 configuration

  • 72 lattice spacings in time direction

  • iog_reader is provided in iog_read_demo.py

  • x=gv.gvar('2(0.5)') : x.mean2.00, x.sdev 🢩 0.50

  • Construct gvar from tuple x=gv.gvar((2.0.5))

 

Notes

  • Python indices starts from 0, Mathmatica, Julia starts from 1

  • numpy.ndarray slice, range doe not include the last index e.g.

    • a[2:5] 🢩 [a2,a3,a4]
    • range(2,5) 🢩 [2,3,4]
  • iog_read.py returns pandas.Dataframe, columns in this data frame are strings (not integers)

 

III. Linux Basics

Search for help and tips

 

Path

  • Absolute path: starts from /
    • print by pwd (present working directory)
    • e.g. /home/chimaoshu/Nutstore_Files
  • Relative path: relative to pwd

    e.g. ./demo/c2pt_fit.png, ../LPC_trnsvrsty
  • Some special path
    • home directory ~ 🢩 /home/USR
    • PWD .
    • Parent path .. , also in combination ../..
  • Change directory cd path
  • List files and directories under a path ls path
  • Autocomplete/suggestions
    • Press Tab
 

Create/Remove/Move/Copy file, directory

  • create a file
    • touch file_name 🢩 create under PWD
    • touch path/file_name 🢩 create under path
  • create a directory
    • mkdir dir_name 🢩 create under PWD
    • mkdir path/dir_name 🢩 create under path
  • remove (delete) a file
    • rm file_name, rm path/file_name
  • remove a directory (and everything inside)
    • rm -r dir_name rm -r path/dir_name
    • CAUTION when using rm -rf
  • Similar for copy cp and move(rename) mv
 

cat, grep, >

  • cat: show contents of a file
    • e.g. cat ./demo/lsqfit_2pt_iog.py
  • grep: catch specified contents of a file
    • e.g. cat ./demo/lsqfit_2pt_iog.py | grep gvar

    '|' : pipe

  • >: redirect output stream
    • e.g. ./lsqfit_2pt.py > lsq.log 2>&1

    2>&1 append error message

Compress and uncompress

  • compress tar
    • try tar --help
 

SSH login

  • On Windows you can use powerhsell's built-in SSH client
  • ssh usr@host
    • e.g. ssh phyzjun-user3@sylogin.hpc.sjtu.edu.cn
    • usr: username, host: remote machine
  • Use key instead of password
    • ssh-keygen: create RSA key (public and private)
    • distribute pubkey id_rsa.pub to others
    • keep your private key id_rsa private
    • ssh-copyid usr@host: install your pubkey to host
  • SSH config file
    • ~/.ssh/config
    Host sysjtu
     HostName sylogin.hpc.sjtu.edu.cn
     User phyzjun-user3
     IdentityFile /home/chimaoshu/.ssh/id_rsa
     AddKeysToAgent yes
    
    • e.g. ssh sysjtu
 

SSH via Windows powershell (But recommend using vscode)

  • powershell has built-in ssh.exe, ssh-keygen.exe, scp.exe








  • ssh-copy-id equivlent: type path_to_id_rsa.pub | ssh sysjtu "cat >> ~/.ssh/authorized_keys"
 

Copy between local and remote machines

  • scp source target, (scp = s(ssh) + cp) e.g.
    • scp ./demo/lsqfit_2pt.py sysjtu:~/demo
    • scp -r sysjtu:~/demo /home/chimaoshu

    sysjtu could be replaced by usr@host if .ssh/config is not set

 

Modify files on remote machine

  • vi, vim
    🤣 Q: How do you generate a random string?
          A: Put a Windows user in front of vi, and tell them to exit

  • Download to local 🢩 edit locally 🢩 upload to remote 🢩
    compile/excute 🢩 download to local, ...

  • Recommanded: vscode + remote development extension

    https://code.visualstudio.com/docs/remote/ssh-tutorial
    better set .ssh/config

 

vscode

 

vscode

 

Enviormental Modules

  • module avail: list avaliable modules
  • module load mod: load mod
  • module list: list currenly loaded modules
  • module purge: unload all modules
  • ...
 

A glance on HDF5

  • h5glance: installed via pip: pip3 install h5glance
  • Basic usage:
    • h5glance h5_file [path]
    • path: unix like path to data (string)
  • Advanced usage:
    • show attributes : h5glance --attrs h5_file [path]
    • peek data: h5glance --attrs h5_file path
    • slice data: h5glance -s slice --attrs h5_file path
      • slice: numpy like slice, e.g: 2,10:15
 

IV. Introduction to HDF5

  • HDF5: Hierarchical Data Format Version 5

    High-performance data management and storage suite
    Utilize the HDF5 high performance data software library and file format to manage, process, and store your heterogeneous data. HDF5 is built for fast I/O processing and storage.

  • Open source library

  • Supported by C/C++, Julia, Mathmetaica, Python ...

    • HDF5 does not provide native complex type, different software may have different interpretation of complex type
      • e.g. Interpreted as association (dictironary) in mathmatica
         In[1]:= c = 1 + 0.2*I;
         In[2]:= Export["cmplx.h5", "/cmplx" -> c];
         In[3]:= Import["cmplx.h5", {"Datasets", "/cmplx"}]
        Out[4] = <|"Re" -> 1., "Im" -> 0.2|>
        
 

Other Features

  • Float, integer, string, array, image ... all in a single h5 file
  • Compressed storage e.g. gzip
    • in h5py by setting filter
  • Suport meta data (to describe the data)
    • e.g. attributes of datagroup and datasets
  • Support parallel io (mpi)
    • tricky if file system does not have file lock
  • Virtual dataset
    • "symbolic link" to data in other HDF5 files
    • not supported by Mathematica, Julia
  • HDF WebService
    • alow access to data via HTTP, h5serv
  • ...
  • Python and HDF5, Andrew Collette, ISBN: 9781449367831 (2013)

 

iog to h5

  • Based on iog.so, iog_reader.py, requiring h5py python package

    • iog2h5_2pt.py: convert 2pt iog files to h5

    • iog2h5_3pt.py: convert 3pt iog files to h5

    provided on demand

The last comment block of each slide will be treated as slide notes. It will be visible and editable in Presenter Mode along with the slide. [Read more in the docs](https://sli.dev/guide/syntax.html#notes)